Partitioned for Performance

If you’ve done any of the Microsoft training or certifications in the last few years, you’ll probably noticed they’re really pushing for the use of partition tables-it’s appeared in at least 3 of my exams for the MSCA/MSCE. In my opinion, that’s a probably a good move as I believe partitioned tables are another one of those great features on SQL that’s not used as often as it should despite the many benefits it brings.

In my last blog on partition tables, I talked about the benefits from an archiving and management perspective. In this one, I’ll go over the query performance that can be achieved. If partitioning is something new to you, it’s worth thinking it as dividing large tables into smaller ones-the smaller the table, the fewer the number of records and reads and the faster the performance. The screen shot shows how this works and basically what we’ll be doing.

SQL Partitioned Tables
Partitioned vs Non Partitioned Table

We’ll be using the sample WideWorldImportedDW database from Microsoft for our demo,  a copy of this new database can be found here.  I’ll assume you know how to restore or attach a database and focus on the codes and what is happening as it is run. For those that want to follow along, the codes are available here.

To get things started, let’s examine the initial state of the DB after we’ve just restored it. For this table (Sales), note how it is already partitioned (7 partitions) and the database itself comprises of several file groups already. In comparison, database by default  on creation have just one.

Partitioned Tables-SQL Server
The fastest way to see if a table is partitioned-right click and select properties and Storage. As seen here, this table is already partitioned

Below are the properties for the database. See that it already has four file groups (including the logs). Also note the number of file groups need not correspond to the number of partitions.

FileGroupInitial
The Files page of the table properties shows details of the files

For the demo, we’ll create a table based on the Fact.Sales fact table as this table is already partitioned and I want to show this is done, but also as it has just enough data for our analysis and also recreating a copy means we can strip out the noise from the column-store index on this table and focus on the execution plan as it relates to our newly created partitions.

First off, create some new Filegroups-five to be exact. IF you go to the folder of the C-drive below after the code is run, you’ll see our new files -this is our database (files). Note however you do not need to use multi Filegroups to use partitioning-the extra Filegroups is to further optimize performance. We could well partition the table using the same file/filegroup had we wanted to.

CreateFileGroup

Now go back to the database properties and note the new Filegroups we just created, in contrast to what we saw earlier.

created

What we’ve done is essentially split our database into smaller chunks (or rather, add extra chunks). Next, create the three things we need-partition function, partition scheme and partitioned table. The partition function essentially sets the boundaries of where our data will reside, here, we’re using dates (years) but this can be any number of data types. The partition scheme basically binds the file groups for our partitions. For the partitioned table, the only requirement is the partitioning column (Invoice Date Key) must form part of the primary key.

CreateTable

Next, populate the table from the base table so we have data to play with. We’ll also create a secondary table by using a select into statement for comparison. This secondary table will have the same structure but won’t be partitioned and used for comparison.

As the focus is on partitions (and not index), we won’t worry about  recreating the clustered index on this table-having one will make no difference to the partitions read which is our area of focus.

CreateWorking

With our table populated, we can now check where our data now sits. The first query shows the count of records by year and the second query confirms which partition they are located. Noticed how the records are split and stored in different partitions.

partitioncheck

Now run a simple select on the non-partitioned table. If we look at the query plan of this query, we can see the query returned 29,458 records but it had to read (scan) the entire table with 228,265 records.

notpart

Now run the same query but on our partitioned table and we can see/confirm:

  1. The table is partitioned and we only had to read from one partition
  2. We got 29,458 records as above but only read 29,458 records as all the records were contained in this one partition (partition 5 for 2016 year) in contrast to the 228,265 records on the non partitioned table above.

part

Now lets expand our date parameter for our second query on the partitioned table and see what we get. This time:

  1. The table is partitioned as before but this time, we had to read from two partitions
  2. We had to read more rows this time for the two partitions (101,356) as our query parameter spanned two years.

CrossPart

What about queries with joins? Same logic applies and so you get the same outcome-the date parameter meant we only had to read from the one partition and not the entire table this time.

Joinsx

Can we improve the performance further? Absolutely, just create a simple non-clustered covering index. Once created, the optimiser now can perform a much more efficient Index Seek instead of doing a table scan and we now see the number of reads further reduced from 71,898 to 61.

IndexSeek

And finally, the most important thing to remember- while partitioning can be great for many reasons, it is not the panacea to all our problems. Primarily, if your query parameter(s) don’t align with your partition function, it could still read from one to all your file groups as the query below shows. Depending on your how your data is distributed and partitioned, the extra overhead could actually make your query slower than a non partitioned table.

SQL Server 2017 now supports up to 15,000 partitions, up from 1,000 for the SQL 2012 version. Clearly, Microsoft sees this as the way to go on handling high volumes and the proliferation of the ‘Big Tables’ and while you may not experience much performance gains on small tables like the one we just did for this demo (hence why we didn’t review the statistics) with just over two hundred thousand plus rows of data, you’ll certainly feel it when the volumes increases and runs into the billions as it’s becoming ever increasingly common in many workplaces. Hopefully, this blog has shown you how this is achieved.

For further reading, I strongly encourage you to read this article from Microsoft which go much more into detail.

allparts

 


Leave a Reply