I was considering building a micro course (1 hr) of performance tuning for SQLServer. Instead, I will build one highly focused on tuning Data Vault models (raw and business vault). That said, here are some incredible video links that I think you should take advantage of for tuning your Microsoft SQLServer (azure as well):
- Brent Ozar Live - Performance Tuning
- About 27 minutes in, he talks about COLUMN STORE compression Indexing, with a complete example, up to 90% compression on repeating values
- I *highly* recommend you apply this technique to Satellites, and Links where repeating values are huge, especially if you have a WIDE satellite, or replicating business keys (even hash keys), load-dates, record sources, and other types of data.
- Brent Ozar - What to do when SQLServer is Slow
- Pinal Dave Live - SQLServer Perf Tuning Made Easy
- This video is incredible - how creating an index can SLOW your queries down... amazing
- Pinal Dave, 3 Common Mistakes to Kill SQL Server Performance
- Column Store Indexing
- Something you SHOULD investigate, but understand before you apply it
These presentations are awesome, I've been doing performance tuning on massively large systems for years from Oracle to SQLServer to Teradata and yes, even IBM DB2. Snowflake is auto-tuned (internally), and it's all about the data layout. Yes, more is to come from me on these subjects.
For now: I will add the following tips to the above video suggestions:
- UPGRADE - make sure to move to AT LEAST SQLServer 2016, if you can, move all the way to 2019. It has the best optimizers, and tons of enhancements for large scale data sets.
- Move SQLServer to LINUX if at all possible, it will make your SQLServer scream (as long as you tune the Linux box properly and give SQLServer the memory it needs to run). By the way, on Linux you can and SHOULD change the default block size. On Windows OS it won't make much difference because at the OS level the block size is STILL 8k
- On SQLServer Azure or 2019, Change your staging and data vault table structures over to COLUMNAR storage (requires a create index command to accomplish this) - more instructions and how-to in the upcoming micro course.
- CHANGE YOUR LOGGING LEVELS TO BULK MODE!! For a Data Warehouse YOU DO NOT need normal levels of logging, bulk logging is good enough - as all loading and query processes should be fully restartable (IF you follow the Data Vault Standards)
- Drop ALL indexes BEFORE loading (this is even in the video), it is always faster to rebuild the indexes AFTER loading. And if your data set is too big, learn how to partition, and create indexes on new partitions only.
- Learn how to use FUNCTION BASED COLUMNS instead of function based where clause predicates. IF you have an UPPER or substring, or something else, build a function based column, then index it. IT WILL make things faster than trying to leverage that function in a SQL query where clause.
- NEVER EVER EVER USE INDEX HINTS.. THEY KILL PERFORMANCE (maybe faster for the first 10k rows, or for testing) but they die under volume. If you feel like you need index hints, then you have either an underpowered platform, misconfigured platform, or poor design in your data model that needs to be optimized.
That's all for now, there are a bunch more hints and tips, but I hope you enjoy this - and IF you apply this and see a performance increase, please reply with the results? Measure your query BEFORE changes, and measure your query after the changes. These changes will affect run-times with volumes as small as 1 million rows.
Merry Christmas and Happy Holidays and Happy New Year!
Don't forget to LIKE or REPLY with your tips as well.