Normalization Data Models

Technical Tip: Clustered Hash Keys – Don’t Do it!

This is a short tip for everyone out there implementing Data Vaults.

We’ve heard throughout the years about people trying to CLUSTER the Hash Keys in Data Vault on SQLServer and other databases that support clustering.   This is a bad idea and should be avoided at all costs.

What is a Clustered Key?

Generally a Clustered Key means:  KEEP THE DATA IN PHYSICAL SORTED ORDER when inserting by Clustered Key.  This, co-locates data sets together that have similar keys.   HASH KEYS ARE VARIANT, and are essentially a hexadecimal representation of a mathematical outcome.  They are computed to be distributed over the length of bits available for the length of the hash.  This, makes them a very bad choice for clustering in SQLServer or any other database.  Click the links below to learn more about clustered keys.

When are Clustered Keys a GOOD IDEA?

Clustered Keys are a good idea WHEN you have true business values that you join on all the time, or where clause fields that you issue values for all the time, AND you need the data to be held together on disk to minimize I/O operations.   Clustered Keys are a good idea when you have increasing sequence values, then the optimizers realize this and most can run mathematical algorithms for min/max, range, and set operations.

Otherwise: STAY AWAY FROM CLUSTERED KEYS – and never, ever put a cluster on a HASH KEY or HASH DIFFERENCE.

  • Browse Categories

  • The Latest News—Unlocked and Straight to Your Inbox.

    Thanks for reading. Subscribe to get the latest blogs, podcasts and notifications.

  • The Latest News—Unlocked and Straight to Your Inbox.

    Thanks for reading. Subscribe to get the latest blogs, podcasts and notifications.

    View More

    Explore the strategic importance of evolving data relationships and their impact on data-driven insights in our latest blog post. Learn how shifts in business rules require significant re-engineering, affecting data management and decision-making. Essential reading for executives and business analysts, this discussion highlights the need for adaptable data practices to maintain competitive advantage in a dynamic business environment.
    Scroll to Top