Clear all

Do I need Hash Keys?

Posts: 482
Topic starter
Joined: 3 years ago

On some platforms no.  Teradata for instance will Hash your natural keys using their own internal algorithm.  If you want to know more, you should read about Teradata's PRIMARY INDEX definition.  They generate a location based hash, and resolve collisions internally using a Linked-List resolution.

Snowflake DB - no, performance with Natural business key columns (including multi-field/attribute) seem to perform better than Hash Key storage on SnowflakeDB.  Our community member: @kentgraziano can comment more on this going forward.

For all the rest of the platforms: Hash Keys are recommended, in order to take full advantage of high speed large scale parallel data loading mechanisms.

A note for all of you: Sequence numbers fail at an upper limit of hardware and data size.  If you really want to see this, push your own Relational DB system as high as possible with a simple Parent-Child construct.  Measure the performance of the load as the data set size grows, and the parent table causes bottlenecks with the "lookup" during the loading process. 

However, if you are always on a smaller system (<10TB) approximately, then you won't see this effect.  Keep in mind that 10TB in 2019 will become 30TB in 2020, and grow as hardware gets faster and more scalable.  And of course on Snowflake DB with elastic compute, there is "no upper limit", but they still show loading times causing issues (in 2019) with tables containing 100 billion records in them.

15 Replies