Hi all, I have a question regarding business keys. I am working with colleagues who are using DV2.0 for the first time. They are struggling to understand why we have separate hubs and satellites, which I've tried to explain the benefits (obviously poorly!)
More importantly, they are looking for avenues to simplify and standardise the design patterns and have proposing that a Hub should follow a simple physical design pattern that is the same for every Hub, that is containing (mostly) a hashkey, load date and a generic business key column. It is suggested that only a concatenated business key be stored in the Hub, with attributes converted to a string, and the original attributes in native format stored in the child Satellite.
One of the reasons given was a genuine fear that because the source systems are complex to understand, they may (indeed they thought likely) define the business key incorrectly for a given entity. As such, by having a generic column, they believe it would not matter if in the future the understanding of the business key evolves as the existing pattern can still be reused.
This does make me err towards stressing the importance of performing adequate analysis in order to articulate the business key correctly in the first place and that if done effectively the fear of "getting it wrong" it not as great as what may be feared, but the perception is genuine.
This does make me wonder what the advice is around evolving the Hub model at a later date if it found to be inaccurate and a correction is required. I am presuming this would be as simple as defining a new Hub (and associated Satellites and Links) and stopping the load of data into an existing tables - and that this is preferred over refactoring an existing Hub, for auditability purposes. The argument that a single Hub with a generic BK column prevents work doesn't seem to stack up - there will need to be refactoring of the load process from the staging layer anyway to point the correct attributes to the target Hub. I'm also a little nervous that once the hashkeys are generated and locked down into a primary key of a physical table, changing the BK rules could lead to duplicates in the future and means you are unable to reload the existing hub in a different environment if the hash rules have evolved. This issue cascades down to the Link tables that rely in the BK of the related Hubs in its generation. Keeping the evolved Hub in a different physical table seems cleaner. In essence, if there is a genuine misunderstanding of the data source, there is no avoiding rework anyway, so evolve the model in a safe way.
In general, I've been reading through many threads in this forum which suggests replication (not movement) of individual business key components to a Satellite could be helpful in some circumstances. Also, concatenation of business key components in the Hub when in addition to the raw components, is acceptable if the storing the concatenated value is essential. So, I'm feeling a little accommodating to the idea of having a composite key in addition to the raw values in the hub, as well as even replicating the business key attributes to the satellite - after all the hashdiff is calculated on all of the attributes, including the business key components. But this is not a reason to use a generic BK in the Hub.
As always, grateful for your insights on this, especially the use of the generic BK column to standardise the hubs and to support evolving the BK in the future where it was misunderstood, so that I can nudge the model design in the right direction - otherwise, we have a danger of having a mosaic of different patterns being employed in the physical design of the raw DV.