In our BI stack our sources get replicated real-time onto our data lake which then contains full history tracking of all source tables. This is then used as input into our Raw Vault. Due to the batch nature of the loads of these sources into Data Vault we will always have the scenario of having intra-batch changes. I have seen/heard on 2 occasions ( https://datavaultalliance.com/discussions/members-tech-discussions/processing-a-cdc-log-adds-complexity-to-the-loading-template-for-satellites/#post-326 as well as our DV2 course) that the recommendation to handle this is by introducing a sub sequence.
I do however have some concerns around this which I hope to get some feedback. Also assuming that we want to use the same stage and load templates and not introduce custom ones for various purposes.
- How does one split tables into different rates of change in this scenario? If one needs to use the sub sequence then your rate of change will be the same per source table. So even if 1 field changes frequently and the rest doesn't the sub sequence will still be unique across all changes.
- I have a conceptual concern. All of our source tables behave in this way, and to have a newly introduced sub-sequence field on all satellites feels wrong. There are other scenarios where you might also need a subtype or subsequence which means you will then need both fields in the satellite as additional primary keys.
Your comments and feedback will be much appreciated.