I'm new to Data Vault and just reviewed the updated Modeling standards. One of the items I noticed - at the end of the standards - was that the Satellite Load End_Dates and Hub and Link Last Seen Dates have been DEPRECATED. This caught me a little off guard as I had just reviewed the "book" and it was still a part of that (at least my Kindle version had it).
I understand the rationale is to avoid update logic - and to basically only perform appends (inserts). That said - I'm struggling with what to use instead to understand when something is no longer in the source. Here are some scenarios I'm trying to resolve:
- Satellite build from a full load. The "creation" date is captured naturally as part of the design - and even updates over time based on the same business key are captured (and we even know when the hub was created). What I'm confused on is how to capture the fact this "row" no longer exists in the source. I can use a PIT table to capture daily snapshots - so that I can derive it no longer exists by the fact it's not in the ongoing PIT records - but that does mean the PIT is no longer derivable from the hub and satellites.
- Many to Many Link. Again we know when we initially found this connection (whether in the source or derived) but I'm not sure how to know when it no longer exists in the source. If this was a one to many we could build an assumption based on the most recent for one side of the link being the correct one - but that wouldn't work for a many to many. The only idea I could think of was to build a "status" type satellite that captures the "delete" event.