On today’s episode of Data Vault Mysteries, we discuss Driving Keys and Effectivity Satellites!
One of the most complex and often misunderstood Data Vault 2.0 artifacts is the beloved Effectivity Satellite. Some argue it doesn’t fit the Insert-Only architecture of Data Vault 2.0. Still, I can assure you, through a detailed example, that an Effectivity Satellite does indeed fit the Data Vault 2.0 architecture. I will show you where and why you would even consider using Effectivity Satellite and why you would not.
Although Data Vault 2.0 is INSERT-ONLY, why then do we have a start and end date in an Effectivity Satellite?
To understand this, let’s first discuss relationship effectivity. Hypothetically if two business entities we are tracking have a relationship, an account and a product (for instance) can be related in a one-to-one or one-to-many cardinality. It stands to reason that if an account begins a relation with a product, the relationship itself will have a start date with no end-date value. To draw a parallel with slowly changing dimension (type 2), this means the end-date is assigned a high-date like ‘9999-12-31’, meaning that until the relationship changes, when we query this relationship, it is the effective relationship until the end of time.
Note: we are using natural keys for a simpler representation of the concepts!
However, when the relationship changes, the account is now related to a different product (or products). We then have to end what we previously knew about this relationship. In a slowly changing dimension (Kimball style modelling), we would have updated that record and inserted the new effective (or active) record.
This can occur in an operation source as well and not just a data warehouse. Imagine if this effectivity is being tracked in a system like master data management (MDM). MDM uses an enterprise-type data model with relationships between entities tracked with start and end dates. When sourced and loaded to Data Vault, the model will look like a bi-temporal link satellite.
Although we have done nothing out of the ordinary in our link satellite load, the content is loaded and hashed for the record hash and persisted into the link satellite. Notice, however, the value of the surrogate link-hash-key…
Now, remember, if the source system is doing the effectivity tracking, then this is the best scenario. Data Vault scales naturally and simply ingests what the source tells us is happening to the account with its relation to the product. You could say the account number is the master entity MDM tracking the changes against.
Let’s see what the effectivity satellite does for us, shall we…
If the source does not provide a business date tracking the change in a relationship, or at least the business date to the cardinality of change we want, then we request that they do. If they can’t, then we may have to derive this relationship movement ourselves. Now that we have defined “account” as the master entity, or rather, the entity driving the change tracking in the link. Yes, that’s right, account id is the driving key of the relationship. Could we rely on the link structure to show the effectivity change in the relationship? Let’s see…
Remember, the effectivity satellite is only about relationship tracking and nothing else. On day two, we show some changes. Notice how the link-satellite and effectivity satellite have the same change iterations…
Now, this is the tricky bit. What happens when the relationship goes back….?
Ah, now you can see that the effectivity satellite is only about the relationship and nothing else. The end-date is persisted but, in essence, never updated. With no change to the “interest” column, the only way to detect a relationship change is in the effectivity satellite. You cannot rely on the link table because the link table is a unique list of relationships. Therefore, by returning to the original relationship, the load date of the previous relationship is not the active relationship.
Could we instead put effectivity in the link table structure?
To answer that, let’s explore a similar relationship tracking satellite, the status tracking satellite. Remember, just like the effectivity satellite, we only really use these structures if we do not get this intelligence from the source date already. For the effectivity satellite, it’s the business date of the relationship changes to a driving key for the status tracking satellite it’s because we do not have change data capture (CDC) statuses for either business keys or relationships from the source.
For a two-part link effectivity, the effectivity satellite and status tracking satellite outcome will be the same. If it were a three-part link (three-link participants / three hubs), the outcome between the effectivity and status tracking satellites will be very different…
Effectivity in Status Tracking Satellite
Status tracking satellite can be applied to a business key (based on a hub) or unit-of-work (based on a link) from a source. Can we use the status tracking satellite to infer effectivity? It does not have a start and end date and no driving key defined. Let’s compare the two. This time we are not showing the whole target satellites, just the records that get inserted, and two keys are driver keys.
Now, what happens when we change the active driving key columns from two to one? Yes, we can track different effectivity on the link based on a different set of driving keys; of course, they would against different effectivity satellites. We use the same source file (green header) in the following example but track different driver keys.
A single driver key would cause the same output between the two. The caveat is that the source file is a snapshot of the source. Status Tracking Satellite in the traditional implementation does not work with delta loads.
Of course, I don’t have to tell you that setting the driver key to all columns in a link does not make sense. I mean, what are you comparing the driver key(s) against?! However, the flexibility increases logarithmically as the number of participants in the link table increases. What I mean is, if
a. there is no business date tracking relationship changes or;
b. you want to track different cardinality to what is supplied by the source; you could, in fact, create as many effectivity satellites from a single link table.
2 hub link: maximum 2 effectivity satellites,
3 hub link: maximum 4 effectivity satellite
4 hub link: maximum 8 effectivity satellites, etc.
The flexibility is there. If you were to introduce effectivity into the link table itself, it means that every link table is a specialized configuration in the data vault model; i.e., the solution will not scale. It is also an anti-pattern if you build the data warehouse a switch- architecture, as we demonstrated above, you could have as many effectivity satellites as you want! In essence, you would need as many links as there are effectivity requirements which could be disastrous to the number of tables you end up managing and the number of tables to join!
And in the governance itself, the model is no longer is self-describing! How do you pick which link table to solve your query? To those that suggest it is possible, go back to the start and study Data Vault 2.0 all over again!
No, effectivity is a decoupled component for a reason… it will scale.
Note that the effectivity satellite not only solves
Relationships that return to a previous relationship state,
Business tracking of relationship effectivity,
but also it does not matter
what cardinality the relationship is either.
We have discussed relationships that are 1:1, but yes, the same structure is usable in a 1:M cardinality as well. After all, the link table itself is M:M and will always remain that way! Other peripheral satellite tables like record tracking and status tracking can live off a hub and link table, but an effectivity table cannot. It is purely a link table offspring. Unless how you choose to populate an effectivity satellite is not in the standard driver to the non-driver keyway.
The last point to make about effectivity satellites is this: identify the need for a driver key early. As you saw in the illustrations above, the effectivity satellite captures when a previous relationship returns. What this means is, there is no other way in data Vault to capture that effectivity by relying on a link table itself. An effectivity satellite is the only way to solve this source system gap, in the absence of the source providing a business date of that movement.
Use the effectivity satellite sparingly. If you need to implement it and want to learn how to use it, all SQL is provided in the Data Vault Guru. See the link below.
#datavault #datawarehouse #analytics #datamodelling #thedatamustflow
Launch that satellite!
Author: Patrick Cuba
Patrick has nearly 20 years working on data-inspired problems utilizing his experience and he has embraced Data Vault 2.0. He works by understanding the business before innovating the technology needed to ensure that his data-driven delivery is agile and automated. He is Data Vault 2.0 certified and regularly contributes to Data Vault Alliance.