Notifications
Clear all

a Guide to "Special" Satellites - a Recipe for Fake Vault


Posts: 743
Topic starter
(@patrick-cuba)
Honorable Member
Joined: 2 years ago

Continuous from https://datavaultalliance.com/discussions/data-vault-2-0-standards/seven-deadly-sins-of-fake-vault/

Lifted bullet points from article ref below with critique:

  • S_BOK; A Satellite which will hold all substitute, alternate, candidate and atomic identifier parts.

Problem with this approach: such keys actually have no value to the business since the idea behind data vault is to store the immutable business (object) key in the hub. When we are forced to use "other" keys then how would a bag of keys help to resolve integration if it is so unique to each source system that provided it? If alternate keys are to be used then a simple "same-as" link would suffice. Further, if splitting out content into a "Bag of Keys" how can you ever hope to "recreate the source" when proving data lineage and auditability? Alternate (non-business) keys are nothing more than regular raw vault satellite attributes.

  • S_SBE; A Satellite which will contain the standard business effectivity of the instance of the ensemble.

This satellite fails to realise that the source system or application is in fact an implementation of business rules; whether an object is effective or not is driven by the source. Additional rules to categorize if something is still effective or not, can indeed be built but into a business vault satellite with additional rules driven by the business. These can be based on record tracking satellite (the last time we saw the entity or relationship) in combination with a business vault status tracking satellite, typically this is deployed as "aging" rules. More explicit scenarios to track this can be recorded in a status tracking satellite if the source happens to be a snapshot and thus the entity effectivity can be directly inferred.

Why create an additional satellite to do something data vault already did?

My guess is that by voting to get rid of link-satellites, Ensemble Modellers now have to apply on additional satellite structure to achieve something as far back as Data Vault 1.0 already solved!

  • S_RAD; A Satellite which contains those attributes which need to be restricted in access.

Data Vault 2.0 already provides for this in satellite splitting in general, i.e. we tend to split content into PII/PHI and non-PII/PHI for a source. There really isn't any need to consolidate these into a single satellite! PII/PHI content hardly (if ever) change so it makes it easier for teams to deal with article 17 of GDPR ("Right to be forgotten"), the split content can be isolated into a secure schema as well.

An alternate approach may be to tokenize all this content and rely on RBAC (Role-Based Access Control) to manage what roles can see sensitive data. If an article 17 request is made then it might be viable to to just delete the token! This approach could mean that no manipulation of the PII/PHI satellite is needed at all!

(please check with your business what is satisfactory approach to doing this in your jurisdiction and for the type and origin of data you are housing)

  • S_PCP: A Satellite which holds the attributes with profiling data which can be kept at all times.

Related to the satellite above, if you think about it why would you need to separate (PCP) content out from a regular satellite if you do not need to manage it uniquely? It's non-identifying as the article articulates then why separate this out into a PCP satellite?

  • S_DEA; A Satellite which holds derived, enhanced business analytics & data science augmented context.

This is a standard Business Vault satellite practice (anyway) in addition a BV satellite doesn't impose any restrictions or boundaries where your data science and analytics should live, as long as it is decoupled from raw vault.

Why is it that the naming of the last two satellites seem to be drug related?

  • S_Shuttle; A Satellite which will provide a handshake to data stored in other environments or less structured data stored outside the EDW.

Not a thing, particularly with modern platforms that are able to store semi-structured and structured data alongside each other. By incorporating a "handshake" or "search string" this might actually be mixing up functional rules and hard rules into the same satellite. This will lead to integration debt that if changes are needed makes it far harder to decouple, therefore an anti-pattern.

Conclusions

"Special" satellites and Ensemble Modelling are not Data Vault 2.0 concepts and in fact seem to add additional layers of complexity and abstraction that for one are not designed to scale and are more like Data Vault anti-patterns rather than something that can be templated and repeatable.

Ref: https://www.linkedin.com/pulse/guide-special-satellites-data-vault-remco-broekmans/

 

Reply