One of my tasks when working on client sites (sometimes even before the engagement starts) is explaining concepts and ideas with regards to Data Vault. This usually comes in the form of onsite standards, modelling scenarios and even architectural concepts. Some of these ideas and explanations I have cleaned up, de-identified and posted to Linkedin and some to Data Vault Alliance blog as well. Certainly posting on here has helped explaining concepts better because there is such a variety of bright minds on this forum!
So if you have the time (and if you're interested), here are some of the articles on Data Vault 2.0 I've authored
- “Advantage Data Vault 2.0” https://www.linkedin.com/pulse/advantage-data-vault-20-patrick-cuba/ This article dives into some of the fundamental differences between DV1.0 and DV2.0.
- Approach to rapidly model data vault visit “Building Data Vault modelling capability through the Mob” here: https://www.linkedin.com/pulse/building-data-vault-modelling-capability-through-mob-patrick-cuba/ This is a proven methodology that saw me lead data vault modelling teams around the world. It was focused and it got results!
- “Deprecating data in Data Vault 2.0” ( https://datavaultalliance.com/news/bring-out-your-dead-data/ ) became a real concern as we needed to solve data problems now and couldn’t wait for the source application to provide the solution. This was just a fun way to discuss this theme
- If you have to deal with batch data arriving in the wrong order look no further than this data-driven approach to solving this issue, “Solving the Time Crime of Back-dated Data in Your Data Vault” ( https://datavaultalliance.com/news/solving-the-time-crime-of-back-dated-data-in-your-data-vault/ ). This approach was presented at wwdvc2019 which I developed a significant part of, even the slide deck Nols presented at the forum!
- The thing about PIT tables is that you haven’t really eliminated the performance lag in querying the data vault, you’ve simply palmed it off to the creation of PIT tables instead. It is an improvement on querying no doubt, but only if you have indexes and if you have virtualized the information mart layer (and you should). This post presents a probable way of not paying the Pied Piper titled "How I can get away without paying the Pied Piper... in Data Vault 2.0” https://www.linkedin.com/pulse/how-i-can-get-away-without-paying-pie-piper-data-vault-patrick-cuba/
- I hadn’t done much work on Apache Spark before I worked at this client, and we encountered a significant problem with Apache Spark SQL, it could not do recursion, we solved it, and it made for a great story published here https://medium.com/macquarie-engineering-blog/apache-spark-graphx-and-the-seven-bridges-of-k%C3%B6nigsberg-5e98a7b8d99f and titled “Apache Spark GraphX and the Seven Bridges of Königsberg“
- If you have ever needed to explain passive integration to a customer… I attempted to here https://www.linkedin.com/pulse/passive-integration-explained-patrick-cuba/
- And then I did it again in this article here “A Rose by any other name… Wait.. is it still the same Rose?”, https://datavaultalliance.com/news/dv/a-rose-by-any-other-name/
- I revived an old hobby of mine… beer making! And yes I managed to make a Data Vault 2.0 article out of it! https://datavaultalliance.com/news/dv/learning-data-vault-is-like-learning-how-to-make-beer/ Called “Learning Data Vault is Like Learning How to Make Beer!”
- You’ve heard of DataOps right? I reused some images to write about how Data Vault 2.0 is very much aligned to DataOps called “Data Vault or: how I learnt to stop worrying and love Data Governance” accessible here: https://www.linkedin.com/pulse/data-vault-how-i-learnt-stop-worrying-love-governance-patrick-cuba/
- If you find that you have a source system that really does treat business keys with case sensitivity, then here I present a data-driven approach to dealing with this called “Business Key Treatments” available here: https://www.linkedin.com/pulse/business-key-treatments-patrick-cuba/
- Ahh business vault, a great mystery! “Data Vault Mysteries... Business Vault”, click here: https://www.linkedin.com/pulse/data-vault-mysteries-business-patrick-cuba/
- And in fact I thought this was going to be a confusing topic at a customer site even when I implemented both for the first time! Here we have and without further ado “Data Vault Mysteries... Zero Keys & Ghost Records”, here: https://www.linkedin.com/pulse/data-vault-mysteries-zero-keys-ghost-records-patrick-cuba/
Don’t forget the articles available here too! “Seven Deadly Sins of Fake Vault” ( https://datavaultalliance.com/discussions/data-vault-2-0-standards/seven-deadly-sins-of-fake-vault/ ) and “Data Vault Elevator Pitch” ( https://datavaultalliance.com/discussions/members-business-discussions/dv-elevator-pitch-depending-on-the-audience-an-open-discussion/ )
And lastly, “Effectivity Satellites”, poorly understood even by some of my former colleagues! This satellite presents some fundamental configuration that seems to fly in the face of DV2.0, it has an End-Date — but by studying how the effectivity satellite works and how to query it it is perfectly fine the way it structured. It so important to understand what it solves and it is why the article has such a teasing title “Is the Effectivity Satellite a Raw or Business Vault artefact?” https://datavaultalliance.com/discussions/members-tech-discussions/is-the-effectivity-satellite-effs-a-raw-vault-or-business-vault-artefact/