Source System #datavault and Business Keys

Far too often I hear a tale about Data Vault not performing, or Data Vault is too complex, or the ever popular: Data Vault has too many tables. I also hear things like: Data Vault has no value, or we don’t have any business keys, or we don’t know why we have to model by business key.  These are all bad things and if you have a consultancy or people who say these things, then it’s time to bring in an authorized CDVP2 trainer for an assessment of what went wrong, and to take corrective action to turn your Data Vault in to a proper success story as it should be.

What I Discover

When I dig deeper, I discover that the implementation team a) hasn’t been to training or certification, b) that the consultancy whom claimed Data Vault 2.0 practices has no clue, no idea what they are doing or only has 1 expert in the entire company, and they are on a different project – not working at all for your implementation or you efforts.  When I dig deeper I often hear: “We read the book, and this should be easy!!”

If “Enterprise integration and Enterprise Data Warehousing” were easy, then we wouldn’t need Data Vault to help us along the way.  First off, Data Vault is far more than just a data modelIf your team, or your consultancy or your experts don’t understand this, and they think that “reading the book is enough”, then we need to help you get that corrected as soon as possible.  If this is the case, or you’ve heard this on your Data Vault projects, then please contact us.  For a bit of assessment now, it can save you a lot of pain later.  Let us help you get your course corrected and your Data Vault on track for future success.

So what’s the big hubub?

Many consultancies (turns out) are building something I call Source System Data Vault. They are not exercising the best practices, the principles, or the rigor found in the Data Vault methodology (ie: information engineering).  Therefore, they cannot ever hope to achieve the successes that we announce with Data Vault that we do have.

What is a Source System Data Vault (SSDV) and Why is that Wrong?

An SSDV is quite simply: A restructuring of your source system in to Data Vault styled objects.  It’s wrong because it holds no business value, does not integrate any data by the real business key.  All it does is apply a ratio: 1 source table creates 2 to 3 Data Vault tables, and leverages source system primary and foreign keys to accomplish this.  Generators that produce SSDV’s are a dime a dozen, cheap – free, homegrown, easy to write, and very quickly can run your data vault efforts off the rails and down the drain!!

SSDV’s cause all kinds of problems, from maintenance issues, to join performance issues, to sheer numbers of tables.  It means: your implementation team does not understand what they are doing, and should attend CDVP2 training before going further.  If you want to speak with customers all over the world that have gone “SSDV route”, then corrected their actions to achieve success, please let me know.  I’d be happy to put you in contact with these folks.

Now, that said: SSDV’s HOLD NO BUSINESS VALUE (I know I’m repeating myself).  Do NOT, I repeat do NOT go to production with an SSDV.  It is bad practice.

You can have the BEST data model in the world, but if you don’t back it with the right approach, and the right methodology, it will not serve your business purposes, it will NOT produce a success – rather, it will simply fail (under volume, under real-time, under complexity – take your pick).

Integration by Source Sequence Surrogate Key is Wrong!

Now, there are times when this is the only option, and we do teach how, when and why this happens in our CDVP2 classes – BUT this should be the exception and not the rule!!  If the Data Vault model is built this way it means: a) the implementation team is lazy, b) the team leveraged a generation tool that doesn’t work properly, or c) they don’t truly understand what a Data Vault is.

Sequence keys in the source system are meaningless generated numbers.  They CHANGE when: a) the data is moved from one source application to another, b) the source application is upgraded, c) the surrogate “rolls over” and is then reassigned in the future to new data d) data is exported then re-imported / restored at a later date.

Surrogate Sequence Keys in the source system really hold no meaning to the actual business user or real world.  I can’t walk up to a business user and ask: what does Order 23 represent / mean to you?  The human cannot respond with an accurate answer to that question without: context, without a) knowing WHICH source application, and b) asking the machine to pull up the record on the screen.  Only then can they interpret the surrogate key.

True Business Keys Have Meaning

True business keys may be hard to find, hard to identify, may not always be filled in (when they should be), but are in fact keys that are assigned to represent data that mean something in the real world.  Take an automobile VIN (vehicle identification number) for example.  Most everyone who works with VIN’s on a daily basis can (for the most part) interpret what  VIN means if you provide it to them, without looking it up.  VIN’s are what we call “Smart keys” / Intelligent keys  (by the way, we teach this concept in our CDVP2 classes, and I encourage everyone to get certified, it will help you).

VIN numbers are a prime example of proper business keys that hold the same semantic meaning, and the same semantic grain across multiple source system applications.  It doesn’t matter if the VIN is in the Sales System (SAP for example), and the data is moved to the Finance System (Oracle Financials for example), the VIN should not change just because the data is moved!  This is the part where Sequence Surrogates do change when the data moves, and this is very very dangerous.

With the same semantic grain, same semantic meaning, the VIN creates a single Hub object in the Data Vault model – regardless of the number of source systems that it exists in.  This does not lead to an explosion of table objects, quite the opposite – it integrates, and consolidates data (as it is designed) at the raw level.

Ok, Why is the Methodology Important then?

The Methodology tells you HOW you should find business keys, how these business keys LINK the data to the business processes, how to turn your data in to an asset in the business (measure it, monitor it, trace it through your business).  The methodology provides the way you work to achieve success, it describes how to model and why modeling around business keys makes sense.  Without the methodology, all you have is a SSDV – a failure.

The methodology gives you standards, and best practices, guidelines, approaches and methods for security, encryption, data integration, consolidation, and of course: agile ways of working (compliments to Scott Ambler and Mark Lines for Disciplined Agile Delivery).  The Data Vault methodology tells you HOW to integrate by business key, HOW to find the right business keys to leverage, and so on.

SOURCE SYSTEM DATA VAULTS ARE DANGEROUS!!   They IGNORE the methodology, they IGNORE the standards, they IGNORE the best practices, they IGNORE the business keys, and by doing so they create a spaghetti mess that cannot ever be called an enterprise data warehouse.

Why are business keys are so important?

Well, I stated some of that already (above), but let me shed light on a few more items:

  1. Business Keys allow cross-line-of-business integration
  2. Business Keys provide better Master Data integration
  3. Business Keys provide 100% traceability across time (regardless of sequence surrogate changes) – just think VIN number for a single car over time….
  4. Business Keys are far more stable than Sequence Surrogates
  5. Business Keys align with the Enterprise Extended Ontologies and Taxonomies for business based integration and BI outcomes
  6. Business Keys (mostly) make sense to humans without having to be “looked up” in the system.
  7. Business keys will (when properly selected) REDUCE the number of tables in your Data Vault (Enterprise Data Warehouse)
  8. Business Keys TRACE the data across the enterprise from inception to completion, allowing you to measure the TIME it takes to “accomplish something” in your business.

An example of #8 is: at Lockheed Martin we had Contract Numbers for building Rockets in Astronautics, we also assigned Tail Numbers to each rocket.  These two business keys allowed us to trace the entire cycle of a rocket contract and build across the enterprise.  We had over 250 source systems, 125 of them were integrated in to the Data Vault in 6 months with a team of 3.5 people (part time data architect).  All by business key.  The contract number allowed us to trace the rocket through Sales, Contracts, Finance, Manufacturing, Planning, Build, Delivery, Parts Maintenance and more.

There was no way that the “sequence numbers” that were assigned would ever have allowed us to accomplish this.  By the way, with a properly built Data Vault and 125 source systems, we actually ended up with about 75 Data Vault structures.  If your team is building 3 Data Vault tables for each source table, then they are doing it wrong, and it needs correction.

Contact us – it’s not to late to get it corrected!

We want you to succeed, we want to see you have success with your Data Vault efforts.  The Authorized Data Vault 2.0 Trainers around the world are happy to step in, and help you get your efforts corrected before it’s too late.  Training is the first step, but it’s not the only option.  We have a service called a Kick Start, along with assessment services.

Just remember: “Automation tools that simply generate data vaults according to source system Primary and foreign keys will be the death of your Data Vault”.   Teams that follow this paradigm are headed for the cliff of disaster.   Let us help you today, contact us to find out more.

Thanks!  Dan

Leave a Comment