This post is a short Q&A of frequently asked questions that I get all the time. Whether you’re just getting started with Data Vault, or are fully certified, there is something here for everyone. If you find that you have additional questions for me, feel free to use the contact-us form on our site to send them to me directly. I’ll try to get them answered in the next blog entry.
Before you get on your high horse and think I have no clue what I’m talking about when it comes to Kimball and Star Schemas, let me share my credentials with you… I build hundreds of Kimball style “marts” over the years, I am CBIP Certified at the mastery level, I am DAMA Data Management Specialist certified at the mastery level, I have been working in IT since 1982 (38 years). I’ve built operational systems, grew up coding assembly level language on CP/M machines with Z80 and Z8080 cpu’s. I’ve build hundreds of database applications, and helped thousands of companies around the world build Data Vaults successfully. I’ve built parsers, object compilers, language interpreters, assemblers, compilers, and even a few data management engines. I taught classes every quarter at TDWI US events from 2001 to 2006, I’ve taught at DAMA events, and Data Modeling Zone conferences, and more.
DATA VAULT IS NOT AN OVERNIGHT FAD – It has been specifically designed from 1990 to 2000 by myself, and my team at Lockheed Martin Astronautics, it’s been engineered to tackle the most difficult of enterprise problems – regardless of hardware, technology, or database advancements. You can find hundreds of free posts and information here: https://danLinstedt.com
Question: Have you ever refactored an existing Data Warehouse, e.g. using a Kimball Model and transforming it into Data Vault? If so, How was your strategic and tactical refactoring approach? How did you ensure the reusability of the data vault in the long term?
Yes, I have done many of these. This however is not the recommended approach because of one simple problem: Kimball Models are based on business rules and changes to the data on the way in to the data warehouse. This causes a loss in fidelity of data sets, a loss in auditability, and generally munges or alters data together that is not typically delivered by any given source system.
There are however a few good things to pull from or leverage from the Kimball Models:
- business keys – known as durable keys in the Kimball Model.
- facts – if the facts are close to “raw” level of detail
- business rules – can be leveraged on the way out of the Data Vault model to rebuild mart solutions for business users.
In order to build the Data Vault model properly from a strategic standpoint, it is required that the team use the source systems as inputs to the design, and not simply focus on just the Kimball model alone. Furthermore, in order to truly be strategic – the team must consider their ways of working.
Well beyond the Data Vault Model, The methodology, the model, the architecture, and the implementation all need to be considered. This is where Data Vault 2.0 really takes shape.
It’s not enough to simply “lift & shift” from a Kimball star schema to a Data Vault model – in fact, I would say: if this is your objective – DON’T build a Data Vault at all, because you will not get any of the value of Data Vault if you ignore the other foundational components.
Tactically speaking:
Training is always required, and is always first. Don’t just dive in – this will lead you down the wrong path immediately. Additionally, having a coach / mentor / authorized trainer retained along the way to ensure the Data Vault is properly executed will make the difference between success and failure.
Ensuring the reusability of the Data Vault Long term requires management buy-in, changes to the ways-of-working, and a number of measures, including roles and responsibilities to be put in place to make this happen. Including a solid foundation in QA /QC. It also requires consistent and regular reviews and assessments to correct any possible “mistakes” that are currently happening in the organization.
Question: What was the most cost and time consuming part within such projects?
The most cost and time consuming part of these projects: that is hard to measure. What is the most cost and time consuming part of any data warehouse project? Unfortunately there is no accurate way of dictating or even speculating what a good answer to this is, other than to say: it’s the same as any other data warehouse program or project
The following still apply:
- training is still required
- management and good governance is still required
- Quality Assurance / Quality Control is still required
- Data Analysis and Profiling is still required
- Metadata and Documentation and Lineage are still required
- Version control is still required
- On-going assessments and reviews are still required
- Management backing and support are still required
- Management backing and support are still required
- Management backing and support are still required !! (yes I repeated it 3 times on purpose!)
Running a Data Vault project / program still requires all the same foundational efforts as any successful business intelligence or enterprise data warehouse project.
The best answer I can give is:
If you already have metrics and measures for successful business intelligence and enterprise data warehouse projects, the costs and time to deliver will decrease (sometimes significantly). With Data Vault 2.0 – when you engage with the methodology, you can actually drop your costs over time – and increase your agility. The price to pay however is: a willingness to change the way you work, a willingness to stick to the standards, a willingness to continuously train new resources as they on-board, a willingness to engage in reviews and assessments.
Question: What do you see as the greatest risks down the refactoring path?
The answer is: there are the same risks as with any other refactoring project. Why? Because Data Vault 2.0 is a solution – comprised of Methodology, Modeling, Architecture and Implementation. Data Vault 2.0 may change: the way you work, the standards you build with, the design you use and the tools applied.
Data Vault 2.0 is not a magic silver bullet. You can’t simply read the book and expect that your build will be successful – this is a huge mistake to assume! Data Vault 2.0 is an evolutionary change to building and deploying enterprise BI solutions.
Some of the common mistakes people make are listed below:
- Assuming “refactoring the data model” is all it takes, OR taking a lift & shift approach – this will never ever work, especially not long term, in fact – this will fail FAST. It will end up costing the customer 4x more money than it should, and not produce any of the outcomes the customer is seeking.
- Refactoring JUST the data model to be data vault style – this will not provide any of the benefits that the Data Vault Solution can bring to the table. It’s simply “storing the same old junk in new data structures” – this DOES NOT WORK, ever.
- Unwilling to train – if the you or your client are interested in following the book (Building a Scalable Data Warehouse with Data Vault 2.0) and not willing to go through training to be certified, this will not work. Again, this will introduce significant confusion, and will run off the rails quickly – causing budget problems, and even slower delivery times. Your client must be willing to train, or Data Vault is NOT for them.
- Unwilling to decouple the business rules from the source data / ingestion. This will cause all kinds of problems, and will ultimately yield a massive failure.
- Unwilling to dedicate QA / QC resources – again, QA / QC properly trained, can make the difference between success and failure.
- Outsourcing to a set of consultants / consultancies that “claim Data Vault expertise” but do not actually have a significant portion of their team trained in CDVP2 or certified by an authorized instructor.
- Failure to leverage the technical platforms’ best capabilities – again, a lift & shift approach can KILL this process dead in it’s tracks. To do this right requires a solid foundation / solid architecture, in addition to training, otherwise it simply won’t work.
Let me ask you this: When you “learned the Kimball method and model” – did you simply “lift & shift from 3rd normal form?” Would that have worked? No, definitely not. It required an entire change to your thought process and designs. The same thing applies to Data Vault 2.0
Question: What are your recommendations for a successful refactoring? What success factors should we care about that are critical?
In order to refactor – it must be an engineered approach. Taking one piece at a time, breaking it in to bite-sized chunks, managing it properly, and training your people appropriately. This is not something you can turn over to a single individual – then outsource to a typical team. If you take this approach, it will fail for sure.
I’ve already outlined quite a few requirements above for refactoring, along with some potential failure risks. That said, I will summarize here for you: (Note: These are NO different than what you would be told when running an enterprise Kimball project) These issues are enterprise focused BI initiative issues – They are NOT just Data Vault issues. If you don’t understand this, then you don’t understand enterprise Business Intelligence.
- Do not assume lift and shift will simply work
- Do not assume that “you can do this with a single resource” on a rogue project
- Do not assume that a single certified resource is enough to manage an outsourced team to success
- Do not assume that this can be done overnight
- Do not assume that just reading the book is good enough
- Do not assume that all consultancies are created equal
- Do not assume that all consultancies who claim DV2 expertise actually have what it takes
- Do not assume that Management can take a back seat (and not engage or not empower) the team that is responsible for building
- Do not assume that a one-time training is all that it takes
- Do not assume that your project can continue without guidance
- Do not assume that “buying an automation tool” will simply make it work (without training & certification in Data Vault 2.0)
- Do not assume that “CDVDM is all you need” – this certification is focused on JUST THE DATA VAULT MODEL, furthermore CDVDM is now over 15 years old and relies on Data Vault 1.0 specifications that are out-dated.
I hope this has been somewhat helpful for you in your efforts. If you’ve not yet watched my video, please take a look at it. It’s FREE, and while it’s LONG, it is fairly comprehensive introduction to Data Vault.
VIDEO: https://datavaultalliance.com/news/dv/understanding-data-vault-2-0/
Again, if there is something I missed here, or something you’d like me to answer for you, please use the CONTACT US form and let me know.
All the best,
Dan Linstedt (C) 2020 All Rights Reserved