Background: given that data vault is more than just a modelling methodology how do you speak to an audience about the benefits, ambiguities, solutions of data vault?
These are just my own thoughts, and I welcome changes, your own elevator pitches and ideas!
To the business executive. Data Vault brings the proven experience of integrating your business’ business process landscape into repeatable automation patterns in technology and agile practices that not only speed up delivery and business value, but are predictable, easy to change and flex and grow as your business grows while providing full audit history of your data.
To the data modeller. Data Vault does not replace a dimensional model, instead a data vault is best suited to model the data warehouse as it maps the enterprise landscape to the enterprise ontology. Dimensional models still have their place but to best serve data presentation that often conforms to Business Intelligence tools or other reporting or dashboard requirements. DV provides the audit history in a non-conformed way whereas your dimensional model will conform to present the data to the business and in that respect you can always recreate the source application at any point in time.
To the enterprise architect. The Data Vault chiefly maps your enterprise ontology to the Data Warehouse, making your enterprise ontology data driven. We identify what those business entities are whether that be customer, accounts, products etc and the Data Vault will map those as Hubs, the relationships between those business entities are recorded in Links and the descriptive details (enterprise history) is tracked in satellite tables around these hubs and links. The DV model represents what we know about the business’ application and data integration landscape and is agnostic to the platform it is delivered on, all in all it really represents what the business architecture desire is and ingrains data governance best practices and contains the necessary components to address data privacy concerns.
To the product owner. Data Vault makes the development of enterprise-tied analytics accelerated because it gives you the confidence that what you are building is relevant to the business by using repeatable established patterns and a proven track record of integrating hundreds to thousands of data sources into a single source of the facts.
To the scrum master. Data Vault provides the same loading patterns for any use case needed to be modelled and loaded to an enterprise data warehouse, we identify what we want to model, split into chunks of the model that we model out into sprints that will themselves follow a repeatable pattern.
To the business analyst. Think of the Data Vault as a place where we record the history of the business processes as it pertains to the relevant parties and entities to the business, in here you only need to understand what the business entities are and how they relate to each to be able to deliver data analytics as it relates to the business. When we add more data sources into the data vault there is no change to the existing data vault artefacts, only (if needed) the addition of new artefacts if there are data and metrics not yet available in the data vault.
To the solution architect. DV differs to Kimball or Inmon in that we establish the loading patterns and rules for hubs, links and satellites (our raw tables) that comes from our source applications and load to raw vault and fill the technical and application gaps in business vault to establish a full enterprise view of the data landscape of the enterprise. From there with the data warehouse populated by these three simple table types, the information marts are delivered on top of the data vault, the data vault provides the enterprise audit history which makes our information marts disposable.
To the data engineer. Once the loading patterns are established the non-modelling point of view of Data Vault is the repeatability of those loading patterns; once the landed files are staged (adding hash-keys, record source columns, applied date, load date) that staged file will load one or more hub tables, zero or more hub-satellites, zero or more link tables or zero or more link-satellites. That’s it! Just load three table types and the rest of the tasks around managing a data vault is around scheduling, task management and so on. The data vault modellers you support are left to model new data into the data vault using those repeatable loading patterns and because they are repeatable loading patterns we have repeatable testing patterns as well.
To the data scientist. When looking holistically at the data vault the data model resembles a mathematical hyper-graph with nodes and edges; nodes are our business entities represented as business entity hubs and edges are the multi-node relationships represented as links. This gives us the business reasons, rules and processes mapped to the data warehouse to support the business service. In this enterprise ontology associated data modelling structure we attach descriptive historised content about nodes and edges which can open the opportunity to learn about what we do well and what we can improve by designing new neural networks over the established business application landscape.