What is it?
To understand Business Vault, we need to understand Raw Vault first. A data vault model comprises hubs, links, and satellites as it is well documented. Each of these artifacts has a special purpose that is defined as the following:
A hub table is the unique list of business keys, and a Raw Vault model can have many, each representing a domain, subject area, capability, or part of a business process of the organization. For a unique list of account numbers you will typically find in a table called hub_account, customer ids would have been loaded to hub_customer and so on.
A link table not only contains the relationships between business keys but also represents the unit of work. Where does this unit of work come from? Well, they come from the automated data output of business processes.
A satellite table is used to track the change record for either a hub table (business key) or a link table (the unit of work).
Why then are these defined as RAW vault artifacts? Because the data is produced by source applications that ultimately are automation engines for business processes. And they continue to produce data collected and modeled into raw vault.
The business vault has many uses; in the context of raw vault, the business vault fills the gaps in business processes. Imagine the viewpoint from the source applications. These are often third-party software catering for their bottom line and often servicing hundreds or thousands of their clients, sometimes globally. The automated business processes will be more generic in nature, to the best of their knowledge and generalization of the industry they serve. Sure they take suggestions on how to improve/adapt their software, and through version and maintenance releases, they update their data model. Still, their clients usually cannot wait for those change requests to be actioned in order to proceed with their businesses. Some change requests may, in fact, never get done!
That is what Business Vault is designed to bridge! Fill in the business process gaps that source systems would have ideally solved! It would probably have been easier if the application was built in-house, but even in those circumstances, you cannot always expect the application to respond exactly to the business analytics’ timelines.
If the source application can fill these gaps, then that is the best place to solve them! Model the change when it is solved and migrate the business vault artifact history to a new raw vault artifact where necessary.
The second purpose of the business vault is to centralize derived content by reusing the raw vault’s same loading patterns! That means the same agility, audit-ability, and automation you expect in the raw vault are given in the business vault!
Where does it live?
Derived content that features the audit-ability mentioned above should contain the lineage of how the data was produced; that means that the business vault is based on the raw vault and extends the overall raw vault model where it is needed. If a business vault is delivered as views instead of physical tables, you do gain performance (initially), but then through time, as the raw vault grows, your business vault query performance starts to suffer. A view is unexecuted code logic, which means defining a business vault as a view introduces vendor lock-in. Rather if the derived business rules outcome is persisted instead, then there is no vendor lock-in. You are free to deliver derived results on any tool and persist the outcome against the unit of work or business entity.
Business Vault physicalized means you can better manage business rules as versions. That is if a business rule updated does not update the history, but rather, is applicable from a point-in-time and onward. It also means that Information Marts are not deployed as views on top of views! Now, that’s a recipe for a slowly-degrading-Information Mart!
What does it look like?
Although you re-use Raw Vault artifacts to load a Business Vault, it is never a replication of Raw Vault. You design derived business rule outcomes that are in itself autonomous; that is, it should not be reliant on a previously calculated derived value to get to a new derived value. Why? If any correction is needed in the previous value, the potential snowball effect of fixing downstream values could have a knock-on effect on every dashboard and report based on that value. That’s not to say that rolling aggregates are banned! No, consider the implications of designing such a Business Vault and the potential for corrections!
Business Vault is not where you conform column names to how businesses would like to see it! Don’t forget, although it is not a Raw Vault, the Business Vault still requires many join conditions to be executed to return the data from the data vault! Business Vault serves as the decoupled derived intelligence based on the Raw Vault.
Business Vault will have the same Data Vault metadata tags as a Raw Vault, and here is where the integration comes in. A Business Vault record source should be the name of the derived business rule name and its version number. Yes, this ensures business rule lineage tracking through your business rule version tool of choice. That means a change record can occur in the Business Vault when,
a) a change in raw data the business vault artifact is based on, and/or
b) a change in another business vault the business vault is based on, and/or
c) a change in a combination of business or Raw Vault the Business Vault is based on, and/or
d) an update to the business rule (derived code) the Business Vault is based on
As we discussed, Business Vault output can be delivered by reusing Raw Vault artifacts. That means we can build:
- a Business Vault link – often used to represent the business view of the unit of work (which may differ from how a 3rd party application defines the unit of work), resolve a relationship based on derived business rules, testing business rules before persisting them; and all may include dependent child keys;
- a Business Vault satellite-derived descriptive output stored against a business entity or relationship from simple case logic to statistical outcomes; this may include dependent child keys as well;
- a Business vault multi-active satellite – if your output needs a derived SET of outcomes against a business entity or relationship;
- a Business Vault effectivity satellite – tracking driver to non-driver relationships not available in the unit of work coming from raw data;
- a Business Vault status tracking – deriving a status on a business entity or relationship by a derived business rule like entity or relationship aging
How do I build one?
Define the derived rules that need audit-ability; other soft rules can be integrated into the Information Mart layer. Deployed as physical tables, you can see how this business vault delivery method has an eye on performance! As the Data Vault model is built out with more raw and Business Vault artifacts, the same table structures will contain the same Data Vault metadata tags and the same key and index structures required for optimal performance. When performance lags, additional table structures exist that, in essence, are not Data Vault tables (because they do not offer audit-ability like hubs, links and satellites do). They are the fabled point-in-time (PIT) and bridge tables! In the first diagram above, they are the query assistance tables designed to take advantage of index-on-index joins. In Teradata terminology, they act like index-joins.
In the absence of indexes, alternate Information Mart structures may be sought, or potentially you may be looking at physicalizing and processing deltas to the Information Mart layer. Are we creating more layers in the data warehouse? Yes and no, the Data Vault looks to build the data warehouse through agility (it is non-destructive to change a Data Vault model with little or no refactoring) and guarantee audit-ability, the Information Mart in the Data Vault world is disposable. If it breaks, we have the full lineage and audit history in the Data Vault, and we can easily recreate the Information Mart whenever it is needed!
Building your Business Vault extends your business capabilities with audit-ability, agility and automation, the same we expect from Raw Vault. The only difference is where the data came from! That means your automation patterns are already there. Your task as a data modeller is to decide which business entity or relationship the derived content will describe and generate that derived outcome. Of course, don’t forget about data governance!
Author: Patrick Cuba
Patrick has nearly 20 years working on data-inspired problems utilizing his experience and he has embraced Data Vault 2.0. He works by understanding the business before innovating the technology needed to ensure that his data-driven delivery is agile and automated. He is Data Vault 2.0 certified and regularly contributes to Data Vault Alliance.