Today I had the opportunity to attend a presentation given bya company active in data vault 2.0 automation.
The presentation had some content that surprised me. It stated that when a source system had been configured in the tool, the standard way of working is to load all tables in the source system to the raw DV. Even when no business keys / were assigned to the tables.
The presenter stated that 2 teams work in parallel, one team extracting the data from the source system, one team doing business analysis and defining business keys.
From what I understand even in the raw dv the goal is to integrate business keys. What about building incrementally opposed to big bang ingesting every table in a source....
This smells like the dreaded ‘source system vault’. The principle of multi-master hubs existed, but doing so involved adding the source to the hub PK to avoid duplicates??? This was explained as ‘when there is no business key, the systems surrogate key (simply the PK in the source system) is the best we’ve got. Seen that 123 can be ‘Mr Orange’ in system A and‘Mr White’ in system B we cannot do without adding the source in the key.... this is solved by simply adding a same-as-link in the business vault’.
Is this how automation is meant to happen? What I personally try to do when automating / generate code from meta & definitions is start from business definitions first and then try to map systems to those definitions.
Just interested in the opinion of this community.