A solution to reference tables in Datavault? A bold statement, but we think we might have cracked it. This isn’t the first time I’ve floated suggestions around alternative ways to represent reference data in Datavault and the options I outlined before meant some form of bending of the Datavault principles or a bending of the data modelling tools (see Options for reference data in DV).
Having had the privilege of working with my newfound friend John Giles (@johngiles) in recent months, the topic of reference data has come up in conversation on many occasions. We wrestled with the shortcomings of the current “standard” that Dan Linstedt (@dlinstedt) and Michael Olschimke (@michaelolschimke) describe in their book (Building a Scalable Data Warehouse with Data Vault 2.0 [Section 6.3]), but it always felt like this was an afterthought and not something that was well integrated into the modelling standard. John came up with an idea that has spawned what I believe to be a very simple, elegant, and (dare I say it) near genius solution to the problem.
The problem as I’d stated is that sometimes data looks like reference data and sometimes like a hub and satellite combination. The example I used was currency, where in many satellites currency is a simple look-up (reference). In other cases, such as currency exchange transactions, we’d really like this information placed in a satellite that references a currency hub via a link (From and To currencies). Under the current standard we can’t have both representations and ideally, we don’t want to duplicate the data in different forms.
The solution we’re proposing is that we introduce a new type of table into the Datavault standard, let’s call it a reference table; for currency it might be labelled REF_CURRENCY. What does this table look like? Say the currency hub looks like this.
(PK) HK_Currency (hash key)
(AK) Currency_Code (business key)
The new reference table then looks like this:
(PK) Currency_Code (business key)
(FK) HK_Currency (foreign key to HUB_CURRENCY)
That’s it!!! The reference table is exactly the same as the hub but has been flipped on its head. When we want to use currency as a reference, a satellite can refer to the REF_CURRENCY table which then points to the HUB_CURRENCY table so we maintain a point of integration. If we want to use Currency as a hub then we can do that too. Any contextual data for the currency will be recorded in a satellite (or satellites) that hang from the Currency hub table with all the standard loading patterns that handle history.
I’ve included an example of how this might look:
Of course, this needs more thought, for example, how do we factor in the Key Set (BKCC)? What about business keys with different structures?
Automation should also be simple; a reference table is almost identical to its parent hub.
I'd be really interested to see what everybody thinks of this "solution". Does it have legs?