Clear all

Flexibility and Fun with Links

Posts: 482
Topic starter
Joined: 3 years ago

Hi Everyone,

You can stick to the standards and still produce a flexible data model in the Raw Data Vault.  How?  Two ways:

  1. Leverage Non-Historized Links (just as you would FACT tables in Dimensional Models)
  2. Replace all links with PIT + BRIDGE combinations, including calculated fields (be CAREFUL with this one)

A word of Caution Before Starting:

First, if you "remove standard links and link Satellites" then you can end up NOT recording auditable data about relationships that exist, UNLESS you have a giant PSA (persistent staging area or Landing Zone) with all the history embedded.

That Said: Let's have some FUN shall we?

Links are NOT required to have Satellites, in fact, a Non-Historized Link will NEVER have a Satellite (which means it will NEVER have delta driven data sets).  These links are great for storing machine generated and IOT or Sensor data that arrives in real-time.

If you take a closer look at PITS and BRIDGES you might notice a few things:  a) they are both in the Business DV which means you can add calculated fields, put business rules upstream, change the structure, b) they are snapshot driven.  In other words: THEY DON'T HOLD DELTAS EITHER!!  They Hold SNAPSHOTS of what the data looks like "right now".

It also means that you can combine PITS and BRIDGES together to build a single fact table like structure.

So Where's the fun?

Assuming (again) you have either a PSA or a Landing zone with ALL the Raw History, you can play a game:

Replace all Non-Historized Links with PIT BRIDGE hybrids.  If you're really daring: get rid of ALL links and ALL link Satellites, and store just Snapshot & computed data in PBG hybrids.

Leaving you with Hubs and Satellites.

THEN: Add a second set of patterns to the load process:  Use the staging tables to determine what "deltas" are inserted to the Satellites, then make the PBG insert only with "new records being added".  Make a snapshot of the old ones to complete the compliment of rows.

If this sounds a bit convoluted, let me know - I am building a PIT/BRIDGE class right now, and can add this as bonus materials.  This is not for the feint of heart, but can be a huge performance booster to your Data Vault warehouse.   As long as the "relationships across hubs" where the link would be, are maintained at the lowest level of grain, you've accomplished all goals and maintained auditability.

End result?

  • Massive Performance Gains for outgoing queries
  • Downside?  A LOT of disk space required to keep continual snapshots of all keys and all data.  BUT if you have compression, OR a columnar DB - this will no longer be a problem.

Love to hear your thoughts