PIT structure  


Posts: 1
(@yongji)
New Member
Joined: 2 months ago

I'm reading Scalable Data Warehouse with Data Vault 2.0 now and have a question about the Passenger PIT example of chapter 6. below are the columns of the PitPassenger PIT table.

PitPassenger

  • PassengerKey
  • PassengerHashKey
  • SnapshotDate
  • PassengerNameHashKey
  • PassengerNameLoadDate
  • PreferredDishHashKey
  • PreferredDishLoadDate
  • PassengerAddressHashKey
  • PassengerAddressLoadDate

Since PitPassenger was built for three Satellites of Passenger Hub, values of the PassengerNameHashKey, PreferredDishHashKey, and PassengerAddressHashKey will have the same PassengerHashKey. If that's correct, what is the benefit of having those columns? Is it to handle ghost records?

When building an information mart from this raw data, querying the passenger state on a given date, e.g. January 5th, 2006, becomes complicated: the query should return the customer data as it was active according to the data warehouse delta process on the selected date. It requires OUTER JOIN queries with complex time range handling involved to achieve this goal. With more than three satellites on a hub or link, this becomes complicated and also slow. The better approach is to use equal-join queries for retrieving the data from the Raw Data Vault. To achieve this, a special entity type is used in Data Vault modeling: point-in-time tables (PIT) and a set of ghost records in satellite tables attached to fixed primary keys.

Reply
Topic Tags
PIT
1 Reply

Please Login or Register