I’m attempting to model the beginnings of a raw data vault from a 3NF source system. Currently, I’m modeling patients, providers, facilities, and appointments. The table script below represents the critical columns of the appointment table in the source system (there are other columns that will be included in satellite). I’ve left out the other tables representing patient, facility, and provider for simplicity. AppointmentId is actually a meaningless identity column. There is no alternate unique business key on the appointment table.
CREATE TABLE [dbo].[Appointment]
[AppointmentId] INT NOT NULL PRIMARY KEY IDENTITY,
[ProviderId] INT NOT NULL,
[PatientId] INT NOT NULL,
[FacilityId] INT NOT NULL,
My initial instinct is to model appointment as a link. This PatientID went to this FacilityID to see a ProviderID. My problem is that this set of business keys can repeat over time (follow up appointment, etc.). There are only two things that make it unique (AppointmentID or adding AppointmentDate as a business key). This brings up several considerations/options in my mind.
- Date seems like the classical reference entity type and, from what I understand, is not typically represented as it’s own hub (which would be required in order to add that business key to the Appointment link entity for uniqueness). It also has no business meaning by itself. It is a globally recognized reference set that does not change. Having said that, I did come across Kent Graziano's great post series on DV design patterns where he uses a date hub to represent the start date of an employee so maybe this is a case where a date hub is acceptable and could be re-used for any other entity where date is part of the natural key as well as being used to build the standard d_Date dimension in a star schema.
- If AppointmentDate ends up changing to DATETIME, that would necessitate creating a time hub.
- AppointmentID is a meaningless identity value which should be avoided when there is already a natural key that makes the relationship unique (which we would have if we added AppointmentDate, but then have concern #1).
- Although AppointmentID is an identity value, it does appear on UI screens (visible to the business).
- ClaimID is not in scope at this time and is not a required field for an appointment (NULLABLE), however the relationship between appointment and claim is definitely coming down the road in a later sprint. This one is interesting. If appointment is modeled as a hub, then I can add this link to claim whenever I want L_APPT_CLAIM at any given point in time with no issues whatsoever down the road. If I model appointment as a link, it feels like I would need a stub hub for claim as a placeholder for when claim data is introduced (keep in mind that ClaimId is NULLABLE).
Based on these considerations, I feel like I have a few approaches here (I lean toward Approach 2):
Create hub for date and use these business keys to create link entity along with creating a stub hub for ClaimId as a placeholder.
Create hub/sat for Appointments with AppointmentID being the business key (ugh) and then create a link (with no satellite) using AppointmentID, PatientID, FacilityID, and ProviderID
Approach 3: (don't kill me Dan - just throwing this one out there):
Create appointment link with PatientID, FacilityID, and ProviderID business keys and have multi-active satellite hanging off appointment link to capture multiple appointments between the same provider, facility, and patient (AppointmentDate would make it unique in sat).
Are there other considerations I should take into account here?
Are there other approaches that can better model this type of situation?
How would you DV gurus handle this type of situation?