I often find foreign key normalization to a reference table using a surrogate key column instead of actual unique code.
CREATE TABLE [dbo].[ProviderType] (
[ProviderTypeId] INT IDENTITY(1, 1) NOT NULL,
[ProviderCode] VARCHAR (50) NULL,
[ProviderCodeDescription] VARCHAR (50) NULL,
[Attribute1] VARCHAR (50) NULL,
[Attribute2] VARCHAR (50) NULL,
[Attribute3] VARCHAR (50) NULL,
CREATE TABLE [dbo].[Provider] (
[ProviderID] INT IDENTITY (1, 1) NOT NULL,
[ProviderTypeID] INT NOT NULL,
[ProviderCode] VARCHAR (10) NULL,
[ProviderName] VARCHAR (10) NULL,
[Specialty] VARCHAR (50) NULL
I believe this is a straight up reference table. It only exists in the context of a Provider as a result of normalization. If the table used "ProviderCode" as the primary key, that code would already exist in the Provider satellite which seems pretty straight forward (standard reference table). But it uses meaningless surrogate key so...because I see this often, I wanted to validate my understanding of how to deal with this in Data Vault (FYI - have read the book from cover to cover and am currently scheduled to take the Jan CDVP2 class, so no need to put me down the "I discuss this in my class..." route please...I'm already coming ; )
First, should surrogate key (ProviderTypeID) be included in satellite? A long time ago in a galaxy far far away...well...10 years ago, Dan posted this response to that line of question which was last commented around 2015.
That's always been my understanding since reading that article along with another one that talked about how that practice might promote usage of joining satellites to satellites for people that know these values are relationship keys (which could lead them into trouble).Recently, however, I have seen other articles where folks put them in purposely. I want to see if there has been a significant shift in thinking since then.
While the raw vault should not be available to the masses, I have seen examples that allow data professionals/data scientists access to the raw vault. Allowing these surrogate FKs in the satellite may be helpful to that small group trying to find insight into the source data that is conveniently located in one repository. They understand relationships between satellites and they know the risk. They can always remove these columns from Information Marts through views or by simply not bring over the column via mart ETL as a general rule.
I suppose there are generally two classes of these things: surrogate keys that have a relationship to a table that happens to be business entity (hub) and those, like in my example, where its more of a reference table as a result of normalization. Do we treat these exactly the same now and include them in satellites or do we separate how we deal with them such as the two options below?
Surrogate key to business table - INCLUDE (may or may not already be a hub)
- Include in satellite (useful for data professionals to build relationships that may not exist yet)
Surrogate key to reference table - EXCLUDE
- Perform LEFT JOIN to reference table when building s_Provider satellite to replace surrogate key with actual code column value (ProviderCode)
- Create reference table without ProviderTypeId (as this is also an identity column)
If we treat them the same and allow both types in satellites, can we treat the surrogate key to a reference table as the real code in order to avoid having to do the LEFT JOIN and filter out the ProviderTypeId?
The second question relates to if an argument can be made to make ProviderType it's own business hub and treat the relationship to Provider as a link? I wouldn't think that is appropriate, but wanted to throw that out there.
Thanks for taking the time to read this...I know its a bit wordy ; )