We are trying to use DataVault methodology on our new DWH. Before we dig deeply into automation we would like to understand if our vision is OK and we are moving the right direction.
This is the example of the function that would be the starting point of the data trip:
CREATE FUNCTION dbo.f_lease__legacy_data
, @row_version_to BINARY(8)
RETURN ( SELECT l.amount_financed
, l.lease_number AS lease_number_pk
, l.dealer_code AS dealer_code
FROM dbo.leases AS l
WHERE l.row_version > @row_version_from AND l.row_version <= @row_version_to
- To load the data
- To get needed metadata to create hub/satellite/link tables definition and get their objects/pk column names
Based on the function name we can figure out that there would be a hub with the name h_lease and satellite s_lease and their structure will be created on the column names from the function and their types. Both of these tables will have column lease_hk and surely all other needed columns as well.
I'm not yet sure what exactly will we do with links. Most probably they should have their own dedicated functions with all needed joins and keys there.
So, ETL will pump all the data from this function into the staging table that would be named in the similar naming convention and another step will load the data.
This is pretty much it. My main questions would be:
- Is such naming ok? I mean is that OK to put the "entity name" that will be used to for hubs and satellites?
- If yes, then are any best practices how to construct good name and what should it include?