Happy New Year all!
I've been thinking about a subject previously discussed regarding loading hubs, links and satellites in parallel and thinking specifically about the impact on link tables in scenarios where a hub is shared but requires some translation before the link association can be made.
We have a concept named Platform and it is loaded from a one data source provider. Platform is used by multiple transactions that get loaded into various Link tables.
In some cases, the provider of the transaction data provides the platform_number in the data set as the business usually refer to it as, so we can directly populate the Link table in parallel with the Hub and Sat table, knowing that the data will be ultimately consistent by the end of the raw DV load process.
In some cases (for a completely different type of transaction), the provided transaction data source, uses a different naming convention for 'platform' (and may refer to it with a slightly different name) and so supplies as an attribute a different variant of Platform - however, it can be easily translated directly into Platform as the mapping between the two is recorded by the business. This occurs sometimes when integrating third party data with internally provided data. With a basic implementation of the Link table it means a lookup needs to be conducted first in order to create the Link record and associate it to the H_PLATFORM hub above, otherwise it will always be unknown. The linking attribute can be provided in the Platform data source file from the business directly ('system_x_identifier' is the contrived example name I've given it). This means we have a dependency between the Link and Hub load where this look-up is needed which prevents parallelism.
The options I see are:
1. perform the lookup of the platform name when staging the System X transaction data (not ideal as new platform data may be getting staged too)
2. for this specific Link table, load it out of sequence of everything else - wait until all Hubs are loaded first, then run the link load process. Not the end of the world, but degrades performance
3. treat this Link table as a Business Vault Link table - after all I am deriving an association via the lookup - applying a basic business rule - not great when the link is a transaction and not just an association as we have some attributes that are 'raw' and not derived - so only creating them in the BV seems counter-intuitive
4. treat the system X variant of platform as either a new hub (or as additional records in the existing Platform hub) and use a Same As Link in the business vault to associate one with the other and to identify the master platform records of interest
What are your thoughts? Is there a better way? I'm erring towards the SAL route - it enables me to still load everything in parallel and mop up the associations of things that are essentially the same concept in the BV layer.
However, I'm not sure thought if the platform and platform variant data resides in one hub or two and if it is acceptable practice to create a SAL between records living in two separate Hubs, rather than two records in one hub?