Clear all

Integrating hub by business key, where the BK is stored in two columns in one source, but one column in another  

Posts: 8
Active Member
Joined: 7 months ago

Hi everyone! 

I'm running into a problem when trying to integrate multiple source systems into a single "Fund" hub in our raw vault. This problem has to do with a situation where my records have the same semantic meaning and granularity across these systems, but the business key is composed of two columns in one source, but is concatenated together into a single column in the other source.

The Background:

Each of our source systems has a unique code for each fund. The same fund might be represented by a different code in each source system (e.g. S&P 500 might be '0G01' in system A, and 'SP500' in system B). They also might have slightly different descriptions in each source system (e.g. 'S&P 500' in system A, and 'Standard and Poor's 500' in system B). 

We have a centralized SQL database today that takes the fund records from each source system, along with their unique fund codes, and uses a relationship table to map them together to indicate that fund '0G01' in system A is the same as 'SP500' in system B.

The Problem:

Now we are building a hub in our data vault to house a unique list of Funds, and I'm faced with a weird situation because of how the fund codes in each system are composed. Our thought was to ingest the fund codes from all source systems as well as the centralized SQL database that holds each fund record, hoping that we could construct a business key that would recognize that the fund coming from the source system is the same fund as the one coming from the centralized SQL database (since the centralized SQL database fund table was populated from that source system in the first place).

In one source system, a fund is uniquely identified by a combination of two columns, "Account Code" and "Division Code".

When we populate the centralized SQL database, we create a "Fund Code" by concatenating together the two fields above into a single value.

On the one hand, I don't think it is a good practice to concatenate Account Code and Division Code into a single value when populating the hub in the raw vault, but it seems to me that is the only way to match back to the record coming from the centralized SQL database.

On the other hand, I don't know how I could define the Fund business key as a combination of Account Code and Division Code (which might not be the case for other source systems) and then insert the "Fund Code" from the centralized SQL database into just one of those columns, since it is not an Account Code nor a Division Code - it is a combination of both of them. And I don't think I could split them back out into two columns now that they have been concatenated together.

The only option that makes sense to me right now is to create two separate hubs, one with the BK on "Fund Code" and the other with the BK on "Account Code" and "Division Code", and then creating a link between them to indicate which codes represent the same Fund. But then I end up with two hubs with the same semantic meaning and granularity (and possibly with some of the same Hub Hash Keys, unless I use a delimiter between Account and Division code in the one hub, like I should).

Anyone have any thoughts on this? Maybe the issue is a more fundamental one, with me trying to integrate source system A with source system B, when source system B's data came from source system A in the first place. Any input greatly appreciated!


6 Replies