There are always questions on what makes a good business key? how do I choose one? and so on. We cover all these questions and more in our CDVP2 (Certified Data Vault 2.0 Practitioner) class. If you've not attended, you probably should. It's a great way to really increase your knowledge. That said, here's a beginners guide to choosing business keys.
1: Do not simply choose the source system primary key
Source system primary keys often are meaningless sequence numbers. Meaningless sequence numbers are not the first choice, they are the last choice. Why? Because they do not integrate across multiple lines of business, multiple applications, and they are not durable over time. Most importantly, they are not meaningful to humans
2. Ask business users what they use to uniquely identify records of a particular type
Business users can and should be able to show you the actual fields in the source system (by opening the operational applications and demonstrating) that uniquely identify rows. These fields are the lookup fields, the pull-down or drop down lists, etc.. These fields should be non-editable. Business keys in Data Vault landscape are known as durable keys in Star Schemas.
3. Profile the data set, test your findings / assumptions
Attempt to check the data for uniqueness, as well as populated values. Sometimes Business keys have been defined, but sometimes are left 50% NULL (for example). This is a business break that needs to be fixed going forward in the operational applications.
4. As a last resort: use the Primary Key fields
This is truly the very last resort - if you choose source system PK fields as your business keys, you will end up with a lot more objects in your Data Vault than you want or need. Furthermore you will end up with a source system DV model - which has a lot of bad implications. Finally, you will end up with zero (and I do mean zero) integration by business key.
Some words of wisdom: Do NOT get caught in analysis paralysis. Choose a business key, and move forward. Remember: Business Keys may be comprised (made up of) multiple fields (aka: composite business key). Do not get stuck on the following: "BUT my business key has multiple values for the same data set!! I need to conform it!!" WRONG. We do not conform ANY data until we load the Business Data Vault.
The fact that a single Business Key may have multiple values is irrelevant. Business Keys are defined to be IN the same hub when: they have the same semantic meaning and the same granularity across systems.
Examples of good business keys: VIN (vehicle identification numbers), BAR codes on products, Employee Badges, Part Numbers that never change, Model Numbers of Parts, Lattitude + longitude + altitude/depth, Prescription Drug Codes.
Now, that said: here's one last example to consider: At Lockheed Martin we had 75 source applications from around the world that all carried "ROCKET CONTRACT NUMBER" - a long string of alphanumeric values, that identified a single rocket building contract. Hence - the rocket contract number was built in to a single hub (they were all defined SEMANTICALLY the same: "rocket contract number"). Each source system had a different physical attribute / field name. 75 Different field names. Each source system represented the same rocket contract number with a different value. That meant: we loaded up to 75 different values / records to a single hub, for a single rocket contract.
In order to resolve these to a single rocket contract (ie: master data resolution) we needed the source systems to feed us the "old mapped to the new" contract number, every time it changed the value mechanically. This data was then loaded to a same-as-link.
So hopefully this insight helps you get a grasp on Hubs, and Business Keys. Happy building!