Raw Vault is the raw data modeled as hubs, links and satellites and the content of these artifacts are provided by the source. Modelling should be as easy as 1-hubs, 2-links, 3-satellites (+4, +5):
1) Identify the business keys
These will set up your hubs, the grain, the number of columns that make up a business entity uniquely. The challenge of hash collisions. This is where the real modelling happens because this requires decisions and understanding of how to map the source supplied keys to the domains of the business. Is there master data involved?
2) Identify the relationships
Supplied by the source --- if there are three business keys supplied by the source then all three must be mapped into a link table. The unit of work is supplied by the source and the unit of work are the related business keys, breaking that up into separate link tables not only obscures the relationship depicted in source but makes querying the data that much more complex when it would have been easier just keeping the source-supplied relationship in tact.
3) Identify the satellite splits
Satellites should just be a copy of what is supplied minus the business keys (sent to hubs) and relationships (sent to links). But variation of satellites come down to where you split them: rate of change and/or some of the content have personally identifiable information (PII).
4) Identify the dependent child keys
Keys that make the business key unique and keys that cannot exist on their own because they have no business meaning. They introduce the concept of having multiple active records per business key and behave similarly to multi-active satellites.
5) Reference data
Model as plain lookup tables or reuse hubs & satellite loading patterns to load the code into hub-ref and the name+descriptions into sat-refs.
If you think about it, if the source system logical data model identified the primary keys and foreign keys already then most of the raw vault modelling is already done!