Hey Everyone, this topic got "started" in another forum post, and I think it deserves its own separate discussion. So here it is... My two cents are as follows:
All loading processes should convert to upper or lower case for all business keys on the way in to either the staging table or the Hub (depends on where the hash key is calculated) it needs to execute before the hash is created.
WHY do I feel this way?
99% of business WANT master keys, master data - 99% of applications do not apply "meaning" to Case.
Example: "Abc123" for a contract number, should represent the SAME contract even if it's entered: "ABC123"
If the Hash is CASE SENSITIVE, then these two "records" will create separate Hub entries / different business keys - when they SHOULD consolidate for the BI layers going out of the DV. To push the case handling downstream to the BI layers would result in far more I/O during the SQL query layers, not sure its a good idea, and definitely does not fit with the integration paradigm of Data Vault methodology.
In my career, I have run into 1 (one) company that said to me: Abc123 represents a different contract than ABC123 - case mattered to their Business Users... I said: what happens if someone makes a "case entry" mistake?? their faces went blank, and they started to ask me how they could detect any potential flaws / errors. I said: are your accounting numbers lining up? or are you always moving funds/payments around to balance the books? They said: the latter, I said: there's your answer... you have a BUSINESS process flaw that desperately needs fixing.
So with that, now I'd like to hear your opinion. By the way, CASE doesn't matter in the Satellites Either IF you truly follow the data vault 2.0 standards properly (either column compare or hash difference). And: by the way, most BI query tools are not case sensitive either in the entry of their where clauses. AND: an Auditor cannot tell the difference when they look at a value on-screen or on a report. To them, it's the SAME value, and the auditors at the company I mentioned above, kept FAILING the audits because of this mix up.
My recommendation? Follow the CDVP2 standards we teach in class, case-insensitive please... What are your thoughts?