Data Vault 2.0 Suggested Object Naming Conventions

Hello everyone,  We have a list of suggested naming conventions that are essentially best practices.  There are a few standards documented here (rules that must be adhered to), but the naming conventions themselves are best practices (suggestions for you to execute if you choose).

The Data Vault 2.0 Standards for Naming Objects

Objects in the physical database and in the processing layers must be named.  The standards indicate what is required to be named and how.  The standards do not dictate which naming conventions to use.  For that, we rely on best practices (also below).

  1. A Prefix or Suffix pattern must be chosen for each object type: processes, views, procedures, functions, tables, columns, indexes, and constraints.
  2. Each object type listed in #1 must apply the chosen pattern globally (everywhere) for consistency and repeatability purposes.  For example: if the Data Steward team decides a prefix is best for views, then ANY view built and released to production for Data Vault (anywhere in the world for this company) must adhere to a prefix notation.  The benefits of following this pattern are: faster time to market, easier understandability, easier team-member on-boarding, easier (and less) documentation required.
  3. Common (enterprise wide) documentation of the naming conventions must be created and maintained.
  4. Any object abbreviations chosen must not conflict with each other, or with source system object names – they must be unique!

That’s it for the standards.  Now for the best practices, and suggested naming conventions that we’ve been using for years.

Best Practices and Suggested Naming Conventions

The most important thing to remember  is: what ever naming conventions you choose – must be enforced / enforcable in QA/QC.  In other words, and standards your team decides to implement aren’t worth anything without enforcement in the TEST area.  Your QA/QC team must be empowered to send things back to development when the standards are broken, and NOT let them slip through to production.  Consistency, scalability, and yes – even agility are all riding on the success of sticking to the standards.

Naming conventions prove not only useful, but essential in handling multiple geographically split teams and ensuring the agility timelines of delivery of each.  They also are essential in delivering accurate and understandable production builds.

  • Processes: P, PRC, PROC, SP
  • Views: V, VW
  • Functions: F, FUNC, FNC, FUN
  • Tables:
    • Stage: STG, S, STAGE
    • Hub: H, HUB, HB
    • Link: L, LNK, LINK
    • Satellite: S, SAT
    • Hub-Sat: HS, HSAT
    • Link-Sat: LS, LSAT
    • Hierarchical Link: HL, HLNK, HLINK
    • Same-As Link: SAL, SALNK, SALINK, SL, SLINK
    • Effectivity Sat: ES, ESAT, EFFSAT, EFF_SAT
    • Status Sat: STATUS, ST_SAT, STAT_SAT, SSAT
    • Non-Historized Link: NHL, NHLNK, NHLINK, NHLINK, XLINK, XL, XLNK
    • Multi-Active Sat: MAS, MASAT, MSAT
    • Business Driven Structure (prefixes all above conventions): B, BD, BDV
    • Exploration-Link, EXPL, EXLNK, EXPLINK, EXL
    • Point-in-time: PIT, PT, P
    • Bridge: BR, BRDG, BRIDGE, BRG
  • Columns:
    • System Driven Columns:
      • Hash Key: HK, HKEY, HKY
      • Hash Diff: HD, HDIFF, HDF
      • Load-Date Time Stamp: LDT, LDTS, LOAD_DTS
      • End-Date Time Stamp: EDTS, ETS
      • Sequence: SQN
      • Record Source: RSRC, RECSRC, REC_SRC, RS
      • Applied Date Time Stamp: APPL_DTS, ADTS, APDTS, ATS
      • Sub Sequence: SUB_SQN, SSQN, SUBSQN
      • Status Indicator: STAT_IND
      • Deleted Date Time Stamp: DEL_DTS, DDTS, DELTS, DLTS
      • Snapshot Date Time Stamp: SNAP_DTS, SNAP_TS, SNPTS
      • Updated Date Time Stamp (used in BDV, Master Data, and Info Marts): UPTS, UPDTS, UDTS, UTS
    • Generic Prefixes and Suffixes for other columns
      • Business Keys: BK, BKEY
      • Dependent Child: DEPC, DC
      • Driving Key: DK, DKEY

A Word about Classification

Naming conventions are good, but are only the base level of object identification and purpose.  They carry meaning about what the object is supposed to be used for, and where it’s needed.  That said, there is another level: classification of objects (ie: what should they mean or be at the conceptual level)?   For this, we develop taxonomies and ontologies – along with conceptual models.

From an implementation view (using most standard data modeling tools) we highly encourage the use of the COMMENT fields on the table structures, columns, procedures, functions, and views.  We suggest leveraging JSON formatted comments to house extra metadata.  An example of this is below:

COMMENT FOR A HUB_CUSTOMER TABLE:   { “entity_class”: “HUB” }

COMMENT FOR A STAGING TABLE:

{ 
  "entity-class" : "STAGE", 
  "business-key-groups" : [    { "bk-columns" : "field1, field2" },
   { "bk-columns" : "field1, field5" }
  ]}

Adding JSON formatted COLUMN and TABLE descriptions is:

  • cross-platform (works on all database platforms)
  • Can be generated (with some parsing) in to business documentation
  • Can be leveraged by automated QA/QC processes

Note for Snowflake Customers

If you are on Snowflake, you should (instead) leverage the TAGGING options for these purposes.  However that said – if you wish to remain cross-platform in your approach you should then follow the JSON – Commenting styles.

Shopping Cart