Data Vault Data Mesh Polysemes

Data Vault, Data Mesh, and Polysemes

For those who are familiar with ‘Data Vault’ and are curious about ‘Data Mesh’, you may be asking yourself if any synergy exists between the two.  I’m happy to say that there is beautiful symmetry between the two.

Data Vault is a proven methodology for building an analytic solution end-to-end from the businesses’ perspective; in other words, Data Vault provides not only “what” is needed, but also “how” to design and deliver a scalable, auditable, affordable solution.  Data Mesh is a concept or framework for visualizing federated computed data governance of data product domains that are primarily aimed at enabling domain-specific teams to design analytic outcomes in an autonomous manner.  That’s the “short” of it, albeit there are layers of concepts in the Data Mesh framework that provide a variety of detail related to its application, breadth, and flexibility within the data ecosystem.

Implementing the concepts of Data Mesh on top of a Data Vault foundation makes a lot of sense because most of the concepts expressed in Data Mesh have been at the core of the Data Vault Methodology for decades.  That’s good news for those who have a Data Vault analytic solution in place and are thinking about Data Mesh — you already have a solid methodology – the “how to” – regarding an accelerated path to achieving your end goal.  This article is going to focus on the concept of a polyseme to illustrate the symmetry.

A POLYSEME is “a shared concept across different domains. They point to the same entity, with domain-specific attributes.  Polysemes represent shared core concepts in a business such as ‘artist’, ‘listener’, and ‘song’.”[i]

Data Vault focuses on the business – not just the technology aspects, but on the people and the processes.  A healthy vision of an enterprise-wide analytic solution cannot merely focus on tooling or on the technology.  It MUST span and incorporate People, Process and Technology.  This is exactly what Data Vault does.  The solution provides businesses with the methodology needed to embrace all three components of a healthy solution. We emphasize in our CDVP2 classes that “Data Vault is, was, and will always be, about the business.”

When we consider polysemes as being a “shared concept across different domains” with domain-specific attributes, our Data Vault minds should immediately be drawn to the longstanding definition of a logically integrated Hub.  The entire purpose of a logically integrated hub is to collect all values of a business concept as identified by its business key wherever those business keys originate across the enterprise.  In simple terms the rules for a hub are: (1) a Hub holds a unique list of business keys (for the business concept represented), and (2) all business keys collected in the Hub must have the same semantic meaning and be at the same semantic grain.

It has been my position from the beginning that regarding polysemes, no one domain can “own” the governance of a polyseme.  It is my perspective that a polyseme is a business concept that is core to the entire operation and sustainment of the business, shared throughout and across the business, and therefore, it is owned at the corporate level.  This does not make it monolithic.  It treats it as it should be treated – core to the business’ ability to continue generating revenue.  It should be governed as such.

There are other business concepts that are not necessarily core to the business, but are enablers to the business, and those are mostly evident in the various domains.  Hubs based on those business concepts should be governed by the domain that owns them.

There are also certain contextual elements of polysemes that are shared across the entire organization and need to be governed as such (e.g., CUSTOMER_NAME, EMPLOYEE_NAME, SUPPLIER_NAME, BILLING_ADDRESS, EMPLOYEE_PII … you get the picture).  Such elements represent the satellite data (the contextual data over time) of a polyseme and ought to be governed at that level.  Other contextual elements associated with a polyseme will be domain-specific; therefore, each domain should be free to store, own, govern, and utilize that descriptive / contextual data in a different satellite object as required.  Other domains may use these satellites of domain-specific contextual data across the mesh as needed based on quality and accuracy (via “up votes”).

Polyseme links represent well-established relationships among business concepts, that may or may not have one or more driving keys, but the relationship of the link as defined is “fixed” across the enterprise and governed at the highest level.  No one domain is allowed to “own” the polyseme relationship; it is owned at the corporate level.  Domains are free to create links in their own domain for discovery, or as data products for specific and different grains of the core link’s relationships, or whatever the domain requires.  Those domain-specific links may be shared through the mesh as well.

Polyseme links present immutable associations among the business concepts; they are common and consumed across the enterprise domains.  For example, an INVOICE is a relationship between ORDER x CUSTOMER or ACCOUNT x PRODUCT x LINE ITEM. The core relationship concept of INVOICE is available for use by the entire enterprise.  The one thing I’d caution about regarding governing polyseme links is to remember that the polyseme link relationship, when based on a driving key, is one directional.  Relationships, however, are perspective-based and may be multi-directional based on a domain’s perspective.  We must allow the domains to view a relationship from their perspective, therefore, governance allows for and encourages such domain-specific links, with or without one or more driving keys, be generated by the product domain team as needed.

I happen to believe that this perspective of polysemes versus domain-based data objects provides the appropriate level of centralized governance while providing latitude for domain areas to reuse the polyseme objects for enrichment, discovery, and the creation of valuable data products without data replication across the enterprise.  Notice, I did not state where any given polyseme “exists” – no platform, no technology, no nothin’.  I only say that it is defined, owned, and governed at the highest level of the organization – not by any specific domain.

One last thought here.  Business Vault objects that represent Master Data ought to be considered polysemes; owned and governed at the corporate level as well.  Domains may produce their own set of “master data” for consumption at the data product level, but the corporate Master Data produced at the polyseme level is highly governed and well defined for the entire organization.  It can be used to feed operational systems and domain-specific applications if needed.

In my vision of a Data Mesh, this presents a truly flexible, resilient, scalable hub and spoke concept that incorporates the right level of governance in the right place ensuring the enterprise is using common language around their core business concepts.  It also provides the freedom for each domain to produce valuable, high quality business products for their internal use as well as for enterprise use through publication to the mesh ecosystem.

[i] Data Mesh, Delivering Data-Driven Value at Scale; Zhamak Dehghani, March 2022, O’Reily Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA, 95472, pg. 25.

#####

If you liked this topic, let us know!  If you’d like more content like this, let us know!  We’d love to hear your thoughts on this topic and any other you wish to share.

  • Browse Categories

  • The Latest News—Unlocked and Straight to Your Inbox.

    Thanks for reading. Subscribe to get the latest blogs, podcasts and notifications.

  • The Latest News—Unlocked and Straight to Your Inbox.

    Thanks for reading. Subscribe to get the latest blogs, podcasts and notifications.

    View More

    Explore the strategic importance of evolving data relationships and their impact on data-driven insights in our latest blog post. Learn how shifts in business rules require significant re-engineering, affecting data management and decision-making. Essential reading for executives and business analysts, this discussion highlights the need for adaptable data practices to maintain competitive advantage in a dynamic business environment.
    Scroll to Top