Data Fabric is one of the latest buzzwords associated with another buzzword, Digital Transformation. In this post, let’s untangle the concept of Data Fabric and unsnarl it from marketing entanglements. Then, we can look at Data Fabric in the context of Data Vault, how the approaches complement each other, and how taking a Data Vault approach can help stretch the Data Fabric to go further.
The purpose of this article is to demystify some of the buzzwords. The goal of buzzwords is that they are used to simplify concepts, but it often deceives people into thinking that they only need one solution. Unfortunately, life is more complicated!
Disentangling data sources for digital transformation value
It’s common wisdom that data volumes, types, and speeds are exploding. Businesses cannot behave like King Canute, shouting at the sea to hold back the tides of data reaching the organization; data is here to stay. However, it’s not so apparent that data sources are getting more and more entangled. Customers are no longer represented by rectangular data: rows and columns. Now, data has a 360 view to represent the customer. It now includes IoT, sensor data, and social media data. Businesses are working to integrate data sources, and the difficulties in integrating data are a pernicious problem that impacts the organization’s continuous digital transformation.
Data Fabric and Data Vault aim to help businesses make their data work, but there is some confusion about the terms. The risk is that organizations can repeat the same mistakes and expect a different result while unintentionally creating another data silo.
What is a Data Fabric?
The objective of the Data Fabric is to deliver unified data fabric for all forms of data consumption. It is an architecture and set of data services that provide constant capabilities across a choice of endpoints, often focusing on traversing on-premise and multiple cloud environments. A data fabric architecture sews together historical and current data across various data silos to produce a uniform and unified business view of the data.
What issue is data fabric architecture trying to solve? Data fabrics are attractive since they are commonly associated with Digital Transformation efforts since there is an assumption that the Data Fabric architecture will reduce costs. Organizations have issues in governing data sources, which, in turn, means it is difficult to derive value from the data. The Data Fabric focuses on integrable data structures rather than integrated data sources, breaking the data into reusable components.
Data Fabric is trying to solve the problem of data that disappears from the perspective of the business. The issue becomes increasingly worse over time: as more data arrives in the organization, then even more data disappears from view. As a result, the problem of disappearing data gets worse over time and pushes the organization away from obtaining value from the data.
Data Fabric provides an elegant solution to the complex IT challenges in handling enormous amounts of data from disparate sources without replicating all of the data into yet another repository. This feat is accomplished by combining data integration, data virtualization, and data management technologies to create a unified semantic data layer that aids many business processes (such as accelerating data preparation and facilitating data science).
The concept of fabrics has been around since the grid computing concept appeared late last century. In computing, fabrics are interconnected structures where all the different computing nodes appear as a single logical unit. This concept is applied to a data fabric, where the various nodes are viewed as data sources and interconnected to one another. Data Fabric connects diverse data environments while providing continuous maintenance and care for data consumers, such as applications and services.
There are varying understandings of the Data Fabric architecture, but ultimately it is a unified architecture that is a platform for services and technologies. One common wisdom is that the data fabric architecture enables obtaining, ingesting, blending, and sharing data in a distributed data environment. The unified perspective comes from perceiving interconnected data sources as a weave of fabric that interconnects different sources of data coming from separate locations.
Data Fabric architecture provides batch, real-time, and big data use cases and focuses on data ingestion and integration. The Data Fabric approach emphasizes built-in data quality, data preparation, data governance capabilities, and automation.
What is Data Vault?
Data Vault is composed of three pillars: Architecture, Model, and Methodology. Together, they are explicitly designed to solve a complete business problem through optimization as requirements change. Data Vault is built on three main pillars: methodology, consistent, repeatable, and business-oriented, focusing on collaboration. Architecture is also crucial since it means that the overall solution meets standards. Within the Methodology, the Data Vault offers prescriptive implementation guidance, comprising rules, best practices, standards, process designs, and more.
What do the Data Vault and Data Fabric have in common?
At a high level, Data Fabric and Data Vault are both aimed at supporting broad business needs of working with data in a fluid, dynamic environment where data needs to be interpreted alongside other data sources. Data Vault and Data Fabric are designed to solve irregularly-shaped data sets, where the data itself is extensive.
How does the Data Vault differ from Data Fabric architecture?
While Data Fabric is mainly focused on the integration piece, Data Vault comprises an end-to-end methodology that prioritizes crucial parts of the business from the data perspective. In essence, Data Fabric only solves one part of the problem, whereas Data Vault recognizes that the people and processes are front-and-center of the solution.
In the Data Vault methodology, the Key Process Areas (KPA) are a crucial part of driving the design of the overall solution. In contrast, the Data Fabric architecture supports the comprehensive solution by offering an integrated solution. Still, it does not help the business to prioritize the sequence of importance for implementation. According to the Data Vault methodology, priority would be given to the most feasible solution to derive the optimal value for the business.
The Data Vault approach is a methodology rather than a technology-focused architecture. Data Vault considers people and the processes, rather than simply the technology itself. There is a distinction between low impact and high impact changes and their consequences, and it is harder to recognize these consequences in the Data Fabric architecture.
Data Fabric and Data Vault both focus on emphasizing data provisioning to the business, but it manifests differently. The Data Fabric architecture emphasizes data ingestion and integration, viewing provisioning as a result. On the other hand, the Data Vault methodology simplifies this critical part of data warehousing by facilitating new data sources without disturbing the schema. Further, the Data Vault Methodology also carefully considers the concept of writing data back into the Data Vault solves data governance concerns by tackling data governance through confidential auditing of data ingestion from the source systems.
Data Vault increases user adoption by business users because they can see improvements and optimizations using the methodology, supporting the business domain through data. The Data Vault positively encourages collaboration with the data scientist community in the organization.
How can Data Vault and Data Fabric work together?
Tim Berners-Lee is quoted as having said that ‘Data is a precious thing and will last longer than the systems themselves.’. Often, data and technology are assumed to be the same thing, and businesses often categorize data as part of the IT department remit due to their guardianship role.
The Data Fabric approach fits in well with the IT department because it zones in on integrating data from disparate sources. However, it is a mistake to take only an IT-oriented stance on data integration since it does not contribute to optimizing the business processes. In contrast, the Data Vault methodology takes a holistic approach to optimize business performance and puts the business first. Data Fabric implementations can be made more successful by adopting the Data Vault methodology, which recognizes that data and technology are two separate entities and should be treated separately. The Data Fabric approach can help integrate data sources. Still, it is recommended that the Data Vault methodology of handling keys is followed in order to ensure success in the overall solution.
The Data Vault methodology differs from Data Fabric in terms of focus and coverage. The Data Fabric approach can play a part in the overall architecture by supporting the integration aspects of the Data Vault methodology. The Data Fabric approach emphasizes data ingestion and integration, the fabric of how data connect. Data Vault methodology is also concerned with generating value from the data. Still, it focuses on solving business problems to generate value from the data rather than on data ingestion and integration. It can be very tempting to focus on the technology and the data rather than focusing on the business, its problems, and its processes since data issues can be easier to solve.
Jennifer Stirrup is the Founder and CEO of Data Relish, a UK-based AI and Business Intelligence leadership boutique consultancy delivering data strategy and business-focused solutions. Jen is a recognized leading authority in AI and Business Intelligence Leadership, a Fortune 100 global speaker, and has been named as one of the Top 50 Global Data Visionaries, one of the Top Data Scientists to follow on Twitter and one of the most influential Top 50 Women in Technology worldwide.
Jen has clients in 24 countries in 5 continents, and she holds postgraduate degrees in AI and Cognitive Science. Jen has authored books in data, and artificial intelligence, has been featured on CBS Interactive and the BBC, and other well-known podcasts, such as Digital Disrupted, Run As Radio, and her own Make Your Data Work webinar series.
Jen has also given keynotes for colleges, universities, as well as donating her expertise to charities and non-profits as a Non-Executive Director. Jen’s keynotes are about AI Leadership, Diversity, and Inclusion in Technology 3. Digital Transformation 4. Business Intelligence. All of Jen’s keynotes are based on her two decades plus years of global experience, dedication, and hard work.