Understanding Data Vault 2.0
Understanding #datavault 2.0, why DV2 is important, comparing with other data modeling methods, and how it works for #bigdata #iot, #streaming, #datawarehouse, and #datascience
Understanding #datavault 2.0, why DV2 is important, comparing with other data modeling methods, and how it works for #bigdata #iot, #streaming, #datawarehouse, and #datascience
Introduction The industry has been struggling for a long time with defining a data lake. We are taking the plunge, let’s properly define a data lake. I have seen hundreds of different definitions around the world, and none of them seem to provide an organization with the foundations they need to build a successful data lake.
By Cynthia Meyersohn To close out this series of articles that have been focused on the data replication introduced by the processes outlined in “Build a Schema-On-Read Analytics Pipeline Using Amazon Athena”, by Ujjwal Ratan, Sep. 29, 2017, on the AWS Big Data Blog, https://aws.amazon.com/blogs/big-data/build-a-schema-on-read-analytics-pipeline-using-amazon-athena/ (Ratan, 2017), I will talk about an approach that I
By Cynthia Meyersohn Picking up from Part 2 of this series where we left off having replicated the data a minimum of nine times, we will continue to identify additional data replication stages as we trace through the data processes outlined in “Build a Schema-On-Read Analytics Pipeline Using Amazon Athena”, by Ujjwal Ratan, Sep. 29,
By Cynthia Meyersohn Continuing from Part 1 of this series, this article is following the breakdown of the AWS Schema-on-Read analytics pipeline with a focus on data movement and replication. You may recall we are tracing through the data processes outlined in “Build a Schema-On-Read Analytics Pipeline Using Amazon Athena”, by Ujjwal Ratan, Sep. 29,
By Cynthia Meyersohn Last August I was asked to participate in an exercise to assess whether or not the Data Vault 2.0 System of Business Intelligence (DV2) still held value. Specifically, I was asked the following: Does the size of the data now being delivered to the business make a compelling argument that the data
Challenge: Automated Build Out Results: 2 people in 2 weeks merged 3 systems, built a full EDW 5 star schemas 3 BI reports zero production errors Generated: 90% of the ETL code 100% of the Staging Data Model 75% of the finished EDW data Model 75% of the star schema data model The competing bid?
Challenge:Rapid Merger and Acuqisition Results: Merged 3 companies in 90 days (Circa 2001) ALL systems, ALL DATA! 125 people, multiple cultures Business Value produced during Due Diligence phase. JP Morgan Chase used the Data Vault model, methodology and architecture to perform due diligence, and ultimately to merge 3 companies in 90 days. I met with
Challenge: Minimized Re-Engineering and Full Auditability Results: “Data Vault 2.0 delivers full auditability by protecting the integrity and origin of the data.” “For us, the Data Vault 2.0 has provided significant benefits. We were able to change our data warehouse whilst minimizing the re-engineering costs. Now we have a system that is more closely aligned
Challenge: Government Agencies Sharing Data Results: CBS (central bureau of statistics) and Belastingdienst (Dutch tax authority) Share Data Between the Agencies By Leveraging Data Vault 2.0 Data Models And leveraging DV2.0 Processes Learn more about Data Vault 2.0 and how we can help securely and impactfully bring data together to produce more business value for
Thanks for reading. Subscribe to get the latest blogs, podcasts and notifications.
Understanding #datavault 2.0, why DV2 is important, comparing with other data modeling methods, and how it works for #bigdata #iot, #streaming, #datawarehouse, and #datascience
Introduction The industry has been struggling for a long time with defining a data lake. We are taking the plunge, let’s properly define a data lake. I have seen hundreds of different definitions around the world, and none of them seem to provide an organization with the foundations they need to build a successful data lake….
By Cynthia Meyersohn To close out this series of articles that have been focused on the data replication introduced by the processes outlined in “Build a Schema-On-Read Analytics Pipeline Using Amazon Athena”, by Ujjwal Ratan, Sep. 29, 2017, on the AWS Big Data Blog, https://aws.amazon.com/blogs/big-data/build-a-schema-on-read-analytics-pipeline-using-amazon-athena/ (Ratan, 2017), I will talk about an approach that I…
By Cynthia Meyersohn Picking up from Part 2 of this series where we left off having replicated the data a minimum of nine times, we will continue to identify additional data replication stages as we trace through the data processes outlined in “Build a Schema-On-Read Analytics Pipeline Using Amazon Athena”, by Ujjwal Ratan, Sep. 29,…
By Cynthia Meyersohn Continuing from Part 1 of this series, this article is following the breakdown of the AWS Schema-on-Read analytics pipeline with a focus on data movement and replication. You may recall we are tracing through the data processes outlined in “Build a Schema-On-Read Analytics Pipeline Using Amazon Athena”, by Ujjwal Ratan, Sep. 29,…
By Cynthia Meyersohn Last August I was asked to participate in an exercise to assess whether or not the Data Vault 2.0 System of Business Intelligence (DV2) still held value. Specifically, I was asked the following: Does the size of the data now being delivered to the business make a compelling argument that the data…
Challenge: Automated Build Out Results: 2 people in 2 weeks merged 3 systems, built a full EDW 5 star schemas 3 BI reports zero production errors Generated: 90% of the ETL code 100% of the Staging Data Model 75% of the finished EDW data Model 75% of the star schema data model The competing bid?…
Challenge:Rapid Merger and Acuqisition Results: Merged 3 companies in 90 days (Circa 2001) ALL systems, ALL DATA! 125 people, multiple cultures Business Value produced during Due Diligence phase. JP Morgan Chase used the Data Vault model, methodology and architecture to perform due diligence, and ultimately to merge 3 companies in 90 days. I met with…
Challenge: Minimized Re-Engineering and Full Auditability Results: “Data Vault 2.0 delivers full auditability by protecting the integrity and origin of the data.” “For us, the Data Vault 2.0 has provided significant benefits. We were able to change our data warehouse whilst minimizing the re-engineering costs. Now we have a system that is more closely aligned…
Challenge: Government Agencies Sharing Data Results: CBS (central bureau of statistics) and Belastingdienst (Dutch tax authority) Share Data Between the Agencies By Leveraging Data Vault 2.0 Data Models And leveraging DV2.0 Processes Learn more about Data Vault 2.0 and how we can help securely and impactfully bring data together to produce more business value for…