NYU Builds Successful Data Vault

NYU Builds Data Vault 2.0 & Unlocks Big Business Value

New York University is as large as a major corporation, with 100,000 undergraduate students, 19,000 employees, and 18 schools in three countries.  While they haven’t been collecting data for all 191 years of their existence, they have been building analytics to support various areas for many years.  Their architecture has evolved through changing source systems, technology innovation, and the emergence of the cloud. We are proud to announce that NYU builds data vault and builds more value to this day.

Over the last 15 years, processes were created to solve one problem at a time as needs arose. Like many analytics groups, Senior Director of Enterprise Data Management Satya Kunta and his team were skilled in building siloed Kimball-style dimensional data warehouses.  Over time the maintenance and operation of all these individual projects became a burden to the NYU team.  As you can see below, the legacy architecture diagram had lines going in many directions.

Legacy Architecture Builds Data Vault

NYU Data Vault 2.0

Kunta Highlighted The Top Five Pain Points:

1. Technical Debt – keeping all the processes running became the focus. Many servers, many jobs, many databases, too few people.
2. Multiple Data Warehouses – duplicate processes and information collisions were common. Governing all these individual databases with data overlaps was difficult.
3. Convoluted ETL Process – batch processes ran nightly, generating many failures that kept the staff busy. “Hardcoded” methods required time to repair.
4. Too Rigid – approaching technical changes in an agile fashion was impossible with such a fragile architecture.
5. No Single Point of Truth – without governance, the truth became what one group and one solution said it was. No one owned the most important measures of their business.

NYU Found a Better Way – NYU Builds Data Vault

By 2017, NYU knew they needed a better way. They started to envision the traits they desired in a more modern analytics architecture: view the data as a product, resiliency to source changes, development agility, and increased delivery speed to the business. During this period, one of NYU’s peers, Yale University, was exploring its own data modernization.

Yale was working with a consultancy, Eon Collective, and considering concepts like Data Vault, automation, and cloud databases. Through conversations with Yale and their research, NYU recognized that the Data Vault 2.0 methodology tenets and their wish list were closely aligned. Kunta and his team had to bring an open mind as they considered leaving the familiar dimensional modeling. Still, they studied Dan Linstedt’s leading book on the subject and attended the Worldwide Data Vault Consortium to hear from the experts. It was all starting to make sense.

NYU took another step in 2018 toward adopting the Data Vault methodology by contracting with Eon Collective to assist in their modernization efforts. Eon Collective CTO, Robert Scott, has extensive experience in modernization and complex migrations. Scott explained, “there are always three areas to be addressed: people, process, and technology. They all must be considered – and Data Vault addresses all three.”

With assistance from Eon Collective, NYU made key technology and process decisions as to what would be the foundation of their new architecture:

  • Data Vault Concepts and Design – speed and resilience to change, auditing, and traceability.
  • Institutional Repositories – constructed authoritative data assets for use across the university.
  • Automation – to attain development agility, NYU used template-based ELT automation methods. This allowed the development team to make changes quickly and automatically update documentation and lineage.
  • Governance – created a data governance framework and operating model.
  • KPIs – developed essential factors and KPIs to measure success.
  • CDC – the new ingestion engine was powered by CDC instead of a batch processing method.
  • Cloud Database and Storage – NYU selected the Snowflake database as a key technology partner.
  • New Consumers – NYU’s API access gives their data greater reach. They are also growing their data science capabilities.
NYU Data Vault Architecture
The diagram below is the new modernized NYU Data Vault
NYU Builds Data Vault 2.0!
NYU Builds Data Vault 2.0
NYU Builds Data Vault 2.0!

Improvements From Data Vault Architecture

As they completed the first project with the new architecture, NYU was gaining experience as they worked side by side with Scott and his team. Keith Belanger from the Eon Collective commented, “NYU absorbed the data vault methodology completely.”

Even while learning new tools and methods, Kunta and his team were already seeing how the new architecture would assist them in serving the business better. They made some mistakes as they went through the learning process but iterated and fixed the issues when they did.

By the end of 2019, they saw significant improvement in these areas – all of which are incorporated in the Data Vault 2.0 Solution:

  • Agility – Kunta stated that agility is most important in delivering value to the business. To quickly respond to business needs in a time frame that allows the business to be more effective is critical.
  • Automation – template-based development created consistent coding regardless of who the developer was. Automation empowers agility.
  • Reusability – standard templates for modeling, transformation, and data handling. By designing the process correctly from the beginning using a standard, the process became reusable and much more efficient.
  • Transparency – brought the business to the process through governance and education on how to use the data.
  • Auditability – Everyone was now on the same page about what data came from the source systems, and what was done to it – in other words, the lineage and quality of the data.

NYU’s ability to deliver with the new architecture would be put to the test during the Covid-19 Pandemic. New applications were needed to know where students were located, their screening status, which buildings were open, and who was allowed entry. In eight weeks, the NYU team was able to deliver the needed applications. In talking about the importance of the Data Vault methodology, Kunta stated, “if we had not had the Data Vault solution in place and had to the use old systems, forget it, we wouldn’t have been able to build it as fast as we did.”

After such a successful modernization project, what’s next? Kunta describes the next adventure for the NYU team: to offer their users a data marketplace where they can shop for data.

Living through the stress of maintaining the legacy processes, Kunta and the team are now excited about accepting new data sources and helping other parts of the university. We are beyond proud that NYU builds data vault 2.0 to create an even larger impact with their data.

Similar Posts