WWDVC-2024-rev-stacked.png
What is a Data Lake?
defining a data lake

Introduction

The industry has been struggling for a long time with defining a data lake. We are taking the plunge, let’s properly define a data lake.  I have seen hundreds of different definitions around the world, and none of them seem to provide an organization with the foundations they need to build a successful data lake.

This post, along with this short 10 minute video, is meant to assist you in defining your data lake.

Special Thanks

I need to take a minute to give special thanks to a few people.  I am constantly reading what the industry says, but I am also constantly vetting these ideas and challenging my friends and colleagues to discuss WHY and HOW these ideas shape, change, and apply to the industry.  Here are a list of the contributors to my ideas around Data Lakes:

  • Tamara Dull (Unofficially blessed my definitions)
  • Kent Graziano
  • Sanjay Pande
  • Bruce McCartney
  • Scott Ambler
  • Cindi Meyersohn

These folks were instrumental in helping me formulate the ideas in this video, and this blog post.  I challenged them to assess the value of the statements made by the industry, along with what was a platform / vendor position vs a conceptual notion.

Defining A Data Lake - Definitions

The original definition of Data Lake written by Tamara Dull in 2015 really doesn’t do justice to what a Data Lake is or should be.  In fact, the original chart is a better comparison of physical platform properties (comparing Traditional RDBMS platforms to NewSQL / NoSQL Platforms).   In order to properly define a Data Lake, we need to clarify what it is not.

In this case, we need to define:

  • Data Dump
  • Data Junkyard
  • Data Swamp
  • Landing Zone

Each one is unique in it’s own right.  To many people in the industry use the term Data Lake when they really mean: Data Dump.  The I.T. teams call it a Landing Zone to be politically correct, as business users grimace when they hear “Data Dump or Data Junkyard”. 

Moving from Data Dump to Data Junkyard to Data Swamp to Landing Zone is a maturity curve.  Each component requires specific definition, as well as specific features and processes to be applied to the data.

Data Lakes are a  solution for Business Intelligence / analytics – they are not a platform, they are not a tool, they are not a file store in the cloud!  Data Lakes are not Data Hubs, and this entry is not about defining a Data Hub – we will do that in another blog post soon.

Landing Zones are not a data lake!!

Data Warehouses are part of a Data Lake solution, but should not be compared to a Data Lake.

Keys to Success

There are several keys to success in building a Data Lake.  Often businesses forget that good governance, security, and methodology are part of a data lake strategy.  Typically data dumps are built, and labeled as “data lakes”, then when the business can’t use what’s built – they deem it unsuccessful (a failure) and shut it down. 

Some of the reasons they can’t use it is: because it’s missing a solid definition, because it lacks business use cases, because the data isn’t properly defined / profiled / documented in metadata, or because there was a security breach at the file level.

A few of the top keys to success include:

  • Governance
  • Security
  • Profiling / Structure Assignment
  • Metadata & Lineage
  • Methodology & Approach
  • Design and Architecture
  • Flow (In and Out)
  • Collaboration (I.T.,  Data Science groups and Business Analysts)
  • Use Case / Purpose for the Data
  • Leverage / Incorporation of a Data Warehouse
  • Productionalization Processes
  • Managed Self-Service BI

It is our opinion that these (and more) elements are necessary for success in building, deploying, and managing a Data Lake. Defining a data lake or data lake architecture is challenging with buzz words running amok. We hope our continued pursuit of defining a data lake will help you and your organization. Thank you for reading.

General Admission Pricing

MAIN PROGRAM


Monday-Friday
& 10th Anniversary Reception Tuesday evening

$997

Become a WWDVC Speaker

Submit your sessions and if selected, receive FREE registration!
Exhibitor Packages

All exhibitors will have a booth on the
exhibitor floor that you are responsible
for staffing. Includes a set number of staff tickets. Diamond Level: Includes a Hands-On Lab

Sponsor Package

Sponsors will have access
to attendee network but NO Booth. Sponsored breakfasts and lunches available. Will be featured on all group marketing material.

Scroll to Top