The Smart Data Blog

Data Warehouse vs Data Lake: Understanding the Evolution

Posted by Sean Martin on May 10, 2016 3:00:00 PM

The times they are a-changing – and changing quickly – in enterprise data management.

Perhaps the biggest change is the move from the ubiquitous data warehouses of the past few decades to the rapid build out of data lakes. While both have their pros and cons, as you can see in the graphic below, a new era of ‘smart’ data lakes based on semantic technology alleviates the downsides of either, creating a clear path for the industry.

Using tools from Oracle, IBM, Teradata and Microsoft, setting up, maintaining and evolving data warehouses has always required vast, expensive resources and infrastructure. Nobody ever really wants to create a new one. In some of the organizations I have been a consultant to, the words “data warehouse” are considered dirty! The high failure rate in projects related to their implementation and their ongoing costs has the business side alarmed – but it was the only real solution to reliably integrating data to produce analytical reports based on enterprise data.

data warehouse vs data lake vs smart data lake

Read also: Welcome to the Semantic Data Lake Revolutionorange-arrow.png

As seen in the chart below, the benefits of well implemented warehouses include effective governance and security, high data quality, and proven, repeatable performance. However, the downsides are many, including the tremendous amount of up front preparation required by skilled IT and business analysts to set up the warehouse and the resulting lack of flexibility in adapting rapidly to changing business information needs. They are anything but agile.

In the face of pressures from the business for quick access to new data, especially data coming from outside the organization and unstructured data which makes up something like 80% of the information in an enterprise, coupled with the decreasing costs of data storage in recent years and the emergence of Big Data and NoSQL toolsets, enterprises began turning to data lakes as an alternative to the challenges of creating yet another data warehouse. The hope was that they would serve as large repositories of structured and unstructured data that could be immediately accessed as needed by business analysts to extract quick value using big data tools.

But these first-generation lakes have presented their own issues, unfortunately. On the plus side, they have lower overall costs and the ability to incorporate and query unstructured data. However, the skills required of business analysts to effectively extract value from them are scarce and much of the critical principles of data governance, such as maintaining the context, provenance, quality, security, and integrity of enterprise data, effectively went out the window. Data integration became an exercise left to the end user.

In my next post, we will look more closely at the newest kid on the block to tackle enterprise data management, query and analysis – the semantic or ‘smart’ data lake – that effectively solves most of the issues associated with data warehouses and first-generation data lakes.

data warehouse vs data lake graphic
Click to enlarge

 

Want to learn more? Click here to download the whitepaper "Data Lake Trends - The Rise of Data Lakes".

Download  Data Lake Trends - the Rise of Data Lakes

Topics: Data Lake, Smart Data Lake, Data Warehouse