The Smart Data Blog

Diving in: What Is a Data Lake?

Posted by Kirk Newell on Jun 9, 2016 12:30:00 PM

Find me on:

In recent years, the concept of a “data lake” has grown in popularity across the world of big data. From financial services to healthcare, companies are recognizing the value they can bring to data analytics and discovery.

scenery-1183363_640.jpgBut what exactly is a data lake? The term has been used by many in many different instances until the definition has become a little hazy. In the early days, they were known as “a disorderly collection of out-of-context data sources loaded into a mismatched technology – bereft of governance or reusability”. This lack of data preparation led many to believe that they were too difficult to use for deriving insights and value.

From our point of view, data lakes are simply defined as large repositories of data, stored in native format and hosted on commodity hardware. Their appeal lies in the ability to rapidly assemble large volumes of unfiltered data and to store it cheaply relative to traditional data warehouses.

As a repository of raw data from many sources and in many formats, they contain a mess of structured and unstructured data whose value, until recently, has not been recognized by companies. While traditional data warehouses were used to convert data for specific analysis and applications, the raw data residing in the lakes was still waiting to be discovered.

However, as we’ve seen, new tools such as Anzo Smart Data Lake can overcome the challenges presented by the traditional lake. ASDL makes it easy to semantically link, analyze and manage diverse data, structured and unstructured, at big data scale and to make it available for self-service consumption by business users.

Our Anzo Graph Query Engine also makes it even easier for anyone in the enterprise to intuitively “surf” information found in them without specialized analytics skills. They are reformulated into an “infinite” graph that allows business users in any department to “link and contextualize” information to get answers to specific questions as well as discover questions they haven’t even considered yet.

Fortune 500 companies are exploring these tools in earnest to democratize big data analytics for obvious reasons. As our President, Alok Prasad, explained:

“Business intelligence and other analytics tools require a lot of programming to put hard-coded, structured information into pretty visuals of data but offer nowhere near the depth of analysis and insight people can get from a graph-based approach to data analysis. With these tools, big data analytics stops being a single application and starts to become a capability across the organization, as easy to use as Excel. Essentially, everyone can now analyze everything.”

With these tools, perhaps the perception of the data lake and its definition should evolve. Rather than the vast repositories of unusable information they were once considered to be, they are fast becoming the cornerstone for big data analytics now and into the future.

To learn more, download our whitepaper here.

Download  Data Lake Trends - the Rise of Data Lakes

Topics: Data Lake