The Smart Data Blog

Winning the Wild Wild West of Data Lakes

Posted by Patrick Wall on Feb 21, 2016 12:06:39 PM

Find me on:

With no inherent means of adhering to governance and security protocols, data lakes are akin to the Wild West in that they are devoid of order and consistency. Each user manipulates his or her own data at the risk of the reuse of that data for others.

Smart Data Lakes reshape that frontier existence into structured civilization. Whereas the integration efforts of the former are limited by the end user’s knowledge of transforming and linking data, that of the latter is characterized by an expedient integration of all enterprise data.


The analytics impact that comprising and collating all information assets produces is maximized with any variety of user-friendly visualizations, from browser-based means to those of conventional analytics or BI tools. Consequently, business users are bolstered by sustainable, self-service analytics that encompass everything from initial integration to visualizations.

Data Integration

Anzo Smart Data Lake facilitates automated integration efforts via Anzo Smart Data Integration, part of the Anzo Smart Data Platform. In accordance with the overarching semantic model that evolves with business needs, this toolset converts any data—structured, unstructured, or semi-structured—into a semantic graph format. Incorporating multiple servers allows ASDI to convert structured data via an ETL process that involves mapping, linking, and transformation. It accounts for semi-structured and unstructured data with text analytics or popular tools such as Spark for Hadoop data.


Data Modeling

ASDI’s conversion process of all data into an RDF graph format is enhanced with a canonical linking model. This model is able to create graph models across sources that link to each other by incorporating germane elements of each source. Canonical linking enables business users to influence the semantic model’s evolution with new sources and their attributes, thereby expanding the semantic model at the pace of the business. Users can thus understand relationships between relevant data elements to gain the sort of context needed for efficacious analytics.

Robust Analytics

Anzo Smart Data Lake—courtesy of ASDI—allows end users to visualize the critical steps of the analytics process, from data discovery of relevant sources to the results of codeless queries. Significantly, the underlying integration required for this procedure is automated by the semantic technologies operating these platforms. Such a process begets more meaningful analytics powered by greater quantities of data for that most profound output—informed action.

Topics: Data Integration, Data Lake, Anzo, Smart Data Lake