Ten Reasons to Dive into the Smart (Semantic) Data Lake

Posted by Marty Loughlin on Jan 11, 2017 12:14:00 PM

The data lake is a modern and rapidly evolving data architecture. It promises ubiquitous access to enterprise data with compelling benefits in terms of cost and agility. Yet, in many cases, this promise has not been realized.

Today's data lakes are challenged with inadequate data governance, have difficulty providing business meaning for diverse data and are often only usable by skilled data scientists.Cambridge Semantics Smart Data Lake.jpg

There is a solution. Leading pharmaceutical, life science, financial service and other organizations are using semantic models and tools to create Smart Data Lakes (SDL). The Smart Data Lake provides governed self-service data discovery, enterprise analytics and data management, rapidly and at scale.

Here are ten reasons these organizations are diving into the Smart Data Lake:

  1. Model Driven Data Governance
    In the SDL, the entire data lifecycle is driven by semantic models - conceptual business models that are used to generate code and queries for data movement, transformation and analytics. Data governance is no longer a separate activity, it is automatically linked to your data.
  2. Data Harmonization & Business Meaning
    Common business models enable data harmonization based on business meaning across disparate sources, structured and unstructured, internal and external. Models can be enterprise specific or leverage industry standards such as FIBO or CDISC.
  3. Self-service Business Analytics
    Model aware tools enable business users to browse and select data sets based on business meaning. Consumption options include dynamic discovery tools, interactive dashboards and predefined reports, all driven by the common models.
  4. Automated Data Movement and Transformation
    Models and maps are used to generate code for data movement and transformation - no coding required.
  5. Full Data Lineage
    These models can be queried so data lineage is complete, immutable and always up-to-date.
  6. Agility
    In the SDL, interactive analytic applications across diverse data sets can be built in days and hours with no coding. Full enterprise deployments take longer, of course, but the model driven platform is ideal for allowing rapid prototypes to be shared with users for feedback and iterative development.
  7. Open Standards
    The platform and semantic models are built on open standards from the W3C (RDF, OWL and SPARQL) - great for getting data in and out easily as well as avoidingv vendor lock in.
  8. Elastic Scale
    Cambridge Semantics has a massively parallel, in-memory, semantic graph database that enables interactive queries over extremely large data sets (read about our 1 Trillion Triple benchmark). This breakthrough technology, developed by the team that developed Netezza and Paraccel, is a game changer for the application of semantic technology at scale.
  9. Works with Existing Tools
    The Smart Data Lake is designed to coexist and augment existing data tools. It works with Hadoop HDFS, Apache Spark, BI tools, ETL tools and many others.
  10. Deploy On-Premise or in the Cloud
    The SDL runs on Linux and Windows, on your infrastructure or in the cloud (Google and Amazon today, Azure coming soon).

To learn more, download the Smart Data Lake white paper here.

Download the Whitepaper

This post was originally posted on LinkedIn.

Tags: Data Lake, Smart Data Lake