The Smart Data Blog

Smart Data Integration - Solving the data lineage problem with semantic technology

Posted by Marty Loughlin on Oct 1, 2014 10:07:00 AM

Provenance and lineage. Two wonderful words, used interchangeably, to describe a sticky problem for most large financial institutions. Together they ask: what is the origin, meaning and quality of my data? These questions are becoming increasingly important as data is sourced from more disparate locations, as regulators demand an audit trail for reporting, and as more data is exposed to internal and external consumers.
 
Traditional data integration approaches typically focus on moving data point-to-point and do a poor job of tracking the full lifecycle of data. At Cambridge Semantics, we're increasingly seeing and advocating a new approach to enterprise data integration that solves these problems with semantic technologies. We call this Smart Data Integration
 
By deploying a semantic layer across existing infrastructure, you can build a full picture of your information landscape and lifecycle while preserving your existing infrastructure investments. In addition, you can achieve other critical benefits:
  • Dramatically lower the time and cost to onboard new customers and data sources
  • Support industry standard, business consumable, operationally agile enterprise data models (e.g. FIBO)
  • Put highly interactive, business friendly data consumption in the hands of business users
  • Expose full enterprise-wide data provenance necessary for business and regulatory reporting

To make this concrete, we're developing a set of tools on top of Anzo, our semantic platform,  to help deliver Smart Data Integration:

Business Analyst Mapping Tool
The mapping tool allows a business analyst to connect to source and target data systems, ingest schemas and review sample data. Using a familiar Excel based interface, business analysts can map source to target fields and capture any required transformations using context-sensitive wizards.
 
Business Conceptual Model
During the mapping process, the BA has the option to map the source data to a target conceptual model, such as FIBO
 
Automatic ETL Generation
Once the mapping process is complete, the map is saved for cataloging and reuse. At this point, the BA can also click a button to automatically create an ETL job for their tool of choice (e.g., Pentaho Kettle, Talend, Informatica etc.). The ETL job is created from the mapping without any coding or manual intervention.
 
Analyst Dashboards
BAs have full access to the target data and conceptual model through Anzo’s web dashboards. They can search on fields and get data provenance visualizations that show where data came from and what transformations were performed on it.
 
Business User Dashboards
Business users also have full access to the target data and conceptual model through web dashboards. This provide interactive data search, visualization and investigation capabilities.

Topics: Smart Data, Semantics, Data Management, Data Integration, Big Data