'Data, data everywhere....'

In the epic poem by British poet Samuel Taylor Coleridge, the “Rime of the Ancient Mariner”, sailors stranded at sea blame their torment and thirst on an ancient mariner after he shoots an albatross for sport. Believing they are cursed by his actions, the sailors lament, “Water, water everywhere, nor any drop to drink”, and hang the albatross carcass around the mariner’s neck as penance.

With the extraordinary growth in complex, distributed data, one can easily picture today’s enterprise data engineers echoing this lament with a cry of their own, “Data, data everywhere, nor any byte for insight”. More and more, executives in the boardroom and the c-suite are recognizing the value of data in transformation. Data is the new oil. It is the fuel that is enabling companies to improve the customer experience, enhance workforce productivity, increase operational efficiency, and adapt to competitor threats. It has also become a new source of revenue, as companies discover ways to productize their data or take advantage of it for delivering new products and features.

78% of enterprise data not used for insights

However, according to Forrester Research, despite new use cases and heightened demand for data and insights, only 22% of enterprise data is actually used for analytics. This underutilization can be attributed to the immense complexity and cost of managing and transforming massive amounts of disparate data; the shortcomings of traditional data management solutions and architectures; and a lack of organizational vision and skill. For many companies, collecting, integrating, and managing enterprise data has become the proverbial “albatross around the neck” for data engineers.

The rise of the data fabric

Out of necessity, companies have had to reimagine their data architectures and rethink their existing data practices to meet new demands and overcome their data challenges.

Three years ago, Forrester noted growing interest in an emerging data architecture - the data fabric. Noel Yuhanna first documented this trend in “The Forrester Wave™: Big Data Fabric, Q2 2018”, and defined the data fabric as a platform that could accelerate insights by automating ingestion, curation, discovery, preparation, and integration of data silos.

Earlier this year, Yuhanna published an updated research note, “Big Data 2.0 Drives Data Democratization”, that describes how the advancement of supporting technologies and growing interest from data-intensive operations has moved the data fabric from a loose collection of point solutions for managing data to a thoughtful organization of data assets and technologies that must work in tandem to intelligently connect complex, distributed data for the purposes of creating a unified, comprehensive trusted view of enterprise data that can be accessed and shared to accelerate insight.

In Cambridge Semantics’ view, this updated research is particularly noteworthy because it:

Acknowledges the important role that a semantic layer and graph models play in reducing complexity and accelerating insight
Debuts a flexible data fabric reference architecture that companies can use as a roadmap for their journey and tailor to their specific requirements
Calls attention to the growing role that AI/ML plays in automating core processes and supporting advanced analytics

Semantics and graph models key to reducing complexity and accelerating insight

One of the key takeaways of Yuhanna’s research note is that graph data models - when coupled with a semantic layer - are integral to integrating and connecting data in a data fabric. By definition, graphs model the connections between entities and these relationships are prioritized, allowing the linking of data from multiple sources into a composite whole. These connections also reduce complexity by providing context and meaning to the data. In turn, a semantic layer enables these connections because they have “built-in” meaning - they present the data in the business terms or nomenclature that the data consumers are accustomed to, as opposed to the often cryptic, unintelligible naming of data typically found in the underlying databases and applications. This business context and reusability provided by the combination of graph data models and semantics reduces organizational dependency on IT and makes the data both more discoverable and accessible to more users - including those of limited technical ability or analytic expertise.

The semantic layer created by the data fabric also reduces complexity by automating processes across the data fabric. It pushes logic and processing to the underlying data platforms, makes use of data pipelines to automatically processes data streams, and manages governance and security policies. This semantics-driven automation is particularly important at scale.

Debut of a data fabric reference architecture

In his latest research note, Yuhanna also debuts a reference architecture for the data fabric that reflects the growing maturity of the emerging technologies and methods deployed by early adopters. He describes the data fabric reference architecture as six core layers:

Data management
Data ingestion and streaming
Data processing and persistence
Data orchestration
Data discovery
Data access

These layers work in tandem to connect complex and distributed data to create a unified, comprehensive, and trusted view of enterprise data that can be used to build valuable, blended, analytics-ready data products, as needed.

This new reference architecture also reflects the desire of companies to leverage both emerging technologies, as well as existing investments in technology, infrastructure, and skills, to support a broader range of use cases. The emerging data fabric is not any one solution for all circumstances. Rather, a data fabric may involve knitting together any number solutions. This building block approach allows for unprecedented flexibility. Companies can deploy the solutions best suited for their use cases, when and where they need it - on premise and/or in the cloud - without costly retooling or retraining, to surface new insights.

Anzo^Ⓡ, Cambridge Semantics’ data discovery and integration platform for the enterprise data fabric, spans the most critical layers of the data fabric: data management, data orchestration, and data discovery. An incrementally applied overlay over existing infrastructure, Anzo uses semantics and graph models to enable companies to quickly find, connect, and blend complex, distributed data; surface data insights that drive transformation; and create reusable, trusted views of enterprise data tailored to user requirements.

AI/ML functionality’s growing role in the data fabric

Lastly, Yuhanna’s updated research recognizes the growing role of AI/ML in the data fabric. At each layer, AI/ML capabilities enable enterprises to automate all processes of the data fabric, including data discovery, mapping, integration, orchestration, and metadata analysis. These capabilities reduce the complexity and drudgery of data management by automating many manual tasks, freeing technical resources for higher value activities. Additionally, such intelligence-driven automation enables enterprises to support more robust analytics and to accommodate a broader range of use cases, as well as non-technical users, at scale.

Data fabric gaining in popularity

Though still in its nascency, the data fabric is quickly gaining in popularity amongst companies struggling to balance the robust analytics required for insight along with the complexity of managing large volumes of distributed, disparate data. In his latest research note on the data fabric, Forrester's Noel Yuhanna deftly outlines how this emerging data architecture enables data-driven companies to overcome both challenges. It provides companies with a flexible architecture that leverages the technologies and approaches most suited to their data requirements. Additionally, when deployed with a semantic layer, graph models, and AI/ML capabilities, the data fabric simplifies data management and makes enterprise data accessible to more users for more use cases. With the volume and variety of data and the need to ask complex questions across that data expected to grow unabated, it is no surprise that Yuhanna concludes that the data fabric should be a part of any enterprise data strategy.

The Smart Data Blog