To begin, consider the following definitions provided from a 2022 U.S. Army Request For Information (RFI). (Emphasis added.)
“Data Mesh: A data architecture based on a federated and decentralized approach to analytical data production, management, and sharing. It is characterized by federated governance, self-service infrastructure platforms, treating and providing data as a product, and autonomous data domains which are responsible for their data.
Data Fabric: A set of technologies and infrastructure used to securely manage data within a domain. Data fabrics are data mesh nodes that create, expose, retrieve and use mesh conforming data products.
The Data Fabric and Data Mesh are complementary to each other. The Data Mesh includes the principles and governance that orchestrate data sharing activities implemented on the Data Fabric.”
Using these definitions, we can say the Data Mesh is a federated and decentralized architecture; and the Data Fabric is a set of technologies applied to manage data in ‘a’ domain.
Their distinction seems to be a matter of organization and scope. That is to say, a Data Fabric applies to a ‘node’ or a domain; and a collection of [interoperable] Data Fabric nodes covering multiple domains forms a Data Mesh.
As a retired military person, I like this definition! It’s awfully convenient to view the two constructs in this paradigm! Also, this paradigm works at first and second approximations. As with many concepts in data, however, our paradigm may not be exactly precise or accurate at a third approximation. For example, if I say I’m going to model some domain for my Data Fabric, who can say precisely what concepts I should model and exactly how they relate? In practice, it comes down to the customer; and that usually suffices.
As another example, since the Data Mesh is, by definition, federated, can I only issue federated queries via some Mesh Broker? Can’t I issue federated queries directly from my Fabric node to other Fabric nodes? Again, in practice, the customer will decide policy, and the solution will implement the policy.
Nevertheless, for practical purposes — and for this article — we’re going to stick with our paradigm that a Data Fabric is a node and multiple interoperable, federated fabric nodes may form a Data Mesh.
But what’s really important?
For us at Cambridge Semantics, this distinction is not the most important factor. The underlying data model is the critical factor because it is the foundation from which all capabilities — and limitations — derive. Accordingly, our software platform, Anzo®, is built on W3C RDF, OWL and SPARQL. It’s not just a graph; it’s a smart graph model developed to enable the vision of the Data Fabric and Data Mesh constructs. In my humble opinion, an enterprise architect would be wise to carefully examine these standards.
Solutions based on these standards and associated methodologies yield capabilities such as machine understandable data, semantic data integration, federated query, data virtualization, reasoning services, adaptability, and more.
Given these capabilities and the aforementioned definitions, we can say that a data fabric is an instance of heterogeneous data sources that, for example, Anzo has normalized to RDF, semantically harmonized using OWL, and made accessible using SPARQL. We call this a query-able knowledge graph. Piethein Strengholt provides a great description of knowledge graphs in Chapter 10 of his book, titled “Data Management at Scale.”
Now, given a collection of Data Fabrics (knowledge graphs), we might need to interoperate them as a Data Mesh, or federation of cooperating participants in an ecosystem, web, or distributed enterprise. Supply chains represent a common and important example wherein multiple Data Fabrics interoperating as a Data Mesh would provide enormous value. And, given our foundation, it becomes quite feasible to implement Data Fabrics and Data Meshes.
Our standards-based, knowledge graph foundation enables us to construct data fabric ‘nodes’ and also to interoperate those nodes as a federated system, or data mesh. Each node provides a catalog component to support lookup, discovery and access functions. And the Mesh Services include governance and policy enforcement functions.
There are at least two subtle points worth mentioning. One, in a cooperating environment, an implementation based on W3C RDF, OWL and SPARQL can obviate the need for a central Lookup, Discovery and Access services because the data model is, by design, distributed, decentralized, and machine understandable. And the standards provide mechanisms to facilitate data discovery and access. As such, data product publication, discovery and access may be federated.
In fact, data publishers are not required to materialize their data in knowledge graph format; publishers can simply publish what are called SPARQL endpoints. These are services that “look like” knowledge graph access points. When a consumer issues a SPARQL query, the service is responsible to interpret the query and return the appropriate content.
Two, an aspect of W3C OWL known as the Open World Assumption (OWA) assumes the model never is ‘complete.’ This makes it easier and more natural to adapt to change — which will happen. For example, schemas, models and semantics change; and data sources come and go. The OWA expects these changes, and allows implementations to adapt gracefully. Referring to supply chains again, suppliers and consumers come and go routinely, and Data Fabrics and Data Meshes built on OWL are inherently more adaptable.
FWIW, I’m speaking from experience. As many readers know, point-to-point data integration models, and even standards-based models are inflexible to change and are brittle. Achieving semantic interoperability yields more powerful and robust implementations; however, scalability had been the challenge. But we’ve overcome scale limitations in terms of time-to-value, query-able knowledge graph volumes, and repeatable solution development methodologies. Users and developers require no specialized knowledge or skills to succeed.
These are the capabilities that our Anzo software platform provides. Anzo is a complete, enterprise class knowledge graph capability that implements Data Fabrics and Data Meshes — at scale. Anzo will adapt to your needs, regardless of how you define Data Fabric and Data Mesh.