Knowledge Graph Technology for a Data Fabric

A portion of this blog is an excerpt from the O’Reilly ebook The Rise of the Knowledge Graph co-authored by Sean Martin, Dean Allemang, and myself. If you’d like to learn more about knowledge graph and how it stitches together the concepts in this post, please check out the ebook as well as the other posts in this blog series.

One of the basic tenets of a data fabric is that enterprise data management should be able to embrace a wide variety of technologies, and in fact, it should be flexible enough to be able to adapt to any new technology. So it would be shortsighted to say that the data fabric must rely on one particular technology.

Nevertheless, any realization of a data fabric will be built on some technology. We believe that knowledge graph technology, in particular one based on the semantic web standards (RDF, RDFS, and SKOS), is the best way to achieve a successful data fabric or data mesh. If you’d like Gartner’s opinion, check out their data fabric design report

Let’s examine how knowledge graph technology satisfies the requirements for a data fabric, and how it is indeed well suited to enable a data fabric. Note that many of the requirements lay out some pretty strict specifications of what a technology must provide in order to support a data fabric. These specifications tightly constrain the technical approach, driving any solution toward RDF, RDFS, and SKOS.

One by one, let’s review the requirements for a data fabric.

Watch on YouTube.

Flexibility in the face of complex or changing data

One of the many advantages of an explicitly represented ontology is that it is easy to extend a model to accommodate new concepts and properties. Since an ontology in RDFS is itself represented as a graph, it can be extended by merging with new graphs or by simply adding new nodes into the graph. There is no need to reengineer table structures and connections to accommodate new metadata. The combination of explicit knowledge and a graph representation combine to provide unparalleled metadata flexibility.

Description in terms of business concepts

Because an ontology is not bound to a particular technology, data modelers can use it to model classes and properties in a domain and provide them with names and structures that correspond to concepts familiar in the business. The art of matching business concepts and processes to data structures is usually the purview of a profession called “business analysis” or “business architecture.”

An ontology is a power tool for the business analyst, providing them a formal structure for expressing business models, data models, and the connections between them.

Ability to deal with unanticipated questions

An oft-lamented drawback of static data representations is that, while the process of building a data structure to answer any particular question is well understood, the process for reusing such a structure to answer a new question is difficult, and typically amounts to starting over. There is no incremental gain from incremental modeling effort. In contrast, a knowledge graph allows users to pivot their questions by following relationships in the graph. Explicit knowledge representation allows users to make sense of these relationships in asking their questions as well as their unanticipated follow-on questions. When requirements change drastically, semantic models also allow dynamic remodeling of data and the quick addition of new data sources to support new types of questions, analytic rollups, or context simplifications for a particular audience.

Data-centric (as opposed to application-centric)

A knowledge graph contributes to a data fabric approach in many ways, not least of which is based on standardization. Most data representations (especially relational databases) enclose data in applications of some sort; there is no standard way to exchange data and the models that describe it on a large scale from one platform to another. ETL (extract, transform, and load) projects are expensive and brittle. Semantic web standards relieve this problem in an extreme way, by providing not only a way to write and read data but also a standard for specifying how to do this on an industrial scale. It is already possible to exchange data on a very large scale from one vendor’s software to another. This interoperability enables data to be the centerpiece of an enterprise information policy.

Data as a product (with SLA, customer satisfaction, etc.)

An upshot of a data fabric is that data now becomes valuable in its own right; providing data is a way that one part of the organization can support another. This shift in emphasis on data is sometimes referred to as seeing data as a product. Someone who provides data takes on the same responsibilities that we expect of anyone else who is providing a product, including guarantees, documentation, service agreements, responses to customer requests, etc. When someone views data as a product, other parts of the organization are less likely to want to take over maintenance of their own version of the data.

The semantic web standards (RDF, RDFS, and SKOS) go beyond simple syntactic standards; each of them is based on sound mathematical logic. Since most enterprises don’t have staff logicians, this might seem like a rather obscure, academic feature, but it supports key capabilities for viewing data as a product. With these standards, it is possible to know exactly what a set of data means (in a mathematical sense), and hence to know when some data provided in response to a request satisfies a requirement for a data service. This is analogous to having clear requirements and metrics for other kinds of products. You can’t support a service level agreement for a product if you don’t know what services have been promised.

FAIR (findable, accessible, interoperable, and reusable)

The FAIR data principles place a variety of requirements on data and metadata representations, many of which are supported by a standardized knowledge graph. Explicit knowledge representation makes it possible to find data that is appropriate for a particular task. Globally reference-able terms (in the form of URIs) allow for interoperability, since one data or metadata set can refer to any other. The web basis of the semantic web standards allow them to interoperate natively with web-based accessibility standards, and the extensibility of an ontology encourages reuse. FAIR is not explicitly a semantic web recommendation, but the semantic web covers all the bases when it comes to building a FAIR data infrastructure.

My next blog post will encompass how to get started on building your data fabric. Find the earlier blog posts of An Integrated Data Enterprise series here:

The Smart Data Blog