Enterprise Knowledge Graph technology has entered mainstream discussion in the world of data architecture. This is exemplified in the recent book titled, “Data Management at Scale,” by Piethein Strengholt. Piethein’s book provides an important discussion of Enterprise Knowledge Graph in his chapter about democratizing data with metadata. In this blog we will elaborate on some of the key points in his treatment.
Strengholt states, “The way objects and resources can be linked together overlaps with how the World Wide Web works: hypertext links can link anything to anything….” This important foundational concept indirectly highlights a chief feature of knowledge graph technology. In an exceptionally practical and limiting sense, traditional data models do NOT consider “external context.”
Traditional models are insular, presume to be right, present ambiguities, struggle with complex data models, and are brittle to change. Having said that, we have collectively developed innumerable technologies and methodologies to get as much utility from them as possible. But, we argue, it’s time for a change; and knowledge graphs — constructed using key W3C standards — represent the next generation of data integration, representation, and access. But which standards? And are these knowledge graph standards ready for mainstream adoption? Let’s look more closely.
Consistent with what Strengholt says, in 2001, Tim Berners-Lee, one of the inventors of the World Wide Web, et al, published an article in Scientific American, titled The Semantic Web, building on the idea that anything can be linked. By extending the current Internet standards with a “Semantic Web” framework, any data can be shared and reused across any application, enterprise, and community boundaries.
That article provided a vision of a Semantic Web that empowers “intelligent agents” to automate many tasks. In the original World Wide Web as we know it, humans publish and share documents. This Web is oriented toward human presentation and consumption of information. In the Semantic Web, published data is oriented toward machines. The vision articulated in the article describes how intelligent software agents will leverage the rich, machine understandable data to automate many tasks. In other words, the machine “knows what we mean.” Notably, Dr. James Hendler, a co-author, became a Program Manager of a DARPA program that developed a technology that became foundational to one of the W3C Semantic Web family of standards; that is, the W3C Web Ontology Language (OWL).
So far we have a vision and we have a conceptual technical framework. Now the framework needs to be defined and encoded. Strengholt lists several standards, and we narrow that further to three from the W3C:
- Resource Description Framework (RDF),
- Web Ontology Language (OWL), and
- SPARQL Protocol and RDF Query Language (SPARQL).
RDF is the data model; OWL is the metadata model; and SPARQL is the query language. RDF and OWL combine to create a knowledge graph; and SPARQL operationalizes it.
Each of these standards feature critical characteristics for enabling the machine (or AI) Web. RDF is schemaless, and the tuple is a “machine fact” or statement consisting of a Subject, Predicate and Object: each tuple is a sentence. The Subject and Predicate are always URIs, and the Object is a literal value or a URI. Note that when the Object is a URI, there exists potential for linkage to other statements. In other words, RDF creates data networks vs data stovepipes.
OWL provides a way to specify some area or domain of interest. We need a way to describe the resources represented in RDF. OWL ontologies provide the unambiguous, machine understandable description of resources. In contrast to “closed world” data models which resist change, OWL ontologies embrace change. OWL is based on an “open world assumption,” which, in a nutshell, means it assumes the model will change. This results in more robust and adaptable data architectures. As Strengholt says, “Although the ontology-based approach is difficult to implement and tools are scarce, it can greatly improve the overall consistency of the data landscape, so it is important to prepare yourself for this emerging development.”
We assert that ontology can be difficult to implement and scale. In fact, complexity and lack of scale historically were adoption inhibitors to Semantic Web technologies. This is no longer the case. Software platforms such as Anzo have matured such that an ontological approach is quite feasible, and are increasingly applied in large, data rich environments.
The Semantic Web query language, SPARQL, as Strengholt says, is the query language used to retrieve and manipulate data stored in RDF. Importantly, SPARQL can join queries across federated data sources, so multi-model graph databases can be incorporated to support multiple data models against a single, integrated backend. In fact, the AnzoGraph database implements our Graph Data Interface (GDI) that leverages features in SPARQL to implement highly performant federated query architectures. Anzo and knowledge graphs have come a long way!
Over nearly two decades we have developed methodologies and technologies that make enterprise scope and scale knowledge graph solutions viable, if not essential, for competitive advantage.
Our Anzo knowledge graph platform has matured to the point where end users require no new skills to successfully employ it. Even IT personnel enjoy access to familiar JEE, RESTful and other APIs. Simultaneously, Anzo allows full access to the underlying technologies. To be sure, enterprise knowledge graph technology is already in production in world-renowned enterprises, and is ready for prime time.
Strengholt's recommendation for using the knowledge-graph-based approach is to start small. He says, don’t try to create enormous charts with all connections that only a small group of very technical people will understand. Build it up slowly, and make the model accessible for business users, too, so everybody benefits from new insight.
We agree. We consistently recommend an iterative and incremental approach, similar to the Agile process. Start with high value use cases and salient data sources. Encode “enough ontology” to describe the entities, concepts and relationships for your application. Then gradually add more data sources and more ontology.
Over time, an important new dynamic will become apparent: data modeling and remodeling will diminish, and end users will increasingly “configure” applications to answer new and unanticipated questions. The use-case driven approach will give way to verification and validation. In other words, your knowledge graph-based solutions will increasingly provide answers, and decision makers will require trust, so verification & validation will become more prominent. Yes, Anzo has thought about this too!