This post is an excerpt from the O’Reilly ebook The Rise of the Knowledge Graph published March 2021.
The NoSQL movement, along with its unruly child, the data lake, were in many ways a reaction to the frustrations and skepticism surrounding the sluggishness, high cost, and inflexibility of traditional approaches to enterprise data management. In some organizations, there was even outright rejection and rebellion; the words “data warehouse” were literally barred by the business, along with everything else that came with it. From the data consumer’s point of view, enterprise data management systems were too often a barrier to discovery and access of the data they needed.
But, of course, there are numerous scenarios that support enterprise data management systems. Consider the following:
- A government manages various documents around family life: marriage licenses, birth certificates, citizenship records, etc. Built into this management are the assumptions behind the structure of this information, such as that a family consists of two parents, one man and one woman, and zero or more children. Suppose the government decides to recognize individuals whose gender is neither male nor female, or parent structures where the gender of the parents need not be one man and one woman. Or perhaps it decides to recognize single-parent families or a legal ersatz guardian, like a godparent, as part of the formal family structure. How do we make sure that all of our systems are up to date with the new policies? A failure to do this means that some systems are in violation of new legal dictates.
- A large industrial manufacturing company wants to support its customers with an integrated view across the entire life cycle of a product from its inception in R&D, to its manufacturing, to its operation in the real world. Such a view supports analytic and operational use cases not before possible, but requires integration of sources never before considered.
- Privacy regulations like the General Data Protection Regulation (GDPR) include a “right to be forgotten.” An individual can request of an enterprise that personally identifiable information (PII) be deleted from all of its data systems. How can an enterprise assure the requester, as well as the government regulators, that it can comply with such a request? Even if we are able to say what constitutes PII, how do we know where this appears in our enterprise data? Which databases include names, birthdates, mother’s maiden names, etc.?
- A retail company buys out one of its competitors, gaining a large, new customer base. In many cases, they are not actually new customers, but customers of both original companies. How do we identify customers so that we can determine when we should merge these accounts? How do the pieces of information we have about these customers correspond to each other?
- An industrial trade organization wants to promote collaboration among members of its industry, to improve how the industry works as a whole. Examples of this are plentiful: in law, the West Key Number System allows legal documents to be indexed precisely and accurately so that the law can be applied in a uniform manner. The UNSPSC (United Nations Standard Products and Services Code) helps manufacturers and distributors manage the products they ship around the globe. The NAICS codes (North American Industry Classification System) allow banks and funding institutions to track in what industry a corporation operates, to support a variety of applications. How can we publish such information in a consistent and reusable manner?
There is a common theme in all of these cases—it isn’t sufficient to just have data to drive an application. Rather, we need to understand the structure, location, and governance of our data. This need to understand your data can be due to regulatory purposes, or even simply to better understand the customers and products of your enterprise. It isn’t sufficient just to know something; an agile enterprise has to know what it knows.
At first blush, it looks as if we have contradictory requirements here. If we want to have agile data integration, we need to be able to work without the fetters of an enterprise data standard. If we require one application to be aware of all the requirements of another application, our data management will cripple our application development. To a large extent, this dynamic, or even just the appearance of this dynamic, is what prompted so much interest in NoSQL. On the other hand, if we want to know what we know, and be able to cope with a dynamic business world, we need to have a disciplined understanding of our enterprise data.
A knowledge graph has the ability to support agile development and data integration while connecting data across the enterprise. To move forward on this topic, grab the ebook. Where we examine all the details of the knowledge graph, first by looking at the unique features of graph representations of data, then by examining how knowledge can be explicitly represented and managed. We then show how these two can be combined into a knowledge graph. Ultimately showing how knowledge graphs are perfectly suited to support the vision of a data fabric in an enterprise, changing the way we think about scalable enterprise data management.