As the physical world continues to become increasingly digitized, the need for connected, interoperable data sources and systems that can communicate with each other seamlessly becomes more critical. From business-to-business processes to supply chains and beyond, the vision for the future of data is one that enables delegation and autonomy. This is where a strong data architecture comes into play.
Interoperable data is a critical component of the connected ecosystem vision. Semantic interoperability, in particular, is an area where data architecture trends are rapidly evolving. Knowledge graphs using unambiguous ontologies will continue to increase in adoption as a way to achieve this goal.
Knowledge graphs are modular, making them ideal for sharing, re-use, and maintainability. They can be physically distributed and logically interconnected, promoting decentralized data architectures. This is where the data fabric and data mesh architectures come into play.
The data fabric architecture is designed for enterprise scope and scale. In contrast, the data mesh architecture features multiple domains, introduces the idea of data products, and requires enterprise mesh services. Each domain and data product can be a knowledge graph, and the mesh catalog can also be a knowledge graph.
As publishers register their data product descriptions in the mesh catalog or broker, consumers can look up and discover data products of interest. Then, they can connect seamlessly to the data they need.
In effect, data mesh architecture is Service Oriented Architecture combined with Data Products, and using knowledge graphs to enable the vision of the connected ecosystem.
To summarize, we’ll see distributed data models that are logically connected and accessible as a seamless data model. This implies that we’ll see more adoption of machine-understandable semantics, robust federated query implementations, and more data virtualization. Cambridge Semantics is leading the way in this exciting field, helping to build the connected ecosystems of the future.
Sam Answer's Your Questions
We've been hearing that data warehouses are now obsolete with the emergence of data lakes. What are your thoughts on that, Sam?
I think this is a great question, and what comes to mind is that, in practice, data warehouses basically are built to answer known questions. But in the real world, we have new, unanticipated questions. And new questions lead to building yet another data warehouse; and by the time the next data warehouse is finished, the questions have changed again. So, in the data fabric architecture—and some of this is in the implementation—we want to load the knowledge graph (which contains all you know about some area of interest) on demand. You want to arbitrarily query the knowledge graph at the speed of thought. And that, I think, is one of the differences in getting past traditional data warehouses and moving forward to a more on-demand unanticipated question answering capability.
Perhaps you can help us define what a data warehouse is. Is it defined around a dimensional model?
The thought that comes to my mind—and this is getting a little bit heady, but it's something that can be researched—is basing your ‘modern data warehouse’ on a graph data model, and a more ontologically constructed data schema or model that supports the Open World Assumption. That approach is going to give you the ability to change rapidly. So, I keep going back to the need to answer questions on demand at the speed of thought, not just the already known questions. So, a modern data warehouse is more dynamic, ontologically driven, and more adaptable to change.
Don't different data architectures, graph, dimensional, data frame, etc. serve different purposes and hold their own value within giving proper context?
We (CSI) spend a lot of time automating conversion of different data structures and models to create the knowledge graphs that give you all this goodness. So, from our perspective, we're kind of moving past a world that is more closed and insular, to a world that is more open and connected; and modeling things more ontologically versus modeling them based on use cases. So, in that sense, I would say, well, let's move past it. Having said that, at the end of the day, you have to use what makes sense for your requirements. But try to keep an eye toward the larger community. Because one way or another, you are not an island, and you will be required to interconnect with others. I mean, you'll want to.
With the emergence of data mesh architecture, is there still a need for the enterprise data warehouse?
I was thinking that I've been involved in semantic technologies for just over 20 years. And when I first learned about them, I thought they were a no-brainer. But then I grew up and realized how hard it is to have new technology adopted at scale. So, what I would say to this question is think of this as a long transition that a) the sooner you can get your data warehouse over to what we described earlier as a modern data warehouse; and b) as the mesh becomes more proliferated and more robust, then gradually over time the way we model and implement a data warehouse would change subsequently. But I would think of this as a long-term thing.
If there's one thing that you'd like our readers to walk away with, what would that be?
I know that I work for Cambridge Semantics, so there's a natural bias there. But based on what I've been involved in for the last couple of decades, the ability to apply machine understandable semantics and to model things more ontologically. Notice I always use the phrase ‘more ontologically.” Don't go down the rabbit hole of trying to build the ‘perfect ontology,’ because I'm not sure you're going to finish that before you retire. But do model things more ontologically; and look at standards like W3C RDF and OWL; and think more universal, more global. Think big, start small, and scale fast is what I would say. And if you want to look for something concrete, study knowledge graphs and the upcoming data mesh architecture.