At Cambridge Semantics, we have watched the explosion of AI awareness in the last year with interest. Nearly every tech leader considering our knowledge graph platform, Anzo, is also actively seeking to integrate artificial intelligence and machine learning into their operations. This drive is fueled by a growing emphasis on harnessing these technologies for enhanced insights and efficiency.
There is an immediate obstacle that these organizations face. To leverage AI, whether that be with large language models (LLMs) or other machine learning techniques they will need to connect to clean, well-defined data sources.
Venture capital has quickly realized this and made it a cornerstone in their evaluation of opportunities. This TechCrunch quote nails it:
“The true value proposition of AI companies now lies not just within the models, but also predominantly in the underpinning datasets. It is the quality, breadth, and depth of these datasets that enable models to outshine their competitors.”1
As a corollary, this logic could be extended to simply mean that the value of an AI project is directly tied to the breadth, depth, and quality of its underlying datasets. From a broad perspective, these datasets rely on a foundational technical stack and several obvious companies have seen an immediate benefit:
- GPU Chipmakers: NVidia, Intel, AMD
- Infrastructure providers: AWS, GCP, Azure, etc.
- Data Platforms: The Big 3 above plus, Snowflake, Databricks, etc.
This list is a great place to start when thinking about the fundamental components for developing AI models, but it’s lacking a last crucial layer – data integration. Think of data integration as the crucial last mile, preparing clean and accurate data for AI.
Analysts have had a lot to say about the technologies that will play a prominent role in this last step; and if you’ve been paying careful attention, there’s been a constant message that Knowledge Graphs (KG) have become a critical enabler to the AI Revolution. A recent report from Gartner notes that:
The range for knowledge graphs is Now, as KG adoption has rapidly accelerated in conjunction with the growing use of AI, generally, and large language models (LLMs), specifically. GenAI models are being used in conjunction with KGs to deliver trusted and verified facts to their outputs, as well as provide rules to contain the model.3
Beyond this, knowledge graphs provide additional capabilities. In an ideal world, data engineers could choose well described data points from across a “single pane of glass” - integrating, aggregating and harmonizing data from previously siloed data sources into a common set of parameters to feed custom algorithms. Think about this quote from McKinsey:
“Context can be determined only from existing data and information across structured and unstructured sources. To improve output, CDOs will need to manage integration of knowledge graphs or data models and ontologies (a set of concepts in a domain that shows their properties and the relations between them) into the prompt.”4
This quote highlights two of the major advantages that knowledge graphs provide.
- First, unlike relational databases, knowledge graphs treat unstructured content (text files, pdfs, etc) as a first class citizen, and allow these data sources to be natively connected to structured data.
- Secondly, ontologies provide a semantic layer that natively expresses the relationships between data concepts. The semantic layer not only guides data engineers, but also provides data provenance.
Now, you might wonder how Anzo can empower AI and Large Language Models (LLMs). Anzo stands out as the sole comprehensive knowledge graph platform boasting an architecture that enables users to dynamically construct knowledge graphs using a unique structure known as a "graphmart." This involves overlaying and combining data from diverse sources, whether structured or unstructured. A graphmart serves as an optimal framework for creating knowledge graphs instantly, offering features and a design specifically tailored for AI initiatives:
- In-Memory Activation: Each data source becomes an activated in-memory layer within the RDF knowledge graph engine. Additional layers can be seamlessly added, creating logical connections, extensions, and transformations within the knowledge graph. This approach limits data movement between sources and the knowledge graph.
- Codeless Workflows: Our latest 6.0 release marks a significant advancement, introducing an intuitive interface that empowers users to connect, map, and cleanse data effortlessly, all without the need for coding.
- MPP Query Engine: This feature boasts dual advantages. Firstly, users can load data without prior inspection, relying on the knowledge graph to cleanse it. Moreover, given the computational intensity of AI tasks, running queries within the MPP query engine, when applicable, efficiently saves resources for downstream applications.
Furthermore, as alluded to above, Anzo represents data with ontologies, which provides several advantages over relational systems.
- Structured Knowledge Representation: Ontologies provide a structured way to represent knowledge. They define concepts, relationships, and categories in a domain, which helps in organizing, disambiguating and contextualizing data. When this structured knowledge is integrated with LLMs, it enhances the model's understanding of relationships and hierarchies in the data, leading to more accurate and contextually relevant responses.
- Domain-Specific Customization: Ontologies can be tailored to specific domains, providing a specialized knowledge base for LLMs. This is particularly beneficial in fields like pharmaceuticals, manufacturing, law, or engineering, where domain-specific knowledge is crucial for generating accurate and reliable content.
- Enhanced Learning and Adaptability: The combination of ontologies and generative AI models facilitates continuous learning. As the knowledge graph evolves and expands, the AI model can adapt and refine its outputs, leading to a system that improves over time.
- Scalability and Efficiency: Ontologies facilitate more efficient data management and querying. Many relationships, such as the ones that exist in supply chains, hierarchies, and investment portfolios are more efficiently represented by an ontology than a relational database. This efficiency translates into faster and more scalable responses from generative AI models, especially in handling large volumes of data or complex information networks.