Starting at the Finish Line: How Anzo’s Upcoming Release Will Create a Knowledge Graph with the Click of a Button

Posted by Eugene Linkov on Mar 10, 2021 6:22:17 AM
Eugene Linkov
Find me on:


This blog highlights some of the new automation features that Cambridge Semantics is building into our scalable knowledge graph platform - Anzo. If you are interested in seeing these capabilities live, you can request a private demo here.

In a recent report from Gartner, “Demystifying the Data Fabric”, Jacob Orup Lund states that one of the pillars of a data fabric is a “Knowledge graph enriched with Semantics - The knowledge graph is designed and built to store and visualize the complex relationship between multiple entities.” Yet, the missing piece that other platforms fail to recognize is the need for automation. Integrations don’t scale- can’t scale - if everything has to be done by hand. Knowledge graphs have been pretty niche until recently but now they are coming into the mainstream - driven in part by the adoption of the data fabric architecture. By definition, a data fabric is big so the underlying knowledge graph platforms must, to the extent possible, automate many of the steps required to stand up and maintain an enterprise-scale knowledge graph.

Over the last 7+ years, Cambridge Semantics Inc.(CSI) has built one of the most scalable knowledge graph platforms in the world - Anzo (Learn More). Anzo has always featured industry-leading automation in all parts of the knowledge graph lifecycle from initial onboarding, to modeling, to end-user query generation. Anzo’s upcoming 5.2 release will showcase a major technological advance that will automatically allow users to build a knowledge graph from multiple sources without having to manually build the graph model representation of the data, nor do a lot of step-by-step data on-boarding. With one click, any collection of enterprise data sources can be transformed and presented to users in an interconnected graph form.   

At the core of this automation in Anzo 5.2 is a new capability in our graph engine which allows you to natively use SPARQL queries to directly load and integrate any commonly used file format, data service, or relational database. With this directly loading capability, users no longer have to wait for their Spark jobs to first ETL the data into graph format before loading it into the graph engine!

With this overall workflow, users will be able to follow two simple steps to automatically create a knowledge graph:

  1. Register new data sources with Anzo’s metadata catalog(Databases, CSV, JSON, etc.) and provide the connection information where applicable. Anzo will process the connections to retrieve metadata information and grab the available schemas and tables for a more curated selection.
  2. Click the “Create Graphmart” button after you have selected one or more sources

After Step 2, Anzo will use Kubernetes to automatically spin up an AnzoGraph cluster on the cloud provider of your choice and bring your data into memory. Connections will be created for disjointed datasets from different sources using a batch of profiling and inference queries that can all be tuned to provide the appropriate confidence intervals. Visualizations and metrics are also generated for the properties and classes that have been automatically materialized from the underlying data sources. The knowledge graph is active and ready to use within Anzo’s native dashboard builder specialized for navigating the connections of the graph data. More comfortable with another BI tool? Simply grab a generated REST API connection string for the entire knowledge graph or use the visual query builder to expose subsets of the knowledge graph that fits your dashboarding use cases.  Not only is the data now ready for analytics but Anzo has already done most of the work for you!

Below is a screenshot of a complete knowledge graph generated solely using this new workflow. The underlying data sources in this example are a small set of CSV files from a Kaggle project but the steps are the same as if you were to use a much, much larger set of data sources. All the connections between the classes, property type inferences, and metrics from the used data sources were automatically generated. Point and click navigation can now be utilized to quickly infer further insights from the newly created knowledge graph or use the APIs to hook this knowledge graph into an operational workflow or customized user experience.


While the fully automated and touchless workflow facilitates and accelerates the steps to an operational knowledge graph, Anzo offers many capabilities to further enrich your data. Using data layers, more advanced users can utilize hundreds of native graph algorithms, run data integrations through external ML and AI tools, mask data by setting multi-level permissions, run validations, and so much more.

Anzo 5.2 delivers a streamlined way to get you to an operational, interconnected knowledge graph without spending days ingesting data, mapping & modeling relationships, and applying business semantics with a patchwork of tools. If you are interested in seeing this new automated onboarding process live or other platform capabilities, let us know and we will gladly set up a demo for you. 

Tags: Anzo, Graph Database

Subscribe to the Smart Data Blog!

Comment on this Blogpost!