This blog post is taken from the Q&A portion of our recent Knowledge Graph Lightning Demo Showcase. The knowledge graph demo showcase hosted five unique knowledge graph use cases:
- Product lifecycle management
- Supply chain risk analysis
- Powering a 360-degree view
- Building a regulatory Data Fabric
- Uncovering holes in your data
You may very well have had the same thoughts or questions that are reflected here. Hopefully, these answers serve you well:
Let's dive in:
How do you integrate the sources? Is this a manual process mapping field by field?
We’ve actually demonstrated multiple ways to integrate the sources in this showcase. One way is to automatically generate a model from the source schema and transform the data to the target ontology afterwards manually wherever needed (ELT). The other way would be to do ETL where you load your data and directly transform it to a graph. And there are two ways to do this.
Yes, you can do it manually. You could create SPARQL queries and assign the fields, but you also could do it automatically. And this was demonstrated in the product lifecycle management demo. There is an algorithm that matches the source data to different classes and instances and tells you whether there are similarities recommending how you can map your data from the sources. So you can do both manual and automatic integration mapping. It's usually a mix of both.
We also have validation algorithms that let you analyze which parts of the source data have already been mapped to the target ontology–showing which data is kind of still hanging-in-the-air and not really integrated. That said, so what’s the magic sauce of the matchmaker?
There are two things. So one part uses semantic similarity that compares the syntactic deviations of instance labels. The semantic similarity compares classes and concepts based on their instances and relations. If you have two concepts that have similar instances, then you have a higher semantic similarity. You can define it yourself, and do it more complicated than I just explained, but that's the basics.
What tool do you have to create canonical models? Can you elaborate on the Ontology editor?
Anzo has its own modelling editor that uses the OWL spec. We can generate OWL models directly from sources. But often when customers are trying to build canonical models, they tend to use a hybrid of a couple of different approaches. One is to use industry standard models that they operationalize for a given set of use cases. The second is to use the model of one source as the starting point for a canonical model, then leverage and connect other sources into that model. It’s a little bit of both. We try to automate the construction of the model as much as possible, but there's always going to be user intervention to get that model as applicable as possible for a given client.
What’s the difference between RDF and LPG?
Going beyond the technicalities of what structurally is different between a label property graph (LPG) and RDF, the most important question is “what is the use case you have for it?”
If your entire world is graphs where everything is going to be modelled inside of graph data that conforms really well to graph structures throughout it? Then LPGs are the perfect use for this because you have to traverse the graph anyways. There is no overarching model of relationships and expression of ontology that is present in the labelled property graphs such that you can reificate your nodes into a new structure through cross-product construction, like you would inside of RDF.
If you have a desire to describe the world in a graph, RDF becomes much more valuable because of its extensibility. We've been talking about RDF exclusively through an ontological/semantics perspective. Specifically, the use of RDF OWL. This allows for a common framework to create a model over the different RDF triples that constructs a graph at query time. And that's where we really get the concept of a knowledge graph. The knowledge graph is ephemeral and exists only at the time that the query is run. The world as described by the RDF gathers the triples and it all gets reificated then and there.
Combined with RDF*, you can get some benefits of named graphs and other such inferred triple relationships like you would out of a graph query - but you must have a system like Anzo that supports RDF*.
We used to have to exclude RDF because of technical limitations of scale and economics, as it is more computationally intensive to use RDF. But that hasn’t been the case for many years, especially with the dominance of elastic computing in the cloud. So it really comes down to, do you have a world that is best served with graphs? Or are you trying to describe the world and continuously extend the alias and refer to things outside your scope in an iterative fashion? If that's the case, labelled property graphs would be very painful in order to continually model to walk or traverse the graph in order to express the shifting, amorphous relationships that may not always bind vertices and edges together when new relationships are introduced, resulting in a lot of hanging properties.. Whereas inside of RDF, you can quickly shift and pivot based on the shift to the model and then simply link the triples together using that context.
Mainly, those are the two use cases that exist in the world. If you find an application that exists solely within the graph space, there's not much real use or benefit to using RDF triples. However, if you are trying to describe multiple systems about the world in the language of those business users–such as in an Ontology where it's human-based language–the use of RDF becomes natural.
On top of that, you can make edits in real-time. You can have one version of your ontology, the world as you know it, then author a new one but still use that old ontology to refer to the current set of triples. Seeing the world as it was versus the world as it is. Whereas in a graph you'd have to make two copies of that graph in most systems.
I hope that gives a little bit of clarity from a business or use case perspective as to why you might want to lean in one direction versus the other, instead of just purely a technical perspective.
Yeah, that's definitely not an easy question to answer because it has a lot of nuances. I think it's important to understand that it's sometimes very tempting to use LPGs as it can be easier to get started, has a lot of plugins, and there is a lot of open source material that one can reuse; but for enterprise-scale usage LPG would create new silos. It would still be graphs, but you wouldn’t have solved your data integration problems.
Does Anzo have master data management capabilities?
Master Data Management is an interesting use case. Master data management is certainly something you can do in the context of the graph.
In the demonstrations, we danced around this a little bit. When you’re building a canonical model, you’re mapping sources and essentially doing master data management. Especially if you're having two sources map onto the same entity. Often our approach when we map sources to the canonical model is to have object properties that point back to the original source model. This is to trace the provenance of that data.
There's going to be some level of manual work required to write the rules that match those sources up into a common entity. Anzo can accelerate that process, utilizing graph metadata to determine heuristically where those matches can be made. There is a more complicated approach that uses the actual structure of the graph itself, using graph embeddings, and if you're interested in this we should have a conversation about it.
What query languages are supported with Anzo?
We use SPARQL. We have limited support for Cypher, because if its recent popularity. SPARQL is the primary language because it is standardized according to the standards defined by W3C. So whenever you use RDF and OWL to define your ontologies, SPARQL is the language to use to define the rules, to define the model, and to query the data. You can do all of this with SPARQL. I understand SPARQL might be tricky in the first few days to learn, but that's the standard. So whenever you want to switch solutions, you can. Unlike Cypher. SPARQL or RDF triple stores will always give you the opportunity to be agile and avoid verndor lock-ins.
Do you support SHACL?
We will have the first release of SHACL by the end of this year. The answer for right now, not yet. But in a few weeks we will have the first release. It will be offered to our customers a few months into the next year.
To add a little clarification. Right now, our approach is to let you write scripts where we can leverage those SHACL shapes onto SPARQL queries.
How does Anzo handle time series data and time dimensions?
This is an interesting use case for many industries including manufacturing and financial services. Often our approach is to use a temporal model where we have time properties that provide a time box which indicate when a specific property or connection between nodes in our data is applicable.
For example, we have one very interesting use case with a hedge fund customer. The timing of activities is critically important. The knowledge graph model is constructed so that all the activities around a specific transaction map to a temporal model. Using the knowledge graph, all interactions between employees prior to a given trade being made can be studied. Again, if this is something that's of interest to a particular group here, we're more than happy to schedule a follow up session.
That’s the end of our Q&A. We hope you got a feeling for what Anzo is, which in its simplest terms, a platform that supports the building of enterprise knowledge graphs. It's not just scalable and full of features, it also allows for workflows to include people from all over the organization. Anzo helps domain experts and people who maybe are not experts in SPARQL, RDF or knowledge graphs bring their own knowledge and their domain expertise into the solution. This is a key success factor in building enterprise knowledge graphs. Enterprise knowledge graphs and collective intelligence are only possible if the whole company is enabled, and we believe Anzo does this very well.