At Cambridge Semantics we use the W3C semantic web standards to create conceptual canonical data models, in particularly using the web ontology modeling language called OWL. The conceptual models are declarative and express information in the way that the domain expert or business user, understands it – usually as a series of interlinked concepts and properties. Unlike most traditional technologies, these conceptual models are independent of how data is stored and provide an abstraction sometimes called “a semantic layer” for our Anzo software.
The result is unprecedented flexibility.
The conceptual models can reflect different versions of the truth if necessary; either evolved over time or in how different groups of users understand different concepts just sharing what is common between them, since they are independent of storage system constraints. OWL names concepts uniquely across foreign languages and even multiple human readable names or labels for properties that describe the same concepts. Using W3C open data standards ensures that the conceptual models can encode a common vocabulary for different parts of an enterprise or members of an information supply chain to talk jointly about and share their data far more easily.
The conceptual models can encode property restrictions, controlled vocabularies and can be annotated with information useful to users, ETL processes, query & form building tools, downstream data consumer applications; validation logic etc. Again this is all in expressed in declarative fashion, independent of all the storage systems and information consuming/producing applications but reusable by any of them.
The conceptual models themselves are often based or can import elements of a growing set of existing industry or domain models expressed in OWL or other representations and since they are standards based the models can easily be shared for reuse by partners, customers, vendors or anyone else that would like to align their information conceptually.
The OWL ontology language facilitates the tagging of data instances from multiple data sources with their meanings, to form a single integrated view of that data built on a multitude of simple factual statements called RDF triples. RDF is an open standards based graph oriented data representation in which objects or graph nodes have properties, some of which have data values and others are pointers to further nodes in the graph. The graph model is an intuitive one for humans who tend to think by associations of objects and their properties. It is far easier for most of us to traverse a series of interlinked concepts to figure out what data we have or need and how it is related, than think about say the interlinked table structures in the data schema model offered by the relational data technology for example.
Operationalizing the Conceptual Model
Once ontologies are established, these conceptual models can easily be operationalized. Together middleware and tooling software that supports the family of W3C standards designed to “play nice together”, the models can drive and underpin nearly every aspect of a system. Here are some examples supported in our own Anzo software:
- To import and integrate/conceptually align multiple data sets to the Anzo system by mapping it from the private proprietary models to the standards based conceptual one;
- To allow end users to find on-the-fly the data they need through searching by both concepts and content in a manner that abstracts away from individual source and format issues;
- To guide the creation of queries and forms for access and manipulation of data – queries and form builders work by allowing users to transverse familiar concepts to find what they want without being impeded by artifacts of storage (e.g. sources of data, formats of data, SQL joins etc.)
- Simplify data integration – at the conceptual level for structured, semi-structured & unstructured data. Data from any source is mapped to the same concepts drawn from the model.
- To automatically create expressions of the how the data may be accessed or manipulated e.g. code generation for Web Services access or the generation of programmable business objects; or the generation of relational database schema that reflect the ontologies;
- Validation & data quality;
- Concept based access control;
- Inference and reasoning;
- Can increasingly be used as a basis for expressing human readable/editable logic rules for making data driven decisions and manipulating data – contrast this with the traditional approach that requires business requirements be passed to developers and DBA’s who will scatter that logic into source code across multiple tiers of a system as well as its database schema, making it impossible for the business user to truly understand the actual logic of the system and creating an ongoing maintenance resource sink ;
- If data is transferred out of the Anzo system, the conceptual model that describes that data can travel with it included in the data stream using open standards like OWL and RDF. This greatly facilitates reuse because information is automatically de-silo’ed and downstream applications can also read the conceptual models and adjust themselves accordingly.
Conceptual Models underpin the Anzo Software
The Anzo software is entirely driven by standards based conceptual models. It includes many software components that are all designed to work together driven by the same conceptual models:
- Anzo Connect and Anzo for Microsoft Excel provide the ability to import structured and semi-structured data into the Anzo system (transforming it as necessary) and described using the conceptual models.
- Anzo Unstructured can be used by end users to create their own multi-vendor based Natural Language Processing (NLP) pipelines that map data extracted from documents (emails, web pages, pdf’s, etc.) onto the same conceptual models, thereby integrating both structured and unstructured data for blended uses.
- Anzo on the Web is an end user focused BI dashboard and Form builder tool. It can be used to locate the data users want and visually mash it up, by using the models to understand what data is available to them and abstracting the complex query building to the relatively simple task of traversing linked concepts. Users can create simple “info apps” for themselves and other less skilled users that include forms for changing data, charts, tables and advanced filters.
- Anzo for Microsoft Excel provides two-way interaction with data made available through access to the conceptual model. Spreadsheet data can be mapped to the model to make it easy to collect and automatically integrate data using worksheets or to build forms for entering and reporting data as worksheets.
- Anzo Workflow and Anzo Rules are also driven by the conceptual data model and used to codify data flows and automated data driven decision making.
Flexibility Achieved
The reason the Anzo approach is so flexible and dynamic is that it takes a holistic approach to all these software components providing different functions and has arranged that all coordinate as a cohesive system using the common understanding provided by the shared conceptual model. In the traditional world, each of these components would be different piece parts, often provided by different vendors, requiring a system integrator to configure or program what is necessary to tie them into a single system.
In Anzo, a change to the conceptual model or the creation of a new model, as the business changes, is better understood or a new need develops, is reflected everywhere immediately and repairs to dashboards and ETL maps can quickly be affected to reflect the new reality. Indeed the old reality can often be left to co-exist in the same system if there are downstream applications that still rely on it. Often it will be the end users themselves who make these changes since access to data has been so simplified through the use of the conceptual models.
Contrast this conceptual semantic layer approach with traditionally built systems where for every alteration you will need the different people skilled in understanding the interactions of all the different piece part components of a solution, that do not share an abstracted model, to modify the logic used to glue those parts together – generally a long and costly business that soaks up the greatest proportion of the overall IT spend.