Cambridge Semantics CTO and co-founder Sean Martin discusses the unique and user-friendly capabilities of the company’s latest version of its flagship product, Anzo Smart Data Lake 4.0, and the new platform’s potential impact on current and future markets.
Q: What is the value proposition that Anzo brings to the market?
The key one is that we simplify access to data by giving it business meaning. If you use Anzo you will be able to make data enormously more reusable than it currently is and provide far quicker access to people who need that data to make a decision.
Semantic technologies have always been seen as a great way to potentially integrate data since it greatly simplifies the sophisticated modeling needed to connect information from multiple sources including text, but up until now it was not possible to scale these solutions. They tended to stop at the departmental level at best before they ran out of steam. Querying them was just too slow to make them useful in practice.
Cambridge Semantics is the first to provide the scale necessary to analyze massive volumes of data in a data lake platform using the semantic open data standards. We provide an enterprise-level product, the Anzo Smart Data Lake, that can combine all of the unstructured and structured data from various sources in your business rather than just serve as a niche departmental solution. Enterprise users can take advantage of what we call an enterprise information fabric which is a standardized way to access all their data, based on a semantic layer that provides the common business meaning to that data. This approach, using open standards to model and connect all the data, along with a common access layer, ultimately creates a coherent network made up of all that data that can be used and reused in any number of different combinations to provide valuable insights that drive businesses forward.
Q: Can anyone in a company easily query and analyze data using Anzo in the same manner a data scientist would?
Certainly. People already use tools like Excel, Tableau and Spotfire to gather and analyze data. The question is, how do you give those people more direct access to much more data, in increasingly sophisticated combinations ?
The traditional way of preparing data is a business person figures out their question and then often works with a team of IT people who find, clean, integrate, and then extract the data needed to answer that question. This waterfall approach is slow and usually requires many iterations and baton passes to get right. It also requires the business side of the house to formulate and narrowly specify their questions in advance (often without even knowing what data is available) so that additional or altered questions go to the back of the queue. With our open standards approach, the connected data is being so well-described at a business level using semantic layers that it can quickly be combined and reused in any manner, much closer to the business person asking the questions allowing the entire operation to become far more agile with quick iterations and fast pivots to evolve and tackle follow-on or new questions immediately.
The semantic layer enabled by Anzo provides both a venue and a means of translation of all of an enterprises’ data to the language of the consuming user, department or organization from how it’s initially generated and stored in any number of proprietary formats from multiple different siloed sources and even documents.
Establishing a smart data lake is almost like creating a huge memory bank with all of the data in the business connected together into a great network of interconnected facts we call an Enterprise Knowledge Graph. The Enterprise Knowledge Graph is easily accessible to all users in the network who can use it to quickly extract the answer to their particular problem. You don’t need to go through several people and long iterative processes to manipulate data into the form that you need to answer a question as it’s all stored, connected and queried in one convenient location. That’s what the smart data lake is about.
Q: What advice can you share for enterprises that want to incorporate Anzo across their business?
One of the nice things about our technology, as opposed to the ERP systems implemented 20 years ago or even more recent data warehouse projects, is that it doesn’t need to be done with a big bang or require a lot of execution risk and business pain. This is an overlay technology that can be added incrementally over the top of existing systems that you already have to provide more value from the data they generate. Enterprises should pick an area that has obvious value in the answers to questions they would ask if they could make it practical to do so and where they need to access data in sophisticated combinations to solve problems.
Begin and succeed with one solution, add more data to the Knowledge Graph and move to an adjacent solution that perhaps reuses some of the data from the initial solutions and perhaps adds a bit more data of its own to the expanding data graph, then a third and fourth. The underlying data from each additional solution accrues to create a massive knowledge graph of reusable and valuable connected data described using open standards making it future proof. This is an approach that can be scaled to any level.
Q: What markets do you expect Anzo SDL 4.0 will expand into in 2018?
We are spreading to more data-intensive environments, to sophisticated customers who are able to more quickly see the advantages to our approach and who have been struggling with more traditional technologies. Pharma, financial services, industrial, oil and gas, retail, government - almost any enterprise usually has multiple complex-data environments. The idea is to take the capability beyond our initial verticals and apply it more horizontally across multiple industries since there is nothing specific about the technology that precludes its broad use. We have an excellent first step in the smart data lake space with this new product and we have many more ideas to further refine the software in really powerful ways.
Q: Does Anzo SDL 4.0 support AI or Machine Learning solutions?
Companies that utilize any kind of advanced predictive analytics or machine learning can benefit from using Anzo and a smart data lake as an enabling infrastructure and foundation for both large scale data management and the subsequent operationalizing of working ML models in analytics workflows. Machine Learning is usually driven by tagged data sets used for training and the resulting models that are created through that training need to be operationalized by applying them to yet more data in an analytics context to get value from them. Anzo supports both of these tasks very well. It is interesting to note that graph technology offers some interesting possibilities to those developing and training ML models since it is so much faster to combine data from multiple-dimensions to create very rich feature sets on which models can be trained to make predictions. We recently released a toolkit to make it really easy to plug external ML models and analytics directly into our MPP graph query engine in the smart data lake, so that they can seamlessly become part of a data analytics workflow. If there is one thing our customers want, it is operational repeatability - to figure out how to apply a particular analytic and then be able to easily and reliably apply it over and over with new data as it enters the system.
To learn more about Anzo Smart Data Lake 4.0, watch our on-demand webinar "A Breakthrough Data Lake Platform for the Enterprise Information Fabric".