The Smart Data Blog

Riding The Competitive Intelligence Automation Wave - Part 1

Posted by Partha Sarathi Bhattacharjee on Mar 16, 2017 4:43:00 PM

Find me on:

“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.” - Sun Tzu

Most Competitive Intelligence (CI) and Business Strategy practitioners are likely to have come across this immortal advice from Sun Tzu. One does not cease to be amazed at how this simple fact underlies the multi-million dollar intelligence gathering industry.

On the flip side, one cannot ignore the reality that CI practitioners are increasingly being asked to demonstrate bang for buck. No industry professional worth her salt will question the utility of CI; what is increasingly being questioned is whether the CI methods have kept up with the times. For instance, are the hundreds of man hours invested in preparing, say, a therapy area landscape optimized to generate the best possible output? In my (limited) experience, I have come across instances that only enforce the question of when, not if, mainstream CI practices will catch up with current resources and technologies.

Graphic 1 is a (very) simplified representation of a typical CI workflow:



Having had an opportunity to develop CI solutions using a state of the art data lake that harmonizes data from diverse sources by semantically enriching them, one can list four pressing issues one encounters while using the traditional approach described above:

1. Asymmetry between resource investment and value generation

Estimates suggest data professionals spend up to 80% of their project time on fetching data and beating it into a shape amenable to consumption by analytical tools. In most cases, this ‘shape’ happens to be tables; often lots and lots of tables. Analysis of the data jostles for the remaining 20% of the budgeted time with other activities such as:

  • Setting client expectations,
  • Resetting client expectations,
  • Sometimes re-resetting every stakeholder’s expectations,
  • Tweaking deliverables to suit individual tastes of senior management folks who already find it cumbersome to not conflate CI with the Marketing team,
  • Meetings,
  • Firefighting last moment discovery of additional highly relevant data by rushing it through the analytical framework with varying degrees of success,
  • More meetings,
  • And wondering how much more awesome this analysis could have been if only the team had more time.

2. Inability to meaningfully interlink large scale data

Thanks to several decades of education and professional experience, we face no difficulty in understanding the meaning of the text and data we come across on a project. Unfortunately, computers do unless using specific concepts and standards to represent knowledge. And equally unfortunately, we humans have limited cognitive abilities that preclude us from being able to reliably interconnect large-scale data and information in a reusable manner particularly within the confines of compressed timeframes with which CI projects often need to operate. The following is an example:

A recent update on a subscribed database informs us that a projected blockbuster under clinical development, 'Awesomeol', is also codified as 'AWL123' in addition to 'NIR223' that we previously knew about. We then find a few hundred news articles and dozens of publications mentioning the drug either by its name, codes, or both. We now need to extract all the new data and information associated with AWL123 to be able to answer questions such as:

  1. Do any genes appear to be frequently co-mentioned with the drug?
  2. Which adverse events are mentioned in the new documents?

Application of the manual approach would entail re-performing extensive secondary research to fetch data about the asset. If this vital piece of information is discovered deep into the lifecycle of a short-term CI engagement, large swathes of prior analysis will need to be reperformed resulting in significant redundant effort.

Also, if the number of literature resources to be analyzed are in thousands, not dozens or hundreds, one can rule out the manual approach for all intents and purposes. The same rule of thumb applies if the data needs to be analyzed in a different context in association with other sources. The following questions (hopefully) illustrate the point:

  1. How does AWL123's publication profile compare to other approved and investigational assets in the therapy area? How about similar indications?
  2. What is the profile of authors associated with the publications? Do they work with other companies?

In a world of burgeoning data by the millisecond, engaging CI practitioners in tasks such as information search and retrieval under evolving contexts where machines are demonstrably better is a colossal waste of their time and talents.

3. Lack of Reusability of the Data or Analytical Approach

The primary reusable elements of the CI effort are the acquired subject matter understanding of the team and the final report as study material for follow-on or related projects. By the time a project is completed, the project team morphs into a cult complete with complex rituals such as:

  • Exactly 2 of the 5 team members knowing what the ‘Landscape_Analysis_version_3.xlsx’ workbook contains,
  • One of those 2 members knowing how ‘List_of_Competitor_Assets_Final’ spreadsheet differs from ‘List_of_Competitor_Assets_Final_2’,
  • Two other members completely understanding the scoring model to rank assets (and it is very likely only they will for the rest of eternity),
  • The remaining 1 member knowing where she sourced all the literature from. Also, her conference notes have been securely mummified in ‘Hotshot_Conference_X_2017.docx’ that almost no one will ever bother to look up 6 months from now.

One can observe a pattern here. Despite all the collective effort invested in creating knowledge, most of it is not captured in a manner that can be largely reused by other teams or even the same team in the future.

4. We make mistakes

Lastly, I would admit some CI activities can be tedious and prone to human error. Plain and simple. Reading and classifying 500 publication abstracts for literature analysis can be painful; and human judgement is inconsistent. I would not even remember how I classified the 10th abstract when classifying the 490th. Wading through badly formatted spreadsheets to determine relevance of data and clean it up can be eye-wateringly cumbersome. It is no wonder issues cropping from minor deficits of attention to detail at such steps often rear their ugly head at subsequent stages of analysis.

 Advances in data and text analytics, particularly semantic enrichment, have resulted in tools that can turbo-charge the performance of CI analysts. Don’t believe me? Here’s a comparison of my execution of text analytics on content from PubMed in two different projects:

In the interest of full disclosure:

 1. Using a full-service data lake typically has a learning curve. The focus on user experience has meant, however, that learners now ascend the curve on motorbikes thereby becoming productive with the tool in a matter of days. Getting full throttle on an analytical engine can take some effort in terms of optimizing configuration. However, the benefits far outweigh the costs of learning.

 2. The sample size of 1 CI analyst (yours truly) is not representative of the entire community. The quality of the tool being used and the use case are important influencers of outcome (having said that, we often base our buying decisions on everything from detergents to pharmaceuticals on advertisements with a sample size of 1!).

Have you had experiences with automation of CI tasks? Have you used tools for normalizing CI data from diverse structured and unstructured sources? What have been your pain points? What have you liked? What kind of impediments have you come across while trying out advanced technologies for adding sophistication to CI tasks? Leave your comments below. Let’s get a conversation going.

To learn more about Semantic Technology's impact on CI, watch our on-demand webinar "CI Informatics: a Deeper, More Comprehensive Approach to Competitive Intelligence Using Semantic Technology".

Watch the Webinar


Read Part 2 of this blogpost orange-arrow.png

This blogpost was originally posted at LinkedIn.

Topics: Clincal Trials, Text Analytics, Data Lake, Analytics, Unstructured Data, Competitive Intelligence