Close this search box.
Subscriber Log In

Machine Learning and Intelligent Forecasting: Beyond the Black Box

The black box. That mysterious widget in which myriad magic tricks reside. For some it is a magical force that delivers exactly what you need, even if you don’t entirely understand how. For others, it is at best untrustworthy and at worst, dangerous. And any solution that offers machine learning, artificial intelligence or predictive analytics is likely to fall into that “black box” arena.

What if we take the lid off the intelligent forecasting  black box and investigate its contents? In our last blog post we talked about what intelligent forecasting is and some of the benefits of it over traditional forecasting models. Here, we’ll take a peek at what intelligent forecasting in life sciences looks like. If you’d like to rummage around in the black box a little more, there is much more detail in our white paper: Intelligent Forecasting for Pharmaceutical R&D.

If we break down intelligent forecasting to its component parts, it becomes a transparent methodology that generates accurate and comprehensive outputs that inform strategic decision making.

Near real-time data, coupled with a flexible and dynamic approach to forecasting means the outputs are current and relevant to pharmaceutical companies’ current strategic direction e.g., speciality care, multi-indication products, complex modalities and precision medicine.
Intelligent forecasting consists of a series of linked processes.

  1. Curation of foundational a dataset from high confidence sources: This brings together many elements, including granular historical pipeline coverage, key R&D metrics and historical sales, product and company characteristics, competitive environment, and market news.
  2. Pre-forecast data processing: This includes standardisation and structuring to make data suitable for analysis. Events, or clinical milestones and major news and outcomes are coded, then datasets are extracted to create training datasets.
  3. Development on training and test datasets: A balanced and unbiased subset of the data is used to train the machine learning and form the basis of estimating the predictability of each attribute. The test dataset is used to validate the accuracy of the machine learning model
  4. Defining product and indication attributes: These are characteristics that define the product and diseases intended to treat. These are categorised into four major categories;
    1. Product characteristics, e.g. MoA or target, clinical milestones
    2. Company characteristics e.g. historical success rates, track record
    3. Unmet need within the indication e.g. regulatory designations, success/failure rates
    4. Competition within the indication e.g. number of products in development, approval order
  5. Attribute selection based on statistical analysis of predictability and attribution correlations: This predictability of model outputs is based on the training dataset, and correlates to real world events. It defines which attributes should be included in the machine learning model based on their predictability and can be flexed depending on product type and indication
  6. Intelligent forecasting methodology applied to real-time clinical and commercial events to predict product/indication level outputs: The machine learning methodologies feed real-time data to generate product/indication-level probabilities of technical and regulatory success and/or forecasts of commercial opportunity
  7. Forecast outputs are transparently produced and explained, quality assured and tested against industry benchmarks to ensure consistency: An audit of drivers can also be performed where it is possible to identify which attributes are used to predict success rates or commercial opportunities. Customisation is also possible, to generate real time predictions based on scenarios

Our own intelligent forecasting solution, Evaluate Omnium, uses machine learning to analyse historical datasets to identify signals of clinical success for products at all stages of the pipeline. It is capable of delivering commercially valuable insights into sales estimates for development assets often 12 to 15 years in advance of actual peak sales occurring.

That’s a bit of a whistle-stop tour of the black box. There’s loads more information not only about how it works, but also about what is means for your business in the full white paper. And, of course, we’re always happy to share more if you have questions. Just let us know!


Karthik Subramanian

VP, Product Strategy & Management


Related Blogs

Understand the context. Data-driven news and analysis for the pharma, biotech and medtech sectors.