Blog

Demystifying AI: Natural Language Processing (Part One)

Geoff Cunningham

Head of Data Science

Published

August 4, 2022

Demystifying AI: Natural Language Processing (Part One)

Geoff Cunningham

Head of Data Science

Published

August 4, 2022

Machine learning and artificial intelligence are increasingly-used buzzwords in the pharma and biotech space. But what’s hiding in that black box?

Artificial intelligence (AI) seems to have found its way into almost every aspect of our lives. Depending on the application – or indeed on your point of view – this is anything from a brilliant innovation that saves time, resource and cost, or it’s a terrifying step towards a dystopian, cyborg-dominated future.

The truth is, of course, it’s somewhere in between. But, understandably, as AI creeps into healthcare, people are inclined to err on the side of caution. However, the role of AI (a term I dislike, but that’s for another day) is starting to become clear and there are huge potential benefits to applying certain types of AI to the worlds of pharma, healthcare and life sciences.

One of the most well-established and effective AI tools is Natural Language Processing (NLP). It’s particularly relevant to healthcare. This is a process by which huge quantities of text can be effectively read and understood by computers in a fraction of the time it would take a human to complete the same task. It has thousands of applications, but when you consider that around 80% of healthcare data is unstructured – doctors’ records, clinical notes, and so on, there’s a clear use case.

We’ll look in more detail at some of the applications in later blogs in this series, but I thought it was useful to lift the lid of the black box and share just a little of how NLP works. It’s a technology we’re using a lot at Evaluate so I’m keen to demystify it somewhat.

NLP combines computational linguistics-rule-based modelling of human language-with statistical, machine learning, and deep learning models. There are elements – effectively resolutions or zooms of understanding of the text. We need these different levels because one of the key challenges of NLP is the fact that the meaning of a text of often non well-localised – it’s spread out and so the algorithms need to span all these resolutions from fine-grained words to large documents.

So, what are these resolutions?

Lexical: Analysis of word structure
Syntactic: grammatical analysis of the words (are they nouns, verbs etc.?)
Semantic: Meaning of word in the context of the sentence.
Discourse: Meaning of any sentence depends on sentences around it
Pragmatic: Looking at the entire text to get the meaning from it

For a typical use case, you will be focusing on levels 3 (e.g. data mining), 4 (e.g. generating knowledge graphs) and 5 (e.g. classifying documents). NLP can be applied to any type of text about any topic, from insurance to market research to life sciences. But each instance of the NLP technology needs to be trained to understand that type of content it is dealing with. Why? Well, as humans we know that legal contracts are a whole different language compared to tweets from Twitter, compared to biomedical research. The words are different, the style of language is different, and this means that the models need to learn the vocabulary and statistics of how the words are used in each case. That is a slow step that needs a lot of data, but once that is done, we have a general understanding of the language which we can fine-tune for specific purposes (such as detecting new drug names) with far less data. Single-purpose models can then be created for lots of use cases from the same base model.

In the second part of this blog, we’ll look at the key application of machine learning, including how we at Evaluate are using it to ensure that we are providing the most up to date, accurate insights to our clients.

Upcoming Webinar

China Pipelines, Western Portfolios

Tuesday 14th July 3pm BST | 10am ET

Western pharma committed close to $2 billion in licensing deals and acquisitions for China-originated assets in Q1 2026 alone.

Discover

Article

October 14, 2025

How AI is transforming forecasting in pharma

Discover

Infographic

October 13, 2025

AI and ML in Pharma: Redefining the Forecasting Landscape

Discover

Blog

September 27, 2024

2024 Dealmaking: What’s the Score?

Discover

Blog

August 6, 2024

In Case You Missed It: Three Things we Learned in our AI Forecasting Webinar

Discover

Blog

July 24, 2024

Unlocking the Power of AI in Pharmaceutical Forecasting

Discover

Webinar

July 15, 2024

AI and ML in Pharma: Redefining the Forecasting Landscape

Discover

Blog

June 2, 2024

The Future of Pharma Forecasting

Discover

Blog

January 23, 2024

Knowledge and Insights: The Power of Real-World Data

Discover

Article

December 13, 2023

AI-Powered Portfolio Decisions

Core Forecasting Products

Regional Forecasts

AI

Forecast Enablement

Bespoke Forecasting

Portfolio & Scenario Analysis

Competitive Context

Pipeline & Asset Intelligence

Deal and Asset Evaluation

Bespoke Business Development Analysis

Scientific & Pipeline Intelligence

Agentic AI

Competitive Intelligence Platforms

Market and Company Benchmarks

Epidemiology Context

Custom Solutions

Support

Training and Enablement

Need Help?

Forecasting

Business Development & Licensing

Portfolio Strategy

Competitive Intelligence

Forecasting

Business Development & Licensing

Portfolio Strategy

Competitive Intelligence

For investment banks

For venture capital

For private equity

For hedge funds

Forecasting

Business Development & Licensing

Portfolio Strategy

Competitive Intelligence

Discover Resources

Blog

Demystifying AI: Natural Language Processing (Part One)

Geoff Cunningham

Blog

Demystifying AI: Natural Language Processing (Part One)

Geoff Cunningham

Machine learning and artificial intelligence are increasingly-used buzzwords in the pharma and biotech space. But what’s hiding in that black box?

Upcoming Webinar

China Pipelines, Western Portfolios

Tuesday 14th July 3pm BST | 10am ET

Related Content

Discover

Discover

Discover

Discover

Discover

Discover

Discover

Discover

Discover

Discover