Blog

Demystifying AI: Natural Language Processing (Part One)

August 4, 2022

Machine learning and artificial intelligence are increasingly-used buzzwords in the pharma and biotech space. But what’s hiding in that black box?

Artificial intelligence (AI) seems to have found its way into almost every aspect of our lives. Depending on the application – or indeed on your point of view – this is anything from a brilliant innovation that saves time, resource and cost, or it’s a terrifying step towards a dystopian, cyborg-dominated future.

The truth is, of course, it’s somewhere in between. But, understandably, as AI creeps into healthcare, people are inclined to err on the side of caution. However, the role of AI (a term I dislike, but that’s for another day) is starting to become clear and there are huge potential benefits to applying certain types of AI to the worlds of pharma, healthcare and life sciences.

One of the most well-established and effective AI tools is Natural Language Processing (NLP). It’s particularly relevant to healthcare. This is a process by which huge quantities of text can be effectively read and understood by computers in a fraction of the time it would take a human to complete the same task. It has thousands of applications, but when you consider that around 80% of healthcare data is unstructured – doctors’ records, clinical notes, and so on, there’s a clear use case.

We’ll look in more detail at some of the applications in later blogs in this series, but I thought it was useful to lift the lid of the black box and share just a little of how NLP works. It’s a technology we’re using a lot at Evaluate so I’m keen to demystify it somewhat.

NLP combines computational linguistics-rule-based modelling of human language-with statistical, machine learning, and deep learning models. There are elements – effectively resolutions or zooms of understanding of the text. We need these different levels because one of the key challenges of NLP is the fact that the meaning of a text of often non well-localised – it’s spread out and so the algorithms need to span all these resolutions from fine-grained words to large documents.

So, what are these resolutions?

Lexical: Analysis of word structure
Syntactic: grammatical analysis of the words (are they nouns, verbs etc.?)
Semantic: Meaning of word in the context of the sentence.
Discourse: Meaning of any sentence depends on sentences around it
Pragmatic: Looking at the entire text to get the meaning from it

For a typical use case, you will be focusing on levels 3 (e.g. data mining), 4 (e.g. generating knowledge graphs) and 5 (e.g. classifying documents). NLP can be applied to any type of text about any topic, from insurance to market research to life sciences. But each instance of the NLP technology needs to be trained to understand that type of content it is dealing with. Why? Well, as humans we know that legal contracts are a whole different language compared to tweets from Twitter, compared to biomedical research. The words are different, the style of language is different, and this means that the models need to learn the vocabulary and statistics of how the words are used in each case. That is a slow step that needs a lot of data, but once that is done, we have a general understanding of the language which we can fine-tune for specific purposes (such as detecting new drug names) with far less data. Single-purpose models can then be created for lots of use cases from the same base model.

In the second part of this blog, we’ll look at the key application of machine learning, including how we at Evaluate are using it to ensure that we are providing the most up to date, accurate insights to our clients.

Geoff Cunningham

Head of Data Science

Related Blogs

Understand the context. Data-driven news and analysis for the pharma, biotech and medtech sectors.

April 24, 2024

Orphan Drug Report 2024: Slowdown a Sign of Success

Hot-off-the-press, Evaluate has just released its latest annual deep dive into the world of orphan drugs and rare diseases. I had the opportunity to preview the content and make the ...

April 9, 2024

US Health Systems: A Driving Force for Digital Health

The US Healthcare system is a vast, unwieldy beast that requires significant navigation by any company in the healthcare space. Whether you’re a Big Pharma, a small biotech or an ...

April 4, 2024

Next Generation Dealmaking with Evaluate and Inpart

Dealmaking is a hallmark of the pharma industry. Whether to access innovative therapies, expand into new areas or fill a gap left from a failure in the clinic, pipelines need ...

March 28, 2024

Meet the Evaluate Team: Markella Kordoyanni

Markella is part of Evaluate’s competitive intelligence (CI) consulting practice, where she works on a wide range of projects to support CI teams in pharma companies to ensure they stay ...

March 27, 2024

Competitive Intelligence Insights: Cell & Gene Therapy

The number of cell and gene therapy (C&GT)-based treatments in development has increased significantly over the last two decades and can be expected to continue, driven by the modality-specific market ...

March 4, 2024

2023 in Digital Health: Four Trends Driving Transformation

2023 is a year that many in the pharma market will be happy to see the back of, and the digital health ecosystem is no exception. Some of this can ...