Demystifying AI: Natural Language Processing (Part One)

Machine learning and artificial intelligence are increasingly-used buzzwords in the pharma and biotech space. But what's hiding in that black box?

Artificial intelligence (AI) seems to have found its way into almost every aspect of our lives. Depending on the application – or indeed on your point of view – this is anything from a brilliant innovation that saves time, resource and cost, or it’s a terrifying step towards a dystopian, cyborg-dominated future. 

The truth is, of course, it’s somewhere in between. But, understandably, as AI creeps into healthcare, people are inclined to err on the side of caution. However, the role of AI (a term I dislike, but that’s for another day) is starting to become clear and there are huge potential benefits to applying certain types of AI to the worlds of pharma, healthcare and life sciences.

One of the most well-established and effective AI tools is Natural Language Processing (NLP). It’s particularly relevant to healthcare. This is a process by which huge quantities of text can be effectively read and understood by computers in a fraction of the time it would take a human to complete the same task. It has thousands of applications, but when you consider that around 80% of healthcare data is unstructured – doctors’ records, clinical notes, and so on, there’s a clear use case.

We’ll look in more detail at some of the applications in later blogs in this series, but I thought it was useful to lift the lid of the black box and share just a little of how NLP works. It’s a technology we’re using a lot at Evaluate so I’m keen to demystify it somewhat.

NLP combines computational linguistics—rule-based modelling of human language—with statistical, machine learning, and deep learning models. There are elements – effectively resolutions or zooms of understanding of the text. We need these different levels because one of the key challenges of NLP is the fact that the meaning of a text of often non well-localised – it’s spread out and so the algorithms need to span all these resolutions from fine-grained words to large documents. 

So, what are these resolutions?

  1. Lexical: Analysis of word structure 
  2. Syntactic: grammatical analysis of the words (are they nouns, verbs etc.?)
  3. Semantic: Meaning of word in the context of the sentence. 
  4. Discourse: Meaning of any sentence depends on sentences around it
  5. Pragmatic: Looking at the entire text to get the meaning from it

For a typical use cases, you will be focusing on levels 3 (e.g. data mining), 4 (e.g. generating knowledge graphs) and 5 (e.g. classifying documents). NLP can be applied to any type of text about any topic, from insurance to market research to life sciences. But each instance of the NLP technology needs to be trained to understand that type of content it is dealing with. Why? Well, as humans we know that legal contracts are a whole different language compared to tweets from Twitter, compared to biomedical research. The words are different, the style of language is different, and this means that the models need to learn the vocabulary and statistics of how the words are used in each case. That is a slow step that needs a lot of data, but once that is done, we have a general understanding of the language which we can fine-tune for specific purposes (such as detecting new drug names) with far less data. Single-purpose models can then be created for lots of use cases from the same base model.

In the second part of this blog, we’ll look at the key application of machine learning, including how we at Evaluate are using it to ensure that we are providing the most up to date, accurate insights to our clients. 

Industry trends

Share This Article