April 8, 2019 | Artificial intelligence has the potential to stimulate and streamline drug discovery and development by increasing our understanding of complex biology, guiding drug design, and by assisting other more mundane elements of pharmaceutical R&D and regulatory affairs, says Morten Sogaard, VP and head of Target Sciences at Pfizer.
On behalf of Bio-IT World, Kaitlin Kelleher spoke with Sogaard about AI and drug development. With more than 20 years of pharma industry experience, Sogaard provides insight into the way industry is responding to this influx of AI.
Editor’s note: Kaitlin Kelleher, a Conference Producer at Cambridge Healthtech Institute, is planning tracks dedicated to Pharmaceutical R&D Informatics and AI in Pharma & Biotech at the upcoming Bio-IT World Conference & Expo in Boston, April 16-18. Sogaard will be speaking on the programs, discussing how AI is disrupting drug discovery and development for the better. Their conversation has been edited for length and clarity.
Bio-IT World: Can you speak to some of the hype surrounding AI in drug development?
Morten Sogaard: In some sense, when you ask about the hype, certainly the significant components of AI are not really new. We’ve been doing machine learning and predictive analytics in pharma in the last couple of decades. What is new, I would say, is the explosion in data and some key technological breakthroughs in particular. One example is the famous Google cat and dog face recognition, from back in 2012. That illustrated that when you have a huge amount of data, you can apply automated machine learning algorithms to identify one type of object, such as the cat or dog. Then it becomes much easier to do a tree or a cell or substructures within cells, or whole human bodies. This led to an explosion in how we could assess and analyze image data in an automated fashion.
Where is AI advancing, especially in pharmaceutical research?
There are other mature data types, certainly text and voice and perhaps video. Then you have other data types where we’re not quite at the same level yet, and those are some of the data types that we have applied in pharmaceutical research: chemical structure, annotations, and descriptions. Things like how you handle genetic data, transcriptomic, ‘omic, and more complex physiological data. Those are areas where some work is still needed. What is needed really to move forward on those new data types, the more challenging date types is high-quality data and lots of it.
This is what allowed us to treat the cat image problem; once you’ve done analysis with abundant high-quality data, you can go ahead and actually look at new problems with much less data, and I think the same will be the case for some of these new data types. I think we will start to see things like common genetic variations linked to phenotypic data, and that’s an area where now we’re having millions of individuals with their phenotypic data, and so I think that’s starting to actually mature. I think other areas like transcriptomic data are still a work in progress particularly as we also venture into single cell sequencing.
Where can AI and machine learning be most helpful in drug development?
I would probably put it into three main buckets, based on what you can do in the near term and often what is driving significant return of investment for the pharma company: 1) mature data types, 2) drug design, and 3) biological understanding.
For a company such as Pfizer, I think the focus is bucket 1, the more mature data types, things like automated document analysis and generation e.g. for regulatory filing or safety assessment. We actually have an AI pilot that we are quite excited about. In pharmacovigilance, we have thousands of people manually curating safety reports that come in on Pfizer medicines, and you typically have to put them in three categories: A) it’s not relevant at all; B) we’re not sure—there could be a safety issue, we have to carefully look into it; or C) it seems to be a serious issue, we need to address it aggressively.
The vision and hope is that AI can help us automatically put those reports into one of those three categories. Of course, you have to be very careful because this is a regulated space. You want to make sure you don’t harm patients, so you want to go forward cautiously. But that could probably save us hundreds of millions per year if fully implemented.
We’re also doing quite a lot of work on image analysis, automating pathology images, both from animal studies as well as clinical trials. And increasingly, we are analyzing those images in terms of understanding biological signals, what happens in the context of specific pathways in cells and in tissues; that’s something that we are actually quite heavily engaged in, as I’m sure most other pharmas are.
And what about bucket 2, drug design?
Bucket 2 focuses on drug design, both small molecules and large molecules. How do you predict the binding of a small molecule to its receptor, for example? Predictive modeling and in silico docking has certainly been around for a few decades, so the question here is probably where do we see compelling contribution from AI. Personally, I’m not too sure that we have really seen meaningful differentiation yet, but I think there are a number of promising startups.
I would say some other areas such as automated deconstruction and analysis of synthetic pathways for making small molecules, that’s becoming more mature, and also prediction of stable salt forms, which often can be laborious and sometimes actually a bottleneck for small molecule projects. There’s been quite good progress there.
Large molecules are of interest as well. The issue there is that for these much larger molecules with a higher degree of freedom, we need a lot more data if we really want to address, for example, binding and the structure of antibodies and biologics. I think that will require a bit more progress; on the other hand, there’s been some fantastic progress on ab initio modeling of protein structure and folding from the likes of David Baker at UW, Google DeepMind in the UK, and others. I think that’s some really nice recent work.
Can you give us insight into the third bucket?
The third bucket I think is really the “big kahuna” that will ultimately have the biggest impact in terms of innovation and bottom-line for pharma, but it’s going to be much further out in terms of that impact. This bucket addresses the issue of really understanding the underpinnings of biological mechanisms, both at the ground state and also in the disease state, really understanding disease biology and how a disease develops and progresses. That’s something that’s really critical for us because if we don’t get that right we will, in many cases, develop the wrong medicines. If you look at Phase II attrition, studies from Pfizer and a number of other companies show that it’s really the availability of the mechanistic biomarkers that links the drug target to downstream biological mechanisms; that’s the main driver or predictor for phase II attrition. Human genetic understanding and confidence rationale is another contributing factor. I think this is an absolutely critical area for AI. Still today, probably in the range of 50-60% of Phase II studies fail because of a lack of efficacy, typically because we don’t understand the mechanism as well as we thought we did. That also relates to really understanding how you modulate the drug target, which is typically a protein. What sort of affinity do you need, what sort of PK, etc.?
I think that’s an important area for AI. As I mentioned in large genetic databases, linking common variants to phenotypes has seen great progress, and getting more mature . If you do a large Genome Wide Association Study (GWAS) they seem to be quite stable by this point in time, and there are some interesting and exciting developments obtaining deeper exome or whole genome data sets, such as the UK Biobank; Pfizer together with 5 other companies has participated in a consortium to sequence the exomes of half a million individuals in the UK Biobank.
That should give much deeper information on the genetics and really be a ground-breaking resource. I’d also point to transcriptomics as an area where certainly there’s a lot of exciting developments to link clinical outcomes and physiology to mechanism at the individual cell level. We have a number of collaborations in this space to better understand the effect of drugs but also in disease versus normal comparisons—basically perturbations of different types of immune cells using transcriptomics, and a number of other startup techs out there are doing some quite exciting work for example using knowledge graphs to correlate different pieces of data enabling insights using machine learning algorithms.
How does pharma decide when to pursue these more advanced buckets?
Considering what are mature data types and how does that impact where you want to go in the short term, that’s an important consideration. I think new algorithms develop your ability to innovate there. On the other hand, I’d say there’s a lot of open-source capabilities and tools out there, TensorFlow from Google and others. I think actually the biggest challenge, the biggest need for us is really the high-quality data sets, and then of course any company would need—it doesn’t necessarily have to be a whole army— but need to have a few truly brilliant data scientists in order to execute.
Another thing I should mention is blockchain. Certainly that’s an interesting, exciting development in pharma. t’s useful for things like counterfeiting and supply chain management. I think that’s really critical; it could have promise in other areas such as clinical data management, but I think that’s where it has immediate promise. In bucket number two, I forgot to mention that ten or more years out, technologies like quantum computing could really disrupt how you analyze larger structure and function of both small and large molecules.
It sounds like the biggest barrier to really bringing AI to fruition is quality data sets. Can you talk a little bit about what is the biggest barrier to getting these quality data sets?
It’s usually expensive to generate them. For example, what you’d really like to have is high quality, longitudinal, clinical data. From the regular healthcare system and individuals, I’d probably point to the Nordic countries and the UK as places where you have such data. At the end of the day you would like to have whole genome sequencing, high-throughput protein profiles from the plasma, and imaging and transcriptomic analysis from single cells. For specific disease cohorts, you would like to have single cell seq data to really understand the mechanism better. And not to forget detailed physiological and phenotype measures with increasing use of digital biomarkers.
I think we know how to do this. A lot of it is really—I wouldn’t say engineering, it’s not quite engineering because it’s biology—but I would contend that we basically know how to do it, and the key question is money. I think we are making solid progress. Some is coming from public funding sources, such as NIH and Human Cell Atlas and the initiative that 6 pharmas have joined in the UK Biobank. These should help us, and just that effort is $200M. It’s a good project with the potential for substantial impact because this will also get in the hands of academics that could leverage the data in a very reasonable time frame. Then of course in our clinical trials, ideally we would want to comprehensively profile all patients, and what that comes up against is the same as all investments—there’s always an opportunity cost; to make an investment, there’s something else you can’t do. So for example, if you comprehensively profile all patients in all clinical trials, maybe you’ll have to cut your clinical portfolio by 20% or more. So you have to think about these trade-offs in a really targeted manner, focus on the biomarkers and kinds of measurements where this will really have an impact in making the project more successful or think about selecting the right patient population.
The short answer is it’s really about money. I would contend that for the most part we know how to do it.
What is your prediction for where AI is going and what it will accomplish in the next 5 years?
The internet of things and 5G has the potential to transform our societies with massive amounts of data available in real time processed with the help of AI. It will be very exciting in terms of how data is applied at scale for more operational and transactional purposes. I think digital biomarkers is an area where we still need to see robust uptake for the first digital diagnostics. It’s a very exciting area and mostly will be based on sensor data initially. I think in the biological space we will have a huge explosion in data in the next 5 years: genetics, omics, deep phenotyping. We should be in a much better place for understanding human biology and disease, and this might be the biggest transformational change.