What is modern in silico drug discovery?
Why use AI in drug discovery?
How is AI being used in drug discovery and development?
The future of AI drug discovery
AI drug discovery with LENSai Integrated Intelligence Platform

AI in Drug Discovery

Artificial Intelligence (AI) technologies are currently one of the prime movers in the global pharmaceutical industry’s transition towards a data-centric approach to computational drug discovery. These technologies are already making significant contributions to drug discovery across all stages of the drug discovery pipeline and helping accelerate development while reducing costs and increasing the likelihood of successful outcomes.

The focus of AI in drug discovery has also expanded beyond small molecules modalities to large molecule therapeutics or biologics such as antibodies, and RNA-based therapies.  As technology continues to evolve, AI technologies, complemented by experimental approaches, have become a central component of early stage in silico drug discovery.

In this article, we will take a closer look at some potential applications of AI in modern in silico drug discovery. 

 

1. What is modern in silico drug discovery?

Modern in silico drug discovery is powered by a combination of conventional Computer-Aided Drug Design (CADD) techniques, the predictive capabilities of AI technologies, and knowledge derived from experimental research. This blended approach to early-stage drug discovery enables researchers to harness the power of in silico models to reduce the cost and time required to prioritize the most promising candidates that can then be validated through traditional experimental (wet lab) approaches.

 

2. Why use AI in drug discovery?

A 2023 report on the potential of AI in drug discovery presented a high-level value model to assess the comparative impact of AI technologies across key phases (target identification and validation, target to hit, hit to lead, lead optimization, and preclinical) of the small molecule discovery value chain vis-a-vis typical experimental drug discovery. The model, which assessed the potential impact on three scenarios (new molecule discovery for poorly understood targets, well-understood targets, and repurposing existing molecules for new targets) revealed significant time, cost, and performance improvements across all phases of early-stage drug discovery. 

In the target identification and validation phase, AI is designed to facilitate quicker and more precise elucidation of structure/function relationships to support hypothesis generation, enable systematic prioritization of targets for validation, and assist in uncovering potential disease-target relationships. Specifically for small molecules, during the target-to-hit stage, AI-driven in silico models significantly expand the potential chemical space available for exploration, thereby increasing the chances of discovering targets of interest and accelerating the discovery of novel target-molecule relationships. In the hit-to-lead and lead optimization phase, the predictive capabilities of AI enable the more accurate forecasting of compound properties and help reduce the number of design-make-test cycles required to find and optimize leads. Finally, preclinical AI applications include more data-driven assessments of toxicity, pharmacokinetics, pharmacodynamics, etc. that facilitate the more effective prioritization of compounds for testing and deliver significant time and cost efficiencies.

 

Core challenges in drug discovery

 Core challenges in drug discovery

 

AI-driven approaches are also helping address several pain points in early stage antibody design and optimization. For instance, some of the key challenges include the identification of relevant/superior epitopes, the selection of the most apposite candidate across experimental outputs/human repertoires, and characterization of epitope/paratope interactions.

The main advantages of these new AI models are their ability to find patterns in their training data, to put these patterns into an abstract mathematical representation, and to find optimal solutions through “simple” algebraic operations, much quicker than an (expert) human could, or a machine could through ab initio model (such as molecular dynamics), and with the benefit of being fairly automated.

On one hand, antibody-specific language models are becoming critical to cope with the vast amounts of data generated by next-generation antibody sequencing technologies and are powering a wide range of applications including de novo sequence generation, prediction of properties such as binding sites, humanness, thermostability etc. Hence, a language model trained specifically for inferring thermostability can produce results in a few minutes for a large number of sequences. These models are typically used at the early stage of the discovery funnel.

On the other hand, significant progress has been made by introducing deep learning models trained on extensive protein (and antibody) structure data sets for tasks including prediction of antibody CDR domain and H3 loops, prediction of epitope (linear and conformational), prediction of paratope and prediction of complex conformations. For instance, deep learning models such as AlphaFold, IgFold or ABodyBuilder supercedes traditional homology modelling methods for variable domain structural modelling, bringing in silico prediction of antibody structure closer to the accuracy of experimental methods such as x-ray diffraction spectroscopy. This improvement also translates to increased accurate predictions of specific antibody/antigen complex conformations, which are critical for modelling and engineering their interactions at the atomic level. AI models for the prediction of the bound conformation of antibody-antigen  are typically more demanding in term of resources, and large-scale virtual screening (as performed for small molecules) cannot be done in this case. Hence, these AI models are more useful downstream the discovery funnel, for instance when optimizing a lead candidate drug.

Deep generative architectures, integrating geometric graph neural networks (GNNs) with large-scale protein language models, combine both sequence and structure information and demonstrate the potential to significantly advance antibody sequence–structure co-design and enable researchers to simultaneously optimize the amino acid sequence and the 3D structure of antibodies to design antibodies with enhanced binding affinity, stability and specificity.  

Finally, AI tools are helping accelerate and de-risk the whole antibody development process by creating seamless in silico workflows to design antibody sequences that have been humanized and optimized for affinity and developability.

AI in drug discovery, however, is not a monolithic system but rather a confluence of technologies that empower in silico drug development. The concept of AI itself is continuously evolving based on the emergence of a range of technological concepts such as machine learning (ML), deep learning (DL), artificial neural networks (ANN), reinforcement learning (RL), deep reinforcement learning (DRL), natural language processing (NLP), large language models (LLMs), etc.

 

Read: AI, ML, DL, and NLP: An Overview

 

In recent years, there has been a lot of attention on AI-powered technologies like NLP to unlock the value embedded in vast volumes of unstructured textual data in the life sciences industry. The key approaches to NLP in drug discovery can further be broadly classified as rules-based, ML-based, and hybrid approaches.

 

Read: A Hybrid Approach to NLP in Drug Discovery
Read: NLP, NLU & NLG : What is the difference?
Read: AI, NLP and the ROI of drug development
 

Currently, Everyone’s Talking About LLMs and Generative AI, with the focus squarely on broadening access to these transformative technologies by integrating these technologies into bioinformatics platforms, pipelines, and workflows.

And finally, no conversation about data-centric AI applications will be complete without mention of the must-know innovation of Knowledge Graphs (KGs). Knowledge graphs are at the epicenter of AI-driven drug discovery for their ability to integrate all life sciences data, structured and unstructured, required to build the ML models that drive decision-making and the power of context to augment AI/ML approaches. The NLP-based semantic data integration and representation capabilities of these graph models, integrated with the scale of LLMs, will form the foundation for next-generation drug discovery

 

3. How is AI being used in drug discovery and development?

Accelerated drug design

Back in 2019, an article in Nature Biotechnology detailed a new deep generative model, called generative tensorial reinforcement learning (GENTRL), that was able to discover six potent inhibitors of DDR1, a kinase target implicated in fibrosis and other diseases, in just 21 days. Of the six compounds identified, four were active in biochemical assays, two were validated in cell-based assays and one lead candidate demonstrated favorable pharmacokinetics in mice.

Earlier this year, a team of researchers at MIT unveiled the DIFFDOCK, a diffusion generative model that reframed molecular docking as a generative modeling problem rather than a regression problem. This new approach has the potential to accelerate the development of new drugs with its ability to accurately dock on computationally folded structures with higher precision and faster inference times. The capabilities of generative diffusion models are also being integrated into antibody design, docking, and optimization pipelines to enhance antibody functionality and to enable antibody sequence–structure co-design with a focus on favorable developability attributes.

Today, DL methods have become a central component of many areas of drug discovery, including molecule generation, molecular property prediction, retrosynthesis, and reaction prediction, and have been key to accelerating the time-consuming and costly process of drug discovery. Though predominantly focused on ligand-based approaches, these techniques are now being applied to accelerate structure-based drug discovery by addressing key challenges such as polypharmacology by design, selectivity optimization, activity cliff prediction, and target deorphanization.

Generative AI frameworks, which include DL algorithms, have helped expand the data analysis, pattern identification and prediction capabilities of classic AI systems to a new paradigm of leveraging a variety of inputs, including text, images, audio, video, 3D models, etc., to create brand new outputs. The multimodal capabilities of generative AI to scale across diverse data types opens up several new opportunities in early stage research and discovery. Deep generative models, with their ability to generate novel chemical and biological structures with desired properties, are expanding the horizon for de novo drug design in small as well as large molecules. Generative AI applications in de novo drug design include target-agnostic/target-aware molecule design, molecular conformation generation, protein and antibody representation learning, protein structure prediction, and the de novo design of novel proteins with desired functionalities.

Despite the growing adoption of these technologies in drug discovery, there are still several challenges, such as requirements for large, high-quality training datasets, hallucinations, and ethical and regulatory considerations, that have still to be addressed. 

 

Antibody Discovery and Development

Antibody Discovery and Development with LENSai Integrated Intelligence Platform

 

Prediction of drug bioactivity

AI plays a crucial role in predicting drug bioactivity, the biological effect or response of a drug on a specific target or pathway. This is a critical stage in drug discovery to identify potential candidates with therapeutic effects and understand their interactions with biological systems.

AI techniques help address several of the time and cost-related challenges of classical approaches to predicting bioactivity and can accurately and efficiently predict the biological activity of both small and large molecules.

There are distinct approaches to AI-driven drug bioactivity prediction. At a foundational level, the ability of AI technologies to integrate diverse multi-omics data for a holistic molecular understanding of biology can significantly enhance bioactivity prediction. AI techniques have also paved the way for integrating chemoinformatics and bioinformatics data, chemical databases, biological assays, and clinical data, for a comprehensive understanding of drug-target interactions and changes in chemical structure may impact bioactivity.

Deep learning techniques, such as graph neural networks, can be trained on diverse datasets, including chemical structures and biological activity profiles to learn patterns and correlations to predict bioactivity for new compounds. Machine learning paradigms such as reinforcement learning (RL) and deep reinforcement learning (DRL) have also been used to explore the chemical space to identify optimal compounds in terms of pharmacokinetic properties, and bioactivity.

Though DL models outperform traditional ML approaches in property prediction tasks, their lack of transparency in the decision-making process has now turned the focus over to self-interpretable explainable AI (XAI) models.

 

Effective risk assessment: immunogenicity and developability

There are three broad computational approaches to addressing developability and immunogenicity risks in early-stage antibody drug development. The classical approach relies on standard sequence and structure-based drug design tools to re-engineer antibody hits derived from traditional hit discovery frameworks. The contemporary approach combines next-generation sequencing (NGS) and machine learning (ML) to engineer antibody libraries and minimize developability liabilities. The emerging approach leverages advanced DL and AI frameworks for protein structure prediction and de novo computational design to generate candidates that meet multiple criteria for potency and developability.

ML-based computational tools are increasingly being used in antibody drug discovery to predict the developability and immunogenicity of antibody candidates based on either physicochemical properties or antibody sequences or structures. Both immunogenicity and developability screening requires a blend of in silico and in vitro assays that are holistically predictive, with the process starting with in silico assessments and progressing to in vitro/ex vivo assays as required.

In recent years the availability of a critical mass of immunogenicity-related preclinical and clinical data has opened up the potential for the application of AI/ML technologies to learn from data even in the absence of hypotheses to test. AI-driven methods are also emerging as first-tier screening tools to profile small molecules and provide liability estimations.

 

Efficacy & toxicity prediction and optimization

An analysis of clinical trial data from 2010 to 2017 revealed that the two most common reasons for new drug failures were lack of clinical efficacy (40–50%) and unmanageable toxicity (30%). Therefore, the early and accurate prediction of toxicity and efficacy, which depend predominantly on pharmacokinetic and pharmacodynamic parameters, is critical to the delivery of safe and effective drugs and to enhance the success rate of drug development.

Currently, ML models are widely used to predict drug efficacy and toxicity both as assessment frameworks, where potential efficacy or toxicity has to be predicted for a predefined therapeutic entity, or as drug design frameworks, where the models generate potentially safe and effective therapeutics for a particular disease state.

AI algorithms capable of analyzing vast volumes of chemical and biological data are used to design and optimize drug candidates based on efficacy, toxicity, and pharmacokinetic predictions. Generative AI models, based on advancements in DL techniques such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), are now being used to produce new chemical structures with properties similar to the training set and generate novel compounds with desirable properties, such as low toxicity. DL-enabled natural language processing is driving the automatic identification of directional pharmacokinetic drug–drug interactions from ever-increasing volumes of biomedical literature.

 

Multi-drug synergy prediction

Combination therapies, which involve administering several separate drugs or several drugs combined into a single medication, enhance overall treatment benefits, improve patient outcomes, reduce drug resistance, and increase the success rate of repositioning/repurposing drugs for diseases that lack effective treatments.  In complex diseases like cancer, a synergistic amalgamation of anti-cancer drugs has become the cornerstone of effective therapy. 

However, identifying novel synergistic combinations remained a long-standing challenge due to the sheer size of the combinatorial space and the exponential number of possible chemical combinations. 

In recent years, there has been growing research interest in computational approaches to in silico screening of potential drug synergies. DL techniques demonstrate a significant advantage over standard machine learning models in synergy prediction.

DeepSynergy was one of the first DL models to apply deep neural networks (DNNs) to process chemical and genomic information and model drug synergies. This was quickly followed by a slew of progressive DL frameworks that have progressively advanced in silico synergy prediction. These include AuDNNsynergy (integrating multi-omics and chemical data), SYNDEEP (combining physicochemical, genomic, protein–protein interaction, and protein-metabolite interaction information), DeepTraSynergy (including drug–target interaction, protein–protein interaction, and cell–target interaction), and MatchMaker (based on chemical structure information and gene expression profiles of cell lines).

The focus of the research into computational drug synergy analysis is now expanding beyond predicting pairwise combinations to identifying complex higher-order combinations that are more effective for complex and precision applications. DeepMDS, a DL-based model multi-drug synergy prediction model leverages a large-scale dataset integrated by target information, drug response data, and large-scale genomic profile of cancer cell lines from varied tissues. This new approach is capable of DeepMDS accurately predicting and ranking the most potent synergistic three or more drug combinations against a specific cell line or subtype of interest. 

 

De novo drug design & synthesis pathway generation

De novo design, the approach to generate new chemical entities with desired biological activities and properties, was among the Massachusetts Institute of Technology’s top ten breakthrough technologies in 2020. This approach represents a radical change from the traditional drug discovery model of screening of large libraries of candidate molecules.

Advances in AI technologies are powering this new paradigm to design entirely new molecules or to modify existing molecules to optimize their desired properties. Generative AI models, such as variational autoencoders (VAEs) or generative adversarial networks (GANs), are capable of generating novel molecular structures that conform to multiple potency, solubility, toxicity, and synthesizability criteria. There are already several real-world examples of compounds generated by fully autonomous or semi-autonomous de novo computational design.

Generative AI is also revolutionizing synthesis planning by automating the generation of synthetic routes toward desired target molecules and optimizing potential synthesis pathways. This computational approach is capable of generating novel synthetic routes, even for complex molecules, with a high degree of accuracy and in a fraction of the time typically required.

More importantly, these technologies address several specific challenges and bottlenecks in small molecule and antibody drug discovery.

In the case of the small molecule discovery pipeline, some of the key limitations include the lack of access to critical data sources, like ADMET, for example, the inability to scale across the vast chemical space, and the prolonged design-make-test cycles. The application of AI to small molecule discovery expands the scope of identifying compounds beyond the screening of existing chemical libraries to include generative de novo design and facilitates the more efficient identification of hit- or lead-like molecules that are optimized for favorable properties.

In the case of antibody drug discovery, there are several challenges in identifying superior epitopes, selecting the appropriate candidates from experimental outputs/human repertoires, designing and optimizing novel antibodies, and the extended lead times to improve antibody preclinical properties. Here again, AI technologies have a range of benefits including more efficient screening of pre-existing libraries, the integration of de novo design capabilities, and in the identification and subsequent optimization of antibody structures and formats for desired properties.

So, AI technologies have a transformative impact across the drug discovery pipeline, and the potential to completely revolutionize in silico drug discovery and development.

 

4. The future of AI drug discovery

As mentioned, AI in drug discovery is not so much about a monolithic system as it is about a diverse array of technologies, each with a specific and strategic role to play in end-to-end in silico drug discovery.

An AI-powered data integration and management architecture will enable the seamless and automated integration of large volumes of high-quality, well-governed data from disparate and distributed data sources.  A metadata-driven semantic knowledge graph will intelligently integrate and curate all data, including incoming data, into a unified and contextualized framework that is both machine and human-readable. The integration of AI-enabled LLMs to these knowledge graphs will help harness their potential across a range of drug discovery tasks. Techniques, such as Retrieval Augmented Generation (RAG), will be central to ensuring biomedical-research-grade LLM performance in terms of reasoning, accuracy, and knowledge recall. The future of in silico drug discovery will be powered by the fluent integration of these diverse technologies into one unified, scalable, end-to-end life sciences research platform.

 

Fully-integrated therapeutic end-to-end lead generation workflow

Fully-integrated therapeutic end-to-end lead generation workflow with LENSai

 

5. AI drug discovery with LENSai Integrated Intelligence Platform

The LENSai Integrated Intelligence Platform offers a streamlined approach to antibody discovery, integrating advanced analytics and data handling to support informed decision-making and efficient research processes. With capabilities such as multi-omics data integration, continuous high-throughput learning, and AI-driven solutions, the platform is designed to optimize timelines and enhance research precision. Flexible options, including fee-for-service, SaaS, scalable API access, and strategic partnerships, ensure tailored solutions for various research needs. In addition, the LENSai platform provides the tools necessary for effective and insightful antibody discovery and development.

Reach out to our team to learn how we can support your research and accelerate your discovery journey.

 

LENSai | Foundation AI Model for multiscale biological data integration

LENSai | Foundation AI Model for multiscale biological data integration

 

 

Register for future communications: