11 Queen St, Edinburgh EH2 1JQ, United Kingdom
Silicon Supremacy? AI and Chemical Intelligence
Dr. John B. O. Mitchell
Reader, School of Chemistry, University of St. Andrews
With Artificial Intelligence deeply embedded in contemporary life, are we witnessing a fundamental shift in research? Or just new tools to seamlessly integrate into our workflows? Growth in massive many-parameter LLMs in chemistry includes those specifically trained for the field, but also general-purpose models tested for chemistry competence. Impressive claims have been made for LLMs’ chemical applicability. With cheminformatics and QSAR having evolved progressively from simple linear regressions into Chemical Machine Learning, our subject is ideally positioned to pioneer and reflect thoughtfully upon these developing technologies.
In education, an ever-increasing majority of students use AI as a go-to study resource. Is this the endgame for Higher Education and scholarship? Or, like the calculator and internet, can AI be flexibly incorporated into our teaching? Addressing these questions is essential in a world where the carefully considered, well-informed and appropriate use of AI is an essential skill for young researchers and graduates.
Improving Alchemical Binding Free Energy Calculations Using Fully Adaptive Simulated Tempering (FAST)
Justina Ratkevičiūtė
PhD Student, University of Southampton, UK-QSAR Autumn 2025 Poster Prize Winner
Alchemical binding free energy (AFE) calculations are often performed at relatively short timescales, during which many ligand binding modes may not be sufficiently sampled due to high kinetic barriers. As a result, these calculations are often highly dependent on the initial structural choices, and enhanced sampling methods are needed to ease this bias. An additional challenge with AFE calculations is designing an optimal protocol of intermediate states, which can often lead to poor efficiency and a trade-off between sufficient long-timescale sampling and adequate convergence.
In this work we present fully adaptive simulated tempering (FAST) [1–2] – a novel and robust variation of the simulated tempering and sequential Monte Carlo algorithms that calculates both the free energy profile and the optimal interpolation protocol on-the-fly without the need for system-specific knowledge. Alongside improving efficiency in traversing the intermediate states, this method also achieves increased effective decorrelation from the initial coordinates as the entire simulation time is spent on a single, continuous trajectory. As a result, FAST attempts to address both previously described AFE challenges.
This algorithm can be used in hydration and binding free energy calculations, enhanced sampling of binding modes, and any Markov chains combining the two. In this work we will demonstrate how FAST deals with a variety of systems, from simple solutes to more challenging protein-ligand systems such as p38 and HSP90, showcasing its sampling efficiency and quality of free energy estimates in comparison with standard AFE calculations.
References
SpaceHASTEN: Boosting Structure-Based Virtual Screening Efficiency From Millions To Trillions Of Molecules
Dr. Tuomo Kalliokoski
Principal Scientist, Orion Pharma
The sizes of made-on-demand compound libraries such as Enamine REAL have dramatically increased in the few last years. These libraries have grown from hundreds of millions to trillions, and thus new methodologies are direly required for virtual screening of such large chemical spaces. We have developed an open-source software called SpaceHASTEN [1] that enables easy and quick virtual screening of nonenumerated chemical using the standard docking software Glide without the need for supercomputing resources. The algorithm will be described in detail, together with results from both public validation targets and in-house prospective virtual screening campaigns. The software is freely available and can be downloaded from http://github.com/TuomoKalliokoski/SpaceHASTEN.
References
Predicting Protein Ligand Binding with Machine Learning and Alchemistry
Dr. Antonia Mey
Senior Lecturer and Chancellor’s Fellow, University of Edinburgh
Computational tools are essential for identifying lead compounds and predicting both binding affinity and ADMET properties. With recent advances in computing architectures, as well as machine learning algorithms, new ways of exploring these properties at scale are now possible.
While structural insights are provided by docking and co-folding models, such as Chi-1 and Boltz-2, the accurate estimation of binding affinity remains a significant hurdle. Methods ranging from traditional alchemical free energy workflows to modern deep learning models often perform well on retrospective benchmarks but underperform in prospective studies.
I will showcase how to assess affinity prediction models in statistically robust ways on benchmark datasets [1]. Furthermore, I will present how some of these models have performed on different tasks in a blinded prediction challenge on large scale SARS, SARS-CoV-2, and MERS datasets provided by the ASAP consortium and hosted by Polaris [2].
References
Advancing Rational Drug Design with the Isomorphic Labs AI Engine
Dr. Franca Klingler
Research Leader, Isomorphic Labs
This presentation introduces the Isomorphic Labs Drug Design Engine (IsoDDE), which demonstrates excellent performance in predicting complex protein-ligand interactions, particularly in low-similarity regimes where other models often fail.
We showcase IsoDDE’s capabilities through a virtual screening campaign. Our engine accurately identified novel allosteric inhibitors and predicted complex conformational changes with high precision. Furthermore, our AI-directed de novo design consistently delivers progressable hits across diverse modalities, outperforming traditional virtual library screenings. By integrating predictive and generative models, IsoDDE enables the rapid discovery of novel chemical matter for even the most challenging biological targets.
Designing Safe and Sustainable Chemicals: The PINK Project
Dr. Alexe Haywood
Postdoctoral Research Fellow, Department of Cancer and Genomic Sciences, University of Birmingham
The transition to a climate-neutral and circular economy in Europe requires designing chemicals and materials in line with the Safe-and-Sustainable-by-Design (SSbD) framework.1 SSbD balances functionality, cost-eHiciency, safety, and sustainability considerations across a product’s life cycle and value chain. The EU-funded PINK project (https://pink-project.eu/)2 is developing computational approaches to support the design of safe and sustainable chemicals and materials.
Recent advances in generative AI for molecular design, driven largely by applications in drug discovery, have demonstrated strong capability in exploring chemical space and proposing novel compounds. In this talk, we investigate how such methodologies can be repurposed beyond the pharmaceutical domain to identify alternative molecules for industrial chemical applications, where design criteria consider performance, safety, and sustainability.
Alongside model development, a semantic framework is being established to facilitate the documentation and reuse of approaches. The framework will leverage existing ontologies and cross-disciplinary metadata standards to provide a consistent representation of concepts, enabling interoperability between data, models, and workflows. In doing so, it supports adherence to FAIR (Findable, Accessible, Interoperable, and Reusable) principles.
References
Code and Collaboration: Machine Learning to Mitigate Assay Interference in HTS
Dr. Angelo Pugliese
Associate Director of In Silico Discovery, BioAscent
Code and collaboration are reshaping how we solve real world problems in the lab. In this work, we present an AI driven framework that tackles one of HTS’s most persistent challenges: assay interference from PAINS and other problematic chemotypes. Using 153 interference prone compounds screened across 13 buffer formulations, we built a two stage residual stacking model that separates chemical from buffer driven effects, achieving an R² of 0.678 on held out data. Domain constrained Bayesian optimisation then identified experimentally feasible buffer compositions that reduced predicted interference risk to 17.7%, placing more than 82% of compounds within a defined “Safe Zone”. Robustness testing under ISO standard pipetting variability confirmed the stability of these recommendations in practical laboratory conditions. By combining data driven modelling with biochemical constraints, this approach demonstrates how machine learning can rationally engineer assay environments, suppress artefacts, and improve the reliability of high throughput screening.
Enhancing ChEMBL: Integrating Drug and Clinical Candidate Data
Harris Ioannidis
Senior Drug Data Integrator, European Bioinformatics Institute, EMBL-EBI
ChEMBL (www.ebi.ac.uk/chembl) is a manually curated database of bioactive molecules with drug-like properties, compiled from the medicinal chemistry literature, direct data depositions, and including data on approved drugs and clinical candidates. Since its launch in 2009, ChEMBL has become a key resource in drug discovery projects due to the unprecedented free access to large amounts of high-quality, curated data on bioactive molecules.
The systematic inclusion of drugs has become an integral part of ChEMBL’s offering. This presentation highlights the complexity of drug and clinical candidate drug data (“drug data”) curation and explains some of the underlying concepts to help users better understand the nature of the drug data within ChEMBL. Multiple automated processes, including API requests, cronjobs, and others, extract drug data from various sources and ingest them into an internal database (“Drugbase”) before migrating it to the public release of ChEMBL. Well-established pipelines, such as the clinical trials pipeline (introduced for ChEMBL 12, 2011), have been expanded to include manually curated drug data, including the International Nonproprietary Names (INN) source (ChEMBL 32, released in January 2023), and with new pipelines, such as the European Medicines Agency (EMA) source (ChEMBL 34, released in March 2024).
The extensively curated drug data in ChEMBL enable researchers to address key questions in drug discovery and chemical biology, such as identifying potential treatments for neglected diseases, tracking targets and indications through clinical trial progression and leveraging drug indications to test large language models. Key curation areas include drug name, synonym(s), chemical structure or biological sequence, data source(s), drug indication(s), drug mechanism(s) of action, drug warning(s), and drug properties such as maximum development phase, orphan drug designation, and molecule type.
In summary, ChEMBL provides a rich, structured and searchable resource of vast drug data sources accessible via both a relational database and a user-friendly interface, supporting drug discovery research.
References