Project

project picture

Summer 2024 challenge: winner

Assessing the Sociability of Commercial Fragment Libraries

Philipp Janssen, University of Muenster - Institute of Pharmaceutical and Medicinal Chemistry, Münster, Germany

Fragment-based drug discovery (FBDD) has become a staple and invaluable tool for probing targets and generating hits. The foundation of all fragment-to-lead projects rests on the underlying libraries. While many efforts have been made to optimise these libraries for diversity, chemical space coverage, and their properties, their developability has often been overlooked. The progression of a fragment, which is often merely a weak binder, is frequently hindered by synthetic challenges but should ideally not require excessive time and resources. Instead, one desires a “sociable” fragment with numerous commercial analogues available, allowing all growth vectors to be accessed with different substituents. Many publications focus on the diversity, purpose, or properties of fragment libraries, yet currently, none comprehensively assess their developability. Consequently, we analysed all available commercial and academic libraries, which contain over half a million unique fragments and more than 4.5 million growth vectors. infiniSee allowed us to use the novel, ultra-large chemical spaces as a framework for what is synthetically feasible and accessible. Although they cannot represent everything that is possible, they comprise typical medicinal chemistry reactions and available building blocks, thereby reflecting the chemical space of early-stage fragment development. However, the results paint a grim picture. The vast majority of available fragments appear unsociable, posing a risk of impeding their progression towards a potent lead. This underscores the necessity for improved libraries, considering follow-up opportunities, and even better chemistries with broad applicability.
After 1 year, Philipp has achieved the following goals:
  1. Our initial aim was to gather all readily available fragment libraries and enumerate all potential growth vectors. In pursuit of this objective, we identified 97 distinct libraries from 19 different sources (including our own). These encompass 1,037,159 fragments and, following deduplication, 561,156 unique structures. Ninety per cent of them have one or zero “Rule of Three” violations; nonetheless, all compounds were included in the analysis, as their parent library was designated as a fragment library by the distributor. The enumeration of all possible attachment points was conducted using RDKit and resulted in 4,608,626 unique growth vectors, which served as input for infiniSee.
  2. The second goal was to identify all possible substituents for each growth vector to evaluate their accessibility. Therefore, all seven available ultra-large spaces (AMBrosia, CHEMryia, eXplore, FreedomSpace, GalaXi, KnowledgeSpace, REALSpace) were utilised as a proxy for the readily accessible and synthesisable chemical space. infiniSee enabled the search of each space for each vector using the recently added “R-group” mode. The maximum number of results per vector was capped at 100, in order to expedite the process. A vector with at least 100 substituents is certainly accessible. Even though infiniSee is remarkably fast and easy to parallelise, the total search time still amounted to several months due to the significant scope of the input. In total, across all spaces, 133,018,716 matches were found.
  3. The final step was to analyse the actual sociability of the fragments and their libraries. For 59% of the vectors, at least one substituent was found in any of the spaces, and 13.6% had the maximum number of 100 possible substituents. However, when we concatenate these results back to the parent fragment level, the outcomes appear less promising. Fragments where 80% of the vectors had 5 matches, and all vectors had at least one match in any of the spaces, were deemed sociable. If we examine the libraries regarding their composition of sociable fragments, the situation remains the same. Virtually all libraries are predominantly composed of unsociable fragments.