Project

project picture

Summer 2024 challenge: phase 2 contestant

Assessing the Sociability of Commercial Fragment Libraries

Philipp Janssen, University of Muenster - Institute of Pharmaceutical and Medicinal Chemistry, Münster, Germany

The first step was assembling as many commercially available fragment libraries as possible. This led to 94 libraries of different sizes and types (covalent, NP-like, halogen, etc). These are comprised of more than a million fragments. After preparation and sanitization, these resulted in more than 500,000 unique fragments. To analyze their sociability, we needed to enumerate each and every possible position where substituents can be introduced, leading to more than 4.5 million growth vectors. Currently, we search all seven available chemical spaces to see which positions are accessible. The first three spaces are already done, with the remaining ones expected to take no more than two months. Afterwards, we will start the in-depth analysis to see how sociable the libraries actually are. Furthermore, with such a large and comprehensive dataset, we also plan to analyze them further regarding their property distributions and uniqueness.
After 3 months, Philipp has achieved the following milestones:
  1. Ninety-four libraries were collected, containing more than one million fragments, more than half a million unique fragments, and more than 4.5 million growth vectors. Growth-vector enumeration was done only in positions where valency allowed it.
  2. The new feature of SpaceMACS enables us to search for compounds substituted in only one specified position. This allows us to analyse the chemical space around the fragments spatially. To speed up the process, we set a maximum of a hundred substituents per vector. In terms of sociability, it does not matter whether a position allows for 100 or more modifications. Even though SpaceMACS is incredibly fast, due to the large input, the searches still take some time.
  3. Once we have all the SpaceMACS results, we can assess the sociability. Generally, we propose a molecule to be sociable if, at least, all but one vector has more than ten substituents available. It would also be interesting to find highly sociable molecules where all vectors have at least 100 substituents. Based on this, we can see if some libraries are more sociable than others and find compounds that might need to be replaced. Also, based on the comprehensive dataset we now have, we plan to analyse the libraries further. Of interest will be here the distribution of physicochemical properties and their similarity.