Chemical Space Overlap: Why We Need More and Larger Chemical Spaces


Chemical Space Overlap: Why We Need More and Larger Chemical Spaces

February 7, 2022 14:10 CET

Chemical Space overlap describes the set of molecules that can be found in two or more individual compound collections. We at BioSolveIT believe in growing accessible compound collection numbers, so we avoid enumeration (= counting molecules up, touching every individual virtual compound in the computer) whenever we can; rather we use combinatorial methods to conquer the explosion of possibilities. That said, overlap assessment is the most obvious and straightforward way to assess the uniqueness and characteristics of one Chemical Space compared to another.

In general, the contents of a Chemical Space are shaped by three factors that also determine its chemical composition: the building blocks, the synthesis protocols, and the in-house knowledge. Commercial chemical compound makers impress us all with huge numbers of their unique building blocks, and that’s often perceived as an indicator for the achievable chemical diversity. Yet, it is the synergy with the makers’ chemistry that creates the required distinctiveness of a combinatorially built-up Chemical Space. An indole is available from multiple sources – but it is what can be done with it thereafter and how that spices up the possibilities.

The question remains: How many molecules are shared across commercial and public Chemical Spaces? Do they each contain chemical novelty, or are they more or less the same? Well, in any case, how much chemical information would they share? The thing is: Combinatorial Chemical Spaces are too large to be compared on a one-by-one molecule basis. Trying to count them would touch practicability boundaries if one wanted to compare n molecules of Space A with m molecules of Space B. But remedy – even “green” remedy – has made its way to this seemingly unsolvable question, the answer to which is of such high importance:

In a recent publication Louis Bellmann et al. created an application (“SpaceCompare”) that avoids explicit enumeration. The tool does not need a supercomputer, is fast enough, and can be swiftly applied to the big ChemicalSpaces out there. With SpaceCompare, Louis and team investigated and compared the composition of the three commercial Chemical Spaces (REAL Space from Enamine, CHEMriya from OTAVA, and Galaxi® from WuXi) and the virtual Knowledge Space that is based on publicly available building blocks and reactions. Their findings were, to put it mildly, surprising: The molecular overlap between the investigated combinatorial billions was almost negligible (see figure below). In most cases <<1% of compounds were present in two Spaces, meaning that over 99% of all molecules(!) were exclusive to one compound maker’s Space. Let this sink in.

Chemical Space overlap comparison between commercial and virtual Chemical Spaces Chemical Space overlap between commercial Chemical Spaces and the virtual Knowledge Space. Figure modified after Bellmann et al.

In summary, it becomes evident that every Chemical Space has its own raison d’être. Their underlying resources (building blocks, reagents) and the in-house knowledge (synthesis protocols, chemistry) result in very unique content. Every drug hunter should consider all available options for their project, since the best solution for a particular challenge in the project may well be contained in only one of the Spaces.


Did this get you excited about exploring the individual Chemical Spaces to discover accessible molecules? Download the Chemical Space Navigation Platform infiniSee now and dive into the billions of possibilities!