Combinatorial vs. Enumerated

Combinatorial vs. Enumerated
Explained with Potatoes

This is a potato. And this potato contains 13.5 grams of carbohydrates.
In this example we will use this potato as a data storage unit to compare "combinatorial" and "enumerated" methods of compound handling.
For those interested: The variety of this potato is 'Sagitta'. The standard tuber has an average volume of 85.84 cm³ and weights 91 grams.

Enumeration — Counting every molecule

Let's start with enumeration first, that is, we count up molecules and store them. Let's say, grams are megabytes: Given our potato with 13.5 grams of carbohydrates, this would represent a data storage volume of 13.5 MB. With 13.5 MB it is possible to count up ("enumerate") 201,600 drug-like molecules as SMILES and their IDs.
The number decreases if further properties like 3D coordinates, physicochemical properties or annotations are included. Including those to store, 13.5 MB would be just enough for only 2,711 molecules based on a calculation with a molecule set of 2,450 entries and the size of 12.2 MB. Most certainly, this number would massively shrink for structure-based methods (e.g. docking) when the number of generated poses to be stored increases.

Combinatorics — Create solutions by providing rules

Now let's look at the combinatorial approach to store molecules: We can fit a Chemical Space containing 290,000,000,000,000 (2.9 x 1014) molecules into 13.5 MB; this very space is also known as the Knowledge Space.
How is this possible? The Knowledge Space has been created using a diverse set of molecular building blocks and robust chemical reactions that can be performed in most synthesis labs by almost everyone with a background in chemistry. So instead of storing all the compounds, we store the recipe to make them. Searching this space with BioSolveIT applications can yield every possible entry of the 2.9 x 1014 molecules as a hit — by applying the reaction rules how to combine one building block with another.
It is possible to create your own Chemical Space with one of BioSolveIT's Amazing Workflows.

Size comparison: Enumerated vs. Combinatorial

Given our premise from above, one potato being equivalent to 13.5 MB of storage, one would need billions, namely ~19,400,000,000 potatoes (≙ 274.9 petabyte of data), to enumerate 2.9 x 1014 molecules. That's the volume of the Eiffel Tower!
The Eiffel Tower volume equivalent can differ depending on the variety of the potato used. In this case we calculated with the above mentioned 'Sagitta' and a volume of 85.84 cm³. The volume of the Eiffel Tower was assumed to be 1.7 x 106 m³ for this calculation.

The Magics of the Combinatorial Approach

But how do so many molecules fit inside such a Chemical Space?
The compounds are combinatorically generated during the search applying the given chemical reaction rules on the featured building blocks; Only what is important gets created and is subsequently enumerated.
The underlying concept is the FTrees (Feature Trees) algorithm. Starting from a building block (the symbol elements to the right) the algorithm takes only routes that create compounds that fulfill desired parameters. Routes that lead to compounds that cannot match the query are not even looked at.

Focus on what is important

Every result out of the 2.9 x 1014 molecule entries may be found a priori. After the best candidates have been discovered by the algorithm only these will be enumerated and shown. The rest of possible building block combinations remains only a virtual possibility. Thus, data resources are invested into only a fraction of the molecules allowing efficient exploration of the Chemical Space.
Combinatorial methods can therefore break the limits of structure-based approaches or fuel those with structures that will likely be of great interest for hit enrichment. The possibility to focus only on promising compound candidates enhances the likelihood to find something that is active.

Stop counting potatoes.
Start thinking in Chemical Spaces!