This website uses cookies so that we can provide the best service possible. For more information, please visit our privacy policy.

sign in

FTrees: the science of similarity searching

how it works

FTrees represents a molecule as a tree structure (illustrated below). Such a tree is composed of nodes (representing functional groups) connected such as to represent the overall topology. Each node carries a profile of the physics-chemical properties of the sub structure that it represents. The following attributes are captured:

(mouse over!)
Feature Tree
  • spatial volume
  • ring or not
  • pharmacophore profile:
    • donor
    • acceptor
    • amide-like
    • aromatic
    • hydrophobic

Two trees are then aligned with each other, similar to a sequence alignment (e.g. there can be gaps, as shown in the example for two PAF antagonists below; these have no coloring around them). The alignment provides a mapping of corresponding nodes on either tree (as illustrated by the same coloring of corresponding substructures). Mapped nodes are compared based on their property profiles resulting in a "Local Similarity". The "Global Similarity" (0.805 below) is essentially an overall average, it can be used to categorize multiple molecules as to how similar they are, where 0 is dissimilar, and 1 is identical.

The coloring of the substructures helps the user to identify which substructure of the query is matched onto which one of the hit molecule. The alignment (or mapping) found by FTrees is among all the one with the highest "Global Similarity".

scaffold hopping

The primary use case for FTrees is: "Find me a molecule with similar properties, but a different scaffold". For this purpose large libraries are virtually screened and the top-x compounds taken to be looked at. As you go down such a hit-list, molecules appear with less and less structural similarity. Typically in the range between 0.7 to 0.9 you'll find the most interesting results — similar enough in terms of pharmacophore properties but highly enriched with scaffold hops.
The example on the right shows a screen of the WDI for D4 Antagonists[1]. The query molecule resulted in hit molecules, of which 3 are shown exemplarily. Two observations should be noted here: First, the orange lines indicate the ranks of active molecules, which shows that sorting by FTrees -similarity provides an enrichment of actives towards the top end of the list. Second, comparing the query with the molecules on ranks 1, 3, and 22, show that the scaffolds of these molecules looks increasingly different, with a clear scaffold hop for the compound on rank 22 and a similarity of 0.899.


FTrees have been shown to be orthogonal to other 2D descriptors. This means that one method likely picks up what other methods wouldn't, and vice versa. The example below is taken from a publication by Pfizer[2], in which the authors used Tropisetron to search a fragment space (cp CoLibri). The results show that these his would have never been picked up based on Daylight fingerprints ("DY") and ECFPs ("PP"), with scores of 0.5 and lower. However, FTrees-similarities ("score") are all 0.9 and higher.

chemical intuition

All the examples above show how FTrees' chemical intuition works. By looking at the molecules "fuzzily", it sees them more like a target as pharmacophore features in a certain topological arrangement, rather than structurally. By aligning them, the chemist can see directly which areas of two molecule correspond to each other, which helps getting and early understanding ot the SAR. Datasets comprising millions of molecules can be processed easily, so that the hit-sets can quickly be boiled down to a manageable size.

[1] Data from:
M.A. Sanner
Selective dopamine D4 receptor antagonists.
Exp. Opin. Ther. Patents, 1998, 8:4, pp 383-393
[2] M. Böhm, T.-Y. Wu, H. Claußen, and C. Lemmen
Similarity Searching and Scaffold Hopping in Synthetically Accessible Combinatorial Chemistry Spaces.
J. Med. Chem., 2008, 51 (8), pp 2468-2480