|
Boosting descriptors for similarity searches: feature trees trained by machine learning
Marcus Gastreich, Sally Ann Hindle, and Christian Lemmen
BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, Germany
and
Jun Liao and Manfred Warmuth
Computer Science Department, University of California Santa Cruz, USA
The FTrees program, being based on the Feature Trees descriptor, is an extremely fast, effective tool for similarity searching. Compounds are described in a topology-preserving way, assigning physico-chemical properties to the tree nodes. Similarities between two compounds are scores for an optimum superposition of compared trees.
Multiple Feature Trees can be 'overlaid' into a so-called model which represents the characteristics of a series of compounds. Moreover, models can store pharmacophore-related information.
Since the constituting parts of models can be assigned weights, it is possible to employ machine learning procedures to adjust them to improve the predictability of the models.
Such a very time efficient post-processing of Feature Trees-based calculations leads to a distinct advantage: Through backmapping, molecular features important for biological activity can be identified.
We report on the advantages of a specially developed machine learning algorithm.
|