You are on page 1of 1

Molecular Similarity

Based on: Chapter 1 of Foodinformatics, by Martinez-Mayorga K. & Medina-Franco, J.L.


and ”Molecular Similarity in Medicinal Chemistry”by Maggiora, G. et al.
Brandon Meza-González

September 25, 2018

Although molecular similarity might be a subjective concept, there are different attempts to quantify the
similarity of compounds through their properties which can be reactivity, electronic or structural features.
Molecular similarity is an important concept in chemoinformatics since its application on medicinal chemistry
field and due to its prediction ability of properties using some compound information.
In order to determine a computational similarity measure it has been introduced three basic components (I)
a representation of certain molecular properties (II) a weighting of these properties (III) a similarity function
that combines all the computed information in order to compare other molecules. In literature exist different
similarity measures categorized by their implementation, some of those are listed below.

Set-Based Similarity Measures Binary structural Finger Prints (FPs) and FPs based on similarity and
molecular dissimilarity coefficients are widely used with help of molecule sets or dictionaries. Nevertheless, these
methods could be low cost computationally but can present notorious lost of structural information. The most
popular similarity coefficient are Tanimoto coefficient that expresses a relationship of the number of features in
common for two molecules, over all the properties that they have; Dice coefficient which differs from Tanimoto
in the comparison over the arithmetic mean, an if it is used the geometric mean coefficient is called Cosine coeff.

Vector-Based Similarity Measures Diverse macroscopic chemical properties are used like components of
a defined abstract p-dimensional vector. Properties can be pKa , heat capacity, ionization potential, HOMO or
LUMO energies and so on. Vector-Based Similarity coefficients are in a certain way extensions of Tanimoto,
Dice and Cosine set-based coefficients where the relationships are completed by continuous and real valued
vectors, which can be a more powerful weapon to describe molecular similarity using a more complex set of
properties in a mathematical entity.

Aggregating Similarity Measures Nowadays the study of molecular similarity is not carry on with only one
similarity measure. In order to improve results scientists have taken account different combinations of properties
and coefficients. Data fusion methods fall under the more general rubric of data aggregation methods. There
are implementation used to obtain better results like: (I) Similarity Fusion, that combines searches obtained
with multiple measures and a single reference molecule; (II) Group Fusion, in that method there is no single
reference molecule but multiple reference ones; (III) Turbo Similarity, which is a variant of group fusion: provides
a procedure for applying group fusion when only a single active is known.

Role of Chemical Intuition and Experience In certain occasions computed values may not express some
similarity degree consistent with personal intuition. The role of synthetic chemists play a important factor in
the decision game. Trough history humans use concepts that they understand, albeit improvement of similarity
functions is well-known, there will be not clear judgments based only in experience of scientists.

You might also like