3 On An Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13

Molecular Diversity (2006) 10: 415427
DOI: 10.1007/s11030-006-9018-4
c Springer 2006
Full-length paper
On an aspect of calculated molecular descriptors in QSAR studies of quinolone

antibacterials
Payel Ghosh1 , Megha Thanadath2 & Manish C. Bagchi1,
1 Drug
Design, Development and Molecular Modelling Division, Indian Institute of Chemical Biology, 4 Raja S.C. Mullick
Road, Jadavpur, Calcutta 700032, India; 2 K.V.M College of Information Technology, Cherthala, Alleppey, Kerala, India
( Author for correspondence, E-mail: mcbagchi@iicb.res.in, Tel.: +91 33 2473 3491/3493/0493/6793, Fax: +91 33 2473
5197, +91-33-2472 3967)
Received 25 October 2005; Accepted 18 January 2006
Key words: quantitative structure activity relationship, quinolone antibacterials, molecular descriptors, intermolecular similarity,
PERL programming, ridge regression
Summary
The re-emergence of tuberculosis infections, which are resistant to conventional drug therapy, has steadily risen in the last
decade and as a result of that, fluoroquinolone drugs are being used as the second line of action. But there is hardly any
study to examine specific structure activity relationships of quinolone antibacterials against mycobacteria. In this paper, an
attempt has been made to establish a quantitative structure activity relationship modeling for a series of quinolone compounds
against Mycobacterium fortuitum and Mycobacterium smegmatis. Due to lack of sufficient physicochemical data for the antimycobacterial compounds, it becomes very difficult to develop predictive methods based on experimental data. The present paper
is an effort for the development of QSARs from the standpoint of physicochemical, constitutional, geometrical, electrostatic and
topological indices. Molecular descriptors have been calculated solely from the chemical structure of N-1, C-7 and 8 substituted
quinolone compounds and ridge regression models have been developed which can explain a better structure-activity relationship.
Consideration of an intermolecular similarity analysis approach that led to a successful computer program development in
PERL language has been used for comparing the influence of various molecular descriptors in different data subsets. The
comparison of relative effectiveness of the calculated descriptors in our ridge regression model gives rise to some interesting
results.
Introduction
The greatest threats to tuberculosis control are the association of this disease with the HIV epidemic and the increase
in resistance to the most effective anti-tuberculosis drugs.
The global increase of multi-drug resistant M. tuberculosis
strains and intolerance of first line anti-tuberculosis drugs
such as isoniazide, rifampicin, pyrazinamide and ethambutol may cause major problems and necessitate modification of standard therapy regimen [1]. Recently developed
drugs like 6-fluoro-4-quinolone-3-carboxylic acids seem to
be very effective in cases of severe intolerance of first line
anti-tuberculosis medication [2, 3]. Of these fluoroquinolone
drugs, sparfloxacin seems to be the most potent agent because of its broad-spectrum efficacies, both in vitro and
in vivo, better than those of ofloxacin and ciprofloxacin
against mycobacterial infections [4, 5]. Developments in the
quinolone family for producing more active agents against
gram-positive organisms and mycobacteria are being continued with substitutions at N-1 and C-7 as well as at the
8 position of the quinolone ring with a view to obtain the
relationship between structural modification at these positions and activity against mycobacteria [6, 7]. These agents
were evaluated for their activities against Mycobacterium
fortuitum and Mycobacterium smegmatis, as the activities
of the compounds against these two organisms were used
for a measure of Mycobacterium tuberculosis activity. But
there is hardly any study to examine specific structure activity relationships of the quinolone anti-bacterials against
mycobacteria. Quantitative Structure Activity Relationship
(QSAR) studies are based on the premise that biological
response is a function of the chemical structure. Thus, the
significant parameters of chemical structure have been defined in numerical terms for the use in the development
of specific QSAR models [8]. Computer-aided drug design
methods, in general, have been rapidly developed in the
416
last few years [9] and graph theoretical methods, in particular, are found to be very useful in Quantitative Structure Activity Relationship (QSAR) problems, to perform
a rational analysis of different pharmacological activities.
The TOPological Substructural MOlecular DEsign (TOPSMODE) approach has been introduced for the development
of novel in silico, graph theoretical and topological methods for modeling physicochemical and biological properties
of a chemical. The TOPS-MODE method has been applied
to the description of physicochemical property of organic
compounds as well as for the design of the biologically active compounds. In this sense, this approach has been extended not only to the discovery of novel leads, but also to
the study of the physicochemical and absorption properties
of drugs. The rationality in the search of the novel antibacterial drugs using the TOPS-MODE approach and the validation of the method for describing the biological activity of
a heterogeneous series of compounds have been studied in
details in very recent years. This approach has been largely
probed to generate good predictive linear models in order
to account for anti-microbial activity of a broader range of
molecular structural patterns [10]. Topological indices (TI)
or numerical graph invariants constitute an important subset
of theoretical molecular descriptors. TIs are the numerical
quantifiers of the molecular topology and encode information regarding size, shape, branching pattern, cyclicity and
symmetry of the molecular graph. These are widely used in
the QSAR research and biological activity prediction for tuberculostatic drug design [1113]. Topological descriptors,
formulated in graph theoretic approach, are found to have
wide applications in modeling studies encompassing quantitative structure-activity relationships (QSARs), quantitative
structure-property relationships (QSPRs) and quantitative
structure- toxicity relationships [1417]. The MarkovianChemicals-In silico-Design (MARCH-INSIDE), a very useful approach in drug design, together with linear discriminant
analysis (LDA) have been utilized to develop a QSAR in
order to classify compounds as antibacterials or not, within
structurally heterogeneous series [18].
The present paper aims at developing quantitative structure activity relationships for N-1 and C-7 as well as 8
substituted quinolone derivatives from the standpoint of
physicochemical, constitutional, geometrical, electrostatic,
and topological indices. Since physicochemical data are not
always available to develop predictive models, the only alternative is to utilize the above-mentioned theoretical molecular descriptors that can be derived solely from chemical
structures in the development of structure-activity relationship models. The biological activity data were sub-divided
into two groups utilizing our computer program on structural similarity and attempt has been made to investigate and
compare the pattern of QSAR models within the data subsets. This similarity-based classification of the database of
quinolone derivatives helps us to draw conclusions about the
influence of physicochemical vis-`a-vis other topological descriptors on the activity profile.
The results along with the utility and limitations of

our proposed QSAR models based on topological descriptors and similarity programs are presented in the following
sections.
Methods
Biological activity data of quinolone compounds
The actions of the quinolone antibacterials against Mycobacterium fortuitum and Mycobacterium smegmatis have been
studied by Renau et al by considering the effect of structural modifications at N-1 and C-7 as well as 8-substituted
quinolone derivatives. The biological activity data in the
form of Minimum Inhibitory Concentration (MIC in g/mL)
were determined experimentally [6, 7]. The activity of the
compounds against M. fortuitum was used as a barometer of M. tuberculosis activity. So, these activities may be
considered for the construction of a valid QSAR model.
QSAR models developed by using experimental properties
as independent variables, are essentially property-property
correlation, whereas models developed by using calculated
molecular descriptors solely form the molecular structure
of these quinolone compounds will give some insight on
structure-property correlations. Our aim is to utilize these activity data for creating structure property correlations, which
may provide a better tool for the rational drug design [19, 20].
Table 1 shows the chemical structures of quinolone substrates
used in our study and the heterocyclic-R groups attached to
the substrates along with their activity against M. fortuitum
and M. smegmatis.
The substituents at C-7 and N-1 positions with optimal
groups were chosen to examine how modifications at the
8 position affect anti-mycobacterial activity. From Table 1,
it is seen that the contribution of the 8 position to antimycobacterial activity is highly dependent upon the substituent of N-1. It may be mentioned that within the series
of compounds substituted at N-1 with a cyclopropyl group,
the COMe derivatives were among the most active when the
MICs against M. fortuitum and M. smegmatis are considered.
Computer program for structural similarity and
data sub-grouping
The biological activity data of the above quinolone compounds were obtained from the biological testing results from
the antibacterial and mycobacterial assays, and were reported
in terms of MIC (in g/mL). In order to develop QSAR models based on structural modifications and activities, and make
a valid comparison of the above models for predicting the influence of the different sets of molecular descriptors, it is necessary to subgroup the data. A possible way of sub-grouping
the data may be based on the structural similarity based on the
atom pair method [21] of the above quinolone compounds.
417
Table 1. Quinolone substrates and their activities considered in the present study
MIC Values(g/mL)
Comp
No.
R1
R7
HN
M.fort
M.smeg
Comp
No.
CH
0.06
0.25
MIC Values(g/mL)
R1
R7
H3C N
M.fort
M.smeg
CH
0.06
0.25
CH
0.06
0.13
CH
0.06
0.13
CH
1.0
0.25
CH
0.03
0.25
CH
0.5
0.5
CH
0.13
0.5
CH
0.25
0.5
CH
0.13
0.5
H 3C
H 3C
HN
CH
0.06
0.25
HN
H 3C
Et
HN
iPr N
CH
0.13
0.25
CH
0.25
0.25
CH
1.0
0.5
Et
iPrCH2 N
n Bu N
10
H 2N
11
CH
0.25
0. 5
12
M eN H C H 2
E tN H C H 2
13
CH
i PrNHCH 2
0.25
0.5
14
(Me)2N CH2
F
15
CH
H 2N CH 2
0.03
0.06
16
CH 3
HN
17
H3C N
CH
0.25
0.5
18
H 3C
HN
(Continued on next page)
In the present study, an attempt has been made to classify the

set of 69 quinolone compounds using a criterion of structural
similarity. This criterion keeps a close relationship between
the molecules belonging to each one of the classes and their
biological activity. To study the structural similarity, it is essential to build a mathematical space where chemical structures are pictured as vectors, whose components describe
topological features proper of their chemical nature. It is
expected that these chemical structures will be distributed
in mathematical space according to their structural characteristics, so that, we could find neighborhoods of similar
molecules. For a well-defined structural space, it is expected
that molecules with similar biological activity will be in the
same neighborhood of structural similarity [22]. A set of
well-chosen descriptors such as physicochemical, geometrical, constitutional, electrostatic and topological descriptors
may be used as variables. These descriptors arise from the
graph theoretical studies, which are often used as a powerful
418
Table 1. (Continued)
H3C
HN
19
CH
F
0.25
0.5
20
H3C
HN
CH
0.25
1.0
CH
0.5
1.0
CH
1.0
2.0
CH
0.13
0.25
CH
0.5
2.0
CH
0.13
1.0
CH
1.0
4.0
CH
0.03
0.13
CH
0.06
0.06
21
Et
Et
CH
0.5
1.0
22
iPr
F
F
iPrCH2 N
23
CH
2.0
4.0
24
n Bu N
F
N
25
CH
F
0.13
0.13
26
H 2N
M eNH CH 2
27
F
CH
E tN H C H 2
0.5
0.5
28
CH2CH3
CH 2CH 3
HN
H 3C
H 3C
29
HN
CH
1.0
2.0
30
CH2CH3
HN
H 3C
31
CH2CH3
Et N
CH
0.25
0.5
32
CH2CH3
H 2N
33
CH2CH3
CH
2.0
8.0
34
H 3C
H 3C
35
HN
M eN H CH 2
HN
CH
0.03
0.06
36
HN
H 3C
tool in the rational drug design. Thus, quantitative molecular similarity analysis was performed to sub-group the set of
quinolone antibacterials by similarity. An atom pair oriented
approach for the inter-molecular similarity using the principle
of Carhart and development of a suitable computer program
in PERL script [23] by our group, will definitely help us to
subdivide the entire database into three categories (a) the
whole set of 69 compounds, (b) compounds having more than
50% similarity with Sparfloxacin, a known fluoroquinolone
tuberculostatic drug and (c) compounds having more than

60% similarity with that of Sparfloxacin. The chemical structure of Sparfloxacin with its biological activity values in MIC
(g/mL) against M. fortuitum and M. smegmatis are given in
the Figure 1.
The computer program has mainly two tasks- the first
module is to generate the atom pairs for each of the quinolone
derivatives and to determine shortest path separation. The second module deals with the calculation of the intermolecular
419
N
37
Et
CH
0.13
0.13
38
CH
0.06
0.13
CH
1.0
4.0
CH
0.05
2.0
CH
1.0
2.0
CH
0.5
2.0
CH
0.13
0.5
CH
1.0
1.0
CBr
0.03
0.06
CBr
0.03
0.06
COMe
0.03
0.03
H 2N
N
39
CH
0.13
0.25
40
HN
MeNHCH2
H 3C
H 3C
HN
41
CH
0.5
2.0
42
HN
H 3C
43
Et
CH
1.0
4.0
44
H 2N
N
45
CH
2.0
8.0
46
HN
MeNHCH2
H 3C
H 3C
HN
47
CH
0.25
1.0
48
HN
H 3C
N
49
Et
CH
0.5
2.0
50
H 2N
H 3C
H 3C
HN
53
CBr
0.03
0.06
54
HN
H 3C
N
55
Et
CBr
0.03
0.06
56
H 2N
N
57
CBr
0.03
0.06
MeNHCH2
58
HN
similarity between any two compounds based on the atom

pairs along with the shortest path separation as determined
in the first module. This program is unique in the sense that
it can determine the intermolecular similarity between any
two chemical structures by using simply the positions of the
atoms and bonds of the concerned structures as specified in
the input format. The intermolecular similarity of all the 69
quinolone antibacterials considered in our present study with

that of Sparfloxacin was generated using the above program
and are represented in Table 2.
The computational approach for the generation of the
atom pairs and similarity calculation are given below whereas
the main program in PERL script is given in the supplementary section. An atom pair is a substructure composed
420
H 3C
H 3C
HN
59
COMe
0.03
0.03
HN
60
COMe
0.03
0.03
COMe
0.03
0.03
0.03
0.06
0.03
0.06
0.03
0.06
H 3C
N
61
Et
COMe
0.03
0.03
62
H 2N
N
63
COMe
0.03
0.03
64
HN
MeNHCH2
H 3C
H 3C
HN
65
0.03
0.06
HN
66
H 3C
N
67
Et
0.03
0.06
68
H 2N
N
69
0.03
0.06
MeNHCH2
of the input should be a forward slash (/) which represents

the start of the input. The format for the next line following
the forward slash is given as:
< symbol><atom name i> (position of the
neighboring atoms separated by commas (, ))
Figure 1. Sparfloxacin with MIC = 0.06 & 0.13 against M. fort & M. smeg,
respectively
of two non-hydrogen atoms, i and j, and their interatomic

separation,
<atom description i >
<separation> <atom description j>
To find the interatomic separation, which is the shortest path
distance between any two atoms in a chemical structure, we
represent the structure in the form of a tree, in which each
level of the tree structure corresponding to a particular atom
shows the number of the neighbors that atom is attached to.
Thus, the program in this direction will definitely help us to
compute atom pairs from a specific input format. The first line
The <symbol> can either be # or depending on whether

the atom is having a double bond or single bond respectively.
Molecular similarity, S(s, t), between any two structures,
s and t may be calculated as,
S(s, t) =
2
d(s) + d(t)
MIN[n(i, s), n(i, t)]
distinct
types i
of atom pairs
where d(s) and d(t) represents the total number of atom pairs
in s and t respectively.
Theoretical molecular descriptors calculation
The molecular descriptors used in the present paper are of
4 categories viz. (a) physicochemical, (b) constitutional and
geometrical, (c) electrostatic and (d) topological descriptors.
The physicochemical descriptors consist of the molecular
421
Computational approach for the generation of atom pairs from two chemical structures
Whether input data files of the chemical structure
together with the bonds and neighbours and atomic
symbol for compounds exit
If No
If Yes
Shortest path calculation from each atom to all other atoms using tree structure
representation where levels of the tree are treated as the array positions.
Identification of the initial and terminal atoms from the input data file
for obtaining atom descriptions
Classification of each atom from its environment consisting of bonds

and neighbouring atom(s) associated with it.
Go to step1 for the
determination of atom
pair for second structure
Print the obtained results in the atom-pair format
Store the calculated atom pairs of two chemical structures in

files, comp1.txt in the 1st iteration and comp2.txt in next
iteration for further analysis.
Terminate program
Exit the program
Similarity calculation for the two compounds

Calculate the no. of atom-pairs from the files, viz. comp1.pl and comp2.pl
Count the similar type of atom-pairs separately for each compound
Compare the count of each atom pair type from both the
structures and take the minimum of the number of occurrences
Obtain the total count of these minimum numbers of occurrences
Substitute these values in calculating structural similarity

between two compounds
Print the results, i.e. the molecular similarity between

two compounds in percentage
Exit the program
descriptors like AlogP98 value, AMR value, buffer solubility, polarizability, vapour density, water solubility etc. Descriptors like formal charges, fraction of rotatable bonds,
number of rigid bonds, number of rings, number of charged
groups etc. form the constitutional descriptors. The threedimensional or shape descriptors (3-D) are more complex,
encoding information about the three dimensional aspects
of molecular structure. The electrostatic descriptors include
422
Table 2. Structural similarity of quinolone derivatives against
Sparfloxacin
Comp
no.
Similarity with
Sparfloxacin(%)
Comp
no.
Similarity with
Sparfloxacin (%)
61.16
55.75
3
5
69.91
64.58
4
6
79.09
54.05
7
9
57.06
49.21
8
10
55.03
60.86
11
13
61.45
59.52
12
14
58.16
60.36
15
17
19
57.18
43.30
57.65
16
18
20
46.43
52.89
49.11
21
23
25
27
42.23
43.27
46.17
44.39
22
24
26
28
44.62
38.19
47.21
49.13
29
31
56.57
44.25
30
32
63.13
48.18
33
35
37
39
50.15
51.78
46.36
50.20
34
36
38
40
45.13
57.06
47.79
55.66
41
43
45
47
49
51
53
61.94
50.07
57.18
67.99
52.40
59.53
73.40
42
44
46
48
50
52
54
67.99
55.05
59.29
77.09
59.29
64.01
82.58
55
57
56.79
64.75
56
58
64.01
63.16
59
61
71.88
56.08
60
62
79.89
63.44
63
65
67
69
63.49
54.05
43.72
49.66
64
66
68
47.79
59.81
46.90
charge polarization, local dipole index, maximum positive

and negative charges, general polarity parameters, relative
charge etc., whereas the topological descriptors are the
biggest set of molecular descriptors which may again be subdivided into two classes- topostructural and topochemical
descriptors. The topostructural descriptors encode information strictly on the neighborhood and connectivity of atoms
within the molecule while the topochemical descriptors encode information related to both the topology of the molecule
and chemical nature of atoms and bonds within it.
In our present paper, we have used the software package PreADMET [24], which is a web based application
for predicting ADME data and building drug-like library
using insilico method. Two commercially available edition
of PreADMET are available, (i) standard and (ii) professional. This program can calculate about 955 molecular descriptors including constitutional, geometrical, topological,
electrostatic and physicochemical descriptors, which has
been developed in response to need for rapid prediction of
drug likeliness and ADME/Toxicity data. The input file may
be created either by drawing the chemical structure or using
an appropriate SMILES notation of the compound concerned.
A total number of 444 molecular descriptors were calculated
for our present investigation using PreADMET program and
prior to model development, the set of calculated descriptors was reduced from 444 to 294. The reduction in the descriptors was either due to keeping a constant value for (or
nearly) all of the compounds, or those that were perfectly
correlated with another class of descriptors. Table 3 represents the symbols of the calculated molecular descriptors
used in our present study together with their corresponding
groups.
Statistical analysis
Multivariate regression analysis (MRA), one of the oldest
data reduction methodologies, continues to be widely used
in QSAR [25] as it does not impose any restriction on the
type and number of graphical invariants used in structureproperty activity studies. For a valid statistical significance
of the MRA, it is necessary to restrict the maximal number of
descriptors, which will depend on the number of compounds
investigated [26, 27]. In order to avoid ambiguities in the
interpretation of regression, only few parameters, or ideally
a single parameter may be used. But the structure activity
relationship of chemical compounds requires a huge number
of physicochemical and molecular descriptors. Consideration of theoretical molecular descriptors like constitutional,
geometric, electrostatic and topological descriptors has found
wide applications in quantitative structure activity relationship modeling [28, 29]. To establish such a relationship between activity and structural descriptors of the quinolone
compounds under consideration, it is essential to develop a
regression or an input-output model. Multiple linear regression and partial least squares are common for development of
linear QSAR models while methods such as artificial neural
network are used in the case of non-linear modeling. Topological indices are in particular inescapable in the development of
successful multiple regression analysis leading to the QSAR
of rational drug design. The present study regarding QSAR of
quinolone antibacterials involves a huge number of various
types of topological as well as physicochemical descriptors.
Conventional regression i.e. ordinary least squares (OLS)
does not produce reliable models when the number of descriptors exceeds the number of observations [30, 31]. In
this situation, the alternate and appropriate statistical methods that may be considered are ridge regression (RR) [32],
principal component regression (PCR) [33] and partial least
squares (PLS) [3436]. All the above three linear statistical
methods are very useful and have a wide applicability when
423
Table 3. List of molecular descriptors used in this study
Descriptor classes
Descriptor names
Constitutional Descriptors
No. amino groups primary, No. amino groups secondary, No. amino groups tertiary,
No. ester groups, No. halogen atoms, Molecular weight, No. Total atoms, No.
Geometrical Descriptors
Rotatable bonds, Fraction of Rotatable bonds, No. Rigid bonds, No. Rings, No.
Aromatic rings, No. single bonds, No. aromatic bonds, No. H-bond acceptors, Ratio
donors to acceptor.
2D-VDW surface, 2D-VDW volume, 2D-VSA hydrophobic, Fraction of 2D-VSA hydrophobic, 2D-VSA
hydrophobic sat, 2D-VSA hydrophobic unsat, 2D-VSA other,
2D-VSA polar, Fraction of 2D-VSA polar, 2D-VSA Hbond acceptor, 2D-VSA Hbond
Electrostatic Descriptors
donor, 2D-VSA Hbond all, Fraction of 2D-VSA Hbond, Fraction of 2D-VSA

chargable groups, Topological PSA.
Max negative charge, Max positive hydrogen charge, Total negative charge, Total
positive charge, Total absolute atomic charge, Charge polarization, Local dipole index,
Polarity parameter, Relative positive charge, Relative negative charge, PPSA1(Partial
Positive Surface Area 1st type), PPSA2, PPSA3, PNSA1(Partial Negative Surface
Area 1st type), PNSA3, DPSA1(Difference in Charged Partial Surface Area), DPSA2,
DPSA3, FPSA1(Fractional charged partial positive surface area 1st type), FPSA2,
FPSA3, FNSA1(Fractional charged partial negative surface area 1st type), FNSA3,
WPSA1 (Surface weighted charged partial positive surface area 1st type), WPSA2,
WPSA3, WNSA1 (Surface weighted charged partial negative surface area 1st type),
WNSA3, RPCS (Relative positive charge surface area), RNCS (Relative negative
charge surface area), Hydrophobic SA MPEOE, Positive charged polar SA
MPEOE, Negative charged polar SA MPEOE, SADH1 (Surface area on donor
hydrogens 1st type), SADH2 (Surface area on donor hydrogens 2nd type), SADH3
(Surface area on donor hydrogens 3rd type), CHDH1 (Charge on donatable hydrogens
1st type), CHDH2, CHDH3, SCDH1 (Surface weighted charged area on donor
hydrogens 1st type), SCDH2, SCDH3, SAAA1 (Surface weighted charged area
on acceptor atoms 1st type), SAAA2, SAAA3, CHAA1 (Charge on acceptors atoms 1st
Topological Descriptors
type), CHAA2, CHAA3, SCAA1 (Surface weighted charged area on acceptor atoms
1st type), SCAA2, SCAA3, HRNCS, HRNCG.
Total structure connectivity index, Chi 0 (Simple zero order chi index), Chi 1, Chi 2,
Chi 3 path (Simple third order path chi index), Chi 3 cluster (Simple 3rd order cluster
chi index), Chi 4 path, Chi 5 path, Chi 4 path/cluster (Simple 4th order path/cluster chi
index), VChi 0 (Valance zero order chi index), VChi 1, VChi 2, VChi 3 path (Valance
3rd order path chi index), VChi 4 path, VChi 3 cluster, VChi 4 path/cluster, VChi 5
path, Kier shape 1 (encodes the degree of cyclicity in the graph, decreases as graph
cyclicity increases), Kier shape 2 (encodes the degree of central branching in the
graph,decreases as the degree of central branching increases.), Kier shape 3 (encodes
the degree of separated branching in the graph,increases as the degree of separation in
branching increases.), Kier alpha 1 (1st Order Kappa Alpha Shape Index), Kier alpha
2, Kier alpha 3, Kier flexibility, Kier symmetry index, Kier steric descriptor, Delta Chi
0 (Delta zero order chi index), Delta Chi 1, Delta Chi 2, Delta Chi 3 path, Delta Chi 3
cluster, Delta Chi 4 path, Delta Chi 4 cluster, Chi 4 path/cluster, Delta Chi 5 path,
Difference chi 0 (Difference simple zero order chi index), Difference chi 1, Difference chi 2,
Difference chi 3, Difference chi 4, Difference chi 5, IC (information content
index), BIC (bond information content), CIC (complementary information content), SIC
(structural information content), IAC total (total information index of atomic
composition), I adj equ (Information index based on the vertex adjacency matrix
equality), I adj mag (Information index based on the vertex adjacency matrix
magnitude), I adj deg equ (Information index based on the degree adjacency matrix
equality), I adj deg mag, I dist equ (Information index based on the distance matrix
equality), I dist mag (Information index based on the distance matrix magnitude),
424
Descriptor classes
Descriptor names
I edge adj equ (Information index based on the edge adjacency matrix equality),
I edge adj mag (Information index based on the edge adjacency matrix magnitude),
I edge adj deg equ, I edge adj deg mag, I edge dist equ, I edge dist mag,
Wiener index (Half-sum of the off-diagonal elements of the distance matrix of a
graph), Hyper Wiener index, Harary index (Half-sum of the off-diagonal elements of
the reciprocal molecular distance matrix), 1st Zagreb (1st Zegreb index), 2nd Zagreb,
Quadratic index, Rouvray index, 2-MTI (Schultz Molecular Topological Index (MTI)),
2-MTI prime (Schultz MTI by valence vertex degrees), Gutman MTI, Graph diameter,
Graph radius, Graph Petitjean, Eccentric connectivity index, Eccentric adjacency
index, Platt number, Odd-even index, Vertex degree-distance index, Ring degreedistance index, Balaban index JX, Balaban index JY, Xu (Xu index), Superpendentic
index, Unipolarity distance matrix, Centralization distance matrix,
Dispersion distance matrix, SC-0 (Subgraph Count Index of order 0), SC-1, SC-2,
SC-3 path, SC-3 cluster, SC-4 path, SC-4 cluster, SC-4 path/cluster, SC-5 path, SC-6
path, SC-7 path, SC-8 path, SC-9 path, SC-10 path, Solvation chi 0 (Solvation zero
order chi index), Solvation chi 1, Solvation chi 2, Solvation chi 3 path, Solvation chi 3
cluster, Solvation chi 4 path, Solvation chi 4 cluster, Solvation chi 4 path/cluster,
Solvation chi 5 path, VS-0 (Valence Shell Count of order 0), VS-1, VS-2, VS-3, VS-4,
VS-5, Molecular walk count 2, Molecular walk count 3, Molecular walk count 4,
Molecular walk count 5, Path/walk 2, Path/walk 3, Path/walk 4, Path/walk 5, Narumi
ATI (Narumi simple topological index (log)), Narumi HTI (Narumi harmonic
topological index), Narumi GTI(Narumi geometric topological index), Pogliani index,
Ramification index, Degree complexity, Graph vertex complexity, Graph distance
complexity, Graph distance index, Mean square distance index, Mean distance
deviation, Edge Wiener index, Edge Hyper Wiener index, Edge MTI, Edge Gutman
MTI, Edge connectivity index, E-state SsCH3, E-state SssCH2, E-state SdsCH, E-state
SsssCH, E-state SaasC, E-state SssssC, E-state SsssNH, E-state SdO, E-state
S hydrophobic, E-state S hydrophobic unsat, E-state S polar, E-state
S hbond donor, E-state S negative charged group, E-state SHssNH2, E-state
SHdsCH, E-state SHCHnX, E-state SH hydrophobic, E-state SH polar, E-state
SaaCH, E-state SdssC, E-state SssNH2, E-state SsssN, E-state SsOHl, E-state SsF, Estate S hydrophobic sat, E-state S none, E-state S hbond acceptor, E-state
S positive charged group, E-state SHsssNH, E-state SHaaCH.
Physicochemical Descriptors
Polarizability Miller, SKlogP value, Water solubilityl, Vapor pressure, Buffer solubility,
SK MP, AMR value (Calculated molecular refractivity index), Polarizability MPEOE,
SKlogS value, SKlogPvp, SKlogS buffer, SK BP, AlogP98 value, AlogP98 002C,
AlogP98 006C, AlogP98 008C, AlogP98 024C, AlogP98 026C, AlogP98 038C, AlogP98
040C, AlogP98 047H, AlogP98 051H, AlogP98 053H, AlogP98 057O, AlogP98 067N, AlogP98 071N,
AlogP98 073N, AlogP98 075N, AlogP98 094Br, AlogP98 084F, AlogP98
001C, AlogP98 003C, AlogP98 005C, AlogP98 011C, AlogP98 029C, AlogP98 046H,
AlogP98 050H, AlogP98 052H, AlogP98 060O, AlogP98 066N, AlogP98 068N.
the number of independent variables greatly exceed the number of observations and when the independent variables are
highly inter-correlated. Each of these methods makes use of
the entire available pool of independent variables as opposed
to selecting a subset, which introduces bias and may result
in the elimination of important parameters from our studies.
From the works of Miller [31] and Friedman [38], it is also
known that data subsetting is less effective than those methods that retain all of the independent variables and use other
approaches to deal with the rank deficiency. Among the three
statistical methods involving RR, PCR and PLS, it is found

that RR is the best among the three methods, and this is used
extensively in multiple comparative studies [18, 3840]. For
this reason, the models based on the large set of constitutional
and geometric, electrostatic and topological descriptors were
developed using the RR methodology. RR, like PCR, transforms the descriptors to their principal components (PCs) and
uses the PCs as descriptors. However, unlike PCR, RR retains
all of the PCs, and shrinks them differentially according to
their eigenvalues. The RR vector of regression coefficients,
425
b, is given by
b = (XT X+k I)1 XT Y
where X is the matrix of descriptors, Y is the vector of
observed activities, I is an identity matrix, and k is a nonnegative constant known as the ridge constant. If k = 0,
RR reduces to conventional OLS regression. Thus, the Ridge
Regression (RR) method has been applied in our dataset of
quinolone compounds and models have been developed accordingly for various sets of molecular descriptors and an
effective comparison among the RR models for the above
descriptor classes have been made and discussed in the next
section.
Results and discussion

QSAR studies have been performed using the theoretical
molecular descriptors, calculated from the PreADMET
Molecular Descriptor Calculation package and experimentally derived biological activity data of the quinolone
derivatives both for M. fortuitum and M. smegmatis and the
ridge regression analysis is given in the Table 4. We have considered all of the 69 quinolone compounds, i.e. N1 and C7 as
well as 8 substituted derivatives of quinolone antibacterials
and different subsets to be used in the statistical analysis.
Further subsetting of the above biological activity data has
been considered by us utilizing molecular similarity analysis
for an effective comparison of the results obtained from the
RR analysis. Similarity analysis performed by us enhance
the scope of sub grouping the data into further two categories
viz., (i) compounds having 50% or more inter molecular similarity with Sparfloxacin, the fluoroquinolone drug used as an
anti-tuberculostatic agent and (ii) compounds having 60%
and more similarity with the drug. So, four cases of ridge
regression models were developed as for example the complete set of 69 quinolone compounds; 51 sets of N1 and C7
substituted quinolone derivatives; and two other groups of
data consisting of 48 and 22 compounds arising out of the
above similarity analysis. To calculate the molecular similarity between any two compounds, we have developed a
computer program in PERL script and the main utility of
this program is that it can generate the whole sets of atompairs of each compound and calculate the structural similarity
afterwards from the input files containing the minimum information, i.e. the positions of atoms and bonds of respective
compounds.
Thus it is evident that the structural similarity oriented
sub grouping of the entire data set has actually arranged the
quinolone compounds activity-wise for it is known that the
structurally similar compounds may possess similar activity.
From the above sub grouping of Table 4, we can study the
pattern of influence of any descriptor class on activity in this
proposed QSAR model of quinolone compounds that help us
to arrive at some conclusions. To study the pattern of influence

of the descriptor classes, it is necessary to compare the R2
values in our ridge regression model. The total RR analysis
was done using the NCSS software package [41].
The above table provides the regression summary for QSAR
of the quinolone derivatives in cases of Mycobacterium fortuitum and Mycobacterium smegmatis. For the complete set
of 69 compounds, the RR model only with the topological
descriptors has R 2 values of 0.8357 and 0.8200 for M. fortuitum and M. smegmatis respectively and the addition of
other theoretical descriptors like constitutional and geometrical and electrostatic indices have contributed significantly
towards a better R2 value thus improving the model quality.
From the Table, it is seen that the influence of the above descriptors when considered alone result in inferior models. For
the same set of compounds, the RR model based on physicochemical descriptors appears to be very poor compared to
the topological descriptors derived model. When we consider
the group of the first 51 compounds in Table 1 excluding the
derivatives with the substitution at 8 position, we see that
the RR models based on topological descriptors alone can
fit the data very well. The fit is clearly much better compared to the physicochemical model. Even the electrostatic
descriptors can describe the model better with R 2 values of
0.7380 and 0.6850 for the case of M. fortuitum and M. smegmatis respectively than the physicochemical descriptors with
the R 2 values of 0.6947 and 0.6408 against those respective mycobacteria. If we take all the calculated molecular
descriptors like topological, electrostatic and constitutional
and geometrical indices into account, we get an excellent fit
with the value of R 2 being 0.9021 and 0.8830 for M. fortuitum and M smegmatis respectively. In the third case, where
48 quinolone compounds were considered on the basis of
50% or more similarity cases, it is worthwhile to mention
that this dataset gives overall improved values of R 2 than
the previous datasets. Here also the topological descriptors
alone can describe the model much better than the physicochemical property based model and the combination of all the
calculated descriptors such as constitutional and geometrical,
electrostatic and topological indices contribute to a more significant model development. This trend is also continued in
the last subset of 22 quinolone compounds possessing 60%
or more structural similarity with sparfloxacin. The pattern of
influence of these structural descriptors seems to be the same
as in the previous cases when compared to the physicochemical descriptors. So it is evident from the QSAR reported in
Table 3 that the calculated molecular descriptors could provide a better quality predictive model for N-1, C-7 and 8 substituted quinolone derivatives. The physicochemical property
based QSPR studies resulted in much inferior models. The
QSAR models based on molecular descriptors that are calculated solely from the chemical structure can be used as more
reliable models for predicting the potential of any quinolone
derivatives. It is hoped that the model development in this direction will throw new light on the anti-tuberculostatic drug
design.
426
Table 4. Regression summary for QSARs of Quinolone compounds
R2
M. fortuitum
M. smegmatis
Constitutional and Geometrical Descriptors

0.5991
0.6785
0.6265
0.6172
With all the above three sets of descriptors
0.8357
0.8928
0.8200
0.8932
0.6932
0.6424
0.5496
0.7380
0.8196
0.5954
0.6850
0.8226
0.9021
0.6947
0.8830
0.6408
0.6574
0.7689
0.7693
0.9226
0.9369
0.7360
0.7300
0.9333
0.9535
0.7350

0.7345
0.9585
0.9914
0.9931
0.8521
0.9415
0.9952
0.9962
0.7679
0.8890
Molecular Descriptors
N = 69 ( whole set of 69 compounds)
N = 51 (considering C1 and N7 substitution)
N = 48 (50% and above similarity)
Compounds: 18, 1015, 1819, 2930,
33, 3536, 3963, 6566
N = 22 (60% and above similarity)

Compounds:1, 35, 1011, 14, 30, 4142,
4748, 5254, 5660, 6263
Ref to Table 2
Acknowledgement
Payel Ghosh thanks the Council of Scientific and Industrial Research, New Delhi 110001, India for the grant of
a Junior Research Fellowship to her. The authors sincerely
acknowledge the valuable comments of the anonymous reviewers that helped to improve the quality of the final
manuscript.
References
1. Lubasch, A., Erbes, R., Munch, H. and Lode, H., Sparfloxacin in treatment of drug resistant tuberculosis and intolerance of first line therapy,
Eur. Respir. J., 17 (2001) 641646.
2. Albino, J.A. and Reichman, L.B., The treatment of tuberculosis, Respiration, 65 (1998) 237255.
3. OBrien, R.J. and Vernon, M., New tuberculosis drug development,
Am J Respir Crit Care Med., 157 (1998) 17051707.
4. Nakamura, S., Minami, A., Nakata, K., Kurobe, N., Kouno, K.,
Sakaguchi, Y., Kashimoto, S., Yoshida, H., Kojima, T., Ohue, T.,
5.
6.
7.
8.
9.
Fujimoto, K., Nakamura, M., Hashimoto, M. and Shimizu, M., In vitro

and in vivo antibacterial activities of AT-4140, a new broad-spectrum
quinolone. Antimicrob. Agents Chemother., 33 (1989) 11671173.
Rastogi, N., Labrousse, V., Goh, K.S. and Sousa, J.P., Antimycobacterial spectrum of sparfloxacin and its activities alone and in association
with other drugs against Mycobacterium avium complex growing extracellularly and intracellularly in murine and human macrophages,
Antimicrob. Agents Chemother., 35 (1991) 24732480.
Reanau, T.E., Sanchiez, J.P., Gage, J.W., Dever, J.A., Shapiro, M.A.,
Gracheck, S.j. and Domagala, J.M., Structure-activity relationships of
the quinolone antibacterials against mycobacteria: Effect of structural
changes at N-1 and C-7, J. Med. Chem., 39 (1996) 729735.
Reanau,T.E., Gage, J.W., Dever, J.A., Roland, G.E., Joannides, E.T.,
Shapiro, M.A., Sanchiez, J.P., Gracheck, S.J., Domagala, J.M., Jacobs, M.R. and Reynolds, R.C., Structure-activity relationships of
quinolone agents against mycobacteria: Effect of structural modification at the 8 position, Antimicrob. Agents Chemother., 40 (1996) 2363
2368.
Hansch, C., On the structure of medicinal chemistry, J. Med. Chem.,
19 (1976) 16.
Hansch, C. and Leo, A. QSAR: Fundamentals and Applications in
Chemistry and Biology, American Chemical Society, Washington, DC,
1995.
427
10. Molina, E., Diaz, H.G., Gonzalez, M.P., Rodriguez, E. and Uriarte, E.,
Designing antibacterial compounds through a topological substructural approach, J. Chem. Inf. Comput. Sci., 44 (2004) 515521.
11. Bagchi, M.C., Maiti, B.C., Mills, D. and Basak, S.C., Usefulness of
graphical invariants in quantitative structureactivity correlations of
tuberculostatic drugs of the isonicotinic acid hydrazide type, J Mol
Model, 10 (2004) 102111.
12. Bagchi, M.C. and Maiti, B.C., On application of atom pairs on drug
design, J Mol Struct: THEOCHEM, 623 (2003) 3137.
13. Bagchi, M.C., Maiti, B.C. and Bose, S., QSAR of antituberculosis drugs
of INH type using graphical invariants, J Mol Struct: THEOCHEM,
679 (2004) 179186.
14. Roy, K., Topological descriptors in drug design and modeling studies,
Mol. Div., 8 (2004) 321323.
15. Balaban, A.T., Basak, S.C., Beteringhe, A., Mills, D. and Supuran,
A.T., QSAR study using topological indices for inhibition of carbonic
anhydrase II by sulfanilamides and Schiff bases, Mol. Div., 8 (2004)
401412.
16. Besalu, E., Ponec, R. and Julian-Ortiz, J.V., Virtual generation of
agents against Mycobacterium tuberculosis. A QSAR study, Mol. Div.,
6 (2003) 107120.
17. Votano, J.R., Parham, M., Hall, L.H. and Kier, L.B., New predictors
for several ADME/Tox properties: Aqueous solubility, human oral absorption, and Ames genotoxicity using topological descriptors. Mol.
Div., 8 (2004) 379391.
18. Gonzalez-Diaz, H., Torres-Gomez, L.A., Guevara, Y., Almeida, M.
S., Molina, R., Castanedo, N., Santana, L. and Uriarte, E., Markovian
chemicals in silico design (MARCH-INSIDE), a promising approach
for computer-aided molecular design III: 2.5D indices for the discovery
of antibacterials, J. Mol. Model (Online), 11 (2005) 116123.
19. Basak, S.C., Gute, B.D. and Mills, D., Quantitative molecular similarity analysis (QMSA) methods for property estimation: A comparison of
property-based, arbitrary, and tailored similarity spaces, SAR QSAR
Environ Res., 13 (2002) 727742.
20. Basak, S.C., Mills, D., Hawkins, D.M. and El-Masri, H.A., Prediction
of tissue: Air partition coefficient: A comparison of structure-based
and property-based methods, SAR QSAR Environ Res., 13 (2002)
649665.
21. Carhart, R.E., Smith, D.H. and Venkataraghavan, R., Atom pairs as
molecular features in structure activity studies: Definition and applications, J. Chem. Inf. Comput. Sci., 32 (1985) 664674.
22. Nino V., M., Daza C., E.E. and Tello, M., A criteria to classify biological activity of benzimidazoles from a model of structural similarity, J.
Chem. Inf. Comput. Sci., 41 (2001) 495504.
23. Tisdall, J. (1st Ed.), Beginning Perl for Bioinformatics, OReilly,
(2001).
24. http://preadmet.brdrc.org/.
25. Katritzky, A.R., Petrukhin, R., Tatham, D., Basak, S., Benfenati, E.,
Karelson, M. and Maran, U., Interpretation of quantitative structure
property and activity relationships, J. Chem. Inf. Comput. Sci., 41
(2001) 679685.
26. Rao, C.R., Linear statistical inference and its applications (2nd ed.),
Wiley, New York (1973).
27. Randic, M., Novel shape descriptors for molecular graphs, J. Chem.
Inf. Comput. Sci., 41 (2001) 607613.
28. Basak, S.C., Grunwald, G.D. and Niemi, G.J., In: A.T. Balaban (Ed.),
From Chemical Topology To Three-Dimensional Geometry, Plenum
Press, New York (1997), pp. 73116.
29. Basak, S.C., Use of molecular complexity indices in predictive pharmacology and toxicology: A QSAR approach, Med. Sci. Res., 15 (1987)
605609.
30. Estrada, E., In: J. Devillers and A.T. Balaban (Eds.), Topological Indices And Related Descriptors In QSAR And QSPR,
Gordon and Breach, Amsterdam, The Netherlands (1999), pp. 403
453.
31. Miller, A.J., Subset selection in regression, Chapman and Hall, (1990)
New York, NY.
32. Rencher, A. C. and Pun, F.C., Inflation of R2 in best subset regression,
Technometrics, 22 (1980) 4953.
33. Hoerl, A.E. and Kennard, R.W., Ridge regression biased estimation for
nonorthogonal problems, Technometrics, 8 (1970) 2751
34. Massy, W.F., Principal components regression in exploratory statistical
research, J. Amer. Statist. Assoc., 60 (1965) 234246.
35. Wold H., Soft modeling by latent variables: The nonlinear iterative
partial least squares approach. In: Gani J (Ed.) Perspectives in Probability and Statistics, papers in honor of Bartlett MS. Academic Press,
London, (1975).
36. Hoskuldsson, A., PLS regression methods, Journal of Chemometrics,
2 (1988) 211228.
37. Hoskuldsson, A., A combined theory for PCA and PLS, Journal of
Chemometrics, 9 (1995) 91123.
38. Frank, I.E. and Friedman, J.H., A statistical view of some chemometrics
regression tools, Technometrics, 35 (1993) 109135.
39. Basak, S.C. , Mills, D. , Hawkins, D.M. and El-Masri, H., Prediction
of human blood: Air partition coefficient: A comparison of structurebased and property-based methods, Risk Analysis, 23 (2003) 1173
1184 .
40. Basak, S.C., Mills, D., Mumtaz, M.M. and Balasubramanian, K., Use
of topological indices in predicting aryl hydrocarbon receptor binding potency of dibenzofurans: A hierarchical QSAR approach, Ind. J.
Chem., 42A (2003) 13851391.
41. NCSS Statistical and Power Analysis Software; Hintze, J. (2004),
NCSS and PASS. Number Cruncher Statistical Systems, Kaysville,
Utah, http://www.ncss.com/.

3 On An Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 On An Aspect of Calculated Molecular Descriptors in QSAR Studies of Quinolone 13

Uploaded by

Copyright:

Available Formats

Molecular Diversity (2006) 10: 415427

On an aspect of calculated molecular descriptors in QSAR studies of quinolone

The results along with the utility and limitations of

(Continued on next page)

In the present study, an attempt has been made to classify the

(Continued on next page)

tuberculostatic drug and (c) compounds having more than

(Continued on next page)

similarity between any two compounds based on the atom

quinolone antibacterials considered in our present study with

of the input should be a forward slash (/) which represents

of two non-hydrogen atoms, i and j, and their interatomic

The <symbol> can either be # or depending on whether

MIN[n(i, s), n(i, t)]

Classification of each atom from its environment consisting of bonds

Print the obtained results in the atom-pair format

Store the calculated atom pairs of two chemical structures in

Exit the program

Similarity calculation for the two compounds

Count the similar type of atom-pairs separately for each compound

Obtain the total count of these minimum numbers of occurrences

Substitute these values in calculating structural similarity

Print the results, i.e. the molecular similarity between

Exit the program

charge polarization, local dipole index, maximum positive

donor, 2D-VSA Hbond all, Fraction of 2D-VSA Hbond, Fraction of 2D-VSA

statistical methods involving RR, PCR and PLS, it is found

Results and discussion

to arrive at some conclusions. To study the pattern of influence

Constitutional and Geometrical Descriptors

Constitutional and Geometrical Descriptors

Constitutional and Geometrical Descriptors

N = 22 (60% and above similarity)

Fujimoto, K., Nakamura, M., Hashimoto, M. and Shimizu, M., In vitro

You might also like