You are on page 1of 12

'a

S~mposiumon Graph Theory in Chemistry

10

A Graph-Theoretical Approach
to Structure-Property Relationships
-

Zlatko MlhaliC
Faculty of Science and Mathematics,The University of Zagreb, Strossmayerovtrg 14.41000 Zagreb, The Republic of Croatia
Nenad TrinajstlC
The Rugjer BoSkoviC Institute, P.O.B. 1016,41001 Zagreb,The Republic of Croatia

A fundamental concept of chemistry is that the structural characteristics of a molecule are responsible for its
pmperties (1).This was pointed out in the middle of the
last century by Crum Brown and Fraser (2) who had also
devised one of the first structure-property models. However, the earliest work in which this relationship was observed (the toxicity of methyl and amyl alcohols)was a thesis by Cms in 1863 (3).
A Topological Model of Matter
The origin of the structure-property concept can be
traced (4) to the work of the Croatian Jesuit priest, scientist, and philosopher Rugjer Josip BoGkoviC (5)who introduced the idea of representing atoms as points in space (6).
(His major work was the theory of a single law of forces.)
By allowing the point atoms to assume a variety of different arrangements, BogkoviC was able to account for the existence of different substances.
In this way the BobkoviC model may be considered as the
forerunner of a topological model for the structure of matter. BoBkovib's fundamental idea, which is of the greatest
importance in chemistry, was that substances have different properties because they have differentstructures. This
idea was used, for example, by Davy to rationalize the difference between diamond and graphite (4, 7).

Table 2. List of Properties that Are Deslrable for


Topological Indices a s Proposed by RandiC (18)
1
2
3
4

Direct structural interpretation


Good correlation with at least one molecular property
Good discrimination of isomers
Locally defined
Generalizable
Linearly independent
Simplicty
Not based on physical or chemical properties
Not trivially related to other indices
Effidencyof construction
Based on familiarstructural concepts
Correct size dependence
Gradual change with gradual change in structures

QSPR

The structure-~m~ertv
. - relationships wantifv the connection between ihe structure and p&pekies o<moleculea
(81. These relationships are mathematical models that
allow the prediction ofproperties from structural parameters. They are called quantitative structure-property rela-

Table 1. List of Selected Topological Indices


Topological Standard Structural interpretationa Author (Year)
index
symbol
Wiener
W
Sum of distances in a
Weiner (1947)
number
molecular graph
Z
Sum of countsof non- Hosoya (1971)
Hosoya
index
adjacent edges in a
molecular graph
RandiC
x
Sum of weighted edges RandiC (1975)
in a molecular graph
index
Balaban
J
Sum of weighted
Balaban (1982)
index
distances in a molecular
graph
Schultz
MTI
Sum of elements of the Schultz (1989)
index
structural row matrix
v[A +bD]of a molecular
graph
H
Sum of squares of
PlavSiC, NikoliiC
Haraty
number
reciprocal distances in a TrinajstiC (1991)
molecular araoh
- .
'Graph-theoretical mncepts are given in the following section.
'v the valency row matrix; A = adjamncy matrix: 0 = t h e distanm matrix

Topological Indexes
A graph-theoretical approach to QSPR is based on the
use of topological (graph-theoretical) indices for encoding
the structural information (8-14). The term topological
index (15) indicates a characterization of a molecule (or a
corresponding molecular graph (16)) by a single number.
The need to represent molecular structure by a single
number arises from the fact that most molecular pmperties are recorded as single numbers. Therefore, QSPR
modelling reduces to a correlation between the two sets of
numbers via an algebraic expression. (One set of numbers
represents the properties, and the other set represents the
structures of molecules under study.)
Characterizing a molecule by a single number represents a considerable loss of information: A three-dimensional object (molecule)is described by a one-dimensional
object (topological index). However, what is surprising is
how much of the relevant structural information is still retained in a given topological index.
There are-more than 120 topological indices available to
date in the literature (171,without any sign that their pmVolume 69 Number 9 September 1992

701

liferation will stop in the near future. Here we will review


onlv several selected touoloeical indices. Table 1 lists six
topblogical indices thatAwill-be considered in this report.
Table 2 gives a list of useful properties that are desirable
for topological indices (18).
The desirable properties proposed by RandiL (18)represent the very high level of sophistication that a topological
index should achieve. All six indices listed in Table 1 approach this ideal. Their weakest point is the discrimination of isomers. This narticular urouertv
" is rather low for
all topological indices'considerei here except the Balaban
index (19-22). However, this is the weak point of most topological indices, except for molecular identification numbers (23). Nonetheless. the low discriminatorv Dower of
many indices does not prevent them from being useful descriptors in structure-property-activity modelling.
In the next section we will give a brief survey of elementary (chemical) grapb-theoretical concepts. This section
will be followed by a section containing definitions of the
six selected to~oloeicalindices. In the fourth section a design of the ~ t ~ c t & - ~ m ~relationships
ert~
will be delineated. Then a didactic example will be presented.

necting the appropriate circles. Because a diagram of a


graph completely describes the graph, it is customary and
convenient to refer to the diagram of the graph as the
graph itself.
Mainly due to their diagrammatic representation,
graphs have appeal as structural models in science, in general, and in chemistry, in particular (26,271. As an example, Figure 1shows a diagram of a labelled graph. Agraph
is called labeled when a specificnumbering of the its vertices is introduced.

Elementary Graph-Theoretical Concepts


We will cover only those graph-theoretical concepts that
will be used in this report. In doing so, we will follow the
book Graph Theory by FrankHarary(24) and both editions
of our book Chemical Graph Theoq (8,16,25).
Graph theory is a branch of discrete mathematics, related to topology and wmbinatorics. It deals with the way
objects are connected and with all the consequences of the
connectivity. The connectivity in a system is, thus, a fundamental quality of graph theory.
Chemical graph theory is a branch of mathematical
chemistry, and consequently of theoretical chemistry. It is
concerned with handling chemical graphs, that is, graphs
that represent chemical systems. Hence, chemical graph
theory deals with analyses of all consequences of connectivity in a chemical system. In other words, chemical graph
theory is concerned with all aspects of the application of
graph theory to chemistry.

Molecular Gmphs
A special class of chemical graphs are molecular graphs.
Molecular graphs (or constitutional graphs) are chemical
graphs that represent the constitution of molecules. In
these graphs vertices correspond to individual atoms, and
edges correspond to the bonds between them. An interesting historical detail i.; related to the concept of the molecular graph: The term graph was introduced by English
mathematician Sylvester (28) in 1878 on the basis of the
constitutional formulas used by the chemists of his day.
To simplify the manipulation of molecular graphs, hydrogen-depleted graphs are often used. Such graphs represent only the molecular skeletons, omitting hydrogen
atoms and their bonds. As an example, Figure 2 gives a
labeled molecular hydrogen-depleted graph that depicts
the carbon skeleton of 2,3,4-trimetbylhexane.

The Concept of a Graph in Chemistry

In chemistry, graphs can be used to represent a variety


of chemical objects such as molecules, reactions, crystals,
polymers, and clusters. The common feature of chemical
systems is the presence of sites and connections between
them. Sites can be atoms, electrons, molecules, molecular
fragments, intermediates, ete., while the connections between sites can represent bonds, reaction steps, van der
Waals forces, etc. Chemical systems can be represented by
chemical graphs using a simple conversion rule: Sites are
replaced by vertices and wnnections by edges.

The Concept of a Graph in Graph Theory

The central concept in graph theory is that of a graph.


For a graph theorist, a graph is the application of a set on
itself, that is, a collection of elements of the set and of binary relations between these elements. Graphs are one-dimensional objects, but they can be embedded or realized in
spaces of higher dimensions.
For a chemist, the two-dimensional realization of a
graph is more appealing, that is, a set of vertices (points)
and of edges (lines)joining these vertices. Agraph G can be
visualized by a diagram when the vertices are drawn as
small circles or dots, and the edges as lines or curves con-

Figure 2. A aoelea, hydrogen-depleted,molecular graph correspondmg lo tne carbon skeleton of2.3.4-trimethylhexane.Tne vertices correspono to aroms, an0 the edges correspono to chem.ca wnos.
Analyzing and Comparing Graphs

Figure 1. Adiagram of a labeled (numbered)graph, showing vertices


as circles and edges as lines. The graph is aclualy aone-dimensional
entity, by it can be realized in two dimensions,as shown here.
702

Journal of Chemical Education

Two graphs GI and Gz are isomorphic if there exists a


one-to-one correspondence between their vertex sets V(GJ
and V(G2), which induces a one-to-one correspondence between their edge sets E(GJ and E(G2).An invariant of a
graph G is a quantity associated with G that has the same
value for any graph that is isomorphic with G. It should be
noted that topological indices are graph invariants.
Two vertices i and j of a graph G are adjacent if there is
an edge joining them; the vertices i and j are then incident
to such an edge. Similarly, two edges of G are adjacent if
they have a vertex in common. The valency of a vertex i of
G is the number of edges incident to i. This is denoted by
dm.

A walk of a graph G is an alternating sequence of vertices


and edges, beginning and ending with vertices, in which
each edge is incident with the two vertices immediately
preceding and following it. A path is a walk in which no
vertex occurs more than once. The distance between two
vertices is the number of edges in the shortest path that
joins the two vertices. Agraph G is connected if every pair
of its vertices is joined by a path. Otherwise, a graph is
considered disconnected.
A graph whose vertices all have the same valence is
called a regular graph. If all vertices in a regular graph
have a valence of 2, then the graph is called a cycle. A tree
is a connected acyclic graph. The molecular graph in Figure 2 is an example of a tree. A graph is acyclic if it has no
cycles.
Associating Graphs with Matrices
A labeled (chemical) graph may be associated with several matrices. Two very important graph-theoretical matrices are the vertex-adjacency matrix and the distance
matrix.
The vertex-adjacency matrix, A = A(G), of a labeled connected graph G with N vertices, is a square symmetric matrix of order N. It is commonly called the adjacency matrix.
It is defined below.
1 if vertices i and j are adjacent

Table 4. The Computation of the Wiener Number for a


Tree TDepicting the Carbon Skeleton
of BMethylbutane.

(a)A labeled tree 7

(b) The distance matrix of T

(c) The Wiener number of T


1
W ( n =-(%I+
8.2+ 4.3) = 18
2

(1)

The distance matrix, D = D(G),of a labeled connected


graph G with N vertices is a square symmetric matrix of
order N. It is defined below.

Table 5. The Computation of the Hosoya Index for a


Tree TRepresenting the Carbon Skeleton
of 2,3-Dimethylpentane.

(a)A tree T
where lij is the length of the shortest path (i.e., the distance) between the vertices i and j in G.
Very often the distance matrix of a graph G can be generated using powers of the corresponding adjacency matrix
of G (29). Table 3 gives the adjacency matrix and the distance matrix that correspond to the molecular graph in
Figure 2.
Table 3. The Adjacency Matrix and the Distance Matrix
of the Molecular Graph in Figure 2

Definitions of the Selected Topological Indices

(b)The count of the ~(Tfiquantities


in T
(i) p(T;O) = 1
(ii) p(T;I) = 6
(iii) p(T;2)= 8

(iv) p(T:3)= 2

Wiener Number
The Wiener number, W = W(G) of G, wasintroduced by
Wiener in 1947 as the path number (30).This topological
index is defined as the half-sum of the elements of the distance matrix (15).
(c)The Hosoya index of T
Z(n= p(T;O)+ p(T;l)+ p(T;2) + p(T;3)= 17
Table 4 gives an example for computingthe Wiener number.
Volume 69 Number 9 September 1992

703

Table 6. The Edge Weights of 10 Edge Types


Which Appear in Graphs Corresponding to the Carbon
Skeletons of Hydrocarbons

Table 8. The Computation of the Balaban lndex


for a Labeled Tree TRepresenting the Carbon Skeleton
of 2,3-Dimethylpentane

(a)A labeld tree 7


I,?

12

0.7071

1,3

0.5774

1,4

0.5

22

0.5

23

0.4082

2,4

0.3536

3,3

0.3333

3,4

0.2887

4,4

0.25

Table 7. The Computation of RandiC lndex for a Tree T


Depicting the Carbon Skeleton
of 4-Ethyl-2-methylheptane

(b) The distance sums


0
1
2
D(!z)= 3
4
2
3

1
0
1
2
3
1
2

2
1
0
1
2
2
1

3
2
1
0
1
3
2

4
3
2
1
0
4
3

2
1
2
3
4
0
3

3
2
1
2
3
3
0

(c) The Balaban index of 1

(a)A tree T
34) (and also in quantitative structure-reactivity relationship ($SARI (35)).
The Randid index is defined as

(b) Count of the edge-types (the numbers at the vertices

represent their valencies)


where d(i) and d(j) are the valencies of the vertices i and j
that define the edge ij.
For saturated hydrocarbons, eq 5 may be givenin closed
form. In molecular graphs that depict the carbon skeletons
of hydrocarbons, only four types of vertices with respect to
their valencies appear, that is, vertices with d = 1,2, 3,4.
These give rise to 10 types of edges whose weights are
given in Table 6.
If the number of each edge type is denoted by
where i = 1, ...,4
bg
j = i, ..., 4

42=2
43=2
bL2= 1
b3=4

(c) The Randit index of T


x ( q = 2.0.7071 + 2.0.5774

+ 0.5 + 4.0.4082 = 4.7018

and if the edge weights from Table 6 are used, then eq 5


becomes the following.

Hosoya lndex

The Hosoya index, Z = Z(G), was introduced by Hosoya


in 1971 as the Z index (15).This index is defined below.

wherep(G; i) is the number of selections of i mutually nonadjacent edges in G.


By definition, p(G; 0) = 1, and p(G; 1)is the number of
edges in G. Table 5 gives an example of computing the
Hosoya index.
RandiC lndex

The Randid index, x = x(G) of G, was introduced by


RandiC in 1975 as the connectivity index (31). This is one
of the most widely used topological indices in QSPR (32704

Journal of Chemical Education

This expression reveals that the Randid indices of hydmcarbons are fully determined by the counts of the edge
types in the corresponding hydrogen-depleted graphs.
Table 7 gives an example of computingthe RandiC index by
means of eq 6.
Balaban lndex

The ~ a l a b a nindex, J = J(G) of G, was introduced by


Balaban in 1982 as the average-distance sum connectivity
(36). It is defined as

Table 9. The Computation of the Schultz lndex for a


Tree TDe~ictinathe Carbon Skeleton

(a)A labeled tree T

Table 10. The Com~utationof the Haraw Number for a


Tree T ~ e l j i c t i nthe
~ Carbon skeleton
of 2,3-Dimethylhexane

(a)A labeled tree T

(b) The distance matrix of 7

(b) The adjacency matrix of T

ic) The distance matrix of 1


(c)The D-' matrix of T

(d) The adjacency-plus-distancematrix of T

:;I

:l

0.2 0.25
0
0.2 0.25 0.33 0.5 1
0.33
0.5 0.33 0.25 0.2 0
0.5 1
0.5 0.33 0.25 0.33 0
0.33 0.5 1

(d)The D-' matrix of T

0
1
0
1
0.25 1

(e)The valence row matrix of T

0.25 0.11 0.06 0.04 0.25 0.11


0.25
1
0.25 0.11 0.06 1
0
1
0.25 0.11 0.25 1

v ( T ) = [ 1 3 2 2 1 11

(1) The v[A + Dl row matrix


v[A + D](T) = [22 15 16 16 25 221
(g) The Schukz index of T

MTZ(T) = 2.22

(e)The Harary number of T


H(T) = % (14.1 + 16.0.25 + 14.0.11

+ 15 + 16 + 18 + 25 = 118

where M is the number of edges in G; v is the eyelomatic


number of G; and (D)iis the distance sum where i = 1,2, ...,

N.
The cyclomatic number = p(G) of a polycydic graph G
is equal to the minimum number of edges that must be
removed from G to transform it to the related acyclic
graph. For trees, = 0; for monocycles, v = 1.
The distance sum (Dlifor a vertex i of G represents a sum
of all entries in the corresponding row of the distance matrix.

Clearly the Wiener number can also be expressed in


terms of the distance sums.

Table 8 gives an example of computing the Balaban index.

+ 8.0.06 +40.04) = 10.10


Schultz lndex

The Schultz index, MTI = MTI(G) of G, was introduced


by Schultz in 1989 as the molecular topological index (37).
This index is defined below (21,371.
N

MTI =

ei

i=l

(10)

where the ezs (i = 1,2, ...,N) represent the elements of the


following row matrix of order N.

where v is the valency row matrix, A is the adjacency matrix, and D is the distance matrix. Table 9 gives an example of computing the Schultz index.
Haraiy Number

The Harary number, H = H(G) of G , was introduced by


PlavSiC et al. (38)in 1991 in honor of Professor Frank Harary on his 70th birthday He greatly influenced the development of graph theory and chemical graph theory. This
index is defined below.
Volume 69 Number 9 September 1992

705

Table 11. The Wiener Numbers IWI. Hosova Indices (a.


Randic indices irl.
Balaban lndices (4,Schultz indic& ( M T Harary
~
~ u n i b e r (H)
s and Boili&
Points (bp In 'C) of Alkanes with Up to 10 Carbon Atoms
Alkane
methane
ethane
propane
2-methylpropane
butane
2,2-dimethylpropane
2-methylbutane
pentane
2.2-dimethyl butane
2.3-dimethyl
butane
2-methylpentane
3-methylpentane
hexane
2,2,3-trimethylbutane
2,2-dimethylpentane
3,3-dimethylpentane
29-dimethylpentane
2,4-dimethylpentane
2-methylhexane
3-methylhexane
3-ethylpentane
heptane
2,2,3,3-tetramethylbutane
2,2,3-trimethylpentane
2,3,3-trimethylpentane
2,2,4-trimethylpentane
2,2-dimethylhexane
3,3-dimethylhexane
3-ethyl-3-methylpentane
2,3.4-trimethylpentme
2,3-dimethylhexane
3-ethyl-2methyipentane
3,4-dimethylhexane
2,4-dimethylhexane
2,s-dimethylhexane
2methylheptane
3-methylheptane
4methylheptane
3-ethylhexane
octane
2,2,3,3tetramethylpentane
2,2,3,4tetramethylpentane
2,2,3-trimethylhexane
2.2-dimethyl-3ethylpentane
3.34-trimethylhexane
2,3,3,4tetramethylpentane
2,3,3-trimethylhexane
2,3-dimethyl-3ethylpentane
2,2,4,4tetramethylpentane
706

0
1
4
9
10
16
18
20
28
29
32
31
35
42
46
44
46
48

52
50
48
56
58
63
62
66
71
67
64
65
70
67
66
71
74
79
76
75
72
84
82
86
92
88
88
84
90
86
88

Journal of Chemicall Education

MTI

bp

where V 2i s t h e matrix whose elements are the squares of the reciprocal


distances in G.
The D" matrix may be considered a s
the distance matrix of a class of specially weighted g r a p h s i n which
weights between vertices in G mimic
the Coulomb law between the sites in
the corresponding structure. Table 10
eives a n examole of comoutine t h e
karary number:
Table 11 eives the Wiener and Harary numbers, and the Hosoya, RandiC,
Balaban, and Schultz indices for alkanes with up to 10 carbon atoms.

Designing QSPR Models


There a r e several ways to design
QSPR models (39-44). Here we outline
one possible strategy. Figure 3 contains a flow diagram of the steps involved in the design of a QSPR model.
This is a n iterative approach.
Step 1. Get a reliable source of experimental data for a given set of molecules.
This initial set of molecules is sometimes
called the training set (45).The data in this
set must be reliable and accurate. The quality of the selected data is important because
it will affect all the following steps.
Step 2. The topological index is selected
and computed. This is also an important
step because selecting the appropriate topological index (or indices) can facilitate finding the most accurate model.
Step 3. The two sets of numbers are then
statistically analyzed using a suitable algebraic expression.

The QSPR model is thus a regression


model, and one must be careful about
its statistical stability. Chance factors
could yield spuriously accurate correlations (4648). T h e q u a l i t y of t h e
QSPR models can be conveniently
measured by the correlation coefficient
r and the standard deviations. Agood
QSPR model must have r > 0.99, while
s depends on the property. For example, for boiling points, s c 5 'C. Therefore, Step 3 is a central step in the design of the structure-property models.
Step 4. Predictions are made for the values of the molecular property for species
that are not part of the training set via the
obtained initial QSPR model. The unknown
molecules are ~ t ~ ~ t u rrelated
d l y to the initial set of compounds.
Step 5. The predictions are tested with
unknown molecules by experimental determination of the predicted properties. This
step is rather involved because it requires
acquiring or preparing the test molecules.
Step 6. If the tests support the predictions, one presents the QSPR model in its
final form with all necessary statistical
characteristics.

If the tests do not support the initial


QSPR model, it must be revised and

Table 11. Continued


MTI

Alkane
2,2,4trimethylhexane
2,4,4trimethylhexane
2,2,5-trimethylhexane
22-dimethyiheptane
3,bdimethylheptane
44-dimethylheptane
3-ethyi-3-methylhexane
3,bdiethylpentane
23.4-trimethylhexane
2,4-dimethyl-3-ethyipentane
2,3,5-trimethylhexane
2,3-dimethylheptane
3-ethyl-2-methylhexane
3,4-dimethylheptane
3-ethyl-4methylhexane
2,4-dimethylheptane
4-ethyl-2-methylhexane
3.5-dimethyiheptane
2,5-dimethylheptane
2,6-dimethyiheptane
2-methyloctane
3-methyioctane
4-methyloctane
Sethylheptane
4-ethylheptane
nonane
2,2,3,3,4-pentamethylpentane
2,2,3,3-tetramethylhexane
3-ethyl-22.3-trimethylpentane
3,3.4,4-tetramethylhexane
2,2,3,4,4-pentamethylpentane
2,2,3,4-tetramethylhexane
3-ethyl-2,2,44rimethylpentane
2,3,4,4tetramethyihexane
2,2,3,5tetramethylhexane
2,2,3-trimethylheptane
2,2dimethyl-3-ethylhexane
3,3,4trimethylheptane
3.3-dimethyl-4-ethylhexane
2,3,3,4-tetramethylhexane
3,4,4-trimethylheptane
3,4-dimethyl-3-ethylhexane
3-ethyl-234-lrimethylpentane
2,3,3,54etramethylhexane
2,3,3-trimethylheptane
2.3-dimethyl-3-ethylhexane
33diethyl-2-methylpentane
2,2,4,4tetramethylhexane
2,2,5-trimethylheplane
2,5,54rimethylheptane
2,2,6-trimethyiheptane
2,2-dimethyloctane
3,3-dimethyloctane
4,4-dimethyloctane
3-ethyl-3-methylheptane
4-ethyl-4-melhylheptane
3,3-diethylhexane
2,3,4,5tetramethylhexane

121

58

4.4641

3.8140
Volume 69

436

13.9933

Number 9

161

September 1992

707

Table 11. Continued


-

Alkane

2,3.4-trimethylheptane
2,3-dimethyi-4-ethylhexane
2,3-dimethyl-4-ethylhexane
2,4-dimethyl-3-ethyihexane

3,4,5-trimethyiheptane
2,4-dimethyl-3-isopropylpentane
3-isopropyl-2-methylhexane

2,35trimethylheptane
2,5-dimethyl-3-ethylhexane

2,4.5-trimethylheptane
2,3.6-trimethylheptane
2,3-dimethyloctane
3-ethyl-2-methylheptane
3.4-dimethyloctane
4-isopropylheptane
4-ethyl-3-methylheptane
43-dimethyloctane
3-ethyl-4-methylheptane
3.4-diethylhexane
2,4,6-trimethylheptane
2,4-dimethyloctane
4-ethyl-2-methylheptane
3,5-dimethyloctane
3-ethyl-5-methylheptane
2,5-dimethyloctane
5-ethyl-2-methylheptane
3.6-dimethyloctane
2.6-dimethyioctane
2.7-dimethyloctane
2-methylnonane
3-methylnonane
4-methylnonane
3-ethyloctane
5-methylnonane
4-ethyloctane
4-propylheptane
decane

165

89

4.9142

the procedure repeated. The QSPR model thus established, even for a narrow class of compounds, is a very useful tool for predicting the properties of hypothetical compounds a n d for t h e search for new compounds with
programmed properties (12).

3.5833
3.7561
3.7561
3.7979
3.6854
3.9835
3.7280
3.4617
3.6033
3.5027
3.3014
3.1296
3.3978
3.3088
3.4999
3.5637
3.3759
3.5299
3.6982
3.3374
3.1600
3.3908
3.2686
3.4123
3.1244
3.2555
3.1682
3.0333
2.9095
2.7732
2.8862
2.9680
3.0869
2.9984
3.2055
3.2951
2.6476

Step 1
The boiling points ('C) of the alkanes are taken from the
CRC Handbook of Chemistry a n d Physics (49) and Beilstein (50).

Step 2
We will consider a t this stage all six topological indices
discussed i n this report.

708

Journal of Chemical Education

find

Step 3
The following structure-property models are the most
successful for each index considered:
bp = 77.93 (M.97) ~30899'0'0137'- (3.35 f l . 0 2 ) 1 0 $
~

-164.24 (i4.99)

An Instructive Example
We will apply the procedure from the preceding section,
to give a n instructive example of the design of the QSPR
model for predicting the boiling points of alkanes. As the
initial set we will consider alkanes with up to 8 carbon
atoms (40 molecules).

MTI

(13)

Table 12. The Predicted Values of Boiling Points ('C)


of Nonanes
predicted boiling point
Nonane
2,2,3,3-letramethylpenlane
2,2,3,4-tetramethylpentane

~q 14
119.26

~q 15
119.40

2,2,3-trimethylhexane
2,2-dimethyl-3-ethylpentane

3,3,4-trimethylhexane
2,3,3,4-tetramethylpentane

233-trimethylhexane
2,3-dimethyl-3-ethylpentane
2,2,4,4-tetramethylpentane

2,2,Plrimethylhexane
2.4,Ptrirnethylhexane
2.2,5-lrimethylhexane
22-dimelhylheptane
3.3-dimethylheptane
4.4-dimethylheptane
3-ethyl-3-methylhexane
3.3-diethylpentane
2,3,Ptrimethylhexane
2,4-dimethyl-3-ethylpentane

Figure 3. A flow diagram of the steps involved in the design of a


QSPR model.
1: Source of experimental data. 2: Seledion of the topological index.
3:
the- QSPR
model. 4 Predictions.
- -~ . Statistical
- -~
.~
. .-work
. and
- .senino uo
-r
-5: Test ng the predictions. 6. The final foml ofthe OSPR model. S:
Tests confirmeathe nit:al model. Tne model appears to be satlsfaclory for f~rtherwork. hS: Tens rejected the nit al model as not sat~sfactory. Tne model mJst be rev,seo and the proced~rerepeateo ~ n t i l
the satisfactory model is obtained
~

2,3,5trimethylhexane
2,3-dimethylheptane
3-ethyl-2-methylhexane
3,4dimelhylheptane
3-ethyl-Pmethylhexane
2,4-dimethylheplane
4-ethyl-2-methylhexane
3,5-dimelhylheptane
2,5-dimethylheptane
2,6-dimethylheptane
2-methyloctane
3-methyloctane
4-methyloctane
3-elhylheptane
4-ethylheptane
nonane

The most accurate models are those based on in Z (eq 14)


and x (eq 15). They will be used in the next step.

The procedure may be repeated, and we will eventually


arrive a t the best possible QSPR model for predicting the
boiling points of alkanes. -

Step 4

Step 6

We use eqs 14 and 15 to predict the boiling points of nonanes (35 molecules) (see Table 12).

Step 5
We compare the predicted and experimental values of
the nonane boiling ~ o i n t (see
s Table 13).
Both models have problems with some members of the
nonane series. However. when S t e 3~is r e ~ e a t e dusine the
boiling points of all alkanes with
to 9 Arban atom; the
QSPR models based on in Z and x did not improve. The
slight improvement happened only when a hiparametric
model (with x and N is the number of carbon atoms in alkane) was used.
This model is given by

up

All three models expressed as 14, 15, and 1 9 may serve

as reliable models for predicting the alkane boiling points.


Plots of bp vs in Z , bp vs X , and bp vs NXand the accompanying statistical data are given, respectively, in Figures 46.
The boiling points of alkanes have been predicted many
times (8,13,15,3037,40,51). Althoughmost of the QSPR
models produced are very accurate (r > 0.998, s < 2 W,
they suffer from several shortcomings.
i. Methane was not considered. In some cases other lighter
alkanes, such as ethane and propane, were also eliminated
from the study.
ii. Models were built for a limited set of alkanes, usually for
C4-C7 families.
iii. The complexity of some of the accurate QSPR models in
the literature is forbidding. For example, one of the most a m rate QSPR models for predicting boiling points of alkanes is
the following (40).(All alkanes with up to 9 carbon atoms have
been considered but methsne.)

Volume 69 Number 9 Sevternber 1992

709

Table 13. Comparison between Predicted (Two Models) and Experimental Values of Boiling Points ('C) of Nonanes

Nonane

(bp)exp

Model
(14)

Model
(15)

Nonane

2,2,3,3-tetramethylpentane

2,3,4-trimethylhexane

2.2,3,4-tetramethylpentane

2,4-dimethyl-3ethylpentane

2.2,3-trimethylhexane

(bp).,

Model
(14)

2,3,5-trimethylhexane

2,2-dimethyl-3ethylpentane

2.3-dimethylheptane

3,3,4-trimethylhexane

3-ethyl-2methylhexane

2,3,3,4-tetramethylpentane

3,4dimethylheptane

2,3,3-trlmethylhexane

3-ethyl-4-methylhexane

2.3-dimethyl-3ethylpentane

2,4dmethylheptane
4-ethyl-2-methylhexane

2,2,4.&tetramethylpentane

3,bdimethyiheptane

2,2+trimethylhexane

2,5dimethylheptane

2,4,&trimethylhexane

2,6dimethylheptane

2.23-trimethylhexane

2-methyloctane

2,2-dlmethylheptane

3-methyloctane

3,3-dimethylheptane

4-methyloctane

4,4-dimethylheptane

3-ethylheptane

3-ethyl-3-methylhexane

4-ethylheptane

32-diethylpentane

nonane

-2M
0.W

0.50

1.w

1.50

2w

In Z
gure 4. A plot of bp vs In Zfor the first 40 alkanes.

710

Journal of Chemical Education

250

3.w

3.50

Model
(15)

Figure 5. Aplot of bpvs x for the first 40 alkanes.

Figure 6. A plot of bpvs Ny for the first 75 alkanes.

Volume 69 Number 9 September 1992

711

The
as follows.

in eq 20 are defined Figure 7. Examples of a path (3rd order), a cluster (3rd order) and a pathcluster (4th order)for a
tree Tcorresponding to 3-methylpentane.
The extended connectivity index

m ~ C[d(i)
=
dm

... d(m + l)la5

(21)

where m represents the order of possible fragments. When


m = 1. framnents are edges which lead to the fint-order
connekivitYindex 'x.

The zero-connectivity index


u

where nl, n2,n3, and n4 are the numbers of vertices with


valencies 1,2,3, and 4, respectively
Connectivity indices "'x, of order m and type t can be obtained by summing analogous terms over subgraphs involving paths (t = p ) ,clusters (t = c), or path-cluster ( t = pc)
combinations ofm edges. Examples of a path, a cluster and
a path-cluster are given in Figure 7.
To conclude this section we stress that there is no simple
QSPR model for predicting boiling points over a wide
range of alkanes. However, if we limit ourselves to a simple
family of alkanes (especially with less than 10 carbon
atoms), then simple aceurate models are possible (34).
Conclusions
In this report we presented a strategy for designing the
quantitative structure-property relationships based on topological indices. The instructive example was directed to
the design of the structure-property model for predicting
the boiling points of alkanes. Six selected topological indices were tested. The most accurate QSPR models for alkane
~ o i n t sare based on ln 2.Y. and Nu. The accu~
~boiline
~
racy of t h l bodel was judged according to thLcorrelation
coefficient and the standard error. The umer limits for the
accurate models were set at r > 0.995 z s < 5 T.
We conclude that there is no simple single-parameter
QSPR model for predicting the boiling points over a wide
range of alkanes due to the great diversity among experimental values. Multivariate regression models appear to
be verv accurate due to a varietv ~arametersinvolved in
the correlation. Each of these p&.meters takes care of a
certain structural detail of a large alkane. When all diverse structural features of alkanes are considered, the
model usually gives extremely good agreement between
the experimental and calculated boiling points.
~

~~~

Acknowledaement
We are thankful to the Ministry of Science, Technology,
and Informatics of the Republic of Cmatia for support.

712

Journal of Chemical Education

3. LipecL, R. LEnuimn. Tmrhi. Chem 1989.8, 1.


4. honey, R. J. Chem. Ed=. 1886,62,846.
5. DdiC,~R~uaiuaiB&Po~oi4
Skolaka knjige: W b . 1987. This is a bilibiligvsl edition:
cmatlan and English.
m
&It
ad micam legem uirivm in
6. Basmvick, R J. ~ I k - i ~ p h i l o e o p h inotvmlia
mtum exUffntium; Runondinl: Venetia, 1763. The English translation ia also
m4able: The TheoryafNolvrol P h i h p h y ; MIT Cambridge,MA, 1966.
7. Daw, H. EIPmntaofCkmimlPhllosophy; London, 1812.
8. %ajstiC, N. Chemiml Gmph Theory; CRC: BoeaRaton. FL, 1963:Vol. lI,Chapter

I1 h u n a y . D I1 InCh.mloolAppiicanomo/T~pd~g)
ondUmph T h o ,fin& R B.
Ed .Elsene,: Amsterdam. 1981; p 159.
12 Smkcneh. M. I.. Stankcnch. I V . M m + X. S R u m C k m Roo 1S88.57.337.
13. Hanscn.P J : Jura. P 1: J C h m Edvc LW. 65.575
11 Rsndk. M .I Math ChDm 1890.4, 337.
15 llopava H Bull C h e m S a . Jomn 1071.44.2332
.
16. Trinajatif.N. Ckmlml Gmkh Thewry, 2nd neviaeded.; CRC: BoeaRaton, FL,199%
chapter 3.
17. huvray, D
. H. J. MoL S t m t . (ThmhemJ 1988,285,187.
18. Randii. M. J. Moth. C h . 1891. 7.155.
19. Bonrheu. D . l b n s p l k . U J Chrm Phya. l(m. 67.4517.
20 F h l a b ~ n T.Bumms.
.~
L V Math Chm. l M v l k ~ mHuh?. lW.9. 14.21:l
21 .\lullcr. W. R ; Szymanalu. K ; Knop. J V.. 'lhna).uc. S J Chem In/ Compur Sn
1880.30.160
22. Plav3iC.D.; N i b % S.;Rinajsti6,N. J Moth. Ck.m in p m s .
23. szymansld. K:~ o u e rw
, R. ffiop, J. V;%sjati&, N. ~ n t J.
. @onrum cham:
Qunntum ChPm Symp. 1989,20,173.
24. Haran. F. Gmph Theary;Addison-Wesley: Reading,MA, 1971: 2nd prmtmg.
25. %ajetif,N. C h a m i d Gmph Thmy:CRC: Baca Raton, K, 1983;Val. 1.
26. Chartrand, G.Gmphs m Mothematical M&b; Rindle, we be^, and Sehmidt: B e ton, MA, 1977.
27. lhngstiC, N. In MATHICHEMICOMP 1967; Lacher, R C.. Ed.; Elsevier Amater
dam, 1986, p83.
28. Sylvester, J. J.Natum 1878,17.264.
29. hbelta, F. 8. Dkrete Molhemniiml M&l; Rentiee-Hall: Englearaod CIS%NJ,
1976: p 56.
30. Wiener.H. J.Am Chem.Soc 1917,69.17.
31. RandiC, M.J.Am. Chem Soc. lW6,97,6MR.
82. %zinger, M.: Chr(den, J. R.; Dub0is.J. E. J. C h . 1C Compul: &i. 19S5,26,23.
33. huvray, D. U %.Am 1988,254,40.
34. Seybld, P. 0.; May,M.;Bagal,U. A J Ckm. Edvc 1%87,84,575.
35. Kier. h B.:Hall. L.H. Molffvlor Conmtiuitv in Stmbre4ctiuihlAdwie..
Wiley:
.
N&Y&, 1986.
36. Rslaban,A. T C h . Phys. Lo#. l M , 89.399
37. Sehultz, H. P. J. Chem Inf Compvt S c i 1983.29.221.
88. PhraiC, D.; Nikoli&,S.; Trinajatik,N. J. Moth Chem, sutmuttedforpublicatim.
39. Ran&&,M.; Jeman-Bldif, B.; Gmaaman, S.C.; Rounay. D. H. Math. Compul.
Mmklling 1968,6,571.
4C. Needham,D. E.; Wei,M.;&ybld.P G.J.Am.Chem S a 1988,120,4188.
41. Nizhnii, S. V.; Epehtein,N. A. Rum Chem Rou. 1078,47,363.
42. Hol, W. G. J A w u Ckm. Id.Ed*. En#. 1983, 26,767.
43. B a d , S. c.; Niemi. G. Vdth. G. D. I" C o m p v l n t i ~Chemiml
~l
Gmph ThmX
huvray, D
. H.,Ed.; Nova: New Ymk,1990; p 235.
U. Psta,B.,Mayer, J . M . A c l a P h a n Jugarl. 1990,40,315.
45.
W.; h v i l k m , J. InPmtlool Applimtlolo o f Q ~ m t i f o t i iSm&=m4cIiuity Roiationahipa (QSARJ in Enuimnmnfd Clumiafry and lbdmlogy; M e r ,
w: Deviuem, J.. Ed*.: muarer:Dordnecht, 1990;p 1.
46. Topliaa. J. G.; Coste1lo.R. J. J M d . Chem la?& 15,1066.
47. lbpliss, J. G.; Edwards, R. P.J Med Chem 1818.22.1238.
48. Banchav, D.;Mekenyan, 0.J. M&. Ckm.. h pms.
49. We&, R. C. CRCHa&kofChrmlatnondPhysiac,
67th d , 3 d printing: CRC:
Baea Raton, FL.1987.
50. Re&tPmbHandbueh &r%Mis~ishen Chamie.
51. N i p , P A,: Belaban, T.-8.;Balaban,A T J.Math. Chem 1987,1,61.

..

*;

You might also like