Professional Documents
Culture Documents
10
A Graph-Theoretical Approach
to Structure-Property Relationships
-
Zlatko MlhaliC
Faculty of Science and Mathematics,The University of Zagreb, Strossmayerovtrg 14.41000 Zagreb, The Republic of Croatia
Nenad TrinajstlC
The Rugjer BoSkoviC Institute, P.O.B. 1016,41001 Zagreb,The Republic of Croatia
A fundamental concept of chemistry is that the structural characteristics of a molecule are responsible for its
pmperties (1).This was pointed out in the middle of the
last century by Crum Brown and Fraser (2) who had also
devised one of the first structure-property models. However, the earliest work in which this relationship was observed (the toxicity of methyl and amyl alcohols)was a thesis by Cms in 1863 (3).
A Topological Model of Matter
The origin of the structure-property concept can be
traced (4) to the work of the Croatian Jesuit priest, scientist, and philosopher Rugjer Josip BoGkoviC (5)who introduced the idea of representing atoms as points in space (6).
(His major work was the theory of a single law of forces.)
By allowing the point atoms to assume a variety of different arrangements, BogkoviC was able to account for the existence of different substances.
In this way the BobkoviC model may be considered as the
forerunner of a topological model for the structure of matter. BoBkovib's fundamental idea, which is of the greatest
importance in chemistry, was that substances have different properties because they have differentstructures. This
idea was used, for example, by Davy to rationalize the difference between diamond and graphite (4, 7).
QSPR
The structure-~m~ertv
. - relationships wantifv the connection between ihe structure and p&pekies o<moleculea
(81. These relationships are mathematical models that
allow the prediction ofproperties from structural parameters. They are called quantitative structure-property rela-
Topological Indexes
A graph-theoretical approach to QSPR is based on the
use of topological (graph-theoretical) indices for encoding
the structural information (8-14). The term topological
index (15) indicates a characterization of a molecule (or a
corresponding molecular graph (16)) by a single number.
The need to represent molecular structure by a single
number arises from the fact that most molecular pmperties are recorded as single numbers. Therefore, QSPR
modelling reduces to a correlation between the two sets of
numbers via an algebraic expression. (One set of numbers
represents the properties, and the other set represents the
structures of molecules under study.)
Characterizing a molecule by a single number represents a considerable loss of information: A three-dimensional object (molecule)is described by a one-dimensional
object (topological index). However, what is surprising is
how much of the relevant structural information is still retained in a given topological index.
There are-more than 120 topological indices available to
date in the literature (171,without any sign that their pmVolume 69 Number 9 September 1992
701
Molecular Gmphs
A special class of chemical graphs are molecular graphs.
Molecular graphs (or constitutional graphs) are chemical
graphs that represent the constitution of molecules. In
these graphs vertices correspond to individual atoms, and
edges correspond to the bonds between them. An interesting historical detail i.; related to the concept of the molecular graph: The term graph was introduced by English
mathematician Sylvester (28) in 1878 on the basis of the
constitutional formulas used by the chemists of his day.
To simplify the manipulation of molecular graphs, hydrogen-depleted graphs are often used. Such graphs represent only the molecular skeletons, omitting hydrogen
atoms and their bonds. As an example, Figure 2 gives a
labeled molecular hydrogen-depleted graph that depicts
the carbon skeleton of 2,3,4-trimetbylhexane.
Figure 2. A aoelea, hydrogen-depleted,molecular graph correspondmg lo tne carbon skeleton of2.3.4-trimethylhexane.Tne vertices correspono to aroms, an0 the edges correspono to chem.ca wnos.
Analyzing and Comparing Graphs
(1)
(a)A tree T
where lij is the length of the shortest path (i.e., the distance) between the vertices i and j in G.
Very often the distance matrix of a graph G can be generated using powers of the corresponding adjacency matrix
of G (29). Table 3 gives the adjacency matrix and the distance matrix that correspond to the molecular graph in
Figure 2.
Table 3. The Adjacency Matrix and the Distance Matrix
of the Molecular Graph in Figure 2
(iv) p(T:3)= 2
Wiener Number
The Wiener number, W = W(G) of G, wasintroduced by
Wiener in 1947 as the path number (30).This topological
index is defined as the half-sum of the elements of the distance matrix (15).
(c)The Hosoya index of T
Z(n= p(T;O)+ p(T;l)+ p(T;2) + p(T;3)= 17
Table 4 gives an example for computingthe Wiener number.
Volume 69 Number 9 September 1992
703
12
0.7071
1,3
0.5774
1,4
0.5
22
0.5
23
0.4082
2,4
0.3536
3,3
0.3333
3,4
0.2887
4,4
0.25
1
0
1
2
3
1
2
2
1
0
1
2
2
1
3
2
1
0
1
3
2
4
3
2
1
0
4
3
2
1
2
3
4
0
3
3
2
1
2
3
3
0
(a)A tree T
34) (and also in quantitative structure-reactivity relationship ($SARI (35)).
The Randid index is defined as
42=2
43=2
bL2= 1
b3=4
Hosoya lndex
This expression reveals that the Randid indices of hydmcarbons are fully determined by the counts of the edge
types in the corresponding hydrogen-depleted graphs.
Table 7 gives an example of computingthe RandiC index by
means of eq 6.
Balaban lndex
:;I
:l
0.2 0.25
0
0.2 0.25 0.33 0.5 1
0.33
0.5 0.33 0.25 0.2 0
0.5 1
0.5 0.33 0.25 0.33 0
0.33 0.5 1
0
1
0
1
0.25 1
v ( T ) = [ 1 3 2 2 1 11
MTZ(T) = 2.22
+ 15 + 16 + 18 + 25 = 118
N.
The cyclomatic number = p(G) of a polycydic graph G
is equal to the minimum number of edges that must be
removed from G to transform it to the related acyclic
graph. For trees, = 0; for monocycles, v = 1.
The distance sum (Dlifor a vertex i of G represents a sum
of all entries in the corresponding row of the distance matrix.
MTI =
ei
i=l
(10)
where v is the valency row matrix, A is the adjacency matrix, and D is the distance matrix. Table 9 gives an example of computing the Schultz index.
Haraiy Number
705
0
1
4
9
10
16
18
20
28
29
32
31
35
42
46
44
46
48
52
50
48
56
58
63
62
66
71
67
64
65
70
67
66
71
74
79
76
75
72
84
82
86
92
88
88
84
90
86
88
MTI
bp
Alkane
2,2,4trimethylhexane
2,4,4trimethylhexane
2,2,5-trimethylhexane
22-dimethyiheptane
3,bdimethylheptane
44-dimethylheptane
3-ethyi-3-methylhexane
3,bdiethylpentane
23.4-trimethylhexane
2,4-dimethyl-3-ethyipentane
2,3,5-trimethylhexane
2,3-dimethylheptane
3-ethyl-2-methylhexane
3,4-dimethylheptane
3-ethyl-4methylhexane
2,4-dimethylheptane
4-ethyl-2-methylhexane
3.5-dimethyiheptane
2,5-dimethylheptane
2,6-dimethyiheptane
2-methyloctane
3-methyioctane
4-methyloctane
Sethylheptane
4-ethylheptane
nonane
2,2,3,3,4-pentamethylpentane
2,2,3,3-tetramethylhexane
3-ethyl-22.3-trimethylpentane
3,3.4,4-tetramethylhexane
2,2,3,4,4-pentamethylpentane
2,2,3,4-tetramethylhexane
3-ethyl-2,2,44rimethylpentane
2,3,4,4tetramethyihexane
2,2,3,5tetramethylhexane
2,2,3-trimethylheptane
2,2dimethyl-3-ethylhexane
3,3,4trimethylheptane
3.3-dimethyl-4-ethylhexane
2,3,3,4-tetramethylhexane
3,4,4-trimethylheptane
3,4-dimethyl-3-ethylhexane
3-ethyl-234-lrimethylpentane
2,3,3,54etramethylhexane
2,3,3-trimethylheptane
2.3-dimethyl-3-ethylhexane
33diethyl-2-methylpentane
2,2,4,4tetramethylhexane
2,2,5-trimethylheplane
2,5,54rimethylheptane
2,2,6-trimethyiheptane
2,2-dimethyloctane
3,3-dimethyloctane
4,4-dimethyloctane
3-ethyl-3-methylheptane
4-ethyl-4-melhylheptane
3,3-diethylhexane
2,3,4,5tetramethylhexane
121
58
4.4641
3.8140
Volume 69
436
13.9933
Number 9
161
September 1992
707
Alkane
2,3.4-trimethylheptane
2,3-dimethyi-4-ethylhexane
2,3-dimethyl-4-ethylhexane
2,4-dimethyl-3-ethyihexane
3,4,5-trimethyiheptane
2,4-dimethyl-3-isopropylpentane
3-isopropyl-2-methylhexane
2,35trimethylheptane
2,5-dimethyl-3-ethylhexane
2,4.5-trimethylheptane
2,3.6-trimethylheptane
2,3-dimethyloctane
3-ethyl-2-methylheptane
3.4-dimethyloctane
4-isopropylheptane
4-ethyl-3-methylheptane
43-dimethyloctane
3-ethyl-4-methylheptane
3.4-diethylhexane
2,4,6-trimethylheptane
2,4-dimethyloctane
4-ethyl-2-methylheptane
3,5-dimethyloctane
3-ethyl-5-methylheptane
2,5-dimethyloctane
5-ethyl-2-methylheptane
3.6-dimethyloctane
2.6-dimethyioctane
2.7-dimethyloctane
2-methylnonane
3-methylnonane
4-methylnonane
3-ethyloctane
5-methylnonane
4-ethyloctane
4-propylheptane
decane
165
89
4.9142
the procedure repeated. The QSPR model thus established, even for a narrow class of compounds, is a very useful tool for predicting the properties of hypothetical compounds a n d for t h e search for new compounds with
programmed properties (12).
3.5833
3.7561
3.7561
3.7979
3.6854
3.9835
3.7280
3.4617
3.6033
3.5027
3.3014
3.1296
3.3978
3.3088
3.4999
3.5637
3.3759
3.5299
3.6982
3.3374
3.1600
3.3908
3.2686
3.4123
3.1244
3.2555
3.1682
3.0333
2.9095
2.7732
2.8862
2.9680
3.0869
2.9984
3.2055
3.2951
2.6476
Step 1
The boiling points ('C) of the alkanes are taken from the
CRC Handbook of Chemistry a n d Physics (49) and Beilstein (50).
Step 2
We will consider a t this stage all six topological indices
discussed i n this report.
708
find
Step 3
The following structure-property models are the most
successful for each index considered:
bp = 77.93 (M.97) ~30899'0'0137'- (3.35 f l . 0 2 ) 1 0 $
~
-164.24 (i4.99)
An Instructive Example
We will apply the procedure from the preceding section,
to give a n instructive example of the design of the QSPR
model for predicting the boiling points of alkanes. As the
initial set we will consider alkanes with up to 8 carbon
atoms (40 molecules).
MTI
(13)
~q 14
119.26
~q 15
119.40
2,2,3-trimethylhexane
2,2-dimethyl-3-ethylpentane
3,3,4-trimethylhexane
2,3,3,4-tetramethylpentane
233-trimethylhexane
2,3-dimethyl-3-ethylpentane
2,2,4,4-tetramethylpentane
2,2,Plrimethylhexane
2.4,Ptrirnethylhexane
2.2,5-lrimethylhexane
22-dimelhylheptane
3.3-dimethylheptane
4.4-dimethylheptane
3-ethyl-3-methylhexane
3.3-diethylpentane
2,3,Ptrimethylhexane
2,4-dimethyl-3-ethylpentane
2,3,5trimethylhexane
2,3-dimethylheptane
3-ethyl-2-methylhexane
3,4dimelhylheptane
3-ethyl-Pmethylhexane
2,4-dimethylheplane
4-ethyl-2-methylhexane
3,5-dimelhylheptane
2,5-dimethylheptane
2,6-dimethylheptane
2-methyloctane
3-methyloctane
4-methyloctane
3-elhylheptane
4-ethylheptane
nonane
Step 4
Step 6
We use eqs 14 and 15 to predict the boiling points of nonanes (35 molecules) (see Table 12).
Step 5
We compare the predicted and experimental values of
the nonane boiling ~ o i n t (see
s Table 13).
Both models have problems with some members of the
nonane series. However. when S t e 3~is r e ~ e a t e dusine the
boiling points of all alkanes with
to 9 Arban atom; the
QSPR models based on in Z and x did not improve. The
slight improvement happened only when a hiparametric
model (with x and N is the number of carbon atoms in alkane) was used.
This model is given by
up
709
Table 13. Comparison between Predicted (Two Models) and Experimental Values of Boiling Points ('C) of Nonanes
Nonane
(bp)exp
Model
(14)
Model
(15)
Nonane
2,2,3,3-tetramethylpentane
2,3,4-trimethylhexane
2.2,3,4-tetramethylpentane
2,4-dimethyl-3ethylpentane
2.2,3-trimethylhexane
(bp).,
Model
(14)
2,3,5-trimethylhexane
2,2-dimethyl-3ethylpentane
2.3-dimethylheptane
3,3,4-trimethylhexane
3-ethyl-2methylhexane
2,3,3,4-tetramethylpentane
3,4dimethylheptane
2,3,3-trlmethylhexane
3-ethyl-4-methylhexane
2.3-dimethyl-3ethylpentane
2,4dmethylheptane
4-ethyl-2-methylhexane
2,2,4.&tetramethylpentane
3,bdimethyiheptane
2,2+trimethylhexane
2,5dimethylheptane
2,4,&trimethylhexane
2,6dimethylheptane
2.23-trimethylhexane
2-methyloctane
2,2-dlmethylheptane
3-methyloctane
3,3-dimethylheptane
4-methyloctane
4,4-dimethylheptane
3-ethylheptane
3-ethyl-3-methylhexane
4-ethylheptane
32-diethylpentane
nonane
-2M
0.W
0.50
1.w
1.50
2w
In Z
gure 4. A plot of bp vs In Zfor the first 40 alkanes.
710
250
3.w
3.50
Model
(15)
711
The
as follows.
in eq 20 are defined Figure 7. Examples of a path (3rd order), a cluster (3rd order) and a pathcluster (4th order)for a
tree Tcorresponding to 3-methylpentane.
The extended connectivity index
m ~ C[d(i)
=
dm
(21)
~~~
Acknowledaement
We are thankful to the Ministry of Science, Technology,
and Informatics of the Republic of Cmatia for support.
712
I1 h u n a y . D I1 InCh.mloolAppiicanomo/T~pd~g)
ondUmph T h o ,fin& R B.
Ed .Elsene,: Amsterdam. 1981; p 159.
12 Smkcneh. M. I.. Stankcnch. I V . M m + X. S R u m C k m Roo 1S88.57.337.
13. Hanscn.P J : Jura. P 1: J C h m Edvc LW. 65.575
11 Rsndk. M .I Math ChDm 1890.4, 337.
15 llopava H Bull C h e m S a . Jomn 1071.44.2332
.
16. Trinajatif.N. Ckmlml Gmkh Thewry, 2nd neviaeded.; CRC: BoeaRaton, FL,199%
chapter 3.
17. huvray, D
. H. J. MoL S t m t . (ThmhemJ 1988,285,187.
18. Randii. M. J. Moth. C h . 1891. 7.155.
19. Bonrheu. D . l b n s p l k . U J Chrm Phya. l(m. 67.4517.
20 F h l a b ~ n T.Bumms.
.~
L V Math Chm. l M v l k ~ mHuh?. lW.9. 14.21:l
21 .\lullcr. W. R ; Szymanalu. K ; Knop. J V.. 'lhna).uc. S J Chem In/ Compur Sn
1880.30.160
22. Plav3iC.D.; N i b % S.;Rinajsti6,N. J Moth. Ck.m in p m s .
23. szymansld. K:~ o u e rw
, R. ffiop, J. V;%sjati&, N. ~ n t J.
. @onrum cham:
Qunntum ChPm Symp. 1989,20,173.
24. Haran. F. Gmph Theary;Addison-Wesley: Reading,MA, 1971: 2nd prmtmg.
25. %ajetif,N. C h a m i d Gmph Thmy:CRC: Baca Raton, K, 1983;Val. 1.
26. Chartrand, G.Gmphs m Mothematical M&b; Rindle, we be^, and Sehmidt: B e ton, MA, 1977.
27. lhngstiC, N. In MATHICHEMICOMP 1967; Lacher, R C.. Ed.; Elsevier Amater
dam, 1986, p83.
28. Sylvester, J. J.Natum 1878,17.264.
29. hbelta, F. 8. Dkrete Molhemniiml M&l; Rentiee-Hall: Englearaod CIS%NJ,
1976: p 56.
30. Wiener.H. J.Am Chem.Soc 1917,69.17.
31. RandiC, M.J.Am. Chem Soc. lW6,97,6MR.
82. %zinger, M.: Chr(den, J. R.; Dub0is.J. E. J. C h . 1C Compul: &i. 19S5,26,23.
33. huvray, D. U %.Am 1988,254,40.
34. Seybld, P. 0.; May,M.;Bagal,U. A J Ckm. Edvc 1%87,84,575.
35. Kier. h B.:Hall. L.H. Molffvlor Conmtiuitv in Stmbre4ctiuihlAdwie..
Wiley:
.
N&Y&, 1986.
36. Rslaban,A. T C h . Phys. Lo#. l M , 89.399
37. Sehultz, H. P. J. Chem Inf Compvt S c i 1983.29.221.
88. PhraiC, D.; Nikoli&,S.; Trinajatik,N. J. Moth Chem, sutmuttedforpublicatim.
39. Ran&&,M.; Jeman-Bldif, B.; Gmaaman, S.C.; Rounay. D. H. Math. Compul.
Mmklling 1968,6,571.
4C. Needham,D. E.; Wei,M.;&ybld.P G.J.Am.Chem S a 1988,120,4188.
41. Nizhnii, S. V.; Epehtein,N. A. Rum Chem Rou. 1078,47,363.
42. Hol, W. G. J A w u Ckm. Id.Ed*. En#. 1983, 26,767.
43. B a d , S. c.; Niemi. G. Vdth. G. D. I" C o m p v l n t i ~Chemiml
~l
Gmph ThmX
huvray, D
. H.,Ed.; Nova: New Ymk,1990; p 235.
U. Psta,B.,Mayer, J . M . A c l a P h a n Jugarl. 1990,40,315.
45.
W.; h v i l k m , J. InPmtlool Applimtlolo o f Q ~ m t i f o t i iSm&=m4cIiuity Roiationahipa (QSARJ in Enuimnmnfd Clumiafry and lbdmlogy; M e r ,
w: Deviuem, J.. Ed*.: muarer:Dordnecht, 1990;p 1.
46. Topliaa. J. G.; Coste1lo.R. J. J M d . Chem la?& 15,1066.
47. lbpliss, J. G.; Edwards, R. P.J Med Chem 1818.22.1238.
48. Banchav, D.;Mekenyan, 0.J. M&. Ckm.. h pms.
49. We&, R. C. CRCHa&kofChrmlatnondPhysiac,
67th d , 3 d printing: CRC:
Baea Raton, FL.1987.
50. Re&tPmbHandbueh &r%Mis~ishen Chamie.
51. N i p , P A,: Belaban, T.-8.;Balaban,A T J.Math. Chem 1987,1,61.
..
*;