Arti®cial Neural Networks For Computer-Based Molecular Design

Progress in
Biophysics
& Molecular
Biology
PERGAMON Progress in Biophysics & Molecular Biology 70 (1998) 175±222
Arti®cial neural networks for computer-based molecular

design
Gisbert Schneider a, *, Paul Wrede b
a
F. Homann-La Roche Ltd., Pharmaceuticals Division, Molecular Design and Bioinformatics, CH-4070 Basel,
Switzerland
b
Freie UniversitaÈt Berlin, UniversitaÈtsklinikum Benjamin Franklin, Institut fuÈr Medizinische/Technische Physik und
Lasermedizin, AG Molekulare Bioinformatik, Krahmerstraûe 6±10, D-12207 Berlin, Germany
Abstract
The theory of arti®cial neural networks is brie¯y reviewed focusing on supervised and unsupervised
techniques which have great impact on current chemical applications. An introduction to molecular
descriptors and representation schemes is given. In addition, worked examples of recent advances in this
®eld are highlighted and pioneering publications are discussed. Applications of several types of arti®cial
neural networks to compound classi®cation, modelling of structure±activity relationships, biological
target identi®cation, and feature extraction from biopolymers are presented and compared to other
techniques. Advantages and limitations of neural networks for computer-aided molecular design and
sequence analysis are discussed. # 1998 Elsevier Science Ltd. All rights reserved.
Keywords: Combinatorial chemistry; Compound library; Drug design; Genetic algorithm; Molecular diversity; Mol-
ecular descriptor; Neural network; Sequence analysis; Structure±activity relationship
Abbreviations: 2D, two-dimensional, 3D, three-dimensional, ANN, arti®cial neural networks, ART, adaptive res-
onance theory, BAM, bi-directional associative memory, Bp, back-propagation-of-errors, CAMD, computer-aided
molecular design, CPN, counter-propagation network, LMS, least-mean-square, PCA, principle component analysis,
QSAR, quantitative structure±activity relationship, RBF, radial basis function, ReNDeR, reversible non-linear
dimension reduction, SAR, structure±activity relationship, SBB, structural building blocks, SNN, supervised neural
networks, UNN, unsupervised neural networks.
* Corresponding author. Tel.: +41-61-6870696; fax: +41-61-6887408; e-mail: gisbert.schneider@roche.com.
0079-6107/98/$19.00 # 1998 Elsevier Science Ltd. All rights reserved.

PII: S 0 0 7 9 - 6 1 0 7 ( 9 8 ) 0 0 0 2 6 - 1
176 G. Schneider, P. Wrede / Progress in Biophysics & Molecular Biology 70 (1998) 175±222
1. Introduction
Intelligent algorithms for computer-aided molecular design (CAMD) have become an

important part of lead discovery and optimization, biological target identi®cation, protein and
nucleic acid design, and assessment of data base diversity (Hunter, 1993; Wrede and Schneider,
1994; Jackson, 1995; van de Waterbeemd, 1995a,b; Blundell, 1996; BoÈhm et al., 1996; Willett,
1997; Young et al., 1997). Various methodological CAMD approaches help speed up the drug
development process either by planned alteration of molecular template structure(s) or by de
novo design (MuÈller, 1995). Most of today's CAMD methods are based on knowledge of
three-dimensional molecular structures. In this work we highlight new techniques for molecule
design and molecular feature extraction, which can be applied when three-dimensional
molecular structures are unknown. A necessary prerequisite for any rational attempt to identify
or even design molecules with a desired property or activity is an accurate model of the
underlying structure±activity relationship (SAR) (Hansch and Leo, 1995). Such SAR models
serve as a guideline in the search for novel or optimized compounds in evolutionary design
cycles which have become possible due to advances in both compound generation and
screening technology. It is obvious that the quality of the model determines the success rate of
this multi-dimensional design process. Only if a relevant SAR model is used, e.g. a particular
pharmacophore hypothesis, can rational molecular design be successful.
How can we develop a good SAR model? It is apparent that no cure-all recipe exists,
nevertheless some general rules of thumb can be given. One approach is to consider the task as
a pattern recognition problem, where three main aspects must be considered: ®rst, the data
used for generation of a SAR hypothesis should be representative of the particular problem;
second, the way molecular structures are described for model generation and its level of
abstraction must allow for a reasonable solution for the pattern recognition task; and third,
the model must permit non-linear relationships to be formulated since the interdependence
between molecular activities and structural entities is generally non-linear. The ®rst point seems
to be trivial but selection of representative data for hypothesis generation is very dicult and
often impossible due to a lack of data. This article mainly focuses on a discussion of the two
latter points, namely dierent levels of data representations and descriptor types, and non-
linear feature extraction from a given data set by arti®cial neural networks (ANN). Various
types of ANN are of considerable value for many ®elds of research, including chemistry,
biology, medicine, and pharmaceutical research. Main tasks performed by these systems are
. feature extraction;
. function estimation and non-linear modelling;
. classi®cation;
. prediction.
For many applications alternative techniques exist (Milne, 1997); ANN provide, however, an
often more ¯exible and elegant approach oering unique solutions to these tasks. The
paradigms oered by ANN lie somewhere between purely empirical and ab initio approaches.
Neural networks
G. Schneider, P. Wrede / Progress in Biophysics & Molecular Biology 70 (1998) 175±222 177
. `learn' from examples and acquire their own `knowledge' (induction);

. are able to generalize;
. provide ¯exible non-linear models of input/output relationships;
. are able to cope with noisy data and are fault-tolerant.
Here we review some of the pioneering work in this ®eld highlighting the development of
ANN for (Q)SAR modelling and molecular design which has dramatically gained momentum
during the last several years (Bradley, 1997). A special emphasis will be on ANN for feature
extraction from nucleic acid and protein sequences (Section 4.1) which can help in ®nding new
drug targets and enlarging today's target spectrum (Drews and Ryser, 1997).
2. Principles of neural network development
Arti®cial neural networks have found a widespread use for classi®cation tasks and function
approximation in many ®elds of chemistry and bioinformatics (Table 1). For these kinds of
data analysis mainly two dierent types of networks are employed, `supervised' neural
networks (SNN) and ùnsupervised' neural networks (UNN). The main applications of SNN
are function approximation, classi®cation, pattern recognition and feature extraction, and
prediction tasks (Table 1). These networks require a set of molecular compounds with known
activities to model structure±activity relationships. In an optimization procedure, which will be
described below (Section 2.3), these known `target activities' serve as a reference for SAR
modelling. This principle coined the term `supervised' networks. Correspondingly,
ùnsupervised' networks can be applied to similar tasks even without prior knowledge of
molecular activities. These systems can perform automatic feature extraction and data
Table 1
Neural network types frequently used in the life sciences (adapted from Sumpter et al., 1994)
Network type/architecture Main applications
Supervised
Multilayer feed-forward (bp) non-linear modelling of (Q)SAR, prediction of molecule
activity and structure, pattern recognition, classi®cation,
signal ®ltering, noise reduction, feature extraction
Recurrent networks sequence and time series analysis
Encoder networks (ReNDeR) data compression, factor analysis, feature extraction
Learning vector quantization auto-associative recall, data compression
Unsupervised
Kohonen self-organizing map clustering, data compression, visualization
Hop®eld networks auto-associative recall, optimization
Bidirectional associative memory (BAM) pattern storage and recall (hetero-association)
Adaptive resonance theory (ART) models clustering, pattern recognition
Hybrid
Counterpropagation networks function approximation, prediction, pattern recognition
Radial basis function (RBF) networks function approximation, prediction, clustering
Adaptive fuzzy systems similar to ART and bp-networks
classi®cation. An introduction to UNN development is provided in chapter 2.2. Hybrid systems

have also been developed and successfully applied to various pattern recognition and modelling
tasks (Section 2.4). In Table 1 the main network types and characteristic applications in the life
sciences are listed. We restrict the following description of neural networks to the most widely
applied in the ®eld of molecular design and analysis of protein and nucleic acid sequences.
More extensive introductions to the theory of neural computation including comparisons with
statistical methods are given in various textbooks (e.g., Rumelhart et al., 1986; Hertz et al.,
1991; Amari, 1993; Bishop, 1995; Ripley, 1996). There are many books on neural networks
covering parts of the subject not discussed here, e.g. the relation of neural network approaches
to machine learning (Mitchie et al., 1994; Bratko and Muggleton, 1995; Langley and Simon,
1996; Langley, 1996), fuzzy classi®ers (Kosko, 1992), time series analysis (Weigend and
Gershenfeld, 1993), and special aspects of pattern recognition (Carpenter and Grossberg,
1991). Ripley provides an in depth treatment of the relation of neural network techniques to
Bayesian statistics and Bayesian training methods (Ripley, 1996). A commented collection of
pioneering publications in neurocomputing has been compiled by Anderson and Rosenfeld
(1988). Recently, two texts have become available covering both theory and applications of
ANN in chemistry, QSAR, and drug design (Zupan and Gasteiger, 1993; Devillers, 1996a).
Frequently data sets must be investigated for homogeneity. A typical example is the
investigation of a compound library for clusters of similar molecules (or performing a
`diversity' analysis) (Downs and Willett, 1995; Willett, 1997). UNN are able to automatically
recognize clusters of data in a given set based on a rational representation of the compounds
and a sensitive similarity measure. What is meant by `similarity' and `diversity' must be
carefully de®ned since the selection of an appropriate measure of similarity is crucial for the
clustering results (Good et al., 1993a,b; Dean, 1995; Sen, 1995a,b). The term ùnsupervised'
stresses the fact that no search template is required for this application, i.e. the network
automatically extracts data features, and no knowledge or de®nition of underlying attributes is
required. These systems can be helpful in generating an overview of the distributions of the
data sets, and they are, therefore, especially suited for a ®rst examination of a given data set or
raw data analysis. In Fig. 1a a distribution revealing two main clusters is shown which might
represent sets of active and inactive molecular compounds. Such inhomogeneities can be easily
detected by UNN. Some types of UNN are able to form a comprehensive model of the real
data distribution which can be helpful for extraction and an understanding of predominant
data features, e.g. molecular structures or properties that are responsible for a certain
biological activity.
In contrast to UNN, the structuring of a data set must be already known to apply SNN.
These systems are able to approximate arbitrary relationships between data points and their
functional or classi®cation values, i.e. they can be used to model all kinds of input±output
relationships (Fig. 1b) and classify data by establishing a classi®cator function (Fig. 1c). The
term `supervised' network indicates that in contrast to UNN for every data point one or more
corresponding functional values must be already known (these might be experimentally
measured molecule activities).
Fig. 1. (a) Data distribution revealing two major clusters; (b) modelling a mathematical function f(x) to establish an
input-output relationship by SNN; (c) a classi®cator (line) formed by a SNN separating two classes of data (shown
as circles and squares).
2.1. Building blocks of neural network architecture
Arti®cial neural networks consist of two elements, (i) formal neurons, and (ii) connections
between the neurons. Neurons are arranged in layers, where at least two layers of neurons (an
input layer and an output layer) are required for construction of a neural network. In Fig. 2
three network architectures are shown, both a two-layered network (Fig. 2a) and a three-
layered network (Fig. 2b) with a single output neuron, and a two-layered network with many
output neurons (Fig. 2c). In these three architectures every neuron of one layer is connected to
every neuron of the following layer, and no intra-layer connections exist. This property coined
the term `fully connected feed-forward networks'. Sometimes they are referred to as `multi-
layered perceptrons' although the classical perceptron contains only a single neuron
(Rosenblatt, 1958; Minsky and Papert, 1969). Fully connected feed-forward networks are by
far the most frequently used neural networks for molecular design and bioinformatical
applications.
Formal neurons transform a numerical input to an output value, and the neuron
connections represent numerical weight values. The weights and the neurons' internal variables
(termed bias or threshold values) are free variables of the system which must be determined in
the so-called `training phase' of network development. Selected feed-forward network types and
appropriate training algorithms are discussed in the following. For details on other network
architectures, see the literature (Hertz et al., 1991; Bishop, 1995; Ripley, 1996).
2.2. Unsupervised training of neural networks
Fig. 2c shows the architecture of a two-layered, fully-connected feed-forward net. This

architecture is often used for data analysis by UNN. However, a number of dierently
structured UNN architectures have been described for speci®c applications (Ritter et al., 1990).
A special type of UNN are self-organizing feature maps or Kohonen-networks with their
output layer neurons arranged as an array (Kohonen, 1982, 1984, Fig. 2d). In UNN both the
number of input layer neurons and the number of weights connected to an output neuron are
identical to the dimension of the input data (for example, molecular compounds represented by
a number of descriptors). The input layer neurons simply distribute the data values to the
Fig. 2. Dierent architectures of feed-forward networks. Formal neurons are drawn as circles, weights are
represented by lines connecting the neurons. The ¯ow of information through the networks is from top to bottom.
White circles: fan-out neurons; gray circles: sigmoidal neurons; black circles: linear neurons. (a) Perceptron; (b)
three-layered feed-forward network; (c + d) two-layered feed-forward network.
output neurons (`fan-out' units), and the output neurons compute a value of 0 or 1 depending
on their input. Output neurons represent possible data clusters or categories. The number of
clusters considered is identical to the number of output layer neurons. Only a single output
neuron computes the value 1 for a given data point at a time, all other output layer neurons
have an output value of 0. By this the data point under investigation is assigned to a cluster.
The decision which output neuron ®res is made by comparing the data vector with all of the
neurons' weight vectors, and the output neuron with the most similar weight vector to the data
input is selected. The network weights provide the basis for data classi®cation.
Weight vectors must be determined by an optimization procedure during the training phase.
The process of ùnsupervised' network training is similar to vector quantization (Fig. 3)
(Nasrabadi and King, 1988; Kohonen, 1989; Hertz et al., 1991). Initially the weights have
arbitrary values (they should, however, not be too dierent from the data values to facilitate
convergence of the training process). Then, a cyclic optimization procedure takes place, termed
`competitive' learning:
step 1: randomly select a data point, x, from the data set to be classi®ed;
step 2: determine the output neuron with the weight vector w closest to the data point
(`winner neuron' w * ):
i
jw * ÿ xjRjwi ÿ xj for all i;
i
step 3: update the weight values of the winner neuron according to the following
prescription:
Dw * Z x j ÿ w * ,
i j i j
where Z is a learning step size which usually is a function of time, Z(t). Conditions on Z(t)
needed to ensure convergence are discussed in the literature (Cottrell and Fort, 1986; Ritter
and Schulten, 1988);
step 4: go to step 1 or stop training (e.g., if the value of Z is below a critical threshold, or if
a certain number of cycles has passed).
This `winner-takes-all' strategy forces the weight values of the network to move towards
centroids of the data distribution and become a set of prototype vectors (Fig. 3). All data
points located within the `receptive ®eld' of an output neuron will be assigned to the same
cluster. The receptive ®elds of a fully trained UNN are comparable to the areas de®ned by
Voronoi tessellation (or Dirichlet tessellation) of the input space (Hertz et al., 1991; Preparata
and Shamos, 1985). All data which are closer to the weight vector of one neuron than to any
other weight vector belong to its receptive ®eld. The criterion employed to ®nd the winner
neuron can either be a distance between a weight vector, w, and a data vector, x. Frequently
used are the euclidian distance:
s
X
d wi ÿ x i 2 ,
i
Fig. 3. Principle of vector quantization by unsupervised network training. The network weight vectors (®lled
arrowheads) move towards centers of the data distribution during training. Location of the weight vectors before (a)
and after training (b). Data points are drawn as vectors with open arrowheads. Here two clusters are formed by two
neurons; 1 and 2 are two dimensions of the data space.
or the Manhattan distance:

X
d jwi ÿ x i j,
i
The complementary approach is to determine the neuron leading to the maximal formal
output, where
X
output wi x i :
i
As with similarity measures in statistical clustering, for example k-nearest neighbor methods,
the de®nition of similarity between data points (or weight vectors, respectively) has a great
in¯uence on the outcome of the analysis.
The optimum number of output neurons is not known a priori and the choice very much
depends on the knowledge available about the original data distribution. Sometimes it is
reasonable to expect only a limited number of data classes like àctive compounds' and
ìnactive compounds'. There are also simple methods that can be used to automatically adapt
the size of the output layer. For example, a new output neuron may be added located within
the receptive ®eld of an already existing neuron if data belonging to its receptive ®eld are less
similar to the weight vector than a minimum threshold; an output neuron might be removed if
it represents only a single data point or all data belonging to its receptive ®eld are more similar
to the weight vector than a maximum threshold. Currently much eort is spent in the
development of techniques for optimizing UNN architectures.
A useful modi®cation of the `winner-take-all' algorithm described above is to allow more
than just one single output neuron to adapt its weights during a training cycle as suggested by
Kohonen (1982, 1984, 1989). In a Kohonen-network the output layer neurons are arranged in
a plane or another low-dimensional geometry. During training not only the winner neuron
weights are updated but also neurons located close to the winner neuron in the output layer,
de®ned by a neighborhood-function. As a consequence, the topology of the usually high-
dimensional input space is conserved in the low-dimensional space spanned by the network
output neurons. Analysis of the features extracted by the network and their distribution in the
neuron map can help getting an idea of the structuring of the training data. Simple examples
of UNN training by the Kohonen-algorithm are shown in Fig. 4. Two-layered networks
containing two input neurons and 10 10 output neurons arranged in a plane were trained by
100 randomly distributed data points (Fig. 4a) and by ten data points roughly forming the
shape of a heart (Fig. 4b). The network architecture is shown in Fig. 2d. In the example the
two-dimensional input space was mapped onto a plane spanned by the output neurons. After
training the network topology re¯ects the corresponding data distribution. For visualization of
the network development process in Fig. 4 the output neurons were connected by lines, where
the co-ordinates of the intersections are given by the corresponding weight vectors.
Besides data clustering Kohonen-networks are useful tools for data visualization. For
example, they were applied to mapping the electrostatic surface potential of molecules onto a
plane (Gasteiger and Li, 1994; Barlow, 1995). This can be useful for visual inspection of
potential pharmacophores and comparison of potential drug candidates. An example of
Fig. 4. Examples of topology-preservation in Kohonen-networks. (a) training of a 10 10 Kohonen-network with

randomly distributed data; (b) training of a 10 10 Kohonen-network with only a few data points roughly forming
the shape of a heart. Stages of network development during the training process are shown. At the end of UNN
training the network topologies re¯ect the underlying data distributions.
visualizing a model of the chemical space covered by a molecular compound library using
Kohonen-mapping is presented in Section 4.2. Several modi®cations of the Kohonen algorithm
have been developed to improve the original method and overcome some limitations (e.g.,
Ritter and Schulten, 1988, 1989; Martinetz and Schulten, 1994). Of particular interest are
special neighborhood functions in the output layer of the Kohonen-network to increase
contrast between neuron clusters (`Mexican hat' function), and to use a toroidal geometry of
the output layer (Zupan and Gasteiger, 1993). The latter can help avoiding erroneous data
classi®cations due to border eects of a planar topology (Li et al., 1993), and is especially
useful for mapping a continuous input space, e.g. a molecular electrostatic potential (Zupan
and Gasteiger, 1993; Holzgrabe et al., 1996).
A disadvantage of Kohonen-networks can be the comparatively long training time needed,
especially if large data sets are used since every data vector must be presented several times to
the network for weight adaptation. Hybrid multi-layered UNN which can be trained extremely
fast employing very large data sets have already been developed (Lu et al., 1990a,b; Hertz et
al., 1991; Lu and Lerner, 1996). These systems can provide an alternative classi®cation tool to
Kohonen-networks if real-time or on-line computation is required, e.g. for control and analysis
of high-throughput screening results. Usually they contain more than two layers of neurons
where from layer to layer a more subtle data classi®cation is performed, and only parts of the
network are considered during one training cycle. This reduces the training time needed since a
data vector is not compared to every weight vector as in classical Kohonen-networks. Until
today, however, no application of these modi®ed modular systems to molecular design has
been published. An overview of early Kohonen-network applications in chemistry together
with a treatment of Hop®eld networks has been compiled by Melssen and co-workers (Melssen
et al., 1994). A comparison of unsupervised Kohonen-networks with the supervised learning
vector quantization approach was performed by Marabini and Carazo (1994) taking the
classi®cation of images of biological macromolecules as an example.
2.3. Supervised training of neural networks
SNN can be used as function estimators (Cybenko, 1988; Hertz et al., 1991) and
classi®cators (Minsky and Papert, 1969). They follow the principle of convoluting simple non-
linear functions for approximation of complicated input±output relationships (Kolmogorov's
theorem, Kolmogorov (1957)). Fourier transforms, for example, are based on a similar idea
using super-positions of Sin and Cos terms. This general concept has been thoroughly
discussed by Bishop (1995). Since in SNN mainly sigmoidal functions are employed the
following description of these systems will be restricted to this network type. Examples of
networks using other neuron activities have been described and ®rst applications to molecular
design exist (see below). We restrict the discussion of SNN to feed-forward networks because
of their dominating role in chemistry and biology. Other network architectures, e.g. recurrent
networks for time-series and sequence analysis, are not considered here.
Like UNN, a supervised neural network architecture consists of neurons and connection
weights. However, in contrast to UNN all neurons in a supervised net compute an output
value every time a data vector is fed into the network. A scheme of a fully connected three-
layered feed-forward network is shown in Fig. 2b. Input layer neurons simply distribute a data
vector to the hidden-layer neurons. They are often referred to as `fan-out units' because they
simply distribute the data vector to the next network layer without any calculation being
performed. The input layer of SNN is identical to the input layer of UNN (Fig. 2). In the
hidden layer neurons with a sigmoidal activation function (or `transfer function') are used. A
sigmoidal neuron computes an output value according to:
1
Sigminput ,
1 eÿinput
where
X
input wi x i ÿ W:
i
Here w is the weight vector connected to the neuron, x is the neuron's input signal, and W is
the neuron's bias or threshold value. If a single sigmoidal output neuron is used the overall
function represented by the fully connected two-layered feed-forward network shown in Fig. 2a
is:
X
fx Sigm wi x i ÿ W ,
i
where x is the input vector (data vector). The network shown in Fig. 2b with sigmoidal hidden
units and a sigmoidal output unit represents a more complicated function:
X X
out
fx Sigm vj Sigm wi,j x i ÿ Wj ÿ W ,
j i
where w are the input-to-hidden weights, v are the hidden-to-output weights, W are the hidden
layer bias values, and Wout is the output neuron's bias. The more layers are present in a
network the more complicated overall functions can be represented. At most two hidden layers
with non-linear neurons are required to approximate arbitrary continuous functions (Cybenko,
1989; Hornik et al., 1989). However, depending on the application and the accuracy of the ®t
the required number of layers and the number of neurons in a layer can vary. In general the
number of free variables in the system, i.e. the number of weights values, should be smaller
than the number of data points in the training set to avoid having an over-determined system.
Baum and Haussler (1989) addressed the question of what size of a network can be expected to
generalize from a given number of training examples. A `generalizing' solution derived from
the analysis of a limited set of training data will be valid for any further data point leading to
perfect predictions. This is possible only by the use of data which is representative of the
problem. Most solutions found by feature extraction from lifelike data sets will, however, be
sub-optimal in this general meaning (Section 2.4). Andrea and Kalayeh (1991) proposed a rule
of thumb, according to which the ratio of data points available (training data) and network
weights, r, should be around 2, where
number of training data points
r :
number of weights
A systematic investigation of ideal ranges of the parameter r in QSAR studies was

performed by Manallack and Livingston (1995). They found that a r value in excess of 3 is
needed to keep the risk of memorization (over-learning) low. It is not trivial to select an
appropriate network architecture for a given task. Several techniques for optimization of SNN
architecture have been developed or adapted, including `weight pruning' or genetic algorithms
(Hinton, 1986; Sietsma and Dow, 1988; Harp et al., 1990; Lohmann, 1993; Bishop, 1995;
Lohmann et al., 1996). Further rules for selecting an appropriate size of the hidden layer in bp-
networks have been compiled by Devillers (Devillers, 1996b).
Determination of a useful architecture by testing the performance of a SNN with
independent test data during and after SNN training is essential. The result gives a useful feed-
back to the researcher. If the test performance is bad either the network architecture must be
altered, or the data used for training is inadequate, i.e. the problem might be ill-posed.
Furthermore, by monitoring test performance the generalization ability can be estimated
during the training process and training can be stopped before over-®tting occurs, i.e. before
full convergence on the training data.
During SNN training the weights and threshold values of the network are determined which
is an optimization procedure. For training a set of data vectors plus one or more functional
values per data point are required. The optimization task is to adjust the free variables of the
system in a way the network computes the desired output values. The dierence between the
actual output and the desired output is minimized; a frequently used error function is the sum-
of-squares error (N is the number of data points (patterns) in the training set):
X
N
E outputactual
i ÿ outputdesired
i 2 :
i1
The sum-of-squares error function isPa P special case of a general error function called the
Minkowski-R error (Bishop, 1995): E n ck1 jyk xn ;w ÿ tnk jR , where yk is the actual network
output, tk is the desired (target) output vector, x is the input vector, and w is the weight vector.
The summation over k will be omitted if only a single output neuron is used. With R = 2 the
Minkowski-R error reduces to the sum-of-squares error function, R = 1 leads to the
computation of the conditional median of the data rather than the conditional mean as for
R = 2. The use of an R value less than 2 can help reducing the sensitivity to outliers which are
frequently present in a training set, especially if experimental data are used.
Several optimization algorithms can be used, e.g. gradient descent techniques, simulated
annealing procedures, or evolutionary algorithms performing some kind of adaptive random
search. Most frequently applied are gradient techniques using the `back-propagation of errors'
(bp) algorithm for weight update (Rumelhart et al., 1986). Gradient descent follows the deepest
descent on the error surface:

@E @E @E
DEw , ,..., :
@w1 @w2 @wl
For network training each weight value is incrementally updated according to:
@E
Dwi ÿg ; i 1,2, . . . ,l:
@wi
g is a constant de®ning the step size of each iteration along the downhill gradient direction.
If the weights of a two-layered network (Fig. 2a) must be updated we make these weight
changes individually for each input pattern xm in turn, and obtain the `delta rule' (also termed
Àdaline rule', `Widrow±Ho rule', or `LMS (least-mean-square) rule') (Rumelhart et al., 1986;
Widrow and Ho, 1960): Dwi,k gtmi ÿ ymi x mk :
Using a continuous dierentiable activation function for the hidden units of a multi-layered
network, it is an easy step to calculate the update values (`deltas') for hidden unit weights
following the bp procedure. A thorough description of this algorithm can be found in many
textbooks on neural networks (see, e.g. Bishop, 1995).
Several modi®cations of bp have been described (Hertz et al., 1991). As with all optimization
techniques gradient methods can have problems with premature convergence due to local
energy barriers or plateaus of the error surface (McInerny et al., 1989). This will be no
problem if the network is small enough to de®ne a comparatively low-dimensional space
lacking striking local optima. However, if large networks are trained classical bp-based
methods might easily get trapped in a local optimum. For training of complicated networks
especially adaptive evolutionary optimization can be useful (Schneider et al., 1996). Besides
considerations concerning network architecture considerable eort must be spent in selecting
an appropriate SNN training technique. A description of maximum likelihood techniques and
Bayesian learning is provided by Bishop (1995) and Ripley (1996).
2.4. Testing the generalization ability of neural network models
To assess the usefulness and applicability of trained SNN and UNN the systems must be
tested, just as any other prediction tool used in (Q)SAR. This can be done in several ways, and
a variety of statistical techniques serve this purpose (for review of rule extraction and
generalization, see (Denker et al., 1987)). Cross-validation and bootstrapping procedures are
well-suited; see Kubinyi (1993a) for details on signi®cance and validity of QSAR equations.
One simple possibility is to divide the data available in only two sets, a training set and a test
set. The training data are randomly divided into subsets for cross-validation of the training
results and to optimize network architecture. The test set contains independent data which is
presented to a fully trained network to evaluate its prediction or recognition ability. A slightly
dierent method is to perform a complete leave-one-out procedure, where there are trained as
many networks as there are data points. One network is trained at a time using all but one
data point for training. The performance of the networks on the left-out compound can be
used to assess the accuracy of test data prediction. Statistical measures like correlation
coecients, information indices, and various error functions are useful for this purpose (Ash,
1965; Kubinyi, 1993a; Ripley, 1996). A cross validated r2, for example, is frequently used in
SAR studies. It provides an estimate of model predictivity by comparing the predictive sum of
squares against the sum of squared deviations of each desired network output, yobs, to its
mean, hyobsi (Good, 1995):
X
yobs ÿ ypred 2
cross ÿ validated r2 1 ÿ X :
hyobs i ÿ yobs 2
This technique allows dierent models containing dierent numbers of components (e.g.,
numbers of network layers, neurons and weights) to be evaluated. For estimating the accuracy
of a two-state prediction the correlation coecient according to Matthews (1975) is frequently
employed:
cc p
PNÿOU ,
NU NOPU PO
where P is the number of correctly predicted `positive' examples (state 1), N is the number of
correctly predicted `negative' examples (state 2), O is the number of false-positives, and U is
the number of false-negative predictions. The value of cc is in [ÿ1,1], where cc = 1 indicates
perfect prediction.
2.5. Special network architectures and hybrid models
Multivariate display methods are of considerable importance to data visualization in

molecular design and SAR, as well as for reduction of data dimensions (Devillers, 1995). The
aim is to reveal the patterns hidden in complex data (Hudson et al., 1989; Domine et al.,
1996). A number of dierent methods which can be categorized as being linear or non-linear
have been described. Popular techniques in the ®eld of (Q)SAR are principle component
analysis (PCA) which is a linear method, multidimensional scaling (MDS) (Shepard, 1962a,b;
Kruskal, 1964) and Sammon mapping (Sammon, 1969; Devillers, 1995; Agra®otis, 1997), a
non-linear technique. Of considerable interest is a neural network approach termed èncoder'
networks or ReNDeR (reversible non-linear dimension reduction) (Livingstone, 1996). The
architecture of an encoder network is illustrated in Fig. 5. The network is symmetrical around
a central parameter layer. The number of input and output neurons is de®ned by the
dimension of the data vectors, and the task of the system is to reproduce the input data at the
output layer (auto-association) via an internal representation which is of lower complexity than
the original data. In Fig. 5 the parameter layer consists of only two neurons (although this
layer can have an arbitrary number of neurons) for a reduction of the data to only two
dimensions (factors). Once the network weights are optimized by conventional supervised
training techniques the input data can be described by the output values of the neurons in the
parameter layer. Two or three neurons are especially useful for graphical display. If no
intermediate hidden layers are used and the neurons are linear the factors found by the
Fig. 5. Architecture of an encoder network (ReNDeR) mapping a seven-dimensional input to two dimensions.
White circles: fan-out neurons; gray circles: sigmoidal neurons; black circles: linear neurons. After network training
the data projection onto a plane can be displayed by plotting the output of the central neurons.
encoder network are equivalent to principle components (Hertz et al., 1991). Presence of
hidden layers containing non-linear neuron activities enables the system to perform non-linear
mappings of the data, quite similar to Kohonen mapping (Salt et al., 1992). A unique
attraction of encoder networks is, however, the possibility to (re)construct data of the original
dimension directly from low-dimensional projections. In a two-dimensional map derived from
high-dimensional QSAR data, for example, any position is directly linked to the corresponding
original space. This might be useful for compound selection with a desired property (or
àctivity') by inspection of a low-dimensional graphical display. Until now, however, this
design technique is still in its infancy. Applications, advantages and drawbacks of the method
have been critically discussed (Reibnegger et al., 1993; Livingstone, 1996).
An example of data visualization and classi®cation using principle component analysis,
Sammon mapping, a Kohonen-network, and an encoder network is shown in Fig. 6 to discuss
some of the speci®c properties of each method. PCA was performed using the commercially
available software package TSAR V3.1 (Oxford Molecular, Oxford), all other software tools
were developed in-house at Roche (Schneider, unpublished). For the example application, the
20 naturally occurring amino acids (cystine was treated as cysteine) were described by the
respective values of seven partly highly correlated physicochemical properties (Schneider and
Fig. 6. Mapping of amino acid residues by dierent methods. Seven-dimensional descriptions of amino acid residues
were projected to planar display by principal component analysis (PCA, upper left), Sammon mapping (upper
right), a non-linear encoder network (lower left), and a 4 4 Kohonen-network (lower right). The locations of the
individual amino acids in the maps are indicated by their single letter code.
Wrede, 1993a): hydrophobicity (Engelman et al., 1986), volume (Harpaz et al., 1994), bulkiness
(Jones, 1975), surface area (Chothia, 1975), hydrophilicity (Hopp and Woods, 1981), and
refractivity and polarity (Jones, 1975). The task was to generate two-dimensional projections of
the seven-dimensional data. In our example the four techniques led to comparable results, i.e.
several groups (clusters) of amino acid residues have become visible. PCA reveals some clear
clusters by linear projection onto the plane spanned by the ®rst two principal components, pc1
and pc2 (Fig. 6, upper left). The principal components can be regarded as linear combinations
of features with high variance. This method is fast, and the relative explanation power of the
features (data variance covered) can be expressed by the corresponding eigenvalues (for review,
see Jackson, 1991). However, PCA is limited by its linearity: projections from high-dimensional
space to low-dimensional space can produce misleading relationships between individual data
points, especially if high-dimensional data is analyzed. This eect has been studied in detail by
Devillers (1995). In contrast to PCA, Sammon mapping aims at retaining the original relative
orientations of the data points in low-dimensional space (Fig. 6, upper right). The non-linear
projection generated might, therefore, more accurately display the relative distances between
the data. However, Sammon mapping is very time-consuming if many data points are used,
and the `factors' extracted have no statistical explanation power. Furthermore, the maps
generated result from an optimization procedure and are thus not fully reproducible. The same
objections hold for the encoder network (Fig. 6, lower left). Nevertheless, there is a potential
application of encoder networks to molecular design, as mentioned above. The Kohonen-
network led to a reproducible, non-linear projection of the data (Fig. 6, lower right). In
addition, an amino acid classi®cation scheme is generated automatically. The major drawbacks
of this technique are the loss of information about the original distances between the data
points, and the lack of statistical meaning of the network axes. Despite the obvious limitations
of the individual methods, combinations of dierent statistical approaches and neural
networks, especially PCA and Kohonen-networks, complement each other well and can be very
useful for molecular diversity analysis, compound clustering, and visualization (Section 4). A
similar conclusion was drawn by Kocjanc° ic° and Zupan (1997) who compared PCA, Kohonen-
network, and an encoder-type network taking the classi®cation of olive oils as an example.
Two types of hybrid networks shall brie¯y be mentioned here, radial basis function (RBF)
networks (Moody and Darken, 1989; Niranjan and Fallside, 1990) and counter-propagation
networks (CPN) (Hecht-Nielsen, 1987, 1988; Huang and Lippmann, 1988). RBF networks
provide a framework combining both unsupervised and supervised network training. A special
advantage of these networks is the short training time needed compared to multi-layered SNN
methods (Hertz et al., 1991; Bishop, 1995). Their basic principle is the use combinations of
localized functions (e.g. Gaussians) to approximate a given input/output relationship, where
each basis function acts like a hidden unit. The idea is to pave the part of the input space
where the input data is located with their receptive ®elds. The application of RBF networks to
molecular design and (Q)SAR studies is limited until now, although they belong to the most
widely used neural networks in other areas. It appears likely and reasonable that these systems
will soon be adapted for SAR modelling and molecular design. Recently a comparison of
multi-layered feed-forward networks with RBF networks for the task of peptide classi®cation
has been published (Schneider et al., 1995a).
Counter-propagation networks (Hecht-Nielsen, 1987; Carpenter and Grossberg, 1991) have

found several applications in the various ®elds of (Q)SAR (Peterson, 1992, 1995; Zupan and
Gasteiger, 1993; Novic° et al., 1997, Vrac° ko, 1997), and sequence analysis (Wu, 1996). Their
architecture consists of two layers, a Kohonen layer in¯uenced by the data vectors and an
output layer (Grossberg layer) in¯uenced by target values (`correct' or èxpected' answers). The
weights connecting the input and the hidden layer are determined by a standard competitive
learning rule in a similar manner to Kohonen-network training. As a result, the hidden units
divide up the input space in a Voronoi tessellation. A set of targets must, however, be known
to train the output layer using the conventional delta rule. The greatest appeal of CPN is their
speed; they can be trained 10±100 times faster than bp-networks yielding comparable results.
An introduction to CPN systems for chemical applications can be found in Zupan and
Gasteiger (1993).
Adaptive resonance theory (ART) based arti®cial neural networks were originally designed
to model real-time pattern recognition in the brain (Grossberg, 1976a,b; Carpenter and
Grossberg, 1991). Recently they were adapted for QSAR modelling (Wienke et al., 1996). Most
ART networks belong to the group of UNN. Their principle is to categorize data by checking
whether an input pattern ®ts into (resonates with) an existing network structure or not. In the
resonance case weight adaptation takes place. If, however, the input pattern belongs to a
hitherto unseen data class a new weight vector will be initialized, and the network grows. ART
systems can help solving the `stability±plasticity dilemma' of most ANN because they are
stable enough to preserve old features but are ¯exible enough to change their structure and
incorporate new information. It has been argued that the original ART networks are
somewhat tricky to adjust and are very sensitive to noisy data (Hertz et al., 1991), recent
developments and combinations with fuzzy systems may help overcoming these problems
(Kosko, 1992; Zupan and Gasteiger, 1993; Wienke et al., 1996; Wienke and Buydens, 1996).
A further neural network technique, termed `holographic neural network' (Sutherland, 1990),
has recently been introduced to the analysis of chemical data (Burden, 1998). The method
provides a discriminant technique which seems to be especially suited for classi®cation of
overlapping data sets. An attractive property of this system is that it gives a de®nitive model,
unlike a bp-network which may lead to a dierent model each time it is trained. However, a
rigorous comparison to bp-networks and other network types remains to be done. In a number
of simplifying test cases the holographic network technique was comparable, sometimes even
superior to conventional statistical discriminant methods (Burden, 1998). Vrac° ko (1997)
describes a `spectrum-like' representation of geometric and electronic molecular attributes in a
counter-propagation network approach which follows a concept related to `holographic'
representation of molecules.
3. Molecular descriptors and data pre-processing
An important concept of three-dimensional molecular similarity was coined by CarboÂ and

co-workers who proposed a similarity index originally based on electron density distributions
(CarboÂ et al., 1980). Many other similarity measures have been invented based on very
dierent 2D and 3D descriptors. An idea of neural networks is to automatically ®nd a
similarity function for a given data representation rather than using a pre-de®ned similarity
index (an elegant combination of established similarity measures and ANN has been developed
recently (So and Karplus, 1997a,b), cf. Section 4.2). However, the data used for neural network
training (or development of any other SAR model) must be represented in a way that
underlying trends or features can be recognized. In other words, molecules that behave similar
in a certain context or have a comparable biological activity should have a similar
representation in terms of molecular descriptors, because molecular design involves recognition
of patterns and subsequent attempts to classify the patterns. To obey the `principle of strong
causality' (Rechenberg, 1973; Holland, 1975), a prerequisite for any rational optimization,
similar molecular activity should correspond to similar molecular descriptors or
representations. However, several cases are known where small changes in the molecular
structure of a drug resulted in drastical functional changes, sometimes even in a complete loss
of activity (for review, see BoÈhm et al., 1996).
It is evident that the relative importance of individual molecular features varies depending
on the particular molecular activity studied. ANN are able to recognize and extract those
features which are responsible for activity and classify compounds accordingly. Nevertheless, it
is essential to focus on the appropriate level of abstraction for descriptor type selection. This
means, for example, that a coarse-grain pharmacophore description might be well-suited for a
rough classi®cation of data base entries, whereas a more detailed description containing, e.g.,
information about speci®c binding modes of potential enzyme inhibitors will be required for
quantitative SAR modelling. It is desirable to develop low-dimensional descriptors which
contain exactly the information needed to answer a particular question.
To a certain extent the need for this view of molecules in terms of patterns stems from
shortcomings of current ab initio methods, mainly the long computer time needed for
calculation of molecular properties. Pattern recognition is a heuristic approach to derive
structure±activity relationships. Keeping this in mind, a complete molecular design pattern may
be de®ned by molecular syntax (the `logical' structure, topology index, molecular building
blocks) and/or semantically with respect to a speci®c context-dependent meaning (e.g.,
experimental values or physicochemical properties). For dierent representation schemes, a
more detailed discussion, and alternative views, see the literature (e.g., Ugi et al., 1990;
Balaban, 1997; Klein and Babic, 1997; Randic, 1997; So and Karplus, 1997a). The relative
importance of any one of the descriptors may vary fundamentally for dierent molecular
àctivities' studied. For example, the positive charge of a lysine residue located in the active site
of an enzyme can be important for the formation of a ionic interaction with a negatively
charged inhibitor, but located within the hydrophobic core of the protein its lipophilic
character might be crucial. Understanding this context dependency of individual molecular
descriptors is a key to realistic SAR models. Due to their architecture and inherently parallel
way of information processing and feature extraction ANN are well-suited for this purpose.
Three major descriptor types containing information about
. overall (`bulk') molecular properties (e.g. lipophilicity, shape, overall charge),

. molecule architecture (2D or 3D structural information), and
. site-speci®c properties (e.g. fragment lipophilicity, point charges)
are frequently employed for representations of various types of molecules. Ever since the
pioneering work of Hansch et al. (1962, 1963) a vast number of dierent molecular descriptors
has been invented to establish useful QSAR equations for small organic compounds.
Comprehensive introductions to the principles of such descriptors for QSAR can be found in
the literature (Kubinyi, 1993a,b; Dean, 1995; van de Waterbeemd, 1995a,b; Hansch and Leo,
1995; BoÈhm et al., 1996). The potential of multivariate pattern recognition techniques in
connection with potent molecular descriptors has been highlighted (Hyde and Livingstone,
1988). An extensive compilation of molecular descriptors for electronic, steric, and
hydrophobic interactions is available from a publication of Jurs et al. (1995). For small
molecules fast 3D structure generators based on 2D representations are at hand which allows
3D descriptors to be derived if no experimental spatial structures are available (Sadowski and
Gasteiger, 1993, 1994). In particular the following 2D and 3D descriptors are often employed
in (Q)SAR studies, including neural network approaches: electronic, hydrophobic and steric
substituent constants; partition coecient (Log P, Log D); topological and connectivity indices;
substructure-based descriptors; molecular shape indices; electronic whole molecule descriptors;
reactivity indices; geometric descriptors (volume, surface area, charged partial surface area).
Several hundred dierent descriptors have been invented to cover the respective context of a
SAR, and there seems to be a set of `basic invariants' of molecular graphs which can be
calculated directly from the molecular structure of an organic compound as claimed by Baskin
et al. (1993, 1995, 1997).
Table 2
First ®ve principal components (PC) derived by PCA from a collection of 143 amino acid properties (for PCA the
original data of Tomii and Kanehisa (1996) was used); columns scaled by unit variance
Amino acid
residue PC 1 PC 2 PC 3 PC 4 PC 5
Ala 0.11 0.32 2.09 0.54 0.51

Cys 0.38 ÿ0.78 ÿ0.33 ÿ1.54 2.03
Asp ÿ1.38 0.46 ÿ0.07 0.09 1.33
Glu ÿ0.74 1.76 1.01 0.84 1.32
Phe 1.36 ÿ0.27 ÿ0.49 0.04 0.14
Gly ÿ1.3 2.09 1.09 ÿ1.32 ÿ0.35
His 0 0.77 ÿ0.74 ÿ0.87 0.36
Ile 1.54 ÿ0.67 0.37 0.48 ÿ0.81
Lys ÿ0.67 1.68 ÿ0.09 0.13 ÿ1.69
Leu 1.33 0.06 1.35 0.81 ÿ0.41
Met 1.27 0.66 0.33 ÿ0.13 1.16
Asn ÿ1.2 ÿ0.1 ÿ0.39 ÿ1.16 0.21
Pro ÿ1.28 ÿ1.62 ÿ1.09 3.34 0.36
Gln ÿ0.45 1.03 ÿ0.09 0.11 ÿ0.19
Arg ÿ0.47 1.3 ÿ0.92 ÿ0.37 ÿ1.8
Ser ÿ0.97 ÿ0.74 0.52 ÿ0.54 ÿ0.47
Thr ÿ0.36 ÿ0.6 0.31 ÿ0.27 ÿ0.84
Val 1.23 ÿ0.7 0.9 0 ÿ1.02
Trp 1.19 0.16 ÿ1.89 0.21 0.93
Tyr 0.43 ÿ0.62 ÿ1.87 ÿ0.38 ÿ0.74
In Table 2 a list of ®ve descriptor scales for amino acid residues is given which were derived
by PCA from 143 property scales collected by Tomii and Kanehisa (1996). These uncorrelated
(orthogonal) scales can be used for representation of amino acid residues in SAR studies and
neural network training. PC1 is highly correlated with hydrophobicity scales, PC2 with helix
propensity, PC3 with volume and amino acid frequency, and PC4 with polarity. PC5 does not
show signi®cant correlation to any of the 143 original properties. These results are in
accordance with an earlier work based on a collection of 180 property scales (Kidera et al.,
1985). Recently a set of amino acid descriptors for QSAR modelling has been published which
is based on ab initio quantum mechanical calculations (Norinder and Svensson, 1998). It is
well possible that these descriptors complement the information contained in more empirically
derived descriptors.
A challenging idea is to directly derive a context-dependent weighing scheme from formal
representations of molecules, rather than pre-calculating or selecting a set of physicochemical
descriptors (for a thorough discussion, see Randic, 1997). This is a ®eld where neural networks
excel, and several such neural network applications have been described for solving SAR and
drug design problems (Aoyama et al., 1990; Aoyama and Ichikawa, 1991, 1993; Gasteiger and
Zupan, 1993; Barlow, 1995; Baskin et al., 1993, 1997). Qian and Sejnowski (1988) published
the methodological principle taking the prediction of protein secondary structure from the
amino acid sequence as an example. Their idea was to represent each amino acid residue by a
20-digit unitary number, i.e. a number consisting of 19 zeros and a single 1 (also termed
`sparse encoding' or `distributed encoding' (Hirst and Sternberg, 1992; Presnell and Cohen,
1993; Schneider and Wrede, 1993b)). A SNN trained on secondary structure prediction
develops input weights for each digit. After training these weights express context-dependent
`properties' of the individual residues. In other words, the network input weight vector
introduces a residue ordering: similar weights correspond to similar residues. Following this
approach ANN introduce a context-sensitive similarity criterion. A related concept was
followed by Brunak and co-workers (Tolstrup et al., 1994). They trained a network to classify
a formal representation of the 61 nucleotide triplets of the genetic code into 20 amino acid
categories. The resulting internal network representation of the amino acids strongly correlates
with the GES scale of transfer free energies (Engelman et al., 1986).
Sometimes graphical display of weight vectors by Hinton diagrams (Qian and Sejnowski,
1988) can help understanding the features extracted. The numerical weight values indicate
contributions of an input variable (if input weights are analyzed) or individual features (if
hidden weights are analyzed) to the SAR model (Schneider et al., 1993, 1995b, 1998). A
method for estimating the relative importance of individual input variables of encoder
networks has also been described (Kocjanc° ic° and Zupan, 1997).
The unitary coding scheme leads to an orthogonal representation of the entities described
(residues, functional groups, or atoms). An advantage is that ANN can automatically derive an
ordering of the entities based on the training data. However, a clear disadvantage is the fact
that this data description often is high-dimensional. This in turn demands for a large number
of training data which might be available for protein or nucleic acid sequence analysis, but in
most cases only a few in vivo or in vitro pharmacological data are at hand.
Correlation functions are of particular attraction for representation of both small molecules
and polymer sequences because they lead to a compact description containing relevant
Fig. 7. Principle of the auto-correlation method for 2D representations of molecules. Step 1: construction of the
molecular graph. Step 2: calculation of the shortest distance between all atoms (expressed as number of bonds). Step
3: Selection of a property for correlation, here the property `nitrogen' was used. This led to entries in the correlation
matrix for the atom pairs (1/7) and (7/1) (auto-correlation of nitrogens over four bonds). For calculation of the ®nal
auto-correlation vector, see the text.
structural information (Jones, 1975; Moreau and Broto, 1980; Wagener et al., 1995). For each
molecule the resulting correlation vectors have identical length, and the vector elements contain
equivalent information. This is a necessary prerequisite for feature extraction by ANN because
the input layer size of a network is usually ®xed. Fig. 7 shows the principle of the atom-based
auto-correlation method for the 2D case. First the molecular graph neglecting hydrogens is
determined (step 1). Then, a table is calculated giving the inter-atomic distances expressed as
numbers of bonds between pairs of atoms, d(i,j), thereby providing the neighborhood relations
between atoms (step 2). In step 3, an atomic property P is selected (here, the property
`nitrogen' was used, and for nitrogen atoms `1' is added to the table), and the auto-correlation
vector for the property selected, AV = (C0, C1, C2, . . ., CN), is calculated, where:
X
Cn PiP j:
i,j
In our example, an auto-correlation of the property `nitrogen' over four bonds is observed
leading to AV = (2,0,0,0,2,0,0,0). Following this strategy, dierent properties can be combined
to form a composite auto-correlation vector. For example, de®nition of hydrogen-bond donors
and acceptors, charged groups, and lipophilic centers can provide a useful level of abstraction
for representation of molecular compounds, especially for inhibitor design (Fig. 8). An
advantage of 2D over 3D correlation vectors is that the 2D structure of a molecule is always
available. 2D correlation vectors can be used for rapid clustering of large sets of data (Section
4.2). However, lacking 3D information often prevents reasonable SAR models. Gasteiger and
co-workers introduced topological auto-correlation vectors and self-organizing networks for
identi®cation of dopamine and benzodiazepine agonists in large data sets (Bauknecht et al.,
Fig. 8. Location of functional groups in the 2D structure of the thrombin inhibitor PPACK. Lip: lipophilic; HA:
hydrogen-bond acceptor; HD: hydrogen-bond donor; +: positive charge.
1996). Recently an application of auto-correlation vectors as structural descriptors for

modelling antimalarial activity of molecules has been described (Calas et al., 1997). The
pioneering work of Jones (1975) introduced correlation methods for protein sequence analysis.
A useful grouping of amino acid residues has been conceived by Taylor (1986) (Fig. 9). This
classi®cation scheme has been shown to be very useful for feature extraction from amino acid
sequences (King and Sternberg, 1990; Schneider and Wrede, 1993c), and correlations between
these residue classes provide valuable sequence information. An example is shown in Fig. 10.
Here, the N-terminal parts of eukaryotic protein precursor sequences were represented by the
auto-correlation of the ten basic Taylor groups (Fig. 9). Correlations over up to ten bonds
were considered, leading to 10 10 = 100-dimensional auto-correlation vectors. A 10 10
Fig. 9. Venn diagram of amino acid residues according to Taylor (1986).

Fig. 10. Kohonen maps of N-terminal protein sequences. Neurons are drawn as squares. A Kohonen-network with
10 10 output neurons arranged in the plane was trained by cross-correlation vectors of amino acid residue classes
derived from N-terminal parts of protein precursor sequences. The location of the individual sequences in the ®nal
map is shown separately for cytoplasmic, secreted, mitochondrial, and nuclear proteins. Neuron shading gives the
fraction of the respective sequence class clustered (white: no sequence found in the neuron; black: only sequences
belonging to the sequence class under investigation were grouped together).
Kohonen-network was used to classify these data. The clustering results are unambiguous
(Fig. 10): N-terminal portions of protein precursor sequences contain information about the
protein targeting route, and this information can be expressed by the special coding scheme
selected. The Kohonen-network was able to map the high-dimensional descriptor space onto a
plane. The corresponding sequence space seems to be divided into three major regions:
cytoplasmic, secretory, and mitochondrial (proteins from other compartments were not
included in the analysis; previous studies using self-organizing feature maps for automatic
sequence classi®cation revealed relationships between mitochondrial and chloroplast targeting
signals (Schneider et al., 1997)). Nuclear proteins were clustered together with cytoplasmic
sequences. Indeed, the targeting signals of nuclear proteins are not located at the N-terminus,
rather internal or C-terminal parts seem to contain the respective information (Silver and
Goodson, 1989).
4. Applications of neural networks to protein and nucleic acid sequence analysis, compound
classi®cation and molecular design
In this chapter we summarize recent advances in protein and nucleic acid sequence analysis
and present current concepts along with some worked examples for compound clustering, and
the design of peptides and small organic molecules. Several pioneering ANN applications have
already been discussed in the previous chapters. For a comparison of ANN with other
techniques and concepts of arti®cial intelligence in sequence analysis, see the literature (Hunter,
1993; Doolittle, 1996).
4.1. Feature extraction and classi®cation of biopolymers
Identi®cation of potential drug targets and the localization of structural and functional
features in biopolymers is of great importance at the onset of a drug development project.
Molecular biology turned into the phase of sequencing the complete genomes of many dierent
organisms. The genomes of three archaebacteria (Methanococcus jannaschii, Methanobacterium
thermoautotrophicum, Archaeglobus fulgidus), nine eubacteria (Haemophilus in¯uenzae,
Mycoplasma genitalium, Synechocystis sp., Mycoplasma pneumoniae, Helicobacter pylori,
Escherichia coli, Bacillus subtilis, Borrelia burgdorferi, Aquifex aeolicus, Treponema pallidum),
and the yeast Saccharomyces cerevisiae have been deciphered. Knowing the DNA sequence of
a genome, that is speaking of several million base pairs, makes automatic identi®cation and
classi®cation tools absolutely necessary. New insight into the high complexity of gene
arrangements as well as of the gene structures themselves can be obtained from both rigorous
comparisons of individual nucleotide sequences and whole genomes (Tatusov et al., 1997).
Gene families which are present in bacteria only are potential targets of new antibiotics.
However, the biological functions of derived protein primary structures is still unknown in up
to 30% of the cases (Strachan and Read, 1996).
The aim is to ®nd decisive feature(s) in a biopolymer sequence as an indicator of biological
function. It is very well known that in most cases the appropriate SAR re¯ects a non-linear
correlation, and ANN have proven their value for automatic non-linear feature extraction from
biopolymer sequences. In the following part several recent advances of ANN applications to
the identi®cation of promotor sites, splice sites, ribosomal binding sites, prediction of protein
structure, identi®cation of membrane proteins, targeting signals, and recognition of
glycosylation sites are described. Several years ago extensive reviews appeared summarizing
earlier applications of ANN in molecular biology (Hirst and Sternberg, 1992; Presnell and
Cohen, 1993). The steady increase of publications ever since re¯ects the usefulness of ANN
approaches to many dierent classi®cation and modelling tasks in molecular biology and
biochemistry.
4.1.1. Identi®cation of promotor sites and polymerase binding regions

Almost from the beginning of molecular biology in the early sixties transcription signals, in
particular promotor regions, have been characterised. Sequencing studies of many promotor
regions in E. coli and other gram negative eubacteria revealed characteristic patterns around
positions ÿ10 and ÿ35 upstream of the transcription start point (Rosenberg and Court, 1979;
Lisser and Margalit, 1993). In E. coli the DNA-dependent RNA polymerase consists of four
subunits (also designated as E) and the initiation factor s, a polypeptide of 70 kDa molecular
weight. The s factor together with the apoenzyme E recognizes promotor sequences.
Identi®cation of promotor regions was a convincing model application for neural networks. A
recent work by Pedersen and Engelbrecht (1995) used experimentally veri®ed promotor site
sequences for training of SNN. Promotor regions were scanned using a sequence window
encompassing seven nucleotides. The authors observed maximal correlation for position 0, as
expected, and in addition for positions ÿ10, ÿ22, ÿ33 and ÿ44. Positions ÿ22 and ÿ44 were
not identi®ed in earlier work. Obviously RNA polymerase binds to more regions of the DNA
than previously suggested. The sequence pattern found by the ANN, a repeated distance of 11
nucleotides, implies that the core RNA-polymerase binds to one side of the DNA helix only.
Another important signal is the termination of transcription. The factor-independent
terminator sequence consists of a G/C-rich dyad symmetry followed by a stretch of 4±8
adjacent thymidines. Again SNN were applied to pattern recognition in these regions. Two
dierent coding schemes were used, unitary coding and a physicochemical property (the
electron±ion-interaction potential, EIIP). EIIP coding leads to a one-dimensional description
per nucleotide. A three-layered network and bp training led to convincing results with 98.1%
correct predictions for unitary coding and 95.6% for EIIP coding in an independent test set
(Nair et al., 1994).
A bp-network system was trained on the identi®cation of E. coli promotor regions of all
spacing classes (Mahadevan and Ghosh, 1994), where a three-module approach was employed:
the ®rst module recognizes consensus boxes, the second aligns the promoters to a length of 65
bases and the third module investigates the entire sequence of 65 bases under consideration of
the interactions between the bases of the promoters. Training was performed with 106
promotor sites, and the test was done with 126 examples. These neural network systems were
rather successful in promotor identi®cation, showing a prediction accuracy of 90.2% for non-
promotor regions and 98% for promotor regions. Especially mutated promotor regions in
plasmids could be identi®ed by this modular neural network system.
In eukaryotes there are three dierent polymerases, Pol I±III, each of which has a certain
function (Fickett and Hatzigeorgiou, 1997). Identi®cation of promotor regions in anonymous
DNA sequences is an important task for gene analysis programs. Several studies of promotor
sequences revealed species-speci®c sequences (Fickett and Hatzigeorgiou, 1997). There seems to
be a non-linear relation between a sequence and its polymerase binding ability. Again, arti®cial
neural networks turned out to reliably approximate this sequence-function relation (Cai and
Chen, 1995; Kraus et al., 1996). Several multi-layered, feed-forward networks were used to
characterize promotor regions, polyadenylation sites, and mutants responsible for transcription
regulation (Nair et al., 1995; Larsen et al., 1995; Matis et al., 1996). Special care was taken in
constituting proper data sets for training and testing ANN. Sequence regions of dierent size
were analyzed to determine the local correlation coecient. The input layer window was kept
®xed at 71 nucleotides. Regions encompassing the positions (ÿ250; 250), (ÿ150; 150), (ÿ100;
100) and (ÿ60; 60) were selected, and the number of hidden layer units was varied from 0 to
100. Negative sequence examples were taken from the promotor region in order to come close
to the natural conditions of polymerase recognition (Larsen et al., 1995). For identi®cation of
single features in promotor regions asymmetric window sizes were used, and the maximum test
correlation coecient was determined. Summarizing the results it was concluded that the most
important region are the TATA-box (ÿ29; 24), the Cap signal (ÿ1; 0) and a region around (8;
10) (Larsen et al., 1995). These regions were also identi®ed in a Shannon plot, a quanti®cation
of sequence information content along the sequence (Schneider and Stephens, 1990), except for
the (8; 10) region. In addition, some information seems to be present around (ÿ19; ÿ17).
Taking together all results the concerted interplay of the Cap signal and the TATA-box is
necessary for the recognition of the transcription initiation site by polymerase.
DNA-protein interactions have also been investigated by ANN. Using a bp-network
characteristic features of zinc ®nger proteins were extracted. Several dierent physicochemical
descriptors as well as secondary structure propensities were used for data representation.
Prediction accuracy of 97% was obtained for TFIIIA type zinc ®nger DNA binding motifs,
96% were reached for the steroid receptor type zinc (Nakata, 1995).
4.1.2. Splice site identi®cation

In eukaryotes, gene transcription leads to pre-mRNA which is processed within the cell
nucleus. Since the protein coding regions (exons) are interrupted by intron sequences
identi®cation of borders between introns and exons is a major task for gene analysis by
bioinformatical methods. Until today no perfect prediction tool for exon/intron borders, the
splice sites, in long anonymous DNA is available. Anyway, several approaches with neural
networks for the identi®cation of splice sites were performed (Brunak et al., 1991; Uberbacher
and Mural, 1991; Hirst and Sternberg, 1992). One of the earliest work was done by Brunak
and co-workers (Brunak et al., 1990) who used a multi-layer feed-forward network for analysis
of human gene sequences. The goal was to proof the assigned splice sites. Trained neural
networks detected several mistakes in the published splice sites. In some genes errors have been
found in data bank entries by this neural network analysis (Brunak et al., 1990). After
correction the reading frame ®tted well. For the donor site identi®cation the best correlation
coecient was obtained with 40 hidden units and a window size of 15 nucleotides. The
Matthews correlation coecient was cc = 0.41. Clearly this result re¯ects some shortcomings
either of data selection or network architecture, or premature convergence of the training
process.
Farber and co-workers described an identi®cation system for splice sites based on dierent
codon frequencies in exons and introns (Farber et al., 1992). Since a signi®cant mutual
information exists for adjacent codons in exons but not in introns, these dierences could be
perceived by a simple perceptron already. For training back-propagation of error was used.
The prediction accuracy depended on the length of exon regions. Those containing 60 codon
ORFs gave an accuracy of 99.4% for the best neural net scheme which was a perceptron. The
neural net turned out to be superior to conventional Bayesian prediction schemes based on 90
codons.
`GeneParser', a combination of an ANN with dynamic programming, was developed for

identi®cation of the coding regions in genomic DNA reaching a correlation coecient of 0.89
for an independent test set of human genes (Snyder and Stormo, 1993, 1995). The program
considers all subintervals in a sequence for content statistics. Some obvious characteristics
imply non-overlapping introns and some statistically signi®cant sequence properties of the
region near the intron±exon boundary. Based on this data a dynamic programming algorithm
is applied to ®nd the combination of introns and exons that maximizes the likelihood function.
Several sub-optimal solutions can be obtained, each of which is the optimum solution
containing a given intron±exon junction. For a special (G,C)-rich gene a correlation coecient
of 0.94 was obtained.
Another example of intron and exon identi®cation in junk DNA used a simple bp-neural
network (Granjeon and Tarroux, 1995). Junk DNA provided negative examples. The
correlation coecient obtained for discrimination of introns and exons was 0.50 and 0.64,
respectively.
A recent study using simple perceptrons for the direct identi®cation of donor splice sites
revealed the importance of sequence encoding schemes and training algorithms (Malik et al.,
1996). From the algorithms tested the evolution strategy (Rechenberg, 1973) led to the highest
prediction accuracy for donor splice sites yielding a relative trans-information of Trel=0.723
and a Matthews correlation coecient of cc = 0.799. A binary encoding scheme was superior
to physicochemical property descriptors. The weights of the trained perceptron correspond to
certain properties of the donor splice site signal and led to the design of an idealized sequence:
C(C/T)C/A)AG*GTAAGT (* indicates the splice site).
The conclusion that neural networks are very reliable tools for splice site recognition is
supported by Ogura and co-workers who used a ¯exible neural network with dierent
architectures (Ogura et al., 1997). Their system was able to identify splice site mutations in the
factor IX gene which may cause hereditary disease.
GRAIL II is a neural network system for recognition of protein coding regions, i.e. exons in
human DNA (Xu et al., 1994). The identi®cation of exon regions results from the generation
of a candidate pool containing all possible exons, translation start sites, acceptor±donor sites
and translation stop in all reading frames. Some heuristic rules are applied to vast elimination
of candidates. The center piece of the system are three neural networks for the exact
identi®cation of the starting exon, internal exon and terminal exon candidates. The special
advantages of GRAIL II are the acceptance of a variable length of sequence windows and that
the success of exon prediction is nearly independent of exon length. This program located 93%
of all exons independent of their size with an overall false-positive rate of 12%.
Hebsgaard and co-workers combined ANN and a rule based system to predict intron splice
sites in DNA sequences of the plant Arabidopsis thaliana (Hebsgaard et al., 1996). The authors
combined local and global feature extraction techniques and ended up in a prediction system
which is claimed to be signi®cantly better than other splice-site prediction methods. A main
advantage of this system is the small number of false±positive predictions.
4.1.3. Ribosomal binding sites

Identi®cation of the ribosomal binding site and of the initiation sites for translation in
mRNA is important for characterizing anonymous genes. Ribosome binding depends on
several parameters, namely certain sequence patterns around the initiation codons, cap
structures, the distance between the start codon and the Shine±Dalgarno (SD) sequence, and
several sequences far up- or downstream of the ribosome binding region (Pedersen and
Nielsen, 1997). The existence of several not strictly locally encoded features makes it dicult to
identify such sites from a local sequence window only. Especially secondary structures have a
great in¯uence on the strength of ribosome binding (Bisant and Maizel, 1995; DeSmit and van
Duin, 1994). Besides early studies on the use of a perceptron algorithm for recognition of
translation start sites (Stormo et al., 1982) only a few studies have recently been performed.
Perceptrons and a supervised multi-layer network trained on a larger data set were applied
(Bisant and Maizel, 1995). The best prediction result was obtained for an input window size of
101 nucleotides and 9 hidden neurons, yielding 75% correct predictions with only 0.08% false
positives.
4.1.4. Prediction and classi®cation of protein structure

Surprisingly arti®cial neural networks were ®rst applied to the prediction of RNA structure
and function at the beginning of the eighties, and only several years later studies on amino acid
sequences have been performed. The ®rst applications focused on secondary structure
prediction from the amino acid sequence (Qian and Sejnowski, 1988). With the beginning of
the nineties an increasing number of ANN studies appeared concerning structure and function
of proteins (Brunak and Engelbrecht, 1996). Modeling sequence-function/structure
relationships for amino acid sequences is a very complex task with many parameters to be
considered and optimized. Ever since the pioneering work of Qian and Sejnowski many
approaches have been developed for protein secondary structure prediction. A signi®cant
contribution was done by Rost and Sander who introduced the multiple alignment of training
sequences for a better overall accuracy and a balanced training (Rost and Sander, 1993a,b). A
conventional three-layer bp-network and a data set of sequence pro®les instead of single
sequences led to a higher prediction accuracy compared to the original work of Qian and
Sejnowski (1988). Since there is an unequal distribution of the three secondary structural
elements in globular proteins the training set was corrected accordingly. This resulted in a
better prediction for b-strand. A further advance was the application of the network system to
all-a, mixed-ab and all-b proteins. Two networks were coupled, the ®rst deals with the
sequence-to-structure relation and the second with the structure-to-structure relation. The
second network was trained to recognize the structural context of single residue states without
any reference to the sequence. The output string of the ®rst network feeds the input of the
second network. Such a network coupling did not lead to a higher overall accuracy but it
improved the reproduction of helix and strand length by comparison to known average sizes of
secondary structural elements. The majority of concurring outputs from 12 networks, called the
`jury', give a further 2% increase of prediction accuracy. The ®nal prediction accuracy is 70%
for helix, 72% for loop and 64% for strand. Further details can be found in Rost (1996).
On the way to improve prediction accuracy for the three-dimensional fold of proteins
algorithms for identi®cation of super-secondary structures can be helpful. A recent work on the
prediction of super-secondary structures by applying a three-layer network revealed correlation
coecients in the range of 0.39 to 0.57 depending on the predicted motif (the results
correspond to equivalent correctness ratios of 68% to 80.6%; with the correctness ratio = m/n;
n is the occurrence of motif A, m is the number of predicted motif A, Sun et al., 1997).
There are many ideas of how to improve the prediction accuracy for protein secondary
structure. One approach is to enlarge the data set as much as possible in order to catch all
possible dierent structures. By using 318 non-homologous proteins 67% accuracy was reached
without any input information derived from multiple sequence alignments (Chandonia and
Karplus, 1996). By introducing sequence pro®les the performance was increased to an overall
value of 72.9%.
The prediction of the three-dimensional structure of a protein solely from its amino acid
sequence will be of great importance for identi®cation of potential drug targets and structure-
based molecular design. The complete genome of the yeast Saccharomyces cerevisiae revealed
about 30% open reading frames for proteins with a very low similarity to known sequences
(Oliver et al., 1992; Mewes et al., 1997). In case of the human genome this ratio is even much
more unfavorable (Strachan and Read, 1996). Therefore, prediction tools are very important,
although partly inaccurate (Burset and Guigo, 1996). Most of the current tertiary structure
prediction methods are based on very successful sequence alignment methods (Jones, 1997). A
correct prediction is mainly based on the correct alignment, and sequence-structure homology
thresholds can be used as a guideline (Sander and Schneider, 1991; Abagyan and Batalov,
1997). Nevertheless, neural networks systems have recently been applied to the prediction of
folding classes (Bohr et al., 1993; Metfessel et al., 1993; Dubchak et al., 1993, 1995). Wu and
co-workers developed a neural network approach for rapid protein family identi®cation which
is claimed to be more robust than conventional motif and pro®le search, and hidden Markov
models (Wu et al., 1996).
The diversity of proteins originates to a large extent from non-regular coil and loop
structures connecting dierent numbers of regular secondary structure elements. Many
attempts have been made to classify all kind of loops and it became clear that several motifs
are recurring (Leszczynski and Rose, 1986). In order to characterize local protein structures an
auto-associative network has been developed to discover intrinsic features of protein structures
(Fetrow et al., 1997). The trained network was applied to a larger database of proteins leading
to structure classi®cation and subsequent amino acid frequency analysis, as well as to the
identi®cation of common patterns of structural building blocks (SBB). Six SBB categories were
found, two of them belong to a-helical or b-strand secondary structures while the other four
are consistently found at the N- and C-termini of helices or strands, including regions
identi®ed as `loops' or `random coil'. This approach is very versatile since classi®cation of more
detailed structural features of amino acid sequences can be deduced by the neural network
(Fetrow et al., 1997).
Another approach for an unbiased classi®cation of structural building blocks of proteins was
performed with self-organizing networks (Schuchhardt et al., 1996). Sequence windows
encompassing eight residues were described by a sequence of the two dihedral angles f and c.
The structural information was derived from the PDB-Select data set (Hobohm et al., 1992).
During unsupervised network training a two-dimensional feature map containing 10 10
neurons was developed. Dierent local structural motifs were found by the network including
typical helical and b-strand elements and all kinds of loop structures. The Kohonen map
re¯ects a topological projection of the high-dimensional input space onto a two-dimensional
map of structural elements. In one corner of the map the helical and helical-like elements were
located while in the diagonal corner b-strand-like examples were found. A comparison of
`trajectories' in the Kohonen map enables protein classi®cation independent of sequence length
(Schuchhardt et al., 1996).
An earlier approach towards classi®cation of protein structures into families by an
unsupervised neural network gave a 15 15 neuron map. The dipeptide composition of amino
acid sequences served as molecular descriptor. Proteins belonging to the same family were
actually clustered at the same or an adjacent neuron (Ferran et al., 1994).
A self-organizing network system for pattern recognition in protein sequences which is based
on the identi®cation of common ungapped residue stretches using a similarity matrix for
sequence comparison was developed by Reich and co-workers (Hanke et al., 1996). Similarity
was expressed by converting sequence (domains, aligned sequences, segments of secondary
structure) into a characteristic signal matrix (Hanke and Reich, 1996).
Some understanding about the dynamic properties of a neural network system motivated an
approach to simulating the dynamic properties of a protein (Liebovitch and Zochowski, 1997).
Neural networks and proteins have several properties in common such as many degrees of
freedom, con¯icting constraints on energy minimization especially for energy functions with
several local minima. As an example system the two-state conformations of ion channel
proteins were simulated by analyzing the dynamics of a Hop®eld network with 100 nodes and
two memories according to the two states of an open and closed ¯ow of ions. The distribution
of the ®rst passage times was used to characterize the dynamics of the network. The
synchronous weight update led to a Hop®eld network which was more consistent with the
expected physical properties than for an unsynchronous update. As a result of the neural
network approach the authors conclude that a synchronous update of the position and velocity
of one atom at a time is more meaningful in molecular dynamics simulations of proteins
(Liebovitch and Zochowski, 1997).
4.1.5. Prediction of membrane-spanning sequences

Prediction of integral membrane protein topology from an anonymous amino acid sequence
represents a special problem in protein structure prediction. Especially recognition of ion
transporters is a dicult task since their membrane spanning regions are very hydrophilic and
cannot be detected by conventional hydrophobicity analysis (Kyte and Doolittle, 1982). Here
arti®cial neural networks have proven to provide a useful alternative approach, although still
about 5% of cytoplasmic proteins are erroneously predicted having transmembrane segments
(Rost, 1996).
For a rapid search of single-spanning transmembrane regions a low-order neural network
®lter has been designed (Huang et al., 1996). The input window encompasses 50 amino acid
residues, and the ideal number of hidden neurons was six. The performance of the optimized
network resulted in a 98% probability of true-positive detection and 6% for false-positives.
With one exception all single-spanning membrane proteins were found. In addition,
transmembrane regions of the immunoglobulin G-binding protein precursor were identi®ed,
even melittin was successfully analyzed which raised problems for a helix-speci®c algorithm
(Rost et al., 1995). Surprisingly three out of ®ve b-sheet membrane proteins could be identi®ed
by this ®lter-system (Huang et al., 1996).
A commercially available neural network (BrainMaker) has been optimized for the
identi®cation of multispanning transmembrane regions (Dombi and Lawrence, 1994). An
increasing size of the training sets from 214 to 1751 sequences led to non-linear increase of
learning success from 75 to 98%. The training sets consisted of nearly equal amounts of
membrane-spanning sequences and non-membrane-spanning sequences. Characteristic amino
acid distributions inside and outside of the membrane region were derived from weight factors
analysis. As a general scheme transmembrane sequences revealed a polar region near the inner
lipid interface, and a hydrophobic sequence including the length of the lipid bilayer. The
network was tested for the prediction of transmembrane regions of the structurally known
archaebacterial membrane protein bacteriorhodopsin. Since the network was not trained by the
overlapping window technique the amount of overpredictions (false-positives) was high
although the seven transmembrane helices were correctly assigned.
A high prediction accuracy for identi®cation of transmembrane regions has been obtained
with arti®cial neural networks in combination with sequence alignment procedures (Rost et al.,
1995). Bp-network training was performed with experimentally veri®ed transmembrane and
non-transmembrane sequence data. The input window length was 13 amino acid residues and
the network output layer contained two neurons, one for each state of the central amino acid
residue, namely for inside and one for outside of the membrane helix. The information
stemming from multiple sequence alignments (pro®le) signi®cantly contributed to the overall
prediction accuracy. All transmembrane regions were identi®ed correctly. From a rigorous
cross-validation test on 69 proteins with experimentally determined locations of transmembrane
segments the accuracy per residue was estimated at 95%. From a negative control set less than
5% of the sequences were predicted as transmembrane helix. The tools were applied to the
identi®cation of transmembrane regions in 29 open reading frames of the yeast chromosome
VIII. At least two transembrane helices were indicated for 59 proteins corresponding to about
25% of all proteins on yeast chromosome VIII, and about 20% showed multiple
transmembrane segments (Rost et al., 1995).
In a further approach towards identi®cation of transmembrane regions special care was
taken in preparation of training data (Fariselli and Casadio, 1996). Single non-homologous
membrane proteins were used. The prediction accuracy for transmembrane helices was 78%,
and topology prediction yielded 70%. Two neural networks were coupled where the output of
network 1 provided the input for network 2. Porins were not identi®ed by the network since
their transmembrane regions are in b-sheet conformation (Fariselli and Casadio, 1996). A more
recent development is the program `TransMem' (Aloy et al., 1997) which yields a prediction
accuracy is in the range of 99.9% for the best and 71.7% for the worst prediction of the
proteins tested.
Identi®cation of transmembrane helices alone is not sucient for the characterization of
membrane proteins. Additional topology information is necessary to predict protein function.
An improvement of the method developed by Rost and co-workers (Rost et al., 1995) included
the combination of several dierent ®lter systems as well the inclusion of expert knowledge
(Rost et al., 1996). The observation that positively charged amino acids prefer to be located on
the cytoplasmic side, the `positive-inside rule', provided valuable information (von Heijne and
Gavel, 1988; Sipos and von Heijne, 1993). In a dynamic-like programming the output values of
the network were combined leading to a slight improvement of the prediction of
transmembrane regions and their topology. A comparison of prediction accuracy for

eukaryotic, prokaryotic and viral transmembrane proteins and their topology revealed a
slightly better result for eukaryotic membrane proteins than for the others.
A special ANN for detection of membrane-spanning regions in amino acid sequences has
been developed in our group (Lohmann et al., 1994, 1996). The idea was to predict membrane/
non-membrane borders in the amino acid sequences rather than complete membrane-spanning
segments, and to optimize both network architecture and residue encoding by an evolutionary
algorithm. In contrast to most of the other network applications a set of physicochemical
properties provided possible residue descriptors. The evolutionary algorithm applied performs
a simultaneous optimization of both architecture and connection weights of neural ®lter
system. It consists of the following steps:
1. Generation of an initial set of network structures.

2. Assignment of separate populations.
3. Optimization of network weights in each population during isolation time.
4. Selection among the network architectures on the level of populations.
5. Variation of the best architectures and replacing the worst.
6. Start next cycle at step 3 with the partly new set of architectures.
In Fig. 11 the architecture of an optimized network is shown. Seven amino acid properties
were used as network input. It turned out that for feature extraction from membrane/non-
membrane borders electronic properties (polarity, hydrophobicity) seem to be more important
than steric properties (bulkiness, surface area). This result substantiates earlier ®ndings (von
Fig. 11. Architecture of a topology-optimized network for prediction of membrane-spanning parts of amino acid
sequences. The number of hidden layer units and the connectivity of input and hidden units was subjected to
evolutionary optimization. Sequence windows encompassing 13 residues were encoded by up to seven physico±
chemical properties (1: hydrophobicity (Engelman et al., 1986); 2: volume (Zamyatnin, 1972); 3: surface area
(Chothia, 1975); 4: hydrophilicity (Hopp and Woods, 1981); 5: bulkiness; 6: refractivity; 7: polarity (Jones, 1975)).
After network optimization property 5 (bulkiness) remained unconnected.
Heijne and Gavel, 1988; von Heijne, 1989). The overall prediction accuracy for a test set of
membrane proteins was 92%.
4.1.6. Identi®cation of protein targeting sequences

The intracellular transport of newly synthesized proteins is regulated by well-de®ned
targeting signals. Targeting signals of secreted proteins, membrane proteins, and most of the
nuclear-encoded organelle proteins are often encoded in N-terminal extensions of the protein
(Pugsley, 1989; von Heijne, 1994a; Claros et al., 1997). After or during protein translocation
across a compartment membrane the targeting signal is removed by speci®c signal peptidase
activity releasing the mature protein (von Heijne, 1994b). Although sequence identity between
targeting sequences and their cleavage sites is very low it is possible to extract relevant features
by neural network systems (von Heijne, 1983, 1985; Schneider and Wrede, 1993a; Nielsen et
al., 1997). Prediction systems for both targeting signals and signal peptidase recognition sites
have been developed. A ®rst attempt towards identi®cation of cleavage site sequences yielded a
prediction accuracy of 97% in an independent test set (Schneider and Wrede, 1992, 1993a,
1994a). Amino acid sequences were encoded by selected physicochemical properties, and the
network was trained by an evolutionary algorithm. The ®lter system was successfully used as
®tness function for de novo design of idealized signal peptidase target sites (Schneider and
Wrede, 1994a; Schneider et al., 1995a; Wrede et al., 1998).
A recent successful study using neural networks for the identi®cation of secretory signal
peptides and signal anchors has been performed by the groups of von Heijne and co-workers
(Nielsen et al., 1997). These authors used bp-networks lacking hidden units or containing one
layer of 2 to 10 hidden units. Sequence data were presented to the network in conventionally
encoded moving windows (Qian and Sejnowski, 1988). Main emphasis was put on the
collection of a proper training set for prediction of cleavage sites against the background of all
other sequences and classi®cation of amino acids as belonging to the signal peptide or not. For
the ®rst task the `C-score' and for the second the `S-score' were calculated by the networks. An
elegant combination of the S- and C-scores lead to the `Y-score' which proved to provide
useful predictions for sequences of gram-negative bacteria. For eukaryotic and gram-positive
bacterial sequences the prediction accuracy was lower. Application of the prediction system to
the whole genome of Haemophilus in¯uenzae yielded 14% potentially secreted proteins (Nielsen
et al., 1997).
Recently an analysis using Kohonen-networks for clustering mitochondrial targeting signals
has been performed (Schneider et al., 1998). Sequences were encoded by physicochemical
properties. It turned out that there are three major classes of import signals which can be
characterised by the occurrence of arginine in either one of the positions ÿ10, ÿ3 or ÿ2 relative
to the signal peptidase processing site, thereby substantiating previous statistical ®ndings
(Gavel and von Heijne, 1990). These three cleavage site classes do not seem to be species-
speci®c. Prediction of mitochondrial signal peptidase cleavage site positions by supervised
networks failed due to a large amount of over-prediction; it is concluded that additional non-
local sequence information is required to recognize the experimentally observed cleavage site
from the set of candidates. Classi®cation of N-terminal sequence parts by unsupervised neural
networks might help deciphering the required relevant features (Schneider and Broger, 1998).
4.1.7. Peptide±protein binding

Many reactions in the living cell are carried out via ligand±protein interactions. Especially
the immune system is based on the recognition of many dierent peptides with the aim to
distinct self and non-self molecules. Prediction of peptide binding to MHC I without knowing
the three-dimensional ligand structure is a typical task in drug design. A recent work by
Gulukota et al. (1997) describes a comparison of two dierent methods, neural networks and a
polynomial method, for predicting speci®city and sensitivity of peptide binding to HLA-A.2.1.
It turned out that neural networks were superior to the applied statistical method. Several
predictions were determined as for speci®city, sensitivity, positive predicted value (PPV), and
negative predicted value (NPV). Sensitivity and PPV increased with the ratio to accept weak
binders. For speci®city and NPV the polynomial method leads to slightly better values than the
network approach. To test whether the peptide training set was representative of the problem
the ANN were trained by increasing numbers of training data. There was no increase in
speci®city compared to a small data set, but sensitivity increased signi®cantly.
4.1.8. Identi®cation of glycosylation sites

A large number of eukaryotic proteins contain glycosylated serine or threonine residues.
Glycosylation patterns are important for cell±cell recognition in tissues and for cell
development (Fukuda, 1991; Hart, 1992; Muramatsu, 1993). The enzyme reaction mechanism
catalyzing the covalent linkage between the OH-group of the amino acid and the sugar moiety
N-acetylgalactosamine (GalNAc) is still unknown. Identi®cation of glycosylation sites of a
protein can be done chemically by Edman degradation, which is very time-consuming. Since
many protein primary structures are derived from DNA sequences prediction tools for the
identi®cation of O-glycosylation sites can be useful. There are only two reports applying
arti®cial neural networks for GalNAc transferase reaction sites (Hansen et al., 1995; Cai et al.,
1997). A report by Brunak and co-workers (Hansen et al., 1995) gives a detailed description of
statistical analysis and development of several arti®cial neural network types as O-glycosylation
site predictors. Special care was taken to assemble a proper training data set based on
experimentally veri®ed O-glycosylation sites and low sequence identities. Several supervised
neural networks with a single hidden layer containing 0±15 hidden units were trained by
conventional bp. The sequence windows were symmetric, ranging from 3 to 49 residues. Amino
acid sequences were represented as bit strings. The acceptor speci®city of the GalNAc-
transferase prefers threonine, serine and proline upstream of position ÿ4 and downstream of
position +4. This pattern was not found in non-glycosylated examples. Cysteins and
tryptophans adjacent to the glycosylation sites as in position ÿ2 to +2 or in ÿ1 to +1 have
not been found. Most glycosylation sites were observed in the N-terminal part of the proteins.
The prediction accuracy for glycosylated threonine is higher than for glycosylated serine. For a
`threonine net' the optimal window size was 17 residues, yielding a Matthews correlation
coecient of cc = 0.42 in a low similarity test set and cc = 0.88 in a high similarity test set.
For the `serine net' correlation coecients of cc = 0.33 and cc = 0.7 were found respectively.
Compared to other statistical analysis like weight matrices the neural networks showed the best
prediction performance.
4.2. Neural networks as ®tness functions in drug design
Besides target identi®cation and prediction of protein structure the predominant current
applications of ANN in the drug development process are: (i) automatic clustering of
compound libraries by self-organizing networks (Kohonen and related approaches, ART
systems), and (ii) SAR modelling employing supervised techniques (bp-networks, RBF
networks). The ®rst successful attempts at self-determining compound classi®cation according
to molecular activity by Kohonen-networks were performed by Gasteiger and co-workers
(Zupan and Gasteiger, 1993; Gasteiger and Li, 1994; Gasteiger et al., 1994). They used the
electrostatic surface potentials of molecules as the basis for classi®cation. This approach has
been further developed since then (Bauknecht et al., 1996; Barlow, 1995; Holzgrabe et al.,
1996; Anzali et al., 1997; Polanski, 1997).
Independent of the choice of the molecular representation the basic idea using UNN is to
generate one or more clusters containing the àctive' compounds, and other clusters containing
the rest of the molecules contained in the initial data set. This approach was also used to
estimate the diversity of compound libraries (Sadowski et al., 1995). The clusters generated by
the UNN are characterised by the respective features extracted, i.e. the associated weight
vectors. The weight vectors can be used as feature templates for further in silico data base
screening. This can drastically reduce the screening eort needed to obtain lead compounds.
For example, one might start o with the construction of a small compound library by a
simple combinatorial approach. This limited set of molecules is tested for a desired activity,
e.g. enzyme inhibition (ki or IC50 measurements), and in parallel the data set is clustered by a
self-organizing network. If the majority of the inhibitory compounds form a single cluster or a
group of neighboring clusters in the neuron map the associated weight vectors serve as a
template for data base screening. The resulting hits are those data base entries that are
classi®ed by the network as belonging to one of the àctive' clusters containing the already
known inhibitors. This example demonstrates one possibility how high-throughput screening
and ANN can complement each other.
We followed this approach for the identi®cation of potential new thrombin inhibitors in a
large data base: to construct an initial small data set 409 four-component Ugi reactions were
performed in a combinatorial approach, and the products were tested for their ability to inhibit
thrombin enzymatic activity (Weber et al., 1995). For feature extraction by a Kohonen-
network each compound was described by the cross-correlation of hydrogen-bond donors,
hydrogen-bond acceptors, lipophilic and charged groups (cf. Section 3). The resulting clustering
in the output layer of the network is shown in Fig. 12. White neurons contain only compounds
with ki>10 mM (de®ned as ìnactive'), the darker a neuron is drawn the more Ugi products
with kiR10 mM (de®ned as àctive' inhibitors) are grouped together. Neuron (6/3) contains the
largest fraction of inhibitors, and the weights of neuron (6/3) were used to search the
MedChem data base (version 1997, as distributed by Daylight Chemical Information Systems,
Irvine) containing over 33,000 compounds for new potential inhibitors. Many known thrombin
inhibitors were assigned to neuron (6/3), the highest ranking known inhibitor was PPACK
which irreversibly binds to the enzyme (Kettner and Shaw, 1981, for a discussion of thrombin
active site features, see (Grootenhuis and Karplus, 1996)). Network development and the
computer-based data base search took only a few minutes on Roche in-house software. An
Fig. 12. A planar 7 7 Kohonen map of thrombin inhibitors. Neurons are drawn as squares. Four-component Ugi-
products were clustered by a Kohonen-network. The localization of compounds with ki<10 mM is shown by grey-
shading. In neuron (6/3) most of the inhibitors were grouped together. In the map the inhibitors are surrounded by
inactive compounds (white neurons).
evolutionary peptide design cycle based on this principle including further CAMD tools has
been published recently (Schneider et al., 1995c). Similar experiments have been described for
the development of a self-organizing network modelling corticosteroid and testosterone binding
globulins (Polanski, 1997).
A second example is thought to demonstrate the usefulness of Kohonen-type networks for
clustering large data bases. We encoded all 33,000 entries of the MedChem data base by the
cross-correlation vector of functional groups, as described above. Then, a Kohonen-network
containing 25 25 output neurons arranged in a plane was trained by these data. During the
training process the network adapted the topology of the data distribution (cf. Fig. 4), and as a
consequence similar molecules were grouped together. In Fig. 13 some examples are shown.
The resulting 25 25 = 625 compound classes are representative of the whole data base. In
other words, the 625 weight vectors of the trained network can be used to describe
predominant molecular features of the MedChem compound library. This can be helpful to
assess molecular diversity (Sadowski et al., 1995), and to construct small sets of compounds
covering a de®ned variety of chemical features (Schneider et al., 1995c; Bauknecht et al., 1996).
Following this concept new endothelin antagonists were found using a Kohonen-network for
clustering known endothelin receptor ligands and subsequent data bank screening (Anzali et
al., 1997). The authors identi®ed the benzothiadiazole group as a surrogate for
methylendioxyphenyl where the molecular electrostatic potential of the ligands served as the
network input parameters. A similar approach was followed by Polanski (1997) for
identi®cation of active site features in corticosteroid and testosterone binding globulins (CBG/
TBG): self-organizing networks were developed for pseudo-receptor modelling based on a set
of steroids with known CBG/TBG anity data. Both electrostatic and steric descriptors were
used for compound representation, including the de®nition of `hypermolecules' which are
Fig. 13. A planar 25 25 Kohonen map was trained with approx. 33,000 entries of the MedChem data base. The
resulting map re¯ects the chemical feature space of the data base. The four molecular structures are shown which
are most similar to the weight vectors of the neurons (1/1), (1/25), (25/1), and (25/25). The neuron map can be
regarded as a condensed version of the original large data base, where the original diversity has been retained.
special two-dimensional topological representations of superimposed molecular compounds

(Polanski, 1997).
Supervised networks were pioneered in drug design applications (Tetko et al., 1993, 1994a),
and also introduced to the peptide design ®eld (Schneider and Wrede, 1994a,b; Wrede et al.,
1998). For example, it was possible to train SNN by a set of known inhibitors of HIV-1
reverse transcriptase using the bp algorithm and neuron pruning, and to identify a new
inhibitor, `B2', using the trained network as prediction system (Tetko et al., 1994a; see also
references cited therein). Prior to network training a set of six molecular descriptors was
selected from an initial set of 50 descriptors by hierarchical clustering. This technique
drastically reduced the network input space (number of input neurons required) and made
possible the use of a small data set. The authors claim from a comparison of neural network
methods to adaptive least squares (Moriguchi et al., 1980), k-nearest neighbors (Tetko et al.,
1994b), and linear machine learning (Saaki et al., 1984) that ``.. the performance of neural
networks on learning and control sets is better'' (Tetko et al., 1994a). A further thorough
discussion of ANN as a non-linear data modelling device and a comparison to other
techniques can be found in Livingstone et al. (1997). Along with a case study dealing with
biodegradation modelling an intercommunicating hybrid system has been proposed using a
genetic algorithm and a bp-neural network model for solving the general problem of designing
molecules with speci®c properties (Devillers, 1996c). This idea consequently follows a strategy
which was already successfully applied to peptide design (Schneider and Wrede, 1994a;
Schneider et al., 1994a; Wrede et al., 1998).
A neural network system with adaptive input parameter selection by a genetic algorithm
seems to have a large potential being an improvement of conventional fully-connected bp-
networks with ®xed inputs (So and Karplus, 1997a). The authors describe an approach to the
construction of QSAR models from molecular similarity matrices. The system compares
favorably with partial least-squares (Dunn et al., 1984), genetic regressions, and other
established 3D-QSAR methods (So and Karplus, 1997a,b). The main advantage of the system
is an intelligent selection of neural network input variables by a genetic algorithm. Rather than
feeding in all available data descriptors a similarity matrix is computed, and from this matrix a
set of meaningful variables is automatically selected. In a way, this concept is comparable to
the neural network for membrane protein prediction described in Section 4.1 (Fig. 11). The
genetic algorithm selects the most relevant variables, and the neural networks provides a
model-free non-linear mapping device. The concept of descriptor selection by evolutionary
algorithms was originally coined by Rogers and Hop®nger (1994) and Luke (1994).
5. Conclusions and current challenges
Independent of the choice of network type and architecture applied the crucial parts of an
analysis are appropriate data selection, description and pre-processing. Like any other analysis
tool neural networks can only be successful if a solution to a given problem can be found on
the basis of the data used. Although this statement seems to be trivial in many real-life
applications it can be very dicult to collect representative data and de®ne a comprehensive
and useful data description. Sometimes techniques like PCA, data smoothing or ®ltering can be
used prior to network application to facilitate network training. In cases where the data
description is high-dimensional per se it can be helpful to focus only on a sub-channel
(however, this requires knowledge about essential data features) or to perform a PCA step to
reduce the number of dimensions to be considered without signi®cant loss of information.
Hybrid network architectures consisting of a pre-processing layer, an unsupervised layer for

course-grain data clustering, and a supervised layer for ®ne-analysis were already successfully
applied to a variety of tasks and seem to provide very useful techniques for (Q)SAR modelling
and molecular design. Both unsupervised and supervised networks have proven their usefulness
and applicability to a number of dierent problems. Nevertheless, it requires signi®cant
expertise to apply them eciently. The number of real molecule designs guided by ANN is still
rather limited. On the other hand, the number of ANN models for prediction of molecular
activities or properties is increasing as can be judged from the increasing number of
publications covering this topic during the past two or three years (a `typical' recent example
using bp-networks and a set of meaningful molecular descriptors is provided by Hosseini et al.
(1997)). Evolutionary algorithms are able to bridge the gap between QSAR models and de
novo design. By a cyclic process consisting of structure generation, property prediction and
selection the vast chemical space can be eciently searched for desired molecules (Schneider
and Wrede, 1994a; Schneider et al., 1994b; Devillers, 1996d; Wrede et al., 1998).
A current limitation of ANN is the lack of tools for interpretation of extracted features in
chemically meaningful terms. Sometimes graphical representation or statistical analysis of
connection weights can already help to provide an idea of what molecular features are required
for a molecular property or biological activity. However, in most cases weight analysis alone is
not sucient to explain the chemical nature of function-determining molecular attributes. To
overcome this limitation and to further enhance the great potential of ANN techniques a
massive knowledge transfer is needed from the computer sciences, especially the ®elds of
arti®cial intelligence and neurocomputing, to chemistry and bioinformatics. Eventually
symbolic machine learning approaches can complement feature extraction by ANN. The
importance of rational and meaningful data representation in combination with both ANN
and non-ANN techniques has been demonstrated in the prediction of protein secondary
structure (King and Sternberg, 1996; Kawabata and Doi, 1997).
For most potential applications of neural networks to drug design conventional techniques
exist, and neural networks should be considered as complementary (Loew et al., 1993; Marrone
et al., 1997). However, ANN can sometimes provide a simpler and more elegant, and
sometimes even superior solution to these tasks. Of special attraction is a combination of
evolutionary algorithms for descriptor selection and ANN as function approximators. This
conglomerate seems to provide a very useful general approach to QSAR modelling. Recently
an inductive logic programming technique was used for descriptor selection and determination
of most predictive data attributes by rule generation (King and Srinivasan, 1997). Such
approaches seem to be well-suited for data pre-processing.
In a recent book review of Neural Networks in QSAR and Drug Design, L.B. Kier wrote: ``...
The medicinal chemist who aspires to professional success in the new millennium must have a
rich background in these areas. It is no longer just sucient to synthesize and test; experiments
are played out in silico with prediction, classi®cation, and visualization being the necessary
tools of medicinal chemistry.'' (Kier, 1997). In connection with other methods the many types
of ANN provide a ¯exible modular framework to help speed up the drug development process
and lead to improved rational designs of molecules with desired structures and activities.
Acknowledgements
Petra Schneider is thanked for many valuable discussions and encouragement. Hans-Joachim
BoÈhm, Clemens Broger, Gerard Schmid, Manfred Kansy, Lutz Weber, Uwe Hobohm, Martin
Stahl, Daniel Bur, Daniel M. Doran, Martin Neeb, Stefan Fischer, Stefan MuÈller, Arif Malik,
Johannes Schuchhardt, and Gerhard MuÈller are thanked for their support. Parts of our
original research described were supported by grants from the Fonds der Chemischen Industrie
and the Boehringer Ingelheim Fonds (to G. Schneider) and by a grant from the
Bundesministerium fuÈr Bildung, Wissenschaft und Forschung (BMBF, DETHEMO project, to
P. Wrede).
References
Abagyan, R.A., Batalov, S., 1997. Do aligned sequences share the same fold?. J. Mol. Biol. 273, 355±368.
Agra®otis, D.K., 1997. Stochastic algorithms for maximizing molecular diversity. J. Chem. Inf. Comput. Sci. 37, 841±851.
Aloy, P., Cedano, J., Oliva, B., AvileÂs, F.X., Querol, E., 1997. `TransMem': A neural network implemented in Excel spreadsheets for
predicting transmembrane domains of proteins. Comput. Appl. Biosci. 13, 231±234.
Amari, S.I., 1993. Mathematical methods of neurocomputing. In Barndor-Nielsen, O.E., Jensen, J.L., Kendall, W.S. (Eds.),
Networks and Chaos ± Statistical and Probabilistic Aspects. Chapman and Hall, London, pp. 1±39.
Anderson, J.A., Rosenfeld, E. (Eds.), 1988. Neurocomputing: Foundations of Research. MIT Press, Cambridge, MA.
Andrea, T.A., Kalayeh, H., 1991. Applications of neural networks in quantitative structure±activity relationships of dihydrofolate re-
ductase inhibitors. J. Med. Chem. 34, 2824±2836.
Anzali, S., Mederski, W.W.K.R., Osswald, M., Dorsch, D., 1997. 1. Endothelin antagonists: Search for surrogates of methylendiox-
yphenyl by means of a Kohonen neural network. Bioorg. Med. Chem. Lett. 8, 11±16.
Aoyama, T., Ichikawa, H., 1991. Neural networks applied to pharmaceutical problems. V. Obtaining the correlation indexes between
drugs activity and structural parameters using a neural network. Chem. Pharm. Bull. 39, 372±378.
Aoyama, T., Ichikawa, H., 1993. How to see characteristics of structural parameters in QSAR analysis: Descriptor mapping using
neural networks. SAR QSAR Environ. Res. 1, 115±130.
Aoyama, T., Suzuki, Y., Ichikawa, H., 1990. Neural networks applied to structure±activity relationships. J. Med. Chem. 33, 905±908.
Ash, R.B., 1965. Information Theory. Reprinted 1990, Dover, New York.
Balaban, A.T., 1997. From chemical topology to 3D geometry. J. Chem. Inf. Comput. Sci. 37, 645±650.
Barlow, T.W., 1995. Self-organizing maps and molecular similarity. J. Mol. Graph. 13, 24±27.
Baskin, I.I., Palyulin, V.A., Ze®rov, N.S.A., 1993. Methodology for searching direct correlations between structures and properties of
organic compounds by using computational neural networks. Dokl. Akad. Nauk 333, 176±179.
Baskin, I.I., Skvortsova, M.I., Stankevich, I.V., Ze®rov, N.S., 1995. On basis of invariants of labeled molecular graphs. J. Chem. Inf.
Comput. Sci. 35, 527±531.
Baskin, I.I., Palyulin, V.A., Ze®rov, N.S., 1997. A neural device for searching direct correlations between structures and properties of
chemical compounds. J. Chem. Inf. Comput. Sci. 37, 715±721.
Bauknecht, H., Zell, A., Bayer, H., Levi, P., Wagener, M., Sadowski, J., Gasteiger, J., 1996. Locating biologically active compounds in
medium-sized heterogeneous datasets by topological auto-correlation vectors: Dopamine and benzodiazepine agonists. J. Chem.
Inf. Comput. Sci. 36, 1205±1213.
Baum, E.B., Haussler, D., 1989. What size net gives valid generalization?. Neural Computation 1, 151±160.
Bisant, D., Maizel, J., 1995. Identi®cation of ribosome binding sites in E. coli using neural network models. Nucl. Acids Res. 23, 1632±
1639.
Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford.
Blundell, T.L., 1996. Structure-based drug design. Nature 384, 23±26.
BoÈhm, H.-J., Klebe, G., Kubinyi, H., 1996. Wirkstodesign. Spektrum Akademischer Verlag, Heidelberg, Berlin, Oxford.
Bohr, J., Bohr, H., Brunak, S., Cotterill, R.M.J., Fredholm, H., Lautrup, B., Petersen, S.B., 1993. Protein structures from distance
inequalities. J. Mol. Biol. 231, 861±869.
Bradley, D., 1997. Distilled wisdom. New Scientist 154(15), 40±43.
Bratko, I., Muggleton, S., 1995. Applications of inductive logic programming. Commun. Assoc. Comput. Machinery 38, 65±70.
Brunak, S., Engelbrecht, J., 1996. Protein structure and sequential structure of mRNA: Alpha helix and beta-sheet signals at the
nucleotide level. Proteins 25, 237±252.
Brunak, S., Engelbrecht, J., Knudsen, S., 1990. Neural networks detects errors in the assignment of mRNA splice sites. Nucl. Acids
Res. 18, 4797±4801.
Brunak, S., Engelbrecht, J., Knudsen, S., 1991. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol.
Biol. 220, 49±65.
Burden, F.R., 1998. Holographic neural networks as nonlinear discriminants for chemical applications. J. Chem. Inf. Comput. Sci. 38,
47±53.
Burset, M., Guigo, R., 1996. Evaluation of gene structure prediction programs. Genomics 34, 353±367.
Cai, Y., Chen, C., 1995. Arti®cial neural network method for discriminating coding regions of eukaryotic genes. Comput. Appl. Biosci.
11, 497±501.
Cai, Y.D., Yu, H., Chou, K.C., 1997. Arti®cial neural network method for predicting the speci®city of GalNAc-transferase. J. Protein
Chem. 16, 689±700.
Calas, M., Cordina, G., Bompart, J., Bari, M.B., Jei, T., Ancelin, M.L., Vial, H., 1997. Antimalarial activity of molecules interfering
with Plasmodium falciparum phospholipid metabolism. Structure±activity relationship analysis. J. Med. Chem. 40, 3557±3566.
CarboÂ, R., Leyda, L., Arnau, M., 1980. An electron density measure of the similarity between two compounds. Int. J. Quantum Chem.
17, 1185±1189.
Carpenter, G.A., Grossberg, S. (Eds.), 1991. Pattern Recognition by Self-Organizing Neural Networks. The MIT Press, Cambridge,
MA.
Chandonia, J.M., Karplus, M., 1996. The importance of larger data sets for protein secondary structure prediction with neural net-
works. Protein Sci. 5, 768±774.
Chothia, C., 1975. The nature of accessible and buried surfaces in proteins. J. Mol. Biol. 105, 1±14.
Claros, M.G., Brunak, S., von Heijne, G., 1997. Prediction of N-terminal protein sorting signals. Curr. Opin. Struct. Biol. 7, 394±398.
Cottrell, M., Fort, J.C., 1986. A stochastic model of retinotopy: A self-organizing process. Biol. Cybern. 53, 405±411.
Cybenko, G., 1988. Continuous valued neural networks with two hidden layers are sucient. Technical Report, Department of
Computer Science, Tufts University, Medford, MA.
Cybenko, G., 1989. Approximations by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303±314.
Dean, P.M. (Ed.), 1995. Molecular Similarity in Drug Design. Blackie Academic and Professional, London.
Denker, J., Schwartz, D., Witner, B., Solla, S., Hop®eld, J., Howard, R., Jacke, L., 1987. Automatic learning, rule extraction and gen-
eralization. Complex Systems 1, 877±922.
DeSmit, M.H., van Duin, J., 1994. Translational initiation on structured messengers. J. Mol. Biol. 235, 173±184.
Devillers, J., 1995. Display of multivariate data using non-linear mapping. In van de Waterbeemd, H. (Ed.), Chemometric Methods in
Molecular Design. VCH, Weinheim, pp. 255±263.
Devillers, J. (Ed.), 1996. Neural Networks in QSAR and Drug Design. Academic Press, London.
Devillers, J., 1996. Strengths and weaknesses of the back-propagation neural network in QSAR and QSPR studies. In Devillers, J.
(Ed.), Neural Networks in QSAR and Drug Design. Academic Press, London, pp. 1±46.
Devillers, J., 1996c. Designing molecules with speci®c properties from intercommunicating hybrid systems. J. Chem. Inf. Comput. Sci.
36, 1061±1066.
Devillers, J. (Ed.), 1996. Genetic Algorithms in Molecular Modelling. Academic Press, London.
Dombi, G.W., Lawrence, J., 1994. Analysis of protein transmembrane helical regions by a neural network. Protein Sci. 3, 557±566.
Domine, D., Wienke, D., Devillers, J., Buydens, L., 1996. A new nonlinear neural mapping technique for visual exploration of QSAR
data. In Devillers, J. (Ed.), Neural Networks in QSAR and Drug Design. Academic Press, London, pp. 223±253.
Doolittle, R.F. (Ed.), 1996. Computer methods for macromolecular sequence analysis. Methods in Enzymology, Vol. 266. Academic
Press, San Diego, London.
Downs, G.M., Willett, P., 1995. Clustering of chemical structure databases for compound selection. In van de Waterbeemd, H. (Ed.),
Advanced Computer-Assisted Techniques in Drug Discovery. VCH, Weinheim, pp. 111±130.
Drews, J., Ryser, S., 1997. The role of innovation in drug development. Nature Biotechnol. 15, 1318±1319.
Dubchak, I., Holbrook, S.R., Kim, S.H., 1993. Prediction of protein folding class from amino acid composition. Proteins 16, 79±91.
Dubchak, I., Muchnik, I., Holbrook, S.R., Kim, S.H., 1995. Prediction of protein folding class using global description of amino acid
sequence. Proc. Natl. Acad. Sci. USA 92, 8700±8704.
Dunn, W.J., Wold, S., Edlund, U., Hellberg, S., 1984. Multivariate structure±activity relationships between data from a battery of
biological tests and an ensemble of structure descriptors: The PLS method. Quant. Struct.-Act. Relat. 3, 131±137.
Engelman, D.A., Steitz, T.A., Goldman, A., 1986. Identifying nonpolar transbilayer helices in amino acid sequences of membrane pro-
teins. Ann. Rev. Biophys. Biophys. Chem. 15, 321±353.
Farber, R., Lapedes, A., Sirotkin, K., 1992. Determination of eukaryotic protein coding regions using neural networks and infor-
mation theory. J. Mol. Biol. 226, 471±479.
Fariselli, P., Casadio, R., 1996. HTP: A neural network based method for predicting the topology of helical transmembrane domains in
proteins. Comput. Appl. Biosci. 12, 41±48.
Ferran, E.A., P¯ugfelder, B., Ferrara, P., 1994. Self-organized neural maps of human protein sequences. Protein Sci. 3, 507±521.
Fetrow, J.S., Palumbo, M.J., Berg, G., 1997. Patterns, structures, and amino acid frequenies in structural building blocks, a protein
secondary structure classi®cation scheme. Proteins Struct. Funct. Genet. 27, 249±271.
Fickett, J.W., Hatzigeorgiou, A.G., 1997. Eukaryotic promotor recognition. Genome Res. 7, 861±878.
Fukuda, M., 1991. Leukosialin, a major O-glycan-containing sialoglycoprotein de®ning leukocyte dierentiation and malignancy.
Glycobiology 1, 337±356.
Gasteiger, J., Zupan, J., 1993. Neural networks in chemistry. Angew. Chem. Int. Ed. Engl. 32, 503±527.
Gasteiger, J., Li, X., 1994. Representation of the electrostatic potentials of muscarinic and nicotinic agonists with arti®cial neural net-
works. Angew. Chem. Int. Ed. Engl. 33, 643±646.
Gasteiger, J., Li, X., Rudolph, C., Sadowski, J., Zupan, J., 1994. Representation of molecular electrostatic potential by topological
feature maps. J. Am. Chem. Soc. 116, 4608±4620.
Gavel, Y., von Heijne, G., 1990. Cleavage-site motifs in mitochondrial targeting peptides. Protein Eng. 4, 33±37.
Good, A.C., 1995. 3D molecular similarity indices and their application in QSAR studies. In Dean, P.M. (Ed.), Molecular Similarity in
Drug Design. Blackie Academic and Professional, London, pp. 25±56.
Good, A.C., Peterson, S.J., Richards, W.G., 1993a. QSAR from similarity matrices. Technique validation and application in the com-
parison of dierent similarity evaluation methods. J. Med. Chem. 36, 2929±2937.
Good, A.C., So, S.S., Richards, W.G., 1993b. Structure±activity relationships from molecular similarity matrices. J. Med. Chem. 36,
433±438.
Granjeon, E., Tarroux, P., 1995. Detection of compositional constraints in nucleic acid sequences using neural networks. Comput.
Appl. Biosci. 11, 29±37.
Grootenhuis, P.D., Karplus, M., 1996. Functionality map analysis of the active site cleft of human thrombin. J. Comput. Aided Mol.
Des. 10, 1±10.
Grossberg, S., 1976a. Adaptive pattern classi®cation and universal recoding: I. parallel development and coding of neural feature
detectors. Biol. Cybern. 23, 121±134.
Grossberg, S., 1976b. Adaptive pattern classi®cation and universal recoding: I. feedback, expectation, o¯action, illusions. Biol. Cybern.
23, 187±202.
Gulukota, K., Sidney, J., Sette, A., DeLisi, C., 1997. Two complementary methods for predicting peptides binding major histocompat-
ibility complex molecules. J. Mol. Biol. 267, 1258±1267.
Hanke, J., Reich, J.G., 1996. Kohonen map as a visualization tool for the analysis of protein sequences: multiple alignments, domains
and segments of secondary structures. Comput. Appl. Biosci. 12, 447±454.
Hanke, J., Beckmann, G., Bork, P., Reich, J.G., 1996. Self-organizing hierarchic networks for pattern recognition in protein sequence.
Protein Sci. 5, 72±82.
Hansch, C., Leo, A., 1995. Exploring QSAR. Fundamentals and Applications in Chemistry and Biology. ACS Professional Reference
Book, Washington.
Hansch, C., Maloney, P.P., Fujita, T., Muir, R.M., 1962. Correlation of biological activity of phenoxyacetic acids with Hammett sub-
stituent constants and partition coecients. Nature 194, 178±180.
Hansch, C., Muir, R.M., Fujita, T., Maloney, P.P., Geiger, F., Streich, M., 1963. Correlation of biological activity of plant growth
regulators and chloromycetin derivatives with Hammett constants and partition coecients. J. Am. Chem. Soc. 85, 2817±2824.
Hansen, J.E., Lund, O., Engelbrecht, J., Bohr, H., Nielsen, J.O., Brunak, S., 1995. Prediction of O-glycosylation of mammalian pro-
teins: speci®city patterns of UDP-GalNAc: polypeptide N-acetylgalactos-aminyltransferase. Biochem. J. 308, 801±813.
Harpaz, Y., Gerstein, M., Chothia, C., 1994. Volume changes on protein folding. Structure 2, 641±649.
Harp, S.A., Samad, T., Guha, A., 1990. Designing application-speci®c neural networks using the genetic algorithm. In Touretzky, D.S.
(Ed.), Advances in Neural Information Processing Systems II (Denver, 1989). Morgan Kaufmann, San Mateo, pp. 447±454.
Hart, G., 1992. Glycosylation. Curr. Opin. Cell Biol. 4, 1017±1023.
Hebsgaard, S.M., Korning, P.G., Tolstrup, N., Engelbrecht, J., Rouz, P., Brunak, S., 1996. Splice site prediction Arabidopsis thaliana
pre-mRNA by combining local and global sequence information. Nucl. Acids Res. 24, 3439±3452.
Hecht-Nielsen, R., 1987. Counterpropagation networks. Appl. Optics 26, 4979±4984.
Hecht-Nielsen, R., 1988. Applications of counterpropagation networks. Neural Networks 1, 131±139.
Hertz, J., Krogh, A., Palmer, R.G., 1991. Introduction to the Theory of Neural Computation. Addison±Wesley, Redwood City.
Hinton, G.E., 1986. Learning distributed representations of concepts. In Proceedings of the Eighth Annual Conference of the
Cognitive Science Society (Amherst, 1986). Erlbaum, Hillsdale, pp. 1±12.
Hirst, J.D., Sternberg, M.J.E., 1992. Prediction of structural and functional features of protein and nucleic acid sequences by arti®cial
neural networks. Biochemistry 31, 7211±7218.
Hobohm, U., Scharf, M., Schneider, R., Sander, C., 1992. Selection of representative protein data sets. Protein Sci. 1, 409±417.
Holland, J.H., 1975. Adaptation in natural and arti®cial systems. MIT Press, Cambridge, MA, 2nd edition 1992.
Holzgrabe, U., Wagener, M., Gasteiger, J., 1996. Comparison of structurally dierent allosteric modulators of muscarinic receptors by
self-organizing neural networks. J. Mol. Graph. 14, 185±193.
Hopp, T.P., Woods, K.R., 1981. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. USA
78, 3824±3828.
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feed-forward networks are universal approximators. Neural Networks 2,
359±366.
Hosseini, M., Maddalena, D.J., Spence, I., 1997. Using arti®cial neural networks to classify the activity of capsaicin and its analogues.
J. Chem. Inf. Comput. Sci. 37, 1129±1137.
Huang, W.Y., Lippmann, R.P., 1988. Neural net and traditional classi®ers. In Anderson, D.Z. (Ed.), Neural Information Processing
Systems (Denver, 1987). American Institute of Physics, New York, pp. 387±396.
Huang, G.M., Farkas, J., Hood, L., 1996. High-throughput cDNA screening utilizing a low order neural network ®lter. BioTechniques
21, 1110±1114.
Hudson, B., Livingstone, D.J., Rahr, E., 1989. Pattern recognition display methods for the analysis of computed molecular properties.
J. Comput. Aided Mol. Design 3, 55±65.
Hunter, L. (Ed.), 1993. Arti®cial Intelligence and Molecular Biology. AAAI Press/The MIT Press, Menlo Park, Cambridge, London.
Hyde, R.M., Livingstone, D.J., 1988. Perspectives in QSAR: Computer chemistry and pattern recognition. J. Comput. Aided Mol.
Design 2, 145±155.
Jackson, J.E., 1991. A User's Guide to Principal Components. Wiley, New York.
Jackson, R.C., 1995. Update on computer-aided drug design. Curr. Opin. Biotechnol. 6, 646±651.
Jones, D.D., 1975. Amino acid properties and side chain orientations in proteins: A cross-correlation approach. J. Theor. Biol. 50, 167±
183.
Jones, D.T., 1997. Progress in protein structure prediction. Curr. Opin. Struct. Biol. 7, 377±387.
Jurs, P.C., Dixon, S.L., Egolf, L.M., 1995. Molecular concepts: Representations of molecules. In van de Waterbeemd, H. (Ed.),
Chemometric Methods in Molecular Design. VCH, Weinheim, pp. 15±38.
Kawabata, T., Doi, J., 1997. Improvement of protein secondary structure prediction using binary word encoding. Proteins Struct.
Funct. Genet. 27, 36±46.
Kettner, C., Shaw, E., 1981. Inactivation of trypsin-like enzymes with peptides of arginine chloromethyl ketone. Methods Enzymol. 80,
824±826.
Kidera, A., Konishi, Y., Oka, M., Ooi, T., Scheraga, H.A., 1985. Statistical analysis of the physical properties of the 20 naturally
occurring amino acids. J. Protein Chem. 4, 23±55.
Kier, L.B., 1997. Book review: Neural Networks in QSAR and Drug Design (editor Devillers J.). J. Med. Chem. 40, 2967.
King, R.D., Sternberg, M.J.E., 1990. Machine learning approach for the prediction of protein secondary structure. J. Mol. Biol. 216,
441±457.
King, R.D., Srinivasan, A., 1997. The discovery of indicator variables for QSAR using inductive logic programming. J. Comput. Aided
Mol. Design 11, 571±580.
King, R.D., Sternberg, M.J.E., 1996. Identi®cation and application of the concepts important for accurate and reliable protein second-
ary structure prediction. Protein Sci. 5, 2298±2310.
Klein, D.J., Babic, D., 1997. Partial orderings in chemistry. J. Chem. Inf. Comput. Sci. 37, 656±671.
Kocjanc° ic° , R., Zupan, J., 1997. Application of a feed-forward arti®cial neural network as a mapping device. J. Chem. Inf. Comput. Sci.
37, 985±989.
Kohonen, T., 1982. Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59±69.
Kohonen, T., 1984. Self-Organization and Associative Memory. Springer Series in Information Sciences 8 (3rd edition 1989). Springer
Verlag, Heidelberg.
Kohonen, T., 1989. Learning vector quantization for pattern recognition. Report TKK-F-A601. University of Technology, Helsinki.
Kolmogorov, A.N., 1957. On the representation of continuous functions of several variables by superposition of continuous functions
of one variable and addition. Dokl. Akad. Nauk SSSR 114, 953±956.
Kosko, B., 1992. Neural Networks and Fuzzy Systems. A Dynamical Systems Approach to Machine Intelligence. Prentice Hall
International, Englewood Clis.
Kraus, R.J., Murray, E.E., Wiley, S.R., Zink, N.M., Loritz, K., Gelembiuk, G.W., Mertz, J.E., 1996. Experimentally determined
weight matrix de®nitions of the initiator and TBP binding site elements of promotors. Nucl. Acids Res. 24, 1531±1539.
Kruskal, J.B., 1964. Non-metric multidimensional scaling: A numerical method. Psychometrika 29, 115±129.
Kubinyi, H., 1993a. QSAR: Hansch Analysis and Related Approaches. VCH, Weinheim.
Kubinyi, H. (Ed.), 1993b. 3D-QSAR in Drug Design. Escom, Leiden.
Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105±132.
Langley, P., 1996. Elements of Machine Learning. Morgan Kaufmann, San Francisco.
Langley, P., Simon, H.A., 1996. Applications of machine learning and rule induction. Commun. Assoc. Comput. Machinery 38, 54±64.
Larsen, N.I., Engelbrecht, J., Brunak, S., 1995. Analysis of eukaryotic promotor sequences reveals a systematically occurring CT-sig-
nal. Nucl. Acids Res. 23, 1223±1230.
Leszczynski, J.F., Rose, G.D., 1986. Loops in globular proteins: A novel category of protein secondary structure. Science 234, 849±855.
Li, X., Gasteiger, J., Zupan, J., 1993. On the topology in self-organizing feature maps. Biol. Cybern. 70, 189±198.
Liebovitch, L.S., Zochowski, M., 1997. Dynamics of neural networks relevant to properties of proteins. Phys. Rev. (E) 56, 931±935.
Lisser, S., Margalit, H., 1993. Compilation of E. coli mRNA promotor sequences. Nucl. Acids. Res. 21, 1507±1516.
Livingstone, D.J., 1996. Multivariate data display using neural networks. In Devillers, J. (Ed.), Neural Networks in QSAR and Drug
Design. Academic Press, London, pp. 157±176.
Livingstone, D.J., Manallack, D.T., Tetko, I.V., 1997. Data modelling with neural networks: Advantages and limitations. J. Comput.
Aided Mol. Res. 11, 135±142.
Loew, G.H., Villar, H.O., Alkorta, I., 1993. Strategies for indirect computer-aided drug design. Pharm. Res. 10, 475±486.
Lohmann, R., 1993. Structure evolution and incomplete induction. Biol. Cybern. 69, 319±326.
Lohmann, R., Schneider, G., Behrens, D., Wrede, P., 1994. A neural network model for the prediction of membrane spanning amino
acid sequences. Protein Sci. 3, 1597±1601.
Lohmann, R., Schneider, G., Wrede, P., 1996. Structure optimization of an arti®cial neural ®lter detecting membrane spanning amino
acid sequences. Biopolymers 38, 13±29.
Lu, T., Lerner, J., 1996. Spectroscopy and hybrid neural network analysis. Proc. IEEE 84, 895±905.
Lu, T., Xu, X., Wu, S., Yu, F.T.S., 1990a. Neural network model using interpattern association. Appl. Optics 29, 284±288.
Lu, T., Yu, F.T.S., Gregory, D.A., 1990b. Self-organizing optical neural network for unsupervised learning. Opt. Eng. 29, 1107±1113.
Luke, B.T., 1994. Evolutionary programming applied to the development of quantitative structure±activity relationships and quanti-
tative structure±property relationships. J. Chem Inf. Comput. Sci. 34, 1279±1287.
Mahadevan, I., Ghosh, I., 1994. Analysis of E. coli promoter structures using neural networks. Nucl. Acids Res. 22, 2158±2165.
Malik, A., Schuchhardt, J., Schneider, G., Wrede, P., 1996. DNA sequence analysis by perceptrons: Identi®cation of splice-sites in
human DNA. Software Development in Chemistry 10 (editor Gasteiger, J.). GDCh, Frankfurt, pp. 217±236.
Manallack, D.T., Livingston, D.J., 1995. Neural networks and expert systems in molecular design. In van de Waterbeemd, H. (Ed.),
Advanced Computer-Assisted Techniques in Drug Discovery. VCH, Weinheim, pp. 293±318.
Marabini, R., Carazo, J.M., 1994. Pattern recognition and classi®cation of images of biological macromolecules using arti®cial neural
networks. Biophys. J. 66, 1804±1814.
Marrone, T.J., Briggs, J.M., McCammon, J.A., 1997. Structure-based drug design: Computational advances. Annu. Rev. Pharmacol.
Toxicol. 37, 71±90.
Martinetz, T., Schulten, K., 1994. Topology representing networks. Neural Networks 7, 507±522.
Matis, S., Xu, Y., Shah, M., Guan, X., Einstein, J.R., Mural, R., Uberbacher, E., 1996. Detection of RNA polymerase II promotors
and polyadenylation sites in human DNA sequence. Comput. Chem. 20, 135±140.
Matthews, B.W., 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta
405, 442±451.
McInerny, J.M., Haines, K.G., Biafore, S., Hecht-Nielsen, R., 1989. Back propagation error surfaces can have local minima. In
International Joint Conference on Neural Networks 2 (Washington, 1989). IEEE Press, New York, 627.
Melssen, W.J., Smits, J.R.M., Buydens, L.M.C., Kateman, G., 1994. Using arti®cial neural networks for solving chemical problems.
Part II. Kohonen self-organizing feature maps and Hop®eld networks. Chemom. Intell. Lab. Syst. 23, 267±291.
Metfessel, B.A., Saurugger, P.N., Connelly, D.P., Rich, S.S., 1993. Cross-validation of protein structural class prediction using stat-
istical clustering and neural networks. Protein Sci. 2, 1171±1182.
Mewes, H.W., Albermann, K., Bahr, M., Frishman, D., Gleissner, A., Hani, J., Heumann, K., Kleine, K., Maierl, A., Oliver, S.G.,
Pfeier, F., Zollner, A., 1997. Overview of the yeast genome. Nature 387, 7±65.
Mitchie, D., Spiegelhalter, D.J., Taylor, C.C. (Eds.), 1994. Machine Learning: Neural and Statistical Classi®cation. Ellis Horwood,
New York.
Milne, G.W.A., 1997. Mathematics as a basis for chemistry. J. Chem. Inf. Comput. Sci. 37, 639±644.
Minsky, M.L., Papert, S.A., 1969. Perceptrons. MIT Press, Cambridge, MA, Expanded edition 1990.
Moody, J., Darken, C., 1989. Fast learning in networks of locally-tuned processing units. Neural Computation 1, 281±294.
Moreau, G., Broto, P., 1980. The auto-correlation of a topological structure: A new molecular descriptor. Noveau Journal de Chimie
4, 359±360.
Moriguchi, I., Komatsu, K., Matsushita, Y., 1980. Adaptive least squares method applied to structure±activity correlation of hypo-
tensive N-alkyl-N0-cyano-N'-pyridylguanidines. J. Med. Chem. 23, 20±26.
MuÈller, K. (Ed.), 1995. De novo design. In Perspectives in Drug Discovery and Design Vol. 3 (series editors Anderson, P.S., Kenyon,
G.L., Marshall, G.R.). Escom Science Publishers, Leiden.
Muramatsu, T., 1993. Carbohydrate signals in metastasis and prognosis of human carcinomas. Glycobiology 3, 291±296.
Nair, T.M., Tambe, S.S., Kulkarni, B.D., 1994. Application of arti®cial neural networks for prokaryotic transcription terminator pre-
diction. FEBS Lett. 346, 273±277.
Nair, T.M., Tambe, S.S., Kulkarni, B.D., 1995. Analysis of transcription control signals using arti®cial neural networks. Comput.
Appl. Biosci. 11, 293±300.
Nakata, K., 1995. Prediction of zinc ®nger DNA binding protein. Comput. Appl. Biosci. 11, 125±131.
Nasrabadi, N.M., King, R.A., 1988. Vector quantization of images based upon the Kohonen self-organizing feature maps. In IEEE
International Conference on Neural Networks 1 (San Diego, 1988). IEEE Press, New York, pp. 101±108.
Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G., 1997. Identi®cation of prokaryotic and eukaryotic signal peptides and pre-
diction of their cleavage sites. Protein Eng. 10, 1±6.
Niranjan, M., Fallside, F., 1990. Neural networks and radial basis functions in classifying static speech patterns. Comput. Speech
Language 4, 275±289.
Norinder, U., Svensson, P., 1998. Descriptors for amino acids using MolSurf parametrization. J. Comput. Chem. 19, 51±59.
Novic° , M., Nikolovska-Coleska, Z., Solmajer, T., 1997. Quantitative structure±activity relationship of ¯avonoid p56lck protein tyro-
sine kinase inhibitors. A neural network approach. J. Chem. Inf. Comput. Sci. 37, 990±998.
Ogura, H., Agata, H., Xie, M., Odaka, T., Furutani, H., 1997. A study of learning splice sites of DNA sequence by neural networks.
Comput. Biol. Med. 27, 67±75.
Oliver, S. et al, 1992. The complete DNA sequence of yeast chromosome III. Nature 357, 38±46 (152 authors).
Pedersen, G., Engelbrecht, J., 1995. Investigations of E. coli promotor sequences with arti®cial neural networks: New signals discov-
ered upstream of the transcriptional startpoint. ISMB 3, 292±299.
Pedersen, A.G., Nielsen, H., 1997. Neural network prediction of translation initiation sites in eukaryotes: Perspectives for EST and
genome analysis. ISMB 5, 226±233.
Peterson, K.L., 1992. Counter-propagation neural networks in the modeling and prediction of Kovats indices for substituted phenols.
Anal. Chem. 64, 379±386.
Peterson, K.L., 1995. Quantitative structure±activity relationships in carboquinones and benzodiazepines using counter-propagation
neural networks. J. Chem. Inf. Comput. Sci. 35, 896±904.
Polanski, J., 1997. The receptor-like neural network for modeling corticosteroid and testosterone binding globulins. J. Chem. Inf.
Comput. Sci. 37, 553±561.
Preparata, F.P., Shamos, M.I., 1985. Computational geometry: An introduction. Springer, New York.
Presnell, S.R., Cohen, F.E., 1993. Arti®cial neural networks for pattern recognition in biochemical sequences. Annu. Rev. Biophys.
Biomol. Struct. 22, 283±298.
Pugsley, A.P., 1989. Protein Targeting. Academic Press, New York.
Qian, N., Sejnowski, T.J., 1988. Predicting secondary structure of globular proteins using neural network models. J. Mol. Biol. 202,
865±884.
Randic, M., 1997. On characterization of chemical structure. J. Chem. Inf. Comput. Sci. 37, 672±687.
Rechenberg, I., 1973. Evolutionsstrategie-Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fromann-
Holzboog, Stuttgart.
Reibnegger, G., Werner-Felmayer, G., Wachter, H., 1993. A note on the low-dimensional display of multivariate data using neural
networks. J. Mol. Graph. 11, 129±133.
Ripley, B.D., 1996. Pattern Recognition and Neural Networks. University Press, Cambridge.
Ritter, H., Schulten, K., 1988. Kohonen's self-organizing maps: Exploring their computational capabilities. IEEE ICNN 88 San Diego
1, 109±116.
Ritter, H., Schulten, K., 1989. Convergence properties of Kohonen's topology conserving maps: Fluctuations, stability and dimension
selection. Biol. Cybern. 60, 59±71.
Ritter, H., Martinez, T., Schulten, K., 1990. Neuronale Netze: Eine EinfuÈhrung in die Neuroinformatik selbstorganisierender
Netzwerke. Addison-Wesley, Bonn.
Rogers, D., Hop®nger, A.J., 1994. Application of genetic function approximation to quantitative structure±activity relationships and
quantitative structure±property relationships. J. Chem. Inf. Comput. Sci. 34, 854±866.
Rosenberg, M., Court, D., 1979. Regulatory sequences involved in the promotion and termination of RNA transcription. Ann. Rev.
Genet. 13, 319±353.
Rosenblatt, F., 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65,
386±408.
Rost, B., 1996. PHD: Predicting one-dimensional protein structure by pro®le-based neural networks. Methods Enzymol. 266, 525±539.
Rost, B., Sander, C., 1993a. Secondary structure prediction of all-helical proteins in two states. Protein Eng. 6, 831±836.
Rost, B., Sander, C., 1993b. Improved prediction of protein secondary structure by use of sequence pro®les and neural networks. Proc.
Natl. Acad. Sci. USA 90, 7558±7562.
Rost, B., Casadio, R., Fariselli, P., Sander, C., 1995. Transmembrane helices predicted at 95% accuracy. Protein Sci. 4, 521±533.
Rost, B., Fariselli, P., Casadio, R., 1996. Topology prediction for helical transmembrane proteins at 80%. Protein Sci. 5, 1704±1718.
Rumelhart, D.E., McClelland, J.L. and The PDB Research Group, 1986. Parallel Distributed Processing. MIT Press, Cambridge.
Saaki, S., Abe, Y., Takahashi, Y., Takayama, T., Miyashita, Y., 1984. Introduction to pattern recognition for chemists. Tokyo
Kagaku Dojin, Tokyo.
Sadowski, J., Gasteiger, J., 1993. From atoms and bonds to three-dimensional atomic coordinates: Automatic model builders. Chem.
Rev. 93, 2567±2581.
Sadowski, J., Gasteiger, J., 1994. Comparison of automatic three-dimensional model builders using 639 X-ray structures. J. Chem. Inf.
Comput. Sci. 34, 1000±1008.
Sadowski, J., Wagener, M., Gasteiger, J., 1995. Assessing similarity and diversity of combinatorial libraries by spatial autocorrelation
functions and neural networks. Angew. Chem. Int. Ed. Engl. 34, 2674±2677.
Salt, D.W., Yildiz, N., Livingstone, D.J., Tinsley, C.J., 1992. The use of arti®cial neural networks in QSAR. Pest. Sci. 36, 161±170.
Sammon, J.W., 1969. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C 18, 401±409.
Sander, C., Schneider, R., 1991. Database of homology-derived protein structures and the structural meaning of sequence alignment.
Proteins Struct. Funct. Genet. 9, 56±68.
Schneider, T.D., Stephens, R.M., 1990. Sequence logos: A new way to display consensus sequences. Nucl. Acids Res. 18, 6097±6100.
Schneider, G., Wrede, P., 1992. Modular feature extraction in protein sequences with arti®cial neural networks. Analog model for sym-
biogenous constraints. Endocyt. Cell Res. 9, 1±12.
Schneider, G., Wrede, P., 1993a. Development of arti®cial neural ®lters for pattern recognition in protein sequences. J. Mol. Evol. 36,
586±595.
Schneider, G., Wrede, P., 1993b. Prediction of the secondary structure of proteins from the amino acid sequence with arti®cial neural
networks. Angew. Chem. Int. Ed. Engl. 32, 1141±1143.
Schneider, G., Wrede, P., 1993c. Signal analysis of protein targeting sequences. Protein Seq. Data Anal. 5, 227±236.
Schneider, G., Wrede, P., 1994a. The rational design of amino acid sequences by arti®cial neural networks and simulated molecular
evolution: De novo design of an idealized leader peptidase cleavage site. Biophys. J. 66, 335±344.
Schneider, G., Wrede, P., 1994b. Optimizing amino acid sequences by simulated molecular evolution. Mathem. Res. 81, 335±346.
Schneider, G., Broger, C., 1998. Visualizing protein sequence space by self-organizing neural networks: Classi®cation of protein tar-
geting signals. In Wagner, E. (Ed.), Endocytobiology VII, in press.
Schneider, G., RoÈhlk, S., Wrede, P., 1993. Analysis of cleavage-site patterns in protein precursor sequences with a perceptron-type
neural network. Biochem. Biophys. Res. Commun. 194, 951±959.
Schneider, G., Schuchhardt, J., Wrede, P., 1994a. Arti®cial neural networks and simulated molecular evolution are potential tools for
sequence-oriented protein design. Comput. Appl. Biosci. 10, 635±645.
Schneider, G., Lohmann, R., Wrede, P., 1994b. The rational design of amino acid sequences. In Wrede, P., Schneider, G. (Eds.),
Concepts in Protein Engineering and Design. Walter de Gruyter, Berlin, New York, pp. 281±317.
Schneider, G., Schuchhardt, J., Wrede, P., 1995a. Development of simple ®tness landscapes for peptides by arti®cial neural ®lter sys-
tems. Biol. Cybern. 73, 245±254.
Schneider, G., Schuchhardt, J., Wrede, P., 1995b. Peptide design in machina: Development of arti®cial mitochondrial protein precursor
cleavage sites by simulated molecular evolution. Biophys. J. 68, 434±447.
Schneider, G., Grunert, H.-P., Schuchhardt, J., Wolf, K., Zeichhardt, H., Habermehl, K.-H., MuÈller, G., Wrede, P., 1995c. A peptide
selection scheme for systematic evolutionary design and construction of synthetic peptide libraries. Minim. Invas. Med. 6, 106±115.
Schneider, G., Schuchhardt, J., Wrede, P., 1996. Evolutionary optimization in multimodal search space. Biol. Cybern. 74, 203±207.
Schneider, G., Schuchhardt, J., Malik, A., Glienke, J., Jagla, B., Behrens, D., MuÈller, S., MuÈller, G., Wrede, P., 1997. Analysis of mito-
chondrial and chloroplast targeting signals by neural network systems. In Schenk, H.E.A., Herrmann, R., Jeon, K.W., MuÈller,
N.E., Schwemmler, W. (Eds.), Eukaryotism and Symbiosis. Springer Verlag, Berlin, Heidelberg, pp. 214±229.
Schneider, G., SjoÈling, S., Wallin, E., Wrede, P., Glaser, E., von Heijne, G., 1998. Feature extraction from endopeptidase cleavage sites
in mitochondrial protein precursors. Proteins 30, 59±60.
Schuchhardt, J., Schneider, G., Reichelt, J., Schomburg, D., Wrede, P., 1996. Local structural motifs of protein backbones are classi-
®ed by self-organizing neural networks. Protein Eng. 9, 833±842.
Sen, K. (Ed.), 1995a. Molecular Similarities I, Topics in Current Chemistry, Vol. 173. Springer, Berlin.
Sen, K. (Ed.), 1995b. Molecular Similarities II, Topics in Current Chemistry, Vol. 174. Springer, Berlin.
Shepard, R.N., 1962a. The analysis of proximities: Multidimensional scaling with an unknown distance function. Psychometrika 27,
125±139.
Shepard, R.N., 1962b. The analysis of proximities: Multidimensional scaling with an unknown distance function. Psychometrika 27,
219±246.
Sietsma, J., Dow, R.J.F., 1988. Neural net pruning ± why and how. In IEEE International Conference on Neural Networks 1 (San
Diego, 1988). IEEE, New York, pp. 325±333.
Silver, P., Goodson, H., 1989. Nuclear protein transport. Crit. Rev. Biochem. Mol. Biol. 24, 419±435.
Sipos, L., von Heijne, G., 1993. Predicting the topology of eukaryotic membrane proteins. Eur. J. Biochem. 213, 1333±1340.
Snyder, E.E., Stormo, G.D., 1993. Identi®cation of coding regions in genomic DNA sequences: An application of dynamic program-
ming and neural networks. Nucl. Acids Res. 21, 607±613.
Snyder, E.E., Stormo, G.D., 1995. Identi®cation of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1±18.
So, S.S., Karplus, M., 1997a. Three-dimensional quantitative structure±activity relationships from molecular similarity matrices and
genetic neural networks. 1. Method and validations. J. Med. Chem. 40, 4347±4359.
So, S.S., Karplus, M., 1997b. Three-dimensional quantitative structure±activity relationships from molecular similarity matrices and
genetic neural networks. 2. Applications. J. Med. Chem. 40, 4360±4371.
Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A., 1982. Use of the Perceptron algorithm to distinguish translational initiation
sites in E. coli. Nucl. Acids Res. 10, 2997±3011.
Strachan, T., Read, A.P., 1996. Human molecular genetics. Bioscienti®c-Wiley, New York.
Sumpter, B.G., Getino, C., Noid, D.W., 1994. Theory and applications of neural computing in chemical science. Annu Rev. Phys.
Chem. 45, 439±481.
Sun, Z., Rao, X., Peng, L., Xu, D., 1997. Prediction of protein supersecondary structures based on the arti®cial neural network
method. Protein Eng. 10, 763±769.
Sutherland, J.G., 1990. Holographic model of memory, learning, and expression. Int. J. Neurol. Syst. 1, 256±267.
Tatusov, R.L., Koonin, E.V., Lipman, D.J., 1997. A genomic perspective on protein families. Science 278, 631±637.
Taylor, W.R., 1986. The classi®cation of amino acid conservation. J. Theor. Biol. 119, 205±218.
Tetko, I.V., Luik, A.I., Poda, G.I., 1993. Applications of neural networks in structure±activity relationships of a small number of mol-
ecules. J. Med. Chem. 36, 811±814.
Tetko, I.V., Tanchuk, V.Y., Chentsova, N.P., Antonenko, S.V., Poda, G.I., Kukhar, V.P., Luik, A.I., 1994a. HIV-1 reverse transcrip-
tase inhibitor design using arti®cial neural networks. J. Med. Chem. 37, 2520±2526.
Tetko, I.V., Tanchuk, V.Y., Luik, A., 1994. Application of an evolutionary algorithm to the structure activity relationship. In Sebald,
A.V., Vogel, L.J. (Eds.), Proceedings of the Third Annual Conference on Evolutioany Programming. World Scienti®c, River Edge,
NJ, pp. 109±119.
Tolstrup, N., Toftgard, J., Engelbrecht, J., Brunak, S., 1994. Neural network model of the genetic code is strongly correlated to the
GES scale of amino acid transfer energies. J. Mol. Biol. 243, 816±820.
Tomii, K., Kanehisa, M., 1996. Analysis of amino acid indices and mutation matrices for sequence comparison and structure predic-
tion of proteins. Protein Eng. 9, 27±36.
Uberbacher, E.C., Mural, R.J., 1991. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network
approach. Proc. Natl. Acad. Sci. USA 88, 11261±11265.
Ugi, I., Wochner, M., Fontain, E., Bauer, J., Gruber, B., Karl, R., 1990. Chemical similarity, chemical distance, and computer-assisted
formalized reasoning by analogy. In Johnson, M.A., Maggiora, G.M. (Eds.), Concepts and Applications of Molecular Similarity.
John Wiley and Sons, New York, pp. 239±288.
van de Waterbeemd, H. (Ed.), 1995a. Chemometric Methods in Molecular Design. VCH, Weinheim.
van de Waterbeemd, H. (Ed.), 1995b. Advanced Computer-Assisted Techniques in Drug Discovery. VCH, Weinheim.
von Heijne, G., 1983. Patterns of amino acids near signal-sequence cleavage sites. Eur. J. Biochem. 133, 17±21.
von Heijne, G., 1985. Signal sequences: The limits of variation. J. Mol. Biol. 184, 99±105.
von Heijne, G., 1989. Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues.
Nature 341, 456±458.
von Heijne, G., 1994. Design of protein targeting signals and membrane protein engineering. In Wrede, P., Schneider, G. (Eds.),
Concepts in Protein Engineering and Design. Walter de Gruyter, Berlin, New York, pp. 263±279.
von Heijne, G. (Ed.), 1994. Signal Peptidases. R.G. Landes Company, Austin.
von Heijne, G., Gavel, Y., 1988. Topogenic signals in integral membrane proteins. Eur. J. Biochem. 174, 671±678.
Vrac° ko, M., 1997. A study of structure±carcinogenic potency relationship with arti®cial neural networks. The using of descriptors re-
lated to geometrical and electronic structures. J. Chem. Inf. Comput. Sci. 37, 1037±1043.
Wagener, M., Sadowski, J., Gasteiger, J., 1995. Auto-correlation of molecular surface properties for modeling corticosteroid binding
globulin and cytosolic AH receptor activity by neural networks. J. Am. Chem. Soc. 117, 7769±7775.
Weber, L., Wallbaum, S., Broger, C., Gubernator, K., 1995. Optimization of the biological activity of combinatorial compound
libraries by a genetic algorithm. Angew. Chem. Int. Ed. Engl. 34, 2280±2282.
Weigend, A.S., Gershenfeld, N.A. (Eds.), 1993. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-
Wesley, Reading, MA.
Widrow, B., Ho, M.E., 1960. Adaptive switching circuits. In 1960 IRE WESCON Convention Record, Part 4. IRE, New York, pp.
96±104.
Wienke, D., Buydens, L., 1996. An adaptive resonance theory based arti®cial neural network for supervised chemical pattern recog-
nition (FuzzyARTMAP). Part 1: Theory and basic properties. Chemom. Intell. Lab. Sys. 32, 151±164.
Wienke, D., Domine, D., Buydens, L, Devillers, J., 1996. Adaptive resonance theory based neural networks explored for pattern rec-
ognition analysis of QSAR data. In Devillers, J. (Ed.), Neural Networks in QSAR and Drug Design. Academic Press, London, pp.
119±156.
Willett, P. (Ed.), 1997. Methods for the analysis of molecular diversity. Perspectives in Drug Discovery and Design, Vols. 7±8.
Wrede, P., Schneider, G. (Eds.), 1994. Concepts in Protein Engineering and Design ± An Introduction. Walter de Gruyter, Berlin, New
York.
Wrede, P., Landt, O., Klages, S., Fatemi, A., Hahn, U., Schneider, G., 1998. Peptide design aided by neural networks: Biological ac-
tivity of arti®cial signal peptidase I cleavage sites. Biochemistry 37, 3588±3593.
Wu, C.H., 1996. Gene classi®cation arti®cial neural system. Methods Enzymol. 266, 71±88.
Wu, C.H., Zhao, S., Chen, H.L., Lo, C.J., McLarty, J., 1996. Motif identi®cation neural design for rapid and sensitive protein family
search. Comput. Appl. Biosci. 12, 109±118.
Xu, Y., Mural, R., Shah, M., Uberbacher, E., 1994. Recognizing exons in genomic sequence using GRAIL II. Genet. Eng. 16, 241±253.
Young, S.S., Sheeld, C.F., Farmen, M., 1997. Optimum utilization of a compound collection or chemical library for drug discovery.
J. Chem Inf. Comput Sci. 37, 892±899.
Zamyatnin, A.A., 1972. Protein volume in solution. Prog. Biophys. Mol. Biol. 24, 107±123.
Zupan, J., Gasteiger, J., 1993. Neural Networks for Chemists. VCH, Weinheim.

Arti®cial Neural Networks For Computer-Based Molecular Design

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arti®cial Neural Networks For Computer-Based Molecular Design

Uploaded by

Copyright:

Available Formats

Progress in

Arti®cial neural networks for computer-based molecular

0079-6107/98/$19.00 # 1998 Elsevier Science Ltd. All rights reserved.

Intelligent algorithms for computer-aided molecular design (CAMD) have become an

. `learn' from examples and acquire their own `knowledge' (induction);

2. Principles of neural network development

classi®cation. An introduction to UNN development is provided in chapter 2.2. Hybrid systems

2.1. Building blocks of neural network architecture

2.2. Unsupervised training of neural networks

Fig. 2c shows the architecture of a two-layered, fully-connected feed-forward net. This

or the Manhattan distance:

Fig. 4. Examples of topology-preservation in Kohonen-networks. (a) training of a 10 10 Kohonen-network with

2.3. Supervised training of neural networks

A systematic investigation of ideal ranges of the parameter r in QSAR studies was

2.4. Testing the generalization ability of neural network models

2.5. Special network architectures and hybrid models

Multivariate display methods are of considerable importance to data visualization in

Counter-propagation networks (Hecht-Nielsen, 1987; Carpenter and Grossberg, 1991) have

3. Molecular descriptors and data pre-processing

An important concept of three-dimensional molecular similarity was coined by CarboÂ and

. overall (`bulk') molecular properties (e.g. lipophilicity, shape, overall charge),

Ala 0.11 0.32 2.09 0.54 0.51

1996). Recently an application of auto-correlation vectors as structural descriptors for

Fig. 9. Venn diagram of amino acid residues according to Taylor (1986).

4.1. Feature extraction and classi®cation of biopolymers

4.1.1. Identi®cation of promotor sites and polymerase binding regions

4.1.2. Splice site identi®cation

`GeneParser', a combination of an ANN with dynamic programming, was developed for

4.1.3. Ribosomal binding sites

4.1.4. Prediction and classi®cation of protein structure

4.1.5. Prediction of membrane-spanning sequences

transmembrane regions and their topology. A comparison of prediction accuracy for

1. Generation of an initial set of network structures.

4.1.6. Identi®cation of protein targeting sequences

4.1.7. Peptide±protein binding

4.1.8. Identi®cation of glycosylation sites

4.2. Neural networks as ®tness functions in drug design

special two-dimensional topological representations of superimposed molecular compounds

5. Conclusions and current challenges

Hybrid network architectures consisting of a pre-processing layer, an unsupervised layer for

You might also like