You are on page 1of 21

1.

5
Human or Artificial Intelligence?

What makes the brain of a small child able to do tasks that conventional
supercomputers can't do? An infant won't be likely to rival a supercomputer at adding
reams of numbers, yet infants can recognize their parents despite haircuts or
concealment of the upper face in a game of peekaboo. It is quite difficult for a
computer to perform such a feat. And despite over thirty years of intensive research in
artificial intelligence, computers fall short of the expectations of intelligence we have
of them. Indeed, many problems, especially in speech, robotics, pattern recognition,
vision, and combinatorial optimization, are not well suited to the sequential approach
of conventional computational methods.

How can computers be made more like us, intelligent and rapid in sorting patterns
and solving problems? According to cognitive theories, two primary strengths of the
relatively slow human brain are massive interconnection and parallel-processing
architecture. Neural networks are an alternativee computational approach based on
such theories of the human brain and intelligence. The ANN approach can be
differentiated from the expert system approach of artificial intelligence in that the
latter typically involves programming a system to apply a hierarchy of explicit rules.
In contrast, ANN operates below a cognitive level of symbolic processing. One might
view expert systems as organizing behavior by description, whereas neural networks
attempt to imitate the behavior. One might make a case that, instead of using a set of
rules, human experts often apply intuition or deeper insight into the problem that they
have learned to solve. Another analogy is that an ANN is somewhat like a monkey
trained to make the right decision based on empirical reasoning without explicit
verbalization of the problem. Of course the monkey has its own mind or native neural
networks, capable of solving tasks more complex than any task we might train the
monkey to do.

Advantages of Artificial Neural Networks

The computational paradigm of neural networks has several advantages for solving
problems within environmental engineering and the geosciences.

1.6.1
Adaptability

1
Under supervised learning, the neural network learns to solve problems by example
rather than by following a set of heuristics or theoretical mechanisms. For many real-
world problems, precise solutions do not exist. In such cases, acquiring knowledge by
example may be the only solution. In other words, if it is not possible to describe the
logic behind a problem or to predict behavior with analytical or numerical solutions
to governing equations, traditional predictive analysis will be difficult. ANN analysis,
however, does not rely on prescribed relation, but rather seeks its own relation, and
thus may have an advantage over traditional predictive analysis.

It may be helpful to think of an ANN as a nonparametric, nonlinear regression


technique. In traditional regression techniques of modeling, one must decide a priori
on a model to which the data will be fitted. The ANN approach is not as restrictive,
because the data will be fitted with the best combination of nonlinear or linear
functions as necessary, without the researcher rigidly preselecting the form of these
functions.

Under this umbrella of adaptability, we will consider a network's ability to develop its
own feature representation. Neural networks can organize data into the vital aspects
or features that enable one pattern to be distinguished from another. This quality of a
neural network stems from its adaptability in learning by example and leads to its
ability to generalize, as discussed below.

1.6.2
Generalization

Generalization may be thought of as the ability to abstract, or to respond


appropriately to input patterns different from those involved in the training of the
network. In learning to solve a problem, the network must encode the generalization
of the vital components of the problem into the weights of the network nodes. One
might think of the weights on which the mature network has converged after
"learning" the training examples as a distillation of the problem to these major
components. Not only is a well-trained neural network skilled in generalizing the
information held in these weights to new patterns it may be asked to judge, but the

2
weights form a storehouse of knowledge that may be explored or mined for insights
about the problem as a whole.

1.6.3
Nonlinearity

A particular strength of the ANN approach is its ability to identify nonlinearities of


phenomena. Nonlinearities are common in the geosciences and environmental
engineering,
either from equations in which there are multiple unknowns that depend on one
another or from phenomena that are multidimensional or geometric rather than linear
in their behavior. The transfer, or activation, function of ANNs that transfers the sum
of the weighted inputs into the outputs is most often a nonlinear function. For
example, the sigmoid function, a nonlinear function with linear portions, is often
used. If the problem is linear, the network will operate in the linear region of this
function. However, if the problem is nonlinear, this function has the depth to handle
the nonlinearity, and the network adapts to the appropriate amount of nonlinearity.

1.6.4
Openness to Noisy, Fuzzy, or Soft Data

When designed well, ANNs can be relatively tolerant to noisy, incomplete (i.e.,
fuzzy), or even spurious data. This tolerance of patchy data is attractive in geologic
and environmental problems because we often attempt to interpret real-world
phenomena based on such data. It is not uncommon to have secondary data that may
or may not be correlated with the primary data. For example, secondary seismic data
may complement a primary database of vertical borehole descriptions. Sometimes
such secondary data are referred to as soft data if their correlation with the primary
data is not certain. This potential to integrate multiple information sources, even if
noisy, fuzzy, or soft, can make a network more robust and flexible.

1.6.5
Modularity

Once trained neural networks have synthesized understanding into a compact system

3
of weights, they become portable and easy to insert into other systems. This
modularity has many advantages. They can be easily retrained with new data and
inserted to upgrade existing systems. For example, in the earthquake forecasting
example problem (chapter 10), one can imagine adding new earthquakes to the neural
network's training base. Modularity also creates the potential for computational
shortcuts. In the groundwater remediation example problem (chapter 4), the neural
network is used to search through possible pumping strategies and eliminates
sequential calls to the time-intensive predictive model of flow and transport. Finally,
trained neural networks are fast systems that can be economically implemented on a
number of real-time computing platforms.

These qualities make the ANN technology particularly attractive to many


fundamental and applied problems in the geosciences and environmental engineering
that are complex, nonlinear, poorly understood, and difficult to model.

4
Architecture of neural network for seismic event di
scrimination

Neural Networks:
Methods and Algorithms

Motivation

Neural networks are elegant and powerful tools for attacking difficult problems. In
the rest of this book, we discuss neural networks from the practical viewpoint of
using them to solve hard problems in environmental and geological applications.

For which problems and in what form are neural networks ideally developed? How
are such networks integrated with other systems? These are difficult but important
questions that can be addressed only with time and experience. We have had
extensive experience in the application of neural networks to many real problems, and
our aim is to share this experience with others. Toward that end, in this and the next
chapter, we provide general descriptions and properties of specific neural network
methods and algorithms.

The plan of this chapter is as follows. We first discuss the five steps that are important
in the successful use of neural networks and generalized structure of ANNs. We then
discuss four important neural network methods: the backpropagation network, the
self-organizing Kohonen network, the Hopfield network, and the radial basis function
network. We have used these networks in many exploratory research problems.
Characterized by simplicity and generality, the networks and their variations are the
workhorses in neural computation and offer unique insights into neural systems.

2.1
Five Steps of Neural Network Design

5
The underlying theoretical principle in solving problems using neural networks is
based on the theory of learning (Valiant, 1984). Machine learning and the associated
problems of generalization cover a wide range of issues that are beyond the scope of
this book. However, in seeking practical solutions to problems in machine learning, it
is helpful to restructure the problem into the following five steps.

Step 1:
Definition

In this step the problem is defined precisely; that is, mathematically. Is the problem
one of detection, estimation, discrimination, classification, optimization, prediction,
interpolation, extrapolation, clustering, or some combination of these problems?
Neural networks are being used to solve a variety of these types of problems. For
example, neural networks are used to solve problems in discrimination, where the
task is to classify the input as one of two classes (chapter 5); in classification, where
the input data are categorized by computer automation as belonging to one of many
different classes (chapter 6); and in interpolation, where multidimensional input data
are interpolated on a multidimensional grid (chapter 8). A combination of estimation
and optimization occurs in chapter 4, where a search for the minimal cost solution is
found with the aid of a neural network that estimates the input patterns performance
on the constraints and objective of the optimization.

Because of the versatility of ANNs, one may overlook the need to define the problem
precisely. However, it is important to define the problem before selecting an
appropriate network. For example, when the researcher must decide which network
(backpropagation, Kohonen, Hopfield, or radial basis function) is most suited for a
particular application, a precise mathematical definition of the problem often leads to
a good choice of the network. However, as many of the example problems discussed
in this text illustrate, the same network can be used to solve many different types of
problems. Furthermore, not only must we define the problem precisely, we must also
construct the network output in an appropriate form, as discussed below.

The real problem is not whether machines think, but whether men do.
B. F. Skinner
Contingencies of Reinforcement

Step 2:
Physics

6
In the second step, the underlying interactions and processes of the problems are
clarified so that a plausible or quasi-understandable relationship exists between
problem inputs and problem outputs. In many learning problems an error function is
often minimized. The goal in this step is to define the network output such that the
error minimization during learning is meaningful; that is, to determine from the
example inputs and outputsthe training setprovided to the network what function the
network would represent if it were to minimize the output error during learning.

One might argue that the physics of the underlying training set is important enough
that this should really be the first step in solving the problem. In practice, however,
the order of the first two steps has little significance because one iterates back and
forth between the first two steps and the next step, the representation of the problem
to the network.

For example, suppose we are interested in forecasting river flooding at a certain


location. Ideally, we would select as inputs from our database the dominant factors
affecting the physics of flooding at this location, perhaps including such factors as
rainfall, upstream water levels, and runoff volumes at selected locations. After
training our neural network with different combinations of these inputs, we may
begin to see how we might improve on our initial selection of inputs. Thus the ANN
performance will cast light on the physics of the problem, even as the physics of the
problem influences the initial ANN formulation.

Step 3:
Representation

In this third step, an appropriate representation of the parameters that can be used or
measured is determined. Are appropriate representations of input features being used?
This is an extremely important part in the successful use of an ANN. For example, to
avoid time offset, one may choose the spectrum of a waveform as the input. Chapter
3 discusses various methods of representation of inputs to the network.

Step 4:
Development

7
In the fourth step of the design process, the neural network, or the nonlinear function,
is designed and developed. Perhaps contrary to a common viewpoint, we find that the
building of a powerful network requires a significant degree of experience and skill.
Although the tools of neural networks simplify the work of the researcher, developing
a network with good generalization properties demands skills that are not explicitly
or widely discussed in the neural network literature. However, with just a handful of
network-developing techniques discussed in this book, and with a little practice,
powerful neural networks can be developed quite rapidly.

Step 5:
Performance

In this final step, an objective measure of network performance is sought. For


example, what is the probability of false alarm or false positive? What is the
sensitivity of the network? How does one know that the network will operate reliably
in the field even when the input data look quite different from the training data?
Chapter 3 sheds some light on these very important issues. We also discuss in section
3.5.3 the method of leave-one-out, an exceedingly simple and useful method of
performance evaluation of the neural network when the number of examples is
limited, a fairly common situation in research problems.

In summary, because discussions of neural networks often portray the mechanics of


training the network as straightforward, it is easy to overlook and

Figure 2.1 Neural networks can be represented as nonlinear systems

even ignore the different steps of the neural network design process. In our
experience, however, neural networks are best constructed when careful attention is
paid to all five steps. By following these design steps systematically, the researcher is
more likely to succeed in the design of a useful network.

8
2.2
Neural Networks as Systems

We consider a number of types of neural networks in this text. All of them can be
viewed as nonlinear functions or systems, which we denote by f(). A system, defined
mathematically, is a transformation that uniquely maps an input pattern into an output
pattern. As shown in figure 2.1, when the input to the system is denoted by the vector
x and the output denoted by the vector y, the input-output relation can be written as y
= f(x, w), where w denotes the weights of the network. The weights and the structure
of interconnected nodes in the system define the input-output mapping performed by
the network.

From this viewpoint, a neural network is simply a nonlinear function. The training set
is the set of all input vectors or patterns {x1, x2, . . ., xN} and the corresponding set of
desired outputs or target patterns {y1, y2, . . ., yN} that are used to derive the network
weights w. The training process is illustrated in figure 2.2.

Note that an input pattern xk can be a vector of input features, a time series, a 2-D
signal, or an image. Similarly, the output yk is a vector of arbitrary dimensions
representing the output features. The training input set of L examples, each with N
features, can be represented by the input L N matrix

where the rows of represent the examples. Similarly, the training output matrix can
x

be represented by

2.3

9
Backpropagation Networks

2.3.1
Supervised Learning

The backpropagation network is a classic example of a supervised learning network.


It is convenient first to consider a general example, summarizing the basic approach
for solving a supervised learning problem with a backpropagation network.

Step 1

Select an architecture suitable for the problem. An example network architecture is


shown in figure 2.3a. The circles in figure 2.3a represent the neurons, and the lines
represent the connections between the neurons. The input layer is a set of registers
that contains the input vector elements. Each neuron in a network computes the
weighted sum of its inputs from the outputs of the previous layer and produces an
output that is a nonlinear transfer function of the weighted sum (see figure 2.3b). We
discuss these operations with specific details when we consider the learning
algorithms later in this chapter.

The number of units in the input and output layers is usually determined by the
structure and the representation of the problem. However, the number of

Figure 2.3a
Architecture of neural networks consists of
input layer, hidden layer,Tand output layer.

10
Each layer consists of neurons or units
connected to other neurons by interconnecting weights

Figure 2.3b
Functional representation of single neuron.

hidden layers and the number of units in a hidden layer are selected by the designer after
considering a number of constraints, such as the number and distribution of examples
used to train the network. We discuss the important concepts of generalization and
memorization in chapter 3. In network learning, generalization is increased and
memorization reduced by limiting the number of weights or interconnects; as a general
rule, the total number of weights in the network should be less than the number of
elements in the training set.

Step 2

Construct the input-output vectors to form a large database of training data, the set of
training patterns or examples. For a set of p training examples the training set is
represented by

and the corresponding training error might be computed by

11
where xk represents the input vector, yk represents target output (or the desired output),
is the network output, and T is the vector transpose operator. An important requirement
for backpropagation networks is the need for a large number of elements in the training
set. It is important that the training database spans the variations in the input-output space
if the network is to have good generalization properties. Indeed, an important concern in
the design of neural networks is determining what constitutes an adequate training set.
We discuss this subject further in chapter 3.

Step 3

Apply a learning algorithm, such that the network learns to associate the inputs with
the corresponding outputs, for all or most of the examples in the training set.
Backpropagation is really a learning algorithm, and the network is a multilayered
feed-forward architecture. The backpropagation algorithm usually uses a gradient
descent method to systematically modify the weights in the network so as to
minimize the network output error. Once the network can solve a problem with a very
large set of examples, the assumption is that the network can generalize from these
examples to previously unseen data, or test data.

Another problem that must be avoided during the training of a network is referred to
as ''overtraining." Overtraining occurs when we assume that the examples used are
noise free and we force the network, by continued exercise of error minimization, to
make the output error very small. Contradictory inputs means there is a high
similarity among two or more inputs, while there is a high dissimilarity among the
corresponding outputs. (Note that, from the definition of a system, the network cannot
make the error arbitrarily low when the outputs are different for two or more identical
inputs.) With contradictory examples in the training set, when the network learning
error is made very small, the weights are forced to assume large positive and negative
values. The network uses a large dynamic range for weights (or more "bits") to
reproduce the input-output mapping of the training examples. In numerical
optimization, this is referred to as being "ill conditioned" because of the large
dynamic range of the singular values. This is akin to using too many weights, again at
the price of reducing the network's generalization capability. Ideally, the
generalization property of the network should be derived from the signal features and
not from the noise features. Overtraining of the network can be avoided by using a
large, well-distributed set of examples and by comparing the performance of the
training and testing events. When the test set results are as good as the training set

12
results, the network is assumed to have reached an optimum training state.

Step 4

Use the trained network to analyze the unknown input patterns derived from new
observations or experiments. This is the feed-forward propagation step, one the
network performs rapidly

2.4
Kohonen Network and Feature Mapping

2.4.1
Unsupervised Learning:
Self-Organization

In many problems it is useful to have a system that forms its own classification of the
data from the training examples. In these types of problems, unsupervised learning
networks (e.g., the Kohonen network) are applicable. Unlike a supervised learning
network, where the training examples must be explicitly specified at the output, an
unsupervised Kohonen network clusters the data into groups that have common
features. Hence the Kohonen network is often called a "feature-mapping network."

Although the Kohonen network can be viewed as a clustering algorithm, the


clustering it forms is very special. The network not only groups the input patterns into
various clusters, it organizes the clusters in a one- or multidimensional space
according to similarity of the cluster centroids themselves (see figures 2.6a and 2.6b).
The motivation of the method is drawn from the knowledge that the brain utilizes
spatial mappings: different sensory and cognitive tasks are localized in different
regions of the brain, and these regions have a definite topological order. See box for
the Kohonen self-organizing neural network (SONN) algorithm.

13
Figure 2.6a
Kohonen 1-D self-organizing neural network.

Figure 2.6b
Kohonen 2-D self-organizing neural network

2.4.2
Two-Dimensional Feature Maps

One of the appealing concepts of the Kohonen feature maps is that the topology of these
maps is not restricted to one-dimensional topological order. The multidimensional cluster
mapping offered by the Kohonen self-organization method can be particularly useful in
"data mining" applications, where the researcher is attempting to discover the important
features of a complex data set. The two-dimensional Kohonen map is particularly
interesting for many real problems, but has been less widely used in applications
compared to the 1-D algorithm. To encourage the use of 2-D Kohonen maps, we include

14
the 2-D algorithm.
2.4.3
K-Means Clustering

To compare the self-organization technique of the Kohonen algorithm with a


conventional pattern recognition method, we include the k-means algorithm (figure 2.6c).
The k-means clustering is a simple algorithm and could also be useful in understanding
the properties of the training set . For example, by
x

Figure 2.6c
K-means clustering strategy consists of determining cluster
centers shown by filled circles. Open circles are locations
of examples in 2-D feature space.

2.4.4
Classification by Self-Organization

Although the SONN can viewed as one way of clustering the data into cluster groups,
an important use of SONNs is the classification of new or unknown data. In the
SONNs described above, there is of course an asso-

2.5
Hopfield Networks

15
The Hopfield network is named after its inventor, John Hopfield of the California
Institute of Technology. Besides proving a number of elegant energy properties of
symmetric networks (w = w ), Hopfield and others developed the network to solve
ij ji

problems of associative memory, the traveling salesman problem, and many others.
When Hopfield network weights are derived using the Hebbian rule (see the Hopfield
network algorithm), the network can be rapidly applied to pattern recognition and

Hopfield neural network architecture.

16
Figure 2.8
.Radial basis function neural network architecture

2.6
Radial Basis Function Network

We discuss radial basis function (RBF) networks from the viewpoint of interpolating
a 2-D space, as shown in figure 2.8. The generalization of the algorithm to higher
dimensions is straightforward. The basic step is to define a function of the form

Using the training set, the radial basis method solves the set of equations

RADIAL BASIC FUNCTION (RBF) ALGORITHM


1. Select a set of example patterns {x1, x2, . . ., xn} that will determine the number of
units in the network. The selection can be made, for instance, by first clustering the data
into N cluster groups and using the cluster centroids as exemplars.
2. Compute the matrix according to equation (2.32b); ij is the radial distance
between examples i and j.
3. Solve for the weights (1, . . ., L) using the set of equations defined in equation
(2.31a).

In matrix notation, equation (2.30) can be written as

17
Note that the basis function defined in equation (2.32) is a radially symmetric
distance function, hence the name ''radial basis function."

As might be suspected, other basis functions can also be used. The radial basis
function defined in equation (2.32) is particularly attractive because the matrix can
be shown to be positive definite (Powell, 1987) and therefore invertible. In summary,
the weights of the radial basis network are obtained by inverting a set of linear
equations, once the width of the basis functions defined by the parameters has been
i

estimated. Note, however, that the network is still nonlinear. Because the dimensions
of the matrix are proportional to the number of training examples, the matrix can
become very large in many problems. Hence a clustering algorithm is usually
employed first, and examples closest to the cluster centers are then used in evaluating
equation (2.31). See box for the details of the RBF algorithm.

Radial basis functions are the method used to solve the example problems in chapter
8.

2.6.1
Data Adaptive Radial Basis Functions

In our discussion of the radial basis function, we note that the precise location of the
radial centers and the width of the basis functions, as defined by , are parameters
that need to be chosen by the designer of the network. In

18
the following we discuss how this 2-D interpolation problem was handled in the RBF
algorithm that we developed.

Consider a 2-D interpolation problem, where the N points in the 2-D plane are
randomly sampled: (x , y ) where k = 1, . . ., N. Our strategy is to minimize the error at
k k

these points by placing the center of the radial basis functions on a uniform grid and
determining the optimally using a gradient descent error minimization rule. The
values of the radial centers is first determined by some nearest neighbor interpolation
algorithm, for example. Consider a radial center (x, y) and suppose is the estimated
value at (x , y ). Then
k k

By substituting

into (2.33), we obtain

Now by defining the error

19
it can be shown that

Equation (2.37) then forms the basis of a gradient descent update rule to obtain the
optimal values for the widths of radial basis functions, .

2.7
Building Complex Networks

Sometimes a problem might be complex enough that it is tempting to use all the units
on a single network and hope that the network is able to solve the problem. Our
experience suggests that this is not a good strategy. Keeping the network small and
simple is an important rule in developing neural networks. The complexity that a
problem demands is best met by breaking the problem into subproblems, solving each
subproblem separately (perhaps using neural networks) and then interconnecting the
components.

2.8
History

In this section we present a brief overview of the developmental foundations


of neural networks. For readers interested in more detailed treatment of the
history of neural networks, we recommend Simpson (1990). One interesting
Summar
y

Neural networks are best designed with attention to problem definition, underlying
physics, input representation, network architecture parameters, and performance

20
issues.

Training and test sets are critical in successful design and use of neural networks.

The four important types of neural networks are the backpropagation, the Kohonen,
the Hopfield, and the radial basis function.

Neural networks have had an interesting and lively history. Researchers


from various disciplines have contributed to the field. Developments in the
last decade have been rapid and significant

21

You might also like