You are on page 1of 9

Chapter 8 ¡r, Artificial Neural Networks 2gl

The Multi-Layer Perceptron


Figure 8-4 shows the structure of a typical neural network model. It has an input
layeg where data enters the network; and a second layer, known as the hidden
layer, comprised of artificial neurons, each of which receives multiple inputs
from the input layer. The artificial neurons summaúze their inputs and pass
the results to the output layer where they are combined again. Networks with
this architecture are called multi-layer perceptrons (NILPs).

lnput Layer

Hidden Layer

ffi
0utput Layer

Data

Figure 8-4: A multi-layer perceptron with a single hidden layer.

The input layer is a reminder that the inputs to the network must have similar
input ranges/ which is usually achieved by standardizing all the inputs. If the
ranges are not in a similar range, then one of the inputs will dominate initially
(probably the one with the largest values), and the neural network will have to
spend many training cycles "learning" that the weights should be small.

If[ Unless the input data is atready well behaved, with values close to zero,
standardizing the training data for neural networks is a good idea.

The hidden layer contains the non-linear activation functions, of which the
hyperbolic tangent is often preferred because it spans both positive and nega-
tive values.
The transfer function for the output layer depends on the target for the neural
network. For a continuous target, a linear combination is used. For a binary tar-
get, a logistic function is used, so the network behaves like a logistic regression
producing a probability estimate. In other cases, more exotic transfer functions
might be chosen.
2f9i2 Chapter 8 m Artificial Neural Networks

A Network Example
The neural network shown in Figure 8-5 represents a model for estimating real
estate values. The topology, or structure, of this network is typical of networks
used for prediction and classification. The units are organized into three layers.
The layer on the left is connected to the inputs and called the input layer.In this
example, the input layer standardizes the values.

output
from unit

¡nput
I we¡ght

Figurc 8-5: The real estate training example shown here provides the input into a neural
network and illustrates that a network is filled with seemingly meaningless weights.

The hidden layer is connected neither to the inputs nor to the output of the
network. Each unit in the hidden layer is typically fully connected to all the units
in the input layer. Because this network contains standard units, the units in the
hidden layer calculate their output by multiplying the value of each input by its
corresponding weighf adding these up and applying the transfer function. A
neural network can have any number of hidden layers, but in general, one hid-
den layer is sufficient. The wider the layer (that is, the more units it contains) the
greater the capacity of the network to recognize patterns. This greater capacity
has a drawback, though, because the neural network can memorize "patterns
of one" in the training examples. You want the network to generalize on the training
set, not to memorize it. To achieve this, the hidden layer should not be too wide.
Chapter 8 m Artificial Neural Networks 29,

fffri[itlfflñ rhe dsk of overfitting increases with the number of hidden-


layer nodes. A small number of hidden nodes with non-linear transfer functions
are sufficient to create very flexible models.

Notice that the units in Figure 8-5 each have an additional input coming
down from the top. This is the constant input, sometimes called a bias, and
is always set to 1. Like other inputs, it has a weight and is included in the
combination function. The bias is like the intercept in a regression equation;
it acts as a global offset that helps the network better capture patterns. The
training phase adjusts the weights on constant inputs just as it does on the
other weights in the network.
The last unit on the right is the output layer because it is connected to
the output of the neural network. It is also fully connected to all the units in the
hidden layer. Most of the time, neural networks are used to calculate a single
value, so there is only one unit in the output layer. In this example, the net-
work produces a value of §176,228, which is quite close to the actual value of
$171,000. The output layer uses a simple linear transfer functior¡ so the output
is a weighted linear combination of the hidden-layer outputs.

Network Topologies
It is possible for the output layer to have more than one unit. For instancg a
department store chain wants to predict the likelihood that customers will
be purchasing products from various departments, such as women's apparel,
furniture, and entertainment. The stores want to use this information to plan
promotions and direct target mailings.
To make this prediction, they might set up the neural network shown in
Figure 8-6. This network has three outputs, one for each department. The
outputs are a propensity for the customer described in the inputs to make his
or her next purchase from the associated department.

last purchase
propensity to purchase
womeñ's apparel
a9e
propensi§ to purchase
fuinituie -
- gender

.propensity to purchase ->


'enteñainment +
avg balance

and so on

Figure 8-6: This network has more than one output and is used to estimate the probability
that customers will make a purchase in each of three departments.
29{l Chapter 8 m Artificial Neural Networks

After feeding the inputs for a customer into the network, the network calculates
three values. Given all these outputs, how can the department store determine
the right promotion or promotions to offer the customer? Some common meth-
ods used when working with multiple model outputs:
r Take the department corresponding to the output with the maximum value.
r Take departments corresponding to the outputs with the top three values.
¡ Take all departments corresponding to the outputs that exceed some
threshold value.
r Take all departments corresponding to units that are some percentage of
the unit with the maximum value.
A1l of these possibilities work well and have their strengths and weaknesses in
different situations. There is no one right answer that always works. In practice,
you want to try several of these possibilities on the test set to determine which
works best in a particular situation.
In the department store example, a customer could reasonably have a high
propensity to purchase in all three departments. In other classification problems,
such as recognizing letters of the alphabet, there is only one right answer. In
these caset a class is only assigned when one output has probability above an
acceptance threshold and all other outputs have probability below a rejection
threshold. When there are two or more "winners i' no classification can be made.
Figure 8-7 illustrates several variations on the basic neural network architecture.

Figure 8-7: There are many variations on the basic neural network architecture.
Chapter I .* Artificial Neural Networks 295

The top network features multiple outputs. The middle network includes con-
nections directly from the inputs to the target layer, as well as the hidden layer.
These connections are called direct connectiorzs. The bottom network contains
more than one hidden layer. All of these networks are examples of feed-forward
neural networks. The name arises from the fact that the data starts at inputs
and feeds forward to the output without any loops.
-
A Sample Application: Real Estate Appraisal
Neural networks have the ability to learn by example which is analogous to the
way that human experts learn from experience. The following example applies
neural networks to solve a problem familiar to most readers real estate appraisal.
-
The appraiser or real estate agent is a good example of a human expert in a
well-defined domain. Houses are described by a fixed set of standard features
taken into account by the expert and turned into an appraised value. In1992,
researchers at IBM recognized this as a good problem for neural networks. This
application, which was described by Joseph Bigus in his book, Data Mining with
Neural Networks, is the earliest truly commercial application of neural networks
that the authors are aware of.
A neural network takes specific inputs
housing sheet
- in this case the information
and turns them into a specific output
from the
appraised value for
-
the house. The list of inputs is well defined because of two factors: extensive use of
the multiple listing service (MLS) to share information about the housing market
among different real estate agents and standardization of housing descriptions
for mortgages sold on secondary markets. The desired output is well defined as
well dollar amount. In additiory a wealth of experience exists in the
- a specific
form of previous sales for teaching the network how to value a house.
lAlhy would you want to automate such appraisals? Clearly, automated appraisals
could help real estate agents better match prospective buyers to prospective homes,
improving the productivity of even inexperienced agents. Another use would be
to set up Web pages where prospective buyers could describe the homes that they
wanted immediate feedback on how much their dream homes cost.
- and get
Perhaps an unexpected application is in the secondary mortgage market. Good,
consistent appraisals are critical to assessing the risk of individual loans and
loan portfolios, because one major factor affecting default is the proportion of
the value of the property at risk. If the loan value is more than L00 percent
of the market value, the risk of default goes up considerably. After the loan has
been made, how can the market value be calculated? For this purpose, Freddie
Mag the Federal Home Loan Mortgage Corporation, developed a product called
Loan Prospectgr that does these appraisals automatically for homes throughout
the United States. Loan Prospector was originally based on neural network
technology developed by u San Diego company, HNC, which has since been
merged into Fair Isaac.
2e6 Ch " Artificial Neural Networks

Back to the example. This neural network mimics an apPraiser who estimates
the market value of a house based on features of the property. She knows that
I
houses in one part of town are worth more than those in other areas. Additional
bedrooms, a lárgef garage, the style of the house, and the size of the lot are
other factors that figure into her mental calculation. She does not apply some
set formula, but balánces her experience and knowledge of the sales prices of
similar homes. Her knowledge about housing prices is not static. She is aware
of recent sale prices for homes throughout the region and can recognize trends
in prices overtime fine-tuning her caltulation to fit the latest data.
-
Etr Neurat networks are good for prediction and estimation problems. A
good problem has the following four characteristics:

I The inputs ore well understood. You have a good idea of which features
of the data are impoÉant, but not necessarily how to combine them. ü

I The output is well understood. You know what you are trying to model. )- il

I Experience is ovailoble. You have plenty of examples where both the


infuts and the output are known. These known cases are used to train
the network.
I A block box model is occeptoble.lnterpreting the model or explaining
how particular scofes wefe arrived at is not necessary.

The first step in setting up a neural network to calculate estimated housing


values is determining u rót of features that affect the sales price. Some possible
common features are shown in Table 8-1. In practice, these features work for homes
in a single geographical area.To extend the appraisal example to handle homes in
many the input data might include ZIP code information, neigh-
"uighU*hoods,
borhood á"*og.uphics, and other neighborhood quality-of-life indicators, such
as ratings of schoóls and proximity to transportation. To simplify the example,
these additional features are not included here.

Table 8-l: Common Features Describing a House

Num_Apartments Number of dwelling units lnteger: l-3


Year-Built Year built lnteger: 1850-1986

Plumbing-Fixtures Number of plumbing fixtures lnteger: 5-17

Heating-Type Heating system tYPe Coded as A or B

Basement-Carage Basement garage (number Integer:0-2


of cars)

Attached-Garage Attached frame garage area Integer: 0-228


(in square feet)
Total living area (square feet)
lnteger: 714-4195
Deck_Area Deck / open porch area
lnteger: O-739
(square feet)
Porch_Area Enclosed porch area
lnteger: 0-452
(square feet)
Recroom_Area Recreation room area
lnteger: 0-672
(square feet)
Basement_Area Finished basement area lnteger:0-BlO
(square feet)

Training the network builds a model that


can then be used to estimate the
target value for unknown examples. Training
presents known examples (data
from previous sales) to the network so that
price' The training examples need two
it;; learn how to calculate the sales
more additional features: the sales
of the home and the salei date. The sales price
price is needed as the target variable.
The date is used to separate the example,
i',to training, validation, and test sets.
Thble 8-2 shows an examPle from the
training set. In addition, the date is used
to calculate the number of months in
the pu; when the sale íus mude,
network can learn about changes over time. so the

Table 8-2: sampre Record from Training


set with varues scared to Range _l

Months_Ago o-23
-o.6s22
Num_Apartments l-3
-1.0000
Year_Built r850-1986 1923 +0.0730
Plumbing_Fixtures 5-17
-o.3077
Heating_Type Coded as A or B
+1.0000
Basement_Garage o-2
-r.0000
Attached_Garage o-228 120 +o.0524
Living_Area 714-4185 1,614
-0.4813
Deck_Area o-738
-1.0000
Porch_Area o-452 2to -0.0705
Recroom_Area o-672
-1.0000
Basement_Area 0-8lo 175
-o.5672
298 Chapter 8 m Artificial Neural Networks

The process of training the network is actually the process of adjusting weights
inside it to arrive at the best combination of weights for making the desired
predictions. The network starts with a random set of weights, so it initially
performs very poorly. However, by reprocessing the training set over and over
and adjusting the internal weights each time to reduce the overall error, the
network gradually does a better and better job of approximating the target
values in the training set. When the approximations no longer improve, the
network stops training. The training process is explained in the following sec-
tion of this chapter.
This process of adjusting weights is sensitive to the representation of the data
going in. For instance, consider a field in the data that measures lot size. If lot
size is measured in acres, then the values might reasonably go from about Lz8
to L acre. If measured in square feet, the same values would be5,445 square feet
to 43,560 square feet. Neural networks work best when their inputs are small-
ish numbers. For instance, when an input variable takes on very large values
relative to other inputs, then this variable dominates the calculation of the
target. The neural network wastes valuable iterations by reducing the weights
on this input to lessen its effect on the output. That is, the first "pattern" that
the network will find is that the lot size variable has much larger values than
other variables. Because this is not particularly interesting, using the lot size
as measured in acres rather than square feet would be better. In general, it is
a good idea to standa rdize all numeric inputs to a neural network, although
inputs that take on a small range of values (say from zero to one) do not really
need to be standardized.
The only categorical variable in this data , Heating-Wpe,only takes on two
values, A and B, so the solution is to create an indicator variable, which takes
on the value of 0 (when the value is "N') and L (when the value is "B"). This can
also be standardized, so one value is negative and the other positive. Chapters
18 and L9 discuss such transformations of categorical variables in detail.
With these simple techniques, it is possible to transform all the fields for the
sample house record shown earlier (see Table 8-2) into values more suitable for
training a neural network. Training is the process of iterating through the train-
ing set to adjust the weights in the neural network. Each iteration is sometimes
called a generation.
After the network has been trained, the performance of each generation must
be measured on the validation set to select the generation of weights that mini-
mizes error on data not used for training. As with other modeling approaches,
neural networks can learn patterns that exist only in the training set, resulting
Cha 8 ffi Art¡ficial Neural Networl¡s 2gg

in overfitting. To find the best network for unseen data, the training
process
remembers each set of weights calculated during each
generation. The final
network comes from the generation that works besi on the"validation
set, rather
than the one that works best on the training set.
When the model's performance on the rrulidution set is satisfactory,
the neu-
ral network model is ready for use. It has learned from the training
examples
and figured out how to calculate the sales price from all the
inputs. The model
takes descriptive information about a house suitably mapped,
ánd produces an
output which is the neural network's estimate of the hoáe,s value.

Training Neural Networks


Training a neural network means using the training data to adjust
the network
weights so that the model does a good job of estimatLg target values
for records
that are not part-of the training data. In some ways, this is s"imilar
to finding the
coefficients for the best-fit line in a regression model. However,
a single Ue"st-fit
line exists for a particular set of-traiñing observations, and there is
a simple,
deterministic method for calculating iti coefficients; there is no
equivalent
method for calculating the best set of weights for a neural network.
This is an
example of an optimization problem. The goal is to find a set
of weights that
minimizes an error function such as the average squared error.
Historically, the first successful trainir,g *éthoá fo. neural networks
-back propagation. In addition to its historicál importance, it also happens
was
to be
one of the easier methods to understand.

How Does a Neural Network Learn using Back


Propagation?
At the heart of back propagation are the following three steps:
1" The network gets a training example and, using the existing weights
in
the network, it calculates the ouput or outputs.
2' Back propagation then calculates the error by taking the difference between
the calculated result and the actual target value. -
3' The error is fed back through the network and the weights
are adjusted
to minimize the error hence the nam e backpropaga"tion because the
-
errors are sent back through the network.

You might also like