1
Machine Learning Machine Learning
Machine learning involves adaptive mechanisms Machine learning involves adaptive mechanisms
that enable computers to learn from experience, that enable computers to learn from experience,
learn by example and learn by analogy. Learning learn by example and learn by analogy. Learning
capabilities can improve the performance of an capabilities can improve the performance of an
intelligent system over time. The most popular intelligent system over time. The most popular
approaches to machine learning are approaches to machine learning are artificial artificial
neural networks neural networks and and genetic algorithms genetic algorithms. This . This
lecture is dedicated to neural networks. lecture is dedicated to neural networks.
2
A A neural network neural network can be defined as a model of can be defined as a model of
reasoning based on the human brain. The brain reasoning based on the human brain. The brain
consists of a densely interconnected set of nerve consists of a densely interconnected set of nerve
cells, or basic information cells, or basic information processing units, called processing units, called
neurons neurons. .
The human brain incorporates nearly 10 billion The human brain incorporates nearly 10 billion
neurons and 60 trillion connections, neurons and 60 trillion connections, synapses synapses, ,
between them. By using multiple neurons between them. By using multiple neurons
simultaneously, the brain can perform its functions simultaneously, the brain can perform its functions
much faster than the fastest computers in existence much faster than the fastest computers in existence
today. today.
3
An artificial neural network consists of a number of An artificial neural network consists of a number of
very simple processors, also called very simple processors, also called neurons neurons, which , which
are analogous to the biological neurons in the are analogous to the biological neurons in the
brain. brain.
The neurons are connected by weighted links The neurons are connected by weighted links
passing signals from one neuron to another. passing signals from one neuron to another.
4
Network Structure
The output signal is transmitted through the The output signal is transmitted through the
neurons outgoing connection. The outgoing neurons outgoing connection. The outgoing
connection splits into a number of branches connection splits into a number of branches
that transmit the same signal. The outgoing that transmit the same signal. The outgoing
branches terminate at the incoming branches terminate at the incoming
connections of other neurons in the network. connections of other neurons in the network.
5
Biological Neural Network Artificial Neural Network
Soma
Dendrite
Axon
Synapse
Neuron
Input
Output
Weight
Analogy between biological and Analogy between biological and
artificial neural networks artificial neural networks
Soma
Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
Input Layer Output Layer
Middle Layer
I
n
p
u
t
S
i
g
n
a
l
s
O
u
t
p
u
t
S
i
g
n
a
l
s
6
Biological neural network Biological neural network
Soma
Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
Neural Networks NN 1 7
NNs: goal and design
Knowledge about the learning task is given in the form
of a set of examples (dataset) called training examples.
A NN is specified by:
an architecture: a set of neurons and links connecting
neurons. Each link has a weight,
a neuron model: the information processing unit of the NN,
a learning algorithm: used for training the NN by modifying
the weights in order to solve the particular learning task
correctly on the training examples.
The aim is to obtain a NN that generalizes well, that is,
that behaves correctly on new examples of the
learning task.
Neural Networks NN 1 8
Dimensions of a Neural Network
network architectures
types of neurons
learning algorithms
applications
Neural Networks NN 1 9
Network architectures
Three different classes of network architectures
singlelayer feedforward neurons are organized
multilayer feedforward in acyclic layers
recurrent
The architecture of a neural network is linked with the
learning algorithm used to train
Neural Networks NN 1 10
Single Layer Feedforward
Input layer
of
source nodes
Output layer
of
neurons
Neural Networks NN 1 11
Multi layer feedforward
Input
layer
Output
layer
Hidden Layer
342 Network
Neural Networks NN 1 12
Recurrent Network with hidden neuron: unit delay operator z
1
is
used to model a dynamic system
z
1
z
1
z
1
Recurrent network
input
hidden
output
Neural Networks NN 1 13
Input Signal and Weights
Input signals
An input may be either a
raw / preprocessed signal or
image. Alternatively, some
specific features can also be
used.
If specific features are used
as input, their number and
selection is crucial and
application dependent
Weights
Weights are connected
between an input and a
summing node. These affect to
the summing operation.
The quality of network can be
seen from weights
Bias is a constant input with
certain weight.
Usually the weights are
randomized in the beginning
Neural Networks NN 1 14
The Neuron
The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with
weights W
1
, W
2
, , W
m
2 An adder function (linear combiner) for computing the
weighted sum of the inputs
(real numbers):
3 Activation function (squashing function) for limiting
the amplitude of the neuron output.
!
!
m
1
j jx w u
j
) (u y b + !
15
he neuron as a simple computing element The neuron as a simple computing element
Diagram of a neuron Diagram of a neuron
Neuron
Y
Input Signals
x
1
x
2
x
n
Output Signals
Y
Y
Y
w
2
w
1
w
n
Weights
16
The neuron computes the weighted sum of the input The neuron computes the weighted sum of the input
signals and compares the result with a signals and compares the result with a threshold threshold
value value, , . If the net input is less than the threshold, . If the net input is less than the threshold,
the neuron output is the neuron output is 1. But if the net input is 1. But if the net input is
greater than or equal to the threshold, the neuron greater than or equal to the threshold, the neuron
becomes activated and its output attains a value +1. becomes activated and its output attains a value +1.
The neuron uses the following transfer or The neuron uses the following transfer or activation activation
function function::
This type of activation function is called a This type of activation function is called a sign sign
function function. .
!
!
n
i
i i
w x X
1

,

> +
!
X
X
Y
if , 1
if , 1
17
Architecture of a typical artificial neural network Architecture of a typical artificial neural network
Input Layer
Output Layer
Middle Layer
I
n
p
u
t
S
i
g
n
a
l
s
O
u
t
p
u
t
S
i
g
n
a
l
s
Neural Networks NN 1 18
Bias of a Neuron
The bias b has the effect of applying an affine
transformation to the weighted sum u
v = u + b
v is called induced field of the neuron
x2 x1 u !
x1x2=0
x1x2= 1
x1
x2
x1x2= 1
Effect of Bias
Without bias function
With Bias Function
Neural Networks NN 1 20
Bias as extra input
Input
signal
Synaptic
weights
Summing
function
Activation
function
Local
Field
v
Output
y
x
1
x
2
x
m
w
2
w
m
w
1
) (
w
0
x
0
= +1
The bias is an external parameter of the neuron. It can be
modeled by adding an extra input.
b w
x w v j
m
j
j
!
!
!
0
0
..
Neural Networks NN 1 21
Activation Function
There are different activation functions used in different applications. The
most common ones are:
Hardlimiter Piecewise linear Sigmoid Hyperbolic tangent

,

>
!
0 0
0 1
v if
v if
v
,

> >
>
!
2 1 0
2 1 2 1
2 1 1
v if
v if v
v if
v
) exp( 1
1
av
v
+
!
v v tanh !
Neural Networks NN 1 22
Neuron Models
The choice of determines the neuron model. Examples:
step function:
ramp function:
sigmoid function:
with z,x,y parameters
Gaussian function:
'
+
'
'
+
'
!
2
2
1
exp
2
1
) (
W
Q
W T
v
v
) exp( 1
1
) (
y xv
z v
+ +
+ !
,

+
>
!
otherwise )) /( ) )( ((
if
if
) (
c d a b c v a
d v b
c v a
v

,

>
!
c v b
c v a
v
if
if
) (
23
Activation functions Activation functions
Step function
Sign function
+1
1
0
+1
1
0 X
Y
X
Y
+1
1
0 X
Y
Sigmoid function
+1
1
0 X
Y
Linear function

,

>
!
0 if , 0
0 if , 1
X
X
Y
step

,

> +
!
0 if , 1
0 if , 1
X
X
Y
sign
X
sigmoid
e
Y
+
!
1
1
X Y
linear
!
24
Example
Threshold
Inputs
x
1
x
2
Output
Y
Hard
Limiter
w
2
w
1
Linear
Combiner
A neuron uses a step function as its
activation function ! 02 and W1 = 0.1,
W2 = 0.4, What is the output with the
following values of x1 and x2:
x1 x2 Y
0 0
0 1
1 0
1 1
Neural Networks NN 1 25
Learning Algorithms
Depend on the network architecture:
Error correcting learning (perceptron)
Delta rule (AdaLine, Backprop)
Competitive Learning (Self Organizing Maps)
26
The Perceptron The Perceptron
The operation of Rosenblatts perceptron is based The operation of Rosenblatts perceptron is based
on the on the McCulloch and Pitts neuron model McCulloch and Pitts neuron model. The . The
model consists of a linear combiner followed by a model consists of a linear combiner followed by a
hard limiter. hard limiter.
The weighted sum of the inputs is applied to the The weighted sum of the inputs is applied to the
hard limiter, which produces an output equal to +1 hard limiter, which produces an output equal to +1
if its input is positive and if its input is positive and 1 if it is negative. 1 if it is negative.
27
This is done by making small adjustments in the This is done by making small adjustments in the
weights to reduce the difference between the weights to reduce the difference between the
actual and desired outputs of the perceptron. The actual and desired outputs of the perceptron. The
initial weights are randomly assigned, usually in initial weights are randomly assigned, usually in
the range [ the range [0.5, 0.5], and then updated to obtain 0.5, 0.5], and then updated to obtain
the output consistent with the training examples. the output consistent with the training examples.
How does the perceptron learn its classification How does the perceptron learn its classification
tasks? tasks?
28
If at iteration If at iteration p p, the actual output is , the actual output is Y Y( (p p) and the ) and the
desired output is desired output is Y Y
d d
( (p p), then the error is given by: ), then the error is given by:
where where p p = 1, 2, 3, . . . = 1, 2, 3, . . .
Iteration Iteration p p here refers to the here refers to the p pth training example th training example
presented to the perceptron. presented to the perceptron.
If the error, If the error, ee( (p p), is positive, we need to increase ), is positive, we need to increase
perceptron output perceptron output Y Y( (p p), but if it is negative, we ), but if it is negative, we
need to decrease need to decrease Y Y( (p p). ).
) ( ) ( ) ( p Y p Y p e
d
!
29
The perceptron learning rule The perceptron learning rule
where where p p = 1, 2, 3, . . . = 1, 2, 3, . . .
EE is the is the learning rate learning rate, a positive constant less than , a positive constant less than
unity. unity.
The perceptron learning rule was first proposed by The perceptron learning rule was first proposed by
Rosenblatt Rosenblatt in 1960. Using this rule we can derive in 1960. Using this rule we can derive
the perceptron training algorithm for classification the perceptron training algorithm for classification
tasks. tasks.
) ( ) ( ) ( ) 1 ( p e p x p w p w
i i i
+ ! + E
30
Step 1 Step 1: Initialisation : Initialisation
Set initial weights Set initial weights ww
11
, , ww
22
,, ,, ww
n n
and threshold and threshold
to random numbers in the range [ to random numbers in the range [0.5, 0.5]. 0.5, 0.5].
Perceptrons training algorithm Perceptrons training algorithm
31
Step 2 Step 2: Activation : Activation
Activate the perceptron by applying inputs Activate the perceptron by applying inputs x x
11
( (p p), ),
x x
22
( (p p),, ),, x x
n n
( (p p) and desired output ) and desired output Y Y
d d
( (p p). Calculate ). Calculate
the actual output at iteration the actual output at iteration p p = 1 = 1
where where n n is the number of the perceptron inputs, is the number of the perceptron inputs,
and and step step is a step activation function. is a step activation function.
Perceptrons training algorithm (continued) Perceptrons training algorithm (continued)
!
!
n
i
i i
p w p x step p Y
1
) ( ) ( ) (
32
) ( ) ( ) ( p Y p Y p e
d
!
If the error, If the error, ee( (p p), is positive, we need to ), is positive, we need to
increase perceptron output increase perceptron output Y Y( (p p), but if it ), but if it
is negative, we need to decrease is negative, we need to decrease Y Y( (p p). ).
33
Step 3 Step 3: Weight training : Weight training
Update the weights of the perceptron Update the weights of the perceptron
where where is is the the weight weight correction correction at at iteration iteration p p. .
The weight correction is computed by the The weight correction is computed by the delta delta
rule rule::
where where
Step 4 Step 4: Iteration : Iteration
Increase iteration Increase iteration p p by one, go back to by one, go back to Step 2 Step 2 and and
repeat the process until convergence. repeat the process until convergence.
) ( ) ( ) 1 ( p w p w p w
i i i
( + ! +
Perceptrons tarining algorithm (continued) Perceptrons tarining algorithm (continued)
) ( ) ( ) ( p e p x p w
i i
E ! (
) ( ) ( ) ( p Y p Y p e
d
!
34
Example of perceptron learning: the logical operation Example of perceptron learning: the logical operation AND AND
Inputs
x
1
x
2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Y
d
1
Initial
weights
w
1
w
2
1
0.3
0.3
0.3
0.2
0.1
0.1
0.1
0.1
0
0
1
0
Actual
output
Y
Error
e
0
0
1
1
Final
weights
w
1
w
2
0.3
0.3
0.2
0.3
0.1
0.1
0.1
0.0
0
0
1
1
0
1
0
1
0
0
0
2
1
0.3
0.3
0.3
0.2
0
0
1
1
0
0
1
0
0.3
0.3
0.2
0.2
0.0
0.0
0.0
0.0
0
0
1
1
0
1
0
1
0
0
0
3
1
0.2
0.2
0.2
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0
0
1
0
0
0
1
1
0.2
0.2
0.1
0.2
0.0
0.0
0.0
0.1
0
0
1
1
0
1
0
1
0
0
0
4
1
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0
0
1
1
0
0
1
0
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0.1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0
0
0
1
0
0
0
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0
Threshold: = 0.2; learning rate: E = 0.1
35
Homework Exercise
By hand train a perceptron to learn an 'or'
function (write down the training sequence)
Train one to learn inference
Train one to learn the XOR function
36
The aim of the perceptron is to classify inputs, The aim of the perceptron is to classify inputs,
x x
11
, , x x
22
, . . ., , . . ., x x
n n
, into one of two classes, say , into one of two classes, say
AA
11
and and AA
22
. .
In the case of an elementary perceptron, the n In the case of an elementary perceptron, the n 
dimensional space is divided by a dimensional space is divided by a hyperplane hyperplane into into
two decision regions. The hyperplane is defined by two decision regions. The hyperplane is defined by
the the linearly separable linearly separable function function::
0
1
!
!
n
i
i i
w x
37
Linear separability in the perceptrons Linear separability in the perceptrons
x
1
x
2
Class A
2
Class A
1
1
2
x
1
w
1
+ x
2
w
2
= 0
(a) Twoinput perceptron. (b) Threeinput perceptron.
x
2
x
1
x
3
x
1
w
1
+ x
2
w
2
+ x
3
w
3
= 0
1
2