Professional Documents
Culture Documents
Debasis Samanta
IIT Kharagpur
debasis.samanta.iitkgp@gmail.com
06.04.2018
Concept of learning
Learning in
Learning
Least mean
Back propagation
square
Supervised learning
It is then feed back to the network for updating the same. This
results in an improvement.
Unsupervised learning
If the target output is not available, then the error in prediction can
not be determined and in such a situation, the system learns of its
own by discovering and adapting to structural features in the input
patterns.
Reinforced learning
Stochastic learning
Hebbian learning
Here, the input-output pattern pairs (xi , yi ) are associated with the
weight matrix W . W is also known as the correlation matrix.
Competitive learning
v11
......
wj1
......
11
I11
.
vi1
v1i
...... vl1
wjk
vij
I i1 1i
v1m
2j 3k
......
vlj
......
vlm
......
wjn
.
I l1 1l
vlm
2m 3n
I
O IH O H
Io
I N1 N2 N3 O
[V] [W]
Input layer Hidden Layer Output Layer
|N2|=m |N3|=n
|N1|=l
(I , j )
1 e o I o e o I o
fi ( I i , i )
m H
l f j j j I Hj f ( I ko , k )
n
1 e e o I o e o I o
k
w11 w12 ··· w1k ··· w1n
w21 w22
··· w2k ··· w2n
W = ... .. .. .. .. ..
. . . . .
wj1 wj2 · · · wjk · · · wjn
wm1 wm2 · · · wmk · · · wmn
Since the output of the input layer’s neurons are the input to the
j-th neuron and the j-th neuron follows the log-sigmoid transfer
function, we have
1
OjH = −α ·I H
1+e H j
where j = 1, 2, · · · , m and αH is the constant co-efficient of the
transfer function.
Note that all output of the nodes in the hidden layer can be expressed
as a one-dimensional column matrix.
···
···
..
.
1
OH =
−αH ·I H
1+e j
..
.
···
··· m×1
Let us calculate the input to any k-th node in the output layer. Since,
output of all nodes in the hidden layer go to the k-th layer with weights
w1k , w2k , · · · , wmk , we have
where k = 1, 2, · · · , n
In the matrix representation, we have
IO = W T · OH
[n × 1] = [n × m] [m × 1]
for k = 1, 2, · · · , n
Hence, the output of output layer’s neurons can be represented as
···
···
..
α ·I o . −α ·I o
o o
O = eαo ·Iko −e−αo ·Iko
e k +e k
..
.
···
··· n×1
Based on the error signal, the neural network should modify its
configuration, which includes synaptic connections, that is , the
weight matrices.
E
Error, E
Initial weights
Adjusted
V, W W Best weight
weight
For simplicity, let us consider the connecting weights are the only
design parameter.
where Ii is the i-th input pattern in the training set and ei (...)
denotes the error computation of the i-th input.
Suppose, A and B are two points on the error surface (see figure
~ can be written as
in Slide 30). The vector AB
~ = (Vi+1 − Vi ) · x̄ + (Wi+1 − Wi ) · ȳ = ∆V · x̄ + ∆W · ȳ
AB
~ can be obtained as
The gradient of AB
∂E ∂E
eAB
~ = ∂V · x̄ + ∂W · ȳ
Hence, the unit vector in the direction of gradient is
h i
1 ∂E ∂E
~ = |e | ∂V · x̄ + ∂W · ȳ
ēAB
~
AB
k
where η = |eAB and k is a constant
~ |
Let us consider any k-th neuron at the output layer. For an input
pattern Ii ∈ TI (input in training) the target output TO k of the k-th
neuron be TO k .
ek = 1
2 (TO k − OO k )2
where OO k denotes the observed output of the k-th neuron.
∂E
∆V = −η (1)
∂V
and
∂E
∆W = −η (2)
∂W
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 36 / 49
Supervised learning : Back-propagation algorithm
∂E ∂E
Note that −ve sign is used to signify the fact that if ∂V (or ∂W )
> 0, then we have to decrease V and vice-versa.
Let vij (and wjk ) denotes the weights connecting i-th neuron (at the
input layer) to j-th neuron(at the hidden layer) and connecting j-th
neuron (at the hidden layer) to k-th neuron (at the output layer).
Also, let ek denotes the error at the k-th neuron with observed
output as OOko and target output TOko as per a sample intput I ∈ TI .
∂ek
We can calculate ∂wjk using the chain rule of differentiation as stated
below.
1
ek = (T o − OOko )2 (6)
2 Ok
o o
eθo Ik − e−θo Ik
OOko = o o (7)
eθo Ik + e−θo Ik
Thus,
∂ek
= −(TOko − OOko ) (9)
∂OOko
∂Ooko
= θo (1 + OOko )(1 − OOko ) (10)
∂Iko
and
∂Iko
= OjH (11)
∂wij
∂OO o ∂Iko
∂ek
Substituting the value of ∂OO o , ∂Iko
k
and ∂wjk
k
we have
∂ek
= −(TOko − OOko ) · θo (1 + OOko )(1 − OOko ) · OjH (12)
∂wjk
∂Ek
Again, substituting the value of ∂wjk from Eq. (12) in Eq.(3), we have
w¯jk = wjk +∆wjk = η ·θo (TOko −OOko )·(1+OOko )(1−OOko )·OjH +wjk (14)
1
ek = (T o − OOko )2 (16)
2 Ok
o o
eθo Ik − e−θo Ik
Oko = o o (17)
eθo Ik + e−θo Ik
Iko = w1k · O1H + w2k · O2H + · · · + wjk · OjH + · · · wmk · Om
H
(18)
1
OjH = H
(19)
1 + e−θH Ij
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 42 / 49
Calculation of v¯ij
...continuation from previous page ...
IjH = vij · O1H + v2j · O2H + · · · + vij · OjI + · · · vij · OlI (20)
Thus
∂ek
= −(TOko − OOko ) (21)
∂OOko
∂Oko
= θo (1 + OOko )(1 − OOko ) (22)
∂Iko
∂Iko
= wik (23)
∂OjH
∂OjH
= θH · (1 − OjH ) · OjH (24)
∂IjH
∂IjH
= OiI = IiI (25)
∂vji
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 43 / 49
Calculation of v¯ij
∂ek 2 H H
= −θo · θH (TOko − OOko ) · (1 − OO o ) · Oj · Ii · wjk (26)
∂vij k
∂ek
Substituting the value of ∂vij using Eq. (4), we have
2 H H
∆vij = η · θo · θH (TOko − OOko ) · (1 − OO o ) · Oj · Ii · wjk (27)
k
2 H H
v¯ij = vij + η · θo · θH (TOko − OOko ) · (1 − OO o ) · Oj · Ii · wjk (28)
k
we have
2 ) H
∆wjk = η θo · (TOko − OOko ) · (1 − OO o · O (29)
k j
is the update for k -th neuron receiving signal from j-th neuron at
hidden layer.
2 H H I
∆vij = η · θo · θH (TOko − OOko ) · (1 − OO o ) · (1 − Oj ) · Oj · Ii · wjk (30)
k
is the update for j-th neuron at the hidden layer for the i-th input at the
i-th neuron at input level.
Hence,
h i
[∆W ]m×n = η · O H · [N]1×n (31)
m×1
where
n o
2
[N]1×n = θo (TOko − OOko ) · (1 − OO o) (32)
k
where k = 1, 2, · · · n
Thus, the updated weight matrix for a sample input can be written as
W̄ m×n
= [W ]m×n + [∆W ]m×n (33)
P|T | 2
1 1
Ē = 2 · |T | t=1 T t Oko − O t Oko
and
1 P ∂ Ē
∆vij = |T | ∀t∈T ∂V
Once ∆wjk and ∆vij are calculated, we will be able to obtain w¯jk and v¯ij
Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 48 / 49
Any questions??