You are on page 1of 26

10

Widrow-Hoff Learning
(LMS Algorithm)

1
10 ADALINE Network

A AA
Input Linear Neuron

A AA
p a
Rx1 W Sx1

A AA
SxR n
Sx1 a = purelin ( Wp + b ) = Wp + b
1 b
Sx1
R S

a = purelin (Wp + b)

w i, 1
T T w i, 2
a i = purelin ( n i ) = purelin ( iw p + b i ) = iw p + b i iw =


w i, R

2
10 Two-Input ADALINE

Inputs Two-Input Neuron p2

AA AA
a<0 a>0
-b/w1,2

AAAA
p1 w1,1 w
Σ
1
n a
p2 1w Tp + b = 0
w1,2 b
p1
1 -b/w1,1
a = purelin (Wp + b)

T T
a = purelin ( n ) = purelin ( 1w p + b ) = 1w p + b

T
a = 1w p + b = w 1, 1 p 1 + w 1, 2 p 2 + b

3
10 Mean Square Error
Training Set:
{p 1, t 1} , { p 2, t 2} , … , {pQ , tQ}

Input: pq Target: tq

Notation:
T
1w
T
x = z = p a =
1
w p+b a = x z
b 1

Mean Square Error:


2 2 2
F( x )= E[ e ] = E[ ( t – a ) ] = E[ ( t – xT z ) ]

4
10 Error Analysis
2 2 2
F( x )= E[ e ] = E[ ( t – a ) ] = E[ ( t – xT z ) ]

2 T
F ( x ) = E [ t – 2t x T z + x T zz x ]

2 T
F ( x ) = E [ t ] – 2 x T E [ t z ] + x T E [ zz ] x

T T
F ( x ) = c – 2 x h + x Rx

2 T
c = E[ t ] h = E[tz] R = E [ zz ]

The mean square error for the ADALINE Network is a


quadratic function:
T 1 T
F ( x ) = c + d x + --- x Ax
2

d = –2 h A = 2R
5
10 Stationary Point
Hessian Matrix:
A = 2R

The correlation matrix R must be at least positive semidefinite. If


there are any zero eigenvalues, the performance index will either
have a weak minumum or else no stationary point, otherwise
there will be a unique global minimum x*.

∇F ( x ) = ∇ c + d x + --- x Ax = d + Ax = – 2 h + 2 Rx
T 1 T
 2 

– 2 h + 2 Rx = 0

If R is positive definite:
x∗ = R – 1 h
6
10 Approximate Steepest Descent
Approximate mean square error (one sample):
2 2
F̂ ( x ) = ( t ( k ) – a ( k ) ) = e ( k )

Approximate (stochastic) gradient:

ˆ F ( x ) = ∇e2 ( k )

2
2 ∂e ( k ) ∂e ( k )
[ ∇e ( k ) ] j = ---------------- = 2e ( k ) ------------- j = 1, 2, … , R
∂w1, j ∂w 1, j

2
2
∂e ( k ) ∂e ( k )
[ ∇e ( k ) ] R + 1 = ---------------- = 2e ( k ) -------------
∂b ∂b

7
10 Approximate Gradient Calculation

∂e ( k ) ∂[ t ( k ) – a ( k ) ] ∂ T
------------- = ---------------------------------- = [ t ( k ) – ( 1w p ( k ) + b ) ]
∂w 1, j ∂w 1, j ∂ w1, j

R
∂e ( k ) ∂  
------------- = t ( k ) –  ∑ w 1, i p i ( k ) + b
∂w1, j ∂ w 1, j  
i=1

∂e ( k ) ∂e ( k )
------------- = – p j ( k ) ------------- = – 1
∂ w 1, j ∂b

ˆ F ( x ) = ∇e 2 ( k ) = – 2e ( k ) z ( k )

8
10 LMS Algorithm
x k + 1 = x k – α ∇F ( x )
x = xk

x k + 1 = x k + 2αe ( k ) z ( k )

1w ( k + 1 ) = 1w ( k ) + 2αe ( k ) p ( k )

b ( k + 1 ) = b ( k ) + 2αe ( k )

9
10 Multiple-Neuron Case

iw ( k + 1 ) = iw ( k ) + 2αe i ( k ) p ( k )

b i ( k + 1 ) = b i ( k ) + 2αe i ( k )

Matrix Form:

T
W ( k + 1 ) = W ( k ) + 2α e ( k ) p ( k )

b ( k + 1 ) = b ( k ) + 2α e ( k )

10
10 Analysis of Convergence
x k + 1 = x k + 2αe ( k ) z ( k )

E [ x k + 1 ] = E [ x k ] + 2αE [ e ( k ) z ( k ) ]

T
E [ x k + 1 ] = E [ x k ] + 2α { E [ t ( k ) z ( k ) ] – E [ ( x k z ( k ) ) z ( k ) ] }

T
E [ x k + 1 ] = E [ x k ] + 2α { E [ t k z ( k ) ] – E [ ( z ( k ) z ( k ) ) x k ] }

E [ x k + 1 ] = E [ x k ] + 2α { h – R E [ x k ] }

E [ x k + 1 ] = [ I – 2α R ]E [ x k ] + 2α h

For stability, the eigenvalues of this


matrix must fall inside the unit circle.
11
10 Conditions for Stability

eig ( [ I – 2α R ] ) = 1 – 2αλ i < 1

(where λi is an eigenvalue of R)

Since λi > 0 , 1 – 2αλ i < 1 .

Therefore the stability condition simplifies to

1 – 2 αλi > – 1

α < 1 ⁄ λi for all i

0 < α < 1 ⁄ λ max

12
10 Steady State Response

E [ x k + 1 ] = [ I – 2α R ]E [ x k ] + 2α h

If the system is stable, then a steady state condition will be reached.

E [ x ss ] = [ I – 2α R ]E [ x ss ] + 2α h

The solution to this equation is

E [ x ss ] = R h = x∗
–1

This is also the strong minimum of the performance index.

13
10 Example
 –1   1 
   
Banana  p1 = 1 , t1 = –1  Apple  p 2 = 1 , t 2 = 1 
   
 –1   – 1 

T 1 T 1 T
R = E [ pp ] = --- p 1 p 1 + --- p 2 p 2
2 2

–1 1 1 0 0
1 1
R = --- 1 – 1 1 – 1 + --2- 1 1 1 – 1 = 0 1 – 1
2
–1 –1 0 –1 1

λ 1 = 1.0, λ 2 = 0.0, λ 3 = 2.0

1 1
α < ------------ = ------- = 0.5
λmax 2.0
14
10 Iteration One
–1
Banana a ( 0 ) = W ( 0 ) p ( 0 ) = W ( 0 ) p1 = 0 0 0 1 = 0
–1

e ( 0 ) = t ( 0) – a( 0 )= t1 – a( 0 )= – 1 – 0= –1

W ( 1 ) = W ( 0 ) + 2αe ( 0 ) p T ( 0 )

T
–1
W ( 1 ) = 0 0 0 + 2 ( 0.2 ) ( – 1 ) 1 = 0.4 – 0.4 0.4
–1

15
10 Iteration Two

1
Apple a ( 1 ) = W ( 1 ) p ( 1 ) = W ( 1 ) p 2 = 0.4 – 0.4 0.4 1 = – 0.4
–1

e ( 1 ) = t ( 1 ) – a ( 1 ) = t 2 – a ( 1 ) = 1 – ( – 0 . 4) = 1 . 4

T
1
W ( 2 ) = 0.4 – 0.4 0.4 + 2 ( 0.2 ) ( 1.4 ) 1 = 0.96 0.16 – 0.16
–1

16
10 Iteration Three

–1
a ( 2 ) = W ( 2 ) p ( 2 ) = W ( 2 ) p 1 = 0.96 0.16 – 0.16 1 = – 0.64
–1

e ( 2 ) = t ( 2 ) – a ( 2 ) = t 1 – a ( 2 ) = – 1 – ( – 0.64 ) = – 0.36

T
W ( 3 ) = W ( 2 ) + 2αe ( 2 ) p ( 2 ) = 1.1040 0.0160 – 0.0160

W(∞ ) = 1 0 0

17
10 Adaptive Filtering
Tapped Delay Line Adaptive Filter

AA
Inputs ADALINE
y(k) p1(k) = y(k)

AA
y(k)

AAAAAA
D

AA
w1,1
p2(k) = y(k - 1)
D

AA
w1,2

AA AA AA
D n(k) a(k)

AA
Σ
SxR

AA
D

AA
b

AA
D 1
pR(k) = y(k - R + 1)
D w1,R

a(k) = purelin (Wp(k) + b)


R
a ( k ) = purelin ( Wp + b ) = ∑ w1, i y ( k – i + 1 ) + b
i=1 18
10 Example: Noise Cancellation
EEG Signal Contaminated Restored Signal
(random) s t Signal e
+
- "Error"
Contaminating
Noise Adaptively Filtered
Noise to Cancel
m
Contamination

Noise Path
Filter
Graduate
Student

v a
Adaptive
Filter
60-Hz
Noise Source
Adaptive Filter Adjusts to Minimize Error (and in doing
this removes 60-Hz noise from contaminated signal)

19
10 Noise Cancellation Adaptive Filter

Inputs ADALINE

AA A AA
v(k) w1,1
n(k) a(k)
Σ
AA
SxR
w1,2
D

a(k) = w1,1 v(k) + w1,2 v(k - 1)

20
10 Correlation Matrix
R = [ zz T ] h = E[ tz ]

z(k ) = v(k)
v(k – 1)

t (k ) = s( k ) + m( k )

2
E[v (k )] E [ v ( k )v ( k – 1 ) ]
R =
2
E [ v ( k – 1 )v ( k ) ] E[v (k – 1 )]

h = E [ ( s ( k ) + m ( k ) )v ( k ) ]
E [ ( s ( k ) + m ( k ) )v ( k – 1 ) ]
21
10 Signals
2πk 3π
v ( k ) = 1.2 sin  ---------  m ( k ) = 1.2 sin  --------- – ------ 
2πk
 3  3 4

3
2
 sin 2πk
---------  = ( 1.2) 0.5 = 0.72

2 21 2
E [ v ( k ) ] = ( 1.2) ---
3   3 
k=1

2 2
E [ v ( k – 1 ) ] = E [ v ( k ) ] = 0.72

3
2π ( k – 1 ) 
1
E [ v ( k )v ( k – 1 ) ] = ---
3 ∑ 1.2 sin 2πk
 --------
3 
-  1.2 sin ----------------------
3
-
k=1


= ( 1.2 ) 0.5 cos ------  = – 0.36
2
3

R = 0.72 – 0.36
– 0.36 0.72
22
10 Stationary Point
E [ ( s ( k ) + m ( k ) )v ( k ) ] = E [ s ( k )v ( k ) ] + E [ m ( k )v ( k ) ]

0
3
1 1.2 sin  2πk 3π   2πk 
E [ m ( k )v ( k ) ] = ---
3 ∑  --------
 3 - – -----
-
4  
1.2 sin ---------  = – 0.51
3
k=1

E [ ( s ( k ) + m ( k ) )v ( k – 1 ) ] = E [ s ( k )v ( k – 1 ) ] + E [ m ( k )v ( k – 1 ) ]

0
3
1.2 sin  2πk 2π ( k – 1 )
--------- – ------  1.2 sin -----------------------  = 0.70
1 3π
E [ m ( k )v ( k – 1 ) ] = ---
3 ∑   3 4   3 
k=1

h = E[ ( s( k ) + m( k ) )v( k ) ] h = – 0.51
E[ ( s( k ) + m( k ) )v( k – 1) ] 0.70

–1
x∗ = R h = 0.72 – 0.36 – 0.51 = – 0.30
–1

– 0.36 0.72 0.70 0.82


23
10 Performance Index
T T
F ( x ) = c – 2 x h + x Rx
2 2
c = E[t (k )]= E[ (s(k ) + m(k ) ) ]

2 2
c = E [ s ( k ) ] + 2E [ s ( k )m ( k ) ] + E [ m ( k ) ]
0.2
0.2
1 1

2 2 3
E [ s ( k ) ] = ------- s ds = ---------------s = 0.0133
0.4 3 ( 0.4 ) – 0.2
– 0.2

3 2
1   2π   = 0.72
∑ 
2
E [ m ( k ) ] = --- 3π
1.2 sin
3
-----
- – -----
- 
3 4 
k=1

c = 0.0133 + 0.72 = 0.7333

F ( x∗ ) = 0.7333 – 2 ( 0.72 ) + 0.72 = 0.0133

24
10 LMS Response

Original and Restored EEG Signals


2 4

2 Original and Restored EEG Signals

0
1
-2

-4
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0

2 EEG Signal Minus Restored Signal


-1
0

-2

-2 -4
-2 -1 0 1 2 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Time

25
10 Echo Cancellation

AAAA
AAAA
+
Transmission

AAA AAA AAA


Line
-

AAAAAAAA AAA AAA AAA


Adaptive Adaptive

AAA AAAAAAA
Phone Hybrid Hybrid Phone
Filter Filter

AAAA
Transmission
Line +

26

You might also like