You are on page 1of 7

ECE531 Screencast 2.

2: Fisher Information

ECE531 Screencast 2.2: Fisher Information for


Estimating a Scalar Parameter

D. Richard Brown III

Worcester Polytechnic Institute

Worcester Polytechnic Institute D. Richard Brown III 1/7


ECE531 Screencast 2.2: Fisher Information

A Definition of “Sensitivity” (Scalar Parameter – part 1)

◮ We require the likelihood function pY (y ; θ) to be differentiable with


respect to θ for each y ∈ Y.
◮ Holding y fixed, the relative steepness of the likelihood function
pY (y ; θ) (as a function of θ) can be expressed as

∂θ pY (y ;θ) ∂
ψ(y ; θ) := = ln pY (y ; θ)
pY (y ; θ) ∂θ

◮ Two problems:
1. We don’t care if the relative steepness is positive or negative. So we
should square this result to give a non-negative measure of squared
relative steepness.
2. This “relative steepness” ψ(y ; θ) or “squared relative steepness”
ψ 2 (y ; θ) is only for a particular observation Y = y. We need to
average this result over Y (holding θ fixed).

Worcester Polytechnic Institute D. Richard Brown III 2/7


ECE531 Screencast 2.2: Fisher Information

A Definition of “Sensitivity” (Scalar Parameter – part 2)


◮ Averaging the squared relative steepness: We compute the mean
squared value of ψ as
I(θ) := E[ψ 2 (Y ; θ)]
" 2 #

= E ln pY (Y ; θ)
∂θ
Z  2

= ln pY (y ; θ) pY (y ; θ) dy
Y ∂θ
!2

p (y ; θ)
Z
∂θ Y
= pY (y ; θ) dy
Y pY (y ; θ)

◮ Terminology: I(θ) is called the “Fisher information” that the random


observation Y can tell us, on average, about the parameter θ.
◮ The Fisher information I(θ) is only a function of θ (and other known
quantities). It is not a function of y.
◮ Fisher information 6= mutual information (information theory).
Worcester Polytechnic Institute D. Richard Brown III 3/7
ECE531 Screencast 2.2: Fisher Information

Example: Single Sample of Unknown Parameter in Noise


Suppose we get one sample of an unknown parameter θ ∈ R corrupted by
zero-mean additive Gaussian noise, i.e. Y = θ + W where W ∼ N (0, σ 2 ).
The likelihood function is then
−(y − θ)2
 
1
pY (y ; θ) = √ exp
2πσ 2σ 2
The relative slope of pY (y ; θ) with respect to θ can be easily computed

∂θ pY (y ; θ) θ−y
ψ(y ; θ) := =
pY (y ; θ) σ2
The Fisher information is then
I(θ) = E[ψ 2 (Y ; θ)]
2
−(y − θ)2
Z ∞
θ−y
 
1
= √ exp dy
−∞ σ2 2πσ 2σ 2
 2
−t
Z ∞
1 1
= √ t2 exp dt = 2
2
2πσ −∞ 2 σ
Worcester Polytechnic Institute D. Richard Brown III 4/7
ECE531 Screencast 2.2: Fisher Information

Fisher Information: Alternative Derivation


∂2
If θ) exists for all θ ∈ Λ and y ∈ Y and
p (y ;
∂θ 2 Y
∂2 ∂2
Z Z
p Y (y ; θ) dy = pY (y ; θ) dy = 0
∂θ 2 ∂θ 2
then we can derive an alternative (equivalent) expression for the Fisher
Information as follows:
" 2 !2 #
∂ ∂
∂2 p (Y ; θ) p (Y ; θ)
 
∂θ 2 Y ∂θ Y
E ln p Y (Y ; θ) = E −
∂θ2 pY (Y ; θ) pY (Y ; θ)
" 2 #

2 pY (Y ; θ)
= E ∂θ − I(θ)
pY (Y ; θ)
∂2
p (y ; θ)
Z
∂θ 2 Y
= pY (y ; θ) dy − I(θ)
y∈Y pY (y ; θ)
∂2
Z
= pY (y ; θ) dy − I(θ) = 0 − I(θ)
∂θ2 y∈Y
h i
∂2
Hence I(θ) = −E ∂θ 2
ln pY (y ; θ) .
Worcester Polytechnic Institute D. Richard Brown III 5/7
ECE531 Screencast 2.2: Fisher Information

Additive Information from Independent Observations

Lemma
If Y1 and Y2 are independent random variables with densities pY1 (y ; θ)
and pY2 (y ; θ) parameterized by θ then

I(θ) = IY1 (θ) + IY2 (θ)

where IY1 (θ), IY2 (θ), and I(θ) are the information about θ contained in
Y1 , Y2 , and {Y1 , Y2 }, respectively.

Corollary
If Y0 , . . . , Yn−1 are i.i.d., and each has information I(θ) about θ, then the
information in {Y0 , . . . , Yn−1 } about θ is nI(θ).

Worcester Polytechnic Institute D. Richard Brown III 6/7


ECE531 Screencast 2.2: Fisher Information

Appendix: Useful Calculus Results

These results were used in the derivations:



∂ ∂θ f (θ)
ln f (θ) =
∂θ f (θ)

" #

∂2 ∂ ∂θ f (θ)
ln f (θ) =
∂θ 2 ∂θ f (θ)
  2
∂2 ∂
∂θ 2 f (θ) f (θ) − ∂θ f (θ)
=
f 2 (θ)
∂2 2
f (θ)

∂θ 2 ∂
= − ln f (θ)
f (θ) ∂θ

Worcester Polytechnic Institute D. Richard Brown III 7/7

You might also like