You are on page 1of 212

ADVANCED

THEORY

OF

STATISTICS

1. Distribution Theory
2. Estimation

3. Tests of Hypotheses

"

Lectures delivered by D. G. Chapman


(Stat 611-612)

1958-1959
North Carolina state College
Notes recorded by R. C.. Taeuber

Institute of Statistics
Mimeo Series No. 241
September, 1959

'J
"

- ii TABLE OF CONTENTS

I. PRELIl1INARIES
Concepts of set theory
Probability measure
Distribution function (Th. 1)
Density funct ion (def. 8)

Random variable (def. 9)


Conditional distribution
Functions of random variables
Transformations
Rieman - Stieljes Integrals
0

o


0

"

PROPERTIES OF UNIVARIATE DISTRIBUTIONS -

III.

1
3
5
8
10
10
12
12
15

CHARACTERISTIC FUNCTIONS


0
0
0
0
0

0
0
0
0

Standard distributions

Expectation and moments
Tchebycheff 's Inequality (Th. e + corollary)
Characteristic functions
Cumulant generating functions (def. 17)
Inversion theorem for characteristic functions
Central Limit Theorem (Tho 14)

Laplace Transform "


Fourier Transform "

Derived distributions (from Normal) "
0

20

" 21
.. 24
e 25
28
0

I)

(Tho 12)

C)

30

33

36

36
36

CONVERGENCE
Convergence of distributions 0 0 0 0 0 e 0 39
Convergence in probability (def. 18) " 41
Weak law of large numbers (Th. 16)
41
A Convergence Theorem (Th~ 18) 0 0 0 0 0 44
Convergence almost everywhere (def. 20)
e
49
Strong law of large numbers (Th. 20)
51
Other types of convergence 0 0 0 0 " 0 0 53
0

IV.

"

"

POINT ESTIMATION
Preliminaries

Ways to formulate the problem of obtaining best estimates
Restricting the class of estimates
optimum properties in the large
Dealing only with large sampla (asymptotic) properties
Methods of Estimation
I'1ethod of moments e e
"
Method of least squares 0

Gauss-Markov Theorem (Th. 21) ,

"
Maximum Likelihood Estimates


Single parameter estimation (Th. 22)
Multiparameter estimation (Tho 23) 0 0 0
0

III

I)

ill

55
56
58
60


It

61
61
,.

61
64
65
70

-t

iii Unbiased Estimation


Information Theorem (Cramer-Rao) (Th. 24) 0
Multiparameter extension of Cramer-Rao (Tho 25)
Summary of usages and properties of m.l.e. 0 0
Sufficient statistics (def. 31)
Improving unbiased estimates (Th. 26) 0 0 0
Complete sufficient statistics (Tho 27)
2 Non-parametric estimation (m,vou.e.)
X -estimation 0 0 0 e .. 0 0
Maximum likelihood estimation 0 0 0 0 0
"
Min:unum
X2
2
Modified minimum X
Transformed nun:unum X2
Summary 0 0 0 0
0

..
..

Examples ".. .
Minimax estimation
c
Wolfowitzts minimum distance estimation
0
0

DO'

V.

~EST1NG

..

..

"

fi

..

..

..

76

80

85

86
88

92

95

98
102
102
103
103
104
10,
108
110

OF HYPOTHESES -- DISTRIBUTION FREE TESTS

Basic concepts
113
Distribution free tests
U5
One sample tests
o. 117
Probability transformation
~ 121
Tests for one sample problems
121
Convergence of sample distribution function (Tho 31)
124
Approaches to combining probabilities o. 129
Two sample tests 0 0 41 0 0 0 133
Pitman I s theorem on asymptotic relative efficiency (Tho 32) 140
k-sample tests 0 0 .. 0 0 0 0 ,. 0 ,. 146
0

....

.'

....

..

VI.

{PARAMETRIC TESTS -- POWER


Optimum tests 0 0 0 0 0 0
Probability ratio test (Neyman-Pearson) (Th. 33) . . . . . .
U,M.P. U. test (Th. 35) 0 " 0 0 0
Invariant tests
Distribution of non-central t (~)
e
Tests for variances
0
0
.
Summary of normal tests .. " ..
Maximum likelihood ratio tests e 0 0 e 0
General Linear Hypothesis 0.... 41 0 ..
Power of analysis of variance test
e
Multiple Comparisons 0 .. 0 0 0 0 0
Tukey's procedure " e .. ()
2 Scheffels test for all linear contrasts (Tho 38)
X -tests 0 0 .. 0 0 .. 0
Power of a simple X2-test e 0 0 0 " 0 ~
One sided x2.test

0
o Q o "
0

41

....

'"

..

..

..

41

148

149

156

161

163
168
169
173

175
182

186
186

187

191
191
192

- iv x 2.test of oomposite hypotheses 0 0


Asymptotic power of the x2.test (Th. 39)
Other approaches to testing hypotheses 0 0 o 0

VIII.'

I{(SCELLANEOUS~

193
0

194
197

CONFlDENCE REGIONS

Correspondence of confidence regions and tests (Tho 40) 199


Confidence interval for the ratio of two means .. .. 0 200
List of Theorems
List of Definitions
List of Problems

202

.,

204
206

CHAPl'ER I ..... kELIMINARIE8

I:

Preliminaries:

Set~

Def",

A collection of points in

1/ 8J.

lie

(Euclidian k-dimensional space) -. S

+ 8 2 is the set of points in either or both sets.

Sl + 8 2 is the set of points in both

(Sr e
in S:t

it' 81 is contained in 8

points in 5 2 bllt not

Exercise 1/ Show that 8

1 '" 8 2

~S2

c.l

8:l.

and 52'

8 ) then 8 2 .. ~ is the set of


2

82 + ~

I:

8 251

There is also the obvious extension of these definitions of set addition and multiplication to 3 or more sets.
CD

~ 18n

II;l

the set of points in at least one of the 8n

&:

the set of points common to all the Sn

CD

-nn'J 18n
s*

is defined as the complement of 8 and is the same as ~ -

Proof':

denote

Let

X. 8

(~ + 5 2 )* means that x is not a member of either ~ or 8 >


2

iGeo x

!l1is

s.

J 81J

therefore x e

an element of n

82

'*

~ ~ 1

e 8 2*

since x is common to both

Sr*

8 2*~ x

B ~

* 82*

I;

To complete the proof

=- ~

x e 81* 8 2*

* and x

:l 6

~ x_~

e 8 2*

andxJS2

. 7

x /.

(f\

::::::::r

(81 + S2)*'

+ 82 )

Exercise

2/

Show that 52 - 51 - S2~*0

Exercise

1/

In R define 81 .. {x,y: x 2 + y2 ~
2

1}

i.e~ the set of points x,y subject to the restriction x 2 + 1'2 , 1

~ 'xf~o8, t1"~ 08}

82

{x,;r

{x,y:

z O}

Represent graphically ~ + 8 2,
Def. 2/ If ~ c. 82 c 8

S:J.* 8 2"

S3S251~ 81 8 2 - SlS;"

(an exploding family)

co

We

def'ine~

lim Sn'" ~ Sn
n-l

n~(X)

CD

We

define~

lim Sn

n~tO

fr.B

n .1 n

Such sequences ot sets are called monotone sets ..


Exercise
(aj

hi
{x,;r: txt

8how that the closed interval in R


~ loll
2
as an inf1hite product of a set ot open intervals:

Ans~
(b)

8n [x,p1'g

Ixl

<:

1 +~,

lyl ~l

may be represented

1Y/< 1 + ~

Show that the open interval in R2 {x,y~ JxJ.( 1"


as an infinite sum of closed intervals~ .

11'1 <1

can be represented

-.1-

Probability is generally thought of in terms of sets, which is why we study sets.


Def ~

3/ Borel

Sets -

the family of sets which can be obtained from t.he family of


intervals in lie by a finite or enumerable sequenoe of operation
of set addition, multiplication, or complementation are called
Borel Sets.

The word multiplication oould be deleted, since multiplication can be performed by complementation, eog4l:

(.tIL * + S
-~

Def.

4/

*)*

(s*)*

S. S
-J. 2

~(S) is an additive set function i f

11 for

each Borel 8et A(S) is a real number, and


2/ it ~,\I 8 2, 0 0 are a sequence of disjoint sets

Examples: -

area is a set function

- in!l:L

C?'

~(S).

A(S). !f(X) dx

2Y

f:

dx

~ef3

5/

peS) is a probability measure on ~ i t

1/ P is an addi tive set function


2/ P is non-negative

3/ P(I1c) '" 1

fA will denote the empty set which contains no points,


Ex!!

C;;

P() .. 0

Exo

6/

i t ~ CS2 then P(S:!) ~ P(S2)

Lemma 2/ P (~ .;. 8 2 +

Problem 1:

Frovc lcr.una 2.

0)

~ P(~) + P(S2) +

1 oeo

til

R *;
k

9J

1\

t:t

R
k

Proof:

case I Define:

~ C 8 C 8,

~.

Sf

st
3

82 - ~

S3 - 52

etc.
CD

These sets Stare disjoint; &lso


n

n-l

CD

Sn

n-l n

CD

P ( 11m Sn)
n~CD

(~

8n )
-1

00

(~ S~)

n-l

the additive properly of P

P (~) + P (S2-51 ) + P (8 - 8 2 )
3

P(~,>

..P (8 ) + P(S2)
1
- P(S2) + P(S3)
etc.
P (Sn) after n steps

lim P(Sn)
n~

Case 2:

~::>

8 2 ,:) 8

Problem 2/ Prove lelllJl8 3 for case 2.


Defg

6/ Associated with any probabUity measure

rex)

peS) on

defined by
F(x.)

pc

00 I

x).

F(x) is called a distribution function -

dolo

Ii. there

is a point function

-sTheorem 1/ Any distribution function, F(x), has the following properties:


1. It is a monotone, non-decreasing sequence.
2e F (. CD) 0; F (+ CX +1
3. F(x) is continuous on the right.
Proof:

1/ For Xl < x 2 we

The interval (.
From exercise

have to show that

00" ~)

6 we

PC:;) ~ F(X2 )

C "'he interval (.

have that

P(I:t)

cx),

x2 )

~ P(I )

Therefore F(~) ~ F(x2 )


2/a/ It we define Gn as the interval (-

G a no+CD
lim G
n

cx)"

n)

n:d.,2 p 3,uuo

(the empty set)

~~ .F~n) l~cl(Gn) P(n~~ Gn )

From lemma 3

p(G) P() 0
b/ Follows in a simUar tashion by defining Gn
3/

= (-

CD" n)

Pick a point a ~ for this point we want to show

lim F(x)

= F(a),

JC.7a
x>a

Consider a nested sequence


Then

tn~O,

sn> 0

lig14(CX)a + en) =- F(a) is the property to be shown.

If we define ~ (-

lim Hn H.

n?CD

IX)

a + t n)

interval (-

n:l,2,,3,uuo
ex),

a)

lim P(~) P(lim \) P(H)


Therefore

l~~ a + en) F(a)

lemma 3

.6.
Problem 3/ Show that
F(a) .. lif ~(f) + P {Cal}

Where (a
is a.

x<a

Or in familiar terms F(a) F{x - 0) + Pr(x

is the set whose only point

aD

a)
Where F(x .. 0) is the
limit from the left #

Theorem 2/ To any point funotion F{x) satisfying properties 1,,2, and 3 of theorem
1, there oorresponds a probability measure peS) defined for all Borel
Sets suoh that for any interval (- ex>, x)
p(- co ~ x) F(x) ,

Proof omitted Theorem

3/ A distribution

see Cramer p.

53

referring to p. 22 II

funotion F(x) has at most a oountable number of dis-

oontinuitieso

Proof~ Let vnbe the number of points of disoontinuity with a jump,.


then v

n whroh is what we have to show(1

Suppo,se the oontrary holds, i.e 0

:> n

1#

Then if we let Sn be the set of such disoontinuities, we have


1 P{R:t)

P(Sn' ">

(n) ~ 1

whioh is a oontradiotion,
ex>

ex>

Therefore s the total number of discontinuities. ~ V <:. ~ n


n 1 n
1

co

where

is the sum of the integers whioh is a oountable sum_

Notationg

[ ]

square braokets -- means the end point is not inoluded in the interval -ioe cr , an open interval"

round brackets -- mean the end points are inoluded in the interval, ioe."
a olosed interval.

(a, bJ is the interval a

to

b, inoluding a

but not b.

In Rk to each probability measure peS) there corresponds a unique distributic.

function

7
The interval is the set of points in

1\
i .. 1,

Theorem

4:

F{~ .. x ' ~

&:1

2~ ~

0,

~) has the following properties:

10

It is cant inuous from the right in each variable

2.

F{- co, x 2' It . , x k ) F{~, co,


F{+co, +00'800, + 0 ) - 1

.3.

~ F{al' a2~ ", ~) ~ 0

3:1

0,

xk )

II:

..

..

see p. 79 Cramer

i.eo, the P measure of any interval is non-negative"


Conversely if F(X ~ x ' " "
l
2
P-measure defined by

0'

xk ) has these properties.ll then there is a unique

That is, I is the interval [ . co, -

00,

0,

~).

0 0; ",

('1.'

x2 '

0 0' ~).

&2 + h2~

-----~-------6---~(~ +

h.t,

4>

h 2)

In R
2

P(I) .. F(~ + ~,

~ F(~, a 2)

2 + h2 ) - F(~, a 2 + h2 ) - F(a1 + ~I a 2 ) + F(~I a 2 )

-8-

,,-P(Sl" ~
-F(sl +

-I-

h:t,

+F(~.; a 2.9

h2" &.3 + h3 ) F(~

1-

b:Ls

~i &.3 + h.3)

8,2 .. h2 , 8.,3)

3 + ~) + F(~D

8.

8.

2 + h2~ a.3) <c. F(~ .. ~) Q2' a.3)

...F(al , $2" a )
3
CII

~ F{~I

B- 2D

a,)

TIle proof of theorem 4 is by analogy with the linear case (theorem 1),
F(xl' +

Q)"

~ + 00) p(

CD" xl)

.. the marginal distribution of xl


(similarly for other dimensions)
Def.

8:

It F is continuous and differentiable in all variable s" then

is the density function of xl' x "


2

.!J

~,

Exercise 7~
F(x, y) .. 0

..

if

x ~0

ex + y)

for

.. 1 for x::>l,p '1) 1


Can this be a distribution function in R2?
How can the definitions be completed?

or

'1 ~ 0

O<x

~l

O<y

~l

- 9 -

Solution -- consider the marginal distribution of x


x +1
Fl (x) .. F(x" + (0) ~ F(x, 1 ) a=2
1)
F (0) .. (0
l

But in fact

ai

F(O, 0) .. 0
F(O, y) 0 for all y
Fl (0)

Therefore there is a contradictiono


function"

F cannot be a proper distribution

If F(x, y) is a proper distribution function, then the two marginal


distributions

F2 (7)
Proble!'l ~:

lit

F (+

00;

y)

If we define f(x, y) .. x + l'

must be proper and in this case


they break dowo
0 ~x ~1

o
.. 0

~l

elsewhere

Find F(x, y); F (x); and F (y)


2
l
Show that F(x" y) satisfies the properties of a distribution function..

.10 Delo

9:

Random Variable

We assume we have exper:lments which yield a vector valued set of observations

1.

la

(Xl' X2' II 0 <>1 Xlt ) with the properties:


For each Borel set S in ~ there is a probability measure peS) which is
13

the probability that the whole vector.! lalls in the set in S


(P(S) is non-negativeJ additive, and P(~)

then the combined vector

(.!ll

.!2J

()

1).

Cramer pe 1,2-4
axioms 1 and 2

Il'}on) is also a random variable

--

Conditional Distribution
( .~ Y)
- -

are random variables in Rk _:; R. .--~ -1(2

~2'

Let S and T be sets in "


!!et. 10:

It'

p( X belongs to 80" then we detine conditional probability


p( YeT

I xes).

P(Y C T.y XeS)


p( XeS)

We show that P( Y c:. T , XeS ) does satisfy the requirements of a probability


measure
1.. It is non-negative since p( YeT, XC.S ) is non-negative.
2... It is additive since p( Y c: T/1 xes) is additive in ~ ,.
2

P( Y C Tl J xes ) + P( yeT 2
,3-

P(Y c~,!) X CS)

p(X cS)

P(X <;.S)
If p(Y C T)

P(x

Co

1 X c: 5

CI

P(l CS)

> Owe could also define

SlY C T)

PCI c. SA yeT)
-P(YeT;

) P{ Y C Tl + T2 1 xes )

-11In familiar terminology what we are saying is that

Pr(A

B)

~tA)B)

or

Pr(A$ B) .. Pr(A' B) Pr(B).,

If we have the corresponding distribution functions

F(x, y); ~ (x) ... F(x, +

CD

>;

and F2 (Y) F(+

00,

y) then:

notation:

..- Oapital ktin lettet'~ used tor ranflom variables in general.


-- Small latin letters used for observations or specific values of the
random variables 0
Pr(X ~x) .. F(x)

Def. 11 -- extension:
In the case of n random variables,

Note':

X $
2

..

I)

these are independent if

Three variables may be pairwise independent, but may not be (mutually)


il'ldependent -- see the example on po 162 of Cramer..

If density functions exist, then

Note:

1.J.. ~

~,

X $
2

.,

independent means that

The fact that the density functions factor does not necessa.rily mean
independence since dependence may be brought in through the limitso
y
eog. X and Yare distributed uniformly on OAB

lex, y) 2
O~x~l
O~y~x

- 12 -

Var1ables~

Functions of Random

g is a Boral measurable function if for each Borel set S~ there is a set


such that

Sf

xeS' ~ 8(:)88
is also a Borel set o

U LLU,--St

Sf

A notat ion sometime s used is that S i


S under the mapping go

Sf

g-1 (S) -- that S i is the inverse image of

Now consider Y .. g(X) where X is a random variable.

.=

Pr(yc S) = Pr(x cS i)

p(S')

Therefore, arty' Borel measureable function" Y .. g(X) of a random variable, X" is


itself a random variable.
Pr(y

S) p [g-1(5)J

This extends rea.dily to k dimensionse

Transforrnations:
We have X and Y whioh are random variables with distribution function F(x" y) and
a density function I(x, y)q

Where ~l and 2 are 1 to 1 with';'" ~; are continuous" differentiable;


and the Jacobian of the transformation is non-vanishing."

dX

'Ox

d~

'OY

~Y

1:1

a.;.

o~

We then have the inverse functions

X ..

The density function f(a, b)


f ['t'l (a, b);

III

'f1 (.;."
'f2 (..<,

~)
~)

of the random variables ..<,

2(a, b)]

IJ

is

.. J3,-

Solution:

Consider alsoW

III

X .. Y

Consider the joint distribution ot (Z, V)

z X+

(0, 1)

f-------44!~$ 1)

t(x"Y) 1

V-x-y
o Z+W
C02

Ja

X Z -2 W III Y

III

(:..,,0)

22
1

l'lI

...

2'-2
Density of

IJI

Z" Will f(x, y)

-1

The limits of Z and W are dependent

Z X+ Y
WeaX..,Y
III

It Z .. z, then lIT takes on values from (0 .. z) thru 0 to z ... 0, so that

for Z
II
Q

III

Z ~

f(z"W) ..

with limits ..z.s; W'$ +z


z~l

Since we started with only Z, and "a.rtificially" added toT to get a solution" we must
now get the marginal distribution of Z (this being what we desire)o

z
z
F(z)

) F(t)dt

- 14
If X

'> 1, then

(z - 1) - 1

F(z)

takes on values from

to 1 - (z - 1)

(2-z) dz

2-z 2 .

2 . ~!L.

1
1
(2-Zi. +2'.. T2
~

z2
F(z) =T

1:1

f(z)

1 ..

o
(2_z)2
2

1:1

or from (a-2)to (2-z)

~z ~

(2-!l:.

-~

1 ~

.. z

2-z

1~z~2

~z ~l

See p.

24, ~ 6

in Cramer

F(z)
density

DoFo

2'

z
1

Joint density of

Z, W
fez, w}

2
1

1:1-

1
Oiz~l

... z-s.w'S.z
1

-"2

z - 2

~w ~

2 .. z

If the transformation is not 1 to 1 (that is J .. 0) then the usual devi. e to avoid


the difficulties that may arise is to divide ~ into regions in each of which
the transformation is 1 to 1, and then work separately in eaoh regiono

ioeo" consider in

R:J.

2
Y .. X

We should consider 2 separate


cases:
x.<o
Cramer-

p. 167

X~ 0
Riemann-Stie~jes

Integral:

0.,

Let F(x) be a d.f with at most a finite number of discontinuities in (a,9b) and
let g(x) be a continuous function, then we can define

g(x) dF(x)

Cramer
PCll 71-74

.s follows:

Divide (a , b) into

~l <.

-s;

sub-intervals xl' x 2"

but as

n ~ 00" !J. ----? 0

They can be shown to have a common limit.

0'

1i.

of length ~ !J.

S is increasing, S is deoreasing

So the common limit is called the R-S integral,

b(

+00

Also define

g(x) dF(x) lim

) g(x) dF(x)

a-7 -00 a

b-; +00

"'00

provided the :j.imit exists


and in general

b(
)

rf..x )

dF(x) bas all the usual properties of the fam:lliar Riemann


integral 0

If F(x) has density f(x) which is continuous except at a finite number of points,
,
F(x ) - F(x _l ) -.t(JI') (Xi - x _l )
x i _l < x
xi
i
i
i

t~n

<'

-!', (x ) ~ (x)

j' .. ~

[int g(x)]

~li

g(x) dF(x)

lim

l
D

f (x i) ]

fjx

g(x) .!(x) dx

ordinary Riemann integral.

JC:l.$ x 2,

If F(x) has only jumps at

.,

nand elsewhere

is oonstant

If g(x) is oontinuous then this limit (the R-S Integral) exists o Also, if g(x)
has at most a finite number of discontinuities and so does F(x) and they don1t
ooincide, then the R..S integral existso

Def. 12:

X is a disorete random variable if there exists a countable set of


points, ~, x 2' C I I , x n with Pr(X .. Xi) .. Pi and ~ (Pt) .. 1
[elsewhere F(x) is oonstant, ioe e FY(x)

=oJ.

~I

-I--..-------"~~ -

--'

...--- '

.----,r--__

g(x)

for such a discrete random variable, the R-S integral reduoes to a sum:

,J
b

g(x) dF(x)'" lim


n.....,co

~ g(x~ f) [F(X~) - F(X~"l)]

.. 17 r

rr

Where the xi are points of division of (a, b) and xi


point in the i th interval

is an intermediate

summed over the set of points xi


in (a, b) - .. the points where
there is some probability"
Det ~ 13: X is a continuous random variable if F(x) is continuous and has a
derivative f(x) continuous except at a oountable number of points e
F(X ) - F(X _ )
i
i l

aa

t(X~ t) [ xi
x _

i l

.i X _

i 1

(the theorem of the mean)

~ %~ r ~ Xi

g(x) dF(x)

.
J

Va can extend this definition to k-dimensions readily by writing:

g(~1 ~~6j ~) d . ~

x1 ,

HtJ,

For a del

II

Xt<:

F(x11 , x k )

of '\ see po B

.. 18 If F(~, x ' , ~) is continuous and the densitj f(X l , x '


2
2

It

.,

exists and is continuous}} then


b-

jrg('i'
a

0'

~) ~o.o ~ F('i'

~l ~
0

g('i"

~) f('i'

~)
0,

and this extends easily to the k-diU!ensional case,

So

In

l1.

0,

.,

~) dxl' ~

ak

d F(x) F(x) - F(- ClO) F(x)

-00

ji
b

dF(x). F(b) - F(a)

a.
+CD If we let b ~ .,. 00,

&.-7- 00

dF(x) 1

+00

d'i

~F('i'

0,

~)

-00

Consider k 2, and the marginal distributions


+00

Fl(xl ) F('i' + m).

)
-CD

lim

dFx (x1Si x 2 )

a1-CD
b -7+CD

that we have:

xk )

.. 19 ..

a.., ..co

b-7 +00

Fl (~, + m ) .. Fl (~" .00)

This also extends readily to

R.k

with xl held f:ixed.


If the density function exists 3 then this reduces to a k..l integral

+00

+00

0' ~) ~2'

0'

..m

- 00

Problem

f(>i' x 2, ' ,

6~

if Xl'

show that

U$

-2

ln Xi

have independent, uniform on (0.. l)distributions,

has a X2 distribution with 2n d.f.

and indicate the statistical application of thiso


see~

Snedecor ch~ 9
Fisher ...- about pG100
Anderson + BancrOft -- last chapter of section 1

From problem $b

Z .. -2 lnX Y

-2 ln X + ... 2 In Y

or is the sum of 2 X2 with 2 d.f" each


Referenoes on

integI'als~

-... Cramer pp. 39-40


-

Sokolnikoft -

Advanced Calculus -

ch.

.. 20 ..

Chapter II
Properties ot Univariate Distributionsl Characteristic Functions
Standard Distributionsl
A"

Trivial or point mass (discrete)


Pr

B0

[X ... a]

x.(a
x~a

Uniform (contimous)

F(x) = 0

x <0

F(x) ... x
1

rex)

C.

F(x) ... 0
F(x) 1

~x ~

>1

Binomial (discrete)
Pr

[x

sa

k]

.(:)pk(l

~ p)n-k

o ~p

] (:)pk (1 _ p),,"k [<l-P) + pJ n


D

1n

kaO

~ ( : )pk

(1 _ p)n-k

;; 1

is an identity in PI n

k=0

Do

Poisson (disorete)
'\

ct

00

e- A

'\ k

(X)

~ e->.. ~ .. e ->.. ~
kaO
E.

k~

k:

k~O k~

=1,

2,

.o~, 00

A >0

->..>..
.. e

e 1

Negative binomial (discrete)


Pr

[x . . kJ

r
. . ( r+k-l)
r-l
p

(1.. p)

k ...

l~

2, ..

oe,

00

O<::p<l
r is an integer
Example.

Draw trom an urn with proportion p of red ballS, with replacement, until
we get r reds out e The random variable in this situation is the number of
non-reds drawn in the processJ to have k black balls means that in the

first r + k - 1 trials, we got r .. 1 reds and k blaoks, and then on


the last trial got a red, the probability at this is

This is what is referred to as inverse sampling in that the number of


dofectives is speo:!1'ied rathez' than speoifying the sample size whioh
is then sorutinized for the mmber of defeotives.

F.

Normal distribution (oontinuous)


(x_ij)2

rex)

==

J'2ia

cl-0:>

Problem 71

<:

X <0:>

Prove that

t:~l

pr (1 - p)k 1

i"e., is an identity in r, p
Hints
Def. 14:

is in the Mmc ... express (a+b)-rl in an infinii.te series.

It' X is a random variable with distribution F(x) and if'

..

g(x) dF(x)

exists, then we define the expectation of g(X) as

g(x) dF(x)

this being the R .. S integral,


if X is oontinuous

E [g(X) ]

Cl

-co
if X is disorete
Problem 8,

Given

F(x)

==

E [g(X)

1 +

'2'

rex)

dx

-OD

v=O

g(xv ) Pv

x <.:0

1/2
Q

g(x)

)(

ffn

S .
o

x == 0

t 2/2

dt

>0

(This is a oensored normal distribution -- i.e., all the negative


values are conoentrated at the origin)

.. 22 Find,
Def .. 153

E(X)
exists it is defined to be the kth central moment and

If E [X - E(X)]k

"'k0

is denoted

If E(X)k exists it is defined as the kth moment about the origin" and is

denoted
Exercise 9g

Find E(X) for each of the standard distributions/1

53

Theorem

"\0

E [g(X) + hey)] .. E [g(X)] + E [hey)]

Proofs

Let F(x, y) be the joint d.f Cl of X, Y and F and F2 the marginal dof
l

E [g(X) + hey)]
III

Theorem 61

II

If X, Yare independent random variables, then

Proof.
Corollary.

[g(X)+ h(y) ]dx/(X,y) =jJg(X)dx/(X,y) 1f(y)dx/(X,y)

g(x)dl'l (x) + h(y)dy F2(y) .. E [g(X)] + E [hey>]

E [g(X) hey)] ..

E[g(X~ E~(y)J

See Cramer, po 1730

If X and Y are irdependent random variables, then

Var(X+ Y) .. Var(X) + Var(Y)


Mornentsa

'\

E(X) .. mean
"'-2 .. E(X ... lJ.) 2 .. variance
=:

~ .. E(X _ ~)3

""4 ..

E(X ..,

~)4,

etc.
for the normal distribution -- N(O, 1)
..x2/2

t(X! 0, 1)"

.J 2n

_1
{2i"
00

_..k
E(X--)

1
.---..-

all odd moments (k odd)

=0

(>

by a

"s~etry"

argument,

23
E(X2)

E(X 4)

III

dx = 1

{2n
3

or in general
E(X2n )

from integration by parts

2
2n -x /2

III

{2n

dx = 1 :3

5 ..

z: (2n - 1)

which can be shown by induction.


Theorem 7a Let.,(o III 1"
F(x), i.e o ,

~" ~,

u.,

be the moments of a distribution function

ilk

CD

then if' for some r

:>O~ ~ -

converges absolutely then F(x) is the


k=O
k~
only distribution with these moments~

Proof~

See Cramer, p. 176.

Example I

N(O, 1)

~k+l

=0

co

since odd terms drop out.

k=O
co

III

~ _~(_r2"",,)_k_ _
k=O 2k- 1 (k-l)! .2k

2
{r )k

~ ~(:;)k
k.

_r 2/2
III

::I

==

exponential series

- 24 converges absolutely for all r, therefore the only distribution


with these moments is the d.f. with the density
,

"

f(x)

1lI..!.-

V2it

-x /2

ilts., the normal

Problem 91

Find the moments of the uniform distribution and show that this is the
only d.f. with these moments.

Theorem 8a

(Tchebycheff's)

If g(X) is a non-negative function of X then for every K> 0

Pr [g(X)

~KJ ~

E g(X)]
K

00

Prooh E [g(X)]

g(x) dF(x)

Let S be the set of values of X where g(X)


Pr [g(X)

't KJ ? !.(X)
S g

dF(x)

.f

dF(x)

III

'.- Pr

Corollary II

since the smallest value of g(X) in S is K


KPr

[g(X)~ K]

E G(X) ]
K

The above (th. 8) converts readily into the more familiar form

Proof,
g(X)

~(X)~KJ !

~K

III

(See p. 182 in Cramer) setting


~)

(X .,

Pr [ (X - lJ.)

(k 0)

J~ TI2

'" k

taking the square root of the left hand side

- 25 -

X
n

II

munber of successes in n independent trials with a probability of


success in each trial I I p

O'2x = np(l-p)

Proof &

From corollary to theorem 8


Pr

[I :.r - pi ~ k

Choose s

=5

0"]

(,

or k

kJ p(l-p)
n

;l
1
II -

-J8

= e or

n = p(l-~l
8 e

Hence 1 n is chosen this large~ from the corollary to theorem 8 the stated
probability inequality follows o
Note I

Theorem 9 could be rewritten

n=~
8 s

Characteristic Functions 3
Def o

168

Characteristic functions

X(t)

=]

eixt dF(x)
or since e ixt = cos xt + i sin xt

CD

(X)

X(t) =

-00

cos xt dF(x) + i

-r~ sin xt dF(x)

- 26 -

the successive derivatives of (t) when evaluated at t = 0 yield the moments of


F(x) except for a factor of a power or tti".
d

(t)

dt

J ix e uro dF(x)

= v (t) =

m
-(I)

CD

' (0)

= i

x e O dF(x) i ~

-00

in general

the moment generating function operates in the same manner, except it does not include
the factor tti" $ and is therefore not as general in application,
MaF

= M(t) = E

(e R )

00

= .~

ext dF(x)

if' this integral exists

-00

and operates by evaluating successive derivatives with respect to t at t 0


Lemmas

i t E(Xk ) exists, then k (t) exists and is contimous.

also true.
Examples I
1.

Trivial distribution,

Pr

[X = a]
(t)

CD(
III

-00

20

Binomial.

1
xt

eJ.

The converse ~

- 27 co

3.

(t)

Poissona

k
eitk a-A _A

'"
~

III

kl

k=O

411

Normal (O~ 1) S

(t)

(1)( itx _x2/2 dx


1
)
e
e
-

1
III -

.J2n -00

1
=-

{2n

Fn

.~... e-

2
x

2' 2itx

dx

x2 ... 2itx + ( it l 2 + (it)2

e-

dx

setting y = x - it*

(Note the term in curly brackets is the integral of a normal density and equals one.)

The validity of this complex transformation has to be justif'ied o

If X is N(O, 1)

(t)

III

e-t

/2

2t _t 2/2
= - reI'
.)1

.),

(t)

= -e

(0)

=-

_t 2/2

2 _t 2/2
+ te"

22
E (X)
=i "

=1

See Problem 11.

.. 28 ..

Problem 10 s Prove

Problem 11.1

Show that

without using the transformation used in class


Hint. e itx oos tx + i sin tx
Theorem 108
AX +B (t). e

iBt

X (At)

where X is a random variable with a dof. F(x) and a characteristic function


(t)
y .. AX + B where A, B are constants

Proof:

i f X is N(O, 1) then Y a X + jJ. is N(jJ.,

Pyet)

I:

eitjJ. x(a t)

= e

Def 17 i
<)

i)

it

jJ. e

;:;e 1tjJ. - .

The cumulant generat ing function, K{t )" is defined to be i

K( t) :Ln ~(t)

Example. For Y which is N(jJ.,


2 2
t
K(t) itjJ. -

T-

i)

- 29 Note I

For further discussion of cumulants see Cramer or Kendall e

Note,

Originally cumulants = semi..invariants (British school name ~


Scandanavian school name) -- however, semi-invariants have been
extended so that now aumulants are a special case.

Theorem 11:

If

J1.P

X2J

0 UJl

Xn are independent

ran~om

variables, then

tr~

~f;.,~~.~ =.,.--1. ~AO~J'~. ~~}

.~x.-""
.i=l'
. J.

Ai

(the c.f e of a sum = the product of the individual c ,f' )


Proof,

=e

n
n
therefore we could say that Y is N(] j.(.i' ]
1

ai)

To justify this last step we need to show the converse of F(X),"""? X (t) ioe"
X(t)

~F(X)o

Therefore, we need the followirg lemma am theorem,

sin ht dt

- 1 b<.O
III

t
Proo.t'~

that

J(..(, ~)

~f3

III

a:>.

III

(-

cJ

0
+1

e-...(U

e-..al

cos

h '=0

b>O

~u

du

sm r U
u

du

'.,('>0
/

Note I

Differentiation under the


integral can be justified,

- 30 ..

J.

III

J.~+'P2

see tables or integrate by parts tWice

f~ dp =2~2
Let

djl

= arc

then J(.<., 0)

{3""'70

tan

~ +c:

e'<'u ~ 0 du ... 0

arc tan 0 + C = 0
o + C ... 0

Let J. ~O, aId put {3 ... h, then


lim J(J., h) -= arc tan:
0<.'--7 0

h>O

-T
n

... "' 2
Theorem 12&

h<O

It F(x) is contimous at

F(a + h) ... F(a - h)


If

1rfil

depending on h + or -

00

\-7 00n

(t) dt<", then rex) ='

Note~

Recall that (t)

1:..

lim

a - h, a + h, then

i.t

-1

....ita

SJ.ntht e

(t) dt

.1 .

-itxIt (t) dt

itx

(I)

lex) dx

~el~

16-1

Combining this with the above theorem means that given (t) or lex)
we can determine the other.

Prooll
Define

1
J=n

;T

sin ht e

-ita
wet) dt

-T
T
1
=n-T{

sin ht

-ita
Ell

CJ

/tx d

F(X)] dt

-31Interchanging integrals (reversing the order of integration) which can be justified


this becomes

or)

.l.
n

...ita

tee

...

J[J

The

T( sin ht

-00

1
a_

No1>e!

'-

s~

ht

S~ sin term is an

;it(x ..

Notes

_~_(
J

,r.b

dt

a)

[:OS t,(x - a)+

d F(x)

i sin t(x..a)] dt

_~

old ,function "".

The !~ cos term is an even function


- _;

itx

dF(x)

0. Ja2j

S;i~ h! cos t(x.-a) dt dF(x)

2 cos A sin B = sin (AotB) ... sin (A - B)


J

a}

1 fj

~ sin t(x-a.hl

sin t(x-a+h)

dt d F(x)

now take the limit as T 7(0


using the lemma just proved, with that h = (x-a+h) here
x=a+h

---11---------

For sin t(x-a+h),-_Xi_-_a_+h_<O...;..._I___-~_-_+_h)O~-

For sin t(X-a.-h)----__


For x in each region

both

= ... 2'n

x-a-h<..O

:1st part

n
'2"

I.. .

Xi_-_a_-....;h>O
both

lit

I
J 2nd
I

Whole integral (in


brackets)

part""

I
I

1t

.. 32 i,e o , in the region a .. h

rf
T

sin t(x

~a

~x~a

+ h) dt

+ h

T( 1

) t sin t(x - a .. h) dt ..
o

.. 0

elsewhere

a+h
1
Ja-

There

alh

a-h

CD

o.

dF(x) + ..

1l

dF(x) +

a~

0'1 dF(x)

.. F(a + h) .. F(a - h)
Proof of the second statement in the theorems
F(a +
h) F(a h) .

2h

ex>

sint&
ht e -ita rI.(t)dt
'P

-00

taking the limit of both sides as h -70

= 2l n

tea)

<X>J lim
_
h-.,o

sin ht e...
ht

ita

(t) dt

"-~=1
therefore
f(a) ...

-J

'2i'

-ita
e

(t) dt

Problem 12iLet Xi (i .. 1, 2,

eo,

a-l
f(x) ... a x

n) have a density function given by


O~xfl

a>O
n

Find the density of Y

-rr

i=1

X
i

X.~ are independent

(may need the re sult of Cramer p. 126)


Problem 13r

Define a factorial moment

B(X[r]) = E [X(X-l)
Define

F (t)...

.1
ex>

(x-rotl)]

(1 + t)X dF(x) as the factorial moment generating function.

- 33Find
F*(t) for the binomia1 and use this to get the factorial moments.

-It I

Problem 142 If (t) 1:1 e


find the density of f(x) corresponding to o Find the
distribution of the mean of n independent variables with this dof ~
Theorem 13: A necessary and wfficient comition that a sequenoe of distribution
fUnotions Fn tend to ., at every point of oontiwity of F is that n" the oharaoteristic function oorresponding to Fn' tend to a funotion, (t) which is continuous at
t 1:1 0 ~r tends to a funotion (t) which is itself a oharaoteristic function] It
Proof:

Omitted -

see Cramer po 96-98

This theorem says, if we have Fl

F2 F
3
(11. 2 3

ouu
fl

we go from the Fi to the i - .. observe that the i tend to a limit ~or exampl,
the normal approximation to the binomia~ -- observe that the limit is itself
a charaoteristio funotion [of the normaJJ .... then go to F by previously
disoussed teohniques.

Theorem

l4~

(Central Limit Theorem)

~" X u o u are a sequenoe of independently and identically distributed


3
random variables with a distribution function F(x) with finite first ilnd second

If'

X:t."

moments

(say mean

IoL

and variance ~2] then3

1-- the distribution of Yn =

vn- (*

~ Xi

IJ.)

tends" as n
normal distribution with mean 0 and variance (]2

2- for any interval (a" b)

1 is asymptotically N(O, i)

3-- the sequence [Y n


Proof:

Denote the o.f o of X. as(t)

to get the c.f. of Yn =

(t)
n

rn

~ (X...
~

IJ.)

~ (Xi""'

rn-

IJ.)

~oo"

to the

- 34 ...
it' we expand

(t)

where

(t)

in a Taylor series, we have

i2~ t 2/2

&:

1 + i ." t +

&:

1 + i lJ,t ... (lJ,2 + ci) t 2/2 + R(t)

~ ~O

." = loLl

+ remainder

= I0Io2+

as t?O

1\n (t) n ~n e -:I. fn t

In

&:

rn lJ, t

:: i IoL

Note: In (1 + x)

rn

x2

= Xae T

+ n In

(i>]

$6(*)

t + n In [ 1 +

ifi..

al :::". ) ]
lrn

... (1+2 + (j2 )t 2 +

2n

x 3 x4
'1"
.. "4

x
= x -"7

r6

+ In

+ R

.~......,

where
setting
ln

0{

~n

X!'I

vn

.rn

(t) = ..

1.

i/

i IJ. -

as x ~O

2
2
2 t
t
.. (IJ. + a ) ~ + R (~) we get
t:n
'In
2
2
2
t
~t + n i lJ. .., (~ + (] ) 2n + R (-)

[trn

(i j.L.i)

rn

3
~

It]

An + R ~t)

+nR(~ +V~~

n~CD

In

r (t) =- a 2
n

it2
"

lim
n-too

rA

'Yn

(t)

&:

e-

-r

.,.

Y2

denote this by

2 t2

lim

nJ/to.
n
~

Tn

Rill

since lim

n-)ro

\-;:::F/"" = 0
./ U

lim

R~ t

n~

Q)

~) l

&:

t
0
since as n~, ~~
which is the characteristic function of the normal
distribution N(O, (]2)

Note 8

See Cramer p

214""l215.

tI

The theorem says, for any given s there is some sufficiently large nee, a, b) such
that

b( -t2/202

a.
btl

j
dt <::::s

directly -- use Sterling's approximation


by characteristic functions

see Feller ch. VII

Problem 15 ..- says that the standardiZed Poisson random variable is asymptotically
normally distributed N(O, 1)
. ..,
.,
Cauchy Distribution - see problem
differentiable at the origin

14 --- has no mean or variance since e

lim
b~CD

- \t

is not

.!.
n

a"-7CD

=oo-co

therefore E(x) does not exist for a Cauchy random variable, even though the distribution is symmetric about the origin"
Theorem 15~

(Liapounoff)

If ~I X2,
2
0i and with

e&o

X are independent random variables with means

P~ = E

I Xi - i 1
8

3 4.. CD and lim

~p3

(~

VI -']00

then
Y

"O'i

Proof l

an

""

~X. -~J,t..
:L
)
:L

=c

;r:L

is asymptotically N(O, 1)

1/2.

See Cramer p tl 216-217 ..

."
3/2

and variances

- 36-

ft.Problem 16a Define M*(t)

III

E(Xt ) as the Mellin Transform


III

k xk- l

III

...

If X has density f(x)


then M*(t)

III

k~t

It X has dens! ty f{x)

then

Wet)

0 ~X~

&~

k2. ~"'l J,n x

O<x~l

Use this to find the density of Y


density f(x)
k x k-l

III

Xf2

where the Xi are independent and have

III

State a theorem necessary to validate this approach.


Laplace Transform:
if f(x)

III

x<O

-- extensive uses in differential equations


..,. extensive tables of the L.T o in the literature -- tables for passing from
the transform to the fuction and vice versa ...- see Doetscha Tables of the
Laplace Transform"
Notes

Replacing (s) by (-t) gives the m(Jgof .. , or by (-it) gives the c.f 0

Fourier Transform -. the mathematical name for the charact.eristic function


III

[eitx ]

a;

\Ie have previously noted that if X, Y are NID with means jJ.. and variances
221.

then Z a X + Y is normal N(~ + ~I a + a2 )c


1
There is a converse to this "addition theorem for normal variatesll to the effect
that:
if Z

=:

normal

X + Y where X, Y are independent and Z is normal$ then X, Y are both


._. see Cramer po 213.

Derived Distributions (from the Normal and othersh

name

density
1

n/2 ..
2

r (~)

n.

;~

...x/2

.Cof.
(1 ..

a
n

2it)~

2n

- 37 ..
n

~....... Xn are

.- it'

NID(O, 1),

x2t

is y.,2

= X; + X~

_. it has the additive property X;+n


Garruna

aA

-ax

"'-1
x

rCA)

.. XZ1) is a special case of the gamma

- = Pearson's
student's t
(n)

-_ t

Type III distribution with starting point at the origin"

~(~)

rm

r(~)

=XJ n
y

--

2 nil

(1+ !..) 7'"


n

X~

where

n-2

Ln ;;.2)

( n",>l)

Yare independent, X is N(O,? 1), Y is

X~

-- t with 1 d.r \) is a Cauchy distribution

r{~)

F(m, n)

r<;) r()

i 1

n/2

( ~)

---

X m+n
(1+ !!! x) T
n
.,

2
2n (m + n ... 2)

n
n'"

m(n_2)2 (n-4)
~

X>O
-

if X, Y are independent, X is

Fisher's z is defined by F

= e 2z

2
Xm,

Y J.s Xn, then F =

see Cramer p.

243~

Beta

13(p, q)

r (p+q) p...l
q-l
- - x
(l-x)
rp,rg-,

__ putting P

= ~o

= ~,

we get 13

.~.

p+q

~ F

= _n_ _
l+!!F
n

-- see Kendall for the relation to the Binomialo

Jil!!'YTr1

38 Gamma function,
rep)

rep)

III

n-l
x

(p-l) r(p-l)

i f p is a positive integer

r(p+l)

lim r~h)
p

n>O

dx

III

rep)

= (p-l) !
Stirling IS Formula

=1

h Fixed

rep)

P~
Problem 178

2.

X and Y are WID (0,

III

arc tan Y/X

Problem 18g
Cramer po 319

No. 10.

(j

).

Find the marginal distribution of

-39Chapter III
Convergence
Convergence of Distributions:
a. Central Limit Theorem (theorem 14)
b. Poisson distribution - (problem 15) -

Proof: 1. By c of'. is straightforward 2. Direct


>"+b~
P

(a, b)

~>...,y.
K!

= >..+a{X e

i f X is a Poisson r.v .., then

see sol. to problem 15, or p. 250.


where P(a" b) is the above
probability statement.
x= If->..

LetK= >"+VAx

-rr

X+6x aK + l - h

iA

ax
b

..

(a,b) . x=a

.-A AA +

xl?:.

111-

rr

(h + x~)!

using stirlingl s Formula:

nl

n -JIl ' .. 1/2


n e (2nn)

where e{n)--7 0
as
n --7 co

III

1 + Q(n)

-402

Notes: limit

= e-x

X 700 (1+

limit ln

lfi3:1t
Xvr -

:::J

1..-1 00 (1+ .:.)1..

{i

where 9 (X) -7 0

as 1..700

henoe the lim


Pea b) = lim
X 700 '

~:tG1:l18ml sum ..l.. ~

r
b

Note:

.l..

f2;a

2n

e"'x

h2~

e..

/2

dx

For a similar proof for the binomial oase, see J. Neymam First
. Course in statistics, Chapter 4.

o. If' X has the X2 distribution, then

:_n

~
V 2n

is A ~1(0,1) as

n-7

00

Proof: See Cramer, p. 250.


d.

If X has the Student's distribution with n d.f .., then

X-o

J~~

is A NCO,l) as n 700

Proof: Deferred for now) a proof working with densities is given by


Cramer, p. 250.
e.

If X has the Beta distribution with parameters p, q, then the standardized


variate is A N(O,l) as p -j 00, q -700, and p/q remains finite
Note:

X has mean...P....
p+q

Proof:

Omitted

1
1 + qJp

-41Problem 12:
a.
b.

X is a negative binomial r.v. (r,p)

Find the limiting d.f'. of' X as p~l; q-70; rq-7'A. (f'inite).


state and prove a theorem that shows that under certain conditions a
linear function of' X is AN{O,l).

Problem 20:

X,Y have a joint density f'(x,y)

Define E

[seX) (Y] ..

where f(x y)

g(x) f(xly)dx

-co

::I

Show that E[g(X) ]

f'(XJY>!f'l(y),

f'l(y) is the marginal density of y.

DEE [g(x>Iy]

Y X
Use this to find the unconditional distribution of' X if
Y is Poisson (A.).

xlY is B(Y, p) while

Convergence in Probabilit;y
Def. 18:

A sequence of' r.v., X:t' X2, .. ., J!h is said to converge'in probability


to a r ov. 'X i f given s, 8 there exists a number N( eo, 8) such that
f'or n ';7 N(e J 6)

Pr[1 Xn - X )?S J<8


which is written ~

-p

As a special case of' this we. have


Nsuoh that for n

-t tt

indicates converging
in probability.

J!h

p> c i f given

t,

8 there exists

>N

Pr[\~

cA>~e]< ~

Further, in this notation, theorem 9 may be written:


1 J!h is a binomial r.v with parameter n, then

p-1

Ii
Theorem 16:

If' ~,X2' ... 'J!h are a sequence of independent random variables with
means l-Li and variances
n

cri then i f

Lcr~
1 1

---""2- ~O
n

then
Yn

~ ~ (~ ~ - j

-42Proof:

Bit' the corollary to the Tchebyohe.fi'" Theorem (thm.8)


n

~ ~Xi - I'i~ 7 K "Yn1"' ~

Pr [I
given

~, ~

choose
"

Now

oi:'n

Put K

IOl

'.

iii

v'"f
V2

{6

n
)'

~ ~< ~
iii

o~ ~

-7

Gl)

and take n suffioiently large so that

(~ f' o~
n

)1/2 <.. s

with this choice of n


n

Pr[1

as n

~ ~

(Xi -

'1.~ ? ~ J< ~

If X:I.,X ,X , ~ a U ~ have the same distribution, then


2 3
2
2
andOy = .2..~o
asn--7oo
n

ai

2
o
=n-

so that in this particular case, X - P ~IJ.


Theorem 17: (Khintchine) (WGSk Law of Large Numbers)
If J1.,X2,

mean IJ., then

"X are independently and identioally distributed r


n

~ IJ..

Proof: (See Cramer, p. 253...4)

xCt)

In

IT t
E(e ~"')

i(~ IXi)t

l(t)

E(e

. (t)

==

n In (~)

where

~ -7 0

Lx(i)] n
rL

= n In 1 + i IJ.! + R]
n

as t

-1

.v. with

-43-

In

.
(t) .. -1IJ.t + n In(l + i lJ.! + R)
1-1.l.
n
Note:

In(l+x)
where

-:UJ.t .. n ~i

III

R~X) -7

(t) -) 1

as x

-j 0

RU(!)

...ntn.--to

so

henoe

IJ.* + R{i> + Rll]

Rn(1)
= in t whioh as n-f (I)

x + R(x)

as n

-t 00

!-lJ.

but i f (t) = 1

F{X) = 0

x <.0

x~o

III

or, in other words, the limiting distribution of X- lJ. is the


trivial d. whioh takes the value 0 with probability 1.
0

Questiom

X:J.'

X2, X ,
3

Do

II

lJ.

~3

Do

2
2
~ 2

If

1>&0'

Xn

." .

lJ.n ~lJ.

GO"

,in ---762

i.e ., does ~7 I

imply lJ. n }lJ. '1

Not neoessarily)as shown by the following example.


Examples
Let X be defined as follows:
n
2
In = n
with probability

with probability

-1
n

1
1- ....

-7

-44,"~ X

E \'" X
l n

-j 0

:I

0(1 - 1) + n2 (1)
n
n

as n -)00

:I

r.~1-t 00

Problem 21:

Let X be the r.v. defined as the length of the first run in a


series of binomial trials w~th Pr [ success J : I p. Find the distri
bution of X" E [XJ ' and ~(X).

Problem 22:

Let ~" 1 2 , '''''!n be independently unit.ormly distribu.ted (on 0,1


L~t
m1n OS.. " 42' ..." ~). Find the d.f. of I" ELY] , and
er(Y).

r=

Find the as~totic d.f. of nI and of I ... E(Y)


tTy
Is Y ... E(Y)
A N(O"l) as n - j oo~
0'1

Theorem 18:

u"x.a

Let X.,,~,
be a sequence of random variables with distributi~
functions Fl(x)" F2(x), "0' Fn(x) ~F(x). Let 11" 1 2, ""Yn be
a sequence of random variables tending in probability to c.
Define: Un

:I

~ + In;

Vn ~~;

and Wn

:I

~/Yn

a) the d.f. of Un ---7 F(x-c); and if c 70


b) the d.f. of Vn~ F(x/c)
c) the d.f. of Wn~ F(xc)
Proof:

All three parts are similar -

see Oramer p.

2S4...S for proof of

the third part.


Proof of the first statement (a):
Assume x ... c is a point of continuity of F. Lett be small so
that x ... c t is an interval of continuity.
Let B:t : I the set of points such that.
Xn +1<'
n- x;
8

:I

/1n

-cl<::.-

the set of points such that.

xn + Yn

.:::.

Xj

In .. c \ 7

+ 82 CI the set of points such that


Xn +1n_
<:"x

:I

..

..45-

IYn .... 0 \'7 e J~ Pr [I Yn .. 0)7 eJ whioh ten~s to 0 as


n -700" the~efore we can choose nl so that n >nl implies P(8 2 ) c::.. i
P(S2) Pre Xn + Yn L.-~.

In ~:
-1
F@x ....

0 ..

e -Soyn::- 0 +

0 .. 6) ....

P(~ + ~ L. x)

~ p[ V

&,

thus
.

f<F{X" c .. e)

= P(~ = x ..

~c - ~)] = Fn2 [

:=I

[xn ~ x .. (c + el J :::-

Yn)

x - c ..

~.o(x -

c + s) -

where n is chosen so that when n;>n Fn(x) .... F(x)(i in the vicinity of c.
2
2
Therefore" in Sl

i< P(Xn +Yn x) L F(x .. c + ~) +.i


oan be chosen so that F(x ... c +
F(x ..
i for n
F(x ..

0 ...

e) ""

6) -

NO~ing
.... 8 ~ ....

i?hat ,Pr[Un ~ x

i .. i ~ F( x ..

p( 8 ) +p( 8 ) ..

~ F(x -

+ e) +

0 -

6}<

J= ~[Xn + Yn~xJ = P{Sl) + P(S2)'

6) ..

~(x

c)

i + 0 - F( x ., c)

= Pr [Xn ~ .xl

i +- i . . F(x -

0)

~i

:"" F( x ...
~

whioh makes use of F(x

c + e)

F{x .., c)

II(

F(x

0 .... 6)

F{x .... 0)

III

This then reduces to ....

~ ~-

hence J Pr[Xn~xJ .... F(x ...

Pr

[xn~xJ

c)J~~

L.

.. F{x ..

~ ~ 6

i
i
o)~- ~

for n7max(n l ,n 2 )

which is what we set out to prove in the first plaoe

...

.,max

(~"n2)e

we can write:

-46-

:'heorem 19: If X
n
g(X )
n

ProBlem 23i

>0

and i f g(x) is a funotion oontinuous at x

0,

P-} g( 0)

Prove Theorem 190

(work with faot that g{x) is oontinuous)

ExamJ2le of Theorem 18:


Show t n is A N{O,l)
where

XJ.,~,

000

are independent with mean

sn =
whioh is in the Wn form

In is A N( 0,1) by the oentral limit theorem.


Henoe, 1.t' we show that
sn
y=p-71
n
(]
then the statement that t is A N( 0,1) follows from theorem l8{ e)

~(\ _ 1)2
{n _ 1)(]2

::I

n
n -1

=-

~, val' (]

-47-

by KhintchineJs theorem (No. 17) this sample mean tends to (]2

p ) (]2

therefore, the first term -

i.el)'

(Xi ... J.L)2

) 1

hence, it remains to show that

(x - J.L)2

2
(]

':j

but we know that ) X ...


theorem (No. 17).

J.L

(I ... t.l.)2 _

Therefore.
,

-7

J.

:;J

pI

as n -7

ex>

2
and sn

by Khintchine t s

2' ~
p

(]

(]

s
and thus by another application of Theorem 19 -;
Note:

p ., 1

The t n in this case is Student as distribution if the x are


independently and normally distributed -- however, this is
for general tn.

Re Theorem 19, see: Mann and Wald; Annals of Math. Stat., 1943;
itOn stochastic Order Relationships" -- for extending ordinary
limit properties to probability limit properties.
Mise .. remarks#

On the Taylor series remainder term as used in the


proofs of theorems 14, 17, and on p. 37 -see
CratAer p. 122.

If f(x) is continuous and a derivative exists, than we can write


f(x) = f(a) + (x

GO

a) ff La +

Q (x -

a)]

~ 9 ':=-1

=:

f(a.) + (x .. a)Lf'(a) + ,fl(a ... G(x - a) ) ... f~(a)J

10

f(a) ... (x .. a)fU(a) + R

where R

= (x -

a) [fl(a + "(x .. a - f' (a)]

-48R

fa [a + Q(x ... a)] .. fl{a)

IS

ft
then i f

[a + G{x ..

is continuous" as x --7 a

f~

lim

a)] ... fl{a) ----70

III

ioeo,P R converges to 0 faster than x - a

x...-.)a x-a
If there exists an A such that fA g(x) dF (x)

Remark:

0)

..
for n
)

1, 2~.3, 0lJ u then i f Fn--p"7~ F

g(X)dFn(X) - - ;

...co
Question,

r::;

and ,

<s
g{x)dFn(x)

Ref .. Cramer, P4 74

g(;)dF(x)

...( 1 ) '

Under what conditions does E(t n ) 0 for all n


or does E( t n) --j 0 as n ----1. co'l
(tn is defined as in the example illustrating theorem 18)

Counter-exampJ,elt ~

1
2
;-; e- x /2

Define Pn:=...L
~ 1-n
In is normal (0,1) except on the interval. (1 - ~ , 1)
and

III

1 with probability Pn '

X:L'

Then with probability (Pn)n


in which case t n

.l0-

IJ.)

= ,.,.
~

therefore E(t n ) is not defined.


Problem.: 24~

X is .Poisson All then Y III

is asymptotically

(X

X".)

X~l) as ".-7

CD

X2,

"0'

III

<s

-LeConvergence Almost Everywhere:


Def. 19:

A sequence of r.v.,

X:t'

--t X a.e.

X2, ...

i f given

8,

6 there

exists as N such that

Ref ..: Feller, Chapter 9.


~:

Convergence almost everywhere is sometimes called "strong convergence"


Convergence in probability is sometimes called 'lrseak convergence".

Exam:raleg . (of a case when Xn

but Xn~ c a.e.)

p ) c

with probability 1 .. ~

-,=:
1 with probability
n
the XI s are independent.

(1)

To show Xn

p -j 0

.. !n

for any 8<1

can be made arbitrarily small by increasing n.

(2)

~-/-10 ae ..
P.r

D~ "'"

0 )

<

00
III

IT Pr[~L.E]

nmN
Note:

l - x <:::.. e-x

= N,

N + 1, N + 2, ...
CD

= IT

(1 -

N;j)

j=O
0

~x

=1

which series is divergent

-50therefore

[I Xn - 0 I < e

Pr

therei'ore ~

= N,

N+l, N+2, ]

=0

0 a.e.

Problem 25:
If

then

=0 with probability

=1

--j 0

with probability

1 . . ,1~
1

7'

a.e.

Problem 26:
= cos ta

x(t)

1. What is the d.f. of X'1


2. Is 'f A.. N. as n -7 CD (suitably normalized) '1
3. To waht does X converge as a--j 0 '1
Prohlem 27:
X.~ and Y.~ are independent and identically distribu'ted random
.

variables with means

IJ.,

1/

and variances

vi ' a~.

Find and

prove the asymptotic d. of X Y (suitably normalized).


0

Kolmogorov ine9Wit,V

Let
2
and variances O'i"

IJ.i.

Then
Pr

I~

n n

[~~

be a sequence- of r

oVa

Xi

~ fl \ . q

(tai

)1/2

Exampls3

J1.

X2

Pr

[J

Xi

\-<

\ JS. + X2 \

<

Y2K ] '/

1 -

with means

Theorera 208
oo.

are independent random variables with means l1


i
00

2
and variances ai' then, i f. L\

l-7

l1n - 'iin

2/,2 converges
ail.

0 a

or we say that X obeys the strong


n

where jj;

= 1n I

J.!!! !

large numbers

l1i

Proof of Theorem 20:


.

Let A. be the event that for some n in the interval 2 j ...1 4.n '" 2j
J

(Xn .., I1n /;> s

(violating the definition of convergence a ..e.)

-52-

Pr

[J

t<l"- ~)

17

nJ

~ ~ L<x" - jJ.nl
I I n - ~JJ
e
'7---(I <7~
I
j 1

2 -

Pr

j l
2 ...

{X

L.

inserting the lower bounc


on n

PI'

)1/2

<7i )1/2

from the Kolmogorov ine'qUslity:


PI'

x..

-J.'"

h1

L.. If-

<71

IL(~ ~ ~t

(JS. + X2 ) .. (!1 + ~2

<::... If;

t:,;
~

(2 + 2)1/2
<71 . 0'2

..:..-_--..;. <:

{~ <7i)1/2

and inserting the


letting k == 2 - e
bound
of n in the
2)1/2
( <7i
summation

upp~

-53Interohanging the order of summation

~ ~: 2~j

L.. 4.
-2

1=1

2j >i

Note that

since it is a geometttio series.


1

1-':'2"""
2

2j 7i

16
38

III

Now this sum is finite by hypothesis -

henoe

I l
Pr

'X --1
Proof:

fJ.

a.e.

is immediate sinoe

\' O"i

L ":"2"

,,2

III

I!2

Aj ]

which converges, i.e.

other TYpes of Convergenc::e:


1. Convergence in the mean
1.i .m.
Note:

X
n

1.i om.

III

III

i f lim E [X

n-ja>

- X

J 2 ---7 0

limit in the mean

Implies convergence in probability but not convergence .a.e ..


Ref:

Cramer -

Annals of Math. stat. 1947.

~ a>

-542_

Law of the iterated logarithm


Ref.: Feller, Chapter 8

3.

st. Petersburg paradox


X = 2n with probability

Note:

Ln

~n

=1

e.g. Toss a ooin until heads comes up .- oount the total number of
tosses required ( II! n) .... bank pays 2n to the player.,

E(x)

III

for a fair game, the ' entry fee should be equal to the expeoted
gain, therefore, this game presents a problem in the

determinat~

of the fifair" entry fee.


Ref.~

Feller, p. 235"'7 -. he shows

~ ~n~:in ~ l>~}~
1

tha.t is, the game 'lbecomes .fair It if the.. entry' fee 'is .n. 1n n.

... 55CRAPI'ER "N

ESTIMATION (Point):
Ref:

E. L~ Lehman, "Notes on the Theory of Estimation U, U. of Cal. Bookstore


Cramer -- chc 32-3

F(x, 9) -- a family of distributions


X, 9 may be vector valued, in which case they will be denoted
I

!,

8 ~

9 s

n -- parameter space

Example~ 1;

if Xl' X ,

~ ., Xn are NID(~,

i)

then

consists of all possible ~

and all positive (l

2: F(x) Fs

7'

the family of all continuous distributions

is then the space of all continuous dof 0

-- this is the non-parametric case


- might wish to estimate Er(x).

1
co

x dF(x)

"'00

provided that we add the restriction that E.F(X)


Estimate of g(Q) is some function of
close to

from R to
k

-f.L which

<: co

in some sense comes

g(Q)

Or in the "general decision theory" point of view (Wald)


an estimate of g(Q) is a decision function d(X) and we have associated with
each decision function a loss function W [
whenever de!) ... g(g)

d(!>, Q] with

W .. 0

The choice of loss function is arbitrary, but we frequently choose

Poet;)

20~

Risk Function is defined as


R(d,

~)

... E

W[ d(!),

]1 . .

co

S
-00

W [ d(!),

dF(~ ~>

- 56 -

d~(X)

... f

R(d*,~)

E(X _ ~)2

101

~
n

A t'best" estimate might be defined as one which minimizes R(d, Q) with respect to

d uniformly in

Q.

R(d, Q) 15 R(d*, Q) for all Q with d* any other estimator ..


consider the estimate d(x) of g(g) defined as d(x)

..

g(Q)
o

Hence amiformly (supposedly) minimum risk estimate can be found only if there
exists a d(x) such that R [d(x), Q
0

J..

An example would be similar to asking which is better for estimating time -- a

stopped clock which is right twice a day, or one that runs five minutes slowo
Since a uniformly best estimate is Virtually impossible to find, we want to
consider alternative
WAYS to formulate the problem of obtaining best estimates~.
I.

by restrioting the class of estimates


l~
unbiased estimates

Det. 2l~ d(!.> is unbiased i f E [d(!.}

J...

g(Q)

d(X) is a minimum variance unbiased estimate (m~vOuQe.) if


E
~

t de!) -

g(Q)

J2

is minimized over unbiased estimates d

invariant estimates
Let heX) be the transformation of a real line into itself which induces
a transformation h on parameterspaceo If d [heX}] "" li[d(X)] then
d(X) is invariant under this transformation.
Example~

family of d.f c with E(X)

< 00

Problem is to estimate E(X)


hex) ... ax + b

Ii [E(X)]

An estimate d(X) of

~ is invariant if d [h(X) ]

d(aX + b)

Q.

a E(X) + b
101

Fi [d(Xi}

a d(X} + b

Therefore X is an invariant estimate of ~ under this


transformation.
Note that d(X) .. ~o is not invarianto

- 57 -

3. Bast linear unbiased estimates (b.l ~u.a. )

Dei'. 22: Estimates of g(Q) which are unbiased, linear in the X. and which among
- - such estimates have minimum variance are b .l.u.e.
J.

Problem

2~:

Xl' X2, 0 " OJ Xn are independent random variables with mean IJ. and
and variance (12

I is the

show that
Problem 29~

Xl' X2,

W [d(X),

0,

QJ

bolou~e. of IJ..

are NID (IJ., (1 )

co

b [d(I) - IJ. ]

III

[d(I) - IJ.] if d(X) ~ IJ.

a"

d(X)"
(note:

X find R(X,

b.

d(I)"

X+a

[note~

i f d(X) :> IJ.

(1)
the answer depends on the loss function constants only,
not IJ.)
j..L,

-- show how to determine


is minimized

such that R(d,IJ.,(1)

the answer involves (z) which is defined as

-co
Comment on this

problem~

An orthogonal transformation

.- is a rotation or reflection
-- l. .., ~ where A is orthogonal

.... I J 1= 1
.~

...-....wyi = -"ix i

...... For

II

+ a.l_x :

~ a~ ... 1; ~a'jakj
...
j J.

.Lunj=lJ.J

In general if d(~) is a function of T(X .. X )


l
2
\

R(~,

Q) .. E[W(d, Q)J

f,.. f
/

V [T(X), Q ] dF(!)

et

0'

In)

- 58Making the transformation

y ... T(x)
y ...

n-1 functions independent of the first one

Then R [d(T). 9}

f'l

[T(X). 9 ]

~(Yl!

..ro.o ) elF(Y2 Y),

000.

Y</

~arginal distribution ~conditional distribution


of Y1

~
g

II.

r'l
S

III

T(x)

of Y2' Q"11 Yn
given Y1 1

[Y1 , 9 ]elF(Yl)

WCT(X), 9] elF [T(X)]

Optimum Properties in the Large


1c Bayes Estimates

Defo 23:

If

o.r

has a known "a priori" distribution H(Q) then the Bayes estimate
g(G) is that d(G) which minimizes

)R(d, 9) dH(9) with respect to d(x)

ft
Example:

X is B(n, p) and p is uniform on (0, 1)


Let W [d(X), GJ
R(d, 9) =

&(1) .. p] 2 and minimize this with respect to


Id(X) - pJ

x ... O~

( : ) pX(l _ p)n-x

Average risk ... risk function averaged with respect to p


1

SR(d, Q) dp

- 59 -

=~

2
[d (x) - 2pd(x) +

x=o

i]

xJ(:~xli pX(l,

Note:

(a{)b
) p 1 - P dp

- p)n-x dp

a b~
(a+b+lj1

o
Using this evaluation on each part separately we have

x 0

[2
nl
xl (n-x) 1
2d{x)nJ
{x+l)J(n-x)J
d (x)xJ(n-x)J (n+lp - (x)J(n-x}l - (n+2H
+

1:1

~
a

~ r

. 1
2{) _
n+l ~ 0 ~d x

1
n+l " 0
x

nl
xHn-x}J

2
[d (X)
2d x) x+l
0
n+l - n+l n+2

~(x){x+l)
(n+2)

nrr: '

X+11
d(x) ... n+2

(x+2)J(n-x)J ]
-(n+3JJ

(x+l)(x+2)
]
m+l){n+2Hn+3)

{~+1)~X+2~ _(~
'\ 2]
n+2 J

+ (!:!:!)2 +
n+2
\n+2) n+3

x+l {n+l...x
}]
+ n+2 \.Tn+2)(n+3)

This is certainly minimized with respect to d(x) if each term in the first summation
is zero -- ioe 0 if
d(X)
.
Problem 30:

if d(X) ..

1:1

X;l
n+2

?f. find R{d.ll p) as a function of p and also

io R(d~

average R...

p) dp

if P is uniformly distributed on
R(d,? p)

os

E [d(X) ... pJ 2

(o~

1)

- 60 2.

Minimax Estimate

Def~

24:

d(X) is a minimax estimate if d(X) minimizes sUPeR(d, Q) in comparison


to al\Y other estimate d*(X)

(9)

i;Je., we get the min (with respect to d) of the max (with respect
to e) of R(d, 9)
-- or we take the inf (d) sup (e) of R(d, 9)
dl(x) is minimax estimate since it

l---"L------~
_ _- -

R('1. 1 Q}

'--------

.au!~ Q)

has a minimum maximum point

3. . Constant Risk Estimates


Def 0

25:

A constant risk estimate is one for which R(d, Q) is constant with


respect to 9

Problem 31:

II
Der.

Find a constant risk estimate among linear estimates or p if X is B(n,p)

-::J~l. de~~.,,",o~y_.w~p_~..la~~@...!~.!!!E.!e (a.~.~~':~~1~..Eertie~.5)! estimate s

26: Consistent Estimates

~-

d (X) is consistent
n -

it:

d (X)..
) 9
n p

(it does not necessarily follow that E(dn ) ~Q


Problem 32:

If' E(d ) -79


n

and J(d

HO

or that d~drl70)

then d (X) is consistent"


n

(these are the sufficient conditions for consistency)


Best Asymptotically Normal Estimates (B"A"N"Eo) .- d(X) is a B.AeN.
estimate:if:
dn-E(dn '
is A., N(O, 1)

atdnJ -

2.
lim
n-?co

if

--

d: is any other Ao No estimate, then

<:. 1

.61METHODS OF ESTIMATION
-.
A ......
~

~thods

ofdi>ments

(Ko Pear$on)

.. 91' 92,u", 9 .... equate the .first


k

sample moments to

moments (expressed as functions of 91 " 9 <1"" Qk).


2'
91' 92, <I <I . , Qk and these are the moment estimates.
example:

Xl" X2 '

then

3'
.It.

'$

population

Solve these equations for

~ are NID{l.', 02t

first population moment .. ~


second population moment ... 0 2
n
~.
2
..:J (Xi"!)
~2
<I 1
(Un" being the divisor used
, ";'--n--- .. 0
by Ko Pearson)

note: This method yields poor estimates in many cases -- has very few optimum
._ .properties
.
B -- M9thod of Least Squares (Gauss -- M:l.rkov)
let X ... {Xl' X2~

all ....
A ...

<I

a nl

0"

i~e.~

s~n

A is of rank

a ns

s
..

j .. 1

aijQ j

also
Def. 28:

both X and Q are column vectors

i ..

l~

2,

<I

ioe o , the covariances

is a least squares estimate of

if

OJ

III

* minimizes

(X - A9)' (X - AQ) .. ~ (X. - aJ.'191i-l


J.
Theorem 21:

00

e -

a is 98 )2

(Gauss -- Markov)

With the given conditions on Xl' X2' Q$ Xn the least squares estimate
~* is a best linear unbiased estimate {belouve.} of ~ ~

- 62
Proof Theorem 21:

ref:

we first show that

it' we write

g* ..

= C.1

c1

= (!. _ AC-1

A'

1
AC.. A'

1949, po 458

BiometrikaJ

At X

A!. + l

with respect to y,

= (! ..

Placket:
Lehman

where

1:1

AA

=C

since it ia
symetrio

then we are trying to minimize

i.

A ~) t (!.

(! ..

AC

A ~)

AC..1 A'!. .. AZ)

! .. Al)' (! ..
!.)' (! -

1
(1" AC.. A'o!)' Al
AC-1 A'!) + (Al) , (Al)

'!r -

1 A

(! ..

- (Az) t

now the croas-product terma equal zero

-! Az + !

eClg4

since

'-1 ' ,

) A Al

A(C

(Cool)

;:I

1:1

..

Cool

Al. +

(C..1 )tA 'A C-1 C

! Al. ..

= A'A

=I

similarly the seoond cross-produot also .. 0


hence

is minimized with respect to

happen i f

Az ..

or writing

it' (Al,) , (Al,) is minimized which will

sinoe A has maxilrrwn rank

this will happen only if' y=O

= ~ (X ... a 9 - 0" .... a .9.


U 1
iJ J
i .. 1
1

-I)

(I

a. 9)2

0"

1S S

.!

formally minimizing
in the usual fashion by differentiating with respeot to
the 9
j
n
=
-2
~
a .. (Xi .. a
9 .. () .. .. a ..9 .. 0
a. 9 ) .. 0
'1r
11 1
d~j
i
1 1J
1J j
16 S

'dl

(t

;:I

1, 2,

j ..

0, a

solving these equations


n

~.
=1

a i .x.
J J

(~

i. 1

or A X (A A) Q

aijan

91 + '-'

(~

i 1

aij&is) 9 s

. 6.3 ...
To show that

so defined

(ioe.,

' ) is B.LoUoE e
'"' C-1 A!
B~m!

consider a linear estimate

B AQ '"' Q

thus BA '"' I

note that o-2BB ' is the covariance matrix of B! -- the elements in the
diagonal are the variances of the estimates ~ ...., we thus wish to minimize these diagonal elE;1ments (by proper choice of B) subject to the
restriction that BA" I

(B .. C1A') (B ~ a1A'), '"' Bit. B{A')'(C-1 )' _ C1A'B' + C1A'(Ai)'(C l )


using the relationships that BA '"' I
(B_O..1A') (:B..O...1A ')'
thus

BB 8

..

(0...1 ) I

..

0-1 ... C-l

III

0-1

'"' BB' C-l

BB' '"' C1 + (B CnlA') (B ... C1A')'


or if B C...1 A'

minimization will occur if (B .. O-lA I) '"' 0


hence

0 '... 0 or (C.. l ) t

\ AOf A '"' 0

* are

example:

if

(~'"' C...1A '!)

B~LoUoEo

JS.,

X2J

.~

common variance

Xn are uncorrelated with E(X i ) '"' ~ and

ci

then the least squares estimate of ~ is f


1

let A

.,

here

1 (we have a lxn matrix of 1 s)

o '"' A'A '"'

0...1= ~

'"' 1 ~ x... f
n 1=1

= 1"

2,

0,

t i are known constants


find b o l" u. e. of

col., ~

and assume

~ ti
1

using theorem 21

lOl

].

64Aiken extended this result in 19.34 to the case where the X. are correlated and
we mow the correlation matrix V (up to an arbitrary mul tlplier) -b.lou"eo are also least squares estimates which are obtained by minimizing

(! _ A f~) f

V I

(! _ A t~)
ref:

c.

Plackett:

Biometrika, 1949,

p&45~

Maximum Likelihood Estimates (ref. Cramer oh .3.3)

Defe 29:

Likelihood function
L

III

f(JS., X2 "

I)

0,

III

p(Xl, X2 '

I)

."

In' ~) if the I's are continuous


Xn'

Q) where the pts are discrete probabilities


if the Xt S are discrete

i f the Xi are continuous, independent,and identically distributed

L .-n-t(X., Q}

1.1'

or ln L

II:

= 0

i ..

1, 2,

Q ,

~ ln
1

(X., Q)
1

Regularity conditions needed in the maximum likelihood derivations (ref: Cramer


500-504)
1.

2.

The

are continuous, independent, and identically distributed.


~ will assume first ttat ~ 1s a scalar..
~ ln (X ' Q)
.
i
is a function of X. and hence is a random variable.

dQ

we assume that

"lln (X
';)gk

.3.n is an interval and

4.

dk In t(Xi ,
09k

Q)

<

i , Q)

exists for

= 1,

2, .3

(the true value of Q) is an interior point

Fk(X i ) which is integrable over (-

00,

00)

- 65 E [F (X )].(M for all


3 i

50 E

ta;nQfi~2

i.e., it is bounded

...-

J(O ln~f~", Q)~(,,1'

O<k2< 00

Q) dxi k2

If f , n satisfy the regularity conditions, and i f L, or

Theorem 22:
"'-Ma'

In L, has

a unique maximum, then


the maximum likelihood estimate ~ is the solution of the equation

1-

,In L
0

09
1\

2- 9 is consistent

3-

rn (~ . . 9o )

Proof:

1-

is asymptotically normal with mean 0

'9t~ L is continuo'lls with a continuous derivative,

f
n

L(a

2..- to show that

[o~l~i

since
In

and varianca

In f(X1'

has a maximum,

\1~ L.. 0

at this

if

max~

is consistent

summing each term on both sides of the equality, dividing by n, and doing
some substituting
1
-

aIn L
e- 9

1
...::i.
n i=l

s:: -

d In f i
~Q

~ [~.21n f i

+ - ...::i
Q=9
n i=l
o

Q2

9=9

( 9-9)
0

z
(Q-Q )2
+ - ~ F (x )
0
n i-l 3 i
2

the term in the third derivative is replaced by the term in


regularity condition 4a -- and multiplied by the factor z(O$z-il)
to restobe the equality
this equation can be

!..n

a In L
~Q

written~

(9_9 )2

B + B (9 _ 9 ) +
0

BI)
....

20

-----~-----------------~~--------------~---

-66note that we have:

f(x i Q) dxi 1

..00

from which we can Show:

also" differentiating a second time we have:

- 67 ..

-----~--------------~~----~----------------

Thus, by Khintchine's theorem (noe 17)


B

Let S

= the

~O

B2

2
.. ..k

> E[F3 (xi >] <M

set of points where

I B0

IL.. ,}

We can f'ind an

~~

Bl

n j given

Pr[S] >1 - s

0.11 such that

8j

In S the right hand side (rohos.) of' the Taylor series expansion is to be
considered"
Consider: Q = Q + 6
o

rohas~ ... Bo + Blo

~ B2 62

~ F} (1 + M) ,., ~ k 2 8
k2

So that" 1
Considering:

Q ... 9

<

the l"ah~Sc

2{1+M)

~ 0

2
8 +

6" 9 0 + 6) and

in the interval. {at the root)a

.. M5

f'orthe same

Hence, in S, which occurs with probability

__

~ 0 "

- 8

'? . .

%'Oot in the interval (9 0

letting z 1

= -(M + 1)6

2 +

k 26

<~(l+M)

> (1 - 8

~ a"'al~

L .. 0 has a

ln L has a maximum

- 68.1\
Q

is then the maximum likelihood estimate and the solution of the equation

! ~lnQL
n

~..,

whioh yields

($

... B + B
l
0

Q )
0

+ !2 z B (Q
2

Bo

... _ _

co

_Q

)2 .. 0

_
.

B +2" B2 (Q - go)
1

Multiplying both sides by

k..r;;

We lmow that:

1
]
V [ Bo J ... ~ E

therefore

[nB o

is

[d d

ln

AoNo(O, 1)

Bl-~p->~ ...k2

B <M (i>e()~ it is bounded)


2

9-Q---0
o
p

Thus, by the relationships we have jUst stated, and by use of theorem 18

O~

2"

1
ln

III

E[

d~

f.i

----------------~----~-------~------------Example~

f(x)'" a a-ax
Find the moloe o of a, and its asymptotic distributiono
n .
L .. an a-a

Xi

.. 69 n
In L n In(a)....

xi

1
..........
..

i-l
n

.~~

therefore

We can easily verify that E(x)

= -1a ,
c>

E(x)

=-1a

'V

or a .. -

by the method of momentso

is A N(O~ 1).

or

--------------~-----~-----~----------~-----

Problem 346
-

2 ..a 2x
f()
x a e

find the m"loe& of a

x ,>0
and its as;ymptotic distribution.

-- verify that the same result could be obtained by a Taylor Series


expansion.

Problem 35~

--

Xl is Poisson ~
find the

~lc:>eo ot

(i = 1.1' 2, "

.J

n)

and its asymptotic distribution.

- 10 Problem 36:

X is uniform on the interval

- ... Find the mol.e", of a

(0" a) ..

[not by differentiating]

_... Correct it for bias and find the asymptotic distribution of the
unbiased estimate (~!, e
-- Another unbiased estimate is

* .. 2 ..x

Compare the actual variances of 1\,;


a, a*
Note:

a * is a moment estimate -- for another comparison of the method of moments


with the method of maximum likelihood" see Cramer p., 505"

NoteZ

The meloeo is invariant under single valued functional transformations i"eo,p


1\
the mol.e o of g(9) is gee).

Remark~

If dn(X) is

afn)

A N(j.L"

g[dj is oontinuous with continuous first

and

and second derivatives and the first derivative


then

F" [g(dn ) -

g(-IJ.).J

is A" N(O,

[g'(~)J2

.;, 0 at x=1J.
(12).

From 'Which we can get

Hence by use of theorem 18

"---~)

rn (g(d~
(j

... g(lJ')] is A N(O.9 1) "


g (j.L)

------------------------------------------Multiparameter case:
Theor~~~

Under generalized regularity conditions of theorem 22" if


~ "" (91 , Q 2 ,9

"

"

~!I 9n )" then for sufficiently large

and with

- 71
probability 1 - t, the
of the equations
0

III

and furthero

mol.e~

9 , Q2" ,
1

~o

~s

13

9s

."

V"" ... n

~, S

has asymptotically a joint normal

distribution with means Ql' Q2"


matrix V-l

where

is given as the solution

i 1, 2,

~o

"0

of

and variance-covariance

13

13

l
ri
In.1.\
Vlni
d
( dQsQl
s 2)
E

9 (0

(:lln f \
o

E (d2
;:

This is the so-called information matrix used in


multiparameter estimatIOn o
Sketch of part of the proof:

oa

d::l :

L
i ~ ~l~iL toa

+ second and higher degree terms

(Note:

ln L terms can be replaced by

In f i

CI

i 1" 2, , s

n" 111 f i termso)

From theorem

22:

lnL...

~
1

ln f(x i ,!)

E( ~~) E(~ P~l:jii1) =0


a

2
. E(ad ~
Q

Ii) . E(dl~ f2:)2


"l7

We also need a set of covariance terms:


2

ln
_ E f:.d ln f i ) .. E [-( d f i .) (.

C-d Qj

Qk

~Qj

~dln fi)J
Qk

which follows in a manner similar to the derivation of the variance


expression in theorem 220

- 72 ..
Now we oan write:

------:lJo)
p

n
b ok
J

!.n ~
1

II!

The maximum likelihood equations can be written (ignoring the second degree terms
in the expansions):
~ ~

A ~ 9 )
.. B (g

in matrix form or completely

written out as:

1)

(i.e., .. each element b jk replaced by E Lb


jk

B -p....--~> E(BJ

For la~ge n~

[ J -1

B E(B)
-B
o

or:

p ) 1

(~ -

B [E(B)]-l [E(B)]

13

($ - gO\.
[E(B)

-~(B)

J -1 Bo

gO)

13

(E(B)](Q -

(P)

"

V and is non-singular.

CI

S' ..

Variance-covariance matrix of

gO =

E(B) -1 [ var-.cov
V

Example of the information matrix used in the multiparameter estimation:


0

f Xj afj b
(

...

'fCE'
a

-ax b-l
x

X;>O

a 0

ln L .. n b ln a ~ n lnreb) .. a

Xi .., (b ... 1)

defining ~ ""

b'70

~ lnx
. n
i

- 73 ~ln

Lb."

....
n . . a a ." ..
a x 0

1.
dln L.
n~b

ln a -

r/i/f~t

+.~.o

From the first equation a:;


x
Substituting this in the second equation we get:
In.2 - F2(b) +

Xr. 0

ln (' (b) )

defining

lnb-F2(b).lnX-~"

Note:

Pearson has compiled tables for F2 (b) which he called the


di-gamma function
.
..

Thus we have:
.....

'd2 "In

da 2
~21n L
b

oaa

'?J2 In L

~b2

1
a

7
n

V -

IvI-

-1a

F (b)
2

[+ ~ F~ (b) -~ ]

note:

F2 (b) is called the tri-gamma.

The asymptotic variances or covarianoes are thus:


t
)
F2(b

1\

of

of

a 1\

a,P

JVI

J1.!.
Iv/

2
F2'(b)
a
n [bF ~ (b) - 1

----rOt- - - n[bF 2(b) - lJ

... 74

...;n

b);

(~

means

Vn (~ .. a) thus have a joint normal asymptotic distribution with


and variance-covariance matrix:
2
a F~(b)
a
b

F~(b)

:l
b

a
i

b F (b) .. 1
2
Exercise:

f'(x;~,il (] ) ..

j2"";'-e(] >1

.... Find the moloe" of

and

j.l.

(]2

(x - f,/:)2
2
20-

and find the information matrixo


n

..., The
....,.

mol~e

of

a and

~ (xi" i)2

1
are x" - -n - - -

~2

~3 '6-2

are independent:J therefore the covariance terms in the information


matrix are == 0 0

.. oo.(x <00
Note~

This is the so--called Laplace distributiono


It is an example of finite theory ."" the m.l.e" is not a minimumvariance estimate for any finite sample size.

a) Find the moloeQ of

j.l.

(based on n

independent observations).

Does it satisfy the conditions of theorem 22?


b) Is
Problem

38:

an unbiased estimate of
X is Poisson ~

A"

&

Y is Poisson ~ j.l.
Find the m~l<!le(l of

~? find a2
x

why not?

j.l.

We have a single observation"


0

and also their information matrix~

.. 75 ..
Check that the same results are obtained by expanding
about

I-t-I henoe this result is asymptotic as ~

Xi

Problem 39: a

ti'
~

in a Taylor series

ClO

I" Yare independent

is

a">O

b70

j 1,2, ... ,n

Find the m.loeo

matrix.,

ot a$ b and also the asymptotio variance-covariance

- 76 ..
D.

UNBIASED ESl'IMATIONe

Theorem 24:

Information Theorem (Cramer-Rao)

If d(X) is a regular estimate of Q and E [ d(Jt)

possible bias faotor) then

I:

9 + b (Q) (where b(Q) is a

for all n
where

-- the equality holds if and only i f d(X) is a linear funotion of

)~ngf

Regularity oonditions for the Information Theorems


1&

is a soalar in open interval.

JS.,

~,

ou.t

n are independently and identically distributed with density

f(X.? e) 0
2 '"

t~

exists for almost all x (the set Where; ~


on e).
(Noteg

Problem 36 where f(x)

I:

does not exist must not depend

~ (i.e o uniform on (0, a) ) would be an

exception to this oondition) 0

'3.

40

f(x, 9) dx can be differenUatedunder the integral sign with respect to 9.

d(x) f(x" e) dx can be differentiated under the integral sign with respect
to 9&

5. 0<k2~
Proof!

CD

From the proof of theorem 22 we remember that

Differentiut:i,ng both sides of this

~quation,

we get

- 77 -

wh 1.0ch ca n al so b e wrl.Ott en E

k(!) 4dl~
L

The correlation coefficient of S<,~)

(E[S(~)
-

Note~

O"d(~)

d<l) ] )2
2
O"s (:1)

(S<!>l

E [d<!:)]

I:

1 + br

Oln f(X)

dQ-

and d(X)

term vanishes since E CSl.!)

] =0

or

now

therefore
[1 + bR(e)] 2
'D in f (1)]2
n E -'09

Since r = 1 if and only if the random variables are linearly related it follows
that the equality in this result holds if and only if d(~) is a linear f!J.nction of
S(.!)
!2rC'Jrlplea

JS.,

X2 > ~I);"

2.
2
Xn are NID(&J" 0" )$ 0" known.

Find the C.,R lower bound for the variance of unbiased estimates of
jJ.)2

2: (\ _

L=

1.
(2 n 0")n/2

In L = K ""

2 )v

jJ.1)

- 78 -

.~

Hence any regular unbiased estimate d(~) of ~ must satisfy a~


2

but

ax

*2

Example (Discrete):

"minimum variance unbiased estimate (mov.u.e,,).

x is a

hence

Binomial

x is B(n, p) -- find the C-R lower bound for the uJiased estimates at p.
L

n )

=( x

In L

pX (1 - p)

K + x 1n P + (n - x) In (1 '" p)

:or

din L. ~

op

.. ?;}

n-x

In L
p2

E.:!

-:-

I-p

x
=~
P

;:

['d J= ~

Hence

In L
} p2

i >- R\l-p)

~:

d~

but

(!)

YI

'\

\-f )

n-X
(1_p)2

.. E

Y- '..L~~)

- - 0()

n(l-~

(l-p) .

n .
p(l-p)

= n k2

= E(l-p)

Recall that the C-R bound is achieved if' and only if de!) is a linear function

"a'dlQ.f --

of

if

is not a linear function of

d~~:f

then ~ will not achieve

the C-R 10W'er bound.


Problem
Example:

40:

Find the C-R lower bound tor the unbiased estimates of the Poisson
parametsr A. based on independent observations X:t" ~, ..., Xne

Negative Binomial (Ref. Lehman ChI 2" P. 2-21, 22)

L = Pr(x)

In

L = In p

= pqx
+

x In (1 - p)

1
x
=--p
I-p

i.e., sample until a single success occurs

.. 79

E(x)
+

p(l-p)

= I-p
p
= n k

p (l-p)

thus
To find the M.L.E" of P8
1
x
=- - - o r
p
I-p

1\

p .., l+X

- .. it oan be shown that this estimate of p is biased,

E [d(X~
or

= d(O)p
do +

+ d(l)pq + d(2)pq2 +

~q

+ d2q2 +

.00

c~o

+ d(n)pqn +

+ dnqn +

ooeo

fOO

5 1

If this power series is to be an identity in q we can equate the coefficients on the


lett and on the right
do

=1

=0

The unbiased estimate is thus

~. == 0

p* = 1

if x.., 0

=0

i f x~l

ioe .., the decision as to the status of the whole lot is based on the first
observation"
This is unbiased, but o~o.o
(this example is used to show Why we don't always want an unbiased estimate)

i *' = pq~p2

(the C-R lower bound) -

thus the C-R lower bound can't be met

p'

.' since this estimate is the only unbiased estimate by the uniqueness of power
series o

- 80To find an unbiased estimator we can try to solve the equation

.J
co

de!) f(~" Q)

dx

==

for the continuous case.

or
for the discrete case.
These equations" however, are in general rather messy and difficult to handle"
Problem 41. f(x)

k xk- 1

1.

Find the C..R lower bound.

Is it attained by the M.LoE o?

exte~sion

Multiparameter
We have

I:

fey 91 , 92"

'"

of the Information (C..R) Theorem


9k ) ..

Assume the same regularity conditions as for Theorem 24 in all the Gts.
Denotes

>tl

""ooe

~k

..
..

A=

"

"
~l

. / \ is non-singular

..

"""

as in theorems 22, 24
Let d(!.) be an unbiased estimate of
BO

~(~)

that

"J.

f{y 91' 92 , "., 9k )

By differentiation with respect to 9


1

E [d

81]

=1,'

By differentiation with respect to 9


Eld8 j

==

O~'

= l\ ~

- 81 Theorem 2$,

Under the regularity conditions stated

where

1. = (1"

O~

000"

0)

k components

or

~2

~k

fle.

0
0

'1ca

ad:!.

"u 0..

~k

f)
(>

"
0

e..

E [d:!. - 91 constants to be determined.


Proof:

Observe that

a i 8i

J2~ 0

where the a's are arbitrary

Which then becomes

k
ad:!.2

2 a1 "" ~ ~ a. a j A.
. 1 J=
. 1 J.
J.J
J.=

Call the right hand side of this inequality

r(J<,!)

We wish to maximize<:V(a) with respect to a to get the best possible statement about
the bound of the var'iaii'ce..
k
(1) (!)

_ _ _ _T

~
.L--.J;l.:=~l

= 28- -

a J.. AiJ.. -

i"

a.a j X..
J.
J.J
'---'---=-------

'_

- 82 -

d~~CP
or

=2

2~>-.u

- 2 ~
.. 2
J=

a j "\A' j
~

j=l

or

A..

~J

=1

~ a.

j=l

Aij

=0

Hence the equations in ! to maJdmize fJ(~) are


k

a'
j=l j

A:t. =1
J

~ a~ ~ .

j=l

where a

= 0

= 2, 3~ OQO~

is a maximizing value of a j:';.,.

How, multiply the i th equation by a? and add all the equations.


J.

The left hand side of the sum so obtaimd is


k

i=J.

a?
J.

~ a~ A. J~

j=l

J.J

=~~

;max

Therefore

(~)

aO

0 A.

i~ j. i a j

The right hand side is a~"


Hence (ZJ

= 2 a~ - a~ = a~

''ij

- 83 -

or~

as stated in the

O"~

v
I1

theorem~

>'22

C$$

c-

"

Pel

"

\:2

D1

o_1t

.1:.

An
0

III

E [SiSj

8.

1.

Examplei

00.

d 1n L
=-e"

A:Lk
~

Aij

--

7'1':k

"

<>

Remembers

~k

-ax b-l
ab
:rex) = - 8
x
reb)

x>O

-- the generalized Gamma. or Pearson's Type III


... a, b are the unknmm parameters
Find the lower bound for the variance of the unbiased estimate of a",
In L

=n

In a - n 1n
nb
III -

denote

82

= .~lb L. = n

Au=
A_
."2 2

r(b)

2
E [ 81

r (b) - ax

+ n(b...1) 1 n x

..,
-

In a .. n

fo [

1ll r (b )]

J.. F2 (b)

-1 = -

= E [822 J

:I

'd 2

J.n L

In L

nb

da"2-- "";~
2

d b 2-

= n F~ (b)

+ n 1n x

- 84 -

-1

F~{b)
a F~(b)
2

=--~;;...---

n [b F~ (b) - 1

- ~)
1

Note:

See the

e~'l"lJ.ple

for Theorem 23.

~I We started the proof with E [cL

J.

- 91

- ~ a.
i=l ).

_2

s. ]
).

~ o.

The strict inequality will hold unless


oonstant"
Therefore;

[d:J. - 91

~
1

a i 8i ]

If the multiparameter C...R lower bound is attained,


k

function of

'i.

is essentially

is a linear

~ a. 8.
i=l ). J.

and, as before, the M.. L\OE. (corrected for bias :if necessary) attains the
lower bound (IF there is any estimate that attains that lowe!' bound).
Problem 42:

P:r

X:t"

QUI

[x = x ] =

Xn are negative binomial


1'

X -

l' ..

1\

) pr qX

p +q

=1

Find the C-R lower bounds for the unbiased estimates of p

1.

If

l'

is known

2.

If

l'

is unknown

85
Problem 42 (b):

Show that it I
with density

I A.

is Poisson ( A ) and A. is distributed


b
f( A ) ..!.... e- a >. Ab- l

f1b)

then X is negative binomial.

Identify functions of a, b with p,r.

~-------~--~------~~--~----~--------~-------

Summary on usages re m.l,e.:


We have the following criteria for estimates:

1. Efficiency (Cramer) -- attains the CaR lower bOWld.


2"

(Asymptotic) Efficiency (Fisher) ...- (n) (variance) ~ ~ in the limit.


k

3. Minimum variance unbiased estimates.

4.

Best aSYmptotic normal (b.a.n.) -- among A.N. estimates, no other has a


smaller asymptotic variance,

Properties of m.l.e"

10

If there is an efficient (Cramer) estimate (i.e., it the estimate is a

linear function of
2.

Jo

d~nJ=.

) then the m.l.e. is efficient (Cramer).

If the m.l.e. is efficient (Cramer) and unbiased it is JIlc,v.u.e.


m.l,e. mayor may not be m.v.u.e.

other

Under the ~neral regularity conditions, the m.l.e. is asymptotically


effie ient (Fisher) 0

4& Among the class of unbiased estimates, then the m.l.e. are b.atln., otherwise
(ioe., if the m.l.e. are biased) we can not say that the m.l.eo are necessar
b.a.n.
ref: Le Cam; Univ. of Calif. Publications in Statistics;
Vol. I, No,) 11, 19.$3
-- The class of efficient (Cramer) estimates is contained in the class of m.vou.e. -which is the real reason we are interested in attaining the CaR lower bound.
-- 1, 2 are finite results, i"e. for any no
- 3, 4 are finite (asymptotic) results -- for large

n only.

.. 86 -

Eo

MilV"u.eo "".. COMPLETE SUFFICIENT STATISTICS:


."" Let X be a random variable with density f(!., ~), ~
~-

eno

T(X) is a statistic, ioeo, a function of the random variable (~)o

"".. Assume that the conditional dofa of


De1' 0 .31~

X" given T. t, exists",

T(I) is a sut'1'icient statistic for


T

1:11

if the distribution of
is (mathematically) independent of
that iS i f in
Q

1'(!.t 2,) g(T,9


h is independent of

.)

h(xlT)

~v

llt~!!:

NO e 1 -- suppose Xl' X2~


f(x, ~)

..

1. n
(.[2n)

CJ

Xn are N(~, 1)

~(r.i ... ~)2


e'" . ; 2 -

is thus sufficient for


.- distribution of

~::>

~x is N(x,9 1)

No.2 ..... Poisson:

~Xi) ~

Tfii ~

"'------v---------""
heX IT)

X given

- 87 -

~Xi

~ Xi'

is Poisson (nA); given

IJ!1ltinomia1 and T

=~ xi

n i

the individual observations are

Ao

is sufficient for

NO(J> :3 -- Normal

Here sufficient statistics for

IJ., ,,2

x;

are

~ (xi _ i)2
1

Problem 4:3~

Find sufficient statistics (if any) for the parameters of the following
distributions:

a--

a . e-ax xb-1

f(x) =

Gamma

x>o

((b)

b--

power

f(x)

=k

C"

Beta

f (x )

III

d..-

Cauchy

f(x)

x k- 1

pta,1

O~x~l

xa ...1

b)

=; (

O~~l

2)

1 +(x .. lJ.)

e-

nego binomial

p(x) ...

f--

normal

mean:

.,(, +

variance;

(1.. x)b-1

+ r 1'" 1) pr qX

r-

~ t

mown

,,2

--------------------------~----------------

~~:

for any real number


E

[X - 0)2 ~

ETfEx(XLT ) _

~2

..
- 88 Proof'~

E [x21 t]

where t is any particular value of the


statistic Til

[E(X t)]2

E [XJ ... ET[EX(Xt T)]


E [X 2].

~ F~[E(XI T)J2

E.r[EX (x2fT)]

we get

replacing X by (X - c)

tx -

cJ2 ~ ET[Ex(X

Proof:

coo

c f T~2

ET[Ex (X/T)

co

cJ2

Since T is suffioient then the distribution of d(~) \T


of Q and hence

is independent

t(T) ... Ex~(!)IT1 is independent of ~e


1-

2-

E[~(T)]" ETfEx EtC!) IT])

In the result of the lemma put de!)


then it follows immediately that

O'~

..

E[d(!) -

= g(~)

... E@(!)]

g(~IT2 ~

III

ET['t'(T) -

g(~)

.. c

g(~)J2

~ 0'2

/' 'r
" : .. Of

.. 89 Examples:
1....

Binomial

Xi is B(l, p)

1:1

1, 2,

.,

x ) pI(l _ p)n-X
n

OJ

where X

xi
1
(since f(x) is ordered, the coefficient
term is omitted)

X is a sufficient statistic for p.

JS.

E(X ) ... p so that


l

'C1

xl

is an unbiased estimate of

pel ... p)

eo

.... by the theorem


(X)
smaller varianceQ
Xl

in n trials we have

PI{Xr
Pr[Xl
Pr[Xl

1:1

success I X]

= olx
I:

llx]

X successes
II!

X/n

-1 .. -X

"" n

therefore

44:

...

Negative binomial:
show that

10

2B
find E[Xli
Example No, 3:

will be unbiased and have

1 if SUCCEUSS on trial 1
0 if failure on trial 1

II

II

2Problem

p:>

xl"

-Xn
lmown

X is sufficient for p.
(1 = Xl) is an unbiased estimate of p,
nO"iiegXi == 1 if faUure on trial i
0 otherwise

..

(0, s)

Uniform distribution on
Z ... max(Xl ,
E[X i ] .. ~

'f (Z)

II

x2'

01

so that

E[2Xil ZJ

Xn)
2X

II

~Xi

is sufficient for

So

is an unbiased estimate of 9.

will be unbiased and have smaller variance Gl

or>

proof~

90 -

If Xi is one of the observations less than

uniform on (O~

Z then

J 2( ~ ) ... Zc

If Xi is the maximum (1. e ~, ... Z) then


E [2xil

zJ

I:

...

is

Z),

then E [2x1IXi .c Z

Thus:

Xi

J 2Z ..

E[ 2xi Z

~l Z + ~ 2Z (1 - ~) Z + ~ Z
(1 + 1) z ... ( IL,+ 1 ) Z
n
n

4- Given that X is a Poisson random variable,?


observation) ..

T'" ~
Xi is sufficient for
~

we know that

A and

.J.

P (T) ... e-nX

(alSO X is mcl~ee for

(~;"j

(see example No

~]

e- A

PrLx 0]

..2

so that

is an unbiased estimate of

Define the estimate of

.. 0

if X.J.' :> 0

then nO"

~Y.
1

1.

e-'A.

2 on po 86 )

A and e-X is m~loeo of e- A ')

What is an unbiased estimate of e -X ??

E[

e-'A. ..

91III

III

.\j..J

To show that I (T)

III

1 -

n1 )T

r.

1 Ln(
Ii

III

T 0

1 )T]
Ii

is unbiased:

CD

E ['reT)]

1 ..

(1 - l)T
-n

( 1 _ ~ )T e -n>..
n

(n>..)T
T :

.~ en>.. [(1 *Hn>..)I:: -e-n>.. e n>..->..


T :
. Problem

45:

III

e ->..

Find the variance of


Compare it wi. th the CoR" lower bound for unbiased estimates of e->'"

--~------------~---------------------------

Sufficient statistics are not unique:


Example 1:

Poisson

f(x) ... (

~xi I ~f

is sufficient; but

{2n

!n = X

is equally goodo

)n

are sufficient for

""~

however

~Xi

gives no more information about I.L than does f

is not necessary and

92 -

Example

4: Same as No o 3, but
then

T...

~ (Xi

this set gives t.ha same inforM


mation (need ~o combine them
to estimate a )

is known"

lJ.

"" lJ.)2

is sufficient for

cr2 ~

-~-------~~~-~~----~-~~~-~--~-------------

Remark:

(aimed at problem 43.l1 but holds in general)


The i'llol:le~

is .~function of the sufficient statistic(s)

since L .. g(t, ~) h(b !)


ln L ln g + ln h
III

since it is
independent of

The solution of this equation (which gives the Molee.) obviously


depends only on T"
D!f. 32:

A sufficient statistic :is called complete if


implies

, Theorem 27:

Proof:

-=

h(T)

O~

=9 denotes identically

equal in

If:T is a complete sufficient statistic and 4(T) is an unbiased"


estimate of g(Q) then d(T) is an essentially unique minimum
variance unbiased estimate (mOv0uQ eG)e
Let

dl (T)

be any other unbiased estimate of

E [d(T}]
Q

II

g(Q)

gee); then we know that

., 93 Eg[t\(T)]

g(Q)

thus by subtraction Ee[d(T) - dl (T)

J..

0 for all

Q"

The completeness of T implies that d(T) - dl(T) 0


d(T) ~ ~(T).

or that
Further -- let d i Qf}
that E[diQi T)]..

(see defo 32)

be any other unbiaS~d estimatorG


reT)

holding only if

o~ ~ O~t

is unbiased and

We mow from theorem 26


with the equality

is a funotion of To

dv

'f (T)

But, by the first part of the proof

III

d(T)~

Hence, if we start with an unbiased estimator not a function of TJ we can improve


it. If we start with an unbiased estimator that is a function of T, it is d(Th
This is the contention of the theorem
Q

Example No"

1:

X is

Binomial

B(n, p)

For completeness of X; which has previously been shown to be a sufficient


statistic$ we need
n

x .. 0

heX) ( ~ ) px (l...p)n..,x

~a
p

======~~> heX) .. 0

The left hand side is a polynomial in p of degree

n"

For this to be identically zero in p implies that all coefficients are zero
which means h(X)" 0 for every X" 0" l~ 2, ~ 0 ~~ n,
therefore

heX)

hence
Example No.

2~

pta

and X is a complete sufficient statistic"

~ is m.vouoe e

Poisson
T

III

~Xi

is sufficient

and is Poisson (nX)o

For completeness, we must have


co

~
1;1

h(T) e.. nX (n;i


0

=0
1.

implying h(T) .. 0

on the integers

- 94 CD

or

h(T) (nA)T

T 0

TJ

1\

Such a power series identically zero means each ooefficient "" 0


so that T is complete ..... therefore

3~

Example No o

Normal

Xl' X2 ",

<)

.,

T
n

[h(!}

N(~, 1)0

Xn are

N(lL'~)

e~ ~]e-nX1L eli

~s

CI

IL 0

is a bilateral Laplace transform

By the theory of Laplace transforms (ref~ Widder Ws book) if the Laplace


transform
0 identically, then the function = 0 or
nX 2
heX) e"" -r
0
which implies that h(X)'" 0,

=-

III

therefore
If

Xl' X2' ~

complete for
Remark~

(J.J

X is completeo
Xn are

0",

2
N(lL, a)

then by a similar argument

(..X, s)
2

are

a2 c

lmy estimate which is unbiased and a function of

g( lLJI aC L

of

In particular, if'

2
2
g(lL, a ) ... lJ.2 + a 2 "" E[X ] ,

!n ~X2i

is an unbiased estimate of

~ ~X~

2
-2
.. (noel)s + nX
n

J.

= 0,

A,9

is m.v.uoe. of

is sufficient and is

-fin- .- '!f S

or h(T)

1J.2+ a 2

1 ~ 2
so ii A::d Xi is me V 0 u. e.

of

2
2
IJ. + a

- 95 !:oblem 46;

Find m"v"upe .. of Ap where Pr[x (hpl "" p"


and Xl" X21

."

are

Y1 " Y2,

~p

In are

Xn are NIO(~I a )

N(~.t

a)

~(<; a2 )

,
Sufficient statistios ar-e

2
(X$ Y', sx'

s;>
~

(i~e","

we have four suffioient statistios for three pal~ametersip


thus this suffioient stitistic is not oompleteCl
for oompleteness~ the suffioient statistic vector must have the same
number of oomponents as the parameter vector.
for an example of what oan happen by foroing the oriteria of' unbiasedness
on an estimate p see Lehmann p~ 3~13, 14@
Non-parametric: Estimation (m"vouoeo)
Let
The

1S.,

X2 'p

prob~e~

().t X be independent random variables with oontinuous dof. F(x).


n
is to estimate some funotion g(F), e.g"
9

00

g(F)

IS

)X dF(x)

a:

E[X]

provided E(X]

<.

co

-co

g(F)

i o e v5 we want to estimate the density function

F(x)

. g(F) "" F(a)

g(F)

g(F)

1:1

F(b)

""
~

Pr[x ~a1

F(a)

(Xl" Xn) such that F(Xn ) - F(Xl ) ~ 1 (two-sided toleranoe limit problem)

co<.

., 96 Theorem 28g

If

d(Xl ; X2.l1

E&qp J =
Proofg

()

0,

g(F)

JS."

In) is symmetrio in
then de!)

Suffioient statistic is T

is mov.u.eo

= (~Xi" ~xi"

X2"
of

'I

0$

<l

<)

0$

Xn and

g(F)"

~X~)e

Consider the n equations:

..

(I

These equations have e.t most

nJ

solutions 0

Assume (as is true with. probability 1) that all the I's are distinot -'" :1:f.'
x_, X2S 0 0 "I XIt is a solution, so is ar.:y permutation of the XiS"
-j,
There are nJ
solutionsQ

permutations of the Xts se these give a oomplete set of

Since the sufficient statistic may be regarded as a set of observations"


order disregarded" any function of ~ is s,ymetric in
X21 0 e I , Xn~

Xl,

To show completeness, we must show


) geT) dF(T) F

==)~>

g(T) .. 0 "

Consider the sub..family of densities

,
i

... 97

where:

p(tl' t 2J

t ) is polynomial in the t's obtained


n
from the Jacobian -- it does not involve the Q's and is
non-negative.
0

.,

..~ Qjt

CD

Hence, if

g(!.) e

jUll

p(~)

-CD

by the same type of theory as :in the normal case (uniqueness of the bilateral
Laplace transform) this implies
or g(l)'" 0
Example 1:

EstiD:ate E(X).

Xi,

X2, 0 0 .~ In'
further, it is unbiased so that by theorem 28 it is mj)v.u.e.

e
Example 2:

is symetric function of

Estimate

F(x).

We have, from the sample I

l..f------~-----~--:---===--:.:::::,,=-,..-.i;'-'~derlying
population dof.
sample d
-<.-:",~\~

01

.\1:::;> ..<'~"

The observations here are ordered


in size but not by observation
order (chronological).

. ?
j'

Fn(x)

~ (:I xi' ~x)

III

!.n .~1 ~(x)


1-

where

~1-' (x)

III

if

Xi(X

]..

n
F (x)
n

= .n i=l
~

Wi (x)

Ii

indicates symmetry in the xts

It then remains to show that it is unbiased$ i.ee, for each x: E[Fn (x)
~.

J . . F(x)

For each fixed x the number of x 'Ex is a binomial random variable with
parameters [n, F(x) J-- and henc~, since the sample frequency is an unbiased
estimate of the binomial parameter, we have the required unbiasedness.

Remark~ Fn(x)

) F(x)
p

for all x

[prOOf given later]

Ii

.
98

Problem

47: Find the mov o u e4\ of


Q

Pr[X '! a]

(a known)
with dof" F(x)o

~xi

liS

is sufficient for

(12,

among functions of the form aS~

(l

a)

find the minimwn risk estimate of

b)

is there a oonstant risk estimator of

of the form as + b?

(12

~----~---~----~------~~_._-----------------

Fit x2.estimation
.- most. generally applied to multinomial
~- us~ associated with Karl Pearson~
ref ~

Ferguson.... Annals of Matho Stat 0

Multinomial

situations~

_.

58

Dec 0

Distribution~

Given ..., a series of trials

with~

possible outcomes of each independent trial


outcome~

probabilities :tor a given


and

n experiments result

in:

v1 v 2

with the restrictions that

~Pi

I;>

vk

~v
... n
,
1.

Characteristic"function~

~
T

vl' v.2' c , v k

(t1 , t 2,

.,

t k ) .. ~

(I1.

v1

:l

n
v 2,

t
e

e 1 +

u.

0,I

0".

~~

(Ple)

Ge.

(p e k,
k

- 99 -

(t 11 0,

vl' v 2 OU$ v k

.00'

0) "" (Pie 1 P2 +

II

[l1.et l + (1 -

GO.

+ Pk)n

I1.~ n

i.e., any one variable in a multinomial situat:.ion is


is binomial B(n, Pi)~
M>ments~

Asymptotic
a)

b)

distributions~

""

vi-nPi
--~ nP (l-P )

is AN(O" 1) as n --.o.J>)' CD

k
k
2
~ 2
~ (vi'",np.)
~z. = A ~. has a

i=l].

i-l

nPi

.. ')
x""-distributionwith k-l d.f o as
.

(see Cramer for the transformation from


k...l independent variables)

v l' v "
2

~ ~

parameters

a,

n-7(Q,

k dependent to

v _ have a. -limiting multi...Poisson distribution with


k 1

h:t.t

~~

I)

0" \:...1 ("i'

v2" c o o , v k-l are independent


in the limit)"

------------------------------------------Example:

(Homogeneity)
We have 3 sets of multinomial trials with possible outcomes Ep E2" E )
3
with probabilities 91' 9 2,91 - 91 - Q2 for each set of trials.

- 100 Therefore we have the following outcomes


v

v l .3

n..t

22

v 2.3

v.3l

v
32

v3 .3

vol

v. 2

v.3

12

21

In L = In[n.. 1
V ll p v12
-~
12

These give the two


(1)
(2)

n.. V31.3 p v 2l p v 22
.~
21
22

P v 23
23

equations~

91 + vol Q2

(vol + v c )

v l2 Q1 + ( v $2

v 03)

-I-

Q2

vol

v 02

which when added together yield


(3)

N Q1 + N Q2

v"l

+ v 82 = N -

v 0.3;.;

Multiplying (3) by N,,2 we get


(4)

v Q2

Ql

V 02 Q

v 2( v 01+ va 2)
N

p v31
31

p v 32
32

P v337 + 1n K
33

- 101 -

It can also be found that

-----~----------------------------------~-

!!oblem 49:

( Independence)

Consider one sequence of N trials that result in one of the

where

i=l

Given:--s

p.

1 ;

IlS

j=l

J.

1:3

1 "

general case

serie~

p ..

't"j

and also their variances and covariances.

Pi" 'l;'j

of n

trials, each trial resulting in one of the events El1o .. ,E

_The probability of Ej
k
j=l

events

Find the lll"l"el} of

x2.estimation;

rc

=1

occuring on any trial in the

i th series is p..
J.J J

J.J

-the random variable in the problem is the number of occurrences of each


event:
k
o

-the PiJ'

G,

v ik'

are functions of

, Jl:ll

vij

= ni

(continuous with continuous first and second


derivatives) 0

_expectation is with respect to

I'jlf

... ioe." E[V

ij ]

= ni Pij

the @ti" subscript is a convenience, we could consider the


big trial"

-. the use of

s trials as one

.. 102 -

The following methods of estimating .i. are asymptotically equivalent, ieee:l


(i) they are consistent
(ii) they are asymptotically normal
(iii) they have equal asymptotic '\rariances
lE>

maximum likelihood
2
m:iriimum X

20

3" modified minimum x

transformed minimum X2

4$
1(1

Maximum Likelihood Estimation:

Maximize

k
~

sUbject to ~ Pi. (~) = 1


j=l
J

for each i,9 with respect to

Q;

or actually solve the equations:

~.~ ~ ~,~ d Pti =


i

p~
:'.J -

'9

" -

(provided su.itable regularity conditions are imposed as given in


theorems 22, 23)~

(vi - niPi)
,J

ni Pij

is asymptotically distributed as X2 with s(k-l) dof p

The method is to minimize this expression with respect to


the equations~
.. 2

This set of equations could be expressed as:

or to solye

therefore the second term


R the third tel"ll1

Hence, equation

r2] is the

eql~ls

>0

as

0
N ~oo

same as equation

[1] except

for R and thus it is seen

that under suitatle regularit,y condiGions the solutions of equation [2] tend in

probability to the solutions of eqtlation [11.

Replaoe Pij in the denominator of the regular minimum ,,2


estimated value, and then minimi3e with respeot olio ~:
2
Pij
i
XI~i1' .. ~ ~ (V;ij: n )
i j
ij
subject to

Pij 1

for

equation with its

i 1, 2, , s.

Comparing the two ,,2.,. methods

X ... Xm

III

...
R.
are

2
) 0 faster than does X --....aS~l!l.pi.;otioal1y equivalent.
p

therefore methods

(2)

and

(3)

The equations to be solled in this oase are:

~~

40

Transformed

ni

(V";j.:'ni~ij)
'''.,....,'1

1I'~i.'n''!!LL

Recall that 1

In(Xn..~)

-b;-

is a.symptotioally normal~ then

and the asymptotic variance of

g(X )
n

is

cf[

S(X )-g(~)

Cng'(~)

g t (IJ.>] 2

is ll.N.

104 Before writing the necessary equation, write

vii
qij .~

thus the qij

Thus the modified minimum

x2

X; ~ ~
i

are random variableso

equation can be written:


2
~i (qiJ - Pi
.
qij

32..

The equivalent estimation procedure is thus to minimize

~~
i

D1

[g(q~)

g(P~W

Qij Lg.(qij>]

(the numberator should be Pij [gf(Pij}J2 but the difference


occasioned by the modification
>0)
for gts which are continuous with continuous first and second derivatives" and
with g' (Pij ) bounded away from zero~
This method will be useful i f the transformation; gl!ij{~~

is simple.

The equations to be solved are thus;

Summary:
.

ni [g(Qij) ..

qij [gl(qij)]

Method (1) [mol"e"

Method (2)

g~;ij) ]

J... is most useful if the

Pij
products, i.e., Pij Pi'Cj

2
[minimum x ] -- most useful it Pij

can be expressed as
as in problem

49~

can be expressed as a

sum of probabilities, i.e. Pij .. .,(i + Pjo

J-

2
M9thod (3) [modified X
Method (4)

same as (2).

[tranSformed x2] -- may be most useful in some special cases..

\) t ,1----1

t V'\\.
01-) / V\,

\) '2'

- 10.5 -

Examples of minimum - "

prooedures:

1. Linear Trend:

-vobs.

p.)

ni 2

total

prob~

Pup-A

12

P12- l - p + A

v2l

P2l P

22

P22 1 - P

Il

31

31

., P + A

v\,

v
32

P32

III

1 .. P - A

Problem is to estimate both p and Ao


2
Using the modified minimum_x
procedure:

l [vu""':J.(P-d)j2 [V~l + V~2] + ("2l-n2"Y[~ + "~2] + [".3l-n3(P+d) 2 [vi + V~

-t ~~2.

~ [vll"~(P-A)J +
where

a 2 [v 2l-n2P] + a3

rV31.. n3(P+A) ]

a i ni'...L +..L)
Vil

i2

These tw equations yield the following two equations which can be solved simultaneously for p and A ~

Note:

remember that this procedure is asymptotically equivalent to the m.l.e o - therefore the asymptotic variance-covariance matrix can be found by evaluatin~

!.2 ).2y 2:

~#

! ~2y2

2U'

l_d_

2' ~ at the expectations since - ~ is the exponent

of the asymptotic normal distributiono

- 106 -1
a..
. J. = n1 ( v

l)
+
v12
ll

Recall

E[vlIJ

Replacing the random variables in

= ~ (p-A)

by their expectations yields

1
1
I
--p--). ( p-A + l ..p+l ).. {p..A) (i.. p+A)'
I
p 7 p(l..p)

Similarly~

...

thus the inverse of the Ao V-COV matrix is

-----------------------------------~-----~-

2e

Logistic (Bio-assay) problem (Bergson)


... applying greater dosages produces greater kill (or reaction)
s

= general

xl'

~$

ni

=2

xs

dosages

0,

ns

receive dosages

09

va

die (or react)

o~

nI" n 2"

vI,\l v 2'

:>

e-(-<+~xi)

= l+e.(-<+~xi)

1 .. Pi

Vi

q.J. = -n

=- proportion dying or reacting

107

2
Applying the transformed- X method to estimate

Putting in the values of

g, g',

.,/."

and using the transformation

we have

,,2 ~ n.
i=l

---------~-----------~----------~---------

Problem

50~

Consider rc sequenoes of
where in trial (ij )

n trials whioh 'lrS.y result in E or E

i
j

Set up equations to estimate

"/"i"

10

maximum likelihood..

20

minimum modified X2

~j

by

based on observations v ij ' n - vij

1,,2, 00 ."r
1,2,.06,0

- 108 Problem as stated produce.s r+c equations in r+c unknowns" but their matrix
is singular -- i.e., the equations are not independent by virtue of the fact that
the swn of the I' equations for the .,(i equals the swn of the c equations for
the ~jfl
Therefore, reduce the number of sequences to
and add the restriction that

P22

CI

4i

P12 -+ P2l - Pll

ioe o " reparameterize and find the equations necessary to solve for the
new parameters<ll

Given

51~

Problem
s

sequences of n i

(let the number of Eis

trials may result in E or E where in sequence

observed be denoted by vi)o

Set up the equations to estimate


l(t

maximum likelihood"

20

2
minimum lOOdified X "

A by

and find the asymptotic variance of


(noteg
G~

1.

Ferguson discusses this problem in his Dec.

-Minimax estimation:
Recall that

d(!) is a minimax estimate of


sup R~,
Q

Q]

R[d, Q] E[d(!> -

58

g<.~) if

is minimized by dqf)

g(~r

[see def e 24 on p.

article in the Annals)

[d(,y - g(,2.>r f(!.)

54]

- 109 Methods to try to find minimax estimates:


1.

Cramer-Rao

cl;:"

[l+b'(Q

d ;;;..-

nE[d ln

E[d{~)

g(~)J2

x ~1

J:.:l+ b'~lL

nE

~(!) -

III

<{ +
Rld,9] >-

,/

f1+b (9)1
nEI~ In

-an92f(X~J

[. d

E[d{!)] + E[de!)] -

Lb(9)}2

+ b 2 (Q)

k(Q)

f!&12
J

dQ

If we have an est:iJnate
d(X) which actually has the risk
k(9)
estimator with risk less than k(Q) leads to a contradiction.

Example:

~"X21 ~

0,

J2

g{~)

Xn are N(~,

then any other

J)

2
and we know R(x). ....2n

Lehmann shows that

which implies that b(Q)


Therefore X is a minimax estimator of
2..

~0

This method based on the following theoJ:em which will be stated without proof G

~eorem 29:

!.f ~ Bayes estimator [deft 23] has constant risk [defo 25] then
1.t 1.S minimax/}
d(~)

is constant risk if E [de;) - g(9)]2

de!)

is Bayes if d{!.) minimizes


where

ref:

Lehmann -- section

Example:

4,

is constant.

[d(!.) ..

gee) ]

G is the Ita priori" distribution of

f(x, Q) dG(9)

Q.

p. 19-21

Binomial -- X is binomial

B(n p p)e

Recall that we found that

d(X)

rn

!. +

1+.Jn n

1
2(1+Jl}

is a constant risk estimator of p [see problem 31" p.

54].

- liOhas a prior Beta distribution

If'

.f'n"

f(p)
then d(X)

I:

K P2"

12. 1

(1 _ p) 2

is Bayes;

hence tJ?,e minimax estimate of the binomial parameter p

rn_!+
l+Jii n

1
2(1+$)

is

Graphically we have the following risk functions:

1
rn-- - -- - - - -.---.-,.

.It.::;

o ;!.

-+-

Problem 52:

Compare

( d

Ho

Efd (X)_p,2
l - 'j

Rd' (p)

and Rd (p)
m

for

I:

:e(l-p)
n

n =25..

X
- ; d = minimax estimate)
n
m

Wolfowitz is Minimum Distance Estimation


Xl' X2,

.~

Xn are observations from the distribution F(x,

Q).

We are also given some measure of distance between 2 distributions

e.go:

(F, G) =

sup

1F(x)

ro (F,

- G(X) \

.co<x<oo

Ii

(F. G)

(fr(X) -

Also we have that the sample d.fo Jl

G(x>]2

<lex) ; G(x)

F (x) -= noo of.l's<x

(see p. 97).,

G) e

.. III -

Q'

is a minimum distance
minimized by choosing 9

Note:

es~mate of

= 90

if

f. fF(x,
t

Q), F (x)] is
n

We want the whole sample dof. to agree 'With the whole theoretical
distribution ... not just the means or the variances agreeingo
~

In particular, i f we use the "sup-distance" .then

which yields

suP' F(x" Q)- Fn(x)

min

9 -CD <x <CD


/V

Remar!.:

Proof:

"..-

n) 9

differs from 9 by

15

Then for some

and for some

jJ

Suppo se ,)for all sufficiently large


more than 8"

sup

is that estimate of

eo

is a consistent estimate of

I F(x, ~)

.. F(x" 90 >J

> 15

We want to show that


sup

"'CD<X <00

Let x'

F (x) - F(x,
n

sup l Fn(x). F(x,Q>


o >} L..-oo.(x.
<. 00

I"

be the x such that


,.J

F(x', Q) - F(x'a Qo > = 15


Q )

therefore F(x',
thus

< :315

'Q) '" F~ (XSj

sup F (x I, Q) - Fn (x W>J
1

15

>j
)

but this contradicts the fact that


therefore

is consistent.

",

15

sup

minimizes

Fn(x) .. F(x"

o)

sup Fn (x) .. F(x,

Q)}

Example: Find the minimum distance estimate for

IJ. given Xl' X2, 1


are
3

N(IJ., 1.

~ = 1

p
Fn(X)

_::::;:::....,,==....--

2/3
1/3
.\

o
Note:

The n:sup-distance" will obviously occur at one of the jump points on F

(x)~

n.

To find the minimum distance estimate of IJ. an itterative procedure was used$! whicJh
started by guessing at IJ. and then finding F(x, IJ.} at each X.o
J.

204 zi=xi-IJ.
P(zi)

767

lilt

-1.4

"",0.4

1,,6

..

,,081

0345

0945

e325 (067 - w345)

zi

...1.,3

-0 0 3

10 7

f(zi)

. . 097

0382

~955

~288

2 e 33

P(zi)

..371

&952

299

20 29

f(zi) ;:

0386

.956

284

2,,3

Other values of
therefore
~~em

e.

(\33

X
3
1 00

Fn(x)

X2

Xl

IJ. were
IV

tried~

IJ. ... 2,,29..

(x

but the min sup-distance

..

(.(,1- 3~ b)

284,

=2,,33)

53: Take 4 observations from a table of normal deviates (add an arbitrary


factor if desired) and find the minimum distance estimate of

IJ.p

113 Q.hapter
I~

V~

TESTING OF HYPOTHESES ... DISTRIBurroN FREE TESJfS


.~

Basic concepts..
Xl' X '
2

Der Q

~, X

have distribution function F(x) ... continuous


-- absolute continuous
[density f(x)]

The hypothesis

H specifies that Fe 7o(some family of dQf.)

Alternative

specifies that Fe

33:

A test is a function
iQe.~

if f(!) ... 1

7- f{

J(~) taking on values between 0 and 1


reject

H with probability

... k

II

tI

II

II

it

Illl

(note that this considers a test as a function instead of the usual


consideration of regions)
Def e

34:

.,( if E fI(!) ]

A test is of size

fr'(!lJ

rf(~) dF(~)

for Fe

? ~ .,(

"'00

(this says that the probability of rejecting

~~:
~...12~

Power of a test is

r~1

E[1'(!)]

Fe

1",. ~

and is denoted

is a consistent sequence of tests if for

~96 (F)

Def e ~:

for

H when it is true

as

Fe

~f (F)

7- (

n ~ (I)

Index of a sequence of tests~


N(

Cfn~

1: co<!)~)

is the least integer n

~ co<)

such that, given that

.. 114 ..

l-~

+-----/

I--

--Io-;-

7~

-- 1: 10)
Def. 38;

Asymptotic Relative Efficiency (A~RoE.)

Let

p(

Let

fn and 9ii

be some measure of the distance of ;/;rom

10

/\A.-'

be two sequences of consistent tests

then the AoR"E e of

lim

N(

--

~ to

~ is defined to be

~'1," 7~ ""'3 ~)

---~-----

p-';'O N(

fn9 7~ .<.~ 13)

provided the lindt exists

(0' mown)
alternative~

Consider two tests of

IJ.

H~

10

t he mean te st ~

2"

the median test:

reject

H if

>0
... ~-z
X)'_..-

(J

Vn

[use the large sample distribution of the median, i"e-the sample median is asymptotically normal with mean
... the population median and variance = Ra2/ 2n

if test:
a)

rejeot

H if the ssmple median

'7 "1--<~

find the index for each test for the alternative

~!

- 11S b)

find the

A~RoEo"

ieee
lim N~f',
IJ.' -7 0 N(

II g

r;

,*U 2

.,(,9

IJ. r;

co<."

@)
(3)

Distribution Free Tests


refs~

Siegel -- Non-parametric Statistics


Fraser -. Non-parametric Methods .in Statistics
Savage -- Bibliography in the Dec. 19S3 JoAcSeAo

Ao

Quantile and M:ldian Tests:


Let the solution of the equation F('" )

I:

If the solution is not unique, define

Given the problem to test


note~

Test:

Ho : \

I:

the number of

against the alternative

H~

<
-

A0

(q ~p) at level

co<.

JC]." x 2"

I:

pJ

~: Aq

= "'0

/)

"

oa X

X is B(n" p)

for the tl<ro sided test


reject

~::!!lple:

"'0

.. infx(l(x)

A.

the altemative could be stated ApI


-L A.0 but this statement precludes
.
consideration of the power of the test since the power in this case wouJ
be undetermined by virtue of the unknown behavior of F(x).

under

Problem
-.

..

P define the quantile

using the normal approximation

. /) Jx~ npl- ~ >

H if.

tJ np(l

- p)

;;

zl..<./
.., 2

SS: Compute the power of the test (using the normal approximation) as a
function of q for n"'" 100 and p == 005
for paired comparisons" to test if the median is equal to zero set up the
series of observations dl , d 2"

0"

dn,

di

= Xi-Y i

- 116 This test that the median is zero is often referred to as the sign test, sine(
in this case is merely the number of d with a negative signa
i

pete 40:

the sample quantUe~ is defined as the solution of , n (Ap )

>-p"

=P

Theorem 30:

\>

has density function

where
Proof:

ref:
Pr

l.L -= [np]

which denotes the greatest integer in

. np

Cramer po ,368

[Xp ~x J

Pr [l.L + 1 or more observations {x]

III

( j ) [F(X)] j

[1 - F(X)]n- ..

j-l.L+l

li'(~)

To get the density hex) [assuming that F(x) has a density f{X)]
we differentiate the summation, getting the following two summations
[trom each part of the produot thatoocurs in each term summed]
n

hex)

-=. ~ ( j ) j ~(x)]j-l
J=l.L+l

[1 . . F(X>]n- j

... ~ ( j )

(n - j) [F(x)]j

j-",,+l

f(x)

[1 - F(x8n-j-lf(x)

The corresponding terms [in ,a(l .. F)bJ cancel each other except for
the first (j'" l.L + 1) term in the first summation which has no
corresponding term in the second sum.,

/\

By the usual limiting process applied to density functions, A


normal" with mean

\>
.

and variance . 2 1

f{\)

p(l..,p)
n

is asymptotically

- 117 ~ob1em

56: Let

J$;

ha11e density

co

) rex)

where

dx .. 1

""co

and

rex)

....1
c

f(

X"/J: )
C1

co

x rex)

dx .. 0

x 2f(x) dx -= 1

"00

is symmetric about Oc

Find the condition on rex) such that the AcR.E. of the median to the mean
is greater than 1" Use large sample normal approximations tor both tests"
Specialize this r.esult to

I x..,u I
o;-~ ..

f(x)" ~ e-

CJ.)

<x <0:>

--~-------------------~-------------~-----

III.

One Sample
H:
o

Tes~~

F .. F0

completely specified

~ \)

the smaller distribution has larger


observations.

-J..

:Ell:> F == F.,,..F
.I.
0
problems:

1)

goodness of fit (usually a 'W-.lder problem since F


o
completely specified)

2)

ilslippage test!i.

is usually not

3) combine.tion of tests -. ref: A Birnbaum, JASA, Septo 1954

4)

randomness il1 time or space _.

re~

Bartholomew, Barton" and David;


Biometrika; circa 195,.6

Randomness in Time:
Assume:

events in (O,t) (the time interval

i"e." is Poisson with parameter

to

t)] e

-At J.~t)n
n'

At
"-.

Probability of event oocuring in one t.ime irrlierval is in:iependent of an occurrence ir


any other time ~tervalo

GOO

.. 118 i th event occurred c

Let T. be the time at which the


J.

~ t.

j=l

Pr[ti>t
tet)

III

= density

f(t 1s t ,
2

I)

prGo event occurs in time (O,t)] ..


of the t
0,

Distribution ot Tl' T2'

T1

== t

At

),e" At (the exponential distribution)

~ e "A~ti

0 <u;

Tn

t ) ..

e"'

s:L"lce the

IS

are independent

is obtained by transformation

Therefore, let us find the conditional distribution of Tp T , ~


2
given that n events occurred in the fixed time interval (0, T)e
f(T 1 , T2~

~ ~, Tn ; n)

= density

ot Tl , T2,

!>

.,

Tn

0'

Tn and also the probability


that no events occur in the time interval (T , T)
$

..

,\n
-AT
1\ e

ten)

This distribution is the distribution of


random variables on (O~ T)c

ordered uniform independent

- 119 Ordered Observations:


~, X "

0,

I , I 2"
l

.,

If the XQs
between the

X have density f(x) and are independent.


n
In are the ordered XiS.

have a continuous distribution then we may disregard any equalities


or the I's~

Xrs

The marginal distribution of the Y


i
10 Integration:

Q)

00

nl

can be obtained by three methods, i .. e.

f'(Yn ) dyn

Yn""l

Ii

f(Yi+l) dYi+l

Yi

Y3

f(Yi...l) dYi_l

/)

"'00

f(y.)
J.

Y2
f(Y2) dY2)

f(Yl) dYl

-00-00

note: ...- the observations above Yi are constrained by the next lower
observation$ since this is an ordered sequence
-- the observations below Yi are constrained from above
integrating this we get
gi(y)

20

lllI

the marginal density of Yi

By differentiation:

Pr\!i.( Y] =

Prei

or more observations fall to the left of Y]

1:1

J=i

( j ) [F(y)]j

[1 .. F(y)l n- j
:.J

... 120

3(

tn.i)~ti.ln

[F(y)]i-l

.., F(y)]n...i

fey)

Heuristic Method (Wilks):


dy

_ _ _ _-"'~1!i.._;

1i
Y

(1-1) observations
in the interval
(-eo, y)

(n...i) observations
in the interval
(y" + 00)

The probability of this is essentially a trinomial distribution therefore

Example:

In particular, if we have the uniform distribution:


fey)

F(y)

=Y

n).
J.-,l
gi ( y ) = "{ii..n trI:IYl y

(1

.. y

)n..."i

We could also define the spacings~

(which is a Beta distribution)

81 " Yi-Y i - 1

i 1 , 2,

QI

n+l

n+l

i-1

8 1 -1

Y .. 0

Problem

5n Show that each 81 has the same densityo


a) find this density.

b)

find

1)

{~5~]

n+l

11) E [

11

1~ r51 - n~~

----~---------------~----~-~--------------

-121Lemma~

Probability

Transformation~

If X has distribution F{x).. F(X) has a uniform distribution on (0" l)c


u

define the inverse function as

PrI!(X)~U]

Pr[X~F-l(u)

F'"'l (u)

...r.-

_ _ _ _ _ _ _.......I - -

in! [F{X)

F~"'l(u)J

au

One Sample Problem can be put in the following form by use of the probability
t ransl'ormation:
Given Xl' X21

')

0);

In; let

increasingly and define

Ui

III

Xll)'~' I(2)~

H:
o

o<X(n) be the IVs ordered

i = 1,9 2,

Fo(I(i

we have a set of ordered cbservations


interval (Oi 1) which ul'lder

U1 , U2s ..

eo...

!9

Un on the

has uniform distribution (with density n,t')


[or equ~salently;,l that the original XiS had do.f o Fo(X); i.eo" that
the correct probability transformation was used]

and under HI ~

has distribution F
1

two-sided problem

[F~l

a Go

one-sided problem
G{U)~ u -- with the

strict inequality for


some interval

Some of the Tests for One Sample Problems


1.

n which is

AoN~(~,

rk )

- for the oneo:osided sitnation


-- consistent
-- reject 1i if it is "too large"" ioe~ if
(for the above pictured situation)
or if

ii is "too small"

'O'>~ + zl-.-<. (lin)

in the 'converse situation .. [i"e." G(u) >uJ '..

.. 12.2 -

n
20

-2

i=l

whioh is X2 with 2n do!~ (rafo problem 6)

In U
i

e~

X2 is exact, not asymptotic

....
-""

consistent
for the one-sided problem
used in combination problems
reject H if' X2 is Of too largeU

30

<.02

i-l

ln (1 ., U ' _.. also is ,,2 with 2n d.! 0


i

....., tor the one-sided problem


Pearson's counter to Fisher's advocating
",,- consistent
... reject H if' X2 is ~too small If

=.

4G Distanoe

NOa

Type Problem

Kolmogorov Statistic defined as follows

n+ ... sup [Fn(U) .. u] )


~u~l

n-...

sup [ u - F (u) ]

sup

III

o~uu

for the one..sided problem

, Fn(u) - u I

for the two,.,sided problem

o~u~l

.- is consistent

5a

Related to the Ko1.mogorov statistic is


R+ ...

sup
a~u!l

III

sup
a~u~l

-----

Fn(u)...u
u

one""sided problem

tFn(u)-u)
u

two-sided problem

not necessarily consistent


lIa" is arbitrary but positive
derived by a Hungarian, RenYi
not of very great merit

6.,

12.3 -

eu~

)[Fn(U)

....
__
---

ul du

two-sided or one-sided problem


sort of a continuous analogue of ,, 2
consistent
due to Von Mises and Smirnov

n+1

7~

wn = ~ i=l
~'8i ~ I
co

-- ref: Sherman$ Annals, 1950


-- one-sided or two-sided problem
.... consistent
-- is AoN.

n+l
8f)

~ = ~ S~
i-1

J.

a_

one~sided or two-sided problem


-- due to lII.bran
...- is consistent

90 Ul

., 1

is AoNo(O$ 1)

(Wilkinson Combination Procedure)

-- one-sided problem
generally not consistent
Problem

58~

a) Find the test based on


with g(u) = G'(u)

Ul

for the set of alternatives

b) Find the power of the test for the alternative


c) Find the limiting power as n

~co

G(u)

:I

G(u)")u
uk

O<k<l

(if the limit is 1" the test is


consistent)

10. ,,2
11$ NeymanYs smooth tests
-- discovered by Neyman about 1937, but never generally used
-- re.t'g Neyman; Skandinavisk Aktuarietidskrift; 1937
Pearson ... Biometrika; 1938
David"" Biometrika; 1938.

- 124 -.

Problem ,9~

Xl" X2,
let

o.

= max

are independent with dof~ F(x), density rex)

Xn

0,

Xi -

min Xi

l~i'!tn

l~i~n

a)

Find the distribution of

b)

Suppose F(x)

R...

has density of the form

E(R 1- k(f"

show that

(J

x 2(>:) dx

)f(X) dx 1

where

1: f( x""tL
a

~1

n)O'o

-------~----------------------------------

Theorem 31:
F

If
!!22!.~

is continuous,

Fn

We want to show that given

- Pr

[I Fn (x)

.. F(x)

I <' s

p>F for
8, (3

for all

au

Xv

we oan find
x]., 1 -

such that for

n '/ N

(3

Let Af) B be such that

F(A) <. 8/2


l-F(B) <: 6/2

Flok out points X(1).9 X(2)" 0 0 .. , X(k_l) in (A, B) suoh that


whioh we can do because of uniform continuity.
Now set

F[x(i)

1.. F(X(i_l)] <

= X(o)

ConSider a sample of size


(x(i_l)' xCi~

no

Let

has a

,,2

i=o
We oan find an

1>1

such that

n.

1.

=;/-{x.
n J

that fall in the

distribution with

k+l dof"

i~ interval

- 12,5 -

or

Pr

[X~l

MJ /' 1.. S

<.

Since the above sum is less than Mj each term and aJ. so its square root are certainly
less than MJ therefore~
Ini..nPii
PI'
[ -JnPi ... Mfor each i 0, 1, , k + >1. 6

<

which could be written

Recall that

~ n.

j=o J
n

Now

J Fn(X(i)J

~ F[X(i)l/ /~? - ~ P I -/' ~(~. P ~1~ .~ I~ - P I


j=o

j=o

(k+2)MJP!

Choose n

rn

so large that

< 2'

J=O

JUO

for all i

Henoe, i f n is chosen this large s with probability l .. S the following


relationship will hold:

Fn(X(i)] '" F [XCi)

11

<.

(i+l)M.JP"i

vn

<:.

Consider x lying between x(i-I) and xCi)

<.

*
F[X(i)] - F[x(i)]

+ F[X(i_l)l Fn[x(i)l

F[x(i_l)l ...Fn [XCi)]

F(x) Fn(x)

~ F[X(i)J

<C 8

8
'2'+~&'lS

with probability 1 - S

* lltUieiI1g the following

x'elalllonsblps:

'i.'* making use of the following


relationship:

.. 126 ..

..

~ <: F[X(i)l

~ < F(x(i_l)l

.., Fn[X(i)]
..

<~

F(x(i~ <0

Therefore the theorem holds c


------------~-------------~----~-~-------~

Example:

Referring to test No o

4 given

on page 1220

is true, F ~uniform, thus for


n

If

If

G is true,

IFn(U). G(u)

1<

Therefore" the test based on D


probability tending to 1;

eo

sufficiently large

and thus

sup

IF

I Fn(u)

(u) - u

DO

I Fn(u) -

u, ..(&0

uJ '>s.

will reject

with

henoe the test based on D is consistent$


Problem 59 (Addenduml (see p. 12~ ~
c)

Find the distribution of R for

1) Xl' X2,

()

Xl' X2,

2)

note:

Xn uniform on (0, 1)
~1 Xn with exponential density rex)
0$

=a

e- ax

in the uniform case U and R are dependent, in the exponential case


they are independent l since the upper limit of the observations is co

---~-~----------------~------------------~

Remark:

The Kolmogorov statistics, Dn, D+, and D- are in fact invariant under
the probability integral transform"

Proof g

we have to show that

sup
f F (x) "" F(x)
-a><x..(oo n

III

sup

FJF(X)] ...

o !F(x)~

S!

Fex)

sup

IF

0 ~u ~l

(u) - u ,

- 127 -

Fn (x)

where the YI S are the 0 rdered


observations
X2'
In

JS.,

.i
n

Therefore:

III

o ~u-{l n

sup

, F (x)

..c:o~x <con

F(x)

D~

Distributions of

When H is true, and Ul, U ,


2

a) lim . Pr
n~CD

.,

where

, F (u) ,. u I

sup

rJ

(l

III

[rn D: <z J ..

Un are uniform, then

_,

QI

e_2z 2

(due to Smirnov)

0 {z <co

_. Finite dist!'ibution of D+ given by Zc Birnbaum + Tingey in the


Annals of Math., Stat..,
-- Tabled by Miller in JASA, 1956, pPo 113-11'0
b) 1(z)

III

lim PrrJii D+
n-7 co l~
n

.c. z

J.

22

CD

~ (_l)m+l e- 2m z (due to Kolmogorov)


m=l
O~z <CD

-- Tabled by Smirnov in the Annals of

Ma.th~

Stat"" 1948

The simplest proof of both results is due to Doob in the Al".nals of Mith"

Stat~"

1949

Some tables on the finite distribution of D are given by Massey in JABA" 1949"
ppe 68-77Q
n
Consider now that

is t:rue, i.eo, that U has, de!. G(U):

n- test is to reject

H if

PI{D~ > en1 . . .,(

n'"n

> en

where

Pr[.rn D~ >frl I-;nl


thus

lIn. .,( I
2n
~

max. difference between up G(u)

D~ .. sup [u ... F (u)]


o ~u~l
n

- 128 Suppose

is ture so that

maximum / u - G(u)
Then:

Pr[D::> fin

A at u

1~
~
~

Fn
III

is the sample d.f

l)

from

G( u) with

Uo G

Pr[uo - Fn(uo )

> en ]

Pr[Fn(Uo ) - uo<-e n ]
Pr[nFn(uo ) t::n(uo - en>]

But Fn(u ) is a binomial random variable with expectation


o
n(u -e )
. 0 n
and
Binomial probability'" ~ F(k; n, u - A)
k=o

Prob1em.Q.:

D~ test where G(u)'" u 2

Find the bound on the power of the

Using the normal approximation given an explicit f.orm for this bound
in terms of
x
2
,J, (x) = .1n, -<, 'I
e _t / 2 dt

J2n

Test No 0

IlC

5; Renyi Statistic
+

R
R

=0

Fn(U)-U

sup

a~u!l

= sup

u
Fn(U)-U.l

a<u~l
.....

Limiting distribution:
lim Pr [
n"-1 oo

rn R+<z]

For the distribution of R and further discussion see an article by


Renyi in Acta Mathematica (Magyar), 1953.
Test No. 6:

CJ~

I
1

1
sa

[Fn(U)

lF~(~)

UY

co

du

-2uFn (u) + u

j
2

(Fn(x) - F(X)]2 dF(x)

du

- 129 ttl

i
n

eeg., F (u) is flat for an interval


n

fo

recall:

un+1" 1

n+1

.. i=l
~

i ( u2+'"" u 2) + !3
-n
i
i

2 "'"
( -i )2 (u 1 .. u. ) ...,:r...c::J
1
n
+
1
~ 1-1
n

.. 1 +

2
u du

~ ~ (1

n~ 1-1

.., 2 )u .. ~ (~
i i
n 1=1

u~)

.., 1 +

i
J

.:F.,2
'n

.""4J

Cramer shows that: GJ2


n

III

Tables of lim Pr(nw2

1 + 1 ~ [u 12n2 n 1-1
1
z]

~
2n

have been given by Te We Anderson and Darling in

n~co

in the Annals of Math o

StatQ~

19$2, P3 20611

Approache.s to Combining Probabilities:


Exampte~

The following probabilities are from a one-sided t-test:

...R-.
0026
.11$
,,27
036
&7$

-.1F
02

03

04
S

J....
,,76
081
,,89
,,92
.98

-06F

07
08
09
1 0
0

.. 130 ..
Under the basic J::wpothesis the

pts are uniform on (O,\i 1)

Alternative tests:
acoept H

P OeS7
21.'

To test alternatives of the form G(uu the Kolmogorov-Smirnov test


statistic is
n ., U

D:. F

sup D+

OQ08S

10

Pr[~O '> .342 J 015

From Miller's tables:


..
1
U ... ~

34> D. eS88

"
0088

J~r ~

Pr~ <' 0964]


Since

0e964

067

BtU] under

H.L <: ELU under Ho we will reject

therefore we cannot rejeot

H if'

Ho~

40 Against two-sided alternatives


D = 035

(,,75 - 040)

See Massey's tables for small sample size probabilitieso


Usi.l'lg the large sample approximationg
co
22
Pr
D
2 ~ (__l)m e- 2m z

[rn z ]

Pr

[D)~ ]

no

(II

at

L(z)

mal

L(035)

Example~
-

In the following-

table~

the Xi are taken from a table of N(O,t 1) normal


-- the XCi) are the ordered Xi
-- the U...
1.

..L (XCi) e- t 2i/2

{2n

)
..co

dt = PI'

f J
Z

<Xi

deviat~so

z <zco(

- 131 -

the 8 are the spacings between the Ui


1

042

-0038

-0002

-0 0 02

.49

088

,,08

.53

040

009

.54

031

064

'?40

066

e08

042

.66

1012

088

011

-Oe38

1;p12

081

031

1016

&96

X(1)

1.16
009

Under H:

Ui

Xi

.35

8i
~

-;"09

.14

009

&04

009

~01

~09

.10

t>09

.02

009

,,00

1109

.05

:>09

c16

~09

009

t}09

004

009

n--!

the XIS are N(O,\l 1)0

The test statistic is: 10


1
l.Jn =-2
~
i-O

Is i . n+l
.1.../

Ill;

Oe38

Ignoring the slight negative oorrelation between the S., one could use a
normal approximation with:
1.
E[W ]
n

l:

! .
e

0037

v(

W)
n

2e-?2 (0.07)2

lDe

~amp1e:

It you want to test

transformation:

H:t ~

u..
2

the X' s are X(,5) then you should use the probability

57~i?5)
It ~

X
t 2 i-l
r e- / dt
t'

- 132 ...
Combination
-

Probabilit1es~

- --

of Test

...

Ho:

U is unito rm

H:J.:

U has distribution G(uu

(i~e.J the observations tend to be smaller)

ref: Ao Birnbaum, JABA, September 1954


He oonsiders a oomparison of the following
1... Fisher:

-2

ln Ui

Fearson~

-2

In (1 .. U )

2<:

tests~

i
..- found unsatisfaotory for moet applicationso

3- tJ.J.. .... not consistent, but is this important?


Birnbaum conoluded that -2
testing normal meansc
other possible

tests~

D,

~ lnUi was better

~,.9

1'0 r the one particular case of

...

U ; have not been studied in this light.

Goodness of Fit Tests:


HO:

F - F0 completely specified

The test usually involves estimation of parametersQ


The only oompletely worked out theory is for the x2-testo
For other suggestions see:
.- F.

David~

Biometrika" 19389

-- Kac, Kiefer, and Wolfowitz, Annals of Math o Stato" June 1955.


They present a Mmte Carlo derived distribution of W 2 and n
for the case of testing H~ Xts are N(fJ., 0'2) where lJ.; n 0'2
nare
estimated by
s2 working with n-25, n=100 o

x,

For consideration of one-sided tests see:

Ohapman, Annals of Mathe> Stat., 1958

... The nff test is allminimax" test (among Fisher, Pearsen$


of the one-sided hypothesis He: ,
ioe~,

'0 versus Hl

oR, W~ , U)

F .. '1<' Fo.

timinimax ll in the sense that it has minimum power to piok up easyto-detect alternatives" maximum power to pick up hard-to-detect alternatives.

- 133 -

IV. Two Sample Problems s:


X1"~"
Yp Y ,
2
Ho : F

1:1

\!l

.,
II

9'

have d~f. F(x)

Yn have d"f. G(y)


H1I F < G

or

F>G

Hl :. F "G
In the parametric case one could use the normal approximation en d the two-sample
t-test on the means, but these hypotheses are somewhat wider~

Partial list of testsi


1. Median Test
2. 0 Runs Test
3. WilcoxentsTest (also called the Mann..v-lhitney test)
4~ KolmogorovoooSmirnov D-test
5. Ven der Wa.erdents X-test (or Terry's C-test)

All we need for these tests is to be able to order the observations; magnitudes
are not important; eeg. lv y. y \/~ \.'
\

~3~~W~

Test 1:

For the median test set up a 2x2 table classifying the observations
as above or below the median o! the combined sample. For example:
Below

Above

Total

X:

Y:

m+n

m+n

m +n

Total:

2
Test Statistics is the usual X for 2x2 contingency table s with one d.f,

Test

2:

For the runs test, set


I'

= number of runs of Xt S and of Yt s in the combined sample (in our


example r = 4)

If' H is true, then the Xis and yts are intermingled and the value of

o
r will be "large"; 1 Hl is true, then r will be "smalllt

- .134 Test j :

For the Wilcoxen (Mann-Whitney) test, we define:


=1

= 0

otherwise

m n
~ ~

0:

in our example U = 22 (4+4+4+5+5)

~j

i-1 j-l

Wilcoxen's original test was based on

Rf

where ~ = the sum of the ranks of ~ in the combined sample ordering from
the smallest.

S1m1larly R':! = the sum of the ranks of Y.


Problem 62:

Test

4:

2.,.

Prove: U = mn + m(m

1) ...

~ ~

n
0:

3.=1

~ R~ _ n(n + 1)

. 1 J
JD

Kolmogorov-8mirnov define D for the two-sample problem as:


Dmn

= sup

\ Fm(x) - Gn (x)

-oo<x<co
Test

2:

For Van der Waerden' s X-test we


let . (u) ..

.l:fin

_t2~
dt

-eo

then the test statistic is:


Problem 63:

a..)

let Q.

X=

:t~1 r(

RX
m + ~ ...

1)

number of Y's which exceed max(X p ~,

I!l

8'

~) 0

Find the distribution of Q in the general case"

b) Specialize the result in (a) when H is true.


o
c)

Find the lindting distribution of Q for case (b) as m~, n

~oo,

.. 135 Runs Test (Test No 2):


Q

Ret lI' :
r

==

Mood Chap 16
fl\

no c of runs of X' s or Y' s in t he ordered combined sample.

There are two cases to be considered-that of an even and that of an odd


number of runs.
Even case:

The number of arrangements of m X's and of n Yrs that have tl:e


property of giving rise to 2kruns can be found by a simple
generating function device.

Again~ there are two possibilities, i~ell starting with a run of l'S or of
yts, so starting with either, we must multiply the end result by 2 since
the two starts are symmetric.

Starting with the X's, they are divided into k groups, all non-zero. To
find the number of ways of doing this, consider the ccefficient of t m in the
expansion of (t + t 2 + t3 + ., )k.
(t + t

23
k
t
+ t .;. He) = (l-t)

==

=tk

k
(1 .. t)-

t k ( 1 + kt + (-k) (-k-l) t 2 +

The coefficient of t m is found when j

=m -

k" and

_ fk + m - k - 1) 1 _ (m .. 1)
-m k)t (k .. I)l - k .. 1

Similarly the number of arrangements of the n Irs into k non-zero groups


is

Therefore" the number of arrangements of X' s and Irs with 2k runs is


2 (~ :

i) (~ : i)

The total number of arrangements possible with m

XiS

and n Its is (m + n)t


mt nL

-136Therefore:

(r = 2k + 1) For the case w..l1en the number of runs is odd, i oe '),


to determine Pr [r ... 2k + IJ , the argument is similar, but we
start and end with either X or Y (instead of starting with one
end and ending with the other), therefore

OOd Case:

E (r)

=m
~
+n

V( r )

= em

t'

2mn (2mn - m - n)
+ n) em + n _ i):;:::::;

N =m+n

Where:

+ 1 ;::;:;; 2 N..(

= No<.

2 2
N,.( ~

= N!3

0(

+ ~

=1

By Sterling's approximation methods, we can show that

lim
N~

Test:

00

r
2..<13

H is rejected i f r
o

- 2N4.

is N(O,l)

rN
<

r 0 can be determined from the left hand tail of the normal


appro:ximation or from tables in
Swed + Eisenhart, 1943 Annals of Math. Stat.
Siegel, Table F
3 Dixon + Mas sey" Table 11
1
2e
0

Wilcoxen (Mann-Whitney) Test (Test No o 3):

U=

~ ~
i

where

u ' ... 1 if X.
~
iJ .

=0
If H is true"

~(U) ...

..2
i

~ E(uij ) = ~
j

< Y.J

otherwise

-137. Pr

For the general case, assume

[y

> X~

= P

Pr

[Y > x

I X = x] dF(x)

-co
CD

= ) [1 -

G(x)] dF(x)

-00

In this general case

E(tf)

=m n

E(U)

[{I -

G(x)}

I F(x)]

2 )
~4 E(u..
~J

::1

"

+-Z] ~

B(u.. , u. b)

E(uJ.' " u )
aJ
J

(3 )

E(u .. , u b)
~J
a

(4)

J.J

ijfb

(1)

+~

2i

.z

isla

+~

ifa

jfb

J.

To evaluate this, we can examine each part separately, e og.:


(1)

2
E(U ) = p
ij

(4)

E(Uij " uab)

(2)

E(uij , uib) = Pr)j

mn such terms

= E(Uij )

m(m-l)n(n-l) such terms

E(uab ) = p2
Xi' Yb

JS.:

> x,

>x

Pr [Y j

Yb

I~

x] dF(x)

-co
00

= )

[1 - G(xll dF(x)

-00

= B[ {I -

G(x)} 2 tFJ

mn(n-l) such terms


00

(3 )

by similar argument

2
E(uij " u ) = ) F (y) dG(y)
aj
-00

=E

C; (y) IGJ

mn (m-l) such terms

-1,38Thus~

V(U) = mnp + mn(m-l) (n_l)p2+ mn (n-l) E [(1_G)21 FJ


2 2
+ mn(m-l) E [ ; G] - m n p2

If H is true" verify that E(F2 )

, Exercise:

and then Var(U) ==

= E(l

_ G)2

==

IT (m + n + 1)

Mann-Whitney in their further work on Wilcoxen's test proved that


mn
U -2
is AN(O,l). This they found by discovering a recursion

/~

(min+l)

formula to get all the moments of the distribution of U" then observing
that the limits of the moments" as m -+ (x)" n -7 (X) were the moments of
the normal distribution.

U -- may

be used for ane-sided or two-sided tests.

-- the most complete tables are given by Hodges + Fix, Annals of


Math. Stat." 1955.
Problem 64:
Take 10 observations of (a) X which is N(O,l) and
(b) Y which is N(l"l)
Apply each of the fi ve tests to the data to obtain tests for

=G

H F

Problem 65:
let a 1o' a 2 ,
let Xl'

~"

Define:

zi

(a)

IOU'

~-l

be fixed points o

n., Xm, Y1o' Y2"

= Fm(ai )

u.,

Yn be independent observations from


F(x), G(y).

- Gn(ai )

Find E(Z), Var (Z) in general and for the case F


denote:

F(a.)
J.

=f i

==

G.

-139(b)

Find E(Z), Var(Z) for the case when

=u

F(u)

=0

G(u)

"'_ 1 + u - 1 ... b
1 - 215

=1
In this problem let a

(c)

S. u

l-b

.s:. l-b

<.. u < 1

Assuming Z is AN, is it consistent for alternatives of the


form of G in (b).

- - - - - - - - - - - - - -- - - - -- -

--

...

_-------

KoJ.mogorov-8mirnov Test (Test No.4):

Our basic sample (X X X X Y Y y X Y Y) could be expressed graphically in


two ways:

(b)

(a)

Y (5,5)

Fm(x)

-roj-1,-,_

..........

L!.

'I

l~:

--.J'

I
I
~>

Gn(y)

i
~
I

.r-

if

I
I

(0,0) ,.'"'-----~~-----
i.e., the sample could be plotted as
a two-dimensi ona1 random walk reaching
the point (5,5)--reject if the walk
strays beyond a line parallel to the
0
45 line"
Asymptotic distributions of D , if H is true:

ron

lim
m,n --7
1(z) has been tabled in the Annals of Math. Stat., 1948.
+
+
Also" D- have the same limiting distribution as D- (one-sample
mn
n
statistic) with the normalization
(see p. 127 )

1m:

-J.Lo- Test:
.

Reject H if Dron )' d.

Van der Waerden's X-test (Test NO e

5):

where
u

(~) =...L. (
.;rn)

e-t /2 dt

-00

X is AN(O, : 1 Q)
Example:

where

2 i
r
(N+1)
i=l

Q =j

=m + n

XXXXyyYXyy

ItJ.

RX
i

11

11

11

11

11

111+1

e09, 018, .27, 036, 073

using normal deviate tables

(111+1)

= z .09

z .18

-.60,

For determining the variance of X, tables of


Van der Waerden in an appendix to his text.

have been gi van by

- - - - - - - - - - - - - --- - - - --- - - - - - - - - - - - - -Theorem 32 (Pitman's Theorem on A.R.E.):


Assume:

Tn' T1i are A. N. Statistics

.!. L '

is a subset of ..!n L indexed by p such that

when H is true p

= 0"

Let the sequence of p'S tends to

e,

i.e., P1 P2 " Pn

--+

O.

-141-

Test:

T -- reject H if Tn

> t n.,(

T -- reject H if Tl!-

>

Assumptions:

>0

(1)

-dd Ep (T )
.p
n

(2)

d Ep (T ) p=O
lim ~
n.

pm"
d

(3)

lim
n",,"*oo

ered

-dp

=c

J.'f

CJ

o
Ep (Tn)
Ep (Tn)

I p = pn

Ip = 0

l{
P =n

rn

=1

plJ"l (T)
n

CJ

(4)

lim

CJ;rr;J- = 1

n-1'OO
Theorem:
.

Under these regularity conditions, the limiting power of .the Tn

=..!. as
n{r1

test for alternatives p


~.

The A.R.E. of

= (c*)2 =

T:~:'
"l!

to

lim
n-7oo

~~

is

00

is 1 -

~ (zO'( -

kc).

-142-

Proof:
Pr

>

Tn - EO (Tn)

t n:< EO

0"0

~J.

= ..<

0"0

For large n, since T is asymptotically normal, t n.,(


Pr [Reject H I Pn ]

lim

= z.,(O"O

EO(Tn)

n~co

= lim
n-1oo
0"

= Z.,(

(...9...)
0"

lim
n..-?oo

Pn

from assumption 4, as n

0"0

00,

( _..) ~
0"

kc

Pn

using a Taylor series expansion:

o < pIn

=Therefore:

n -1

kc

prlReject Hr

lim

< p.n

P~

=1

(z.. -

00

x-'"

kc)

By a similar argument, the limiting power of the Tll- test is .

d EP(Toll-)
n

where

c~l-

-d

= lim
n-j

. . , '"'OS

00

P= 0

-143We want to determine sequences {n l

~ , fn11 such that

1 - (Zl_..< - k*ci~) mich means that kc

to be the same p
n

= p*

or ---

y'ii1

k*
=---

0i

= k*c*.

1 -

(Zl_..< - kc) =-

Also" for the two sequences

Thus we can determine the equality of the following ratios:

The AeR.E. of T to T* is given by any of these ratios, or as stated in the


theorem:

AeR.E.

=( -r-C'

= lim

c ,)

Example:

n ....,

Coa2).(~d Ep

CD

(Tn)

dp Ff' (Tjt)

Ip = 0 )
Ip

Obtaining the A.R.E. for the Wilcoxen test versus the normal
mean test, i.e e

U versus Z

I-X

which is asymptotically equivalent to the


two-sample t-test e

X and Y have d.t. F(x) under Hoe


X has def. F(x), Y has d.f. F(x -IJ,) under HI.
In both cases the variance =

8'

For vfilcoxen' s test:

co
p Pr

[Y

> X]

[1 - G(x) ] <IF(x)

-h>

- co
f (x - IJ,)

(x) dx
-00

[1 - F(x -101)] rex) dx

-144~ E(u)

E(U) = mnp

= mn

(x) dx

IJ.=O

_ mn(m + n + 1)
Var (U) 12
E(Z)

x dF(x -

Xx dF(x)

IJ,J -

crV/ .!n + m
!
dE(Z)
dIJ.

~/1+1

vI

iii

ii

Var (Z)

mn
~

(J

in

==

Therefore, the ADR.E e of U to Z is


00

) f2 (x)
-00

I-mn<m ; n + 1) ).

n~

oo\!

I ron
(] y' iii-+n

12

which reduces to:

= 12

A.R.E. (U to Z)

;. [

Thus we can compare U and Z for any f (x) whatsoever.


For ins tance, if f (x)

=-L
Il1tcr

e-x2/2cr2 then

f2 (x) dx

2no-

using the transformation

1
0

.1

...........

V2n .j2

l /2 dy

e-

-00

and

12

J [ )

2cr/fl

00

(x) dx

12i

-00

12i
4ncf"

=1= 955
n

=~

All of which says that the A.R.E. of the Wilcoxen test to the Normal
(two-sample) test" if the underlying populations are normal, and we are
testing IIs1ippage of the mean" is 31n.
Problem 66:
(a)

Evaluate the A.R ..E. of U and Z when


(1) f(x)

=1

F(x)

=x

F (x - j.l.)

a<x

=x

<1

IJ. <x ~ 1 + IJ.

j.l.

a <x <CD

Find an f(x) such that the A.RIlE. of U to Z is + co

(b)

It has been shown that the A.ReE. of U to Z in this case" i.e e


testing slippage" is always > e864. - Hodges and Lehmen, Annals
of Math" Stat .. , 1955.
AeRoE. of test to Z
(testi~ for slippage)

Test
1. Median

2/n

2. Runs

3.

3/n

4.

K-S

???

5.

~ 1" i.e,

= e-x

(2) f(x)

Remark;

a~ x

Consistency
yes, if the median
median of F
l
consistent for all
yes" if the median
median of F
l
consistent for all

of Fa

Fa = Fl
of F ,.

Fa = Fl
X
1
yes, if the median of Fa f
median of F
1
Robustnes s of a test (as propounded by Box) refers to the behavior of a
test when the various assumpti ons made .for the validity of the test are
not fulfilled.
Type 1 error - P.r [reject H when true under the assumptions]
Power

- P.r [reject H when false under theassumptionsl

A test is sa:.d to be robust if Pr [reject H when true if assumptions are


not fulfilled] remains close to
Note:

..4

regardless of the assumptions.

the Z-test, or two-tailed t-test, is robust.

The proponents of distribution-free statistics argue that the disadvantage


of t,he Z-tes tis that the power may slip if the as sumpti ons (of normalcy, etc)
are not satisfied.

r
- 146 -

1.
2.

3.

4.

Median Tests
X 2_Test with arbitrary groupings
Kruskal-Wallis Rank Order Test
Kolmogorov-Smirnov T,ype Tests

samples:

1.

..

2.

..

k.
1.

~l

\2

k
N= }"n.
J~ J

~~

Median Test is made by setting up a 2xk: table:


sample

...

.....
./

Above

mi

N/2

Below

n - m.J.
i

N/2

n
l

...

n.J.

nk

Where mi = the number of observations in sample i above the median


of the combined sample.
Use a x2.test of the null hypothesis with the expected values
is even, and X2

= n./2
J.

when N

with k-l d.f. under the null hypothesis

that the k samples all came from the same distribution.


2. x2.Test:
Being given or arbitrarily choosing groups Aj (a j < X

a j +l ) define nij

the number of X's in sample i that fall in A.


J

Under the null hypothesis Pr[Xfalls in A. J = p. independently of i.


..
J
J
2
This can 'be tested by x in the usual manner -- as an rxk: test of homogeniety,
2
where x has (r-l)(k-l) d.f.

- 147 -

=:

12
N(I~H)

where Ri = sum of the ranks of sample i taken wi thin the combined sample
ii

=:

Ri/n i

H is asymptotically distributed as X2 with k-l d.f.


ref: March 1959 JASA for small sample approximations.
Problem 67:

!to

What does H reduce, to when k = 21


Prove your answer.

K-S Type _Tests:

Would involve drawing a step-function for each sample on the same graph.
Unfortunately nothing is now known about the distributions.
ref:

Kefer, Annals of Math. Stat., article to be published probably in 1959.

Consistenc"I:
The Median and H are consistent against all alternatives if at least one of
the sub-group medians differs from the others.
A.R.E. :
A.R.E. for slippage alternatives:
median test against the standard ANOVA test

2/rr

against the standard ANOVA test

3/rr

H test

where the underlying distribution is normal.


If the underlying distribution is rectangular, then the A.R.E.fs become:

median against ANOVA


H

against ANOVA

1/3
1

.. 148 CRAnER VI
TESTING OF HYPGrHESES -- PARAMETRIC THEORY -

Cramer" ch. 35
Kendall" vol 0 2" chs. 26-27
Lehmann" IlTheory of Testing Bypotheses,lI, U. of Cale Bookstore
(notes-by C. Blyth)
Fraser" IINonparametric Methods in Statisticsll"ch. 5

Refsg

1.

't.-

POWER

Generalities

-- usually the XIS are independent with density f(x, 9), the density having a
specified parametric form with one or more unknOwnparameters.
For the parameter space.fl HO'

~. ~ e '--\

e CJo

Recall that (~) is a test function of size

0<

such that

reject H with probability 1


reject H with probability k

=k

. do not reject H (accept H)

==0

where E [()
Power function'

Def'll ~:

~ (Q) ~ E

e Wo ]

~-<

[ I ~J

* is a unif'ormly most powerful. (uom.. p.) test of size ""-, if being an.
. other size cI.. test

'\}

2.

Defs" 33..38 in chapter 512

Ref 8
--

Probability Ratio

T~

Neyman-Pearson Theorem,
X is a continuous random variable with density f(x, 9)0
HOi

9 == 9

(simple

Hlll

0
~pothesis)

Assume!

f(x, 9 0 )

Assumes

f(x" 91 )

f{x, QoJ

>

OJ

f(x" Ql)

>0

9 == 9

1
(simple alternative)
for the same set So

is a continuous random variable 0

- 149 ..
Theorem 33 (Neyman-Pearson):
The most powerful test of HO against ~ is given by *(x) defined as follows~
*(x)

1
elsewhere

- 0

where k can be chosen so that it- is of size ..<.


If the test

* is in~ependent of Q1 for Q1 &GJ~\hen * is the u.mopo test of HO

against Hl :

&~ 0

Remark: This is what has been called the probability ratio test, since Neyman.-,
Pearson originally expressed the theorem that

I----+--~--

__~--~----

Proof:ltl

To show there is a required k that makes -:l- of size ..<


define:

..<{k)

-o) .,
since

r:

Pr

[1f'O{x)
(x)

:.>

X has density f

lJ

oI

'-'(00) == 0

is a continuous random variable, 1 -

~(k)$

which is the

d.~

of

this random variable, is a continuous function and is monotone non-decreasing,ll


hence for some k f we must have -k f )

==-<.

- 1,50 2.

To show that * is uomopo


Let

be

any other test of size ~.

We want to ShO'!l'l that g

j/(x) t(x,
Consider!

~ Y(x) t(x, !ll) dx


[t(x,~) - k t(x, 90 )]

!ll) dx

[(x) - lI(x)]

That this integral is


when f

when f

- k f

follows since

>0" then'!t

- k f ~ Os then

1.9 and

rf! ... ~O

=.0" and

ff .. ~O

=:

Expanding this integral we get8

)1 t

l dx

fllt

l!f

but

j*

Therefore;

and

l dx

f O dx
dx

::&.

- k [

r1

j f O dx

~5 f 1

I\~

..,

j2n' a

Reject Ho when

.L

1
O

~ (Xi

'"'

l.

known

~)2
f

1
O

I:"

- -

J2n'a

~
2
~
2
- 2u..
.:::JX.
""
nu..
..
~Xi
.' J.
J.
".L
,

2a'2

== '" ... the size condition

>k
~

or

to dx

IJ.=~> 0

2&2

~X.

is u"mQpO
a

... ~l_

to dx -

~0

dx

Example I i

dx

>

;?

-151 -

(In k) 20' +

or

nlJ:t. 2

To satisfy the size condition, K a Zl-..


If Ho is true,

Remark:

Pr[ aJ,pr
X > zl....<l.
J

~)

X is NCO,

Er,Pl) (which is independent

If the most powerful test of H against


O

H:J..

of

IJ:LJ

=0;"<

= Q1

is independent of

for some family~" then the probability ratio test, *~ is u$m.Pe for

He'

O against~.

Q =: Q

Therefore"
For Hoa

X > zl_..<

<0

and l.J.

>0

uQom~po
,

u~m9P.

It'-

<0

11.:

the uOm0p. test is

j,l.

'> 0",

X< zl-'.

J;

test for this problem since the u.m.p. tests for

differ.
here&
+\

'j'

.'\

Problem 68:

6~t

is u.m()p. for HO against

IJ.. 0 against ~.

There can be no
l.J.

.fn

X< K'

u.m.p, hereg

~4

X> K

--._---

l.J.1

Xl' X2" , Xn are independently distributed with density

f(x.)

= Q e-Qx

x '>

(1)

Find the u"m.p~ test explicitly (i"e., find the distribution of the
test statistic) c>

(2)

Write down the power function in terms of a familiar tabulation.

- 152 Remark:

--

flex)
fo(X)

is required to be a continuous random

variable~

If this ratio is not a continuous random variable, then .J.. (k)

may be discontinuous, so that .t.. (k)

lOt

..< has no solution o

nomal situations where one can't get ..<

lit

.05

Pr(~>~

(eogo small bi-

exactly.)

in this case is defined to be e

=1

flex)
when i'O(x)

>k

=0

flex)
when l' (x.)

<k

I:

when

rflex)
(x)
o

=:

where e is chosen so that the size


of the test comes out as .,(

~(k).
I

.t.(k-ltQ)

'----

(k)
(k .. 0)

Problem 69g

For 1 let 8 be the set where f ~ o.


l
1
For fa let So be the set where fa

(1)

> 0(1

If So and Sl are not coincident then there may be no test of size ..<.

(2). If there is no test of size ..< given by the Neyman-Pearson Lemma

(Theorem 33) there is a test of size

<.,( with

Hypotheses noted thus far' have been simple hypotheses(l


the hypothesis is called a composite hypothesis, eGg. 3
Xl' X2,

eGO,

Xn are.NID

(~, cr2)

If

power
Cy

=1

has more tha~ one point:

13(Q) = Pr[rejeot H]
for HO: I-.t. =: 0 against

I1.1

I-.t.

>0
Testg

..

,,(

- ---

Extension of the u.m(>pe test to oomposite h;y'potheses by means of a "most unfavorable


distribution of

@"'o

w.

Let A (Q) be a distribution of Gover


Then X has a density f (x,
~(x)

i).

~ (x, Q) d :\(iI)

-=

density of X under H plus the


O
additional information.

w
NV

Let Hobet

I1. $

X has density h4: (x)

i~e~1

Let

Th~~

34:

Proog

be the most powerful test of

= Q

the density of X is (x~ Ql)

%against ~.

If rp~ is of size d.. for the original test" it is mOp$ for this test.

Let be any other test of

(x) f(x,
Then

'"

=
Thus

andl

is

~) dx ~ -< for all Ii

J[5 ~

iii

Ha-

(x)

(x) (x, iI) dx

f J(x~

lit

d:\{iI}

iI) d:\(Q}}

~ (x) ~ (G) dx

AA-

of required size for H ~


o

~" (x) f(x,

Q} dx

(x) (x, Q)

dx

dx

~54

ci known
rf' test~
H~:

!f

(;J.

I:

I(. /, ]

Put

reject 1 X ';> zl-<...2:- which was found to be uomop. for

rn

0 against Hl :

(../,> O.

~"CO<

O.

for(../,

We know that Pr [rejecting H with

A(Q) = 0

'3 <0

-1

Q>O
,/

(ioe&$ concentrate the distribution at a which is the worst spot from


the standpoint of testing or distinguishing)
';fhis makes our composite distributioni

~ (x)"

( :rex.

~) d},(Q)

..

:rex,

0)

so that our problem is back to HO and we have for HO a u.mop" test which
is also a test of the original HO'
One way to get an optimum test in the absense of a u"mop" test is to restrict the
---------class of tests to be considered and look for probabilH,y ratio test for restricted
class.

Such restrictions area

1<)

..Defo

42~

2/j1

Def. 43g

Unbiasedness

rJ

is an unbiased test if l3 (Q) ~

co<

for all G e ~

Similarity
is a similar test if Eg (

rJ)

=..< for all! e

Wo

3e Invarianoe
X has density f{x, ~)
G :=, a family of transformations of the sample space of X onto itself\)
eo go a g(x)

=:

cx.

= x + d

change of scale
translation of axes

Let g(X) have density f [ x, g(Q) ]

g is

g(Q)

JL

a transformation of the parameter space induced by the transfor~


mation of the sample space.

- 155 if X is N(~" (l)

g(~.f (12)
if X is

lit

OJ,.L,,

N(~, 0'2)

g(~,\1 (12)

X ... d is

lit

~ e wand H.J.I ~

IT we have HOi

oX is N(0 j,J."
2
0 0'2

~+

2
0'2 )

N(~ of!- d, (12)

ci

d"

an. w

then a transformation in-

duced by g leaves the problem invariant i f


i(Q}

8: (,.J

i( i)

8:

8: UJ

n... _

~ when

0'2)

Example:

when Q

H08

Q oS

~.

n .. w

g(X) .. oX

>

g(j,.L1 (12) .. oj,J., 020'2

W is the (12 axis

-----+-----1-/.

under H g(X) is N(O, 0 20'2)


O
2
H g(X) is N(~, 0 0'2)
l

Def.44t

whioh doesn't change

eu

A test, ,s(x)" is invariant under a transformation g (for whieh the


oorrespo'nding

Example t

t.

g leaves

the problem invariant) if W[g(x)]

l~
n
-"i X.
J.

=:.

81m (] (Xi .. f.)2)~2


.

lit

~(x)

l~

(n-l) n

(under g

ii ~cXi
ex) ---"':::-_~r-J"I\'

(~~i_Iq.)~\1/2.

(n-l) n )

ii~:'X'

lit

~(Xj-f)2J 2
(n-l.) n

Def* 45:

If among all unbiased (or similar or invariant) tests there is a

which is uom.pol then ~ is u.m.p9u. (or u.m~poso or u.m.p.i.).

rjf

-156 Eniformly Most Powerful Unbiased (uom.pou,) Tests:


HOi Q

Single parameter Q

"f" is differentiable with respect to


Remarks

If

=:

which is inside an open interval


O of the parameter space.

~G

is unbiased, then
=:

~~, (iO) 6.,(

Proof:

~ .(~)

~_ (Q)
~

>-t

Hence

for

~ ~O

(Q) has a minimum at ~

~_ (x)

(x, Q)

d~

(Q) ..

'aQ

dx

Ill:

G .... Q
O

tor unbiase~~-j~La..l:t~.!'-1?:~~~yes._mu~!:?.~.~:w.Q~~!c.!ed (otherwise the power curve


has no minimum 0
"

Assuming that

He:

=:

(Q) is differentiable with respect to Q

and henoe

Notes

"Q., 9

.Theorem 35:

5f(x, Q)
11.$

dx is differentiable under the integral sign, and with

@l (two-sided)

If there exist
(

I:

I:

~,

we can get.

so that

1 when f(x, Ql) >kl f(x" i o) + k 2

~~

"I
Q

=: ~O

0 elsewhere

is of size ..,{for H and unbiased, then it is mope for alternatives


O
Q e If the test does not depend on G it is uom.po u. for \ U i 6 GJ.
l
1
ProofS

(refS

Cramer p.532)
Let be any other unbiased testo
Then

Sfu fa dx ='" =
J'(w: ~"' Q

'='

90

)_ fa
dx =

dx

a=

J'(- ~ I.

dx -

= Qo

Bi~ conditions

unbias~dness

conditJ.ons

- 157 -

)Cf -lI)

f 1 dx

Cf - ) [f1 -

This integrand must always be ~


since if f 1 -

kJ.

~~ f 1 dx ~

We want to show that.

f 0 - k2
.

~i

k:J.

1 dx

~ IQ ~ QJdx~O

f O - k2

i =

~0

~O

1.

lit

#}

>,

o=f~ti

Thus the desired relationship always, holds o


Comment:

It: you are trying to find abounded function" a ~

f dx subject to side conditionsJ

~ b which max~mizes

f i dx = c

i = 1,,2, "",9 n;

then this maximum will be given by choosing,


where k. are chosen to
J.
satisfy the n side conditil

=a

otherwise

Exampleg

Consider a particular alternative

,
!.

~,

known

lJ:Le

~ (Xi~)2
2 -

2a

- 1.58 -

k'a, the u.m.p.u. test

If we can find proper

is~

Reject H i f .

..
(derivative evaluated at

~x.

~.

J.
--r
cr

nlJ:L

. -

~ k +
r

n~

-:r-!
e

n~
.......

kJ.

: 0)

2a

,-

where,

+. k 2 x

k'

-""1

=:

2a

n ~2

k'

e
2 =T
a

2
2a

If we set Yl = the left hand side of this inequality" Y2 = the right hand

and restrict ourselves to the cases where


ing type of

'1.. ~ 0;

then we could get the

graph~

side~

follo~

~l

\
J

- . - --.+
I

I
I
I
I","

'"

___-==-_......:"'

..,-__-l-

'"

It

,It

(Y2" Y2, Y2 "Y2

are possible Y lines)


2
The test says to reject H i f < a or > b where a may be -

b may be +
.

41'

II

Iff

Y2 gives a two-tailed test (finite a" b) ..- Y2' Y2 "Y2


tests (only one interseotion with Yl) ]

CD
00

all give one-tailed

-159 Requiring that the test be of size


Pr[a<i<bJ

0<

specifies thatll

... l.-.,{

rn

i.e."

.--'.--

,r2n 0'

me

20'2

di =1. ..

..<

Also, the unbiasedness condition requires that:

d (

dPl

'-

orl

d~

rn
f2io;
1. ..

e"

or

,f2na

_rn-

-{2ncr

2ci

..
e

):
00

dX+

rn
$0'

-co

- rn

n(i - lJ.:L)2

n(x - lJ.:L)2

2ci

..

-0

dx

'2=0

n(x _ ~)2

.. - 2(i

lJ.:L

..2
nx

2r:i

=0

dX

..

nx

ill

di

[2]

==0

It

[2 ]

The function under the integral in


is an odd function" so that the integral
is zero only if a == - b (i.e", if a, bare symetrical).

[1]

..

thus become sg

so that 111e can determine a asg

1.60 Thus the test is:

1x

Reject Hii'

>

and since it is independent of

IJ.r

it is U1>m.p"u.

Problem 70.

Find the

u~m.p.uo

note~
Theorem 36e

~ (Xi
If

test

test for H
O
-

~)2

2
is sufficient for cr

is sufficient for ~ then given any test (~) there exists a

'Y (~O with the same power function.

optimum tests, only functions of

Proof:

define
Y(:) = E
by definition
Thus 8

[. (o!)

'. need

( TJ

Hence in looking for

be considered.

which is independent of Q

- 161.

-Invariant Tests
Refs:

We have8

Lehmann" "Notes on Testing Hypotheses"" ch o 4


Fraser" "Non Parametric Methods in Statistics", ch. 2
observations.

Transformations:

parameter.

...

g(x)

density&

has density f

._ X t

[!.,

f(~

Q)

g (Q)]

ioeo, there is an induced transformation on the parameter space:

the group of transformations that leave the problem invariant, i.e.

Gg

Examplee

-g (Q)

$,

Wo

if Q e

""g (f})

.~

if

~ 8

~l

2
Xl' X2, " ... "', In are NlD (.I., (J )
t

=OX

g(x)

:0

+ d

CX 0\+-

g (.I.,

11.~

If

we set d ... 0 and c


X

...

> 0,

(C(.l.

+ d, c 2(J2)

:(0

(.I.

is NID

Xl

(J2)

oft-

d, c 2(J2)

~ >0

then the problem is invariant, and we have.

g(x) ... ex

Sufficient statistics for

~,

..
'"
cr2
are
X and S ... ~(X.

X)2

J.

Under the transformation:

t a - is invariant (the inclusion of constants does

rs

not affect invariance).

(discussion to be continued after def.


Def. 46t

46 and theorem 370)

m(x) is a maximal invariant function under a group of transformations if


1) m[g(x}]

=m(x)

2) m(xl )'" m(x 2 )

>

there is a g such that g(x2 ) = Xl of vice verB

- 162 Theorem

37~

The necessary and sufficient condition that the test function (~)
be invariant under G is that it depend only on m<.~).

Proof~ 1)

(2)

Y(m(~) ]

I:

rJ [g(~)]

'ft ~(!)] 1 = "r[m(~.> J


m

Suppose that (!) is invariant,

2)

= (x)
,

We have to show that (?E.) is

a function of m<.~~).
Given [g(~)

show m(x ) ~ m(x )


1
2
m(x1 ) = m(x 2 )

(!)

>

> for some

(~) == [g t (x2 )

thusg

(~)

:: (x2 )
t

g, call it g , g (x 2 ) == xl
I:'

(x 2 ) which is what we set out

to prove.
Returning to the IIIStudent D problema
It remains to show that t

... i

JS

is maximal invariant

(l

""I

Setting t :. - "

rs

==.1.

:c.L.-

J'Si

we have to show that given t

=t II

we can find

_)

g such that X , S : g(X i S

-,
Consider..!.. and call this ratio

"a""
t
2
or S = a S

But this is just one of the members of the original family of transformations so
t is maximal invariant.
Hence the problem is reduced to finding the u.mGp. test based on t.
In summary.

Xl' X2.9
H
3
'0

eo.,

X are NID (~.?


n

/lk=0

Cf )

- 1631) Reduce by using sufficient statistics,


2)

X,

SW

Impose the invariance conditions under the transformation x'


This reduces the problem to tests based on t.

3) Find the distribution of t under


Ratio Test).
-

fb

at

CX, C

:> 0.

and H and apply Theorem 33 (Probabilit:.


1

-------------------------------------------.
Distribution of Non-Central t{t-distribution under H)a
Refs~

.'C 5

Neyman -Ii- To:tmrska, jASA 1936, pp. 318-326 (tables for the one-sided case)
Welcl?- + Johnson; Biometrika, 1940, PP. 362-389
Resnikoff + Lieberman, "Tables of the Non-Central t-distribution ll , Stanford
University Press, 1957
the- non-central t-variable, is defined bya

z is N(O, 1) .

where I

d.f~

w is X with f

8 is a constant

>

The usual t-variab1e is

='C

CO,

= rn

f)

i
CT

-rhE
2

(j

when H is true
o
If ~ is true, lJ.

lit

11.

and

Jil (X-/.L:1.)
(]

. is N(O" 1)

(n-1)

-164 The joint density of z and w is,

2'

-1

'2

Q)

<.Z

<,co

w ~O

To get the density of

't"

(the non-central t) we make the transformatiom

't"

= _e +8

rr

Thuss

1 /fU 't"

co

~
o
To get the distribution of the usual t, put

=0

r rr

8)2

!:! _ ~
u 2 e

O.

The integral in f ('t" ) become s

;1 - \

~( l +1)

du

which can be readily evaluated recalling that

e-ax xb-1 dx

o
Thus,
ref:

1 +t

Cramer, p.238

du

- 165-

EO

The u.m.p.i. test is to reject

Note.

if &

This ratio is a monotone increasing function of 't'. (The simplest


proof of the required monotonisity is given by Kruskal" Annals of Math
Stat" 1954, pp. 162-30)
R('t')
ratio
kl-------~

Since t >t o when the above ratio> k" the probability ratio test thus reduces to:
Reject Ho when t

>

to

Final result is that the m.p.io test for Hli

Reject Ho when t
This is independent of
Example.

'1.

fl: >

=~ >

~o

is.

t 1-< ("..1)

and hence is u.. m.p.i. for

(~"

Xl" X2,

000"

Xmare NID

Yl" Y2"

000,

Yn are NID (~2'

EO

against Hlo

el)
2
(J

i~ u

Sufficient statistics for the three parameters are:

sp

==

~ (Xi

- X)2

m+n...2

+ ~ (Yi - 'Y)2

V\

\;"-0

\l\

_ 166 ..
Consider.

N( a

X'=a.X+b

Xt i

yt : oY + d

yt is N(c

+ b, a 2a2)

~2 + d, c 2a 2 )

Invariance requires that,


a~~b=o~+d

when

=- j.L.2 for all j.L.

a~ +b>o ~2 +d

b dJ

The first line requires that.

requirementthat~

The seoond line adds the


Thereforec

a=-o

Xi' =- aX + b, yt == aY

oft

b, a ')P 0

leaves the problem invariant.

To be provent
is a maximal invariant statistic.
t =- t

t===} there exists an a, b such that

X'= aX + b" yt

=- aY

-11

x. - "

=-

,?

define:

o(x .. i)
now let XI

or~

==

j( t

..

!I

=d

eX

o(x ... Y) =

ef ...

.. ei

il

'if =- of

Xf
T~e

..

=:

l:<o

d ..

d -

+ d

eX ...

so that the same transformation has been applied to


'X and Yo

u.m.pe test invariant under the family of transformations,


Xt =- aX ... b,

iSI

Rejeet II if

Y': aY + b"

"> 0

-161 ~

Under the alternative

- 1J.2

I:

d" thus

(!..:-Xcr - d)
s

cr

"'"

JH
m. .
T

!!

cr

1
Ii

has a non-central t distribution with parameters [

crJl:+!
m n

.
Exercise.

.( == 0.05
-cr = 0.8
Find m" n required to have the power

'?

,,90 (put m = n)~

Try to find m, n also by normal approximation.


dof 0

Answer:

non-centrality parameter =p

2(n-l)

=0

-30n
28
27

25
For the power to exceed e90" from the Neyman-Tokarska tables we need:

= 30
d.,f o = 00

p == 2.99 at dGf o
P ., 2.93 at

Thus, by rough interpolation, the minimum size required is n == m 28.


From the normal approximation,
Problem 7lg

a)

crl

= 26.7

or 27

Xl' X2,
.
Y
,
Yl~ 2
HO~

0"00,

"Ill""
2

X are NJD (~, cr )


m
1

Yn are NJD (1J.2" cr 2 )


2

== cr2

1\$

crl

>- cr22

b)
c)

Find the group of transformations leaving H invariant.


O
Find sufficient statistics for the parameterso
Find the maximum invariant function.

d)

Find the u.m.p.i. test.

-168 ..
e) Find the u~m~p.u.i. test (ioe., set down conditions to get the rejection
2
2
region as in problem 70) forI HO' cr1 = r:J2 against H~U (J12 rJ cr 2
2
f) Show that the usual test which is to reject if F '> Fl _..<. (m-l, n-l) where:
s 2

1:0

~ (Xi"

x)2

m-l

~ (Y ...

z 2
y

.~

!)

n-l

1=

is a test of size 2..< with lIequaltail probabilities""


g)

= 0.05.

Plot the power of the test of (r) for m = n = 10 with 2..<.


2 2
(Include the points cr1 /r:J2 = 0.5, 0.8, 1.25, 1.5s 2~0)~

Tests for Variances:

~ (Xi ..

~) IJ. known

Reject

Ba

it

~ (Xi ~ 1J.)2

Reject Ho if ~ (Xi - f!.L) 2


ref: problem 70

2) IJ. unknown

is sufficient for

/.1-)2

-- u"mopc> for HO against

)t K

< IS.

~(X.

or
-

>K2

xl

J._

.,

cr

n-l

--

H:t.

uemopo u<) for HO against

are sufficient for IJ.,

H~.

cr

Problem is invariant under translation, i.e


XI = X

-It

To find a maximum invariant function


f(X', s,2)

= f(X

+ a, s2) = f(X, s2) must hold for all a.

This says that f(X, s2) is independent of


tions of i only.

X.

Thus invariant functions are func-

Hence, since (n-l~ s is ,,2 with (n-l) d~f. when H is true, the problem .is
O
(j
exactly that of (1) with the

d~fo

reduced by 1

(i~eo,

n-l vice n).

-169No~alTests:

Summary of

Xl~ X2'

oe", Xm are NID

XiS" Y's are independent

Yp Y m ... ~ Y are NID


n
2
Altel'native
Hypothesis
1.

Ha :

HI:

~ )0

&It

Test:

(~

Reject H if

Classification
Invariant under
(f or H against the transformations
given alternati va)
of the fom

known)

x )zl

O"x

-..<

rm
-

O"x

2 fl

"

I-I )Zl_~ 1m
-

Hi: ~

:f

HO: ~x

= ""0 ( O"~

HI: ""x

> ""0

t ) t 1_..<

H'I"" ""x

I: ""0

It

IX

= i0

3. Ha : 0"2x
HI: 0"2x
2

Hr: O"x
1

unknown )
Xf = cX

I > tl~2

e)

Xf = eX

e arbitrary

("" known )
2

~ (Xi .. ",,) 2

~ (Xi ;.. ",,) < Kl or

>' 0"0

F 0"0

2 (m)
> X1-..<

--:1

u.m.p.

>K2

(for equations for K1' K2


see problem 7a)

4e

HO: 0"

= 0"02

HI:

O"~ )

HI:

O"~ F O"g

og

("" unknown)
~

- 2
(Xi - X)

~ (Xi

2
> X1-..<
(m-l)

>

- j{)2( Kl or 1<2
(for K , K see problem
2
l
70 - use m-l d.f.)

xr = X + a

Xt = X

-I-

-170Alternative
Hypothesis

Test:

Re ject H if

01assification Invariant under


(for H against the transformations
given a1ternati va)
of the form

H : lJ.x :: lJ.y
O

x' = X + a
I'=I+a
x' = X + a
I' :: Y + a

x,

a.

'f, s~

m+n-2

are sufficient for j.Lx'~~O'

x-f

x' = aX
= aY

yl

sp/ ! .
m+ !
n

I X - y I > tl_~(m+n-2)
2
s p(lii-r"ii
J!":T
ix r iy

X, f, s~J

+b
+ b

Xl :: aX + b
It :: aY + b

s; are sufficient statistics far IJ.x '

~, ~,

0';.

This is the classical Fisher-Behrens Prob1em--no exact test


is known. The approximations thus far have tried to keep the
size of the test under control, and very little attention has
been paid to the power. The approximation used is:

X-I

s2
2:. + _l

t' is approximately distributed as t with


modified daf .. (i.e., modifications by:
Smith-Satterthwaite, Cochran-Oox, and
Dixon-Massey)

Ref:

Anderson and Bancroft, p. 80.

Tables far t I in the one-sided case (-< = 0.05, 0 ..01) have been
given by Aspen in Biometrika, 1949.

-171If m = n

X-I

= -:;:::::::::;;;:
+ s2

ix
I
V

X-I

Empirical results:
based on 1000 samples"

= n = 15

cri = 1"

cr~ = 4 :

significance level
actual rejections

5%

1%
4.9% 1.1%

rejections based pn 28 d.!.


significance level
m=n=5

actual rejections

5%

1%
6.4% 1.8%

rejections based on 8 d.f.

In re power in the Fisher-Behrens problem:


Ref:

Gronow; Biometrika, 1951" pp. 252-256

He gives a note on the power of the. U and the t tests for


the 2 sample problems with unequal variances e

cri

vfith n l f n "
f cr~, the U test stays fairly close to ..(
2
in size, whereas the t-test jumps wildly ani a comparison
of the power becomes very difficult.

-172Alternative
Hypothesis

Test:

H'cl=i
O' x
y

Hl :

2
x

(j

>

Reject H if

Classification
Invariant under
(for H against the transformations
g!!~m alternative)
of the form

known)

2
y

XI = aX + b
yl = (a)Y+c

(j

XI = aX .,.. b
yl = (:ta)Y+c

vJhere Fl' F are chosen to satisfy the unbiasedness


2
conditions as in problem 71.
Since Fl' F depend on complicated equations, we usually use
2
the test:
Reject HO if max

~2

2 )

1-, 1

sx

>F l~

(m,n or n,m)

d.f. depending on which term is in the numerator.

8"

(lJ.x '

~y

unknown )

u ..m.p.i.

XI = aX + b
Yt

u.m.p.u,.i.

= (ta)Y +c

XI = aX + b
yl = (ta)Y+c

Same comments on the determination of F and F apply as


l
2
in No.7" and the same alternative approach is usually
taken (with the appr'opriate modification of d.f.)

- 1733..

Maximum Likelihood Ratio Tests.

Ha:

~:

Q 8

n-

max f(!I Q)
Q 8 Wa
11.= max f(~ gj

8..0..

Maximum Likelihood Ratio Test is to reject H if A <: A where A is chosen to

the size conditions.

For small samples in general this procedure may not give reasonable results.

satisf~

For

large samples, as with the mel.e., results are fairly good\t


Remarkc

Under suitable regularity conditions -2 ln A is asymptotically distributed


as x2 The degrees of freedom depend upon the number of parameters speci-

fied by the hypothesis, ice", if Q has m components in nand k component


in W then the dof" in the asymptotic distribution of -2 lnA. (under H )
a
o
are (m - k).
See S. S. Wilksi

Proof:

Annals of Math Stat, 1938" or "Hathematical

Statistics" II Princeton University Press


Wald has proven that the ffieloro test is asymptotically most powerful. or asymptoticall
most powerful unbiased testG
A~

Refi

Example
of
...

m,l.r~

Walda

Annals of Math Stat, 1943Transactions of American Math Society" 1943


(both papers in his collected papers)

tests

2
Xl" X2 " ou" ~ are N(~" 0.)

HaS

=a

~Fa

R.t.8

~ (Xi

2
20

- ,,)2

unknown

-174 ~

~X.

When H is true"

J.

is the m.loe. of a "

X"

And in general"

n~X2

----~X
i

- ~ (Xi" X)2
An: ~X2
2

n~

(X._X)2
_ _...;;J.
_

2~ (X -X)2
i

(n-I) s2

(n-l ) S 2

-it

nx-.2

.if

.-n2

.
:::

h(~) :::
of Ae

+~n:r
n )'=2s

J n-l

(A. -2!n _ 1)1/2

can be used if it is a monotone function


/

- 175 -

(~)

hf(A) ==-[n-l
h(~)

Therefore

(X 2/ n _1r l / 2 i\..f/n)-l

(~) <

0
~o

is a monotone decreasing function of

h('iI.)

</\o
h(:\) > h
o
ltl >t o

Reject H whens
O

Problem 72& Find the m,l or e test for H :

against

lJ,=O

1\:

I-L

> O.

Does it reduce to the u.m.p.i. test?


Problem 73:

Xl" X2,

Hog

X are NID(ft./., (i)

0>01

(j

-::: C

lJ,

lJ,J

(specified)

Find a uem.po io test for HO against


Perform the test where X :: 5J

4.

General Linear

(j

HIg

S :::

== c 1

ci unknown
'>

Co

~.

12..

00

=:

1, n

II;';

20

00

~" N

Hypothesis~

assumptions8

X,1. are NID(lJ""


1.
P

lJ" ==
3.

j=l

a ..
3.J

i = 1.1/ 2"

~,
J

~1"

$.0,

~p

are unknown parameters

p ~N

rank A ::: (a

ij

) is p

k = 1,,: 2" eu" s


s ~ p
i.eo" s linearly independent
equations in the parameters.

aBs" bes known


Alternative formulationg
We may solve for
assumptions:

~l"

13 2"

ClU"

13 p in terms of p of the l,J>is and then get as the

i==l

".ei lJ.i

=t

.f, ::.

1" 2"

u.

N..,p

HO can be written as an additional set of s equations in the lJ.'SI


N

HOtJ

i=l

C1'...
'10.

~,3.

= 0

k -1" 2"

GU',

- 176 ..

Example;
i.e.~

the 2-factor, no interaction modeJ

=- 1" 2,

eo .t

ab observations

'"'1." ~.,

Parameters are iJI,

~a-l' ~l" ... , (3b-l

0<.

j=l

parameters in all

are the a..l equations in the parameters

It:

b -

a-l .

Lemma:

a. jY' (i = 1., 2" ..


1. J

fl,

m) are m linearly independent equations in

n unknowns (m .c: n) then there exists an equivalent set of equations with


matrix C which is orthogonal" i.e.
n

~ c~. ::

j=l

ProofS

for i

l.J

= 1,

2,

'Ref a 11ann, Analysis and Design of Experiments.


given m equations in n unknowns=

..
"

CI

o~.,

.. 1.7'7 ..

The first row in the equivalent set is determined by settingt

To get the second row:


t

c 2j ::: a 2j ..

~lj

~clc;.
e~cla2'
j
J J
j
J J

..

A~ci,
j
J

This = 0 (as required) if A. :

~clja2'
j

which will hold i f the denominator ,. 0 _. but by


virtue of the independence of the original equations the equality can not holdo
For the third row:

~Cl,c3'
, =~a3,cl'
j
J J
j
J J
This = 0 i f

Similarly

A:t

=.

..

A:L (1) - ~(O)

~a3 .c l
j

'
J

.~ =~a3jC2j
J

Finally we set:

The completion of the proof follows readily by induction.

--------

~~-

-1-78 hypothesis~

In the alternative formulation of the general linear


N-p+a (~ N) linearly independent equations,

we had the followin

i=:~

A. ti

~i

Ill:

From the lemma we may assume these form the first N-p+s rows of an orthogonal matrix,
These can be extended to form a complete orthogonal matrix (NxN) -- this is a well i
known matrix algebr~lemma,\l proof is in Cramer or Mann. If we call the complete
'
orthogonal matrix ./ '- , we have

.0.

~l

..
II

"N.p"

III

Pu

.0.
o

"

":w

~-p,
Pl N

'A,ts-"

P Vs orthogonalizE

't'ts added to complete


the matrix.

..
0

C)

fJ

PaN

Psl
III

'l;lN

"

<l>

'l;

p-s, N

E(Y,e)

= 1.=1
~ 4 Di ~
'CI

by the assumptions

=0

.0

if H is true
O

ci

yts are independent normal variables with means as shown and variances
(an orthogonal transformation changes NID variables into other NID variables with the same
varia.nce s) "

-179 This is the canonical form of the general linear hypothesis.


2

Yi are NID (1,.(.1" a)


jJ..

].

H:t

where lJ.i

at:

=- 0
I

one or more of these s IJ.i

"

S .;.

MeL.R" Test for the canonical form:

=- -

( {2i1 a)N

m.loe. of the IJ. 's in ware found by setting IJ..]. = Y.]. (i=N-p+s+1"
N-p+s

and~2=~

1=1

A =

u."

N)

y. 2 "N
].

8~1

(~

e - N/2
e - N/2

N-p

~y.2

i=l ].
N-p+s
~y2
i=~ i

Qr

N-p+s
y. 2
== Q + ]
a
i=N-p+~].

'A,N

Q
a

CJ;

=(~r

= absolute

minimum (nothing
specified about the hypothes
r = relative minimum

-180 -

Qr - ~a

- 'l:9
1'1

A. -1=

X (s)

N;f

=~

y.

i=1

has the F-distribution with parameters

Therefore

~d.2

N.;fItS

Recall that

is

J.

i~-~l

>

(S.ll

N-p)

J.

y.2 is the sum of squares of variables with non-zero means" and


J.

hence has a non-central

x2..distribution

with parameters (s,

i=1

d. 2 }e
J.

(Q - Q )/s
The ratio of a non-central X2 to a central X2 ia a non-central F ~ thus ~r_,...-a
__
Qa P
s
under H.. has a non-central F-distribution with parameters (a, N-p.t ~ d. 2 ).
--~
i=l J.

7N-

Thus the problem is solved in terms of the yts (the orthogonalized XiS).
The original problem was:
XiS are NID

..

1)

i=l

)..0'
",J.

(~i' a2 )

~i

= 0

2)

i=l

Pki jJ.i = 0

.e = 1" 2"

Q $

N-p

- 181-

HeLoR. test:

defineS

Q;
Q;
~

=:

min

~ (Xi

=:

min

(in .f).)

G (in

w)

~)2

under restrictions (1)

(Xi - lJ.i)2
=:

under restrictions (1) and (2)

(Q~)lJ2

~ (Q r )1/2
r

')/2e-N/2'
.

"
a
"" =n'
-

""r

-Nl ~

As in the other case, this is a monotone decreasing function of

Q - Q

r,

Qa
Recall that under an orthogonal transformation, sums of squares are preserved.
Thus:

Q is carried into Q

Q is carried into Q

Hence:
and has the same distributions under H (F-distributioj
and under
,Problem

74:

11.

(non-central F).

X. 'k is N(j.l.", cl)


~J

j.l.

~J

=~

=~,

2,
= 1.. 2,

i
j

~J

0.0,9

a
b

+ ~i + ~. + (~)i'
J
J

all ~.

Find the usual F-test for 1)

H
O

2)

= OJ

all (-<13)

~J

=:

- 182-

Power of the ANOVA test:


Power of the ANOVA test has been tabled by Tang (Statistical Research 11emoirsl Volt\> 2
for ~ = 0005, ~ = 0.01 for various values,of.

Tables ing

lit

2~JJi2

s+.

Mann
Kempthorne

s
2

aO' A

==

d.

k=l

i~

YN-p*k =

i==l

P ki Xi

~ ~

==

Q-Q

11,=

--,-

=~y'
k=l

(N

=~]

N-p+k

k=l

i=l

hence 20'2", is Q - Q with X. replaced by E(X )


r
a
1
i

I:

IrA.

Example:

or

X" k =
1J

f,J.

13J

+ -<i + ..<. ""


1

]13. ""
j

-I;

(0<[3) 1J

e"

-I;

1J k

~
(0<[3) ..
i
1J

..<. "" 0

"" 0

X eo. l
under

~,

-<i not all zero

Qr .. Qa ==

~(~13) ..

E(l.1 ., ) = ~ + -<.1
E(l

C'

) ""

E(l.1.. -

+ 0 + 0

X.... ) "" -<.1

1J

=0

- 183 a
hence

~ ~

20'2",:::

thus

Exercise:

i=l jl:~ k=l

..{.

nb

1.

.,(. 2
i=l 1.

2nb .::J~i 2]1/2

I:

[ 20"2

If a

3" b = 5 how large should n be so that we can detect

I:

with probability .75 (~ ::: 0 0 05).


Try n.. calculate , enter the tables and find the power. After successive
trials n will be obtained to give the required power. (daf for the numerato
:0 2,; for the denominator::: ab(n-l) ::. 15{n-l).
Verify that n ::: 13.
0

Randomized blocks;
i .. I" 2"
j = 1" 2"
~..

J.J

~ +

J..

oft

.. e ...

"0.''

b.

(this is the additional assumption of randomized


blocks)
or

X,.
J.J

I:'.~

where

+ "<i + b. + e ..
J
J.J

ij
j

is

N(O, 0'2)

is

N(O, O'b 2 )

and they are independent.


some

-1...1.

rj.

~b,

Given b."
J

If H is true"

X.1..

ib

are N(~ + ..(. +


1.

X.

1..

is

N(~ ' ..

i=l J

II>

2
0' )

""0

0'2)" and

~ ~ {f. - X

1=. 'J=
1.'11....

a
)2

~ (X. _ X )2
'11.
1.=

2
has a X -distribution with a-1 dGf. This conditional (on the b,) distribution does
not involve the b j " therefore the unconditional distribution of J
a

~ ~ (X, - X

i=l j=~

1.

)2 is

x2 (a_l).

..

...

-184 ...

In general" given b.,


J

~ ~ (XiJ.
i

- Xi....

X.

-J

2
is X (a-l)(b-l).

)2

Again we have the same result for the unconditional distribution.


Hence a test of

He

is based on the statisticB

ft

.>y'a

(ili. - il

~ (X'j~x-.-'"';"'--x-.-It-X-)2-~~(a---~)-(b-"-l)

[b x.1"

is N(IJ.'

X -distribuliion with parameter b

+$ ..<.,
1
~c(i2.9

.~~'

.J

1.

which has the F-distribution with


If H is false,

-1

(a-l)" (a-l)(b-l)

dof.

~ ~(X1.... XO'~ )2

ri)e

has a non-central

dot' ~

a-l~

The Power is also calculated as i f the b t S were fixed parameterso .


Random model:

(Heirarchical Classification or Nested Sampling)

00.,

i =~J1 2,
j = 1.~ 2" ..

where the a

u,

a
n

are N(O, ?"a ).

ANOVA Table

~
aSs
error

a-I
a(n.,l)

E(MS)

SaiS.

~ ~(X ....!

)2

~ (X i {'X i )2

1.

CI<ll

+na

has a x2.distribution with a(n-I) d.f.

-18$ Since this is conditional on the ai' but independent of


them" it is also the unconditional distribution"
....
e~g.,

unconditional distribution.

in terms of moment generating functions

j.tb

~ (X. - X

i=l

J. 1t

)2

is ,,2 with a-l d.f.

Qa

a2 itnaa 2

He;

eTa

a leads

...

to the usual F-test against

~g

--:2.
a

+na
a

has an F-distribution with [(a-I), a(n-l)l d 0 f.


Eroblem 75~

Xijk

1ll:LJ

a)

lJ.ij ... l..l. ... ..<..J.

b)

I.kij

=:0

,i)

is N(l';.j'

l..l.

-I;

it

b ..

J.J

a. + b ..
J.

ij

J.J

..,

= 1,
= 1.,
= 1.,

a.
2,
b
2, .,
2, 0 , n
k
where b
are N(O, CT 2 )
ij
b
2
where b
are N(O, a )
i
j

a.

J.

fj.~,

are N(O,

CT

2)

and the a's and b is are independent.


For n

= 21

= 2;
1.)

2)

a =

5;

: 0.05,

in (a) against

in (b) against

plot the power of the tests:

eT2 +
aa

2~ 2 2

-186
~

MUltiple Comparisonsa
Tests of all contrasts with the general linear hypothesis model following ANOVAI!>
Tukeyts Procedures
observations:
let R =- max
~

se

Xl' X2 ,

i Xi ~
~

'5 i

is an estimate of

eo.,

Xn are NID(O,

- min {Xi}
n
1. ~ i ~n
2 which is independent of Xl' X ,
2

qU,

Xn with

f d.f.
(iQe., f s 2;02 has a X2.distribution with f d.fQ)
e
The distribution of Ria can be found in a general form" and in particular when the
X9s are normal (since the X.la are N(O, 1) -- it is free of any parameters.
:L

The distribution of Ri

RI!a

= !..
s e "'" s e

::0

Studentized Range

has been found by

numerical integration and percentage points have been tabulated by Hartley and Pearsor
in Biometrika Tables, Table 29Q
Consider the one..way ANOVA

situation~

Xij are NID (~i' cr )

i
j

2
!i. are NID (I-l-i , ~-)

I'1SE =Consider HO~

~ ~ (Xij

2<0

Xi )2

a(n,,",l)~-

J.!i::O ~

= 1"

2,

~ ~,

=- 1 ~ 2~ 0". II n i = n

2
is an independent estimate of a with a(n-l)

or IJ.i .., ~ = 0

Given HO:
Pr

This t is the statistic Q (a" f) in Snedecor, section 10.6, po 251


where f =0 d.f () =' a(n-l) in this case.
This inequality will hold for any possible pairwise comparison (i.e lll " any i, k).

d~f.

~87

Frame of reference is the total experiment" not the individual pairwise tests -- is eli,
for the total experiment and making all possible pairwise tests, the probability of
the Type I error is ~..<" where co< is the level chosen for the Q(a"f) statistic.

i.1. - '1<:

Hence we reject H:

I:'

for any i" k when

rn (Xi. - X
{Tm
k)

Q..< (a"f)

Scheffe Test for all contrasts (most general of all).


Ref:

Scheffe, Biometrika, June 1952


2

X.. are NID (lJ.., (1 )


l.J

l.

1, 2,

1" 2,

se 2 is an estimate of (12 which is independent of

with f dof.

~ c.~l.'

i=1

...

i=1

l.

Consider the totality of all such


Theorem 38g

X.l..

0 ....
l.

HO(~).

If the lJ.i are all equal" then

Pr

(a-I). C

a-l
where 0'

()

hence, if' we reject any such H (') when


O
a
2

[ i=l 0i

(Xl.'

----

- X

>}

2
s 2~ 0i
e. i=l

n.l.

then the type I error

~-<.

a-l.1
F"l
(1

r(a-~+! )

r()r{f)

_ a-ltf

F)

dF

- 1.88 Proof:

Under --0
H_

For all c. with ~ c.

~c.X

1. ...

1.

1.

0,

=:0

>

~Pr

.......

~c i =0
Y.1. are NID(O, 1)

define:

i=l, 2, ".:1 a

d.

1.

tit

~o.1. =~.r;-:'J u i d.1. =

(]

d. Y)2

Expre8$ion we wish to maximize is

1(,2
1.

i_

with respect to d

=- 1" 2" ... " a

r~ rn:J y.J~di
Y~li=
J
~n.J

multiply

[lJ by dj

since

~ (ltd.
1. 1.

:=

and sum over j:

2[;EdlJ [;EdiYiJ - 0 - 2~(l) - 0


~ - [~diYi] 2

since: i)

~ v:'d.
J. J.

ii) ]d. 2
J.

=- 0
=:0

- J.89 put

":t"

~2 in [1] to obtain the jth equation:

2Yj

[~d1YiJ ~ Tn;~Jii',.~J[~diyJ -2dj~d1yJ ~O


2

nj

= ~

(Y . - i)
J

j=l

Reoall:

Pr

.,
J.=..

c.J. !.J.. ]

t
~C'!i
J.
,

:il:. Pr

>t

~ n. Y.
_ _...J.---.;J.;;;.._

~ni

:=Pr
6e

Y.J.

22

and are NID(O" 1)0

(]

>t

... 190 ..,


We can make an orthogonal transformation of the formi

Then

has a x2..distribution with (a-l) dof ..

Hence

>a.:rt

Thus

has an F-distribution with (a-l, f) d.f.

but

Therefore the type I error will be less than or equal to ..< if we reject any H (!!) ifa
O

L~Oi 1 '>
2

Xio

se

2~ci

~-

ni

(a-l) Fl _-< (a-l" f)

191 $"

~.Tests
Simple hypothesis

Single series of trialsi


events)~

classes (or

pr6babilities~

observations:

l'J.

P2

vI

"

Pk

~P.J

~v j

=1

=n

If H is true

X.

is asymptoticall1 normal subject to the restricti..

lOt

By an orthogonal transformation it can be shewn that

(
0,
"'Iv. -flP.'/

~. -1L.~ 6~
j=1
l1P
j

--71

The power of the test

ha.s an asymptotic X2.distribution with k-l


d"f. as n --7 (!) 0

as n--7oo"

~ower of a simple x2-test:

p. = p.
J

v. - np. o

V ...

np.

=_L.-...:2
X. =- -L-_:.L
.r::-Or:;::'J
p
, np
'I n

$ j
p.

now e

-:L
0
Pj

p.

1. + -J

Pj

to>

p. 0
J

as
Under HI the Yj are asymptotically normal!)

n~oo.

- 192 -

~ Xj 2

With the same type of orthogonal transformation, we have

under

H:L

is a sum

J~~_ and henea has a llOn.-oantral x200distribntion with parameters

of squares of Yj +

Pj
k

j=l
k

The non-centrality parameter can also be written as

n~

j=l

Example a

v." noo of families with


l.

boys in families of 2 children.

Assume the number of boys is binomially distributed with parameter Pit


H:
O

P .. Oc4

p: 005

p.

No .. boys

Pj ~under

under HO)

.16

~48

.36

A=n

=t

I1.)

,,0816 n

For n = 100
The power can be determined from the Tables of the Non..Central x 2 by E. Fix,
University of California Publications in Statistics, Volume 1, No.2, Pp. 15-19.
From the tables, for

= f =- 2" .,( = 0.05


(power) <: 008

A. ... 8/116" k-1

00 7

~ ~

x2_test may also be a one-sided test:


for j

~kw

for j

> k'

(kt exceed P.O)


J

(k - k' fall below Pjo)

- 193 -

x2.teat

o2
(v, - np, )
caloulate ~ a J - as usual

is modified as follows:

for j .f k':

nPj

for j

>kit

<.

if v

> np,O
J

if v j

x2

if v j

<

np j

np j

put the contribution == 0


put the contribution

=0

calculate x2 as usual

is rejected if the sample value ;> X21.2~(k-l)

Examples

8(10)

12(10)

20

12(10)

8(10)

20

Refs:

if v1I

X~

=0

--

0.5

=:

therefore we fail to
reject H
O

20

20
noteg

Hog

> 10 we would

then calculate x2

(on the X2...test) Z


Cochrane

1952 Annals of Math Stat


1954 Biometrics

x2.test of a composite hypothesis:


classes:
s series of observations:

1
vll
'"
0

probabilities:

2
v12

0
0

k
vlk

.eo

"
v sl

v s2

..

P1I

P12

oQ

"

v sk
Plk

"l)

PSI

P.s2

.,,.

Psk

= 1, 2,

2:1

= 1,

...0",
~

j=l.

Pij

Vi'
J

=:

=:

f(~)

s
k

- 194 -

~
[P;;
j
~J
HO~

Pij

X' j
~

1:0

for i

1" 2" ..." s

= PijO(~)

Theorem 39.

1\

IrQ is estimated by

then as n

~ 00,

m.1.e~ or any asymptotically equivalent method


under the regularity conditions as for estimation:

r: .

1\

""\2

LV i; - ni.Pij (~) I
-~

7t

n-lp, . (9)
~J -

-=-

.l.

has a X2~distribution w~Gh s(k-1)-t d$fo when H is true.


(t =: no~ of components in ~)
0

If H1 is true8

Pij == Pij

=:

Pij

c. .
-8-

and

ni
=:

--

~n

l/ n i

i
2
then X has a non-central X2..distribution with d.. f. as before"

l
and non-centrality parameter .S [ I B(BfBr

where Oi
- ], x ks

(..JC"JPi)

Cramer:
Mitra:

J~

:: _~J

r=-o- ~
Pij

Ref:

B~

i = 1" 2"
j # 1" 2"
.e == 1" 2"

0'." ks
.. /) 0"
" "t

..

x2 section
December 1958 Annals of 11ath Stat
Phq,D. Dissertation, UNC, Institute of Statistics I".fimeograph Series
No,,142.

- 19, -

Note on the general X2 tests of oomposite hypothesesi


v ij - n i Pij
X :
- are asymptotically normal subject to the linear
ij
i Pij

Jn

11 JPij X1j ~ 0

rel>i;rictionB that

~ 1,

for all 1

2, .... B

1J

For sufficiently large n the m.l.e. reduces to solving the equationa

dIn
"$Q

I=

dInQL

-a

==

+
Q

(~~~2In L j
oQ

~ =9

) (9 -

9 )
0

Recall:

1n L =- KI

01;

~~
i

v.. In Pij (9)


1J

In L is linear in the v ij" as are

...t-~
'0 ili

h =- 1" 2

Hence the estimation of Q (single component) means one additional linear restriction
on the v. . or on the X. j (since the two are linearly related) It
J.J

J.

If this restriction (2) is linearly independent of the s restrictions in [11" then


we can transform the s+l restrictions to s+l orthogonal linear restrictions and then

by the usual extension get an orthogonal transformation that takes

~~
ij

Xij2

into

~~

y ..

ij

with s(k-l) - 1

1:1

sk - (s+1) terms and henoe we get

J.J

the result of the general X2 theorem (theorem 39).


Example a

Power ofax2 test of homogeniety" 2 x 2 table

Probabilities:

1)

11.1

2)

Pu =- P21 =
~:

P12

=P22 1 - Q

P11

= Q + o/[ll

Pl2 = 1 - 9 -

.P21

= Q - olin

P22 ==

1. -

ct F

c/

rn

[I - B(B IB)'l B' 1,2

A =t,2'

take PI

= P2 '21

(sampling fraction)

::0

~ (-r;--Q-

BI

BtB

- ~96 ..

::0

-1

(BIBf

=:

-Q(l1-

=-

1 - 9
Q(l - 9)
~

.. Q

- 9(l-~
2

1 -

-Q(1-9)

... Q(l-@)

1
B(BIBr Bi =.

1 - Q

-r-

(~

5'

I 0

T
-c

J2(1-9)

2"

1 .. 9

- Q(l- Ql
2

-9(1-Q)
-2

--r

- 9(1-9)
2

~--

- 9(1-9)
2

-rQ

~-

.. Q(l-e)

-61

Q)

-2

-c-

-c

~2(1..9)

::0

(patterns of + and - signs in 5, Bare


opposite and sufficient to cancel all termse
Hence the non-centrality parameter,

;..

c2
= Q(i-Qj

Which can be expressed in terms of the pts as:


PII

of;

P2I

=---r2c

PII" P2I

= Fn

PI2 + P22

I-Q~--~-

n (
=t 4
Pll .. P2I )2

- 197 -

A=

In the Deoember 1958 Annals of Math Stat Mitra gives the following formula for A in
the 2 x k contingency table&
probabilitiesg

~Qj
J = Q + c1.J./

pJ." "

rn

=1

~C1j = 0
~C2j = 0

2
(c lj - C2j~

9j
For the above example:

Pl

= P2:

"2'

6. Other Approaches to Testing Hypotheses and other problems,


1.

Most Stringent Testse


Let the family of tests of size
Let ~(Q)

sup ~

e ':f'(.1..)

(Q)

.I..

be

<p (..<).

- 198 -

(x) is most stringent if:


i)

it is of size

ii)

20

for any other size ~ test ~ I

I'1inimax~:

L [D(X)"

Example:

Decision Theory Approach


== the loss when D(x) is the decision made and 9 is the true
value of the parameter.

X is N(Il" 1)
D(!) == 1
D(!) ... 0

means accept

L [D(!)" Il] ...

HO~

Il'" 0

JJ.2 when D(!) ... 0

L [D(!)" Il } : C/Il when D(!)

Having set up a loss function, we then have a risk function defined as:

rD(Q)

~ E(L):

.5
00

(:>(!:), Q]f(",. g)

It is frequently impossible to minimize this universally with respect to

Q"

thus:

D(x) is a minimax decision rule i f it minimizes (with respect to all possible


decision rUles).

SUPg reg)
or expressed another way we determme3
ref:

infD sUPQ r (Q)


D

Blackwell and Girshick, Theory of Games, Wiley

3& Admissible Decision Rules:


D(~)

is admissible if there is.E2. DI (x) such that


rD,(Q)

~ rD(Q)

for all t;

rD'(~)

<::

for some Q

rD(Q)

i.e o , you can not improve on D(!) uniformly0

---------

.. 199 CHAPTER. VII

MISCELLANEOUS
A.

Confidenq, Re gions:

B(w)

Let
Def.

ta.:

=:

A(~

the totality of subsets of

If Pr[A(!>

Theorem 40:

.rL

::> e J)
~

1 -.,(

...

then A(X)

is a confidence region with

~.

If a non-randomized test of size

exists for H : e

Let R(X.
-

e)

90

_9
_

be the set of X for which H : e


-

=:

for every e

0 0 0

then there exists a confidence region for


Proof:

be a function from X (sample space) to B(UJ)

confidence coefficient 1 -

is a random variable with d.f. f(2$., e)

=:

e of

size 1

J ~.

eo is rejected, e.g.,

R(X,e),

lIl1f111,

IllIflfn

--x
[lJ

This R(!, eo) is defi~ed and satisfies [1 J for every eo'


Define: A(!)

U X J R(2,

./

Pr[A(!> :J Go] = Pre!


'~='

e)

J R(e)]

x.
=

1 - Pre! e R(e)

J >

1 - ~

q.e.d.

48: A confidence region, A(!>, is uniformly most powerful if


Pr[AQ~)

::>

Ie'

is true J is minimized for all

e' r/ e.

note: Kendall uses this same definition with "uniformly most powerful"
replaced with lIuniformly most selective. 1I Neyman replaces "u.m.p."
with "shortes t II

Def. 42,:

A confidence interv'al is "shortest" if the length of the interval is


minimized uniformly in 9.

Confidence Interval for the ratio of two means:


:d

,~_==-.

a~, a~, p)

Xi' Yi are bivariate normal BIV N( fl., "<ll-,

= correlation

wherejO
Problem:

coefficient and may

find a confidence interval for ..<

i = 1, 2, ... , n

= O.

= ~fi~

Zi= ..<Xi -Y
i is N(O "~2a2
x - 2..<na
r x a+
y 0'2)
y

2. (Zi

- 2)2 and

Z are

independent and distributed as X and normal respectively.

L (Zi-Z) 2
n-l-

=
222

=..<

sX -

2~s

:xy

+ By

therefore:

4):~ =~~i1/2'
x

Pr

xy

....i~ 2~...;:.-Y~-s;>2~1"'r72~ <


(..< s

d.!.

-.... 1- -1

has a t-distribution with n-I

x - 2..<s xy +

l-..
2

We can solve this and get confidence limits for


quadratic equation in ..<:

,,~.

(n-I)

=1

- e

This yields the following

- 201

000

Among the various possible solutions of this quadratic equation we can have the
following situations:
~,l~/,,(
2
A.
If nX! .. t s~ > 0:
~"<2
It can be shown that 2 roots always exist and this the confidence interval is:

...
For the above inequality to hold, X must be significantly away from the origin;
i.e. , we must

rejectHo:~

=0

on the basis of Xl' X2, ~ Xn at the e level.

e.g. , the following condition must hold:

i\lx.L>t l
x

.. .!
2

One could, if desired, change t (and thus e) to insure that this inequality
always held and thus that a ureal" confidence interval exists.

Note:

B.

If

nX2

- t

s2
x

<0
~,

..(2 may be real or complex

If the roots are real, then the "confidence interval" is (- 00, ~), ("<2' + 00),
i.e., we accept "in the tails."
......
2)2
2\ -2
2 2
2
If 4(nXY t Sxy
< 4(nX....2 .. t 2sx'
(nY - t Sy), i.e., b
C.

< 4ac,

then

the roots are complex and we have a confidence interval (.. 00, + 00).
[Thus we can say that .. co
D.

If we get

< .< <

+ co with confidence 1 ..

Q(..<)

----+-----'-"..<)
Q(..(

' \

Refs:

eJ.

s. 0 <:

ill:..2

(..( s

1..<1 - Y I

2
x

.. 2..<8

2 l7~2
+ S )
xy
Y

thus we always accept H since the test statistic never exceeds t.


o
Fieller, Journal of the Royal Statistical Society, 1940
Paulson, Annals of Math Stat, 1942
Bennett, Sankhya, 1957
Symposium, Journal of the Royal Statistical Society, 1954.

<t

- 202 -

LIST OF THEOREMS
Theorem
10

Properties of distribution functions

G.

2. 1 - 1 Correspondence of distribution function and probability measure

Discontinuities of a distribution function .

4.

Multivariate distribution function properties

50 Addition law for expectations of random variables

..

22

22

Conditions for unique determination of a distribution function by moments


(Second Limit Theorem) .. 0 0 e 0 0
0

23

8. Tchebycheff's Inequality. .. .. ..

24

9.

25

7e

10.

...

...

6. Multiplication law for expectations of random variables

Bernoulli's Theorem (Law of large numbers for binomial random variables).

..

Characteristic function of a linear function of a random variable

28

11. Characteristic function of a sum of independent random variables

29

12.

30

Inversion theorem for characteristic functions

..

~.'

.....

13. Continuity theorem for distribution functions (First Limit Theorem)

33

14. Central Limit Theorem (independently and identically distributed random


variable s)

II

"

..

"

15. Liapounoff Theorem (Central limit theorem for non-identically distributed


random variables)
"
16. Ofeak) Law of Large Numbers (LLN) ..
0

17. Khintchine's Theorem (Weak LLN assuming only E(x)

< 00 )

18. A Convergence Theorem (Cramer) co co .


e
19.. Convergence in probability of a continuous function "
20. Strong Law of Large Numbers (SLLN) .. .
21. Gauss-Ms.rkov Theorem (bol. u. e. ) " .. .
~

<l\

22. Asymptotic properties of mol.e o (scalar case)

35
41
41

"

44

46

51

61

.. .

23. Asymptotic properties of m.l.e. (multiparameter case)

33

65
70

- 203 -

Theorem

Page

24~

Information Theorem (Cramer-Rao) (scalar case)

25.

Information Theorem (Cramer-Rao) (multiparameter case)

76

81

26c

Blackwell's Theorem for improving an unbiased estimate.

88

270

Complete sufficient statistics and m.v.u.e.

28.

Complete sufficient statistics in the non-parametric Qase

290

Bayes, constant risk, and minimax estimators

109

300

Density of a sample quantile

116

31. Convergence of a sample distribution function

o.

124

32~

Pitman's Theorem on Asymptotic Relative Efficiency

*' 140

33~

Neyman-Pearson Theorem (Probability ratio test)

G.

92

96

34. Extension of Probability Ratio Test for composite hypotheses "

35.. Sufficient conditions for uom.p.u. test "0.0

(>

149

153

156

36.. Reduction of test functions to those depending on sufficient statistics. 160


37.

Conditions for invariance of a test function "

Scheffe's Test of all linear contrasts


2
39. Asymptotic power of the x test (I'1itra)
38.

"

""

162

187

194

40. Correspondence of confidence regions and tests " "

".

199

... 204 -

LIST OF DEFINITIONS
Definition
10 Notation of sets

2. Families of sets

Oc

. . . . . .. . . .
. . .. ..

_'.

-0

.,

40
50

Additivity of sets " ..

Probability measure

6"

Point function "

30 Borel Sets

..,

.... .. ..
0

7. Distribution function

8. Density function

<3

9. Random variable

'.

..

..

..

II

_e

"'0.

..

..

-0

O'

.e

cD.OO

10

0.0.

10

II

O.~

16

-0

..

...

GIi

(0)

10.

Conditional probability

ll.

Joint distribution function

12" Discrete random variable "

13. Continuous random variable

000,

14~

Expectation of g(X)

1$"

Moments

160

Characteristic function

17~

Cumulant generating function

180

9-~e

()

(l

"

17

21

It!

..

O.OeJ

2$

..

28

Convergence in probability " .. "

" " " " ..

hI

.." " "

"

"

"

"

..

"

eo.

e.

II

"

..

22

19. Convergence almo at everywhere . . . . . .. 49


20"

Risk function

..

21 0

I~nimum

22.

Best linear unbiased estimates (b.louoe o )

230

Bayes estimates

24.

Minimax estimates

II

..

"

<)

..

ill

..

..

..

"

"

(m.v~u.eo)

variance unbiased estimates

.. .. ..

..

" "

..

"

"

"

..

I>

"

$5

56
57
58
60

- 205 Page

Definition
Constant risk estimates
26.

Consistent estimates

OIl

.Oll.,~

II

eo

60

o.

60
60

27. Best asymptotically normal estimates (b.a.nce.)

28. Least Squares estimates

32 0

Complete sufficient statistics

330

Test

0."0.,000

34. Test of size

ooo.o

350 Power of a test

"

"

36. Consistent sequenoe of tests


Index of tests

e"

...

40.

Sample quantile

"

o.

64

oa

..

86

92

113

"o.o

'

113

113

.'

o.o

eCJ:o.

CI~.o

. . .,

"

..

41" Uniformly most powerful tests

l).

113

113

."'.e.o.o.o o

114

tIIQ.e

115

.)

116

1). 154

154

155

64

"

oa.<t;>.o

Asymptotic relative efficienoy (AoReE o )


39. Quantile

"

. . ..

Maximum likelihood estimates (m.l.e.)

61

30.

Likelihood funotion

290

31. Sufficient statistics ..

~.

oo

148

42.

Unbiased tests

43.

Similar tests "

44.a

Invariant tests

45..

Uniformly most powerful unbiased tests (u.m"p"u.) "

.....

155

46.

Maximal invariant function

470

Confidence region ..

48 0

U.. M,P Q confidence region .. .. .. "

490

"Shortest" oonfidence interval

80

..

"

0'

oo

.,oeo

ee

161

..

..

..

..

199

.. .. .. .. .. .. 199

..

0$.0

..

..

...

200

- 206 -

LIST OF PROBLEMS
Problem

o.

lim
Pr,( 8n ) " " " ..
n-?,oo

3.

Cumulative probability at a point " " "

40

Joint and marginal distribution functions " "

5~

Transformations involving uniform distributions " " " " " " "

1.

Pr(Sum of sets)

Sum Presets)
=0

"

"

""

13

6. Distribution of the sum of probabilities (combination of probabilities).

19

7. Prove that the density function of the negative binomial sums to 1

21

8& Find the mean of a censored normal distribution


9.

"

Uniqueness of moments (uniform)

(l

. ., " "
"

<>

." .

. ." .. , .
"

21

24

10.

Prove the normal density function integrates to 1 " " " " " " " " "

28

ll.

Derive the characteristic function of N(O, 1)

"

(II

28

eo.

.,

"

32

"

..

32

33

".... " " "

Ii

10

35

36

38

It

"

38

41

41

coo

12. Density of the product of n geometric distributions


13.

Factorial mog.f. for the binomial distribution

Inversion of the characteristic function (Cauchy)


Limiting distribution of the Poisson

16.

Use of Mellin transforms

"

"

17. Transformations on NID(OJ a2 )

18. Limiting distribution of the multinomial distribution


19. Limiting distribution of the negative binomial

20.

Show E [g(X)

210

Distribution of the length of first run in a series of binomial trials

22.

Distribution of the minimum observation from a uniform CO, 1) population.

44
44

23.

Prove theorem 19

E,x [g(x)ly]

Ci

oo.o.e~o:o

oe

Asymptotic distribution of (X - A)2/X where X is Poisson A


Elcample of convergence almost everywhere

,,~

"

46

48

So

- 207 -

Problem

26. Inversion of (t)

= cos

ta

eo'

e.eOc

o.

\)

.,

Asymptotic distribution of the product of two independent sample means

28. Show that

x is a bol.u.e of ~

o \)
<>

29. Example of minimum risk estimation

..0

30. Determination of risk function for Binomial

~.

50
50
57
57
59

31" Find a constant risk estimate for Binomial p

60

32. Proof of consistency

..

60

ItI

co

Cla

63

69

0,

"

69

c:

70

74

E>

.,

of parameters in simple linear regression


2
= a 2 e_a x

Find the m.lee" of lla lQ' where f ()


,x

M.l.,e. of Poisson A and its asymptotic distribution

.,.

11.1..8e of parameter na" of R(O, a)

37., M.lee. of the parameter of the Laplace distribution


38g M.l08 of A,
3

74

\)

75

bound for unbiased estimates of the Poisson parameter A

78

Crarr~r-Rao

39. Application of the multiparameter


40., C-R

J.o~18r

~A

where X is Poisson AJ Y is Poisson

<>

lower bound

410 C-R lower bound for estimates of the geometric parameter

80

42. Application of the C-R lower bound to the negative binomial ... 84
43. Derivation of sufficient statistics

44.

4)

87

89

Sufficient statistics and unbiased estimators for the negative binomial

45. Variance of sufficient statistics for the Poisson parameter A 91


46. M.v. u.e. of a quantile
47.
480

<>

. ..

95


..
98
Minimum and constant risk estimates of cr2 (normal) "
98

49. M.l.e o of the multinomial parameters (p..


~J

<>

= p.~1:'.)
J

101

50. M.lee. and minimum modified x2 estimates of the multinomial parameters


(Pij = .,(i "" Pj ) C l . . 0 0 .,

107

- 208 Problem
2
M.l.e$ and minimum modified X estimates of Pi

=e

-~iA

fl

It

"

f}

108

,2. Comparison of variance of minimax and m.l. estimators for Binomial p 110

,3.

Example of t1Tolfowitz minimum distance estimate of

,4"

A.R.E" of median test to mean test (normal)

,,~

Power of the normal approximation for testing Binomial p ..

,6.

1.1.

..

A.R.E. of median test to mean test (general) " ..


Distribution of intervals between R(O,

1) variables

. ..

..

tI

"

..

Example of the U test (lVilkinson Combination Procedure) ...


l

.....

Distribution of the range

:)

Power of the

n-n

..

Expected value of W~ (Von Mises .. Smirnov statistic)


':>

fl

i>

..

. . . . . . . ..

test (Kolmogorov statistic)

62. Wilcoxen's U test . . . . .

"

CJ

..

e 0 0 ..

..

"

..

.......

"

..

..

64. Application of two sample tests

..

..

..

..

..

.'e"~..O"

"

..

..

'. 112

.. 114

. 115
117

. .. 120

"

II

II

123

124,6

. . . .. 128
Gil

..

, "

. 129

..

134

:)

63. Distribution and properties of the number of Yi exceeding max Xi (two


sample te st) ..
Q

..

134

138

65. Moments and properties of a particular non-parametric two-sample test


statistic

0""..".. 0

I)

..

"

..

"

'"

"

66. A.R.E e of Wilcoxen's U test to the normal test


67. Equivalence of the
68.

Kruskal~Wallis H test

0 " .. ... " "

138

14,

and Wilcoxen's U test for k=2

UQM,p. test for negative exponential parameter ..

69. Example of non-existence of tests of size "'"

70.

."

..

..

ct

Co

lJ

. .. . ..

.. "

to

73.

f.L (normal) "

~. 75e

Power of the two-way ANOVA F-test

f)

. .. 151

152
160

"

U.m.p.i. test for the coefficient of variation (normal)

74.. Derivation of F-test for the general linear

..

167
..
. .. . " . . . 175
" . 17,
. . . . . . . .. 185

71. Example of application of 1"1aximal Invariant Function ..


72. Maximum likelihood ratio test for

147

~pothesis

(two-way ANOVA) .... 181

"

You might also like