You are on page 1of 33

SIMULATION STUDY OF RUNS TEST Assignment for Nonparametric

and Sequential Analysis

FOR RANDOMNESS Pritam Dey


M.Stat 1st Year
OBJECTIVES
To conduct a simulation study of the runs test to check randomness in a given sequence
of observations with the following goals:

1. Checking distribution free nature of the test


2. To verify that in the limiting case, under the null hypothesis, the test statistic has a
normal distribution.
3. To evaluate and compare powers for various sample sizes.
4. Check consistency of the test procedure.
SETUP OF THE TEST
Let 1 , 2 , + be a sequence of some random variables.
The Hypotheses
H0: 1 , 2 , + is a random sample
H1: 1 , 2 , + is not a random sample

Extra Assumption: The Xis have a continuous distribution.


Construction of test statistic
We take the m smallest elements to be of type 1(say H) and the rest n elements as type
2(say T).
We compute R=Number of Runs of H and T.
So, if =label(H or T) of the ith observation, then
R = 1 + =2 ( 1 )
SETUP OF THE TEST
Rejection region: We will reject H0 for small and large values of R
Size: Throughout this study we will use our size as 0.05. If exact size is not possible
due to the discrete nature of the test statistic, we will use a conservative rejection
region.

Justification for the test : All (m+n)! arrangements of the data are equally
likely under the null hypothesis. Thus by defining the m smallest observations as
+
type H and rest of type T , all ( ) possible arrangements of H and T are

equally likely. So we are justified in using this test.
DISTRIBUTION OF R UNDER H 0
The exact distribution of R is:


(2 1 )
Let a=m+n, then (0,1) as with lim =
2(1)
So we can use this for a asymptotic test for relatively large a.
We will use tables of distribution of R available for sample sizes less than 24. For all other
cases we will use the asymptotic test.
CHECKING DISTRIBUTION FREE NATURE
We choose 4 different distributions: Uniform(0,1), Normal(0,1) , Exponential(1) and
Cauchy(0,1).
For each we draw a random sample of size m+n and compute R
We repeat the above 10000 times and estimate P[R=r] for different values of r.
We plot these on a graph
Distribution free property check

0.5
_ uniform _ normal _ exponential _ cauchy

0.4
Estimated P[R=r] m=7 n=7

0.3
0.2
0.1
0.0

2 4 6 8 10 12 14

r
Distribution free property check

0.5
_ uniform _ normal _ exponential _ cauchy

Estimated P[R=r] m=11 n=11

0.4
0.3
0.2
0.1
0.0

5 10 15 20

r
Distribution free property check

0.5
_ uniform _ normal _ exponential _ cauchy

Estimated P[R=r] m=20 n=30

0.4
0.3
0.2
0.1
0.0

10 15 20 25 30 35 40

r
The estimated probabilities for all the different
distributions are very close to each other. And
thus from a visual inspection,

We Conclude Runs Test


is distribution free!!
LIMITING DISTRIBUTION OF R
As we have already discussed before,


( 2 1 )
Zn=
(0,1) as
2(1)

Where a=m+n and lim =

We shall empirically verify this fact.
Proportion of Observations

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-2
0

z
m=50 n=50

2
4
Proportion of Observations

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-4
-2

z
0
m=60 n=80

2
Proportion of Observations

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-4
-2

z
0
m=100 n=150

2
4
Proportion of Observations

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-2
0

z
m=200 n=200

2
4
Proportion of Observations

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-4
-2

z
0
m=200 n=50

2
Proportion of Observations

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-3
-2

z
-1
m=200 n=15

0
1
CONCLUSIONS OF LIMITING DISTRIBUTION STUDY
As a=m+n becomes large R, when properly standardized seems to go close to a
normal distribution as expected.
The approximation seems to be best if lambda is close to 0.5
If m<<n or n<<m then the convergence do not seem to happen fast enough with a.
STUDY OF POWER
We need to check the power of the test procedure against different alternatives.
As our alternative is nonparametric we will take 5 different possible alternative
distributions and check power empirically for each.
The 5 chosen alternatives are
1. Xis are independent normal, but means of successive Xis increase by 0.01
2. Xis come from a MA(1) model with parameter 0.2
3. Xis come from a AR(1) model with parameter 0.6
4. Xis come from a MA(2) model with parameters 0.2 and 0.1
5. Xis come from a ARMA(1,1) model with parameters 0.2 and 0.2
Power comparison for sample size 7+7

0.20
0.15
Power

0.10
0.05

0 1 2 3 4 5

Hypothesis
Power comparison for sample size 11+11

0.35
0.30
0.20 0.25
Power

0.15
0.10
0.05

0 1 2 3 4 5

Hypothesis
Power comparison for sample size 20+30

0.6
0.5
0.4
Power

0.3
0.2
0.1

0 1 2 3 4 5

Hypothesis
Power comparison for sample size 75+125

1.0
0.8
0.6
Power

0.4
0.2

0 1 2 3 4 5

Hypothesis
CONCLUSION OF POWER STUDY
Power for all chosen alternatives is seen to increase with sample size.
Estimated size of the test for some set of values of m and n are as below. We see all
of these are close to 0.05
m n Estimated size
7 7 0.0513
11 11 0.0493
20 30 0.061
40 60 0.0508
75 125 0.0519
STUDY OF CONSISTENCY OF THE TEST
We want that under all choices of alternative distributions, power increases with
sample size.
For this study we will again take the 5 alternatives in the power study.
For this study we will take m=n . We will choose m from 25 to 100 at a gap of 5.
ALTERNATIVE 1: INCREASING MEANS

Consistency for Alternative 1

0.6
0.5
0.4
Power

0.3
0.2
0.1

50 100 150 200

a
ALTERNATIVE 2: MA(1) PARAMETER=0.2
Consistency for Alternative 2

0.35
0.30
0.25
Power

0.20
0.15
0.10

50 100 150 200

a
ALTERNATIVE 3: AR(1) PARAMETER=0.6
Consistency for Alternative 3

0.95 1.00
0.85 0.90
Power

0.70 0.75 0.80

50 100 150 200

a
ALTERNATIVE 4: MA(2) PARAMETERS = (0.2,0.1)
Consistency for Alternative 4

0.15 0.20 0.25 0.30 0.35 0.40


Power

50 100 150 200

a
ALTERNATIVE 5: ARMA(1,1) PARAMETERS=(0.2,0.2)

Consistency for Alternative 5

0.9
0.8
0.7
Power

0.6
0.5
0.4
0.3

50 100 150 200

a
CONCLUSION OF CONSISTENCY
From the plots, it is clear that the power of the test increases for each choice of
alternatives with increase in sample size, so the test is consistent under all our choices
of alternatives.
CONCLUDING REMARKS
1) The test is not sensitive to underlying distribution as long as it is continuous in terms
of the null distribution. So this is a distribution free test.
2) The test statistic is asymptotically normal.
3) The test is consistent under various alternatives.
4) There is no parametric counterpart of this test.
5) Other tests based on runs like the runs up and down test can be used for testing
the same set of hypotheses.

You might also like