You are on page 1of 11

Research Methods Assignment 1999-2000

for
Chris Stewart

Written by Konstantinos Kostoulas 9910284

Msc in International Banking and Finance(F/T)

(1500-2000 words)
1

The exercise I shall be trying to analyse involves the use of Monte Carlo techniques.These
techniques will be used in order to analyse the sampling performance of the OLS estimator,
where the population (or true) model is given by the expression: Yi=b1+b2Xi2+b3Xi3+ei.
Because of the fact that the residuals ei refer to the population,I shall symbolize them as ui,
in order to distinguish between them and the residuals from the samples ei given that
b1=10.0,b2=0.40,b3=0.60,ei~N(0,1),and i=1,2.....20.We also know the 20x3 design matrix
for X.
In Excel,the random number generator was used to produce 10 samples of 20 observations
each for the dependent variable,Yi,from the population model given above.The manual
estimation of the parameters b'=(b1,b2,b3) for each sample,showed the mean estimate of b
from the 10 samples is an approximation of the population parameters.The latter indicates
that should the random number generator have been used to produce n samples,with n tending
to infinity,the mean estimate of b would have equaled the true parameters.The more times
this simulation process is repeated,the more accurate it becomes with respect to the
population parameters.
The results from the regression analysis for the sample number 1,can be summarised as
follows: ^
Yi=11.31+0.47Xi2-0.46Xi3
_2 (0.99) (0.40) (0.60)
R =-.007909 s=1.2186 DW=2.0613
Pr[FSC1]=0.363 ,Pr[FFF1]=0.238, Pr[LMN2]=0.464 ,Pr[FH1]=0.940 ,
where R-bar-squared is the coefficient of determination adjusted for degrees of freedom,s
is the regression's standard error and DW the Durbin Watson statistic.Pr[#] denotes the
probability up to which a particular form of misspecification (#) is statistically insignificant.
FSC1 is the F-test for first order serial correlation,FFF1 is the F-test for non-linearity,LMN2
is the Lagrange Multiplier test for departures from normality and FH1 is the F-test for
heteroscedasticity.These pieces of information are automatically supplied by Microfit.I should
say that the results obtained by Microfit for each sample are identical with those obtained
from Excel.For this reason I will use the above way to summarise the results from each
regression,which combines both packages,thus taking advantage of the additional statistical
information provided by Microfit.
We can see from the results of the first sample that the adjusted coefficient of determination
R-bar-squared is not only extremely low,but also negative.The R-squared for this sample is
0.098.This suggests that this sample model has almost no explanatory power.Further,there is
no autocorrelation according to the DW statistic,since the value 2.0613 lies inbetween 1.5 and
2.5,where we know that there is no autocorrelation.If DW was lower than 1.5 or higher than
2.5,then we should probably have to calculate the upper and lower limits DL and DU,but then
again DL and DU could define a limit where one could not tell whether there is
autocorrelation or not. The rest of the statistics show no evidence of first order serial
correlation,non-linear functions of the parameters,non-normality or heteroscedasticity.As
regards the t-statistic,only the intercept term(Xi1) appears to be statistically
2

significant,having a t-ratio of 11.44.The t-ratios for Xi2 and Xi3 are 1.18 and -0.76
respectively,which indicates statistical insignificance.The plot of actual and fitted values
produced in Microfit for the first sample is:

Plot of Actual and Fitted Values


15

14
Y1
13

12

11

10
Fitted
9
1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1990
Years

We can see that the fit is not satisfactory1

The results from the second sample are:


^
Yi=9.36+0.50Xi2+0.62Xi3
(0.97) (0.39) (0.59)
_2
R=0.046 s=1.1999 DW=1.4703
Pr[FSC1]=0.228, Pr[FFF1]=0.992 , Pr[LMN2]=0.714, Pr[FH1]=0.220
There is again a very low R-bar squared,which suggests little explanatory power of the
regression.The DW statistic indicates a possible positive autocorrelation.The t-ratios show
that only the intercept is statistically significant,with a t-value of 9.6175,while the t-values for
Xi2 and Xi3 are 1.25 and 1.05,which are below the critical t-value~=2.The probability up to
which the diagnostic tests provided by Microfit are statistically insignificant from zero all
have a probability greater than 0.05.The F-test for serial correlation,for example, has a
probability of 0.280.This means that up to a signicance level of 28%,the null hypothesis
should be rejected.The latter implies that there is no statistically significant
autocorrelation.The other three diagnostic tests are all statistically insignificant,which is
desirable.

1
The dates in the diagrams are fictional and were introduced only to help import sample data from Excel to Microfit.
3

The results from the regression of the third sample for the dependent variable Yi are
the following:
^
Yi=10.4766+0.46303Xi2+0.238Xi3
(0.81) (0.333) (0.500)
_2
R=0.014 s=1.0053 DW=1.3478
Pr[FSC1]=0.231 ,Pr[FFF1]=0.749,Pr[LMN2]=0.521, Pr[FH1]=0.083
There are similar results in this regression:A R-bar-squared near zero and a R-squared
11.8%.The standard error of the regression is low,which is a desirable property.The t-ratios
are 12.83 for the intercept term,1.387 for the variable Xi2 and 0.476 for the variable Xi3.Only
the intercept is statistically significant.The error term assumptions all hold.There is some
tendency for heteroscedasticity though,but the probability of insignificance is 0.083,which is
above 0.05,and indicates that the error term has probably a constant variance(is
homoscedastic).
The results from the regression analysis of the fourth sample can be summarised as
follows: ^
Yi=9.522+0.54Xi2+0.917Xi3
(0.703)(0.287) (0.431)
_2
R=0.262 s=0.867 DW=1.93
Pr[FSC1]=0.888,Pr[FFF1]=0.023,Pr[LMN2]0.641,Pr[FH1]=0.369
In this regression we obtain a much higher R-bar-squared than the previous ones.R-bar
squared is 26.2% and R-squared is 34%.The latter indicates a sufficient increase of the
explanatory power in this model.The DW statistic is 1.93 which is greater than
1.5:(1.5<1.93<2.5),so there is no autocorrelation.However,the diagnostic tests indicate
evidence on non-linearity in this model,since the probability up to which non-linearity is
insignificant is only 0.023<0.05.The rest of the diagnostic tests are normal.The t-ratio for
the intercept is 13.53 and for the variable Xi3 is 2.12,while for Xi2 it is 1.88.This indicates
significance not only for the intercept but also for one of the explanatory variables Xi3.It is
the first sample where at least one of the variables Xi2 and Xi3 is statistically significant from
zero. The plot of actual and fitted values is:

Plot of Actual and Fitted Values


14

13 Y4

12

11

10
Fitted
9
1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1990
Years

The fit is better in this sample


4

The regression for the fifth sample is:


^
Yi=10.66-0.271Xi2+1.002Xi3
(0.83)(0.342) (0.514)
_2
R=0.10 s=1.032 DW=1.46
Pr[FSC1]=0.306,Pr[FFF1]=0.509,Pr[LMN2]=0.604,Pr[FH1]=0.40The R-bar squared here
indicates a 10% explanatory power of the model for the dependent variable.DW approximates
1.5 but we cannot tell for sure whether there is autocorrelation from this test only.There is
possibly positive autocorrelation.Again,the t-tests show that the intercept is statistically
significant,while the explanatory variables are not statistically significant.All the diagnostic
tests FSC1,FFF1,LMN2 and FH1 are insignificant,which is what we want.
The results from the sixth regression using Ordinary Least Squares are:
^
Yi=9.282+0.698Xi2+0.4117Xi3
(0.70)(0.29) (0.435)
_2
R=0.2123 s=0.873 DW=2.41
Pr[FSC1]=0.392,Pr[FFF1]=0.251,Pr[LMN2]=0.415,Pr[FH1]=0.095
This is the best sample up to now in terms of finding a good model.R-bar squared is
21.23%,R-squared is 29.5%,the standard error of the regression is 0.86,and the t-statistics
show that Xi2 this time and the intercept are statistically significant.In addition,all the
diagnostic tests are insignificant.The histogram of residuals obtained from Excel for this
sample is:

Histogram

9
8
7
6
Frequency

5
Frequency
4
3
2
1
0
-1 -0.5 0 0.5 1 More
Bin

We observe that most of the


residuals deviate from zero,which is undesirable
5

The results from the seventh sample show that:


^
Yi=10.43+0.509Xi2+0.064Xi3
(0.73)(0.301) (0.45)
_2
R=0.047 s=0.096 DW=1.85
Pr[FSC1]=0.801,Pr[FFF1]=0.948,Pr[LMN2]=0.797,Pr[FH1]=0.406
The R-bar-squared tends to zero,suggesting almost no explanatory power in the model.The
DW statistic is well above 1.5 ,which means there is no autocorrelation.The t-ratios indicate
significance of the intercept term(14.17>2) and statistical insignificance of both Xi2(1.69<2)
and Xi3(0.14<2).All the diagnostic tests are normal.
The regression of the eighth sample is:
^
Yi=9.482+0.98Xi2+0.621Xi3
(0.741)(0.303) (0.454)
_2
R=0.374 s=0.913 DW=1.93
Pr[FSC1]=0.802,Pr[FFF1]=0.765,Pr[LMN2]=0.659,Pr[FH1]=0.824
Here, the R-bar squared shows a 37.4% explanatory power of the model,while the R-squared
is 43.7%.The DW statistic shows no autocorrelation of residuals.The t-ratios for Xi2 and the
intercept are 3.25 and 12.79 respectively,so Xi2 and the intercept are statistically
significant,while the t-ratio for Xi3 is 1.36,which means that Xi3 is not statistically
significant.The standard error of regression is 0.91,while all the diagnostic tests produced in
Microfit show no evidence of first order serial correlation,non-linearity,non-normality or
heteroscedasticity.The plot of actual and fitted values for this sample is:

Plot of Actual and Fitted Values


15

14
Y8
13

12

11

10
Fitted
9
1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1990
Years

The fit is much more accurate in this sample than in the previous ones
6

The histogram of residuals for this sample is:

Histogram

7
6
5
Frequency

4
Frequency
3
2
1
0
-1 -0.5 0 0.5 1 More
Bin

Most of the residuals deviate slightly from zero,which is desirable

The next graph is the histogram of residuals produced by Microfit:

Histogram of Residuals and the Normal Density


0.8

0.6
Frequency

0.4

0.2

0.0
-3.651 -2.463 -1.274 -0.08546 1.103 2.292 3.48
Y8

What we can see here is the same as in the previous diagram:the density of residuals restricts
itself mainly to values close to zero
7

The regression results from the ninth sample are the following:
^
Yi=10.198+0.119Xi2+0.659Xi3
(0.664) (0.271) (0.407)
_2
R=0.047 s=0.818 DW=1.64
Pr[FSC1]=0.531,Pr[FFF1]=0.920,Pr[LMN2]=0.508,Pr[FH1]=0.818
In this case the F-statistic for the significance of R-squared has a probability of insignificance
of 0.257>0.05,which means that R-squared is statistically insignificant.The DW statistic has a
value of 1.64 and indicates no autocorrelation.The t-ratios show statistical significance of the
intercept term only.All the diagnostic tests FSC1,FFF1,LMN2 and FH1 are
insignificant,which is desirable.
The regression results from the tenth sample are:
^
Yi=9.9675+0.203Xi2+0.59Xi3
(0.802) (0.328) (0.492)
_2
R=-0.0016 s=0.988 DW=1.54
Pr[FSC1]=0.412,Pr[FFF1]=0.944,Pr[LMN2]=0.895,Pr[FH1]=0.660
We have R-bar-squared,the adjusted coefficient of determination which allows for degrees
of freedom,being nearly zero,while the R-squared is 10%.This shows that there is some
problem with this model.The F-statistic also informs us that the R-squared is statistically
insignificant,having a probability of insignificance of 39.4%.The DW statistic indicates no
autocorrelation.The t-ratios show that the intercept is significant once again and the
explanatory variables are insignificant.All the diagnostic tests are normal.
The mean estimates from the above ten samples are:
_ _ _
^ ^ ^
b1=10.07 b2=0.42 b3=0.46 ,
where the true(population) parameters are b1=10 ,b2=0.40 and b3=0.60.
The conclusion of this exercise is that had the simulation experiment been repeated
many(several thousand?) times by the method of least squares the average estimation of the
parameters would have equaled the population parameters.That is,OLS estimators are
unbiased .This denotes their desirable statistical properties(BLUE-Best Linear Unbiased
estimators). Had we considered more than 10 sampling experiments,we would have come
much closer to the true values.
An additional remark on the above exercise has to do with the regression results.We have
seen several times very low determination coefficients.In some cases one of the explanatory
variables was statistically significant while the other was not and vice versa.In some other
samples one of the diagnostic tests was statistically significant,thus indicating some form of
serial correlation among residuals,or non-linear functions of the parameters,or non-normal
distribution of the error term,or heteroscedasticity.
We have seen considerable sample variation.This due to a variety of factors such as different
quality of sample data,significant changes in the model which are not captured by some
samples,an element of random behaviour which becomes non-random when dealing with
small samples and so on.
Serial correlation is a great problem because it violates the error term
assumptions.Namely,when the residuals correlate with each other we cannot construct a
correct model.
8

The same happens when the variance(sigma-squared) of the error term is not
constant(homoscedasticity)but changes.Then we have heteroscedasticity,and as a result OLS
estimators will have a poor performance.
The distributional assumption of the error term also plays an important role.Without the
knowledge of the sampling distribution we cannot tell how "close" the OLS estimators are to
their population values.It is also necessary to know that the residuals ui are normally
distributed ,in order to help us find out the probability distributions of b1,b2 and b3.This
means that if we prove that b1, b2 and b3 are linear functions of the normally distributed
variable u,they themselves are normally distributed.That is because any linear function of a
normally distributed variable is itself normally distributed.
That is why we should run as many simulation tests as possible,to make up for some sample
inefficiencies,and offset undesirable results in some samples.
9

APPENDIX

Autocorrelation function of residuals, sample from 1971 to 1990


1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
0 1 2 3 4 5 6 7 88
Order of lags

Standardized Spectral Density of Residuals (Parzen Window)


2.0

1.5

1.0 Parzen

0.5

0.0
0 1 2 3 4
Frequency

The above diagrams monitor the behaviour of residuals and refer to sample number 8.
10

BIBLIOGRAPHY

Damodar Gujarati, "Essentials of Econometrics",Second Edition,McGraw Hill,1999

Watsham&Paramore,"Quantitative Methods for Finance",First Edition,Thomson


Business Press,1997

Pesaran&Pesaran,"Working with Microfit for Windows 4.0",Oxford University


Press,Oxford 1997

You might also like