Multiple Linear Regression I

1/33
EC114 Introduction to Quantitative Economics

17. Multiple Linear Regression I
Marcus Chambers
Department of Economics
University of Essex
28 February/01 March 2012
EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I
2/33
Outline
1
Introduction
2
Ordinary least squares with multiple regressors
3
The Classical Multiple Regression Model
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, sections 13.1 and 13.2.
Introduction 3/33
So far we have been concerned with regression models
involving a single explanatory variable, X, of the form
Y
i
= + X
i
+
i
, i = 1, . . . , n,
where and are the unknown population regression
parameters and
i
denotes a random disturbance.
We have also considered the set of Classical assumptions
on X and that imply that the ordinary least squares (OLS)
estimators of and , denoted a and b, have good
sampling properties.
In particular, the OLS estimators are:
best linear unbiased estimators (BLUE);
efcient (under normality).
Introduction 4/33
In addition to unbiasedness, the OLS estimators have the
smallest variance among linear unbiased estimators, and if
we assume normality they have the smallest variance
among all unbiased estimators.
The OLS estimators therefore provide a good basis for
making inferences about and .
For example, we can use the results that
a
s
a
t
n2
and
b
s
b
t
n2
where s
a
and s
b
are the estimated standard errors of a and
b, respectively, to conduct hypothesis tests using the
t-distribution.
Introduction 5/33
However, many relationships that we study in Economics
are concerned with more than two variables, Y and X.
For example, the demand for a good (Q
d
) may depend not
only on its own price (P
1
) but also on consumers income
(M) and the prices of other goods (substitutes and
complements) (P
2
, P
3
,. . . ) e.g.
Q
d
= f (P
1
, M, P
2
, P
3
, . . .).
We therefore need to extend our regression model to
include additional explanatory variables (regressors) while,
at the same time, keeping the desirable properties of the
OLS estimators in the two-variable model.
Fortunately it is possible to apply OLS to regressions with
multiple explanatory variables and the optimality properties
carry over under suitable assumptions.
Ordinary least squares with multiple regressors 6/33
We begin by assuming that a linear relationship exists
between a dependent variable Y and k 1 explanatory
variables, X
2
, X
3
, . . . , X
k
:
Y =
1
+
2
X
2
+
3
X
3
+ . . . +
k
X
k
+ ,
where is a random disturbance and the
j
(j = 1, . . . , k)
are constants.
Note that it is common to denote the rst explanatory
variable by X
2
rather than X
1
.
In fact, it is convenient to interpret the intercept
1
as the
coefcient on a variable X
1
that always takes the value 1.
Assuming that E() = 0 and taking the X values as given,
we obtain
E(Y) =
1
+
2
X
2
+
3
X
3
+ . . . +
k
X
k
;
this is the population regression equation.
Each coefcient
j
represents the effect on E(Y) of a unit
change in X
j
holding all other X variables constant.
For example,
2
measures the change in E(Y) when X
2
changes by one unit; it is the partial derivative E(Y)/X
2
.
The
j
coefcients are population parameters; their values
are unknown and we aim to estimate them from a sample
of observations on Y and the Xs.
We shall use the following notation:
Y
i
: observation i on the dependent variable Y;
X
ji
: observation i on explanatory variable X
j
;
i
: the (unobserved) value of for observation i.
For example, observation 6 consists of
Y
6
, X
26
, X
36
, . . . , X
k6
;
these values are related by
Y
6
=
1
+
2
X
26
+
3
X
36
+ . . . +
k
X
k6
+
6
.
For a general observation i we have
Y
i
=
1
+
2
X
2i
+
3
X
3i
+ . . . +
k
X
ki
+
i
, i = 1, . . . , n.
Suppose we estimate
1
, . . . ,
k
using b
1
, . . . , b
k
; for the
time being we shall not specify how the estimates are
obtained.
The sample regression equation corresponding to
b
1
, . . . , b
k
is
Y
i
= b
1
+ b
2
X
2i
+ b
3
X
3i
+ . . . + b
k
X
ki
, i = 1, . . . , n;
the

Y
i
(i = 1, . . . , n) are the tted (or predicted) values of Y.
The difference between Y and

Y is, as before, called a
residual, and is denoted
e
i
= Y
i
Y
i
, i = 1, . . . , n.
We can also write
Y
i
= b
1
+ b
2
X
2i
+ b
3
X
3i
+ . . . + b
k
X
ki
+ e
i
, i = 1, . . . , n.
How do we choose b
1
, . . . , b
k
?
The method of ordinary least squares (OLS) chooses the
estimates so as to minimise the sum of squared residuals,
S =
e
2
i
.
We can express e explcitly in terms of b
1
, . . . , b
k
:
e
i
= Y
i
b
1
b
2
X
2i
b
3
X
3i
. . . b
k
X
ki
, i = 1, . . . , n.
It follows that the objective function is
S =
n
i=1
e
2
i
=
n
i=1
(Y
i
b
1
b
2
X
2i
b
3
X
3i
. . . b
k
X
ki
)
2
.
In order to minimise S with respect to b
1
, . . . , b
k
we must:
(i) partially differentiate S with respect to each b
j
;
(ii) set the k partial derivatives equal to zero and solve for
b
1
, . . . , b
k
.
In step (i) we obtain
S
b
1
,
S
b
2
, . . . ,
S
b
k
.
In step (ii) we equate to zero and solve the following k
equations jointly:
S
b
1
= 0,
S
b
2
= 0, . . . ,
S
b
k
= 0.
As k gets larger this becomes more and more difcult!
For an arbitrary value of k it is possible to write the solution
compactly in terms of matrices and vectors.
In practice we rely on computer software to compute OLS
estimates based on such a representation of the solution.
Example. We return to the money demand example rst
encountered in Lecture 11.
Our two-variable regression of money stock (Y) on GDP X
2
yielded
Y = 0.0212 + 0.1749X
2
,
based on our sample of 30 countries in 1985.
Suppose we also add the rate of interest variable, X
3
, to
the regression; we obtain the following output in Stata:
. regress m g ir
Source | SS df MS Number of obs = 30
-------------+------------------------------ F( 2, 27) = 47.03
Model | 20.5135791 2 10.2567896 Prob > F = 0.0000
Residual | 5.88865732 27 .218098419 R-squared = 0.7770
-------------+------------------------------ Adj R-squared = 0.7604
Total | 26.4022364 29 .910421946 Root MSE = .46701
------------------------------------------------------------------------------
m | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
g | .172615 .0183198 9.42 0.000 .1350258 .2102042
ir | -.0006758 .0008844 -0.76 0.451 -.0024904 .0011388
_cons | .0569582 .125639 0.45 0.654 -.2008317 .3147481
------------------------------------------------------------------------------
The regression results, including standard errors in
parentheses, can be represented as:
Y = 0.0570 + 0.1726 X
2
0.000676 X
3
,
(0.1256) (0.0183) (0.000884)
with R
2
= 0.777.
The magnitudes of the estimated coefcients differ
substantially, with the coefcient on X
3
appearing to be
very small.
But this reects the relative units of measurement of X
3
,
which is measured as, for example, 16% rather than 0.16.
If we had used the latter units of measurement (i.e. dividing
all observations on X
3
by 100), then then estimated
coefcient would have been 100 times larger.
Remember that statistical signicance of a variable is
tested using a t-test and is not judged by the magnitude of
the estimated coefcient!
If we add another regressor, X
4
(the rate of price ination),
we obtain:
. regress m g ir pi
-------------+------------------------------ F( 3, 26) = 30.70
Model | 20.5893701 3 6.86312337 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.7544
Total | 26.4022364 29 .910421946 Root MSE = .47283
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
g | .1703745 .0189433 8.99 0.000 .1314361 .2093129
ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967
pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592
_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475
------------------------------------------------------------------------------
These regression results, including standard errors in
parentheses, can be represented as:
Y = 0.0894 + 0.1704 X
2
0.000169 X
3
0.0022 X
4
,
(0.1388) (0.0189) (0.001248) (0.0038)
with R
2
= 0.7798.
We could also carry out the estimations using logarithms of
the variables; for example
. regress lm lg lir
-------------+------------------------------ F( 2, 27) = 175.53
Model | 59.8192409 2 29.9096204 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.9233
Total | 64.4198259 29 2.22137331 Root MSE = .41279
------------------------------------------------------------------------------
lm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lg | 1.026927 .0570772 17.99 0.000 .9098146 1.14404
lir | -.2486999 .0671987 -3.70 0.001 -.3865802 -.1108195
_cons | -1.248211 .1991953 -6.27 0.000 -1.656926 -.8394964
------------------------------------------------------------------------------
The logarithmic results can be represented as
ln(Y) = 1.2482 + 1.0269 ln(X

2
) 0.2487 ln(X
3
),
(0.1992) (0.0571) (0.0672)
with R
2
= 0.9286 and where gures in parentheses are
standard errors.
The estimated coefcients now have the interpretation of
elasticities.
For example, the income elasticity of the demand for
money is estimated to be 1.0269, while the interest rate
elasticity of money demand is estimated as 0.2487.
However, in order to conduct formal hypothesis tests, we
need to know the sampling properties of the OLS
estimators, and to do that we need to make some
assumptions. . .
The Classical Multiple Regression Model 18/33
Just as in the two-variable regression model, the OLS
estimators in the multiple regression model are subject to
sampling variability.
The properties of the OLS estimators and their
distributions depend on the conditions under which they
are obtained i.e. the assumptions made.
We have already studied the assumptions of the
two-variable Classical model, and the Classical multiple
regression model is basically a straightforward extension of
the two-variable case.
The assumptions we need to make concern the
explanatory variables X
2
, . . . , X
k
and the error term .
As before we shall focus on the small-sample properties of
the estimators and shall ignore large sample (n )
properties.
The assumptions concerning the regressors are as follows:
Assumptions concerning the explanatory variables
IA (non-random): X
2
, . . . , X
k
are non-stochastic;
IB (xed): The values of X
2
, . . . , X
k
are xed in
repeated samples;
ID (no collinearity): There exist no exact linear relationships
between the sample values of any two
or more of the explanatory variables.
Note that Assumption IC, used in Thomas, is a
large-sample assumption and has been omitted here.
Assumptions IA (non-random) and IB (xed) are identical
to the two-variable model but are now applied to all
regressors.
It means that X
2
. . . , X
k
are not random variables and the
same values would appear in each sample if it were
possible to conduct repeated sampling.
The new assumption is ID (no collinearity) which has no
equivalent in the two-variable model.
It is included in order to rule out the possibility of what is
called perfect multicollinearity, which we will study in
more detail in Lecture 18.
For now, simply note that the assumption rules out the
possibility that, for example, X
3i
= 5 + 2X
2i
for all i.
If Assumption ID is violated then all estimation methods,
including OLS, are infeasible.
The assumptions concerning are the same as in the
two-variable model.
For completeness they are repeated below:
Assumptions concerning the disturbances
IIA (zero mean): E(
i
) = 0, for all i;
IIB (constant variance): V(
i
) =
2
= constant for all i;
IIC (zero covariance): Cov(
i
,
j
) = 0 for all i = j;
IID (normality): each
i
is normally distributed.
These assumptions govern the properties of the random
part of the model.
Given that X
2
, . . . , X
k
are xed they therefore govern the
variation in Y in repeated samples.
Assumption IIA (zero mean) implies that the average
effect of in repeated samples is zero and the value of Y,
on average, is:
E(Y
i
) = E(
1
+
2
X
2i
+ . . . +
k
X
ki
+
i
)
=
1
+
2
X
2i
+ . . . +
k
X
ki
+ E(
i
)
=
1
+
2
X
2i
+ . . . +
k
X
ki
, i = 1, . . . , n,
because E(
i
) = 0 under IIA.
Note that E(Y
i
) is not the same for each i but depends on
X
2i
, . . . , X
ki
which are not constant throughout the sample
(if they were constant they would violate Assumption ID).
Recall that combining IIA (zero mean), IIB (constant
variance) and IID (normality) gives
i
N(0,
2
), i = 1, . . . , n.
Note that
Y
i
E(Y
i
) = Y
i
2
X
2i
. . .
k
X
ki
=
i
;
this implies that
V(Y
i
) = E (Y
i
E(Y
i
))
2
= E(
2
i
) = V(
i
) =
2
which in turn implies that
Y
i
N

1
+
2
X
2i
+ . . . +
k
X
ki
,
2
, i = 1, . . . , n.
The implications of the Assumptions for the OLS
estimators can be summarised as follows:
Property Assumptions
Linearity IA, IB, ID
Unbiasedness IA, IB, ID, IIA
BLUness IA, IB, ID, IIA, IIB, IIC
Efciency IA, IB, ID, IIA, IIB, IIC, IID
Normality IA, IB, ID, IIA, IIB, IIC, IID
These are the same as in the two-variable model except
that we now require Assumption ID (no collinearity) in all
cases.
The unbiasedness and normality properties imply that
b
j
N
j
,
2
b
j
, j = 1, . . . , k,
which can be used as a basis for inference.
For k > 2 the variances,
2
b
j
, are complicated functions of
the regressors, but all are proportional to
2
= V().
In order to conduct inference we therefore need to
estimate
2
.
A generalisation of the estimator in the two-variable model
is used for this, and is given by
s
2
=
e
2
i
n k
;
it is an unbiased estimator i.e. E(s
2
) =
2
.
Note that the denominator of s
2
involves n k.
This is because we have had to estimate k parameters
(
1
, . . . ,
k
) in order to compute the residuals e
1
, . . . , e
n
and
have therefore lost k degrees of freedom.
If we use s
2
in the (complicated) formulae for the estimator
variances we obtain the estimated variances s
2
b
j
(j = 1, . . . , k).
It follows that, for inference, we then use Students
t-distribution instead of the normal distribution:
b
j
j
s
b
j
t
nk
.
So, to test the signicance of a regressor, X
j
, i.e. to test
H
0
:
j
= 0 against H
A
:
j
= 0,
we can use the test statistic
TS =
b
j
s
b
j
t
nk
under H
0
.
Let t
0.025
denote the 5% critical value from the t
nk
distribution that puts 2.5% of the distribution into each tail.
As before the decision rule is:
if |TS| > t
0.025
reject H
0
; if |TS| < t
0.025
do not reject H
0
.
We can also use the t-distribution to form condence
intervals (CIs) for the unknown population parameters
1
, . . . ,
k
.
With t
0.025
as dened on the previous slide, a 95% CI for
j
is of the form
b
j
t
0.025
s
b
j
or

b
j
t
0.025
s
b
j
, b
j
+ t
0.025
s
b
j
,
i.e. we are 95% condent that
j
lies in this interval.
Example. Lets return to the money demand data where
we estimated the model
Y =
1
+
2
X
2
+
3
X
3
+
4
X
4
+ ,
where Y denotes money stock, X
2
is GDP, X
3
is the interest
rate and X
4
is the rate of price ination.
Lets test the hypotheses
2
= 0 and
3
= 0 and nd a 95%
condence interval for
4
.
The regression output is as follows:
. regress m g ir pi
-------------+------------------------------ F( 3, 26) = 30.70
Model | 20.5893701 3 6.86312337 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.7544
Total | 26.4022364 29 .910421946 Root MSE = .47283
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
g | .1703745 .0189433 8.99 0.000 .1314361 .2093129
ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967
pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592
_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475
------------------------------------------------------------------------------
Note that t-ratios for testing
j
= 0 are given in the output
above, as are 95% CIs, but we shall go through the
calculations nonetheless!
To test H
0
:
2
= 0 against H
A
:
2
= 0 we use
TS =
b
2
s
b
2
=
0.1704
0.0189
= 8.99 t
26
under H
0
.
The 5% critical value for a two-tail test from the t
26
distribution is 2.056.
As |TS| = 8.99 > 2.056 we reject H
0
in favour of H
A
i.e. there is evidence that
2
= 0 and hence that GDP is a
signicant determinant of the money stock.
Repeating the process for
3
we obtain
TS =
0.0001693
0.001248
= 0.14.
Here |TS| = 0.14 < 2.056 and hence we do not reject
H
0
:
3
= 0 i.e. we are unable to reject the hypothesis that
the interest rate is not a signicant determinant of money.
A 95% CI for
4
is obtained as
b
4
t
0.025
s
b
4
= 0.002197 (2.056 0.003773)
which gives 0.002197 0.007757 or [0.00995, 0.00556].
Summary 33/33
Summary
the Classical multiple linear regression model
Next week:
the problem of multicollinearity
making inferences

Multiple Linear Regression I

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Linear Regression I

Uploaded by

Copyright:

Available Formats

1/33

EC114 Introduction to Quantitative Economics

ln(Y) = 1.2482 + 1.0269 ln(X

You might also like