Predictions in Linear Regression Model

Chapter4
Predictions In Linear Regression Model
Prediction of values of study variable

An important use of linear regression modeling is to predict the average and actual values of study
variable. The term prediction of value of study variable corresponds to knowing the value of E ( y ) (in
case of average value) and value of y (in case of actual value) for a given value of explanatory variable.
We consider both the cases. The prediction of values consists of two steps. In the first step, the regression
coefficients are estimated on the basis of given observations. In the second step, these estimators are then
used to construct the predictor which provides the prediction of actual or average values of study
variables. Based on this approach of construction of predictors, there are two situations in which the
actual and average values of study variable can be predicted- within sample prediction and outside
sample prediction. We describe the prediction in both the situations.
Within sample prediction in simple linear regression model

Consider the linear regression model y   0  1 x   . Based on a sample of n sets of paired
observations ( xi , yi ) (i  1, 2,..., n) following yi   0  1 xi   i , where  i ’s are identically and
independently distributed following N (0,  2 ) . The parameters  0 and 1 are estimated using the
ordinary least squares estimation as b0 of  0 and b1 of 1 as
b0  y  b1 x
sxy
b1 
sxx
where
n n
1 n 1 n
sxy   ( xi  x )( yi  y ), sxx  ( xi  x ) 2 , x  i x , y   yi .
i 1 i 1 n i 1 n i 1
The fitted model is y  b0  b1 x .
Case 1: Prediction of average value of y

Suppose we want to predict the value of E ( y ) for a given value of x  x0 . Then the predictor is given
by
pm  b0  b1 x0 .
Here m stands for mean value.
Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur
1
Predictive bias
The prediction error is given as
pm  E ( y )  b0  b1 x0  E (  0  1 x0   )
 b0  b1 x0  (  0  1 x0 )
 (b0   0 )  (b1  1 ) x0 .
Then the prediction bias is given as
E  pm  E ( y )   E (b0   0 )  E (b1  1 ) x0
 0  0  0.
Thus the predictor pm is an unbiased predictor of E ( y ).
Predictive variance:
The predictive variance of pm is
PV ( pm )  Var (b0  b1 x0 )
 Var  y  b1 ( x0  x ) 
 Var ( y )  ( x0  x ) 2 Var (b1 )  2( x0  x )Cov( y , b1 )
2  2 ( x0  x ) 2
  0
n sxx
 1 ( x0  x ) 2 
  
2
.
n sxx 
Estimate of predictive variance

The predictive variance can be estimated by substituting  2 by ˆ 2  MSE as
 ( p )  ˆ 2  1  ( x0  x ) 
2
PV m  
n sxx 
 1 ( x  x )2 
 MSE   0 .
n sxx 
Prediction interval :
The 100(1-  )% prediction interval for E ( y ) is obtained as follows:
The predictor pm is a linear combination of normally distributed random variables, so it is also normally
distributed as
pm ~ N   0  1 x0 , PV  pm   .

2
So if  2 is known, then the distribution of
pm  E ( y )
PV ( pm )
is N (0,1). So the 100(1-  )% prediction interval is obtained as
 p  E ( y) 
P   z / 2  m  z /2   1  
 PV ( pm ) 
which gives the prediction interval for E ( y ) as
  1 ( x  x )2  ( x0  x ) 2  
2 1
 pm  z /2  2   0 , 
 m  /2
p z     .
 n sxx  n sxx  

When  2 is unknown, it is replaced by ˆ 2  MSE and in this case, the sampling distribution of
pm  E ( y )
 1 ( x0  x ) 2 
MSE  
n sxx 
is t -distribution with (n  2) degrees of freedom, i.e., tn  2 .
The 100(1-  )% prediction interval in this case is

 
 
 pm  E ( y ) 
P  t /2,n  2   t /2,n  2   1   .
  1 ( x  x )2  
MSE   0 
  n s  
 xx 
which gives the prediction interval as
  1 ( x  x )2   1 ( x0  x ) 2  
 pm  t /2, n  2 MSE   0  , pm  t / 2, n  2 MSE   .
 n sxx  n sxx  
Note that the width of prediction interval E ( y ) is a function of x0 . The interval width is minimum for
x0  x and widens as x0  x increases. This is expected also as the best estimates of y to be made at
x -values lie near the center of the data and the precision of estimation to deteriorate as we move to the
boundary of the x -space.

3
Case 2: Prediction of actual value
If x0 is the value of the explanatory variable, then the actual value predictor for y is
pa  b0  b1 x0 .
Here a means “actual”. The true value of y in the prediction period is given by y0   0  1 x0   0 where
 0 indicates the value that would be drawn from the distribution of random error in the prediction
period. Note that the form of predictor is the same as of average value predictor but its predictive error
and other properties are different. This is the dual nature of predictor.
Predictive bias:
The predictive error of pa is given by
pa  y0  b0  b1 x0  (  0  1 x0   0 )
 (b0   0 )  (b1  1 ) x0   0 .
Thus, we find that
E ( pa  y0 )  E (b0   0 )  E (b1  1 ) x0  E ( 0 )
 000  0
which implies that pa is an unbiased predictor of y .
Predictive variance
Because the future observation y0 is independent of pa , the predictive variance of pa is
PV ( pa )  E ( pa  y0 ) 2
 E[(b0   0 )  ( x0  x )(b1  1 )  (b1  1 ) x   0 ]2
 Var (b0 )  ( x0  x ) 2 Var (b1 )  x 2Var (b1 )  Var ( 0 )  2( x0  x )Cov(b0 , b1 )  2 xCov(b0 , b1 )  2( x0  x )Var (b1 )
[rest of the terms are 0 assuming the independence of  0 with 1 ,  2 ,...,  n ]
 Var (b0 )  [( x0  x ) 2  x 2  2( x0  x )]Var (b1 )  Var ( 0 )  2[( x0  x )  2 x ]Cov(b0 , b1 )
 Var (b0 )  x02Var (b1 )  Var ( 0 )  2 x0Cov(b0 , b1 )
1 x2  2 x 2
  2     x02   2  2 x0
 n sxx  sxx sxx
 1 ( x  x )2 
  2 1   0 .
 n sxx 

4
Estimate of predictive variance
The estimate of predictive variance can be obtained by replacing  2 by its estimate ˆ 2  MSE as
 ( p )  ˆ 2 1  1  ( x0  x ) 
2
PV a  
 n sxx 
 1 ( x  x )2 
 MSE 1   0 .
 n sxx 
Prediction interval:
If  2 is known, then the distribution of
pa  y0
PV ( pa )
is N (0,1). So the 100(1-  )% prediction interval for y0 is obtained as
 p  y0 
P   z / 2  a  z / 2   1  
 PV ( pa ) 
which gives the prediction interval for y0 as
  1 ( x  x )2  
2 1 ( x0  x ) 2 
 pa  z /2  2 1   0 , 
 a  /2
p z   1    .
  n s xx   n sxx  
When  2 is unknown, then

pa  y0
( p )
PV a
follows a t -distribution with (n  2) degrees of freedom. The 100(1-  )% prediction interval for y0 in
this case is obtained as
 pa  y0 

P t /2, n  2   t /2, n  2   1  
 
PV ( pa ) 
 
which gives the prediction interval for y0 as
  1 ( x  x )2   1 ( x0  x ) 2  
 pa  t /2,n  2 MSE  1   0 , 
 a  /2,n  2
p t MSE 1    .
  n sxx   n sxx  

The prediction interval is of minimum width at x0  x and widens as x0  x increases.
The prediction interval for pa is wider than the prediction interval for pm because the prediction interval
for pa depends on both the error from the fitted model as well as the error associated with the future
observations.
5
Within sample prediction in multiple linear regression model
Consider the multiple regression model with k explanatory variables as
y  X  ,
where y  ( y1 , y2 ,..., yn ) ' is a n 1 vector of n observation on study variable,
 x11 x12  x1k 

 
x x22  x2 k 
X   21
    
 
 xn1 xn 2  xnk 
is a n  k matrix of n observations on each of the k explanatory variables,   ( 1 ,  2 ,...,  k ) ' is a k  1
vector of regression coefficients and   (1 ,  2 ,...,  n ) ' is a n 1 vector of random error components or
disturbance term following N (0,  2 I n ) . If intercept term is present, take first column of X to be
(1,1,...,1)' .
Let the parameter  be estimated by its ordinary least squares estimator b  ( X ' X ) 1 X ' y . Then the
predictor is p  Xb which can be used for predicting the actual and average values of study variable.
This is the dual nature of predictor.
Case 1: Prediction of average value of y

When the objective is to predict the average value of y , i.e., E ( y ), then the estimation error is given by
p  E ( y )  Xb  X 
 X (b   )
 X ( X ' X ) 1 X ' 
 H
where H  X ( X ' X ) 1 X '.

Then
E  p  E ( y)  X   X   0
which proves that the predictor p  Xb provides unbiased prediction for average value.
The predictive variance of p is
PVm ( p)  E  p  E ( y ) ' p  E ( y )

 E  ' H .H  
 E  ' H  
  2tr H   2 k .
 m ( p )  ˆ 2 k where ˆ 2  MSE is obtained from analysis
The predictive variance can be estimated by PV
of variance based on OLSE.
6
When  2 is known, then the distribution of
p  E ( y)
PVm ( p )
is N (0,1). So the 100(1-  )% prediction interval for E ( y ) is obtained as
 p  E ( y) 
P   z /2   z /2   1  
 PVm ( p ) 
 p  z /2 PVm ( p), p  z /2 PVm ( p)  .

 
p  E ( y)
 m ( p)
PV
is t -distribution with (n  k ) degrees of freedom, i.e., tn  k .
The 100(1-  )% prediction interval for E ( y ) in this case is
 p  E ( y) 
P  t /2,n  k   t / 2,n  k   1   .
  m ( p) 
 PV 
p t   
  /2, n  k PV m ( p ), p  t / 2, n  k PV m ( p )  .

Case 2: Prediction of actual value of y

When the predictor p  Xb is used for predicting the actual value of study variable y , then its prediction
error is given by
p  y  Xb  X   
 X (b   )  
 X ( X ' X ) 1 X '   
   I  X ( X ' X ) 1 X ' 
 H .
Thus
E ( p  y)  0

7
which shows that p provides unbiased predictions for the actual values of study variable.
The predictive variance in this case is
PVa ( p)  E  p  y ) '( p  y  
 E   ' H .H  
 E  ' H  
  2trH
  2 (n  k ).
The predictive variance can be estimated by
 m ( p )  ˆ 2 (n  k )
PV
where ˆ 2  MSE is obtained from analysis of variance based on OLSE.
Comparing the performances of p to predict actual and average values, we find that p in better
predictor for predicting the average value in comparison to actual value when
PVm ( p)  PVa ( p)
or k  (n  k )
or 2k  n.
i.e. when the total number of observations are more than twice the number of explanatory variables.
Now we obtain the confidence interval for y.

p y
PVa ( p)
is N (0,1). So the 100(1-  )% prediction interval for y is obtained as
 p y 
P   z / 2   z /2   1  
 PVa ( p ) 
which gives the prediction interval for y as
 p  z / 2 PVa ( p), p  z /2 PVa ( p)  .

 
p y
 a ( p)
PV

8
The 100(1-  )% prediction interval of y in this case is obtained as
 p y 
P  t /2, n  k   t /2, n  k   1   .
  a ( p) 
 PV 
which gives the prediction interval for y as
p t   
  /2, n  k PV a ( p ), p  t /2, n  k PV a ( p )  .

Outside sample prediction in multiple linear regression model

Consider the model
y  X  (1)
where y is a n 1 vector of n observations on study variable, X is a n  k matrix of explanatory
variables and  is a n  1 vector of disturbances following N (0,  2 I n ).
Further, suppose a set of n f observations on the same set of k explanatory variables are also available
but the corresponding n f observations on the study variable are not available. Assuming that this set of
observation also follows the same model, we can write

yf  X f   f (2)
where y f is a n f 1 vector of future values, X f is a n f  k matrix of known values of explanatory
variables and  f is a n f  1 vector of disturbances following N (0,  2 I n f ) . It is also assumed that the
elements of  and  f are independently distributed.
We now consider the prediction of y f values for given X f from model (2). This can be done by
estimating the regression coefficients from model (1) based on n observations and use it is formulating
the predictor in model (2). If ordinary least squares estimation is used to estimate  in model (1) as
b  ( X ' X ) 1 X ' y
then the corresponding predictor is
p f  X f b  X f ( X ' X ) 1 X ' y.

9
Case 1: Prediction of average value of study variable
When the aim is to predict the average value E ( y f ), then the prediction error is
p f  E( y f )  X f b  X f 
 X f (b   )
 X f ( X ' X ) 1 X '  .
Then
E  p f  E ( y f )   X f  X ' X  X ' E   
1
 0.
Thus p f provides unbiased prediction for average value.
The predictive covariance matrix of p f is
Covm ( p f )  E  p f  E ( y f ) p f  E ( y f ) '
 E  X f ( X ' X ) 1 X '  ' X ( X ' X ) 1 X 'f 
 X f  X ' X  X ' E   ' X ( X ' X ) 1 X 'f

1
  2 X f ( X ' X ) 1 X ' X  X ' X  X 'f

1
  2 X f ( X ' X ) 1 X 'f .
The predictive variance of p f is
PVm ( p f )  E  p f  E ( y f  ' p f  E ( y f )

 tr Covm ( p f ) 
  2tr ( X ' X ) 1 X 'f X f  .
If  2 is unknown, then replace  2 by ˆ 2  MSE in the expressions of predictive covariance matrix and
predictive variance and there estimates are
 m ( p )  ˆ 2 X ( X ' X ) 1 X '
Cov f f f
 m ( p )  ˆ 2tr  X ' X 1 ( X ' X )  .

PV f  f f 
Now we obtain the confidence interval for E ( y f ) .

p f  E( y f )
PVm ( p f )
is N (0,1). So the 100(1-  )% prediction interval of E ( y f ) is obtained as

10
 p f  E( y f ) 
P   z / 2   z / 2   1  
 PVm ( p f ) 
which gives the prediction interval for E ( y f ) as
 p f  z / 2 PVm ( p f ), p f  z /2 PVm ( p f )  .
 
p f  E( y f )
m(p )
PV f
The 100(1-  )% prediction interval for E ( y f ) in this case is
 p f  E( y f ) 
P  t /2,n  k   t /2,n  k   1   .
 m(p )
PV 
 f 
which gives the prediction interval for E ( y f ) as
p t  m ( p ), p  t  
 m  /2, n  k
PV  / 2, n  k PV m ( p )  .
m

Case 2: Prediction of actual value of study variable

When p f is used to predict the actual value y f , then the prediction error is
pf  yf  X f b  X f   f
 X f (b   )   f .
Then
E  p f  y f   X f E  b     E ( f )  0.
Thus p f provides unbiased prediction for actual values.
The predictive covariance matrix of p f in this case is
Cova  p f   E  p f  y f  p f  y f  '
 E  X f (b   )   f (b   ) ' X 'f   'f 

 X f V (b) X 'f  E ( f  'f ) (Using (b   )  ( X ' X ) 1 X '  )
  2  X f ( X ' X ) 1 X 'f  I n f  .

11
The predictive variance of p f is
PVa ( p f )  E  p f  y f  '  p f  y f  
 tr Cova ( p f ) 
  2 tr ( X ' X ) 1 X 'f X f  n f  .
The estimates of covariance matrix and predictive variance can be obtained by replacing  2 by
ˆ 2  MSE as
 a ( p )  ˆ 2  X ( X ' X ) 1 X '  I 
Cov f  f f nf 
 a ( p )  ˆ 2 tr ( X ' X ) 1 X ' X  n  .

PV f  f f f 
Now we obtain the confidence interval for y f .

pf  yf
PVa ( p f )
is N (0,1). So the 100(1-  )% prediction interval is obtained as
 pf  yf 
P   z / 2   z /2   1  
 PVa ( p f ) 
which gives the prediction interval for y f as
 p f  z / 2 PVa ( p f ), p f  z /2 PVa ( p f )  .
 
pf  yf
a(p )
PV f
The 100(1-  )% prediction interval for y f in this case is
 pf  yf 
P  t /2,n  k   t /2, n  k   1   .
 
PV a ( p f ) 
 
which gives the prediction interval for y f as
p t  a ( p ), p  t  
PV  /2, n  k PV a ( p f )  .
 f  /2,n  k f f

12
Simultaneous prediction of average and actual values of study variable
The predictions are generally obtained either for the average values of study variable or actual values of
study variable. In many applications, it may not be appropriate to confine our attention to only to either
of the two. It may be more appropriate in some situations to predict both the values simultaneously, i.e.,
consider the prediction of actual and average values of study variable simultaneously. For example,
suppose a firm deals with the sale of fertilizer to the user. The interest of company would be in
predicting the average value of yield which the company would like to use in showing that the average
yield of the crop increases by using their fertilizer. On the other side, the user would not be interested in
the average value. The user would like to know the actual increase in the yield by using the fertilizer.
Suppose both seller and user, both go for prediction through regression modeling. Now using the classical
tools, the statistician can predict either the actual value or the average value. This can safeguard the
interest of either the user or the seller. Instead of this, it is required to safeguard the interest of both by
striking a balance between the objectives of the seller and the user. This can be achieved by combining
both the predictions of actual and average values. This can be done by formulating an objective function
or target function. Such target function has to be flexible and should allow to assign different weights to
the choice of two kinds of predictions depending upon their importance in any given application and
also reducible to individual predictions leading to actual and average value prediction.
Now we consider the simultaneous prediction in within and outside sample cases.
Simultaneous prediction in within sample prediction

Define a target function
   y  (1   ) E ( y ); 0    1
which is a convex combination of actual value y and average value E ( y ). The weight  is a constant
lying between zero and one whose value reflects the importance being assigned to actual value
prediction. Moreover   0 gives the average value prediction and   1 gives the actual value
prediction. For example, the value of  in the fertilizer example depends on the rules and regulation
of market, norms of society and other considerations etc. The value of  is the choice of practitioner.
Consider the multiple regression model

y  X    , E ( )  0, E ( ')   2 I n .
Estimate  by ordinary least squares estimation and construct the predictor

p  Xb.
Now employ this predictor for predicting the actual and average values simultaneously through the target
function.
13
The prediction error is
p    Xb   y  (1   ) E ( y )
 Xb   ( X    )  (1   ) X 
 X (b   )   .
Thus
E ( p   )  XE (b   )   E ( )
 0.
So p provides unbiased prediction for  .
The variance is
Var ( p )  E ( p   ) '( p   )
 E (b   ) ' X '  ' X (b   )   
 E  ' X ( X ' X ) 1 X ' X  X ' X  X '    2 '    (b   ) ' X '    ' X (b   ') 
1
 
 E (1  2 ) ' X ( X ' X ) 1 X '    2 '  
  2 (1  2 )tr ( X ' X ) 1 X ' X    2 2trI n
  2 (1  2 )k   2 n  .
The estimates of predictive variance can be obtained by replacing  2 by ˆ 2  MSE as


Var ( p )  ˆ 2 (1  2 )k   2 n  .
Simultaneous prediction is outside sample prediction:

Consider the model described earlier under outside sample prediction as
y  X    , E ( )  0, V ( )   2 I n
n1 nk k 1 n1
y f  X f    f ; E ( f )  0, V ( f )   2 I n f .
n f 1 n f k k 1 n f 1
The target function in this case is defined as

 f   y f  (1   ) E ( y f ); 0    1.
The predictor based on OLSE of  is
b   X ' X  X ' y.
1
p f  X f b;
The predictive error of p is

p f   f  X f b   y f  (1   ) E ( y f )
 X f b   ( X f    f )  (1   ) X f 
 X f (b   )   f .

14
So
E ( p f   f )  X f E (b   )   E ( f )
 0.
Thus p f provides unbiased prediction for  f .
The variance of p f is
Var ( p f )  E ( p f   ) '( p f   )
 E (b   ) '  'f   'f  X f (b   )   f 
 E  ' X ( X ' X ) 1 X 'f X f  X ' X  X '    'f  f  2 'f X f ( X ' X ) 1 X '  
1
 
  tr  X ( X ' X ) X f X f ( X ' X ) X '   n f 
2 1 ' 1 2
assuming that the elements is  and  f are mutually independent.
The estimates of predictive variance can be obtained by replacing  2 by ˆ 2  MSE as


Var ( p f )  ˆ 2 tr  X ( X ' X ) 1 X 'f X f ( X ' X ) 1 X '   2 n f  .

15

Predictions in Linear Regression Model

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predictions in Linear Regression Model

Uploaded by

Copyright:

Available Formats

Chapter4

Predictions In Linear Regression Model

Prediction of values of study variable

Within sample prediction in simple linear regression model

observations ( xi , yi ) (i  1, 2,..., n) following yi   0  1 xi   i , where  i ’s are identically and

ordinary least squares estimation as b0 of  0 and b1 of 1 as

The fitted model is y  b0  b1 x .

Case 1: Prediction of average value of y

Estimate of predictive variance

Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur

is N (0,1). So the 100(1-  )% prediction interval is obtained as

is t -distribution with (n  2) degrees of freedom, i.e., tn  2 .

The 100(1-  )% prediction interval in this case is

Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur

which implies that pa is an unbiased predictor of y .

Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur

is N (0,1). So the 100(1-  )% prediction interval for y0 is obtained as

When  2 is unknown, then

 x11 x12  x1k 

is a n  k matrix of n observations on each of the k explanatory variables,   ( 1 ,  2 ,...,  k ) ' is a k  1

Case 1: Prediction of average value of y

where H  X ( X ' X ) 1 X '.

PVm ( p)  E  p  E ( y ) ' p  E ( y )

is N (0,1). So the 100(1-  )% prediction interval for E ( y ) is obtained as

 p  z /2 PVm ( p), p  z /2 PVm ( p)  .

is t -distribution with (n  k ) degrees of freedom, i.e., tn  k .

The 100(1-  )% prediction interval for E ( y ) in this case is

Case 2: Prediction of actual value of y

Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur

where ˆ 2  MSE is obtained from analysis of variance based on OLSE.

Now we obtain the confidence interval for y.

When  2 is known, then the distribution of

is N (0,1). So the 100(1-  )% prediction interval for y is obtained as

 p  z / 2 PVa ( p), p  z /2 PVa ( p)  .

is t -distribution with (n  k ) degrees of freedom, i.e., tn  k .

Outside sample prediction in multiple linear regression model

variables and  is a n  1 vector of disturbances following N (0,  2 I n ).

observation also follows the same model, we can write

where y f is a n f 1 vector of future values, X f is a n f  k matrix of known values of explanatory

elements of  and  f are independently distributed.

Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur

The predictive covariance matrix of p f is

Covm ( p f )  E  p f  E ( y f ) p f  E ( y f ) '

 E  X f ( X ' X ) 1 X '  ' X ( X ' X ) 1 X 'f 

 X f  X ' X  X ' E   ' X ( X ' X ) 1 X 'f

  2 X f ( X ' X ) 1 X ' X  X ' X  X 'f

The predictive variance of p f is

PVm ( p f )  E  p f  E ( y f  ' p f  E ( y f )

 m ( p )  ˆ 2tr  X ' X 1 ( X ' X )  .

Now we obtain the confidence interval for E ( y f ) .

When  2 is known, then the distribution of

is N (0,1). So the 100(1-  )% prediction interval of E ( y f ) is obtained as

which gives the prediction interval for E ( y f ) as

is t -distribution with (n  k ) degrees of freedom, i.e., tn  k .

The 100(1-  )% prediction interval for E ( y f ) in this case is

Case 2: Prediction of actual value of study variable

Thus p f provides unbiased prediction for actual values.

The predictive covariance matrix of p f in this case is

Cova  p f   E  p f  y f  p f  y f  '

 E  X f (b   )   f (b   ) ' X 'f   'f 

Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur

PVa ( p f )  E  p f  y f  '  p f  y f  