You are on page 1of 32

Chapter 5: Ordinary Least Squares Estimation Procedure –

The Mechanics
Chapter 5 Outline
• Best Fitting Line
• Clint’s Assignment
• Simple Regression Model
o Parameters of the Model
o Error Term and Random Influences
o Best Fitting Line
o Needed: A Systematic Procedure to Determine the Best Fitting
Line
• Ordinary Least Squares (OLS) Estimation Procedure
o Sum of Squared Residuals Criterion
o Finding the Best Fitting Line
• Importance of the Error Term
o Absence of Random Influences: A What If Question
o Presence of Random Influences: Back to Reality
• Error Terms and Random Influences: A Closer Look
• Standard Ordinary Least Squares (OLS) Premises
• Clint’s Assignment: The Two Parts
Chapter 5 Prep Questions
1. The following table reports the (disposable) income earned by Americans and
their total savings between 1950 and 1975 in billions of dollars:
Income Savings Income Savings Income Savings
Year (Billion $) (Billion $) Year (Billion $) (Billion $) Year (Billion $) (Billion $)
1950 210.1 17.9 1959 350.5 32.9 1968 625.0 67.0
1951 231.0 22.5 1960 365.4 33.7 1969 674.0 68.8
1952 243.4 23.9 1961 381.8 39.7 1970 735.7 87.2
1953 258.6 25.5 1962 405.1 41.8 1971 801.8 99.9
1954 264.3 24.3 1963 425.1 42.4 1972 869.1 98.5
1955 283.3 24.5 1964 462.5 51.1 1973 978.3 125.9
1956 303.0 31.3 1965 498.1 54.3 1974 1071.6 138.2
1957 319.8 32.9 1966 537.5 56.6 1975 1187.4 153.0
1958 330.5 34.3 1967 575.3 67.5
a. Construct a scatter diagram for income and savings. Place income on
the horizontal axis and savings on the vertical axis.
b. Economic theory teaches that savings increases with income. Do these
data tend to support this theory?
2

c. Using a ruler, draw a straight line through these points to estimate the
relationship between savings and income. What equation describes this
line?
d. Using the equation, estimate by how much savings will increase if
income increases by $1 billion.
2. Three students are enrolled in Professor Jeff Lord’s 8:30 am class. Every week,
he gives a short quiz. After returning the quiz, Professor Lord asks his
students to report the number of minutes they studied; the students always
respond honestly. The minutes studied and the quiz scores for the first quiz
appear in the table below:1
Minutes Quiz
Student Studied (x) Score (y)
1 5 66
2 15 87
3 25 90
a. Construct a scatter diagram for income and savings. Place minutes on
the horizontal axis and score on the vertical axis.
b. Ever since first grade, what have your parents and teachers been telling
you about the relationship between studying and grades? For the most
part, do these data tend to support this theory?
c. Using a ruler, draw a straight line through these points to estimate the
relationship between minutes studied and quiz scores. What equation
describes this line?
d. Using the equation, estimate by how much a student’s quiz score would
increase if that student studies one additional minute.
3. Recall that the presence of a random variable brings forth both bad news and
good news.
a. What is the bad news?
b. What is the good news?
4. What is the relative frequency interpretation of probability?
5. Calculus problem: Consider the following equation:
SSR = ( y1 − bConst − bx x1 ) 2 + ( y2 − bConst − bx x2 ) 2 + ( y3 − bConst − bx x3 )2
Differentiate SSR with respect to bConst and set the derivative equal to 0:
dSSR
=0.
dbConst
Solve for bConst, and show that
3

y1 + y2 + y3
bConst = y − bx x where y =
3
x +x +x
x= 1 2 3
3
6. Again, consider the following equation:
SSR = ( y1 − bConst − bx x1 ) 2 + ( y2 − bConst − bx x2 ) 2 + ( y3 − bConst − bx x3 )2
Let
bConst = y − bx x
Substitute the expression for bConst into the equation for SSR. Show
that after the substitution:
SSR = [( y1 − y ) − bx ( x1 − x )]2 + [( y2 − y ) − bx ( x2 − x )]2 + [( y3 − y ) − bx ( x3 − x )]2

Best Fitting Line


Recall the income and savings data we introduced in the chapter preview
questions:
Income and Savings Data: Annual time series data of U. S. disposable
income and savings from 1950 and 1975.

Income Savings Income Savings Income Savings


(Billion (Billion (Billion (Billion (Billion
Year (Billion $) Year Year
$) $) $) $) $)
1950 210.1 17.9 1959 350.5 32.9 1968 625.0 67.0
1951 231.0 22.5 1960 365.4 33.7 1969 674.0 68.8
1952 243.4 23.9 1961 381.8 39.7 1970 735.7 87.2
1953 258.6 25.5 1962 405.1 41.8 1971 801.8 99.9
1954 264.3 24.3 1963 425.1 42.4 1972 869.1 98.5
1955 283.3 24.5 1964 462.5 51.1 1973 978.3 125.9
1956 303.0 31.3 1965 498.1 54.3 1974 1071.6 138.2
1957 319.8 32.9 1966 537.5 56.6 1975 1187.4 153.0
1958 330.5 34.3 1967 575.3 67.5
Table 5.1: U. S. Annual Income and Savings Data – 1950 to 1975

Economic theory suggests that as American households earn more income, they
will save more:
Theory: Additional income increases savings.
Project: Assess the effect of income on savings.
Question: How can we use our data to “test” this theory? That is, how can we
assess the effect of income on savings?
4

Answer: We begin by drawing a scatter diagram of the income-savings data.


Each point represents income and savings of a single year. The lower left point
represents income and savings for 1950: (210.1, 17.9). The upper right point
represents income and savings for 1975: (1187.4, 153.0). Each other point
represents one of the other years.

Figure 5.1: Income and Savings Scatter Diagram

The data appear to support the theory: as income increases, savings generally
increase.
Question: How can we estimate the relationship between income and savings?
Answer: Draw a line through the points that best fits the data; then, use the
equation for the best fitting line to estimate the relationship.
5

Figure 5.2: Income and Savings Scatter Diagram with Best Fitting Line

By choosing two points on this line, we can solve for the equation of the
best fitting line. It looks like the points (200, 15) and (1200, 155) are more or less
on the line. Let us use these two points to estimate the slope:
Rise 155 − 15 140
Slope = = = = .14
Run 1, 200 − 200 1, 000
A little algebra allows us to derive the equation for this line:
y − 15
= .14
x − 200
y − 15 = .14 x − 28
y = .14 x − 13
This equation suggests that if Americans earn an additional $1 of income, savings
will rise by an estimated $.14; or equivalently, we estimate that a $1,000 increase
in income causes a $140 increase in savings. Since the slope is positive, the data
appear to support our theory; additional income appears to increase savings.
Clint’s Assignment
Next, consider a second example. Three students are enrolled in Professor Jeff
Lord’s 8:30 am class. Every week, he gives a short quiz. After returning the quiz,
Professor Lord asks his students to report the number of minutes they studied; the
6

students always respond honestly. The minutes studied and the quiz scores for the
first quiz appear in Table 5.2:
7

Minutes
Quiz
Studied
Student Score (y)
(x)
1 5 66
2 15 87
3 25 90
Table 5.2: First Quiz Results

The theory suggests that a student’s score on the quiz depends on the
number of minutes he/she studied:
Theory: Additional studying increases quiz scores.
Also, it is generally believed that Professor Lord, a very generous soul, awards
students some points just for showing up for a quiz so early in the morning. Our
friend Clint has been assigned the problem of assessing the theory. Clint’s
assignment is to use the data from Professor Lord’s first quiz to assess the theory:
Project: Use data from Professor Lord’s first quiz to assess the effect of
studying on quiz scores.
Simple Regression Model
The following equation allows us to use the simple regression model to assess
the theory:
yt = βConst + βxxt + et where yt = Quiz score received by student t
xt = Number of minutes studied by student t
et = Error term for student t: Random influences
t = 1, 2, and 3
Denoting the three students: Student 1,
Student 2, and Student 3
yt, quiz score, is called the dependent variable and xt, minutes studied, the
explanatory variable. The value of the dependent variable depends on the value
of the explanatory variable. Or putting it differently, the value of the explanatory
variable explains the value of the dependent value.
Parameters of the Model
βConst and βx, the constant and coefficient of the equation, are called the
parameters of the model. To interpret the parameters recall that
• it is generally believed that Professor Lord gives students some points just
for showing up for the quiz.
• the theory postulates that studying more will improve a student’s score.
Using these observations, we can interpret the parameters, βConst and βx:
• βConst represents the number of points Professor Lord gives students just
for showing up.
8

• βx represents the number of additional points earned for an additional


minute of studying.
Error Term and Random Influences
et is the error term. The error term reflects all the random influences on student
t’s quiz score, yt. For example, if Professor Lord were in an unusually bad humor
when he graded one student’s quiz, that student’s quiz score might be unusually
low; this would be reflected by a negative error term. On the other hand, if
Professor Lord were in an unusually good humor, the student’s score might be
unusually high and a positive error term would result. Professor Lord’s
disposition is not the only sources of randomness. For example, one particular
student could have just “lucked out” by correctly anticipating the questions
Professor Lord asked. In this case, the students score would be unusually high,
his/her error term would be positive. All these random influences are accounted
for by the error term. The error term accounts for all the factors that cannot be
determined or anticipated beforehand.
What Is Simple about the Simple Regression Model?
The word simple is used to describe the model because the model includes only a
single explanatory variable. Obviously, many other factors influence a student’s
quiz score; the number of minutes studied is only one such factor. However, we
must start somewhere. We will begin with the simple regression model. Later we
shall move on and introduce multiple regression models to analyze more realistic
scenarios in which two or more explanatory variables are used to explain the
dependent variable.
Best Fitting Line
Question: How can Clint use the data to assess the effect of studying on quiz
scores?
Answer: He begins by drawing a scatter diagram using the data appearing in
Table 5.2.
9

Score (y)

100

Std 3
90
Std 2

80

70
Std 1

60

50

5 10 15 20 25 30
Minutes (x)
Figure 5.3: Minutes and Scores Scatter Diagram

The data appear to confirm the “theory.” As minutes studied increase, quiz scores
tend to increase.
Question: How can Clint estimate the relationship between minutes studied and
the quiz score more precisely?
Answer: Draw a line through the points that best fits the data; then, use the best
fitting line’s equation to estimate the relationship.
10

Score (y)

100

Std 3
90
Std 2

80

70
Std 1

60

50

5 10 15 20 25 30
Minutes (x)
Figure 5.4: Minutes and Scores Scatter Diagram with Clint’s Eyeballed Best
Fitting Line

Clint’s effort to “eyeball” the best fitting line appears in Figure 5.4. By
choosing two points on this line, Clint can solve for the equation of his best fitting
line. It looks like the points (0, 60) and (20, 90) are more or less on the line. He
can use these two points to estimate the slope:
Rise 90 − 60 30
Slope = = = = 1.5
Run 20 − 0 20
Next, Clint can use a little algebra to derive the equation for the line:
y − 60
= 1.5
x−0
y − 60 = 1.5 x
y = 60 + 1.5 x
This equation suggests that an additional minute of studying increases a student’s
score by 1.5 points.
Needed: A Systematic Procedure to Determine the Best Fitting Line
Let us compare the two examples we introduced. In the income-savings case, the
points were clustered tightly around our best fitting line. Two different individuals
11

might not “eyeball” the identical “best fitting line,” but the difference would be
slight. In the minutes-scores case, however, the points are not clustered nearly so
tightly. Two individuals could “eyeball” the “best fitting line” very differently;
therefore, two individuals could derive substantially different equations for the
best fitting line and would then would report very different estimates of the effect
that studying has on quiz scores. Consequently, we need a systematic procedure to
determine the best fitting line. Furthermore, once we determine the best fitting
line, we need to decide how confident we should be in the theory. We will now
address two issues:
• What systematic procedure should we use to determine the best fitting line
for the data?
• In view of the best fitting line, how much confidence should we have in
the theory’s validity?
Ordinary Least Squares (OLS) Estimation Procedure
The ordinary least squares (OLS) estimation procedure is the most widely
used estimation procedure to determine the equation for the line that “best fits”
the data. Its popularity results from two factors. The procedure
• is computationally straightforward; it provides us (and computer software)
with a relatively easy way to estimate the regression model’s parameters,
the constant and slope of the best fitting line.
• possesses several desirable properties when the error term meets certain
conditions.
This chapter focuses on the computational aspects of the ordinary least squares
(OLS) estimation procedure. In Chapter 6 we turn to the properties of the
estimation procedure.
We begin our study of the ordinary least squares (OLS) estimation
procedure by introducing a little notation. We must distinguish between the actual
values of the parameters and the estimates of the parameters. We have used the
Greek letter beta, β, to denote the actual values. Recall the original model:
yt = βConst + βxxt + et
βConst denotes the actual constant and βx the actual coefficient.
We shall use Roman italicized b’s to denote the estimates. bConst denotes
the estimate of the constant for the best fitting line and bx denotes the estimate of
the coefficient for the best fitting line. That is, the equation for the best fitting line
is:
y = bConst + bxx
The constant and slope of the best fitting line, bConst and bx, estimate the values of
βConst and βx.2
Sum of Squared Errors Criterion
12

The ordinary least squares (OLS) estimation procedure chooses bConst and bx so as
to minimize the sum of the squared residuals. We shall now use our example to
illustrate precisely what this means. We begin by introducing an equation for each
student’s estimated score: Esty1, Esty2, and Esty3.
Esty1 = bConst + bxx1 Esty2 = bConst + bxx2 Esty3 = bConst + bxx3
Esty1, Esty2, and Esty3 estimate the score received by students 1, 2, and 3 based on
the estimated constant, bConst, the estimated coefficient, bx, and the number of
minutes each student studies, x1, x2, and x3.
The difference between a student’s actual score, yt, and his/her estimated
score, Estyt, is called the residual, Rest:
Res1 = y1 − Esty1 Res2 = y2 − Esty2 Res3 = y3 − Esty3
Substituting for each student’s estimated score:
Res1 = y1 − bConst − bxx1 Res2 = y2 − bConst − bxx2 Res3 = y3 − bConst − bxx3
Next, we square each residual and add them together to compute the sum of
squared residuals, SSR:
SSR = Res12 + Res22 + Res32
= ( y1 − bConst − bx x1 ) 2 + ( y2 − bConst − bx x2 ) 2 + ( y3 − bConst − bx x3 ) 2

We can generalize the sum of squared residuals by considering a sample


size of T:
T T
SSR = ∑ Rest2 = ∑ ( yt − bConst − bx xt )2 where T = SampleSize
t =1 t =1
bConst and bx are chosen to minimize the sum of squared residuals. The following
equations for bConst and bx accomplish this:
T

∑(y t − y )( xt − x )
bConst = y − bx x bx = t =1
T

∑ (x − x )
t =1
t
2

To justify the equations, consider a sample size of 3:


SSR = ( y1 − bConst − bx x1 ) 2 + ( y2 − bConst − bx x2 ) 2 + ( y3 − bConst − bx x3 ) 2

Finding the Best Fitting Line


First, focus on bConst. Differentiate the sum of squared residuals, SSR, with respect
to bConst and set the derivative equal to 0:
13

dSSR
= −2( y1 − bConst − bx x1 ) + −2( y2 − bConst − bx x2 ) + −2( y3 − bConst − bx x3 ) = 0
dbConst

Dividing by −2.
( y1 − bConst − bx x1 ) + ( y2 − bConst − bx x2 ) + ( y3 − bConst − bx x3 ) = 0
Collecting like terms.
( y1 + y2 + y3 ) + (−bConst − bConst − bConst ) + (−bx x1 − bx x2 − bx x3 ) = 0
Simplifying.
( y1 + y2 + y3 ) − 3bConst − bx ( x1 + x2 + x3 ) = 0
Dividing by 3.
y1 + y2 + y3 x +x +x
− bConst − bx 1 2 3 = 0
3 3
y + y2 + y3
Since 1 equals the mean of y, y ,
3
x +x +x
and 1 2 3 equals the mean of x, x :
3

y − bConst − bx x = 0
14

Score (y)

100

Std 3
90
Std 2
OLS Estimate: y = bConst + bxx

80
− −
(x, y) = (15, 81)

70
Std 1

60

50

5 10 15 20 25 30
Minutes (x)
Figure 5.5: Minutes and Scores Scatter Diagram with OLS Best Fitting Line

Our first equation, our equation for bConst, is now justified. To minimize the sum
of squared residuals, the following relationship must be met:
y = bConst + bx x or bConst = y − bx x
As illustrated in Figure 5.5, this equation simply says that the best fitting line
must pass through the point ( x , y ) , the point representing the mean of x, minutes
studied, and the mean of y, the quiz scores. It is easy to calculate the means:
x +x +x 5 + 15 + 25 45
x= 1 2 3 = = = 15
3 3 3
y + y2 + y3 66 + 87 + 90 243
y= 1 = = = 81
3 3 3
The best fitting line passes through the point (15, 81).
Next, we shall justify the equation for bx. Reconsider the equation for the
sum of squared residuals and substitute y − bx x for bConst:
SSR = ( y1 − bConst − bx x1 ) 2 + ( y2 − bConst − bx x2 ) 2 + ( y3 − bConst − bx x3 ) 2

Substituting y − bx x for bConst .


15

= [ y1 − ( y − bx x ) − bx x1 ]2 + [ y2 − ( y − bx x ) − bx x2 ]2 + [ y3 − ( y − bx x ) − b x3 ]2

Simplifying each of the three terms.


= [ y1 − y + bx x − bx x1 ] 2
+ [ y2 − y + bx x − bx x2 ]2 + [ y3 − y + bx x − b x3 ]2

Switching of the “bx terms” within


each of the three squared terms.
= [ y1 − y − bx x1 + bx x ]2 + [ y2 − y − bx x2 + bx x ]2 + [ y3 − y − bx x3 + bx x ]2

Factoring out −bx within each of the


three squared terms.
= [( y1 − y ) − bx ( x1 − x )]2 + [( y2 − y ) − bx ( x2 − x )]2 + [( y3 − y ) − bx ( x3 − x )]2

To minimize the sum of squared residuals, differentiate SSR with respect to bx


and set the derivative equal to 0:
dSSR
= −2[( y1 − y ) − bx ( x1 − x )]( x1 − x ) − 2[( y2 − y ) − bx ( x2 − x )]( x2 − x )
dbx
− 2[( y3 − y ) − bx ( x3 − x )]( x3 − x ) = 0

Dividing by −2.
[( y1 − y ) − bx ( x1 − x )]( x1 − x ) + [( y2 − y ) − bx ( x2 − x )]( x2 − x )
+ [( y3 − y ) − bx ( x3 − x )]( x3 − x ) = 0
Simplifying the expression.
( y1 − y )( x1 − x ) − bx ( x1 − x ) 2 + ( y2 − y )( x2 − x ) − bx ( x2 − x ) 2
+ ( y3 − y )( x3 − x ) − bx ( x3 − x ) 2 = 0
Moving all terms containing bx to the right side.
( y1 − y )( x1 − x ) + ( y2 − y )( x2 − x ) + ( y3 − y )( x3 − x ) = bx ( x1 − x ) 2 + bx ( x2 − x )2 + bx ( x3 − x ) 2

Factoring out bx from the right side terms.


( y1 − y )( x1 − x ) + ( y2 − y )( x2 − x ) + ( y3 − y )( x3 − x ) = bx [( x1 − x ) 2 + ( x2 − x )2 + ( x3 − x ) 2 ]

Solving for bx.


( y1 − y )( x1 − x ) + ( y2 − y )( x2 − x ) + ( y3 − y )( x3 − x )
bx =
( x1 − x ) 2 + ( x2 − x ) 2 + ( x3 − x )2
Now, let us generalize this to a sample size of T:
16

∑(y t − y )( xt − x )
bx = t =1
T

∑ (x − x )
t =1
t
2

Therefore, we have justified our second equation.


Let us return to Professor Lord’s first quiz to calculate the constant and
slope, bConst and bx, of the ordinary least squares (OLS) best fitting line for the
first quiz’s data. We have already computed the means for the quiz scores and
minutes studied:
x +x +x 5 + 15 + 25 45
x= 1 2 3 = = = 15
3 3 3
y + y2 + y3 66 + 87 + 90 243
y= 1 = = = 81
3 3 3
Now, for each student calculate the deviation of y from its mean and the
deviations of x from its mean:
Student yt y yt − y xt x xt − x
1 66 81 −15 5 15 −10
2 87 81 6 15 15 0
3 90 81 9 25 15 10
Next, for each student calculate the products of the y and x deviations and squared
x deviations:
Student ( yt − y )( xt − x ) ( xt − x ) 2
1(−15)(−10) = 150 (-10) 2 = 100
2 (6)(0) = 0 (0) 2 = 0
2
3 (9)(10) = 90 (10) = 100
Sum = 240 Sum = 200
bx equals the sum of the products of the y and x deviations divided by the sum of
the squared x deviations:
T

∑(y t − y )( xt − x )
240 6
bx = t =1
T
= = = 1.2
∑ (x − x ) 2 200 5
t
t =1
17

Score (y)

100

Std 3
90
Std 2
OLS Estimate: y = βConst + βx
= 63 + 1.2x
80
− −
(x, y) = (15, 81)

70
Std 1

60

50

5 10 15 20 25 30
Minutes (x)
Figure 5.6: Minutes and Scores Scatter Diagram with OLS Best Fitting Line

To calculate bConst recall that the best fitting line passes through the point
representing the average value of x and y, ( x , y ) :
y = bConst + bx x
Solving for bConst,
bConst = y − bx x
We just learned that bx equals 6/5. The average of the x’s, x , equals 15 and the
average of the y’s, y , equals 81. Substituting,
6
bConst = 81 − x
5
= 81 − 18 = 63
Using the ordinary least squares (OLS) estimation procedure, the best
fitting line for Professor Lord’s first quiz is:
6
y = 63 + x = 63 + 1.2 x
5
Consequently, the least squares estimates for βConst and βx are 63 and 1.2. These
estimates suggest that Professor Lord gives each student 63 points just for
18

showing up; each minute studied earns the student 1.2 additional points. Based on
the regression we estimate that:
• 1 additional minute studied increases the quiz score by 1.2 points.
• 2 additional minutes studied increase the quiz score by 2.4 points.
• etc.
Let us now quickly calculate the sum of squared residuals for the best fitting
line:
6
Student xt yt Estyt = 63 + xt = 63 + 1.2 xt Rest = yt − Estyt Rest2
5
6 9
1 5 66 63 + ×5 = 63 + 6 = 69 66 – 69 = −3
5
6 36
2 15 87 63 + ×15 = 63 + 6×3 = 63 + 18 = 81 87 – 81 = 6
5
6 9
3 25 90 63 + ×25 = 63 + 6×5 = 63 + 30 = 93 90 – 93 = −3
5
SSR = 54
The sum of squared residuals for the best fitting line is 54.
Econometrics Lab 5.1: Finding the Ordinary Least Squares (OLS) Estimates
We can use our Econometrics Lab to emphasize how the ordinary least squares
(OLS) estimation procedure determines the best fitting line by accessing the Best
Fit simulation.

[Link to MIT-Lab 5.1 goes here.]

By default the data from Professor Lord’s first quiz are specified: the values of x
and y for the first student are 5 and 66, for the second student 15 and 87, and for
the third student 25 and 90:
19

Figure 5.7: Best Fitting Line Simulation – Data

Now, click Go. A new screen appears as shown in Figure 5.8 with two
slider bars, one slide bar for the constant and one for the coefficient.

Figure 5.8: Best Fitting Line Simulation – Parameter Estimates

By default the constant and coefficient values are 63 and 1.2, the ordinary least
squares (OLS) estimates. Also, the arithmetic used to calculate the sum of squared
residuals is displayed. When the constant equals 63 and the coefficient equals 1.2,
the sum of squared residuals equals 54.00; this just the value that we calculated.
Next, experiment with different values for the constant and coefficient values by
moving the two sliders. Convince yourself that the equations we used to calculate
20

the estimate for the constant and coefficient indeed minimize the sum of squared
residuals.
Software and the Ordinary Least Squares (OLS) Estimation Procedure
Fortunately, we do not have to trudge through the laborious arithmetic to compute
the ordinary least squares (OLS) estimates. Statistical software can do the work
for us.
Professor Lord’s First Quiz Data: Cross section data of minutes studied and
quiz scores in the first quiz for the 3 students enrolled in Professor Lord’s
class.

Minutes
Quiz
Studied
Student Score (y)
(x)
1 5 66
2 15 87
3 25 90
Table 5.3: First Quiz Results

[Link to MIT-Quiz1.wf1 goes here.]

Getting Started in EViews___________________________________________


We can use the statistical package EViews to perform the calculations. After
opening the workfile in EViews:
• In the Workfile window: Click on the dependent variable, y, first; and
then, click on the explanatory variable, x, while depressing the <Ctrl>
key.
• In the Workfile window: Double click on a highlighted variable
• In the Workfile window: Click Open Equation
• In the Equation Specification window: Click OK
This window previews the regression that will be run; note that the
dependent variable, “y,” is the first variable listed followed by two
expressions representing the explanatory variable, “x,” and the
constant “c.”
Do not forget to close the workfile.
__________________________________________________________________
21

Ordinary Least Squares (OLS)


Dependent Variable: y
Explanatory Variable(s): Estimate SE t-Statistic Prob
x 1.200000 0.519615 2.309401 0.2601
Const 63.00000 8.874120 7.099296 0.0891
Number of Observations 3
Sum Squared Residuals 54.00000
Estimated Equation: Esty = 63 + 1.2x
Interpretation of Estimates:
bConst = 63: Students receive 63 points for showing up.
bx = 1.2: Students receive 1.2 additional points for each additional minute
studied.
Critical Result: The coefficient estimate equals 1.2. The positive sign of the
coefficient estimate, suggests that additional studying increases
quiz scores. This evidence lends support to our theory.
Table 5.4: OLS First Quiz Regression Results

Table 5.4 reports the values of the coefficient and constant for the best fitting line.
Also, note that the sum of squared residuals for the best fitting line is also
included.
Importance of the Error Term
Recall the regression model:
yt = βConst + βxxt + et where yt = Quiz score of student t
xt = Minutes studied by student t
et = Error term for student t
The parameters of the model, the constant, βConst, and the coefficient, βx, represent
the actual number of
• points Professor Lord gives students just for showing up, βConst;
• additional points earned for each minute of study, βx.
Obviously, the parameters of the model play an important role, but what about the
error term, et? To illustrate the importance of the error term, suppose that
somehow we know the values of βConst and βx. For the moment, suppose that
βConst, the actual constant, equals 50 and βx, the actual coefficient, equals 2. In
words, this means that Professor Lord gives each student 50 points for showing
up; furthermore, each minute of study provides the student with 2 additional
points. Consequently, the regression model is:
yt = 50 + 2xt + et
22

NB: In the real world, we never know the actual values of the constant and
coefficient. We are assuming that we do here, just to illustrate the importance of
the error term.
The error term reflects all the factors that cannot be anticipated or
determined before the quiz is given; that is, the error term represents all random
influences. In the absence of random influences, the error terms would equal 0.
Absence of Random Influences: A What If Question

Score (y)
Std 3
100

90

Std 2
80

Actual: y = 50 + 2x
70

60
Std 1

50

5 10 15 20 25 30
Minutes (x)
Figure 5.9: Best Fitting Line with No Error Term

Assume, only for the moment, that there are no random influences; consequently,
each error term would equal 0. While this assumption is unrealistic, it allows us to
appreciate the important role played by the error term. Focus on the first student
taking Professor Lord’s first quiz. The first student studies for 5 minutes. In the
absence of random influences (that is, if e1 equaled 0), what score would the first
student receive on the quiz? The answer is 60.
y1 = 50 + 2×5 + 0 = 50 + 10 = 60
Next, consider the second student. The second student studies for 15 minutes. In
the absence of random influences, the second student would receive an 80 on the
quiz:
23

y2 = 50 + 2×15 + 0 = 50 + 30 = 80
The third student would receive a 100:
y3 = 50 + 2×25 + 0 = 50 + 50 = 100
We summarize this in Table 5.5:

Absence of
Random
Influences
Student Minutes (x) Score (y)
1 5 60
2 15 80
3 25 100
Table 5.5: Quiz Results with No Random Influences (No Error Term)

In the absence of random influences, the intercept and slope of the best fitting line
would equal the actual constant and the actual coefficient, βConst and βx:
y = βConst + βxx = 50 + 2x
Summary: In the absence of random influences, the error term of each
student equals 0 and the best fitting line fits the data perfectly. The slope of this
line equals 2, the actual coefficient, and the vertical intercept of the line equals 50,
the actual constant. Without random influences, it is easy to determine the actual
constant and coefficient. We shall now use a simulation to emphasize this point.
Econometrics Lab 5.2: Coefficient Estimates When Random Influences Are
Absent
24

Figure 5.10: Coefficient Estimate Simulation

The simulation allows us to do something we cannot do in the real world. It


allows us to specify the actual values of the constant and coefficient in the model;
that is, we can select βConst and βx. We can specify the number of
• points Professor Lord gives students just for showing up, βConst; by
default, βConst is set at 50.
• additional points earned for an additional minute of study, βx; by default,
βx is set at 2.
Consequently, the regression model is:
yt = 50 + 2xt + et
Each repetition of the simulation represents a quiz from a single week. In each
repetition, the simulation:
• Calculates the score for each student based on the actual constant (βConst),
the actual coefficient (βx), and the number of minutes the student studied;
then, to be realistic, the simulation can add a random influence in the form
of the error term, et. An error term is included whenever the Err Term
checkbox is checked.
• Applies the ordinary least squares (OLS) estimation procedure to estimate
the coefficient.
25

When the “Pause” box is checked the simulation stops after each
repetition; when it is cleared, quizzes are simulated repeatedly until the “Stop”
button is clicked.

[Link to MIT-Lab 5.2 goes here.]

We can eliminate random influences by clearing the Err Term box. After doing
so, click Start and then continue a few times. We discover that in the absence of
random influences the estimate of the coefficient value always equals the actual
value, 2:
26

Coefficient Estimate:
Repetition No Error Term
1 2.0
2 2.0
3 2.0
4 2.0
Table 5.6: Simulation Results with No Random Influences (No Error Term)

This is precisely what we concluded earlier from the scatter diagram. In the
absence of random influences, the best fitting line fits the data perfectly. The best
fitting line’s slope equals the actual value of the coefficient.
Presence of Random Influences: Back to Reality
The real world is not that simple, however; random influences play an important
role. In the real world, random influences are inevitably present:

Inclusion of
Random
Influences
Student Minutes (x) Score (y)
1 5 66
2 15 87
3 25 90
Table 5.7: Quiz Results with Random Influences (with Error Term)

In Figure 5.11, the actual scores on the first quiz have been added to the scatter
diagram. As a consequence of the random influences, Students 1 and 2 over
perform while Student 3 under performs.
27

Score (y)

100

Std 3
90
Std 2

80

Actual: y = 50 + 2x
70
Std 1

60

50

5 10 15 20 25 30
Minutes (x)
Figure 5.11: Scatter Diagram with Error Term

As illustrated in Figure 5.12, when random influences are present, we


cannot expect the intercept and slope of the best fitting line to equal the actual
constant and the actual coefficient. The intercept and slope of the best fitting line,
bConst and bx, are affected by the random influences.
28

Score (y)

100

Std 3
90
Std 2
OLS Estimate: y = 63 + 1.2x
80

Actual: y = 50 + 2x
70

Std 1
60

50

5 10 15 20 25 30
Minutes (x)
Figure 5.12: OLS Best Fitting Line with Error Term

Consequently, the intercept and slope of the best fitting line, bConst and bx, are
themselves random variables. Even if we knew the actual constant and slope, that
is, if we knew the actual values of βConst and βx, we could not predict the values of
the constant and slope of the best fitting line, bConst and bx, with certainty before
the quiz was given.
Econometrics Lab 5.3: Coefficient Estimates When Random Influences Are
Present
We shall now use the Coefficient Estimate simulation to emphasize this point. We
shall show that in the presence of random influences, the coefficient of the best
fitting line is a random variable.

[Link to MIT-Lab 5.3 goes here.]

Note that the error term checkbox is now checked to include the error term. Be
certain that the Pause checkbox is checked and then click Start. When the
simulation computes the best fitting line, the estimated value of the coefficient
typically is not 2 despite the fact that the actual value of the coefficient is 2. Click
the “Continue” button a few more times to simulate each successive week’s quiz.
29

What do you observe? We simply cannot expect the coefficient estimate to equal
the actual value of the coefficient. In fact, when random influences are present,
the coefficient estimate almost never equals the actual value of the coefficient.
Sometimes the estimate is less than the actual value, 2, and sometimes it is greater
than the actual value. When random influences are present, the coefficient
estimates are random variables:

Coefficient Estimate:
Repetition With Error Term
1 1.8
2 1.6
3 3.2
4 1.9
Table 5.8: Simulation Results with Random Influences (with Error Term)

While your coefficient estimates will no doubt differ from the ones in
Table 5.8, one thing is clear. Even if we know the actual value of the coefficient,
as we do in the simulation, we cannot predict with certainty the value of the
estimate from one repetition. Our last two simulations illustrate a critical point:
The coefficient estimate is a random variable as a consequence of the random
influences introduced by each student’s error term.
Error Terms and Random Influences: A Closer Look
We shall now use a simulation to gain insights into random influences and error
terms. As we know, random influences are those factors that cannot be anticipated
or determined beforehand. Sometimes random influences lead to a higher quiz
score and other times they lead to a lower score. The error terms embody these
random influences:
• Sometimes the error term is positive indicating that the score is higher
than “usual”;
• Other times the error term is negative indicating that the score is lower
than “usual.”
If the random influences are indeed random, they should be a “wash” after many,
many quizzes. That is, random influences should not systematically lead to higher
or lower quiz scores. In other words, if the error terms truly reflect random
influences, they should average out to 0 “in the long run.”
Econometrics Lab 5.4: Error Terms When Random Influences Are Present
Let us now check to be certain that the simulations are capturing random
influences properly by accessing the Random Influence – Error Terms simulation.

[Link to MIT-Lab 5.4 goes here.]


30

Figure 5.13: Error Term Simulation

Initially, the Pause checkbox is checked and the error term variance is 500.
Now, click Start and observe that the simulation reports the numerical value error
term for each of the three students. Record these three values. Also, note that the
simulation constructs a histogram for each student’s error term and also reports
the mean and variance. Click Continue again to observe the numerical values of
the error terms for the second quiz. Confirm that the simulation is calculating the
mean and variance of each student’s error terms correctly. Click Continue a few
more times. Note that the error terms are indeed random variables. Before the
quiz is given, we cannot predict the numerical value of a student’s error term.
Each student’s histogram shows that sometimes the error term for that student is
positive and sometimes it is negative. Next, clear the Pause checkbox and click
Continue. After many, many repetitions, click Stop.

Student 1 Student 2 Student 3

Mean: .0 Variance: 500. Mean: .0 Variance: 500. Mean: .0 Variance: 500.


Figure 5.14: Error Term Simulation Results

After many, many repetitions, the mean (average) of each student’s error
terms equals about 0. Consequently, each student’s error term truly represents a
random influence; it does not systematically influence the student’s quiz score. It
is also instructive to focus on each student’s histogram. For each student, the
numerical value of the error term is positive about half the time and negative
about half the time after many, many repetitions.
Summary: The error terms represent random influences; consequently, the error
terms have no systematic effect on quiz scores, the dependent variable:
• Sometimes the error term is positive indicating that the score is higher
than “usual”;
31

• Other times the error term is negative indicating that the score is lower
than “usual.”
What can we say about the student’s error terms beforehand, before the next quiz?
We can describe their probability distribution. The chances that a student’s error
term will be positive is the same as the chances it will be negative. For any one
quiz, the mean of each student’s error term’s probability distribution equals 0:
Mean[e1] = 0 Mean[e2] = 0 Mean[e3] = 0
↓ ↓ ↓
e1 has no systematic e2 has no systematic e3 has no systematic
effect on Student 1’s score effect on Student 2’s score effect on Student 3’s score
↓ ↓ ↓
e1 represents e2 represents e3 represents
a random influence a random influence a random influence
Standard Ordinary Least Squares (OLS) Premises
Initially, we shall make some strong assumptions regarding the explanatory
variables and the error terms:
• Error Term Equal Variance Premise: The variance of the error term’s
probability distribution for each observation is the same; all the variances
equal Var[e]:
Var[e1] = Var[e2] = … = Var[eT] = Var[e]
• Error Term/Error Term Independence Premise: The error terms are
independent: Cov[ei, ej] = 0.
Knowing the value of the error term from one observation does not
help us predict the value of the error term for any other observation.
• Explanatory Variable/Error Term Independence Premise: The
explanatory variables, the xt’s, and the error terms, the et’s, are not
correlated.
Knowing the value of an observation’s explanatory variable does not
help us predict the value of that observation’s error term.
We call these premises the standard ordinary least squares (OLS)
premises. They make the analysis as straightforward as possible. In Part Four of
this textbook, we relax these premises to study more general cases. Our strategy is
to start with the most straightforward case and then move on to more complex
ones. While we only briefly cite the premises here, we shall return to them in the
fourth part of the textbook to study their implications.
Clint’s Assignment: The Two Parts
Recall Clint’s assignment. He must assess the effect of studying on quiz scores by
using Professor Lord’s first quiz as evidence. Clint can apply the ordinary least
squares (OLS) estimation procedure; the OLS estimate for the value of the
coefficient is 1.2. But we now know that the estimate is a random variable. We
32

cannot expect the coefficient estimate from the one quiz, 1.2, to equal the actual
value of the coefficient, the actual impact that studying has on a student’s quiz
score. We shall proceed by dividing Clint’s assignment into two related parts:
• Reliability of the Coefficient Estimate: How reliable is the coefficient
estimate calculated from the results of the first quiz? That is, how
confident should Clint be that the coefficient estimate, 1.2, will be close to
the actual value?
• Assessment of the Theory: In view of the fact that Clint’s estimate of the
coefficient equals 1.2, how confident should Clint be that the theory is
correct, that additional studying increases quiz scores?
In the next few chapters, we shall address these issues.

1
NB: These data are not “real.” Instead, they were constructed to illustrate
important pedagogical points.
2
There is another convention that is often used to denote the parameter estimates,
the “beta-hat” convention. The estimate of the constant is denoted by βˆConst and
the coefficient by βˆ . While the Roman italicized b’s estimation convention will
x
be used throughout this textbook, be aware that you will come across textbooks
and articles that use the beta-hat convention. The b’s and β̂ ’s denote the same
thing; they are interchangeable.

You might also like