You are on page 1of 14

Chapter 1 Simple Linear Regression

(Part 2)
1 Software R and regression analysis
Downloadable from http://www.r-project.org/; some useful commands
setwd(path..) ... to change the directory for data loading and saving
read.table .... for reading/loading data
data$variable .... variable in the data
plot(X, Y) ... plotting Y against X (starting a new plot);
lines(X, Y)... to add lines on an existing plot.
object = lm(y x)... to call lm to estimate a model and stored the calculation
results in object
Exporting the plotted gure (save as .pfd, .ps or other les)
Example 1.1 Suppose we have 10 observations for (X, Y ): (1.2, 1.91), (2.3, 4.50), (3.5,
2.13), (4.9, 5.77), (5.9, 7.40), (7.1, 6.56), (8.3, 8.79), (9.2, 6.56), (10.5, 11.14), (11.5, 9.88).
They are stored in le (data010201.dat). We hope to t a linear regression model
Y
i
=
0
+
1
X
i
+
i
, i = 1, ..., n
code of R (the words after # are comments only)
mydata = read.table(data010201.dat) # read the data from the file
1
X = mydata$V1 # select X
Y = mydata$V2 # select Y
plot(X, Y) # plot the observations (data)
myreg = lm(Y X) # do the linear regression
summary(myreg) # output the estimation
Coecients:
Estimate Std. Error t value Pr(> |t|)
(Intercept) 1.3931 0.9726 1.432 0.189932
X 0.7874 0.1343 5.862 0.000378 ***

Sign. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


Residual standard error: 1.406 on 8 degrees of freedom
Multiple R-squared: 0.8111, Adjusted R-squared: 0.7875
F-statistic: 34.36 on 1 and 8 DF, p-value: 0.0003778
lines(X, myreg$fitted) # plot the fitted
title("Scatter of (X,Y) and fitted linear regression model") # add title
# Please get to know how to make a gure le for latter use
2 4 6 8 10
2
4
6
8
1
0
X
Y
Scatter of (X,Y) and fitted linear regression model
Figure 1: (R code)
2
The tted regression line/model is

Y = 1.3931 + 0.7874X
For any new subject/individual with X

, its prediction of E(Y ) is

Y = b
0
+ b
1
X

.
For the above data,
If X

= 3, then we predict

Y = 0.9690
If X

= 3, then we predict

Y = 3.7553
If X

= 0.5, then we predict



Y = 1.7868
2 Properties of Least squares estimators
Statistical properties in theory
LSE is unbiased: E{b
1
} =
1
, E{b
0
} =
0
.
Proof: By the model, we have

Y =
0
+
1

X +
and
b
1
=

n
i=1
(X
i


X)(Y
i


Y )

n
i=1
(X
i


X)
2
=

n
i=1
(X
i


X)(
0
+
1
X
i
+
i

1

X )

n
i=1
(X
i


X)
2
=
1
+

n
i=1
(X
i


X)(
i
)

n
i=1
(X
i


X)
2
=
1
+

n
i=1
(X
i


X)
i

n
i=1
(X
i


X)
2
recall that E
i
= 0. It follows that
Eb
1
=
1
.
3
For b
0
,
E(b
0
) = E(

Y b
1

X) =
0
+
1

X E(b
1
)

X =
0
+
1

X
1

X
=
0
Variance of the estimators
V ar(b
1
) =

2

n
i=1
(X
i


X)
2
, V ar(b
0
) =
1
n

2
+

X
2

n
i=1
(X
i


X)
2

2
[Proof:
V ar(b
1
) = V ar(

n
i=1
(X
i


X)
i

n
i=1
(X
i


X)
2
)
= {
n

i=1
(X
i


X)
2
}
2
V ar{
n

i=1
(X
i


X)
i
}
= {
n

i=1
(X
i


X)
2
}
2
n

i=1
(X
i


X)
2

2
=

2

n
i=1
(X
i


X)
2
.
We shall prove the second equation later.]
Estimated (tted) regression function

Y
i
= b
0
+b
1
X
i
. We also call

Y
i
= b
0
+b
1
X
i
the
tted value.
E{

Y
i
} = EY
i
[Proof:
E(

Y
i
) = E(b
0
+ b
1
X
i
) = E(b
0
) + E(b
1
)X
i
=
0
+
1
X
i
= EY
i
]
Numerical properties of tted regression line
Recall the normal equations
2
n

i=1
(Y
i
b
0
b
1
X
i
) = 0
2
n

i=1
X
i
(Y
i
b
0
b
1
X
i
) = 0
4
and e
i
= Y
i


Y
i
= Y
i
b
0
b
1
X
i
. It follows
n

i=1
e
i
= 0
n

i=1
X
i
e
i
= 0
The following properties follows

i=1
e
i
= 0

i=1
Y
i
=
n

i=1

Y
i

i=1
e
2
i
= min
b
0
,b
1
{Q}

i=1
X
i
e
i
= 0

i=1

Y
i
e
i
= 0
Regression line always goes to (

X,

Y )
Y
i


Y =
1
(X
i


X) +
i
, where
i
=
i
.
The coecient and the correlation coecient
b
1
= r
X,Y
s
Y
s
X
where
s
X
=

_
n

i=1
(X
i


X)
2
n 1
, s
Y
=

_
n

i=1
(Y
i


Y )
2
n 1
r
X,Y
=

n
i=1
(X
i


X)(Y
i


Y )
_

n
i=1
(X
i


X)
2

n
i=1
(Y
i


Y )
2
2.1 Estimation of Error Terms Variance
2
Sum of squares of residuals or error sum of squares (SSE)
SSE =
n

i=1
(Y
i


Y
i
)
2
=
n

i=1
e
2
i
5
Estimate
2
by
s
2
=
n

i=1
(Y
i


Y
i
)
2
n 2
=
n

i=1
e
2
i
n 2
called mean squared error (MSE), i.e.
MSE =
n

i=1
e
2
i
n 2
or denoted by
2
.
Why is it divided by n 2? because there are TWO constraints on e
i
, i = 1, ..., n, i.e.
the normal equations.
s
2
is unbiased estimator of
2
, i.e. E(s
2
) =
2
[Proof: For any
1
, ...,
n
IID with mean and variance
2
, we have
E
n

i=1
(
i


)
2
= E
n

i=1
[(
i
) (

)]
2
= E{
n

i=1
(
i
)
2
n(

)
2
}
=
n

i=1
V ar() nV ar(

)
= n
2

2
= (n 1)
2
This is why we estimate
2
by

2
=

n
i=1
(
i


)
2
n 1
.
Consider
V ar(
1


) = V ar{(1
1
n
)
1

n-1 terms
..
1
n

2
...
1
n

n
}
= (1
1
n
)
2

2
+
1
n
2

2
+ ... +
1
n
2

2
= (1
2
n
+
1
n
2
)
2
+
n 1
n
2

2
= (1
1
n
)
2
.
6
similarly, for any i,
V ar(
i


) = (1
1
n
)
2
.
Now turn to the estimator s
2
. Consider
E{
n

i=1
(Y
i


Y
i
)
2
} =
n

i=1
E(Y
i


Y
i
)
2
=
n

i=1
V ar(Y
i


Y
i
) +{E(Y
i


Y
i
)}
2
=
n

i=1
V ar{(Y
i


Y b
1
(X
i


X))
2
}
=
n

i=1
{V ar(Y
i


Y ) 2Cov(Y
i


Y , b
1
(X
i


X)) + V ar(b
1
)(X
i


X)
2
}
=
n

i=1
{V ar(Y
i


Y ) 2Cov((Y
i


Y )(X
i


X), b
1
) + V ar(b
1
)(X
i


X)
2
}
=
n

i=1
{V ar(
i
) 2Cov((Y
i


Y )(X
i


X), b
1
) + V ar(b
1
)(X
i


X)
2
}
= (n 1)
2
2Cov(
n

i=1
(Y
i


Y )(X
i


X), b
1
) + V ar(b
1
)
n

i=1
(X
i


X)
2
= (n 1)
2
2Cov(b
1
n

i=1
(X
i


X)
2
, b
1
) + V ar(b
1
)
n

i=1
(X
i


X)
2
= (n 1)
2
V ar(b
1
)
n

i=1
(X
i


X)
2
= (n 2)
2
.
Thus
E(s
2
) =
2
Example For the above example, the MSE (estimator of
2
= V ar(
i
)) is
MSE =
n

i=1
e
2
i
/(n 2) = 1.975997.
or
=

MSE = 1.405702
which is also called Residual standard error.
How to nd the value in the output of R?
7
3 Regression Without Predictors
At rst glance, it doesnt seem that studying regression without predictors would be very
useful. Certainly, we are not suggesting that using regression without predictors is a major
data analysis tool. We do think that it is worthwhile to look at regression models without
predictors to see what they can tell us about the nature of the constant. Understanding the
regression constant in these simpler models will help us to understand both the constant
and the other regression coecients in later more complex models.
Model
Y
i
=
0
+
i
, i = 1, 2, ..., n.
where as before, we assume

i
, i = 1, 2, ..., n are IID with E(
i
) = 0 and V ar(
i
) =
2
(We shall call this model Regression Without Predictors)
The least square estimator b
0
is to minimizer of
Q =
n

i=1
{Y
i
b
0
}
2
Note that
dQ
db
0
= 2
n

i=1
{Y
i
b
0
}
Letting it equal 0, we have the normal equation
n

i=1
{Y
i
b
0
} = 0
which leads to the (ordinary) least square estimator
b
0
=

Y .
The tted model is

Y
i
= b
0
.
The tted residuals are
e
i
= Y
i


Y
i
= Y
i


Y
i
8
Can you prove the estimator is unbiased, i.e Eb
0
=
0
?
How to estimate
2
?

2
=
1
n 1
n

i=1
e
2
i
Why it is divided by n 1?
4 Inference in regression
Next, we consider the simple linear regression model
Y
1
=
0
+
1
X
1
+
1
Y
2
=
0
+
1
X
2
+
2
.
.
. (1)
Y
n
=
0
+
1
X
n
+
n
under assumptions of normal random errors.
X
i
is a known, observed, and nonrandom

1
, ...,
n
are independent N(0,
2
), Thus Y
i
is random

0
,
1
and
2
are parameters.
By the assumption, we have
E(Y
i
) =
0
+
1
X
i
and
V ar(Y
i
) =
2
4.1 Inference of
1
We need to check whether
1
= 0 (or any other specied value, say -1.5), why
To check whether X and Y has linear relationship
9
To see whether the model can be simplied (if
1
= 0, the model becomes Y
i
=
0
+
i
,
a regression model without predictors.) For example, Hypotheses H
0
:
1
= 0 v.s.
H
a
:
1
= 0
Sample distribution of b
1
recall
b
1
=

n
i=1
(X
i


X)(Y
i


Y )

n
i=1
(X
i


X)
2
Theorem 4.1 For model (1) with normal assumption of
i
then
b
1
N
_

1
,

2

n
i=1
(X
i


X)
2
_
Proof Recall the fact that any linear combination of independent normal distributed random
variables is still normal. To nd its distribution, we only need to nd its mean and variance.
Since Y
i
are normal and independent, thus b
1
is normal, and
Eb
1
=
1
and (we have proved that)
V ar(b
1
) =

2

n
i=1
(X
i


X)
2
The theorem follows.
Question: what is the distribution of b
1
/
_
V ar(b
1
) under H
0
? Can we use this Theorem
to test the hypothesis H
0
? why
Estimated Variance of b
1
. (Estimating
2
by MSE)
s
2
(b
1
) =
MSE

n
i=1
(X
i


X)
2
=

n
i=1
e
2
i
/(n 2)

n
i=1
(X
i


X)
2
s(b
1
) is the Standard Error (or S.E.) of b
1
, (or called Standard deviation)
sample distribution of (b
1

1
)/s(b
1
)
b
1

1
s(b
1
)
follows t(n 2) for model (1)
Condence interval for
1
. Let t
1/2
(n2) or t(1/2, n2) the 1/2quantile
of t(n 2).
P(t(/2, n 2) (b
1

1
)/s(b
1
) t(1 /2, n 2)) = 1
10
By symmetry of the distribution, we have
t(1 /2, n 2) = t(/2, n 2)
Thus, with condence 1 , we have the condence interval for
1
is
t(1 /2, n 2) (b
1

1
)/s(b
1
) t(1 /2, n 2)
i.e.
b
1
t(1 /2, n 2) s(b
1
)
1
b
1
+ t(1 /2, n 2) s(b
1
)
Example 4.2 For the example above, nd the 95% condence interval for
1
?
solution: since n = 10, we have t(1 0.05/2, 8) = 2.306; the SE for b
1
is s(b
1
) = 0.1343.
Thus the condence interval is
b
1
t(1 0.05/2, 8) s(b
1
) = 0.7874 2.306 0.1343 = [0.4777, 1.0971]
Test of
1
Two-sided Test: to check whether
1
is 0
H
0
:
0
= 0, H
a
:
1
= 0
Under H
0
, we have random variable
t =
b
1
s(b
1
)
t(n 2)
Suppose the signicance level is (usually, 0.05, 0.01). Calculate t, say t

If |t

| t(1 /2; n 2), then accept H


0
.
If |t

| > t(1 /2; n 2), then reject H


0
.
The test can also be done based on the p-value, dened as p = P(|t| > |t

|). It is
easy to see that
p-value < |t

| > t(1 /2; n 2)


Thus
11
If p-value , then accept H
0
.
If p-value < , then reject H
0
.
One-sided test: for example to check whether
1
is positive (or negative)
H
0
:
1
0, H
a
:
1
< 0
Under H
0
, we have
t =
b
1
s(b
1
)
=
b
1

1
s(b
1
)
+

1
s(b
1
)
t(n 2) + a positive term
Suppose the signicance level is (usually, 0.05, 0.01). Calculate t, say t

If t

t(; n 2), then accept H


0
.
If t

< t(; n 2), then reject H


0
.
4.2 Inference about
0
Sample distribution of b
0
b
0
=

Y b
1

X
Theorem 4.3 For model (1) with normal assumption of
i
then
b
0
N
_

0
,
2
[
1
n
+

X
2

n
i=1
(X
i


X)
2
]
_
[Proof The expectation is
Eb
0
= E{

Y } E(b
1
)

X = (
0
+
1

X)
1

X =
0
Let k
i
=
(X
i


X)

n
i=1
(X
i


X)
2
, then (see the proof at the beginning of this part)
b
1
=
1
+
n

i=1
k
i

i
.
Thus
b
0
=
0
+
1
n
n

i=1

i=1
k
i

i
=
0
+
n

i=1
[
1
n
k
i

X]
i
The variance is
V ar(b
0
) =
n

i=1
[
1
n
k
i

X]
2

2
= [
1
n
+

X
2

n
i=1
(X
i


X)
2
]
2
12
Therefore the Theorem follows.]
Estimated Variance of b
0
(by replacing
2
with MSE).
s
2
(b
0
) = MSE
_
1
n
+

X
2

n
i=1
(X
i


X)
2
_
s(b
0
) is the Standard Error (or S.E.) of b
0
, (or called Standard deviation)
Sample distribution of (b
0

0
)/s(b
0
)
b
0

0
s(b
0
)
follows t(n 2) for model (1)
Condence interval for
0
: with condence 1 , we have the condence interval
b
0
t(1 /2, n 2) s(b
0
)
1
b
0
+ t(1 /2, n 2) s(b
0
)
Test of
0
Two-sided Test: to check whether
1
is 0
H
0
:
0
= 0, H
a
:
0
= 0
Under H
0
, we have
t =
b
0
s(b
0
)
t(n 2)
Suppose the signicance level is (usually, 0.05, 0.01). If the calculated t, say t

If |t

| t(1 /2; n 2), then accept H


0
.
If |t

| > t(1 /2; n 2), then reject H


0
.
Similarly, the test can also be done based on the p-value, dened as p = P(|t| > |t

|).
It is easy to see that
p-value < |t

| > t(1 /2; n 2)


Thus
If p-value , then accept H
0
.
13
If p-value < , then reject H
0
.
One-sided test:to check whether
1
is positive (or negative)
H
0
:
0
0, H
a
:
0
> 0
Example 4.4 For the example above, with signicance level 0.05,
1. Test H
0
:
0
= 0 versus H
1
:
0
= 0
2. Test H

0
:
1
= 0 versus H

1
:
1
= 0
3. Test H

0
:
0
0 versus H

1
:
0
< 0
Answer:
1. since n = 10, t(0.975, 8) = 2.306. |t

| = 1.432 < 2.306. Thus, we accept H


0
(another approach: p-value = 0.1899 > 0.05, we accept H
0
)
2. The t-value is |t

| = 5.862 > 2.306, thus we reject H

0
, i.e. b
1
is signicantly dierent
from 0.
(another approach: p-value = 0.000378 < 0.05, we reject H

0
)
3. t(0.05, 8) = 1.86, since t

= 1.3931 > 1.86 we accept H

0
How to nd these test from the output of the R code?
14

You might also like