1.2 Multiple Linear Regression

1.
2 Multiple Linear
Regression
Hector Lemus
Spring 2016
Multiple Linear Regression
Examine the relationship between a set of independent variables and a

single continuous dependent variable.
Uses of multiple linear regression:

1. To assess the relationship between the dependent and the independent
variables simultaneously taking into account the intercorrelations
among the independent variables.
2. To examine the effect of one or more variables on the dependent
variable after controlling (adjusting) for the effects of the other
variables in the model.
3. To assess the interaction of two or more independent variables with
respect to the dependent variable.
4. To develop a prediction equation.
Example
Dependent variable: Systolic blood pressure
Independent variables:
1. BMI
2. Age
3. Smoking history:
0 = Nonsmoker
1 = Current or Previous Smoker
Hypothetical Example
Carry out a clinical trial to examine the effectiveness of a drug to treat

hypertension.
Half of the patients are randomly assigned to an active drug and half
assigned to placebo.
The dependent variable is change in diastolic blood pressure from the
baseline evaluation to the 6-month evaluation.
Suppose we observe the following mean changes in DBP stratified by age:
Age
(years)
Drug Group
Active
Placebo
<60
-10
-2
60
-1
-2
Example of an interaction between drug and age. Differential effect of

active drug. Its effectiveness varies by age.
4
Hypothetical Example
Multiple linear regression may be used to assess the degree of interaction

and test whether the interaction is statistically significant.
Dependent variable: Change in DBP
Independent variables:
1. Age
2. Drug group (Active/Placebo)
3. Interaction term (to be discussed)
Multiple Linear Regression Model

Notation:
Let Y be the dependent variable
Let X1,, Xk be the independent variables
Model:
Y 0 1 X 1 2 X 2 L k X k E
k
0 i X i E
i 1
where 0, 1,, k are regression coefficients to be estimated and E is

the error term which is a random variable
E has a distribution and for testing hypotheses we need to make an

assumption about its distribution.
6
Assumptions for MLR

1. Y is a random variable with distribution of values for each specific
combination of values of the Xs.
2. The observations of Y are statistically independent of each other.

3. The mean value of Y for each specific combination of the Xs is given by
0 1 X 1 2 X 2 L k X k
4. The variance of Y is the same for any fixed combination of the Xs.
For hypothesis testing, we need one more assumption:

5. Y is normally distributed for each specific combination of the Xs.
7
Estimating with Least Squares

Basic Idea: Find estimates of the s which minimizes the sum of the squared
distances between the observed and corresponding predicted values.
X X L X
Y
0
1 1
2 2
k
k
Let the predicted value be
Find the estimated parameters
i 1
Yi Yi
i 1
0 , 1 ,K , k
which minimizes
Yi 0 1 X 1i L k X ki
This quantity defines the error sum of squares denoted by SSE. It is

also called the residual sum of squares.
The difference between the observed and predicted value of Y Y yields

an estimate for E.
E Y Y
is called the residual.

8
Computing the Parameter Estimates
Details for computing the s will NOT be discussed.
Requires matrix algebra and calculus.
ANOVA Table for MLR

n
SSY Yi Y
SSE
i 1
i 1
SS Reg SS Res
SSY SSE
SSY SSE SSE
i 1
Source
df
SS
Regression
SSY SSE
nk1
SSE
n1
SSY
Residual
Total
Yi Yi
Yi Y
MS
SSY SSE
MSReg
k
SSE
MS Res
n k 1
F
MSReg
F
MS Res
10
Coefficient of Determination
Proportion of variability of Y that can be explained by the model
R2
SSY SSE
SSY
0 R2 1
If R2 = 1, we have a perfect fit. The model explains all of the variability.
If R2 = 0, the model explains none of the variability.
11
Testing Hypotheses in MLR

Three test types:
1. Overall test: Does the set of independent variables taken together explain a
significant amount of the variability in Y?
Taken together, does the set of BMI, Age and Smoking History explain a
significant amount of the variability of SBP?
2. Test for addition of a single variable: Given a set of independent variables in
the model, does the addition of one variable explain a significant amount of the
variability of Y?
Evaluate the relationship between one independent variable and Y after controlling
(adjusting) for the other variables in the model.
Given that Age and Smoking History are in the model, what is the relationship
between BMI and SBP?
3. Test for the addition of a group of variables: Given a set of independent
variables in the model, does the addition of another set of variables explain a
significant amount of the variability of Y?
Assess the relationship of a set of behavioral variables measuring stress on DBP

adjusting for known factors related to DBP such as Age and BMI.
12
Nested Models: The Full
A full model contains all of the variables of interest.
For example:
Suppose we test the association between BMI and SBP after adjusting for
Age and Smoking History
SBP 0 1 (BMI) 2 (Age) 3 (Smoking History) E
H 0 : 1 0
13
Nested Models: The Reduced
If H0 is true, then the most appropriate model is

SBP 0 2 (Age) 3 (Smoking History) E
This is the reduced model when H0 is true.
Testing H0 is equivalent to testing which of the two models is most

appropriate.
Note that no new variables are introduced in the reduced model.
The concepts of nested (full and reduced) models will apply to all of the
tests that we discuss.
14
Test for Overall Regression
The full model:
Have k independent variables.
Three ways of stating the same null hypothesis

1.
H0: The k independent variables taken together do not explain a
significant amount of the variability in Y.
2.
H0: The overall regression using the k independent variables is not
statistically significant.
3.
H0: 1 = 2 = = k = 0
The reduced model:
Y 0 1 X 1 2 X 2 L k X k E
Y 0 E
15
The Test Statistic
Use the F statistic from the ANOVA table

F
MS Reg
MS Res
When H0 is true F ~ F-dist with k and n-k-1 degrees of freedom
Reject H0 for large values of F
Fk, n-k-1,
n-k-1, 1-
1-: the 100(1 - ) percentile from the F-dist with k and n-k-1 degrees
of freedom, where is our chosen level of significance.
Decision rule: Reject H0 if F > Fk, n-k-1,

n-k-1, 1-
1-
The percentile is the critical value or critical point.
Alternatively, compute the p-value and compare to the level.
16
SBP Example
Determine whether BMI, age and smoking history taken together account
for a significant amount of the variability of SBP.
Y: SBP, X1: BMI, X2: Age, X3: Smoking History
n = 32 subjects k = 3
Full model:
Y 0 1 X 1 2 X 2 3 X 3 E
H 0 : 1 = 2 = 3 = 0
Reduced model:
Y 0 E
Under H0, F follows F-dist with k = 3 and n-k-1 = 28 df.

17
SAS Output
The REG Procedure
Model: MODEL1
Dependent Variable: SBP Systolic Blood Pressure (mmHg)
Number of Observations Read
Number of Observations Used
At = 0.05,
F3, 28, 0.95 = 2.95
32
32
Analysis of Variance
Source
DF
Model
Error
Corrected Total
3
28
31
Root MSE
Dependent Mean
Coeff Var
Sum of
Squares
4889.82570
1536.14305
6425.96875
7.40691
144.53125
5.12478
Mean
Square
1629.94190
54.86225
R-Square
Adj R-Sq
F Value
29.71
0.7609
0.7353
Pr > F
At = 0.01,
F3, 28, 0.99 = 4.57
<.0001
At = 0.001,
F3, 28, 0.999 = 7.19
Reject H0 and conclude that taken together the 3 variables account for a
significant amount of the variability of SBP.
18
The Partial F Test
The regression sum of squares must be partitioned into components that

can be used to test hypotheses about individual variables
One type of breakdown is sequential, variables-added-in-order
Called Type I in SAS
X1: BMI, X2: Age, X3: Smoking History
Source
df
SS
X1
1 3537.95
Regression X 2 | X 1
1 582.65
X | X , X 1 769.23
3
1
2
Residual
28 1536.14
19
SS(X1)
The sum of squares explained by using only X1 in the model.

This may be used to test whether BMI is linearly related to SBP without
adjusting for any other variables.
Since, technically, X2 and X3 are not in the model, then pool their terms
with the residual.
SSRes = 1536.14 + 582.65 + 769.23 = 2888.02
dfRes = 28 + 1 + 1 = 30
So given the full model:

Test H0: 1 = 0 using
Y 0 1 X 1 E
3537.95 / 1 3537.95
36.75
2888.02 / 30
96.28
20
SS(X2|X1)
The extra sum of squares explained by adding Age to the model given BMI
already in the model.
Pooled error term:
SSRes = 1536.14 + 769.23 = 2305.37
dfRes = 28 + 1 = 29
Full:
Y 0 1 X 1 2 X 2 E
Reduced: Y 0 1 X 1 E
H0: 2 = 0 [Age is not related to SBP after adjusting for BMI.]

F
582.65 / 1
582.65
7.33
2305.37 / 29 79.50
21
SS(X3|X1, X2)
The extra sum of squares explained by adding Smoking history to the

model given BMI and Age already in the model.
Full:
Y 0 1 X 1 2 X 2 3 X 3 E
Reduced: Y 0 1 X 1 2 X 2 E
H0: 3 = 0 [Smoking history is not associated with SBP after adjusting for
BMI and Age.]
F
769.23 /1
769.23
14.02
1536.14 / 28 54.86
22
General Partial F Test

Y 0 1 X 1 2 X 2 L p X p * X * E
Full model:
H0: The addition of X* to the model does not explain a significant amount
of the variability of Y in the presence of X1, X2, , Xp.
H0: X* is not significantly related to Y controlling for X1, X2, , Xp.
H 0 : * = 0
Reduced model:
Y 0 1 X 1 2 X 2 L p X p E
23
Construction of the Test
To construct the partial F test, you need the extra sum of squares for X*.
Denote:
SS(X*| X1, X2, , Xp) = RegSS(X1, X2, , Xp, X*) RegSS(X1, X2, , Xp)
= RegSS(Full) RegSS(Reduced)
MSRes Full
We also need the MSRes for the full model:
So,
F X * | X 1 ,..., X p
SS X * | X 1 ,..., X p
SSRe s (Full)
n p2
MSRes (Full)
The statistic follows an F-dist with 1 and n-p-2 df
Reject the H0 if F(X*| X1, X2, , Xp) > F1, n-p-2,

n-p-2, 1-
1-
24
Example 1
Test whether smoking history is related to SBP after controlling for Age and
BMI.
Y 0 1 X 1 2 X 2 3 X 3 E
Full model:
H0: 3 = 0
From the table:

SS(X3|X1, X2) = 769.23
MSRes(X1, X2, X3) = 1536.14/28 = 54.86
F = 769.23/54.86 = 14.02
Since F1,28,0.999 = 13.5 p-value < 0.001

Reject H0 and conclude Smoking history is significantly related to SBP after
adjusting for BMI and Age.
25
Example 2
Test the relationship of BMI to SBP controlling for Age and Smoking history.
H 0 : 1 = 0
Full:
Y 0 1 X 1 2 X 2 3 X 3 E
Reduced: Y 0 2 X 2 3 X 3 E
We need SS(X1|X2, X3), but not available in the table.
Note that SS(X1|X2, X3) = SS(X1, X2, X3) SS(X2, X3)
We know that SS(X1, X2, X3) = 4889.83 from the SAS Output.
However, we would have to find SS(X2, X3) by fitting a model with only X2
and X3 in it.
It turns out SS(X2, X3) = 4689.69

26
Example 2 (cont.)
SS(X1|X2, X3) = 4889.83 4689.69 = 200.14
This is the marginal sum of squares, SAS can provide this information.
F(X1|X2, X3) = 200.14/54.86 = 3.65
F1, 28, 0.90 = 2.89
F1, 28, 0.95 = 4.20
0.05 < p-value < 0.10
Fail to reject H0 at = 0.05.

No evidence to suggest a significant relationship between SBP and BMI
adjusting for Age and Smoking history.
27
A T-test Equivalent
An equivalent test to the Partial F test.
*
*
Full model: Y 0 1 X 1 2 X 2 L p X p X E
Test: H0: * = 0
Could use F(X*| X1, X2, , Xp) or equivalently
*
where
is the estimated regression parameter
s
and * is the estimated standard error.
*
T
s *
For a two-sided Test:

Reject H0 if |T| > tn-p-2,1/2
n-p-2,1-
28
Example 2 (again)
Relationship of BMI to SBP adjusting for Age and Smoking History.
Parameter Estimates
Variable
Label
Intercept
BMI
AGE
SMK
Intercept
Body Mass Index
Age (years)
Smoking History
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
45.10319
1.22225
1.21271
9.94557
10.76488
0.63993
0.32382
2.65606
4.19
1.91
3.75
3.74
0.0003
0.0664
0.0008
0.0008
1.2223
1.91,
0.6399
p value 0.066
F = T2 = (1.91)2 = 3.65
29
Partitioning the RegSS

1.
SS ( X 1 )
SS ( X 2 | X 1 )
SS ( X 3 | X 1 , X 2 )
Leads to variables-added-in-order or sequential

testing.
This is SAS Type 1 SS.
Useful if there is an ordering to the independent variables.

2. SS ( X 1 | X 2 , X 3 )
SS ( X 2 | X 1 , X 3 )
SS ( X 3 | X 1 , X 2 )
Leads to variables-added-last or marginal testing.

Each test adjusts for all other variables in the
model.
This is SAS Type 2 SS.
With the exception of the last test, these tests are not equivalent.
30
SAS Code and Output

proc reg data=sbp_data;
model sbp = bmi age smk / ss1 ss2;
run;quit;
Parameter Estimates
Variable
Label
Intercept
BMI
AGE
SMK
Intercept
Body Mass Index
Age (years)
Smoking History
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
45.10319
1.22225
1.21271
9.94557
10.76488
0.63993
0.32382
2.65606
4.19
1.91
3.75
3.74
0.0003
0.0664
0.0008
0.0008
Parameter Estimates
Variable
Label
Intercept
BMI
AGE
SMK
Intercept
Body Mass Index
Age (years)
Smoking History
DF
Type I SS
Type II SS
1
1
1
1
668457
3537.94574
582.64651
769.23345
963.09739
200.14147
769.45920
769.23345
31
MLR Table
Multiple Linear Regression of Systolic Blood Pressure versus selected characteristics (n
(n = 32)
Characteristic
Estimated Coefficient
95% Confidence Interval
p-value
BMI (kg/m2)
1.2
-0.1, 2.5
0.066
Age (5 yr interval)
6.1
2.7, 9.4
<0.001
Smoking History
9.9
4.5, 15.4
<0.001
R2 = 0.76
32
Multiple Partial F Test

Given that a set of independent variables is in the model, test for the addition
of another set.
Uses:
1. The additional set represents a related group of variables; test a set of
behavioral variables controlling for a set of demographic variables.
2.
Test a set of interactions.
3.
Assess the relationship of a categorical variable with 3 or more categories.
33
Generalization of Partial F Test

Y 0 1 X 1 L p X p p* 1 X *p 1 L k* X k* E
Full model:
*
*
H0: The addition of Xp+1
p+1 , , Xk to the model does not explain a
significant amount of the variability of Y in the presence of X1, X2, , Xp.
*
*
H0: The set of Xp+1
p+1 , , Xk is not significantly related to Y controlling for
X1, X2, , Xp.
*
*
H0: p+1
p+1 = = k = 0
Reduced model:
Y 0 1 X 1 L p X p E
34
Construction of the Test
*
*
Need the extra sum of squares from adding Xp+1
p+1 , , Xk to the model.
Denote:
*
*
SS(Xp+1
p+1 , , Xk | X1, X2, , Xp) = RegSS(Full) RegSS(Reduced)
F X
*
p 1
,..., X | X 1 ,..., X p
*
k
SS X *p 1 ,..., X k* | X 1 ,..., X p / k p
So,
The statistic follows an F-dist with k-p and n-k-1 df
*
*
Reject the H0 if F(Xp+1
p+1 , , Xk | X1, X2, , Xp) > Fk-p, n-k-1,
n-k-1, 1-
1-
MSRes (Full)
35
Example: SBP Data

Test for a set of interactions.
Let X1 = BMI
X2 = Age
X3 = Smoking History
X4 = BMIAge interaction
X5 = BMISmoking History interaction
X6 = AgeSmoking History interaction
Full model: Y 0 1 X 1 2 X 2 3 X 3 4 X 4 5 X 5 6 X 6 E
H 0 : 4 = 5 = 6 = 0
Reduced model:
Y 0 1 X 1 2 X 2 3 X 3 E
36
Example: SBP Data (cont.)

ANOVA for the full model:
Source
SS
df
ANOVA for the reduced model:
MS
Source
Regression
4889.83
1629.94
Residual
1536.14
28
54.86
Regression
5092.83
848.80
Residual
1333.14
25
53.33
SS
df
MS
SS(X4, X5, X6 | X1, X2, X3) = 5092.83 4889.83 = 203.00

F ( X 4 , X 5 , X 6 | X1, X 2 , X 3 )
203.00 / 3
1.27
53.33
Under H0, F follows an F-dist with 3, 25 df

p-value > 0.25 Fail to reject H0.
The interactions taken together do not explain a significant amount of the
variability of SBP.
37
Constructing Extra SS
Suppose we have:
SS(X1)
SS(X2|X1)
SS(X3|X1, X2)
Suppose we have the full model: Y 0 1 X 1 2 X 2 3 X 3 E

and we want to test H0: 2 = 3 = 0.
So we need SS(X2, X3 | X1) which does not appear in the table.
SS(X2, X3 | X1) is the extra sum of squares explained by adding X2 and X3 to

the model given X1 already in the model.
SS(X2, X3 | X1) = SS(X1, X2, X3) SS(X1)
38
Rewriting the Extra SS

SS(X2 | X1) = SS(X1, X2) SS(X1)
SS(X3 | X1, X2) = SS(X1, X2, X3) SS(X1, X2)
Therefore,
SS(X2 | X1) + SS(X3 | X1, X2) = SS(X1, X2) SS(X1) + SS(X1, X2, X3) SS(X1, X2)
= SS(X1, X2, X3) SS(X1)
= SS(X2, X3 | X1)
39

1.2 Multiple Linear Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1.2 Multiple Linear Regression

Uploaded by

Copyright:

Available Formats

1.

Multiple Linear Regression

Examine the relationship between a set of independent variables and a

Uses of multiple linear regression:

Dependent variable: Systolic blood pressure

Carry out a clinical trial to examine the effectiveness of a drug to treat

Example of an interaction between drug and age. Differential effect of

Multiple linear regression may be used to assess the degree of interaction

Dependent variable: Change in DBP

Multiple Linear Regression Model

where 0, 1,, k are regression coefficients to be estimated and E is

E has a distribution and for testing hypotheses we need to make an

Assumptions for MLR

combination of values of the Xs.

2. The observations of Y are statistically independent of each other.

For hypothesis testing, we need one more assumption:

Estimating with Least Squares

This quantity defines the error sum of squares denoted by SSE. It is

The difference between the observed and predicted value of Y Y yields

is called the residual.

Computing the Parameter Estimates

Details for computing the s will NOT be discussed.

Requires matrix algebra and calculus.

ANOVA Table for MLR

SSY SSE SSE

If R2 = 1, we have a perfect fit. The model explains all of the variability.

If R2 = 0, the model explains none of the variability.

Testing Hypotheses in MLR

2. Test for addition of a single variable: Given a set of independent variables in

3. Test for the addition of a group of variables: Given a set of independent

Assess the relationship of a set of behavioral variables measuring stress on DBP

Nested Models: The Full

A full model contains all of the variables of interest.

SBP 0 1 (BMI) 2 (Age) 3 (Smoking History) E

Nested Models: The Reduced

If H0 is true, then the most appropriate model is

This is the reduced model when H0 is true.

Testing H0 is equivalent to testing which of the two models is most

Note that no new variables are introduced in the reduced model.

Test for Overall Regression

The full model:

Have k independent variables.

Three ways of stating the same null hypothesis

The reduced model:

The Test Statistic

Use the F statistic from the ANOVA table

When H0 is true F ~ F-dist with k and n-k-1 degrees of freedom

Reject H0 for large values of F

Decision rule: Reject H0 if F > Fk, n-k-1,

Under H0, F follows F-dist with k = 3 and n-k-1 = 28 df.

The Partial F Test

The regression sum of squares must be partitioned into components that

The sum of squares explained by using only X1 in the model.

So given the full model:

H0: 2 = 0 [Age is not related to SBP after adjusting for BMI.]

The extra sum of squares explained by adding Smoking history to the

General Partial F Test

H0: X* is not significantly related to Y controlling for X1, X2, , Xp.

Construction of the Test

We also need the MSRes for the full model:

The statistic follows an F-dist with 1 and n-p-2 df

Reject the H0 if F(X*| X1, X2, , Xp) > F1, n-p-2,

From the table:

Since F1,28,0.999 = 13.5 p-value < 0.001