Fu Ch11 Linear Regression

Chapter 11
Regression and Correlation

methods
EPI 809/Spring 2008 1

Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Understand and check model assumptions
6. Predict Response Variable
7. Comments of SAS Output
Learning Objectives…
8. Correlation Models
9. Link between a correlation model and a
regression model
10. Test of coefficient of Correlation

Models

What is a Model?
1. Representation of
Some Phenomenon
Non-Math/Stats Model

What is a Math/Stats Model?
1. Often Describe Relationship between
Variables
2. Types
- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)

Deterministic Models
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of
body fat based
 Metric Formula: BMI = Weight in Kilograms

(Height in Meters)2
 Non-metric Formula: BMI = Weight (pounds)x703

(Height in inches)2

Probabilistic Models
1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns
Is 6 Times the Age in days + Random Error
• SBP = 6xage(d) + 
• Random Error May Be Due to Factors
Other Than age in days (e.g. Birthweight)

Types of
Probabilistic
Models
Regression Correlation Other

Models Models Models

Regression Models

Types of
Probabilistic
Models
Regression Correlation Other

Models Models Models

Regression Models
 Relationship between one dependent

variable and explanatory variable(s)
 Use equation to set up relationship
• Numerical Dependent (Response) Variable
• 1 or More Numerical or Categorical Independent
(Explanatory) Variables
 Used Mainly for Prediction & Estimation

Regression Modeling Steps
 1. Hypothesize Deterministic Component
• Estimate Unknown Parameters
 2. Specify Probability Distribution of
Random Error Term
• Estimate Standard Deviation of Error
 3. Evaluate the fitted Model
 4. Use Model for Prediction & Estimation

Model Specification

Specifying the deterministic
component
 1. Define the dependent variable and

independent variable
 2. Hypothesize Nature of Relationship

 Expected Effects (i.e., Coefficients’ Signs)
 Functional Form (Linear or Non-Linear)
 Interactions

Model Specification
Is Based on Theory
 1. Theory of Field (e.g., Epidemiology)

 2. Mathematical Theory
 3. Previous Research
 4. ‘Common Sense’

Thinking Challenge:
Which Is More Logical?
CD+ counts CD+ counts
Years since seroconversion Years since seroconversion

CD+ counts CD+ counts
Years since seroconversion Years since seroconversion

OB/GYN Study

Types of
Regression Models

Types of
Regression Models
Regression
Models

Types of
Regression Models
1 Explanatory Regression
Variable Models
Simple

Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables
Simple Multiple

Types of
Regression Models
Simple Multiple
Linear

Types of
Regression Models
Simple Multiple
Non-
Linear
Linear

Types of
Regression Models
Simple Multiple
Non-
Linear Linear
Linear

Types of
Regression Models
Simple Multiple
Non- Non-
Linear Linear
Linear Linear

Linear Regression
Model

Types of
Regression Models
Simple Multiple
Non- Non-
Linear Linear
Linear Linear

Linear Equations
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X
© 1984-1994 T/Maker Co.

Linear Regression Model
 1. Relationship Between Variables Is a

Linear Function
Population Population Random
Y-Intercept Slope Error
Yi   0  1X i   i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Population & Sample
Regression Models

Population & Sample
Regression Models
Population
 


Population & Sample
Regression Models
Population
Unknown
Relationship 
Yi   0  1X i   i
 


Population & Sample
Regression Models
Population Random Sample
Unknown
Relationship 
Yi   0  1X i   i 

 


Population & Sample
Regression Models
Population Random Sample
Unknown
 
Yi   0   1X i   i
Relationship 
Yi   0  1X i   i 

 


Population Linear Regression
Model
Y Yi   0  1X i   i Observed
value
i = Random error
E Y   0  1 X i
X
Observed value
Sample Linear Regression
Model
Y  
Yi   0   1X i   i
î = Random
error
Unsampled
observation
  
Yi   0   1X i
X
Observed value
Estimating Parameters:
Least Squares Method

Scatter plot
 1. Plot of All (Xi, Yi) Pairs
 2. Suggests How Well Model Will Fit
Y
60
40
20
0 X
0 20 40 60

Thinking Challenge
How would you draw a line through the

points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

Thinking Challenge
‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
Thinking Challenge
‘fits best’?
Slope unchanged
Y
60
40
20
0 X
0 20 40 60
Intercept changed
Thinking Challenge
‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
Least Squares
 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative ones

Least Squares
Actual Y Values & Predicted Y Values is a
Minimum. But Positive Differences Off-Set
Negative ones. So square errors!
    ˆ
n n
Yi  Yî
2
2
i
i 1 i 1

Least Squares
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative. So square errors!
    ˆ
n n
Yi  Yî
2
2
i
i 1 i 1
 2. LS Minimizes the Sum of the Squared

Differences (errors) (SSE)
Least Squares Graphically
n
LS minimizes   i   1   2   3   4
 2  2  2  2  2
i 1
Y Y2   0   1X 2   2
^4
^2
^1 ^3
  
Yi   0   1X i
X
Coefficient Equations
 Prediction equation
yî  ˆ0  ˆ1xi
 Sample slope
SS xy   xi  x  yi  y 
ˆ1  
SS xx  i x  x 2
 Sample Y - intercept
ˆ0  y  ˆ1x
Derivation of Parameters (1)
 Least Squares (L-S):
Minimize squared error
n n
    yi  0  1 xi 
2 2
i
i 1 i 1
     yi   0  1 xi 
2 2
0 i

0 0
 2  ny  n0  n1 x 
ˆ0  y  ˆ1x
Derivation of Parameters (1)
 Least Squares (L-S):
Minimize squared error
   i2    yi   0  1 xi 
2
0 
1 1
 2 xi  yi   0  1 xi 
 2 xi  yi  y  1 x  1 xi 
1  xi  xi  x    xi  yi  y 
1   xi  x  xi  x     xi  x  yi  y 
ˆ SS xy
1 
SS xx

Computation Table
2 2
Xi Yi Xi Yi XiYi
2 2
X1 Y1 X1 Y1 X1Y1
2 2
X2 Y2 X2 Y2 X2Y2
: : : : :
2 2
Xn Yn Xn Yn XnYn
Xi Yi Xi2
Yi2
XiYi
Interpretation of Coefficients

^
 1. Slope (1)
^
 Estimated Y Changes by 1 for Each 1 Unit
Increase in X
• If ^1 = 2, then Y Is Expected to Increase by 2 for
Each 1 Unit Increase in X

^
 1. Slope (1)
 Estimated Y Changes by ^1 for Each 1 Unit
Increase in X
^
• If  = 2, then Y Is Expected to Increase by 2 for
1
Each 1 Unit Increase in X
^
 2. Y-Intercept (0)
 Average Value of Y When X = 0
• If ^0 = 4, then Average Y Is Expected to Be
4 When X Is 0

Parameter Estimation Example
 Obstetrics: What is the relationship between
Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

Scatterplot
Birthweight vs. Estriol level
Birthweight
4
3
2
1
0
0 1 2 3 4 5 6
Estriol level

Parameter Estimation Solution
Table
Xi Yi Xi2 Yi2 XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
  
n n
  X i   Yi 
n
  i 1  1510 

i 1
X Y
i i  37 
n
ˆ1  i 1
 5  0.70

n

2
15
2
  i X 55 
n
  5

i 1
X i
2

i 1 n
ˆ0  Y  ˆ1 X  2  0.70 3  0.10

Coefficient Interpretation
Solution

Solution
^
 1. Slope (1)
 Birthweight (Y) Is Expected to Increase by .7
Units for Each 1 unit Increase in Estriol (X)

Solution
^
 1. Slope (1)
 Birthweight (Y) Is Expected to Increase by .7
Units for Each 1 unit Increase in Estriol (X)
^
 2. Intercept (0)
 Average Birthweight (Y) Is -.10 Units When
Estriol level (X) Is 0
• Difficult to explain
• The birthweight should always be positive

SAS codes for fitting a simple linear
regression
 Data BW; /*Reading data in SAS*/
 input estriol birthw@@;
 cards;
 1 1 2 1 3 2
4 2 5 4
 ;
 run;
 PROC REG data=BW; /*Fitting linear regression

models*/
 model birthw=estriol;
 run;
Parameter Estimation
SAS Computer Output
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -0.10000 0.63509 -0.16 0.8849

Estriol 1 0.70000 0.19149 3.66 0.0354
^0 ^
 1

Parameter Estimation Thinking
Challenge
 You’re a Vet epidemiologist for the county
cooperative. You gather the following data:
 Food (lb.) Milk yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
© 1984-1994 T/Maker Co.
 What is the relationship
between cows’ food intake and milk yield?

Scattergram
Milk Yield vs. Food intake*
M. Yield (lb.)
10
8
6
4
2
0
0 5 10 15
Food intake (lb.)

Table*
2 2
Xi Yi Xi Yi XiYi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218

Parameter Estimation Solution*
  
n n
  X i   Yi 

n
X Y   i 1  i 1 
218 
32 24 
i i
n
ˆ1  i 1
 4  0.65

n

2
32 
2
  i X 296 
n
  4

i 1
X i
2

i 1 n
ˆ0  Y  ˆ1 X  6  0.658  0.80

Solution*

Solution*
^
 1. Slope (1)
 Milk Yield (Y) Is Expected to Increase by .65
lb. for Each 1 lb. Increase in Food intake (X)

Solution*
^
 1. Slope (1)
 Milk Yield (Y) Is Expected to Increase by .65
lb. for Each 1 lb. Increase in Food intake (X)
 2. Y-Intercept (0)
^
 Average Milk yield (Y) Is Expected to Be 0.8
lb. When Food intake (X) Is 0

Fu Ch11 Linear Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fu Ch11 Linear Regression

Uploaded by

Copyright:

Available Formats

Chapter 11

Regression and Correlation

EPI 809/Spring 2008 1

EPI 809/Spring 2008 3

EPI 809/Spring 2008 4

EPI 809/Spring 2008 5

- Probabilistic Models (with randomness)

EPI 809/Spring 2008 6

 Metric Formula: BMI = Weight in Kilograms

 Non-metric Formula: BMI = Weight (pounds)x703

EPI 809/Spring 2008 7

EPI 809/Spring 2008 8

Regression Correlation Other

EPI 809/Spring 2008 9

EPI 809/Spring 2008 10

Regression Correlation Other

EPI 809/Spring 2008 11

 Relationship between one dependent

EPI 809/Spring 2008 12

EPI 809/Spring 2008 13

EPI 809/Spring 2008 14

 1. Define the dependent variable and

 2. Hypothesize Nature of Relationship

EPI 809/Spring 2008 15

 1. Theory of Field (e.g., Epidemiology)

EPI 809/Spring 2008 16

Years since seroconversion Years since seroconversion

Years since seroconversion Years since seroconversion

EPI 809/Spring 2008 17

EPI 809/Spring 2008 18

EPI 809/Spring 2008 19

EPI 809/Spring 2008 20

EPI 809/Spring 2008 21

EPI 809/Spring 2008 22

EPI 809/Spring 2008 23

EPI 809/Spring 2008 24

EPI 809/Spring 2008 25

EPI 809/Spring 2008 26

EPI 809/Spring 2008 27

EPI 809/Spring 2008 28

© 1984-1994 T/Maker Co.

EPI 809/Spring 2008 29

 1. Relationship Between Variables Is a

EPI 809/Spring 2008 31

EPI 809/Spring 2008 38

EPI 809/Spring 2008 39

How would you draw a line through the

EPI 809/Spring 2008 40

EPI 809/Spring 2008 44

EPI 809/Spring 2008 45

 2. LS Minimizes the Sum of the Squared

EPI 809/Spring 2008 50

EPI 809/Spring 2008 52

EPI 809/Spring 2008 53

EPI 809/Spring 2008 54

EPI 809/Spring 2008 55

EPI 809/Spring 2008 56

ˆ0  Y  ˆ1 X  2  0.70 3  0.10

EPI 809/Spring 2008 59

EPI 809/Spring 2008 60

EPI 809/Spring 2008 61

 PROC REG data=BW; /*Fitting linear regression

Intercept 1 -0.10000 0.63509 -0.16 0.8849

EPI 809/Spring 2008 63