Linear Models

Linear Models
1
Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
4. Compute Regression Coefficients
5. Understand and check model assumptions
6. Predict Response Variable
7. Comments of Python Output
2
Models
3
What is a Model?
Representation of Some Phenomenon
Non-Math/Stats Model
4
What is a Math/Stats Model?
1. Often Describe Relationship between Variables
2. Types
- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)
5
Deterministic Models
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of
body fat based
• Metric Formula: BMI = Weight in Kilograms

(Height in Meters)2
• Non-metric Formula: BMI = Weight (pounds)x703

(Height in inches)2
6
Probabilistic Models
1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns Is
6 Times the Age in days + Random Error
• SBP = 6xage(d) + 
• Random Error May Be Due to Factors Other
Than age in days (e.g. Birthweight)
7
Types of
Probabilistic Models
Probabilistic
Models
Regression Correlation Other

Models Models Models
8
Regression Models
9
Regression Models
• Relationship between one dependent variable and
explanatory variable(s)
• Use equation to set up relationship
• Numerical Dependent (Response) Variable
• 1 or More Numerical or Categorical Independent (Explanatory)
Variables
• Used Mainly for Prediction & Estimation
10
Regression Modeling Steps
• 1. Hypothesize Deterministic Component
• Estimate Unknown Parameters
• 2. Specify Probability Distribution of Random
Error Term
• Estimate Standard Deviation of Error
• 3. Evaluate the fitted Model
• 4. Use Model for Prediction & Estimation
11
Model Specification
12
Specifying the deterministic
component
• 1. Define the dependent variable and
independent variable
• 2. Hypothesize Nature of Relationship

• Expected Effects (i.e., Coefficients’ Signs)
• Functional Form (Linear or Non-Linear)
• Interactions
EPI 809/Spring 2008 13

Model Specification
Is Based on Theory
• 1. Theory of Field (e.g., Epidemiology)
• 2. Mathematical Theory
• 3. Previous Research
• 4. ‘Common Sense’

Thinking Challenge:
Which Is More Logical?
CD+ counts CD+ counts
Years since seroconversion Years since seroconversion

CD+ counts CD+ counts
Years since seroconversion Years since seroconversion

OB/GYN Study

Types of
Regression Models

Types of
Regression Models
Regression
Models

Types of
Regression Models
1 Explanatory Regression
Variable Models
Simple

Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables
Simple Multiple

Types of
Regression Models
Simple Multiple
Linear

Types of
Regression Models
Simple Multiple
Non-
Linear
Linear

Types of
Regression Models
Simple Multiple
Non-
Linear Linear
Linear

Types of
Regression Models
Simple Multiple
Non- Non-
Linear Linear
Linear Linear

Linear Regression Model

Types of
Regression Models
Simple Multiple
Non- Non-
Linear Linear
Linear Linear

Linear Equations
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X
© 1984-1994 T/Maker Co.

Linear Regression Model
• 1. Relationship Between Variables Is a Linear
Function
Population Population Random

Y-Intercept Slope Error
Y i   0  1 X i   i
Dependent Independent (Explanatory)
(Response) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Population & Sample Regression
Models

Models
Population
 


Models
Population
Unknown
Relationship 
Yi   0  1X i   i
 


Models
Population Random Sample
Unknown
Relationship 
Yi   0  1X i   i 

 


Models
Population Random Sample
Unknown
 
Yi   0   1X i   i
Relationship 
Yi   0  1X i   i 

 


Population Linear Regression Model
Yi   0   1 X i   i Observedv
alue
Y
i = Random error
E YX    0  1 X i
Observed value
Sample Linear Regression Model
 
Yi   0   1X i   i
Y
î = Random
error
Unsampled
observation
  
Yi   0   1X i
X
Observed value
Estimating Parameters:
Least Squares Method

Scatter plot
• 1. Plot of All (Xi, Yi) Pairs
• 2. Suggests How Well Model Will Fit
Y
60
40
20
0 X
0 20 40 60
Thinking Challenge
How would you draw a line through the points?

How do you determine which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
Thinking Challenge
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
Thinking Challenge
Slope unchanged
Y
60
40
20
0 X
0 20 40 60
Intercept changed
Thinking Challenge
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
Least Squares
• 1. ‘Best Fit’ Means Difference Between Actual Y
Values & Predicted Y Values Are a Minimum. But
Positive Differences Off-Set Negative ones

Least Squares
Values & Predicted Y Values is a Minimum. But
Positive Differences Off-Set Negative ones. So
square errors!
    ˆ
n n
Yi  Yî
2
2
i
i 1 i 1

Least Squares
Values & Predicted Y Values Are a Minimum. But
Positive Differences Off-Set Negative. So square
errors!
    ˆ
n n
Yi  Yî
2
2
i
i 1
• 2. LS Minimizes the Sum ofi the
1 Squared
Differences (errors) (SSE)

Least Squares Graphically
n
LS minimizes   i   1   2   3   4
 2  2  2  2  2
i 1
Y Y2   0   1X 2   2
^4
^2
^1 ^3
  
Yi   0   1X i
X
Coefficient Equations
• Prediction equation
yî  ˆ0  ˆ1xi
• Sample slope
SS xy   xi  x  yi  y 
ˆ1  
SS xx  i x  x 2
• Sample Y - intercept
ˆ0  y  ˆ1x
Derivation of Parameters (1)
• Least Squares (L-S):
Minimize squared error
n n
    yi  0  1 xi 
2 2
i
i 1 i 1
     yi   0  1 xi 
2 2
0 i

0 0
 2  ny  n0  n1 x 
ˆ0  y  ˆ1x
Derivation of Parameters (1)
• Least Squares (L-S):
Minimize squared error
   i2    yi   0  1 xi 
2
0 
1 1
 2 xi  yi  0  1 xi 
 2 xi  yi  y  1 x  1 xi 
1  xi  xi  x    xi  yi  y 
1   xi  x  xi  x     xi  x  yi  y 
ˆ SS xy
1 
SS xx
Computation Table
2 2
Xi Yi Xi Yi XiYi
2 2
X1 Y1 X1 Y1 X1 Y1
2 2
X2 Y2 X2 Y2 X2 Y2
: : : : :
2 2
Xn Yn Xn Yn Xn Yn
Xi Yi Xi2
Yi2
XiYi

Interpretation of Coefficients

^
• 1. Slope (1)
^
• Estimated Y Changes by 1 for Each 1 Unit Increase in X
• If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit
Increase
^ in X

^
• 1. Slope (1)
^
• Estimated Y Changes by 1 for Each 1 Unit Increase in X
• If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit
Increase
^ in X
• 2. Y-Intercept (0)
• Average Value of Y ^
When X = 0
• If 0 = 4, then Average Y Is Expected to Be 4
When X Is 0
^

Parameter Estimation Example
• Obstetrics: What is the relationship between
Mother’s Estriol level & Birthweight using the following
data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

Scatterplot
Birthweight vs. Estriol level
Birthweight
4
3
2
1
0
0 1 2 3 4 5 6
Estriol level

Parameter Estimation Solution Table
Xi Yi Xi2 Yi2 XiYi

1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Parameter Estimation Solution
 X i   Yi 
n n
 
n
X iYi    i 1  1510

i 1
37 
n
ˆ1  i 1
 5  0.70

n

2
15
2
  Xi 55 
5
X i2   
n

i 1
i 1 n
ˆ0  Y  ˆ1 X  2  0.70 3  0.10

Coefficient Interpretation Solution

^
• 1. Slope (1)
• Birthweight (Y) Is Expected to Increase by .7 Units for
Each 1 unit Increase in Estriol (X)

^
• 1. Slope (1)
• Birthweight (Y) Is Expected to Increase by .7 Units for
Each 1 unit Increase in Estriol (X)
• 2. Intercept (0^)
• Average Birthweight (Y) Is -.10 Units When Estriol level
(X) Is 0
• Difficult to explain
• The birthweight should always be positive

SAS codes for fitting a simple linear
regression
• Data BW; /*Reading data in SAS*/
• input estriol birthw@@;
• cards;
• 1 1 2 1 3 2
4 2 5 4
•;
• run;
• PROC REG data=BW; /*Fitting linear regression models*/

• model birthw=estriol;
• run;

Parameter Estimation
SAS Computer Output
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -0.10000 0.63509 -0.16 0.8849

Estriol 1 0.70000 0.19149 3.66 0.0354
^
 ^
0 1

Parameter Estimation Thinking
Challenge
• You’re a Vet epidemiologist for the county
cooperative. You gather the following data:
• Food (lb.) Milk yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
• What is the relationship © 1984-1994 T/Maker Co.
between cows’ food intake and milk yield?

Scattergram
Milk Yield vs. Food intake*
M. Yield (lb.)
10
8
6
4
2
0
0 5 10 15
Food intake (lb.)

Parameter Estimation Solution Table*
2 2
Xi Yi Xi Yi XiYi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218

Parameter Estimation Solution*
 X i   Yi 
n n
 

n
X iYi  
i 1  i 1 
218 
3224
n
ˆ1  i 1
 4  0.65

n

2
32 
2
  Xi 296 
4
X i2   
n

i 1
i 1 n
ˆ0  Y  ˆ1 X  6  0.658  0.80

Coefficient Interpretation
Solution*

Solution*
• 1. Slope (1)^
• Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1
lb. Increase in Food intake (X)

Solution*
• 1. Slope (1)^
• Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1
lb. Increase in Food intake (X)
• 2. Y-Intercept (0)
• Average Milk yield (Y) Is Expected to Be 0.8 lb. When
Food intake (X) Is 0 ^

Linear Models

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Models

Uploaded by

Copyright:

Available Formats

Linear Models

Representation of Some Phenomenon

- Probabilistic Models (with randomness)

• Metric Formula: BMI = Weight in Kilograms

• Non-metric Formula: BMI = Weight (pounds)x703

Regression Correlation Other

• 2. Hypothesize Nature of Relationship

EPI 809/Spring 2008 13

EPI 809/Spring 2008 14

Years since seroconversion Years since seroconversion

Years since seroconversion Years since seroconversion

EPI 809/Spring 2008 15

EPI 809/Spring 2008 16

EPI 809/Spring 2008 17

EPI 809/Spring 2008 18

EPI 809/Spring 2008 19

EPI 809/Spring 2008 20

EPI 809/Spring 2008 21

EPI 809/Spring 2008 22

EPI 809/Spring 2008 23

EPI 809/Spring 2008 24

EPI 809/Spring 2008 25

EPI 809/Spring 2008 26

© 1984-1994 T/Maker Co.

Population Population Random

EPI 809/Spring 2008 29

EPI 809/Spring 2008 36

How would you draw a line through the points?

EPI 809/Spring 2008 42

EPI 809/Spring 2008 43

EPI 809/Spring 2008 44

EPI 809/Spring 2008 49

EPI 809/Spring 2008 50

EPI 809/Spring 2008 51

EPI 809/Spring 2008 52

EPI 809/Spring 2008 53

EPI 809/Spring 2008 54

Xi Yi Xi2 Yi2 XiYi

ˆ0  Y  ˆ1 X  2  0.70 3  0.10

EPI 809/Spring 2008 56

EPI 809/Spring 2008 57

EPI 809/Spring 2008 58

EPI 809/Spring 2008 59

• PROC REG data=BW; /*Fitting linear regression models*/

EPI 809/Spring 2008 60

Intercept 1 -0.10000 0.63509 -0.16 0.8849

EPI 809/Spring 2008 61

between cows’ food intake and milk yield?

EPI 809/Spring 2008 62

Food intake (lb.)

EPI 809/Spring 2008 63

EPI 809/Spring 2008 64

ˆ0  Y  ˆ1 X  6  0.658  0.80

EPI 809/Spring 2008 65

EPI 809/Spring 2008 66

EPI 809/Spring 2008 67

EPI 809/Spring 2008 68

You might also like

• PROC REG data=BW; /Fitting linear regression models/