You are on page 1of 30

Linear Regression Model

Daniel Romero Rodriguez


2018

Probability Distributions 1
Predictive Models
Predictive modeling is a process that uses data mining and
probability to forecast outcomes. Each model is made up of a
number of predictors, which are variables that are likely to
influence future results.

Probability Distributions 2
Regression and Classification Problems

Probability Distributions 3
Classification Problems
Classification is the process of predicting the class of given
data points. Classes are sometimes called as targets/ labels or
categories
Examples: Spam detection, Medical diagnosis, Fraud
detection, credit scoring, image detection, etc…

Probability Distributions 4
Classification Models
Logistic Regression
Linear Discriminant Analysis (LDA)
Classification Trees
K Nearest Neighbors

5
Linear Regression Model
Problems involving sets of variables when it is known that
there exists some inherent relationship among the variables
can be solved by regression models.
Examples: House Price based on key features, gas milage
based on engine capacity, epidemiology, economics.

Model can be linear or nonlinear depending of the equation


structure and relationship between dependent and
independent variables.

Probability Distributions 6
Regression Models
Regression Trees Linear Regression Model

Probability Distributions 7
Linear Regression Model
A simple linear regression model has the following structure:

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜀
𝛽0 is the intercept
𝛽1 is the slope. Y change per unit of X change
𝑌 is the dependent variable. The variable to be predicted
X is the indepedent variable. Can be controlled, is not random.
𝜀 is the perturbation. E[𝜀]=0
Probability Distributions 8
Linear Regression Model
• The objective is to estimate β0 and β1 based on a dataset with pairs
x i , yi

• b0 and b1 are estimations of β0 and β1

𝑦ො = 𝑏0 + 𝑏1 𝑥

Probability Distributions 9
Linear Regression Model Steps
• 1. Dispersion diagram
• 2. Estimate 𝑏0 and 𝑏1
• 3. Validate model with analysis of variance (ANOVA)
• 4. Determination coefficient Analysis (R2)
• 5. Residuals Analysis (Normality, Homoscedasticity and
independence)
• 6. Confidence intervals of coefficients 𝑏0 and 𝑏1
• 7. Confidence intervals of 𝑌෠ and prediction interval of Y
Probability Distributions 10
Step 1: Dispersion Diagram

Probability Distributions 11
Step 2: Estimate 𝑏0 and 𝑏1
We shall find b0 and b1, the estimates of β0 and β1 , so that the sum of the
squares of the residuals is a minimum. The residual sum of squares is often
called the sum of squares of the errors about the regression line and is
denoted by SSE.

σ 𝑦𝑖 − 𝑏 σ 𝑥𝑖 𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑏0 = = 𝑦ത − 𝑏1 𝑥ҧ 𝑏1 =
𝑛 𝑛 σ 𝑥 2 − σ 𝑥𝑖 2

Probability Distributions 12
Example Errors

Probability Distributions 13
Example
A company assigns different prices to an electronic product. The following
table shows the product sales for different prices.

Sales 400 440 380 450 420 420 380 350


Price 60 50 65 45 50 55 60 65

Develop a dispersion diagram and compute a linear regression model.

Probability Distributions 14
Step 3: Model Validation ANOVA
The analysis of variance splits the variation of Y in different
components (Regression model and error)

The idea is that the dependent variable Y is explained by 𝛽0 +


𝛽1 𝑋 , and the error has small influence on the variation.

• 𝐻0 : 𝛽1 = 0
• 𝐻1 : 𝛽1 ≠ 0
Probability Distributions 15
Step 3: Model Validation ANOVA

𝑛 𝑛 𝑛
2 2 2
෍ 𝑦𝑖 − 𝑦ത = ෍ 𝑦ො𝑖 − 𝑦ത +෍ 𝑦𝑖 − 𝑦ො𝑖
𝑖=1 𝑖=1 𝑖=1

Sum of Square Sum of Square


Sum of Square Total(𝑆𝑆𝑇) = Regression(𝑆𝑆𝑅) + Error(𝑆𝑆𝐸)

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜀

Probability Distributions 16
Step 3: Model Validation ANOVA
• 𝑆𝑆𝑇: Total Variability of 𝑌.
• 𝑆𝑆𝑅: Variability of 𝑌 explained by the regression model.
• 𝑆𝑆𝐸: Variability of 𝑌 explained by the error.

𝑆𝑦𝑦 = ෍ 𝑦𝑖 − 𝑦ത 2
𝑆𝑆𝑇 = 𝑆𝑦𝑦

𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑆𝑆𝑅 = 𝑏1 𝑆𝑥𝑦 𝑆𝑥𝑦 =
𝑛
𝑛 σ 𝑥𝑖2 − σ 𝑥𝑖 2
𝑆𝑆𝐸 = 𝑆𝑦𝑦 − 𝑏1 𝑆𝑥𝑦 𝑆𝑥𝑥 =
𝑛
Probability Distributions 17
Step 3: Model Validation ANOVA
Source SS DF MS 𝒇𝑻𝒆𝒔𝒕
𝑆𝑆𝑅ൗ
𝑆𝑆𝑅 1
Regression 𝑆𝑆𝑅 1 𝑀𝑆𝑅 =
1 𝑆𝑆𝐸ൗ
𝑛−2
𝑆𝑆𝐸
Error 𝑆𝑆𝐸 𝑛−2 𝑀𝑆𝐸 =
𝑛−2
Total 𝑆𝑆𝑇 𝑛−1

Probability Distributions 18
Step 3: Model Validation ANOVA

𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0

If 𝐻0 is not rejected then the model is not valid to predict Y


If 𝐻0 is rejected then the model explains the variability of Y,
therefore it is an appropiate model.
When 𝑓𝑇𝑒𝑠𝑡 is large then the 𝐻0 is rejected in favor of 𝐻1 , which
means that 𝛽1 ≠ 0

Probability Distributions 19
Step 4: Coeficient of Determination

The coefficient of determination, denoted R2 is the proportion of


the variance in the dependent variable that is predictable from the
independent variable(s)

2
𝑆𝑆𝐸
𝑅 =1−
𝑆𝑆𝑇

Probability Distributions 20
Example
Y X
The grades of a class of 9 students on a midterm 82 77
report (x) and on the final test (y) are as follow: 66 50
(a) Estimate the linear regression line. 78 71
(b) Carry out an ANOVA 34 40
(b) Estimate the final examination grade of a 47 46
student who received a grade of 85 on the midterm 85 90
report. 99 96
99 99
68 67
Probability Distributions 21
Problem: Simple Linear Regression
A study was made on the amount of converted Converted Sugar Temperature
sugar in a certain process at various temperatures. 7.7 1
The data were coded and recorded as follows: 7.8 1.1
(a) Estimate the linear regression line. 8.2 1.2
8.4 1.3
(b) Estimate the mean amount of converted sugar 8.8 1.4
produced when the coded temperature is 1.75. 8.9 1.5
8.6 1.6
9 1.7
9.3 1.8
9.2 1.9
10.5 2
Probability Distributions 22
Step 5: Residuals Analysis- Normality test

•𝐻0 :Residuals are normally distributed with a mean 𝜇 = 0 and 𝜎 2


2 𝑆𝑆𝐸
estimated as 𝑠 = .
𝑛−2
•𝐻1 : Residuals are not normally distributed

The residuals are estimated as 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 . A goodness of fit test


is carried out to evaluate the hypothesis.

Probability Distributions 23
Step 5: Residuals Analysis- Homogeneity of Variance
o𝐻0 : Residuals have homogeneous variance.
o𝐻1 : : Residuals have heteregoneous variance.
Plot the standardize errors versus 𝑦ො or x and evaluate if there are
patters or variance changes

Probability Distributions 24
Step 5: Residuals Analysis- Independence
Residuals independence can be tested analyzing the residuals plot or
using a statistical test such as Durbin-Watson

Probability Distributions 25
Step 5: Residuals Analysis- Independence
The Durbin-Watson is estimated using the residuals ei

The statistic will be between 0 ≤ 𝐷𝑤 ≤ 4. When 𝐷𝑤 is close to 2 the residuals are assumed
independent. For a given significance level the critical values dl and du are found from the
table. The are based on the following guidelines:
Si 0≤ 𝐷𝑤 ≤ 𝑑l, Negative correlation
Si 𝑑𝑙≤ 𝐷𝑤 ≤ 𝑑𝑢, inconclusive
Si 𝑑𝑢≤ 𝐷𝑤 ≤ 4−𝑑𝑢 residuals are independent.
Si 4−𝑑𝑢≤ 𝐷𝑤 ≤4−𝑑𝑙, inconclusive
Si 4−𝑑𝑙≤ 𝐷𝑤 ≤4, Positive correlation

Probability Distributions 26
Step 6: Inference of 𝛽0 and 𝛽1
The estimations 𝑏0 and 𝑏1 of 𝛽0 and 𝛽1 are values of the random
variables 𝐵0 and 𝐵1 .
𝐸 𝐵0 = 𝛽0 and 𝐸 𝐵1 = 𝛽1 .
The variances 𝐵0 and 𝐵1 are:
2 2
1 𝑥 ҧ 𝑠
𝑉 𝐵0 = 𝑠 2 + 𝑉 𝐵1 =
𝑛 𝑆𝑥𝑥 𝑆𝑥𝑥
2 2 𝑆𝐶𝐸
Where 𝑠 is given by the following equation: 𝑠 = .
𝑛−2

Probability Distributions 27
Step 6: Inference of 𝛽0 and 𝛽1
The confidence intervals at 1 − 𝛼 100% of 𝛽0 and 𝛽1 are:

𝑏0 − 𝑡𝑛−2,𝛼ൗ 𝑉 𝐵0 ≤ 𝛽0 ≤ 𝑏0 + 𝑡𝑛−2,𝛼ൗ 𝑉 𝐵0
2 2

𝑏1 − 𝑡𝑛−2,𝛼ൗ 𝑉 𝐵1 ≤ 𝛽1 ≤ 𝑏1 + 𝑡𝑛−2,𝛼ൗ 𝑉 𝐵1
2 2

Probability Distributions 28
Step 7: Inference of 𝑌
The confidence interval at 1 − 𝛼 100% of 𝑌 is:

𝑦ො0 − 𝑡𝑛−2,𝛼ൗ 𝑉 𝑌෠ ≤ 𝜇𝑌Τ𝑋=𝑥𝑜 ≤ 𝑦ො0 + 𝑡𝑛−2,𝛼ൗ 𝑉 𝑌෠


2 2
Where,
2
1 𝑥𝑜 − 𝑥ҧ
𝑉 𝑌෠ = 𝑠 2 +
𝑛 𝑆𝑥𝑥

Probability Distributions 29
Step 7: Inference of 𝑌
The prediction interval at 1 − 𝛼 100% of 𝑌 𝑖𝑠:

𝑦ො0 − 𝑡𝑛−2,𝛼ൗ 𝑉 𝑌 ≤ 𝑌 ≤ 𝑦ො0 + 𝑡𝑛−2,𝛼ൗ 𝑉 𝑌


2 2

Where,
2
1 𝑥𝑜 − 𝑥ҧ
𝑉 𝑌 = 𝑠2 1+ +
𝑛 𝑆𝑥𝑥

Probability Distributions 30

You might also like