05-ML (Linear Regression)

Machine Learning
Dr. Faraz Akram
Riphah International University

Linear Regression
What is Linear Regression?
Linear regression tries to find the best line (curve) to fit
the data
It Assumes that there is approximately a linear
relationship between X and Y
0 + 1
25
20
y = 1.5x + 4
15
10
0
0 2 4 6 8 10 12 3
Linear Regression Analysis
Regression analysis is used to predict the value of one
variable (the dependent variable) on the basis of other
variables (the independent variables).
= 0 + 1
where
= Dependent or response variable
= Independent or predictor variable
0 = y-intercept of the line
1 = slope of the line
Meaning of 0 and 1

= 0 + 1
1 =slope
0 = /
6
Height vs
Weight
Height(cm) Weight(kg) Height (cm) vs Weight (kg)
100
y = 0.5x + 0.1
100 50.1 90
110 55.1 80
120 60.1 70
130 65.1 60
140 70.1 50
150 75.1 40
160 80.1 30
90 110 130 150 170 190 210
170 85.1
180 90.1
190 95.1
7
Simple Linear regression
When we have a single input attribute () and we want to use
linear regression, this is called simple linear regression.
Dataset
x versus y
60
50
X Y
40
30
1 0
20
2 30
10
4 30 0
3 20 0 1 2 3 4 5 6
5 50
8
Thinking Challenge
How would you draw a line through the points?
x versus y
60
50
40
30
20
10
0
0 1 2 3 4 5 6
9
Thinking Challenge
How would you draw a line through the points?
x versus y
60
50
40
30
20
10
0
0 1 2 3 4 5 6
How do you determine which line fits best?

10
Thinking Challenge
Least Square approach: The line that minimizes
the sum squared error between the line and points
60
= 2 = 1 2 + 2 2 + 3 2 + 4 2 + 5 2

50
5
40
4 y = 10x - 4
30
2 3 This line minimizes the

20 sum square error(SSE)
between points & line
10
But where did the line equation come from?
1 How did we get slope 1 = 10 and intercept 0 = 4
0
0 1 2 3 4 5 6
11
least squares approach
The least squares approach chooses 0 and 1 to
minimize the SSE. Using some calculus, one can show
that the minimizers are
=1
1 = 2
=1
We can calculate 0 using 1 and some statistics from our
dataset, as follows:
0 = 1
12
Example
Dataset x versus y
60
50
X Y 40
30
1 0 20
2 30 10
4 30 0
0 1 2 3 4 5 6
3 20
5 50
13
Estimating parameters ( and 1 )
2
1 0 -2 4 -26 52
2 30 -1 1 4 -4
4 30 1 1 4 4
3 20 0 0 -6 0
5 50 2 4 24 48
Mean 3 26
sum 10 100
=1 0 = 1
1 =
=1( ) 2
0 = 26 10 3 = 4
100
1 = = 10
10
14
60
50
= 10 4
40
30
Predicted Y
20 y
Linear (Predicted Y)
10
0
0 1 2 3 4 5 6
15
Estimating error
= 10 4
Predicted Y
X Y

1 0 6 6 36
2 30 16 -14 196
4 30 36 6 36
3 20 26 6 36
5 50 46 -4 16
sum Square Error (SSE) 320
Root mean square error (RMSE) 8
16
Multiple Linear
Regression
17
Simple linear regression is a useful approach for
predicting a response on the basis of a single predictor
variable.
However, in practice we often have more than one
predictor
We can extend the simple linear regression model to
accommodate multiple predictors.
= 0 + 1 1 + 2 2 + +
The values 0 , 1 that minimize SSE are the multiple

least square regression coefficients
18
19
Linear Regression
using Gradient
Descent
20
The goal of all supervised machine learning
algorithms is to best estimate a target function ()
that maps input data () onto output variables
().
Different algorithms have different representations

and different coefficients, but many of them
require a process of optimization to find the set of
coefficients that result in the best estimate of the
target function.
21
Gradient Descent
Gradient descent is an optimization algorithm used to find
the values of parameters (coefficients) of a function (f) that
minimizes a cost function (cost).
Gradient descent method is a way to find a local minimum

of a function.
Gradient descent is best used when the parameters

cannot be calculated analytically (e.g. using linear algebra)
and must be searched for by an optimization algorithm.
Convex Optimization, Stephen Boyd & Lieven Vandenberghe

22
Convex functions
Convex functions look something like:
One definition: The line segment between any two points

on the function is above the function
23
Finding the minimum
Gradient Descent Method
1. Start with an initial

guess of the solution
2. Choose step size
(learning rate)
3. Step the solution in the
negative direction of the
gradient
4. Repeat
24
Gradient Descent Algorithm
Given starting point 0 and step size
Repeat
+1 =
Until stopping criterion is satisfied
Learning rate ():

The learning rate value is a small real value such as 0.1, 0.01 or 0.001
Starting point(0 ):
Any initial guess, e.g. 0 = 0
Convex Optimization, Stephen Boyd & Lieven Vandenberghe

25
Linear Regression using
Gradient Descent
= 0 + 1
Dataset
Need to optimize 0 and 1
Initial values
X Y
0 = 0
1 = 0 1 0
2 30
We can calculate Predicted values 4 30
of y using starting point coefficients 3 20
5 50
= 0.0 + 0.0 1
=0 =
= =
26
We can now use this error to update
coefficients
0 ( + 1) = 0 ()
0 + 1 = 0 0.1 1 = 0.1
Similarly
1 ( + 1) = 1 ()
1 + 1 = 0 0.1 1 = 0.1
27
60
50
40
Prediction
30
y
Linear (Prediction)
20
10
0
0 1 2 3 4 5 6
28
29
Polynomial
Regression
30
polynomial regression
Instead of straight line, we can use higher
order polynomials for regression
= 0 + 1 + 2 2 + 3 3 +
31
Predict Height from weight
72
-- Linear
70
-- Second order polynomial
68
-- Fourth order polynomial
-- Sixth order polynomial
Height (Inches)
66
64
62
60
80 100 120 140 160 180 200
Weight (pounds)
32

05-ML (Linear Regression)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05-ML (Linear Regression)

Uploaded by

Copyright:

Available Formats

Machine Learning

Dr. Faraz Akram

Riphah International University

How do you determine which line fits best?

2 3 This line minimizes the

The values 0 , 1 that minimize SSE are the multiple

Different algorithms have different representations

Gradient descent method is a way to find a local minimum

Gradient descent is best used when the parameters

Convex Optimization, Stephen Boyd & Lieven Vandenberghe

Convex functions look something like:

One definition: The line segment between any two points

1. Start with an initial

Learning rate ():

Convex Optimization, Stephen Boyd & Lieven Vandenberghe

You might also like