You are on page 1of 31

Machine Learning

Dr. Faraz Akram

Riphah International University


Linear Regression
What is Linear Regression?
Linear regression tries to find the best line (curve) to fit
the data
It Assumes that there is approximately a linear
relationship between X and Y
0 + 1
25

20
y = 1.5x + 4
15

10

0
0 2 4 6 8 10 12 3
Linear Regression Analysis
Regression analysis is used to predict the value of one
variable (the dependent variable) on the basis of other
variables (the independent variables).

= 0 + 1
where
= Dependent or response variable
= Independent or predictor variable
0 = y-intercept of the line
1 = slope of the line
Meaning of 0 and 1


= 0 + 1
1 =slope

0 = /

6
Height vs
Weight
Height(cm) Weight(kg) Height (cm) vs Weight (kg)
100
y = 0.5x + 0.1
100 50.1 90

110 55.1 80

120 60.1 70

130 65.1 60

140 70.1 50

150 75.1 40

160 80.1 30
90 110 130 150 170 190 210
170 85.1
180 90.1
190 95.1

7
Simple Linear regression
When we have a single input attribute () and we want to use
linear regression, this is called simple linear regression.

Dataset
x versus y
60
50
X Y
40
30
1 0
20
2 30
10
4 30 0
3 20 0 1 2 3 4 5 6

5 50
8
Thinking Challenge
How would you draw a line through the points?

x versus y
60
50
40
30
20
10
0
0 1 2 3 4 5 6

9
Thinking Challenge
How would you draw a line through the points?

x versus y
60
50
40
30
20
10
0
0 1 2 3 4 5 6

How do you determine which line fits best?


10
Thinking Challenge
Least Square approach: The line that minimizes
the sum squared error between the line and points
60
= 2 = 1 2 + 2 2 + 3 2 + 4 2 + 5 2

50
5

40

4 y = 10x - 4
30

2 3 This line minimizes the


20 sum square error(SSE)
between points & line
10
But where did the line equation come from?
1 How did we get slope 1 = 10 and intercept 0 = 4
0
0 1 2 3 4 5 6
11
least squares approach
The least squares approach chooses 0 and 1 to
minimize the SSE. Using some calculus, one can show
that the minimizers are

=1
1 = 2
=1
We can calculate 0 using 1 and some statistics from our
dataset, as follows:
0 = 1

12
Example

Dataset x versus y
60
50
X Y 40
30
1 0 20

2 30 10

4 30 0
0 1 2 3 4 5 6
3 20
5 50

13
Estimating parameters ( and 1 )
2

1 0 -2 4 -26 52
2 30 -1 1 4 -4
4 30 1 1 4 4
3 20 0 0 -6 0
5 50 2 4 24 48
Mean 3 26

sum 10 100

=1 0 = 1
1 =
=1( ) 2
0 = 26 10 3 = 4
100
1 = = 10
10
14
60

50
= 10 4

40

30

Predicted Y

20 y

Linear (Predicted Y)
10

0
0 1 2 3 4 5 6

15
Estimating error
= 10 4

Predicted Y
X Y

1 0 6 6 36
2 30 16 -14 196
4 30 36 6 36
3 20 26 6 36
5 50 46 -4 16
sum Square Error (SSE) 320
Root mean square error (RMSE) 8
16
Multiple Linear
Regression

17
Simple linear regression is a useful approach for
predicting a response on the basis of a single predictor
variable.
However, in practice we often have more than one
predictor
We can extend the simple linear regression model to
accommodate multiple predictors.

= 0 + 1 1 + 2 2 + +

The values 0 , 1 that minimize SSE are the multiple


least square regression coefficients
18
19
Linear Regression
using Gradient
Descent

20
The goal of all supervised machine learning
algorithms is to best estimate a target function ()
that maps input data () onto output variables
().

Different algorithms have different representations


and different coefficients, but many of them
require a process of optimization to find the set of
coefficients that result in the best estimate of the
target function.

21
Gradient Descent
Gradient descent is an optimization algorithm used to find
the values of parameters (coefficients) of a function (f) that
minimizes a cost function (cost).

Gradient descent method is a way to find a local minimum


of a function.

Gradient descent is best used when the parameters


cannot be calculated analytically (e.g. using linear algebra)
and must be searched for by an optimization algorithm.

Convex Optimization, Stephen Boyd & Lieven Vandenberghe


22
Convex functions

Convex functions look something like:

One definition: The line segment between any two points


on the function is above the function

23
Finding the minimum
Gradient Descent Method

1. Start with an initial


guess of the solution
2. Choose step size
(learning rate)
3. Step the solution in the
negative direction of the
gradient
4. Repeat

24
Gradient Descent Algorithm
Given starting point 0 and step size
Repeat
+1 =
Until stopping criterion is satisfied

Learning rate ():


The learning rate value is a small real value such as 0.1, 0.01 or 0.001

Starting point(0 ):
Any initial guess, e.g. 0 = 0

Convex Optimization, Stephen Boyd & Lieven Vandenberghe


25
Linear Regression using
Gradient Descent
= 0 + 1
Dataset
Need to optimize 0 and 1
Initial values
X Y
0 = 0
1 = 0 1 0
2 30
We can calculate Predicted values 4 30
of y using starting point coefficients 3 20
5 50
= 0.0 + 0.0 1
=0 =
= =
26
We can now use this error to update
coefficients
0 ( + 1) = 0 ()
0 + 1 = 0 0.1 1 = 0.1
Similarly

1 ( + 1) = 1 ()
1 + 1 = 0 0.1 1 = 0.1

27
60

50

40

Prediction
30
y
Linear (Prediction)

20

10

0
0 1 2 3 4 5 6

28
29
Polynomial
Regression

30
polynomial regression
Instead of straight line, we can use higher
order polynomials for regression

= 0 + 1 + 2 2 + 3 3 +

31
Predict Height from weight
72

-- Linear
70
-- Second order polynomial

68
-- Fourth order polynomial
-- Sixth order polynomial
Height (Inches)

66

64

62

60
80 100 120 140 160 180 200
Weight (pounds)

32

You might also like