You are on page 1of 8

POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

COLLEGE OF ENGINEERING
DEPARTMENT OF INDUSTRIAL ENGINEERING

MODULE 6
MULTIPLE LINEAR REGRESSION

In Industrial Engineering, many applications of regression


analysis involve situations wherein there is more than one
regressor. A regression model that contains more than one
regressor is called a multiple regression model.

I. The Multiple Linear Regression Model

Suppose that the effective life of a cutting tool (Y ) depends on


the cutting speed ( x1 ) and the tool angle ( x 2 ) . A multiple
regression model that might describe this relationship is:

Y = 0 + 1 x1 + 2 x 2 +

Specifically, the model stated above is a multiple linear


regression model with two regressors because it is a linear
function of the unknown parameters 0 , 1 , 2 .

Models that are more complex in structure may often still be


analyzed by multiple linear regression models techniques.
Consider the polynomial model in one regressor variable:

Y = 0 + 1 x + 2 x 2 + 3 x 3 +

If we let x1 = x, x 2 = x , x3 = x :
2 3

Y = 0 + 1 x + 2 x2 + 3 x3 +

Models that include interaction effects may also be analyzed by


multiple linear regression methods.
Y = 0 + 1 x + 2 x2 + 12 x1 x2 +

If we let x3 = x1 x 2 and 3 = 12 , we will have:

Y = 0 + 1 x + 2 x2 + 3 x3 +

II. Least Squares Estimation of the Parameters

Similar to the simple linear regression model, the least squares


estimation methodology may be used to estimate the regression
parameters.

Using the same derivation method in simple linear regression


and letting x ij denote the ith observation of variable x j , we
have the following least square normal equations below:

n n n n
n0 + 1 xi1 + 2 xi 2 + + k xik = yi
i =1 i =1 i =1 i =1
n n n n n
0 xi1 + 1 xi1 xi1 + 2 xi1 xi 2 + + k xi1 xik = xi1 y i
i =1 i =1 i =1 i =1 i =1

n n n n n
0 xik + 1 xik xi1 + 2 xik xi 2 + + k xik xik = xik y i
i =1 i =1 i =1 i =1 i =1

EXAMPLE:

You are given the wire bond data in the succeeding table. Fit a
multiple linear regression model for the given data.

Observation Pull Strength Wire Length Die Height


Number (y) (x1) (x2)
1 9.95 2 50
2 24.45 8 110
3 31.75 11 120
4 35.00 10 550
5 25.02 8 295
6 16.86 4 200
7 14.38 2 375
8 9.60 2 52
9 24.35 9 100
10 27.50 8 300
11 17.08 4 412
12 37.00 11 400
13 41.95 12 500
14 11.66 2 360
15 21.65 4 205
16 17.89 4 400
17 69.00 20 600
18 10.30 1 585
19 34.93 10 540
20 46.59 15 250
21 44.88 15 290
22 54.12 16 510
23 56.63 17 590
24 22.13 6 100
25 21.15 5 400

Given the data above, we arrive at the following values:

25 25 25
n = 25 yi = 725.82 xi1 = 206 xi 2 = 8,294
i =1 i =1 i =1
25 25 25
xi1 xi1 = 2,396 xi 2 xi 2 = 3,531,848 xi1 xi 2 = 77,177
i =1 i =1 i =1
25 25
xi1 yi = 8,008.37 xi 2 yi = 274,811.31
i =1 i =1

For this particular case, the normal equations are:


25 25 25
n0 + 1 xi1 + 2 xi 2 = yi
i =1 i =1 i =1
25 25 25 25
0 xi1 + 1 xi1 xi1 + 2 xi1 xi 2 = xi1 yi
i =1 i =1 i =1 i =1
25 25 25 25
0 xik + 1 xik xi1 + 2 xik xi 2 = xi 2 yi
i =1 i =1 i =1 i =1

Inserting the computed values into the normal equation, we


have:

250 + 2061 + 8,2942 = 725.82


206 + 2,396 + 77,177 = 8,008.37
0 1 2

8,2940 + 77,177 1 + 3,531,8482 = 274,811.31

We have three unknowns 0 , 1 , 2 but we have three equations.


Using the system of equations methodology, the values of the
three unknowns are as follows:

0 = 2.264, 1 = 2.744, 2 = 0.013

Therefore, the fitted (multiple linear) regression equation is:

y = 2.264 + 2.744 x1 + 0.013x2

The equation above can now be used to predict pull strength for
pairs of values of the regressor variables wire length and die
height.

III. Matrix Approach to Multiple Linear Regression

In fitting a multiple regression model, it is much more


convenient to express the mathematical operations using a
matrix notation. Suppose that there are k regressor variables and
n observations, and that the model relating the regressors to the
response variable y is:

y i = 0 + 1 xi1 + 2 xi 2 + + k xik + i i = 1, 2, ,n

This model is a system of n equations that can be expressed in


matrix notation as

y = X +

where

y1 1 x11 x12 x1k 0 1


y2 1 x 21 x 22 x2k 1 2
y= X= = =

yn 1 x n1 xn2 x nk k n

To solve for the values of , we use the matrix equation:

= ( X ' X )1 X ' y

EXAMPLE # 1: Let us use the matrix approach in coming up


with the fitted regression model using the data in the previous
example.
1 2 50 9.95
1 8 110 24.45
1 11 120 31.75
1 10 550 35.00
1 8 295 25.02
1 4 200 16.86
1 2 375 14.38
1 2 52 9.60
1 9 100 24.35
1 8 300 27.50
1 4 412 17.08
1 11 400 37.00
X = 1 12 500 y = 41.95
1 2 360 11.66
1 4 205 21.65
1 4 400 17.89
1 20 600 69.00
1 1 585 10.30
1 10 540 34.93
1 15 250 46.59
1 15 290 44.88
1 16 510 54.12
1 17 590 56.63
1 6 100 22.13
1 5 400 21.15

The XX matrix is (please check!):


25 206 8,294
X ' X = 206 2,396 77,177
8,294 77,177 3,531,848

and the Xy vector (please check!):

725.82
X'y = 8,008.37
274,811.31

The least squares estimates are found by:

= ( X ' X )1 X ' y

0 2.26379143
1 = 2.74426964
2 0.01252781

Therefore, the fitted regression model is:

y = 2.264 + 2.744 x1 + 0.013x2

which is the same as what we got previously.

Estimating
2

Just as in simple linear regression, it is important to estimate


2

in a multiple regression model. In simple linear regression, the


equation was

SS E
2 =
n2
Remember that the reason why we subtracted 2 from n is
because we estimated two parameters in our regression model,
0 and 1 . However, in multiple linear regression, we estimate
more than two parameters in our regression model. We adjust
the previous equation to take this fact into consideration. Thus, a
more general equation for estimating the variance of the error
term is

n
ei2
SS E
2 = i =1
=
n p n p

where p is the number of parameters, i.e. regressors, including


the intercept, estimated.

EXAMPLE # 2: The electric power consumed each month by a


chemical plant is thought to be related to the average ambient
temperature (x1 ) , the number of days in the month (x2 ) , the
average product purity (x3 ) , and the tons of product produced
(x4 ) . The past years historical data are available and are
presented in the Excel file. Use the Data Analysis function in
Excel to come up with a regression analysis of the problem.

IV. Hypothesis Tests in Multiple Linear Regression

In multiple linear regression problems, there are certain tests of


hypotheses about the model parameters that are useful in
measuring model adequacy. Similar to the simple linear
regression model, a major assumption for the hypothesis tests is
that the error terms are NID(0, 2).

You might also like