You are on page 1of 34

MECHANICAL

ENGINEERING SYSTEMS
LABORATORY

Group 02

Asst. Prof. Dr. E. İlhan KONUKSEVEN


REGRESSION ANALYSIS
Regression problem deals with the relationship between
the frequency distribution of one (dependent) variable and
another (independent) variable(s) which is (are) held fixed
at each of several values.

In statistics, “regression” means simply average


relationship, and thus between two variables, y and x,

regression of variable y on variable x

implies a relationship between


•the average of the values of the variable y for a
given value of the variable x,
and
•the value of the variable x.
y

Regression of y on x

x
In the practice of engineering experimentation, the
regression analysis is used to estimate the “best”
empirical constants, with their respective confidence
limits, to fit a mathematical model to a set of
measurement data.

Once a mathematical model is so established between


the dependent variable y and the independent
variable(s) x, it can then be used to predict y for new
values of x, by treating x as a continuous variable in
the implied interpolation process involved.
y

CONTINUOUS
RELATIONSHIP

x
y

x
LINEAR REGRESSION
It is the regression in which the model used is
linear

THE MODEL IS LINEAR

y  ax  b
CURVILINEAR REGRESSION

IT IS THE REGRESSION IN WHICH


THE MODEL USED IS A POLYNOMIAL
IN X,

y  f x 
CURVILINEAR REGRESSION
It is the regression in which the model used is
nonlinear (polynomial or not)

THE MODEL IS NON-LINEAR

EXAMPLE

cx
y  axbe
MULTIVARIATE REGRESSION

MULTIPLE INDEPENDENT VARIABLES

It is the regression in which there exist multiple


independent variables used in a linear model

y  ax  bz  c
y = a 1 x 1 + a2 x 2 + b
or in a nonlinear model
y  axz  bw  c
y = a(1-x12)x2 + bx3
METHOD OF LEAST SQUARES
It is a general method used in a very broad class of engineering problems
like
curve fitting,
parameter estimation,
system identification, and
static and dynamic optimization
Its major advantage over other techniques is that it results in a set of
linear algebraic equations in terms of unknown model parameters if
these parameters appear linearly in the mathematical model.
IF THERE ARE (n) MEASUREMENTS
x1, x2, x3,……….. xn THEN THE BEST ESTIMATE OF THIS
DATA xbest THAT WILL SATISFY THE FOLLOWING
CONDITION :
n
E   xi  xbest 
2
IS MINIMIZED
i 1
n
E   x i
i 1
 x best 2
The minimization of E with respect to xbest requires that
dE
0
dx best

n
d  ( xi  xbest ) 2 n
dE
 i 1
 2 ( xi  xbest )0
dxbest dxbest i 1

n n n

 x  x
i 1
i
i 1
best  xi nxbest 0
i 1

which is nothing but the arithmetic mean of the data set given.
n

x
i 1
i  n.x best  0

x i THIS IS THE
xbest  i 1
ARITHMETIC
n MEAN
LET US APPLY
THE METHOD OF LEAST SQUARES TO
LINEAR REGRESSION
Let (x1,y1), (x2,y2), ….., (xn,yn) constitute n paired data points. A best 
linear regression of y on x as y= ax + b

y Data Point i
yi (x i,yi)

ax i+b
Linear Regression
of y on x
y = ax + b

xi x
y Data Point i
yi (x i,yi)

ax i+b
Linear Regression
of y on x
y = ax + b

xi x

y  ax  b at x i y  ax i  b
y Data Point i
yi (x i,yi)

ax i+b
Linear Regression
of y on x
y = ax + b

xi x
at xi y  axi  b

n n
E   y  y E   y  ax  b 
2 2
i i i
i 1 i 1

y – (axi+b)
is the vertical distance between the data point i and the 
regressed value of y (=axi+b) at xi in the y versus x plane.
n
E   y  ax i  b 
2
i
i 1

IN ORDER TO MINIMIZE E
WITH RESPECT TO (y)  BEST VALUE

dE dE
0 0
da db
n
 n  n 
n  xi y i    xi   y i 
a i 1  i 1   i 1 
2
2  
n n
n  x i   x i 
i 1  i 1 

 n  n 2   n  n 
  y i   xi    xi y i   xi 
b   i 1  i 1   i 1 2
 i 1 
2  
n n
n  xi   xi 
i 1  i 1 
or by defining
1 n 1 n
x  xi y   yi
n i 1 n i 1

1 n 2 1 n
x   xi
2
xy  xi yi
n i 1 n i 1

xy x. y x 2 . y  x.xy
a 2 b
x x2 2
x x
2

or
a  1  x 1   y 
b   2 2  2  
  x x  x  x   xy 

Note also that b  y a x
EXAMPLE: Let the following data show the result of an 
experiment as 6 pairs of values.
i 1 2 3 4 5 6
xi 0.5 1.0 1.5 2.0 2.5 3.0
yi 0.71 1.33 1.68 1.88 2.31 2.48

3,00
2,50
2,00
1,50
y

1,00
0,50
0,00
0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5
x
EXAMPLE
i 1 2 3 4 5 6
xi 0.5 1.0 1.5 2.0 2.5 3.0
yi 0.71 1.33 1.68 1.88 2.31 2.48

xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
 10,5 10,39 21,18 22,75
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
 10,5 10,39 21,18 22,75

n
 n  n 
n  xi yi   xi   yi 
i 1  i 1   i 1  6( 21.18)  (10.5)(10.39)
a   0.685
n
2 
n

2
6( 22.75)  (10.5) 2

n  xi   xi 
i 1  i 1 
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
 10,5 10,39 21,18 22,75

 n  n 2   n  n 
  yi   xi   xi yi   xi 
(10.39)(22.75)  (21.18)(10.5)
b  i 1  i 1   i 1 2  i 1    0.533
n
2 
n
 6(22.75)  (10.5) 2

n xi   xi 
i 1  i 1 
3.00
2.50
2.00
1.50
y

Regression
1.00
Line
0.50 y=0.685x+0.533
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x

y  0.533  0.685x
CORRELATION COEFFICIENT
It is a measure of the degree of linear correlation existing between 
(y) and (x). It is defined as: n

 ( x  x)(y i i  y)
r  i 1
(n  1) s x s y
or n _ _

 (x i  x )( y i  y )
r i 1
n _ n _

 ( xi  x )
i 1
2
 i
( y
i 1
 y ) 2

or
xy  x . y
r
x 2
 x
2
y 2
 y
2

or if the regression coefficient a has already been calculated
r as x / s y
THE CORRELATION COEFFICIENT (r) ALWAYS LIES 
BETWEEN ‐1 AND +1

IF AND ONLY IF ALL DATA POINTS LIE ON THE 
REGRESSION LINE, THEN r = ±1

IF r = 0, THE REGRESSION DOES NOT EXPLAIN 
ANYTHING ABOUT THE VARIATION OF y, and 
the regression line is horizontal; that is y = b = y.
COEFFICIENT OF DETERMINATION

variation due to regression


r 
2

total variation

r2 percent of the total dispersion of data points yi from their


overall mean value y can be explained by the existing regression
between y and x

A negative correlation coefficient (r<0) is simply an indication 
of an inverse correlation between y and x, leading to a negative 
slope (a<0) for the regression line.
Standard Deviation of Data From the Regression Line
(Standard Error of Estimate):

The deviation of each data point from the regression line is


written as:
yi-yti = yi -(axi+b)

Then, as a measure of the vertical scatter (dispersion) of data,


the standard error of estimate is defined as:

1 n
s y,x    yi  yti 2

n  2 i 1

Note that s2y,x can be considered as the estimate of the variance of y left
unexplained by the regression of y on x. It is defined with n-2 rather than n-1 in
the denominator because two degrees of freedom are used in estimating a and
b.
The followings are two equivalent expressions for sy,x used in its
calculation if a regression is already done:

n 1 2 n 1 2
s y,x 
n2

s y  a 2 s x2  s y,x 
n2

sy 1 r 2 
For the last example, sy,x= 0.133.

Standard Error of Slope:

s y,x s y,x 1
sa   .
sx n  1 n x x
2 2

For the last example sa=0.064


Standard Error of y-Intercept:

2
1 x s y,x x2
sb  s y , x  2  .
n s x n  1 n x x
2 2

For the last example sb=0.124

Standard Error of Mean Value of yt @ any x:

s y,x ( x  x) 2
s yt  . 1 2
n x x
2

For the last example syt=0.096 @ x=3


Standard Error of Estimation of yt @ any x:

s y,x ( x  x) 2
s yt  . 1n 2
n x x
2

For the last example sye=0.164 @ x=3

Confidence Limits:

Assume Gaussian distribution with a degree of


confidence 90 %  z = 1.645
Confidence of Slope:

a = ± z sa
For the last example: a = 0.685 ± 1.645*0.064
a = 0.685 ± 0.105

3.00
2.50 Confidence
2.00 of Slope
1.50
y

1.00
0.50
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Confidence of y-Intercept:

b = ± z sb
For the last example: b = 0.533 ± 1.645*0.124
b = 0.533 ± 0.204

3.00
2.50 Confidence of
2.00 y-Intercept
1.50
y

1.00
0.50
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Confidence of Mean Value of yt at x(=3):

yt = ± z syt

For the last example: yyt = 2.588 ± 1.645*0.096


yyt = 2.588 ± 0.158

Confidence of Estimation of yt at x(=3):


(i.e., the size of the band at x(=3) which is expected to include 90 %
of the new data points)
ye = ± z sye

For the last example: yye = 2.588 ± 1.645*0.164


yye = 2.588 ± 0.270

You might also like