Week9 PDF

MECHANICAL
ENGINEERING SYSTEMS
LABORATORY
Group 02
Asst. Prof. Dr. E. İlhan KONUKSEVEN

REGRESSION ANALYSIS
Regression problem deals with the relationship between
the frequency distribution of one (dependent) variable and
another (independent) variable(s) which is (are) held fixed
at each of several values.
In statistics, “regression” means simply average

relationship, and thus between two variables, y and x,
regression of variable y on variable x
implies a relationship between

•the average of the values of the variable y for a
given value of the variable x,
and
•the value of the variable x.
y
Regression of y on x
x
In the practice of engineering experimentation, the
regression analysis is used to estimate the “best”
empirical constants, with their respective confidence
limits, to fit a mathematical model to a set of
measurement data.
Once a mathematical model is so established between

the dependent variable y and the independent
variable(s) x, it can then be used to predict y for new
values of x, by treating x as a continuous variable in
the implied interpolation process involved.
y
CONTINUOUS
RELATIONSHIP
x
y
x
LINEAR REGRESSION
It is the regression in which the model used is
linear
THE MODEL IS LINEAR
y  ax  b
CURVILINEAR REGRESSION
IT IS THE REGRESSION IN WHICH

THE MODEL USED IS A POLYNOMIAL
IN X,
y  f x 
CURVILINEAR REGRESSION
It is the regression in which the model used is
nonlinear (polynomial or not)
THE MODEL IS NON-LINEAR
EXAMPLE
cx
y  axbe
MULTIVARIATE REGRESSION
MULTIPLE INDEPENDENT VARIABLES
It is the regression in which there exist multiple

independent variables used in a linear model
y  ax  bz  c
y = a 1 x 1 + a2 x 2 + b
or in a nonlinear model
y  axz  bw  c
y = a(1-x12)x2 + bx3
METHOD OF LEAST SQUARES
It is a general method used in a very broad class of engineering problems
like
curve fitting,
parameter estimation,
system identification, and
static and dynamic optimization
Its major advantage over other techniques is that it results in a set of
linear algebraic equations in terms of unknown model parameters if
these parameters appear linearly in the mathematical model.
IF THERE ARE (n) MEASUREMENTS
x1, x2, x3,……….. xn THEN THE BEST ESTIMATE OF THIS
DATA xbest THAT WILL SATISFY THE FOLLOWING
CONDITION :
n
E   xi  xbest 
2
IS MINIMIZED
i 1
n
E   x i
i 1
 x best 2
The minimization of E with respect to xbest requires that
dE
0
dx best
n
d  ( xi  xbest ) 2 n
dE
 i 1
 2 ( xi  xbest )0
dxbest dxbest i 1
n n n
 x  x
i 1
i
i 1
best  xi nxbest 0
i 1
which is nothing but the arithmetic mean of the data set given.
n
x
i 1
i  n.x best  0
x i THIS IS THE
xbest  i 1
ARITHMETIC
n MEAN
LET US APPLY
THE METHOD OF LEAST SQUARES TO
LINEAR REGRESSION
Let (x1,y1), (x2,y2), ….., (xn,yn) constitute n paired data points. A best
linear regression of y on x as y= ax + b
y Data Point i
yi (x i,yi)
ax i+b
Linear Regression
of y on x
y = ax + b
xi x
y Data Point i
yi (x i,yi)
ax i+b
Linear Regression
of y on x
y = ax + b
xi x
y  ax  b at x i y  ax i  b
y Data Point i
yi (x i,yi)
ax i+b
Linear Regression
of y on x
y = ax + b
xi x
at xi y  axi  b
n n
E   y  y E   y  ax  b 
2 2
i i i
i 1 i 1
y – (axi+b)
is the vertical distance between the data point i and the
regressed value of y (=axi+b) at xi in the y versus x plane.
n
E   y  ax i  b 
2
i
i 1
IN ORDER TO MINIMIZE E
WITH RESPECT TO (y)  BEST VALUE
dE dE
0 0
da db
n
 n  n 
n  xi y i    xi   y i 
a i 1  i 1   i 1 
2
2  
n n
n  x i   x i 
i 1  i 1 
 n  n 2   n  n 
  y i   xi    xi y i   xi 
b   i 1  i 1   i 1 2
 i 1 
2  
n n
n  xi   xi 
i 1  i 1 
or by defining
1 n 1 n
x  xi y   yi
n i 1 n i 1
1 n 2 1 n
x   xi
2
xy  xi yi
n i 1 n i 1
xy x. y x 2 . y  x.xy
a 2 b
x x2 2
x x
2
or
a  1  x 1   y 
b   2 2  2  
  x x  x  x   xy 
Note also that b  y a x
EXAMPLE: Let the following data show the result of an
experiment as 6 pairs of values.
i 1 2 3 4 5 6
xi 0.5 1.0 1.5 2.0 2.5 3.0
yi 0.71 1.33 1.68 1.88 2.31 2.48
3,00
2,50
2,00
1,50
y
1,00
0,50
0,00
0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5
x
EXAMPLE
i 1 2 3 4 5 6
xi 0.5 1.0 1.5 2.0 2.5 3.0
yi 0.71 1.33 1.68 1.88 2.31 2.48
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
 10,5 10,39 21,18 22,75
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
 10,5 10,39 21,18 22,75
n
 n  n 
n  xi yi   xi   yi 
i 1  i 1   i 1  6( 21.18)  (10.5)(10.39)
a   0.685
n
2 
n

2
6( 22.75)  (10.5) 2
n  xi   xi 
i 1  i 1 
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
 10,5 10,39 21,18 22,75
 n  n 2   n  n 
  yi   xi   xi yi   xi 
(10.39)(22.75)  (21.18)(10.5)
b  i 1  i 1   i 1 2  i 1    0.533
n
2 
n
 6(22.75)  (10.5) 2
n xi   xi 
i 1  i 1 
3.00
2.50
2.00
1.50
y
Regression
1.00
Line
0.50 y=0.685x+0.533
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
y  0.533  0.685x
CORRELATION COEFFICIENT
It is a measure of the degree of linear correlation existing between
(y) and (x). It is defined as: n
 ( x  x)(y i i  y)
r  i 1
(n  1) s x s y
or n _ _
 (x i  x )( y i  y )
r i 1
n _ n _
 ( xi  x )
i 1
2
 i
( y
i 1
 y ) 2
or
xy  x . y
r
x 2
 x
2
y 2
 y
2

or if the regression coefficient a has already been calculated
r as x / s y
THE CORRELATION COEFFICIENT (r) ALWAYS LIES
BETWEEN ‐1 AND +1
IF AND ONLY IF ALL DATA POINTS LIE ON THE
REGRESSION LINE, THEN r = ±1
IF r = 0, THE REGRESSION DOES NOT EXPLAIN
ANYTHING ABOUT THE VARIATION OF y, and
the regression line is horizontal; that is y = b = y.
COEFFICIENT OF DETERMINATION
variation due to regression

r 
2
total variation
r2 percent of the total dispersion of data points yi from their

overall mean value y can be explained by the existing regression
between y and x
A negative correlation coefficient (r<0) is simply an indication
of an inverse correlation between y and x, leading to a negative
slope (a<0) for the regression line.
Standard Deviation of Data From the Regression Line
(Standard Error of Estimate):
The deviation of each data point from the regression line is

written as:
yi-yti = yi -(axi+b)
Then, as a measure of the vertical scatter (dispersion) of data,

the standard error of estimate is defined as:
1 n
s y,x    yi  yti 2
n  2 i 1
Note that s2y,x can be considered as the estimate of the variance of y left
unexplained by the regression of y on x. It is defined with n-2 rather than n-1 in
the denominator because two degrees of freedom are used in estimating a and
b.
The followings are two equivalent expressions for sy,x used in its
calculation if a regression is already done:
n 1 2 n 1 2
s y,x 
n2

s y  a 2 s x2  s y,x 
n2

sy 1 r 2 
For the last example, sy,x= 0.133.
Standard Error of Slope:
s y,x s y,x 1
sa   .
sx n  1 n x x
2 2
For the last example sa=0.064

Standard Error of y-Intercept:
2
1 x s y,x x2
sb  s y , x  2  .
n s x n  1 n x x
2 2
For the last example sb=0.124
Standard Error of Mean Value of yt @ any x:
s y,x ( x  x) 2
s yt  . 1 2
n x x
2
For the last example syt=0.096 @ x=3

Standard Error of Estimation of yt @ any x:
s y,x ( x  x) 2
s yt  . 1n 2
n x x
2
For the last example sye=0.164 @ x=3
Confidence Limits:
Assume Gaussian distribution with a degree of

confidence 90 %  z = 1.645
Confidence of Slope:
a = ± z sa
For the last example: a = 0.685 ± 1.645*0.064
a = 0.685 ± 0.105
3.00
2.50 Confidence
2.00 of Slope
1.50
y
1.00
0.50
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Confidence of y-Intercept:
b = ± z sb
For the last example: b = 0.533 ± 1.645*0.124
b = 0.533 ± 0.204
3.00
2.50 Confidence of
2.00 y-Intercept
1.50
y
1.00
0.50
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Confidence of Mean Value of yt at x(=3):
yt = ± z syt
For the last example: yyt = 2.588 ± 1.645*0.096

yyt = 2.588 ± 0.158
Confidence of Estimation of yt at x(=3):

(i.e., the size of the band at x(=3) which is expected to include 90 %
of the new data points)
ye = ± z sye
For the last example: yye = 2.588 ± 1.645*0.164

yye = 2.588 ± 0.270

Week9 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week9 PDF

Uploaded by

Copyright:

Available Formats

MECHANICAL

Asst. Prof. Dr. E. İlhan KONUKSEVEN

In statistics, “regression” means simply average

regression of variable y on variable x

implies a relationship between

Once a mathematical model is so established between

THE MODEL IS LINEAR

IT IS THE REGRESSION IN WHICH

THE MODEL IS NON-LINEAR

MULTIPLE INDEPENDENT VARIABLES

It is the regression in which there exist multiple

variation due to regression

r2 percent of the total dispersion of data points yi from their

The deviation of each data point from the regression line is

Then, as a measure of the vertical scatter (dispersion) of data,

Standard Error of Slope:

For the last example sa=0.064

For the last example sb=0.124

Standard Error of Mean Value of yt @ any x:

For the last example syt=0.096 @ x=3

For the last example sye=0.164 @ x=3

Assume Gaussian distribution with a degree of

For the last example: yyt = 2.588 ± 1.645*0.096

Confidence of Estimation of yt at x(=3):

For the last example: yye = 2.588 ± 1.645*0.164

You might also like