Professional Documents
Culture Documents
ENGINEERING SYSTEMS
LABORATORY
Group 02
Regression of y on x
x
In the practice of engineering experimentation, the
regression analysis is used to estimate the “best”
empirical constants, with their respective confidence
limits, to fit a mathematical model to a set of
measurement data.
CONTINUOUS
RELATIONSHIP
x
y
x
LINEAR REGRESSION
It is the regression in which the model used is
linear
y ax b
CURVILINEAR REGRESSION
y f x
CURVILINEAR REGRESSION
It is the regression in which the model used is
nonlinear (polynomial or not)
EXAMPLE
cx
y axbe
MULTIVARIATE REGRESSION
y ax bz c
y = a 1 x 1 + a2 x 2 + b
or in a nonlinear model
y axz bw c
y = a(1-x12)x2 + bx3
METHOD OF LEAST SQUARES
It is a general method used in a very broad class of engineering problems
like
curve fitting,
parameter estimation,
system identification, and
static and dynamic optimization
Its major advantage over other techniques is that it results in a set of
linear algebraic equations in terms of unknown model parameters if
these parameters appear linearly in the mathematical model.
IF THERE ARE (n) MEASUREMENTS
x1, x2, x3,……….. xn THEN THE BEST ESTIMATE OF THIS
DATA xbest THAT WILL SATISFY THE FOLLOWING
CONDITION :
n
E xi xbest
2
IS MINIMIZED
i 1
n
E x i
i 1
x best 2
The minimization of E with respect to xbest requires that
dE
0
dx best
n
d ( xi xbest ) 2 n
dE
i 1
2 ( xi xbest )0
dxbest dxbest i 1
n n n
x x
i 1
i
i 1
best xi nxbest 0
i 1
which is nothing but the arithmetic mean of the data set given.
n
x
i 1
i n.x best 0
x i THIS IS THE
xbest i 1
ARITHMETIC
n MEAN
LET US APPLY
THE METHOD OF LEAST SQUARES TO
LINEAR REGRESSION
Let (x1,y1), (x2,y2), ….., (xn,yn) constitute n paired data points. A best
linear regression of y on x as y= ax + b
y Data Point i
yi (x i,yi)
ax i+b
Linear Regression
of y on x
y = ax + b
xi x
y Data Point i
yi (x i,yi)
ax i+b
Linear Regression
of y on x
y = ax + b
xi x
y ax b at x i y ax i b
y Data Point i
yi (x i,yi)
ax i+b
Linear Regression
of y on x
y = ax + b
xi x
at xi y axi b
n n
E y y E y ax b
2 2
i i i
i 1 i 1
y – (axi+b)
is the vertical distance between the data point i and the
regressed value of y (=axi+b) at xi in the y versus x plane.
n
E y ax i b
2
i
i 1
IN ORDER TO MINIMIZE E
WITH RESPECT TO (y) BEST VALUE
dE dE
0 0
da db
n
n n
n xi y i xi y i
a i 1 i 1 i 1
2
2
n n
n x i x i
i 1 i 1
n n 2 n n
y i xi xi y i xi
b i 1 i 1 i 1 2
i 1
2
n n
n xi xi
i 1 i 1
or by defining
1 n 1 n
x xi y yi
n i 1 n i 1
1 n 2 1 n
x xi
2
xy xi yi
n i 1 n i 1
xy x. y x 2 . y x.xy
a 2 b
x x2 2
x x
2
or
a 1 x 1 y
b 2 2 2
x x x x xy
Note also that b y a x
EXAMPLE: Let the following data show the result of an
experiment as 6 pairs of values.
i 1 2 3 4 5 6
xi 0.5 1.0 1.5 2.0 2.5 3.0
yi 0.71 1.33 1.68 1.88 2.31 2.48
3,00
2,50
2,00
1,50
y
1,00
0,50
0,00
0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5
x
EXAMPLE
i 1 2 3 4 5 6
xi 0.5 1.0 1.5 2.0 2.5 3.0
yi 0.71 1.33 1.68 1.88 2.31 2.48
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
10,5 10,39 21,18 22,75
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
10,5 10,39 21,18 22,75
n
n n
n xi yi xi yi
i 1 i 1 i 1 6( 21.18) (10.5)(10.39)
a 0.685
n
2
n
2
6( 22.75) (10.5) 2
n xi xi
i 1 i 1
xi yi x iy i x i2
0,5 0,71 0,355 0,25
1 1,33 1,33 1
1,5 1,68 2,52 2,25
2 1,88 3,76 4
2,5 2,31 5,775 6,25
3 2,48 7,44 9
10,5 10,39 21,18 22,75
n n 2 n n
yi xi xi yi xi
(10.39)(22.75) (21.18)(10.5)
b i 1 i 1 i 1 2 i 1 0.533
n
2
n
6(22.75) (10.5) 2
n xi xi
i 1 i 1
3.00
2.50
2.00
1.50
y
Regression
1.00
Line
0.50 y=0.685x+0.533
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
y 0.533 0.685x
CORRELATION COEFFICIENT
It is a measure of the degree of linear correlation existing between
(y) and (x). It is defined as: n
( x x)(y i i y)
r i 1
(n 1) s x s y
or n _ _
(x i x )( y i y )
r i 1
n _ n _
( xi x )
i 1
2
i
( y
i 1
y ) 2
or
xy x . y
r
x 2
x
2
y 2
y
2
or if the regression coefficient a has already been calculated
r as x / s y
THE CORRELATION COEFFICIENT (r) ALWAYS LIES
BETWEEN ‐1 AND +1
IF AND ONLY IF ALL DATA POINTS LIE ON THE
REGRESSION LINE, THEN r = ±1
IF r = 0, THE REGRESSION DOES NOT EXPLAIN
ANYTHING ABOUT THE VARIATION OF y, and
the regression line is horizontal; that is y = b = y.
COEFFICIENT OF DETERMINATION
total variation
A negative correlation coefficient (r<0) is simply an indication
of an inverse correlation between y and x, leading to a negative
slope (a<0) for the regression line.
Standard Deviation of Data From the Regression Line
(Standard Error of Estimate):
1 n
s y,x yi yti 2
n 2 i 1
Note that s2y,x can be considered as the estimate of the variance of y left
unexplained by the regression of y on x. It is defined with n-2 rather than n-1 in
the denominator because two degrees of freedom are used in estimating a and
b.
The followings are two equivalent expressions for sy,x used in its
calculation if a regression is already done:
n 1 2 n 1 2
s y,x
n2
s y a 2 s x2 s y,x
n2
sy 1 r 2
For the last example, sy,x= 0.133.
s y,x s y,x 1
sa .
sx n 1 n x x
2 2
2
1 x s y,x x2
sb s y , x 2 .
n s x n 1 n x x
2 2
s y,x ( x x) 2
s yt . 1 2
n x x
2
s y,x ( x x) 2
s yt . 1n 2
n x x
2
Confidence Limits:
a = ± z sa
For the last example: a = 0.685 ± 1.645*0.064
a = 0.685 ± 0.105
3.00
2.50 Confidence
2.00 of Slope
1.50
y
1.00
0.50
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Confidence of y-Intercept:
b = ± z sb
For the last example: b = 0.533 ± 1.645*0.124
b = 0.533 ± 0.204
3.00
2.50 Confidence of
2.00 y-Intercept
1.50
y
1.00
0.50
0.00
0.0 0.5 1.0 1.5 2.0 2.5 3.0
x
Confidence of Mean Value of yt at x(=3):
yt = ± z syt