10 views

Original Title: 31. Regression Analysis 1-1-11

Uploaded by Ani Krishna

- Demand Forecasting (Best)
- Ken Black QA ch15
- Absent Seem of Employess RMM Project
- 61-404-65
- Islam and Democracy
- Novi Dara Utami
- Frequency Distribution and Percentage
- Assignment-06.xlsx
- ma22_2015second_ps1 (1)
- Projection and Regression
- Least Squares
- Hysteresis Modeling for Estimation Of battery
- Independent and Dependent Variables
- Week9 (1).pdf
- Chapter 17
- Kuiper_Ch02
- Multivariate Time Series Analysis for Optimum Production Forecast, A Case Study of 7up Soft Drink Company in Nigeria
- Simple Regression NLS
- Regression Updated
- Advanced Data Analytics Assignment

You are on page 1of 27

Regression Analysis

Enables fit of linear or exponential function to data. The goal in regression analysis is the development of a statistical model that can be used to predict the values of a dependent or response variable from the values of the independent variable(s).

Linear Fits Most Common For exponential functions, data must be transformed.

Regression Analysis

If we have N pairs of data (xi, yi) we seek to fit a straight line through the data of the form: Determine constants, a0 and a1, such that the distance between the actual y data and the fitted/ predicted line is minimized.

y = a0 + a1 x

a0 = x i " x i y i ! " x i2 " y i "

Each xi is assumed to be error free. All the error is assumed to be in the y values.

i i 1 2 i 2 i

x i ! N " x i2 "

2 i i

Regression Analysis

Raw Data yi xi 1.2 2 2.4 3.5 3.5 12.6 1 1.6 3.4 4 5.2 15.2 xiyi 1.2 3.2 8.16 14 18.2 44.76 xi

2

Sum

Seeking an equation with the form: y=a0+a1x y=0.879+0.540x (15.2)(44.76)! (58.16)(12.6) = 0.879 a0 = (15.2)2 ! (5)(58.16)

Regression Analysis 4

Coefficient of Determination (R2) measures the goodness of fit and the proportion of the variation of the y values associated with the variation in the x variable in the regression. The ratio of the explained variation to the total variation.

R2 =1 Perfect Fit (good prediction) R2 =0 No correlation between x and y For engineering data, R2, will normally be quite high (0.8-0.90 or higher) A low value might indicate that some important variable was not considered, but is affecting the results.

R2

i i 2 i

Regression Analysis 5

The standard error of estimate (SEE or Syx) is a statistical measure of how well the best-fit line represents the data. This is, effectively, the standard deviation of the differences between the data points and the best-fit line.

It provides an estimation of the scatter/random error in the data about the fitted line. This is analogous to standard deviation for sample data. It has the same units as y. 2 degrees of freedom are lost to calculate coefficients a0 and a1.

" ( yi ! yi ) N !2

Regression Analysis 6

Variation in the data is assumed to be normally distributed and due to random causes. Assuming random variation exists in y values, while x values are error free. Since error has been minimized in the y direction, an erroneous conclusion may be made if x is estimated based on a value for y. For power law or exponential relationships, data needs to be transformed before carrying out linear regression analysis. (As we will discuss later, the method of least squares can also be applied to nonlinear functional relationships.)

Regression Analysis

Use Excel Chart>>Add Trendline to obtain coefficients Functions RSQ() and STEYX() to determine R2 and SEE

3.00

Output, Volts

0.50

1.00

1.50 Length, cm

2.00

2.50

3.00

Regression Analysis

Linear regression is a standard feature of statistical programs and most spreadsheet programs. It is only necessary to input the x and y data. The remaining calculations are performed immediately.

Performs linear regression only Non-linear relationships must be transformed Calculates the slope, intercept, SEE, and the upper and lower confidence intervals for the slope and intercept Does not produce any graphical output on the users plot. Does not update automatically. The user must interpret the results.

Regression Analysis 9

Y = m1iX + b

Torque, N-m (Y) 4.89 4.77 3.79 3.76 2.84 4.12 2.05 1.61 RPM (X) 100 201 298 402 500 601 699 799 Y Predicted Residual Residual/SEE=Residual/sey 4.998433207 0.108433207 0.17558474 4.559896053 -0.210103947 -0.340219088 4.138726707 0.348726707 0.564689451 3.687163697 -0.072836303 -0.117943051 3.261652399 0.421652399 0.682777249 2.823115245 -1.296884755 -2.100031702 2.397603947 0.347603947 0.562871377 1.963408745 0.353408745 0.572271025

Outlier

m1 se1 r^2 F

b seb sey df

=LINEST(A2:A9,B2:B9,TRUE,TRUE)

Regression Analysis

10

Torque, N-m (Y) 4.89 4.77 3.79 3.76 2.84 2.05 1.61 RPM (X) 100 201 298 402 500 699 799 Y Predicted 5.000219168 4.504157858 4.02774254 3.516946736 3.03561992 2.058231795 1.567081983 Residual 0.110219168 -0.265842142 0.23774254 -0.243053264 0.19561992 0.008231795 -0.042918017 Residual/SEE=Residual/sey 0.504559919 -1.21696881 1.088334807 -1.112646171 0.895506407 0.037683406 -0.196469559

m1 se1 r^2 F m1

b seb sey df b

Regression Analysis

11

Uncertainties on Regression

Confidence Interval for Regression Line SEE=sey TINV(a=0.05,n=5) 95% C.I.=TINV(=0.05,=5)*SEE/SQRT(7)

Prediction Band for Regression Line 95% P.I.=TINV(=0.05,=5)*SEE

Uncertainty in Slope b=TiINV(0.05,5)*se1

Uncertainty in Intercept b=TiINV(0.05,5)*seb

0.218446143 2.570581835 0.212239784

0.561533687

0.000895789

0.438558582

Regression Analysis

12

Not only do you want to obtain a curve fit relationship but you also want to establish a confidence interval in the equation or measure of random uncertainty in a curve fit. =N-2 in determination of t-value. Two degrees of freedom are lost because m1 and b are determined. 6 Syx Sey SEE CI = !y " t# ,$ = t# ,$ = t# ,$ N N N 5 where

Prediction Band -95% CI - 95% Torque, Lease Squares Fit CI +95% Prediction Band +95% Data

Torque, N-m

4 3 2 1 0

t#

,$

= TINV (# , $ )

200

400 RPM

600

800

1000

Regression Analysis

13

!yPrediction!Band

*

More accurate Approximate -minimum at mean -flares out at low & high extremes

Regression Analysis

14

Variable Sample Standard Deviation Expressions used in regression analysis Sum of squares for evaluating CI & PI Standard error of estimate

Sxx = " ( xi ! x )

2

Expression

$ 1 2' Sx = & " # ( xi ! x ) ) %N !1 (

1/2

1/2

Slope, m

Intercept, b

Note 1: =n-2. Note 2: m & b are not independent variables. Therefore, do not apply RSS to y=mx+b to determine y. Instead, use CI for curve fit.

Regression Analysis 16

Method involves computing the ratio of the residuals (predicted-actual) to the standard error of estimate (sey=SEE)

1. 2.

3.

Residuals=ypredicted-yactual at each xi Plot the ratio of residuals/SEE for each xi. These are the standardized residuals. Standardized residuals exceeding 2 may be considered outliers. Assuming the residuals are normally distributed, you can expect that 95% of residuals are in the range 2 (that is, within 2 standard deviations from best fit line)

Regression Analysis

17

Regression Analysis

18

Data Transformation

Commonly, test data do not show an approximate linear relationship between the dependent (Y) and independent (X) variables and a direct linear regression is not useful.

The form of the relationship expected between the dependent and independent variables is often known. The data needs to be transformed prior to performing a linear regression. Transformations often can be accomplished by taking the logarithms of or natural logarithms of one or both sides of the equation.

Regression Analysis

19

Common Transformations

Relationship Plot Method Log y vs. Log x (log plot) Log(y)=Log()+Log(x) Ln y vs. x (log-log paper) Ln(y)=Ln()+Ln(x) Transformed Intercept, b Log() Transformed Slope, m1 Ln() Log() Ln() Log(e)

y=x

y=ex

Regression Analysis

20

Example

A velocity probe provides a voltage output that is related to velocity, U, by the form E=+U , , and are constants

4.5 4 3.5 3 0 10

U (ft/s) 0 10 20 30 40 Ei (V) 3.19 3.99 4.3 4.48 4.65

50

10

20 30 Velocity, ft/s

40

Regression Analysis

21

E=+U (E==3.19 at U=0) Log(E-3.19)=Log(U) Log(E-3.19)=Log()+Log(U)= Log()+Log(U) Y

U (ft/s) 0 10 20 30 40 Ei (V) 3.19 3.99 4.3 4.48 4.65 Lets Tranform this X 1.00 1.30 1.48 1.60

m1 X

Y -0.097 0.045 0.111 0.164

Regression Analysis 22

SUMMARY OUTPUT Regression Statistics Multiple R 0.998723855 R Square 0.997449339 Adjusted R Square 0.996174009 Standard Error 0.01 Observations 4 ANOVA df Regression Residual Total 1 2 3 SS MS F Significance F 0.038118269 0.038118 782.1106 0.00127614 9.74754E-05 4.87E-05 0.038215745

Intercept X Variable 1

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% -0.525 0.021056315 -24.9274 0.001605 -0.61547736 -0.4342812 0.432 0.015438034 27.96624 0.001276 0.36531922 0.49816831

Y=-0.525+0.432X

Regression Analysis 23

Y predicted -0.0931 0.0368 0.1129 0.1668 Y+ -0.0781 0.0519 0.1279 0.1818 Y-0.1082 0.0218 0.0978 0.1518 Transform it Back Again E 3.19 4.00 4.28 4.49 4.66 E+ 3.19 4.03 4.32 4.53 4.71 E3.19 3.97 4.24 4.44 4.61

Example 4.10 5

4.5

E, V

3.5

E=3.19+0.298U0.432

0 10 20 U, ft/s 30 40 50

Regression Analysis

24

Regression analysis can also be performed in situations where there is more than one independent variable (multiple regression) or for polynomials of an independent variable (polynomial regression) Polynomial Expression Seeks the form

Y=b+m1*x+m2*x2++mkxk

Y = b + m1 x1 + m2 x2 + m3 x3 + .... + mk xk where x may represent several independent variables For example: x1 = x1 x2 = x2 x3 = x1 ! x2

Regression Analysis 25

Regression Analysis

26

SUMMARY OUTPUT Regression Statistics Multiple R 0.99964308 R Square 0.99928628 Adjusted R Square 0.99910785 Standard Error 0.02788582 Observations 6 ANOVA df Regression Residual Total 1 4 5 SS 4.35502286 0.00311048 4.35813333 MS 4.35502286 0.00077762 F Significance F 5600.45805 1.9107E-07

R2 SEE=sey N

Intercept X Variable 1

Coefficients Standard Error t Stat 0.02952381 0.02018228 1.46285828 0.99771429 0.01333197 74.8362082

P-value Lower 95% Upper 95% 0.21733392 -0.02651117 0.08555879 1.9107E-07 0.9606988 1.03472978

intercept b"

slope m1"

The lower and upper bounds for the coefficients. To obtain the +- bound, simply subtract the lower from the upper and divide by two.

Regression Analysis 27

- Demand Forecasting (Best)Uploaded byRahul Kumar
- Ken Black QA ch15Uploaded byRushabh Vora
- Absent Seem of Employess RMM ProjectUploaded byHardik Patel
- 61-404-65Uploaded byCucu
- Islam and DemocracyUploaded byInternational Organization of Scientific Research (IOSR)
- Novi Dara UtamiUploaded byDra'zt Nov Zuhanda
- Frequency Distribution and PercentageUploaded byGurumoorti Bhat
- Assignment-06.xlsxUploaded bysameer imdad
- ma22_2015second_ps1 (1)Uploaded byRiemann Soliven
- Projection and RegressionUploaded byapi-26344229
- Least SquaresUploaded byAli Mpk
- Hysteresis Modeling for Estimation Of batteryUploaded byParikshitBanthia
- Independent and Dependent VariablesUploaded byKrish Fenandish
- Week9 (1).pdfUploaded byosmanfırat
- Chapter 17Uploaded byfa2heem
- Kuiper_Ch02Uploaded byandyrb123
- Multivariate Time Series Analysis for Optimum Production Forecast, A Case Study of 7up Soft Drink Company in NigeriaUploaded bychristian emeka okafor
- Simple Regression NLSUploaded byMichael Thung
- Regression UpdatedUploaded byVishal Siwal
- Advanced Data Analytics AssignmentUploaded byAurindum Mukherjee
- Regression Btw Shrm v IpUploaded byBasit Ali Chaudhry
- US Federal Reserve: 199924papUploaded byThe Fed
- 2DS01lec1Uploaded byfarukh jamil
- US Federal Reserve: 199924papUploaded byThe Fed
- Cereal Example NotesUploaded byRohan Gupta
- Correlation.and.RegressionUploaded bybhaktamelt
- RegressionLecture2(1)Uploaded bydan
- Sample Exam 2 OutputUploaded byAndrew Brosman
- Wine Case ReportUploaded byNeeladeviN
- OlsUploaded byইশতিয়াক হোসেন সিদ্দিকী

- 8 Shaping Strategy in an Uncertain Macroeconomic EnvironmentUploaded byvineet_bm
- Modeling Guidance W_o PicUploaded byalirfane
- ds87 intro monohybrid crossesUploaded byapi-110789702
- Whiteman, 2000 [book] Mountain Meteorology Fundamentals and ApplicationsUploaded byCarlos Gómez Ortiz
- December Reading FormUploaded byFahmi Sy
- RainUploaded byMarina Manofu
- Air Density and Bullet PerformanceUploaded byblowmeasshole1911
- Ev TutorialUploaded byShrek Meister
- Tornado - WikiUploaded bywdm00
- K-HOLE #3Uploaded bykix11
- Narrative & Explanation [Eighteen Shared]Uploaded byAndiMuhammadShalihin
- amd_2Uploaded byMuhammad Irfan Siregar
- FLOOD MODELLING.pdfUploaded byShashika Iresh
- 21st Bomber Command Tactical Mission Report 317, OcrUploaded byJapanAirRaids
- SWAT Model Calibration EvaluationUploaded byAnonymous BVbpSE
- A Holistic Method for Conductor Ampacity and Sag Computation on an OHL StructureUploaded byoikksun
- Understanding Chiller EfficiencyUploaded byOmair Farooq
- Performance-based Seismic and Wind Engineering for 60-Storey BuildingUploaded byleodegarioporral
- scm test 2Uploaded byhmichaelkim
- Time Series EconometricsUploaded byfenomanana
- CCA _BookUploaded byJorge Lg
- 7class&PredUploaded byYogesh Bansal
- Estimation of PropertiesUploaded byLee860531
- 4913 Mentzer Chapter 3 Time Series Forcasting TechniquesUploaded bykaustavpal
- 130190106036_2150602Uploaded byYashika Bhathiya Jayasinghe
- Population ForecastingUploaded byNicole Cruz
- Michael a. DiSpezio, Myron Miller Critical Thinking PuzzlesUploaded byAgime Ukella
- Almas Tower AtkinsUploaded bysuman33
- Forecast Package for RUploaded byVictoria Liendo
- Philippines Response 1 PageUploaded byMatthew Falk