Professional Documents
Culture Documents
Regression analysis
All these notions can be extended to the case with multiple predictors...
193 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Example We can use two predictors for Intel: S&P500 and ination.
Intel
SP500
194 / 221
Statistics
Inflation
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
i = 1, . . . , n
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
(0 , . . . , k ) = arg min
0 ,...,k
i=1
yi (0 + 1 xi1 + + k xik )
196 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
. . . but require a matrix form of the regression model, Y = X + , 1 x11 x1k y1 0 1 . . . . , = . , = . . . with Y = . , X = . . . . . . . . . . yn 1 xn1 xnk k n The LS estimator of minimizes (Y X )T (Y X ), and is = (X T X )1 X T Y . No need to learn this slide by heart, we will use Excel to estimate the parameters.
197 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
198 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
The R Square in the Excel output is higher, than in the case of one predictor S&P500 only. Question What does R Square mean in the multiple regression?
199 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Syy
where Syy = n (yi y )2 and yi are the tted values i=1 0 + 1 xi1 + + k xik . yi = Proposition R2 = 1
200 / 221 Veronika Czellar HEC Paris
n i=1 (yi
Syy
yi ) 2
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
It can range from 0 to 1. A value near 0 indicates little linear association between the set of independent variables and the dependent variable. A value near 1 means a strong association. R 2 cannot go down when an extra predictor is added to the model and it will generally increase. R 2 can almost always be made very close to 1 by using a model with k quite close to n, even if many of the predictors would contribute only marginally to variation in y .
201 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
n1 nk 1
n i=1 (yi
Syy
yi ) 2
adjusted R 2 penalizes the addition of extraneous predictors to the model; adjusted R 2 is smaller than R 2 .
202 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Question High values of R 2 suggest that the model t is a useful one. But how large should this value be before we draw this conclusion?
203 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
204 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Model utility F test for the Intel example with two predictors:
205 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Warning If the F test results in the rejection of H0 , it does not mean that all predictors are useful.
206 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
tnk1 ,
207 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
SE (j ) is the standard error of the coecient j. (and has the matrix form 2 (X T X )1 ); jj
Example Do an individual test of each independent variable for the Intel regression with two predictors. Which variable would you consider eliminating? Use the 0.05 signicance level.
208 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
209 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Remark: if there are more than one nonsignicant variables, we should delete only one variable at a time. Each time we delete a variable, we need to rerun the regression equation and check the remaining variables. This method is called backward stepwise regression method.
210 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
211 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Example continued We would like to investigate the impact of GDP per capita and population growth on the increase of CO2 emissions. Year2008 : emissions of CO2 in 2008 (in million tons, yi,2008 ) Year2009 : emissions of CO2 in 2009 (in million tons, yi,2009 ) GDP2009realgrowth : GDP real growth rate (in %, xi,1 ) PopGrowth2009 : population growth rate (in %, xi,2 )
2 SquareGDP2009 : squared GDP2009realgrowth (xi,1 ) 2 SquarePopGrowth2009 : squared PopGrowth2009 (xi,2 )
Fit the following model: yi,2009 2 2 = 0 + 1 xi,1 + 2 xi,2 + 3 xi,1 + 4 xi,2 + i , yi,2008
212 / 221 Veronika Czellar HEC Paris Statistics
i = 1, . . . , 65 .
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
213 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
214 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
215 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
216 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
217 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Model assumptions in multiple regression can be veried in the same way as in simple linear regression (see 5.2.8).
Back to simple regression
However, there is an additional requirement in multiple regression: predictors should not be correlated. . .
218 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
5.3.9 Multicollinearity Multicollinearity exists when independent variables are correlated. Several clues that indicate problems with multicollinearity: An independent variable known to be an important predictor ends up being not signicant. A regression coecient that should have a positive sign turns out to be negative, or vice versa. When an independent variable is added or removed, there is a drastic change in the values of the remaining coecients.
219 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
For further details about linear regression, see Kutner, Nachtscheim, Neter and Li (2005), Applied Linear Statistical Models, 5th ed., McGraw-Hill; Fox (2008), Applied Regression Analysis and Generalized Linear Models, 2nd ed., Sage Publications.
220 / 221
Statistics
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and condence intervals 4. Testing statistical hypotheses 5. Regression analysis
Gracias
Spasibo
Ksznm o o o
221 / 221
Statistics