Professional Documents
Culture Documents
by Ken Black
Chapter 16
Discrete Distributions
Learning Objectives
Analyze and interpret nonlinear variables in multiple regression analysis. Understand the role of qualitative variables and how to use them in multiple regression analysis. Learn how to build and evaluate multiple regression models. Learn how to detect influential observations in regression analysis.
First-order with Two Independent Variables Second-order with One Independent Variable
Second-order with an Interaction Term
2
Y 0 1 X 1 2 X 2 3 X 1 X 2
Y 0 1 X 1 2 X 2 3 X 1 4 X 2 5 X 1 X 2
2
Regression Statistics Multiple R 0.933 R Square 0.870 Adjusted R Square 0.858 Standard Error 51.10 Observations
13
ANOVA
df
1 11 12
MS 192395 2611
F 73.69
Significance F 0.000
Sales
Sales
500 450 400 350 300 250 200 150 100 50 0 0 50 100 150
Number of Mfg. Reps. Squared
Regression Statistics Multiple R 0.986 R Square 0.973 Adjusted R Square 0.967 Standard Error 24.593 Observations 13 t Stat 0.73 - 1.65 6.12 P-value 0.481 0.131 0.000
F 177.79
Significance F 0.000
Move toward
,Y , , or X ,
X , , or
toward log X, -1
Y toward X , X ,
Move toward
2 3
,Y , , or
Move toward
, X , or Y ,
toward log Y, -1
Stock 1 41
Stock 2 36 36
Stock 3 35 35
39
38
45 41 43 47 49
38
51 52 55 57 58
32
41 39 55 52 54
41
35 36 39 33 28 31
62
70 72 74 83 101 107
65
77 75 74 81 92 91
X X
1 2
Y
0 0
X X X X Y X X X
1 1 2 2 3 1 1 1 2 2 3 3
X X X
1 2 3
X X
1
R-Sq = 47.2%
Analysis of Variance Source Regression Error Total DF 2 12 14 SS 224.29 250.64 474.93 MS 112.15 20.89 F P 5.37 0.022
Predictor Coef StDev T Constant 12.046 9.312 1.29 Stock 2 0.8788 0.2619 3.36 Stock 3 0.2205 0.1435 1.54 Inter -0.009985 0.002314 -4.31 S = 2.909 R-Sq = 80.4%
R-Sq(adj) = 25.1%
Analysis of Variance Source Regression Error Total DF 3 11 14 SS 381.85 93.09 474.93 MS 127.28 8.46 F P 15.04 0.000
X 0 1 0
Y b b
'
'
b0 b1 X
' '
where :
0 ' 1
'
log Y
log b0 log b1
Company 1 2 3 4 5 6 7
Company 1 2 3 4 5 6 7
Y = Sales ($ million/year)
X = Advertising ($ million/year)
Regression Statistics Multiple R 0.990 R Square 0.980 Adjusted R Square 0.977 Standard Error 0.054 Observations 7
Intercept X
ANOVA
df
Regression Residual Total 1 5 6
MS 0.7392 0.0030
F 250.36
Significance F 0.000
2.900364 X 0.475127 For X = 2, log Y 2.900364 2 0.475127 3.850618 Y anti log(log Y ) anti log(3.850618) 7089.5
2.900364 X 0.475127 anti log( 2.900364) 794.99427 0.475127 anti log( 0.475127) 2.986256
log b0 2.900364
b log b b
0 1 1
2.986256
R-Sq = 89.0%
Analysis of Variance
DF 2 12 14
Males
1.400 1.200 1.000 0.800
Females
X2
X3 X4 X5
Y 55.7 55.7 52.8 57.3 59.7 60.2 62.7 59.6 56.1 53.5 53.3 54.5 54.0 56.2 56.7 58.7 59.9 60.6 60.2 60.2 60.6 60.9
X1 74.3 72.5 70.5 74.4 76.3 78.1 78.9 76.0 74.0 70.8 70.5 74.1 74.0 74.3 76.9 80.2 81.3 81.3 81.1 82.1 83.9 85.6
X2 83.5 114.0 172.5 191.1 250.9 276.4 255.2 251.1 272.7 282.8 293.7 327.6 383.7 414.0 455.3 527.0 529.4 576.9 612.6 618.8 610.3 640.4
X3 598.6 610.0 654.6 684.9 697.2 670.2 781.1 829.7 823.8 838.1 782.1 895.9 883.6 890.3 918.8 950.3 980.7 1029.1 996.0 997.5 945.4 1033.5
X4 21.7 20.7 19.2 19.1 19.2 19.1 19.7 19.4 19.2 17.8 16.1 17.5 16.5 16.1 16.6 17.1 17.3 17.8 17.7 17.8 18.2 18.9
X5 13.30 13.42 13.52 13.53 13.80 14.04 14.41 15.46 15.94 16.65 17.14 17.83 18.20 18.27 19.20 19.87 20.31 21.02 21.69 21.68 21.04 21.48
Stepwise Regression
Perform k simple regressions; and select the best as the initial model Evaluate each variable not in the model
If none meet the criterion, stop Add the best variable to the model; reevaluate previous variables, and drop any which are not significant
Forward Selection
Like stepwise, except variables are not reevaluated after entering the model
Backward Elimination
Start with the full model (all k predictors) If all predictors are significant, stop Otherwise, eliminate the most nonsignificant predictor; return to previous step
Y
Y Y Y
X2
X3 X4 X5
4.43
3.91 1.08 33.54
45.0%
38.9% 4.6% 34.2%
Response is Coiler on 5 predictors, with N = 26 Step Constant Seconds T-Value P-value Fuel Rate T-Value P-value S R-Sq 1.52 85.24 1 13.075 0.580 11.77 0.000 2 7.140 0.772 11.91 0.000 -0.52 -3.75 0.001 1.22 90.83
Multicollinearity
Condition that occurs when two or more of the independent variables of a multiple regression model are highly correlated
Difficult to interpret the estimates of the regression coefficients Inordinately small t values for the regression coefficients Standard deviations of regression coefficients are overestimated Sign of predictor variables coefficient opposite of what expected
Copyright 2008 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information herein.