Professional Documents
Culture Documents
,)
Reference: Chapter 17 of Statistics for Management and Economics, 7th Edition, Gerald Keller.
Why regression?
1. Analyze specific relations between Y and X. How is Y related to X? 2. Forecast / Predict the variable Y with the help of X. In this case, linear relationship
3
y = b0 + b1 x
Do not provide any information about how closely the value will match the true value
Interval predictions
Prediction intervals for predicting y for a given value of x Confidence intervals for the average of y for a given x.
4
200 190 180 170 160 150 140 130 120 110 100
SALES
ADVER
6
E[Y | X = x g ]
y t 2:n 2 s
(xg x) 1 + 2 n (n 1)( x1 x )
2
9
10
200 190 180 170 160 150 140 130 120 110 100
SALES
ADVER
11
Residual Analysis
Most of the departure from the required conditions can be diagnosed by the residual analysis
Standardiz ed residual = Residual mean of the residual standard deviation of the residual
th
ei Standardiz ed i residual = s
Food company... a 1st data: when ADVER= 276, SALES= 115.0 predicted SALES= 118.0087 residual= 115.0 118.0087= 3.008726
Standardiz ed residual = 3.008726 = 0.1869935 16.09
13
Nonnormality
Nonnormality of the residuals can be checked by making a histogram on residuals
Food Company Example
3.0 Frequency 0.0 -30 0.5 1.0 1.5 2.0 2.5
-20
-10
0 Residual
10
20
30
14
Heteroscedasticity
Variance of the errors is not constant (Violation of the requirement)
Homoscedasticity
Variance of the errors is constant (No violation of the requirement) Check: plot the residuals against predicted values of Y by the model
20 Residual -20 -10 0 10
120
130
140
150
160
170
180
15
Outliers
An outlier is an observation that is unusually small or large. Several possibilities need to be investigated when an outlier is observed:
There was an error in measuring or recording the value. The point does not belong in the sample. The observation is valid.
16
Identify outliers from the scatter diagram. It is customary to suspect an observation is an outlier if its |standard residual| > 2
17
Influential observations
200 200 180 180 11
160
160
140
140
120
120
SALES
SALES
ADVER
ADVER
18
If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line. If r = 0 there is no linear pattern.
The coefficient can be used to test for linear relationship between two variables.
19
Y X
20
When no linear relationship exist between the two variables, = 0. The hypotheses are:
H0: = 0 H1: 0
21
The statistic is Student t distributed with d.f. = n - 2, provided the variables are bivariate normally distributed.
22
r = 0.826
Test statistic
23
,2
,1
0,0
-,1
USINDEX
JAPINDEX
25
26
Conclusion: There is sufficient evidence at a = 5% to infer that there are linear relationship between the two variables.
28
Correlations USINDEX USINDEX JAPINDEX Pearso n Co rrelatio n 1 ,491** Sig. (2-tailed) , ,000 N 59 59 Pearso n Co rrelatio n ,491** 1 Sig. (2-tailed) ,000 , N 59 59
JAPINDEX
29