You are on page 1of 6

Mean model

FORECASTS, ERROR MEASURES, AND CONFIDENCE INTERVALS


MEAN MODEL (p=1)
Y
FCST ERROR ABS ERR
45
53.667
-8.667
8.667
58
53.667
4.333
4.333
50
53.667
-3.667
3.667
54
53.667
0.333
0.333
62
53.667
8.333
8.333
53
53.667
-0.667
0.667
Number of data points (n)
Mean (AVERAGE(Y))
Sample standard deviation (STDEV(Y))
Std error of the mean (STDEV(Y)/SQRT(n))
Lower 95% limit for mean
Upper 95% limit for mean

6
53.667
5.955
2.431
47.417
59.916

SQ ERR LOWER 95%


75.111
38.358
18.778
38.358
13.444
38.358
0.111
38.358
69.444
38.358
0.444
38.358

Mean absolute error (MAE)


Sum of Squared Errors (SSE)
Mean Squared Error (SSE/(n-p))
RMSE (square root of MSE)
Critical t-value (95%, n-p d.f.)

Confidence interval for the mean = Mean +/- (t-value)*(std. error of mean)
Confidence interval for a prediction = Mean +/- (t-value)*(RMSE)
Note that RMSE for the mean model is just the sample standard deviation of the dependent variable,
...which is also the sample standard deviation of the errors in this case.

Page 1

4.333
177.333
35.467
5.955
2.571

UPPER 95%
68.975
68.975
68.975
68.975
68.975
68.975

Simple regression model

FORECASTS AND ERROR MEASURES


SIMPLE REGRESSION MODEL (p=2)

Number of data points (n)


Mean (AVERAGE)
Sample standard deviation (STDEV)
Sample variance (VAR)

X
18
25
15
22
24
20

Y
45
58
50
54
62
53

FCST
50.252
59.215
46.411
55.374
57.935
52.813

6
20.667
3.777
14.267

6
53.667
5.955
35.467

53.667
4.836
23.388

ERROR ABS ERR


-5.252
5.252
-1.215
1.215
3.589
3.589
-1.374
1.374
4.065
4.065
0.187
0.187

0.000
3.475
12.079

2.614

SQ ERR
27.587
1.476
12.879
1.887
16.528
0.035

10.065

Note that VAR(Y) = VAR(FCST) + VAR(ERR) = "explained variance" + "unexplained variance"


Correlation of X & Y (r = CORREL(X,Y))
0.81206
Regression slope coefficient (b = r*STDEV(Y)/STDEV(X))1.280374
Regression intercept (a = AVERAGE(Y)-b*AVERAGE(X)) 27.20561
Square of correlation coefficient (r squared)
0.659441
Unadjusted R-squared (1-VAR(ERR)/VAR(Y))
0.659441
Adjusted R-squared (1-MSE/VAR(Y))
0.574301

Mean absolute error (MAE)


Sum of Squared Errors (SSE)
MSE (SSE/(n-p))
RMSE (square root of MSE)
Critical t-value (95%, n-p d.f.)

2.614
60.393
15.098
3.886
2.776

The difference between unadjusted and adjusted R-squared is that VAR(ERR) = SSE/(n-1) whereas MSE = SSE/(n-2).
The former is a biased measure of the error variance, whereas the latter is an unbiased estimate, correcting for the
fact that 2 coefficients have been estimated, not 1.
Also note that MSE is not just the sample mean of the squared errors--it is the sum of squared errors divided by n-p,
not divided by n.
The RMSE for a regression model is also called the Standard Error of the Estimate (SEE)
The exact confidence interval for a prediction is equal to the prediction +/- (t-value) * (std. dev. of prediction)
...however the std. dev. of the prediction is NOT simply the RMSE of the model (unlike in the mean model).
Rather. it includes an additional factor that depends on the standard errors of the coefficients and the values
of the independent variables at that point.

Page 2

Multiple regression model

This worksheet shows the "brute force" calculation of regression coefficients, predictions, and confidence
intervals using matrix algebra. Essentially the same formulas would work for any number of data points and
independent variables, although the arrays would have to be reshaped.

The "X" matrix (constant & independent variable(s)):


X0
X1
X2...
1
18
1
25
1
15
1
22
1
24
1
20
6
2
4

The "Y" vector (dependent variable) and its deviations-from-mean and squared-deviations-from-mean:
Y
Y-AVG(Y) (Y-AVG(Y))^2
45
-8.66666667 75.11111111
(The blue cells are "live."
58
4.333333333 18.77777778
You can change their contents
50
-3.66666667 13.44444444
and see what happens....)
54
0.333333333 0.111111111
62
8.333333333 69.44444444
53
-0.66666667 0.444444444
53.66667
0.00000
29.55556 Average values

Number of data points (named N)


Number of coefficients to estimate (named P)
Number of "DEGREES OF FREEDOM" (named DF)

29.55556 = POPULATION VARIANCE of Y (named VARPY) is the average squared deviation of Y from its mean
35.46667 = SAMPLE VARIANCE of Y (named VARY) is the average squared deviation of Y from its mean ADJUSTED for the estimation of the mean from the finite sample
(i.e., it is the sum of squared deviations from the mean divided by N-1 rather than N)
Here is "X-transpose" (the X matrix transposed, named XT)
1
18

1
25

1
15

1
22

1
24

1
20

Now here is "X-transpose-X" (i.e., X-transpose times X, named XTX)


6
124
124
2634
And here is "X-transpose-X inverse" (the matrix inverse of the previous thing, named XTXINV)
6.154206 -0.28972
-0.28972 0.014019
Here is "X-transpose-Y" (X-transpose times the Y vector, named XTY)
322
6746
The vector of COEFFICIENT ESTIMATES ("beta hat") is equal to "X-transpose-X-inverse times X-transpose-Y"
(the previous two things matrix-multiplied together):
27.20561
1.280374
The vector of predictions (named YHAT) is now equal to "X beta-hat" (X times beta-hat):
50.25234
59.21495
46.41121
55.37383
57.93458
52.81308
ERRORS (actual minus predicted):
-5.25234
-1.21495
3.588785
-1.37383
4.065421
0.186916

SQUARED ERRORS:
27.58704
1.476111
12.87938
1.887414
16.52764
0.034938
60.39252 = Sum of Squared Errors (SSE)

The SIMPLE AVERAGE OF THE SQUARED ERRORS is the sum of squared errors divided by N:
10.06542 (This is a BIASED estimate of the average size of a squared error)
R-SQUARED is equal to 1 minus the average squared error divided by the population variance of Y:
0.659441 (This is a BIASED estimate of the fraction of variance "explained" by the model)
The MEAN SQUARED ERROR (MSE) is equal to the Sum of Squared Errors divided by the # Degrees of Freedom:
15.09813 (This is an UNBIASED estimate of the average size of a squared error)
ADJUSTED R-SQUARED is equal to 1 minus the MSE divided by the sample variance of Y
0.574301 (This is an UNBIASED estimate of the fraction of variance "explained" by the model)
The STANDARD ERROR OF THE ESTIMATE (SEE) is the square root of the MSE:
3.885631
The COVARIANCE MATRIX OF THE COEFFICIENT ESTIMATES ("COVMAT") is equal to X-transpose-X-inverse
times the MSE:
92.917 -4.37422
-4.37422 0.211656
The STANDARD ERRORS OF THE COEFFICIENT ESTIMATES are the square roots of the diagonal elements
of the covariance matrix:
9.639347
0.460061
The T-STATISTICS OF THE COEFFICIENT ESTIMATES
are the coefficients divided by their standard errors:

..and their exact SIGNIFICANCE LEVELS (p-values)


can be calculated using the TDIST function:

Page 3

Chart

65

60

55
Y
FCST

50

45

40
10

15

20

25

Page 4

30

Excel regression

SUMMARY OUTPUT: Tools/Data Analysis/Regression procedure


Regression Statistics
Multiple R
0.812059516
R Square
0.659440658
Adjusted R Square
0.574300822
Standard Error
3.885631331
Observations
6
ANOVA
df

SS
MS
F Significance F
1 116.9408 116.9408 7.745383 0.049663
4 60.39252 15.09813
5 177.3333

Regression
Residual
Total

Coefficients
Standard Error t Stat
P-value Lower 95%Upper 95%
Lower 95.000%
Upper 95.000%
27.20560748 9.639347 2.82235 0.047714 0.442436 53.96878 0.442436 53.96878
1.280373832 0.460061 2.783053 0.049663 0.003037 2.55771 0.003037 2.55771

Intercept
X Variable 1

RESIDUAL OUTPUT
Observation
1
2
3
4
5
6

Predicted Y
50.25233645
59.21495327
46.41121495
55.37383178
57.93457944
52.81308411

Residuals
Standard Residuals
-5.25234 -1.35173
-1.21495 -0.31268
3.588785 0.923604
-1.37383 -0.35357
4.065421 1.04627
0.186916 0.048104

X Variable 1 Residual Plot


6

Residuals

4
2
0
-2

14

16

18

20

22

24

26

-4
-6
X Variable 1

X Variable 1 Line Fit Plot


65

60
55
Y

50
45

40
14

16

18

20

22

24

26

X Variable 1

Page 5

Predicted Y

SG regression

Multiple Regression Analysis


----------------------------------------------------------------------------Dependent variable: Y
----------------------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
----------------------------------------------------------------------------CONSTANT
27.2056
9.63935
2.82235
0.0477
X
1.28037
0.460061
2.78305
0.0497
----------------------------------------------------------------------------Analysis of Variance
----------------------------------------------------------------------------Source
Sum of Squares
Df Mean Square
F-Ratio
P-Value
----------------------------------------------------------------------------Model
116.941
1
116.941
7.75
0.0497
Residual
60.3925
4
15.0981
----------------------------------------------------------------------------Total (Corr.)
177.333
5
R-squared = 65.9441 percent
R-squared (adjusted for d.f.) = 57.4301 percent
Standard Error of Est. = 3.88563
Mean absolute error = 2.61371
Durbin-Watson statistic = 1.79877
Regression Results for Y (PREDICTION FOR X=30)
-----------------------------------------------------------------------------------------------------Fitted
Stnd. Error Lower 95.0% CL Upper 95.0% CL Lower 95.0% CL Upper 95.0% CL
Row
Value
for Forecast
for Forecast
for Forecast
for Mean
for Mean
-----------------------------------------------------------------------------------------------------7
65.6168
6.00434
48.9461
82.2876
52.9075
78.3262
-----------------------------------------------------------------------------------------------------Y = 27.2056 + 1.28037*X

Page 6

You might also like