You are on page 1of 53

STATS 330: Lecture 15

Case Study
21.08.2014

Page 1/52
Housekeeping

I My contact details
Office: 303S.265
Email:aj.lee@auckland.ac.nz or lee@stat.auckland.ac.nz
I Office hours: Tuesday 10:30-12:00, Thursday 10:30-12:00
I am happy to talk to students at any time if I am not too
busy

Page 2/52
Aims of todays lecture

I To illustrate the modelling process using the


evaporation data.

Page 3/52
Evaporation data

Page 4/52
Evaporation data: Aims of the analysis

I Understand relationships between explanatory


variables and the response.

I Be able to predict evaporation loss given the


other variables.

Page 5/52
Evaporation data: The variables
A data frame with 46 observations
on the following 11 variables:

evap: amount of evaporation over 24 hour period (response)

avst: average soil temperature over 24 hour period (x10)


minst: minimum soil temperature over 24 hour period (x10)
maxst: maximum soil temperature over 24 hour period (x10)
avat: average air temperature over 24 hour period (x10)
minat: minimum air temperature over 24 hour period (x10)
maxat: maximum air temperature over 24 hour period (x10)

avh: average humidity over 24 hour period (x10)


minh: minimum humidity over 24 hour period (x10)
maxh: maximum humidity over 24 hour period (x10)

wind: average wind speed over a 24 hour period (x100)

Page 6/52
The Modelling Cycle

PLOTS and THEORY

Choose Model

Fit Model

Transform Examine Residuals

Bad fit Good fit

USE MODEL

Page 7/52
The Modelling Cycle: Our plan of attack

I Graphical check
I Suitable for regression
I Gross outliers

I Preliminary fit

I Model selection (for prediction and interpretation)

I Transforming (if required)

I Outlier check

I Use model for prediction and interpretation

Page 8/52
Step 1: Plots

I Preliminary plots

I Want to get an initial idea of suitability of data for regression


modelling

I Check for linear relationships, outliers

Page 9/52
Step 1: Pairs plot (using gclus-package)
66 72 80 90 150 200 30 60 100 500






























90



















avst























75



















72
















minst









































66


























180















maxst



















130



























































80 90
















avat







































































70





































minat














60













200



































maxat

















150

96








avh

















93












60







































































minh



































30

440














































maxh














































340






500





















































wind












100

































































evap

30
















0
75 90 130 180 60 70 93 96 340 440 0 30

Page 10/52
Step 1: Points to note

I avh has very few values.

I Strong relationship between response and some variables


(particularly avst, maxh).

I Relationship between maxh and response looks curved.

I Not much relationship between response and wind, avh.

I Strong relationship between avst, minst, maxst and avat.

I No obvious outliers.

Page 11/52
The Modelling Cycle: Our plan of attack
I Graphical check

I Preliminary fit
I Fit a model of response against all explanatory variables
I Check diagnostic plots
I Investigate extreme outliers
I Look at VIFs

I Model selection

I Transforming (if required)

I Outlier check

I Use model for prediction and interpretation

Page 12/52
Step 2: Preliminary fit
Call:
lm(formula = evap ~ ., data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -54.074877 130.720826 -0.414 0.68164
avst 2.231782 1.003882 2.223 0.03276 *
minst 0.204854 1.104523 0.185 0.85393
maxst -0.742580 0.349609 -2.124 0.04081 *
avat 0.501055 0.568964 0.881 0.38452
minat 0.304126 0.788877 0.386 0.70219
maxat 0.092187 0.218054 0.423 0.67505
avh 1.109858 1.133126 0.979 0.33407
minh 0.751405 0.487749 1.541 0.13242
maxh -0.556292 0.161602 -3.442 0.00151 **
wind 0.008918 0.009167 0.973 0.33733
---
Residual standard error: 6.508 on 35 degrees of freedom
Multiple R-squared: 0.8463, Adjusted R-squared: 0.8023
F-statistic: 19.27 on 10 and 35 DF, p-value: 2.073e-11

Page 13/52
Step 2: Diagnostic plots

Residuals vs Fitted Normal QQ

10 15

Standardized residuals



1

5

Residuals









0

0






5

2
2
15

41 33 2
33
41

0 10 20 30 40 50 2 1 0 1 2

Fitted values Theoretical Quantiles

ScaleLocation Residuals vs Leverage


41 33 2

1.5

2
Standardized residuals

1

Standardized residuals
0.5


1


1.0



0






1



0.5

31 0.5

1
2



2
41
Cook's distance
3
0.0

0 10 20 30 40 50 0.0 0.2 0.4 0.6

Fitted values Leverage

Page 14/52
Removing point 31 and comparing

Coefficients: With point 31 Without point 31


Estimate Std. Error Estimate Std. Error
(Intercept) -54.074877 130.720826 -96.548794 133.201036
avst 2.231782 1.003882 2.116661 0.996830
minst 0.204854 1.104523 0.611714 1.134679
maxst -0.742580 0.349609 -0.800539 0.348578
avat 0.501055 0.568964 1.502231 0.940245
minat 0.304126 0.788877 0.234797 0.782114
maxat 0.092187 0.218054 -0.123226 0.269796
avh 1.109858 1.133126 1.235595 1.124895
minh 0.751405 0.487749 1.138294 0.563481
maxh -0.556292 0.161602 -0.660888 0.178177
wind 0.008918 0.009167 0.008438 0.009075

Page 15/52
Variance Inflation Factors

avst minst maxst avat minat


39.294418 14.082537 52.340003 8.827574 8.887339
maxat avh minh maxh wind
22.215658 1.980530 25.376391 24.115308 1.984635

Page 16/52
Findings

I Plots OK, normality dubious.

I Point 31 has quite high Cooks distance, but its removal does
not change regression much.

I Model looks linear, could interpret coefficients, but variables


highly correlated.

Page 17/52
The Modelling Cycle: Our plan of attack

I Graphical check

I Preliminary fit

I Model selection
I Using APR to identify suitable models
I Look at summary stats
I Check diagnostic plots

I Transforming (if required)

I Outlier check

I Use model for prediction and interpretation

Page 18/52
Step 3: Model selection using APR
Cp Plot

30
25
20
Cp

15

1,2,3,4,5,6,7,8,9,10

6,9
10

1,3,4,5,6,7,8,9,10

1,3,4,6,7,8,9,10
6,9,10

1,3,4,7,8,9,10

1,3,6,8,9,10
5

1,3,6,9 1,3,6,8,9

2 4 6 8 10

Number of variables
Page 19/52
Model suggestions for different criteria

rssp sigma2 adjRsq Cp AIC BIC CV


1 3071.255 69.801 0.674 30.519 76.519 80.177 293.984
2 2101.113 48.863 0.772 9.612 55.612 61.098 220.242
3 1879.949 44.761 0.791 6.390 52.390 59.705 202.706
4 1696.789 41.385 0.807 4.065 50.065 59.208 206.391
5 1599.138 39.978 0.813 3.759 49.759 60.731 207.462
6 1552.033 39.796 0.814 4.647 50.647 63.448 210.884
7 1521.227 40.032 0.813 5.920 51.920 66.549 228.658
8 1490.602 40.287 0.812 7.197 53.197 69.654 245.759
9 1483.733 41.215 0.808 9.034 55.034 73.321 260.160
10 1482.277 42.351 0.802 11.000 57.000 77.115 278.885

Page 20/52
Suggested models

avst minst maxst avat minat maxat avh minh maxh wind
1 0 0 0 0 0 0 0 0 1 0
2 0 0 0 0 0 1 0 0 1 0
3 0 0 0 0 0 1 0 0 1 1
4 1 0 1 0 0 1 0 0 1 0
5 1 0 1 0 0 1 0 1 1 0
6 1 0 1 0 0 1 0 1 1 1
7 1 0 1 1 0 0 1 1 1 1
8 1 0 1 1 0 1 1 1 1 1
9 1 0 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1

Page 21/52
Suggested models

I CV favours model
evap~maxat+maxh+wind

I BIC favours model


evap~avst+maxst+maxat+maxh

I AIC and Cp favour model


evap~avst+maxst+maxat+minh+maxh

I 2 and adjR 2 favour model which adds wind to the


AIC-favoured model.

Page 22/52
CV selected model: Summary

Call:
lm(formula = evap ~ maxat + maxh + wind, data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 123.901800 24.624411 5.032 9.60e-06 ***
maxat 0.222768 0.059113 3.769 0.000506 ***
maxh -0.342915 0.042776 -8.016 5.31e-10 ***
wind 0.015998 0.007197 2.223 0.031664 *
---
Residual standard error: 6.69 on 42 degrees of freedom
Multiple R-squared: 0.805, Adjusted R-squared: 0.7911
F-statistic: 57.8 on 3 and 42 DF, p-value: 5.834e-15

Page 23/52
CV selected model: Diagnostic plot

Residuals vs Fitted Normal QQ

3
8 8

2
Standardized residuals

10


1


Residuals

1
10

2
41
20

33 41

3
33

0 10 20 30 40 50 2 1 0 1 2

Fitted values Theoretical Quantiles

ScaleLocation Residuals vs Leverage

3
33
41 8 0.5
Standardized residuals

8
1.5

2
Standardized residuals



1




1.0


0







3 2 1



2
0.5


0.5

41
Cook's distance 1
0.0

0 10 20 30 40 50 0.00 0.05 0.10 0.15 0.20 0.25

Fitted values Leverage


Page 24/52
BIC selected model: Summary

Call:
lm(formula = evap ~ avst + maxst + maxat + maxh,
data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.30530 45.65538 1.321 0.19387
avst 1.29035 0.60287 2.140 0.03832 *
maxst -0.56355 0.18237 -3.090 0.00359 **
maxat 0.42601 0.09389 4.538 4.9e-05 ***
maxh -0.30734 0.05160 -5.956 5.0e-07 ***
---
Residual standard error: 6.433 on 41 degrees of freedom
Multiple R-squared: 0.824, Adjusted R-squared: 0.8069
F-statistic: 48 on 4 and 41 DF, p-value: 6.089e-15

Page 25/52
BIC selected model: Diagnostic plot

Residuals vs Fitted Normal QQ


8

Standardized residuals
10


1



Residuals

1
10

2
33 332
20

41

3
41

0 10 20 30 40 50 2 1 0 1 2

Fitted values Theoretical Quantiles

ScaleLocation Residuals vs Leverage


41 1
8

2
33
Standardized residuals

0.5
2 Standardized residuals
1.5




1




1.0

0




3 2 1



0.5

0.5
2

1
41
Cook's distance
0.0

0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4

Fitted values Leverage


Page 26/52
AIC selected model: Summary

Call:
lm(formula = evap ~ avst + maxst + maxat + minh + maxh,
data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 69.5476 45.2608 1.537 0.132265
avst 2.1304 0.8000 2.663 0.011104 *
maxst -0.6857 0.1955 -3.507 0.001136 **
maxat 0.2908 0.1265 2.299 0.026802 *
minh 0.6021 0.3852 1.563 0.125960
maxh -0.4712 0.1165 -4.045 0.000232 ***
---
Residual standard error: 6.323 on 40 degrees of freedom
Multiple R-squared: 0.8342, Adjusted R-squared: 0.8134
F-statistic: 40.24 on 5 and 40 DF, p-value: 1.411e-14

Page 27/52
AIC selected model: Diagnostic plot

Residuals vs Fitted Normal QQ


10

Standardized residuals



1
5

Residuals

1
10

2
2

41 33 332

3
20

41

0 10 20 30 40 50 2 1 0 1 2

Fitted values Theoretical Quantiles

ScaleLocation Residuals vs Leverage


41
33 2 0.5

2
1.5
Standardized residuals

Standardized residuals
38

1




1.0



0



1



0.5

0.5

2

3

41 1
Cook's distance
0.0

0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4

Fitted values Leverage


Page 28/52
Step 3: Summary

I Increase in residuals acceptable according to anova.

I CV selection appears appealing. However, non-normality an


issue for prediction.

I BIC selection very appealing, normality seems more likely.

I AIC model not much of an improvement over BIC, so will


ignore it from now on.

I Will look at CV and BIC further.

Page 29/52
The Modelling Cycle: Our plan of attack

I Graphical check

I Preliminary fit

I Model selection

I Transforming (if required)


I Check for remaining signal in residuals
I Check for transformation in response or explanatory variables
I Check for normality

I Outlier check

I Use model for prediction and interpretation

Page 30/52
Step 4: Checking for signal in residuals

Diagnostic for CV Diagnostic for BIC


Residuals vs Fitted Residuals vs Fitted

8 8

10
10

0

0


Residuals

Residuals








10

10


41
33
20

20
33 41

0 10 20 30 40 50 0 10 20 30 40 50

Fitted values Fitted values

Page 31/52
Transformations for Response?

95% 95%
50

50
BoxCox for CV model BoxCox for BIC model
100

100
logLikelihood

logLikelihood
150

150
200

200
250

2 1 0 1 2 250 2 1 0 1 2

Page 32/52
Transformations for Variables: Cross Validation
20

20

20
10

10

10
s(maxat,1.22)

s(maxh,3.22)
0

0
s(wind,1)
10

10

10
20

20

20
30

30

30
150 160 170 180 190 200 210 340 360 380 400 420 440 460 480 100 200 300 400 500 600

maxat maxh wind

Page 33/52
Transformations for Variables: Cross Validation

Call:
lm(formula = evap ~ maxat + poly(maxh, 3) + wind, data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.041666 10.932152 -0.736 0.46627
maxat 0.201854 0.058888 3.428 0.00142 **
poly(maxh, 3)1 -69.247897 8.132711 -8.515 1.61e-10 ***
poly(maxh, 3)2 3.167952 6.631289 0.478 0.63544
poly(maxh, 3)3 15.969100 6.550355 2.438 0.01931 *
wind 0.015351 0.006896 2.226 0.03170 *
---
Residual standard error: 6.369 on 40 degrees of freedom
Multiple R-squared: 0.8317, Adjusted R-squared: 0.8107
F-statistic: 39.54 on 5 and 40 DF, p-value: 1.876e-14

Page 34/52
Transformations for Variables: Cross Validation

Residuals vs Fitted Normal QQ

8
8

2
10

Standardized residuals


1





Residuals

1

10

41

2

41
20

3
33
33
10 20 30 40 50 2 1 0 1 2

Fitted values Theoretical Quantiles

ScaleLocation Residuals vs Leverage

3
33


Standardized residuals

2
8 41 Standardized residuals
1.5

0.5
1





1.0

0









4 3 2 1


2 0.5
1
6
0.5




41

Cook's distance
0.0

10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Fitted values Leverage


Page 35/52
Transformations for Variables: BIC

40

40
s(avst,3.76)

s(maxst,1)
20

20
0

0
20

20
40

40
75 80 85 90 95 130 150 170 190

avst maxst
40

40
s(maxh,3.28)
s(maxat,1)
20

20
0

0
20

20
40

40

150 170 190 210 340 380 420 460

maxat maxh
Page 36/52
Transformations for Variables: BIC
Call:
lm(formula = evap ~ poly(avst, 3) + maxst + maxat
+ poly(maxh, 3), data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 61.32493 23.91563 2.564 0.01453 *
poly(avst, 3)1 57.17955 20.51578 2.787 0.00834 **
poly(avst, 3)2 21.68382 8.41793 2.576 0.01413 *
poly(avst, 3)3 14.24292 7.31494 1.947 0.05914 .
maxst -0.72595 0.15901 -4.565 5.35e-05 ***
maxat 0.52135 0.09295 5.609 2.13e-06 ***
poly(maxh, 3)1 -64.45204 9.89038 -6.517 1.26e-07 ***
poly(maxh, 3)2 -10.10864 8.01830 -1.261 0.21531
poly(maxh, 3)3 15.92728 6.37772 2.497 0.01709 *
---
Residual standard error: 5.346 on 37 degrees of freedom
Multiple R-squared: 0.8903, Adjusted R-squared: 0.8666
F-statistic: 37.55 on 8 and 37 DF, p-value: 1.794e-15

Page 37/52
Transformations for Variables: BIC
Call:
lm(formula = evap ~ poly(avst, 2) + maxst + maxat
+ poly(maxh, 3), data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 65.24568 24.69042 2.643 0.011885 *
poly(avst, 2)1 52.57309 21.11410 2.490 0.017267 *
poly(avst, 2)2 25.04976 8.53573 2.935 0.005637 **
maxst -0.65890 0.16084 -4.097 0.000212 ***
maxat 0.43969 0.08595 5.116 9.24e-06 ***
poly(maxh, 3)1 -71.74260 9.48446 -7.564 4.30e-09 ***
poly(maxh, 3)2 -13.57454 8.10027 -1.676 0.101985
poly(maxh, 3)3 10.75983 6.00852 1.791 0.081300 .
---
Residual standard error: 5.539 on 38 degrees of freedom
Multiple R-squared: 0.8791, Adjusted R-squared: 0.8568
F-statistic: 39.47 on 7 and 38 DF, p-value: 1.599e-15

Page 38/52
Transformations for Variables: BIC

Call:
lm(formula = evap ~ poly(avst, 2) + maxst + maxat + maxh,
data = evap.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 190.07107 28.88199 6.581 7.22e-08 ***
poly(avst, 2)1 51.96140 22.44731 2.315 0.025838 *
poly(avst, 2)2 18.49093 6.22026 2.973 0.004980 **
maxst -0.65449 0.16987 -3.853 0.000413 ***
maxat 0.48866 0.08857 5.517 2.25e-06 ***
maxh -0.33992 0.04853 -7.004 1.85e-08 ***
---
Residual standard error: 5.894 on 40 degrees of freedom
Multiple R-squared: 0.8559, Adjusted R-squared: 0.8378
F-statistic: 47.5 on 5 and 40 DF, p-value: 8.841e-16

Page 39/52
Transformations for Variables: BIC

Residuals vs Fitted Normal QQ

10

Standardized residuals


1

5






Residuals

0







5

2
2
33 33
15

2
41
41

3
10 20 30 40 50 2 1 0 1 2

Fitted values Theoretical Quantiles

ScaleLocation Residuals vs Leverage


41

2
0.5
2
1.5

33
Standardized residuals

Standardized residuals


1




1.0

0







1




0.5

7 0.5

2
41 1
3

Cook's distance
0.0

10 20 30 40 50 0.0 0.1 0.2 0.3 0.4

Fitted values Leverage


Page 40/52
Comparisons

FitResidual plot for CV model FitResidual plot for BIC transformed model
Residuals vs Fitted Residuals vs Fitted

10


10

5







0

0

Residuals

Residuals



5


10


41

10

2
33
20

15
33 41

10 20 30 40 50 10 20 30 40 50

Fitted values Fitted values

Page 41/52
Comparisons

Normal QQ plot for CV model Normal QQ plot for BIC model


Normal QQ Normal QQ

WBtest: p=0.1

2

8

WBtest: p=0
2

1


1



Standardized residuals

Standardized residuals




0

0

1


2

2
41
33
2
3

33 41

2 1 0 1 2 3 2 1 0 1 2

Theoretical Quantiles Theoretical Quantiles

Page 41/52
Step 4: Summary

I Response does not need transformation.

I CV model is flat after fitting 3rd degree polynomial to maxat;


non-normality still an issue.

I BIC model suggests transformations for avat and maxh. After


playing around, only 2nd degree polynomial for avat seems to
be sensible.

I WB test provides weak evidence against normality for


transformed BIC model.

Page 42/52
The Modelling Cycle: Our plan of attack

I Graphical check

I Preliminary fit

I Model selection (for prediction)

I Transforming (if required)

I Outlier check
I Check for outliers
I Test effects of removing outliers

I Use model for prediction

Page 43/52
Step 5: Diagnostics for outliers for CV model

dfb.1_ dfb.maxt dfb.maxh dfb.I(^2


1.5

1.5
6 6 6

1.5
0.8
1.0

1.0

1.0
0.6

dfb.maxh
dfb.maxt

dfb.I(^2
dfb.1_

0.4
0.5

0.5

0.5
0.2
0.0

0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Obs. number Obs. number Obs. number Obs. number

dfb.I(^3 dfb.wind DFFITS ABS(COV RATIO1)

6 6 6

1.5
2.5
0.5
1.5

ABS(COV RATIO1)
2.0
0.4

1.0
1.0

dfb.wind

1.5
DFFITS
dfb.I(^3

0.3

41
33
2 8

1.0
0.2

0.5
0.5

3
7 41
40

0.5
0.1
0.0

0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Obs. number Obs. number Obs. number Obs. number

Cook's D Hats

6 6
1.0

0.6

2
0.8
Cook's D

0.4
0.6

Hats
0.4

0.2
0.2
0.0

0.0

0 10 20 30 40 50 0 10 20 30 40 50

Obs. number Obs. number

Page 44/52
CV summary

I Points 2, 6, and 41 light up as potential outliers

I Removing 2 and 41 removed significance for polynomial term

I Removing point 6 kept polynomial terms significant and


changed coefficients.

I Thus, remove point 6 as high-influence outlier.

I Non-normality still an issue after removal.

Page 45/52
Diagnostics for outliers for BIC model

dfb.1_ dfb.p(,2)1 dfb.p(,2)2 dfb.mxst

1.2
0.5
2 7
1.5

0.5
1.0
0.4

0.4
0.8
41
1.0

dfb.p(,2)1

dfb.p(,2)2
0.3

dfb.mxst

0.3
dfb.1_

0.6
0.2

0.2
0.4
0.5

0.1

0.1
0.2
0.0

0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Obs. number Obs. number Obs. number Obs. number

dfb.maxt dfb.maxh DFFITS ABS(COV RATIO1)


0.5

41 2 32
1.5

41

0.8
7
0.4

1.5

ABS(COV RATIO1)
41

0.6
6
1.0
0.3

dfb.maxh
dfb.maxt

DFFITS

1.0
33

0.4
38
0.2

0.5

0.5

0.2
0.1
0.0

0.0

0.0

0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Obs. number Obs. number Obs. number Obs. number

Cook's D Hats

2 32
0.4
0.5

41
0.4

0.3

7
Cook's D

0.3

Hats

0.2
0.2

0.1
0.1
0.0

0.0

0 10 20 30 40 50 0 10 20 30 40 50

Obs. number Obs. number

Page 46/52
BIC summary

I Points 2, 7, 32, and 41 light up as potential outliers

I Removal of any of these points kept polynomial terms


significant

I However, removing any of the points had no notable effects


on coefficients

I Keep outliers and stick with polynomial model.

I Removal would improve evidence against normality from weak


to none, so will keep a model without point 41 for prediction.

Page 47/52
The Modelling Cycle: Our plan of attack

I Graphical check

I Preliminary fit

I Model selection (for prediction)

I Transforming (if required)

I Outlier check

I Use model for interpretation and prediction

Page 48/52
Step 6: Interpretation (BIC model)

With other variables fixed,


I Evaporation goes up with increased average soil temperature
(but for fixed average, goes down with increasing maximum)
This may sound a bit strange but a some plots will help
explain what is going on (see next slide)

I Evaporation goes up with maximum atmospheric temperature.


I Evaporation goes down with maximum humidity.

Page 49/52
Step 6: Interpretation of negative
maxst coefficient


Given : avst
5
6 6
50

75 80 85 90 95

4 6 6 6


5
5 6
6
4

5 5 5
4
4 5
40

4 6 6 6

4 3
5
evap.df$evap


1 2

130 150 170 190 130 150 170 190
30


3

2
3

50

40

30
4
20

20

1 5 4

10


4

evap

2

0
5

50
10



2 2



40


30





1 1

20



0

10

0
75 80 85 90 95 130 150 170 190

maxst
evap.df$avst

Note: in the left hand picture, we have coded the range of maxst
into groups with 1 being the lowest and 6 the highest value. For
fixed avst, the evaporation goes down for increasing maxst.

Page 50/52
Step 6: Prediction

I Create a data frame containing average values of explanatory


variables for each model.

I Check prediction and compare to prediction for full model.

Page 51/52
Step 6: Summary

fit lwr upr


CV all 34.75389 21.64577 47.86202
CV without 6 35.56229 22.74656 48.37803
BIC all 32.24676 20.09240 44.40113
BIC without 41 33.56925 22.64414 44.49436
Full 34.67391 21.31966 48.02817

Page 52/52

You might also like