You are on page 1of 32

2

MATH2831/2931
Linear Models/ Higher Linear Models.

September 13, 2013

Week 7 Lecture 3 - Last lecture:

DFFITS

DFBETAS

COVRATIO

Week 7 Lecture 3 - This lecture:

Variance stabilizing transformations.

Examples: snow geese, inflation rates and central bank


independence.

Weighted regression

Transformations for modelling non-linear relationships


between a response and a single predictor.

Week 7 Lecture 3 - Transformations

Diagnostic measures detect violations of assumptions: how do


we deal with these violations?

One approach which is sometimes applicable:


use of transformations.

Appropriateness of transformation and the type of


transformation used depends on the nature of the violations of
assumptions.

Are there problems with: constant variance, independence of


noise random variables i , distributional assumption for noise,
mean...

Week 7 Lecture 3 - Transformations

This lecture: variance stabilizing transformations


(appropriate when constancy of error variance is violated).

Applying a transformation to fix a violation of assumptions


may cause another violation which did not appear on the
original scale.

What is the effect of transformation on the model for the


errors?

Week 7 Lecture 3 - Variance stabilizing transformations


Write y for a typical response.
Common variance stabilizing transformations:

Square root transformation,


y : appropriate when error
variance is proportional to the mean.
Exercise: demonstrate this result by taking an
appropriate 1st order Taylor series expansion of the
response.

Week 7 Lecture 3 - Variance stabilizing transformations


Write y for a typical response.

First order Taylor series expansion of y about expected

value E(y ) of y : y is approximately

p
1
E(y ) + p
(y E(y )).
2 E(y )

From this variance of y is approximately


1
Var(y )
4E(y )

Week 7 Lecture 3 - Variance stabilizing transformations

So if variance of y is proportional to the mean, variance of

y is approximately constant.

Square root transformation is often used with count data

if a Poisson distribution is an appropriate model for the


counts, then the variance is equal to the mean property of
Poisson distribution.
E(N) = Var(N) =

10

Week 7 Lecture 3 - Variance stabilizing transformations


Write y for a typical response.
Other common variance stabilizing transformations:
log y : appropriate when standard deviation is proportional to
the mean.
Exercise: demonstrate this result by taking an
appropriate 1st order Taylor series expansion of the
response.

11

Week 7 Lecture 3 - Variance stabilizing transformations

Other common transformations.

Log transformation: log y is approximately


log E(y ) + (y E(y ))/E(y )
giving variance of log y approximately
Var(y )
.
E(y )2
If the standard deviation is proportional to the mean, log
transformation stabilizes the variance.

12

Week 7 Lecture 3 - Variance stabilizing transformations

Inverse transformation 1/y : useful when error standard


deviation is proportional to the square of the mean.

All the transformations weve considered are only appropriate


with a positive response.

When zeros occur, might use log(y + 1) or 1/(y + 1) instead


of log and inverse transformations.

13

Week 7 Lecture 3 - Variance stabilizing transformations

Freeman-Tukey transformation y + y + 1: useful when


variance of y is proportional to the mean and some of the
responses are zero or very small.

sin1 ( y ): useful when variance of y is proportional to


E(y )(1 E(y )) such as binomial proportions where 0 y 1.

14

Week 7 Lecture 3 - Comparing models for transformations


of the response

If were building a model for predictive purposes, were usually


interested in predictions on the original scale.

We cant compare models for a response y and a


transformation z = f (y ) (f invertible) of that response by
looking at R 2 or
for the two different models.

How should we compare models then?

15

Week 7 Lecture 3 - Comparing models for transformations


of the response

Develop statistic for model for transformed response z which


can be compared with the PRESS statistic for the model for y .

Write zi,i : prediction of zi based on fit to all the data with


ith observation deleted.

Prediction of yi (original scale): f 1 (


zi,i ).

16

Week 7 Lecture 3 - Comparing models for transformations


of the response

Analogue of PRESS residual on original scale:


yi f 1 (
zi,i ).
Compare

n
X
i=1

(yi f 1 (
zi,i ))2

with PRESS statistic, or


n
X
i=1

|yi f 1 (
zi,i )|

with sum of absolute PRESS residuals.


Same idea can be used to compare models for two different
transformations of the response.

17

Week 7 Lecture 3 - Snow Geese

Aerial survey counting methods to estimate Snow Geese


numbers (Hudson Bay, Canada)

Test reliability of expert counters

Record expert estimated number of birds, compare to exact


photo numbers

Common modelling for count data is Poisson distribution


(mean=variance)

Square root transformation expected

Also consider log-transformation.

18

Week 7 Lecture 3 - Snow Geese

Transformation is improvement.
Still concern that variance increases with the mean - try log?

19

Week 7 Lecture 3 - Snow Geese

Log-transform appears better than square root transform.


However, may work with square root for interpretability (data are
counts).

20

Week 7 Lecture 3 - Snow Geese

Comparing the PRESS statistics on each scale:

Original

square root transform

log-transformed

Pn

f 1 (
zi,i ))2

172,738

137,603

122,704

Pn

|yi f 1 (
zi,i )|

1,475.55

1,295.89

1,257.81

i=1 (yi

i=1

Two statistics are consistent (not always the case see next example)
Log-transformation again seems to give better fit.

21

Week 7 Lecture 3 - Inflation rates

22

Week 7 Lecture 3 - Inflation rates

Comparing the PRESS statistics on each scale:


Original log-transformed
Pn

f 1 (
zi,i ))2

16,071

21,611

Pn

f 1 (
zi,i )|

433.84

431.19

i=1 (yi
i=1 |yi

Conflict between two statistics is due to outlier.

Log-transformation preferred on squared-PRESS statistic if


outlier removed.

23

Week 7 Lecture 3 - Weighted regression

If constant error variance assumptions seems to be violated,


transformation is one approach to fixing the problem.

Another approach: change the model to allow a variance


which is not constant.

Errors i , i = 1, ..., n. Suppose we know Var(i ) = i2


(variance of errors not necessarily constant) or suppose we
know weights wi such that Var(i ) = 2 wi where 2 is
unknown.

Write V for covariance matrix of the errors. V is diagonal


matrix with diagonal elements the i2 or 2 wi . V = 2 W
where W is diagonal matrix of weights in second situation.

24

Week 7 Lecture 3 - Weighted regression

Maximum likelihood estimator of :


= (X V 1 X )1 X V 1 y

where X is the design matrix and y is vector of responses.

Covariance matrix of ,
(X V 1 X )1 .
When V = 2 W , have
= (X W 1 X )1 X W 1 y
and covariance matrix is
2 (X W 1 X )1 .

25

Week 7 Lecture 3 - Weighted regression

minimizes

n
X
i=1

wi1 (yi xi )2

weighted least squares type criterion. Observations with large


variance are less reliable and get less weight.

26

Week 7 Lecture 3 - Weighted regression

When suitable weights wi are known, or when i2 is known,


weighted regression may be preferable to a variance stabilizing
transformation.

Much of the theory of linear models for constant variance case


can be carried over.

May be able to estimate weights from the data: if we have


multiple observations for each combination of predictor values,
variances can be estimated from the data.

Sometimes it may be natural to take the weights wi to be


proportional to one of the predictors.

27

Week 7 Lecture 3 - Example: Transfer efficiency data

Model equipment efficiency (response) as function of air


velocity and voltage
Experiment conducted:

For each of 2 levels of air velocity and voltage (4 in all)


Ten observations each combination

So can estimate variance for each (of 4) predictor


combinations

Perform weighted regression

28

Week 7 Lecture 3 - Example: Transfer efficiency data

Unweighted Regression:
The regression equation is
Efficiency = 143 0.927Voltage 0.138AirVelocity
S = 5.40890 R-Sq = 79.2% R-Sq(adj) = 78.1%

Weighted Regression:
The regression equation is
Efficiency = 142 0.924Voltage 0.124AirVelocity
S = 0.983881 R-Sq = 87.3% R-Sq(adj) = 86.6%

29

Week 7 Lecture 3 - Transformations and nonlinearity

Previously: transformations of the response to stabilize the


error variance.

Transformations can also be helpful when there is evidence of


a nonlinear relationship between the response and predictors.

Simplest case: interested in describing a relationship between


response y and a single predictor x.

Different nonlinear relationships can be captured by


transforming y , and incoporating transformation(s) of x into a
linear model.

Common nonlinear relationships: parabolic, hyperbolic,


exponential, inverse exponential, power.

30

Week 7 Lecture 3 - Transformations and nonlinearity

For the moment ignore random component of the model.


Assume y is positive.

Parabolic relationship between y and x:


y = 0 + 1 x + 2 x 2 .
Introduce new predictor x 2 (a transformation of x).

Hyperbolic relationship between y and x:


y=
Then

x
.
0 + 1 x

1
1
= 1 + 0 .
y
x

Transform response to 1/y , use predictor 1/x

31

Week 7 Lecture 3 - Transformations and nonlinearity

Exponential relationship:
y = 0 exp(1 x).
Take logarithms,
log y = log 0 + 1 x.

Inverse exponential relationship:


y = 0 exp(1 /x).
Take logarithms,
log y = log 0 + 1 /x.

So transform y to log y , use 1/x as predictor.


Power relationship:
y = 0 x 1 .
Take logarithms,
log y = log 0 + 1 log x.

32

Week 7 Lecture 3 - Transformations and model error


structure

Effect of transformations of the response on model error


structure need to be carefully considered.
Suppose
yi = 0 exp(1 xi ) + i
where i are zero mean errors with constant variance.
Taking logarithms of the mean of yi gives something linear in
unknown parameters.
But taking logarithms of both sides of above does not give a
model of the form
log yi = 0 + 1 xi + i

where the i are zero mean errors with constant variance.


Effect of transformation on model errors needs to be
considered. May be better to work with original nonlinear
model: nonlinear regression (will be covered in later statistics

33

Week 7 Lecture 2 - Learning Expectations.

Understand the use of variance stabilizing transformations.

Be able to perform and work with weighted regression


(including estimation of weights).

Appreciate the role of transformations for modelling non-linear


relationships between a response and a single predictor.

You might also like