21 Week 7 Lect 3

2
MATH2831/2931
Linear Models/ Higher Linear Models.
September 13, 2013
Week 7 Lecture 3 - Last lecture:
DFFITS
DFBETAS
COVRATIO
Week 7 Lecture 3 - This lecture:
Variance stabilizing transformations.
Examples: snow geese, inflation rates and central bank

independence.
Weighted regression
Transformations for modelling non-linear relationships

between a response and a single predictor.
Week 7 Lecture 3 - Transformations
Diagnostic measures detect violations of assumptions: how do

we deal with these violations?
One approach which is sometimes applicable:

use of transformations.
Appropriateness of transformation and the type of

transformation used depends on the nature of the violations of
assumptions.
Are there problems with: constant variance, independence of

noise random variables i , distributional assumption for noise,
mean...
Week 7 Lecture 3 - Transformations
This lecture: variance stabilizing transformations

(appropriate when constancy of error variance is violated).
Applying a transformation to fix a violation of assumptions

may cause another violation which did not appear on the
original scale.
What is the effect of transformation on the model for the

errors?
Week 7 Lecture 3 - Variance stabilizing transformations

Write y for a typical response.
Common variance stabilizing transformations:
Square root transformation,

y : appropriate when error
variance is proportional to the mean.
Exercise: demonstrate this result by taking an
appropriate 1st order Taylor series expansion of the
response.

First order Taylor series expansion of y about expected
value E(y ) of y : y is approximately
p
1
E(y ) + p
(y E(y )).
2 E(y )
From this variance of y is approximately

1
Var(y )
4E(y )
So if variance of y is proportional to the mean, variance of
y is approximately constant.
Square root transformation is often used with count data
if a Poisson distribution is an appropriate model for the

counts, then the variance is equal to the mean property of
Poisson distribution.
E(N) = Var(N) =
10

Other common variance stabilizing transformations:
log y : appropriate when standard deviation is proportional to
the mean.
Exercise: demonstrate this result by taking an
appropriate 1st order Taylor series expansion of the
response.
11
Other common transformations.
Log transformation: log y is approximately

log E(y ) + (y E(y ))/E(y )
giving variance of log y approximately
Var(y )
.
E(y )2
If the standard deviation is proportional to the mean, log
transformation stabilizes the variance.
12
Inverse transformation 1/y : useful when error standard

deviation is proportional to the square of the mean.
All the transformations weve considered are only appropriate

with a positive response.
When zeros occur, might use log(y + 1) or 1/(y + 1) instead

of log and inverse transformations.
13
Freeman-Tukey transformation y + y + 1: useful when

variance of y is proportional to the mean and some of the
responses are zero or very small.
sin1 ( y ): useful when variance of y is proportional to

E(y )(1 E(y )) such as binomial proportions where 0 y 1.
14
Week 7 Lecture 3 - Comparing models for transformations

of the response
If were building a model for predictive purposes, were usually

interested in predictions on the original scale.
We cant compare models for a response y and a

transformation z = f (y ) (f invertible) of that response by
looking at R 2 or
for the two different models.
How should we compare models then?
15

of the response
Develop statistic for model for transformed response z which

can be compared with the PRESS statistic for the model for y .
Write zi,i : prediction of zi based on fit to all the data with

ith observation deleted.
Prediction of yi (original scale): f 1 (

zi,i ).
16

of the response
Analogue of PRESS residual on original scale:

yi f 1 (
zi,i ).
Compare
n
X
i=1
(yi f 1 (
zi,i ))2
with PRESS statistic, or

n
X
i=1
|yi f 1 (
zi,i )|
with sum of absolute PRESS residuals.

Same idea can be used to compare models for two different
transformations of the response.
17
Week 7 Lecture 3 - Snow Geese
Aerial survey counting methods to estimate Snow Geese

numbers (Hudson Bay, Canada)
Test reliability of expert counters
Record expert estimated number of birds, compare to exact

photo numbers
Common modelling for count data is Poisson distribution

(mean=variance)
Square root transformation expected
Also consider log-transformation.
18
Transformation is improvement.
Still concern that variance increases with the mean - try log?
19
Log-transform appears better than square root transform.

However, may work with square root for interpretability (data are
counts).
20
Comparing the PRESS statistics on each scale:
Original
square root transform
log-transformed
Pn
f 1 (
zi,i ))2
172,738
137,603
122,704
Pn
|yi f 1 (
zi,i )|
1,475.55
1,295.89
1,257.81
i=1 (yi
i=1
Two statistics are consistent (not always the case see next example)
Log-transformation again seems to give better fit.
21
Week 7 Lecture 3 - Inflation rates
22
Week 7 Lecture 3 - Inflation rates
Comparing the PRESS statistics on each scale:

Original log-transformed
Pn
f 1 (
zi,i ))2
16,071
21,611
Pn
f 1 (
zi,i )|
433.84
431.19
i=1 (yi
i=1 |yi
Conflict between two statistics is due to outlier.
Log-transformation preferred on squared-PRESS statistic if

outlier removed.
23
Week 7 Lecture 3 - Weighted regression
If constant error variance assumptions seems to be violated,

transformation is one approach to fixing the problem.
Another approach: change the model to allow a variance

which is not constant.
Errors i , i = 1, ..., n. Suppose we know Var(i ) = i2

(variance of errors not necessarily constant) or suppose we
know weights wi such that Var(i ) = 2 wi where 2 is
unknown.
Write V for covariance matrix of the errors. V is diagonal

matrix with diagonal elements the i2 or 2 wi . V = 2 W
where W is diagonal matrix of weights in second situation.
24
Maximum likelihood estimator of :

= (X V 1 X )1 X V 1 y
where X is the design matrix and y is vector of responses.
Covariance matrix of ,
(X V 1 X )1 .
When V = 2 W , have
= (X W 1 X )1 X W 1 y
and covariance matrix is
2 (X W 1 X )1 .
25
minimizes
n
X
i=1
wi1 (yi xi )2
weighted least squares type criterion. Observations with large

variance are less reliable and get less weight.
26
When suitable weights wi are known, or when i2 is known,

weighted regression may be preferable to a variance stabilizing
transformation.
Much of the theory of linear models for constant variance case

can be carried over.
May be able to estimate weights from the data: if we have

multiple observations for each combination of predictor values,
variances can be estimated from the data.
Sometimes it may be natural to take the weights wi to be

proportional to one of the predictors.
27
Week 7 Lecture 3 - Example: Transfer efficiency data
Model equipment efficiency (response) as function of air

velocity and voltage
Experiment conducted:
For each of 2 levels of air velocity and voltage (4 in all)

Ten observations each combination
So can estimate variance for each (of 4) predictor

combinations
Perform weighted regression
28
Week 7 Lecture 3 - Example: Transfer efficiency data
Unweighted Regression:
The regression equation is
Efficiency = 143 0.927Voltage 0.138AirVelocity
S = 5.40890 R-Sq = 79.2% R-Sq(adj) = 78.1%
Weighted Regression:
The regression equation is
Efficiency = 142 0.924Voltage 0.124AirVelocity
S = 0.983881 R-Sq = 87.3% R-Sq(adj) = 86.6%
29
Week 7 Lecture 3 - Transformations and nonlinearity
Previously: transformations of the response to stabilize the

error variance.
Transformations can also be helpful when there is evidence of

a nonlinear relationship between the response and predictors.
Simplest case: interested in describing a relationship between

response y and a single predictor x.
Different nonlinear relationships can be captured by

transforming y , and incoporating transformation(s) of x into a
linear model.
Common nonlinear relationships: parabolic, hyperbolic,

exponential, inverse exponential, power.
30
For the moment ignore random component of the model.

Assume y is positive.
Parabolic relationship between y and x:

y = 0 + 1 x + 2 x 2 .
Introduce new predictor x 2 (a transformation of x).
Hyperbolic relationship between y and x:

y=
Then
x
.
0 + 1 x
1
1
= 1 + 0 .
y
x
Transform response to 1/y , use predictor 1/x
31
Exponential relationship:
y = 0 exp(1 x).
Take logarithms,
log y = log 0 + 1 x.
Inverse exponential relationship:

y = 0 exp(1 /x).
Take logarithms,
log y = log 0 + 1 /x.
So transform y to log y , use 1/x as predictor.

Power relationship:
y = 0 x 1 .
Take logarithms,
log y = log 0 + 1 log x.
32
Week 7 Lecture 3 - Transformations and model error

structure
Effect of transformations of the response on model error

structure need to be carefully considered.
Suppose
yi = 0 exp(1 xi ) + i
where i are zero mean errors with constant variance.
Taking logarithms of the mean of yi gives something linear in
unknown parameters.
But taking logarithms of both sides of above does not give a
model of the form
log yi = 0 + 1 xi + i
where the i are zero mean errors with constant variance.

Effect of transformation on model errors needs to be
considered. May be better to work with original nonlinear
model: nonlinear regression (will be covered in later statistics
33
Week 7 Lecture 2 - Learning Expectations.
Understand the use of variance stabilizing transformations.
Be able to perform and work with weighted regression

(including estimation of weights).
Appreciate the role of transformations for modelling non-linear

relationships between a response and a single predictor.

21 Week 7 Lect 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21 Week 7 Lect 3

Uploaded by

Copyright:

Available Formats

2

September 13, 2013

Week 7 Lecture 3 - Last lecture:

Week 7 Lecture 3 - This lecture:

Variance stabilizing transformations.

Examples: snow geese, inflation rates and central bank

Transformations for modelling non-linear relationships

Week 7 Lecture 3 - Transformations

Diagnostic measures detect violations of assumptions: how do

One approach which is sometimes applicable:

Appropriateness of transformation and the type of

Are there problems with: constant variance, independence of

Week 7 Lecture 3 - Transformations

This lecture: variance stabilizing transformations

Applying a transformation to fix a violation of assumptions

What is the effect of transformation on the model for the

Week 7 Lecture 3 - Variance stabilizing transformations

Square root transformation,

Week 7 Lecture 3 - Variance stabilizing transformations

First order Taylor series expansion of y about expected

value E(y ) of y : y is approximately

From this variance of y is approximately

Week 7 Lecture 3 - Variance stabilizing transformations

So if variance of y is proportional to the mean, variance of

Square root transformation is often used with count data

if a Poisson distribution is an appropriate model for the

Week 7 Lecture 3 - Variance stabilizing transformations

Week 7 Lecture 3 - Variance stabilizing transformations

Other common transformations.

Log transformation: log y is approximately

Week 7 Lecture 3 - Variance stabilizing transformations

Inverse transformation 1/y : useful when error standard

All the transformations weve considered are only appropriate

When zeros occur, might use log(y + 1) or 1/(y + 1) instead

Week 7 Lecture 3 - Variance stabilizing transformations

Freeman-Tukey transformation y + y + 1: useful when

sin1 ( y ): useful when variance of y is proportional to

Week 7 Lecture 3 - Comparing models for transformations

If were building a model for predictive purposes, were usually

We cant compare models for a response y and a

How should we compare models then?

Week 7 Lecture 3 - Comparing models for transformations

Develop statistic for model for transformed response z which

Write zi,i : prediction of zi based on fit to all the data with

Prediction of yi (original scale): f 1 (

Week 7 Lecture 3 - Comparing models for transformations

Analogue of PRESS residual on original scale:

with PRESS statistic, or

with sum of absolute PRESS residuals.

Week 7 Lecture 3 - Snow Geese

Aerial survey counting methods to estimate Snow Geese

Test reliability of expert counters

Record expert estimated number of birds, compare to exact

Common modelling for count data is Poisson distribution

Square root transformation expected

Also consider log-transformation.

Week 7 Lecture 3 - Snow Geese

Week 7 Lecture 3 - Snow Geese

Log-transform appears better than square root transform.

Week 7 Lecture 3 - Snow Geese

Comparing the PRESS statistics on each scale:

square root transform

Week 7 Lecture 3 - Inflation rates