You are on page 1of 7

Variance Stabilizing Transformations

Variance is Related to Mean


• Usual Assumption in ANOVA and Regression is that
the variance of each observation is the same
• Problem: In many cases, the variance is not constant, but
is related to the mean.
– Poisson Data (Counts of events): E(Y) = V(Y) = m
– Binomial Data (and Percents): E(Y) = np V(Y) = np(1-p)
– General Case: E(Y) = m V(Y) = W(m)
– Power relationship: V(Y) = s2 = a2m2b

b
s  am  ln( s )  ln( a )  b ln( m )  a  bm * *
Transformation to Stabilize Variance
(Approximately)
• V(Y) = s2 = W(m). Then let:

dm  V  f (Y )  constant
1
f (m )  
W(m ) 1/ 2

This results from a Taylor Series expansion:


f ( y )  f ( m )  (Y - m ) f ' ( m )
  f (Y ) - f ( m )  (Y - m )  f ' ( m )
2 2 2

2
 1 
 V ( f (Y ))  W( m )  1
1/ 2 
 (W( m )) 
Special Case: s2  a2m2b
Case 1 : b  1 :
1 1 1  m - b 1 
f (m )   dm   b dm     cm 1- b
W(m )1/ 2
am a  - b 1
Case 2 : b  1
1 1 1
f (m )   dm   dm  ln( m )
W(m )1/ 2
am a
Estimating b From Sample Data

• For each group in an ANOVA (or similar X levels


in Regression, obtain the sample mean and
standard deviation
• Fit a simple linear regression, relating the log of
the standard deviation to the log of the mean
• The regression coefficient of the log of the mean
is an estimate of b
• For large n, can fit a regression of squared
residuals on predictors expected to be related to
variance
Example - Bovine Growth Hormone
Bovine Growth Hormone Data

70

60

50

40
Std Dev

30

20

10

0
0 50 100 150 200 250 300 350 400 450 500
Mean
Example - Bovine Growth Hormone

ln(mean) ln(sd) Coefficients Standard Error


5.7807 3.6687 Intercept -1.0553 0.5373
6.0684 4.0993 ln(mean) 0.8396 0.0984
5.7900 3.6661
5.7621 3.9703
5.7838 3.8351
5.7930 3.7612
Estimated b = .84  1, A logarithmic
4.9972 3.1946 transformation on data should have
5.3799 3.4751 approximately constant variance
4.9416 2.9755
5.0239 3.4340
4.9904 3.0910
5.0239 3.0204

You might also like