Two Sample and ANOVA Handout

Comparing Two Means
Often we have two unknown means and are interested in comparing them to each other. Usually the null hypothesis is H0: no difference between the population means There are a number of related testing procedures we will present. Which testing procedure you choose depends on your data. We will present three basic procedures here: Paired t-test for paired or matched data Two-sample t-tests for comparing two independent groups. Two basic independentsample tests will be presented: Equal variance t-tests: the two groups can be assumed to have equal variances. Unequal variance t-tests: the two groups are not assumed to have equal variances.
Paired t-test
When the means being compared come from observations that are naturally paired or matched, a paired t-test is used. Examples: Before vs after studies, also called longitudinal studies produce paired data. Each patient contributes two paired observations: the before value and the after value. Other types of studies can produce paired data also. One possibility would be a dental study where both opposing treatments are used in each patient, in randomly assigned half-mouths.
Computing a Paired t-test

To compute a paired t-test, focus on the within-pair differences (for example after before). Perform a t-test on the mean of the differences. To test if the means are different the null-hypothesis is H0: differences = 0. Note: Even though we are comparing two means, this is still considered a one-sample test.
Example: fluoride varnish study

In ten at-risk children, fluoride varnish is applied in randomly assigned half-mouths. The remaining halfmouths are left untreated. The children are followed for two years and the new dmfs and locations are recorded: patient 1 2 3 4 5 6 7 8 9 10 mean sd varnish 2 1 0 2 0 0 2 1 3 5 1.6 untreated difference 3 -1 2 -1 1 -1 0 2 0 0 2 -2 5 -3 1 0 7 -4 4 1 2.5 -0.90 1.79
Comparing means of two independent samples These are called two-sample tests. Our goal is usually to estimate 1 - 2 and the corresponding confidence intervals and to perform hypothesis tests on: H0: 1 - 2 = 0. For each sample we compute the relevant statistics: Sample 1 Sample 2 n2 n1
X1
s1
X2
s2
The obvious statistic to compare the two population means is X 1 X 2 . Probability theory tells us that: 1. X 1 X 2 is the best estimate of 1 - 2
2 2 2. the standard error is 1 n1 + 2 n2
To perform the paired t-test, compute a one-sample t-test on the last column where H0: = 0.
T=
.90 0 = 1.59 1.79 10
For a two-tailed test compare |-1.59|=1.59 to t9, .975 = 2.262. We do not reject since 1.59 < 2.262. P-value is P(|t9| > |-1.59|) = 2P(t9 > 1.59) = 0.15.
3. for large n1and n2:
2 X 1 X 2 ~ N 1 2 , 12 n1 + 2 n2
In order to compute hypothesis tests and confidence intervals for 1 - 2 we will need to estimate the standard error of X 1 X 2 . Two different estimation procedures are commonly used depending on whether one feels it is reasonable to assume the two groups have similar variances. RULES OF THUMB for deciding whether to use the equal variance or unequal variance formulas 1. For small samples can use equal variance formulas unless s1 is twice as big as s2, or the other way around. 2. If n1 and n2 > 80 can use unequal variance formulae for SE (its easier to compute), and use the Normal distribution. 3. If you are unsure, the unequal variance formula will be the conservative choice (less power, but less likely to be incorrect). 4. The calculations are a snap with a computer program. If unsure about variance assumptions, compute the test both ways and see if there is a conflict.
Equal Variance case: 1 = 2

If it is reasonable to assume that 1=2, we can estimate the standard error more efficiently by combining the sample. Standard Error of X 1 X 2 is estimated by
SE ( X 1 X 2 ) = s pooled 1 n1 + 1 n2 ,
where the pooled standard deviation, spooled is

s pooled =
2 (n1 1)s12 + (n2 1) s2 n1 + n2 2 .
This pooled standard deviation is roughly the combined distance of observations from their respective means. The T statistic
Tequal = X1 X 2 SE( X 1 X 2 ) ,
has a t distribution with n1+ n2-2 degrees of freedom.
Example: Confidence Intervals for difference between means. Gum data from day 1. Gum A Gum C n2=40 n1=25 X 1 =-0.72 X 2 =2.63 s1=5.37 s2=3.80 Assume equal variances (s2 /s1< 2)
s pooled = 24 5.372 + 39 3.802 25 + 40 2 = 4.46 ,
SPSS output for Gum example:

T-Test
Group Statistics gum type A C N 25 40 Mean -0.7200 2.6250 Std. Deviation 5.36594 3.80073 Std. Error Mean 1.07319 0.60095
change in DMFS
Independent Samples Test Levene's Test for Equality of Variances
t-test for Equality of Means Sig. (2tailed) 0.005 Mean Difference -3.345 Std. Error Difference 1.138
SE ( X 1 X 2 ) 4.46 1 25 + 1 40 = 1.14 ,
Sig.
t -2.940
df 63
so 95% confidence interval is
change in Equal DMFS variances assumed 0.924 0.340 Equal variances not assumed
- 0.72 - 2.63 2.00 1.14 = ( 5.63, 1.07)

t63,.975 Note: Since confidence interval does not cover 0, this implies that a two-sided hypothesis test of H0: 1 - 2 = 0, would reject at level =0.05. check: T = |(-0.72 - 2.63)/1.14| = 2.94 > 2.00 = t63,.975.
-2.720
39.05
0.010
-3.345
1.230
95% Confidence Interval of the Difference Lower Upper -5.61840 -5.83279 -1.07160 -0.85721
Unequal Variance case: 1 2 If one is not sure that the variances are equal it is usually safest to assume that they are not. Standard Error of X 1 X 2 is estimated by
2 SE ( X 1 X 2 ) = s12 n1 + s2 n2 .
Example: NHANES III data

807 participants who got both dental exam and answered chewing tobacco question. SPSS t-test output below.
Group Statistics N currently chew tobacco mean attachment yes loss no Std. Std. Error Deviation Mean Mean 341 1.71 1.724 0.093 466 1.50 1.381 0.064
The T statistic
Tunequal = X1 X 2 SE ( X 1 X 2 ) ,
mean attachment Equal variances loss assumed Equal variances not assumed 95% Confidence Interval of the Difference Lower Upper
Independent Samples Test Levene's Test for Equality of Variances F Sig. t
t-test for Equality of Means df Sig. Mean Std. Error (2-tailed) Difference Difference
has a t distribution with degrees of freedom that can be estimated by:

n1 + s n 2 2 ( s n1 ) 2 ( s 2 n2 ) 2 + n1 1 n2 1
2 1
5.682 0.02
1.980
805
0.048
0.22
0.109
(s
2 1
2 2
1.914
632.3
0.056
0.22
0.113
Note: If n1 and n2 > 80, then can use standard Normal distribution in place of t, which removes necessity to estimate degrees of freedom.
What to do? In this case choose the unequal-variances results. They rely less on assumptions, plus the sample sizes are large enough so that the SEM estimates are probably close to optimal even if the variances are equal.
0.002 -0.006
0.431 0.439
6/8/2010
ANOVA - Analysis of Variance
ANOVA - Analysis of Variance

Extends independent-samples t test Compares the means of groups of independent observations
Dont be fooled by the name. ANOVA does not compare variances.
Can compare more than two groups
ANOVA Null and Alternative Hypotheses

Say the sample contains K independent groups
That is, the group means are all equal
The alternative hypothesis is H1: i j

for some i, j
15 implants were placed using each guide. Error (discrepancies with a reference implant) was measured for each implant.
or, the group means are not all equal
0.23
0.24
0.25
H0: 1 = 2 = = K
Implants were placed in a manikin using placement guides of various widths.
Mean Implant Height Error (mm)
0.26
ANOVA tests the null hypothesis
0.27
Example: Accuracy of Implant Placement
Mean Error by Guide Width
4mm
6mm Guide Width
8mm
0.27
The overall mean of the entire sample was 0.248 mm. This is called the grand mean, and is often denoted by X . If H0 were true then wed expect the group means to be close to the grand mean.
The ANOVA test is based on the combined distances from X . If the combined distances are large, that indicates we should reject H0.
0.26
0.25
0.24
0.23
4mm
6mm Guide Width
8mm
0.23
0.24
0.25
0.26
0.27
4mm
6mm Guide Width
8mm
6/8/2010
The Anova Statistic

To combine the differences from the grand mean we
Square the differences Multiply by the numbers of observations in the groups Sum over the groups
How big is big?

For the Implant Accuracy Data, SSB = 0.0047
SSB = 15 X 4 mm X + 15 X 6 mm X + 15 X 8 mm X
where the X * are the group means.
Is that big enough to reject H0? As with the t test, we compare the statistic to the variability of the individual observations. In ANOVA the variability is estimated by the Mean Square Error, or MSE
SSB = Sum of Squares Between groups

Note: This looks a bit like a variance.
MSE Mean Square Error

The Mean Square Error is a measure of the variability after the group effects have been taken into account.
0.5
Implant Height Error by Guide Width
MSE Mean Square Error

The Mean Square Error is a measure of the variability after the group effects have been taken into account.
0.5
Implant Height Error by Guide Width
Implant Height Error (mm)
Implant Height Error (mm)
0.4
0.3
MSE =
1 2 (xij X j ) N K j i
MSE =
1 2 (xij X j ) N K j i
0.2
where xij is the ith observation in the jth group.
4mm
6mm Guide Width
8mm
Note that the variation of the means seems quite small compared to the variance of observations within groups
0.1
0.1
0.2
0.3
0.4
4mm
6mm Guide Width
8mm
Notes on MSE
If there are only two groups, the MSE is equal to the pooled estimate of variance used in the equalvariance t test. ANOVA assumes that all the group variances are equal.
ANOVA F-statistic
The ANOVA is based on the F statistic
F=
SSB (K 1) MSE
where K is the number of groups.
Under H0 the F statistic has an F distribution, with K-1 and N-K degrees of freedom (N is the total number of observations)
6/8/2010
Implant Data: p-value

To get a p-value we compare our F statistic to an F(2, 42) distribution.
F(2,42) distribution
Implant Data: p-value

To get a p-value we compare our F statistic to an F(2, 42) distribution. In our example
F(2,42) distribution
F=
.0047 2 = .211 .0467 42

P = .81
The p-value is
0 1 2 F 3 4
P (F (2,42) > .211) = 0.81
0 0.211
2 F
ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Squares Between Groups Within Groups Total .005 .466 .470 Mean Square .002 .011 Sum of Squares Between Groups Within Groups Total .005 .466 .470
ANOVA Table
Results are often displayed using an ANOVA Table
Mean Square .002 .011
df 2 42 44
F .211
Sig. .811
df 2 42 44
F .211
Sig. .811
Sum of Squares Between (SSB)
Mean Square Error (MSE)
F Statistic
p value
Post Hoc Tests

NHANES I data, women 40-60 yrs old. Compare cholesterol between periodontal groups. The ANOVA shows good evidence (p = 0.002) that the means are not all the same.
Sum of Squares Between 33383 Groups Within 4417119 Groups Total 4450502 df 3 2007 2010 Mean Square 11128 2201 F 5.1 Sig. .002 Healthy Gingivitis Periodontitis Edentulous
Least Significant Difference test

N 802 490 347 372 Mean 221.5 223.5 227.3 232.4 Std. Deviation 46. 2 45.3 48.9 48. 8
The most simple post hoc test is called the Least Significant Difference Test. The computation is very similar to the equalvariance t test.
Which means are different? Can directly compare the subgroups using post hoc tests.
Sum of Squares Between 33383 Groups Within 4417119 Groups Total 4450502
df 3 2007 2010
Mean Square 11128 2201
F 5.1
Sig. .002
Compute an equal-variance t test, but replace the pooled variance (s2) with the MSE.
6/8/2010
Least Significant Difference Test: Examples

N Healthy Gingivitis Periodontitis Edentulous 802 490 347 372 Mean 221.5 223.5 227.3 232.4 Std. Deviation 46. 2 45.3 48.9 48. 8
Post Hoc Tests: Multiple Comparisons

Post-hoc testing usually involves multiple comparisons. For example, if the data contain 4 groups, then 6 different pairwise comparisons can be made
Healthy Gingivitis
Compare Healthy group to Periodontitis group:
T=
221.5 227.3 = 1.92 2201(1 802 + 1 347)
p = 2 P(t1147 > 1.92) = 0.055

Compare Gingivitis group to Periodontitis group: 223.5 227.3 T= = 1.15 2201(1 490 + 1 347)
Sum of Squares Between 33383 Groups Within 4417119 Groups Total 4450502
df 3 2007 2010
Mean Square 11128 2201
F 5.1
Sig. .002
Periodontitis
Edentulous
p = 2 P(t835 > 1.15) = 0.25
Post Hoc Tests: Multiple Comparisons

Each time a hypothesis test is performed at significance level , there is probability of rejecting in error. Performing multiple tests increases the chances of rejecting in error at least once. For example:
if you did 6 independent hypothesis tests at the = 0.05 If, in truth, H0 were true for all six. The probability that at least one test rejects H0 is 26% P(at least one rejection) = 1-P(no rejections) = 1-.956 = .26
Bonferroni Correction for Multiple Comparisons

The Bonferroni correction is a simple way to adjust for the multiple comparisons.
Bonferroni Correction Perform each test at significance level . Multiply each p-value by the number of tests performed. The overall significance level (chance of any of the tests rejecting in error) will be less than .
Example: Cholesterol Data post-hoc comparisons

Group 1 Healthy Healthy Healthy Gingivitis Gingivitis Periodontitis Group 2 Gingivitis Periodontitis Edentulous Periodontitis Edentulous Edentulous Mean Difference (Group 1 Group 2) -2.0 -5.8 -10.9 -3.9 -8.9 -5.1 Least Significant Difference p-value .46 .055 .00021 .25 .0056 .147 Bonferroni p-value 1.0 .330 .00126 1.0 .0336 .88
Example: Cholesterol Data post-hoc comparisons

Group 1 Healthy Healthy Healthy Gingivitis Gingivitis Periodontitis Group 2 Gingivitis Periodontitis Edentulous Periodontitis Edentulous Edentulous Mean Difference (Group 1 Group 2) -2.0 -5.8 -10.9 -3.9 -8.9 -5.1 Least Significant Difference p-value .46 .055 .00021 .25 .0056 .147 Bonferroni p-value 1.0 .330 .00126 1.0 .0336 .88
Conclusion: The Edentulous group is significantly different than the Healthy group and the Gingivitis group (p < 0.05), after adjustment for multiple comparisons

Two Sample and ANOVA Handout

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Two Sample and ANOVA Handout

Uploaded by

Copyright:

Available Formats

Comparing Two Means

Computing a Paired t-test

Example: fluoride varnish study

.90 0 = 1.59 1.79 10

3. for large n1and n2:

Equal Variance case: 1 = 2

where the pooled standard deviation, spooled is

has a t distribution with n1+ n2-2 degrees of freedom.

SPSS output for Gum example:

Independent Samples Test Levene's Test for Equality of Variances

so 95% confidence interval is

- 0.72 - 2.63 2.00 1.14 = ( 5.63, 1.07)

Example: NHANES III data

Independent Samples Test Levene's Test for Equality of Variances F Sig. t

has a t distribution with degrees of freedom that can be estimated by:

ANOVA - Analysis of Variance

ANOVA - Analysis of Variance

Can compare more than two groups

ANOVA Null and Alternative Hypotheses

That is, the group means are all equal

The alternative hypothesis is H1: i j

or, the group means are not all equal

Implants were placed in a manikin using placement guides of various widths.

Mean Implant Height Error (mm)

ANOVA tests the null hypothesis

Example: Accuracy of Implant Placement

Mean Error by Guide Width

6mm Guide Width

Mean Implant Height Error (mm)

Mean Implant Height Error (mm)

6mm Guide Width

Example: Accuracy of Implant Placement

Mean Error by Guide Width

Example: Accuracy of Implant Placement

Mean Error by Guide Width

6mm Guide Width

The Anova Statistic

How big is big?

SSB = Sum of Squares Between groups

MSE Mean Square Error

Implant Height Error by Guide Width

MSE Mean Square Error

Implant Height Error by Guide Width

Implant Height Error (mm)

Implant Height Error (mm)

where xij is the ith observation in the jth group.

6mm Guide Width

6mm Guide Width

where K is the number of groups.

Implant Data: p-value

Implant Data: p-value

.0047 2 = .211 .0467 42

P (F (2,42) > .211) = 0.81

Sum of Squares Between (SSB)

Mean Square Error (MSE)

Post Hoc Tests

Least Significant Difference test

Mean Square 11128 2201

Least Significant Difference Test: Examples

Post Hoc Tests: Multiple Comparisons

Compare Healthy group to Periodontitis group:

221.5 227.3 = 1.92 2201(1 802 + 1 347)

p = 2 P(t1147 > 1.92) = 0.055

Mean Square 11128 2201