You are on page 1of 7

Comparing Two Means

Often we have two unknown means and are interested in comparing them to each other. Usually the null hypothesis is H0: no difference between the population means There are a number of related testing procedures we will present. Which testing procedure you choose depends on your data. We will present three basic procedures here: Paired t-test for paired or matched data Two-sample t-tests for comparing two independent groups. Two basic independentsample tests will be presented: Equal variance t-tests: the two groups can be assumed to have equal variances. Unequal variance t-tests: the two groups are not assumed to have equal variances.

Paired t-test
When the means being compared come from observations that are naturally paired or matched, a paired t-test is used. Examples: Before vs after studies, also called longitudinal studies produce paired data. Each patient contributes two paired observations: the before value and the after value. Other types of studies can produce paired data also. One possibility would be a dental study where both opposing treatments are used in each patient, in randomly assigned half-mouths.

Computing a Paired t-test


To compute a paired t-test, focus on the within-pair differences (for example after before). Perform a t-test on the mean of the differences. To test if the means are different the null-hypothesis is H0: differences = 0. Note: Even though we are comparing two means, this is still considered a one-sample test.

Example: fluoride varnish study


In ten at-risk children, fluoride varnish is applied in randomly assigned half-mouths. The remaining halfmouths are left untreated. The children are followed for two years and the new dmfs and locations are recorded: patient 1 2 3 4 5 6 7 8 9 10 mean sd varnish 2 1 0 2 0 0 2 1 3 5 1.6 untreated difference 3 -1 2 -1 1 -1 0 2 0 0 2 -2 5 -3 1 0 7 -4 4 1 2.5 -0.90 1.79

Comparing means of two independent samples These are called two-sample tests. Our goal is usually to estimate 1 - 2 and the corresponding confidence intervals and to perform hypothesis tests on: H0: 1 - 2 = 0. For each sample we compute the relevant statistics: Sample 1 Sample 2 n2 n1
X1

s1

X2

s2

The obvious statistic to compare the two population means is X 1 X 2 . Probability theory tells us that: 1. X 1 X 2 is the best estimate of 1 - 2
2 2 2. the standard error is 1 n1 + 2 n2

To perform the paired t-test, compute a one-sample t-test on the last column where H0: = 0.

T=

.90 0 = 1.59 1.79 10

For a two-tailed test compare |-1.59|=1.59 to t9, .975 = 2.262. We do not reject since 1.59 < 2.262. P-value is P(|t9| > |-1.59|) = 2P(t9 > 1.59) = 0.15.

3. for large n1and n2:

2 X 1 X 2 ~ N 1 2 , 12 n1 + 2 n2

In order to compute hypothesis tests and confidence intervals for 1 - 2 we will need to estimate the standard error of X 1 X 2 . Two different estimation procedures are commonly used depending on whether one feels it is reasonable to assume the two groups have similar variances. RULES OF THUMB for deciding whether to use the equal variance or unequal variance formulas 1. For small samples can use equal variance formulas unless s1 is twice as big as s2, or the other way around. 2. If n1 and n2 > 80 can use unequal variance formulae for SE (its easier to compute), and use the Normal distribution. 3. If you are unsure, the unequal variance formula will be the conservative choice (less power, but less likely to be incorrect). 4. The calculations are a snap with a computer program. If unsure about variance assumptions, compute the test both ways and see if there is a conflict.

Equal Variance case: 1 = 2


If it is reasonable to assume that 1=2, we can estimate the standard error more efficiently by combining the sample. Standard Error of X 1 X 2 is estimated by
SE ( X 1 X 2 ) = s pooled 1 n1 + 1 n2 ,

where the pooled standard deviation, spooled is


s pooled =
2 (n1 1)s12 + (n2 1) s2 n1 + n2 2 .

This pooled standard deviation is roughly the combined distance of observations from their respective means. The T statistic
Tequal = X1 X 2 SE( X 1 X 2 ) ,

has a t distribution with n1+ n2-2 degrees of freedom.

Example: Confidence Intervals for difference between means. Gum data from day 1. Gum A Gum C n2=40 n1=25 X 1 =-0.72 X 2 =2.63 s1=5.37 s2=3.80 Assume equal variances (s2 /s1< 2)
s pooled = 24 5.372 + 39 3.802 25 + 40 2 = 4.46 ,

SPSS output for Gum example:


T-Test
Group Statistics gum type A C N 25 40 Mean -0.7200 2.6250 Std. Deviation 5.36594 3.80073 Std. Error Mean 1.07319 0.60095

change in DMFS

Independent Samples Test Levene's Test for Equality of Variances

t-test for Equality of Means Sig. (2tailed) 0.005 Mean Difference -3.345 Std. Error Difference 1.138

SE ( X 1 X 2 ) 4.46 1 25 + 1 40 = 1.14 ,

Sig.

t -2.940

df 63

so 95% confidence interval is

change in Equal DMFS variances assumed 0.924 0.340 Equal variances not assumed

- 0.72 - 2.63 2.00 1.14 = ( 5.63, 1.07)


t63,.975 Note: Since confidence interval does not cover 0, this implies that a two-sided hypothesis test of H0: 1 - 2 = 0, would reject at level =0.05. check: T = |(-0.72 - 2.63)/1.14| = 2.94 > 2.00 = t63,.975.

-2.720

39.05

0.010

-3.345

1.230

95% Confidence Interval of the Difference Lower Upper -5.61840 -5.83279 -1.07160 -0.85721

Unequal Variance case: 1 2 If one is not sure that the variances are equal it is usually safest to assume that they are not. Standard Error of X 1 X 2 is estimated by
2 SE ( X 1 X 2 ) = s12 n1 + s2 n2 .

Example: NHANES III data


807 participants who got both dental exam and answered chewing tobacco question. SPSS t-test output below.
Group Statistics N currently chew tobacco mean attachment yes loss no Std. Std. Error Deviation Mean Mean 341 1.71 1.724 0.093 466 1.50 1.381 0.064

The T statistic
Tunequal = X1 X 2 SE ( X 1 X 2 ) ,
mean attachment Equal variances loss assumed Equal variances not assumed 95% Confidence Interval of the Difference Lower Upper

Independent Samples Test Levene's Test for Equality of Variances F Sig. t

t-test for Equality of Means df Sig. Mean Std. Error (2-tailed) Difference Difference

has a t distribution with degrees of freedom that can be estimated by:


n1 + s n 2 2 ( s n1 ) 2 ( s 2 n2 ) 2 + n1 1 n2 1
2 1

5.682 0.02

1.980

805

0.048

0.22

0.109

(s

2 1

2 2

1.914

632.3

0.056

0.22

0.113

Note: If n1 and n2 > 80, then can use standard Normal distribution in place of t, which removes necessity to estimate degrees of freedom.

What to do? In this case choose the unequal-variances results. They rely less on assumptions, plus the sample sizes are large enough so that the SEM estimates are probably close to optimal even if the variances are equal.

0.002 -0.006

0.431 0.439

6/8/2010

ANOVA - Analysis of Variance

ANOVA - Analysis of Variance


Extends independent-samples t test Compares the means of groups of independent observations
Dont be fooled by the name. ANOVA does not compare variances.

Can compare more than two groups

ANOVA Null and Alternative Hypotheses


Say the sample contains K independent groups

That is, the group means are all equal

The alternative hypothesis is H1: i j


for some i, j

15 implants were placed using each guide. Error (discrepancies with a reference implant) was measured for each implant.

or, the group means are not all equal

0.23

0.24

0.25

H0: 1 = 2 = = K

Implants were placed in a manikin using placement guides of various widths.

Mean Implant Height Error (mm)

0.26

ANOVA tests the null hypothesis

0.27

Example: Accuracy of Implant Placement

Mean Error by Guide Width

4mm

6mm Guide Width

8mm

0.27

Mean Implant Height Error (mm)

Mean Implant Height Error (mm)

The overall mean of the entire sample was 0.248 mm. This is called the grand mean, and is often denoted by X . If H0 were true then wed expect the group means to be close to the grand mean.

The ANOVA test is based on the combined distances from X . If the combined distances are large, that indicates we should reject H0.

0.26

0.25

0.24

0.23

4mm

6mm Guide Width

8mm

0.23

0.24

0.25

0.26

0.27

Example: Accuracy of Implant Placement

Mean Error by Guide Width

Example: Accuracy of Implant Placement

Mean Error by Guide Width

4mm

6mm Guide Width

8mm

6/8/2010

The Anova Statistic


To combine the differences from the grand mean we
Square the differences Multiply by the numbers of observations in the groups Sum over the groups

How big is big?


For the Implant Accuracy Data, SSB = 0.0047

SSB = 15 X 4 mm X + 15 X 6 mm X + 15 X 8 mm X
where the X * are the group means.

Is that big enough to reject H0? As with the t test, we compare the statistic to the variability of the individual observations. In ANOVA the variability is estimated by the Mean Square Error, or MSE

SSB = Sum of Squares Between groups


Note: This looks a bit like a variance.

MSE Mean Square Error


The Mean Square Error is a measure of the variability after the group effects have been taken into account.
0.5

Implant Height Error by Guide Width

MSE Mean Square Error


The Mean Square Error is a measure of the variability after the group effects have been taken into account.
0.5

Implant Height Error by Guide Width

Implant Height Error (mm)

Implant Height Error (mm)

0.4

0.3

MSE =

1 2 (xij X j ) N K j i

MSE =

1 2 (xij X j ) N K j i

0.2

where xij is the ith observation in the jth group.

4mm

6mm Guide Width

8mm

Note that the variation of the means seems quite small compared to the variance of observations within groups

0.1

0.1

0.2

0.3

0.4

4mm

6mm Guide Width

8mm

Notes on MSE
If there are only two groups, the MSE is equal to the pooled estimate of variance used in the equalvariance t test. ANOVA assumes that all the group variances are equal.

ANOVA F-statistic
The ANOVA is based on the F statistic

F=

SSB (K 1) MSE

where K is the number of groups.

Under H0 the F statistic has an F distribution, with K-1 and N-K degrees of freedom (N is the total number of observations)

6/8/2010

Implant Data: p-value


To get a p-value we compare our F statistic to an F(2, 42) distribution.

F(2,42) distribution

Implant Data: p-value


To get a p-value we compare our F statistic to an F(2, 42) distribution. In our example

F(2,42) distribution

F=

.0047 2 = .211 .0467 42


P = .81

The p-value is
0 1 2 F 3 4

P (F (2,42) > .211) = 0.81

0 0.211

2 F

ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Squares Between Groups Within Groups Total .005 .466 .470 Mean Square .002 .011 Sum of Squares Between Groups Within Groups Total .005 .466 .470

ANOVA Table
Results are often displayed using an ANOVA Table
Mean Square .002 .011

df 2 42 44

F .211

Sig. .811

df 2 42 44

F .211

Sig. .811

Sum of Squares Between (SSB)

Mean Square Error (MSE)

F Statistic

p value

Post Hoc Tests


NHANES I data, women 40-60 yrs old. Compare cholesterol between periodontal groups. The ANOVA shows good evidence (p = 0.002) that the means are not all the same.
Sum of Squares Between 33383 Groups Within 4417119 Groups Total 4450502 df 3 2007 2010 Mean Square 11128 2201 F 5.1 Sig. .002 Healthy Gingivitis Periodontitis Edentulous

Least Significant Difference test


N 802 490 347 372 Mean 221.5 223.5 227.3 232.4 Std. Deviation 46. 2 45.3 48.9 48. 8

The most simple post hoc test is called the Least Significant Difference Test. The computation is very similar to the equalvariance t test.

Which means are different? Can directly compare the subgroups using post hoc tests.

Sum of Squares Between 33383 Groups Within 4417119 Groups Total 4450502

df 3 2007 2010

Mean Square 11128 2201

F 5.1

Sig. .002

Compute an equal-variance t test, but replace the pooled variance (s2) with the MSE.

6/8/2010

Least Significant Difference Test: Examples


N Healthy Gingivitis Periodontitis Edentulous 802 490 347 372 Mean 221.5 223.5 227.3 232.4 Std. Deviation 46. 2 45.3 48.9 48. 8

Post Hoc Tests: Multiple Comparisons


Post-hoc testing usually involves multiple comparisons. For example, if the data contain 4 groups, then 6 different pairwise comparisons can be made
Healthy Gingivitis

Compare Healthy group to Periodontitis group:

T=

221.5 227.3 = 1.92 2201(1 802 + 1 347)

p = 2 P(t1147 > 1.92) = 0.055


Compare Gingivitis group to Periodontitis group: 223.5 227.3 T= = 1.15 2201(1 490 + 1 347)

Sum of Squares Between 33383 Groups Within 4417119 Groups Total 4450502

df 3 2007 2010

Mean Square 11128 2201

F 5.1

Sig. .002

Periodontitis

Edentulous

p = 2 P(t835 > 1.15) = 0.25

Post Hoc Tests: Multiple Comparisons


Each time a hypothesis test is performed at significance level , there is probability of rejecting in error. Performing multiple tests increases the chances of rejecting in error at least once. For example:
if you did 6 independent hypothesis tests at the = 0.05 If, in truth, H0 were true for all six. The probability that at least one test rejects H0 is 26% P(at least one rejection) = 1-P(no rejections) = 1-.956 = .26

Bonferroni Correction for Multiple Comparisons


The Bonferroni correction is a simple way to adjust for the multiple comparisons.
Bonferroni Correction Perform each test at significance level . Multiply each p-value by the number of tests performed. The overall significance level (chance of any of the tests rejecting in error) will be less than .

Example: Cholesterol Data post-hoc comparisons


Group 1 Healthy Healthy Healthy Gingivitis Gingivitis Periodontitis Group 2 Gingivitis Periodontitis Edentulous Periodontitis Edentulous Edentulous Mean Difference (Group 1 Group 2) -2.0 -5.8 -10.9 -3.9 -8.9 -5.1 Least Significant Difference p-value .46 .055 .00021 .25 .0056 .147 Bonferroni p-value 1.0 .330 .00126 1.0 .0336 .88

Example: Cholesterol Data post-hoc comparisons


Group 1 Healthy Healthy Healthy Gingivitis Gingivitis Periodontitis Group 2 Gingivitis Periodontitis Edentulous Periodontitis Edentulous Edentulous Mean Difference (Group 1 Group 2) -2.0 -5.8 -10.9 -3.9 -8.9 -5.1 Least Significant Difference p-value .46 .055 .00021 .25 .0056 .147 Bonferroni p-value 1.0 .330 .00126 1.0 .0336 .88

Conclusion: The Edentulous group is significantly different than the Healthy group and the Gingivitis group (p < 0.05), after adjustment for multiple comparisons

You might also like