Professional Documents
Culture Documents
Analysis of Variance
15.1 Introduction
Analysis of variance compares two or more populations of interval data. Specifically, we are interested in determining whether differences exist between the population means. The procedure works by analyzing the sample variance.
Week ly sales
529 658 793 514 663 719 711 606 461 529 498 663 604 495 485 557 353 557 542 614
Qa u lity
804 630 774 717 679 604 620 697 706 615 492 719 787 699 572 523 584 634 580 624
Price
672 531 443 596 602 502 659 689 675 512 691 733 698 776 561 572 469 581 679 532
Weekl y sales
= 2=
H1: At least two means differ To build the statistic needed to test the hypotheses use the following notation:
Notation
1
X11 x21 . . . Second observation, Xn1,1 second sample n 1 First observation, first sample
k
X1k x2k . . . Xnk,k
x1
Sample size Sample mean
n2 x2
nk xk
Terminology
In the context of this problem
Response variable weekly sales Responses actual sale values Experimental unit weeks in the three cities when we record sales figures. Factor the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy. Factor levels the population (treatment) names. In this problem factor levels are the marketing trategies.
Two types of variability are employed when testing for the equality of the population means
30
25
x3 = 20
20
x2 = 15
16 15 14 11 10 9
20 19
x3 = 20
x2 = 15 x1 = 10
12 10 9 7
x1 = 10
A small variability within The sample means are the same as before 1 the samples makes it easier the larger within-sample variability but Treatment 1 Treatment 2 Treatment Treatment 1 3 TreatmentTreatment 3 2 to draw a conclusion about the it harder to draw a conclusion makes population means. about the population means.
SST= nj (xj x)
j=1
There are k treatments The size of sample The mean of sample j j Note: When the sample means are close to one another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H1.
m of all squared differences tween sales in city j and the mple mean of city j (over all e three cities).
This sum is called the Sum of Squares for Error our example this is the SSE
nj
Is SST = 57,512.23 large enough relative to SSE = 506,983.50 to reject the null hypothesis that specifies that all the means are equal?
Calculation of MSE
Mean Square for Error
M ST =
MSE =
Required Conditions: 1. The populations tested are normally distributed. with 2. The variances of all the the following degrees of freedom: populations tested arev1=k -1 and v2=n-k equal.
The F test
H o: 1 = 2 = 3 H1: At least two means differ
MT S F= ME S 2 ,7 6 1 8 5 . 2 = 88 4 1 , 9 . 7 =3 2 . 3
fx Statistical = .0467
FDIST(3.23,2,57)
Count 20 20 20
Sum Average Variance 11551 577.55 10775.00 13060 653.00 7238.11 12173 608.65 8670.24
df 2 57 59
MS 28756 8894
F crit 3.16
Level2
Factor B
Level 1
Random effects
If the levels included in our analysis represent a random sample of all the possible levels, we have a randomeffect ANOVA. The conclusion of the random-effect ANOVA applies to all the levels (not only those studied).
Randomized Blocks
Block all the observations with some commonality across treatments
Block3
Block2
Block 1
Randomized Blocks
Block all the observations with some commonality across treatments
Treatment Block
1 2 . . . b 1 2 k Block mean X11 X12 . . . X1k X21 X22 X2k
x[B] 1
x[B]2
Xb1 Xb2
Xbk
Treatment mean
x[T] x[T]2 1
x[T]k
x[B]b
Recall. The sum of square total is partitioned For the independent into three sources of variation
SSB=
2
k(x[B] ) X + 1 x[B] 1 2 x[B]2 k(x[B] ) X + + 2 Xb1 Xb2 Xbk = ( x11 X ) 2 + ( x21 X ) 2 + ... + ( x12 X ) 2 + ( x22 X ) 2 Treatment mean
... + ( X 1k X ) 2 + ( x2 k X ) 2 + ... =
x[T] x[T]2 1
x[T]k
2
x
2
k(x[B] ) X k
SST
SSB=
x[B]
k(x[B] ) X + 1 2 k(x[B] ) X + 2 k(x[B] ) X k
2 2 2
12 Treatment mean
(x x[T]2 x[B]1 + X) + (x22 x[T]2 x[B]2 + X) + ... (x1k x[T]k x[B]1 + X)2 + (x2k x[T]k x[B]2 + X)2 + ...
x[T] x[T]2 1
x[T]k
2
SST
Mean Squares
To perform hypothesis tests for treatments and blocks we need Mean square for treatments Mean square for blocks Mean square for error = SST MST k 1
SSB MSB= b 1
SSE M = SE nkb+1
MST F= MSE
Test statistic for blocks
MSB F= MSE
F > F ,k-1,n-k-b+ 1
Testing the mean response for blocks
F> F ,b-1,n-k-b+ 1
Can we infer from the data in Xm15-02 that there are differences in mean cholesterol reduction among the four drugs?
Treatments
nclusion: At 5% significance level there is sufficient eviden nfer that the mean cholesterol reduction gained by at leas o drugs are different.
Variance
City1 City6
City2
City3
Quality TV
City4 City5
Quality Price TV
Xm15-03
Paper
The p-value =.0452. We conclude that there is evidence that differences exist in the mean weekly sales among the six cities.
TV
City 1 sales
Are there differences in the mean sales caused by different marketing strategies?
Cn. ov
Qa u lity
P e ric
TV
City 1 sales
Are there differences in the mean sales caused by different advertising media?
st whether mean sales of the TV, and Newspaper nificantly differ from one another. H0: TV = Newspapers H1: The means differ
Calculations are based on the sum of square for factor B SS(B)
Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium?
Graphical description of the possible Graphical description of the possible relationships between factors A and B. relationships between factors A and B.
Difference between the levels of factor A, and between the levels of factor A Difference difference between the levels of factordifference between the levels of facto No B; no interaction M R Level 1 of factor BM R Level 1and 2 of factor B e e e e s s a p a p Level 2 of factor Bn o n o n n s s e e Levels of factor A Levels of factor A 1 2 3 1 R e A. s p o n s e 1 2 Interaction 3 M M R e No e edifference between the levels of factor s a Difference between the levels of factor B a p n n o n s e Levels of factor A 1 2 3
Levels of factor A 3
Sums of squares
SS(A) = rb SS(B) = ra
i=1 b j=1
(x[A]i x)2
(x[B]j x)2
b
SS(AB = r )
i=1
SSE =
i =1 j =1 k =1
( xijk x[ AB ]ij ) 2
MS(B) MS(A) F= F= MSE MSE Rejection region: F > F ,a-1,n-ab > F , b-1, n-ab
Test for interaction between factors A and B
Required conditions:
1. The response distributions is normal 2. The treatment variances are equal. 3. The samples are independent.
491 712 558 447 479 624 546 444 582 672 464 559 759 557 528 670 534 657 557 474
677 627 590 632 683 760 690 548 579 644 689 650 704 652 576 836 628 798 497 841
575 614 706 484 478 650 583 536 579 795 803 584 525 498 812 565 708 546 616 587
quality
price
qa u lity
p e ric
H1: At least two mean sales are different F = MS(Marketing strategy)/MSE = 5.33 Fcritical = F ,a-1,n-ab = F.05,3-1,60-(3)(2)
MS(A)/MSE
At 5% significance level there is evidence to infer that differences in weekly sales exist among the marketing strategies.
df 1 2 2 54 59
Ns a e ep p r
H1: The two mean sales differ F = MS(Media)/MSE = 1.42 Fcritical = F, a-1,n-ab = F.05,2-1,60-(3)(2)
MS(B)/MSE
= 4.02 (p-value = .2387)
At 5% significance level there is insufficient evidence to infer that differences in weekly sales exist between the two advertising media.
T * u lity Vq a
==
n wp p e e s .* ric
Interaction AB = Marketing*Media
TV*quality
==
newsp.*price
MS(AB)/MSE
At 5% significance level there is insufficient evidence to infer that the two factors interact to affect the mean weekly sales.
Two means are considered different if the difference between the corresponding sample means is larger than a critical number. Then, the larger sample mean is believed to be associated with a larger population mean. Conditions common to all the methods here:
The ANOVA model is the one way analysis of variance The conditions required to perform the ANOVA are satisfied. The experiment is fixed-effect
This method builds on the equal variances t-test of the difference between two means. 2 The test statistic is improved by using MSE rather than sp . We can conclude that i and j differ (at % significance level if | i - j| > LSD, where
Bonferroni Adjustment
The procedure:
Compute the number of pairwise comparisons (C) [C=k(k-1)/2], where k is the number of populations. Set = E/C, where E is the true probability of making at least one Type I error (called experimentwise Type I error). We can conclude that i and j differ (at /C% significance level if
1 1 MSE + ) ( ni nj
and 2.
MSE = q (k, ) ng
k = the number of samples =degrees of freedom = n - k ng = number of observations per sample (recall, all the sample sizes are the same) = significance level q (k, ) = a critical value obtained from the studentized range table
xmax xmin
If xmax xmin> there is sufficient evidence to conclude that max > min .
Repeat this procedure for each pair of samples. Rank the means if possible.
he sample sizes are not extremely different, we can use the k ng = ove procedure with ng calculated as the harmonic mean of 1 n +1 n2 +... 1 nk + 1 e sample sizes.
xmax xmin
xmax xmin>
Sales - City 1 City 1 vs. City 2: 653 - 577.55 = 75.45 577.55 Sales - City 2 City 1 vs. City 3: 608.65 - 577.55 = 31.1 653 Sales - City 3 City 2 vs. City 3: 653 - 608.65 = 44.35 698.65