You are on page 1of 11

Is ANOVA a STAR?

Introduction
In a previous chapter, the calculation of the t statistic was defined. In this article, the use of
the variance—the square of the standard deviation—and the F statistic will be described in
an alternative method to decide whether two samples are not the same.

What does ANOVA mean?


The term ANOVA is an acronym that stands for Analysis Of Variance. ANOVA is a
statistical technique, which is used to analyze statistical hypotheses concerning two or more
items within a given treatment (for example, do different lots of raw material behave the
same or differently; or does changing a reaction temperature change the yield); or between
treatments (for example do reaction temperature changes or reaction time changes have the
greater effect). With ANOVA a researcher can tell which variables must be controlled at
which levels, so that a process is reproducible or the desired result is maximized; and can
tell which variables can be ignored, because they don’t affect the process. ANOVA is not a
STAR, but the researcher who uses this technique will be one to his company.

How are two treatments compared?


As described in the previous chapter, the means of two treatments can be compared using

the calculation of the t statistic (Equation 1), where X is the average, s is the standard

deviation, n is the number of replicates in the and s calculation, 1 and 2 refer to the
X
different levels or treatments of a variable being compared, t is the statistic, cf is the desired
confidence, n1+n2-2 are the number of degrees of freedom in the study.
The t statistic allows an experimenter to determine whether two levels within a treatment or
whether two different treatments give the same results or not.
(X1 −X 2 )
tcf, n1+n2-2 = 1
s12 s 22
+
n1 n 2
In actual practice, the experimenter assumes the two treatments will give the same results—
the null hypothesis. He then compares the two treatments by running a number of replicates
on each and calculating the average, the standard deviation and finally the t value. The
experimenter then uses a published table of t values to look up the expected t value, when
only error were operating and no real difference between the items existed. If the calculated
t value is larger than the t value in the table, the researcher can conclude that the hypothesis
was wrong and that the two treatments do not give the same results.
Example 1: A researcher wanted to know if two lots of polyester had the same acid number.
He believed that they did. He measured five replicates on each resin: Polyester 1: 3.9, 3.6,

1 First published in the American Paint and Coatings Journal, May 27, 1996; revised publication in Paint and Coatings
Industry, “deSigns of the Times: Or, when F is a passing grade,” submitted for publication August, 2000.

33
3.5, 3.4, 3.6; and Polyester 2: 3.8, 3.9, 3.6, 3.8, 3.7. The average acid value for Polyester 1
was 3.6 and for Polyester 2 was 3.76; and the standard deviation for the acid value of
Polyester 1 was 0.19 and for Polyester 2 was 0.11. The calculated t value was 1.63. A
published t Table (found in all statistical references) tells the researcher that he would be
wrong fifteen times in a hundred if he said these polyesters had different acid numbers.
These were not acceptable odds, so the researcher concluded that the polyesters had the
same acid numbers. A statistician would rather state that “the two polyesters could not be
proven to be different.”
Example 2: The researcher of Example 1 was still not satisfied so he decided to run five
more replicates the next day: Polyester 1, 3.6, 3.4, 3.6, 3.4, 3.5; and Polyester 2, 3.8, 3.6,
3.7, 3.8, 3.8. Combining the first results with the second results, the average acid value for
Polyester 1 was 3.55 and for Polyester 2 was 3.75, while the standard deviation for the acid
value of Polyester 1 was 0.15 and for Polyester 2 was 0.10. With twenty tests the researcher
has increased his power to make a decision. The calculated t value was 2.98. This time the t
Table tells that the researcher would be wrong only one time in one hundred, if he said these
polyesters had different acid numbers. The researcher and the statistician would both
conclude that with the additional data, the polyesters have different acid numbers.

Is there another way to evaluate data?


In Examples 1 and 2 the averages were used to look for differences between lots. Another
way is to use an Analysis of Variance (ANOVA) table to do the same thing.

What is Variance?
In a previous chapter, “What’s a Standard Deviation Good for Anyway?”, standard
deviation, s, was described as the square root of the variance (Equation 2).
s = Variance 2
The variance is calculated by subtracting the average from each data point to obtain the
deviation, d (Equation 3); squaring each deviation; summing the squared deviation
(Equation 4); dividing the sum-of-the-squares by the number of data points, n-1 (Equation
5); and finally taking the square root (Equation 6).

di = Xi – X 3
2
Sum of Squares = ∑ di 4

2
∑ di 5
Variance = s2 =
n -1

What is ANOVA?
Variance is a statistic that is calculated by measuring the effects when the levels of the
variables are changed. In the example above a variance can be calculated for the difference
between Polyester 1 and Polyester 2 and a variance can be calculated for the experimental
error. These variances are additive.

34
How is an ANOVA table constructed?
Example 3: For simplicity, all the data in Examples 1 and 2 will be used.
Polyester 1, 3.9, 3.6, 3.5, 3.4, 3.6, 3.6, 3.4, 3.6, 3.4, 3.5
Polyester 2, 3.8, 3.9, 3.6, 3.8, 3.7, 3.8, 3.6, 3.7, 3.8, 3.8
The average for each polyester is calculated: 3.55 and 3.75.
The grand average is calculated: 3.65.
The grand sum of the deviation squares (often abbreviated to “grand sum of the squares” or
simply, Total SS) is calculated using twenty data points and the grand average (see for
example, Equation 4): 0.49.
The sum of the squares within the two resins is calculated using each resin average, the
grand average and the number of replicates.
The sum of the square between the two resins is calculated using the averages of the two
resins and the individual data points.
The Total Sum of Squares is seen to be the sum of the Between- Resins and Within-Resins
Sum of Squares. This means that only two of the three SSs need to be calculated long-hand,
and the third can be calculated from the others.
The ANOVA table is completed by including the degrees of freedom, df; and calculating the
variance by dividing the SS by the df. The variance in an ANOVA table is usually called the
Mean Square.
In this example, two resins were compared, so the between-resins degrees-of-freedom is
one; there are ten analysis for each resin or nine degrees-of-freedom for each resin, so the
experimental error or within-resin degrees-of-freedom is twice nine or eighteen; and the
grand degrees-of-freedom is nineteen (the grand degrees-of-freedom is equal to the sum of
the between-resins degrees-of-freedom and the within-resins degrees-of-freedom; or, is
equal to the total number of experiments minus one). This is summarized in Table 1. 2

Table 1: ANOVA
Source of Variation Sum of df Mean
Squares Square
Between Resins 0.200 1 0.200
Within Resins 0.290 18 0.016

Total 0.490 19 0.026

2 I know I’m waving my hands here , but I would refer you to any good text in applied statistics to find the algebraic
derivation.

35
In this example the within-resins Mean-Square is the error variance. The standard deviation
for this acid number test is the square-root of the Within-Groups Mean-Square (s2 = 0.016);
therefore, s = 0.127 with 18 degrees of freedom.
The above demonstrates calculations to isolate the effects between and within different lots
of polyester, that is, the variance between the lots and the variance of the experimental error.
But how does ANOVA tell whether Resin 1 is different than Resin 2? This requires the
calculation of the F statistic and a comparison in an “F” test

What is an F statistic?
The challenge is to describe the F statistic and how to use it to judge differences in samples
or treatments without going into its derivation (see Footnote 1). An Fsample is calculated by
dividing the variance between the samples, s12 , by the error variance, s 22 , Equation 8.
s12
Fsample = 8
s 22
This calculated Fsample is then compared to values in the F table. F tables are published in
most statistics references. An excerpt of the table is given in Table 2.

Table 2: Excerpts from an F table


Between df 1 2 5 10 20
Within df
39.86 49.50 57.24 60.20 61.74
1 161.4 199.5 230.2 241.9 248.0
4052 4999 5764 6056 6209
8.53 9.00 9.29 9.39 9.44
2 18.5 19.00 19.30 19.40 19.45
98.5 99.00 99.30 99.40 99.45
4.06 3.78 3.45 3.30 3.21
5 6.61 5.79 5.05 4.74 4.56
16.26 13.27 10.97 10.05 9.55
3.28 2.92 2.52 2.32 2.20
10 4.96 4.10 3.33 2.98 2.77
10.04 7.56 5.64 4.85 4.41
2.98 2.59 2.16 1.94 1.79
20 4.35 3.49 2.71 2.35 2.12
8.10 5.85 4.10 3.37 2.94

36
A sample variance, s2, is only an estimate of the population variance, σ2. If two
different samples are taken to calculate two variances, s2. Each s2 is an independent estimate
of the variance σ2. The ratio of the variances for the two samples could vary by as much as F
and still be due to experimental variation. If the Fsample is larger than the F from the table,
something more than just experimental variation is happening. This additional variation is
attributed to the difference in the treatment.
Using the data from Example 3, the following holds. Since the between-groups
variance = polyester variance (with 1 degree of freedom) and the within groups
variance = experimental error (with 18 degrees of freedom), the F Value is calculated by
dividing the between-groups variance by the within-groups variance (Equation 7).
Polyester Variance 0.200
= = 12.5 = F 1,18 (7)
Experimental Error 0.016

The next step is to look up the F value in an F Table. Since the Between Groups had 1 degree
of freedom, the first column is used; and since the within groups had 18 degrees of freedom,
the eighteenth row is used. The F table tells that at 90% confidence, F1,18,90 = 3.01; at 95%
confidence, F1,18,95= 4.41; and at 99% confidence, F1,18,99 = 8.29.
The F1,18,95 = 4.41, tells that at the 95% confidence level, experimental Between Resins
Variance can be as small as 0.016 or as large as 4.41 * 0.016 = 0.070 and still be due only to
serror
2
. Since The Between Resins Variance of Polyesters 1 and 2 = 0.200, this is larger than
can be explained simply by experimental error. An experimenter would be wrong only 1
time in 20 if he said the lots were different. Moreover, at 99% confidence, F1,18,99 = 8.29 and
the experimental Between Resins Variance can be as small as 0.016 or as large as
8.29 * 0.016 = 0.130; Even at the 99% confidence level, the Between Resins Variance of
0.200 is larger than can be explained by experimental error; and an experimenter would be
wrong only 1 time in 100 if he said the lots were different.
An easier way to use the F table is to go the ANOVA table and compare the calculated
Fsample to the F1,18 from the F Table for each confidence level, Table 3.

Table 3: Statistical Comparison


Probability F1,18 Fsample
90 3.01 12.5
95 4.41
99 8.29

A comparison of Fsample = 12.5 to the values in the table show that the there is less than 1
chance in 100 that these lots are not different.
The results of ANOVA is identical to the results of comparing the means in the t-test.

37
Why construct a complicated ANOVA table when the t-test is easier?
A t-test can only be used to compare two means at a time. When there are more than
two groups in a treatment, use of the t-test would require that all pairs be compared. For
example, if there were five in the group, then ten comparisons would have to be made.
ANOVA can compare all five in one step.
Example 4: The acid numbers of five polyesters were to be compared to see if any
were different. Six replicate determinations of acid number were made for each resin. The
data is given in Table 4 and the Analysis of Variance is given in Table 5.
Table 4 / Example 4 data
Polyester Acid Numbers Average
3 3.4, 3.4, 3.5, 3.3, 3.5, 3.3 3.43
4 3.5, 3.5, 3.6, 3.4, 3.6, 3.5 3.52
5 3.2, 3.3, 3.2, 3.3, 3.3, 3.3 3.27
6 3.5, 3.6, 3.4, 3.4, 3.6, 3.5 3.50
7 3.8, 3.9, 3.8, 3.8, 3.9, 3.9 3.85

Table 5 / Example 4 ANOVA


Source Sum of df Mean F Ratio
Squares Square
Between Resins 1.085 5 0.2712 52.2
Within Resins 0.130 24 0.0052 Prob>F
Total 1.215 29 0.0419 <.0001

The F ratio of 52.2 tells that there is less than 1 chance in 10,000 in making a mistake
if the experimenter says that these five polyesters are not the same. Figure 1 shows a plot of
the data for each polyester and comparison circles, which will allow the experimenter to
judge which resins are the same and which are different.
Figure 1 / Data plot with comparison circle
4.0

3.8

3.6
Acid Value

3.4

3.2

Comparison
1 2 3 4 5
Circles
Polyester

38
Overlapping comparsion circles means that the ANOVA cannot tell if the resins are
different. A comparison circle that is isolated indicates that the data disprove the hypothesis
that a sample is the same as the others. Figure 1 shows one polyester with a high acid
number, one with a low acid number and three polyesters whose acid numbers cannot be
told apart. For these three resins, since one of the comparason cirlces does.not overlap the
other two excatly, additional experimentation on these three might show a difference.
This was an example of a one-way Analysis of Variance, because only one variable
was present.

How can ANOVA analysis be applied in a Coatings Lab?


Example 5: Several years ago a complaint was received that four lots of a commercial
polymeric isocyanate gave two-component coatings with different Gardner dry-time results.
The dry times were measured in the lab using the same pigmented polyol, but varied the
polyisocyanate lot, Table 6..
Table 6: Gardner Dry Times
Isocyanate Samples
Lots A B C D
510 490 355 660

These results lead to a conclusion that indeed there did seem to be differences in the lots. A
review of the production tests of these lots led to no definitive conclusions. At face value it
seemed that lot D was slow to dry, lot C was fast to dry and lots A and B were in the middle.
A statistically designed experiment was conducted to increase the number of degrees of
freedom in order to increase the statistical “power.” The dry times of these lots were
evaluated over several days, Table 7.
Table 7: Gardner Dry Times
Isocyanate Samples
Lo t A B C D
D ay

1 510 490 355 660


2 450 480 360 620
3 345 450 450 540
4 360 660 450 510

These results seemed to reveal a pattern about the performance of the lots and something
about the experimental error. Table 8 shows the day average and standard deviation for the
tests run on each lot; shows the lot average and standard deviation for the tests run on the
different days; and the grand mean and grand standard deviation for the test.

39
Table 8: Gardner Dry Times
Isocyanate Samples
Lo t A B C D Day Day
D ay Mean Standard
Deviation

1 510 490 355 660 504 125


2 450 480 360 620 478 198
3 345 450 450 540 446 138
4 360 660 450 510 495 218
Lot
Mean 416 520 404 583
Lot
Standard 78 95 53 69
Deviation
Grand Mean 481
Grand Standard Deviation 102

From this data two hypothesis can be stated: ① There are no differences between the lots;
and ② There are no day-to-day differences in dry time. The data seem to show that the lot-
to-lot averages fall into two groups: two resins have higher gel times than the other two; and
that the day-to-day averages don’t seem to fall into any pattern. To analyze this data an
ANOVA Table is constructed, Table 9, to test these hypotheses.

Table 9: ANOVA Table


Sum of df Mean F s Prob > F
Squares Square
Between Days 7731 3 2577 0.4 0.768
Between Lots 76831 3 28310 4.0 0.037
Error 60481 9 6720 82
Total 156144 15
Confidence, % F3,9
90 2.8
95 3.9
99 7.0

How were the calculations done for Table 9?


The Between Lots Sum-of-Squares was calculated by taking the difference between each
Lot’s average dry time and the Grand Mean, squaring this difference and taking the sum of
the squares of the four Lots. The Between Days Sum-of-Squares was calculated by taking
the difference between each Day’s average dry time and the Grand Mean, squaring this
difference and taking the sum of the squares of the four Days. The Total Sum-of-Squares
was calculated by taking the difference between each data point and the Grand Mean,

40
squaring this difference and taking the sum of the squares for the sixteen runs. Finally, the
Error Sum-of-Squares was calculated by subtracting the Between Lots Sum-of-Squares and
the Between Days Sum-of-Squares from the Total Sum-of-Squares. (see Footnote 1.)
Since there were four days there were three Between Days degrees-of-freedom; and there
were four lots so there were three Between Lots degrees-of-freedom. There were sixteen
experiments so there were fifteen Total degrees-of-freedom. The Error degrees-of-freedom,
9, were calculated by subtracting the Days and the Lots degrees-of-freedom from the Total
degrees-of-freedom
Each Mean Square was calculated by dividing each Sum of Squares by its respective
degrees of freedom.
Each Fsample was calculated by dividing the Lot or Day Sum-of-Squares by the Error Sum-of-
Squares, respectively. The Fsample was then compared to the Fs from the F table with 3 and 9
degrees of freedom.
If an experimenter would say that there is a difference between each day’s testing, he would
be wrong 77 out of 100 times, so the first hypothesis false true. However, if he said that
there is a difference between the lots, he would be wrong only 4 times in 100, and the
second hypothesis was false.
The standard deviation for the experimental error was calculated by taking the square root of
the error Mean Square. One surprise was that the day to day error was so large—a standard
deviation of 82 hours. This means that if an experimenter determined only one dry time per
sample, he would have to see a difference of ~160 minutes before he could confidently say
the samples had different dry times. In this case, differences between lots could be seen
because of the increased power in the experiment due to the large number of replicates.

Was there a happy ending?


Yes! From an evaluation of comparison circles, Lot A was statistically not different
than Lot C; and Lot B was statistically not different than Lot D. However, the group A and
C was statistically different than the group B and D. Review of the production history was
able to pinpoint a production variable that caused the two groups to have different dry times.
As a result, the process, and so the product, was made more consistent. The process
inconsistency probably would not have been found if only one dry time had been run per lot
and if the experimental design not been done.
This was an example of a two-way Analysis of Variance, because two variables were
present. If ten variables were present, ANOVA could be used to construct a ten-way
analysis. The ability to separate the variance for each variable is what makes ANOVA such
a powerful analytical tool.

Where else can ANOVA be used in the laboratory?


Analysis of Variance, ANOVA, is used for most statistically designed experimentation:
comparison of several sets of data, as seen in the above examples; regression analysis;
factorial statistics; mixture statistics; response surface analysis; Taguchi methods; robust
testing; etc. These are all topics for future discussions.

41
42
Disclaimer
The manner in which you use and the purpose to which you put and utilize our products, technical assistance
and information (whether verbal, written or by way of production evaluations), including any suggested
formulations and recommendations are beyond our control. Therefore, it is imperative that you test our
products, technical assistance and information to determine to your own satisfaction whether they are suitable
for your intended uses and applications. This application-specific analysis must at least include testing to
determine suitability from a technical as well as health, safety, and environmental standpoint. Such testing
has not necessarily been done by us. Unless we otherwise agree in writing, all products are sold strictly
pursuant to the terms of our standard conditions of sale. All information and technical assistance is given
without warranty or guarantee and is subject to change without notice. It is expressly understood and agreed
that you assume and hereby expressly release us from all liability, in tort, contract or otherwise, incurred in
connection with the use of our products, technical assistance, and information. Any statement or
recommendation not contained herein is unauthorized and shall not bind us. Nothing herein shall be
construed as a recommendation to use any product in conflict with patents covering any material or its use.
No license is implied or in fact granted under the claims of any patent.

You might also like