Professional Documents
Culture Documents
• The difference between two sample means can be studied through the
standard error of the difference of the means of the two samples or through
student’s t-test but the difficulty arises when we happen to examine the
significance of the difference between more than two sample means at
once. Analysis of variance help us to test whether more than two
population means can be considered to be equal.
• Analysis of variance will enable us to test for the significance of the
differences between more than two sample means.
• Using analysis of variance, we will able to make inference about
whether our samples are drawn from population having the same mean.
• Sir R. A. Fisher originated the technique of analysis of variance.
• The analysis of variance is essentially a technique for testing the
difference between groups of data for homogeneity. It is a method of
analyzing the variance to which a response is subject into its various
components corresponding to the various sources of variation. There may
be variation between the samples or there may be variation within the
sample items. Thus, the technique of analysis of variance consists in
splitting the variance for analytical purposes into its various components.
Normally the variance (or what can be called as the Total variance) is
divided into two parts:
1. Variance between samples,
2. Variance within a samples; such that
Variance = Variance between samples + Variance within samples
Three steps in analysis of variance
Analysis of variance consists of three different steps.
1. Determine first estimate of the population variance from the variance
among (between) the sample means.
2. Determine second estimate of the population variance from the
variance within the sample.
3. Compare these two estimates. If they approximately equal in value,
accept null hypothesis
Assumption
1
In order to use analysis of variance, we must assume that the each of the
samples is drawn from a normal population and that each of these
populations has the same variance
Suppose that the experimenter has available the result of k independent
random sample, each of size n, from k different population and wish to test the
hypothesis that the means of these k population are equal.
If we denote the jth observation in the ith sample by y ij , the general scheme
for a one-way classification is as follows:
Sample
Mean
Sample 1 y11 y12 … y1 j … y1n y1
Sample 2 y 21 y 22 … y2 j … y 2n y2
:
Sample i y i1 y i2 … y ij … y in yi
:
Sample k y k1 yk 2 … y kj … y kn yk
Grand
Mean y.
Where y.is the overall mean (grand mean) of all observations.
To test the hypothesis that the samples were obtained from k populations with
equal means, we make an assumption that we are dealing with normal
population having equal variances.
If µi denotes the mean of the ith population and σ2 denotes the common
variance of the k populations, we can express each observation Y ij as µi plus
the value of a random component. Thus, we could say that the model for the
observation is given by:
2
Y ij = µ i + ε ij fo ri = 1,2,..,k ; j = 1,2,..,n
Where ε ij are independent, normally distributed random variables with zero
means and the common variance σ2 .
The null hypothesis we shall want to test that the population means are all
equal; that is µ 1 = µ 2 = .....= µ k or equivalently
H 0: α 1 = α 2 = .....= α k = 0
Correspondingly, the alternative hypothesis that
the population means are not all equal to zero, that is
H 1:α i ≠ 0 f o ra tle a sot n ev a luoef i
To test the null hypothesis that the k population means are all equal, we shall
compare two estimates of σ2 - one based on the variation among (between) the
sample means, and one based on the variation within the samples.
Since by the assumption each sample comes from a population having the
variance σ2 , this variance can be estimated by any one of the sample
variances
3
=
n (y −y )
ij i
2
σˆ w = ∑
2
= ∑ ∑
i =1 k i =1 j =1 k ( n −1)
Note that each of the sample variance s i2 is based on (n-1) degrees of freedom.
Hence, σ̂ 2w is based on k(n-1) degrees of freedom.
The Variance Among (between) The Sample Means
sample mean. But we do not know σ y , but we could however calculate the
2
2 2
variance among the three sample means s y . So, substitute σ y by s y .
2
Then we have the estimated population variance; that is variance between the
columnσˆ 2 n *σ 2 y = n * s 2 y = n * ∑
k ( y i − y.) 2 , and it is based on (k-1) degrees
B=
i=1 k−1
of freedom.
4
2
F = σ B2
σW
is a value of a random variable having F- distribution with (k-1)and k(n-1)
degrees of freedom.
The null hypothesis will be rejected if F-exceeds Fα with (k-1) and k(n-1)
degrees of freedom.
i =1
.
it is customary to refer to the expression on the left-hand side of the identity of
theorem as the Total Sum of Squares(SST) to the first term of the
expression as the Error Sum of Squares (SSE) and to the second term of the
expression of the right-hand side as the Treatment Sum of Square SS(Tr)
SS(Tr)/ (k- 1 )
F=
SSE / k ( n −1)
To simplify the calculation of the various sums of squares, we usually use the
following computing formulas:
5
k n
2
SST = ∑ ∑ y ij − C
i =1 j =1
k
∑ T i2
SS(Tr) = i =1 .
− C where,C,calledthe correctionterm ,is givenby C = T
2
n kn
.
In these formula Ti is the total of the n observations in the ith sample, where
as
T is the grand total of all kn observations. The Error Sum of Square,
6
Example: A company wants to compare the cleansing action of three
detergents on the basis of the following whiteness readings made on 15
swatches of white cloth, which were first soiled with ink and then washed in
an agitator-type machine with the respective detergents:
Detergent 1 77 81 71 76 80
Detergent 2 72 58 74 66 70
Detergent 3 76 85 82 80 77
Test at the 0.01 level of significance whether the differences among the means
of whiteness reading are significant.
The F-Distribution
Degrees of Freedom
• As we have mentioned each F-distribution has a pair of degrees of
freedom, one for the numerator of the F-ratio and the other for the
denominator.
• While calculating variance between the sample mean we have used
different values of - , one for each sample to calculate . In above
example once we knew two of these - values, the third was
automatically determined and could not be freely specified. Thus, one df
is lost when we calculate the variance between samples. Hence, the
number of degrees of freedom for the numerator of the F-ratio is always
one fewer than the number of samples.
8
Number of degrees of freedom in the numerator of the F-ratio = (n-1)
The F-Table
For analysis of variance, we shall use an F-table in which the columns
represent the number of degrees of freedom for the numerator and the rows
represents the degrees of freedom for the denominator. Suppose we are testing
a hypothesis at the level of significance 0.05, using F-distribution and our
degrees of freedom for numerator is 2 and 13 for the denominator. The value
we find in the F-Table is 3.81 (First look in column and then in row)
9
If calculated F-ratio value is greater than table we reject null hypothesis,
otherwise accept it.
Statement of Hypotheses
Null Hypothesis: There is no significance difference between population
means
In our above example suppose the director of training wants to test at the 0.05
levels the hypothesis that there are no differences among the three training
methods.
We set the null hypothesis as
10
11