You are on page 1of 14

Homogeneity of Variance

Running Head: HOMOGENEITY OF VARIANCE

The Importance of the Homogeneity of Variance Assumption in ANOVA/MANOVA Pasquale Veleno University of Calgary

In order for statistical tests to be valid when using univariate or multivariate analysis of variance, it is important that certain assumptions hold true. Three commonly held assumptions for univariate ANOVA are that: observations are independent; the observations are normally distributed; and, variances within the groups of the design are identical (homogeneity of variance). Similarly, assumptions for MANOVA include: observations are independent; observations are normally distributed; and, the population covariance matrices for the p dependent variables are identical (Stevens, 2009). The homogeneity of variance assumption addresses variances in populations, and requires that variances are equal. Failure to meet this requirement results in changes to Type-I error rate. This paper will further outline the importance of the homogeneity of variance assumption in ANOVA and MANOVA, including how the assumption is tested, and effects of violations. The homogeneity of variance assumption, otherwise referred to as homoscedasticity, is predicated on the assumption that the population variances for the groups of the design are equal. Homoscedasticity is evaluated for pairs of variables (Schwab, undated). Data is subdivided into appropriate groups, and variances within each of the groups are calculated and tested to ensure that they are consistent with being sampled from a single, normal distribution (Stevens, 2009). If this assumption is not met, the calculation of error variance (SS error) is affected insofar as the result will not produce an estimate of the common within-group variance (StatSoft Inc., 2010). When the error variance is not consistent among groups, the data are said to be heteroscedastic. According to Tabachnick and Fidell (1996), homoscedasticity is related to the assumption of normality because when the assumption of multivariate normality is met, the

relationship between variables are homoscedastic (p. 85). Typically, heteroscedasticity is influenced by a number of factors: non-normality of one of the variables; the relationship between one variable and the transformation of another variable; and, measurement error at some levels of an independent variable (Tabachnick and Fidell, 1996). This can be evaluated using both graphical and statistical methods. Among the most common graphical methods of evaluation is the boxplot, while Levenes statistic, F statistic, and Wilks lambda are among the common examples of the many statistical methods of evaluation for these purposes. ANOVA is robust with respect to violations of assumptions when variances between group sizes are approximately equal, and when the normal distribution between groups criteria is met. Group sizes can be compared by ensuring that the larger group is no more than one and a half times the size of the smaller group, while group variance is considered acceptable when variance within the larger group remains less than five times that of the smaller group (ANOVA Assumptions, 2004). Slight heteroscedasticity has little effect on significance tests; however, when heteroscedasticity is pronounced, this can significantly reduce the accuracy of findings, thereby resulting in an increase in the possibility of a Type I error (rejecting the null hypothesis when it is true) (Tabachnick and Fidell, 1996). Stevens (2009) suggests that the F statistic is: robust against heterogeneous variances when the group sizes are equal(as) long as the group sizes are approximately equal (largest/smallest <1.5), F is robust. On the other hand, when the group sizes are sharply unequal and the population variances are different, then if the large sample variances are associated with the small group sizes, the F statistic is liberal (p.227).

This would result in the researcher incorrectly rejecting the null hypothesis. Alternately, when large variances are associated with the large group sizes this would lead to the F statistic being conservative, resulting in a reduction of power (Stevens, 2009). If sample sizes are approximately equal, violations of the assumption of homogeneity on Type I error have little effect. While considered an extension of ANOVA, MANOVA differs from ANOVA insofar as it includes more than one dependent variable in the analysis. The homogeneity of variance assumption also applies in MANOVA, however since there are a number of dependent variables within this design, it remains a requirement that their covariances are homogenous across the cells of the design as well (StatSoft, 2010). This is referred to as the assumption of homogenous covariance. The assumption of homogenous covariance matrices is very restrictive (Stevens, 2009). Given the requirement that two matrices are equal only when all corresponding elements are equal, this requires that matrices with multiple dependent variables are equal across groups, and all covariances are equal between groups. This becomes increasingly problematic when multiple variables are measured. When subjects within each group are approximately equal, MANOVA remains robust. When subjects within each group are unequal, however, an examination of the covariance is required. Wilks lambda and Boxs M tests can be applied as a means of testing this assumption. Testing Assumptions Although a number of tests exist for checking homogeneity of variance, most if not all of them are affected to some degree by non-normality in the samples. The need to test for homogeneity of variance can be determined by inspecting the variances or standard

deviations for each group. If the ratio of the variances differ by more than nine or the ratio of the standard deviations differ by more than three, then the homogeneity of variance assumption is threatened, thereby necessitating further inspection (Linear Statistical, 1998). Before testing for homoscedasticity, the independent variable must be non-metric and the dependent variable must be metric (ordinal or interval). If the independent variable is metric, it must first be converted to a categorical variable prior to testing (Schwab, 2005). Once this condition is met, the statistical methods can be applied. There are several statistical methods for checking the homogeneity of variance assumption. The following will be briefly reviewed: Fmax test (Hartleys test) - Fmax test, otherwise known as Hartleys test, is useful when a univariate analysis of variance is appropriate for two population variances. If the ratio of the largest to smallest size group in ANOVA is small, i.e., 4:1 or less, and if F-max is 10:1 or less, then it can be inferred that homogeneity of variances is assumed not to be a problem. Hartley's test assumes that data for each group are normally distributed, and that each group has an equal number of members. The test involves computing the ratio of the largest group variance, max(sj2) to the smallest group variance, min(sj2). The resulting ratio, Fmax, is then compared to a critical value from a table of the sampling distrubution of Fmax . If the computed ratio is less than the critical value, the groups are assumed to have similar or equal variances (Pearson and Hartley, 1970) The test statistic is as follows:

with df = p & (n-1)

Requirements of this test include the need for approximately equal sample sizes, and normal populations and independent samples. One of the limitations of this test is that it is extremely sensitive to violations of the normality assumption, and as a result, outliers have a disproportionate effect on outcomes (OBrien, 1981). Bartlett's test - Bartlett's test of homogeneity of variance is a chi-square statistic with (k-1) degrees of freedom, where k is the number of categories in the independent variable. The Bartlett's test is sensitive to violations of normality and therefore Levene's test and BrownForsythe test have now largely replaced it (StatSoft, 2010). It is best used for larger samples only.

If there are k samples with size ni and sample variance statistic is

then Bartlett's test

where variance.

and

is the pooled estimate for the

The test statistic has approximately a

distribution. Thus the null hypothesis is rejected

if

(where

is the upper tail critical value for the

distribution)

(Bartletts, 2010) Levene's test - Levene's test of homogeneity of variance tests the assumption that each group of the independent variables has the same variance on the dependent variable. If the Levene statistic is significant at the .05 level or better, the researcher rejects the null hypothesis that the groups have equal variances. The Levene test is robust in the face of non-normality. The Levene test is considered more robust to non-normality than Bartletts test, though it is sensitive to asymmetries of the residuals, thus requiring the median to be used in place of the mean in computing zi scores (Garson, 2006). The test statistic, W, is defined as follows:

where

W is the result of the test, k is the number of different groups to which the samples belong, N is the total number of samples, Ni is the number of samples in the ith group, Yij is the value of the jth sample from the ith group,

Wilks lambda - This is a commonly used test that is appropriate when there are more than two groups formed by the independent variables. Wilks' lambda is a multivariate F test that measures the difference between groups of the centroid (vector) of means on the independent variables (Garson, 2006). Wilks' lambda scores ranges from 0 to 1, with lower the Wilks' lambda scores signifying greater relationship of predictors to responses. Brown & Forsythes test - Brown & Forsythe's test of homogeneity of variances tests for equality of group means by using the median instead of the mean. The Brown-Forsythe test is more robust than the Levine test when groups are unequal in size and deviations from the group means are highly skewed, causing a violation of the normality assumption and the assumption of equal variances. The transformed response variable is constructed to measure the statistical variability in each group. Let

where

is the median of group j. The BrownForsythe test statistic is the model F statistic

from a one way ANOVA on zij:

where p is the number of groups, nj is the number of observations in group j, and N is the total number of observations.

Of these, Levene's test and Brown & Forsythes tests are considered least affected by non-normality (Linear Statistical, 1998). Box's M test - tests the multivariate homogeneity of variances and covariances, as required by MANOVA, using the F distribution. When F is not significant, i.e., the probability value of this F should be greater than .05 to demonstrate that the assumption of homoscedasticity is valid, the researcher accepts the null hypothesis that groups do not differ. When F is less than .05, the covariances are considered significantly different, and the null hypothesis is not rejected. It has been shown to be a conservative test, failing to reject the null hypothesis too often. It also is highly sensitive to violations of multivariate normality. As such, it is not considered the most useful tool for these purposes. Furthermore, to compensate for sensitivity to non-normality, researchers have tended to test at the p= .001 level, especially during instances where sample sizes are unequal (Garson, 2006). Dealing With Violations When output data does not meet the requirements of normal distribution or homogeneity of variance, the risk of making a type I error increases. There are specific situations, however, which call for special attention in this regard. If conducting a between-subjects ANOVA with equal numbers of participants, then violations of normality and homogeneity of variance can be tolerated without substantially increasing the chances of drawing false conclusions, since ANOVAs are very robust (ANOVA Assumptions, 2004). If conducting a within-subjects factor in the design, especially with a withinsubjects factor and unequal sample sizes, then heterogeneity of variance becomes problematic, thereby necessitating examination of sphericity (ANOVA Assumptions,

2004). If the variances associated with the means differ in a substantial degree, particularly if there are unequal cell sizes, it is best to proceed as though assumptions have been violated (ANOVA Assumptions, 2004). Assuming that there are no problems with research design and output data appears valid, consideration must be given to data transformation as a means of meeting the assumption of homogeneity of variance. The purpose of data transformation is to return the data to a symmetric (normal) distribution, make it easier to visualize, and easier to interpret (Data transformation, 2010). Data transformation remains a viable option for dealing with heteroscedasticity, though it must be done only if the transformed data remains interpretable and does not alter the relationship of the dependent variable to the independent variable(s). According to Tabachnick and Fidell (1996), difficulty in interpretation is largely dependant upon the scale in which the variable is measured, however. So, if the scale is meaningful or widely used, transformation often hinders interpretation, but if the scale is somewhat arbitrarytransformation does not notably increase the difficulty of interpretation (p.86). For example, if data being transformed are measured in units such as seconds, hours or days, data transformation may not be best suited in this case. If, however, data were measured in more arbitrary means, it would be useful to transform the data, since this does not impact upon the interpretation of results. The goal of transformation is to normalize the data. As such, it is important to assess for normality after the transformation has taken place. A possible corollary advantage of transformation lies in the fact that it may reduce the impact of outliers in grouped data, thereby improving the analysis (Tabachnick and Fidell, p.86).

10

The process of determining which transformation is best suited usually involves multiple trials to determine optimal results, i.e., normal distribution. The type of transformation used will depend upon the distribution deviation. For example, a square root transformation is best used during instances when the distribution differs slightly from normality; a log transformation is best suited when the distribution deviation is moderately pronounced; and, an inverse transformation is used when the distribution is markedly different than expected, i.e., non-normal. Direction of the deviation is also important. If the data is negatively skewed, the data should be reflected before the application of the transformation. This involves creating a new variable where the original value of the variable is subtracted from a constant. The constant is calculated by adding 1 to the largest value of the original variable (Introduction to, 2007). In some cases, data transformations do not result in homoscedasticity. When this occurs, this results in a loss of power. As such, it becomes much more difficult to accurately identify relationships and strength of relationships between the dependent and independent variables. In conclusion, to reduce the likelihood of committing Type-I error, it is important to ensure homogeneity of variance. A number of available statistical and graphical methods exist as a means of doing so, however the selection of methods of evaluation is dependent upon the nature of the research question, and the type of research design. As a means of supplementing the information provided herein, the following resources are suggested:

Regression - http://dss.princeton.edu/online_help/analysis/regression_intro.htm

11

Homoscedasticity utexas.edu/courses/schwab/sw388r7/.../Assumptions_Summer2003.ppt http://www.utexas.edu/courses/schwab/sw388r7/SolvingProblems/MultipleRegression_Ass umptionsAndOUtliers.ppt#104 MANOVA - http://faculty.chass.ncsu.edu/garson/PA765/manova.htm ANOVA Homogeneity of Variance Assumption www.math.umt.edu/graham/stat452/varhomog.pdf

12

References Anova Assumptions. (2004). Retrieved June 20, 2010 from www.sjsu.edu/faculty/gerstman/StatPrimer/anova-b.pdf Bartlett, M. S. (1937a). Some examples of statistical methods of research in agriculture and applied biology. J. Roy. Statist. Soc. Suppl. 4: 137-170. Bartlett, M. S. (1937b). Properties of sufficiency and statistical tests. Proc. Roy. Statist. Soc.Ser. A 160: 268-282. Bartletts Test (2010, June 18). Retrieved June 22 from http://en.wikipedia.org/wiki/Bartlett%27s_test Box, G. E. P. (1953) Non-normality and tests on variances. Biometrika 40: 318-335. Brown-Forsythe Test (2010). Retrieved June 22, 2010 from http://en.wikipedia.org/wiki/Brown%E2%80%93Forsythe_test Data transformation. (2009, August 11). Retrieved June 21, 2010, from http://en.wikipedia.org/wiki/Data_transformation_(statistics)). David, H.A. (1952). "Upper 5 and 1% points of maximum F-ratio." Biometrika, 39, 422 424. Garson, G. David (2006). "Multivariate GLM, MANOVA and MANCOVA", from Statnotes: Topics in Multivariate Analysis. Retrieved June 14, 2010 from http://faculty.chass.ncsu.edu/garson/pa765/statnote.htm. Huberty, Carl J. & Morris, John D. (1989). Multivariate analysis versus multiple univariate analyses Psychological Bulletin, 105(2), 302-308.

13

Introduction to Regression. (2007). Retrieved June 18, 2010 from http://dss.princeton.edu/online_help/analysis/regression_intro.htm Linear statistical models: Regression. (1998, January 26). Retrieved June 14, 2010, from http://www.gseis.ucla.edu/courses/ed230bc1/cnotes1/check.html Martin, C. G. and Games, P. A. (1977) Anova tests for homogeneity of variances: nonnormality and unequal samples. Journal of Educational Statistics, 2, 187-206. O'Brien, R.G. (1981). A simple test for variance effects in experimental designs. Psychological Bulletin, 89, 570574. Pearson, E.S., and Hartley, H.O. (1970). Biometrika Tables for Statisticians, Vol 1, Cambridge University Press. Schwab (2005). Assumptions of Homoscedasticity. Retrieved June 8, 2010 from utexas.edu/courses/schwab/sw388r7/.../Assumptions_Summer2003.ppt StatSoft, Inc. (2010). Electronic Statistics Textbook. Tulsa, OK: StatSoft. http://www.statsoft.com/textbook/. Stevens, J. (2009). Applied multivariate statistics for the social sciences. (5th edition). New York: Taylor & Francis Group. Tabachnick & Fidell (1996). Using multivariate statistics. (5th edition). New York: HarperCollins. WEB:

14

You might also like