You are on page 1of 13

Autocorrelation, Multicollinearity & Heteroscedasticity

Autocorrelation
Autocorrelation is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time separation between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal which has been buried under noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

Definitions
Different fields of study define autocorrelation differently, and not all of these definitions are equivalent. In some fields, the term is used interchangeably with autocovariance.

Statistics
In statistics, the autocorrelation of a random process describes the correlation between values of the process at different points in time, as a function of the two times or of the time difference. Let X be some repeatable process, and i be some point in time after the start of that process. (i

may be an integer for a discrete-time process or a real number for a continuous-time process.) Then Xi is the value (or realization) produced by a given run of the process at time i. Suppose that the process is further known to have defined values for mean i and variance i2 for all times i. Then the definition of the autocorrelation between times s and t is

where "E" is the expected value operator. Note that this expression is not well-defined for all time series or processes, because the variance may be zero (for a constant process) or infinite. If the function R is well-defined, its value must lie in the range [1, 1], with 1 indicating perfect correlation and 1 indicating perfect anti-correlation. If Xt is a second-order stationary process then the mean and the variance 2 are timeindependent, and further the autocorrelation depends only on the difference between t and s: the correlation depends only on the time-distance between the pair of values but not on their position in time. This further implies that the autocorrelation can be expressed as a function of the time-lag, and that this would be an even function of the lag = s t. This gives the more familiar form

and the fact that this is an even function can be stated as

It is common practice in some disciplines, other than statistics and time series analysis, to drop the normalization by 2 and use the term "autocorrelation" interchangeably with "autocovariance". However, the normalization is important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the strength of statistical dependence, and because the normalization has an effect on the statistical properties of the estimated autocorrelations.

Signal processing
In signal processing, the above definition is often used without the normalization, that is, without subtracting the mean and dividing by the variance. When the autocorrelation function is normalized by mean and variance, it is sometimes referred to as the autocorrelation coefficient.[1] Given a signal f(t), the continuous autocorrelation Rff() is most often defined as the continuous cross-correlation integral of f(t) with itself, at lag .

where .

represents the complex conjugate and * represents convolution. For a real function,

The discrete autocorrelation R at lag j for a discrete signal xn is

The above definitions work for signals that are square integrable, or square summable, that is, of finite energy. Signals that "last forever" are treated instead as random processes, in which case different definitions are needed, based on expected values. For wide-sense-stationary random processes, the autocorrelations are defined as

For processes that are not stationary, these will also be functions of t, or n. For processes that are also ergodic, the expectation can be replaced by the limit of a time average. The autocorrelation of an ergodic process is sometimes defined as or equated to[1]

These definitions have the advantage that they give sensible well-defined single-parameter results for periodic functions, even when those functions are not the output of stationary ergodic processes. Alternatively, signals that last forever can be treated by a short-time autocorrelation function analysis, using finite time integrals. (See short-time Fourier transform for a related process.) Multi-dimensional autocorrelation is defined similarly. For example, in three dimensions the autocorrelation of a square-summable discrete signal would be

When mean values are subtracted from signals before computing an autocorrelation function, the resulting function is usually called an auto-covariance function.

Durbin-Watson test
A test that the residuals from a linear regression or multiple regression are independent. Method: Because most regression problems involving time series data exhibit positive autocorrelation, the hypotheses usually considered in the Durbin-Watson test are

The test statistic is

where ei = yi yi and yi and yi are, respectively, the observed and predicted values of the response variable for individual i. d becomes smaller as the serial correlations increase. Upper and lower critical values, dU and dL have been tabulated for different values of k (the number of explanatory variables) and n.

Another test for autocorrelation: the BreuschGodfrey test Recall that DW is a test only of whether consecutive errors are related to one another. So, not only can the DW test not be applied if a certain set of circumstances are not fulfilled, there will also be many forms of residual autocorrelation that DW cannot detect. For example, if corr( ut , ut1) = 0, but corr( ut , ut2) _= 0, DW as defined above will not find any autocorrelation. One possible solution would be to replace ut1 in (4.10) with ut2. However, pairwise examinations of the correlations ( ut , ut1), (ut , ut2), (ut , ut3), . . . will be tedious in practice and is not coded in econometrics software packages, which have been programmed to construct DW using only a one-period lag. In addition, the approximation in (4.11) will deteriorate as the difference between the two time indices increases. Consequently,

the critical values should also be modified somewhat in these cases. Therefore, it is desirable to examine a joint test for autocorrelation that will allow examination of the relationship between ut and several of its lagged values at the same time. The Breusch--Godfrey test is a more general test for autocorrelation up to the rth order. The model for the errors under this test is

The null and alternative hypotheses are:

So, under the null hypothesis, the current error is not related to any of its r previous values. The test is carried out as in box 4.4. Note that (T r ) pre-multiplies R2 in the test for autocorrelation rather than T (as was the case for the heteroscedasticity test). This arises because the first r observations will effectively have been lost from the sample in order to obtain the r lags used in the test regression, leaving (T r ) observations from which to estimate the auxiliary regression. If the test statistic exceeds the critical value from the Chi-squared statistical tables, reject the null hypothesis of no autocorrelation. As with any joint test, only one part of the null hypothesis has to be rejected to lead to rejection of the hypothesis as a whole. So the error at time t has to be significantly related only to one of its previous r values in the sample for the null of no autocorrelation to be rejected. The test is more general than the DW test, and can be applied in a wider variety of circumstances since it does not impose the DW restrictions on the format of the first stage regression. One potential difficulty with Breusch--Godfrey, however, is in determining an appropriate value of r , the number of lags of the residuals, to use in computing the test. There is no obvious answer to this, so it is typical to experiment with a range of values, and also to use the frequency of the data to decide. So, for example, if the data is monthly or quarterly, set r equal to 12 or 4, respectively. The argument would then be that errors at any given time would be expected to be related only to those errors in the previous year. Obviously, if the model is statistically adequate, no evidence of autocorrelation should be found in the residuals whatever value of r is chosen.

Remedies for Autocorrelation

If the form of the autocorrelation is known, we could use a GLS procedure i.e. an approach that allows for autocorrelated residuals e.g., Cochrane-Orcutt. But such procedures that correct for autocorrelation require assumptions about the form of the autocorrelation.

If these assumptions are invalid, the cure would be more dangerous than the disease! - see Hendry and Mizon (1978). However, it is unlikely to be the case that the form of the autocorrelation is known, and a more modern view is that residual autocorrelation presents an opportunity to modify the regression.

Multicollinearity
Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. A high degree of multicollinearity can also cause computer software packages to be unable to perform the matrix inversion that is required for computing the regression coefficients, or it may make the results of that inversion inaccurate. Note that in statements of the assumptions underlying regression analyses such as ordinary least squares, the phrase "no multicollinearity" is sometimes used to mean the absence of perfect multicollinearity, which is an exact (non-stochastic) linear relation among the regressors.

Definition
Collinearity is a linear relationship between two explanatory variables. Two variables are perfectly collinear if there is an exact linear relationship between the two. For example, X1 and X2 are perfectly collinear if there exist parameters 0 and 1 such that, for all observations i, we have X2i = 0 + 1X1i. Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related. We have perfect multicollinearity if, for example as in the equation above, the correlation between two independent variables is equal to 1 or -1. In practice, we rarely face perfect multicollinearity in a data set. More commonly, the issue of multicollinearity arises when there is a strong linear relationship among two or more independent variables. Mathematically, a set of variables is perfectly multicollinear if there exist one or more exact linear relationships among some of the variables. For example, we may have

holding for all observations i, where j are constants and Xji is the ith observation on the jth explanatory variable. We can explore one issue caused by multicollinearity by examining the process of attempting to obtain estimates for the parameters of the multiple regression equation

The ordinary least squares estimates involve inverting the matrix XTX where

If there is an exact linear relationship (perfect multicollinearity) among the independent variables, the rank of X (and therefore of XTX) is less than k+1, and the matrix XTX will not be invertible. In most applications, perfect multicollinearity is unlikely. An analyst is more likely to face a high degree of multicollinearity. For example, suppose that instead of the above equation holding, we have that equation in modified form with an error term vi:

In this case, there is no exact linear relationship among the variables, but the Xj variables are nearly perfectly multicollinear if the variance of vi is small for some set of values for the 's. In this case, the matrix XTX has an inverse, but is ill-conditioned so that a given computer algorithm may or may not be able to compute an approximate inverse, and if it does so the resulting computed inverse may be highly sensitive to slight variations in the data (due to magnified effects of rounding error) and so may be very inaccurate.

Detection of multicollinearity
Indicators that multicollinearity may be present in a model: Large changes in the estimated regression coefficients when a predictor variable is added or deleted Insignificant regression coefficients for the affected variables in the multiple regression, but a rejection of the joint hypothesis that those coefficients are all zero (using an F-test) Some authors have suggested a formal detection-tolerance or the variance inflation factor (VIF) for multicollinearity:

o where is the coefficient of determination of a regression of explanator j on all the other explanators. A tolerance of less than 0.20 or 0.10 and/or a VIF of 5 or 10 and above indicates a multicollinearity problem (but see O'Brien 2007).[1] Condition Number Test: The standard measure of ill-conditioning in a matrix is the condition index. It will indicate that the inversion of the matrix is numerically unstable with finite-precision numbers ( standard computer floats and doubles ). This indicates the potential sensitivity of the computed inverse to small changes in the original matrix. The Condition Number is computed by finding the square root of (the maximum eigenvalue divided by the minimum eigenvalue). If the Condition Number is above 30, the regression is said to have significant multicollinearity. Farrar-Glauber Test:[2] If the variables are found to be orthogonal, there is no multicollinearity; if the variables are not orthogonal, then multicollinearity is present.

Consequences of multicollinearity
As mentioned above, one consequence of a high degree of multicollinearity is that, even if the matrix XTX is invertible, a computer algorithm may be unsuccessful in obtaining an approximate inverse, and if it does obtain one it may be numerically inaccurate. But even in the presence of an accurate XTX matrix, the following consequences arise: In the presence of multicollinearity, the estimate of one variable's impact on y while controlling for the others tends to be less precise than if predictors were uncorrelated with one another. The usual interpretation of a regression coefficient is that it provides an estimate of the effect of a one unit change in an independent variable, X1, holding the other variables constant. If X1 is highly correlated with another independent variable, X2, in the given data set, then we only have observations for which X1 and X2 have a particular relationship (either positive or negative). We don't have observations for which X1 changes independently of X2, so we have an imprecise estimate of the effect of independent changes in X1. In some sense, the collinear variables contain the same information about the dependent variable. If nominally "different" measures actually quantify the same phenomenon then they are redundant. Alternatively, if the variables are accorded different names and perhaps employ different numeric measurement scales but are highly correlated with each other, then they suffer from redundancy. One of the features of multicollinearity is that the standard errors of the affected coefficients tend to be large. In that case, the test of the hypothesis that the coefficient is equal to zero leads to a failure to reject the null hypothesis. However, if a simple linear regression of the dependent variable on this explanatory variable is estimated, the coefficient will be found to be significant; specifically, the analyst will reject the hypothesis that the coefficient is zero. In the presence of multicollinearity, an analyst might falsely conclude that there is no linear relationship between an independent and a dependent variable.

A principal danger of such data redundancy is that of overfitting in regression analysis models. The best regression models are those in which the predictor variables each correlate highly with the dependent (outcome) variable but correlate at most only minimally with each other. Such a model is often called "low noise" and will be statistically robust (that is, it will predict reliably across numerous samples of variable sets drawn from the same statistical population). Multicollinearity does not actually bias results; it just produces large standard errors in the related independent variables. In a pure statistical sense multicollinearity does not bias the results, but if there are any other problems which could introduce bias multicollinearity can multiply (by orders of magnitude) the effects of that bias. More importantly, the usual use of regression is to take coefficients from the model and then apply them to other data. If the new data differs in any way from the data that was fitted you may introduce large errors in your predictions because the pattern of multicollinearity between the independent variables is different in your new data from the data you used for your estimates.

Remedies for multicollinearity


Make sure you have not fallen into the dummy variable trap; including a dummy variable for every category (e.g., summer, autumn, winter, and spring) and including a constant term in the regression together guarantee perfect multicollinearity. Try seeing what happens if you use independent subsets of your data for estimation and apply those estimates to the whole data set. Theoretically you should obtain somewhat higher variance from the smaller datasets used for estimation, but the expectation of the coefficient values should be the same. Naturally, the observed coefficient values will vary, but look at how much they vary. Leave the model as is, despite multicollinearity. The presence of multicollinearity doesn't affect the fitted model provided that the predictor variables follow the same pattern of multicollinearity as the data on which the regression model is based[citation needed][dubious discuss] . Drop one of the variables. An explanatory variable may be dropped to produce a model with significant coefficients. However, you lose information (because you've dropped a variable). Omission of a relevant variable results in biased coefficient estimates for the remaining explanatory variables. Obtain more data, if possible. This is the preferred solution. More data can produce more precise parameter estimates (with lower standard errors), as seen from the formula in variance inflation factor for the variance of the estimate of a regression coefficient in terms of the sample size and the degree of multicollinearity. Mean-center the predictor variables. Mathematically this has no effect on the results from a regression. However, it can be useful in overcoming problems arising from rounding and other computational steps if a carefully designed computer program is not used. Standardize your independent variables. This may help reduce a false flagging of a condition index above 30.

It has also been suggested that using the Shapley value, a game theory tool, the model could account for the effects of multicollinearity. The Shapley value assigns a value for each predictor and assesses all possible combinations of importance.[3] Ridge regression or principal component regression can be used. If the correlated explanators are different lagged values of the same underlying explanator, then a distributed lag technique can be used, imposing a general structure on the relative values of the coefficients to be estimated.

Examples of contexts in which multicollinearity arises Survival analysis


Multicollinearity may also represent a serious issue in survival analysis. The problem is that time-varying covariates may change their value over the time line of the study. A special procedure is recommended to assess the impact of multicollinearity on the results. See Van den Poel & Larivire (2004)[4] for a detailed discussion.

Interest rates for different terms to maturity


In various situations it might be hypothesized that multiple interest rates of various terms to maturity all influence some economic decision, such as the amount of money or some other financial asset to hold, or the amount of fixed investment spending to engage in. In this case, including these various interest rates will in general create a substantial multicollinearity problem because interest rates tend to move together. If in fact each of the interest rates has its own separate effect on the dependent variable, it can be extremely difficult to separate out their effects.

Heteroscedasticity
In statistics, a collection of random variables is heteroscedastic, or heteroskedastic, if there are sub-populations that have different variabilities than others. Here "variability" could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. The possible existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, because the presence of heteroscedasticity can invalidate statistical tests of significance that assume the effect and residual (error) variances are uncorrelated and normally distributed.

Definition
Suppose there is a sequence of random variables {Yt}t=1n and a sequence of vectors of random variables, {Xt}t=1n. In dealing with conditional expectations of Yt given Xt, the sequence {Yt}t=1n is said to be heteroskedastic if the conditional variance of Yt given Xt, changes with t. Some

authors refer to this as conditional heteroscedasticity to emphasize the fact that it is the sequence of conditional variance that changes and not the unconditional variance. In fact it is possible to observe conditional heteroscedasticity even when dealing with a sequence of unconditional homoscedastic random variables, however, the opposite does not hold. When using some statistical techniques, such as ordinary least squares (OLS), a number of assumptions are typically made. One of these is that the error term has a constant variance. This might not be true even if the error term is assumed to be drawn from identical distributions. For example, the error term could vary or increase with each observation, something that is often the case with cross-sectional or time series measurements. Heteroscedasticity is often studied as part of econometrics, which frequently deals with data exhibiting it. White's influential paper[1] used "heteroskedasticity" instead of "heteroscedasticity" whereas the latter has been used in later works.[2]

Consequences
Heteroscedasticity does not cause ordinary least squares coefficient estimates to be biased, although it can cause ordinary least squares estimates of the variance (and, thus, standard errors) of the coefficients to be biased, possibly above or below the true or population variance. Thus, regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect. Biased standard errors lead to biased inference, so results of hypothesis tests are possibly wrong. An example of the consequence of biased standard error estimation which OLS will produce if heteroskedasticity is present, is that a researcher may find at a selected confidence level, results compelling against the rejection of a null hypothesis as statistically significant when that null hypothesis was in fact uncharacteristic of the actual population (i.e., make a type II error). It is widely known that, under certain assumptions, the OLS estimator has a normal asymptotic distribution when properly normalized and centered (even when the data does not come from a normal distribution). This result is used to justify using a normal distribution, or a chi square distribution (depending on how the test statistic is calculated), when conducting a hypothesis test. This holds even under heteroscedasticity. More precisely, the OLS estimator in the presence of heteroscedasticity is asymptotically normal, when properly normalized and centered, with a variance-covariance matrix that differs from the case of homoscedasticity. In 1980, White[1] proposed a consistent estimator for the variance-covariance matrix of the asymptotic distribution of the OLS estimator. This validates the use of hypothesis testing using OLS estimators and White's variance-covariance estimator under heteroscedasticity. Heteroscedasticity is also a major practical issue encountered in ANOVA problems.[3] The F test can still be used in some circumstances.[4] However, it has been said that students in econometrics should not overreact to heteroskedasticity.[2] One author wrote, "unequal error variance is worth correcting only when

the problem is severe."[5] And another word of caution was in the form, "heteroscedasticity has never been a reason to throw out an otherwise good model."[6][2] With the advent of robust standard errors allowing for inference without specifying the conditional second moment of error term, testing conditional homoscedasticity is not as important as in the past.[citation needed] The econometrician Robert Engle won the 2003 Nobel Memorial Prize for Economics for his studies on regression analysis in the presence of heteroscedasticity, which led to his formulation of the autoregressive conditional heteroscedasticity (ARCH) modeling techniques.

Examples
Heteroscedasticity often occurs when there is a large difference among the sizes of the observations. A classic example of heteroscedasticity is that of income versus expenditure on meals. As one's income increases, the variability of food consumption will increase. A poorer person will spend a rather constant amount by always eating less expensive food; a wealthier person may occasionally buy inexpensive food and at other times eat expensive meals. Those with higher incomes display a greater variability of food consumption. One can correct for this problem by focusing just on the people in one's sample who always eat as much as they can, but who also have aesthetic criteria about what they eat. This doubled constraint holds the variability constant across changes in income level. This technique of focusing on the fattest ("weighted"), coolest people ("least squares") in one's sample, gave its name to a general method of data analysis. Imagine you are watching a rocket take off nearby and measuring the distance it has traveled once each second. In the first couple of seconds your measurements may be accurate to the nearest centimeter, say. However, 5 minutes later as the rocket recedes into space, the accuracy of your measurements may only be good to 100 m, because of the increased distance, atmospheric distortion and a variety of other factors. The data you collect would exhibit heteroscedasticity.

In statistics, the BreuschPagan test (named after Trevor Breusch and Adrian Pagan) is used to test for heteroscedasticity in a linear regression model. It tests whether the estimated variance of the residuals from a regression are dependent on the values of the independent variables. Suppose that we estimate the equation

We can then estimate , the residual. Ordinary least squares constrains these so that their mean is 0, so we can calculate the variance as the average squared values. Even simpler is to simply regress the squared residuals on the independent variables, which is the Breusch Pagan test:

If an F-test confirms that the independent variables are jointly significant then we can reject the null hypothesis of homoscedasticity. The BreuschPagan test tests for conditional heteroscedasticity. It is a chi-squared test: the test statistic is n2 with k degrees of freedom. If the BreuschPagan test shows that there is conditional heteroscedasticity, it can be corrected by using the Hansen method, using robust standard errors, or re-thinking the regression equation.

You might also like