Statistical Analyses of Multivariate Time Series Data With Application To Compacting Effects On Soil Chemical and Biological Properties in Forestry

STATISTICAL ANALYSES OF MULTIVARIATE
TIME SERIES DATA

WITH APPLICATION TO COMPACTING
EFFECTS ON SOIL CHEMICAL AND
BIOLOGICAL PROPERTIES IN FORESTRY
VOLUME ONE
By Stuart Fenech BSc (AES)

Australian School of Environmental Studies
Faculty of Environmental Science
GRIFFITH UNIVERSITY
BRISBANE
This dissertation is submitted in partial fulfilment of the requirements of the degree of

Bachelor of Science with Honours in Australian Environmental Studies.
October 2002
DECLARATION
I, Stuart Anthony Fenech, hereby declare that this work has not been submitted for a
degree or diploma in any university. To the best of my knowledge and belief, the
dissertation contains no material previously published or written by another person
except where due reference is made in the dissertation itself.
_____________________________________
Stuart Fenech
October 2002
ii
ABSTRACT
When repeated measurements are recorded over time, the result is a time series. The
nature of a measurement being taken over time is that the values that result are likely to
be correlated. Commonly more than one time series (univariate) may be recorded,
resulting in a multiple variable (multivariate) time series situation. Statistical analyses for
univariate and multivariate time series are the focus of this investigation.
A practical approach was adapted to the presentation of available methods for dealing
with data correlated over time. Basic principles were presented before full repeated
measures and time series techniques for statistical analyses. Repeated measures were
found suited to shorter time series while time series techniques were better suited to
longer time series (ie. length of more than 25). Both areas of statistical analyses can be
applied to data correlated over time. Two main repeated measures techniques of split plot
designs and MANOVA, which have usage outside of time series, were introduced and
evaluated. Various traditional univariate time series models were detailed including
autoregressive integrated moving average models (ARIMA) models. Multivariate time
series models were then presented, including multiple independent variable and vector
based variants of ARIMA models. Practical examples from a rainfall data set illustrated
the well developed and supported concepts detailed. Each section built on those presented
previously in a logical, orderly fashion.
A review of recent theoretical developments and practical applications in the area of

multiple time series was provided. A large variety of fields make use of multiple time
series and the direction taken by theoretical and practical literature varied depending
largely on the particular field. Recent developments in ARIMA, genetic algorithms,
nonlinear developments and more were discussed. Pointers were given towards possible
future directions for analyses of data correlated over time.
A detailed forestry application based on data from an experiment on the effects of

compaction and cultivation on soil chemical and biological properties over time was
presented. Due to the short time series (less than twenty time periods) nature of the
iii
experiments, split plot and MANOVA were utilised for most analysis. Moving average
smoothers and cross correlation functions proved useful for exploring relationships
between treatments and variables. Interpreted together, the split plot and MANOVA
designs were found to be far more informative than either could be in isolation. Many
significant relationships were determined from the original data set.
A number of statistical issues were found to be very important when considering analyses
of data correlated over time. Large amounts of natural variation or error make the
establishment of significant relationships difficult. Hence it is important to carefully
consider sources of variation in experimental designs. All analyses covered require
certain assumptions that need to be carefully monitored. In the case of ARIMA time
series analysis, stationarity of mean and variance is commonly violated. In the case of
split plot and MANOVA designs, normality must be watched.
iv
ACKNOWLEDGEMENTS
Firstly a big thankyou to my wife Leanne (Fenech) for the patience and occasional
bravery to have a go at understanding statistics. Thankyou to Mum (Denise Fenech), Dad
(Louis Fenech) and Grandma (Hazel Corby) who have always been and continue to
always be there. Cheers to my assorted uncles, aunts, cousins, rellies, in-laws, neighbours
and even brother Scott, who see Stuart first and mathematical nut second.
The old school friends have been supportive, fun, and kept my feet firmly on the
ground. Thankyou particularly to Ben Marks, Walter Haas, Nicholas Page, Thomas
White, Simon Mahoney, Cara Barnes, Kym Higgins and Lisa Tarca.
Thankyou to all at the Griffith University Cooperative Research Centre for Sustainable
Production Forestry (CRC) for your support. In particular, thankyou to Tim Blumfield,
ZhiHong Xu and Chengrong Chen for your time.
A special thankyou to my supervisor Janet Chaseling. It was not so long ago I was
seventeen and hiding in statistics lectures being happily unnoticed. Thankyou for coaxing
me out of the shadows and into the light of statistics. Your time, dedication, experience,
general knowledge and well placed stirring are greatly appreciated.
Thankyou to all the other people I have had the pleasure of dealing with at Griffith
University over the years. Of particular note are Andrew Rock, Rodney Topor, Carlo
Hamalainen, Alex Creagh, Rebecca OLeary and Cameron Hurst.
Cheers!
Stuart Fenech October 2002

http://www.humanfrailty.com/
TABLE OF CONTENTS
ABSTRACT ...................................................................................................iii
ACKNOWLEDGEMENTS ............................................................................ v
LIST OF FIGURES......................................................................................... x
LIST OF TABLES .......................................................................................xiii
GLOSSARY................................................................................................. xiv
SYMBOLS .................................................................................................... xv
1
INTRODUCTION ................................................................................ 1
TIME SERIES THEORY ..................................................................... 5
2.1
Fundamental Statistical Concepts ..................................................................... 5

2.1.1
Univariate Information................................................................................ 5
2.1.2
Bivariate Information ................................................................................ 11
2.1.3
Dependence within Variables.................................................................... 12
2.1.4
Statistical Measures and Terms................................................................. 15
2.1.5
Hypothesis Testing Overview ................................................................... 16
2.1.6
Outliers ...................................................................................................... 16
2.2
Correlation Functions ...................................................................................... 17

2.2.1
The Autocorrelation Function (ACF)........................................................ 17
2.2.2
The Partial Autocorrelation Function (PACF) .......................................... 19
2.2.3
The Inverse Autocorrelation Function (IACF).......................................... 20
2.2.4
The Cross Correlation Function (CCF)..................................................... 23
2.3
Repeated Measures Models............................................................................. 26

2.3.1
Background ............................................................................................... 27
2.3.2
Split Plot Designs ...................................................................................... 30
2.3.3
MANOVA................................................................................................. 32
2.4
Univariate Time Series Models....................................................................... 36

2.4.1
Time Series Model Components ............................................................... 36
2.4.2
General Time Series Models ..................................................................... 39

vi
2.4.3
Moving Averages ...................................................................................... 40
2.4.4
Simple Linear Regression ......................................................................... 44
2.4.5
Multiple Linear Regression....................................................................... 45
2.4.6
Stationarity ................................................................................................ 48
2.4.7
Backshift Notation..................................................................................... 52
2.4.8
AR (Autoregressive) Models .................................................................... 52
2.4.9
MA (Moving Average) Models................................................................. 55
2.4.10
ARMA (Autoregressive Moving Average) Models.................................. 58
2.4.11
ARIMA (Autoregressive Integrated Moving Average) Models ............... 58
2.4.12
Forecasting ................................................................................................ 64
2.5
Multivariate Time Series Models.................................................................... 65

2.5.1
Multivariate ARIMA Models.................................................................... 65
2.5.2
Vector ARIMA Models............................................................................. 70
THEORY LITERATURE REVIEW.................................................. 73
3.1
AR/ARMA/ARIMA Developments................................................................ 76
3.2
ARIMA Alternative Developments ................................................................ 80
3.3
Bayesian Developments .................................................................................. 82
3.4
Nonlinear Developments................................................................................. 83
3.5
Miscellaneous Developments.......................................................................... 85
APPLICATION LITERATURE REVIEW........................................ 88
4.1
Medical Applications ...................................................................................... 92
4.2
Economic Applications ................................................................................... 94
4.3
Sociology Applications ................................................................................... 96
4.4
Natural Phenomena Applications.................................................................... 97
FORESTRY CASE STUDY ............................................................ 105
5.1
Background ................................................................................................... 105
5.2
Previous Data Analysis ................................................................................. 110

5.2.1
Chemical Data ......................................................................................... 110
5.2.2
Biological Data........................................................................................ 112
5.3
Limitations and Scope................................................................................... 114
5.4
Data Analysis Techniques............................................................................. 115

5.4.1
Analysis Direction................................................................................... 115

vii
5.4.2
Exploratory Data Analysis (EDA) .......................................................... 120
5.4.3
Correlation Analysis................................................................................ 121
5.4.4
Overall Split Plot Designs ....................................................................... 122
5.4.5
Overall MANOVA Designs .................................................................... 125
5.4.6
Season Based Split Plot Designs ............................................................. 127
5.4.7
Season Based MANOVA Designs.......................................................... 129
5.4.8
Multiple Comparison Tests ..................................................................... 130
5.5
5.6
Data Analysis and Results............................................................................. 132

5.5.1
Nitrate Levels .......................................................................................... 132
5.5.2
Ammonium Levels .................................................................................. 136
5.5.3
Total Mineral Nitrogen Levels ................................................................ 142
5.5.4
Nitrate Dynamics..................................................................................... 146
5.5.5
Ammonium Dynamics ............................................................................ 149
5.5.6
Total Mineral Nitrogen Dynamics .......................................................... 151
5.5.7
Nitrate Leaching...................................................................................... 152
5.5.8
Ammonium Leaching.............................................................................. 154
5.5.9
Total Mineral Nitrogen Leaching............................................................ 156
5.5.10
Microbial Carbon Levels......................................................................... 158
5.5.11
Microbial Nitrogen Levels ...................................................................... 162
5.5.12
Microbial Carbon to Nitrogen Ratio ....................................................... 166

General Discussion........................................................................................ 174
CONCLUSION................................................................................. 178
REFERENCES............................................................................................ 182
APPENDIX A SAS EXAMPLES INPUT............................................... 187
APPENDIX B SAS EXAMPLES OUTPUT........................................... 191
APPENDIX C EXPERIMENT TIME PERIODS ................................... 199
APPENDIX D VARIABLE LIST ........................................................... 200
APPENDIX E NITRATE LEVELS ........................................................ 205
APPENDIX F AMMONIUM LEVELS .................................................. 214
APPENDIX G TOTAL MINERAL NITROGEN LEVELS ................... 223
viii
APPENDIX H NITRATE DYNAMICS ................................................. 232

APPENDIX I AMMONIUM DYNAMICS............................................. 241
APPENDIX J TOTAL MINERAL NITROGEN DYNAMICS .............. 245
APPENDIX K NITRATE LEACHING .................................................. 249
APPENDIX L AMMONIUM LEACHING ............................................ 253
APPENDIX M TOTAL MINERAL NITROGEN LEACHING............. 257
APPENDIX N MICROBIAL CARBON LEVELS................................. 261
APPENDIX O MICROBIAL NITROGEN LEVELS ............................. 269
APPENDIX P MICROBIAL C:N RATIO .............................................. 277
ix
LIST OF FIGURES
Figure 2.1: Skewness left, right and no skew................................................................... 7
Figure 2.2: May rainfall plotted against year. ..................................................................... 9
Figure 2.3: Correlations of 1 (positive), -1 (negative) and 0.1 (weak positive)................ 12
Figure 2.4: Rainfall over time from 1985 to 2001 inclusive line graph......................... 13
Figure 2.5: Example autocorrelation function................................................................... 18
Figure 2.6: Monthly rainfall graphical autocorrelation function....................................... 21
Figure 2.7: Monthly rainfall partial graphical autocorrelation function. .......................... 22
Figure 2.8: Monthly rainfall inverse graphical autocorrelation function. ......................... 22
Figure 2.9: Rainfall and Days of Rain over time from 1985 to 2001 inclusive. ............... 24
Figure 2.10: Rainfall and Days of Rain cross correlation function................................... 24
Figure 2.11: Time plot of a pure trend component. .......................................................... 37
Figure 2.12: Time plot of a pure seasonal component. ..................................................... 37
Figure 2.13: Time plot of a pure random component. ...................................................... 38
Figure 2.14: Time plot of trend, season and random components (additive).................... 39
Figure 2.15: A time series before and after applying a moving average smoother........... 40
Figure 2.16: Applying a 2 4MA moving average to rainfall data (1994 to 2001). ........ 44
Figure 2.17: Simple linear regression presented graphically. ........................................... 45
Figure 2.18: Applying differencing to a created series with a clear trend. ....................... 51
Figure 2.19: Typical autocorrelation function for AR(1), positive 1 ............................... 53
Figure 2.20: Typical partial autocorrelation function for AR(1), positive 1 . .................. 54
Figure 2.21: Typical autocorrelation function for AR(p). ................................................. 54
Figure 2.22: Typical partial autocorrelation function for AR(p). ...................................... 55
Figure 2.23: Typical autocorrelation function for MA(1), positive 1 .............................. 56
Figure 2.24: Typical partial autocorrelation function for MA(1), positive 1 ................... 56
Figure 2.25: Typical autocorrelation function for MA(q)................................................. 57
Figure 2.26: Typical partial autocorrelation function for MA(q)...................................... 57
Figure 2.27: Time plot of rainfall over time from 1985 to 2001 inclusive. ...................... 62
Figure 5.1: Picture of the forwarder used for compaction in experiments...................... 106
Figure 5.2: Three sampling cores in the ground at Yarraman. One is being removed. .. 108
Figure 5.3: Mean mineral nitrogen levels (kgN/ha) over the nineteen months............... 117
Figure 5.4: Mean mineral nitrogen dynamics (kgN/ha) over the nineteen months......... 117
Figure 5.5: Graphical notation for compaction and cultivation options. ........................ 121
Figure 5.6: Back transformed means ( S.E.) for compaction effects on mean
nitrate levels in season one...................................................................................... 134
nitrate levels in season two...................................................................................... 135
Figure 5.8: Back transformed means ( S.E.) for compaction and cultivation effects
on mean ammonium levels in season one. .............................................................. 138
ammonium levels in season two (each month separately). ..................................... 139
Figure 5.10: Back transformed means ( S.E.) for cultivation effects on mean
ammonium levels in season two (each month separately). ..................................... 140
Figure 5.11: Back transformed means ( S.E.) for compaction and cultivation
effects on mean ammonium levels in season three. ................................................ 141
Figure 5.12: Back transformed means ( S.E.) for compaction effects on mean total
mineral nitrogen levels in season one. .................................................................... 144
effects on mean total mineral nitrogen levels in season three................................. 145
nitrate dynamics in season one (each month separately). ....................................... 148
effects on mean ammonium dynamics. ................................................................... 150
nitrate leaching. ....................................................................................................... 153
ammonium leaching. ............................................................................................... 155
Figure 5.18: Back transformed means ( S.E.) for cultivation effects on mean total
mineral nitrogen leaching........................................................................................ 157
Figure 5.19: Microbial carbon levels by compaction and cultivation over time............. 158
Figure 5.20: Graphical cross correlation function - microbial carbon and soil
moisture................................................................................................................... 159
effects on mean microbial carbon levels in season two. ......................................... 160
xi
effects on mean microbial carbon levels in season three. ....................................... 161
Figure 5.23: Back transformed means ( S.E.) for block effects on mean microbial
nitrogen levels. ........................................................................................................ 163
microbial nitrogen levels in season one (each month separately)........................... 164
microbial nitrogen levels in season three (each month separately)......................... 165
Figure 5.26: Graphical cross correlation function microbial carbon to nitrogen
ratio and soil moisture. ............................................................................................ 167
effects on the mean microbial carbon to nitrogen ratio in season one (each
month separately). ................................................................................................... 168
effects on the mean microbial carbon to nitrogen ratio in season two (each
month separately). ................................................................................................... 169
effects on the mean microbial carbon to nitrogen ratio in season three (month
nine)......................................................................................................................... 170
effects on the mean microbial carbon to nitrogen ratio in season three (months
ten and eleven). ....................................................................................................... 171
effects on the mean microbial carbon to nitrogen ratio in season four. .................. 172
xii
LIST OF TABLES
Table 2.1: Autocovariance and autocorrelation data from monthly rainfall. .................... 14
Table 2.2: Split plot projected ANOVA for moisture repeated measures example. ......... 31
Table 2.3: Rainfall data from January to December in 2001. ........................................... 43
Table 2.4: First and second differencing applied to 2001 rainfall data............................. 51
Table 3.1: The fields of research involved in recent theoretical articles........................... 74
Table 3.2: Aspects looked at by researchers in recent theoretical articles. ....................... 76
Table 4.1: Field of application for recent literature articles.............................................. 89
Table 4.2: Techniques used in detail in recent literature articles. ..................................... 91
Table 5.1: Summary of factors and variables provided for analysis in the case study. .. 109
Table 5.2: Standard summary notation for factor levels. ................................................ 120
Table 5.3: Legend for symbols denoting significance. ................................................... 120
Table 5.4: Structure and df in overall split plot ANOVA designs. ................................. 124
Table 5.5: Structure and df in seasonal split plot ANOVA designs................................ 128
xiii
GLOSSARY
ACF
Autocorrelation function.
AIC
Akaikes information criterion.
ANOVA
Analysis of variance.
AR
Autoregressive model.
ARIMA
Autoregressive integrated moving average model.
ARMA
Autoregressive moving average model.
ARMAX
Autoregressive moving average model with explanatory variables.
CCF
Cross correlation function.
CRD
Completely randomised design.
CV
Coefficient of variance.
DF
Degrees of freedom.
DPI
Department of Primary Industries.
EDA
Exploratory data analysis.
IACF
Inverse autocorrelation function.
MA
Moving average.
MANOVA
Multivariate analysis of variance.
MSE
Mean square error.
PACF
Partial autocorrelation function.
PCA
Principle components analysis.
RCB
Randomised complete block.
SE
Standard error.
VAR
Vector autoregressive model.
VARMA
Vector autoregressive moving average model.
VMA
Vector moving average model.
xiv
SYMBOLS
Significance level (usually 0.05).
Simple or multiple regression model parameters.
Backshift operator.
A constant (in models) and the sample covariance.
Order of first differencing used.
Order of seasonal first differencing used.
Error or random variation.
Population mean.
Sample size.
Order of autoregressive model components.
Order of seasonal autoregressive model components.
Order of moving average model components.
Order of seasonal moving average model components.
Sample correlation coefficient.
r2
Coefficient of determination.
Sample standard deviation.
Time.
Sample mean (of a variable X).
Autoregressive model parameter (and Dickey-Fuller test parameter).
Seasonal autoregressive model parameter.
Moving average model parameter.
Seasonal moving average model parameter.
xv
1 INTRODUCTION
When a random variable is measured at a number of different times, the result is a
univariate (single variable) time series. Values in a time series are usually correlated due
to the nature of multiple recordings on the same entity. If many different variables are
recorded over time then the situation is a multivariate (multiple variable) time series. This
dissertation investigates the effective analysis of one or more variables where data are
correlated over time.
The project providing the motivation for this thesis comes from forestry experiments
conducted at Yarraman, Australia. The data used came from experiments on the effects of
compaction and cultivation on soil chemical and biological properties. Two data sets
were provided by the Griffith University Cooperative Research Centre for Sustainable
Production Forestry (CRC), one for chemical properties and one for biological properties.
The data sets contain many variables measured over time, and represent a multivariate
time series situation. Details of the experiment and of the initial data analysis are given in
Blumfield et al. (2002) for the chemical data set and Chen et al. (2002) for the biological
data set.
A plethora of techniques are available for the analysis of univariate and multivariate time
series situations. Time series occur in many fields, including finance, physics, computing,
medicine, ecology and forestry. The range of methods and techniques available is as
varied as the fields from which they come. This dissertation investigates modern time
series analysis techniques before providing a detailed forestry application involving
research into the effects of compaction and cultivation on soil chemical and biological
properties. The general aim is to inspire confidence and understanding in dealing
theoretically and practically with data correlated over time. To aid in achieving this,
complex time series concepts are presented by building from the basics.
Advancements in computing mean that time series data can easily be analysed on the
average personal computer, given appropriate software. For the purposes of this
dissertation the SAS statistical package (SAS Institute, 1999) is mainly used for
assistance in calculations. However, the best software in the world will not help without
an understanding of what the software is doing. Therefore software is regarded as a tool
for analysis and is only referred to after theoretical aspects are covered.
Three main sections form this dissertation. The first involves reviewing modern accepted
time series analytical techniques. The second section involves a critical review of recent
developments and applications in the multivariate time series field. The final section
involves a detailed application of methods for dealing with data correlated over time. A
more detailed outline of the dissertation is provided below.
Chapter two contains analytical techniques commonly applied to data correlated over
time. Complicated techniques are addressed via a gradual approach that starts with
looking at the basics. Basic statistical concepts that appear in the analysis of correlated
data such as covariance, correlation, standard error, outliers, hypothesis testing and so
forth are looked at first. Four correlation functions commonly used as an exploratory
precursor to time series modelling are presented.
Two main areas of detailed analysis are covered in chapter two. The first is repeated
measures analysis, which can be applied to the time series case. Two particular methods
that can be applied to repeated measures, namely, split plot designs and MANOVA, are
presented in detail. These repeated measures techniques can be used for analysis of any
number of time series. The second group of analytical techniques covered are known as
time series techniques. These techniques are developed specifically for the analysis of
time series. Both univariate and multivariate techniques are reviewed. Univariate
techniques include AR (autoregressive) models, MA (moving average) models, ARMA
(autoregressive moving average) and ARIMA (autoregressive integrated moving average)
models. Multivariate techniques looked into include multivariate ARIMA and vector
ARIMA. Forecasting is not investigated as it is beyond the scope of this thesis.
To accompany the presentation of theory in chapter two is a series of examples. The
practical examples are based on rainfall data collected by the authors father (Louis
Fenech) over seventeen years at Buccan, Queensland, Australia. Assistance on using the
SAS statistical package (SAS Institute, 1999) for analysis is also provided via these
examples.
Theoretical developments in multivariate time series analysis are covered in chapter
three. A rich and varied set of papers from the last eight years are reviewed for their
contribution towards the multivariate time series wealth of knowledge. Many
developments are based on the ARIMA models from chapter two while others involve
ARIMA alternatives, Bayesian statistics and nonlinear techniques among others. This
chapter presents an understandable delving into the many directions of multivariate time
series progress.
Chapter four investigates practical applications of time series techniques from the last
eight years. The methods and techniques used are critically reviewed and reported on
with a view towards practical understanding. Recent articles involve either medical,
economic, sociology or natural phenomena applications.
Chapter five presents a detailed case study analysing the Yarraman data introduced
earlier. Information on the background of the experiments is provided and previous data
analysis carefully reviewed. The purpose of analysis is to investigate the effects of soil
cultivation, compaction, and their possible interaction, over time on soil biological and
chemical variables. Detailed analysis using advanced techniques is applied to twelve
separate variables, nine from the chemical data set and three from the biological data set.
Moving average smoothers, correlation, split plot designs, MANOVA and Bonferroni
modified multiple comparison tests feature among the techniques applied in data analysis.
The forestry case study concludes with a discussion about the statistical analysis
undertaken. The value of all methods and techniques used are reviewed. Considerations
are given towards possible advancement of analyses of this and similar future
experiments.
The dissertation is provided in two volumes. The first volume encompasses all of the
chapters discussed above while the second volume contains appendices. The majority of
the appendices are comprised of raw data analysis from the forestry application.
The specific aims of this dissertation are:
To provide a clear and precise introductory guide to techniques available for the
analysis of data correlated over time, including repeated measures, univariate time
series and multivariate time series techniques.
To investigate recent theoretical developments in modelling of multivariate time

series situations.
To investigate recent applications of multivariate time series techniques.
To apply techniques for dealing with data correlated over time to a data set on
compaction and cultivation effects on soil chemical and biological properties over
time.
While not every concept relating to dealing with correlated data over time is covered
within, every effort has been taken to ensure that a thorough cross section of current time
series topics is covered. After reading this thesis it is intended that the reader will have a
working knowledge of dealing with data correlated over time in the theoretical and
practical sense.
2 TIME SERIES THEORY
2.1
Fundamental Statistical Concepts
A random variable is a characteristic or attribute that assumes randomly different values

(Bluman, 2001). This definition allows a lot of things (eg. temperature, monthly rainfall)
to be classified as random variables. For ease of notation this thesis refers to random
variables as upper case letters (eg. X) and particular variable values using a subscript (eg.
X1).
In this section sample data is dealt with exclusively. Occasionally population data may be
available for analysis but as this is uncommon the focus here is on samples. Summary
values calculated from samples are referred to as statistics. A statistic is estimator of a
property for a population parameter, the true value that would be retrieved should the
entire population be analysed.
2.1.1
Univariate Information
Initial investigation into a single random variable (univariate) is often referred to as

exploratory data analysis (EDA). A usual first stage in EDA is to plot a graph of the
variable being investigated. From here there are a number of summary measures that can
be calculated to give more information than can be usually seen with the human eye.
The most common summary measures are measures of central tendency and measures of
dispersion. Measures of central tendency look at expected values and answer questions
such as what is the average temperature? Measures of dispersion look at how spread
out the data is and look at questions like how varied is the temperature?
Two common measures of central tendency are the median and mean (arithmetic
average). The median is the middle value that the variable takes when ordered. The mean,
more commonly seen in statistical analysis, is denoted using a variable with a bar on the
top (eg. X is the mean of variable X). The mean is calculated by adding all variable
values and dividing them by the number of data entries. This is expressed in Equation 2.1
where the number of variable values is n and Xi represents individual variable values. The
mean is a commonly seen and generally well understood summary statistic used in a large
range of statistical analysis in the time series area. While mathematically precise, the
mean is affected by extreme values (outliers) and hence can sometimes be misleading.
n
X =
X
i =1
( 2.1 )
The most common measures of dispersion are the range, variance and standard deviation.
The range is simply the smallest value the variable takes subtracted from the largest
value. The variance s 2 is a weighted sum of the differences between the data and the
mean, in effect giving the amount of dispersion around the mean. The standard deviation
s (Equation 2.2) is the square root of the variance and is close in concept to the average
distance of data from the average. The units of the standard deviation are the same as in
the original variable.
1 n
s=
X i X
n 1 i =1
( 2.2 )
Closely related to the standard deviation is the coefficient of variance (CV), which is in
effect a scaled standard deviation. The problem with the standard deviation and variance
is that as means get larger the standard deviation and variance are likely to also get larger,
making direct comparison of these measures difficult (Zar, 1999). By scaling using the
mean as shown in Equation 2.3 the coefficient of variance gives a standardised
comparable measure of no particular units.
CV = 100
s
X
( 2.3 )
Other measures of dispersion include inter quartile range, mean of absolute deviations
(MAD) and mean of squared deviations (MSD). Variance and standard deviation are
more commonly used and accepted due to favourable statistical properties (Makridakis et
al., 1998).
There are many other summary measures available for random variables. One of these is
skewness, which looks at the shape of the distribution formed by a random variable. The
distribution of a variable is said to have a left skew, no skew or a right skew depending
on the dispersion of values as seen in Figure 2.1. The raw formula for calculating skew
involves cubed differences between variable values and the mean (whereas the standard
deviation squared differences). Equation 2.4 shows one of many forms shown for the
calculation of skew. A skew value of zero indicates no skew while a value less than zero
indicates a skew to the left and values more than zero a skew to the right.
Skewness
No Skew
Right Skew
Left Skew
Figure 2.1: Skewness left, right and no skew.

n
X X
n
i
skew =
(n 1)(n 2) i =1 s
( 2.4 )
Kurtosis is a statistic that gives another indication of the shape of a distribution formed
by a random variable in comparison to a normal distribution. The kurtosis formula given
in Equation 2.5 uses differences between variable values and the mean taken to the power
of four. A kurtosis value of zero indicates that the shape is mesokurtic and is as the
form of the bell shape normal distribution. When the kurtosis value is more than zero
the distribution is leptokurtic and tends to have more values further away from the
mean, causing a thinner appearance around the mean. Alternatively, a kurtosis value of
less than zero indicates a platykurtic distribution where there are more values around
the mean causing a wider appearance (Zar, 1999).
n
X X
n(n + 1)
i
kurtosis =
(n 1)(n 2)(n 3) i =1 s
3(n 1)
(n 2 )(n 3)
( 2.5 )
Example 1: Single Variable Investigation on May Rainfall

Being ever so slightly statistically obsessed, the Fenech family has been recording
rainfall since 1985 on their humble residence in Buccan, south of Brisbane, Queensland,
Australia. Every morning around 6am for the better part of seventeen years Louis Fenech,
the authors father, has dutifully recorded the rainfall in millimetres.
In this example May rainfall is investigated. For the investigation it is assumed that the
May rainfall data available (1985 to 2002) is a random sample from a population of May
rainfall.
The Fenech residence wants to know any interesting information, including the median,
mean, standard deviation and skewness. The total rainfall in millimetres (mm) for each
May from 1985 to 2002 has been as follows: 81, 89, 124.6, 8, 162.6, 229.5, 68.1, 85.05,
51.5, 64.35, 41.4, 582.55, 142.35, 153.9, 94.95, 57, 40.75, 50. A graph showing this data
graphically is given in Figure 2.2.
May Rainfall
Rainfall (mm)
600
500
400
300
200
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
100
Year
Figure 2.2: May rainfall plotted against year.

The median (middle value) tells the middle value of the rainfall measurements in May.
It is a good gauge of what you would expect rainfall in May to be as it is the point
where as many measurements were more than the value as there was less. The median
was found to be 83.025 mm.
The mean, another indication of expected rainfall, is calculated below. Notice that the
mean is dramatically higher than the median. A large contribution to this effect is the
extreme rainfall of 583mm in May 1996.
n
X =
X
i =1
n
81 + 89 + 124.6 + 8 + l + 57 + 40.75 + 50
=
18
2126.6
=
18
= 118.1444mm
The range gives an indication of the variation present in the data. Simply the highest
value minus the lowest, it is simple to calculate. A massive range of 575 mm was found
to be present in May rainfall to date.
The more computationally intensive standard deviation and CV are calculated below.
1 n
s=
X i X
n 1 i =1
1
2
2
2
=
(81 118.14 ) + (89 118.14) + l + (50 118.14)
18 1
= 127.8903
s
X
127.8903
= 100
118.1444
= 108.249
CV = 100
This large standard deviation of 127.89 says that the average distance from the average
rainfall is 127.89 mm. This is quite a large standard deviation, a perhaps unsurprising
result given the large variability in the original data. The CV, often considered as a
percentage from comparing the standard deviation with the mean, is 108. A rule of
thumb sometimes seen is that a CV over twenty it quite variable which makes 108
extremely variable.
Given that there are a few extremely high values of May rainfall, it is anticipated that
there will be a skew to the right (where right indicates higher rainfall). The result from
the skew calculation is above zero which reaffirms a skew to the right.
n
X X
n
i
skew =
(n 1)(n 2) i =1 s
(81 118.14)3 + (89 118.14 )3 + l + (50 118.14)3

18
=
(18 1)(18 2)
(127.89)3
= 3.1046
Kurtosis is another measure of the shape of the distribution formed by a random variable.
Given the spread out nature of the May rainfall data, the result here is of little surprise.
The kurtosis value of above zero indicates that May rainfall is leptokurtic and tends to
have more values further away from the mean than in a normal distribution.
10
n
X X
n(n + 1)
i
kurtosis =
(n 1)(n 2)(n 3) i =1 s
3(n 1)
(n 2)(n 3)
4
4
18 19 (81 118.14) + l + (50 118.14) 3 17 2
17 16 15
(127.89)4
16 15
= 11.1038
Refer to Appendix A under this example for code to make SAS calculate these statistics
for you.
2.1.2
Bivariate Information
Although summary measures on single variables are common and important, often the
quest is to find relationships between different variables. There are a couple of commonly
used summary measures that can be used to quantify relationships between variables. For
the purposes of this section covariance and correlation will be investigated.
Covariance (c), a measure of how two variables X and Y vary together, is defined in
Equation 2.6. The mean of X is denoted by X , the mean of Y by Y and the number of
values of X and Y being compared by n.
)(
1 n
c XY =
X i X Yi Y
n 1 i =1
( 2.6 )
Closely related to covariance is the sample correlation coefficient, r, which is in effect a

scaled covariance where the results are between -1 and 1. The correlation coefficient is
defined in Equation 2.7 where s X is the standard deviation of X and sY is the standard
deviation of Y.
(X
n
rXY
c
= XY =
s X sY
i =1
(X
n
i =1
)(
X Yi Y
) (Y
2
i =1
( 2.7 )
The correlation coefficient measures the level of linear correlation between two variables
X and Y. A linear relationship entails that for a change in X there will be a constant
11
change in Y (and vice versa), no matter what the X value is. A value of 1 indicates a
perfect positive relationship, -1 a perfect negative relationship, and a value of 0 indicates
no relationship at all. Figure 2.3 demonstrates these facts graphically.
Figure 2.3: Correlations of 1 (positive), -1 (negative) and 0.1 (weak positive).
2.1.3
Dependence within Variables
Correlation previously investigated (see section 2.1.2) was discussed from the perspective
of there being different variables. However, a variable may be correlated with itself. This
situation is common in time series, where values at one moment in time may be
correlated with the previous (or other) moments in time. For example, there may be
correlation between temperatures or stock prices on sequential days.
The terms autocovariance and autocorrelation are used to refer to covariance and
correlation within a variable. These measures are taken for particular lags or delays of
the given variable. For instance, if looking at daily temperature measures, a lag of one
would look at the measures one day apart, a lag of two would look at measures two days
apart and so on.
Given a lag k, Equation 2.8 calculates sample autocovariance ck and Equation 2.9
calculates sample autocorrelation rk . Notice that these formulae are similar to those given
for the two variable measures in section 2.1.2. In these formulae, Y is the mean of time
series variable Y, Yt represents the value of time series Y at time t, and Yt k the value of Y
at lag k.
ck =
1
n
(Y
n
t = k +1
)(
Y Yt k Y
( 2.8 )
12
(Y
n
rk =
t = k +1
)(
Y Yt k Y
(Y
n
t =1
( 2.9 )
For a given data set where assessment of correlation at a number of lags is wanted, this
process can become time consuming and tedious. Thankfully software packages
including SAS (SAS Institute, 1999) can be coaxed into generating autocovariance and
autocorrelation information.
Example 2: Autocovariance and Autocorrelation in Rainfall Data
Returning to the Buccan rainfall data, it is of interest to see if there is correlation between
rainfalls in different months. That is, information on autocovariance and autocorrelation
is wanted on monthly rainfall. Weather in south east Queensland typically involves hot,
humid, moderately wet summer conditions from around December to March and cool,
dry winters from around June to September.
A graphical summary from Microsoft Excel (Microsoft Excel, 2001) is shown in Figure
2.4 of the entire monthly rainfall data recorded from 1985 to 2001.
Rainfall Over Time

600
Rainfall
500
400
300
200
Month
Figure 2.4: Rainfall over time from 1985 to 2001 inclusive line graph.
13
Jan 2001
Jan 2000
Jan 1999
Jan 1998
Jan 1997
Jan 1996
Jan 1995
Jan 1994
Jan 1993
Jan 1992
Jan 1991
Jan 1990
Jan 1989
Jan 1988
Jan 1987
Jan 1986
Jan 1985
100
Using SAS (SAS Institute, 1999) to retrieve autocovariance and autocorrelation data
entails two steps. Firstly, the data must be read in and then proc arima must be called to
analyse the data. Appendix A contains the SAS code used for retrieving these results.
The first section of output from the arima procedure gives autocorrelation data. A
summary of results obtained is provided in Table 2.1.
Lag
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Autocovariance
7050.041
1618.361
430.13
81.115935
-208.723
-686.799
-968.495
-1241.304
-218.564
106.109
281.02
769.686
1626.151
870.382
-28.987065
-31.67815
-1228.963
-1160.323
-1234.903
-965.033
-386.607
47.909485
858.003
1163.412
1052.169
Autocorrelation
1
0.22955
0.06101
0.01151
-0.02961
-0.09742
-0.13737
-0.17607
-0.031
0.01505
0.03986
0.10917
0.23066
0.12346
-0.00411
-0.00449
-0.17432
-0.16458
-0.17516
-0.13688
-0.05484
0.0068
0.1217
0.16502
0.14924
Table 2.1: Autocovariance and autocorrelation data from monthly rainfall.
Notice that there is a perfect correlation (1) at a lag of zero. This makes sense because the
set of data is being compared with itself at this lag. The largest correlations are at a lag of
one and a lag of twelve. A lag of one could have been anticipated because there may be
14
some relationship between rainfalls in successive months. The correlation at a lag of

twelve is a reflection of the seasonal pattern seen frequently in rainfall data.
2.1.4
Statistical Measures and Terms
There are a number of common statistical measures and terms that find their way into
time series analysis. This section takes a brief look into these concepts.
The standard error of the mean and the standard error for the difference between two
means are standard statistical measures. Both are common summary measures used in
confidence limits and hypothesis testing (Rao, 1998). The standard error of the mean is
given in Equation 2.10 where s is the standard deviation and n the sample size used to
calculate the mean. The standard error of the difference between two means is given in
Equation 2.11 where n1 and n2 are the sample sizes used in calculating the two means.
SE X =
( 2.10 )
SE X 1 X 2 = s
1
1
+
n1 n2
( 2.11 )
In most statistical models of a time series, a time series Y value at time t (Yt) is seen as a
combination of an explained part and a random error et as seen in Equation 2.12. Random
error is also commonly referred to as natural variation, residual or simply error. A
random error is a natural and logical result of natural variation, measurement error and
other similar issues. There are a number of summary measures of random error. Equation
2.13 shows mean error (ME), Equation 2.14 mean absolute error (MAE) and Equation
2.15 the frequently seen and used mean square error (MSE). In all of these formulae n is
the number of errors and t the error at time t.
Yt = [ Explained ] + t
( 2.12 )
1 n
t
n t =1
( 2.13 )
ME =
15
1
MAE =
n
MSE =
1
n
t =1
t =1
( 2.14 )
2
t
( 2.15 )
Makridakis et al. (1998) also briefly investigate some other summary error measures.
These include relative or percentage error, mean percentage error and mean absolute
percentage error. These percentage based errors are useful for comparisons of models
when they are not initially in the same units.
2.1.5
Hypothesis Testing Overview
Often the procedure of hypothesis testing is applied to models as a whole and also to
individual model components. In hypothesis testing there is a null hypothesis that is
assumed true and an alternative hypothesis that is the case if the null is not true. The
probability of the null hypothesis being true, commonly known as the p value, is
evaluated. If the p value is less than a given allowable error (commonly 0.05) then the
null hypothesis is rejected and the alternative hypothesis accepted.
When testing models and model components, more often than not the null hypothesis
assumes no relationship. No relationship usually entails that a model or model component
has no noticeable impact or influence. For example, in ANOVA the null hypothesis is
there is no difference in mean between different levels of a factorial effect on a dependent
variable. A relationship is shown when the null hypothesis or no relationship is rejected.
In this way, hypothesis testing fills the role of establishing the usefulness of a model or
components in a model.
2.1.6
Outliers
Outliers are values in a data set that are so extreme that they do not appear to be part of
the data set (Zar, 1999). Outliers are frequently the result of errors in measurement or
inconsistency in units. Whereas outliers may be the result of errors, they can also be an
integral, legitimate part of the data set. Sample data containing outliers can lead to severe
16
departures from standard assumptions made in statistical analysis (for example, equal
variances in ANOVA). Where outliers are present, careful consideration must take place
to decide what to do with them. Options include leaving them out of analysis, correcting
errors (if known to be errors) or applying nonparametric statistical analyses, which are
less effected by outliers (Zar, 1999).
2.2
Correlation Functions
Correlation functions are used as a diagnostic tool on time series to judge the types of
relationships evident in the time series. The first three correlation functions here look at
relationships within time series, to see if values at particular times are related to values at
previous times in any way. The final correlation function here investigates relationships
between two time series.
There are a number of different relationships commonly seen in time series that can be
shown by correlation functions. These include where values at one time are related to
those immediately prior to it. Another common relationship is a seasonal one, where
values are related at a fixed interval of time (yearly temperature and rainfall patterns, for
example). These relationships and more present certain patterns in correlation functions
that aid in their diagnosis.
2.2.1
The Autocorrelation Function (ACF)
The autocorrelation function (ACF) and its graphical form called the correlogram are a
natural consequence of having autocorrelation in data. Autocorrelation functions have the
capability to show autocorrelation data (see section 2.1.3) in the form of a graph for ease
of interpretation and understanding.
The autocorrelation function involves calculating correlation at different lags within a
variable using the autocorrelation formula. Figure 2.5 shows a typical graphical
autocorrelation function (correlogram). Note that the correlogram is graphed only from
lags of zero forward. This is because the autocorrelation function is symmetrical around
zero as negative lags are identical to positive lags (Nemec, 1996). That is, the
autocorrelation at a lag of a is the same as at a.
17
Correlation
Autocorrelation Function
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
9 10 11 12 13 14 15 16 17 18 19 20
Lag
Figure 2.5: Example autocorrelation function.
If there were absolutely no correlation at a lag, a perfect correlation of zero may be

expected. However, usually a small correlation will appear even if there is no relationship
whatsoever. For judging which correlations will appear merely by chance and which are
significantly different from zero, the standard error of the correlation coefficient is used.
The correlation coefficient standard error is defined as 1
n where n is the number of
values being compared for an autocorrelation measure. That is, the standard error changes
through the autocorrelation function because there are a different number of comparisons
depending on the lag. When an autocorrelation measure is more than two standard errors
away from zero, it is regarded as significant (Chatfield, 1980). That is, correlations more
than 2
n away from zero are seen as significant. Should a time series be completely
random, all correlations (except zero, which will always be one) should be within this
range.
The autocorrelation function can be used in initial time series analysis phases to deduce
the types of relationships at play, and also after fitting models to judge the success of
modelling.
18
2.2.2
The Partial Autocorrelation Function (PACF)
Whereas the standard autocorrelation function looks at each lag without considering the
effect of other lags, the partial autocorrelation function factors in smaller lags. That is, the
partial autocorrelation function correlation at a lag of n takes into account the correlation
at lags of 1, 2, 3, up to n-1.
The use of the partial autocorrelation function is best shown by an example. Let us say
that there is a strong correlation between maximum temperatures one day apart. That is,
there is a strong autocorrelation in temperature using a lag of one. Assuming this
correlation is sufficiently strong, standard autocorrelation will also report a significant
correlation between temperatures two days apart due to the lag one autocorrelation. For
example, since day one and two temperatures are highly correlated as are days two and
three, then days one and three (lag two) are going to have a certain amount of correlation
due entirely to the lag one correlation.
What may be desirable is a correlation measure that honestly measures the present level
of correlation. That is, correlation that takes into account the effects of correlation at
lesser lags. This is the use of the partial autocorrelation function (PACF).
Each partial autocorrelation coefficient ak is a measure of association between a time
series Y and the same time series with a lag of k (Makridakis et al., 1998). Each partial
autocorrelation coefficient ak is found by running the multiple regression model in
Equation 2.16 where bk is an estimation of ak. The parameter bk is a standard partial
regression coefficient. For more information on multiple linear regression and partial
regression coefficients, please refer to section 2.4.5.
Yt = b0 + b1Yt 1 + b2Yt 2 + ... + bk Yt k
( 2.16 )
The graphing of partial autocorrelation coefficients creates the partial autocorrelation

function. This can give a better indication of the exact location of autocorrelation within
variables than the standard autocorrelation function. Critical values for judging
significance are the same as in the standard autocorrelation function. The SAS ARIMA
procedure automatically creates partial autocorrelation functions.
19
2.2.3
The Inverse Autocorrelation Function (IACF)
Inverse autocorrelation is calculated on a time series by applying standard the

autocorrelation function to a modified time series model. In the modified time series
model, autoregressive and moving average components are swapped (Chatfield, 1980).
Autoregressive and moving average components are discussed in detail in sections 2.4.8
to 2.4.11.
The inverse autocorrelation function (IACF) is similar in use and result to the partial
autocorrelation function. The inverse autocorrelation function is regarded as particularly
useful for data with seasonal trends (SAS Institute, 1999). The inverse autocorrelation
function tends to show seasonal (and subset) trend sources more accurately than the other
functions.
The SAS statistical package automatically provides the inverse autocorrelation function,
though it is not as common in time series literature as the other functions documented
here. Critical values are again the same as in the standard autocorrelation function.
Example 3: Rainfall Correlation Functions
From the monthly rainfall data this example generates autocorrelation, partial
autocorrelation and inverse autocorrelation data. The functions formed from this data are
presented graphically. The purpose of this investigation is to find relationships within the
time series. For instance, is the rainfall in a month correlated with rainfall in the previous
month? Is rainfall for a month correlated with rainfall for that month the previous year?
The statistical package SAS was used for the detailed mathematical calculations involved
in this example. Selected input is attached in Appendix A and selected output in
Appendix B.
The autocorrelation function is shown graphically in Figure 2.6. Because the value of the
standard error changes depending on the number n of comparisons involved, the standard
error increases from 0.07 at a lag of one to 0.091 at lag 24. Those lags found to be more
20
than two standard deviations away from zero and hence judged statistically significant
were lags of 0, 1, 7, 12, 16 and 18. The correlation at a lag of 0 is perfect and positive;
since the data is being compared to itself (a lag of 0 is really no lag at all). The significant
correlation at a lag of one reflects similar rainfall in those months directly following each
other. The significant lag at 12 is predictably significant, a side effect of seasonal patterns
seen in rainfall data. Interestingly, lags of 7, 16 and 18 are negative and significant,
another reflection of seasonal trends (rainfall records half a year apart will frequently be
opposites).
Correlation
Autocorrelation Function
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Lag
Figure 2.6: Monthly rainfall graphical autocorrelation function.
The partial autocorrelation graph shown in Figure 2.7 presents a similar result to the
autocorrelation results. Only three significant effects were evident this time, at lags of 1,
12 and 16. The lags of 7 and 18 were shown not to be significant when all lags prior to
them were taken into account for partial autocorrelation. The lags found significant here
were all significant by the autocorrelation function and are assumed to have the same
interpretation here.
21
Partial Autocorrelation Function

0.6
Correlation
0.4
0.2
0
-0.2
-0.4
-0.6
1
2 3 4
5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Lag
Figure 2.7: Monthly rainfall partial graphical autocorrelation function.
The inverse autocorrelation function has a reputation for dealing with seasonal effects
more appropriately. It is clear from the inverse autocorrelation function in Figure 2.8 that
they definitely have been dealt with the data differently. Only the lag of 1 remains
significant while the seasonal lag of 12 was close to being significant, at 1.7 standard
errors away from 0 (two standard errors is judged as significant). Although not
significant, the lag at 12 months was responsible for a number of significant lags as side
effects in the standard autocorrelation function. The inverse correlation function
methodology effectively cleaned up the effects of seasonal effects.
Inverse Autocorrelation Function

0.6
Correlation
0.4
0.2
0
-0.2
-0.4
-0.6
1
2 3 4
5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Lag
Figure 2.8: Monthly rainfall inverse graphical autocorrelation function.
22
2.2.4
The Cross Correlation Function (CCF)
Whereas the autocorrelation function investigates correlation within one time series
variable, the cross correlation function looks at correlation between two time series
variables. To investigate relationships evident between two time series, correlations
between both positive and negative lags need to be calculated. This is because a time
series X may cause a delayed effect in a time series Y, or that other time series Y may
cause a delayed effect in X. The cross correlation function uses a slight extension of the
standard correlation r formula. These formulae, which vary for positive and negative lags
of k, are given in Equations 2.17 and 2.18 (modified from McCleary and Hay, 1980).
(X
)(
nk
CCF (+ k ) =
i =1
(X
X Yi + k Y
i =1
(X
CCF ( k ) =
(X
n
i =1
i =1
i = k +1
) (Y
2
)(
X Yi k Y
) (Y
2
i =1
)
)
( 2.17 )
( 2.18 )
Example 4: Rainfall Cross Correlation Function (CCF)
Rather than simply be interested in total monthly rainfall, the Fenech residence would
also like to take a look at the number of rain days in each month. For this example, the
interest is in investigating correlation relationships between monthly rainfall and days of
rain. Does monthly rainfall depend on the days of rain from previous months or vice
versa? Figure 2.9 presents a line graph showing both rainfall and days of rain over time.
Due to the mass of values included in the graph it is difficult to draw much in the way of
conclusions from this graph.
23
25
1000
20
800
15
600
10
400
Jan 2001
Jan 2000
Jan 1999
Jan 1998
Jan 1997
Jan 1996
Jan 1995
Jan 1994
Jan 1993
Jan 1992
Jan 1991
Jan 1990
Jan 1989
Jan 1988
Jan 1987
5
Jan 1986
200
Days Of Rain
1200
Jan 1985
Rainfall
Rainfall and Days Of Rain Over Time
Rainfall
Days
Date
Figure 2.9: Rainfall and Days of Rain over time from 1985 to 2001 inclusive.
Rather than calculate cross correlation data manually the SAS statistical package was
called upon. The input for using SAS to generate this data is given in Appendix A and
selected output in Appendix B. Figure 2.10 shows a graphical summary of the cross
correlation function found. The cross correlation function is formed by effectively
comparing rainfall with positive and negative time lags of rain days.
Rainfall, Rain Days Cross-Correlation Function

0.6
Correlation
0.4
0.2
0
-0.2
-0.4
Lag
Figure 2.10: Rainfall and Days of Rain cross correlation function.
24
12
10
-2
-4
-6
-8
-10
-12
-0.6
The cross correlation function revealed a strong significant correlation at a lag of zero, a
reference to there generally being more rain when there are more rain days. The pattern
showing is almost periodic, a reflection of the strong seasonal trends in the base rainfall
data.
25
2.3
Repeated Measures Models
Many standard statistical tests involve assumptions that samples are independently and
randomly collected from populations. These assumptions dictate that the error terms in
models are random and uncorrelated. Frequently this is not the case and there is a degree
of correlation between measures. For example, consider the one factor ANOVA model
for testing equality of treatment means given in Equation 2.19. The error terms ( ij ) in
this model represent randomly selected individuals within a treatment (Zar, 1999).
Should there be a relationship between specific individuals (within or between
treatments) then the independence assumptions are violated and an alternative method of
analysis should be considered (Rao, 1998).
Yij = + i + ij
( 2.19 )
Repeated measures is a label given to a large number of situations where multiple

measures are recorded on the same experimental unit (Rao, 1998). Repeated measures
situations implicitly violate independence assumptions because there may be correlation
between the multiple measures on the same experimental unit. The error terms in models
involving repeated measures can not safely assume independence and are likely to be
correlated.
A wide variety of experimental situations involve repeated measures. Examples include
where different measures are made at a particular time on each experimental unit and
where the same measure is taken at a number of different times on each experimental
unit. Time series situations can be regarded as repeated measures where observations are
made at a number of different times on each experimental unit (measure).
There are a range of techniques available for both repeated measures and time series
analysis. Typically, repeated measures are used for the analysis of time series when there
few (up to ten) occasions, while analytical methods designed specifically for time series
analysis are used when there are many (at least 25) occasions (Nemec, 1996). Methods
available for analysis of repeated measures include split plot designs, mixed models and
multivariate ANOVA (MANOVA).
26
This section first presents a background to the theoretical and practical aspects involved
in repeated measures situations. Split plot designs and MANOVA are then reviewed as
techniques available for the analysis of repeated measures. The SAS statistical package
(SAS Institute, 1999) supports evaluations using all methods introduced here to some
degree.
Working knowledge of ANOVA is assumed in this section. There are a plethora of text
books available that detail degrees of freedom, models, blocking, ANOVA tables,
multiple comparison tests, calculations and other issues pertaining to the use of ANOVA
models. Recommended introductory texts include Bluman (2001) and Mann (1998). For
more in depth information Rao (1998) and particularly Zar (1999) are recommended.
2.3.1
Background
Repeated measures situations commonly occur in ANOVA, the univariate analysis of

variance. In its most basic form, ANOVA tests for equality of means and interactions
between any number of factors (Bluman, 2001). ANOVA tests for equality of sample
means use F-tests of variance ratios. The F-test answers the question of if the samples
could be taken from the same population. Briefly, the assumptions involved in standard
ANOVA hypothesis testing are as follows from Zar (1999) and Bluman (2001).
1. Model components are additive.
2. Each combination of factors has the same variance.
3. Each combination of factors is taken from a normally distributed population.
4. Error (or natural variation) model terms are independently and normally
distributed. This requires random and independent samples.
The ANOVA F-test is regarded as robust in regards to the second and third assumptions
as little difference will be seen in the results if they are violated (Zar, 1999). When a
repeated measures situation is present, however, the assumption regarding independent
error terms is violated. When data provided has multiple measures recorded on a
particular experimental unit, the error terms are not random and independent. There is an
explicit correlation between the measures on each experimental unit.
27
The inclusion of time as a factor in an ANOVA model may be considered a possible way
of modelling a situation involving repeated measures over time. If exactly the same
measure on an experimental unit is taken at different moments in time there is likely to be
a degree of correlation between these measurements. This situation cannot be regarded as
having random, independent samples. Therefore a standard ANOVA model with time as
a factor does not accurately represent the situation.
To demonstrate how repeated measures situations naturally arise, two simple repeated
measures situations are presented in Equation 2.20 and Equation 2.21. Both involve the
measuring of moisture where is overall mean moisture level. For Equation 2.20, i
represents the depth where the moisture measure was taken and ij an estimation of error
from replicates j within each depth level i. For Equation 2.21, i represents the time when
the moisture measure was taken and ij an estimation of error from replicates j within
each time level i. Note that the second situation could involve time series data whereas
the first clearly does not.
Moistureij = + Depthi + ij
( 2.20 )
Moistureij = + Timei + ij
( 2.21 )
For the depth based moisture samples to be completely random, each moisture
measurement would have to be taken from a different, random soil sample. However, it is
likely to be more desirable to take a number of soil samples, divide each up into different
depths and calculate moisture levels from there. This likely situation involves taking
repeated measurements (moisture) from each experimental unit (soil sample). Therefore,
this can be considered a repeated measures situation where moisture is the repeated
measure. Measurements taken from within the same soil sample are likely to have a
degree of correlation, leading to error terms that are not independent using standard
ANOVA.
The time based moisture measurements would need to all be recorded at random
locations to be regarded as independent random samples. Practicalities and needs in
controlling other variables are likely to dictate that measurements are taken at the same
location at each time interval. In this case, repeated measures (moisture) are being
recorded on the same experimental unit (location). Hence this can also be considered a
28
repeated measures situation where moisture is the repeated measure. Measurements taken
at the same location are likely to have a degree of correlation, meaning that the error
terms are not going to be truly independent.
For examples later in this section, Equation 2.22 shows a univariate ANOVA model
where repeated measures occur. Because of assumption violations, it is important to note
that any standard ANOVA hinted at by this model is inappropriate. A dependent variable
moisture is being analysed using the variables location, time and block. There are to be
five locations, four times and three blocks. Due to the demonstrative nature of this
example, exact details such as units are not specified.
Moistureijk = + Locationi + Time j + (Location Time )ij + Block k + ijk
( 2.22 )
Where:
i = 1, 2, 3, 4, 5 (location level indicator).
j = 1, 2, 3, 4 (time level indicator).
k = 1, 2, 3 (block level indicator).
The block variable is regarded as a standard block that is assumed not to interact with
other factors. The error estimate ijk in Equation 2.22 therefore includes all interactions
of the block with other factors. In Equation 2.23, the error term is expanded to show all
the interactions implied to be within the error term.
Moistureijk = + Locationi + Time j + (Location Time )ij + Block k
+ (Location Block )ik + (Time Block ) jk
( 2.23 )
+ (Location Time Block )ijk
Repeated measures are involved in this situation because of the time factor.
Measurements taken at different times with the same block and location combination are
likely to have correlation that violates assumptions of independence. Therefore moisture
is regarded as a repeated measure over time. The following sections introduce two
general approaches that can be used to analyse repeated measures situations and apply
them to this particular problem.
29
2.3.2
Split Plot Designs
Split plot designs are a common type of analysis that can be applied to repeated
measures. A split plot design involves placing a subplot (or number of subplots) within a
main plot to test additional factors. Factorial effects contained in the main plot and
subplots are tested separately using different error terms (Rao, 1998). The purpose of
using the different error terms is to avoid violations of independence of errors.
The technique of using each main plot as a complete replicate for another factor often
carries with it a substantial economic or logical advantage (Cochran and Cox, 1957). For
situations where the same measure is recorded at different times, it allows for the analysis
to take place (a logical advantage). The situation where a split plot design is used to deal
with correlation between measurements at different times is referred to as a split plot
over time.
In split plot designs, the main plot has an estimation of error separate to the contents of
any subplots. The main plot error adheres to demands of random independent errors,
leaving the factors that repeated measures are taken over (eg. time) in the subplots. In
effect the subplot factorial effect values are averaged over the variable repeated measures
are taken over (eg. time) for tests of significance involving main plot factorial effects.
This averaging removed the influence of repeated measures from the main plot.
Every factorial effect contained in the subplots (along with subplots of subplots and so
forth should they be involved) shares a common level of correlation in the error terms.
This correlation is due to the repeated measures over the factors contained in the subplot.
Factorial effects found in the subplot are tested for significance using a subplot error,
which is completely separate to the main plot error. All factorial effects isolated in the
subplot are affected by a standard random error and a common correlated error
component (that results from the repeated measures). Factorial effects within the subplot
can therefore be compared because they shared the same error components.
Once a split plot design is set up, analysis is as per standard ANOVA using variance
ratios and F-tests with one important exception. Factorial effects must be tested for
significance using the appropriate error (natural variation) term. Drawbacks of split plot
30
designs include that the separation may lead to low error degrees of freedom and the
possible loss of valuable information. For further information on split plot designs,
Cochran and Cox (1957) is recommended.
Example 5: Setting up a Split Plot Design

Equation 2.24 gives a version of the model introduced in section 2.3.1. In this base
ANOVA model, moisture is the repeated measure over time since identical moisture
measures taken at different times are assumed to be correlated. Hence factorial effects
involving time are placed in a subplot as is typical in a split plot over time design. Table
2.2 shows the resulting projected ANOVA for the split plot design. The location and
block factors are tested using an estimation of error from the location and block
interaction (1). The time factor and time by location interaction are tested using all terms
involving time and block interactions (2).
Moistureijk = + Locationi + Block j + (Location Block )ij +
Time k + (Location Time )ik + (Block Time ) jk
( 2.24 )
+ (Location Block Time )ijk
Source of Variation
Degrees of Freedom
Location
4
Block
2
8
Location Block (1)
Time
3
12
Location Time
6
Block Time (2)
24
Location Block Time (2)
Total
59
Table 2.2: Split plot projected ANOVA for moisture repeated measures example.
Note that the first estimation of error (1) has a fairly low number of degrees of freedom,
a common consequence of a split plot design. Each factorial effect tested for significance
uses error factorial effects that share the same error components. In the main plot, all
factorial effects are only affected by a random error because values used in analysis are
averaged over the four times. All factorial effects in the subplot are affected by a
31
completely random error component and a common correlated error component due to
the repeated measures over time.
2.3.3
MANOVA
MANOVA (multivariate ANOVA) presents an alternative way of dealing with repeated

measures situations. MANOVA is a multiple variable (multivariate) equivalent of
ANOVA. Correlation between levels of the repeated measure is dealt with in MANOVA
by the use of vectors and matrices where single values used to stand.
MANOVA involves combining a number of univariate ANOVA models together into
one large vector based model. Each level of each variable that repeated measures are
taken over is represented by a univariate ANOVA model that is combined in MANOVA.
In a situation were repeated measures are recorded over time, this means that a univariate
model for each different time is combined in MANOVA.
Standard sums of squares measurements in the ANOVA table are replaced in
MANOVA with covariance matrices called sums of squares and cross products (s.s.p.)
matrices. These are needed because the correlation between dependent variables must be
taken into account and covariance matrices provide that functionality. Mathematically,
correlation and covariance are closely linked but not the same (see section 2.1.2).
Elements along the diagonal of the s.s.p. matrices are conventional sums of squares
measurements, as they are calculating sums of squares on univariate components within
the model. All other elements of these matrices measure how dependent variables vary
together. Detailed formulae can be found in Crowder and Hand (1990) for these matrices.
Hypothesis testing in MANOVA is again an extension of that used in ANOVA. A one
way ANOVA null hypothesis is usually of the form given in Equation 2.25, where g is
the number of groups. This claims equality of group means while the alternative
hypothesis is that two or more of the means are not equal. For MANOVA, the null
hypothesis is as in ANOVA for every dependent variable (Zar, 1999). So if there were n
separate dependent variables (still with g groups) the null hypothesis would look as in
Equation 2.26.
32
H0: 1 = 2 = = g
( 2.25 )
H0: 11 = = g1 and 12 = = g2 and and 1n = = gn
( 2.26 )
The alternative MANOVA hypothesis states that there are differences between at least
two groups in at least one of the dependent variables. This is a vague conclusion to make
and must therefore be investigated further using should the null hypothesis be rejected.
Note that the MANOVA hypothesis does not test (or enforce) that the means for different
dependent variables are the same. A significant MANOVA result for a factorial effect can
be a reflection of significant differences between mean levels from that factorial effect
and or interactions between that factorial effect and the factors the repeated measures are
taken over.
There are a number of different test statistics for MANOVA, all derived in different ways
from the s.s.p. matrices. Standard notation used in MANOVA denotes the between
groups s.s.p. as TB, within groups s.s.p. at TW and total s.s.p. as T. These matrices are
equivalents of the mean sums of squares measures in standard ANOVA. The common
statistics seen are Wilks lambda (determinant TW / determinant T), Roys largest root
(largest eigenvalue of TBTW -1), Hotelling-Lawley trace (sum of eigenvalues of TBTW -1)
and Pillais trace (sum of eigenvalues of TBT -1). The values obtained from each of these
methods is converted to an approximate F-value (as used in ANOVA) and hence
compared to an F-distribution. The degrees of freedom used in the F-test are dependent
on the test statistic involved.
The best MANOVA test statistic depends on properties of the data being analysed.
Wilks Lambda tends to be the most common and is often exclusively used (such as in
Johnson and Wichern, 1982). Zar (1999) points out that Pillais trace tends to be the most
robust of the methods, handling departures from strict assumptions reasonably. Roys
largest root functions as an upper bound on the test statistic value.
In univariate ANOVA, multiple comparison tests should be taken out should a significant
result be obtained from the main ANOVA. This is to highlight where exactly significant
33
differences lie in the groups being compared. Multiple comparison tests in MANOVA
involve splitting the MANOVA into the separate univariate ANOVA models and
applying ANOVA multiple comparison techniques to each univariate model. In a
MANOVA model of reasonable size, this ends up involving a lot of multiple comparison
tests.
Given a standard allowable error level () of 0.05, an average of one in twenty multiple
comparison tests will be significant purely by chance. This risk may be judged as
unacceptable when there are a large number of multiple comparison tests as can be the
case in MANOVA. One way to get around this is the Bonferroni approach, where a
revised error term is used (Rao, 1998). As shown in Equation 2.27, the new allowable
error level is the original error level divided by the number of multiple comparison tests
k. The purpose of this modification is to reduce the occurrence of sporadic relationships.
The Bonferroni approach can be applied to any number of standard multiple comparison
tests including the protected t-test, SNK test and Tukeys HSD (Honestly Significant
Difference).
new =
old
k
( 2.27 )
There are a couple of drawbacks involved in MANOVA. Firstly, tests of equality of

means can not be taken out involving the factor the repeated measures were taken over
(eg. time). Furthermore, the additional parameters estimations required in MANOVA
over ANOVA may leave low degrees of freedom left for error estimation. As the number
of dependent variables increases, the degrees of freedom available for error estimation
decreases.
Example 6: Setting up a MANOVA Design

In the ongoing example involving modelling moisture from location, time and a block
effect, repeated measures were taken over the time variable. An ANOVA model
considering moisture measures at one moment in time is shown in Equation 2.28, where i
represents the particular location and j the block. The interaction of location and block is
an estimation of error or natural variation.
34
Moistureij = + Location i + Block j + (Location Block )ij
( 2.28 )
Since there are four separate times, the MANOVA model simultaneously deals with four
such univariate models as in Equation 2.28. Equation 2.29 shows a vector based
interpretation of what happens when the four ANOVA models are combined. This
representation is compressed to give the form seem in Equation 2.30. Components here
have interpretations as follows:
Moistureijk is particular moisture measure at a time i, location j and block k.
i is the mean moisture level at time i.
Locationij is the effect of location j at time i.
Blockik is the effect of block k at time i.
(Location Block )ijk
is the interaction between location j and block k at time i.
Moisture1 jk 1 Location1 j Block1k (Loc Block )1 jk
Moisture Location
2
Block 2 k (Loc Block )2 jk
2j
2 jk
+
+
+
=
Moisture3 jk 3 Location3 j Block 3k (Loc Block )3 jk

Moisture4 jk 4 Location 4 j Block 4 k (Loc Block )4 jk
( 2.29 )
Moistureijk = i + Locationij + Block ik + (Location Block )ijk
( 2.30 )
Note that the MANOVA presented here will not test for equality of time means. This
reflects the common drawback of MANOVA where factors the repeated measures are
recorded over can not be tested for equality of means.
For investigation purposes, consider that location is found to be significant in the
MANOVA model. That is, mean difference in moisture levels exist between at least two
locations during at least one time. Multiple comparison tests should then occur to find out
exactly where these differences lie. Remember that there were five locations decided
previously. For each of the four univariate models there would be 10 multiple comparison
tests (4 + 3 + 2 + 1) leading to a total of 40 multiple comparison tests. Given a default
error level () of 0.05, it is likely that two (40 0.05) tests will be significant purely by
chance. Therefore a Bonferroni approach modified error level of 0.00125 (0.05 / 40) is
worth considering to lessen the occurrence of sporadic significant results.
35
2.4
Univariate Time Series Models
All of the statistical models to be dealt with in this section involve modelling values of
one dependent variable from one of more independent variables. That is, models are
created to explain the behaviour of a dependant variable using other variables we label
independent. This section looks at different models commonly used for single
(univariate) time series. In this univariate time series context, the independent variables
are versions of the time series being investigated. Different forms of the original time
series are used to predict that time series.
2.4.1
Time Series Model Components
Components of time series are often split into three separate parts trend, season and
random error. These three components are usually viewed in their most fundamental
terms by plotting time (x-axis) against the value for the variable (y-axis). This type of plot
is commonly known as a time plot. The ongoing purely theoretical example for this
section will be average monthly temperature for a location over 100 years (there is no
actual data).
The term trend is used in reference to long term changes in a property over time. We
may find that the average monthly temperature is slowly increasing over the 100 years,
forming a trend. Figure 2.11 shows a graph of a pure trend component.
36
Value
A Pure Trend Component
Time
Figure 2.11: Time plot of a pure trend component.
A recurring pattern over time often is regarded as a seasonal trend. Average monthly
temperature will tend to be different every month but return to about the same at the same
month every year. For this reason temperature commonly has a strong seasonal trend with
a period of twelve months. It is possible to have other cyclic trends that are not based on
years. These types of seasonal trends are dealt with in analysis exactly as are yearly
seasonal trends. Figure 2.12 shows a graph of a pure seasonal (or cyclic) component.
Value
A Pure Seasonal Component
Time
Figure 2.12: Time plot of a pure seasonal component.
37
The final component is random error or natural variation. Once everything possible has
been taken into account, there will usually be a certain amount of unexplained variation.
This component is the difference between the temperature in a particular given month
and the average temperature in that month (taking into account long term trends). A
purely random series should be approximately normally distributed with a mean of 0 and
variance of 1/n where there are n data entries (Chatfield, 1980). Figure 2.13 shows a time
series with only a pure random component.
Value
A Pure Random Component
Time
Figure 2.13: Time plot of a pure random component.
It is unrealistic to expect to graph time by a variable and see simply a trend, seasonal
trend or random component. Most models are an intricate combination of these three
components. Figure 2.14 shows a typical result of combining trend, season and random
components. In this case, the three components have been added together. The three
components could be multiplied together or formed by any number of other mathematical
combinations.
38
Value
Trend, Season and Random Components
Time
Figure 2.14: Time plot of trend, season and random components (additive).
2.4.2
General Time Series Models
There are a large number of different models possible in time series. Hence often a
general decomposition form such as that in Equation 2.31 is seen (Makridakis et al.,
1998). Here, the dependent variable Y at a particular time t ( Yt ) is seen as a function of
seasonal ( St ), trend ( Tt ) and error ( t ) components.
Yt = f (S t , Tt , t )
( 2.31 )
The exact relationship between components is left out of the general model. This is
because there are many options for these relationships, including additive (Equation
2.32), multiplicative (Equation 2.33) and pseudo-additive (Equation 2.34). Details on the
building of additive models are given in later parts of section 2.4.
Yt = S t + Tt + t
( 2.32 )
Yt = S t Tt t
( 2.33 )
Yt = Tt (S t + t 1)
( 2.34 )
There is a clear limit to how much behaviour can be modelled in time series analysis. If
relationships are searched for in too much detail random error may end up being
modelled, which can be unproductive and misleading. If relationships are reported that
are really a result of random error, then those relationships do not really exist and are
39
purporting to be meaningful sources of change. These misleading relationships render the

results at least partially invalid. Modelling randomness is labelled as overfitting.
There are common summary measures available for evaluating the best time series
model. The model with the minimum mean square error (see section 2.1.4) gives the
minimum difference between the predicted and actual values and is an indication of a
good model. Another commonly used metric is Akaikes Information Criterion (AIC),
which uses likelihood to evaluate the value of a model. The best model is regarded as the
one that retrieves the minimum value for the AIC (Pynnnen, 2001).
2.4.3
Moving Averages
Every specific value in a time series is inevitably going to be effected by natural

variation. This often leads to a time series having many spikes through time that can
make the time series difficult to interpret and analyse. One method commonly used to
combat these noisy time series situations is to use a moving average. Moving averages
are a common tool used in time series analysis to smooth out random variation in data.
Figure 2.15 gives an example of the smoothing effect of a simple moving average
application.
Smoothed Time Series
Value
Value
Original Time Series
Time
Time
Figure 2.15: A time series before and after applying a moving average smoother.
Applying a moving average (usually) creates a smoother time series. Each value in the
new time series is a modified version of the old value taking into account surrounding
time series values. That is, each new value is an average of sorts of itself and surrounding
40
values in the original time series. Many methods use different weights for surrounding
values in the original series so that different time series locations are prioritised.
The general from of moving averages is given in Equation 2.35 (Chatfield, 1980). In this
general form Tt represents the resulting moving average series T at time t, Yt + j values
from the original time series around time t and aj denotes the weights on the original time
series T value. The half width m is defined as m = (k 1) / 2 where k is the number of
values from the original time series being included. Note that this formula implies that
the weights for values surrounding the central position t are symmetrical. Weights must
add to one to preserve the same scale as the original time series. The moving averages
presented in this section are all specific types of this general moving average form.
Tt =
a Y
j = m
( 2.35 )
j t+ j
In all moving averages there are trade offs involving the number of positions k included
in each moving average calculation. The higher the value of k, the smoother the resulting
time series is likely to appear. However, inclusion of too many values may lead to over
smoothing were certain patterns are being removed. Furthermore, the larger the value of k
is, the more values are lost at the beginning and end of the smoothed time series. This is
because smoothed values at these positions required values before and after the original
time series.
The most basic form of a moving average is referred to as a simple moving average. This
type of moving average simply gives equal weighting to a set number of positions
surrounding each original position. The number of positions considered k must be odd for
the simple moving average to allow for the same number of values to be considered on
either side of the original position. For example, each point could be replaced with an
average of five values (two from either side of the original position). This means that the
value at time 8 will now be the average of the values at time 6, 7, 8, 9 and 10. This
procedure of selecting values around each time is applied to every observation. This
particular example using five values is known as a 5MA simple moving average. The
general formula for a kMA moving average is given in Equation 2.36 (Makridakis et al.,
41
1998). Note that Y is the original time series, T is the resulting smoothed time series and
the half-width m = (k 1) 2 .
Tt =
1
k
j= m
( 2.36 )
t+ j
The simple moving average only allows an odd number of terms (eg. k = 3, 5 or 7) to be
used, since the same number of values must be taken on either side of the position. A
centered moving average allows for moving averages using an even number of terms (k).
The centered moving average works by taking the average of two separate simple moving
averages. This concept is shown in Equation 2.37.
Tt =
Tt 0.5 + Tt + 0.5
2
( 2.37 )
A moving average of this form is known as a 2 kMA moving average because it is the
average of two kMA moving averages. Consider a 2 4MA moving average, where t is 4
in Equation 2.37. An even number moving average can be calculated for a value like T3.5
because it means taking an even number of points on either side of 3.5. For example, the
4MA moving average value T4.5 would be the average of values at times 3, 4, 5 and 6.
Centered moving averages provide different weights for the first and last positions in the
original time series. The first and last values have half the weight of all other values
involved. In the general case of a 2 k MA centered moving average, all time series
points used will have weights of 1/k except the first and last which have a weight of 1/2k.
This behaviour is best observed by looking at Example 7 at the end of this section.
The last specific type of moving average for review here is a double moving average. A
double moving average is another advancement of the simple moving average. It involves
taking a moving average of a moving average. For example, Equation 2.38 shows how a
33 MA double moving average Tt ' would be calculated (where each Ti is calculated
using 3 MA). Expanding out these moving average forms can reveal interesting implied
weightings. For example, consider the expanded weights present in a 33 MA shown in
Equation 2.39.
42
Tt ' = (Tt 1 + Tt + Tt +1 ) / 3
( 2.38 )
Tt = (Yt 1 + Yt + Yt +1 ) / 3
Tt ' = (Yt 2 + Yt 1 + Yt ) / 3 + (Yt 1 + Yt + Yt +1 ) / 3 + (Yt + Yt +1 + Yt + 2 ) / 3
=
1
(Yt 2 ) + 2 (Yt 1 ) + 1 (Yt ) + 2 (Yt +1 ) + 1 (Yt + 2 )
9
9
3
9
9
( 2.39 )
Example 7: Calculating a 2 4MA (Centered Moving Average)
For this example a centered moving average value is calculated for one particular point
on a time series. The 2001 rainfall data in Table 2.3 is used to find a 2 4 MA value at
April 2001.
Jan 2001
Feb 2001
Mar 2001
Apr 2001
May 2001
Jun 2001
Jul 2001
Aug 2001
Sep 2001
86.9
140
415
48
40.8
13.3
22.9
2.5
12
Table 2.3: Rainfall data from January to December in 2001.
A moving average at time 4 is the average of a moving average at time 3.5 and 4.5. Once
the moving averages at times 3.5 and 4.5 are full expanded and simplified, an expression
showing different position weightings results.
T3.5 + T4.5
2
(Y2 + Y3 + Y4 + Y5 ) (Y3 + Y4 + Y5 + Y6 )
+
4
4
=
2
Y2 + 2Y3 + 2Y4 + 2Y5 + Y6
=
8
1
1
1
1
1
= Y2 + Y3 + Y4 + Y5 + Y6
8
4
4
4
8
T4 =
43
It is clear from the above that the first and last values being looked at have a weight of
one eighth while the others have a weight of one quarter. This pattern of the first and last
values having half the weight of all other values is expected of a centered moving
average. Below the true time series values are substituted in to get the final moving
average result.
1
(140) + 1 (415) + 1 (48) + 1 (40.8) + 1 (13.3)
8
4
4
4
8
= 17.5 + 103.75 + 12 + 10.2 + 1.6625
T4 =
= 145.1125
Applying the above procedure to every position in the original time series results in a
smoothed time series. Figure 2.16 shows the effect resulting from this smoothing on
Buccan monthly rainfall data from 1994 to 2001 inclusive.
Smoothed Time Series
400
400
350
350
300
300
Rainfall (mm)
Rainfall (mm)
250
200
150
100
50
250
200
150
100
50
Time
Time
Figure 2.16: Applying a 2 4MA moving average to rainfall data (1994 to 2001).
2.4.4
Simple Linear Regression
Simple linear regression looks at modelling one continuous dependent variable (Y) from
one continuous independent variable (X). For analysis, a data set must be available
44
containing values of X and resulting values of Y. Simple linear regression is exactly like
finding a line of best fit. Two population regression parameter estimates, 0 and 1 ,
are calculated to fit a simple linear regression model. The model component represents
error or natural variation. The general model is shown in Equation 2.40 where the
subscript j is used to represent a particular occasion.
Y j = 0 + 1 X j + j
( 2.40 )
It is a good idea to test the 1 regression coefficient for significance. If 1 is not

significantly different from zero then X is not a good predictor of Y. The graphical
representation of simple linear regression in Figure 2.17 shows 0 as the intercept and
1 as the slope on the line of best fit.
Figure 2.17: Simple linear regression presented graphically.
The coefficient of determination, symbolised by r2, relays a percentage of variation in Y

accounted for by X (Zar, 1999). The r2 is calculated by dividing the regression sum of
squares by the total sum of squares found during model evaluation. The result for the r2 is
a value between 0 and 1 where zero indicates no variation is being explained by the
model and one that all variation is being explained by the model. Simple linear regression
models work well when there is a strong correlation (r) between X and Y.
2.4.5
Multiple Linear Regression
Multiple linear regression is an extension of simple linear regression where multiple

independent variables are allowed. Instead of having one independent variable X, there
45
are now p dependant variables X 1 , X 2 , X 3 up to X p . For analysis, a data set must be

available containing a set of values of each X variable and resulting values of Y. The
additional independent variables increases the number of regression coefficient estimates
i , as seen in the general form given in Equation 2.41. Each regression coefficient i is
now regarded as a partial regression coefficient. Once again, the subscript j represents a
particular occasion where values are available.
Y j = 0 + 1 X 1 j + 2 X 2 j + ... + p X pj + j
( 2.41 )
The model as a whole should first be tested for significance. This F-test in effect tests the
hypothesis that Y has no dependence on any of the X variables (Zar, 1999). Should this
test show some relationship then each partial regression coefficients i should be tested
for significance. These tests are to see if each partial regression coefficient is equal to
zero (no relationship).
As in simple linear regression, r2 values give an approximation of how much of the
variability in Y is explained by the model. A variant of the r2 called the adjusted r2 is
particularly suited to the multiple linear regression context. This is because the adjusted
r2 adjusts for the number of variables included in the model whereas the r2 tends to
increase as more variables are included (whether or not there is any meaningful
relationship).
Notice the use of the term independent means that the independent variables should not
be associated (correlated) with each other. If there is correlation between independent
variables, this leads to a situation known as multicolinearity. The unusual situation of
having a significant model as a whole but no significant individual variables is one of
many potential side effects of multicolinearity. The matter of multicolinearity is
particularly important to take note of in the time series area. Many time series models
naturally involve a degree of multicolinearity (see section 2.4.8). It must be carefully
monitored where correlation exists between independent variables and a non standard
evaluation strategy such as ridge regression or two phase least squares may be needed
to acquire accurate parameter estimates (Zar, 1999).
46
There are a number of techniques available for selecting appropriate variables to include
in a multiple linear regression model. These include forward selection, backward
selection and stepwise selection. All techniques are based on individual variables
significance in the model, associated r2 values for the entire model and sometimes other
criteria. Detailed discussion of these common techniques is beyond the scope of this
thesis.
Forms of regression play an important role in time series. The base forms of
autoregressive (AR) models in section 2.4.8 and moving average (MA) models in section
2.4.9 both involve forms of regression models. In turn, autoregressive moving average
(ARMA) models in section 2.4.10 and autoregressive integrated moving average
(ARIMA) models are strongly based on regression models.
Regression is also fundamental in a number of other available but not so common time
series techniques not discussed in detail in this thesis. Harmonic regression (HREG), for
instance, uses sine and cosine components in a multiple linear regression to model a time
series (Stergiou et al., 1997).
Example 8: Linear Regression
It has already been established that there is some relationship between days of rain and
total rain in any particular month for the Buccan rainfall data. For this exercise a simple
linear regression model is looked at where rainfall is modelled as dependent on rain days.
Rain j = 0 + 1 Days j + j
The SAS statistical package (SAS Institute, 1999) provides the glm and reg
procedures to support the evaluation of regression models. Appendix A contains the
relevant code for using the SAS reg and glm procedures for regression analysis of the
Buccan rainfall data.
Both SAS procedures produce exactly the same results with subtly different output. The
reg procedure is the only one to output an adjusted r2 value while the glm procedure
gives a greater variety of sums of squares measures. Raw output for both procedures is
provided in Appendix B.
47
Evaluation of the model as a whole found that it was significant, with an F-value of
111.14 and resulting p value of less than 0.0001. The null hypothesis of the there being
no relationship between the independent (days of rain) and dependent (rainfall) variables
in the model is confidently rejected.
An r2 value of 0.3549 tells that 35.49% of the variation observed in rainfall is explained
by the days of rain variable. In this simple linear regression situation, the adjusted r2 is
very similar at 0.3517.
Since the model as a whole was found to be significant, model components are
investigated next. Within the model, days of rain is found to be significant with a
probability of less than 0.0001 of not having an effect on rainfall. In simple linear
regression, the significance result for the model as a whole and the one independent
variable will always be the same.
Parameter estimates for the intercept ( 0 ) and days ( 1 ) are calculated to give to give the
final model seen below. These parameter estimates are provided along with their standard
errors. Notice that these parameters do not necessarily make complete logical sense, as
this model suggests that when there are no rain days there will rainfall of -17mm.
Rain j = 17.1837 + 12.869 Days j
(11.073) (1.221)
2.4.6
Stationarity
A number of the models looked at in later sections (AR, MA and ARMA models) require
a time series to be stationary (Chatfield, 1980). This section considers the issue of
stationarity.
For a time series to be stationary it must be both stationary in mean and in variance; the
mean and variance must not change over time. If a time series is gradually increasing or
48
decreasing over time, this trend makes the time series not stationary in mean. If the data
become more or less variable over time, then a time series does not have stationarity of
variance. In a lot of cases it may be clear from a simple time series graph (a variable
against time) that a time series is stationary. At other times this will not be so clear and
for these cases there are a number of standard tests available to assess stationarity. A
group of tests known as unit root tests can be used as tests of stationarity (Makridakis et
al., 1998).
Three tests of stationarity are the Dickey-Fuller test, random walk test and PhillipsPerron test (SAS Insitute, 1999). The most commonly referred to test, the Dickey-Fuller
test, is based on a multiple linear regression model. There are a few different forms of the
Dickey-Fuller test including the zero mean, single mean and trend versions (SAS
Institute, 1999). The zero mean test assumes a mean of zero, the single mean test allows a
mean of any particular value and the trend test allows for a trend (difference in mean over
time). All tests test for stationarity of variance while all except trend test for stationarity
of mean.
Equation 2.42 shows the regression model for the zero mean Dickey-Fuller test, Equation
2.43 the model for the single mean Dickey-Fuller test and Equation 2.44 the model for
the trend Dickey-Fuller test. All of these augmented Dickey-Fuller stationarity tests result
in an estimation of the parameter (phi). If the series is stationary will be negative,
otherwise it will be close to zero. There are a number of options for testing and these
include conversion to rho for comparison to the Dickey-Fuller null distribution and an F
approximation. The number of lagged differenced series terms included for analysis is
flexible with around three recommended (Makridakis et al., 1998).
Yt ' = Yt 1 + 1Yt '1 + 2Yt ' 2 + ... + pYt ' p
( 2.42 )
Yt ' = Yt 1 + 0 + 1Yt '1 + 2Yt ' 2 + ... + p Yt ' p
( 2.43 )
Yt ' = Yt 1 + t + 0 + 1Yt '1 + 2Yt '2 + ... + p Yt ' p
( 2.44 )
In the Dickey-Fuller regression models (Equations 2.42 to 2.44) Yt 1 is the lagged one
series taken from the original time series, Yt ' is the differenced series and each Yt 'k is a
lagged time series taken from the differenced time series. The values 0, 1 and so on are
49
standard regression coefficients. A differenced time series is formed by calculating the

differences at each point from an original time series, as shown in Equation 2.45 where
Yt is the original time series, Yt 1 is the lagged one original time series and Yt ' the
resulting differenced time series.
Yt ' = Yt Yt 1
( 2.45 )
Stationarity needs to be enforced in a time series before conventional analysis. This

process is referred to as prewhitening in McLeary and Hay (1980) among others. If a time
series is not stationary in variance, a mathematical transformation such as the natural log
or inversion (diving one by values) is applied to the time series (Makridakis et al., 1998).
Logarithms are particularly common and used in many applications such as Chan et al.
and Nicholson et al. (1998).
If a time series is not stationary in mean, differencing as discussed previously is used.
Sometimes it is necessary to use second-order differencing to remove non-stationarity of
the mean. That is, standard differencing must be carried out a second time, as in Equation
2.46 where Yt '' is the resulting second-order differenced series. In practice it is regarded
as rarely necessary to go further than second-order differencing (Makridakis et al., 1998).
Yt '' = Yt ' Yt '1
( 2.46 )
Example 7: Differencing a Time Series
This example has two parts. In the first a simple first and second-order differenced time
series is found from a subset of the Buccan rainfall data. In the second, a data set is
created and then made stationary using differencing.
Part 1
The first eight months of rainfall in 2001 are regarded as the original time series Y. The
first lag of this time series Yt 1 is then found simply by shifting Yt . The first-order
differenced series Yt ' can then be calculated using Yt ' = Yt Yt 1 . After then lagging the
differenced series to create Yt '1 , the second order differenced series Yt '' can be found by
50
calculating Yt '' = Yt ' Yt '1 . Table 2.4 shows the results of applying the differencing to the
2001 data. Notice that with each successive difference one value is lost from the time
series. Whereas this may not be overly important for long time series such as the base
rainfall one, this should be considered for short time series.
t
Yt
Yt 1
Yt ' = Yt Yt 1
Yt '1
Yt '' = Yt ' Yt '1
1
2
3
4
5
6
7
8
86.9
140
415
48
40.8
13.3
22.9
2.5
86.9
140
415
48
40.8
13.3
22.9
53.1
275
-367
-7.2
-27.5
9.6
-20.4
53.1
275
-367
-7.2
-27.5
9.6
221.9
-642
359.8
-20.3
37.1
-30
Table 2.4: First and second differencing applied to 2001 rainfall data.
Part 2
A time series was created where there was a definite trend over time, with a slightly
increasing amount of random error at each point as time goes on. Simple first order
differencing was applied to coax the time series to be stationary in mean. The original
time series is shown along with the resulting differenced time series in Figure 2.18. It is
clear that the trend has been completely removed.
Differenced Time Series
Value
Value
Time
Time
Figure 2.18: Applying differencing to a created series with a clear trend.
51
2.4.7
Backshift Notation
Backshift is a notational format commonly used to represent time series models

(Makridakis et al., 1998). Although fundamentally simple, backshift notation carries
substantial benefits for modelling complex models. The backshift operator B is defined as
in Equation 2.47.
BYt = Yt 1
( 2.47 )
The backshift operator B simply changes a time series Y to have a lag of one. Multiple
application of B can shift a time series back n time periods as seen in Equation 2.48.
B nYt = Yt n
2.4.8
( 2.48 )
AR (Autoregressive) Models
In section 2.1.3 the concepts of autocovariance and autocorrelation were introduced.

These are measures that look at covariance and correlation within a variable for different
lags. Autoregressive models work along these lines, using multiple linear regression to
model current values of a time series from previous values of the same series (Chatfield,
1980). These models assume that current time series values rely entirely on these
previous values with any differences being due to random noise.
An autoregressive model of order p is referred to as AR(p). For example, an
autoregressive model that includes the first three lags of the current variable would be
referred to as AR(3). The general form of an autoregressive model as shown in Equation
2.49 is much the same as a standard multiple linear regression model. The backshift
notation form is shown in Equation 2.50.
Yt = c + 1Yt 1 + 2Yt 2 + ... + p Yt p + t
Where:
Yt represents a time series Y observation at time t.
c a standard regression constant.
i are standard regression coefficients.
Yt i the time series Y observation at a lag of i.
52
( 2.49 )
t is the random error at time t.
(1 1 B ... p B p )Yt = c + t
( 2.50 )
Determination of whether a time series follows an autoregressive model and how many
terms to include can be aided by the use of autocorrelation and partial autocorrelation
functions.
An autoregressive model of order one, AR(1), is a model where each time series value
depends purely on the previous. The autocorrelation function for this model is expected
to decay exponentially to zero on the positive side if 1 is positive, alternating in sign
starting with a negative correlation if 1 is negative. The partial autocorrelation function
is expected to show a spike at lag one (of the same sign as 1 ) before cutting off straight
to zero. For the AR(1) case when 1 is positive, Figure 2.19 displays a typical
autocorrelation function and Figure 2.20 displays a typical partial correlation function.
Correlation
AR(1) Autocorrelation Function

1
0.5
0
-0.5
-1
1
10
11
Lag
Figure 2.19: Typical autocorrelation function for AR(1), positive 1 .
53
12
Correlation
AR(1) Partial Autocorrelation Function

1
0.5
0
-0.5
-1
1
10
11
12
Lag
Figure 2.20: Typical partial autocorrelation function for AR(1), positive 1 .
For the general autoregressive model of order p, AR(p), there is a given set of behaviour
that can be anticipated. The autocorrelation function is expected to exponentially decay
(potentially in a sine-wave pattern) while the partial autocorrelation function is expected
to have spikes at lags one to p and then cut off to zero (Makridakis et al., 1998). The
spikes are at a number of lags because there are separate relationships between the time
series at a number of time lags. For the AR(p) case Figure 2.21 displays a typical
autocorrelation function and Figure 2.21 displays a typical partial correlation function.
Correlation
AR(p ) Autocorrelation Function

1
0.5
0
-0.5
-1
1
Lag
Figure 2.21: Typical autocorrelation function for AR(p).
54
10
11
12
Correlation
AR(p ) Partial Autocorrelation Function

1
0.5
0
-0.5
-1
1
10
11
12
Lag
Figure 2.22: Typical partial autocorrelation function for AR(p).
2.4.9
MA (Moving Average) Models
Moving average models use a form of multiple linear regression to model time series
values on previous errors. In effect, the model is assumed to be formed from a moving
average of the error series.
A moving average model of order q is referred to as MA(q) and is shown in Equation
2.51. For example, a moving average model that includes the first three lags of the error
would be referred to as MA(3). The use of subtraction for error terms is a standard
notation and not for any specific purpose.
Yt = c + t 1 t 1 2 t 2 ... q t q
( 2.51 )
Where:
Yt represents a time series Y observation at time t.
c is a standard regression constant.
i are standard regression coefficients.
t is the error at time t.
t i is the random error at a time lag of i.
The backshift notation form is shown in Equation 2.52. As was the case for AR models,
the autocorrelation and partial autocorrelation functions can help in diagnosing the
appropriateness of MA models and the terms to include.
55
Yt = c + 1 1 B ... q B q t
( 2.52 )
A moving average model of order one, MA(1), is a model where each time series value
depends purely on the previous error. The autocorrelation function for this model is
expected to have a spike at lag one and then cut off to zero. The spike is of the same sign
as the 1 regression coefficient. The partial autocorrelation function is expected to show
decay at an exponential rate on the negative sign if 1 is negative and alternating in sign
starting from the positive side if 1 is positive. For the MA(1) case when 1 is positive,
Figure 2.23 displays a typical autocorrelation function and Figure 2.24 displays a typical
partial correlation function. Interestingly, the expected patterns displayed in the
autocorrelation and partial autocorrelation functions are almost precisely swapped around
from the AR model case.
Correlation
MA(1) Autocorrelation Function

1
0.5
0
-0.5
-1
1
10
11
12
Lag
Figure 2.23: Typical autocorrelation function for MA(1), positive 1 .
Correlation
MA(1) Partial Autocorrelation Function

1
0.5
0
-0.5
-1
1
10
11
12
Lag
Figure 2.24: Typical partial autocorrelation function for MA(1), positive 1 .
56
For the general moving average model of order q, MA(q), there is a given set of
behaviour that can be anticipated. The autocorrelation function is expected to have spikes
at lags 1 to q and then cut off to zero while the partial autocorrelation function is
expected to exponentially decay, potentially in a sine-wave pattern (Makridakis et al.,
1998). Once again, the expected behaviour shown by MA models in the autocorrelation
and partial correlation functions are almost exactly swapped around from the AR model
case.
Correlation
MA(q ) Autocorrelation Function

1
0.5
0
-0.5
-1
1
10
11
12
11
12
Lag
Figure 2.25: Typical autocorrelation function for MA(q).
Correlation
MA(q ) Partial Autocorrelation Function

1
0.5
0
-0.5
-1
1
10
Lag
Figure 2.26: Typical partial autocorrelation function for MA(q).
57
2.4.10
ARMA (Autoregressive Moving Average) Models
Autoregressive moving average (ARMA) models are a useful set of time series models
that combine autoregressive (AR) and moving average (MA) models together. The
resulting form can be seen in Equation 2.53.
Yt = c + 1Yt 1 + ... + p Yt p + t 1 t 1 ... q t q
( 2.53 )
All symbols used in Equation 2.53 are as defined in sections 2.4.8 and 2.4.9 for the AR
and MA models combined to form this model. The backshift notation form can be found
in Equation 2.54.
(1 1 B ... p B p )Yt = c + 1 1 B ... q B q t
( 2.54 )
Identification of ARMA models and components is by observing for underlying patterns

of AR and MA models discussed in sections 2.4.8 and 2.4.9. Experimentation is
commonly seen in the area of ARMA models, where different model components are
tested to see which end up being significant or useful in the model. Specification of
model components in AR, MA, ARMA and ARIMA (section 2.4.11) is not an exact
science.
2.4.11
ARIMA (Autoregressive Integrated Moving Average) Models
All models looked at so far (AR, MA, ARMA) require a time series to be stationary.
Stationarity of variance is enforced by mathematical transformations while stationarity of
mean by differencing. See section 2.4.6 for more information on stationarity.
Autoregressive Integrated Moving Average models, hereafter referred to as ARIMA
models, allow for autoregressive (AR), integrated (I) and moving average (MA)
components. The use of the term integrated is a reference to the use of integration to
return the series back to its original form after differencing and evaluation. The
integration process is the inverse of the differencing process.
ARIMA models are written in the form ARIMA (p, d, q) where p indicates the order of
the autoregressive component, d is the number of first differences used and q is the order
of the moving average component. All models previously investigated can be represented
using this ARIMA notation. For example, an autoregressive model of order three is
58
ARIMA (3, 0, 0), a moving average model of order two is ARIMA (0, 0, 2) and an ARMA
model that combines these two without differencing would be ARIMA (3, 0, 2). A
completely random (white noise) model is simply ARIMA (0, 0, 0).
ARIMA models are best shown using backshift notation as the formulae quickly become
very large and complicated without it (Makridakis et al., 1998). The general ARIMA (p,
d, q) formula using backshift notation is as shown in Equation 2.55. The symbols used in
Equation 2.55 are as defined in sections 2.4.8 and 2.4.9 for AR and MA models.
(1 B ...
1
B p (1 B ) Yt = c + 1 1 B ... q B q t
AR
MA
( 2.55 )
Notice that the backshift notation of Equation 2.55 allows for ease of distinction between
autoregressive (AR), integrated (I) and moving average (MA) model components. An
example model for ARIMA (1, 2, 1) is shown in Equation 2.56.
(1 1 B )(1 B )2 Yt
AR
= c + (1 1 B ) t
( 2.56 )
MA
The inclusion of seasonality has so far been avoided in the ARIMA models presented.
However, often seasonality is an important component to include in analysis. Seasonal
components in ARIMA models are considered separately to other components because
they can have their own behaviour. Consider the hypothetical modelling of monthly
temperature over many years where there is a season of twelve months. There may be
non-stationarity over time, perhaps as a result of global warming. Temperatures in one
month may be related to the temperature in the same month in the previous year. This
hypothetical example highlights in the practical sense why seasonal components are
important and have their own characteristics that need to be modelled.
Incorporating seasonality into ARIMA models involves further complication and these
models are referred to as ARIMA (p,d,q)(P,D,Q)s models. The p, d and q components
represent non seasonal autoregressive, integrated and moving average components
respectively. P, D and Q represent seasonal autoregressive, integrated and moving
average components respectively. The length of the season involved is referred to as the
59
parameter s. The general model in backshift notation is given in Equation 2.57

(Makridakis et al., 1998).
(1 B ... B )(1 B ... B )(1 B ) (1 B )

(1 B ... B )(1 B ... B )
p
Ps
s D
Yt =
Qs
( 2.57 )
Where:
Yt is the original time series observation at time t.
Each i is a parameter estimate for the non seasonal autoregressive component.
Each i is a parameter estimate for the seasonal autoregressive component.
Each i is a parameter estimate for the non seasonal moving average component.
Each i is a parameter estimate for the seasonal moving average component.
t is the random error at time t.
All parameters used in ARIMA models have logical restrictions on the values they can
take. For more information on these issues consult Chatfield (1980) or Makridakis
Makridakis et al. (1998).
The general approach for ARIMA modelling is as follows:
1. Plot the time series variable against time and observe; if the variance is not
stationary over time then transform.
2. If the data is not stationary around the mean (ie. there is a trend of some sort) then
use differencing. Take second-order first differencing at most.
3. Examine the autocorrelation functions of the time series remaining. Seasonality,
autoregressive or moving average components (or a combination of these) may
reveal themselves.
The goal of ARIMA models is to end up with a random (white noise) time series for the
error component. Once all components have been identified and extracted this is what
should result. This is checked by reviewing residual autocorrelation function values after
models have been fitted.
60
An alternative paradigm to ARIMA sometimes used for analysis of univariate time series
data is Winters three parameter exponential smoothing (WES). Three smoothing
operations are used to handle trend, seasonality and randomness. Additive and
multiplicative forms of WES are available.
The focus in the remainder of this section is on supplying accurate input and making
valid conclusions using time series tools. For this investigation the particular tool of
choice for aiding in calculations is the SAS statistical software package (SAS Institute,
1999). The SAS arima procedure is dedicated to the use of ARIMA models. Below is a
brief overview of the structure of ARIMA input for using SAS and the following example
utilises SAS for a practical application. For full details on the functionality available in
proc arima, it is recommended to consult a SAS/ETS user guide or standard SAS Help
(SAS Institute, 1999).
Within the SAS arima procedure, there are two main instruction lines that are
fundamental for ARIMA models. These are the identify and estimate instructions.
The var option in the identify statement is for specifying the time series to be
analysed. It is at this point that differences (the integrated component of ARIMA models)
should be specified if they are required. For example, to read in a variable A that needs to
have second-order first differencing for analysis, VAR=A(1,1) should be entered.
The stationarity option in the identify statement can be used to conduct Dickey-Fuller,
Phillips-Perron and Random Walk tests. The form of the Dickey-Fuller test given in
section 2.4.6 shows that the number of autoregressive components in the test is not fixed.
SAS supports the specifying of different numbers of components for inclusion in
stationarity tests.
Seasonal and non seasonal autoregressive and moving average model components are
specified using the estimate statement. Autoregressive lags can be specified using the
P option and moving average components with the Q option. Due to the separation in
modelling between seasonal and non seasonal components care must be taken to ensure
any analysis tool takes this into account. For example, to include moving average
components with a lag of one and two and a seasonal moving average of lag twelve in,
61
Q=(1,2)(12) should be entered. Since the goal of ARIMA modelling is to end up with
purely random residuals, information on the form of the residuals after model fitting is of
value. Adding the keyword plot to the estimate statement produces autocorrelation
plots of the errors (residuals) after the model is fitted.
Example 9: Rainfall ARIMA Model
In this example an ARIMA model is developed for the monthly Buccan rainfall data from
1985 to 2001. The first task involved is to graph a time plot. The SAS Insight package
was used to draw a scatter plot with time on the x-axis and rainfall on the y-axis. The
resulting graph is shown in Figure 2.27.
Figure 2.27: Time plot of rainfall over time from 1985 to 2001 inclusive.
It is not expected that the mean or variance of the rainfall data will change over time from
personal experience and the fact that only sixteen years were covered. Dickey-Fuller tests
were conducted anyway, using SAS. Since the mean can be any value the type of DickeyFuller test of interest is the single mean augmented Dickey-Fuller test. SAS allows for a
number of different autoregressive orders to be used in the base regression model of the
62
Dickey-Fuller tests. By default SAS tests autoregressive orders of 0, 1 and 2 but it was
decided to test with orders of 0, 1, 3 and 5. Appendix A contains the SAS code used for
these tests and Appendix B contains selected output.
For each single mean augmented Dickey-Fuller test, the null hypothesis of non
stationarity was confidently rejected (p values all well under 0.01). The Buccan rainfall
data was already stationary and hence differencing is not required.
The autocorrelation, partial autocorrelation and inverse correlation functions of the
original time series have already been investigated in Example 3. The main suggestions
from the analysis were that there is a twelve month seasonal pattern and a relationship
between rainfalls from one month to the next. The model to be tested therefore has a
seasonal (autoregressive) component of lag twelve, and a normal autoregressive
component of lag one.
ARIMA (p,d,q)(P,D,Q)s = ARIMA (1,0,0)(1,0,0)12
(1 1 B ) (1 1 B12 )Yt
= (1) t
Specifying this model in SAS was not difficult using the estimate line in proc arima.
The exact ARIMA code is included in Appendix A under this example. The estimate
line told SAS to regard the normal and seasonal components differently. The use of P is
as used through this thesis and refers to autoregressive components. Specifying plot
tells SAS to include autocorrelation plots of the random error component in the model.
Plotting the error components helps judge the legitimacy of the model created.
The ease of specifying models in SAS with the estimate line allowed quick testing and
evaluation of many different models. For experimentation, different models were tested
to the base one proposed from initial analysis. However, the most suitable model was
found to be the initial model proposed above. Suitability was judged on the significance
of factors in the model, residual autocorrelation information and to a small extent
personal judgement. The remainder of this example investigates how exactly the decision
on accepting the initial model was made.
63
The two autoregressive components in the initial model (lag one and seasonal lag of
twelve) were both found to be significant in the ARIMA model. With p values both of
less than 0.005 there was little question as to whether they were having an effect in the
model.
An indicator of the best model is given by minimising the standard error. The initial
model retrieved a standard error estimate of 80.44209, which was not bettered during
tests of other models.
The autocorrelation, partial autocorrelation and inverse autocorrelation function plots of
residuals were carefully scrutinised. These plots should show no relationships whatsoever
if they are the result of modelling an accurate model. The autocorrelation and partial
autocorrelation functions of the residuals had a significant lag at 16. The significant lag
of 16 had little practical interpretation and was dismissed as sporadic after inclusion in
models did not improve model performance. The inverse autocorrelation function
revealed no lag anywhere near significant (including 16).
Accepting the initial model, information can be retrieved from SAS on the values of the
model parameters. An overall mean rainfall estimate is given of 88.47 with a standard
error of 8.91. The lag one parameter estimate is 0.19926 with a standard error of 0.069
while the seasonal lag of twelve parameter estimate is 0.23 with a standard error of
0.073). These parameters are shown below in the form of the original model.
(1 0.19926 B ) (1 0.2267 B 12 )Yt
2.4.12
= t
Forecasting
Although a common application of time series models, forecasting is not investigated in

detail in this thesis as it is beyond the scope in the time available. Forecasting is the
prediction of future values given previous data. There is a large amount of literature and
information on forecasting topics as they are of keen interest in many fields (economics,
biological systems, etc).
64
2.5
Multivariate Time Series Models
Most of the techniques looked at thus far have been concerned with one variable and are
hence referred to as univariate techniques. This may not realistically reflect the situation
which is to be modelled. Models may be required that include more than one time series.
A variable may be more accurately modelled and predicted given information from a
number of different time series.
There are two main classes of multiple variable (multivariate) time series analysis.
Naming for these different methodologies is inconsistent and therefore in this thesis a
standard naming convention is adopted. Multivariate ARIMA models use many time
series to predict only one time series while vector ARIMA models use multiple time
series to predict multiple time series. For example, in multivariate ARIMA a daily
temperature time series may be modelled from previous daily temperature, humidity, and
atmospheric pressure. In vector ARIMA, all of the variables (daily temperature, humidity
and atmospheric pressure) may be modelled simultaneously off previous values of those
same variables. Both multivariate ARIMA and vector ARIMA methodologies are
extensions of univariate (single variable) ARIMA techniques.
2.5.1
Multivariate ARIMA Models
Multivariate ARIMA models allow for the prediction of one time series from a number of
time series (including itself). This is of use when we suspect that a time series may be
affecting the time series we are analysing and want to include this relationship in the
model. We will refer to these additional influential time series as explanatory variables.
For example, daily temperature may perhaps be predicted better from including the
explanatory variables atmospheric pressure, humidity and rainfall in analysis.
To be included in a multivariate model, an explanatory time series may affect the time
series being modelled, but not vice versa. That is, in our model of daily temperature,
humidity may affect the temperature but temperature is not allowed to affect humidity. If
there are relationships in both directions, a more general approach like vector ARIMA is
more appropriate (Makridakis et al., 1998).
65
Conceptually, multivariate ARIMA simply adds the effect of any number of explanatory
variables (X1, X2, , Xn) on top of a standard univariate ARIMA model. This concept is
shown in Equation 2.58 where a function of each explanatory time series X i is added to
a standard univariate ARIMA model for Y called N.
Yt = f1 ( X 1 ) + f 2 ( X 2 ) + ... + f n ( X n ) + N t
( 2.58 )
Each additional time series X i may exist by itself or with any number of lags. This is
because the relationship between the cause X and effect in Y may be delayed.
There is a general backshift notation form used for writing these multivariate models. The
one explanatory variable case is shown in Equation 2.59 (Makridakis et al., 1998).
Generalisations with more explanatory variables simply sum together the effects of
separate explanatory variables.
Yt = c + v(B ) X t + N t
( 2.59 )
Where:
c is a constant.
Y is the time series being investigated.
X is a time an explanatory time series variable.
v(B ) is the transfer function that calculates the effect of X on Y. It is defined by

v(B ) = v 0 + v1 B + v 2 B 2 + l + v k B k .
N t is an ARIMA model for the time series Y.
Sometimes the transfer function v(B ) is represented by the ratio (B ) (B ) where
(B ) = 0 1 B 2 B 2 l s B s and (B ) = 1 1 B 2 B 2 l r B r . This
alternative formation provides a more efficient parameterisation since it commonly
requires less parameter estimates.
There are two common techniques for fitting multivariate ARIMA models. The first
involves the use of prewhitening and cross correlation and the second, called linear
transfer function is a more modern precise methodology (Makridakis et al., 1998).
66
The older method was developed by Box and Jenkins in 1970. McCleary and Hay (1980)
provide a number of examples of the use of this method. The basic steps involved are as
follows:
1. Each time series is made stationary. This is referred to as prewhitening.
2. Explanatory variables at various lags are decided upon and fitted from cross
correlation functions (see section 2.2.4).
3. An ARIMA model is fitted to the time series of focus by using standard univariate
methods on the residuals from the last phase.
The more modern linear transfer function (LTF) involves a number of different steps but
is more precise. Makridakis et al. (1998) contains various pointers for achieving a
successful model. The summarised LTF method steps are as follows:
1. A regression model is fitted using lags of explanatory variables. The number of
lags used should be enough to accurately capture all potential relationships.
2. Stationarity of the time series is enforced if the errors from the initial regression
model are not stationarity.
3. Transfer functions are found to convert explanatory time series into effects on the
time series of focus.
4. A model is fitted using the transfer function and an ARMA model fitted to the
errors resulting from this fitting.
Once an entire model has been fitted, success is measured in the same way as univariate
ARIMA models. Autocorrelation plots of the model residual components should show no
significant lags. The mean square error and Akaikes Information Criterion (AIC) if
available should be minimised.
A multivariate ARIMA model commonly seen is ARMAX. An ARMAX(p, q, r) model is
a normal autoregressive moving average model with autoregressive (AR) order p, moving
average (MA) order q and explanatory (X) variables of order r. ARMAX models result
from looking at a single dependent time series within a vector ARMA model (Franses,
1998). They contain the effects of many independent variables but only have one
dependant time series variable. In effect ARMAX models are a version of the general
multivariate models discussed in this section where explanatory variables have a fixed
67
order r. Another common multivariate ARIMA form is dynamic regression (DREG)

models, which actually predate ARIMA (Stegiou et al., 1997).
The SAS statistical package (SAS Institute, 1999) contains functionality within proc
arima to cope with these multivariate models. Within the identify statement, the
crosscorr option allows for explanatory variables to be specified with their given level
of differencing. Then within the estimate statement, the input option allows for the
specifying of model additional parameters.
Example 10: Rainfall Multivariate ARIMA Model
In this example we extend the univariate ARIMA model (see Example 9) deduced earlier
to make a multivariate model. While we still wish to focus on monthly rainfall, we will
include the days of rain in each month as an explanatory variable.
Dickey-Fuller stationarity tests reveal that rainfall and days of rain are both stationary
and do not need transformation. A cross correlation function between rainfall and days of
rain (see Example 4) showed a strong correlation at time 0. This suggests using a model
where only the lag 0 of days of rain is included.
For the time being the LTF (linear transfer function) method will be used by running a
model with the first ten lags of days of rain included. Below is the model used and
Appendix A contains the SAS code to run it. The important components to note are the
crosscor option that allows explanatory variables to be specified and the input option
that specifies explanatory variable components to be included in the model.
Yt = c + v0 + v1 B + v 2 B 2 + l + v10 B 10 X t + t
As expected, the results from this model showed only the first parameter (from time 0,
v0 ) to be anywhere near significant in the model. In addition to this, the resulting
autocorrelation plots revealed no significant lags. The SAS parameter estimates used to
establish this understanding are given in Appendix B.
68
Given that only the lag 0 correlation was significant, the model was reran with simply the
lag 0 of the explanatory variable included. The model used is shown below and in
selected SAS output in Appendix B. The autocorrelation functions on the residuals for
this model again showed no meaningful significant lags.
Yt = c + v0 X t + t
Normally at this point an ARMA model would be fitted to the errors to represent the
influence of previous rainfall values and errors on the current model. However, since
residual autocorrelation plots showed no meaningful significant lags, this is unlikely to
provide a better model.
Purely for experimentation then, a model was run that includes autoregressive separate
components of lag one and lag twelve (seasonal) which were significant in the univariate
rainfall model. The relevant model is shown below and SAS input in Appendix B.
(1 1 B ) (1 1 B 12 )Yt
= c + v0 X t + t
As expected, the autoregressive components that were significant in the univariate model
were found not to be significant in this multivariate case. The effect of days of rain at lag
0 was found to be far more crucial than the lagged autoregressive terms.
The more primitive model was found to have a higher standard error but perform better
using the AIC. The differences were rather small in both cases though. Accepting the
basic model with only the lag 0 of days of rain included, the numerical form of the base
model is then derived. The model form with parameter estimates and standard errors is
shown below.
Yt = c + v0 X t + t
Raint = 17.1837 + 12.86902 Dayst + t
(11.073) (1.221)
69
A final note on this particular example is that it does not necessarily make much sense or
meet the requirements of a multivariate time series model. Whereas we know that rain
days will affect rainfall, is it also safe to assume that rainfall affects rain days? If rainfall
does affect rain days, then another model should be used. Furthermore, the inclusion and
use of the lag 0 for rain days means that rainfall in a particular month is basically being
predicted from the number of rain days in that month. The usefulness of such a model,
where previous values play no role whatsoever, is indeed questionable.
2.5.2
Vector ARIMA Models
Vector ARIMA models are an extension of base ARIMA models where instead of
looking at an individual time series, many time series are looked at concurrently. This
view allows for investigation of complex interrelations between different time series,
common components and so forth (Pynnnen, 2001).
All time series to be included in vector ARIMA must first be stationary so they can be
modelled and compared (Franses, 1998). This removing of non stationarity before
analysis is the reason why these methods are referred to in literature as vector ARMA
rather than vector ARIMA.
As is the case with standard ARIMA models, there are a few separate models often used
within the class of vector ARIMA. These are vector AR (autoregression), vector MA
(moving average) and vector ARMA (autoregressive moving average). Commonly the
vector part is shortened to V and these models are referred to as VAR, VMA and
VARMA models.
The VARMA model can be represented using exactly the same form as univariate
ARMA models (Equation 2.60), except it must be remembered that each time series
component is a vector (number of values) rather than a single value. In Equation 2.61 the
extended general form is shown, using vectors and matrices (Franses, 1998).
Yt = c + 1Y t-1 + + pY t-p + 1 t-1 + + q t-q + t
70
( 2.60 )
Y1,t c1 11,1
Y
2,t = c 2 + 21,1

Ym,t c m m1,1
11,1 12,1
22,1
21,1
+

m1,1 m 2,1
12,1 1m,1 Y1,t 1

11, p
22,1 2 m ,1 Y2,t 1
21, p
+ ... +

m 2,1 mm,1 Ym,t 1

m1, p
1m,1 1,t 1
11, q 12,q
22,q
2 m ,1 2,t 1
21, q
+ ... +

mm,1 m,t 1
m1,q m 2,q
12, p 1m, p Y1,t p

22, p 2 m, p Y2,t p

m 2, p mm, p Ym,t p
1m,q 1,t q 1,t
2 m ,q 2,t q 2,t
+

mm ,q m,t q m,t
( 2.61 )
A side effect of vector ARIMA models is that there are a large number of parameters to
be estimated. When there is not enough data to estimate the required parameters, the
situation is known as over fitting. Continuing when over fitting is evident can result in
grossly inaccurate and misleading results. The number of parameters in vector ARMA
models is given in Equation 2.62 where m is the number of variables, p the order of the
autoregressive component and q the order of the moving average component (Franses,
1998).
m + m2 ( p + q)
( 2.62 )
The number of parameters involved is the key to why these models are not investigated
further in this dissertation. The length of a time series needs to be suitably long to model
the number of parameters involved. The application data for this thesis does not provide
anywhere near the length necessary for vector ARIMA given the number of parameters.
Whereas multivariate ARIMA models allow for the inclusion of additional explanatory
variables in ARIMA, multivariate dynamic regression models allow for additional
explanatory variables in vector ARIMA. Multivariate dynamic regression models allow
for improved modelling of a number of dependent variables in vector ARIMA by
incorporating additional variables purely for explanatory purposes.
Autocorrelation and autocovariance are generalised to autocorrelation and autocovariance
matrices for the vector ARIMA application. The residuals are assumed to follow a
multivariate normal distribution rather than a standard normal distribution. Judging which
elements of these matrices are significant helps decide an appropriate order for
71
autoregressive components. Provided with sufficient data, vector ARIMA can be used for
a number of different purposes. Particularly useful is the modelling and evaluation of
complex interrelations between multiple time series.
Vector ARIMA is beyond the call of duty for the SAS proc arima (SAS Institute, 1999).
Instead, SAS offers the proc statespace that includes the ability to evaluate these
models. Once again, full investigation is considered beyond the scope of this dissertation.
72
3 THEORY LITERATURE REVIEW

The purpose of this theoretical literature review is to sample progress in the multivariate
time series field. Many situations involve the simultaneous recording of more than one
time series. There is considerable variety in the types of data recorded and purposes for
analysis. Therefore, multivariate time series analyses have been the subject of a
significant amount of theoretical development in recent years.
For the purpose of review the focus was on more recent articles, with all articles finally
reviewed published in the last eight years. A large portion of discovered multiple variable
time series literature during these years was tightly linked with a particular field. The
reason for these strong links was mainly that the type of data analysis was only of value
to the given field. Different data analysis tangents are a natural result of the different
types of data involved and different desires for the information to be extracted by
analysis.
Articles in common statistics journals were found to rarely coincide with multivariate
time series. No particular journal was found that specialises in multivariate time series.
The reason for this is that there is an interest in extracting meaning from naturally
occurring phenomena in a wide range of situations from diverse fields. Each field
develops multivariate time series techniques to suit their particular purpose and
contributes towards the overall wealth of knowledge. For this reason the theoretical
research contained in this review covers a diverse range of different research areas, with
no particular journals having dominance. The journals found with multivariate time series
literature included Computational Statistics and Data Analysis, Artificial Intelligence in
Medicine, Neurocomputing, Physica D, International Journal of Modern Physics,
Reliability Engineering and System Safety and Aquatic Living Resources. The different
fields that arose from a sample of relevant articles found are shown in Table 3.1.
73
Article
Field of Research

Statistics
Physics
Pattern Recognition
Medicine
Forestry
Computing
1996
1998
2001
2001
2000
2001
1999
1995
2000
1996
2001
2000
2001
2002
2001
2001
Chemistry
Year
Akman and De Gooijer

Cao et al.
Crucianu et al.
Guimares et al.
Kulkarni and Parikh
Lu et al.
Maharaj
Nemec
rstavik et al.
Palu
Pynnnen
Reick and Page
Repucci et al.
Swift and Liu
Swift et al.
Wilson et al.
Table 3.1: The fields of research involved in recent theoretical articles.
Table 3.2 provides an overview of the base theoretical aspects found in recent articles.
Advanced time series text books commonly contain information on these base theories
used (eg. ARIMA, vector ARIMA). For the most part, recent articles found advance upon
particular aspects involved in this base theory. Where an article addresses more than one
base concept, usually concepts are considered in isolation and occasionally they are
combined. It is important to remember that while there may be different base concepts,
they are linked by the data they analyse and hence combinations of methods are always a
possibility for the future. The main areas of base theoretical concepts found in literature
from the last eight years are:
ARIMA Multiple independent variable forms of autoregressive integrated

moving average (ARIMA) models.
74
Vector ARIMA Vector based versions of ARIMA, supporting multiple

dependent and independent variables. Includes vector AR (VAR) and vector
ARMA (VARMA) among others.
Bayesian A more recent statistical paradigm allowing for the inclusion of prior
information and additional experimental flexibility.
Component extraction Investigating finding common components (properties)

within a number of time series.
Grouping variables Placing a number of time series into groups depending on

similarities.
Patterns The searching for patterns in a number of time series. Analysing

patterns may be used for diagnosis using artificial intelligence, for grouping
together time series and so forth.
State-Space Modelling A multiple time series analysis technique related to

vector ARIMA.
Nonlinear Methods Analysis for when standard assumptions of linear

relationships are invalid.
The remainder of this chapter provides an overview of recent multivariate time series
developments in a range of areas. The chapter is arranged into the following broad
categories ARIMA developments, ARIMA alternatives, Bayesian developments,
nonlinear developments and miscellaneous developments.
75
Article
Techniques Involved

Nonlinear Methods
State-Space Modelling

Patterns
Grouping Variables
Component Extraction
Bayesian
Vector ARMA
1996
1998
2001
2001
2000
2001
1999
1995
2000
1996
2001
2000
2001
2002
2001
2001
ARIMA
Year
Akman and De Gooijer

Cao et al.
Crucianu et al.
Guimares et al.
Kulkarni and Parikh
Lu et al.
Maharaj
Nemec
rstavik et al.
Palu
Pynnnen
Reick and Page
Repucci et al.
Swift and Liu
Swift et al.
Wilson et al.

Table 3.2: Aspects looked at by researchers in recent theoretical articles.
3.1
AR/ARMA/ARIMA Developments
ARIMA models and variants of them are common in standard texts on time series
analysis. It is therefore of little surprise that multiple variable forms of these models are
the subject of ongoing research. ARIMA based models are applicable to multiple time
series where there are fixed time intervals, linear relationships and time series that are not
short.
Time series and repeated measures options available and currently in use are reviewed by
Nemec (1995) without introducing new material. The purpose of Nemecs review was to
provide an understandable set of standard techniques for use in analysing time series
76
situations that arise in forestry. Nemec (1995) looks at the base theory of time series
including correlation functions, ARIMA models and forecasting, before looking at the
functionality provided by the SAS statistical package for analysis.
Pynnnen (2001) provides a comprehensive set of lecture notes that cover and advance
upon topics covered by Nemec (1995). The lecture notes are mathematically accurate
after comparisons with simpler material by the likes of Chatfield (1980). The purpose of
the work by Pynnnen was to present a set of ARIMA based methods and techniques
suitable for use on economic multiple time series. Base univariate AR, MA, ARMA and
ARIMA models are reviewed by Pynnnen (2001). A detailed review of concepts in
vector AR and vector ARMA multiple time series models feature in the remainder of the
article. Section 2.5.2 contains more information on vector ARIMA based methods.
Cointegration is a commonly used concept in recent literature with relevance particularly
in the finance area for the long term modelling of economic time series (see Felmingham
et al., 2000). When a time series has to have first differencing applied to be stationary
before ARIMA modelling it is referred to as being I(1), a notation taken from the I for
integrated in ARIMA models. Two time series are cointegrated if they are both I(1) and
there exists some linear combination of the two that results in a stationary time series.
The existence of cointegration suggests a relationship between the two time series and
tests are available to test the significance of cointegration relationships. An example of
the use of a cointegration test can be found in Felmingham et al. (2000). The time series
X and Y are cointegrated if they are both I(1) and there exists an a (a 0) such that the
resulting difference between Y and a X is a stationary time series (Equation 3.1).
Introductory cointegration information can be found in a number of financial application
papers including Diamandis et at. (2000), Felmingham et al. (2000), and Green and
Sparks (1999).
Yt aX t ~ I (0)
( 3.1 )
When a number of time series are available on fundamentally the same quantity, parallel
time series often arise. Akman and De Gooijer (1996) devise a method for finding
common components in parallel time series. Finding the common components in multiple
time series can give interesting information about underlying patterns. In effect these
77
methods provide an additional tool for the types of multiple time series commonly found
in text books and literature.
Common autoregressive (AR) and moving average (MA) components are searched for in
parallel time series using what Akman and De Gooijer (1996) call component extraction
analysis. Each time series was first assumed correlated with every other time series and
represented by a (stationary) ARMA model of order (p, q). The multiple time series are
then used to find common components. The component extraction method was applied to
each original univariate time series to result in a univariate common component time
series. Standard univariate time series analysis techniques are then advocated for the
modelling and forecasting of these resulting series. Simulation using varying sample sizes
and fundamental models was used to test the performance of Akman and De Gooijers
(1996) proposed techniques. A medical example of male and female death rates from
certain diseases provides a demonstration of a typical practical usage. Although
mathematically intensive and not intuitive theoretically, the method performs well and
should be useful in the practical sense.
When a number of multiple variable time series as seen in Akman and De Gooijer (1996)
are available, these may need to be placed in groups. Maharaj (1999) developed a method
to compare a number of multiple time series and class them into groups. The input for
analysis was a number of multiple time series while the output was a number of groups
each containing multiple time series. The similarity of multiple time series for group
classification was decided by similarity of model parameter estimates. The purpose of
such analysis was to investigate patterns present across different multiple time series
groups.
The multiple variable time series models Maharaj (1999) used in analysis were vector
ARMA (VARMA) models. Each model was first converted to an infinite order vector AR
(VAR) model. Each infinite order VAR model was then truncated and compared to every
other model using provided formulae. The purpose of this was to ascertain how similar
each multiple time series was to every other multiple time series. Every combination of
two modified VAR models was compared using a hypothesis test. The null hypothesis of
this test was that there is no significant difference between the two multiple time series
models. The result of comparing all models was a matrix of p values. A clustering
78
algorithm was proposed to form groups of multiple time series from these p values. These
groups highlight which multiple time series are the most similar to each other.
The grouping algorithm proposed by Maharaj (1999) was performance tested using
simulation. Bivarate (two variable) vector ARMA models of lengths 50 and 200 were
compared. Larger (200 or more) sample sizes were recommended for best results;
particularly as the number of time series in each multiple time series model increases.
The provided techniques make clear logical sense despite being moderately intense
mathematically and are a useful addition to multiple time series analysis techniques.
Some recent developments in the modelling of multiple time series are presented by
Wilson et al. (2001). The focus of the article was on advances in vector based models (as
seen in Akman and De Gooijer, 1996). These authors looked first at a more efficient use
of parameters in vector autoregressive (VAR) models. The reason for wanting to do this
was that the large number of parameters required for standard VAR models (see Franses,
1998) leads to restrictive demands on the minimum size of the time series for analysis.
The goal of the modified model was therefore to provide a good representation of the
structure with as few parameters as possible. The technique presented involved showing
variable relationships using a directed acyclic graph (DAG). In the DAG, particular lags
of each time series are nodes and arrows linking nodes represented casual dependence. A
provided procedure converts the DAG to a graph without directions called a conditional
interdependence graph (CIG). The CIG tells which parameters are to be estimated and
which are to be left out of the model. The final result of this process was a VAR model
with less than the standard number of parameters that performs better using standard
model evaluation measures such as Akaikes Information Criterion (AIC).
An extension of VAR models also proposed by Wilson et al. (2001) represented moving
average (MA) components using a single smoothing coefficient . When = 0 the
situation simplifies to standard VAR. These extended VAR models were referred to as
VZAR models after the univariate ZAR label. Univariate models are not necessarily
called ZAR and can be found in many text books including Makridakis et al. (1998). By
including MA components, VZAR models provide an alternative to VARMA models
where fewer parameters are used to represent MA components. The smoothing proposed
works by applying powers of an operator W, shown in Equation 3.2, were B is the
79
backshift operator. The advantage of such a scheme, as is the case in the univariate
equivalent, is that only one parameter () needs to be estimated rather than different
parameters for each time lag. Some questions remain unanswered with ZAR models,
including how to make an appropriate choice of the smoothing coefficient .
W=
B
1 B
( 3.2 )
Economic applications are used by Wilson et al. (2001) to test their theoretical models.
The combining of methods such as the two they investigated is a clear potential future
direction. Both advancements given are useful contributions to the time series field, with
the second model particularly promising given the known usefulness and popularity of
the univariate equivalent.
3.2
ARIMA Alternative Developments
At times ARIMA and its multiple variable equivalents are not suitable for various
reasons. The two articles covered in this section provide alternative methods of analysis
for similar data sets as dealt with in the ARIMA based methods in section 3.1.
A well known restriction of multivariate time series techniques using vector ARMA is
the large number of parameters involved. Without a sufficiently long time series, it may
not be possible to accurately estimate all parameters. This situation is known as
overfitting. To avoid this situation software packages often impose restrictions on T, the
necessary length of a time series. Equation 3.3 shows such a restriction for a VAR model
in the SPSS software package, where n is the number of variables and p the order of the
vector autoregressive component (Swift and Liu, 2002). For example, to model five
multiple time series using three lags (the autoregressive component) the original time
series must longer than 20 time units.
T > n( p + 1)
( 3.3 )
Swift and Liu (2002) report an unconventional way to successfully model smaller
multivariate time series, that they call the vector autoregressive genetic algorithm
(VARGA) method. Based on genetic algorithms, the method may reduce the time series
80
length necessary to as low as in p+1, where p is the order of the autoregressive

component.
As with all genetic algorithms, Swift and Lius VARGA (2002) is an iterative learning
process where original (VAR) matrices are mixed and matched (crossover and mutation)
over successive generations to find a solution that performs best in a fixed fitness
function (Swift and Liu, 2002). Experiments with a multiple variable short time series
revealed good performance from VARGA compared to the traditional VAR method.
Clearly presented and understandable, the provided techniques are a sign of the
possibilities achievable from combining statistics and genetic algorithms. VARGA
presents a fascinating alternative to conventional time series modelling techniques.
Akman and De Gooijer (1996) presented a technique to class a number of multiple time
series into groups. A simpler problem that is also relevant to multiple time series is to
place a number of univariate time series into groups (rather than multiple time series).
This feature is useful but not often covered in standard time series literature. Swift et al.
(2001) develop an unconventional technique for splitting a number of time series into
groups where within group dependencies are high and between groups dependencies are
low. Swift et al. (2001) apply their grouping techniques to data from a chemical process
from an oil refinery and a medical data set about glaucoma eye conditions. The provided
techniques allow for multiple time series that are not necessarily long in length to be split
into appropriate groups.
The grouping algorithm of Swift et al. (2001) has two main phases. The first involves
collecting all significant correlations between variables and lags of variables deemed
significant. The result of this is a list referred to as Q that contains triples, composed of
the two correlated variables and the time lag at which they are correlated. The second
phase uses a grouping algorithm based on Q to form groups from the original multiple
time series.
Swift et al. (2001) make use of evolutionary programming (EP), similar in nature to
genetic algorithms (GA), for finding correlated variables at particular lags. Then the
grouping was done using a genetic algorithm variant called a grouping genetic algorithm
(GGA).
81
The application of EP and GA to multivariate time series is fascinating and deserving of

further research. Although clearly potential exists, methods involving EP and GA are
beyond the scope of this thesis.
3.3
Bayesian Developments
Bayesian statistics provides a whole different view to dealing with statistical problems,
with a degree of flexibility unseen in conventional methods. Using Bayesian statistics
allows for the inclusion of previous information, gives the ability to stop or continue at
any point during an experiment and other features that may be considered advantages. A
prior distribution is assumed for data before analysis and this is combined with data that
follow another distribution to form a posterior distribution (Berry and Stangl, 1996).
Crucianu et al. (2001) investigate the application of Bayesian techniques to the

multivariate time series situation. The focus was on the modelling of time dependent
nonlinear processes. Extensions are proposed to better deal with existing problems with
Bayesian multivariate time series analysis. The purpose of this and similar Bayesian work
is to provide a valid alternative method of analysis that has the potential to be more
appropriate than conventional techniques in certain conditions.
The bulk of the Crucianu et al. (2001) article concerns structuring the prior distribution
and combining it successfully with multiple variable time series data to form a posterior
distribution. In particular, a general method was proposed for translating this prior
knowledge into a prior distribution. The distributions presented are complex mixes of
predominantly Gaussian, gamma and Dirac distributions. Recurrent Neural Networks
(RNNs) were used as a tool to solve the model parameters.
Bayesian statistics are a relatively new area of statistics and application to multiple
variable time series is in the early phases. Therefore, Bayesian techniques are not
discussed further in this thesis.
82
3.4
Nonlinear Developments
Linear relationships form the fundamental basis of most conventional time series
analysis. A lot of univariate and multivariate time series analysis techniques (for
example, cross correlation) work on the assumption of linear relationships. Often
relationships are not linear and therefore methods and techniques have been developed to
deal with nonlinear situations. This section looks at the most recent developments in the
area of nonlinear time series analysis.
There are a number of tests for nonlinearity available for univariate time series but they
are not as common in the multiple time series context. Palu (1996) developed a test of
nonlinearity for the multiple time series situation. These techniques are useful for
deciding whether linear or nonlinear models are appropriate.
In the method devised by Palu (1996), surrogate (substitute) data was generated using a
Fourier-transform based algorithm. Nonlinear statistics are formed from the surrogate and
actual data sets. If there was a significant difference between the statistics for the
surrogate data and the actual data, it was concluded that the (actual) data were not
generated by a linear process. Techniques and tests based on linear redundancies were
investigated to provide additional information about variable relationships and avoid
spurious results from imperfect surrogate data. The techniques were tested using data
generated from a two variable (bivariate) autoregressive model and from the Lorenz
(nonlinear mathematical) equations. Brief applications were shown for meteorological
and physiological data. The moderately mathematically complex methods are reliable and
informative.
Although univariate models capturing nonlinear behaviour (dynamics) are common,

multiple variable forms are less well known. Cao et al. (1998) investigate the modelling
and forecasting of multiple time series when the time series have nonlinear dynamics.
The purpose of developing multiple variable models was to display relationships and
information that may not be found or utilised using univariate analysis.
Cao et al. (1998) first consider each time series to see if it is relevant for analysis. In
particular, if a variable can be exactly predicted from other variables it is clearly not
83
necessary. A process of embedding was proposed to decide how many lags and
appropriate time delays to include in modelling the multiple time series. Different time
intervals are allowed as opposed to ARIMA based models, where fixed time intervals are
required. The focus was on finding an optimum embedding dimension (number of lags)
to include for optimum prediction of the values at the next time. The new multivariate
time series techniques were found to not be sensitive to the number of data points.
The technique of predicting values by Cao et al. (1998) was presented via a hands on
approach of looking at physical problems. Theory was presented on determining
predictive relationships between variables. Agreement between the reconstructed
(modelled) time series and the original time series was found to occur when modelling
two nonlinear mathematical equations known as the Rsslers and Lorenz equations. The
techniques presented are complex and appropriate when a phenomenon is accurately
recorded and not subject to much natural variation, as was the case with the physical
examples provided.
Kulkarni and Parikh (1999) also investigate the modelling of multiple variable nonlinear
time series but in a completely different way to Cao et al. (1998). The article extends a
univariate Artificial Neural Network (ANN) approach to a multivariate form. Their
reasoning for investigation was that several variables are usually required to describe
system behaviour (dynamics). Univariate models are looked at before dependent
variable, vector to scaler, vector to vector and partial data vector to vector
multivariate models. Computer generated data from the mathematical Lorenz equations
and Henon Map were used to test the time series models. The method accurately makes
short term predictions using clear and concise techniques.
Reick and Page (2000) present a method for predicting the next value in a single time
series using many nonlinear time series for information. To do this, multivariate versions
of standard nonlinear univariate methods were created. The class of nonlinear univariate
methods extended are classed as local and involve the use of nearest neighbour
techniques. The nearest neighbour to a section of a time series is another section that is
the most similar that can be found. The nearest neighbour previous series section is then
used to predict how the current series section will continue. The reason for creation of
84
these multivariate techniques was to take advantage of the information that additional
variables could provide.
The center-of-mass-prediction (COM-prediction) and local linear prediction (LLprediction) methods extended by Reick and Page (2000) were both based on nearest
neighbour techniques. Vectors are considered at points rather than individual values and
additional mathematics was included to effectively compare sets of vectors instead of sets
of individual values. Vectors allow for comparison of a number of time series instead of
just one. This innovative prediction technique is useful for short term predictions but is
unlikely to provide feasible long term predictions.
rstavik et al. (2000) investigate the modelling of a specific case of nonlinear multiple
variable time series where the multiple time series is generated by a spatio-temporal
system, a system which varies over time and space. These systems are commonly
nonlinear and include weather, rainfall and traffic. For the said systems rstavik et al.
(2000) propose algorithms to find certain measures (patterns) and then report on the
performance of these algorithms. These developments are based on and intended for use
on physical problems. Further discussion requires heavy mathematical knowledge and is
beyond the scope of this thesis.
3.5
Miscellaneous Developments
This final theoretical review section considers several miscellaneous research articles
found in the literature. Included herein is a look at the use of hierarchical
decomposition, artificial intelligence and state space modelling. Although varied in
nature, these articles present some of the latest and most inventive approaches in the
multiple time series arena.
Repucci et al. (2001) create a new approach to multivariate time series analysis that they
label hierarchical decomposition (HD). HD is in concept similar to standard
decomposition methods such as PCA (principal components analysis). Standard
techniques do not handle complex dynamically interacting variables very well in theory
or practice. Crucially, unlike PCA, HD is suitable for use where there are nonlinear
85
dynamics. For HD to take place, the variables must be organised hierarchically. This
involves there being one source time series that is based entirely on itself (autoregressive)
and noise. Each time series is then based on time series higher in the hierarchy, its own
state (autoregressive) and its own independent noise. If the variables are not in this form,
HD is inappropriate.
The HD method given by Repucci et al. (2001) involves first using PCA derived
components from the original data to create a multivariate linear autoregressive (MLAR)
model. The MLAR model is then transformed to be as consistent with a hierarchical
interrelationship between variables as possible. When there was a HD situation, the
underlying generators were accurately found. The techniques were tested using
simulation and an application to a multiple variable time series data set. For situations
where a hierarchical structure is present between variables, this analysis is ideal.
Guimares et al. (2000) apply the field of artificial intelligence to multivariate time
series analysis. Technically, genetic algorithms as seen in Swift and Liu (2002) can be
classed as artificial intelligence. However, in this application artificial intelligence was
used in the more common context of simulating human intelligence. The purpose of this
type of analysis was to classify multiple variable time series into groups depending on
their properties. Maharaj (1999) presented techniques for grouping multivariate time
series under a completely different paradigm.
Group allocations of multiple time series by Guimares et al. (2000) were based on
predefined characteristics by human experts. The planned field of application for this
type of analysis was in medical diagnosis, where observed multiple time series are
compared to typical features of diseases to result in a diagnosis. The specific method
developed by Guimares et al. (2000) was referred to as temporal knowledge conversion
(TCon). All criteria for diagnosing particular conditions are written in a structured form
of plain English, for ease of understanding and application. The criteria are specified
using primitive patterns (ranges that a feature should be in), events, sequences (events
with time ranges between them) and temporal patterns (combinations of sequences).
Software can then be used from these rules to directly analyse multivariate time series
data and give results. The simple, clear and concise methods proposed are currently
specialised in the medical field but could feasibly be modified for use in other fields.
86
Modelling and prediction of system performance reliability using multivariate time series
analysis was investigated by Lu et al. (2001). State space modelling, a general approach
in multivariate time series analysis and relative of vector ARIMA, was utilised. State
space modelling uses a state vector to give a picture of past and present behaviour. Future
behaviour was then described and determined from the present state and future inputs.
Reliability approaches typically involves probability distributions on time to failure. The

use of time series by Lu et al. (2001) was a new approach and this article presents a
multiple variable model whereas previous models have been univariate. The new
approach was expected to have benefits in better representing the dynamic underlying
conditions involved in reliability studies. System performance and reliability is usually
measured on one or more critical variables called performance measures. A state-space
model form was created to represent the multivariate time series situation formed from
these performance measures and create forecasts. An example of using the created
models for reliability analysis was provided, using a software package created specially
for reliability assessment. Although innovative and interesting, it is unclear at this stage
whether the new techniques form an advantage over typical probabilistic based methods
used in reliability assessment. State space modelling, as a general approach to time series
models, shows a lot more promise for other more standard time series applications.
Further information on state space models is beyond the scope of this thesis.
87
4 APPLICATION LITERATURE REVIEW

When a number of variables are measures over time, a multiple time series situation
arises. These situations arise frequently in many different fields because the reasons for
analysis and the underlying types of data differ. Research was conducted to show how
and where multiple time series have been used in recent years.
Applications of multiple time series analysis exist in fields as diverse as economics,

ecology and sociology. For this reason research was based on searching for multiple time
series analysis rather than particular journals. Journals found with applications included
the Journal of Hydrology, Agriculture and Forest Meteorology, Fisheries Research,
Continental Shelf Research and Artificial Intelligence in Medicine. Recent articles were
searched for, with all final applications included in this review being published in the last
eight years. Table 4.1 displays the fields involved in recent application papers found.
88
Article
Field of Application

Sociology

Meteorology

Medicine
Hydrology
Geology
Forestry
Economics
Ecology
2001
1999
1998
1995
2000
2000
1999
2001
2001
2000
1995
1998
2001
1996
2000
2002
1997
2002
2001
1998
Chemistry
Year
Boyd and Murray

Chan et al.
Chen and Dyke
Chin, D. A.
Diamandis et al.
Felmingham et al.
Green and Sparks
Guimares et al.
Jensen
Li and Kafatos
Nemec
Nicholson et al.
Pech et al.
Peiris and McNicol
Reick and Page
Rod et al.
Stergiou et al.
Swift and Liu
Swift et al.
Van Dongen and Geuens

Table 4.1: Field of application for recent literature articles.
Even more so than in theoretical papers, directions taken with applications of multiple
time series tend to be tightly linked with the field of the researchers. The resulting variety
of topics addressed is summarised in Table 4.2. The main sections noted in this table are:
One dependent One dependent time series is being investigated.
Many dependent Many dependent time series are being investigated

simultaneously.
ARIMA Multiple independent variable forms of autoregressive integrated

moving average (ARIMA) models.
89
Vector ARIMA Vector based versions of ARIMA, supporting multiple

dependent and independent variables. Includes vector AR (VAR) and vector
ARMA (VARMA) among others.
Combining series Combining a number of time series into a single time series.
Grouping variables Placing a number of time series into groups depending on

similarities.
Clustering Grouping particular periods in time together from analysing multiple

time series.
PCA Principle components analysis. Retrieving independent fundamental

components from time series.
Patterns The use of searching for patterns in a number of time series. Analysing
patterns may be used for diagnosis using artificial intelligence, grouping together
time series and so forth.
Nonlinear Methods Analysis when standard assumptions of linear relationships

are invalid.
90
Article
Techniques Involved
Nonlinear Methods
Patterns
PCA
Clustering

Grouping Variables
Combining Series
Vector ARMA
ARIMA
Many Dependant
2001
1999
1998
1995
2000
2000
1999
2001
2001
2000
1995
1998
2001
1996
2000
2002
1997
2002
2001
1998
One Dependant
Year
Boyd and Murray

Chan et al.
Chen and Dyke
Chin, D. A.
Diamandis et al.
Felmingham et al.
Green and Sparks
Guimares et al.
Jensen
Li and Kafatos
Nemec
Nicholson et al.
Pech et al.
Peiris and McNicol
Reick and Page
Rod et al.
Stergiou et al.
Swift and Liu
Swift et al.
Van Dongen and Geuens

Table 4.2: Techniques used in detail in recent literature articles.
The purpose of this application literature review was to sample the practical usage of
multivariate time series. The remainder of this chapter provides an overview of the recent
multiple variable time series applications found in a range of fields. The chapter is
arranged into the following broad categories medical applications, economic
applications, sociology applications and natural phenomena applications.
91
4.1
Medical Applications
Time series in medical applications are the logical result of monitoring an attribute over
time. Multiple time series in turn result when a number of attributes are viewed over
time. Commonly these attributes are different indicators of the health of a patient being
monitored. The three multivariate papers found from the medical field show a focus on
diagnosis as the result of analysis. None of the articles considered the use of these models
for prediction but this is clearly a possibility.
Guimares et al. (2000) take a different approach to the use and application of
multivariate time series. For the analysis of sleep related breathing disorders, there are
many variables that can be monitored including sleep related signals involving respiration
and circulation. The parallel recording technique of all of these variables is known as
PSG. There are many different types of sleep related breathing disorders that are
normally diagnosed by a human expert from a visual representation of multiple time
series. The aim of these authors was to be able to differentiate between disorders
automatically without the use of a human expert. Criteria were set up from human expert
knowledge to be able to make the necessary classifications.
It was the method of obtaining results that set Guimares et al. (2000) apart. The
technique used is based on artificial intelligence where computers are programmed to
simulate human intelligence. The specific method used was referred to as temporal
knowledge conversion (TCon). The method works by using primitive patterns (range that
a feature should be in), events, sequences (events with time ranges between them) and
temporal patterns (combinations of sequences). These basic elements are constructed
using plain English for ease of human understanding. Software is then used to directly
analyse multivariate time series data from sleep indicators and give results. The software
automates a process of comparing typical sleep disorder characteristics to that observed
to find a suitable diagnosis. Note that the elements used to make diagnosis decisions are
completely independent of any particular patients data. The challenge in this situation is
not so much the diagnosis process (which is rather simple) but in the accurate portrayal of
expert information used in the diagnosis.
92
Swift and Liu (2002) consider a conventional multiple time series situation where a
model was wanted to represent medical data. Restrictions of large parameter
requirements in vector ARIMA are bypassed using a genetic algorithm based technique.
The application was to data obtained from the group of medical eye conditions known as
glaucoma. Data involve many time series from different locations in the eye but over
short periods of time for multiple time series analysis (there were typically between 10
and 44 separate times). That is, analysis of many time series over a short period was
required rather than the typical few time series over a long time period.
The method used by Swift and Liu (2002) was given the label VARGA for vector
autoregressive genetic algorithm. For the glaucoma application, VARGA was found to
perform better than an equivalent VAR technique most of the time. The use of genetic
algorithms for multivariate time series is an interesting one that has a clear potential for
further advancement.
Swift et al. (2001) look at classifying multiple time series into groups. Given a number of
time series, the goal was to classify them into groups where within group dependencies
are high and between groups dependencies are low. Swift et al. apply innovative
grouping techniques to data from a medical data set about glaucoma eye conditions and a
chemical process from an oil refinery. Use is made of evolutionary programming (EP)
and a grouping genetic algorithm (GGA), both similar in nature to genetic algorithms
(GA), to develop a method for classifying time series. The developed procedure involves
an initial correlation search before application of an iterative grouping algorithm. In
effect, initial correlation information between time series is used to place the time series
into suitable groups.
The glaucoma medical data analysed by Swift et al. (2001) comprised of data recorded
from the right eye of 82 patients. Each patient was tested at different locations within the
eye every six months for between five and 22 years. This application of grouping was to
group together locations within the eye that are exhibiting similar behaviour. Because
there were 82 patients, 82 separate grouping applications took place (one for each
patients data). The final group sizes resulting from the grouping algorithm were smaller
than expected. Overall the algorithm gave groups low in size and similar for people with
good vision and large groups that are varied in size for those with low vision. The
93
chemical data set also analysed by Swift et al. (2001) involved 50 chemical processes
being recorded from a fluid catalytic converter every minute. The purpose of grouping
here was to group similar chemical processes together. The total time series had a length
of 3000 times and a large delay between cause and effect was known to exist. The results
have groupings of various sizes showing most variables to be dependent on others.
Applying the results to the original chemical process situation gave some interesting and
sometimes unexpected results.
4.2
Economic Applications
Economic data including stocks prices, exchange rates and interest rates can form time
series. Particular economic time series are likely to be influenced by other time series
since markets and rates are somewhat intertwined. Hence the field of multivariate time
series is of interest in the economic area. Not only could the relationships between time
series here be fascinating, but application of findings could prove financially rewarding.
The four papers discussed here look at the analysis of economic multivariate time series.
Chan et al. (1999) used multivariate time series to model relationships between stock
markets in China, Hong Kong and Taiwan. Hong Kong tends to function as a
middleman in trading between China and Taiwan, a result of political tension. A
multivariate ARIMA on daily stock market indicators from 1992 to 1997 was used to
model market relationships. The four stock market indicators used were the Hong Kong
Hang Seng index, Shanghai B share index, Shenzhen B share index and the Taiwan Stock
Exchange capitalisation weighted stock index. To deal with non stationarity, logarithms
were taken of the stock market values and then first differencing applied. Refer to section
2.4.6 for more information on stationarity.
Chan et al. (1999) used cross correlation matrices between the four market indicators to
establish relationships. Elements of these cross correlation matrices found to be
statistically significant gave information on relationships between markets. It was these
relationships that were particularly of interest as a result of analysis. Significant
correlations were then used to decide which factors to include in multivariate models for
94
each market indicator. Interestingly, the Hong Kong market appeared to be leading the
other markets. These and similar relationships can be found from the appropriate usage of
multiple variable time series techniques.
Felmingham et al. (2000) investigate the interdependence of the Australian and foreign
short term interest rates. The foreign markets of interest were the United States, Japan,
United Kingdom, Canada, Germany and New Zealand. A problem with modelling data
such as interest rates is that they are subject to sudden changes as a result of politics,
availability of resources and many other potential shocks. Sudden changes are referred
to as breaks and care must be taken in their analysis. Felmingham et al. (2000) use a
version of the Augmented Dickey Fuller (ADF) stationarity test that takes into account
breaks to find that all time series require first order differencing for modelling.
The data analysed by Felmingham et al. (2000) was quarterly short term interest rates
from 1970 to 1997. Bivariate (two variable) and multivariate analysis using vector AR
based models was applied to the data paying special care to cointegration. Cointegration
is discussed earlier in section 3.1. The appearance of cointegration was regarded as
limited evidence of long term relationships.
Diamandis et al. (2000) took a detailed look into the Greek drachma to dollar and
drachma to mark exchange rates. The interest was in fitting an accurate model that could
then be used to produce better forecasts that the nave random walk method. Money
supply, income and short-term interest rates were available for each market being
analysed. They applied a new approach to determine the order of integration (number of
times first differencing is required) necessary for model components (variables). The
problem with existing methods to determine the level of differencing needed is that they
have been developed for univariate models. These unit root tests are not suitable in the
multivariate context. The model was found to contain one component that required first
differencing twice. A data transformation was applied to convert this component into one
that only involved first differencing once. This data transformation was applied because
statistical inference techniques for time series involving first differencing twice are not as
developed as those for simply first differencing. The detailed analysis that followed,
among other things, showed a relationship between domestic money and exchange rates.
95
In their paper Diamandis et al. (2000) use a multivariate cointegration technique to find
combinations of non stationary variables that form stationary series. The chosen
cointegrated vector autoregressive (VAR) model took into account these relationships. A
dynamic error correction model for forecasting based on long term behaviour was found
to well outperform the nave random walk method.
Another example of modelling economic data using multivariate time series analysis is
provided by Green and Sparks (1999). The exact sources of the growth and dynamics of
Canadian economic development around the start of the 20th century have been debated
for some time. This paper looked at annual data from 1870 to 1939 to resolve the debate
by pinpointing the exact sources of growth and dynamics.
Green and Sparks (1999) use a form of a vector autoregression (VAR) model to represent
the variables of population, terms of trade, exports, investment and gross national product
(GNP). Specifically, a cointegration model was used to exploit linear combinations of
non stationary time series that resulted in stationary series. The model allowed for
dramatic changes referred to as innovations in certain variables to effect specific other
variables. The largest impact on growth and trends in Canadian development was found
to come from innovations in population.
4.3
Sociology Applications
Social data on crimes, marriage rates and so forth collected over time form time series.
The analysis of multiple social time series has the potential to reveal patterns and trends
in human society. Although only one article was found in the class, the article by Jensen
(2001) gives an indication of the potential of multiple time series analysis in the
sociology area.
Literature commonly claims that there is a relationship between television and homicide
(murder) rates. Using multivariate time series regression similar to multivariate AR
Jensen (2001) found this relationship to be spurious. Multivariate time series models
were formed with homicide rates being dependent on many social indicators. These
indicators included the marriage-divorce ratio, Cirrhosis death rate, immigration,
96
unemployment and percentage of population 15 to 24 years old. Lagged terms of

previous homicide rates and television were included. The television effect as judged by
number of televisions, was lagged by ten years for one model and fifteen in another. This
lagging was in line with the claims by those linking television with murder.
The analysis by Jensen (2001) was carried out on murder involving white males and
females in the United States (1945 to 1992), murder rates in Canada (1950 to 1985) and
white murder rates in South Africa (1950 to 1985). Divorce was found to be a far better
indicator in all situations than lagged numbers of televisions. In the United States where
Cirrhosis data is available, a significant relationship was found between this indication of
alcoholism in society and homicide rates.
The analysis by Jensen (2001) could have been improved by the consideration of more
lags of the provided variables. As it stands, no lag was assumed of any variables except
television where set time lags were assumed prior to analysis. There is the potential for
relationships with homicide levels to have so far been undetected. A full vector based
model could also provide fascinating information on the nature of relationships between
all of the variables included in analysis rather than just homicide levels.
4.4
Natural Phenomena Applications
Many different fields use multivariate time series analysis for the fundamental task of
analysing natural phenomena. Statistical analyses are common in the analysis of natural
phenomena due to the standard inclusion of natural variation or error as is the norm in
natural situations. Often an integral part of natural phenomena models is the effect of
inevitable human intervention. Fields with natural phenomena applications include
chemistry, ecology, forestry, geology, hydrology, and meteorology.
Nemec (1995) was one of very few articles to apply time series in the forestry context.
Nemec presents a practical paper focusing on using repeated measures and time series
techniques for a set of forestry problems. The overall purpose of the paper was to give a
guide to dealing with situations where measurements are correlated. Where
97
measurements are taken repeatedly on a particular entity then correlation must be

factored into analysis.
The specific examples investigated by Nemec (1995) are repeated measures on the height
of seedlings, time series of tree rings over many years and the relationship between tree
rings and rainfall. The time series techniques used are ARIMA variants. Although Nemec
does not go into too much depth on the multivariate methods, basic information was
provided on the availability and use of these techniques. The SAS statistical package was
referred to and used for all applications. For repeated measures Nemec (1995)
demonstrated the SAS glm procedure, with occasional references or assistance from the
anova, print and sort procedures. For time series analysis, the arima, autoreg and
forecast procedures were used. Detailed hypotheses provided a variety of information
on the nature of the variables involved. The potential for multivariate analysis could have
been explored in more detail but for the most part analysis was appropriate and
informative.
Peiris and McNicol (1996) investigate modelling daily weather using multiple variable
time series techniques. By modelling rain and non-rain variables simultaneously, they
modelled wet and dry days in the one model. Previous models tended to be specific to a
particular task or site whereas Peiris and McNicol strived for a general model to apply to
the entire Scottish climate. Data from many sites spanning 15 to 50 years was available.
Four sites were chosen for detailed analysis. The reason for analysis was to investigate
patterns and trends in rainfall in Scotland.
Some particular rainfall variables were found by Peiris and McNicol (1996) to have
annual cyclic patterns well modelled using cosine functions. Modelling using sine or
cosine waves is an alternative to ARIMA based modelling techniques. In this case the
cyclic components were more of a hindrance that an interest. These cyclic components
were removed before proceeding with multiple variable analyses.
The authors used the resulting detrended variables in a vector ARMA (VARMA) model.
Parameter estimation was calculated using maximum likelihood and the final model used
for the four Scottish sites was a VARMA(2,1) model. That is, a vector ARMA model
with an autoregressive component of order two and moving average component of order
98
one. The large number of parameters needing estimation in this model led to the
possibility of overfitting. Rainfall models were created by applying logistic regression
techniques on the other weather variables. Finally real and simulated weather predictions
were compared over time. Practical solutions to lessen the threat of overfitting are to use
less variables or longer time series. A good level of agreement was found between the
predicted and actual weather.
Starting with an overview of conventional vector ARMA methods, Chin (1995) develops
and uses a scale model to represent a multivariate rainfall time series. Chin (1995) saw
conventional multivariate ARIMA based methods as restrictive and inappropriate for his
particular purpose, which involved wanting to model monthly and yearly rainfall data
from the US state of South Florida. The scale model presented by Chin (1995) allowed
for the distinction between regional-scale processes that apply to all locations and smallscale local processes that only apply to a particular location. Processes (variables) are
judged as regional-scale if they are correlated with rainfall in many locations and as
small-scale if correlated with rainfall in only one location.
Chin (1995) created models for both monthly and annual rainfall. The monthly model
was found to be particularly suited to the state space model because the regional-scale
phenomena were found to have a temporal structure. That is, average amounts of rainfall
in particular months were gradually changing over time. This behaviour violates the
assumptions of standard multivariate ARIMA based models.
Li and Kafatos (2000) investigate the relationship between the normalised difference
vegetation index (NDVI) and the El Nio Southern Oscillation (ENSO). The data set
analysis was 11 years worth of monthly NDVI (and ENSO) measurements from 1982 to
1992 from locations throughout the United States. For more information on the nature of
the NDVI and ENSO measures consult the original article. For analysis the authors first
removed the seasonal components before applying principal components analysis (PCA)
to the NDVI. The purpose of the PCA was to find the main sources (components) of
variation within the time series data. The result of the PCA was independent time series
from within the NDVI referred to as standardised linear combinations (SLCs).
Interannual signals were investigated by wavelet decomposition involving the use of
wavelets, which are fundamental building block functions localised in time or space (Li
99
and Kafatos, 2000). The result was that the fifth strongest principal component (time
series) from the NDVI was significantly correlated with the ENSO signal. However, this
principal component only explained 0.3% of the variance in the NDVI. Appropriate use
of time series analysis found the relationship between the NDVI and the ENSO.
Rod et al. (2001) predict the water level in the large Lake Gallocanta in Spain for the
years 1889 to 1994 before using multivariate time series techniques to explain the
observed levels. The main purpose of analysis was to provide explanations on the water
level influences. Lake water levels were first predicted using a geochemical method.
Detailed analysis of sediment core samples taken at two sites in the Lake Gallocanta
resulted in a time series of water level. Consult Rod et al. (2001) for the exact methods
of how the lake level evolution was reconstructed from the soil core samples. The theory
and use of the methods to reconstruct lake levels from the soil core samples formed the
bulk of the article.
Multivariate time series techniques were then used by Rod et al. (2001) to find
influences on lake water levels. Before analysis, first order differencing was applied to
the lake level to make the time series stationary (see section 2.4.6). The final time series
model explained 62.5% of the lake level variance by modelling annual water level from
annual rainfall and mean maximum temperature. Other potential parameters were tested
but not found to be significant. Additional variables such as evaporation, wind and
relative humidity were not available for the range of years included but could have been
valuable. Analysis over selected years where these variables are available may provide a
more detailed picture of water level influences.
A number of time series may share an underlying trend that is not immediately obvious
from plotting each time series against time. Nicholson et al. (1998) investigate forming a
single time series summarising trend from a number of time series. The data for analysis
involved five groups of phytoplankton sampled over 13 months from two geographical
locations in the North Sea. The goal was to find a linear transformation (univariate time
series) of the multiple time series to maximise the trend. A few techniques are available
for doing this, and a linear smoother method proposed by Hastie and Tibshirani in 1990
was used. This method carries advantages of data not needing to be equally spaced in
time and outliers (extreme values) have less of an impact than in other techniques.
100
Nicholson et al. (1998) took logarithms of the provided data prior to the application of
the linear smoother. The purpose of the logarithm transform was not provided but
assumed to be to enforce stationarity of variance (see section 2.4.6). The amounts of
particular forms of phytoplankton were found to be quite different between the two
locations. That is, linear smoothers applied in the two locations separately showed quite
different trends. More investigation into the details of the linear transformation for
practical applications would be valuable for future analyses.
Boyd and Murray (2001) investigate 22 years worth of yearly measurements of 27

variables. The 27 highly correlated variables were recorded over time from a marine
ecosystem at South Georgia where approximately 36% of the data was missing.
Logarithmic transforms were applied to some variables before inclusion in the model (for
stationarity of variance, see section 2.4.6).
For analysis Boyd and Murray (2001) combined the multiple time series into a single
time series called a combined standardised index (CSI). Three approaches were used in
the formation of the CSI with the third method, which involved smoothing a covariance
matrix to ensure positive eigenvalues, being the most successful. It was acknowledged
that this combining may lose important distinctions between the combined groups. The
CSI of the provided data showed periodic fluctuations but little evidence of a long term
trend.
Boyd and Murray (2001) took a look at how each CSI dealt with correlated and relatively
uncorrelated data. Predictably, when the data was relatively uncorrelated the correlation
between a modelled index and actual index got weaker as more data was removed. When
the variables were correlated the formed index was robust, handling 40 to 50 percent of
values missing.
Most univariate and multivariate time series analyses carry with them assumptions of
linear relationships. When these can not be assumed, nonlinear techniques are available.
A number of recent nonlinear techniques are discussed in section 3.4. Reick and Page
(2000) provide an application using nonlinear techniques referred to as next (or nearest)
neighbour methods. These methods also carry the advantage that they do not require a
101
time series to be stationary. Previously used on univariate time series, next neighbour
methods are generalised for the multivariate context and applied. The most elementary
concept behind nearest neighbour prediction is that of analogue prediction. Analogue
prediction involves searching for a time series section as similar as possible to that
leading up to where the prediction is wanted and assuming the earlier pattern continues.
The common nearest neighbour techniques of local linear (LL) prediction and center-ofmass-prediction (COM) are based on the concept of analogue prediction.
Reick and Page (2000) analysed data from twelve time series of zooplankton numbers
collected from the German North Sea. Three data sets were created from these twelve
time series by using different moving average lengths to deal with the noisy nature of the
initial data set. A number of additional quantities covering aspects such as water
temperature, salinity and wind were also available. For each time series in each data set
three time series models were created. These were univariate (one variable), bivariate
(two variable) and trivariate (three variable) models. In every model one of the variables
was the one being predicted. In most cases the next neighbour prediction schemes were
found to give better predictions than comparative autoregressive (AR) models.
Various time series methods were applied and compared by Stergiou et al. (1997). The
data set analysed contained total monthly commercial catches for sixteen species from
eighteen fishing sub areas around the Greek islands. A number of independent variables
covering fishing effort, economic factors and climatic factors were also available. Three
general categories of models were applied. The first were standard simple and multiple
linear regression models based on external independent variables. The second group were
the univariate techniques Winters exponential smoothing and ARIMA. The final group
were multivariate techniques, which included harmonic regression, dynamic regression
and vector autoregression. All of these techniques are discussed in varying detail in
sections 2.4 and 2.5. For most techniques one model was made for each species creating
sixteen models in total for each particular modelling method.
The success of modelling techniques by Stergiou et al. (1997) was measured by a number
of different standard criteria. The focus was on finding a model that minimised error in
fitting and gave the most accurate forecasts. Despite the inclusion of multivariate
techniques, the all round best performer as judged by standard measures turned out to be
102
the univariate ARIMA model. The multivariate dynamic regression model was not far
behind though. This demonstrates that in the time series arena complex does not
necessarily mean better.
Pech et al. (2001) explore the logical relationship between fishing activity and
availability of the resource being fished. Seven time series collected from 1974 to 1992
documenting fishing effort and subsequent catches were analysed. All combinations of
fishing resources, tactics, strategies and locations used were noted. The approach taken to
analysis was mainly mathematical rather than statistical. Mathematical expressions were
derived to represent the effort, catch and biomass (amount of fishing resources) involved.
To suit the available data, the expressions were changed to involve lower level (more
specific) variables including a measure of catch effort. Overall the formulae were created
specifically for the situation and based on assumptions unique to that situation. The
statistical techniques used were to estimate the parameters in the resulting formulae. This
statistical application is tightly bound with the particular situation.
Chen and Dyke (1998) model suspended sediment concentration along with its
relationship to current water velocity profile using multivariate time series techniques.
This is a new approach towards a situation typically dealt with using deterministic partial
differential equations. Previous investigation by Chen and Dyke found that a multivariate
model was more appropriate that a univariate model. The model decided for use was a
multivariate time series model called ARMAX (refer to section 2.5.1).
In their work Chen and Dyke (1998) used a recursive least square algorithm to find
parameters for the ARMAX model. A set of statistical measures (one-step prediction
error, maximum one-step prediction vector error, maximum one-step prediction element
error and maximum parameter variation) were used to judge the suitability of different
ARMAX models. The final suspended sediment model decided upon was an ARMAX (4,
2, 1) model. Although the model was found to fit the data well, the mass of parameters
gave little in the way of practical physical, chemical or biological meanings. Significance
testing of model parameters may have assisted in drawing meaning.
Another example of using multivariate time series analyses in an unconventional

application can be found in Van Dongen and Geuens (1998). Wastewater treatment
103
problems typically involve deterministic analysis of differential equations. Time series

techniques were used to handle the more variable nature of realistic situations.
Univariate and multivariate ARMA models were used by Van Dongen and Geuens
(1998) to model the behaviour exhibited by a lab-scale biological wastewater treatment
plant. Every model had one dependant variable. Multivariate models were created to
explain the behaviour of effluent filtered, effluent suspended solids and amount of mixed
liquor suspended solids (MLSS). Success was judged by the amount of variance
explained in the dependant variable as indicated by the Akaike Information Criterion
(AIC) and the adjusted r2. A number of independent variables were available for
inclusion in the multivariate models. Correlation between the independent variables
meant that they were not technically independent variables. To deal with these
dependencies, the advanced least squares parameter estimation techniques two stage
least squares and three stage least squares were used rather than ordinary least squares.
Changes were made to dependant variables by Van Dongen and Geuens (1998) as it was
found that some variables modelled better than others. Some variable values were
inverted while others were used in the form of ratios. The better performance after the
transformation is likely to be because the resulting data adhered better to assumptions of
linear relationships. The final models were judged as successful due to the significance of
stochastic (involving probability) parts of the models and the significance of lagged
explanatory variable effects.
As was the case with Chen and Dyke (1998), Van Dongen and Geuens (1998) applied
time series techniques where deterministic models involving partial differential equations
are typically used. In these situations overall behaviour can be effectively modelled by
time series but there is a compromise in interpretation. Models formed from partial
differential equations are built from basic relationships so returning results to the original
context is fairly simple. On the other hand, when time series techniques are used the
results are relatively detached from the original situation. Time series based techniques to
more effectively convey results could be of value in these situations.
104
5 FORESTRY CASE STUDY

This chapter investigates data sets resulting from an in depth experiment on the effects of
mechanical harvesting operations and site management (particularly soil cultivation) on
plantation productivity in second rotation hoop pine plantations. The original trials were
set up by the Queensland Forestry Research Institute Agency for Food and Fibre
Sciences, Queensland Department of Primary Industries (DPI) and are detailed in Smith
and Bubb (2000).
As part of the experiments, the Griffith University team of the Cooperative Research
Centre (CRC) for Sustainable Production Forestry collected soil chemical and biological
data. The data sets contained many variables measured over up to nineteen time periods.
Some results and interpretations produced prior to this case study can be found in
data set.
The focus of this case study is on investigating the effects of soil compaction and soil
cultivation on chemical and biological variables. Since the experiments were conducted
over time it is also of interest to observe variable behaviour in relation to compaction and
cultivation over time. Correlated data is involved because there are repeated measures
over time.
This chapter initially presents the data sets along with information on how they were
collected and issues relating to their usage. Previous data analysis is scrutinised and
detailed analyses provided of the data sets using advanced statistical methods.
5.1
Background
A large experiment was set up on approximately 4.6 hectares of land at Yarraman (26
52 S, 151 51 E), north west of Brisbane, Queensland, Australia. The experiment was to
investigate the effects of mechanical harvesting operations and soil cultivation on Red
Ferrosol (Krasnozem) soil properties under wet weather conditions (Smith and Bubb,
105
2000). Specifically, these effects were to be investigated at plantation establishment of a

second rotation (2R) Hoop Pine plantation.
The data sets provided are from experiments that were part of a larger set of experiments
conducted at the Yarraman site. The experimental design used in the data sets provided
was based on the original experimental design set up by the DPI. The original experiment
was a randomised complete block (RCB) design with three blocks and twelve treatments.
Within each block the twelve treatments were randomly allocated to different locations.
The three blocks were based on slopes considered in three categories of upper (1), mid
(2) or lower (3). The twelve treatments were formed from combinations of two factors;
compaction with four levels and cultivation with two levels. Compaction levels were set
by using a fully laden Hemek F18HP Cranab 1200 forwarder weighing 40.2 tons (see
Figure 5.1). The four compaction levels used were zero pass (no compaction), one pass,
four pass and sixteen pass. The three cultivation options were none, disc plough and
winged ripper. The three blocks and twelve treatments led to 36 different sub locations
within the main Yarraman site. For more details on the original experiments and design
consult Smith and Bubb (2000).
Figure 5.1: Picture of the forwarder used for compaction in experiments.
Data sets have been provided from measurements of soil chemical and biological
variables over time. Both sets involved measurements taken every 28 days, where this
106
measure was referred to as a month. The chemical data set was measured over nineteen
time periods while the biological data set covered only fourteen time periods. The
fourteen time periods in the biological data set coincide with the first fourteen time
periods in the chemical data set. The exact dates involved in sampling are given in
Appendix C and range from the 3rd of February 2000 through to the 19th of July 2001.
The soil chemical and biological experiments did not utilise the full number of treatments
available. Three compaction levels of (1) zero pass, (2) one pass and (3) sixteen pass
were used along with two cultivation levels of (1) none and (2) disc plough. This created
six treatments in total when twelve were available. The three blocks were still utilised,
leading to the use of eighteen plots within the Yarraman site.
General environmental measures were also provided to accommodate potential

environmental influences on data variation. In particular, monthly rainfall, mean
maximum temperature, mean minimum temperature and mean temperature range were
provided for each month. Rainfall was measured on site using a pluviometer while
temperature data were calculated from Yarraman Forestry Office records.
Both data sets contain measurements of soil moisture for each soil sample but from
different perspectives. The chemical data set used weight of water in sample divided by
weight of wet soil while the biological data set used weight of water in sample divided by
weight of dry soil. The result is that the two measures are different in nature but
noticeably correlated.
It was considered that the effects of both soil compaction and cultivation are likely to
vary according to soil depth. For this reason the chemical data set considers two soil
depths of 0-10 cm and 10-20 cm. In the biological data set only the 0-10 cm depth is
investigated.
In the chemical data set, there was interest in dynamics of nitrogen transformations and
leaching. Soil mineral nitrogen dynamics covers soil mineral nitrogen fluxes at various
periods of measurement. To investigate soil mineral nitrogen dynamics and leaching a
technique was used that involved sequential, in situ exposure of soils. At each location in
each month, three sampling tubes called cores were driven into the ground (see Figure
107
5.2). One was removed immediately and used for baseline data. The remaining two were
left for the 28 days, one capped so that nitrogen leaching could not take place and one
uncapped so that nitrogen leaching could occur.
Figure 5.2: Three sampling cores in the ground at Yarraman. One is being removed.
Each soil sample was analysed for the three forms of nitrogen nitrite (NO2), nitrate (NO3)
and ammonium (NH4). The standard metric used in analysis for these measures was
kilograms of nitrogen per hectare (kgN/ha). Soil mineral nitrogen dynamics were
calculated by subtracting baseline core levels from the capped core levels. For
ammonium, it was assumed that positive dynamics were related to nitrogen
mineralisation while negative dynamics were a reflection of immobilisation or
nitrification. For nitrate and nitrite, positive dynamics were assumed to be the result of
nitrification whilst negative dynamics were assumed to be the result of denitrification.
Leaching was calculated by subtracting the uncapped core level from the capped core
level. By this method data from the three cores produced baseline nitrogen levels,
nitrogen dynamics and nitrogen leaching measures for each form of nitrogen.
Biological variables were measured from the same soil samples as used in the chemical
data set. Every month the baseline samples from the chemical data set were analysed at
the 0-10 cm depth level only. In particular, microbial biomass carbon (MBC) and
108
microbial biomass nitrogen (MBN) were measured. An additional ratio variable,

calculated from microbial carbon divided by microbial nitrogen (ie. MBC / MBN), was
also of interest for analysis.
Further details on the methods used for sampling and extracting data can be found in
data set. Table 5.1 contains a summary of the factors and variables provided in the data
sets. For more information on the factors and variables provided, consult Appendix D.
Data Set(s)
Both
Both
Both
Both
Chemical
Chemical
Biological
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Chemical
Biological
Biological
Biological
Biological
Label
Month
Block
Compaction
Cultivation
Depth
Grav
Moist
Rainfall
MaxTemp
MinTemp
TmpRange
HaNO2
HaNO3
HaNH4
HaTOTN
PotNO2
PotNO3
PotNH4
PotTOTN
LchNO2
LchNO3
LchNH4
LchTOTN
MBN
MBC
MicroC:N
MBNFlux
Units
[Levels]
[Levels]
[Levels]
[Levels]
[Levels]
%
%
mm
C
C
C
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
kgN/ha
g/g
g/g
ratio
g/g
Comments
Categories: 0 to 19, or 0 to 14.
Categories: 1 to 3, Based on slope.
Categories: 1 to 3 (0, 1, 16 pass).
Categories: 1 (none) or 2 (plough).
Categories: 1 (0-10 cm) or (10-20 cm).
Gravimetric soil moisture.
Soil moisture, different to grav.
Recorded monthly rainfall.
Mean monthly maximum temperature.
Mean monthly minimum temperature.
Mean monthly temperature range.
Nitrite levels.
Nitrate levels.
Ammonium levels.
Total mineral nitrogen levels.
Nitrite dynamics.
Nitrate dynamics.
Ammonium dynamics.
Total mineral nitrogen dynamics.
Nitrite leaching.
Nitrate leaching.
Ammonium leaching.
Total mineral nitrogen leaching.
Microbial biomass nitrogen.
Microbial biomass carbon.
Ratio of MBC / MBN.
Changes in MBN each month.
Table 5.1: Summary of factors and variables provided for analysis in the case study.
The purpose for the analysis of the data sets is to investigate the effects of compaction
and cultivation over time on the chemical and biological variables introduced in this
109
section. It is of interest to see if compaction affects these variables, if cultivation affects

these variables or if perhaps an interaction of these two factors is involved. Furthermore,
the role that time has to play with any of these relationships is of special interest. Related
to this investigation are other factors and variables (blocks, rainfall, etc.) that may also be
influencing the variables.
5.2
Previous Data Analysis
Existing articles can be found addressing initial analyses of the chemical data set in
Blumfield et al. (2002) and the biological data set in Chen et al. (2002). In this section
the initial analyses are reviewed in a statistical sense. The original papers contain detailed
analysis and conclusions in terms of forestry that are not pursued here. For more
information on the forestry issues involved, consult the original articles.
5.2.1
Chemical Data
Blumfield et al. (2002) analysed the chemical data set using the SPSS Base 10 software
system (SPSS, 1999). Parametric analysis techniques used included ANOVA for
comparison of means and group contrast multiple comparison tests. Where the
assumption of normally distribution populations seemed uncertain, nonparametric
techniques including the Mann Whitney U-test for equality of means and Spearmans rho
(rank correlation) were utilised. The application of non parametric methods is a good idea
given that normality is not likely in a lot of the variables.
Decisions on significance were based on the standard 0.05 significance level using p
values. Commonly p values were quoted along with significance information.
An unexplained curiosity in the data was the extremely high values of ammonium
observed in month sixteen. No evidence could be found of measurement error or similar
that could have caused the extreme values. Therefore the values were treated as genuine
outliers and omitted from analyses.
Soil moisture was found to be correlated with rainfall, maximum temperature and
minimum temperature. Soil that had not been cultivated was found to have significantly
110
higher mean moisture content at 0-10 cm than soil that was cultivated. Moisture levels
were not significantly different between cultivation levels at the 10-20 cm depth. Soil
moisture was found to be significantly higher from sixteen pass compaction than from
zero pass and one pass compaction. This was found to be the case both with and without
cultivation.
Blumfield et al. (2002) conducted ANOVAs separately for each combination of the two
depths and three nitrogen dynamics measures (nitrate, ammonium and total nitrogen).
The data used in each ANOVA was cumulative totals over the nineteen months. Two
small amounts of missing data were ignored in these calculations which although not
recommended probably only negligibly changed the results.
There are advantages and disadvantages to the use of cumulative data. The resulting data
analysis looks at the overall behaviour exhibited but gives no information on how
behaviour changes over time. The picture at certain months and periods of time within
the data may be completely different to the overall picture but this will not be shown. The
effect of individual extreme values may be amplified and give the appearance of
relationships that are not really there.
The use of blocks in the experimental design presents a theoretical and practical problem.
The randomised complete block (RCB) design theoretically involves a block that is
assumed to have an effect but not interact with other factors (Zar, 1999). In the
experiment, the blocks were based on subtle differences in slope (Blumfield, personal
communication) that would not be expected to lead to significant differences in nitrogen
measures. This means that the blocks are not expected to have an effect and suggests that
a completely randomised design (CRD) may have been more appropriate from the outset.
Each ANOVA model tested the factorial effects block, compaction, cultivation, block by
compaction interaction, block by cultivation interaction and compaction by cultivation
interaction. Equation 5.1 shows the form of each specific model. Every model uses the
three way interaction of block by compaction by cultivation as the error term, which has
only four degrees of freedom. A further characteristic of RCB designs is that interactions
involving the block are estimates of natural variation. To remain true to the block design,
the two block interactions that were tested for significance should have been included in
111
the error. More worrying is that at times these block interaction terms were found to be
significant. Combining this fact with the lack of practical differences between the blocks
suggests that a significant block interaction may indicate that effects depend on the
particular piece of land.
Measureijk = + Compactioni + Cultivation j + Block k

+ (Comp. Cult.)ij + (Comp. Block )ik
( 5.1 )
+ (Cult. Block .) jk + ijk
Where:
i = 1, 2, 3 (compaction level indicator).
j = 1, 2 (cultivation level indicator).
The bulk of the article by Blumfield et al. (2002) deals with implications of these results
and more specific results obtained from multiple comparison tests after significant
ANOVA results. Due to the aforementioned concerns with the base design and the
specific nature of reported results, these are not detailed in this section. Blumfield et al.
(2002) should be consulted for further information.
5.2.2
Biological Data
Chen et al. (2002) analysed the biological data set by applying two main groups of
ANOVA models to the data set. The first group were ANOVA models where the factors
soil compaction, cultivation, block and sampling month were included in each model of a
biological measure. The factorial effects included in these models are shown in Equation
5.2. The second group were one factor ANOVA models looking at the main effects of
soil compaction and cultivation.
Measureijkl = + Monthi + Compaction j + Cultivationk + Block l

+ (Month Comp.)ij + (Month Cult.)ik + (Comp. Cult.) jk
+ (Comp. Block ) jl + (Cult. Block .)kl
+ (Month Comp. Cult.)ijk + ijkl
112
( 5.2 )
Where:
i = 1, 2, 3, , 14 (month level indicator).
j = 1, 2, 3 (compaction level indicator).
k = 1, 2 (cultivation level indicator).
l = 1, 2, 3 (block level indicator).
There are two main issues with ANOVA as conducted by Chen et al. (2002). The first is
that sampling month has been included as a factor in ANOVA. Correlation between
measures taken at different times in exactly the same location breaks the ANOVA
requirement of random, independent errors. Therefore models simply including the
sampling month as an additional term are not appropriate and may result in misleading
conclusions. Secondly, looking at main effects for variables when there may be
interactions is not recommended. This can lead to misleading conclusions depending on
the exact nature of the interactions.
The biological data set is also faced with the theoretical and practical problems resulting
from the RCB design as was the case with the chemical data set. In the overall ANOVA
models two block interactions were included, leaving a number of two, three and four
way interactions to form the estimate of error. It is unclear exactly how the factorial
effects to be included in the model and the factorial effects to be included in the error
were decided. More of interest is that there were occasional significant block interactions,
again hinting at behaviour depending on the particular piece of land.
Most statistical results reported by Chen et al. (2002) were based on the two base types of
ANOVA models introduced previously. Given the analysis problems, further results are
not reported in this thesis and it is recommended that the original article be consulted for
further information and biological interpretations.
113
5.3
Limitations and Scope
The main limitation for the application of statistical techniques is the limited length of the
investigations in time. Both data sets cover less than one and half years worth of time.
This makes estimation of seasonal variation difficult because there are very few seasons
available. In fact, some months only appear once in the data sets, meaning that any
assumptions on seasonal behaviour would be rather nave.
A small number of nitrogen measurements were missing from the chemical data set.
Some appeared to be left out by accident and were easily filled in from other data while
in two cases data were missing and not able to be obtained. Nitrogen measures were not
provided for both cultivation levels in month seven, block three, compaction level three
and depth two. With less than 0.3% of the data missing, this small number of missing
values is not a cause for concern. It was decided to take into account these missing values
in analysis rather than attempt to approximate values.
The chemical data set contained measurements at two depths of 0-10 cm and 10-20 cm
while the biological data set only looked at the first depth. For the purposes of this thesis
only the 0-10 cm depth is investigated. The reason for this is that the majority of the
effects and behaviour are expected in the 0-10 cm depth (hence why the biological
experiment was only at this depth).
Detailed biological and chemical interpretations of behaviour are beyond the scope of this
thesis. Analysis is dealt with from a statistical point of view in line with the objectives of
the thesis.
114
5.4
Data Analysis Techniques
This section provides a detailed discussion of the statistical models and techniques
applied in section 5.5 during data analysis. Features of the data set as a whole are
discussed before detailed information on the procedures used in data analysis. All
analysis was aided by the SAS statistical package (SAS Institute, 1999) and Microsoft
Excel (Microsoft Corporation, 2001).
5.4.1
Analysis Direction
There are a number of influences that have affected the direction taken in this thesis
towards data analysis. This section contains a review of how the particular direction of
analysis was chosen from the original experimental situation, data sets and other relevant
information.
The original samples could not be assumed to be taken from normally distributed
populations (normality). The main difficulty with assuming normality is the nature of the
variables measurements. Many of the variables are concentrations, which are known to
form a lognormal distribution due to the way they are measured (Chaseling, personal
communication). The majority of the variables that were not concentrations (eg. soil
moisture, rainfall) form distributions that are skewed to the right and where larger
variable values are likely to have a larger standard deviation. These types of situations are
common in biological situations (Rao, 1998).
A mathematical natural log transformation was applied to bring all variable samples
closer to normality. This type of transformation is commonly applied to achieve a closer
adherence to normality, particularly in the case of concentrations and biological variables
(Rao, 1998 and Zar, 1999). The concept of the log transformation is shown in Equation
5.3, where y is the original value and g(y) the new, transformed value. This
transformation was applied to all variables considered except the chemical variables (i.e.
it was applied to the biological variables, rainfall, temperature and moisture variables).
For the chemical variables the transformation shown in Equation 5.4 was used instead as
115
some variable values were zero. A log can not be taken of zero as the result is an infinite
number.
g ( y ) = log e ( y )
( 5.3 )
y = e g ( y)
g ( y ) = log e ( y + 0.1)
( 5.4 )
y = e g ( y ) 0.1
The natural log transformation for normality was not applied directly to the dynamics and
leaching measures in the chemical data set. Rather, the transformation was applied to the
original baseline, capped and uncapped core readings (derived from the provided data
set). The dynamics and leaching measurements used in analysis were derived from these
logged original recordings.
A small error was found in a chemical data measurement while determining the original
measurements of capped and uncapped cores from nitrogen levels, dynamics and
leaching. In month thirteen, block three, compaction level three, cultivation level one and
depth two there is an invalid nitrite level and nitrite dynamic measure combination.
Should the data be assumed true, a negative measure was recorded for nitrite in the
capped core. Deciding that the error was more likely to be in the derivation of the
dynamics measure, the dynamics value was changed to zero. The value of zero was
chosen as it would mathematically agree with other data set measures.
A number of extremely high values can be found in the ammonium levels collected from
the baseline samples in month sixteen. These outliers appear to be accurate but were
completely inconsistent with the remainder of the measurements. A flow on effect on
ammonium nitrogen dynamics shows extremely high values in month fifteen and
extremely low values in month sixteen. Figures 5.3 and 5.4 show a summary of the
extreme ammonium behaviour observed. Leaching levels were unaffected by the unusual
values, inferring that only baseline levels were subject to extreme values. For the
purposes of analysis these extreme values (outliers) were removed as they may lead to
misleading results. The ammonium contribution towards overall nitrogen variables was
not changed. This was because the ammonium values may be part of an overall behaviour
and the extreme values are similar in magnitude to nitrate measures anyway.
116
Mean Chemical Nitrogen Levels
Mean Nitrogen (kgN/ha)
60
Nitrite
Nitrate
Ammonium
50
40
30
20
10
0
1
10 11 12 13 14 15 16 17 18 19
Month
Figure 5.3: Mean mineral nitrogen levels (kgN/ha) over the nineteen months.
Mean Chemical Nitrogen Dynamics
60
40
20
0
-20
Nitrite
Nitrate
-40
Ammonium
-60
1
10 11 12 13 14 15 16 17 18 19
Month
Figure 5.4: Mean mineral nitrogen dynamics (kgN/ha) over the nineteen months.
117
Nearly 76% of nitrite level readings were zero. This can be seen in the very small (where
visible) values for nitrite in Figures 5.3 and 5.4. The reason nitrite is present in small
quantities is that it is a midpoint in an ongoing chemical reaction (Blumfield, personal
communication). This fact combined with the large proportion of zero nitrite measures
suggests that meaningful results are unlikely. Therefore it was decided to not analyse
nitrite in isolation in this thesis.
As introduced in section 5.2, the original design was a randomised complete block (RCB)
with blocks being based on small differences in slope. These subtle slope differences
would not be expected to be a source of variation for any variables, which is against the
concept of a RCB design. Therefore a completely randomised design (CRD) may have
been more appropriate but possibly unsuitable practically.
The experimental situation for each different variable is a RCB at a number of different
times. It is a repeated measures over time situation because each variable is recorded at a
number of different times. The same variable recorded at exactly the same location each
month is likely to have a level of correlation between measures, meaning that month
cannot simply be included as a factor in a RCB design. Two main techniques are used in
this thesis to deal with the repeated measures nature of this experiment split plot and
MANOVA. Both of these techniques are variants of ANOVA suitable for analysis of
repeated measures. For more information on split plot designs and MANOVA refer to
section 2.3.
Twelve different variables are investigated in this thesis. Nine chemical variables result
from combinations of the three nitrogen variables nitrate, ammonium and total nitrogen
with the three types of measures levels, dynamics and leaching. The three biological
variables are microbial carbon, microbial nitrogen and the carbon to nitrogen ratio. Each
variable is investigated separately in line with the focus of this thesis on analysis over
time. Analysis including a number of these variables, particularly at a specific time, is a
possibility for future analysis. Sections 5.4.2 through to 5.4.8 detail the techniques and
models applied to every variable.
Techniques specifically developed for time series analysis are for the most part not suited
to this experimental situation. The main reason for this is that the time series involved are
118
very short, having at most only nineteen time periods. To create accurate univariate time
series models, original time series this short can rarely be used (Makridakis et al., 1998).
In particular, short time series makes it difficult (if not impossible) to retrieve an
estimation of seasonal effects. Particularly troubling is that multiple variable versions of
time series techniques require even longer base time series (McCleary and Hay, 1980;
Franses, 1998). With a length of (at most) nineteen time periods it is clear that accurate
multivariate time series models are not likely to be successful.
Seasons were created from the original months as an additional factor for analysis, under
the suspicion that behaviour of variables may be different depending on the season.
Furthermore, individual months may contain a large amount of noise or random variation
and a season based approach can somewhat control this issue. From the original nineteen
months, five appropriate seasons were identified. The seasons are shown in detail along
with the original time periods in Appendix C. In summary, months two to four form
season one (autumn 2000), months five to seven form season two (winter 2000), months
nine to eleven form season three (spring 2000), months twelve to fourteen form season
four (summer 2000/2001), and months fifteen to seventeen form season five (autumn
2001). Only the first four seasons are applicable to the biological data set because data
was recorded for only fourteen months. Note that four months data are lost from the
original nineteen in the conversion to seasons, due to inexact alignment with seasons and
because each month is 28 days.
The measure used for all chemical variables in analysis was kilograms of nitrogen per
hectare (kgN/ha). In the biological data set, the carbon to nitrogen variable is a ratio
while the other two variables are measured in micrograms per gram (g/g). During
analysis logged values were used but these modified values are not quoted in results.
Rather, back transformed means and standard errors are quoted when values are
appropriate. Back transformation simply involves reversing the original log
transformation. A back transformed mean is commonly labelled an equivalent mean
(Chaseling, personal communication) and will differ from the precise numerical mean.
A standard summary notation is used throughout to represent different factor levels. This
notation is for ease of analysis and brevity of reporting analysis results. The original data
119
set provided utilised a similar scheme. The notation used in this chapter is summarised in
Table 5.2.
Factor
Compaction
Level
No compaction (zero pass)
One pass
Sixteen pass
None
Disc Plough
Cultivation
Symbol
1
2
3
1
2
Table 5.2: Standard summary notation for factor levels.
Hypothesis tests are frequently conducted in data analysis, with results commonly quoted
with p values. Asterisks are commonly used to reflect the level of significance from
these p values as shown in Table 5.3.
Symbol
*
**
***
Interpretation
p < 0.05
p < 0.01
p < 0.001
Table 5.3: Legend for symbols denoting significance.
5.4.2
Exploratory Data Analysis (EDA)
Exploratory data analysis for each variable involved graphs of the variable over time for
all six treatments. That is, the variable is graphed over the nineteen (or fourteen) time
periods for each compaction and cultivation combination. Values for each compaction
and cultivation combination at each time are averaged over the three blocks, since in a
RCB design blocks should not interact with treatments.
In an effort to smooth the appearance of these graphs over time, a three month moving
average (3MA) was applied to every variable graph. Every month value was effectively
the average of itself, the month before and the month after it. Hence the smoothed graphs
have no values for the first and last months (because there is no month before the first or
after the last). These graphs were commonly more decipherable and clear than the
standard graphs.
120
A standard notation was used to represent compaction and cultivation levels over time in
exploratory data analysis graphs. This notation is shown in Figure 5.5, where the first
value is the level of compaction and the second is the level of cultivation. All graphs are
titled as being by compaction, cultivation, a reference to the notation symbolising
compaction first and the cultivation second.
Figure 5.5: Graphical notation for compaction and cultivation options.
5.4.3
Correlation Analysis
Correlation is looked at in detail to see if each variable has a relationship with physical
measures of rainfall, temperature and moisture. Correlation was calculated using
Pearsons correlation coefficient on transformed variable values (see section 5.4.1). Cross
correlation functions were used to give a picture of each variables correlation with a
number of lags of the physical measures. Up to and including four lags were applied to
allow for a delayed effect of up to four months. Where cross correlation functions are
included, only positive lags are included because the reverse relationships (eg. that a
variable effects rainfall) do not make practical sense.
For soil moisture, correlation is calculated by comparing moisture with the chosen
variable from every soil sample. This was done because every soil sample has a unique
soil moisture measure. The soil moisture quantity used was different depending on the
data set (see section 5.1). The moisture quantity used for each variable was the one given
in the appropriate data set.
Correlation coefficients involving temperatures and rainfall were calculated using

average variable values for each month. The reason for this is that hypothesis testing on
correlation requires that variable values are selected at random from normally distributed
populations (Zar, 1999). This assumption can clearly not be satisfied for rainfall and
121
temperature measures when the same nineteen (or fourteen) monthly measures are being
compared many times. In fact, it could be anticipated that given the form of the
correlation coefficient (see section 2.1.2), variable values in months with particularly
high and low rainfall or temperature will very strongly influence the correlation result.
5.4.4
Overall Split Plot Designs
Split plot designs over time can be applied to the Yarraman forestry data in both the
chemical and biological data sets. This section contains the models and information
pertaining to the use of these models for the twelve separate variables retrieved from the
data sets. Split plot models are discussed in detail in section 2.3.2.
The purpose of each model was to simultaneously investigate the effects of treatments,
blocks and seasons. In both data sets there is a repeated measures situation, where a
number of measures have been recorded repeatedly over time. Therefore, the split plot
design applied is form of a split plot over time. The split plot design used involves two
splits where the main plot contains compaction, cultivation and blocks, the subplot
contains seasons and the sub-subplot contains months within seasons.
The main plot tests cultivation, compaction and block main effects along with the
cultivation by compaction interaction for significance. The main plot error term is
comprised of block interactions. All tests of significance for factorial effects in the main
plot use values that are averaged over time, removing the effect of any correlation
between measures at different times. All factorial effects confined to the subplot share a
common correlation from repeated measures over season and a random, independent
error. This allows for factorial effects in the subplot to be effectively compared in tests of
significance. The error term in the subplots is formed by season and block interactions.
The sub-subplot contains replicates formed by months within season. Containing simply
replication (and interactions of replication), the sub-subplot therefore forms one large
third error term that is not of use for significance testing of any factorial effects.
Equation 5.5 presents the base overall split plot model form, where each factorial effect is
tested for significance using the error term to the right of it. The three separate error
122
1
2
3
, ijkl
and ijklm
) use block interactions that are not explicitly pointed out in this
terms ( ijk
form.
1
Measureijklm = + Compi + Cult j + (Comp Cult )ij + Block k + ijk
+ Seasonl + (Season Comp )il + (Season Cult ) jl
( 5.5 )
2
3
+ (Season Comp Cult )ijl + ijkl
+ ijklm
Where:
l = 1, 2, 3, 4, 5 (season level indicator).
m = 1, 2, 3 (month within season indicator).
The structure of the main, subplot and sub-subplot is shown in Table 5.4, along with the
numbers of degrees of freedom (df) for analysis for chemical and biological variables.
Degrees of freedom vary because of differences in months available for analysis. The
terms used as estimates of error are as per standard RCB designs. The factorial effects
used as error terms are shown in Table 5.4 using an asterisk (*). Note that the main plot
has ten error degrees of freedom while the subplots have at least 36. The main plot error
degrees of freedom are low but not low enough to cause concern.
123
Plot
Main
Main
Main
Main
Main
Main
Main
Subplot
Subplot
Subplot
Subplot
Subplot
Subplot
Subplot
Subplot
Sub-subplot
Sub-subplot
Sub-subplot
Sub-subplot
Sub-subplot
Sub-subplot
Sub-subplot
Sub-subplot
Source Of Variation
Compaction
Cultivation
CompactionCultivation
Block
BlockCompaction (*)
BlockCultivation (*)
BlockCompactionCultivation (*)
Season
SeasonCompaction
SeasonCultivation
SeasonCompactionCultivation
SeasonBlock (*)
SeasonCompactionBlock (*)
SeasonCultivationBlock (*)
SeasonCompactionCultivationBlock (*)
(Month:Season) (*)
(Month:Season)Compaction (*)
(Month:Season)Cultivation (*)
(Month:Season)CompactionCultivation (*)
(Month:Season)Block (*)
(Month:Season)CompactionBlock (*)
(Month:Season)CultivationBlock (*)
(Month:Season)Comp.Cult.Block (*)
Chem. Df
2
1
2
2
4
2
4
4
8
4
8
8
16
8
16
10
20
10
20
20
40
20
40
Biol. Df
2
1
2
2
4
2
4
3
6
3
6
6
12
6
12
8
16
8
16
16
32
16
32
Table 5.4: Structure and df in overall split plot ANOVA designs.
The number of degrees of freedom for all potential factorial effects as given in Table 5.4
is as expected if there were full and complete data for each variable. This is not always
the case as some data was not available or removed in the chemical data set. Therefore,
the actual degrees of freedom during analysis were at times less. Usually the lesser
degrees of freedom appeared in the months within season replication terms that were not
used in hypothesis testing anyway.
As with most split plot designs, factorial effects are tested for significance using a
standard ANOVA F-test except using different standard error measures for specific
factorial effects. All factorial effects in the main plot are tested using the main plot error,
and in the subplot using the subplot error.
124
Hypothesis testing in these split plot designs involves means. For example, a test of
significance for compaction tests the null hypothesis that compaction means are equal
where the alternative hypothesis is that not all means are equal. Being tested on values
averaged over the times, the main plot results present an overall picture and are
effectively very similar to the previous cumulative analyses used by Blumfield et al.
(2002).
Should season have a significant interaction with compaction or cultivation (or their
interaction), further interpretation is investigated by considering analyses for each season.
If there are not significant interactions involving season, multiple comparison tests are
conducted to reveal the nature of the differences. For information on season based
analyses refer to section 5.4.6 and for multiple comparison tests refer to section 5.4.8.
5.4.5
Overall MANOVA Designs
The multivariate analysis of variance (MANOVA) is one way to deal with repeated
measures situations such as found in the chemical and biological data sets. This section
contains the MANOVA models and associated important information for the application
of MANOVA to these data sets. MANOVA models are discussed in detail in section
2.3.3.
In the context of this application, MANOVA offers less flexibility than split plot analysis.
In particular, MANOVA does not provide an indication of equality of means for the
factor over which repeated measures were taken. A significant MANOVA result may be
a reflection of an overall effect or interactions involving the factor over which the
repeated measures are taken. Split plot designs evaluate both of these possibilities
separately.
The models in this section are the MANOVA equivalent of the split plot models
presented in section 5.4.4. If every month were to be included in MANOVA, there would
be nineteen (or fourteen) dependent variables. This number of dependent variables
exceeds the available error degrees of freedom and hence models of this type cannot be
applied due to overfitting. To fit these models would result in no feasible estimate of
125
natural variation. Therefore, the overall MANOVA models use seasons but do not
include the detail of months.
To achieve analysis using seasons but not including months, the data in each season was
averaged over the three months. This created a data set with four or five time periods
from the seasons and no month components. There were four seasons for biological
variables and five for chemical variables due to the original number of sampled months.
The general form of the models used is shown in Equation 5.6. Each model is for a
particular variable (eg. total mineral nitrogen dynamics or microbial nitrogen level).
Within each model there are five (or four) dependent variables resulting from the five (or
four) different seasons. Beyond this, the model has the ring of a standard ANOVA
model in appearance. The error term is formed from treatment and block interactions as
per standard RCB designs.
Measureijkm = m + Compactionim + Cultivation jm
+ (Compaction Cultivation )ijm + Block km + ijkm
( 5.6 )
Where:
Measureijkm refers to each variable measure at a particular compaction level i,
cultivation level j, block k and season m.
m is the mean level for the variable being analysed at season m.
m = 1, 2, 3, 4, 5 (season level indicator four in the case of biological variables).
ijkm represents the natural variation or error term. This is formed from the
interaction of the block with the other model components.
Four common test statistics are used in MANOVA Wilks lambda, Roys largest root,
Hotelling-Lawley trace and Pillais trace (Zar, 1999). Pillais trace tends to be the most
robust to deviations from strict MANOVA assumptions (Zar, 1999). Roys largest root is
not usually considered in isolation as it is an upper bound for the test statistic value.
126
Hypothesis tests in MANOVA are different to those in split plot models. In MANOVA,
the hypotheses are the combination of a number of univariate ANOVA hypotheses. For
example, the test of significance for compaction tests that variable means for the three
compaction levels are the same in every season (the same in season one, the same in
season two, etc.). The alternative hypothesis for this case is that there is a difference in
mean variable level between at least two compaction levels in at least one season. This
vague conclusion does not really tell much and hence season based MANOVA models
(section 5.4.7) are chosen as the next analysis step if significant relationships are found.
5.4.6
Season Based Split Plot Designs
The purpose behind season based split plot designs is to investigate the behaviour of
treatments within each separate season. These designs are only of relevance if significant
interactions with season are found from the overall split plot designs or MANOVA
significances need to be investigated in more detail. These models test for relationships
within particular seasons without consideration of overall behaviour.
For each variable, should season based split plot models be decided upon, five (or four)
different models are run. These five (or four) models are for each different season. Each
model is a basic split plot over time where treatments and the block are in the main plot
and time related factorial effects are in the subplot. Note that now month is the time unit
rather than season. Split plot models are discussed in detail in section 2.3.2.
The season based model is shown in Equation 5.7. There are three months because within
1
2
and ijkm
) use
each season there are only three months. The two separate error terms ( ijk
block interactions that are not explicitly pointed out in this model form.
1
Measureijkm = + Compi + Cult j + (Comp Cult )ij + Block k + ijk
+ Monthm + (Month Comp )im + (Month Cult ) jm
2
+ (Month Comp Cult )ijm + ijkm
127
( 5.7 )
Where:
m = 1, 2, 3 (month level indicator).
The structure of the main and subplots is shown in Table 5.5, along with the expected
number of degrees of freedom for analysis on chemical and biological variables. In this
case the degrees of freedom are identical for the chemical and biological data sets
because there is the same number of months in each season, irrespective of the data set.
The terms used as estimates of error are as per standard RCB designs. Note that the main
plot has ten error degrees of freedom while the subplot has 24. The main plot error
degrees of freedom are not high, but are not low enough to be considered a problem.
Plot
Main
Main
Main
Main
Main
Main
Main
Subplot
Subplot
Subplot
Subplot
Subplot
Subplot
Subplot
Subplot
Source Of Variation
Compaction
Cultivation
Block
BlockCompaction (*)
BlockCultivation (*)
BlockCompactionCultivation (*)
Month
MonthCompaction
MonthCultivation
MonthCompactionCultivation
MonthBlock (*)
MonthCompactionBlock (*)
MonthCultivationBlock (*)
MonthCompactionCultivationBlock (*)
Chem. Df
2
1
2
2
4
2
4
2
4
2
4
4
8
4
8
Biol. Df
2
1
2
2
4
2
4
2
4
2
4
4
8
4
8
Table 5.5: Structure and df in seasonal split plot ANOVA designs.
Hypothesis testing in these split plot designs involves means. Should significant
differences be found, the exact nature of these significances is investigated further using
multiple comparison tests. The techniques involved in the multiple comparison tests
undertaken are given in section 5.4.8.
128
5.4.7
Season Based MANOVA Designs
Season based MANOVA designs are to investigate treatment and other effects within
each separate season. As is the case with the split plot equivalent in section 5.4.6, these
tests are only relevant if the overall MANOVA had significances or significant
interactions with season were found in the overall split plot designs.
Four or five season based MANOVA models are run for each variable depending on the
number of seasons. Each model contains three dependent variables, one for each month.
MANOVA models are discussed in detail in section 2.3.3.
The season based model is shown in Equation 5.8. The error term is formed from the
interaction of treatments and the block as per standard RCB designs.
Measureijkm = m + Compactionim + Cultivation jm
+ (Compaction Cultivation )ijm + Block km + ijkm
( 5.8 )
Where:
Measureijkm refers to each variable measure at a particular compaction level i,
cultivation level j, block k and month m.
m is the mean level for the variable being analysed at month m.
m = 1, 2, 3 (month level indicator).
ijkm represents the natural variation or error term. This is formed from the
interaction of the block with the other model components.
When a factorial effect is significant in MANOVA, the conclusion is rather vague and
unspecific. For example, should the compaction factor be significant in the season
based MANOVA model presented here, the conclusion is that there are significant
differences in the mean variable level between at least two compaction levels during at
129
least one month. This conclusion should be investigated further using multiple
comparison tests. In MANOVA this is achieved by reverting back to univariate
MANOVA models and applying standard multiple comparison tests. More information
on multiple comparison tests used in this analysis is provided in section 5.4.8.
5.4.8
Multiple Comparison Tests
Multiple comparison tests were used to find where exactly differences lie, should
significant differences be found between factorial effect levels. For example, if
compaction is found to be significant in a model, multiple comparison tests can tell
exactly which compaction levels are significantly different from each other.
Multiple comparison tests are conducted in this thesis using pair wise t-tests, though other
methods may be just as appropriate. Fundamentally, a multiple comparison test compares
two means for significant differences. Where software was not available to automate
multiple comparison tests, they were done manually using the pair wise t-test based least
significant mean difference formula in Equation 5.9.
1
1
Significance Level
+
LSD = t edf
s
2
n1 n2
( 5.9 )
Where:
edf is the error degrees of freedom.
s is an approximation of error standard deviation (commonly the root error mean

square).
n1 and n2 are the number of observations in the two samples being compared for
equality of means.
Often there are lot of multiple comparison tests involved in a full investigation. Using a
standard allowable error rate of 0.05, one in twenty tests will return significant results
purely by chance. A common approach available to combat this problem, used in this
thesis, is the Bonferroni approach. The Bonferroni approach involves a simple
modification to the significance level as determined by the number of multiple
comparison tests involved. The standard allowable error level is divided by the number of
130
multiple comparison tests to take place. For example, given a standard allowable error of
0.05 and twenty multiple comparison tests, the new allowable error is 0.0025 (0.05 / 20).
All graphs showing treatment means and multiple comparison results follow a standard
format. The lines above and below each mean each display one standard error of the
mean. Means are all assigned at least one letter. Means that share the same letter are not
significantly different. Where multiple months are shown on one diagram, each month is
considered separately. All means and standard errors shown are calculated by back
transformation from the logged values used in analysis (see section 5.4.1).
131
5.5
Data Analysis and Results
Full analysis of each chemical and biological variable is contained in this section. Each of
the twelve variables is analysed separately by exploratory data analysis, investigation into
correlation and the application of split plot and MANOVA models. The technicalities and
model specifics for these applications is reviewed in section 5.4.
5.5.1
Nitrate Levels
Nitrate levels are investigated in this section to find out what can be seen to be
influencing these levels. Raw data, graphs and results are contained in Appendix E.
Exploratory data analysis of chemical nitrate levels revealed a rather confused

intermingling of treatments over time. Earlier on, nitrate levels appear to be lowest where
sixteen compactions were applied and highest under one compaction, irrespective of
cultivation. Beyond the initial months, though, clear relationships faded and towards the
end no ploughing and no compaction tends to have the lowest nitrate levels and sixteen
compactions with disc plough cultivation the highest. Applying a moving average
smoother accentuated these relationships (or rather, lack thereof). Both graphs showed
that nitrate levels are gradually increasing over time. The exploratory data analysis
revealed that treatment effects clearly differ over time as treatments showed a lack of
consistent trends.
Investigating the correlation of nitrate levels with rainfall, maximum temperature,

minimum temperate and soil moisture revealed only one significant correlation. Soil
moisture was found to have significant correlation with nitrate levels in the following
month (p = 0.032). The strength of this correlation was weak (r = 0.11917) and could be
regarded as spurious, given that twenty correlation tests were taken at a significance level
of 0.05.
Overall split plot and MANOVA designs found an interaction between the season and
compaction levels. In particular, the split plot design had a strongly significant season by
compaction interaction factorial effect (p < 0.0001). This indicates that the effect of
132
different compaction levels depends on the season. The overall MANOVA model quoted
the compaction effect as being significant for Roys largest root (p = 0.0153) and close to
significant (p < 0.10) for all of the other test statistics. As previously discussed,
significant effects in MANOVA can be a reflection of an overall effect or an interaction
involving that effect and the factor over which repeated measures were taken. In this
case, given the significant split plot interaction, the MANOVA result is a reflection of an
interaction of compaction with season. Due to the significant interaction, main effects are
not investigated because results may be misleading. Instead, each season is looked at
using season based models to investigate behaviour within each season.
Strong significant relationships involving compaction were found in the first season,
which covers months two to four. Significant results were seen for compaction in both
the split plot design (p = 0.0007) and MANOVA (p < 0.05 for all four test statistics)
season based models in the first season. Investigation using multiple comparison tests
found that there were significantly lower amounts on nitrate when sixteen pass
compaction was applied compared to the other two compaction levels (both p < 0.005).
Figure 5.6 presents these results graphically using back transformed means and standard
errors. The only other effect found significant in season one was the month factor in the
split plot design. This tells that there are significant differences in mean nitrate levels
between the three months in season one. Month differences are not a priority for analysis
and are therefore not further investigated.
133
Nitrate Levels By Compaction in Season 1
Mean Nitrate (kgN/ha) .
40
35
None
30
25
20
1 Pass
15
10
16 Pass
5
0
1
Season
Figure 5.6: Back transformed means ( S.E.) for compaction effects on mean nitrate
levels in season one.
The significant compaction effect is also present in the second season, which covers
months five to seven. In this instance the compaction effect is not as strong but still
significant in both the split plot (p = 0.0277) and MANOVA (p < 0.05 for three of the
four test statistics) season based designs. At the 0.05 significance level, sixteen pass
compaction once again leads to significantly different mean nitrate levels compared to the
other two compaction levels (p < 0.05). However, using the Bonferroni modified
significance level for multiple comparison tests, only the sixteen pass and zero pass
compaction levels are significantly different (p = 0.012). Back transformed means,
standard errors and significant differences are shown in Figure 5.7. No other factorial
effects were significant in the season based split plot and MANOVA designs in season
two.
134
Nitrate Levels By Compaction in Season 2
40
35
None
30
25
1 Pass
20
15
10
16 Pass
5
0
2
Season
Figure 5.7: Back transformed means ( S.E.) for compaction effects on mean nitrate
levels in season two.
No significant effects from compaction (or cultivation) were found in the remaining three
seasons for nitrate levels. The only significant effects were for month in the third (p =
0.0048) and fourth (p = 0.02) seasons using the split plot designs. These simply indicate
differences in mean nitrate levels between months and are not investigated further as they
are not of specific interest.
In summary:
Compaction significantly affected nitrate levels in the first and second seasons. In
the first season, sixteen pass compaction led to significantly lower nitrate levels
than the other two compaction levels. In the second season, sixteen pass
compaction led to significantly lower nitrate levels than the zero pass compaction
level.
Cultivation was not found at any point to have a significant influence on nitrate
levels.
The block was not found to have a significant influence on nitrate levels, but
should have given the use of an RCB design.
135
Months within seasons commonly have differences in mean nitrate level. This
was the case in the first, third and fourth seasons.
5.5.2
Ammonium Levels
Ammonium levels are investigated in this section to find out what can be seen to be
influencing these levels. Raw data, graphs and results are contained in Appendix F.
Exploratory data analysis (EDA) on ammonium levels found extreme values in month
sixteen that were subsequently omitted from analysis. The mass of unclear trends
appeared slightly more palatable with the use of a moving average smoother. The clearest
relationship seen was that, in general, ammonium levels were higher when the disc
plough was used. In particular, with none and one pass compaction mean ammonium
levels were particularly low with no cultivation and particularly high with use of the
plough. Over time, ammonium levels were regularly changing and rather unstable.
Correlation analysis looked at possible correlation of ammonium levels with rainfall,

maximum temperature, minimum temperature and soil moisture. This revealed three
significant correlations, all of which are suspected to be spurious. The significant
correlations suggested that rainfall effects ammonium four months later (p = 0.004), and
that soil moisture effects ammonium three (p = 0.025) and four (p = 0.026) months later.
Should a Bonferroni modification be used on the significance level for correlation, none
of these seemingly spurious relationships would be significant.
A plethora of significant effects appear in the overall split plot and MANOVA designs
for ammonium levels. It is clear that, as opposed to nitrate, cultivation plays a part in
ammonium levels. The split plot design revealed a significant season interaction with
cultivation (p = 0.0319) while the MANOVA design a significant cultivation effect (p =
0.0458 for all test statistics). Both overall models hint at a possible interaction of
compaction and cultivation. The exact nature is difficult to consider from the overall
models since only two of the four MANOVA test statistics found the interaction
significant (p < 0.05) and although the term is significant in the split plot design (p =
136
0.0082) it is also known that cultivation interacts with season. To obtain a clearer picture
of the influences on ammonium levels, season based models were evaluated.
In the first season, an interaction was found between compaction and cultivation. The
interaction was clearly significant in both the split plot (p < 0.0001) and MANOVA (p <
0.05 for all test statistics; p < 0.001 for two) season based models. Multiple comparison
tests found many significant differences in mean ammonium levels. Figure 5.8
graphically displays the following significant differences:
At the zero pass compaction; there was significantly more ammonium when the
plough was used for cultivation.
At the sixteen pass compaction the opposite was true, where there was
significantly less ammonium when the plough was used.
When there was no cultivation, significantly more ammonium was present under
sixteen pass compaction than the other two compaction levels.
When there was disc plough cultivation, there was significantly more ammonium
at the zero pass compaction than the one pass compaction.
137
Ammonium Levels By
Compaction, Cultivation in Season 1
10
0 Pass, None
Mean Ammonium (kgN/ha)
9
8
0 Pass, Plough
7
6
1 Pass, None
5
1 Pass, Plough
4
3
16 Pass, None
2
1
16 Pass, Plough
0
1
Season
Figure 5.8: Back transformed means ( S.E.) for compaction and cultivation effects on
mean ammonium levels in season one.
In the second season the effect of compaction and cultivation depends on the particular
month. The season based split plot design found a significant interaction between month
and compaction (p = 0.037) as well as between month and cultivation (p = 0.0221).
MANOVA also found a significant effect for cultivation (p = 0.0043 for all test statistics)
and compaction (p < 0.05 for all). MANOVA significances can be a reflection of an
overall effect or an interaction with the factor over which repeated measures are taken. In
this case it is clear from the split plot design results that the MANOVA result is an
indication of compaction and cultivation interactions with month. Note that there is no
significant compaction and cultivation interaction in season two, unlike in season one.
The exact nature of the season two compaction by month interaction and the cultivation
by month interaction are investigated using multiple comparison tests within each
particular month. No significant differences in ammonium means resulted from different
138
compaction levels in any month. Back transformed means and results are shown for these
compaction multiple comparison tests in Figure 5.9. In respect to cultivation, only a
significant difference was found in month six, where there was significantly more
ammonium when the disc plough was used. Back transformed means and results are
shown for these cultivation multiple comparison tests in Figure 5.10.
Ammonium Levels By Compaction in Season 2

10
9
8
7
6
5
4
3
2
1
None
1 Pass
16 Pass
0
5
Month
Figure 5.9: Back transformed means ( S.E.) for compaction effects on mean ammonium
levels in season two (each month separately).
139
Ammonium Levels By Cultivation in Season 2

10
None
Disc Plough
9
8
7
6
5
4
3
2
1
0
5
Month
Figure 5.10: Back transformed means ( S.E.) for cultivation effects on mean ammonium
levels in season two (each month separately).
The situation in the third season was similar to that in season one except the relationships
were not as strong. The split plot design found a significant interaction between
compaction and cultivation (p = 0.03). In the MANOVA design, only Roys largest root
had a significant interaction result (p = 0.0199) but all test statistics had p values under
0.15. Cultivation clearly has a strong effect as it was significant in both the split plot (p =
0.0006) and MANOVA (p = 0.0101 for all), but this effect is known to depend on
compaction level because of the interaction. Multiple comparison tests revealed the
nature of the interaction between compaction and cultivation. For both the zero pass and
one pass compaction levels, significantly higher ammonium levels were found when the
plough was used (as opposed to no cultivation). These multiple comparison test means
and results are shown graphically in Figure 5.11.
140
Ammonium Levels By
10
0 Pass, None
9
8
0 Pass, Plough
7
6
1 Pass, None
5
1 Pass, Plough
4
3
16 Pass, None
2
1
16 Pass, Plough
0
3
Season
mean ammonium levels in season three.
Significant differences were common between months in the season based models. The
first (p = 0.002), fourth (p < 0.0001) and fifth (p <0.004) seasons all had significant
month factors as determined by the split plot designs. This tells that mean ammonium
levels differ between months but this is not investigated further as it is not the focus of
investigation. With the exception of month, there were no significant factorial effects
found in months four and five. As was the case with nitrate levels, effects of compaction
and cultivation seem to wear off over time.
In summary:
Both compaction and cultivation have varying effects on ammonium levels for the
first three seasons. These seasons cover the months two through to eleven
(excluding eight). The exact nature of the behaviour in each of the seasons is as
follows:
141
o In the first season, there was an interaction between compaction and

cultivation. When there was no compaction, there was significantly more
ammonium when plough cultivation was used. With sixteen pass
compaction, there was significantly more ammonium when no cultivation
was used. When there was no cultivation, significantly more ammonium
was present under sixteen pass compaction than the other two compaction
levels. When there was disc plough cultivation, there was significantly
more ammonium at the zero pass compaction than the one pass
compaction.
o There was an interaction between compaction and month along with
cultivation and month is the second season. The ammonium means from
different compaction levels were completely different in every month with
no specific significant differences. In the second month, there was
significantly more ammonium when the disc plough was used (as opposed
to no cultivation).
o There was an interaction between compaction and cultivation in the third
season. There was significantly more ammonium with disc plough
cultivation than with no cultivation for the zero and one pass compaction
levels.
The block was not found to have a significant influence on ammonium levels, but
should have given the use of an RCB design.
Months within seasons commonly had differences in mean ammonium level. This
was the case in the first, fourth and fifth seasons.
5.5.3
Total Mineral Nitrogen Levels
Total mineral nitrogen levels are investigated in this section to find out what can be seen
to be influencing these levels. Raw data, graphs and results are contained in Appendix G.
Exploratory data analysis using raw and smoothed time series for each treatment over
time showed different behaviour at different times. Early on, one pass compaction with
disc plough cultivation resulted in generally higher total mineral nitrogen levels. The
other treatment combinations were more or less interchangeable, with sixteen pass
142
compaction tending to have the least total mineral nitrogen levels (both cultivation
levels). Central months visibly lack distinctive patterns or consistency from month to
month. In the later months the sixteen pass compaction with disc plough applied tended
to have the highest total nitrogen while the zero pass compaction with no cultivation the
lowest.
Analysis using correlation revealed a number of possible relationships between total

nitrogen and physical properties. Should the Bonferroni modification be applied,
however, none of the significant relationships would be significant. The most plausible
possible relationships are minimum temperature lagged by one month (p = 0.0417) and
soil moisture with no lag (p = 0.0058).
The total mineral nitrogen level situation was first investigated using overall split plot
and MANOVA designs. The split plot design revealed a significant interaction of season
with compaction (p = 0.0127). Interestingly, the MANOVA overall design did not give
any significant relationships whatsoever, including for compaction. In the case of an
interaction between season and compaction, it would be anticipated that MANOVA
would have a significant result for compaction, but most p values for compaction in the
MANOVA are between 0.25 and 0.26. Season based MANOVA and split plot models
were used to find the exact nature of the exhibited behaviour.
In the first season the split plot design found compaction significant (p = 0.019) while the
MANOVA results were not as clear. One of the four MANOVA test statistics found
compaction significant (p = 0.0443) while two found the compaction by cultivation
interaction significant (p < 0.05). The MANOVA test statistic most robust to departures
from strict statistical assumptions, Pillais trace, was nowhere near significant for the
compaction by cultivation interaction (p = 0.1509). Therefore, given this fact and the lack
of significance for this interaction in the split plot design (p = 0.4628), the interaction was
not investigated for the time being. Investigation of the main effect of compaction during
season one found that total nitrogen levels are significantly higher when there is one pass
compaction compared to sixteen pass compaction. Means and significant differences
during season one resulting from compaction are shown in Figure 5.12.
143
Total Mineral Nitrogen Levels By Compaction in

Season 1
40
35
None
30
25
20
1 Pass
15
10
16 Pass
5
0
1
Season
Figure 5.12: Back transformed means ( S.E.) for compaction effects on mean total
mineral nitrogen levels in season one.
The only other season to show significant differences relating to compaction or

cultivation was season three. The compaction by cultivation interaction was found to be
significant using an overall split plot design (p = 0.0321). The interaction was only
significant for Roys largest root in MANOVA (p = 0.029) but returned p values under
0.15 for all test statistics. Multiple comparison tests using the Bonferroni modification
found no significant differences in mean total mineral nitrogen levels from different
compaction and cultivation combinations. Without the Bonferroni modification some
comparisons are significant (p < 0.05). Figure 5.13 shows the means and (lack of)
significant differences for compaction and cultivation combinations in season three.
144
Total Mineral Nitrogen Levels By

40
0 Pass, None
35
0 Pass, Plough
30
25
1 Pass, None
20
1 Pass, Plough
15
16 Pass, None
10
5
16 Pass, Plough
0
3
Season
mean total mineral nitrogen levels in season three.
There are significant differences in mean total mineral nitrogen levels between months in
a number of seasons. Given the somewhat erratic nature of the exploratory data analysis
graphs this does not come as a surprise. Significant differences between months are
present in the first (p = 0.0001), third (p = 0.017), fourth (p = 0.0038) and fifth (p <
0.0001) seasons.
The block was close to having a significant effect in seasons two and five. This was
because the upper bound Roys largest root test statistic in MANOVA was significant
in these two seasons. Furthermore, block was close to being significant in the second
season split plot design (p = 0.075). None of these results confirmed a significant block
effect at any point.
145
In summary:
In season one there are significant differences in mean mineral nitrogen resulting
from different compaction levels. There was significantly more total nitrogen
where one pass compaction had been applied than where sixteen pass compaction
was applied.
The block was not found to have a significant influence on mean total mineral
nitrogen levels, but should have given the use of an RCB design.
Months within seasons commonly have differences in mean total mineral nitrogen
level. This was the case in the first, third, fourth and fifth seasons.
5.5.4
Nitrate Dynamics
Nitrate dynamics are investigated in this section to find out what can be seen to be
influencing these levels. Raw data, graphs and results are contained in Appendix H.
Exploratory data analysis assisted slightly in deciphering the complex, seemingly

unrelated trends formed by the treatments over time. Even in the smoothed graphical
form relationships were unclear. In the earlier and later months there are no clear trends,
with treatments orders commonly changing dramatically from month to month. The only
consistency over time can be found in the middle months where the one pass compaction,
disc plough cultivation and sixteen pass compaction, disc plough cultivation clearly have
higher nitrate dynamics.
Correlation analysis clearly showed that rainfall, maximum temperature, minimum

temperature and soil moisture are not correlated with nitrate dynamics. None of the
twenty separate correlation tests revealed a significant result as all had p values over 0.05.
Not a lot was revealed by overall split plot and MANOVA models. The split plot model
revealed a three way interaction between season, compaction and cultivation (p =
0.0196). This can be interpreted as meaning that the behaviour of the interaction between
compaction and cultivation on nitrate dynamics depends on the particular season. This is
a fair enough call given the rather inconsistent behaviour shown in graphs created during
146
exploratory data analysis. The equivalent factorial effect to suggest a season, compaction
and cultivation interaction in MANOVA is the compaction by cultivation interaction. The
compaction by cultivation interaction was, however, not significant in MANOVA (p >
0.05 for all test statistics). To investigate the nature of the interaction between season,
compaction and cultivation found in the split plot model, season based MANOVA and
split plot models were the next step.
During season one, split plot and MANOVA revealed two significant relationships. The
first and most notable is an interaction between month and cultivation from the seasonal
split plot design (p = 0.0113). Upon further investigation it was found that there was a
significantly higher level of nitrate dynamics in month four when there is no cultivation
compared to when disc plough cultivation is used (p = 0.0099). Other months in the
season had higher dynamics when the disc plough was used (but not significantly more).
This cultivation situation is shown graphically in Figure 5.14. The equivalent seasonal
MANOVA model did not find cultivation significant (p = 0.1003 for all test statistics) but
instead found the block significant (p < 0.05 for three of the four test statistics). The
season one split plot model was close to finding the block term significant (p = 0.0547).
The block term was not found to be significant in any other season and is not investigated
further as it is not the focus of this case study.
147
Nitrate Dynamics By Cultivation in Season 1

4
None
Disc Plough
3.5
3
2.5
2
1.5
1
0.5
0
2
Month
Figure 5.14: Back transformed means ( S.E.) for cultivation effects on mean nitrate
dynamics in season one (each month separately).
Significant relationships beyond the first season were few and far between. The split plot
designs revealed a significant difference in mean nitrate dynamics levels between the
three months in the fourth season. The MANOVA design in the final season was close to
finding a significant interaction between compaction and cultivation, with upper bound
Roys largest root having a p value under 0.05.
In summary:
There was an interaction between month and cultivation in season one on nitrate
dynamics. Investigating further revealed the only significant difference to be in
the third month (month four), where there was significantly more nitrate dynamics
when there was no cultivation.
The block was significant using MANOVA (and very close to being significant
using a split plot design) in season one. At no other season was the block
significant, as it should be to adhere to a RCB design.
148
There is a significant difference in mean nitrate dynamics in season four between

the three months.
5.5.5
Ammonium Dynamics
Ammonium dynamics are investigated in this section to find out what can be seen to be
influencing these levels. Raw data, graphs and results are contained in Appendix I.
The graphs created during exploratory data analysis provided few clues to relationships
between compaction, cultivation and ammonium dynamics. Early on the sixteen pass
compaction, no cultivation treatment had the lowest level of ammonium dynamics. For
the remainder of the times there are no clear common trends in the time series resulting
from the different treatments. The performance of a treatment appears to be largely
dependent on the particular month. The smoothed graph removes a lot of the variability
associated with each particular month and shows that those treatments where disc plough
cultivation has occurred tend to have higher ammonium dynamics during the middle
months. Extremely high values observed in month fifteen and extremely low values in
month sixteen were removed prior to analysis to prevent misleading results.
Correlation analysis found a significant negative correlation with soil moisture (p =

0.0311). This tells that the more soil moisture, the less ammonium dynamics there will
be. However, if a Bonferroni modification was applied to the twenty tests of correlation,
none would be found to be significant. Therefore caution should be applied because this
relationship between ammonium dynamics and soil moisture may be spurious.
Overall split plot and MANOVA design results were evaluated to investigate influences
on ammonium dynamics. The overall split plot design found a significant interaction
between compaction and cultivation (p = 0.0464). MANOVA only found this interaction
significant for Roys largest root (p = 0.0399) but all p values were under 0.15. Further
investigation using multiple comparison tests revealed no significantly different means
resulting from compaction and cultivation combinations, as shown in Figure 5.15.
Without the Bonferroni modification on the multiple comparison tests two significant
differences appear. These are between the one pass and sixteen pass compaction levels
149
when the disc plough is used and between the two cultivation levels when sixteen pass
compaction is used.
Ammonium Dynamics By Compaction, Cultivation

4
0 Pass, None
3.5
3
0 Pass, Plough
2.5
1 Pass, None
2
1 Pass, Plough
1.5
1
16 Pass, None
0.5
16 Pass, Plough
0
1-5
Season
mean ammonium dynamics.
The overall split plot design also found a significant season effect (p = 0.0036), a simple
reference to mean ammonium levels being different between seasons. This relationship is
not investigated further because it is not of interest. Season based models were not
investigated because there was no evidence of interactions involving the season,
indicating similar behaviour (or lack thereof) across all seasons.
In summary:
There is an overall interaction between compaction and cultivation on ammonium

dynamics. Significant differences between treatment means are only present if the
Bonferroni modification is not applied. These are between the one pass and
150
sixteen pass compaction levels when the disc plough is used and between the two
cultivation levels when sixteen pass compaction is used.
The block does not have a significant effect on ammonium dynamics.
There is a significant difference in mean ammonium dynamics between different

seasons.
5.5.6
Total Mineral Nitrogen Dynamics
Total mineral nitrogen dynamics are investigated in this section to find out what can be
seen to be influencing these levels. Raw data, graphs and results are contained in
Appendix J.
The picture given by graphs during exploratory data analysis for total mineral nitrogen
dynamics is similar to that given in nitrate and ammonium dynamics. That is, there is not
very much to see. Nitrogen dynamics slowly rise from the start before peaking at month
fifteen, savagely dropping in month sixteen and somewhat returning to normal in month
seventeen. For many months it appears that treatments where the disc plough was used
had more total mineral nitrogen dynamics.
Total mineral nitrogen dynamics were compared with lags of rainfall, minimum
temperature, maximum temperature and soil moisture to see if there were any significant
correlations. No correlation tests were found to be anywhere near significant (all p >
0.10). Therefore, it would appear that there is no relationship between total mineral
nitrogen dynamics and rainfall, temperature or soil moisture.
Overall split plot and MANOVA designs revealed very little about the nature of the
influences on total mineral nitrogen dynamics. The only term significant in either model
was season in the split plot design (p < 0.0001). That is, there is a different mean level of
mineral nitrogen dynamics depending on the season. This simply reflected the
exploratory data analysis graphs that showed different dynamics at different times.
151
In summary:
It is unclear what exactly is influencing total nitrogen dynamics, except for time.
Mean total nitrogen dynamics were found to be significantly different depending
on the season.
5.5.7
The block was not significant, as it should have been for a RCB design.
Nitrate Leaching
Nitrate leaching levels are investigated in this section to find out what can be seen to be
influencing these levels. Raw data, graphs and results are contained in Appendix K.
Exploratory data analysis involved the creation of graphs to observe the behaviour of
different treatments on nitrate leaching over time. The clearest relationship was that those
treatments where disc plough cultivation was involved tended to have higher levels of
nitrate leaching. The cultivation differences given, the trends between compaction levels
were inconsistent and varied erratically through the experiment. A smoothed version of
the original graph emphasised the apparent differences in leaching levels between the two
cultivation levels.
Nitrate leaching was found not to be correlated with rainfall, temperature or soil
moisture. This was discovered by using a number of correlation significance tests
between nitrate leaching and up to four lags of rainfall, minimum temperature, maximum
temperature and soil moisture. The p value for every correlation test was above 0.05.
The overall split plot result reaffirmed the suspicion that cultivation levels may be
effecting mean nitrate leaching levels. The cultivation factor was found to be significant
in the split plot design (p = 0.0383) but not in the MANOVA (p = 0.2197 for all test
statistics). The reason for this inconsistency is due to the nature of the hypotheses being
tested in split plots and MANOVA. In this case, MANOVA is testing for equality of
mean nitrate leaching in each season while the split plot tests for equality of nitrate
leaching means averaged over the five seasons. That is, the split plot test is looking more
at the bigger picture while the MANOVA is focusing on the details. Multiple
152
comparison tests found overall leaching levels significantly higher (p = 0.0383) when the
disc plough was used as opposed to when no cultivation was applied. The nitrate leaching
means and cultivation level comparison is shown graphically in Figure 5.16.
Nitrate Leaching By Cultivation

1.6
1.4
1.2
None
1
0.8
0.6
0.4
Disc Plough
0.2
0
1-5
Season
Figure 5.16: Back transformed means ( S.E.) for cultivation effects on mean nitrate
leaching.
The only other significant result from the overall designs was for season in the split plot
design (p = 0.0047). This is simply a reflection of different mean nitrate leaching in
different seasons and is not of interest for further analysis.
In summary:
Cultivation significantly affects mean nitrate leaching levels. Significantly more

mean nitrate leaching occurs when the disc plough is used.
No evidence was provided that compaction has any influence on nitrate leaching.
Season was found to significantly affect nitrate leaching. This reflects differences
in mean nitrate leaching at different times.
The block was not found to be significant when it should have been because of
the RCB design.
153
5.5.8
Ammonium Leaching
Ammonium leaching levels are investigated in this section to find out what can be seen to
be influencing these levels. Raw data, graphs and results are contained in Appendix L.
Raw and smoothed graphs were created for exploratory data analysis to investigate the
effects of different compaction and cultivation levels over time. Levels of ammonium
leaching appeared to be highest when the disc plough was used, while differences
between compaction levels were far less clear. Precise behaviour again varies from
month to month and treatments lack distinct trends overall.
Ammonium leaching levels were found not to be correlated with rainfall, temperature or
soil moisture. A series of correlation significance tests found no p values under 0.20
which clearly stated a lack of correlation of ammonium leaching with rainfall, maximum
temperature, minimum temperature and soil moisture.
Overall split plot and MANOVA designs investigated possible effects on mean levels of
ammonium leaching. The overall split plot reported a significant cultivation effect (p =
0.0132) which was not significant under MANOVA (p = 0.15156). The reason for this is
anticipated to be that cultivation levels have an overall effect but not a strong enough
significance in any particular season to given a significant MANOVA result. Multiple
comparison tests reveal that overall ammonium leaching levels are higher when the disc
plough is used (p = 0.0132). This result is shown graphically in Figure 5.17.
154
Ammonium Leaching By Cultivation

1.6
1.4
1.2
None
1
0.8
0.6
0.4
Disc Plough
0.2
0
1-5
Season
Figure 5.17: Back transformed means ( S.E.) for cultivation effects on mean ammonium
leaching.
The overall MANOVA returned a significant result for compaction using the upper
bound Roys largest root with a p value of 0.0274. All other compaction test statistics
were not close to being significant though, with p values over 0.10. Interestingly, the p
value for compaction in the split plot design was a lot higher at 0.3504. These results do
not provide sufficient evidence that compaction is affecting ammonium leaching. Season
was found to be significant in the split plot model (p = 0.0089), a reflection of different
mean ammonium leaching levels in different seasons. Overall, no interactions with
season were found in the split plot design and there were no clearly significant factors in
the overall MANOVA. There is therefore no motivation for looking at season based
models.
In summary:
Overall, cultivation significantly affected ammonium leaching. Significantly

higher levels of ammonium leaching occurred where the disc plough was used
compared to where no cultivation was applied.
No evidence was provided that compaction has any influence on nitrate levels.
155
Season was found to be significant. This simply informs that mean ammonium
leaching levels are different depending on the season.
5.5.9
The block was not significant when it should have been for a RCB design.
Total Mineral Nitrogen Leaching
Total mineral nitrogen leaching levels are investigated in this section to find out what can
be seen to be influencing these levels. Raw data, graphs and results are contained in
Appendix M.
Graphs created for exploratory data analysis presented a similar picture for total mineral
nitrogen leaching as for nitrate and ammonium leaching. Higher levels of total mineral
nitrogen leaching tend to be present where the disc plough was used for cultivation.
There is not a clear relationship depending on the level of compaction. Mean levels of
total mineral nitrogen leaching differ depending on the month, with central months
having the highest levels.
Correlation analysis revealed no significant relationships between total mineral nitrogen

leaching and rainfall, minimum temperature, maximum temperature or soil moisture. All
twenty tests of the correlation significance resulted in p values over 0.05.
Overall split plot and MANOVA designs were used to establish influences on total
mineral nitrogen leaching. Cultivation was found to be significant using split plot designs
(p = 0.0482) but not using MANOVA (p = 0.2262). This unusual result is considered a
reflection of there being a significant overall effect but a lack of significant effects in
specific seasons. Investigation into the overall situation found mean mineral nitrogen
leaching to be significantly higher when the disc plough is used (p = 0.0482) as shown in
Figure 5.18.
156
Total Mineral Nitrogen Leaching By Cultivation

1.6
1.4
1.2
None
1
0.8
0.6
0.4
Disc Plough
0.2
0
1-5
Season
Figure 5.18: Back transformed means ( S.E.) for cultivation effects on mean total
mineral nitrogen leaching.
The split plot design also revealed significant differences in mean nitrogen leaching
levels depending on season (p = 0.0004). This is because there are different levels of total
mineral nitrogen leaching in different seasons. The block was nowhere near being
significant in either the overall split plot or MANOVA designs.
In summary:
There is an overall effect of cultivation on total mineral nitrogen leaching. Where

the disc plough was used there was significantly more nitrogen leaching.
No evidence was provided that compaction has any influence on total nitrogen
leaching.
Mean total nitrogen leaching levels depend on the particular season.
The block was not significant when it should have been for a RCB design.
157
5.5.10
Microbial Carbon Levels
Microbial carbon levels are investigated in this section to find out what can be seen to be
influencing these levels. Raw data, graphs and results are contained in Appendix N.
Graphs created during exploratory data analysis for microbial carbon levels revealed
interesting behaviour that was clearly changing over time. Four treatments were relatively
consistent over time while two treatments where the plough was used vary in a systematic
and interesting way, as shown by the smoothed graph in Figure 5.19.
Microbial Carbon Levels By Compaction, Cultivation

(3MA Smoothed)
1000
900
Mean Microbial Carbon (g/g)
800
700
600
500
400
300
200
100
0, None
1, None
16, None
0
1
0, Plough
1, Plough
16, Plough
10
11
12
13
14
Month
Figure 5.19: Microbial carbon levels by compaction and cultivation over time.
158
Correlation analysis found that microbial carbon levels are strongly correlated (p <
0.0001) with soil moisture. The correlation coefficient is 0.286, indicating that while it is
not a strong correlation; it is clearly a significant one (from the p value). Figure 5.20
shows a graphical cross correlation function resulting from calculating the correlation of
microbial carbon with five lags of soil moisture. The strong correlations after the lag of
zero are assumed to be mainly a result of the strong correlation at the lag of zero. A
significant correlation was also found between rainfall and microbial carbon levels two
months later (p = 0.018). It is likely that this relationship is spurious due to there being a
total of twenty correlation calculations.
Cross-Correlation:
Microbial Carbon
Soil Moisture
Correlation
1
0.5
0
-0.5
-1
0
Soil Moisture Lag
Figure 5.20: Graphical cross correlation function - microbial carbon and soil moisture.
Overall split plot and MANOVA designs revealed a number of sources of variation for
microbial carbon levels. The only agreement between the split plot and MANOVA
models was in regards to the block. The split plot model found the block significant (p =
0.0234) as did two of the MANOVA test statistics (p < 0.05). In addition to this, the
overall split plot revealed a significant season by compaction interaction (p = 0.0189).
Compaction was not significant in MANOVA, which would have been an indication of
agreement on a season by compaction interaction. The season by compaction interaction
informs that the behaviour of compaction depends on the season. For this reason season
based models are pursued.
159
The first significant effects are found in the second season. The split plot design found
the compaction by cultivation interaction to be significant (p = 0.0408). Only Roys
largest root found this term significant in MANOVA (p = 0.027). Multiple comparison
tests using the Bonferroni modification found no significant differences between means
resulting from compaction and cultivation combinations. The means involved and
significance results from these tests are shown in Figure 5.21. Without the Bonferroni
modification two significant differences (p < 0.05) would result. These are between zero
pass and sixteen pass compaction when there is no cultivation, and between the two
cultivation levels when there is no compaction.
Microbial Carbon Levels By

1200
0 Pass, None
Mean Carbon (kg/ha) .
1000
0 Pass, Plough
800
1 Pass, None
600
1 Pass, Plough
400
16 Pass, None
200
16 Pass, Plough
0
2
Season
mean microbial carbon levels in season two.
In the third season, the block is significant by both the split plot (p = 0.0039) and
MANOVA (p < 0.02 for all test statistics). The exact nature of the block effect is not
investigated further as it is not a focus of this case study. The split plot design also found
the compaction by cultivation interaction significant (p = 0.0424) as was the case in the
160
second season. Multiple comparison tests using the Bonferroni approach did not find any
significant differences between means. Means, significant differences and standard errors
are shown in Figure 5.22. If the Bonferroni approach had not been applied, one (relevant)
significant difference would have resulted (p < 0.05). This is between the two cultivation
levels when there is no compaction. Note this is exactly the same as one of the mean
differences that was close to being significant in season two.
Microbial Carbon Levels By

1200
0 Pass, None
Mean Carbon (kg/ha) .
1000
0 Pass, Plough
800
1 Pass, None
600
1 Pass, Plough
400
16 Pass, None
200
16 Pass, Plough
0
3
Season
mean microbial carbon levels in season three.
In summary:
There was a strong positive correlation between soil moisture and microbial
carbon levels.
There was an interaction between compaction and cultivation in seasons two and
three. Multiple comparison tests revealed no significant differences between
means though some were very close.
161
The block had a significant effect on mean microbial carbon levels. This was
significant overall and in the third season.
5.5.11
Microbial Nitrogen Levels
Microbial nitrogen levels are investigated in this section to find out what can be seen to
be influencing these levels. Raw data, graphs and results are contained in Appendix O.
Exploratory data analysis using raw and smoothed graphs of treatments over time showed
some clear trends. The zero pass, no cultivation treatment tends to have the highest levels
of microbial nitrogen. The zero pass, disc cultivation treatment tends to have the lowest
levels of microbial nitrogen. The remainder of the treatments swap around frequently and
do not have their ordering as clear. Through the nineteen months microbial nitrogen
levels are decreasing overall, with a spike in months six and ten.
As with the other microbial variables, microbial nitrogen is strongly correlated with soil
moisture (p < 0.0001). With a correlation of 0.378 at no time lag, higher moisture levels
are associated with higher levels of microbial nitrogen. No significant correlations were
found between microbial nitrogen and a variety of lags of rainfall, minimum temperature
and maximum temperature.
Overall split plot and MANOVA designs were applied to the microbial nitrogen variable.
The block was significant in the split plot design (p = 0.0082) but only significant for
Roys largest root (p = 0.0099) in the MANOVA. This suggests that while the block
appears significant looking over all times, within any particular month there is not a
strongly significant difference. Although the block is not a focus of this case study,
multiple comparison tests revealed significant mean differences between the third block
and the first and second blocks (see Figure 5.23). The split plot design also reported a
significant interaction between season and compaction (p = 0.0484).The MANOVA
compaction effect, which should also be significant if there is an interaction between
compaction and season, was not significant (except for Roys largest root, with a p value
of 0.0401). Because of the significant interaction of compaction and season from the split
plot model, season based models are investigated to find the exact nature of the
interaction.
162
Microbial Nitrogen Levels By Block

200
180
Mean Nitrogen (g/g)
160
Block 1
140
120
100
Block 2
80
60
Block 3
40
20
0
1-5
Season
Figure 5.23: Back transformed means ( S.E.) for block effects on mean microbial
nitrogen levels.
The interaction between month and compaction was found to be significant in two
different seasons. In season one, the split plot model found the interaction significant (p =
0.0397) while in the equivalent MANOVA model only one test statistic found
compaction significant (p = 0.015). In the third season, the split plot design found the
interaction strongly significant (p = 0.0079) and two test statistics found compaction
significant (p < 0.05) in the equivalent MANOVA model. Using multiple comparison
tests, the only significant compaction effect in season one was in month three. In this
month significantly higher mean microbial nitrogen levels were present with no
compaction than sixteen pass compaction. Only one significant difference was found in
season three as well. A significantly higher mean microbial nitrogen level was present
with no compaction than one pass compaction in month nine. Compaction means and
significances for each month are shown for season one in Figure 5.24 and for season
163
three in Figure 5.25. These figures show that the effect of compaction varies substantially
between months.
Microbial Nitrogen By Compaction in Season 1

350
None
1 Pass
16 Pass
Mean Nitrogen (g/g)
300
250
200
150
100
50
0
2
Month
Figure 5.24: Back transformed means ( S.E.) for compaction effects on mean microbial
nitrogen levels in season one (each month separately).
164
Microbial Nitrogen By Compaction in Season 3

350
Mean Nitrogen (g/g)
300
250
200
150
100
50
None
1 Pass
16 Pass
0
9
10
11
Month
Figure 5.25: Back transformed means ( S.E.) for compaction effects on mean microbial
nitrogen levels in season three (each month separately).
The second season was relatively void of any relationships, except for a strongly
significant season effect (p < 0.0001). This simply informs that the mean microbial
nitrogen level is different in different months.
The block was significant or close to being significant in a number of different seasons.
In the first season the block was significant with one MANOVA test statistic (p = 0.0134)
and close to being significant with the other test statistics (all p < 0.15). Three of the four
MANOVA test statistics found the block significant in the second season (p < 0.05) while
the split plot result was close to significant (p = 0.0685). The block was significant in the
third season for both the split plot (p = 0.0023) and MANOVA (p < 0.05 for three of the
four test statistics) models. In the fourth and final season the block was close to
significant (p = 0.0566) in the split plot model and significant (p = 0.0423) for one test
statistic in MANOVA (all p < 0.15).
165
In summary:
There was a strong positive correlation between soil moisture and microbial
nitrogen levels.
An interaction between month and compaction exists in seasons one and three.
This is a reflection of different compaction behaviour in different months.
Significantly more microbial nitrogen was found at no compaction compared to
sixteen pass compaction in month three. Significantly more microbial nitrogen
was found at no compaction compared to one pass compaction in month nine.
Significant differences in mean microbial nitrogen exist between the blocks. The
block was significant overall and in every season (to varying degrees).
In the second season month was strongly significant. Mean nitrogen levels differ
during this season depending on the month.
5.5.12
Microbial Carbon to Nitrogen Ratio
The microbial carbon to nitrogen ratio is investigated in this section to find out what can
be seen to be influencing these levels. Raw data, graphs and results are contained in
Appendix P.
Graphs created during exploratory data analysis provided a few clues as to the influences
on the microbial carbon to nitrogen ratio. Overall, the ratio decreases slowly until month
seven before sharply rising, then sharply dropping in month nine and sharply rising again
in month twelve. Those treatments where the disc plough was applied tend to have higher
levels than their non cultivated counterparts. The differences between treatments are
unclear and vary from month to month.
Correlation analysis found that the microbial carbon to nitrogen ratio is not correlated
with rainfall, maximum temperature or minimum temperature. Along with the other
biological variables the ratio was found to be strongly correlated with soil moisture (p <
0.0001). Unlike the other biological variables this was a significant negative correlation,
meaning that higher levels of the ratio are associated with lower soil moisture levels.
Figure 5.26 shows the cross correlation function formed from correlation between the
166
ratio and a number of lags of soil moisture. It is suspected that the correlation is really at
a lag of zero and other strong lags are a reflection of this relationship.
Cross-Correlation:
Microbial C:N Ratio
Soil Moisture
Correlation
1
0.5
0
-0.5
-1
0
Soil Moisture Lag
Figure 5.26: Graphical cross correlation function microbial carbon to nitrogen ratio and
soil moisture.
Many relationships were uncovered by overall split plot and MANOVA designs. In
particular, the split plot design found a highly significant (p = 0.0005) three way
interaction between season, compaction and cultivation. This informed that the behaviour
of cultivation and compaction combinations depends on the season. Interestingly, the
equivalent MANOVA term that would be expected to be significant, the compaction by
cultivation interaction, was only significant for Roys upper bound largest root (p =
0.0334). All MANOVA test statistics for this interaction did have p values under 0.15
though. These interactions were seen as reason to investigate further using season based
designs. Note that graphs in this section use different scales for clarity of visual
representation.
During the first season the split plot design revealed a significant three way interaction
involving month, compaction and cultivation. This is reaffirmed by three of the four
MANOVA test statistics for the compaction by cultivation interaction being significant (p
< 0.05). Multiple comparison tests found three notable significant mean differences. In
month three, the carbon to nitrogen ratio is significantly higher with the plough than
without the plough where one pass compaction has been applied. The precise reverse of
this is true in month four, where the ratio is significantly less when the plough is used
167
compared to when it is not used at the one pass compaction level. When the plough is
used in month four, there is a significantly higher ratio where there is no compaction
compared to one pass compaction. Many more multiple comparison test results would
have been significant if the Bonferroni modification was not used. However, the results
are somewhat erratic. Figure 5.27 presents the means and significant differences from the
month by compaction by cultivation interaction in season one.
Microbial C:N Ratio By Compaction, Cultivation

in Season 1
14
Mean Carbon to Nitrogen Ratio
0 Pass, None
12
0 Pass, Plough
10
8
1 Pass, None
1 Pass, Plough
16 Pass, None
2
16 Pass, Plough
0
2
Month
the mean microbial carbon to nitrogen ratio in season one (each month separately).
The exact nature of effects on the biological carbon to nitrogen ratio in season two is
difficult to decipher from the season based split plot and MANOVA results. The split plot
result reported a significant interaction of month and compaction (p = 0.0231) and a
separate significant cultivation effect (p = 0.009). The MANOVA claimed that there is an
interaction between compaction and cultivation though, with all test statistics returning a
p value under 0.05. To be on the safe side the behaviour of compaction and cultivation
levels was investigated within each month using multiple comparison tests. In month five
168
the mean carbon to nitrogen ratio was significantly lower for the sixteen pass compaction,
no cultivation treatment than every other treatment. In month six, the only significant
difference was that the ratio was significantly higher for no compaction than sixteen pass
compaction when there was no cultivation. Figure 5.28 presents these results graphically
where month seven has been left out mainly for aesthetic value (there were no significant
mean differences in this month anyway).

in Season 2
7
0 Pass, None
6
0 Pass, Plough
5
4
1 Pass, None
1 Pass, Plough
16 Pass, None
1
16 Pass, Plough
0
5
Month
the mean microbial carbon to nitrogen ratio in season two (each month separately).
The third season presents a similar scenario to that seen in the first two seasons.
Influences of compaction and cultivation levels depend on the month. The split plot
design found a significant month by compaction interaction (p < 0.0001) and month by
cultivation interaction (p = 0.0009) while the equivalent MANOVA design found the
compaction by cultivation interaction significant (p < 0.05 for three of the four test
statistics). These results demonstrated that the carbons to nitrogen ratio means were
different depending on the month and that there may be an interaction between
169
compaction and cultivation in these months. Therefore multiple comparison tests were
applied within each month of the third season. Means and results of these tests are shown
in Figure 5.29 for month nine and Figure 5.30 for months ten and eleven. Many
significant differences were found and were sometimes contradictory in different months.
This is a side effect of the known interactions with month.

in Season 3 (Month 9)
35
0 Pass, None
30
0 Pass, Plough
25
20
1 Pass, None
15
1 Pass, Plough
10
16 Pass, None
5
16 Pass, Plough
0
9
Month
the mean microbial carbon to nitrogen ratio in season three (month nine).
170

in Season 3 (Month 10, 11)
14
0 Pass, None
12
0 Pass, Plough
10
8
1 Pass, None
1 Pass, Plough
16 Pass, None
2
16 Pass, Plough
0
10
11
Month
the mean microbial carbon to nitrogen ratio in season three (months ten and eleven).
The behaviour in season four was similar to that seen in the other three seasons. The
overall split plot design found a significant interaction between season and cultivation (p
= 0.0066) that was reinforced by the significant cultivation effect in MANOVA (p =
0.0134 for all test statistics). While cultivation was interacting with seasons it was also
interacting with compaction as found in the split plot design (p = 0.0066) and hinted at by
MANOVA (p < 0.10 for all test statistics). The consequence of this is that the effects of
compaction and cultivation once again depended on the particular month. Multiple
comparison tests found that the mean carbon to nitrogen ratio value was significantly
higher for disc plough cultivation than no cultivation when there was one pass
compaction in month twelve. More multiple comparison test results would have been
significant without the use of the Bonferroni modification. The display of means and
significant differences for season four in Figure 5.31 shows the differences present in
mean carbon to nitrogen ratio values between treatments in different months.
171

in Season 4
35
0 Pass, None
30
0 Pass, Plough
25
20
1 Pass, None
15
1 Pass, Plough
10
16 Pass, None
5
16 Pass, Plough
0
11
12
13
Month
the mean microbial carbon to nitrogen ratio in season four.
As was the case with the other biological variables, the block commonly led to significant
differences in mean carbon to nitrogen ratio measures. The overall split plot design found
the block significant (p = 0.026) while only Roys upper bound test statistic in the
overall MANOVA found it significant (p = 0.0258). In the first season only Roys largest
root test statistic found the block significant (p = 0.0407). For the second season the
block was strongly significant in MANOVA (p < 0.05 for all test statistics) but not so in
the split plot design (p = 0.3982). The split plot design (p = 0.0124) and MANOVA (p <
0.05 for all test statistics) agreed on block significance in the third season. No significant
differences were found with either test statistic in the fourth (and final) season.
More significant differences were found in the analysis of the carbon to nitrogen ratio
than with any other variable. This stems from there being a particularly small amount of
error in the statistical methods applied. Detailed biological reasons for these small
amounts of error are beyond the scope of this thesis. It would appear that the carbon to
172
nitrogen ratio properties do not vary as much depending on particular pieces of land as
other variables.
In summary:
There was a strong negative correlation between soil moisture and the microbial
carbon to nitrogen ratio.
The behaviour of the carbon to nitrogen ratio from compaction and cultivation
depends entirely on the month. Changing from month to month, significant
differences between treatments (in months) are common but since they differ
from month to month, an unclear picture is painted overall.
The block term was significant (or close to being significant) in most models
evaluated. The block was particularly influential overall and in seasons two and
three.
173
5.6
General Discussion
The Yarraman experiments and their resulting data sets and analysis make an interesting
statistical case study. In this discussion, the main case study issues and problems are
covered from a statistical point of view. Data analyses are reflected upon with a purpose
towards advancing the potential to extract information out of similar situations in the
future. Lastly, future directions for the current data set are presented.
Problems with the initial randomised complete block (RCB) design were felt throughout
data analysis. These manifested when the block was not significant in most models. It is
also likely that the design contributed towards the overall lack of sensible significant
relationships found. Randomised complete block (RCB) designs use a factor as a block.
This blocking factor is assumed to have an effect on the variable being modelling and not
interact with any other factors. The data results suggest that both of these RCB
assumptions have been violated. Firstly, rarely was the block found to be significant in
models and this was not surprising given the subtle differences that differentiated the
three blocks (Blumfield, personal communication). Secondly, previous data analysis by
Blumfield et al. (2002) and Chen et al. (2002) suggested significant interactions of the
block with other factors. Given the relative failure of the blocking aspect of the initial
model, alternatives should be considered in future. Blocking is useful for providing
necessary replication but needs to be on a factor that adheres to RCB design assumptions.
The log transformation for variable values prior to analysis was found to be fairly
successful. Making the samples adhere closer to normality led to noticeably more
significant relationships than there was from analysis on the raw data (not included).
Furthermore, the split plot and MANOVA results were more consistent when analysed
using the log transformed values. This suggests that part of the reason for the occasional
inconsistency between split plot and MANOVA results may be due to deviations from
statistical assumptions.
The possibility of other variables affecting the chemical and biological measures was
investigated during data analysis. This was done by cross correlation of each variable
174
with the available environmental variables rainfall, maximum temperature, minimum

temperature and soil moisture. The only strongly significant relationships were between
soil moisture and the microbial carbon, microbial nitrogen and microbial carbon to
nitrogen ratio variables. The inclusion of soil moisture as a covariate in these variable
models is a possibility for the future.
There were not a large number of significant relationships found during analysis. Often,
while samples revealed a sizable difference in mean, the same could not be assumed for
the populations they came from. The reason for this was that high error estimates made
significant differences difficult to find. This is because variation attributed to a factorial
effect needs to be significantly greater than that for the error to be judged significant.
The nature of the large amount of error is important to consider because it may be the key
to more successful future analyses. It is possible that soil samples are extremely variable
due to factors beyond control (eg. what was growing in the location 50 years ago). It is
more likely that there are recordable additional variables influencing levels of the
variables (such as physical soil properties). In this case an improved design may be
possible that reduces error by more effectively dealing with variation. The other
alternative for reducing error is to take more samples (replicates) but this is not as
sensible and probably not as feasible as improving the design.
The split plot and MANOVA designs used were found to be effective in answering
statistical questions in the case study. The differences in test structure and hypothesis for
the most part complemented each other, answering slightly different questions. The two
techniques were found to give interesting and often seemingly conflicting results.
Tests of significance for the same factor in split plot and MANOVA designs are quite
different in nature. Consider the overall designs from the case study where both evaluate
compaction, cultivation, the compaction by cultivation interaction and block factorial
effects for significance. In these models repeated measures were taken over season. Both
split plot and MANOVA test for equality of means but how they do so is different. The
split plot tests look at the average value for each factorial effect over time while
MANOVA searches within each time to gauge significant differences of each factorial
effect. Therefore, the split plot result gives a picture of overall behaviour without
175
consideration of specific seasons. The split plot test does not provide any indication of an
interaction. MANOVA on the other hand could return a significant result that is not an
overall effect but an indication of an interaction between season and a factorial effect
because it has evaluated the situation from within each season.
The main plot tests of significance are not the only tests provided by split plot designs.
Continuing with the overall models in the case study, every factorial effect (except the
block) in the main plot is evaluated for an interaction with season in the subplot. This can
be seen as a benefit over MANOVA, where interactions between season and factorial
effects are bound with the result for that factorial effect. For example, a significant result
for compaction in MANOVA could be an indication of a significant compaction effect,
significant compaction by season interaction or both. The equivalent split plot model tests
for compaction and the compaction by season interaction separately.
In some analyses factorial effects were significant in MANOVA but not in the split plot
design and vice versa. In fact, at times the results from the two methods were complete
opposites. At first puzzling, consideration of the hypotheses and analysis techniques of
the two methods reveals how these differences can occur. When a factorial effect is
significant in MANOVA but not in the split plot designs, the mean variable levels do not
show a significant difference when averaged over time, but do at a particular time or
combination of times. Commonly in this case the split plot design had an interaction
between the factorial effect and time as that was the true nature of the significant
MANOVA result. The alternative case where a factorial effect is significant in the split
plot designs but not in MANOVA also occurred. This arises when there are only small
differences in any particular month but these small differences are significant when
looked at over all times. In these situations, the split plot result will be significant (in the
main plot only) as it analyses values averaged over time while MANOVA looks at each
time period involved.
The benefit of using the Bonferroni modification for multiple comparison tests in this
application is open to debate. Where it may prevent spurious relationships when a lot of
comparisons take place, sometimes it prevented there from being any significant results.
Often when a factorial effect was significant, no multiple comparison tests were
significant because the Bonferroni modified p value was too low. The application of a
176
conservative multiple comparison test like Tukeys or Student-Newman-Keuls (SNK)

without the Bonferroni modification may be a better approach in the future.
For the short time series situation found here, MANOVA and split plot designs were
appropriate. As the number of time intervals increases, MANOVA begins to falter as the
number of degrees of freedom available for error declines. In the case of the Yarraman
data sets, this problem was avoided by using seasons rather than months. Conventional
time series techniques were not appropriate due to the small size of the time series. The
desirable characteristics of investigating a seasonal effect and including multiple
variables were definitely not feasible with a maximum of nineteen times. As the time
series begins to get large (ie. more than 25 time intervals) conventional univariate time
series specific analysis techniques may be useful.
The direction taken in analysis from the start was to look at each individual variable over
time, rather than focusing on particular times. There is the potential for multivariate
analysis at specific times rather than using different times. For instance, nitrate levels,
dynamics and leaching could all be analysed in one MANOVA model (but not a standard
split plot model as one dependent variable is required). These types of models were
beyond the scope of this thesis but present a possibility for future analysis.
177
6 CONCLUSION
Data correlated over time result from situations where one or more variables are recorded
over time. This dissertation found analysis of data correlated over time to be a common
endeavour frequented by many fields. Medicine, chemistry and forestry feature among
the range of fields that dabble in data correlated over time. A variety of techniques are
available to deal with differences in types of data and purposes of analysis.
No matter what exactly your area of interest is there is an analysis option available. Better
understood theories adorn many text books and are not overly complex. Care must be
taken that data adhere to the assumptions of the statistical tests involved. Advanced
methods even allow for the exploitation of the lack of relationships (eg. cointegration
models).
Techniques available for the analysis of data correlated over time can be placed into two
large general categories. These are repeated measures and time series techniques, both
supported in modern statistical software packages. Repeated measures techniques cover
situations where multiple measurements are recorded on the same experimental unit and
these may not necessarily be over time. Time series techniques are created specifically
for situations where a number of measurements are made over time. Although designed
specifically for time series situations, time series techniques have drawbacks in regards to
the length of time series required for analysis. A general rule is that at least 25 times are
recommended for univariate time series (Nemec, 1996), and this increases dramatically
for multivariate techniques due to the additional parameters involved. Where there are
too few times to feasibly conduct (univariate or multivariate) time series analysis,
repeated measures techniques can usually be applied.
Recent theoretical and practical applications found in the literature were quite varied in
nature, with ARIMA based methods featuring predominantly. A range of other
techniques exists on the fringes, though most of these methods are designed for particular
situations. Nonlinear techniques exist for usage when assumptions of linear relationships
178
are not feasible. Bayesian techniques introduce a new statistical paradigm into the
multivariate time series field. One of the most interesting possibilities found was the use
of genetic algorithms for the accurate modelling of short multivariate time series
situations.
The Yarraman forestry case study involved many simultaneously recorded variables over
a maximum of nineteen months. Chemical and biological variables were recorded each
month at different compaction and cultivation levels for three designed blocks. In effect,
the original design was a randomised complete block (RCB) recorded over a number of
times. The original decision to use slope as the block was based on experimental
practicalities and suspicions. The validity of the block is questionable, though, as it was
based on very slight differences that would not have been expected to be a source of
variations for recorded variables.
Only having nineteen time periods in the case study made analysis using specific time
series methods impractical. Therefore it was instead decided to focus more on repeated
measures designs and use the two techniques of MANOVA and split plot analysis. Some
time series based techniques were applied where appropriate. Original graphs of the
twelve variables for each combination of the compaction and cultivation levels over time
were smoothed with the aid of a three month moving average. A form of cross correlation
functions was effectively used to judge correlation between particular variables and
environmental variables. Each variable was compared with a number of lags of each
environmental variable in case there was a delayed effect of environmental behaviour on
variable response. In this case only one side of the typical cross correlation function made
practical sense, as soil properties affecting environmental variables like temperature is
irrational.
Split plot designs and MANOVA were both found to be valuable analytical tools for the
case study. Using the split plot designs and MANOVA in conjunction with one another
revealed far more information than either technique would have if used in isolation. This
is because of differences in the hypotheses tested by the two techniques. In the correlated
data over time situation, factorial effects in the (split plot) main plot section are estimated
using values averaged over time. In MANOVA estimates are based on investigating the
179
factorial effects at each time using a global testing process. Practically, the more specific
results from the split plots designs were best to observe first and then refer to MANOVA.
Differences in approach between split plots and MANOVA led to occasional situations
where the two analytical methods appear to give conflicting results. It is seen in the case
study that these apparent differences, if interpreted correctly, add to the overall
understanding of the data. For example, mean nitrate leaching was found to be
significantly affected by cultivation overall from the split plot design (p = 0.0383) but not
the MANOVA (p = 0.2197). This is because in any given season there is not a noticeably
significant difference between the cultivation levels but considered over all seasons there
is.
A major hindrance to finding significant relationships in this experiment was the large
amount of natural variation or error present. The main reason suspected for the high
levels of variation was the presence of confounding variables that were not included in
the model. It was found that the biological variables were significantly correlated with
soil moisture and it is suspected that the inclusion of soil moisture may improve the
results. The time constraints of this thesis prohibited the investigation of relevant
covariate analyses.
When it comes to analysis of data correlated over time, it is clear from this investigation
that it is important to know what exactly is of interest. This is because there is a great
range of time series techniques available, and an appropriate selection will save time and
resources. Related to this is the importance of a suitable experimental design from the
outset.
Although not elementary statistical concepts, repeated measures and time series analyses
are not too difficult theoretically or practically. Focusing on ARIMA based methods, split
plot designs and MANOVA for discussion, all have advantages and disadvantages.
Modern statistical packages provide support for the evaluation of all base techniques for
ARIMA, split plots and MANOVA. In the case of SAS (SAS Institute, 1999), split plot
designs are not as well supported because multiple factorial effects can not be specified
as an error term (this must be done manually). ARIMA based methods involve more
human decision making as appropriate terms to include in models are frequently selected
180
from correlation functions and model success. Split plot designs and MANOVA require
less human decision making, but require expert interpretation when complex or
contrasting behaviour is found. All methods involve important statistical assumptions
about samples that must be carefully monitored. In the case of ARIMA, stationarity is a
common issue while for split plot and MANOVA it is normality.
This dissertation has presented a set of practically based techniques for dealing with data
correlated over time. Investigation has been provided into the latest and greatest literature
developments and application. A detailed example demonstrating the usefulness of
techniques of dealing with data correlated over time has been provided. It is anticipated
that the reader now has a working understanding of the concepts and issues surrounding
the analysis of situations involving data correlated over time.
181
REFERENCES
Akman, I., and De Gooijer, J. G. (1996), Component Extraction Analysis of Multivariate
Time Series, Computational Statistics and Data Analysis, 21, 487-499.
Alexandrov, G. A., Yamagata, Y., and Oikawa, T. (1999), Towards a Model for
Projecting Net Ecosystem Production of the World Forests, Ecological Modelling,
123, 183-191.
Berry, Donald A. & Stangl, Dalene K. (1996), Bayesian Biostatistics, Marcel Dekker Inc,
New York, USA.
Bluman, A. G. (2001), Elementary Statistics A Step By Step Approach, Mc-Graw Hill,
New York, New York, USA.
Blumfield, T. J., Xu, Z. H., Chen, C. R. (2002), Soil Compaction and Mineral Nitrogen
Dynamics during Hoop Pine Plantation Establishment, Cooperative Research
Centre (CRC) for Sustainable Production Forestry, Griffith University, Nathan,
Queensland, Australia.
Boyd, I. L. and Murray, A. W. A. (2001), Monitoring a Marine Ecosystem Using
Responses of Upper Trophic Level Predators, Journal of Animal Ecology, 70, 747760.
Cao, L., Mees, A., and Judd, K. (1998), Dynamics from Multivariate Time Series,
Physica D, 121, 75-88.
Chan, W. S., Lo, H. W. C., and Cheung, S. H. (1999), Return Transmission Among Stock
Markets of Greater China, Mathematics and Computers in Simulation, 48, 511-518.
Chatfield, C. (1980), The Analysis of Time Series: An Introduction, Second Edition,
Chapman and Hall, New York, New York, USA.
Chaturvedi, A., Wan, A. T. K, and Singh, S. P. (2002), Improved Multivariate Prediction
in a General Linear Model with an Unknown Error Covariance Matrix, Journal of
Multivariate Analysis, [Online] Available: http://www.academicpress.com/jmva
(23/07/2002).
Chen, C. R., Xu, Z. H., Blumfield, T. J. and Hughes, J. M. (2002), Soil Microbial
Biomass During the Early Establishment of Hoop Pine Plantation: Seasonal
Variation and Impacts of Site Preparation, Cooperative Research Centre (CRC) for
Sustainable Production Forestry, Griffith University, Nathan, Queensland,
Australia.
Chen, H., and Dyke, P. P. G. (1998), Multivariate Time-Series Model for Suspended
Sediment Concentration, Continental Shelf Research, 18, 123-150.
Chin, D. A. (1995), A Scale Model of Multivariate Rainfall Time Series, Journal of
Hydrology, 168, 1-15.
Cochran, W. G. and Cox, G. M., 1957, Experimental Designs 2nd Edition, John Wiley,
New York.
182
Crowder, M. J. and Hand, D. J. (1990), Analysis of Repeated Measures, Chapman &

Hall, New York, USA.
Crucianu, M., Bon, R., Asselin de Beauville, J. (2001), Bayesian Learning For
Recurrent Neural Networks, Neurocomputing, 36, 235-242.
Diamandis, P. F., Georgoutsos, D. A., and Kouretas, G. P. (2000), The Monetary Model
in the Presense of I(2) Components: Long-run Relationships, Short-run Dynamics
and Forecasting of the Greek Drachma, Journal of International Money and
Finance, 19, 917-941.
Felmingham, B., Qing, Z., and Healy, T. (2000), The Interdependence of Australian and
Foreign Real Interest Rates, Economic Record, 76, 163-171.
Franses, P. H. (1998), Time Series Models for Business and Economic Forecasting,
Cambridge University Press, United Kingdom.
Green, A. G. and Sparks, G. R. (1999), Population Growth and the Dynamics of
Canadian Development: A Multivariate Time Series Approach, Explorations in
Economic History, 36, 56-71.
Guimares, G., Peter, J. H., Penzel, T., and Ultsch, A. (2001), A Method for Automated
Temporal Knowledge Acquisition Applied to Sleep-Related Breathing Disorders,
Artificial Intelligence in Medicine, 23, 211-237.
Jensen, G. F. (2001), The Invention of Television as a Cause of Homicide: The
Reification of a Spurious Relationship, Homicide Studies, 5, 114-130.
Johnson, R. A. and Wichern, D. W. (1982), Applied Multivariate Statistical Analysis,
Prentice-Hall, New Jersey, USA.
Keselman, H. J., Algina, J., Kowalchuk, R. K. and Wolfinger, R. D. (1999), A
Comparison of Recent Approaches to the Analysis of Repeated Measurements,
British Journal of Mathematical and Statistical Psychology, 52, 63-78.
Khalil, M., Panu, U. S. and Lennox, W. C. (2001), Groups and Neural Networks Based
Streamflow Data Infilling Procedures, Journal of Hydrology, 241, 153-176.
Kulkarni, D. R., and Parikh, J. C. (2000), Multivariate Time Series Modeling In A
Connectionist Approach, International Journal of Modern Physics, 11, 159-173.
Li, Z., and Kafatos, M. (2000), Interannual Variability of Vegetation in the United States
and Its Relation to El Nio/Southern Oscillation, Remote Sensing of Environment,
71, 239-247.
Lu, S., Lu, H. and Kolarik, W. J. (2001), Multivariate Performance Reliability Prediction
in Real-Time, Reliability Engineering and System Safety, 72, 39-45.
Maharaj, E. A. (1999), Comparison and Classification of Stationary Multivariate Time
Series, Pattern Recognition, 32, 1129-1138.
Makridakis, S., Wheelwright, S. C., and Hyndman, R. J. (1998), Forecasting Methods
and Applications Third Edition, John Wiley & Sons, New York, USA.
Mann, P. M. (1998), Introductory Statistics Third Edition, John Wiley & Sons, New
York, New York, USA.
McCleary, R., and Hay, R. A. Jnr. (1980), Applied Time Series Analysis For The Social
Sciences, Sage Publications, Beverly Hills, California, USA.
183
Microsoft Corporation (2001), Microsoft Excel 2002, Microsoft Corporation, USA.

Nemec, A. F. L. (1995), Analysis of Repeated Measures and Time Series: An
Introduction with Forestry Examples (Biometrics Information Handbook No. 6),
Ministry of Forests Research Program, British Columbia, Canada.
Nicholson, M., Fryer, R., and Maxwell, D. (1998), Multivariate Trends in Phytoplankton
Species Groups in the Southern North Sea, ICES Journal of Marine Science, 55,
581-586.
rstavik, S., Carretero-Gonzlez, R., and Stark, J. (2000), Estimation of Intensive
Quantiles in Spatio-Temporal Systems From Time-Series, Physica D, 147, 204220.
Palu, M. (1996), Detecting Nonlinearity in Multivariate Time Series, Physics Letters A,
213, 138-147.
Pech, N., Samba, A., Drapeau, L, Sabatier, R., and Lalo, F. (2001), Fitting a Model of
Flexible Multifleet-Multispecies Fisheries to Senegalese Artisanal Fishery Data,
Aquatic Living Resources, 14, 81-98.
Peiris, D. R., and McNicol, J. W. (1996), Modelling Daily Weather with Multivariate
Time Series, Agricultural and Forest Meteorology, 79, 219-231.
Perry, D. A. (1998), Detecting The Scientific Basis Of Forestry, Annual Review of
Ecology and Systematics, 29, 435-66.
Pynnnen, S. (2001), Multivariate Time Series, Professor of Statistics, University of
Vaasa.
Rao, P. V. (1998), Statistical Research Methods in the Life Sciences, Brooks/Cole
Publishing Company, Pacific Grove, California, USA.
Reick, C. H., and Page, B. (2000), Time Series Prediction by Multivariate Next
Neighbour Methods with Application to Zooplankton Forecasts, Mathematics and
Computers in Simulation, 52, 289-310.
Repucci, M. A., Schiff, N. D., and Victor, J. D. (2001), General Strategy for Hierarchical
Decomposition of Multivariate Time Series: Implications for Temporal Lobe
Seizures, Annals of Biomedical Engineering, 29, 1135-1149.
Rod, X., Giralt, S., Burjachs, F., Comn, F. A., Tenorio, R. G., Juli, R. (2002), HighResolution Saline Lake Sentiments As Enhanced Tools For Relating Proxy
Paleolake Records to Recent Climatic Data Series, Sedimentary Geology, 148, 203220.
SAS Institute Inc (1999), The SAS System for Windows Version 8.00 Edition, SAS
Institute Inc., Cary, NC, USA.
Smith, T. E. and Bubb, K. A. (2000), The Effects of Mechanical Harvesting Operations
on Plantation Productivity in QDPIF Hoop Pine Plantations, Part A: Impacts to
Soil Physical Properties, Queensland Forestry Research Institute: Agency for Food
and Fibre Sciences, Gympie, Queensland, Australia.
SPSS (1999), SPSS Base 10 Application Guide, SPSS Inc., Chicago, USA.
Stric, C. (1999), Multivariate Extremes for Models with Constant Conditional
Correlations, Journal of Empirical Finance, 6, 515-553.
184
Stergiou, K. I., Christou, E. D., and Petrakis,G. (1997), Modelling and Forecasting
Months Fisheries Catches: Comparison of Regression, Univariate and Multivariate
Time Series Methods, Fisheries Research, 29, 55-95.
Swift, S., Liu, X. (2002), Predicting Glaucomatous Visual Field Deterioration Through
Short Multivariate Time Series Modelling, Artificial Intelligence in Medicine, 24,
5-24.
Swift, S., Tucker, A., Martin, N. and Liu, X. (2001), Grouping Multivariate Time Series
Variables: Applications to Chemical Process and Visual Field Data, KnowledgeBased Systems, 14, 147-154.
Telenius, B. and Verwijst, T. (1995), The Influence of Allometric Variation, Vertical
Biomass Distribution and Sampling Procedure On Biomass Estimates In
Commercial Short-Rotation Forests, Bioresource Technology, 51, 247-253.
Van Dongen, G. and Geuens, L. (1998), Multivariate Time Series Analysis For Design
and Operation of a Biological Wastewater Treatment Plant, Water Research, 32,
691-700.
Wilson, G. T., Reale, M., and Morton, A. S. (2001), Developments in Multivariate Time
Series Modeling, University of Canterbury, Christchurch, New Zealand , Report
Number: VCDMS2001/1.
Wold, S., Sjstrm, M. and Eriksson, L. (2001), PLS-Regression: A Basic Rook of
Chemometrics, Chemometrics and Intelligent Laboratory Systems, 58, 109-130.
Zar, J. H. (1999), Biostatistical Analysis Fourth Edition, Prentice-Hall International
Inc., New Jersey, USA.
185
STATISTICAL ANALYSES OF MULTIVARIATE

TIME SERIES DATA
WITH APPLICATION TO COMPACTING
EFFECTS ON SOIL CHEMICAL AND
BIOLOGICAL PROPERTIES IN FORESTRY
VOLUME TWO
By Stuart Fenech BSc (AES)

Australian School of Environmental Studies
Faculty of Environmental Science
GRIFFITH UNIVERSITY
BRISBANE
This dissertation is submitted in partial fulfilment of the requirements of the degree of

Bachelor of Science with Honours in Australian Environmental Studies.
October 2002
186
APPENDIX A SAS EXAMPLES INPUT

Example 1 - Exploratory Data Analysis
data EDA;
input mayRain;
datalines;
81
89
124.6
8
162.6
229.5
68.1
85.05
51.5
64.35
41.4
582.55
142.35
153.9
94.95
57
40.75
50
run;
proc univariate;
run;
Example 2 Autocorrelation and Autocovariance

/* Reading in the data */
data rainy;
input mon $ year rain;
datalines;
Jan 1985 76.2
Feb 1985 94.8
Mar 1985 224
... [A LOT MORE DATA]
Oct 2001 56.9
Nov 2001 153
Dec 2001 102
run;
/* Using proc ARIMA for autocovariance and autocorrelation */
proc arima;
identify VAR=rain;
run;
187
Example 3 Rainfall Correlation Functions

data rainy;
datalines;
Jan 1985 76.2
Feb 1985 94.8
Mar 1985 224
Apr 1985 119
Oct 2001 56.9
Nov 2001 153
Dec 2001 102
run;
/* Using proc ARIMA for autocorrelation function, etc. */
proc arima;
identify VAR=rain;
run;
Example 4 Rainfall Cross Correlation Function

data rainy;
input mon $ year rain days;
datalines;
Jan 1985 76.2 4
Feb 1985 94.8 9
Mar 1985 224 14
...
Sep 2001 12
4
Oct 2001 56.9 6
Nov 2001 153 11
Dec 2001 102 7
run;
/* Using proc ARIMA for cross correlation */
proc arima;
identify VAR=rain CROSSCOR=days;
run;
188
Example 7 Linear Regression

data rainy;
datalines;
Jan 1985 76.2
Feb 1985 94.8
Mar 1985 224
Apr 1985 119
Oct 2001 56.9
Nov 2001 153
Dec 2001 102
run;
/* Using proc REG for regression */
proc reg;
model rain = days /* Additional variables if needed */;
run;
/* Using proc GLM for regression */
proc glm;
model rain = days /* Additional variables if needed */;
run;
Example 8 Rainfall ARIMA Model

data rainy;
datalines;
Jan 1985 76.2
Feb 1985 94.8
Mar 1985 224
Apr 1985 119
Oct 2001 56.9
Nov 2001 153
Dec 2001 102
run;
/* Initial testing for stationarity */
proc arima;
identify VAR=rain STATIONARITY=(DICKEY=(0,1,5,10));
run;
/* Running an actual ARIMA model */
proc arima;
identify VAR=rain STATIONARITY=(DICKEY=(0,1,5,10));
estimate P=(1)(12) PLOT;
run;
189
Example 9 Rainfall Multivariate ARIMA Model

data rainy;
datalines;
Jan 1985 76.2
Feb 1985 94.8
Mar 1985 224
Apr 1985 119
Oct 2001 56.9
Nov 2001 153
Dec 2001 102
run;
proc arima; /* Testing lage 0 to 12 of days */
identify VAR=rain CROSSCOR=days STATIONARITY=(DICKEY=(0,1,5,10));
estimate INPUT=((0,1,2,3,4,5,6,7,8,9,10,11,12) days) PLOT;
run;
proc arima; /* Testing only lag 0 of days */
estimate INPUT=((0) days) PLOT;
run;
proc arima; /* Testing lags 0 of days, 1 and 12 of rain */
estimate INPUT=(0 days) P=(1)(12) PLOT;
run;
190
APPENDIX B SAS EXAMPLES OUTPUT

Example 3 Rainfall Correlation Functions
Note that in the following SAS output that the numbers 1, 2 etc. are actually the first
decimal place. The left hand side of the diagram symbolises negative correlations the
right hand side positive collelations. For example, the 7 on the left hand side of each
diagram represents -0.7. Full stops are used to indicate two standard errors either side of
zero. The asterisks are (rather crude) representations of the correlation present with each
lag.
Autocorrelations
Lag
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
|********************|
|
. |*****
|
|
. |* .
|
|
. | .
|
|
. *| .
|
|
.**| .
|
|
***| .
|
|
****| .
|
|
. *| .
|
|
. | .
|
|
. |* .
|
|
. |**.
|
|
. |*****
|
|
. |**.
|
|
. | .
|
|
. | .
|
|
***| .
|
|
***| .
|
|
****| .
|
|
.***|
.
|
|
. *|
.
|
|
.
|
.
|
|
.
|** .
|
|
.
|***.
|
|
.
|***.
|
"." marks two standard errors
191
Std Error
0
0.070014
0.073611
0.073858
0.073867
0.073925
0.074552
0.075783
0.077762
0.077823
0.077837
0.077937
0.078683
0.081931
0.082838
0.082839
0.082840
0.084619
0.086174
0.087902
0.088940
0.089106
0.089109
0.089920
0.091392
Inverse Autocorrelations
Lag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Correlation
-0.14819
0.01785
0.00192
-0.04323
-0.02746
-0.04143
0.12140
-0.06345
0.01748
0.03610
0.02650
-0.13184
-0.03791
0.07625
-0.08401
0.12260
0.02838
0.06193
0.02578
0.00497
0.03667
-0.08051
-0.03179
-0.02682
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
***| .
|
|
. | .
|
|
. | .
|
|
. *| .
|
|
. *| .
|
|
. *| .
|
|
. |**.
|
|
. *| .
|
|
. | .
|
|
. |* .
|
|
. |* .
|
|
***| .
|
|
. *| .
|
|
. |**.
|
|
.**| .
|
|
. |**.
|
|
. |* .
|
|
. |* .
|
|
. |* .
|
|
. | .
|
|
. |* .
|
|
.**| .
|
|
. *| .
|
|
. *| .
|
Partial Autocorrelations
Lag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Correlation
0.22955
0.00878
-0.00464
-0.03353
-0.08803
-0.10035
-0.12725
0.04279
0.02107
0.02495
0.07991
0.17402
0.01375
-0.06698
0.01440
-0.17265
-0.06877
-0.07572
-0.02888
-0.01077
-0.00723
0.10487
0.04594
0.03017
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
. |*****
|
|
. | .
|
|
. | .
|
|
. *| .
|
|
.**| .
|
|
.**| .
|
|
***| .
|
|
. |* .
|
|
. | .
|
|
. | .
|
|
. |**.
|
|
. |***
|
|
. | .
|
|
. *| .
|
|
. | .
|
|
***| .
|
|
. *| .
|
|
.**| .
|
|
. *| .
|
|
. | .
|
|
. | .
|
|
. |**.
|
|
. |* .
|
|
. |* .
|
192
Example 4 Cross Correlation Function (CCF)

Comments are as per the Example 3 on autocorrelation functions.
Correlation of rain and days

Variance of input =
Number of Observations
15.10861
204
Crosscorrelations
Lag
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
Covariance
87.698391
62.931614
31.158648
-16.809443
-3.484089
-68.982688
-71.005827
-44.118362
-53.835095
-11.265130
31.202538
74.625334
194.433
74.621641
33.891011
10.905942
-15.958500
-56.899180
-96.559925
-74.264974
-35.327098
-28.199919
16.475877
74.464415
101.732
Corr
0.26871
0.19282
0.09547
-.05150
-.01068
-.21136
-.21756
-.13518
-.16495
-.03452
0.09561
0.22865
0.59575
0.22864
0.10384
0.03342
-.04890
-.17434
-.29586
-.22755
-.10824
-.08641
0.05048
0.22816
0.31171
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
. |*****
|
|
. |****
|
|
. |**.
|
|
. *| .
|
|
. | .
|
|
****| .
|
|
****| .
|
|
***| .
|
|
***| .
|
|
. *| .
|
|
. |**.
|
|
. |*****
|
|
. |************
|
|
. |*****
|
|
. |**.
|
|
. |* .
|
|
. *| .
|
|
***| .
|
|
******| .
|
|
*****| .
|
|
.**| .
|
|
.**| .
|
|
. |* .
|
|
. |*****
|
|
. |******
|
193
Example 8 Linear Regression

The REG Procedure
Model: MODEL1
Dependent Variable: rain
Analysis of Variance
DF
Sum of
Squares
Mean
Square
1
202
203
510441
927767
1438208
510441
4592.90592
Source
Model
Error
Corrected Total
Root MSE
Dependent Mean
Coeff Var
67.77098
88.29181
76.75794
R-Square
Adj R-Sq
F Value
Pr > F
111.14
<.0001
0.3549
0.3517
Parameter Estimates
Variable
Intercept
days
DF
Parameter
Estimate
Standard
Error
t Value
1
1
-17.18371
12.86902
11.07325
1.22072
-1.55
10.54
Pr > |t|
0.1223
<.0001
The GLM Procedure

Dependent Variable: rain
Source
Model
Error
Corrected Total
R-Square
0.354915
DF
1
202
203
Sum of
Squares
510441.398
927766.995
1438208.393
Coeff Var
76.75794
Root MSE
67.77098
Mean Square
510441.398
4592.906
F Value
111.14
Pr > F
<.0001
rain Mean
88.29181
Source
days
DF
1
Type I SS
510441.3983
Mean Square
510441.3983
F Value
111.14
Pr > F
<.0001
Source
days
DF
1
Type III SS
510441.3983
Mean Square
510441.3983
F Value
111.14
Pr > F
<.0001
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
days
-17.18370784
12.86902297
11.07324557
1.22072097
-1.55
10.54
0.1223
<.0001
194
Example 9 Rainfall ARIMA Model

Augmented Dickey-Fuller Unit Root Tests
Type
Zero Mean
Single Mean
Trend
Lags
0
1
3
5
0
1
3
5
0
1
3
5
Rho
-74.3640
-44.1021
-21.7670
-15.2863
-156.394
-152.853
-173.290
-908.962
-156.416
-152.913
-173.197
-909.016
Pr < Rho
<.0001
<.0001
0.0008
0.0060
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
Tau
-6.72
-4.66
-3.14
-2.54
-11.22
-8.67
-6.76
-6.56
-11.20
-8.65
-6.74
-6.54
Pr < Tau
<.0001
<.0001
0.0018
0.0110
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
Conditional Least Squares Estimation
Parameter
Estimate
MU
AR1,1
AR2,1
88.46947
0.19926
0.22670
Standard
Error
t Value
8.90639
0.06926
0.07276
Constant Estimate
Variance Estimate
Std Error Estimate
9.93
2.88
3.12
Approx
Pr > |t|
<.0001
0.0044
0.0021
54.78186
6470.93
80.44209
The ARIMA Procedure

AIC
2372.02
SBC
2381.974
Number of Residuals
204
* AIC and SBC do not include log determinant.
Correlations of Parameter Estimates

Parameter
MU
AR1,1
AR2,1
MU
AR1,1
AR2,1
1.000
0.001
0.003
0.001
1.000
-0.063
0.003
-0.063
1.000
195
Lag
0
1
12
Autocorrelation Plot of Residuals

Lag
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Correlation
1.00000
-.00418
0.02090
-.00335
0.02526
-.01751
-.05223
-.13569
0.00856
0.01327
-.00798
0.03288
-.01308
0.05938
-.02232
0.01769
-.15192
-.07228
-.10248
-.07657
-.01122
-.02114
0.09650
0.10554
0.04805
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|********************|
. | .
|
. | .
|
. | .
|
. |* .
|
. | .
|
. *| .
|
***| .
|
. | .
|
. | .
|
. | .
|
. |* .
|
. | .
|
. |* .
|
. | .
|
. | .
|
***| .
|
. *| .
|
.**| .
|
.**| .
|
. | .
|
. | .
|
. |**.
|
. |**.
|
. |* .
|
Inverse Autocorrelations
Lag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Correlation
0.05504
0.03323
-0.00812
-0.05064
-0.01044
-0.00149
0.11853
-0.02505
0.02746
0.02628
-0.00179
0.03220
-0.04304
0.03290
-0.03709
0.11700
0.06798
0.09580
0.08192
-0.00424
0.01902
-0.09172
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
. |* .
. |* .
. | .
. *| .
. | .
. | .
. |**.
. *| .
. |* .
. |* .
. | .
. |* .
. *| .
. |* .
. *| .
. |**.
. |* .
. |**.
. |**.
. | .
. | .
.**| .
196
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23
24
-0.06844
-0.02149
|
|
. *|
. |
.
.
|
|
Partial Autocorrelations
Lag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Correlation
-0.00418
0.02089
-0.00318
0.02481
-0.01720
-0.05348
-0.13582
0.00804
0.02003
-0.00602
0.03808
-0.02016
0.04356
-0.03838
0.01749
-0.14918
-0.08034
-0.09348
-0.08340
0.00681
-0.02820
0.09420
0.07241
0.02325
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
. | .
. | .
. | .
. | .
. | .
. *| .
***| .
. | .
. | .
. | .
. |* .
. | .
. |* .
. *| .
. | .
***| .
.**| .
.**| .
.**| .
. | .
. *| .
. |**.
. |* .
. | .
Model for variable rain

Estimated Mean
88.46947
Autoregressive Factors
Factor 1:
Factor 2:
1 - 0.19926 B**(1)
1 - 0.2267 B**(12)
197
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example 10 Multivariate Rainfall ARIMA model

(Initial model)
Parameter
MU
NUM1
NUM1,1
NUM1,2
NUM1,3
NUM1,4
NUM1,5
NUM1,6
NUM1,7
NUM1,8
NUM1,9
NUM1,10
Estimate
41.31748
12.20554
-0.01414
0.81242
-0.02988
-0.84325
1.54941
1.31813
1.63912
0.0088404
1.92493
0.20047
Standard
Error
34.57051
1.44052
1.48190
1.48888
1.48894
1.48745
1.43216
1.47719
1.47407
1.46303
1.45910
1.42377
Approx
Pr > |t|
0.2336
<.0001
0.9924
0.5860
0.9840
0.5715
0.2807
0.3734
0.2676
0.9952
0.1887
0.8882
Lag
0
0
1
2
3
4
5
6
7
8
9
10
Variable
rain
days
days
days
days
days
days
days
days
days
days
days
(Later model)
Parameter
MU
NUM1
Estimate
Standard
Error
Approx
Pr > |t|
Lag
-17.18371
12.86902
11.07325
1.22072
0.1223
<.0001
0
0
Constant Estimate
-17.1837
Variance Estimate
4592.906
Std Error Estimate
67.77098
AIC
2301.1
SBC
2307.736
Number of Residuals
204
* AIC and SBC do not include log determinant.
198
Variable
rain
days
APPENDIX C EXPERIMENT TIME PERIODS

Month
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20*
Start Date
3 February 2000
2 March 2000
30 March 2000
27 April 2000
25 May 2000
22 June 2000
20 July 2000
17 August 2000
14 September 2000
12 October 2000
9 November 2000
7 December 2000
4 January 2001
1 February 2001
1 March 2001
29 March 2001
26 April 2001
24 May 2001
21 June 2001
19 July 2001
Season
1: Autumn 2000
2: Winter 2000
3: Spring 2000
4: Summer 2000/2001
5: Autumn 2001
* No measurements were taken for month 20. Included purely to show end of month 19.
199
APPENDIX D VARIABLE LIST

This appendix provides a summary of the factors and variables provided for analysis.
Note that some easily derived (and irrelevant) variables have not been included, for
simplicity.
* Indicates variables that are dependant or strongly linked with variables previously
listed.
CHEMICAL VARIABLES
MONTH
Values: 1 to 19.
Interpretation: Month or 28 day period.
BLOCK
Values: 1, 2, 3.
Interpretation: Replicate or block. These differ depending on slope present. 1 is upper
slope, 2 is mid slope, 3 is lower slope.
COMPACT
Values: 1, 2, 3.
Interpretation: Level of compaction. None (1), one (2) or sixteen (3) compactions.
CULT
Values: 1, 2.
Interpretation: Whether or not the sample has had cultivation applied. Not cultivated (1)
or cultivated with disc plough (2).
200
DEPTH
Values: 1, 2.
Interpretation: Soil depth being analysed. 0-10 cm (1) or 10-20 cm (2) within the sample
taken.
GRAV
Units: Percentage.
Interpretation: Gravimetric soil moisture. Percent of soil in sample.
MOIST *
Units: Percentage.
Interpretation: Soil moisture content. Percent of water in sample.
Note: Closely related to GRAV above.
RAINFALL
Units: Mm per month.
Interpretation: Monthly rainfall as measured on site.
MAXTEMP
Units: Degrees Celsius.
Interpretation: Mean maximum temperature during a month.
MINTEMP *
Interpretation: Mean minimum temperature during a month.
Note: Very strong (85.593%) correlation with MAXTEMP.
TMPRANGE *
Interpretation: Mean temperature range during month.
Note: 99.73% correlation with (MAXTEMP - MINTEMP). Not simply max-min due to
rounding error.
201
HANO2, HAN03, HANH4, HATOTN *

Units: kgN/Ha.
Interpretation: Nitrite, nitrate, ammonium and total mineral nitrogen levels, respectively.
Note: Total is sum of previous three (rounding error applies).
POTNO2, POTNO3, POTNH4, POTTOTN *
Units: kgN/Ha.
Interpretation: Nitrite, nitrate, ammonium and total mineral nitrogen dynamics,
respectively.
Note: Total is sum of previous three (rounding error applies). Calculated from capped
core baseline.
LCHNO2, LCHNO3, LCHNH4, LCHTOTN *
Units: kgN/Ha.
Interpretation: Nitrite, nitrate, ammonium and total mineral nitrogen leaching,
respectively.
Note: Total is sum of previous three (rounding error applies). Calculated from capped
uncapped.
CUPOTNO2, CUPOTNO3, CUPOTNH4, CUPOTTN [All * ]
Units: kgN/Ha.
Interpretation: Cumulative nitrite, nitrate, ammonium and total mineral nitrogen
dynamics.
CULCHNO2, CULCHNO3, CULCHNH4, CULCHTN [All * ]
Units: kgN/Ha.
Interpretation: Cumulative nitrite, nitrate, ammonium and total mineral nitrogen leaching.
202
BIOLOGICAL VARIABLES
Note: All measurements taken at the 0-10cm soil depth.
BLOCK
Values: 1, 2, 3.
Interpretation: Replicate or block. These differ depending on slope present. 1 is upper
slope, 2 is mid slope, 3 is lower slope.
Note: In chemical data set as well.
MONTH
Values: 1 to 14.
Interpretation: Month or 28 day period.
Note: Corresponds with first 14 months from chemical set variable of same name.
COMPACT
Values: 1, 2, 3.
Interpretation: Level of compaction. None (1), one (2) or 16 (3) compactions.
CULT
Values: 1, 2.
Interpretation: Whether or not the sample has had cultivation applied. Not cultivated (1)
or cultivated with disc plough (2).
MBN
Units: Ug/g (micrograms per gram).
Interpretation: Microbial nitrogen level, referred to as MBN.
203
MBC
Units: Ug/g (micrograms per gram).
Interpretation: Microbial carbon level, referred to as MBC.
MICROC:N *
Units: Ratio.
Interpretation: Ratio of microbial carbon to nitrogen. MBC/MBN.
MBNFLUX *
Units: g/g (micrograms per gram).
Interpretation: Microbial nitrogen flux. 13 months because it is simply the difference in
microbial nitrogen between successive months.
MOIST
Units: Percentage.
Interpretation: Soil moisture. Percent of soil as oven-dried weight.
204
APPENDIX E NITRATE LEVELS

Degrees of freedom, test statistics, F values and p values have their usual interpretations.
In MANOVA, Wilk refers to Wilks lambda, Pillai to Pillais trace, HL to the
Hotelling-Lawley trace and Roy to Roys largest root. The number of asterisks (*)
represents the level of significance. If p is less than 0.001 then the symbol used is ***;
if p is less then 0.01 then the symbol used in ** and if p is less than 0.05 then the
symbol used is *.
Nitrate Levels By Compaction, Cultivation,

Depth 0-10 cm
100
0, None
1, None
16, None
90
0, Plough
1, Plough
16, Plough
80
70
60
50
40
30
20
10
0
1
10
Month
205
11
12
13
14
15
16
17
18
19
Nitrate Levels By Compaction, Cultivation,

Depth 0-10 cm (3MA Smoothed)
60
0, None
1, None
16, None
0, Plough
1, Plough
16, Plough
50
40
30
20
10
0
1
10
Month
206
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
0.071774977
0.390098182
0.225078152
0.217614909
0.117667997
Max. Temp
0.31109683
0.345335295
0.423550487
0.459377045
0.402983877
Min. Temp
0.211662798
0.351742539
0.279488789
0.169258885
0.09976742
Moisture
0.08446
0.11917
0.03844
-0.00111
-0.03071
Max. Temp
0.194825683
0.1604459
0.090235437
0.073443253
0.136392668
Min. Temp
0.384355859
0.152314574
0.277292335
0.53087977
0.723518466
Moisture
0.119
0.032
0.503
0.9851
0.6154
Correlation p-values
Lag Of
0
1
2
3
4
Rainfall
0.770289183
0.10951225
0.38508926
0.41815731
0.676205849
Cross-Correlation:
Nitrate Levels
Rainfall
Cross-Correlation:
Nitrate Levels
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Nitrate Levels
Min. Temp.
Cross-Correlation:
Nitrate Levels
Soil Moisture
1
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
207
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
4.463444
2.234167
1.088696
2.745249
25.47999
6.50085
0.838996
1.780685
p value
0.041172
0.165860
0.373447
0.112117
0.000000
0.000010
0.507291
0.104355
Sig
*
***
***
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.51692
Cultivation
5
6
0.48308
Cultivation
5
6
0.93454
Cultivation
5
6
0.93454
Compaction
10
12
0.11085
Compaction
10
14
1.20432
Compaction
10
6.7 5.17796
Compaction
5
7
4.55356
CompactionCultivation 10
12
0.39833
14
0.65686
6.7 1.37193
7
1.26215
Block
10
12
0.38669
Block
10
14
0.72814
Block
10
6.7 1.28912
Block
5
7
0.98881
208
F
Value
1.12
1.12
1.12
1.12
2.4
2.12
2.95
6.37
0.7
0.68
0.78
1.77
0.73
0.8
0.73
1.38
p value
Sig
0.438600
0.438600
0.438600
0.438600
0.076200
0.096700
0.086500
0.015300 *
0.708700
0.723200
0.651400
0.238200
0.686700
0.631000
0.682900
0.335000
Seasonal Split Plot
Season 1
Source Of Variation
Compaction
Cultivation
Compaction Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
16.27645
1.5912711
0.91367
0.4690602
11.827422
0.9947435
2.4712513
1.8914225
p value
0.000717
0.235776
0.432080
0.638685
0.000266
0.429507
0.105707
0.144662
Sig
***
Df
2
1
2
2
2
4
2
4
F Value
5.2468002
0.5459634
3.4642485
3.3826475
1.4820891
0.6083719
0.3492567
0.3892338
p value
0.027664
0.476957
0.071930
0.075499
0.247223
0.660507
0.708737
0.814212
Sig
*
Df
2
1
2
2
2
4
2
4
F Value
0.143626
0.0002949
1.3966129
0.6181587
6.7366834
0.5769341
0.0604309
0.7190796
p value
0.867967
0.986636
0.291810
0.558316
0.004763
0.682100
0.941502
0.587274
Sig
***
Season 2
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Season 3
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
209
**
Season 4
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
0.9748616
3.8284691
0.5316026
2.0401966
4.6030903
0.5033742
1.2328767
0.744417
p value
0.410403
0.078874
0.603386
0.180687
0.020319
0.733572
0.309260
0.571232
Df
2
1
2
2
2
4
2
4
F Value
1.3006132
0.2153729
0.1115725
1.425204
1.7952657
0.5815466
0.3404418
2.1860018
p value
0.314728
0.652527
0.895524
0.285375
0.187678
0.678914
0.714836
0.101080
Sig
Season 5
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
210
Sig
Seasonal MANOVA
Season 1
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.70075
0.29925
0.42704
0.42704
0.1312
0.98649
5.72481
5.56357
0.41762
0.65069
1.23098
1.07946
0.58917
0.46305
0.60865
0.36735
F
Value
1.14
1.14
1.14
1.14
4.7
2.92
7.34
16.69
1.46
1.45
1.58
3.24
0.81
0.9
0.78
1.1
p value
0.390300
0.390300
0.390300
0.390300
0.006100
0.036100
0.004400
0.000500
0.253500
0.251700
0.257400
0.074600
0.578900
0.513700
0.605600
0.397600
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.79306
Cultivation
3
8
0.20694
Cultivation
3
8
0.26093
Cultivation
3
8
0.26093
Compaction
6
16
0.22955
Compaction
6
18
1.03212
Compaction
6
9.1 2.21635
Compaction
3
9
1.40505
16
0.41398
18
0.68208
9.1 1.18356
9
0.93553
Block
6
16
0.49938
Block
6
18
0.50771
Block
6
9.1 0.98829
Block
3
9
0.97371
F
Value
0.7
0.7
0.7
0.7
2.9
3.2
2.84
4.22
1.48
1.55
1.52
2.81
1.11
1.02
1.27
2.92
p value
0.580200
0.580200
0.580200
0.580200
0.041500
0.025700
0.076700
0.040500
0.247600
0.218000
0.274400
0.100400
0.400800
0.443300
0.358900
0.092700
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Sig
**
*
**
***
Season 2
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
211
Sig
*
*
*
Season 3
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.98768
0.01232
0.01247
0.01247
0.74211
0.26932
0.33211
0.27639
0.57897
0.47451
0.63483
0.40897
0.64104
0.37778
0.5306
0.46782
F
Value
0.03
0.03
0.03
0.03
0.43
0.47
0.43
0.83
0.84
0.93
0.81
1.23
0.66
0.7
0.68
1.4
p value
Sig
0.991200
0.991200
0.991200
0.991200
0.849000
0.823800
0.844700
0.510500
0.558500
0.495300
0.584700
0.355400
0.679800
0.654300
0.670600
0.304100
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.49931
Cultivation
3
8
0.50069
Cultivation
3
8
1.00278
Cultivation
3
8
1.00278
Compaction
6
16
0.76853
Compaction
6
18
0.24142
Compaction
6
9.1 0.28824
Compaction
3
9
0.23259
16
0.79247
18
0.21449
9.1 0.2531
9
0.21157
Block
6
16
0.51414
Block
6
18
0.56152
Block
6
9.1 0.79783
Block
3
9
0.50832
F
Value
2.67
2.67
2.67
2.67
0.38
0.41
0.37
0.7
0.33
0.36
0.32
0.63
1.05
1.17
1.02
1.52
p value
Sig
0.118300
0.118300
0.118300
0.118300
0.884100
0.861500
0.881000
0.576500
0.912000
0.894400
0.908100
0.611100
0.429500
0.364400
0.467700
0.273800
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Season 4
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
212
Season 5
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
213
Test
Stat.
0.90827
0.09173
0.101
0.101
0.65942
0.36137
0.48494
0.40756
0.4072
0.68112
1.2389
1.02789
0.53227
0.48953
0.83776
0.78561
F
Value
0.27
0.27
0.27
0.27
0.62
0.66
0.62
1.22
1.51
1.55
1.59
3.08
0.99
0.97
1.07
2.36
p value
Sig
0.845900
0.845900
0.845900
0.845900
0.713900
0.681300
0.710200
0.356700
0.236700
0.218900
0.254600
0.082800
0.465200
0.471500
0.442500
0.139800
APPENDIX F AMMONIUM LEVELS

symbol used is *.
Ammonium Levels By Compaction, Cultivation,

Depth 0-10 cm
60
0, None
1, None
16, None
0, Plough
1, Plough
16, Plough
50
40
30
20
10
0
1
10
Month
214
11
12
13
14
15
16
17
18
19
Ammonium Levels By Compaction, Cultivation,

25
0, None
1, None
16, None
0, Plough
1, Plough
16, Plough
20
15
10
0
1
10
Month
215
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
0.374834073
0.300605831
0.091370791
-0.1447029
-0.70678132
Max. Temp
0.300855478
0.165882532
0.022838142
0.08629043
-0.13572033
Min. Temp
0.399884796
0.32567179
0.18765877
-0.00320268
-0.34585806
Moisture
0.03618
0.06732
0.06133
-0.13599
-0.13985
Max. Temp
0.211776475
0.511274898
0.93075245
0.751034017
0.630281678
Min. Temp
0.090931187
0.18847401
0.471560417
0.990620832
0.208555538
Moisture
0.5164
0.2403
0.2996
0.0254
0.0264
Lag Of
0
1
2
3
4
Rainfall
0.114961938
0.226662833
0.727613915
0.593519395
0.003630989
Cross-Correlation:
Ammonium Levels
Rainfall
Cross-Correlation:
Ammonium Levels
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Ammonium Levels
Min. Temp.
Cross-Correlation:
Ammonium Levels
Soil Moisture
1
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
216
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
0.55428
16.20717
8.07254
0.401862
9.444786
0.691267
2.889562
0.467082
p value
0.591168
0.002416
0.008186
0.679411
0.000010
0.697171
0.031902
0.873146
Sig
**
**
***
*
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.20791
Cultivation
5
6
0.79209
Cultivation
5
6
3.80978
Cultivation
5
6
3.80978
Compaction
10
12
0.22061
Compaction
10
14
0.90318
Compaction
10
6.7 2.97184
Compaction
5
7
2.76922
12
0.09638
14
1.09306
6.7 7.4103
7
7.13481
Block
10
12
0.28971
Block
10
14
0.85878
Block
10
6.7 1.93913
Block
5
7
1.62341
217
F
Value
4.57
4.57
4.57
4.57
1.35
1.15
1.69
3.88
2.67
1.69
4.22
9.99
1.03
1.05
1.1
2.27
p value
0.045800
0.045800
0.045800
0.045800
0.305000
0.393000
0.254300
0.052800
0.055500
0.180000
0.037100
0.004300
0.473900
0.452200
0.464600
0.157100
Sig
*
*
*
*
*
**
Seasonal Split Plot
Season 1
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
9.92615
11.369482
32.614782
1.2204024
8.1213334
0.5126283
0.4873514
1.4440088
p value
0.004218
0.007099
0.000042
0.335550
0.002024
0.727043
0.620199
0.250201
Sig
**
**
***
Df
2
1
2
2
2
4
2
4
F Value
0.1349308
2.4482647
0.5506311
0.3888218
3.3622712
3.0347654
4.4844597
0.5105076
p value
0.875341
0.148722
0.593114
0.687671
0.051607
0.037021
0.022145
0.728538
Sig
Df
2
1
2
2
2
4
2
4
F Value
0.1977355
24.47833
5.0843357
1.1089639
2.0311934
0.4866067
1.6554639
1.4942311
p value
0.823719
0.000581
0.029965
0.367293
0.153123
0.745427
0.212079
0.235301
**
Season 2
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
*
*
Season 3
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
218
Sig
***
*
Season 4
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
0.7894138
4.8778011
1.2255112
1.8676206
51.581116
0.0290996
3.0976969
0.1503591
p value
0.480482
0.051685
0.334176
0.204559
0.000000
0.998246
0.063569
0.961017
Df
2
1
2
2
1
2
1
2
F Value
0.6688134
0.1888934
0.2919175
1.2187892
12.64152
1.0380984
0.2499859
0.4339011
p value
0.533813
0.673068
0.752982
0.335986
0.003956
0.383862
0.626127
0.657751
Sig
***
Season 5
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
219
Sig
**
Seasonal MANOVA
Season 1
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.32401
0.67599
2.08637
2.08637
0.28025
0.76841
2.39464
2.3198
0.09157
0.94579
9.51314
9.47007
0.65911
0.36883
0.4748
0.35556
F
Value
5.56
5.56
5.56
5.56
2.37
1.87
3.07
6.96
6.15
2.69
12.2
28.41
0.62
0.68
0.61
1.07
p value
0.023300
0.023300
0.023300
0.023300
0.078600
0.141300
0.063200
0.010100
0.001700
0.048100
0.000700
<.0001
0.713300
0.669000
0.719100
0.410600
Sig
*
*
*
*
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.20901
Cultivation
3
8
0.79099
Cultivation
3
8
3.78444
Cultivation
3
8
3.78444
Compaction
6
16
0.15302
Compaction
6
18
1.16514
Compaction
6
9.1 3.45598
Compaction
3
9
2.68022
16
0.51394
18
0.49822
9.1 0.92209
9
0.89567
Block
6
16
0.53485
Block
6
18
0.50742
Block
6
9.1 0.79067
Block
3
9
0.6733
F
Value
10.09
10.09
10.09
10.09
4.15
4.19
4.43
8.04
1.05
1
1.18
2.69
0.98
1.02
1.01
2.02
p value
0.004300
0.004300
0.004300
0.004300
0.010500
0.008300
0.022800
0.006500
0.429100
0.458000
0.393500
0.109400
0.470400
0.443800
0.472400
0.181700
Sig
**
**
**
**
*
**
*
**
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
*
**
*
***
***
Season 2
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
220
Season 3
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.26095
0.73905
2.83216
2.83216
0.72612
0.28684
0.35934
0.29981
0.30654
0.7773
1.98871
1.84007
0.47386
0.58589
0.98427
0.83288
F
Value
7.55
7.55
7.55
7.55
0.46
0.5
0.46
0.9
2.15
1.91
2.55
5.52
1.21
1.24
1.26
2.5
p value
0.010100
0.010100
0.010100
0.010100
0.825700
0.798500
0.821200
0.478500
0.103700
0.134700
0.099300
0.019900
0.352400
0.331200
0.360900
0.125600
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.67201
Cultivation
3
8
0.32799
Cultivation
3
8
0.48808
Cultivation
3
8
0.48808
Compaction
6
16
0.55689
Compaction
6
18
0.46043
Compaction
6
9.1 0.76458
Compaction
3
9
0.72146
16
0.52724
18
0.51642
9.1 0.81385
9
0.69464
Block
6
16
0.52835
Block
6
18
0.5043
Block
6
9.1 0.8309
Block
3
9
0.74833
F
Value
1.3
1.3
1.3
1.3
0.91
0.9
0.98
2.16
1.01
1.04
1.04
2.08
1
1.01
1.07
2.24
p value
Sig
0.339100
0.339100
0.339100
0.339100
0.514400
0.517900
0.489800
0.162100
0.455300
0.430100
0.457400
0.172700
0.457500
0.448600
0.446700
0.152300
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Sig
*
*
*
*
Season 4
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
221
Season 5
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
2
2
2
2
4
4
4
2
4
4
4
2
4
4
4
2
Ddf
9
9
9
9
18
20
9.9
10
18
20
9.9
10
18
20
9.9
10
Test
Stat.
0.97719
0.02281
0.02334
0.02334
0.73202
0.28678
0.3404
0.22751
0.92112
0.07924
0.08525
0.08044
0.66167
0.36166
0.47608
0.38434
F
Value
0.11
0.11
0.11
0.11
0.76
0.84
0.75
1.14
0.19
0.21
0.19
0.4
1.03
1.1
1.05
1.92
p value
Sig
0.901400
0.901400
0.901400
0.901400
0.565000
0.517800
0.581900
0.358800
0.941200
0.931900
0.939600
0.679200
0.417700
0.382100
0.432000
0.196700
Note: The season 5 model has two dependent variables month 16 is not included.
222
APPENDIX G TOTAL MINERAL NITROGEN

LEVELS
represents the level of significance. If p is less than 0.001 then ***, if p is less then
0.01 then ** and if p is less than 0.05 then *.
Total Mineral Nitrogen Levels By Compaction,

Cultivation, Depth 0-10 cm
120
0, None
1, None
16, None
0, Plough
1, Plough
16, Plough
Mean Mineral Nitrogen (kgN/ha)
100
80
60
40
20
0
1
10
Month
223
11
12
13
14
15
16
17
18
19
Total Mineral Nitrogen Levels By Compaction,

Cultivation, Depth 0-10 cm (3MA Smoothed)
100
0, None
1, None
16, None
90
0, Plough
1, Plough
16, Plough
80
70
60
50
40
30
20
10
0
1
10
Month
224
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
0.149596438
0.43981975
0.355556502
0.311261646
0.130664642
Max. Temp
0.290523525
0.413163352
0.489020628
0.591480017
0.501739359
Min. Temp
0.265994179
0.484185647
0.426642884
0.331237485
0.175608414
Moisture
0.14902
0.14639
0.04027
-0.02697
-0.02886
Max. Temp
0.227574098
0.088347873
0.046361569
0.015808237
0.056696484
Min. Temp
0.271023681
0.04173754
0.087662635
0.210124591
0.531302009
Moisture
0.0058
0.0083
0.4827
0.6486
0.6369
Lag Of
0
1
2
3
4
Rainfall
0.541025743
0.067786607
0.161333342
0.24059967
0.642534758
Cross-Correlation:
Total N. Levels
Rainfall
Cross-Correlation:
Total N. Levels
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Total N. Levels
Min. Temp.
Cross-Correlation:
Total N. Levels
Soil Moisture
1
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
225
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
3.237539
4.135041
0.095566
3.011809
20.70201
2.791478
0.583816
0.947481
p value
Sig
0.082388
0.069400
0.909678
0.094667
0.000000 ***
0.012731 *
0.675846
0.487387
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.57608
Cultivation
5
6
0.42392
Cultivation
5
6
0.73587
Cultivation
5
6
0.73587
Compaction
10
12
0.19859
Compaction
10
14
1.01767
Compaction
10
6.7 2.94664
Compaction
5
7
2.51336
12
0.44719
14
0.56636
6.7 1.20591
7
1.18024
Block
10
12
0.34864
Block
10
14
0.71556
Block
10
6.7 1.68419
Block
5
7
1.56666
226
F
Value
0.88
0.88
0.88
0.88
1.49
1.45
1.68
3.52
0.59
0.55
0.69
1.65
0.83
0.78
0.96
2.19
p value
Sig
0.544600
0.544600
0.544600
0.544600
0.252300
0.254800
0.257900
0.065700
0.791100
0.825200
0.715400
0.263200
0.609000
0.647800
0.541700
0.167300
Seasonal Split Plot
Season 1
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
6.041082
0.2516923
0.8330084
0.0676657
13.320379
0.7994956
1.3335844
1.9201531
p value
0.019045
0.626746
0.462793
0.934997
0.000128
0.537388
0.282366
0.139671
Sig
*
Df
2
1
2
2
2
4
2
4
F Value
3.1798703
1.0836714
0.5966758
3.3937885
1.5692784
0.9593558
0.1725496
0.2471376
p value
0.085333
0.322392
0.569114
0.075000
0.228820
0.447634
0.842551
0.908557
Sig
Df
2
1
2
2
2
4
2
4
F Value
0.4805692
2.3308639
4.9457739
0.6367425
4.8493954
0.3432344
0.0697412
1.0485044
p value
0.632007
0.157819
0.032111
0.549173
0.017028
0.846042
0.932823
0.403169
Sig
***
Season 2
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Season 3
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
227
*
*
Season 4
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
1.7424244
4.467433
0.2122454
1.8501823
7.0860892
0.8788308
1.3181026
0.7944589
p value
0.224269
0.060671
0.812317
0.207176
0.003816
0.491239
0.286330
0.540423
Df
2
1
2
2
2
4
2
4
F Value
1.5163622
0.0611204
0.1244686
0.5135517
26.795082
1.5538633
0.1527255
1.4543626
p value
0.265965
0.809735
0.884313
0.613328
0.000001
0.218737
0.859193
0.247056
Sig
**
Season 5
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
228
Sig
***
Seasonal MANOVA
Season 1
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.7936
0.2064
0.26007
0.26007
0.28357
0.90768
1.85204
1.35389
0.25513
0.75602
2.87577
2.86048
0.6056
0.40366
0.63599
0.61098
F
Value
0.69
0.69
0.69
0.69
2.34
2.49
2.37
4.06
2.61
1.82
3.69
8.58
0.76
0.76
0.82
1.83
p value
0.581400
0.581400
0.581400
0.581400
0.081500
0.062100
0.116600
0.044300
0.058400
0.150900
0.038700
0.005300
0.611400
0.611300
0.583800
0.211400
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.65062
Cultivation
3
8
0.34938
Cultivation
3
8
0.537
Cultivation
3
8
0.537
Compaction
6
16
0.3996
Compaction
6
18
0.72665
Compaction
6
9.1 1.18654
Compaction
3
9
0.78307
16
0.53245
18
0.51254
9.1 0.7936
9
0.66689
Block
6
16
0.41811
Block
6
18
0.60215
Block
6
9.1 1.34324
Block
3
9
1.30614
F
Value
1.43
1.43
1.43
1.43
1.55
1.71
1.52
2.35
0.99
1.03
1.02
2
1.46
1.29
1.72
3.92
p value
Sig
0.303600
0.303600
0.303600
0.303600
0.224800
0.175500
0.273300
0.140600
0.465600
0.435900
0.470500
0.184500
0.254400
0.310000
0.221500
0.048300 *
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Sig
*
**
Season 2
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
229
Season 3
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.73508
0.26492
0.3604
0.3604
0.69067
0.31193
0.44409
0.43543
0.31378
0.79948
1.82595
1.60042
0.66737
0.34969
0.47285
0.41057
F
Value
0.96
0.96
0.96
0.96
0.54
0.55
0.57
1.31
2.09
2
2.34
4.8
0.6
0.64
0.61
1.23
p value
Sig
0.456700
0.456700
0.456700
0.456700
0.768900
0.760500
0.746300
0.331200
0.111300
0.119300
0.120300
0.029000 *
0.728200
0.700400
0.720800
0.353900
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.50945
Cultivation
3
8
0.49055
Cultivation
3
8
0.9629
Cultivation
3
8
0.9629
Compaction
6
16
0.6961
Compaction
6
18
0.31622
Compaction
6
9.1 0.41887
Compaction
3
9
0.37119
16
0.7811
18
0.23006
9.1 0.26597
9
0.19133
Block
6
16
0.53038
Block
6
18
0.53763
Block
6
9.1 0.75722
Block
3
9
0.50157
F
Value
2.57
2.57
2.57
2.57
0.53
0.56
0.54
1.11
0.35
0.39
0.34
0.57
0.99
1.1
0.97
1.5
p value
Sig
0.127300
0.127300
0.127300
0.127300
0.778000
0.753800
0.768700
0.393500
0.899200
0.875800
0.898400
0.646300
0.461500
0.398500
0.494800
0.278600
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Season 4
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
230
Season 5
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
231
Test
Stat.
0.96712
0.03288
0.034
0.034
0.40778
0.71627
1.14812
0.73328
0.29029
0.86384
1.91396
1.57736
0.32136
0.7016
2.04035
2.00471
F
Value
0.09
0.09
0.09
0.09
1.51
1.67
1.47
2.2
2.28
2.28
2.45
4.73
2.04
1.62
2.62
6.01
p value
Sig
0.963100
0.963100
0.963100
0.963100
0.237600
0.184800
0.288000
0.157700
0.087700
0.081800
0.108400
0.030100 *
0.119600
0.198600
0.093500
0.015600 *
APPENDIX H NITRATE DYNAMICS

symbol used is *.
Nitrate Dynamics By Compaction, Cultivation,

Depth 0-10 cm
50
40
30
20
10
-10
-20
-30
-40
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
-50
1
10
Month
232
11
12
13
14
15
16
17
18
19
Nitrate Dynamics By Compaction, Cultivation,

40
30
20
10
-10
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
-20
1
10
Month
233
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
0.151193688
0.229692186
-0.32442784
-0.15824778
0.121083764
Max. Temp
0.096098762
0.332712651
0.428952959
-0.06717411
-0.25202723
Min. Temp
0.040455248
0.181337575
0.03836094
-0.25523954
-0.23340802
Moisture
0.01601
0.01005
0.04219
-0.01604
-0.00451
Max. Temp
0.695534862
0.177319698
0.085775573
0.804769908
0.364846677
Min. Temp
0.869388484
0.471449988
0.883785992
0.340046937
0.402470603
Moisture
0.768
0.8571
0.4621
0.7864
0.9412
Lag Of
0
1
2
3
4
Rainfall
0.53666426
0.359203252
0.203918649
0.558310579
0.667298054
Cross-Correlation:
Nitrate Dynamics
Rainfall
Cross-Correlation:
Nitrate Dynamics
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Nitrate Dynamics
Min. Temp.
Cross-Correlation:
Nitrate Dynamics
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
234
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
0.361981
0.001363
1.458587
0.415109
3.56246
1.068198
0.930248
2.585801
p value
Sig
0.705056
0.971281
0.278075
0.671141
0.012676 *
0.400851
0.454375
0.019593 *
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.7117
Cultivation
5
6
0.2883
Cultivation
5
6
0.40509
Cultivation
5
6
0.40509
Compaction
10
12
0.29475
Compaction
10
14
0.8311
Compaction
10
6.7 1.96573
Compaction
5
7
1.71706
12
0.22964
14
0.86304
6.7 2.95108
7
2.80731
Block
10
12
0.38124
Block
10
14
0.68382
Block
10
6.7 1.45235
Block
5
7
1.3234
235
F
Value
0.49
0.49
0.49
0.49
1.01
1
1.12
2.4
1.3
1.06
1.68
3.93
0.74
0.73
0.83
1.85
p value
Sig
0.777100
0.777100
0.777100
0.777100
0.486000
0.489900
0.457300
0.142000
0.327000
0.446400
0.257300
0.051200
0.676200
0.689300
0.621700
0.221300
Seasonal Split Plot
Season 1
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
3.1790453
0.1646168
0.8604193
3.941777
0.8581284
0.5881418
5.4327159
1.7116214
p value
0.085376
0.693486
0.452071
0.054668
0.436558
0.674368
0.011319
0.180283
Sig
Df
2
1
2
2
2
4
2
4
F Value
0.3696634
0.004026
4.0413913
0.3935162
1.0281372
0.5423882
0.1178653
0.8543679
p value
0.700027
0.950658
0.051722
0.684684
0.372895
0.706136
0.889327
0.505134
Sig
Df
2
1
2
2
2
4
2
4
F Value
0.3361052
1.9937817
2.032109
0.2872698
1.4895469
0.2070019
0.2848167
0.6277079
p value
0.722317
0.188299
0.181728
0.756298
0.245588
0.932012
0.754659
0.647382
Sig
Season 2
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Season 3
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
236
Season 4
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
1.5497833
3.1929193
0.0034687
0.4527546
5.7225569
0.1571715
2.0341718
1.6123264
p value
0.259248
0.104256
0.996538
0.648291
0.009287
0.957832
0.152733
0.203616
Df
2
1
2
2
2
4
2
4
F Value
0.4541577
0.0508765
0.1663616
1.2011149
2.2909384
0.162766
0.0470829
1.9005445
p value
0.647458
0.826088
0.849036
0.340801
0.122870
0.955149
0.954096
0.143058
Sig
**
Season 5
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
237
Sig
Seasonal MANOVA
Season 1
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.47737
0.52263
1.09481
1.09481
0.46914
0.53604
1.1205
1.11055
0.44637
0.60111
1.13389
1.03068
0.2136
1.05712
2.41433
1.64288
F
Value
2.92
2.92
2.92
2.92
1.23
1.1
1.44
3.33
1.32
1.29
1.45
3.09
3.1
3.36
3.1
4.93
p value
0.100300
0.100300
0.100300
0.100300
0.343600
0.400800
0.299100
0.070100
0.302500
0.311300
0.293600
0.082300
0.032700
0.021100
0.061900
0.027100
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.89286
Cultivation
3
8
0.10714
Cultivation
3
8
0.12
Cultivation
3
8
0.12
Compaction
6
16
0.78957
Compaction
6
18
0.22257
Compaction
6
9.1 0.25114
Compaction
3
9
0.1455
16
0.43887
18
0.61635
9.1 1.15275
9
1.03067
Block
6
16
0.65806
Block
6
18
0.36334
Block
6
9.1 0.48711
Block
3
9
0.40727
F
Value
0.32
0.32
0.32
0.32
0.33
0.38
0.32
0.44
1.36
1.34
1.48
3.09
0.62
0.67
0.62
1.22
p value
Sig
0.810900
0.810900
0.810900
0.810900
0.908800
0.884900
0.909600
0.732300
0.289400
0.292200
0.286200
0.082300
0.711400
0.678100
0.708300
0.357000
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Sig
*
*
*
Season 2
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
238
Season 3
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.80529
0.19471
0.24179
0.24179
0.74805
0.25372
0.33446
0.32725
0.42663
0.6352
1.19907
1.06272
0.75083
0.25823
0.31979
0.27608
F
Value
0.64
0.64
0.64
0.64
0.42
0.44
0.43
0.98
1.42
1.4
1.54
3.19
0.41
0.44
0.41
0.83
p value
Sig
0.607700
0.607700
0.607700
0.607700
0.857300
0.845300
0.842700
0.443600
0.268500
0.269500
0.268700
0.077100
0.861100
0.839200
0.855100
0.510900
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.47836
Cultivation
3
8
0.52164
Cultivation
3
8
1.09048
Cultivation
3
8
1.09048
Compaction
6
16
0.62944
Compaction
6
18
0.38034
Compaction
6
9.1 0.57317
Compaction
3
9
0.54463
16
0.57351
18
0.4685
9.1 0.67039
9
0.53294
Block
6
16
0.66111
Block
6
18
0.35837
Block
6
9.1 0.48314
Block
3
9
0.41156
F
Value
2.91
2.91
2.91
2.91
0.69
0.7
0.73
1.63
0.85
0.92
0.86
1.6
0.61
0.65
0.62
1.23
p value
Sig
0.101100
0.101100
0.101100
0.101100
0.657700
0.650000
0.634600
0.249500
0.547600
0.504900
0.557300
0.257000
0.716900
0.686200
0.711800
0.353000
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Season 4
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
239
Season 5
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
240
Test
Stat.
0.98709
0.01291
0.01308
0.01308
0.60638
0.39475
0.64725
0.64435
0.32519
0.75246
1.83635
1.69552
0.43874
0.62738
1.12857
0.97381
F
Value
0.03
0.03
0.03
0.03
0.76
0.74
0.83
1.93
2.01
1.81
2.35
5.09
1.36
1.37
1.45
2.92
p value
Sig
0.990600
0.990600
0.990600
0.990600
0.612900
0.626100
0.575000
0.194800
0.124000
0.153800
0.118900
0.024900 *
0.289200
0.278700
0.295800
0.092600
APPENDIX I AMMONIUM DYNAMICS

symbol used is *.
Ammonium Dynamics By Compaction, Cultivation,

Depth 0-10 cm
60
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
40
20
-20
-40
-60
1
10
Month
241
11
12
13
14
15
16
17
18
19
Ammonium Dynamics By Compaction, Cultivation,

25
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
20
15
10
-5
-10
-15
-20
1
10
Month
242
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
-0.30827252
-0.06403202
-0.19477609
-0.04303375
0.501893265
Max. Temp
0.070493888
0.050717585
-0.02038566
-0.29792401
-0.17026102
Min. Temp
-0.22071841
-0.20659955
-0.31777376
-0.31542012
0.049907475
Moisture
-0.1233
-0.15974
-0.11039
-0.06122
-0.04534
Max. Temp
0.774752405
0.841954766
0.938259345
0.265583316
0.545996066
Min. Temp
0.365567613
0.412524211
0.216840615
0.237364767
0.860296553
Moisture
0.0311
0.0066
0.0701
0.3331
0.49
Lag Of
0
1
2
3
4
Rainfall
0.20143035
0.801178549
0.455584055
0.87464355
0.060410778
Cross-Correlation:
Ammonium Dynamics
Rainfall
Cross-Correlation:
Ammonium Dynamics
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Ammonium Dynamics
Min. Temp.
Cross-Correlation:
Ammonium Dynamics
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
243
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
0.586346
1.163804
4.236824
1.90036
4.502184
1.356114
1.917511
0.342826
p value
Sig
0.574395
0.306011
0.046477 *
0.199752
0.003607 **
0.239880
0.122747
0.944551
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.43147
Cultivation
5
6
0.56853
Cultivation
5
6
1.31765
Cultivation
5
6
1.31765
Compaction
10
12
0.2916
Compaction
10
14
0.8445
Compaction
10
6.7 1.96264
Compaction
5
7
1.68578
12
0.11402
14
1.28728
6.7 4.25061
7
3.124
Block
10
12
0.15152
Block
10
14
1.214
Block
10
6.7 3.18748
Block
5
7
1.95102
244
F
Value
1.58
1.58
1.58
1.58
1.02
1.02
1.12
2.36
2.35
2.53
2.42
4.37
1.88
2.16
1.81
2.73
p value
Sig
0.294900
0.294900
0.294900
0.294900
0.478400
0.471600
0.458100
0.146800
0.081100
0.055100
0.131500
0.039900 *
0.148800
0.091000
0.225900
0.111400
APPENDIX J TOTAL MINERAL NITROGEN

DYNAMICS
symbol used is *.
Total Mineral Nitrogen Dynamics By Compaction,

80
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
60
40
20
-20
-40
-60
1
10
Month
245
11
12
13
14
15
16
17
18
19
Total Mineral Nitrogen Dynamics By Compaction,

50
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
40
30
20
10
-10
-20
-30
1
10
Month
246
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
0.072752525
0.199769622
-0.31710666
-0.1705821
0.011916034
Max. Temp
0.249203995
0.241858587
0.275800519
-0.15791168
-0.36315432
Min. Temp
0.114758559
0.065315971
-0.09265454
-0.35713878
-0.27357296
Moisture
-0.06176
-0.02599
-0.01167
-0.03348
-0.02251
Max. Temp
0.303546342
0.333588332
0.28394693
0.559157388
0.183373231
Min. Temp
0.639918885
0.796798672
0.723572761
0.17447377
0.323830451
Moisture
0.2547
0.6411
0.8389
0.5715
0.7127
Lag Of
0
1
2
3
4
Rainfall
0.767242234
0.426750497
0.214904381
0.527624829
0.966380845
Cross-Correlation:
Total N. Dynamics
Rainfall
Cross-Correlation:
Total N. Dynamics
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Total N. Dynamics
Min. Temp.
Cross-Correlation:
Total N. Dynamics
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
247
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
0.113938
0.172506
0.564854
0.307029
7.699315
0.952014
0.737626
1.951618
p value
Sig
0.893455
0.686665
0.585573
0.742323
0.000071 ***
0.483951
0.570941
0.073588
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.77256
Cultivation
5
6
0.22744
Cultivation
5
6
0.2944
Cultivation
5
6
0.2944
Compaction
10
12
0.22711
Compaction
10
14
0.99443
Compaction
10
6.7 2.42767
Compaction
5
7
1.91947
12
0.27315
14
0.8279
6.7 2.29104
7
2.11623
Block
10
12
0.28109
Block
10
14
0.8403
Block
10
6.7 2.12576
Block
5
7
1.89827
248
F
Value
0.35
0.35
0.35
0.35
1.32
1.38
1.38
2.69
1.1
0.99
1.3
2.96
1.06
1.01
1.21
2.66
p value
Sig
0.863300
0.863300
0.863300
0.863300
0.320800
0.280700
0.347200
0.115000
0.433800
0.494300
0.376400
0.094600
0.453100
0.477300
0.415300
0.117500
APPENDIX K NITRATE LEACHING

symbol used is *.
Nitrate Leaching By Compaction, Cultivation,

Depth 0-10 cm
60
50
40
30
20
10
0
-10
-20
-30
0, None
1, None
16, None
-40
-50
1
10
Month
249
11
12
13
0, Plough
1, Plough
16, Plough
14
15
16
17
18
19
Nitrate Leaching By Compaction, Cultivation,

40
30
20
10
-10
0, None
1, None
16, None
-20
1
0, Plough
1, Plough
16, Plough
6
10
Month
250
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
-0.22372978
0.036330546
-0.47593451
-0.00302566
-0.07609275
Max. Temp
0.196864848
0.136222382
-0.10248736
-0.40839496
-0.38222812
Min. Temp
-0.02160497
0.020689343
-0.33868516
-0.41043598
-0.267336
Moisture
-0.06462
-0.06545
-0.07224
0.00638
-0.04805
Max. Temp
0.419200558
0.589904278
0.695492116
0.116307856
0.159722667
Min. Temp
0.930043439
0.935057104
0.183591897
0.114317045
0.335420941
Moisture
0.2333
0.2401
0.2076
0.9141
0.4317
Lag Of
0
1
2
3
4
Rainfall
0.357177485
0.886196896
0.053467953
0.991127076
0.787522533
Cross-Correlation:
Nitrate Leaching
Rainfall
Cross-Correlation:
Nitrate Leaching
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Nitrate Leaching
Min. Temp.
Cross-Correlation:
Nitrate Leaching
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
251
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
0.623358
5.686292
1.221334
0.4524
4.303844
1.451897
0.478589
0.615986
p value
Sig
0.555740
0.038311 *
0.335299
0.648502
0.004686 **
0.200016
0.751252
0.760030
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.38083
Cultivation
5
6
0.61917
Cultivation
5
6
1.62587
Cultivation
5
6
1.62587
Compaction
10
12
0.43324
Compaction
10
14
0.67993
Compaction
10
6.7 1.047
Compaction
5
7
0.63683
12
0.53235
14
0.53626
6.7 0.74957
7
0.4824
Block
10
12
0.35023
Block
10
14
0.81465
Block
10
6.7 1.38447
Block
5
7
0.78397
252
F
Value
1.95
1.95
1.95
1.95
0.62
0.72
0.6
0.89
0.44
0.51
0.43
0.68
0.83
0.96
0.79
1.1
p value
Sig
0.219700
0.219700
0.219700
0.219700
0.769200
0.694200
0.778400
0.534100
0.895800
0.854200
0.891400
0.656000
0.612400
0.512600
0.646700
0.438400
APPENDIX L AMMONIUM LEACHING

symbol used is *.
Ammonium Leaching By Compaction, Cultivation,

Depth 0-10 cm
20
0, None
1, None
16, None
15
0, Plough
1, Plough
16, Plough
10
-5
-10
-15
-20
-25
1
10
Month
253
11
12
13
14
15
16
17
18
19
Ammonium Leaching By Compaction, Cultivation,

10
-2
-4
-6
0, None
1, None
16, None
-8
-10
1
0, Plough
1, Plough
16, Plough
6
10
Month
254
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
-0.20617833
-0.04946184
-0.06037652
0.163350894
0.306248611
Max. Temp
0.178228084
0.253037596
0.135222092
-0.03935116
0.041600246
Min. Temp
-0.01761848
0.13081083
-0.01605389
-0.13420019
0.062753102
Moisture
-0.00897
-0.03976
-0.06684
-0.03544
-0.05768
Max. Temp
0.46538218
0.311018557
0.604838715
0.884954691
0.882972714
Min. Temp
0.942929317
0.604895997
0.951236976
0.620241463
0.824177992
Moisture
0.8687
0.4757
0.2437
0.5491
0.3451
Lag Of
0
1
2
3
4
Rainfall
0.397077594
0.845469738
0.817947011
0.54552195
0.266938457
Cross-Correlation:
Ammonium Leaching
Rainfall
Cross-Correlation:
Ammonium Leaching
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Ammonium Leaching
Min. Temp.
Cross-Correlation:
Ammonium Leaching
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
255
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
1.166713
9.036143
0.168363
0.070164
3.827587
1.268665
1.356305
0.338753
p value
Sig
0.350414
0.013207 *
0.847393
0.932696
0.008854 **
0.281992
0.263073
0.946410
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.32761
Cultivation
5
6
0.67239
Cultivation
5
6
2.05238
Cultivation
5
6
2.05238
Compaction
10
12
0.16027
Compaction
10
14
1.04071
Compaction
10
6.7 3.98562
Compaction
5
7
3.64124
12
0.41624
14
0.64082
6.7 1.26535
7
1.14568
Block
10
12
0.18663
Block
10
14
1.12079
Block
10
6.7 2.71108
Block
5
7
1.79176
256
F
Value
2.46
2.46
2.46
2.46
1.8
1.52
2.27
5.1
0.66
0.66
0.72
1.6
1.58
1.78
1.54
2.51
p value
Sig
0.151600
0.151600
0.151600
0.151600
0.166800
0.230400
0.149500
0.027400 *
0.740800
0.742700
0.692100
0.274700
0.224600
0.156100
0.294600
0.131200
APPENDIX M TOTAL MINERAL NITROGEN

LEACHING
symbol used is *.
Total Mineral Nitrogen Leaching By Compaction,

60
40
20
-20
-40
0, None
1, None
16, None
-60
1
0, Plough
1, Plough
16, Plough
6
10
Month
257
11
12
13
14
15
16
17
18
19
Total Mineral Nitrogen Leaching By Compaction,

50
0, None
1, None
16, None
0, Plough
1, Plough
16, Plough
40
30
20
10
-10
-20
1
10
Month
258
11
12
13
14
15
16
17
18
19
Correlations
Lag Of
0
1
2
3
4
Rainfall
-0.17141961
0.050710422
-0.47405996
-0.06907472
-0.03634688
Max. Temp
0.243220592
0.225847715
0.003430887
-0.34116018
-0.39590333
Min. Temp
0.062611986
0.105032715
-0.27977062
-0.41998601
-0.31650144
Moisture
-0.03378
-0.02691
-0.05707
0.00437
-0.07238
Max. Temp
0.315681241
0.367522798
0.989573271
0.195955369
0.144074128
Min. Temp
0.799000809
0.678306393
0.276787792
0.105317744
0.250425909
Moisture
0.5336
0.6293
0.3197
0.9559
0.2359
Lag Of
0
1
2
3
4
Rainfall
0.482869338
0.84161495
0.054549003
0.799349554
0.897674487
Cross-Correlation:
Total N. Leaching
Rainfall
Cross-Correlation:
Total N. Leaching
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Total N. Leaching
Min. Temp.
Cross-Correlation:
Total N. Leaching
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
259
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
4
8
4
8
F Value
0.675458
5.058849
1.110948
0.087981
6.180628
1.101983
0.375236
0.937304
p value
Sig
0.530696
0.048246 *
0.366697
0.916479
0.000428 ***
0.378597
0.825161
0.495150
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
5
6
0.38547
Cultivation
5
6
0.61453
Cultivation
5
6
1.59425
Cultivation
5
6
1.59425
Compaction
10
12
0.39965
Compaction
10
14
0.71065
Compaction
10
6.7 1.22619
Compaction
5
7
0.92917
12
0.49706
14
0.57887
6.7 0.85909
7
0.60774
Block
10
12
0.3873
Block
10
14
0.7444
Block
10
6.7 1.24196
Block
5
7
0.83446
260
F
Value
1.91
1.91
1.91
1.91
0.7
0.77
0.7
1.3
0.5
0.57
0.49
0.85
0.73
0.83
0.71
1.17
p value
Sig
0.226200
0.226200
0.226200
0.226200
0.711200
0.654300
0.707400
0.362000
0.858200
0.812400
0.851700
0.555400
0.687900
0.609100
0.701200
0.410000
APPENDIX N MICROBIAL CARBON LEVELS

symbol used is *.

1200
1000
800
600
400
200
0, None
1, None
16, None
0
1
Month
261
0, Plough
1, Plough
16, Plough
10
11
12
13
14

(3MA Smoothed)
1000
900
800
700
600
500
400
300
200
100
0, None
1, None
16, None
0
1
Month
262
0, Plough
1, Plough
16, Plough
10
11
12
13
14
Correlations
Lag Of
0
1
2
3
4
Rainfall
0.085570019
0.284682759
0.565368398
0.028254635
-0.2751191
Max. Temp
0.238078047
0.184128101
0.060178663
0.368053099
-0.02979152
Min. Temp
0.227729794
0.285440056
-0.0113084
0.273313942
-0.15943005
Moisture
0.28629
0.20622
0.13565
0.09367
0.19816
Max. Temp
0.326338458
0.464537119
0.818534269
0.160734648
0.916062829
Min. Temp
0.348417652
0.250899746
0.965641537
0.305719589
0.570338739
Moisture
<.0001
0.0015
0.0464
0.1893
0.0077
Lag Of
0
1
2
3
4
Rainfall
0.727608236
0.252214465
0.018022878
0.917272112
0.320993117
Cross-Correlation:
Microbial Carbon
Rainfall
Cross-Correlation:
Microbial Carbon
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Microbial Carbon
Min. Temp.
Cross-Correlation:
Microbial Carbon
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
263
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
3
6
3
6
F Value
0.39208
0.856956
3.435646
5.59942
5.247222
2.956556
0.40138
1.794473
p value
0.685596
0.376392
0.073157
0.023358
0.004160
0.018880
0.752847
0.127962
Sig
*
**
*
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
4
7
0.39604
Cultivation
4
7
0.60396
Cultivation
4
7
1.52498
Cultivation
4
7
1.52498
Compaction
8
14
0.26683
Compaction
8
16
0.8934
Compaction
8
8
2.14716
Compaction
4
8
1.81659
14
0.28697
16
0.90919
8
1.80119
8
1.2577
Block
8
14
0.17264
Block
8
16
0.93381
Block
8
8
4.1757
Block
4
8
4.0224
264
F
Value
2.67
2.67
2.67
2.67
1.64
1.61
1.79
3.63
1.52
1.67
1.5
2.52
2.46
1.75
3.48
8.04
p value
Sig
0.121700
0.121700
0.121700
0.121700
0.200400
0.197200
0.214100
0.056900
0.236700
0.182800
0.289500
0.124300
0.067200
0.161700
0.048500 *
0.006600 **
Seasonal Split Plot
Season 1
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
2.338017
2.0237218
1.5792748
1.2394761
0.7598793
0.7103896
0.0599677
1.5528522
p value
0.146878
0.185295
0.253489
0.330453
0.478648
0.592842
0.941936
0.219008
Sig
Df
2
1
2
2
2
4
2
4
F Value
0.2814361
0.1273823
4.4823787
2.9838509
1.973918
1.3737506
0.3409503
0.4103859
p value
0.760484
0.728578
0.040763
0.096336
0.160826
0.272592
0.714483
0.799361
Sig
Df
2
1
2
2
2
4
2
4
F Value
1.1097742
1.7627457
4.4073715
10.145989
2.5780606
0.1928468
0.2881036
0.1105257
p value
0.367049
0.213792
0.042414
0.003921
0.096778
0.939761
0.752240
0.977646
Season 2
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Season 3
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
265
Sig
*
**
Season 4
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
2.0527222
0.0168278
1.6428183
2.4818863
1.0105902
0.7892213
2.3392555
0.4520114
p value
0.179088
0.899359
0.241595
0.133289
0.378975
0.543592
0.117993
0.769935
Sig
Seasonal MANOVA
Season 1
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.81681
Cultivation
3
8
0.18319
Cultivation
3
8
0.22428
Cultivation
3
8
0.22428
Compaction
6
16
0.41683
Compaction
6
18
0.61614
Compaction
6
9.1 1.31996
Compaction
3
9
1.25704
16
0.36353
18
0.73169
9.1 1.48885
9
1.28502
Block
6
16
0.5602
Block
6
18
0.46144
Block
6
9.1 0.74644
Block
3
9
0.6905
266
F
Value
0.6
0.6
0.6
0.6
1.46
1.34
1.69
3.77
1.76
1.73
1.91
3.86
0.9
0.9
0.96
2.07
p value
Sig
0.634000
0.634000
0.634000
0.634000
0.252300
0.292400
0.228500
0.052900
0.172100
0.171100
0.183100
0.050200
0.521000
0.516300
0.502200
0.174400
Season 2
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.87689
0.12311
0.14039
0.14039
0.3441
0.73405
1.67903
1.53066
0.32839
0.75357
1.79561
1.64377
0.49282
0.56443
0.91297
0.76015
F
Value
0.37
0.37
0.37
0.37
1.88
1.74
2.15
4.59
1.99
1.81
2.3
4.93
1.13
1.18
1.17
2.28
p value
Sig
0.774000
0.774000
0.774000
0.774000
0.146600
0.169100
0.144000
0.032600 *
0.127700
0.152900
0.124800
0.027000 *
0.388200
0.360300
0.398500
0.148200
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.80247
Cultivation
3
8
0.19753
Cultivation
3
8
0.24616
Cultivation
3
8
0.24616
Compaction
6
16
0.69444
Compaction
6
18
0.31669
Compaction
6
9.1 0.42398
Compaction
3
9
0.38203
16
0.52339
18
0.47867
9.1 0.90666
9
0.90229
Block
6
16
0.15034
Block
6
18
1.103
Block
6
9.1 3.96657
Block
3
9
3.48273
F
Value
0.66
0.66
0.66
0.66
0.53
0.56
0.54
1.15
1.02
0.94
1.16
2.71
4.21
3.69
5.09
10.45
p value
0.601300
0.601300
0.601300
0.601300
0.775200
0.753100
0.764200
0.382100
0.447700
0.488700
0.402000
0.107900
0.009900
0.014400
0.014900
0.002700
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Season 3
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
267
Sig
**
*
*
**
Season 4
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
268
Test
Stat.
0.58453
0.41547
0.71078
0.71078
0.49965
0.5675
0.86702
0.66491
0.57997
0.44453
0.68198
0.61307
0.24935
0.87697
2.50382
2.28181
F
Value
1.9
1.9
1.9
1.9
1.11
1.19
1.11
1.99
0.83
0.86
0.87
1.84
2.67
2.34
3.21
6.85
p value
Sig
0.208800
0.208800
0.208800
0.208800
0.401300
0.356100
0.424800
0.185400
0.560500
0.543800
0.548600
0.210300
0.054200
0.075500
0.056300
0.010700 *
APPENDIX O MICROBIAL NITROGEN LEVELS

symbol used is *.
Microbial Nitrogen Levels By Compaction, Cultivation

350
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
Mean Microbial Nitrogen (g/g)
300
250
200
150
100
50
0
1
Month
269
10
11
12
13
14
Microbial Nitrogen Levels By Compaction, Cultivation

(3MA Smoothed)
250
0, None
0, Plough
1, None
1, Plough
16, None
16, Plough
Mean Microbial Nitrogen (g/g)
200
150
100
50
0
1
Month
270
10
11
12
13
14
Correlations
Lag Of
0
1
2
3
4
Rainfall
-0.12603623
0.09066895
0.139363849
-0.19666723
-0.3582452
Max. Temp
0.065341584
-0.24046069
0.023840581
0.367826802
0.070785178
Min. Temp
0.259654118
0.033387725
0.195881708
0.375130526
0.106448489
Moisture
0.37576
0.34172
0.26111
0.06075
0.15905
Max. Temp
0.79042018
0.336475947
0.927633935
0.161011908
0.802061769
Min. Temp
0.283038836
0.895365091
0.451168479
0.152224469
0.705740751
Moisture
<.0001
<.0001
0.0001
0.3952
0.033
Lag Of
0
1
2
3
4
Rainfall
0.607153746
0.720494694
0.593715863
0.465370149
0.189809631
Cross-Correlation:
Microbial Nitrogen
Rainfall
Cross-Correlation:
Microbial Nitrogen
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Microbial Nitrogen
Min. Temp.
Cross-Correlation:
Microbial Nitrogen
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
271
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
3
6
3
6
F Value
0.73746
3.028711
2.483271
8.05243
50.73149
2.383457
0.360821
0.681095
p value
0.502634
0.112427
0.133166
0.008249
0.000000
0.048397
0.781635
0.665831
Sig
**
***
*
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
4
7
0.74643
Cultivation
4
7
0.25357
Cultivation
4
7
0.33972
Cultivation
4
7
0.33972
Compaction
8
14
0.29369
Compaction
8
16
0.76673
Compaction
8
8
2.19924
Compaction
4
8
2.10134
14
0.478
16
0.60274
8
0.92313
8
0.67161
Block
8
14
0.19026
Block
8
16
0.9195
Block
8
8
3.67895
Block
4
8
3.51482
272
F
Value
0.59
0.59
0.59
0.59
1.48
1.24
1.83
4.2
0.78
0.86
0.77
1.34
2.26
1.7
3.07
7.03
p value
Sig
0.678200
0.678200
0.678200
0.678200
0.249200
0.336900
0.204900
0.040100 *
0.626500
0.565500
0.640200
0.333900
0.086800
0.173800
0.066900
0.009900 **
Seasonal Split Plot
Season 1
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
0.2247053
1.0992693
1.4359927
3.7777499
0.561242
2.9751665
0.7881072
2.3679733
p value
0.802677
0.319105
0.282991
0.059970
0.577809
0.039660
0.466122
0.081172
Df
2
1
2
2
2
4
2
4
F Value
0.0664551
3.0447931
2.0516011
3.5467396
19.832871
1.6427854
0.6985255
0.5739241
p value
0.936115
0.111591
0.179230
0.068525
0.000008
0.196155
0.507149
0.684182
Df
2
1
2
2
2
4
2
4
F Value
2.7697966
1.6339222
2.9406423
11.835921
14.509175
4.4397701
1.4491961
0.4014639
p value
0.110357
0.230033
0.098986
0.002310
0.000074
0.007927
0.254577
0.805637
Sig
Season 2
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Sig
***
Season 3
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
273
Sig
**
***
**
Season 4
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
2.3597005
1.727438
1.0963493
3.8791388
0.4202438
0.662817
0.5661479
0.7616197
p value
0.144727
0.218086
0.371108
0.056623
0.661627
0.623892
0.575107
0.560507
Sig
Seasonal MANOVA
Season 1
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.67994
Cultivation
3
8
0.32006
Cultivation
3
8
0.47073
Cultivation
3
8
0.47073
Compaction
6
16
0.31874
Compaction
6
18
0.70365
Compaction
6
9.1 2.06713
Compaction
3
9
2.03257
16
0.40742
18
0.72154
9.1 1.13792
9
0.65374
Block
6
16
0.28721
Block
6
18
0.78514
Block
6
9.1 2.22986
Block
3
9
2.1105
274
F
Value
1.26
1.26
1.26
1.26
2.06
1.63
2.65
6.1
1.51
1.69
1.46
1.96
2.31
1.94
2.86
6.33
p value
Sig
0.352800
0.352800
0.352800
0.352800
0.116700
0.196600
0.090700
0.015000 *
0.237100
0.180100
0.292000
0.190400
0.084800
0.129100
0.075500
0.013400 *
Season 2
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.72433
0.27567
0.38058
0.38058
0.40329
0.70941
1.20019
0.88411
0.38277
0.74425
1.28073
0.92006
0.21925
1.00437
2.5411
2.04151
F
Value
1.01
1.01
1.01
1.01
1.53
1.65
1.54
2.65
1.64
1.78
1.64
2.76
3.03
3.03
3.26
6.12
p value
0.435300
0.435300
0.435300
0.435300
0.230500
0.191200
0.268300
0.112200
0.199300
0.160500
0.240700
0.103800
0.035700
0.031700
0.054200
0.014800
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.47535
Cultivation
3
8
0.52465
Cultivation
3
8
1.1037
Cultivation
3
8
1.1037
Compaction
6
16
0.27138
Compaction
6
18
0.73067
Compaction
6
9.1 2.67722
Compaction
3
9
2.67439
16
0.44624
18
0.61073
9.1 1.11325
9
0.98342
Block
6
16
0.17164
Block
6
18
0.8329
Block
6
9.1 4.79977
Block
3
9
4.79426
F
Value
2.94
2.94
2.94
2.94
2.45
1.73
3.43
8.02
1.33
1.32
1.43
2.95
3.77
2.14
6.15
14.38
p value
0.098700
0.098700
0.098700
0.098700
0.071000
0.172000
0.047100
0.006500
0.302300
0.299100
0.302100
0.090800
0.015700
0.098500
0.008000
0.000900
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Sig
*
*
*
Season 3
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
275
Sig
*
**
*
**
***
Season 4
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
276
Test
Stat.
0.65149
0.34851
0.53495
0.53495
0.51769
0.5572
0.78701
0.49443
0.60014
0.41514
0.6408
0.59824
0.31867
0.82148
1.6983
1.3795
F
Value
1.43
1.43
1.43
1.43
1.04
1.16
1.01
1.48
0.78
0.79
0.82
1.79
2.06
2.09
2.18
4.14
p value
Sig
0.305000
0.305000
0.305000
0.305000
0.436400
0.370400
0.474800
0.283800
0.600600
0.592300
0.580100
0.218100
0.116600
0.105300
0.140600
0.042300 *
APPENDIX P MICROBIAL C:N RATIO

symbol used is *.
Microbial Carbon to Nitrogen Ratio By Compaction,

Cultivation
30
0, None
1, None
16, None
0, Plough
1, Plough
16, Plough
Mean Microbial Carbon to Nitrogen Ratio
25
20
15
10
0
1
Month
277
10
11
12
13
14
Microbial Carbon to Nitrogen Ratio By Compaction,

Cultivation (3MA Smoothed)
20
0, None
1, None
16, None
Mean Microbial Carbon to Nitrogen Ratio
18
0, Plough
1, Plough
16, Plough
16
14
12
10
0
1
Month
278
10
11
12
13
14
Correlations
Lag Of
0
1
2
3
4
Rainfall
0.181414152
0.008497303
0.068060908
0.249237729
0.316141731
Max. Temp
0.019091962
0.354591055
-0.00338806
-0.29198277
-0.09824805
Min. Temp
-0.21189515
0.075656439
-0.23589943
-0.34000751
-0.19648889
Moisture
-0.25247
-0.28758
-0.24208
0.00542
-0.03484
Max. Temp
0.938164708
0.148792127
0.989703426
0.272499568
0.7275801
Min. Temp
0.383821949
0.765423105
0.36202593
0.19756854
0.482761667
Moisture
<.0001
<.0001
0.0003
0.9396
0.6425
Lag Of
0
1
2
3
4
Rainfall
0.457310377
0.973305208
0.795212646
0.351911053
0.250994357
Cross-Correlation:
Microbial C:N Ratio
Rainfall
Cross-Correlation:
Microbial C:N Ratio
Max. Temp.
1
Correlation
Correlation
1
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Rainfall Lag
Max. Temp. Lag
Cross-Correlation:
Microbial C:N Ratio
Min. Temp.
Cross-Correlation:
Microbial C:N Ratio
Soil Moisture
Correlation
Correlation
0.5
0
-0.5
-1
0.5
0
-0.5
-1
Min. Temp. Lag
Soil Moisture Lag
279
Overall Split Plot

Source Of Variation
Compaction
Cultivation
Block
Season
SeasonCompaction
SeasonCultivation
Df
2
1
2
2
3
6
3
6
F Value
1.786044
7.516088
0.102909
5.37481
107.9573
3.059472
4.632102
5.364746
p value
0.217153
0.020777
0.903153
0.025999
0.000000
0.015975
0.007704
0.000483
Sig
*
*
***
*
**
***
Overall MANOVA
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
4
7
0.25088
Cultivation
4
7
0.74912
Cultivation
4
7
2.98597
Cultivation
4
7
2.98597
Compaction
8
14
0.19546
Compaction
8
16
1.0953
Compaction
8
8
2.62856
Compaction
4
8
1.80393
14
0.19083
16
1.07103
8
2.86792
8
2.26102
Block
8
14
0.23047
Block
8
16
0.90789
Block
8
8
2.73856
Block
4
8
2.49825
280
F
Value
5.23
5.23
5.23
5.23
2.21
2.42
2.19
3.61
2.26
2.31
2.39
4.52
1.9
1.66
2.28
5
p value
0.028600
0.028600
0.028600
0.028600
0.093100
0.062900
0.144100
0.057800
0.087500
0.073700
0.119600
0.033400
0.141100
0.184000
0.132200
0.025800
Sig
*
*
*
*
Seasonal Split Plot
Season 1
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
3.1441144
0.1351852
0.0376455
3.1389074
0.0707726
3.2318507
3.2104226
9.3208489
p value
0.087223
0.720778
0.963190
0.087502
0.931867
0.029541
0.058141
0.000108
Df
2
1
2
2
2
4
2
4
F Value
1.9565573
10.422711
3.0133435
1.0109636
121.84314
3.4508415
0.7271065
1.8520383
p value
0.191813
0.009045
0.094576
0.398226
0.000000
0.023072
0.493650
0.151800
Df
2
1
2
2
2
4
2
4
F Value
3.5457504
0.1743075
4.2775027
7.0336997
20.237078
14.306756
9.5827093
1.0279365
p value
0.068564
0.685134
0.045467
0.012384
0.000007
0.000004
0.000873
0.413076
Sig
*
***
Season 2
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Sig
**
***
*
Season 3
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
281
Sig
*
*
***
***
***
Season 4
Source Of Variation
Compaction
Cultivation
Block
Month
MonthCompaction
MonthCultivation
Df
2
1
2
2
2
4
2
4
F Value
2.0741743
11.677755
8.6491272
0.9382031
1.3062356
0.6745547
4.9171217
1.4502301
p value
0.176389
0.006577
0.006597
0.423228
0.289409
0.616144
0.016228
0.248306
Sig
**
**
Seasonal MANOVA
Season 1
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.48111
Cultivation
3
8
0.51889
Cultivation
3
8
1.07854
Cultivation
3
8
1.07854
Compaction
6
16
0.26806
Compaction
6
18
0.78752
Compaction
6
9.1 2.5231
Compaction
3
9
2.43805
16
0.15982
18
0.896
9.1 4.90792
9
4.8357
Block
6
16
0.33944
Block
6
18
0.7683
Block
6
9.1 1.62859
Block
3
9
1.40223
282
F
Value
2.88
2.88
2.88
2.88
2.48
1.95
3.23
7.31
4
2.43
6.29
14.51
1.91
1.87
2.09
4.21
p value
0.103200
0.103200
0.103200
0.103200
0.068300
0.127400
0.055200
0.008700
0.012200
0.066900
0.007500
0.000900
0.140900
0.141400
0.153300
0.040700
Sig
**
*
**
***
Season 2
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Test
Stat.
0.45011
0.54989
1.22167
1.22167
0.13576
0.9546
5.70026
5.58101
0.14847
1.17356
3.56643
2.78865
0.12022
1.28391
3.95635
2.72092
F
Value
3.26
3.26
3.26
3.26
4.57
2.74
7.31
16.74
4.25
4.26
4.57
8.37
5.02
5.38
5.07
8.16
p value
0.080700
0.080700
0.080700
0.080700
0.006900
0.045300
0.004500
0.000500
0.009500
0.007600
0.020700
0.005700
0.004500
0.002400
0.015000
0.006200
Test
Source Of Variation
Ndf Ddf Stat.
Cultivation
3
8
0.1361
Cultivation
3
8
0.8639
Cultivation
3
8
6.34779
Cultivation
3
8
6.34779
Compaction
6
16
0.12778
Compaction
6
18
0.89702
Compaction
6
9.1 6.63165
Compaction
3
9
6.60224
16
0.21373
18
0.78645
9.1 3.67808
9
3.67785
Block
6
16
0.08989
Block
6
18
1.07905
Block
6
9.1 8.24579
Block
3
9
8.01118
F
Value
16.93
16.93
16.93
16.93
4.79
2.44
8.5
19.81
3.1
1.94
4.72
11.03
6.23
3.52
10.57
24.03
p value
0.000800
0.000800
0.000800
0.000800
0.005600
0.066500
0.002600
0.000300
0.032800
0.128200
0.018900
0.002300
0.001600
0.017600
0.001200
0.000100
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
Sig
**
*
**
***
**
**
*
**
**
**
*
**
Season 3
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
283
Sig
***
***
***
***
**
**
***
*
*
**
**
*
**
***
Season 4
Test
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Wilk
Pillai
HL
Roy
Source Of Variation
Cultivation
Cultivation
Cultivation
Cultivation
Compaction
Compaction
Compaction
Compaction
Block
Block
Block
Block
Ndf
3
3
3
3
6
6
6
3
6
6
6
3
6
6
6
3
Ddf
8
8
8
8
16
18
9.1
9
16
18
9.1
9
16
18
9.1
9
284
Test
Stat.
0.28018
0.71982
2.56911
2.56911
0.58204
0.44206
0.67669
0.60867
0.26025
0.90155
2.22072
1.89214
0.68459
0.3251
0.44658
0.41226
F
Value
6.85
6.85
6.85
6.85
0.83
0.85
0.87
1.83
2.56
2.46
2.85
5.68
0.56
0.58
0.57
1.24
p value
0.013400
0.013400
0.013400
0.013400
0.564700
0.547900
0.552600
0.212600
0.062200
0.064600
0.076300
0.018400
0.758500
0.739900
0.744100
0.352300
Sig
*
*
*
*

Statistical Analyses of Multivariate Time Series Data With Application To Compacting Effects On Soil Chemical and Biological Properties in Forestry

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Analyses of Multivariate Time Series Data With Application To Compacting Effects On Soil Chemical and Biological Properties in Forestry

Uploaded by

Copyright:

Available Formats

STATISTICAL ANALYSES OF MULTIVARIATE

TIME SERIES DATA

By Stuart Fenech BSc (AES)

This dissertation is submitted in partial fulfilment of the requirements of the degree of

A review of recent theoretical developments and practical applications in the area of

A detailed forestry application based on data from an experiment on the effects of

Stuart Fenech October 2002

TIME SERIES THEORY ..................................................................... 5

Fundamental Statistical Concepts ..................................................................... 5

Bivariate Information ................................................................................ 11

Dependence within Variables.................................................................... 12

Statistical Measures and Terms................................................................. 15

Hypothesis Testing Overview ................................................................... 16

Correlation Functions ...................................................................................... 17

The Autocorrelation Function (ACF)........................................................ 17

The Partial Autocorrelation Function (PACF) .......................................... 19

The Inverse Autocorrelation Function (IACF).......................................... 20

The Cross Correlation Function (CCF)..................................................... 23

Repeated Measures Models............................................................................. 26

Split Plot Designs ...................................................................................... 30

Univariate Time Series Models....................................................................... 36

Time Series Model Components ............................................................... 36

General Time Series Models ..................................................................... 39

Moving Averages ...................................................................................... 40

Simple Linear Regression ......................................................................... 44

Multiple Linear Regression....................................................................... 45

AR (Autoregressive) Models .................................................................... 52

MA (Moving Average) Models................................................................. 55

ARMA (Autoregressive Moving Average) Models.................................. 58

ARIMA (Autoregressive Integrated Moving Average) Models ............... 58

Multivariate Time Series Models.................................................................... 65

Multivariate ARIMA Models.................................................................... 65

Vector ARIMA Models............................................................................. 70

THEORY LITERATURE REVIEW.................................................. 73

ARIMA Alternative Developments ................................................................ 80

Bayesian Developments .................................................................................. 82

APPLICATION LITERATURE REVIEW........................................ 88

Medical Applications ...................................................................................... 92

Economic Applications ................................................................................... 94

Sociology Applications ................................................................................... 96

Natural Phenomena Applications.................................................................... 97

FORESTRY CASE STUDY ............................................................ 105

Background ................................................................................................... 105

Previous Data Analysis ................................................................................. 110

Chemical Data ......................................................................................... 110

Biological Data........................................................................................ 112

Limitations and Scope................................................................................... 114

Data Analysis Techniques............................................................................. 115

Analysis Direction................................................................................... 115

Exploratory Data Analysis (EDA) .......................................................... 120

Correlation Analysis................................................................................ 121

Overall Split Plot Designs ....................................................................... 122

Overall MANOVA Designs .................................................................... 125

Season Based Split Plot Designs ............................................................. 127

Season Based MANOVA Designs.......................................................... 129

Multiple Comparison Tests ..................................................................... 130

Data Analysis and Results............................................................................. 132

Nitrate Levels .......................................................................................... 132

Ammonium Levels .................................................................................. 136

Total Mineral Nitrogen Levels ................................................................ 142

Nitrate Dynamics..................................................................................... 146

Ammonium Dynamics ............................................................................ 149

Total Mineral Nitrogen Dynamics .......................................................... 151