You are on page 1of 10

DSAASTAT - By Andrea Onofri

DSAASTAT A NEW EXCEL VBA MACRO TO PERFORM BASIC STATISTICAL ANALYSES OF FIELD TRIALS
Andrea Onofri
Department of Agriculture and environmental Sciences University of Perugia Borgo XX Giugno 74 - 06121 Perugia - ITALY

1. 2. 3. 4. 5. 6. 7. 8.

Introduction General features Specifications Usage Conclusions Aknowledgements References and further readings Home

Reference Onofri A., 2007. Routine statistical analyses of field experiments by using an Excel extension. Proceedings 6th National Conference Italian DOWNLOAD Biometric Society: "La statistica nelle scienze della vita e dellambiente", Pisa, 20-22 June 2007, 9396. Version 1.1 (Update: 18/03/2011) Top

1. Introduction
Field agricultural experiments are generally planned to evaluate the actual effect produced by man-made chemical substances or human activities on crop yield and quality, environmental health, farmers' income and so on. Field experiments include the testing of new and traditional varieties, fertilisers (types and doses), pesticides (types and doses), cultural practices and rotations. With respect to greenhouse or laboratory experiments, field

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

trials have some important peculiarities, such as a high variability (in response to genotype, environment and soil heterogeneity) and the need for the adoption of plot sizes big enough for machinery to operate. These peculiarities have pushed experimenters towards the adoption of certain types of designs, such as the randomised block balanced design (to account for some part of soil variability) and the split-plot factorial design, that is very handy whenever one experimental factor requires big plot sizes. Furthermore, the replication of experiments across environments (years and/or locations) is common practice, to comply with environmental variability. These types of designs are not as common in greenhouse or, above all, laboratory experiments. Multivariate innovative methods have recently been proposed for field experiments (Annichiarico, 1997; Crossa, 1990), but it is reasonable to state that traditional statistical methods (ANOVA, multiple comparison tests, correlation and linear regression) still play a major role for data analyses (Onofri and Ciriciofolo, 2004). Even though those traditional methods have been established long ago and have become routinised, some practical problems still exist. It should not be forgotten that very often routine agriculture experiments are carried out by extension and technical services, that frequently employ people with great experience in agriculture, but with a limited training in statistics. This is particularly true in developing countries, wherein the importance of statistical analyses is often not perceived. Furthermore, time is always one of the most limiting resources in field research and it is therefore very important the availability of good computer facilities to accomplish the statistical tasks. Unfortunately, the available software is either difficult to use, or very expensive, or not specifically thought and developed for routine field experiments. With reference to this latter aspect, very often statistical packages such as SAS (SAS Institute, 1985) or R (R Development Core Team, 2003) require some programming to perform ANOVA, multiple comparison tests or other tests to assess whether the basic assumptions for statistical analysis are met. This represents seldom a problem for scientists, but it may give some troubles to field technicians or students, with low statistic and computer background. It becomes therefore relevant to develop cheap, easily accessible and userfriendly specialised software, aimed at analysing data from field experiments. The set of routines that is hereby presented has been progressively developed over the years, to support the intensive field research activity that is carried out at the author's institution. The aims were on one side to address all the peculiarities of routine field experiments and on the other side to enable users (mainly students and technicians) with a limited background in statistics and computer programming, to perform statistical analyses directly from within the most widespread spreadsheet. The aim of this short note is to bring DSAASTAT to the attention of readers, as it may represent a quick solution at no cost to perform sound statistical analyses of data from routine field trials. Top

2. General features

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

The macro was written in the VBA language of EXCEL 97 (Giaccaglini, 1997) and runs as an add-in to EXCEL 97 or higher. DSAASTAT performs the following tasks: 1. read experimental data from the spreadsheet 2. performs the necessary diagnostic analysis, as requested from the user, to verify whether basic assumptions for ANOVA are met 3. performs the ANOVA 4. performs multiple comparison tests, as requested from the user 5. calculate correlation matrices 6. performs simple linear regression analysis (up to fourth order polynomial) 7. performs multiple regression 8. compare regression lines. Top

3. Specifications
Analysis of Variance DSAASTAT performs the ANOVA of experiments with up to four explanatory variables (five, including the block effect). In all the cases, both fully randomised and randomised block designs are handled, while latin square designs are considered only with one-way ANOVA. In the case of multi-way ANOVA, both factorial and split-plot designs may be analysed. One-way experiments may be repeated in different years (with same or different randomisation each year) and/or in different locations; locations may be the same or different across years. Two-way experiments may be repeated in different years (with same or different randomisation) or in different locations. In any case, years or environments can be set as fixed or random factors. In total, 23 different types of designs may be analysed by using DSAASTAT, covering a main part of the needs of field experimenters (LeClerg et al., 1962). One important aspect to be considered is that designs must be naturally balanced, which is absolutely normal with field experiments, so that this does not represent a real practical limitation. However, one-level completely randomised unbalanced designs may be as well analysed, to meet the possible requirements of simple greenhouse and laboratory experiments. Diagnostic tools A main part of field experiments is or may be regarded as one-level ANOVA. For this designs, DSAASTAT can seek for violations of basic assumptions for ANOVA and suggest possible remedies. The presence of outliers may be detected by analysis of residuals, as shown by Anscombe and Tukey (1963). The highest residual values is inspected by using a 'premium' of 2.5% (this latter being the percentage decrease in error variance when the outlier is rejected; see cited paper for explanation). In the case of randomised block designs, multiple outliers are detected by a recursive procedure, wherein the first outlier is corrected, by using the rule proposed by Leclerg et al (1963), then residuals are recomputed and inspected again until no more outliers can be found.

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

At the end of the process, the user obtains observed, expected and residual values, plus a scatterplot of expected vs. residuals, which may help him to visualise possible problems related to non-normality of distribution and eteroschedasticity of response. Likewise, the user obtains a list of outliers and suggested values for correction. On randomised block designs, possible non-addittivity is tested by using the Tukey's procedure (Snedecor and Cochran, 1991). The user obtains an F test for non-additivity and a p value which can be used to transform data, within the Box Cox transformation family (see later). The homogeneity of variances is tested by using the Bartlett's test and the Levene's test, being this second one less sensitive to non-normality of distribution (Snedecor and Cochran, 1991). The user obtains respectively a 2 (Bartlett's test) and a F (Levene's test) value, with corresponding probability levels. To help the user in finding possible correcting transformation, the procedure proposed by Box and Cox (1964) has been implemented, as described by Draper and Smith (1981). The basic family of transformation is adopted, on the form:

where W is the transformed variable, Y is the untransformed variable and is the transformation parameter. Values of ranging from -2.5 to 2.5 are iteratively used and the corresponding log-likelihood is calculated by DSAASTAT; the user may select the value of which corresponds to minimum log-likelihood, that is also the maximum likelihood estimator for . Confidence limits for are also provided to verify whether a transformation is needed at all. Multiple comparison tests Multiple comparison tests are often used (and misused) in field agricultural experiments. A wide range of procedures have been implemented in DSAASTAT: readers are referred to Chew (1976) for description and comparisons of procedures. In details, Fisher's LSD, Duncan's MRT, Newman-Keuls's test, Tukey's multiple range test and HSD, Sheffe's method, Dunnet's test for a two sided comparison with a control, ScottKnott's cluster procedure have all been implemented, while the Waller Duncan's bayesian LSD is currently being developed. All the procedures are accessible by a user friendly input window that requires as the inputs the residual sum of squares (from ANOVA), degrees of freedom and number of replications. Correlation matrices and linear regression analyses DSAASTAT allows the users to calculate correlation matrices and performs linear regression analysis (up to 4th order polynomial). The user may select the parameters he wants to include/exclude in/from the model. Multiple regression may as well be easily performed. These functions are already implemented in EXCEL, but DSAASTAT adds more flexibility and a user friendly interface to define the model. It is also possible to compare
http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

regression lines in order to verify whether they may be considered similar (not statistically different) or whether they may have similar slopes or intercepts. This is done by using 'dummy variables' and simultaneously fitting all the regression lines to the whole dataset. At the beginning, the resulting model is allowed to assume different parameter slopes and intercepts for all the curves. Afterwards, models are reduced by forcing them to assume similar slopes or/and similar intercepts and judgments are made on the basis of consecutive F-tests (Draper and Smith, 1981). Validation of the macro DSAASTAT has been extensively validated by using detasets of several types; results were compared to those obtained by the GLM procedure of SAS (SAS Institute, 1985). Possible bugs may be however reported to the author. Top

4. Usage
Please, make sure you have downloaded the latest version (see top of page) . The file DSAASTAT.XLS containing the macro DSAASTAT can be put everywhere on the hard-disk, or external devices (pen-drives, removable media...). After done this, you might like to create a shortcut on your desktop to lunch the macro more easily. Please, make also sure that macros are enabled in EXCEL. To enable them, you need to follow a version-dependent procedure. Excel 2010 To enable macros in EXCEL 2010 : click the FILE MENU and select Options from the left sidebar. Select TRUST CENTER and press TRUST CENTER SETTINGS. In Trust Center Settings, select MACRO SETTINGS from the left sidebar, choose DISABLE ALL MACROS WITH NOTIFICATION and hit the OK button. Save and close Excel completely, reopen Excel, open DSAASTAT.XLS and you will see a SECURITY WARNING pop-up notification, that will let you enable macros. You will have to do this every time, unless you selected the radio button for "Enable all macros" in the window above (do so only if you know what you are doing!). Excel 2007 Click the "Excel Options" button in the lower right. Click the "Trust Center" button on the left. Then, at the bottom right, select "Trust Center Settings". In the next window, select "Macro Settings," then select the radio button for "Disable all macros with notification." To close the Trust Center window, click the lower right "OK" button. Save and close Excel completely, reopen Excel, open DSAASTAT.XLS and you will see a SECURITY WARNING notification beneath the Office ribbon. In the SECURITY WARNING banner, click the "Options" button and Select the radio box beside "Enable this content," then click "OK." You will have to do this every time, unless you
http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

selected the radio button for "Enable all macros" in the window above. Excel 97 to 2003 Click in 'Tools' to open the menu; then click on 'Macro' and second click on 'Security' (see below).

Now the following window will open:

Select the "Medium" alternative. Now each time the program is launched an EXCEL opens, and the Security Warning window opens, click on "Enable Macros". You will have to do this every time, unless you selected the Security Level "Low" in the window above.

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

To launch the programme, launch the file DSAASTAT.XLS or click on your desktop icon (if you manually created one shortcut) and EXCEL is started. Then clicking on the EXCEL menu bar to open the "Tools" menu, the user can notice that five new entries are added to the tools menu of Excel (Diagnostic tools, Anova, Multiple comparison tests, Correlation matrices, Regression analyses), which can be used to start the analyses. In Excel 2007 or 2010, entries are added to the 'ADD IN' button in the ribbon. Data preparation: all Excel versions When launching the macros, the user is prompted to select the input data, that have to be prepared as follows (Fig. 1). Data have to be organised in a database, with observations in rows, and variables in columns. Basically, one column variable plus one or more explanatory variables are required, coding for each effect included in the ANOVA model or for the regression curves to be compared. The first row is reserved for variable names; response variable must be numeric and measured on a continuous scale. Explanatory variables may be either numeric or string. Numeric variables are obviously required for correlation or regression analyses, though when more curves have to be compared, they can be coded by a string variable.

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

Figure 1. Example of data selection. All the other steps are menu driven and do not present any particular difficulties. However, an example spreadsheet is provided together with DSAASTAT. Top

5. Conclusions
Work carried out during previous years and the intensive use of DSAASTAT by users with any kind of statistic and computer backgrounds (students, technicians and researchers) has shown that this tool has the flexibility and the simplicity to accomplish the main needs of routine field research experiments. This software does not do anything particularly innovative, but it does it quickly and easily. The advantage over other free statistical software is that EXCEL users may perform statistical analyses with no programming and without the need for learning any other software tool. DSAASTAT is a freeware, virus-free macro and can be downloaded from the INTERNET (URL: http://www.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm), or requested to the author by e-mail. Top

6. Aknowledgements
The author wish to thank Dino Alberati, Raffaele Alberati, Euro Pannacci (University of Perugia), Amnuai Adthalungrong (Horticulture Research Institute, Thailand), Simone Bellinazzo for their help in testing and

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

developing the macro and for providing useful comments on the manuscript. Top

7. References and further readings


1. Annichiarico, P. (1997). Nuove metodologie per lo studio dell'interazione trattamento-ambiente e per la definizione della raccomandazione tecnica. Rivista di Agronomia 31, 817-823. 2. Anscombe, F. J., and Tukey, J. W. (1963). The examination and analysis of residuals. Technometrics, 161. 3. Box , G. E. P., and Cox , D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, B-26, 211-243, discussion 244252. 4. Chew , V. (1976). Comparing treatment means: a compendium. Hortscience, 11(4), 348-357. 5. Crossa, J. (1990). Statistical Analyses of multilocation trials. Advances in Agronomy 44, 55-85. 6. Draper, N. R., and Smith, H. (1981). Applied regression. John Wiley Sons, Inc., New York, 2nd ed. 7. Giaccaglini, G. (1997). Usare Excel 97 Visual Basic for Applications. Jackson Libri, Gruppo Editoriale Futura S.r.l., Milano, 427 pp. 8. LeClerg, E. L., Leonard, W. H., and Clark, A. G. (1962). Field Plot Technique. Burgess Publishing Company, Minneapoli, Minnesota, 373 pp. 9. Onofri, A., and Ciriciofolo, E. (2004). Characterisation of yield quality in durum wheat by canonical variate anaysis. In Proceedings VIII ESA Congress European Agriculture in a global context (S.-E. Jacobsen, C. R. Jensen and J. R. Porter, eds.), Copenhagen, 11-15 July 2004, pp. 541-542. 10. R Development Core Team (2003). R: a language and environment for statistical computing. R Foundation for statistical Computing. URL: http://www.R-project.org (ISN 3-900051-00-3), Vienna, Austria pp. 11. SAS Institute Inc. (1985). SAS. User's Guide: Statistics, Version 5 Edition. SAS Institute Inc., Cary, NC, 956 pp pp. 12. Snedecor , G. W., and Cochran , W. G. (1991). Statistical methods. IOWA State University Press, 8 Edition, AMES (Iowa), 503 pp. Top

8. Brief Version history


1. 1.017 (Update: 19/03/2007). Some errors on split-plot designs and ambiguities on multiple comparison tests have been corrected. 2. 1.018 (Update: 26/03/2007). Some errors on two way designs repeated in more years and locations have been corrected. 3. 1.019 (Update: 25/07/2007). Some errors on Newman-Keuls and Tukey's MCT procedures have been corrected. 4. 1.0192 (Update: 09/09/2008). One minor bug on one factor completely randomised design was corrected, that caused an error on English versions of Excel. 5. 1.021 (Update: 03/05/2010). Minor fixes on English language. 6. 1.022 (Update: 07/06/2010). Fixes one bug with latin square design.

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

DSAASTAT - By Andrea Onofri

7. 1.1 (Update: 18/03/2011). Fixes some minor bugs and adds one factor with subsampling designs, as well as 2-ways strip-plot designs. Top

This pages were prepared by Miriana Cenerini Last update: 03/05/2010

http://accounts.unipg.it/~onofri/DSAASTAT/DSAASTAT.htm[14-May-12 11:48:26 PM]

You might also like