You are on page 1of 16

LAHORE COLLEGE FOR WOMEN UNIVERSITY

STATISTICS DEPARTMENT

Experimental Design
An Introduction to Experimental
Design
An Essay

Presented by Ifrah Mahmood


B.S(HONS) Statistics

This essay demonstrates the brief introduction of the experimental design, the history and
the application to the real world phenomena. The important assumptions, guidelines and
the basic statistical concept that ought to be considered are also acknowledged. The three
basic types of experimental design: Complete Randomized Design, Randomized Complete
Block Design and Latin Square Design are also illustrated with help of an example.
Experimental Design

Experimental design is the gold standard of research design, and the best approach for
assessing cause and effect. The design, which relies on random assignment and repeated
measurements for its rigor, has its roots in agricultural experiments of the early twentieth
century and is now commonplace in a variety of scientific and industrial settings.
Elements of experimental design, as outlined by Fisher, include comparison of an
experimental group, which receives the treatment or intervention, and a baseline control
group. Other features include random assignment of subjects to treatment and control
groups to control for any differences between the groups that could bias the results.
Finally, experimental design requires multiple measures to estimate the level of variation
in measurements.

The real start must surely be the arrival of R. A. Fisher at the Rothamsted Experimental
station in the 1920's and his insistence on laying out field trials in special patterns such as
Latin Squares. Randomization was an additional method of isolating these effects.
Experimental design was considered as an active process; treatments were applied by
human action. This work led to the laying of the foundation of subject and its integration
with analytic method such as the analysis of variance. It also led to a burst of activity in
combinatorial design and a whole new branch of mathematics in block designs and other
structures. Experimental design expanded beyond its agricultural experiment roots during
World War II, as the procedure became a method for assessing and improving the
performance of weapons systems, such as long range artillery.

By this time factorial design also had been well developed particularly through the work
of R. C. Bose, F. Yates and C. R. Rao. Setting Up the problem as one of optimization
enabled algorithms to be developed for fast solution and the subject of computer-aided
experimental design was born. This work, which started in the US quickly spread to West
and East Europe. Lately the subject has come of age industrially because of the
increasing use of industrial experiments to test prototypes and improve products and
processes. It has been a shock for main-line statisticians to see an engineer, Genichi
Taguchi, credited with this new popularity. Fisher published "Design of Experiments," a
book that articulated the features of experimental research design that are still used today.
In the 1960s, randomized experiments became the standard for approval of new
medications and medical procedures. Prior to that time, approval of medical devices and
drugs relied on varying data, in which a physician would examine a handful of patients
and write a paper. This approach introduced bias, for which randomized clinical trials
controlled.

Statistical design of experiments refers to the “process of planning the experiment so


that appropriate data that can be analyzed by statistical methods will be collected,
2
resulting in valid and objective conclusions”. When the problem involves data that are
subject to experimental errors, statistical methods are the only objective approach to
analysis. It is wise to take time and effort to organize the experiment properly to ensure
that the right type of data, and enough of it, is available to answer the questions of
interest as clearly and efficiently as possible. This process is called experimental design.
There are two aspects to any experimental problem; the design of the experiment and the
statistical analysis of the data. These two subjects are closely related because the methods
of analysis depend directly on the design employed. An experiment on purpose imposes
a treatment on a group of objects or subjects in the interest of observing the response.

Experimental design methods have found broad application in many disciplines. In fact,
we may view experimentation as part of the scientific process and as one of the ways as
we learn about how system or process work. Generally we learn through a series of
activities in which we make inference about a process, perform experiment to generate
data from the process, and then use the information from the experiment to establish new
conjectures, which lead to new experiment and so on. Experimental design is a critically
important tool in the engineering world for improving the performance of a
manufacturing process. It also has extensive application in the development of new
processes. The application of experimental design techniques early in process
development can result in:

• Improved process yields


• Reduced variability and closer conformance to nominal or target requirements
• Reduced development time
• Reduced overall costs

Some applications of experimental design in engineering design include:

• Evaluation and comparison of basic design configurations


• Evaluation of material alterations
• Selection of design parameters so that product will work well under a wide
variety of field conditions, that is, so that the product is robust
• Determination of key product design parameters that impact product performance

The use of experimental design in these areas can result in products that are easier to
manufacture, products that have enhanced field performance and reliability, lower
product cost, and shorter product design and development time.

Attention to experimental design is extremely important because the validity of an


experiment is directly affected by its construction and execution. The specific questions

3
that the experiment is intended to answer must be clearly identified before carrying out
the experiment. We should also attempt to identify known or expected sources of
variability in the experimental units since one of the main aims of a designed experiment
is to reduce the effect of these sources of variability on the answers to questions of
interest. That is, we design the experiment in order to improve the precision of our
answers.

The plan that we choose to call a design is an essential part of research strategies. The
design itself entails:

• selecting or assigning subjects to experimental units


• selecting or assigning units for specific treatments or conditions of the experiment
(experimental manipulation
• specifying the order or arrangement of the treatment or treatments
• specifying the sequence of observations or measurements to be taken

There are four basic principles of experimental design; randomization, replication,


blocking and local control.

Randomization is the basis underlying the use of statistical methods in experimental


design. Because it is generally extremely difficult for experimenters to eliminate bias
using only their expert judgment, the use of randomization in experiments is common
practice. By randomization we mean that both the allocation of the experimental material
and the order in which the individual runs or trails of the experiment are to be performed
are randomly determined. Statistical methods require that the observations (or errors) be
independently distributed random variables. Randomization usually makes this
assumption valid. By properly randomizing the experiment, we also assist in “averaging
out” the effects of extraneous factors that may be present. In a randomized experimental
design, objects or individuals are randomly assigned (by chance) to an experimental
group. Using randomization is the most reliable method of creating homogeneous
treatment groups, without involving any potential biases or judgments.

By replication we mean an independent repeat of each factor combination. To improve


the significance of an experimental result, replication, the repetition of an experiment on
a large group of subjects, is required. Although randomization helps to insure that
treatment groups are as similar as possible, the results of a single experiment, applied to a
small number of objects or subjects, should not be accepted without question. Randomly
selecting two individuals from a group of four and applying a treatment with "great
success" generally will not impress the public or convince anyone of the effectiveness of
the treatment. If a treatment is truly effective, the long-term averaging effect of
replication will reflect its experimental worth. If it is not effective, then the few members

4
of the experimental population who may have reacted to the treatment will be negated by
the large numbers of subjects who were unaffected by it. Replication reduces variability
in experimental results, increasing their significance and the confidence level with which
a researcher can draw conclusions about an experimental factor. There are two important
properties of replication. First it allows the experimenter to obtain an estimate of the
experimental error. Secondly, if the sample mean (ȳ) is used to estimate the true mean
response for one of the factor levels in the experiment; replication permits the
experimenter to obtain a more precise estimate of this parameter.

Blocking is a design technique used to improve the precision with which comparisons
among the factors of interest are made. Often blocking is used to reduce or eliminate the
variability transmitted from nuisance factor- that is, factors that may influence the
experimental response but in which we are not directly interested, for example, an
experiment in a chemical process may require two batches of raw material to make all the
required runs, however, there could be differences between the batches fur to supplier-to-
supplier variability, and if we are not specifically interested in this effect, we would think
of the batches of raw material as a nuisance factor. Generally, a block is a set of relatively
homogeneous experimental conditions.

Local control, a term referring to the amount of the balancing, blocking and grouping of
experimental units. Balancing means that the treatments should be assigned to the
experimental units in such a way that the result is a balanced arrangement of the
treatments. Blocking does mean that like experimental units should be collected together
to form a relatively homogeneous group. A block is also a replicate. It has been observed
that all extraneous sources of variations are not removed by randomization and
replication. This necessitates a refinement in the experimental technique. In other words
we need to choose a design in such a manner that all extraneous sources of variation are
brought under control. The main purpose of the principle of local control is to increase
the efficiency.

These basic principles of experimental design are part of every experiment. The use of
the statistical approach in designing and analyzing an experiment, it is necessary to have
a clear idea in advance of exactly what is to be studied, how the data are to be collected,
and at least a qualitative understanding of how these data are to be analyzed. The key
points of guidelines for designing experiment are as follows:

• Recognition of and statement of the problem


• Selection of the response variable
• Choice of factors, levels and range
• Choice of experimental design
• Performing the experiment
5
• Conclusions and recommendations

Basic statistical concepts would help describing the procedure better way. Some of the
basic concepts are briefly described below:

Each of the observation in the experiment would be called a run. Since the individual
runs differ, so there is fluctuation, or noise. This noise is usually called experimental
error or simply error. It is a statistical error, meaning that it arises from variation that is
uncontrolled and generally unavoidable. A random variable may be either discrete or
continuous. If the set of all possible values of the random variable is either finite or
countably infinite, then the random variable is discrete, whereas if the set of all possible
values of the random variable is an interval, then the random variable is continuous.

We often use simple graphical methods to assist in analyzing the data from an
experiment. Dot diagram, histograms, and box plot are useful for summarizing the
information in a sample of data. To describe the observation that might occur in a sample
more completely, we use the concept of probability distribution. The probability
structure of a random variable, say y, is described by its probability distribution. If y is
discrete, we often call the probability distribution of y, say p(y), the probability function
of y. If y is continuous then the probability distribution of y, say f(y), is called the
probability density function of y. The mean, μ, of a probability distribution is a measure
of its central tendency or location. We may also express the mean in terms of the
expected value E or the long-run average value of the random variable y. The variability
or dispersion of a probability distribution can be measured by the variance σ2.

The objective of the statistical inference is to draw conclusion about a population using a
sample. Often we are able to determine the probability distribution of a particular statistic
if we know the probability distribution of the population from which the sample was
drawn. The probability distribution of a statistic is called a sampling distribution. There
are several useful sampling distributions as normal distribution, chi-square distribution, t-
distribution and F-distribution. The simple comparative experiment can be analyzed
using hypothesis testing and confidence interval procedures for comparing two treatment
means. The technique of statistical inference called hypothesis testing can be used to
assist the experimenter in comparing these two formulations. Hypothesis testing allows
the comparison of the two formulations to be made on objective terms, with knowledge
of the risks associated with reaching the wrong conclusion.

A completely randomized design is the simplest type of randomization scheme in that


treatments are assigned to units completely by chance. In addition, units should be run in

6
random order throughout the experiment. The completely randomized design has several
advantages:
• It is completely flexible. Any number of treatments can be investigated. Each
treatment can have any number (more than one) of units although balance (an
equal number of units for each treatment) is desirable.
• The statistical analysis is straightforward.
• The analysis remains simple even if observations from some units are missing.

There are disadvantages too. Any variation in units shows up in the experimental error
sum of squares. Unless the units are very similar, a completely randomized design will
have larger experimental error than other designs. For a given number of observations a
completely randomized design has the largest degrees of freedom for error. The sum of
squares will be divided by the largest degrees of freedom possible to produce the mean
square error.
Given these advantages and disadvantages, a completely randomized design is most
appropriate when:
• Experimental units are similar
• Several units may be destroyed or fail to respond
• It is a relatively small experiment

Example: Suppose we have 4 different diets which we want to compare. The diets are
labeled Diet A, Diet B, Diet C, and Diet D. We are interested in how the diets affect the
coagulation (a complex process by which blood forms clots) rates of rabbits. The
coagulation rate is the time in seconds that it takes for a cut to stop bleeding. We have 16
rabbits available for the experiment, so we will use 4 on each diet. How should we use
randomization to assign the rabbits to the four treatment groups? The assignment of
rabbits to treatment is a completely randomized design. However, the arrangement of the
cages for convenience creates a bias in the results. The heat in the room rises, so the
rabbits receiving Diet A will be living in a very different environment than those
receiving Diet D. Any observed difference cannot be attributed to diet, but could just as
easily be a result of cage placement. Cage placement is not a part of the treatment, but
must be taken into account. In a completely randomized design, every rabbit must have
the same chance of receiving any diet at any location in the matrix of cages.

Label the cages 1-16. In a bowl put 16 strips of paper each with one of the integers 1-16
written on it. In a second bowl put 16 strips of paper, four each labeled A, B, C, and D.
Catch a rabbit. Select a number and a letter from each bowl. Place the rabbit in the
location indicated by the number and feed it the diet assigned by the letter. Repeat
without replacement until all rabbits have been assigned a diet and cage.
7
1 5 9 13 1 5 9 13
2 6 10 14 2 6 10 14
3 7 11 15 3 7 11 15
B
4 8 12 16 4 8 12 16

If, for example, the first number selected was 7 and the first letter B, then the first rabbit
would be placed in location 7 and fed diet B. An example of the completed cage selection
is shown below.

1 5 9 13
C A B D
2 6 10 14
D B D C
3 7 11 15
C B A D
4 8 12 16
A A C B

Notice that the completely randomized design does not account for the difference in
heights of the cages. It is a completely random assignment. In this case, we see that the
rabbits with Diet A are primarily on the bottom and those with Diet D are on the top. To
analyze the results of the experiment, we use a one-way analysis of variance. The
measured coagulation times for each diet are given below:

Diet A Diet B Diet C Diet D


62 63 68 56

60 67 66 62
63 71 71 60
59 64 67 61

Mean: 61 66.25 68 59.75

The null hypothesis H0 is: µA=µB=µC= µD (all treatment means the same)
The alternative hypothesis H1 is: at least one mean differ.

8
The ANOVA Table is given below:
Source of Degrees of Sums of Mean squares F-ratio
variation freedom squares
Model 3 191.50000 63.8333 9.1737

Error 12 83.50000 6.9583 P>F 0.0020

Total 15 275.00000

From the computer output, we see that there is a statistically significant difference in
coagulation time (p = 0.0020).
Analysis of variance gets its name because it compares two different estimates of the
variance. The first estimate considers the observations as a single set of data. Here we
compute the variance using the standard formula. The sum of squared deviations from the
overall mean is

If we divide the quantity by

we have an estimate of variance over all units, ignoring treatments. This is just the
sample variance of the combined observations. In our example, SS(total)=275 and

The ANOVA table consolidates most of these computations, giving the essential sums of
squares and degrees of freedom for our estimates of variance. The standard table is
shown below:

Notice that Total SS = Treatment SS + Residual SS. The total sum of squares has been
partitioned into two parts, the Treatment Sums of Squares and the Residual, or Error,
Sums of Squares. The Treatment Sums of Squares is a measure of the variation among
the treatment groups, which includes the variation of the rabbits. The Residual Sums of
Squares is a measure of the variation among the rabbits within each treatment group. The
MS(Treatment) is "explained" variance and MS(Residual) is "unexplained" variance. The

9
variance estimated by MS (Treatment) is explained by the fact that the observations may
come from different populations while the MS(Residual) cannot be explained by variance
in population parameters and is therefore considered as random or chance variation. In
this terminology, the F-statistic is the ratio of explained variance to unexplained variance

Like all hypothesis tests, the one-way ANOVA has several criteria that must be satisfied
(at least approximately) for the test to be valid. These criteria are usually described as
assumptions that must be satisfied, since one often cannot verify them directly. These
assumptions are listed below:
1. The population distribution of the response variable Y must be normal within each
class.
2. Independence of observed values within and among groups.
3. The population variances of Y values must be equal for all k classes

These assumptions are important because normality is not critical. Problems tend to arise
if the distributions are highly skewed and the design is unbalanced. The problems are
when the sample sizes are small. The assumption of independence is critical. The
assumption of equal variance is important. However, the design of the experiment with
random assignment helps balance the variance. This is a greater problem in observational
studies.

If we do not reject H0 in an ANOVA, the analysis is finished. There are no differences


among the means. If we, however, reject H0 then we want to know which of the μi’s are
different from each other. There are differences between the treatment means, but exactly
which means are differ is not specified. Sometimes in this situation, further comparisons
and analysis among groups of treatment means may be useful. Any method for carrying
out this further analysis is called a multiple comparison procedure. There a number of
such procedures in statistical literature. Graphical comparison mean, contrast, orthogonal
contrast, scheffé’s method, comparing pairs of treatment means, comparing treatment

10
means with a control, the least significant difference test, the Student-Newman-Keul’s
multiple range test, Duncan’s multiple range test. Certainly a logical question at this point
is which one of these procedures should be used. Unfortunately, there is no clear-cut
answer to this question. Carmer and Swanson (1973) have conducted a number of studies
on multiple comparison tests and reported that “least significant difference method” is
very effective test for detecting true differences in means if it is applied only after the F-
test in the analysis of variance is significant at 5 percent. They also report good
performance in detecting true differences with “Duncan’s multiple range test”.

In any experiment, variability arising from a nuisance factor can affect the results and we
try to make the experimental error as small as possible. A randomized complete block
design can be defined as one in which the experimental units within a particular block are
relatively homogeneous and each of block contains a complete set of treatments, i.e., it
constitutes a replication of treatments. At all stages during the experiment, the techniques
applied within a block should be as uniform as possible, thus keeping experimental error
within blocks as small as possible. Blocking will be effective only if the error variance
among units within blocks is smaller than the error variance over all units. The
advantages of RCB design are as follows:
• It improves precision relative to CR design. Effective blocking reduces S2, thus
resulting in greater precision or reduces the number of replications needed to
achieve equal precision and creates better treatment balance.
• This design is flexible. Any number of treatments, any number of block replicates
can be applied. Extra replications for certain treatments may also be included and
not all blocks need to contain the same number of units
• The scope of inference is increased and block means provide a comparison of the
differences among blocks.

The disadvantages are as follows:


• Certain assumptions may be required for some tests of hypotheses.
• Block treatment interactions may make interpretation of treatment effects more
difficult.
• Blocking for a single factor may not provide sufficient error control (precision).
• The gain in precision due to blocking generally decreases as the number of
experimental units in a block increases.
• Block degrees of freedom result in a reduction in error degrees of freedom, thus
reducing sensitivity in small experiments or when heterogeneity is small.
• Requires some prior knowledge about variability of experimental units for
successful blocking.

11
In our example the CR design ignored the physical layout of the cages and the potential
effect of the height of cage in which a rabbit was housed. If we wanted to acknowledge
the potential effect of height of the cage on the systematic difference in response
(coagulation time, we should organize the experiment using a randomized complete block
design. One diet of each type will be used on each of the 4 shelves. The randomization
procedure would assign a number 1-16 to each of the rabbits. Put four slips of marked 1,
2, 3, or 4 in a bowl. Select a number at random 1-16 to select a rabbit for Diet A, and pull
a number out of the bowl to select a position on the top row. Repeat three times without
replacement for Diets B, C, and D to complete the assignment to the top row. Follow the
same procedure to assign the other three rows. An example is shown below

This is a randomized block design. For the sake of illustration, the data collected with this
method is the same as with the completely randomized design. Ordering the data so it is
easier to read, we have the following observations.

In the two way analysis of variance the model that include the blocking variable is

By blocking on the shelf position, we hope to increase the power of the test by removing
variability associated with shelf height. This would allow us to detect smaller differences
between treatments.

12
From the computer output, we see that there is again a statistically significant difference
in coagulation time, with the p-value slightly smaller (p=0.0014). The mean square error
has been reduced to 5.0556 and the degrees of freedom are reduced to 9. In this case,

Any difference in mean greater than 3.6 is considered significant. The means for Diet B
and Diet C are not significantly different, as are those for Diet A and Diet D. However,
Diets B and C have larger mean coagulation times than Diets A and D.

The Latin square design is for a situation in which there are two extraneous sources of
variation; that is, it systematically allows blocking in two directions. Thus, the rows and
columns actually represent two restrictions on randomization. If the rows and columns of
a square are thought of as levels of the two extraneous variables, then in a Latin square
each treatment appears exactly once in each row and column. The advantages of Latin
square designs are:

13
1. They handle the case when we have several nuisance factors and we either cannot
combine them into a single factor or we wish to keep them separate.
2. They allow experiments with a relatively small number of runs.

The disadvantages are:

1. The number of levels of each blocking variable must equal the number of levels
of the treatment factor.
2. The Latin square model assumes that there are no interactions between the
blocking variables or between the treatment variable and the blocking variable.
3. Small squares have very few degrees of freedom for experimental error. You
can’t evaluate interactions between: (1) Rows and columns, (2) Rows and treatments
(3) Columns and treatments.

In our example suppose we want to block for both the row and column position in the
room. In this case, we need to insure that each diet is found once on each row and once in
each column. The Latin square allows this constraint to be satisfied. To set up a Latin
square begin with the standard square. Note that each letter appears once in each row and
column. Put the integers 1-16 in a bowl. Select a rabbit and pick a number out of the
bowl. That rabbit is assigned the position in the array indicated and given the diet specific
for that position. One possible configuration is shown below:

Now the model is

By blocking on the shelf position and the column position, we hope to increase the power
of the test by removing variability associated with both row and column positions. We
give up degrees of freedom for reduction in variability.

14
From the computer output, we see that there is again a statistically significant difference
in coagulation time among the diets, with the p-value larger than when only one blocking
variable was used ( p = 0.0047) . The mean square error is now 4.8333 and the degrees of
freedom are reduced to 6. In this case,

which is larger than the LSD when only one blocking variable was used. Since we now
need a larger difference in means to consider the difference significant, we can conclude
that adding the second blocking variable does not add power. The reduction in mean
square error is compensated for in the loss in degrees of freedom. As a result of blocking
on a meaningless variable, we can only distinguish between treatments whose means are
more than 3.8 units apart. Blocking only on shelf allowed us to distinguish as different
treatments whose means were 3.6 units or more apart.

The completely randomized design has the simplest analysis, and it should be used if
there are no other explanatory structural factors in the experiment. If the cages were
arranged so all cages were essentially equal relative to outside influences there would be

15
no need to block. The completely randomized design would be the most efficient. As
shown in this example, blocking when there is no need for blocking reduces the power of
the test, since it reduces the degrees of freedom without appreciably reducing the
variability.

The importance of experimental design stems from the quest for inference about causes
or relationships as opposed to simply description. Researchers are rarely satisfied to
simply describe the events they observe. They want to make inferences about what
produced, contributed to, or caused events. To gain such information without ambiguity,
some form of experimental design is ordinarily required. The purpose of the design is
to rule out these alternative causes, leaving only the actual factor that is the real cause.
The kinds of planned manipulation and observation called experimental design often
seem to become a bit complicated. This is unfortunate but necessary, if we wish to pursue
the potentially available information so the relationships investigated are clear and
unambiguous.

References:
Box, George P., William G. Hunter, and J. Stuart Hunter, Statistics for Experimenters,
John Wiley & Sons, New York, New York, 1978,
http://courses.ncssm.edu/math/Stat_Inst/PDFS/RanDesgn.pdf

Wannacott, Thomas H. and Ronald J. Wannacott, Introductory Statistics, John Wiley and
Sons, Inc. New York, New York, 1969, http://courses.ncssm.edu/math/Stat_Inst/PDFS/RanBlock.pdf

http://www.cs.wustl.edu/~jain/cse567-08/ftp/k_16ied.pdf

http://www.stat.yale.edu/Courses/1997-98/101/expdes.htm

Also
W. G. Cochran, G. M. Cox, Experimental Designs, 2nd edition, John Wiley & Sons, INC, 1957, A Wiley
International Edition

Douglas C. Montgomery, Design and Analysis of Experiments 6th edition, John Wiley & Sons, INC, 2005

No. of pages: 15

16

You might also like