The functions in this chapter cover a wide variety of commonly used experimental designs. They can be categorized, not only based upon the underlying experimental design that generated the users data, but also on whether they provide support for missing values, factorial treatment structure, blocking and replication of the entire experiment, or multiple locations.
Typically, responses are stored in the input vector y. For a few functions, such as imsls_f_anova_oneway and imsls_f_anova_factorial the full set of model subscripts is not needed to identify each response. They assume the usual pattern, which requires that the last model subscript change most rapidly, followed by the model subscript next in line, and so forth, with the first subscript changing at the slowest rate. This pattern is referred to as lexicographical ordering.
However, for most of the functions in this chapter, one or more arrays are used to describe the experimental conditions associated with each value in the response input vector y. The function imsls_f_split_plot for example, requires three additional input arrays: split, whole and rep. They are used to identify the split-plot, whole-plot and replicate number associated with each value in y.
Many of the functions described in this chapter permit users to enter missing data values using NaN (Not a Number) as the missing value code. Use function imsls_f_machine (or function imsls_d_machine with the double-precision) to retrieve NaN. Any element of y that is missing must be set to imsls_f_machine(6) or imsls_d_machine(6) (for double precision). See imsls_f_machine in Chapter 15, Utilities for a description. Functions imsls_f_anova_factorial, imsls_f_anova_nested and imsls_f_anova_balanced require complete, balanced data, and do not accept missing values.
As a diagnostic tool for validating model assumptions, some functions in this chaptersection perform a test for lack of fit when replicates are available in each cell of the experimental design.section.
Completely randomized experiments are analyzed using some variation of the one-way analysis of variance (Anova). A completely randomized design (CRD) is the simplest and most common example of a statistically designed experiment. Researchers using a CRD are interested in comparing the average effect of two or more treatments. In agriculture, treatments might be different plant varieties or fertilizers. In industry, treatments might be different product designs, different manufacturing plants, different methods for delivering the product, etc. In business, different business processes, such as different shipping methods or alternate approaches to a product repair process, might be considered treatments. Regardless of the area, the one thing they have in common is that random errors in the observations cause variations in differences between treatment observations, making it difficult to confirm the effectiveness of one treatment to another.
If observations on these treatments are completely independent then the design is referred to as a completely randomized design or CRD. The IMSL C Numerical Library has two routines for analysis of data from CRD: imsls_f_anova_oneway and imsls_f_crd_factorial.
Both functions allow users to specify observations with missing values, have unequal group sizes, and output treatment means and standard deviations. The primary difference between the functions is that:
1. imsls_f_anova_oneway conducts multiple comparisons of treatment functions; whereas imsls_f_crd_factorial requires users to make a call to imsls_f_multiple_comparisons to compare treatment means.
2. imsls_f_crd_factorial can analyze treatments with a factorial treatment structure; whereas imsls_f_anova_oneway does not analyze factorial structures.
3. imsls_f_crd_factorial can analyze data from CRD experiments that are replicated across several blocks or locations. This can happen when the same experiment is repeated at different times or different locations.
In some cases, treatments are identified by a combination of experimental factors. For example, in an octane study comparing several different gasolines, each gasoline could be developed using a combination of two additives, denoted below in Table 1, as Additive A and Additive B.
|
Treatment |
Additive A |
Additive B |
|
1 |
No |
No |
|
2 |
Yes |
No |
|
3 |
No |
Yes |
|
4 |
Yes |
Yes |
Figure 4- 1 2x2 Factorial Experiment
This is referred to as a 2x2 or 22 factorial experiment. There are 4 treatments involved in this study. One contains no additives, i.e. Treatment 1. Treatment 2 and 3 contain only one of the additives and treatment 4 contains both. A one-way anova, such as found in anova_oneway can analyze these data as four different treatments. Three functions, imsls_f_crd_factorial, imsls_f_rcbd_factorial and imsls_f_anova_factorial will analyze these data exploiting the factorial treatment structure. These functions allow users to answer structural questions about the treatments such as:
1. Are the average effects of the additives statistically significant? This is referred to as the factor main effects.
2. Is there an interaction effect between the additives? That is, is the effectiveness of an additive independent of the other?
Both imsls_f_crd_factorial and imsls_f_rcbd_factorial support analysis of a factorial experiment with missing values and multiple locations. The function imsls_f_anova_factorial does not support analysis of experiments with missing values or experiments replicated over multiple locations. The main difference, as the names imply, between imsls_f_crd_factorial and imsls_f_rcbd_factorial is that imsls_f_crd_factorial assumes that treatments were completely randomized to experimental units. The imsls_f_rcbd_factorial routine assumes that treatments are blocked.
Blocking is an important technique for reducing the impact of experimental error on the ability of the researcher to evaluate treatment differences. Usually this experimental error is caused by differences in location (spatial differences), differences in time (temporal differences) or differences in experimental units. Researchers refer to these as blocking factors. They are identifiable causes known to cause variation in observations between experimental units.
There are several functions that specifically support blocking in an experiment: imsls_f_rcbd_factorial, imsls_f_lattice, and imsls_f_latin_square. The first two functions, imsls_f_rcbd_factorial and imsls_f_lattice, support blocking on one factor.
A requirement of RCBD experiments is that every block must
contain observations on every treatment. However, when the number of treatments
(
) is greater than the
block size (
), it is
impossible to have every block contain observations on every
treatment.
In this case, when
, an incomplete block design must be used instead of a
RCBD. Lattice designs are a type of incomplete block design in which the
number of treatments is equal to the square of an integer such as
9, 16, 25, etc. Lattice
designs were originally described by Yates (1936). The function imsls_f_lattice supports analysis of data from
lattice experiments.
Besides the requirement that
, another characteristic of lattice
experiments is that blocks be grouped into replicates, where each replicate
contains one observation for every treatment. This forces the number of
blocks in each replicate to be equal to the number of observations per
block. That is, the number of blocks per replicate and the number of
observations per block are both equal to
.
In addition, the number of replicate groups in Lattice
experiments is always less than or equal to
. If it is equal to
then the design is referred
to as a Balanced Lattice. If it is less than
then the design is referred
to as a Partially Balanced Lattice. Tables of these experiments and their
analysis are tabulated in Cochran & Cox (1950).
Consider, for example, a 3x3 balanced-lattice, i.e.,
k=3 and t=9. Notice that the number of replicates is
. And the number of
blocks per replicate and block size are both
. The total number of blocks
is equal to
. For a
balanced-lattice,
.
|
Block 1 (T1, T2, T3) |
Block 4 (T1, T4, T7) |
|
Block 2 (T4, T5, T6) |
Block 5 (T2, T5, T8) |
|
Block 3 (T7, T8, T9) |
Block 6 (T3, T6, T9) |
|
Block 7 (T1, T5, T9) |
Block 10 (T1, T6, T8) |
|
Block 8 (T2, T6, T7) |
Block 11 (T2, T4, T9) |
|
Block 9 (T3, T4, T8) |
Block 12 (T3, T5, T7) |
Table 2 - A 3x3 Balanced-Lattice for Nine Treatments in Four Replicates.
The Anova table for a balanced-lattice experiment, takes the form shared with other balanced incomplete block experiments. In these experiments, the error term is divided into two components: the Inter-Block Error and the Intra-Block Error. For single and multiple locations, the general format of the Anova tables for Lattice experiments is illustrated in Table 3 and Table 4.
|
Source |
DF |
Sum of Squares |
Mean Squares |
|
REPLICATES |
|
SSR |
MSR |
|
TREATMENTS(unadj) |
|
SST |
MST |
|
TREATMENTS(adj) |
|
SSTa |
MSTa |
|
BLOCKS(adj) |
|
SSBa |
MSBa |
|
INTRA-BLOCK ERROR |
|
SSE |
MSE |
|
TOTAL |
|
SSTot | |
Table 3 The Anova Table for a Lattice Experiment at One Location
Table 4 The Anova Table for a Lattice Experiment at Multiple Locations
Latin Square designs are very popular in cases where:
1. two blocking factors are involved
2. the two blocking factors do not interact with treatments, and
3. the number of blocks for each factor is equal to the number of treatments.
Consider an octane study involving 4 test vehicles tested in 4 bays with 4 test gasolines. This is a natural arrangement for a Latin square experiment. In this case there are 4 treatments, and two blocking factors, test vehicle and bay, each with 4 levels. The Latin Square for this example would look like the following arrangement.
|
|
Test Vehicle | ||||
|
|
1 |
2 |
3 |
4 | |
|
Test Bay |
1 |
A |
C |
B |
D |
|
2 |
D |
B |
A |
C | |
|
3 |
C |
A |
D |
B | |
|
4 |
B |
D |
C |
A | |
Table 5. A Latin Square Design for t=4 Treatments
As illustrated above in Table 5, the letters A-D are used to denote the four test gasolines, or treatments. The assignment of each treatment to a particular test vehicle and test bay is described in Table 5. Gasoline A, for example, is tested in the following four vehicle/bay combinations: (1/1), (2/3), (3/2), and (4/4).
Notice that each treatment appears exactly once in every row and column. This balance, together with the assumed absence of interactions between treatments and the two blocking factors is characteristic of a Latin Square.
The corresponding Anova table for these data contains information on the blocking factors as well as treatment differences. Notice that the F-test for one of the two blocking factors, test vehicle, is statistically significant (p = 0.048); whereas the other, test bay, is not statistically significant (p=0.321).
Some researchers might use this as a basis to remove test bay as a blocking factor. In that case, the design can then be analyzed as a RCBD experiment since every treatment is repeated once and only once in every block, i.e., test vehicle.
Table 6 - Latin Square Anova Table for Octane Experiment
It is common for a researcher to repeat an experiment and then conduct an analysis of the data. In agricultural experiments, for example, it is common to repeat an experiment at several different farms. In other cases, a researcher may want to repeat an experiment at a specified frequency, such as week, month or year. If these repeated experiments are independent of one another then we can treat them as multiple locations.
Several of the functions in this chapter allow for multiple locations: imsls_f_crd_factorial, imsls_f_rcbd_factorial, imsls_f_lattice, imsls_f_latin_square, imsls_f_split_plot, imsls_f_split_split_plot, imsls_f_strip_plot, imsls_f_strip_split_plot. All of these functions allow for analysis of experiments replicated at multiple locations. By default they all treat locations as a random factor. Function imsls_f_split_plot also allows users to declare locations as a fixed effect.
Originally, split-plot designs were developed for testing agricultural treatments, such as varieties of wheat, different fertilizers or different insecticides. In these original experiments, growing areas were divided into plots. The major treatment factor, such as wheat variety, was randomly assigned to these plots. However, in addition to testing wheat varieties, they wanted to test another treatment factor such as fertilizer. This could have been done using a CRD or RCBD design. If a CRD design was used then treatment combinations would need to be randomly assigned to plots, such as shown below in Table 7.
|
CRD | |||
|
W3F2 |
W1F3 |
W4F1 |
W2F1 |
|
W2F3 |
W1F1 |
W1F3 |
W1F2 |
|
W2F2 |
W3F1 |
W2F1 |
W4F2 |
|
W3F2 |
W1F1 |
W2F3 |
W1F2 |
|
W4F1 |
W3F2 |
W3F2 |
W4F3 |
|
W4F3 |
W3F1 |
W2F2 |
W4F2 |
Table 7 Completely Randomized Experiments Both Factors Randomized
In the CRD illustration above, any plot could have any
combination of wheat variety (W1, W2, W3 or W4) and fertilizer (F1, F2 or
F3). There is no restriction on randomization in a CRD. Any of the
treatments can
appear in any of the 24 plots.
If a RCBD were used, all t=12 treatment combinations would need to be arranged in blocks similar to what is described in Table 8, which places one restriction on randomization.
|
RCBD | ||||
|
Block 1 |
W3F3 |
W1F3 |
W4F1 |
W4F3 |
|
W2F3 |
W1F1 |
W3F2 |
W1F2 | |
|
W2F2 |
W3F1 |
W2F1 |
W4F2 | |
|
Block 2 |
W3F2 |
W1F1 |
W2F3 |
W1F2 |
|
W4F1 |
W1F3 |
W3F2 |
W4F3 | |
Table 8 Randomized Complete Block Experiments Both Factors Randomized Within a Block
The RCBD arrangement is basically a replicated CRD design with a randomization restriction that treatments are divided into two groups of replicates which are assigned to a block of land. Randomization of treatments only occurs within each block.
At first glance, a split-plot experiment could be mistaken for a RCBD experiment since it is also blocked. The split-plot arrangement with only one replicate for this experiment is illustrated below in Table 9. Notice that it appears as if levels of the fertilizer factor (F1, F2, and F3) are nested within wheat variety (W1, W2, W3 and W4), however that is not the case. Varieties were actually randomly assigned to one of four rows in the field. After randomizing wheat varieties, fertilizer was randomized within wheat variety.
|
Split-Plot Design | ||||
|
Block 1 |
W2 |
W2F1 |
W2F3 |
W2F2 |
|
|
W1 |
W1F3 |
W1F1 |
W1F2 |
|
W4 |
W4F1 |
W4F3 |
W4F2 | |
|
W3 |
W3F2 |
W3F1 |
W3F3 | |
|
Block 2 |
W3 |
W3F2 |
W3F1 |
W3F3 |
|
|
W1 |
W1F3 |
W1F1 |
W1F2 |
|
W4 |
W4F1 |
W4F3 |
W4F2 | |
|
W2 |
W2F1 |
W2F3 |
W2F2 | |
Table 9 A Split-Plot Experiment for Wheat (W) and Fertilizer (F)
The essential distinction between split-plot experiments and completely randomized or randomized complete block experiments is the presence of a second factor that is blocked, or nested, within each level of the first factor. This second factor is referred to as the split-plot factor, and the first is referred to as the whole-plot factor.
Both factors are randomized, but with a restriction on randomization of the second factor, the split-plot factor. Whole plots (wheat variety) are randomly assigned, without restriction to plots, or rows in this example. However, the randomization of split-plots (fertilizer) is restricted. It is restricted to random assignment within whole-plots.
Strip-plot experiments look similar to split-plot experiments. In fact they are easily confused, resulting in incorrect statistical analyses. The essential distinction between strip-plot and split-plot experiments is the application of the second factor. In a split-plot experiment, levels of the second factor are nested within the whole-plot factor (see Table 11). In strip-plot experiments, the whole-plot factor is completely crossed with the second factor (see Table 10).
This occurs, for example, when an agricultural field is used as a block and the levels of the whole-plot factor are applied in vertical strips across the entire field. Levels of the second factor are assigned to horizontal strips across the same block.
|
|
|
Whole-Plot Factor | |||
|
|
|
| |||