Rawlings, John O.; (1986)Desing of experiments for Controlled Exposure Studies."

1
DESIGN OF EXPERIMENTS FOR CONTROLLED EXPOSURE STUDIES
John O. Rawlings
North Carolina State University
Raleigh, N.C. 27695-8203
Presented at
Workshop on Controlled Exposure Techniques and the
Evaluation of Tree Responses to Airborne Chemicals
Atlanta, Georgia
February 4, 1986
INTRODUCTION
I want to thank the organizers for this opportunity to
discuss experimental design for the controlled exposure
studies of the effects of air pollutants on forests.
A
common complaint of statisticians is that they are called
upon for their expertise only after the "damage has been
done" so it is indeed a pleasure to participate in this
early planning of the exposure studies in forest species.
The air pollution studies are extremely important, of major
interest to the population as a whole, not just the
scientists, and they tend to be very expensive studies.
2
Every effort must be made to insure that the results from
these studies are as accurate and precise as resources will
permit and that they provide a valid basis for the intended
inferences.
There are three major points I would like to address as
an introduction to the discussion which follows this paper.
First, I want to discuss the importance of clearly
specifying in some detail the objectives of the studies.
The purpose is not to constrain in any way the objectives of
the study but, rather, to encourage a clear definition of
the objectives whatever they might be.
In particular, I
want to concentrate on the importance of specifying the
reference population of environments to which the results of
the study are to be applied.
Secondly, I want to discuss
several key considerations in experimental design.
Finally,
I wish to make a suggestion on the use of dose response
modeling and make an appeal for an early committment to a
coordinated analysis of all forestry exposure data.
SPECIFICATION OF OBJECTIVES:
On the surface, it seems unnecessary to even mention the
need to specify objectives.
We all have in mind general
objectives when we set out to run an experiment.
However,
3
very often the objectives are not stated with sufficient
detail to distinguish among several different objectives.
In many cases, the optimum design for one objective will not
be the optimum design for another.
It is possible that the
optimum design for one objective will not even provide
relevant information for the other objective.
This conflict of objectives is illustrated with the
recent National Lakes Survey.
It is my understanding that
the objectives of the study included both assessment of the
number of lakes and the surface area of the lakes that were
acidified.
The optimum sampling design for estimation of
"number of lakes" is not the optimum sampling design for
"surface area".
was a compromise
Therefore, the design used in the survey
which provided reasonably good information
on both objectives but not maximum information, for the
resources, on either objective.
We can illustrate the need to be specific in the
statement of the objectives using the air pollution studies.
The general objective,
to study air pollution effects of
ozone, for example, does not tell the whole story.
Are we
interested in simply determining if ozone has an effect on
forest productivity or are we interested in assessing the
magnitude of the ozone effect on the forestry industry?
Is
4
it the quantity or the quality of forest products which is
important?
Are all species to be considered or only
loblolly pine?
Is it the Southeast Coastal Plain
environment, the Northeast, or the Northwest environments
that are of interest?
Even the two objectives of detecting
an effect versus assessing the magnitude of the effect
require very different kinds of studies if the final
inference is to be statistically defensible.
The key point I wish to make is that the statement of
objectives must include specification of the set of
environments, or the population of environments, to which
the final inference is to apply.
Regardless of how much we
know about the interaction of our biological material with
various environmental factors,
there will always be a major,
random, environmentally induced component of variance in
biological studies; both because we cannot identify and
understand all of the environmental factors which affect
plant growth and because we cannot control many of the
environmental factors we do understand.
Therefore,
if one
is to make an inference over a set of environmental
conditions, for example the environments of the SE Coastal
plain for the next 20 years, the experiments must be
designed so that a measure of how the plants will respond in
those environments is available.
5
Every factor that contributes to one environment being
different from another is involved in every experiment
regardless of whether that factor has been given any special
attention in the experimental design.
There are undoubtedly
many environmental variables that have not been identified
but which affect plant response.
These unidentified
variables, of necessity, are simply allowed to take whatever
values they happen to have in the particular test
environments.
They are responsible,
in part, for the
"random" environmental variation we observe both within and
between experiments.
The environmental variables we can
identify are handled, basically, in three ways:
(1)
The environmental factor is recognized as being
important and a decision is made to control that variable at
some constant level for all studies.
For example,
supplemental irrigation might be used to ensure that the
plants never suffer drought stress.
(2)
The environmental factor is recognized as being
important and the research is designed to include the study
of the effects of this variable and its interactions.
The
information obtained is then included in any predictions of
the behavior of the system; e.g., of the pollutant effects.
For example, levels of moisture stress might be included as
a treatment factor so that the effects of moisture stress on
6
the pollutant effects are understood.
(3)
The environmental variable is ignored, perhaps
because it cannot be controlled experimentally.
It plays
the same role in the experiment as the unidentified
environmental variables.
Its effects become part of the
random environmental "noise" in the system.
If moisture
level, for example, is simply ignored, the effects of the
"random" plot-to-plot differences in moisture level within
an experiment become part of experimental error.
The
"random" differences in moisture stress .Q..Y.ll the test
environments contribute to the "environment by pollutant"
interaction effects and become part of the error of
prediction of the effects of the pollutant.
When the intent of the study is to make an inference
over a population of environments that include variation in
that factor,
the first course of action can be recommended
only if the environmental factor is known to have no
influence on the effects of the pollutant; that is, if there
is no interaction between the factor and the pollutant.
This will be a rare situation and, in general, should not be
the chosen course of action.
An inference to the population
of environments implies that the level of this "controlled"
environmental factor will be allowed to vary according to
its rules of behavior in that population and, consequently,
7
any inference based on data from a "controlled" experiment
will be in error to the extent that there is interaction
between the factors.
Notice that this does not preclude controlling certain
environmental factors.
It is perfectly legitimate to
specify that certain factors are to be held constant if,
fact,
in
that is compatible with the intended population of
inference.
For example, if a crop is produced only under
green house conditions of adequate moisture and nutrients
under a fixed temperature regime, it would be appropriate to
"fix" the levels of these factors.
Likewise, if the
objective is only to determine if there is an effect of the
pollutant,
it may be desirable to hold certain factors
constant in order to gain in precision for that specific
objective.
The second course of action is to be preferred to the
extent that resources permit.
It is the intent of science
to understand as thoroughly as possible the systems we
choose to study.
This includes how these systems are
influenced by their surroundings.
Certainly, it is
desirable to identify the key environmental variables, those
having major impact on the process, and incorporate their
effects in the dose response models.
This is an expensive
8
option and there are obviously limits to the number of such
factors that can be studied in depth.
From the perspective
of predicting the impact of pollutants, the gain from
understanding the environmental variables is greater
precision since the environmental variation caused by these
key variables can now be taken into account in the
prediction.
Regardless of the magnitude of the resources, however,
there will always be many environmental factors that remain
as part of the random environmental background, that fall in
category three.
To the extent that these "ignored"
environmental variables affect the plants' response to the
pollutant, an inadequate (nonrepresentative) sampling of the
environmental conditions for the test environments will
produce biased estimates of the effects of the pollutant.
Further, the variations in response attributable to these
"ignored" factors contribute to the error of prediction.
Consequently, any statistically valid inference of the
effects of air pollution, for example, must be based on test
data in which the "ignored" environmental factors were
represented much the same as they will be in the population
of environments to which the inference is intended.
This is
accomplished, conceptually, by using as test environments a
random sample of the reference set of environments.
9
To emphasize the role of these environmental factors in
research, I would like to show a diagram used by George Box
in a recent seminar at NC State University.
It takes the
form of a slight elaboration on our usual statistical
models:
Y
= f(X;a)
+ €(w)
where f(Xja) is the part of the model that incorporates the
factors being studied X, and their parameters a.
The change
from the usual model is in expressing the residuals, €, as
functions of another set of variables represented by W.
This emphasizes the fact that what we usually regard as
random error, something mysterious and-beyond our control,
is in fact a result of the ignored environmental variables.
They play a role in the research regardless of whether or
not they are explicitly considered.
Of course, the purpose
of research is to move as many of the variables as possible
from W to X and, as a result, become more and more precise
about what can be said about the process being studied.
Let me reiterate using moisture availability as an
environmental variable for illustration.
All research could
be conducted only under well-watered conditions.
The
10
predictions of the effect of the pollutant would be biased
unless moisture level had no effect on the impact of the
pollutant or the population of environments of interest
involved only "well-watered" conditions.
Alternatively, the
effects of moisture stress on pollutant effects could be
studied, incorporated into the model, and then any
predictions to specific environments would take into account
the levels of moisture stress.
Finally, moisture level
could be simply ignored in the experiments;
that is,
moisture variability would be one of the variables in W, a
component of random error.
Variability in moisture stress
over the set of environments to which the predictions are
applied would become part of the random errors of
prediction.
Of course, it is very difficult and expensive to do an
adequate sampling of environments.
This is particularly
difficult for the studies which are labor intensive and
require fairly sophisticated technology such as the ozone
studies on plants.
Nevertheless,
it is important to
recognize this problem from the very beginning so that
either (1) an reasonably adequate sampling of environments
can be planned, or (2) the limitations of the data can be
appropriately recognized when the regional inferences are
attempted or (3) the objective can be recognized as being
11
unreasonable with the resources in hand.
EXPERIMENTAL DESIGN
Experimental design includes the sampling of
environments discussed above.
It also includes the choice
of treatment factors and "levels" of each treatment factor,
the numbers of replications within each environment sampled,
the choice of field plot design, size and shape of plots and
blocks which will minimize the impact of local environmental
(soil) variations, and consideration of potential covariates
that might provide statistical control of some local
variations.
Of course, we cannot cover the entire field of
experimental design today.
I have chosen just a few points
I consider to be particularly relevant.
1. Factorial treatment combinations:
The advantages of factorial experiments should not be
overlooked.
This point is particularly relevant in view of
my previous comments on identifying key environmental
variables and building the dose response model to include
their effects.
Investigation of treatment factors one-at-a-
time is inefficient and provides no measure of interaction
between treatment factors.
The only way interaction effects
12
can be estimated is by using combinations of levels of
relevant factors in the same experiments.
There is no loss
of information in factorial experiments if a factor is
included which turns out to have no effect; the levels of
the factor simply serve as additional replicates of the
other treatments.
This extra replication is referred to as
"hidden replication".
The factorial studies should be considered in two
distinct stages.
An early objective in a relatively new
area would be to identify those key environmental factors
that influence the response of the plant to air pollution.
A preliminary study for this objective would involve many
environmental factors, all that the researcher thought to be
of primary importance,
in a high level 2
n
factorial study.
With limited resources, it might be necessary to use partial
replicates of 2
n
factorials in these initial studies.
That
is, a subset of all-possible treatment combinations is
chosen such that the interaction effects of interest can be
estimated.
The purpose of these initial studies is simply
to identify key variables which then will be included in
further research.
The second series of studies would be
devoted to developing dose response functions which take
into account the effects of these key variables as well as
the pollutant.
These again would be factorial experiments
13
but with only the important factors involved and with enough
detail to characterize the dose response models.
A compromise design which allows several environmental
factors to be considered and at the same time provides some
information on the air pollutant dose response function
might use,
for example, a half-replicate of a 2
4
factorial
in all combinations with four levels of the pollutant.
half-replicate of the four environmental factors,
The
eight
treatment combinations, would be chosen so as to sacrifice
the four factor interaction.
In combination with the four
levels of the pollutant, this design would require 32
experimental units.
Five levels of the pollutant would
require 40 exposure chambers.
Such a design provides
estimates of the main effects of the environmental factors,
aliased with their three-factor interactions, and estimates
of the interaction effects of the air pollutant with the
environmental factors (Table 1).
The latter are aliased
with the four-factor interactions between the pollutant and
three environmental factors.
This design provides no direct
estimate of error from true replication.
The three-factor
interactions between the pollutant and two environmental
factors would be used as the estimate of experimental error.
14
4
Table 1. Analysis of variance for a half-replicate of a 2
factorial of four environmental factors with four levels of
the pollutant.
The half-replicate is chosen so that the
four-factor interaction among the environmental factors is
sacrificed.
SOURCE OF VARIATION
D.F.
ALIASED WITH:
----------------------------------------------------------Total
Pollutant
Factor A
Factor B
Factor C
Factor D
A x B
A x C
A x D
Pollutant
Pollutant
Pollutant
Pollutant
Pollutant
Pollutant
Pollutant
(P)
x
x
x
x
x
x
x
A
B
C
D
A x B
A x C
A x D
31
3
I
1
I
1
1
1
1
3
3
3
3
3*
3*
3*
B x C x D
A
A
A
C
x C x D
x B x D
x B x C
x D
B x D
B x C
Pollutant
Pollutant
Pollutant
Pollutant
Pollutant
Pollutant
Pollutant
x
x
x
x
x
x
x
B x C x D
A x C x D
A x B x D
A x B x C
C x D
B x D
B x C
* Three
factor interactions pooled for an estimate of
experimental error.
Different levels of some treatment factors can be
compared within exposure chambers.
Different cultivars, for
example can be grown in the same chamber.
The possibility
of including such a factor within the exposure chambers in a
conventional split-plot design, with very little additional
cost, should not be overlooked.
2. Numbers of levels of air pollutant treatment.
There is a common misconception that the more distinct
15
levels of a quantitative factor the better.
this is not true.
In general,
Given fixed resources, a fixed number of
exposure chambers, greater efficiency is obtained from a
relatively small number of treatment levels with a
corresponding higher number of replications at each level.
The optimum number of treatment levels depends on the
complexity of the response curve and the degree of faith one
has in the assumed response function.
For example, the
optimum strategy if the response curve is known to be linear
is to split the replicates equally between two extreme
exposures.
If the response curve is known to be quadratic,
the optimum number of distinct levels is three.
This
optimum number of levels, and their placement, depends on
the true dose response curve, which of course is not known.
To allow for a test of the adequacy of the response model
and to provide protection against misguessing the nature of
the response curve, a few additional points are needed.
The
extra design points become increasingly important for
protection as the response to the pollutant becomes more and
more concentrated in a narrow dose range.
Thus, one would
certainly use more levels than the optimum number for a
specific dose function but the philosophy of using as many
levels as possible should not be pursued.
The general rule
would be to use a few more points than required to
characterize the curve;
a few points more than the number of
16
parameters in the most complex model anticipated.
This, of
course, assumes that the remaining chambers are dedicated to
replication or the investigation of other factors.
3.Placement of design points.
One of my students, Karen Dassel, and I have spent
considerable time investigating the optimum design for the
Weibull response curve.
This is a three parameter nonlinear
model used in many of the NCLAN (National Crop Loss
Assessment Network) studies.
The optimum number of distinct
points for estimation of the model parameters is three if
the response is known to be Weibull.
The optimum placement
of these points can also be specified but, again, only if we
know the values of the parameters of the response curve.
This is, of course, an unrealistic situation for the
experimenter for if he knew the curve the experiment is no
longer needed.
Nevertheless, the study does point out some
general rules:
(1) When the "region of interest" covers only a part of
the dose response curve, there are sizeable gains in
precision to be realized from having the experimental dose
range go well beyond the region of interest.
With ozone,
for example, the region of primary interest is from
17
approximately 0.025 to 0.07ppm.
For the Weibull response
model, the optimum minimum dose is as near zero as practical
and the optimum maximum dose is the dose that generates
approximately an 80% yield reduction; approximately 0.15 to
0.20 ppm depending on species.
Precision of predicted
losses drops rapidly as the minimum dose level rises or as
the maximum dose level falls.
THERE IS A STATISTICAL
ADVANTAGE OF HAVING DOSES ABOVE THE RANGE OF DOSES EXPECTED
IN NATURE.
(2) Given that the end points of the dose range have
been determined as above,
it is best to concentrate the
remaining dose levels in the "region of activity" of the
response curve.
This, again, depends on some prior idea of
where the response will occur and can only be applied
realistically after some information is obtained.
With poor
prior information on the nature of the response curve,
uniform distribution of the design points over the dose
scale appears to provide protection against misguessing.
Uniform distribution of the design points results in some
loss in precision, compared to concentrating the points in
the region of activity,
guessed correctly.
if the dose response curve has been
But, on the other hand, the precision
with uniform distribution can be much better if the curve
has not been guessed correctly.
18
4. Placement of experimental plots and supplemental
information.
Exposure studies with perennial crops are long-term
commitments.
Errors at the beginning of the study will be
around to cause trouble for a long time.
Therefore, any
extra effort in choice of experimental units is likely to
pay good dividends over the course of the study.
Blocking
of experimental units should be used to remove as much of
the test site variability as possible.
The experimental
units (exposure chambers) should be located on uniform
material within blocks.
A uniformity trial on the
experimental area before the experiment is started may be
worthwhile.
Exposure studies have the advantage in that
growth of plants (at least initial growth, stand, etc.) can
be observed before the chambers are placed on the plots and
before treatments are assigned to plots.
Thus, plots can be
chosen for uniformity on the basis of the growth of the same
plants on which the study will be conducted.
In addition,
supplemental information can be obtained on the experimental
field (soil measurements, etc.) which can also be used to
define the experimental units and classify them according to
their similarities and differences.
5. Covariates and companion plots.
19
Use of initial plant and soil data for defining plots
and blocks has been mentioned.
The use of this kind of
information to remove field variability through covariance
analysis may be particularly useful in forestry studies
where selection of uniform field conditions may be
difficult.
Initial plant size, for example, has been shown
repeatedly to be very effective in improving the precision
of experiments.
NCLAN has also attempted to use supplemental information
in the form of "companion plots", plots immediately adjacent
to the exposure chambers.
This is essentially an entire
network of ambient air plots (control plots) interspersed
among the chambers.
For the annual plants, this has only
occasionally proven useful for improving the precision of
dose response models.
Companion plots may, however, be much
more useful in experiments with trees where effects are
allowed to accumulate over more time and where the
experimental areas are likely to be more heterogeneous
(because of larger plots and, therefore, a larger
experimental area and because of less mixing of the soils by
tillage).
Serious consideration should be given to the use
of tree growth data from companion plots to improve the
precision of the studies.
20
6. Power of test.
The objectives should include a statement of the desired
precision of the estimates and the desired power for
critical tests of significance.
These considerations
dictate the size of the experiment -- the number of
replications, the number of test environments and the number
of times the studies are repeated.
Frequently, precision
and power are not explicitly considered until after the
study has been completed, if then, and the data are being
analyzed.
Relative precision and power of different
experimental designs can be compared before the experiment
is run, and absolute power can be computed if an
estimate
of the coefficient of variation or the experimental error is
available.
With this information, experiments can be
designed with sufficient power to meet the objectives.
Much
of the controversy over scientific findings arises because
of an inadequate power in the designed experiments to
consistently detect real effects, an underdesigning of the
experiments.
As a consequence, the significant results in
some studies and insignificant results in others are
interpreted (incorrectly) as inconsistencies in the process;
not as inadequacies in the research.
21
Computation of power or, equivalently, the required
number of replicates, will quickly reveal whether the
objectives are reasonable with the available resources.
A
seemingly logical and economical study to assess the impact
of air pollution would use only ambient air and charcoal
filtered plots with several replicates of each.
How many
replicates are needed with this design to detect a
reasonably high 5% change in productivity between these two
levels of pollutant?
If we assume the coefficient of
variation is 10% and that any change in productivity will be
a decline, one needs approximately 22 replications to have
just a 50:50 chance of detecting the difference; fifty
replicates are needed to have a probability of .80 of
detecting the change.
If the change to be detected is
dropped to 2.5%, 200 replications are needed to have an 80%
chance of detecting the change.
Clearly, such a study
should not be undertaken unless the expected change is
considerably greater than 5% or the coefficient of variation
is lower than 10%.
ANALYSIS OF DATA.
There are two points I wish to emphasize concerning the
analysis of data.
The first point is to encourage the use
of response models and regression analyses to characterize
22
the response of quantitative variables to pollutants rather
than using simple means comparison procedures.
A good
response model, one which adequately characterizes the plant
response, greatly increases the precision of the predicted
yield losses over using simple comparison of two treatment
means.
Again, I use the Weibull model from NCLAN for
illustration of this concept.
Suppose one had the
objective of predicting yield loss for one specific dose
interval, say between 0.025 and 0.06 ppm.
One simple
experimental design would place half of the experimental
units at each end of the dose range of interest, at .025 and
.06.
The predicted relative yield loss would be estimated
as the difference between the two means divided by the mean
for the lower dose.
Note that all of the experimental
effort is concentrated on this one interval; no information
is available on any other interval.
Dassel's study of
design of experiments for the Weibull response model has
shown, however, that much higher precision is obtained, for
this same interval, from fitting the Weibull model and
predicting the relative yield loss from the fitted Weibull.
In addition, the fitted Weibull dose response model provides
information for all intervals within the limits of the
study, not just the .025 to .06 interval.
Use of
appropriate dose response models can greatly increase the
amount of information realized from an experiment.
23
Secondly, and finally,
statistics.
I want to put in a plug for
Some of you probably think that most of this
talk has been a plug.
Here, however, I want to push for the
direct involvement of statisticians in the conduct of the
experiments as team members, not simply as the statistician
to whom the graduate student or technician is sent when a
problem arises.
Obviously,
this point is a little out
sequence since the statistician as a team member is going to
be involved in all of the points we have discussed.
I
introduce it at this point because I want to encourage an
activity that falls primarily under the purview of the
statistician.
The program should be structured so that one individual
(a statistician directly involved in the program) is given
the responsibility of thoroughly and critically analyzing
all data.
If started early, this has the tremendous
advantage of feed-back to the researchers of additional
information needed for modeling, of inadequacies in the
experimental designs or conduct of the experiments, or of
inadequate power of experiments as designed.
It also has
the advantage that one individual has the responsibility of
asking questions which are frequently ignored;
are the
models adequate, are the basic assumptions of the analysis
24
satisfied, do the residuals behave properly, etc.
Such
questions frequently can be better answered from a
coordinated analysis of several data sets than from the
individual experiments.
For illustration of this point, I
have been concerned from the beginning of the NCLAN studies
that the variance associated with the yield responses might
not be homogeneous.
This concern arose from the often
observed phenonmena of the variance increasing with the
level of performance.
However, no solid indication of
heterogeneous variance was ever obtained from anyone study.
Recent analysis by Virginia Lesser of eight sets of soybean
data from North Carolina (Heagie), with 74 degrees of
freedom for the pooled residuals, provides fairly clear
indication that there is a tendency for the variance to
increase with the mean and that a log transformation would
be helpful.
At the same time, this larger data set provided
no indication that the Weibull model was inadequate.
Thus,
analyses of the combined data sets has both highlighted a
heretofore undetected problem and provided support for the
response model being used.
SUMMARY
There are many aspects of experimental design which
could be discussed today.
I have tried to mention a few
25
which I have found to be particularly important and often
not given the attention they deserve.
I would like to
summarize this presentation by listing these key points.
1. Clear specification of the objectives is a requisite
for good experimental design.
This must include an
understanding of the population of environments to
which the inferences are to be made.
Decisions must
be made, often with limited information, as to which
environmental factors are most likely to influence
the response to the pollutant so that they might be
studied and their effects included in the response
models.
2. The greater efficiency of factorial experiments
should not be overlooked.
Partial replicates of 2
n
factorials serve as useful screening studies to
identfy the important interacting environmental
factors.
The key environmental factors must then be
incorporated in the dose response studies if their
effects on the dose response model are to be
understood and taken into account.
3. There is no need to have an excessively large number
of distinct levels of the pollutant in order to
characterize the dose response curve.
More and
better information is obtained if a minimal number of
levels is used with a corresponding increase in
26
numbers of replications or numbers of environmental
factors included in the study.
4. The response curve is estimated best if the design
points cover a wide enough dose range to allow a
major part of the dose response curve to be realized.
This range is usually well beyond the levels of the
pollutant expected in nature.
The optimum placement
of these points is dependent on the nature of the
response curve.
A few additional points and equal
distribution of the design points over the interval
seem to protect against misguessing the response
curve.
5. Preliminary information on the soils and the plants
at the test sites can improve the precision of the
experiments by facilitating the placement of plots
and blocks and as covariate information in the
analyses of data.
This may be even more useful for
perennial crops than for annual crops.
6. The power of a study for specific hypotheses should
be considered before the study is run.
Computations
of power will frequently alert one to inadequate
experiments or overly ambitious objectives.
7. Dose response models and regression analyses should
be used to characterize the response to the pollutant
whenever possible.
In general, more precise
27
predictions of effects can be made from appropriate
response equations than from simple comparison of
means.
8. The planning of a major research effort should
include early and coordinated analyses of all
results.
The combined analyses provide opportunities
for finding inadequacies in the research and the
models which might otherwise go undetected.