6.1 Factorial experiment design

How To Do Research
Faster, Cheaper and Better:
Factorial experiment design
We join Gregor Mendel
and James Watson
Michael G. Walker, PhD.
[email protected]
debating a problem.
How do I increase
the yield of peas
from my pea
plants?
I think fertilizer is critical.
Seeds!
Everything important to life is in
the seeds.
The type of seeds you use is the
most important factor affecting
yield.
I wish I knew if watering once or twice
a week makes any difference to yield.
When I add fertilizer to my soil,
I think I get better yields.
But adding manure is more
work, and not very pleasant.
Watering twice a week is extra time
and effort.
6
1
I also don’t have much space. It’s winter; I
only have room for 4 plants in my
window.
Can I learn about seeds, fertilizer, and
watering with only 4 plants?
For the 1st crop, we should test seeds.
We’ll grow 4 plants:
• 2 plants with English seeds
• 2 plants with Austrian seeds
Then we’ll compare the yield.
Instead of seeds, for the 1st crop, I
think we should test soil manure:
• 2 plants with my regular soil
• 2 plants with manure added
That way we can see the effect
of manure.
• Not only that, it’s going to take 2 months for
each crop.
• That’s a problem. You want to test manure
first, and I think we should test seeds first.
Here’s the way we usually do
our experiments, one factor at
a time.
• 1st crop: Fertilizer
• 2nd crop: Seed type
• 3rd crop: Water frequency
2 months
2 months
2 months
• It’s going to take us 6 months to learn about
these effects.
2
• Can we can learn about the effects of seeds,
fertilizer, and watering in less than 6 months?
I’ve got an idea.
We’ll start with two factors:
For the 1st crop, we can have each factor at
two levels:
Seed type
Seed type: English / Austrian
Fertilizer: Regular / Manure
and fertilizer
We have 4 combinations and we can put 4
plants in the window.
Here’s a table of all 4 combinations.
Plant number
Seed
Soil
1
English
Regular
2
English
Manure
3
Austrian
Regular
4
Austrian
Manure
So we can test all 4 possible combinations
of seed type and soil type in a single crop.
3
• With one crop of 4 plants, we can test the effects
of both fertilizer and seeds.
• Using our standard one-factor-at-a-time
experiment design, we’d take 4 months to
learn about two factors, and wouldn’t know if
there was an interaction.
• And we can see if there is an interaction: does the
effect of fertilizer depend on the type of seed?
• Plus, it only takes us 2 months, instead of 4.
• With this new design we learn about 2 factors
plus interactions in only 2 months. So this
design is faster, cheaper and better.
I like this plan. We need a name for this new
experiment design.
I’m concerned that we only have one
measurement of each combination in this
factorial design.
The advantage of this design is that it let’s
us look at many factors.
Could that hurt our statistical significance?
Let’s call it a factorial experiment design.
• How about if we run a replicate of the
factorial design for the 2nd crop?
Here’s the one factor at a time design for Crop 1
Plant number
1
2
3
4
Fertilizer
Regular
Regular
Manure added
Manure added
4
Here’s the factorial design using 2 factors for Crop 1
• Using the one-factor-at-a-time method, we
would need 8 plants and 4 months.
• 1st crop:
• 2nd crop:
Fertilizer
Seed type
4 plants
4 plants
Plant number
1
2
3
4
Seed
English
English
Austrian
Austrian
Fertilizer
Regular
Manure added
Regular
Manure added
We can replicate this for Crop 2
Replicating the factorial design gives two replicates
of each combination using 8 plants total:
Crop
Plant number
Seed
Fertilizer
1
1
English
Regular
1
2
English
Manure added
1
3
Austrian
Regular
1
4
Austrian
Manure added
2
5
English
Regular
2
6
English
Manure added
2
7
Austrian
Regular
2
8
Austrian
Manure added
One factor at a time misses
interactions
How is that better than one-factor-at-a-time?
Here’s what our one-factor-at-a-time experiment
would look like.
Crop 1. Regular vs. Manure. Choose best or cheapest if equivalent yield.
Suppose we choose Regular to use in Crop 2.
Yield = 0
Yield = 0
One factor at a time:
Crop
Plant number
Seed
Fertilizer
1
1
Austrian
Regular
1
2
Austrian
Regular
2. Add female rabbit. Yield = 0
Conclude female has no effect.
Remove female
1
3
Austrian
Manure added
1
4
Austrian
Manure added
Conclude neither males nor
females affect yield of
babies.
2
5
English
Regular
2
6
English
Regular
2
7
Austrian
Regular
2
8
Austrian
Regular
1. Add male rabbit. Yield = 0
Conclude males have no effect.
Remove male
Not
tested
Yield = 0
0
Female Rabbits
1
Yield of baby rabbits
0
Male Rabbits
1
5
One factor at a time misses
interactions
One factor at a time:
English
2 plants
Not
tested
Crop 2
Austrian
2 plants +
2 plants
2 plants
Regular
Manure
Crop 1
1. Crop 1. Regular vs.
Manure. Choose best. or
cheapest if equivalent
yield.
• So testing multiple factors simultaneously in a
Factorial Design detects interactions that are
missed by one-factor-at-a-time.
2. Crop 2. Austrian vs. English
Never test Manure + English
seed combination.
Test Austrian + regular soil 4
times.
• Statistical power and significance increase
with the number of measurements, N.
<- Big N
There’s also a more subtle argument:
Factorial designs have hidden replicates
• So which design gives us more
measurements for each factor, factorial or
one-factor-at-a-time?
• Let’s look at an example.
34
• Suppose you want to test 3 variables (such as
3 reagents) to determine if they affect a
response (such as cell culture yield).
Culture number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Reagent
control
control
control
control
1
1
1
1
2
2
2
2
3
3
3
3
Conventional onefactor-at-a-time design
Each reagent
measured
N = 4 times
16 cultures total
6
• Here’s the alternative Factorial design:
• Measure all possible combinations of 3 reagents
23 = 8 combinations
• Including a replicate we still use only 2*8 = 16
cultures total
• Both methods use 16 cultures total
• One-factor-at-a-time measures each factor 4
times
Culture
number
Reagent 1
Reagent 2
Reagent 3
1
-
-
-
2
-
-
-
3
+
-
-
4
+
-
-
5
-
+
-
6
-
+
-
7
-
-
+
8
-
-
+
9
+
+
-
10
+
+
-
11
+
-
+
12
+
-
+
13
-
+
+
14
-
+
+
15
+
+
+
16
+
+
+
Still 16 cultures
total,
but
each reagent
measured
N = 8 times
• So for the same number of cultures (16), the
Factorial Design gives us twice as much
information (N=8 measurements of each
reagent) as we get using one-factor-at-a-time
(N=4)
• Factorial design measures each factor 8 times
• Plus, the Factorial Design tests interactions.
Fractional factorial designs
• Relative efficiency (N per factor) increases
with the number of factors
• Hidden replicates give more power, better pvalues
• If we have k factors, and k is large, we don’t
want to run all 2k combinations.
• Instead, we run a carefully chosen fraction of
2 k.
• These are called fractional factorial designs.
7
Factorial designs are not
recommended when:
• the treatment labels are likely to get mixed up
(e.g., pipette errors)
• logistics make combinations hard to set up
(e.g., if a robot has to be reprogrammed to set
up a 96-well plate)
• interactions may be harmful or lethal
• combinations of multiple treatments may be
harmful or lethal
Advantages of factorial designs
• Learn about more factors (seed, fertilizer,
water)
• Learn about interactions
• Increased statistical power and significance
(hidden replication)
• More robust results (across multiple conditions)
• All with the same or fewer runs that one-factorat-a-time
44
Factorial designs are
• Faster
• Cheaper
Regression and ANOVA
• Useful tools for statistical analysis of factorial
designs are analysis of variance (ANOVA) and
regression.
• More informative
46
8