How To Do Research Faster, Cheaper and Better: Factorial experiment design We join Gregor Mendel and James Watson Michael G. Walker, PhD. [email protected] debating a problem. How do I increase the yield of peas from my pea plants? I think fertilizer is critical. Seeds! Everything important to life is in the seeds. The type of seeds you use is the most important factor affecting yield. I wish I knew if watering once or twice a week makes any difference to yield. When I add fertilizer to my soil, I think I get better yields. But adding manure is more work, and not very pleasant. Watering twice a week is extra time and effort. 6 1 I also don’t have much space. It’s winter; I only have room for 4 plants in my window. Can I learn about seeds, fertilizer, and watering with only 4 plants? For the 1st crop, we should test seeds. We’ll grow 4 plants: • 2 plants with English seeds • 2 plants with Austrian seeds Then we’ll compare the yield. Instead of seeds, for the 1st crop, I think we should test soil manure: • 2 plants with my regular soil • 2 plants with manure added That way we can see the effect of manure. • Not only that, it’s going to take 2 months for each crop. • That’s a problem. You want to test manure first, and I think we should test seeds first. Here’s the way we usually do our experiments, one factor at a time. • 1st crop: Fertilizer • 2nd crop: Seed type • 3rd crop: Water frequency 2 months 2 months 2 months • It’s going to take us 6 months to learn about these effects. 2 • Can we can learn about the effects of seeds, fertilizer, and watering in less than 6 months? I’ve got an idea. We’ll start with two factors: For the 1st crop, we can have each factor at two levels: Seed type Seed type: English / Austrian Fertilizer: Regular / Manure and fertilizer We have 4 combinations and we can put 4 plants in the window. Here’s a table of all 4 combinations. Plant number Seed Soil 1 English Regular 2 English Manure 3 Austrian Regular 4 Austrian Manure So we can test all 4 possible combinations of seed type and soil type in a single crop. 3 • With one crop of 4 plants, we can test the effects of both fertilizer and seeds. • Using our standard one-factor-at-a-time experiment design, we’d take 4 months to learn about two factors, and wouldn’t know if there was an interaction. • And we can see if there is an interaction: does the effect of fertilizer depend on the type of seed? • Plus, it only takes us 2 months, instead of 4. • With this new design we learn about 2 factors plus interactions in only 2 months. So this design is faster, cheaper and better. I like this plan. We need a name for this new experiment design. I’m concerned that we only have one measurement of each combination in this factorial design. The advantage of this design is that it let’s us look at many factors. Could that hurt our statistical significance? Let’s call it a factorial experiment design. • How about if we run a replicate of the factorial design for the 2nd crop? Here’s the one factor at a time design for Crop 1 Plant number 1 2 3 4 Fertilizer Regular Regular Manure added Manure added 4 Here’s the factorial design using 2 factors for Crop 1 • Using the one-factor-at-a-time method, we would need 8 plants and 4 months. • 1st crop: • 2nd crop: Fertilizer Seed type 4 plants 4 plants Plant number 1 2 3 4 Seed English English Austrian Austrian Fertilizer Regular Manure added Regular Manure added We can replicate this for Crop 2 Replicating the factorial design gives two replicates of each combination using 8 plants total: Crop Plant number Seed Fertilizer 1 1 English Regular 1 2 English Manure added 1 3 Austrian Regular 1 4 Austrian Manure added 2 5 English Regular 2 6 English Manure added 2 7 Austrian Regular 2 8 Austrian Manure added One factor at a time misses interactions How is that better than one-factor-at-a-time? Here’s what our one-factor-at-a-time experiment would look like. Crop 1. Regular vs. Manure. Choose best or cheapest if equivalent yield. Suppose we choose Regular to use in Crop 2. Yield = 0 Yield = 0 One factor at a time: Crop Plant number Seed Fertilizer 1 1 Austrian Regular 1 2 Austrian Regular 2. Add female rabbit. Yield = 0 Conclude female has no effect. Remove female 1 3 Austrian Manure added 1 4 Austrian Manure added Conclude neither males nor females affect yield of babies. 2 5 English Regular 2 6 English Regular 2 7 Austrian Regular 2 8 Austrian Regular 1. Add male rabbit. Yield = 0 Conclude males have no effect. Remove male Not tested Yield = 0 0 Female Rabbits 1 Yield of baby rabbits 0 Male Rabbits 1 5 One factor at a time misses interactions One factor at a time: English 2 plants Not tested Crop 2 Austrian 2 plants + 2 plants 2 plants Regular Manure Crop 1 1. Crop 1. Regular vs. Manure. Choose best. or cheapest if equivalent yield. • So testing multiple factors simultaneously in a Factorial Design detects interactions that are missed by one-factor-at-a-time. 2. Crop 2. Austrian vs. English Never test Manure + English seed combination. Test Austrian + regular soil 4 times. • Statistical power and significance increase with the number of measurements, N. <- Big N There’s also a more subtle argument: Factorial designs have hidden replicates • So which design gives us more measurements for each factor, factorial or one-factor-at-a-time? • Let’s look at an example. 34 • Suppose you want to test 3 variables (such as 3 reagents) to determine if they affect a response (such as cell culture yield). Culture number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Reagent control control control control 1 1 1 1 2 2 2 2 3 3 3 3 Conventional onefactor-at-a-time design Each reagent measured N = 4 times 16 cultures total 6 • Here’s the alternative Factorial design: • Measure all possible combinations of 3 reagents 23 = 8 combinations • Including a replicate we still use only 2*8 = 16 cultures total • Both methods use 16 cultures total • One-factor-at-a-time measures each factor 4 times Culture number Reagent 1 Reagent 2 Reagent 3 1 - - - 2 - - - 3 + - - 4 + - - 5 - + - 6 - + - 7 - - + 8 - - + 9 + + - 10 + + - 11 + - + 12 + - + 13 - + + 14 - + + 15 + + + 16 + + + Still 16 cultures total, but each reagent measured N = 8 times • So for the same number of cultures (16), the Factorial Design gives us twice as much information (N=8 measurements of each reagent) as we get using one-factor-at-a-time (N=4) • Factorial design measures each factor 8 times • Plus, the Factorial Design tests interactions. Fractional factorial designs • Relative efficiency (N per factor) increases with the number of factors • Hidden replicates give more power, better pvalues • If we have k factors, and k is large, we don’t want to run all 2k combinations. • Instead, we run a carefully chosen fraction of 2 k. • These are called fractional factorial designs. 7 Factorial designs are not recommended when: • the treatment labels are likely to get mixed up (e.g., pipette errors) • logistics make combinations hard to set up (e.g., if a robot has to be reprogrammed to set up a 96-well plate) • interactions may be harmful or lethal • combinations of multiple treatments may be harmful or lethal Advantages of factorial designs • Learn about more factors (seed, fertilizer, water) • Learn about interactions • Increased statistical power and significance (hidden replication) • More robust results (across multiple conditions) • All with the same or fewer runs that one-factorat-a-time 44 Factorial designs are • Faster • Cheaper Regression and ANOVA • Useful tools for statistical analysis of factorial designs are analysis of variance (ANOVA) and regression. • More informative 46 8
© Copyright 2025 Paperzz