Experimental Design for Plant and Microbial Biology

Experimental Design for Plant
and Microbial Biology
James K. M. Brown
Disease and Stress Biology Dept, John Innes Centre
[email protected]
Phone: 2615 from JIC or IFR; 450615 from UEA
Ground floor of Biffen Building, end nearest the Library
Aims
1.
2.
3.
4.
Sources of variation in experiments
Principles of good experimental design
Some useful designs
Thinking about good experimental design in various
kinds of experiment
Types of data
Describing a...
Qualitative
Quantitative
Car
House
Person
Plant
Bacterium
Lecture notes on Experimental Design for Plant and Microbial Biology by J.K.M. Brown. © Copyright James Kenneth
MacMyn Brown 2003. The right of the Author to be identified as the Author of this Work has been asserted in
accordance with the UK Copyrights, Designs, and Patents Act 1988.
How does variation arise? (1)
Response of plants to waterlogging
Investigating genetic variation in the response of several
plant varieties to waterlogging in a glasshouse experiment
Source of
variation
Plant genotype
Soil or compost used
Size of pot
Purity of seed
Watering regime
Position in glasshouse
Repetition of experiment
Person scoring symptoms
Anything else?
Affect
results?
Are you
interested?
How does variation arise? (2)
Level of gene expression
Investigating variation in the expression of GUS under the
control of various promoters in transformed plants grown
in a controlled environment cabinet.
Source of
variation
Promoter
Purity of seed
Soil & pots for growing plants
RNA extraction
Light and temperature in
which plants grown
Repetition of experiment
Spectrophotometer
Anything else?
Affect
results?
Are you
interested?
How does variation arise? (3)
Plot yield in field trials
Investigating variation in yield of plant varieties grown in
field plots with about 1000 plants per plot
Source of
variation
Plant genotype
Purity of seed
Damage by animals
Position in the field
Site of field trial
Repetition of experiment
Person recording yield
Anything else?
Affect
results?
Are you
interested?
Factors involved in experiments
!
Things you are interested in
"
Treatment factors (. fixed effects)
!
Things you aren’t interested in but can control
"
Type and amount of soil in pots, type of
spectrophotometer
!
Things you aren’t interested in and can’t control
"
Remove clearly odd results (seed contamination,
bunny damage to plots, etc) ! missing values
"
Take account of other, extraneous factors: time,
space, subjective influence, etc (. random effects)
How?
Principles of good design: “3 R’s”
Replication
Randomisation
(Restriction) = blocking
Replication: why?
Randomisation: why?
Replication
!
Understand the variability in the experiment
!
Estimate the difference between mean effects of
different treatments
!
More replication increases the precision of estimates
(but too much replication may be expensive)
Randomisation
!
Take account of extraneous variation
!
Insure against “something strange” happening to an
experimental unit
!
Randomise:
"
in space (glasshouse / growth room / field)
"
in time (applying treatments to plants / other
material)
!
How to randomise experiments:
"
Random number tables
"
EDGAR: Experimental Design Generator And
Randomiser
www.jic.bbsrc.ac.uk/services/statistics/
edgar.htm or follow links from JIC Guide
(intranet) or from Science > Facilities (internet) or
from my web page (www.jic.bbsrc.ac.uk/staff/
james-brown)
Complete randomisation
All treatments have an equal chance of being assigned to
each unit in the experiment
e.g. four varieties of plant (A-D):
placed on glasshouse bench
D
A
C
C
A
A
D
C
B
C
A
C
D
A
B
B
B
D
D
B
sequence of extracting RNA
time ö
BBDDCCCBADAACABABDDC
Advantages
!
!
Flexible: can have different numbers of different
treatments
Higher precision than other designs if you know that
there’s very little extraneous variation that you need
to incorporate into block factors
Disadvantages
!
Doesn’t allow for systematic extraneous variation
Analysis of variance
!
Asks if variation between treatments (including
genotypes) is significantly greater than random
variation between units (plants, plots, etc)
The 3rd ‘R’: restriction (blocking)
Etymology (?): restricting the
variation on your experiment
impact
of
extraneous
!
What is the main source of extraneous variation?
(Think of factors in time as well as space)
!
You now have two main sources of variation:
"
The treatment factor
"
The other factor = the block factor
!
Divide your experiment into blocks in the direction of
the block factor
!
If you are blocking your experiment, try to get as
much as possible of the non-treatment variation to go
in the direction of the blocks
"
e.g. a field trial arranged in blocks going up a hill
or along a gradient of soil type
"
then score the blocks one-by-one so any variation
in your scoring over time is included in the block
factor
What factors can be blocked in the 3 experiments we
considered earlier?
Randomised complete blocks
!
!
Allocate (randomly) one unit of each treatment to each
block
Randomise different treatments within blocks
e.g. four varieties of plant (A, B, C, D):
placed on glasshouse bench
Blocks
1
D
C
A
B
2
C
B
A
D
3
D
B
C
A
4
A
B
C
D
5
C
B
D
A
window
sequence of extracting RNA
Timeö D C B A : D B C A : D B A C : A C D B : B D C A
Block 1
2
3
4
5
Advantages
!
Easiest way of controlling extraneous variation
!
Most commonly used design
Disadvantages
May still be substantial variation within blocks if there are
many treatments (NB glasshouses & growth rooms). Other
designs available if there are many treatments with few
reps (get expert advice).
Analysis of variance
Asks if variation between treatments is significantly
greater than random variation between units within a
block ! i.e. takes account of variation between blocks
!
Blocking increases power of detecting differences
between mean effect of treatments
Two treatment factors
1. Randomised complete blocks
!
!
In each block, one unit per combination of two
treatments
Units
with
different
treatment
combinations
randomised within each block
Examples
!
!
!
Gene interactions: effect of alleles at two loci on plant
phenotype (e.g. flower colour/shape; response to
disease)
Complex environmental effects: e.g. effect of light and
temperature on flowering time
Genotype-by-environment interaction: e.g. resistance
gene + pathogen isolate; vernalisation gene + daylength
Questions
!
!
Do the effects of different treatments interact with one
another?
"
If so, it may not be helpful to consider the effect
of each treatment separately
If not, do the mean effects of each treatment differ?
Balance
!
Easiest to analyse experiments with two treatment
factors if all blocks have all combinations of
treatments with one unit per combination. Seek
advice if your experiment can’t be arranged like this.
2. Split plots
!
!
Blocks divided into main plots (whole plots)
Main plots divided into sub-plots
!
For one treatment, each factor is applied to main
plots (assigned randomly) within each block
For the other treatment, each factor is applied to subplots (assigned randomly) within each main plot
!
!
Useful when one treatment can only be applied to
large amounts of material
Examples
Effect of environmental conditions on plant genotypes:
Growth room with different environments = main plots
Pots of plants of different genotypes = sub-plots
Disease trials:
Spray different isolates of fungus onto main plots
Plants of plants of different varieties = sub-plots
Nutrient trials
Hydroponics tanks with different solutions = main plots
Plants of different genotypes = sub-plots
Analysis
!
Compare variation between treatments applied to
main plots to random variation between main plots
!
Compare variation between treatments applied to
sub-plots, and variation of the interaction, to random
variation between sub-plots
Designing experiments
For each of the examples in the section on “How does
variation arise”, consider:
1.
What are the treatment factors?
2.
Should you apply blocking?
3.
What design would be most appropriate?
4.
What non-treatment variation would and would not
be controlled by this design?
Confounding (aliasing)
!
When levels of a treatment factor and levels of a
design factor are applied to the same units
"
Must be avoided! Because you can’t tell whether
the effect relates to the treatment factor or the
design factor
e.g. sequence of scoring five sets of microscope slides of
material given three treatments, P, Q, R
OR sequence of RNA extraction from samples of three
varieties, P, Q, R
PPPPPQQQQQRRRRR
! treatment/variety and time are confounded
Missing values
1.
2.
If only a few MVs, use a standard (balanced) design
If many MVs, can use more advanced techniques (get
advice!)
Space for designing experiments
Stats packages available at JIC
Genstat: The best stats package for designed experiments.
Easy to use and very powerful. Easy to exchange data with
Excel or PowerPoint. Information and manuals on JIC
Statistics web page. IFR should have Genstat too.
SPSS: Considered as the standard stats package at UEA.
Technical leader for analysis of survey data but poor for
experimental data.
Minitab: Used to be much easier to use than Genstat but
not now. Less comprehensive than Genstat for analysing
experiments. (Many people start using Minitab but end up
using Genstat). Good graphics, good manual.
Excel: Not a stats package, but will do some simple
analyses. Very widely used but not recommended!
Many other packages on PCs, but be wary of reliability
Advice at JIC
Support service from statisticians at Reading University.
!
Help desk one day a month on 2nd Thursday of each
two month. Suitable for smallish problems or for a
preliminary look at a bigger problem.
!
For bigger problems, you can visit Reading.
!
Booking form and further information on the intranet
!
Seminars on statistical topics every other month (13th
February: ‘Good graphics for data presentation’)
“To call in a statistician after the experiment is done may
be no more than asking him to perform a post-morten
examination: he may be able to say what the experiment
died of”
Sir R. A. Fisher
Training courses
Basic Statistics and Design Principles
Appropriate for anyone who needs reminding of the basics
of statistical analysis
Two courses:
!
Mon 10 Mar a.m. to Weds 12 Mar a.m. (FULL)
!
Weds 12 Mar p.m. to Fri 14 Mar p.m.
!
!
!
!
!
Exploratory data analysis
Confidence intervals and significance tests
Simple linear regression
Simple analysis of variance (one treatment factor)
Use of Genstat
http://intranet/infoserv/cgi-bin/calendar/id.asp?ID=43438
Experimental Design and Analysis
Appropriate for anyone who
!
has experiments which are more complicated than a
randomised complete block design (e.g. >1 treatment
factor or >1 block factor or a lack of balance)
!
has experiments involving many treatments (e.g. plant
genotypes), or
!
collects data in the form of counts or proportions (%), or
!
wants to improve their ability in data analysis
Particularly recommended for those doing experiments in
controlled environment rooms or field trials
!
Mon 28 Apr a.m. to Weds 30 Apr a.m.
http://intranet/infoserv/cgi-bin/calendar/id.asp?ID=43439