Structure in the Experimental Treatments

Structure in the
Experimental Treatments
PGRM 11
Statistics
in
Science
Σ
Factors
Complex systems are affected by a wide range of
factors:
• Ploughing system: soil type, ploughing depth, no of
cultivations, type of plough, etc
• Animal production system: management regime,
biological & environmental inputs
• Ecological habitat: available food, cover light,
temperature
• Biochemical reaction: concentration of reagents,
temperature, light
Statistics
in
Science
Σ
1
Factor Levels
Enterprise type is a factor affecting farm outputs
The different enterprise types considered are the levels
of the factor:
eg beef, beef suckler, dairy, mixed
Levels may be categorical (as above), or quantitative as
in the study of the effect of washing solution on
retarding bacterial growth – these were
2%, 4% or 6%
of an active ingredient.
With quantitative levels it makes sense to look for a
trend (increasing or decreasing) in the response as
the level increases.
Statistics
in
Science
Σ
Single factor experiments
• Compare the mean response for the different levels
of a single factor
• Other factors affecting the response must be kept as
constant as possible, and any affect of these will
appear as random residual variation (due to the
random allocation of units to the different levels of
the factor)
• The result will be: clear, valid but of limited value
Ex: comparing growth of lambs fed on 2 levels of
protein supplement, we must use the same
sources of protein for the two levels:
we have no info what the response to protein
level would be for other sources
Statistics
in
Science
Σ
2
(Multi)-factorial experiments
Examine the effect of 2 (or more) factors at the same
time
Treatments:
the various combinations of the levels of the different
factors
Ex:
protein supplement: factor B (levels B1, B2, B3)
protein source: factor A (levels A1, A2)
6 treatments
A1B1, A1B2, A1B3
A2B1, A2B2, A2B3
Statistics
in
Science
Σ
Simple and Main effects
• Simple effects of source: Difference (in mean
growth) between source A1 & A2 can be considered
at each of the 3 levels of protein.
• Simple effects of protein: Differences between B1 &
B2, B2 & B3 and between B3 & B1 can be measured
for each source.
• Main effects are averages of simple effects, and are
not always meaningful
Statistics
in
Science
Σ
3
Example
Means
B1
B2
B3
Average
A1
10
18
11
13
A2
11
19
18
16
1
1
7
3
A effect
Note:
the main effect of A, (1 + 1 + 7)/3, is also the difference
between the MARGINAL means
Here the effect of A depends on the level of B
This is an INTERACTION between the factors A
and B
Statistics
in
Science
Σ
Important Rule
With an AB interaction:
the effect of A changes as the level of B changes
Hence:
averaging the effects of A over the levels of B makes
no sense
1. The main effect of a factor can not be uncritically
interpreted as the effect of the factor if there is
an interaction
2. In this case report the ab treatment means and
some meaningful comparisons, and not the
separate means for levels of A and B
Statistics
in
Science
Σ
4
Interaction plot
Statistics
in
Science
Statistics
in
Science
Σ
Σ
5
Why do factorial?
1. Factorial experiments compare a set of treatments
which have a certain structure:
the treatments simply consist of combinations of
levels of 2 (or more) factors
— so we already know how to do the analysis!
— the factorial treatment structure will dictate sensible
comparisons to make
2. The gain:
— knowing whether the effect of one factor varies with
the level of another
— saving resources when there is no interaction,
since a simple effect can be estimated at each level
of the other factor and the results combined
Statistics
in
Science
Σ
Why the gain (in absence of interaction)
Sample
size
B1
B2
B3
Total
A1
6
6
6
18
A2
6
6
6
18
12
12
12
36
Total
A effect: since this is the same for all levels of B
it is measured by the difference in the marginal
means, each based on 18 observations.
B effects: each B effect (B1vB2, B2vB3, B3vB1)
is measured using means of 12 observations
Statistics
in
Science
Σ
6
Separate experiments (same resources)
A1
A2
Total
B1
B2
B3
Total
9
9
18
6
6
6
18
A effect: now measured by the difference
between means of 9 observations (was 18).
B effects: now measured by the difference
between means of 6 observations (was 12).
Also: we don’t know if the A effects depend on
the level of B – MORE LOSS OF INFORMATION!
Statistics
in
Science
Σ
PGRM pg 11-6
The enormous benefits (of factorial designs)
arise through no extra cost but merely by
reorganising the work programme.
You can choose to get much more
information for the same money or reduce
the cost of achieving a given level of
information.
Statistics
in
Science
Σ
7
SAS OUTPUT
1.
ANOVA table
2.
Table of MEANS with SED
3.
Writing a summary
Statistics
in
Science
Σ
ANOVA a×b factorial, replication r
• Treat this as a 1-way structure, with ab treatments
Source
SS
df
MS
Treatments
TSS
ab - 1
TSS/(ab-1)
Error
RSS
(r-1)ab
RSS/((r-1)ab)
Total
rab - 1
• Now partition the treatment SS, TSS
Source
SS
df
MS
A
SSA
a-1
SSA/(a-1)
B
SSB
b-1
SSB/(b-1)
SSAB
(a-1)(b-1)
SSAB/((a-1)(b-1))
TSS
ab-1
AB (interaction)
Statistics
in
Science
Σ
Treatment
8
Example: time to development of Fasciola hepatica eggs under 2
combinations of temperature and relative humidity
o
Temperature C
16
16
22
Humiditiy level
1
2
1
2
27
34
13
17
26
37
17
15
29
33
16
18
27.3
34.7
Treatment Means
22
15.3
16.7
Source
df
SS
MS
F
Treatments
3
758.33
252.78
***75.83
Temp
1
675.00
675.00
***202.7
Humidity
1
56.33
56.33
**16.92
Interaction
1
27.00
27.00
*8.1
Residual
8
26.67
3.33
Partition of TSS
Total
Statistics
in
Science
11
p<0.001 ***
p<0.01 **
p<0.05
*
785.00
Σ
Tables of Means
Temperature oC
16
16
22
22
Humiditiy level
1
2
1
2
Treatment Means
27.3
34.7
15.3
16.7
SED = 1.49
Temperature
Humidity effect:
sig. when temp = 16 (7.4)
non-sig. when temp = 22 (1.4)
Temp. effect:
sig. (12.0 & 18.0) at both levels of
humidity
Humidity
16
22
SED
H1
H2
SED
31.0
16.0
1.06
21.3
25.7
1.06
Interpretation
Overall treatments differ: F = 75.83
Interaction is significant: F = 8.1, so we really should examine
the 4 means as above, and ignore the tests for main effects
which eg compare levels of HUMIDITY averaged over levels of
TEMP
Statistics
in
Science
Σ
However, in this case, the TEMP effect is much larger than the
interaction, its averaged effect broadly reflects its effect at
each level of HUMIDITY
9
Statistics
in
Science
Σ
Example: time to development of Fasciola hepatica eggs under 2
combinations of temperature and relative humidity
16
16
22
22
Humiditiy level
1
2
1
2
27
34
13
17
26
37
17
15
29
33
16
18
27.3
34.7
Treatment Means
15.3
MS
40
Tim e to D evelopm ent
o
Temperature C
16.7
T16
30
T22
20
10
0
Source
df
SS
F
Treatments
3
758.33
252.78
***75.83
Temp
1
675.00
675.00
***202.7
Humidity
1
56.33
56.33
**16.92
Interaction
1
27.00
27.00
*8.1
Residual
8
26.67
3.33
0
1
2
Humidity
Partition of TSS
Total
Statistics
in
Science
11
Σ
785.00
p<0.001 ***
p<0.01 **
p<0.05
*
10
SAS/GLM for 2-way analysis
proc glm data = fasciola;
class temp humidity;
model time = temp humidity temp*humidity;
lsmeans temp;
lsmeans humidity;
lsmeans temp*humidity;
estimate ‘SED for temp’ temp 1 -1;
estimate ‘SED for humidity’ humidity 1 -1;
quit;
proc glm data = fasciola;
class temp humidity;
model time = temp*humidity;
estimate ‘SED tment means’ temp*humidity 1 -1;
quit;
Statistics
in
Science
Σ
One-way analysis
Main effects & interaction
SAS demo!
Statistics
in
Science
temp
humidity
time
16
1
27
16
2
34
22
1
13
22
2
17
16
1
26
16
2
37
22
1
17
22
2
15
16
1
29
16
2
33
22
1
16
22
2
18
Σ
Data must
contain response
values (time) in a
single column
identified by
factor levels in 2
other columns
This gives 3
variables
(columns) for
SAS program
faciola.sas
11
What to present (again!)
• Since the interaction is significant don’t report the main
effects.
• Present:
– the 2-way table: (with SED)
Time
160
220
1
27.3
15.3
2
34.7
16.7
SED = 1.49
– a summary:
the temp/humidity interaction was significant (p = 0.02)
humidity effects were significant at temp = 16 (p = 0.0012)
but not at temp = 22 (p = 0.40)
temp effects were significant at both humidities (p < 0.0001),
and greater when humidity = 1
Statistics
in
Science
Σ
Factorial experiment laid out in blocks
• Above has laid out the ab treatments as a completely
randomised design using rab experimental units (r
for each treatment)
Think: how would this be done in practice?
• If we block the experimental units into blocks of size
ab and randomly allocate the ab treatments to the
units in the block we can then remove BSS from RSS,
hopefully reducing it sufficiently to compensate for
the reduction in DF
• See example over …
Statistics
in
Science
Σ
12
2-way experiment laid out in blocks
• Factor A: 2 levels
Factor B: 3 levels
• 60 experimental units available (10 per treatment)
• Completely randomised design (CR): randomly allocate
treatments of units
Randomised blocks (RB): Group units into blocks of size 6 (so
10 blocks) & randomise the 6 treatments in each block, which
may be much easier to do
ANOVA
Source
DF: CR
Block
Statistics
in
Science
Σ
DF: RB
9
A
1
1
B
2
2
AB
2
2
Residual
54
45
Total
59
59
Practical: 4.2 Two-Factor Factorial Example 2
Bacterial count in sausages
stored at 4 temperatures
using 3 type of preservative methods
Statistics
in
Science
Σ
13
More than 2 factors!
3×4×5 experiment:
ie Factors A, B, C with 3, 4,and 5 levels respectively
giving 60 treatment combinations!
The 3-factor ABC interaction measures how the 2-factor
AB interaction changes over the levels of C
(see over)
Can get away with replication r = 1 provided the 3factor interaction can be assumed negligible
– not usually liked by journal editors!
With r > 1 we include:
main effects: A, B, C
2-factor interactions: BC, CA, AB
3-factor interaction: ABC
Statistics
in
Science
Σ
3-factor interaction for a 2×2×2 expt
(a)
Response
40
30
A1C1
A1C2
A2C1
A2C2
20
10
0
B1
B2
B3
With C1: A effect is least at B2
With C2: A effect is largest at B2
Direction of A effect is different for C1, C2
Statistics
in
Science
Σ
AB interaction different a two C levels
14
3-factor interaction arising naturally
See PGRM Fig 11.2.2 (b)
Statistics
in
Science
Σ
Examples – measuring the benefit
1. 2×2×2×2: artificial insemination
involving 256 heifers
(r = 16 per treatment)
2. 3×4×5: imaginary example to practice
calculating sample sizes! 120 units
(r = 2)
3. 2×2×2: machine tool lifetime 24 units
(r = 3)
Statistics
in
Science
Σ
15
Example 2x2x2x2 factorial
Artificial insemination
256 heifers (64 each week)
4 factors at 2 levels.
Compare precision
A) 32 animals per treatment.
SED = √(2 s2/32) = s/4
choices
where s2 = MSE.
A) 4 experiments (r=32)
B) 128 animals for each level
of a factor
B) 2 x 2 x 2 x 2 factorial
SED = √(2 s2/128) = s/8.
(r=16 per combination)
Plus
With B all interactions
can be estimated
Statistics
in
Science
Σ
Conclusion
Compare precision
A) 32 animals per treatment.
SED = √(2 s2/32) = s/4
where S2 = MSE.
B) 128 animals for each level
of a factor
SED = √(2 s2/128) = s/8.
Statistics
in
Science
Σ
Summary - The factorial
design
- Halves the SED and
quarters the number of
animals required for a given
level of precision
- Allows more general
interpretation of the factor
effects since they are tested
over a wide range of levels of
the other factors
- Allows a test of whether the
factors interact.
16
3×4×5 expt with factors A, B, C & replication 2
(120 units)
For any factor not
involved in a
significant
interaction
Replication of Main effect means
A
B
C
40 30 24
Replication of means in Interaction table, eg BC
B
1
2
3
4
Total
C
1
2
6
6
6
6
24
6
6
6
6
24
3
4
5
Total
6
6
6
6
24
6
6
6
6
24
6
6
6
6
24
30
30
30
30
120
For comparing BC
effects if only
significant
interaction is BC
All interactions
Statistics
in
Science
Σ
AB
AC
BC
10
8
6
Treat Comb.
2
All 2-factor
interactions
significant,
3-factor not
Example
An engineer is interested in the effects of
cutting speed (A),
tool geometry (B) and
cutting angle (C)
on the
life (in hours)
of a machine tool.
Two levels of each factor are chosen,
and three replicates of a 23 factorial design are run.
Design: 2×2×2
No. treatments: 8
No. units: 24
Statistics
in
Science
Σ
17
Example: Data
A
B
C
LIFE(hr)
Replicate
Statistics
in
Science
1
2
3
1
1
1
22
31
25
2
1
1
32
43
29
1
2
1
35
34
50
2
2
1
55
47
46
1
1
2
44
45
38
2
1
2
40
37
36
1
2
2
60
50
54
2
2
2
39
41
47
Σ
Example: ANOVA
Source
A
B
C
A.B
A.C
B.C
A.B.C
Residual
df
1
1
1
1
1
1
1
16
SS
0.7
770.7
280.2
16.7
468.2
48.2
28.2
482.7
Total
23
2095.3
MS
0.7
770.7
280.2
16.7
468.2
48.2
28.2
6.2
F
0.02
25.55
9.29
0.55
15.52
1.60
0.93
F pr.
0.884
<.001
0.008
0.468
0.001
0.224
0.348
Note:
1. ABC interaction non-significant
Statistics
in
Science
Σ
2. AC is only significant 2-factor interaction
18
Tables of MEANS
A
B
C
A
1
2
A
1
2
B
1
2
Statistics
in
Science
1
40.7
35.2
37.4
2
41.0
46.5
44.2
SED
2.24
2.24
2.24
B1
B2
SED
34.2
36.2
47.2
45.8
3.17
C1
C2
SED
32.8
42.0
48.5
40.0
3.17
C1
C2
SED
6.3
44.5
40.0
48.5
3.17
B1
B1
B2
B2
C1
C2
C1
C2
A1
26.0
42.3
39.7
54.7
A2
34.7
37.7
49.3
42.3
SED = 4.48
Help!
Σ
Making sense of tables
1. From this analysis, the only terms that are
significant are the B and C main effects and the AC
interaction.
2. Thus, the only tables that need to be presented are
the B main effect table and the AC tables of means.
– Geometry (B) has a large effect, increasing the life by
over 10 hours.
– Cutting angle (C) increases the life considerably at low
but not at high speed (A).
3. Another way of looking at the AC interaction is that
increased speed increase tool life for the first cutting
angle but reduces it for the second cutting angle.
Statistics
in
Science
Σ
19
How were the tables calculated?
Statistics
in
Science
Σ
SAS/GLM code
proc glm data = mydataset;
model response =
a b c b*c c*a a*b a*b*c;
lsmeans
a b c b*c c*a a*b a*b*c;
quit;
With one (AC) significant interaction
lsmeans b a*c / stderr;
estimate ‘b SED’ b 1 -1;
/* ac SED = sqrt(2) x stderr */
Statistics
in
Science
Is this
the best
we can
suggest
?
Σ
20
Calculating SEDs
Recall (with equal replication):
SED = √2 × SEM
SED: standard error of a difference
SEM: standard error of a mean
SAS:
lsmeans B / stderr;
lsmeans A*C / stderr;
lsmeans A*B*C / stderr;
will give SEM, & a usually useless p-value testing
whether the mean is 0!
Statistics
in
Science
f3_toolLife.sas
Σ
Calculating SED:
For the AC interaction:
SEM = 2.2422707
NB: usual SAS unhelpful precision!
so
SED = 1.414 × 2.2422707
= 3.17 (3 sig. figs.)
Statistics
in
Science
Σ
21
Transformations of data
Analysing log(response)
Statistics
in
Science
Σ
Interpreting the log scale
Linear relationship
log(y) = a + bx (here: log = log2)
y = 2a + bx = 2a 2bx
Compare y-values for a unit increase in x,
ie y1 at x and y2 at x + 1
y2 / y1 = [2a 2(bx + b)]/ [2a 2bx]
= 2b
Increasing x by 1, multiplies y by 2b
eg if b = -1 this is a 50% decrease in y
Statistics
in
Science
Σ
22
Understanding the LOG scale
- where effects of a variate are proportional
Example:
1. uses log2 (logs to base 2)
2. slope b = -1
- giving a 50% decrease per unit increase in x
Statistics
in
Science
Σ
log2(y) = 3 – x
- a linear relationship between log2(y) & x
Statistics
in
Science
x
y
0
8
1
4
2
2
3
1
4
0.5
Σ
23
Back transforming LOG
Statistics
in
Science
y = log10(x)
x = 10y
y = log2(x)
x = 2y
y = log(x)
x = exp(y)
= ey
Σ
Dilution of drug in milk
Excretion of sodium penicillin
for five milkings for a cow.
Relationship is not linear.
1
2
3
4
5
Statistics
in
Science
Units Log(Units)
Excreted
29547
10.29
1111
7.01
235
5.46
26
3.26
4.3
1.46
30000
20000
Units
Milking
Units vs Milkings
10000
0
0
2
4
6
Milkings
Σ
24
LOG-scale
Slope b= -2.14
Log(units) vs # milkings
Log(units)
exp(-2.14) = 0.12
15
Conclusion:
log(U) =11.9 - 2.14 M
10
Each milking
reduces the
# units to 12% of
previous milking
5
0
1
2
3
4
5
Milking
Statistics
in
Science
Σ
Revision:
t-test, p-value, significance level,
hypothesis testing, and much more
ALL IN ONE OVERHEAD!
Statistics
in
Science
Σ
25
t-test
H0:
θ=0
t = ESTIMATE/SE
eg
θ=
µ3 - µ1
µ1 - 2µ2 +µ3
regression slope
When H0 is true:
5% of t-values fall
on axis below blue
shading –
for 11 df:
beyond ±2.2
Statistics
in
Science
Σ
For given t, V, p is
proportion of more
extreme values
26