Structure in the Experimental Treatments PGRM 11 Statistics in Science Σ Factors Complex systems are affected by a wide range of factors: • Ploughing system: soil type, ploughing depth, no of cultivations, type of plough, etc • Animal production system: management regime, biological & environmental inputs • Ecological habitat: available food, cover light, temperature • Biochemical reaction: concentration of reagents, temperature, light Statistics in Science Σ 1 Factor Levels Enterprise type is a factor affecting farm outputs The different enterprise types considered are the levels of the factor: eg beef, beef suckler, dairy, mixed Levels may be categorical (as above), or quantitative as in the study of the effect of washing solution on retarding bacterial growth – these were 2%, 4% or 6% of an active ingredient. With quantitative levels it makes sense to look for a trend (increasing or decreasing) in the response as the level increases. Statistics in Science Σ Single factor experiments • Compare the mean response for the different levels of a single factor • Other factors affecting the response must be kept as constant as possible, and any affect of these will appear as random residual variation (due to the random allocation of units to the different levels of the factor) • The result will be: clear, valid but of limited value Ex: comparing growth of lambs fed on 2 levels of protein supplement, we must use the same sources of protein for the two levels: we have no info what the response to protein level would be for other sources Statistics in Science Σ 2 (Multi)-factorial experiments Examine the effect of 2 (or more) factors at the same time Treatments: the various combinations of the levels of the different factors Ex: protein supplement: factor B (levels B1, B2, B3) protein source: factor A (levels A1, A2) 6 treatments A1B1, A1B2, A1B3 A2B1, A2B2, A2B3 Statistics in Science Σ Simple and Main effects • Simple effects of source: Difference (in mean growth) between source A1 & A2 can be considered at each of the 3 levels of protein. • Simple effects of protein: Differences between B1 & B2, B2 & B3 and between B3 & B1 can be measured for each source. • Main effects are averages of simple effects, and are not always meaningful Statistics in Science Σ 3 Example Means B1 B2 B3 Average A1 10 18 11 13 A2 11 19 18 16 1 1 7 3 A effect Note: the main effect of A, (1 + 1 + 7)/3, is also the difference between the MARGINAL means Here the effect of A depends on the level of B This is an INTERACTION between the factors A and B Statistics in Science Σ Important Rule With an AB interaction: the effect of A changes as the level of B changes Hence: averaging the effects of A over the levels of B makes no sense 1. The main effect of a factor can not be uncritically interpreted as the effect of the factor if there is an interaction 2. In this case report the ab treatment means and some meaningful comparisons, and not the separate means for levels of A and B Statistics in Science Σ 4 Interaction plot Statistics in Science Statistics in Science Σ Σ 5 Why do factorial? 1. Factorial experiments compare a set of treatments which have a certain structure: the treatments simply consist of combinations of levels of 2 (or more) factors — so we already know how to do the analysis! — the factorial treatment structure will dictate sensible comparisons to make 2. The gain: — knowing whether the effect of one factor varies with the level of another — saving resources when there is no interaction, since a simple effect can be estimated at each level of the other factor and the results combined Statistics in Science Σ Why the gain (in absence of interaction) Sample size B1 B2 B3 Total A1 6 6 6 18 A2 6 6 6 18 12 12 12 36 Total A effect: since this is the same for all levels of B it is measured by the difference in the marginal means, each based on 18 observations. B effects: each B effect (B1vB2, B2vB3, B3vB1) is measured using means of 12 observations Statistics in Science Σ 6 Separate experiments (same resources) A1 A2 Total B1 B2 B3 Total 9 9 18 6 6 6 18 A effect: now measured by the difference between means of 9 observations (was 18). B effects: now measured by the difference between means of 6 observations (was 12). Also: we don’t know if the A effects depend on the level of B – MORE LOSS OF INFORMATION! Statistics in Science Σ PGRM pg 11-6 The enormous benefits (of factorial designs) arise through no extra cost but merely by reorganising the work programme. You can choose to get much more information for the same money or reduce the cost of achieving a given level of information. Statistics in Science Σ 7 SAS OUTPUT 1. ANOVA table 2. Table of MEANS with SED 3. Writing a summary Statistics in Science Σ ANOVA a×b factorial, replication r • Treat this as a 1-way structure, with ab treatments Source SS df MS Treatments TSS ab - 1 TSS/(ab-1) Error RSS (r-1)ab RSS/((r-1)ab) Total rab - 1 • Now partition the treatment SS, TSS Source SS df MS A SSA a-1 SSA/(a-1) B SSB b-1 SSB/(b-1) SSAB (a-1)(b-1) SSAB/((a-1)(b-1)) TSS ab-1 AB (interaction) Statistics in Science Σ Treatment 8 Example: time to development of Fasciola hepatica eggs under 2 combinations of temperature and relative humidity o Temperature C 16 16 22 Humiditiy level 1 2 1 2 27 34 13 17 26 37 17 15 29 33 16 18 27.3 34.7 Treatment Means 22 15.3 16.7 Source df SS MS F Treatments 3 758.33 252.78 ***75.83 Temp 1 675.00 675.00 ***202.7 Humidity 1 56.33 56.33 **16.92 Interaction 1 27.00 27.00 *8.1 Residual 8 26.67 3.33 Partition of TSS Total Statistics in Science 11 p<0.001 *** p<0.01 ** p<0.05 * 785.00 Σ Tables of Means Temperature oC 16 16 22 22 Humiditiy level 1 2 1 2 Treatment Means 27.3 34.7 15.3 16.7 SED = 1.49 Temperature Humidity effect: sig. when temp = 16 (7.4) non-sig. when temp = 22 (1.4) Temp. effect: sig. (12.0 & 18.0) at both levels of humidity Humidity 16 22 SED H1 H2 SED 31.0 16.0 1.06 21.3 25.7 1.06 Interpretation Overall treatments differ: F = 75.83 Interaction is significant: F = 8.1, so we really should examine the 4 means as above, and ignore the tests for main effects which eg compare levels of HUMIDITY averaged over levels of TEMP Statistics in Science Σ However, in this case, the TEMP effect is much larger than the interaction, its averaged effect broadly reflects its effect at each level of HUMIDITY 9 Statistics in Science Σ Example: time to development of Fasciola hepatica eggs under 2 combinations of temperature and relative humidity 16 16 22 22 Humiditiy level 1 2 1 2 27 34 13 17 26 37 17 15 29 33 16 18 27.3 34.7 Treatment Means 15.3 MS 40 Tim e to D evelopm ent o Temperature C 16.7 T16 30 T22 20 10 0 Source df SS F Treatments 3 758.33 252.78 ***75.83 Temp 1 675.00 675.00 ***202.7 Humidity 1 56.33 56.33 **16.92 Interaction 1 27.00 27.00 *8.1 Residual 8 26.67 3.33 0 1 2 Humidity Partition of TSS Total Statistics in Science 11 Σ 785.00 p<0.001 *** p<0.01 ** p<0.05 * 10 SAS/GLM for 2-way analysis proc glm data = fasciola; class temp humidity; model time = temp humidity temp*humidity; lsmeans temp; lsmeans humidity; lsmeans temp*humidity; estimate ‘SED for temp’ temp 1 -1; estimate ‘SED for humidity’ humidity 1 -1; quit; proc glm data = fasciola; class temp humidity; model time = temp*humidity; estimate ‘SED tment means’ temp*humidity 1 -1; quit; Statistics in Science Σ One-way analysis Main effects & interaction SAS demo! Statistics in Science temp humidity time 16 1 27 16 2 34 22 1 13 22 2 17 16 1 26 16 2 37 22 1 17 22 2 15 16 1 29 16 2 33 22 1 16 22 2 18 Σ Data must contain response values (time) in a single column identified by factor levels in 2 other columns This gives 3 variables (columns) for SAS program faciola.sas 11 What to present (again!) • Since the interaction is significant don’t report the main effects. • Present: – the 2-way table: (with SED) Time 160 220 1 27.3 15.3 2 34.7 16.7 SED = 1.49 – a summary: the temp/humidity interaction was significant (p = 0.02) humidity effects were significant at temp = 16 (p = 0.0012) but not at temp = 22 (p = 0.40) temp effects were significant at both humidities (p < 0.0001), and greater when humidity = 1 Statistics in Science Σ Factorial experiment laid out in blocks • Above has laid out the ab treatments as a completely randomised design using rab experimental units (r for each treatment) Think: how would this be done in practice? • If we block the experimental units into blocks of size ab and randomly allocate the ab treatments to the units in the block we can then remove BSS from RSS, hopefully reducing it sufficiently to compensate for the reduction in DF • See example over … Statistics in Science Σ 12 2-way experiment laid out in blocks • Factor A: 2 levels Factor B: 3 levels • 60 experimental units available (10 per treatment) • Completely randomised design (CR): randomly allocate treatments of units Randomised blocks (RB): Group units into blocks of size 6 (so 10 blocks) & randomise the 6 treatments in each block, which may be much easier to do ANOVA Source DF: CR Block Statistics in Science Σ DF: RB 9 A 1 1 B 2 2 AB 2 2 Residual 54 45 Total 59 59 Practical: 4.2 Two-Factor Factorial Example 2 Bacterial count in sausages stored at 4 temperatures using 3 type of preservative methods Statistics in Science Σ 13 More than 2 factors! 3×4×5 experiment: ie Factors A, B, C with 3, 4,and 5 levels respectively giving 60 treatment combinations! The 3-factor ABC interaction measures how the 2-factor AB interaction changes over the levels of C (see over) Can get away with replication r = 1 provided the 3factor interaction can be assumed negligible – not usually liked by journal editors! With r > 1 we include: main effects: A, B, C 2-factor interactions: BC, CA, AB 3-factor interaction: ABC Statistics in Science Σ 3-factor interaction for a 2×2×2 expt (a) Response 40 30 A1C1 A1C2 A2C1 A2C2 20 10 0 B1 B2 B3 With C1: A effect is least at B2 With C2: A effect is largest at B2 Direction of A effect is different for C1, C2 Statistics in Science Σ AB interaction different a two C levels 14 3-factor interaction arising naturally See PGRM Fig 11.2.2 (b) Statistics in Science Σ Examples – measuring the benefit 1. 2×2×2×2: artificial insemination involving 256 heifers (r = 16 per treatment) 2. 3×4×5: imaginary example to practice calculating sample sizes! 120 units (r = 2) 3. 2×2×2: machine tool lifetime 24 units (r = 3) Statistics in Science Σ 15 Example 2x2x2x2 factorial Artificial insemination 256 heifers (64 each week) 4 factors at 2 levels. Compare precision A) 32 animals per treatment. SED = √(2 s2/32) = s/4 choices where s2 = MSE. A) 4 experiments (r=32) B) 128 animals for each level of a factor B) 2 x 2 x 2 x 2 factorial SED = √(2 s2/128) = s/8. (r=16 per combination) Plus With B all interactions can be estimated Statistics in Science Σ Conclusion Compare precision A) 32 animals per treatment. SED = √(2 s2/32) = s/4 where S2 = MSE. B) 128 animals for each level of a factor SED = √(2 s2/128) = s/8. Statistics in Science Σ Summary - The factorial design - Halves the SED and quarters the number of animals required for a given level of precision - Allows more general interpretation of the factor effects since they are tested over a wide range of levels of the other factors - Allows a test of whether the factors interact. 16 3×4×5 expt with factors A, B, C & replication 2 (120 units) For any factor not involved in a significant interaction Replication of Main effect means A B C 40 30 24 Replication of means in Interaction table, eg BC B 1 2 3 4 Total C 1 2 6 6 6 6 24 6 6 6 6 24 3 4 5 Total 6 6 6 6 24 6 6 6 6 24 6 6 6 6 24 30 30 30 30 120 For comparing BC effects if only significant interaction is BC All interactions Statistics in Science Σ AB AC BC 10 8 6 Treat Comb. 2 All 2-factor interactions significant, 3-factor not Example An engineer is interested in the effects of cutting speed (A), tool geometry (B) and cutting angle (C) on the life (in hours) of a machine tool. Two levels of each factor are chosen, and three replicates of a 23 factorial design are run. Design: 2×2×2 No. treatments: 8 No. units: 24 Statistics in Science Σ 17 Example: Data A B C LIFE(hr) Replicate Statistics in Science 1 2 3 1 1 1 22 31 25 2 1 1 32 43 29 1 2 1 35 34 50 2 2 1 55 47 46 1 1 2 44 45 38 2 1 2 40 37 36 1 2 2 60 50 54 2 2 2 39 41 47 Σ Example: ANOVA Source A B C A.B A.C B.C A.B.C Residual df 1 1 1 1 1 1 1 16 SS 0.7 770.7 280.2 16.7 468.2 48.2 28.2 482.7 Total 23 2095.3 MS 0.7 770.7 280.2 16.7 468.2 48.2 28.2 6.2 F 0.02 25.55 9.29 0.55 15.52 1.60 0.93 F pr. 0.884 <.001 0.008 0.468 0.001 0.224 0.348 Note: 1. ABC interaction non-significant Statistics in Science Σ 2. AC is only significant 2-factor interaction 18 Tables of MEANS A B C A 1 2 A 1 2 B 1 2 Statistics in Science 1 40.7 35.2 37.4 2 41.0 46.5 44.2 SED 2.24 2.24 2.24 B1 B2 SED 34.2 36.2 47.2 45.8 3.17 C1 C2 SED 32.8 42.0 48.5 40.0 3.17 C1 C2 SED 6.3 44.5 40.0 48.5 3.17 B1 B1 B2 B2 C1 C2 C1 C2 A1 26.0 42.3 39.7 54.7 A2 34.7 37.7 49.3 42.3 SED = 4.48 Help! Σ Making sense of tables 1. From this analysis, the only terms that are significant are the B and C main effects and the AC interaction. 2. Thus, the only tables that need to be presented are the B main effect table and the AC tables of means. – Geometry (B) has a large effect, increasing the life by over 10 hours. – Cutting angle (C) increases the life considerably at low but not at high speed (A). 3. Another way of looking at the AC interaction is that increased speed increase tool life for the first cutting angle but reduces it for the second cutting angle. Statistics in Science Σ 19 How were the tables calculated? Statistics in Science Σ SAS/GLM code proc glm data = mydataset; model response = a b c b*c c*a a*b a*b*c; lsmeans a b c b*c c*a a*b a*b*c; quit; With one (AC) significant interaction lsmeans b a*c / stderr; estimate ‘b SED’ b 1 -1; /* ac SED = sqrt(2) x stderr */ Statistics in Science Is this the best we can suggest ? Σ 20 Calculating SEDs Recall (with equal replication): SED = √2 × SEM SED: standard error of a difference SEM: standard error of a mean SAS: lsmeans B / stderr; lsmeans A*C / stderr; lsmeans A*B*C / stderr; will give SEM, & a usually useless p-value testing whether the mean is 0! Statistics in Science f3_toolLife.sas Σ Calculating SED: For the AC interaction: SEM = 2.2422707 NB: usual SAS unhelpful precision! so SED = 1.414 × 2.2422707 = 3.17 (3 sig. figs.) Statistics in Science Σ 21 Transformations of data Analysing log(response) Statistics in Science Σ Interpreting the log scale Linear relationship log(y) = a + bx (here: log = log2) y = 2a + bx = 2a 2bx Compare y-values for a unit increase in x, ie y1 at x and y2 at x + 1 y2 / y1 = [2a 2(bx + b)]/ [2a 2bx] = 2b Increasing x by 1, multiplies y by 2b eg if b = -1 this is a 50% decrease in y Statistics in Science Σ 22 Understanding the LOG scale - where effects of a variate are proportional Example: 1. uses log2 (logs to base 2) 2. slope b = -1 - giving a 50% decrease per unit increase in x Statistics in Science Σ log2(y) = 3 – x - a linear relationship between log2(y) & x Statistics in Science x y 0 8 1 4 2 2 3 1 4 0.5 Σ 23 Back transforming LOG Statistics in Science y = log10(x) x = 10y y = log2(x) x = 2y y = log(x) x = exp(y) = ey Σ Dilution of drug in milk Excretion of sodium penicillin for five milkings for a cow. Relationship is not linear. 1 2 3 4 5 Statistics in Science Units Log(Units) Excreted 29547 10.29 1111 7.01 235 5.46 26 3.26 4.3 1.46 30000 20000 Units Milking Units vs Milkings 10000 0 0 2 4 6 Milkings Σ 24 LOG-scale Slope b= -2.14 Log(units) vs # milkings Log(units) exp(-2.14) = 0.12 15 Conclusion: log(U) =11.9 - 2.14 M 10 Each milking reduces the # units to 12% of previous milking 5 0 1 2 3 4 5 Milking Statistics in Science Σ Revision: t-test, p-value, significance level, hypothesis testing, and much more ALL IN ONE OVERHEAD! Statistics in Science Σ 25 t-test H0: θ=0 t = ESTIMATE/SE eg θ= µ3 - µ1 µ1 - 2µ2 +µ3 regression slope When H0 is true: 5% of t-values fall on axis below blue shading – for 11 df: beyond ±2.2 Statistics in Science Σ For given t, V, p is proportion of more extreme values 26
© Copyright 2026 Paperzz