Biostatistics Case Studies 2009 Session 3: Replicates and Clusters Peter D. Christenson Biostatistician http://gcrc.labiomed.org/biostat Question #1 Does this paper use individual offspring outcomes or litter means to compare treatments? Question #1 Question #1 Fig 6: Fat in Males Strength of Treatment Effect: Signal:Noise Ratio t= 3.5 19.0 15.5 Δ=3.5 SD(1/NCtrl + 1/NAb)1/2 Are the Ns the # of dams or # of offspring? Ctrl Ab What is correct SD? Question #1 Paper ignores dams. Uses #s of offspring. Simulated data gives: Group N Mean SD Ctrl Ab Diff 33 13 18.784 15.643 3.1411 1.5148 1.3693 1.4766 t-test: t Value 6.50 Pr > |t| <0.0001 t = 6.50 = 3.14/(1.477(1/33 +1/13)1/2) Analysis assumes all 46 offspring give independent information. We explore the validity/necessity of that assumption. Question #2 Does Fig 1 express biological differences or measurement error, or both? Question #2 Question #3 From Fig 1, is it possible (likely?) that littermates from a mother may respond more similarly than offspring from different mothers (who were treated the same)? Question #3 Question #4 Suppose litter-mates do respond almost identically. Would an analysis, say a t-test, using individual offspring that ignores the mothers give about the same treatment difference as an analysis (again, say a t-test) using the mothers means of their offspring? Question #5 Would the answer to question #4 change if some litters had 3 offspring and others had up to 8? Question #6 Continuing question #4, would the analysis using individual offspring overstate or understate the evidence about the treatment difference (i.e., p-value too low or too high)? Question #7 Suppose now that outcomes from littermates differ about the same as offspring from different mothers. Would that justify using individual offspring, rather than mothers, in the analysis, and hence more power with the larger N? Question #7 Suppose now that outcomes from littermates differ about the same as offspring from different mothers. Would that justify using individual offspring, rather than mothers, in the analysis, and hence more power with the larger N? This requires the assumption of this equal variability, an expert opinion that may be valid, but the analysis could be faulty if that assumption is wrong. See the next question. Question #8 Lastly, suppose that we don’t want to suppose as in questions #4-7. Can we use the data itself to measure relative intra- and inter-litter differences, and incorporate that into the treatment comparison? This is what hierarchical or mixed models accomplish. They estimate the correlations among the offspring so we do not have to make assumptions as in question #7. We now show how this is done. Basic Issue for Using Offspring as Replicates • Dams vary. • Overall, offspring vary. • Do offspring from a dam vary less than offspring from different dams (positive correlation)? • Do offspring from a dam vary more than offspring from different dams (negative correlation)? What could cause this? Intra-Dam Correlation Among Offspring Example: Four dams - A,B,C,D - with 2 offspring each: A Offspring Fat B A C B Overall Mean C D A B D C A A A B B C C D D Strong Negative Correlation B A D B D C D Dam Means C Strong Positive Correlation No Correlation Intra-Dam and Inter-Dam Variation Example: Four dams - A,B,C,D - with 2 offspring each: A Offspring Fat B A C B Overall Mean C VInter D A B D C A A A VIntra B B C C D D D Correlation = Scaled VInter - VIntra Can be calculated from the data. Denote correlation by r. C B A D B D C Correct SD Uses Both Variations Table 6: Fat in Males Strength of Treatment Effect: Signal/Noise Ratio t= 3.5 19.0 15.5 Δ=3.5 SD(1/NCtrl + 1/NAb)1/2 Are the Ns the # of dams or # of offspring? Ctrl Ab What is correct SD? SD2 = V(1 + (n-1)r), where n=# offspring/dam Correct Analysis Signal/Noise Ratio t= Δ SD(1/NCtrl + 1/NAb)1/2 Ns are #s of offspring. Incorporate offspring correlation by using: SD2 = V(1 + (n-1)r), where n=# offspring/dam If r=0, then SD2=V and same as t-test. If r>0, then SD2>V, so t-test overstates effect. If r<0, then SD2<V, so t-test understates effect. Correct Analysis Thus, the reasoning is that the dams are clusters of correlated outcomes (offspring). If offspring were completely correlated (r=1), i.e., identical in a dam, then the correct analysis is the same as using dam means. [SD2 = nV] If there is no correlation (r=0), the analysis is the same as ignoring dams and using offspring results. [SD2 = V] If there is some correlation, then SD incorporates that correlation, i.e., relative intra- and inter-. Correct Analysis in Software If we have the same # of offspring for every dam, we can use repeated measures ANOVA. Specify the dam as a “subject” and the offspring as the repeated values. Otherwise, use Mixed Model for Repeated Measures. Both of these methods consider the dams as clusters of correlated outcomes (offspring). Numerical Illustrations 1. All Offspring for a Dam Identical 2. All Offspring for a Dam are Unique 3. Offspring for a Dam are Negatively Correlated We will generate data that has about the same means, but different correlations among littermates for these 3 examples. 1. All Offspring for a Dam Identical Recall Paper Uses Offspring Paper ignores dams. Uses #s of offspring. Simulated data with correlation=1 gives: Group N Mean SD Ctrl Ab Diff 33 13 18.784 15.643 3.1411 1.5148 1.3693 1.4766 t-test: t Value 6.50 Pr > |t| <0.0001 t = 6.50 = 3.14/(1.477(1/33 +1/13)1/2) Analysis assumes all 46 offspring give independent information. … which is wrong here. Analysis on Dam Means Same data using dam means gives: Group Ctrl Ab Diff t-test: N 9 9 t Value 4.96 Mean SD 19.000 15.494 3.5061 1.500 1.500 1.500 Pr > |t| <0.0001 t = 4.96 = 3.51/(1.477(1/9 +1/9)1/2) So the previous analysis gave a signal:noise ratio t that was 6.5/4.96=1.3 times too large. It doesn’t matter here, but if the previous t-test gave p=0.05, then the correct p here would be 0.13. Analysis using Calculated Correlation Same data using mixed model gives: CovParm CS Residual Effect group group Ctrl Ab Subject id Num DF 1 Estimate 19.0006 15.4940 Den DF 16 Std Err 0.4998 0.4998 Estimate 2.2485 1.365E-6 F Value 24.61 Lower 17.9410 14.4344 R=1= 2.2485 (2.2485 + 0) Pr > F 0.0001 Upper 20.0602 16.5536 Square root of 24.61 is t = 4.96, same as analysis on means. 2. All Offspring for a Dam are Unique Second Set of Simulated Data Paper ignores dams. Uses #s of offspring. Simulated data with correlation≈0 gives: Group N Mean SD Ctrl Ab Diff 33 13 19.000 15.500 3.5000 1.5000 1.5000 1.5000 t-test: t Value 7.13 Pr > |t| <0.0001 Analysis assumes all 46 offspring give independent information. … which is correct here; I generated them to be so. Analysis using Calculated Correlation Same data using mixed model gives: R = -0.083 = CovParm CS Residual Effect group group Ctrl Ab Subject id Num DF 1 Estimate 18.9690 15.5102 Den DF 16 Std Err 0.2225 0.4042 Estimate -0.1860 2.4278 F Value 56.20 Lower 18.4972 14.6534 -0.186 (-0.186+2.428) Pr > F 0.0001 Upper 19.4407 16.3670 Square root of 56.20 is t = 7.50, close to t-test ignoring dams. 3. Offspring for a Dam are Negatively Correlated Third Set of Simulated Data Use 2 offspring/dam; N=32 and 12 to be even. Simulated data with correlation=-0.76 gives: Group N Mean Ctrl Ab Diff 32 12 19.000 15.500 3.500 t-test: t Value 7.06 SD 1.4756 1.4302 1.4639 Pr > |t| <0.0001 Analysis assumes all 46 offspring give independent information. … which is wrong here. Analysis using Calculated Correlation Same data using mixed model gives: R = -0.76 = CovParm CS Residual Effect group group Ctrl Ab Subject id Num DF 1 Estimate 19.0000 15.5000 Den DF 20 Std Err 0.1237 0.2020 Estimate -1.5780 3.6458 F Value 218.33 Lower 18.7419 15.0786 -1.578 (-1.578+3.646) Pr > F 0.0001 Upper 19.2580 15.9214 Square root of 218.33 is t = 14.8, twice the t-test. But, with neg corr, probably would not have a 3.5 difference.
© Copyright 2026 Paperzz