Wednesday Day 3 • Sampling theory • Example dataset 1 Clusters and Strata Tor Strand, 2006 Basic statistical rule for most common statistical tests When observations are not independent Analysis of cluster sampling, cluster randomized trials, trials with repeated measurements and entries. How can two observations be interdependent?: • • • • Cluster sampling Cluster randomized trial Individuals participate in a trial >once People from the same household participate in the same trial or survey • An outcome is measured several times in an individual • Individual observations should be independent of each other If they are not, we are likely to ”overestimate” the precision of our effect measures Or in other words, we may by disregarding dependencies ”underestimate” sample size i.e. variance estimates will be too low, confidence intervals too narrow, t-values too high, P-values are too low. How can we deal with clustering? You need to calculate the INFLATION FACTOR = DESIGN EFFECT For this you need to know the: 1) Intracluster correlation coefficient (ICC) 2) Cluster size 1 How can we deal with clustering?: Design effect=Deff=Inflation factor (IF) IF=1+(m-1) x ICC ICC= Intracluster correlation coefficient ICC is a measure of how similar the individuals within a cluster are compared to all in the sample. Theoretical values: 0 to 1. If ICC is 0, the individuals within each cluster is not more similar to each other than to the rest of the sample. m = average number of individuals in each cluster Design effect clusters • Deft – The ratio of the standard error achieved by a study using this design to the standard error that would be achieved by a SRS with the same sample size. – Adjusted SE = Deft*Unadjusted SE • Deff – Deff=Deft2 IF=1+(m-1) x ICC • ”Typical” ICC = 0.04* • “Usual values”: 0.01 - 0.1 • Obtain these values from – Published studies – Pilot studies • Applicable for means, rates and proportions Eldridge SM, (2003) Lessons for cluster randomised trials in the 21st century: a systematic review of trials in primary care. Clinical Trials Examples of Published Intracluster correlations coeff. Outcome EPDS (Q1) EPDS 13 (Q1) EPDS (Q2) EPDS 13 (Q2) Mother smokes (Q1) Mother smokes (Q2) Breast feeding if child under 6 months age Breast fed at birth Child has sleeping problem (Q1) Child has sleeping problem (Q2) Child's general healthvery healthy* (WCHMP) Minor illnessmore than other children (WCHMP) > 2 minor illnesses in past 3 months (WCHMP) Accident in past year (WCHMP) Behaviour problem (WCHMP) Hospitalised in past 6 months (WCHMP) Child attended hospital in last two months Child seen GP in past 2 weeks (Q1) Child seen GP in past 2 weeks (Q2) Non-routine visits to GP by mother standardised to 1 year Non-routine visits to GP for child standardised to 1 year Intracluster corr 0.029 0.0336 0.0239 0.0108 0.0267 0.088 0.0355 0.0137 0.0 (0.0049) 0.0 (0.003) 0.0287 0.0438 0.0479 0.0 (0.0135) 0.0206 0.0 (0.0126) 0.014 0.0 (0.0196) 0.0 (0.0161) 0.0277 0.0274 EPDS, Edinburgh postnatal depression score WCHMP, Warwick child health and morbidity profile Reading et al. Arch Dis Child 2000;82:79-83 How do we adjust?: 20 clusters in a population adjusted variance = crude variance x IF adjusted sample size (SS) = crude(SS) x IF adjusted SE=SE x √(IF) SE=standard error IF=Inflation factor 2 20 clusters in a population 20 clusters in a population Simple random sample One-stage cluster sample 20 clusters in a population Two-stage cluster sample Cluster effect • Individuals within a cluster might be more similar with each other than with the rest of the population. • In cluster sampling the variance of the estimates might accordingly be underestimated. • Thus, the standard errors and the 95% confidence intervals of the estimates need to be adjusted for the cluster effect. Many ways to adjust for cluster effect • svy commands in STATA and similar commands in other programs • robust variance estimates • general estimating equation models (GEEs) • multilevel modeling • others Stratification • Stratify “to make layers” • Reasons to stratify: – Representative (ensure no under sampling of a strata) – Subgroup analyses – Convenience (different sampling procedures in different strata) – Increase the overall precision 3 20 clusters in a population Stratified 20 clusters in a population Stratified Simple random sample 20 clusters in a population Stratum 1 Stratum 1 Stratum 2 Stratum 2 Stratified 20 clusters in a population One-stage cluster sample Stratified Two-stage cluster sample Stratum 1 Stratum 1 Stratum 2 Stratum 2 20 clusters in a population Stratification increases precision 8 7 Stratified Stratum 1 10 11 Mean 9 9 • The variance is a result of the sum of the deviations from the mean of the particular stratum rather than from the total sample, i.e. the variance DECREASES (if there is a change due to stratification). • Reduces the required sample size 13 11 Stratum 2 Mean 12 15 Overall mean 10.5 4 Calculating variance Compared to the overall mean Stratum 1 Summary - design effect Compared to the stratum mean Observed Mean Diff Diff2 Observed Mean Diff Diff2 7 10.5 3.5 12.3 7 9.0 2.0 4.0 8 10.5 2.5 6.3 8 9.0 1.0 1.0 10 10.5 0.5 0.3 10 9.0 -1.0 1.0 11 10.5 -0.5 11 9.0 -2.0 Sum of squared diff 0.3 19.0 Sum of squared diff 4.0 10.0 • Cluster sampling might – reduce the precision – increase the required sample size • Stratification might Stratum 2 Observed Mean Diff Diff2 Observed Mean Diff 2 Diff 9 10.5 1.5 2.3 9 12.0 3.0 9.0 11 10.5 -0.5 0.3 11 12.0 1.0 1.0 13 10.5 -2.5 6.3 13 12.0 -1.0 1.0 15 10.5 -4.5 20.3 15 12.0 -3.0 Sum of squared diff 29.0 Sum of squared diff 48.0 9.0 Sum of squared diff 20.0 Sum of squared diff 30.0 – increase the precision – reduce the required sample size • Many ways to adjust for the design effect, all are technically complicated Self weighting sampling Probability proportional to size Self weighting sampling Probability proportional to size 4 clusters in a population 4 clusters in a population Probability of selection Probability of selection Probability of selection Probability of selection 1 1 1 1 3 3 3 3 1 1 1 1 6 6 6 6 Self weighting sampling Probability proportional to size Self weighting sampling Equal probability of PSUs 4 clusters in a population Probability of selection Probability of selection 4 clusters in a population Probability of selection Probability of selection 1 1 1 1 3 3 4 4 1 1 1 1 6 6 4 4 5 Self weighting sampling Equal probability of PSUs Self weighting sampling Equal probability of PSUs 4 clusters in a population 4 clusters in a population Probability of selection Probability of selection Probability of selection Probability of selection 1 1 1 1 4 4 4 4 1 1 1 1 4 4 4 4 6
© Copyright 2026 Paperzz