Published by Oxford University Press on behalf of the International Epidemiological Association Ó The Author 2006; all rights reserved. Advance Access publication 30 August 2006 International Journal of Epidemiology 2006;35:1292–1300 doi:10.1093/ije/dyl129 METHODOLOGY Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method Sandra M Eldridge,1* Deborah Ashby2 and Sally Kerry3 Accepted 25 May 2006 Background Cluster randomized trials are increasingly popular. In many of these trials, cluster sizes are unequal. This can affect trial power, but standard sample size formulae for these trials ignore this. Previous studies addressing this issue have mostly focused on continuous outcomes or methods that are sometimes difficult to use in practice. Methods We show how a simple formula can be used to judge the possible effect of unequal cluster sizes for various types of analyses and both continuous and binary outcomes. We explore the practical estimation of the coefficient of variation of cluster size required in this formula and demonstrate the formula’s performance for a hypothetical but typical trial randomizing UK general practices. Results The simple formula provides a good estimate of sample size requirements for trials analysed using cluster-level analyses weighting by cluster size and a conservative estimate for other types of analyses. For trials randomizing UK general practices the coefficient of variation of cluster size depends on variation in practice list size, variation in incidence or prevalence of the medical condition under examination, and practice and patient recruitment strategies, and for many trials is expected to be ~0.65. Individual-level analyses can be noticeably more efficient than some cluster-level analyses in this context. Conclusions When the coefficient of variation is ,0.23, the effect of adjustment for variable cluster size on sample size is negligible. Most trials randomizing UK general practices and many other cluster randomized trials should account for variable cluster size in their sample size calculations. Keywords Cluster randomized trial, unequal cluster sizes, sample size Cluster randomized trials are trials in which groups or clusters of individuals, rather than individuals themselves, are randomized to intervention groups. This long-recognized trial design1 1 2 3 Centre for Health Sciences, Barts and The London School of Medicine and Dentistry, Queen Mary, University of London, London, UK. Wolfson Institute for Preventive Medicine, Queen Mary, University of London, London, UK. Department of Community Health Sciences, St George’s, University of London, London, UK. * Corresponding author. Centre for Health Sciences, Barts and The London School of Medicine and Dentistry, Abernethy Building, 2 Newark Street, Queen Mary, University of London, London E1 2AT, UK. E-mail: [email protected] has gained popularity in recent years with the advent of health services research where it is particularly appropriate for investigating organizational change or change in practitioner behaviour. Many authors have described the statistical consequences of adopting a clustered trial design,2–9 but most assume the same number of individual trial participants (cluster size) in each cluster7,8 or minimal variation in this number.9 Researchers calculating sample sizes for cluster randomized trials also generally ignore variability in cluster size, largely because there has been no appropriate, easily usable, sample size formula, in contrast to a simple formula for trials in which cluster sizes are identical.3 In practice, however, cluster sizes often vary. For 1292 CLUSTER SIZE VARIATION AND SAMPLE SIZE example, in a recent review of 153 published and 47 unpublished trials the recruitment strategies in two-thirds of the trials led inevitably to unequal sized clusters.10 Imbalance in cluster size affects trial power. The way in which cluster size imbalance affects trial power is intuitively obvious if we consider several trials with exactly the same number of clusters and exactly the same total number of trial participants. The most efficient design occurs when cluster sizes are all equal. If cluster sizes are slightly imbalanced then estimates from the smaller clusters will be less precise and estimates from the larger clusters more precise. There are, however, diminishing gains in precision from the addition of an extra individual as cluster sizes increase. This means that the addition of individuals to larger clusters does not compensate for the loss of precision in smaller clusters. Thus, as the cluster sizes become more unbalanced, power decreases. Two recent studies report a simple formula for the sample size of a cluster randomized trial with variable cluster size and a continuous outcome,11,12 and a third study reports a similar result for binary outcomes.13 Kerry and Bland14 use a formula for binary and continuous outcomes, which requires more knowledge about individual cluster sizes in advance of a trial. These papers do not discuss the practical aspects of estimating the formulae in advance of a trial in detail. In addition, all the formulae strictly apply to analysis at the cluster level, whereas analysis options for cluster randomized trials have increased in recent years, including analysis at the level of the individual appropriately adjusted to account for clustering. In the present paper we (i) show how a simple formula can be used to judge the possible effect of variable cluster size for all types of analyses; (ii) explore the practical estimation of a key quantity required in this formula; and (iii) illustrate the performance of the formula in one particular context for individual-level and cluster-level analyses. We articulate our methods using cluster randomized trials from UK primary health care where these trials are particularly common.10 Method Judging the possible effect of variable cluster size Full formulae for sample size requirements for cluster randomized trials are given elsewhere.15 When all clusters are of equal size, m, sample size formulae for estimating the difference between means (or proportions) in intervention groups for a continuous (or binary) outcome differ from comparable formulae for individually randomized trials only by an inflation factor 1 1 (m 1)r, usually called the design effect (DE),3 or the variance inflation ratio, because it is the ratio of the variance of an estimate in a cluster trial to the variance in an equivalently sized individually randomized trial. The intra-cluster correlation coefficient (ICC), r, is usually defined as the proportion of variance accounted for by between cluster variation.16 More generally, the design effect represents the amount by which the sample size required for an individually randomized trial needs to be multiplied to obtain the sample size required for a trial with a more complex design such as a cluster randomized trial and depends on design and analysis. Here we assume unstratified designs and that clusters are assigned to each intervention group with equal probability. For continuous and binary outcomes common appropriate analyses are: (i) 1293 cluster-level analyses weighting by cluster size, (ii) individuallevel analyses using a random effect to represent variation between clusters, and (iii) individual-level marginal modelling (population averaged model) using generalized estimating equations (GEEs).3 When cluster sizes vary, the usual design effect formulae for these analyses require knowledge of the actual cluster sizes in a trial, alongside the value of the ICC (Table 1). This information is often not known in advance of a trial. Assuming a cluster-level analysis on the linear scale weighting by cluster size (applicable to both continuous and binary outcomes), and defining the coefficient of variation of cluster size, cv, as the ratio of the standard deviation of cluster sizes sm P [s2m ¼ ðmi m Þ2 =ðk 1Þ where mi to the mean cluster size, m is the size of cluster i, and k is the number of clusters], the appropriate design effect (Table 1), can be rewritten as: 2 1 r; ð1Þ DE ¼ 1 þ cv ðk 1Þ=k þ 1 m which does not depend on knowledge of individual cluster sizes. Ignoring (k 1/k), this becomes: DE ¼ 1 þ 1 r: cv2 þ 1 m ð2Þ This gives a slight overestimate of the design effect, which is more serious when k is small. In this paper we use Equation (2) as an estimate of the design effect for cluster-level analyses weighting by cluster size. Table 1 Design effects for analyses commonly undertaken in cluster randomized trials Outcomes and analysis Source Continuous and binary: Linear cluster-level analysis (difference between two means or two proportions) weighting by cluster size Donner Kerry and 14 Bland Continuous and binary: Zinear cluster-level analysis weighting inversely proportional to variance Kerry and 14 Bland Continuous: Random effects model using maximum likelihood estimation or GEE with exchangeable correlation structure Manatunga 3 Donner Binary: GEE, robust variance estimation, Wald test, exchangeable correlation structure assumed and true Pan Binary: GEE, robust variance estimation, Wald test, independent correlation structure assumed, exchangeable structure true Pan Design effect P 2 m 1þ P i 1 r mi 3 Pk k m mi i¼1 1þðmi 1Þr 12 Pk k m mi i¼1 1þðmi 1Þr 17 Pk k m mi i¼1 1þðmi 1Þr 17 k m Pk i¼1 mi ½1 þ ðmi Pk 2 i¼1 mi 1Þr Note: For binary outcomes, simple expressions for design effect for random effects models are not available. Furthermore, when clustering is present the design effect is smaller than the variance inflation ratio for these analyses because the effect size estimated is further from the null value than population averaged models or models not accounting for clustering. 1294 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Individual-level analyses are more efficient than cluster-level analyses weighting by cluster size,11 and Equation (2) is conservative [in addition to the overestimate produced by ignoring k/(k 1)] for these analyses. In contrast, the commonly used design effect estimate for unequal cluster sizes,5,15 underestimates the true design effect for these analyses: 1Þr: DE ¼ 1 þ ðm ð3Þ For all analyses, the true design effect lies between expressions (2) and (3). We explore the ratio of these two quantities, the Maximum possible Inflation in sample Size (MIS) required when cluster sizes are variable rather than equal, for various mean cluster sizes, cvs, and ICCs: MIS ¼ 1r 1 þ ½ð1 þ cv2 Þm : 1Þr 1 þ ðm ð4Þ Estimating coefficient of variation of cluster size Thus for investigators planning trials with unequal cluster sizes, an idea of the likely value of cv is useful. Here we suggest various ways to estimate this coefficient. Knowledge of coefficients of variation in similar previous studies (method 1) We present actual cvs from six trials in which UK general practices were randomized.18–23 The trials were chosen because they vary in size and method of cluster (practice), and individual, recruitment (Table 2); these factors may affect cluster size variation. For example, in three trials (POST,18 ELECTRA,20 and HD21) all practices in a particular geographic area were invited to participate; in the other trials practice eligibility was further restricted. In the AD19 trial, patients were recruited in proportion to number of practice partners, whereas in the LTMI23 trial patients were recruited from practice disease registers of those already identified with long-term mental illness, and cluster size variation depended on variation in disease register size. In the DD22 and POST trials patients were recruited via an event precipitating health service contact; the number of eligible patients was subject to random variability as well as varying with practice list size and practice incidence rate. Investigating and modelling sources of cluster size variation (method 2) In discussing sources of cluster size variation we distinguish between the number of individual trial participants in each cluster—‘cluster size’, and the size of the wider pool of individuals from which participants are drawn—‘whole cluster size’. For example, when clusters are general practices the whole cluster size is the number of individuals registered at a practice and the cluster size is the number of trial participants from the practice. Possible sources of variation in cluster size are then: (i) the underlying distribution of whole cluster sizes in the population of clusters, (ii) the sampling strategy for recruiting clusters from this population, (iii) patterns of cluster response and drop-out, (iv) the distribution of eligible individuals within clusters (for example, trials may include just those in particular age and sex categories such as the elderly or children, or, in health care, those with particular conditions such as diabetics or asthmatics), (v) the sampling strategy for recruiting individual participants from clusters, and (vi) patterns of response and drop-out amongst individuals. The importance of each source will vary according to the context. We include in a model what we judge are the major sources for trials randomizing general practices: (i), (ii), and (v). Ignoring the other sources, we implicitly assume that levels of non-response and drop-out are unrelated to cluster size and that the distribution of eligible patients is identical in every cluster. Although limited, data from our illustration trials support some of these assumptions; practice response rates, ranging between 55% (AD) and 100% (ELECTRA), had no discernable relationship with cv and practice drop-out rates were minimal. We base our modelling of cluster size variation on common characteristics of trials randomizing UK general practices, some of which are exhibited by our illustration trials. We assume that practices are randomly selected from a population with list size distributed according to the list sizes of UK practices. Most trials in the UK (including three illustration trials) select from all practices in one or more primary care trust (PCT),24 and the coefficient of variation of practice size in most PCTs is similar to that for all UK practices (median 0.56, IQR 0.49–0.64).25 We then assume that patients are recruited following an event resulting in health service contact (a strategy employed in three illustration trials). For practice, i, the expected number of trial participants is wi 5 p * g i where g i is the practice list size, and p the proportion of all patients experiencing this event in the trial period, generated from the assumed mean cluster size in as p 5 m /(mean UK practice size). To reflect random the trial, m variation within practices, the actual number of patients, mi, is assumed to follow a Poisson distribution with mean wi. We estimate the expected cv from simulated mi values, for trials with mean cluster sizes ranging from 2 to 100. Over 80% of primary health care trials in a recent review had mean cluster sizes within this range.10 The expected value of cv increases with increasing number of clusters, but changes very little for trials with more than 10 clusters; we simulate trials with 1000 clusters; although unrealistic in practice this gives conservative estimates of cv for all trials with fewer clusters. For each mean cluster size we also estimate the expected proportion of empty clusters and adjust the cv because these clusters will be excluded from trial analyses. We use Markov chain Monte Carlo modelling in WinBUGS.26 Full details are in the online supplement. When investigators are able to estimate likely minimum and maximum cluster sizes (method 3) This method may be useful when other methods are not feasible, for example when individual trial participants come from subgroups of all individuals in clusters such as particular age or ethnic groups that exhibit a high degree of clustering. Coefficients of variation of cluster size may then be considerably larger than the coefficients of variation of the underlying units.27 If, however, the likely range of clusters sizes can be estimated, the standard deviation of cluster size can be approximated by: likely range/4 (based on the width of the 95% confidence interval of a normal distribution). This approximation, together with an estimate of mean cluster size will give an estimate of cv adequate for sample size calculations. A sensitivity analysis to the likely range may be appropriate. When all individuals in each recruited cluster participate in a trial (method 4) In this case, cv varies only according to the clusters sampled. As a ratio of two random quantities, its expectation can be expressed as a Taylor Series around sm/mm28 where mm and sm are the mean Education package and guidelines to GPs to improve detection and management of depression Postal prompts to patients Not restricted and GP about lowering CHD risk after angina or MI Asthma liaison nurse and Not restricted education to primary care clinicians HDP (Thompson b 2000) POST (Feder a 1999) ELECTRA (Griffiths a 2004) b Data held by SE. Data available from trial authors. c Data held by SK. a Not restricted Consultations (incident cases) Patients consulting with and from notes (already asthma exacerbation or identified) patients at high risk identified from GP notes All patients with exacerbation either in study period or in previous 2 years All admission in specified period Screening took place until minimum of 30/GP or 40/GP in single handed practices All patients with depression whether already known or not Management-positive score on Hospital Anxiety and Depression Scales (HADS) of those screened Admission to hospital with Hospital admissions angina or MI All attendees All attendees Detection—Consecutive adult attendees in General Practice All new cases in 12 months Incident cases of diabetes All patients on register 10/GP Known asthmatics or diabetics Yes Yes Yes Yes Yes No Newly recognized or patients Patients required with condition already known Numbers of patients (design) to consent Known patients with long term mental illness Disease register Structured care of patients with long term mental illness LTMI (Burns c 1997) 3 or more partners, teaching practices 4 or more partners; list size New cases of diabetes identified by practice .7000; diabetic register nurse .1% practice population AD (Feder 1995) Training of GPs in patients centred care Disease register Method of identifying eligible patients DD (Woodcock b 1999) Practice size (by design) Non training practices a Intervention Clinical guidelines for asthma and diabetes Trial Table 2 Recruitment strategies in six illustration trials CLUSTER SIZE VARIATION AND SAMPLE SIZE 1295 1296 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY and variance of the whole cluster size for all clusters in the population. Simulations similar to those described above suggest that sm/mm is a very close, slightly anti-conservative, approximation to the expectation of cv as long as a trial includes .10 clusters. Even for a smaller number of clusters, using sm/mm may be adequate for sample size estimation, bearing in mind the usual uncertainty in sample size inputs. When whole cluster sizes are identical (method 5) This could be the case, when, for example, clinicians with a more or less identical workload and case-mix are randomized. If individual trial participants are recruited from clusters by some random process, the expected distribution of cluster size can pbe ffiffiffiffi approximated by a Poisson distribution, and cv becomes , which tends to zero as m increases. 1= m When cluster size follows a roughly normal distribution (method 6) This could be the case, for example, where patient caseloads within health organizations have a roughly normal distribution and cluster size is heavily dependent on caseload. Since the minimum possible cluster size is always one, the normal approximation must lie almost entirely above zero. The mean must, therefore, be at least 2 SD from zero (approximately) and consequently cv is at most 0.5. Here we present results based on the first three methods only. These methods are more general in application. Estimating sample size requirements in practice We consider four possible estimates of sample size for a hypothetical trial with mean cluster size 10, ICC 0.05, and 200 individuals required if there is no clustering. One estimate accounts for clustering but not variable cluster size [Equation (3)], and three account for variable cluster size using Equation (2) and the first three methods of estimating cv described above. To assess the methods, we compare the four estimated design effects with the sampling distributions of actual design effects assuming (i) cluster-level analyses weighting by cluster size and (ii) GEE analyses assuming an exchangeable correlation structure (applicable for both continuous and binary outcomes) and that variation in cv results from practice and patient recruitment strategies as described for method two above (for details see IJE online). Table 3 MIS values for a range of average cluster sizes, coefficients of variation of cluster size and intra-cluster correlation coefficients Coefficient of variation of cluster size 0.4 0.5 0.6 0.7 0.8 Intra-cluster correlation coefficient 0.001 0.01 0.05 0.1 0.2 0.3 5 1.00 1.01 1.03 1.06 1.09 1.11 10 1.00 1.01 1.06 1.08 1.11 1.13 50 1.01 1.05 1.12 1.14 1.15 1.15 100 1.01 1.08 1.13 1.15 1.15 1.16 500 1.05 1.13 1.15 1.16 1.16 1.16 1000 1.08 1.15 1.16 1.16 1.16 1.16 5 1.00 1.01 1.05 1.09 1.14 1.17 10 1.00 1.02 1.09 1.13 1.18 1.20 50 1.01 1.08 1.18 1.21 1.23 1.24 100 1.02 1.13 1.21 1.23 1.24 1.24 500 1.08 1.21 1.24 1.25 1.25 1.25 1000 1.13 1.23 1.25 1.25 1.25 1.25 5 1.00 1.02 1.08 1.13 1.20 1.25 10 1.00 1.03 1.12 1.19 1.26 1.29 50 1.02 1.12 1.26 1.31 1.33 1.34 100 1.03 1.18 1.30 1.33 1.35 1.35 500 1.12 1.30 1.35 1.35 1.36 1.36 1000 1.18 1.33 1.35 1.36 1.36 1.36 5 1.00 1.02 1.10 1.18 1.27 1.33 10 1.00 1.04 1.17 1.26 1.35 1.40 50 1.02 1.16 1.36 1.42 1.45 1.47 100 1.04 1.25 1.41 1.45 1.47 1.48 500 1.16 1.41 1.47 1.48 1.49 1.49 1000 1.25 1.45 1.48 1.49 1.49 1.49 5 1.00 1.03 1.13 1.23 1.36 1.44 10 1.01 1.06 1.22 1.34 1.46 1.52 50 1.03 1.21 1.46 1.54 1.59 1.61 100 1.06 1.32 1.54 1.59 1.62 1.63 500 1.21 1.53 1.62 1.63 1.63 1.64 1000 1.32 1.58 1.63 1.63 1.64 1.64 5 1.00 1.04 1.17 1.29 1.45 1.55 10 1.01 1.07 1.28 1.43 1.58 1.66 Results 50 1.04 1.27 1.59 1.69 1.75 1.77 100 1.07 1.41 1.68 1.74 1.78 1.79 Judging the possible effect of variable cluster size 500 1.27 1.68 1.78 1.80 1.80 1.81 1000 1.41 1.74 1.79 1.80 1.81 1.81 5 1.00 1.05 1.21 1.36 1.56 1.68 10 1.01 1.09 1.34 1.53 1.71 1.81 50 1.05 1.34 1.72 1.85 1.93 1.96 100 1.09 1.50 1.84 1.92 1.96 1.98 500 1.33 1.83 1.96 1.98 1.99 2.00 1000 1.50 1.91 1.98 1.99 2.00 2.00 The maximum possible increase in sample size due variable cluster size (MIS) increases with increasing cv, ICC, and mean cluster size (Table 3). For a given cv, MIS reaches a maximum approaches infinity. When the effect of variable of 1 1 cv2 as m cluster size is ignored the values of cv, which ensure that the maximum underestimate in sample size is ,10 and 5%, are 0.33 and 0.23, respectively. Estimation of the coefficient of variation of cluster size Knowledge of coefficients of variation from previous similar trials. Our illustration trials have cvs between 0.42 and 0.75 (Figure 1, Table 4), the lowest occurring for the LTMI trial in which only 0.9 Average cluster size 1 large training practices were eligible to participate. The other trials have cvs in a narrow range (0.61–0.75) in spite of varying design features. For similar trials expected values of cv could be estimated from these values. CLUSTER SIZE VARIATION AND SAMPLE SIZE .6 .6 LTMI .4 .4 cv = 0.42 .2 Fraction .6 DD .2 0 0 1 48 .6 POST .6 .4 cv = 0.75 .4 .2 18 10 HD cv = 0.62 .2 0 23 60 .6 ELECTRA .4 cv = 0.64 .2 0 1 cv = 0.72 .2 0 8 AD .4 cv = 0.61 1297 0 2 295 41 28 Size of clusters Figure 1 Distribution of cluster sizes for six illustration trials Table 4 Distribution of cluster sizes and coefficient of variation for six illustration trials Coefficient of variation of cluster sizes Number of clusters Mean cluster size Standard deviation Minimum cluster size Maximum cluster size Observed Expected, method 2 Expected, method 3 24 16.25 11.73 10 60 0.72 0.68 0.77 AD DD 40 6.25 3.81 1 18 0.61 0.71 0.68 LTMI 16 23.31 9.91 8 48 0.42 0.67 0.43 HD 55 109.78 68.06 41 295 0.62 0.63 0.58 POST 52 6.31 4.75 1 23 0.75 0.71 0.72 ELECTRA 41 7.78 4.95 2 28 0.64 0.72 0.84 0.71 at a mean cluster size of five and tends towards the underlying coefficient of variation of practice list sizes, 0.63, as mean cluster size increases. Using this method, expected cvs for our illustration trials range from 0.63 to 0.71 (Table 4). Coefficient of variation 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0 10 20 30 40 50 60 70 Average cluster size 80 90 100 Figure 2 Expected coefficient of variation of cluster size (top line represents coefficient for all practices randomized and bottom line coefficient of variation for all practices contributing to analysis, excluding randomized practices providing no patients) by average cluster size for trials randomizing UK general practices Investigating and modelling sources of variation Using modelling, expected cvs for all practices agreeing to participate in a trial (Figure 2, top line) decrease from a maximum of 0.95 as mean cluster size increases. The more appropriate cv to use in a sample size calculation, that for all analysed practices (Figure 2, bottom line), is at a maximum of When investigators can estimate likely minimum and maximum cluster sizes. Using actual minimum and maximum cluster size values from our illustration trials, estimated cvs range from 0.43 to 0.84 and are close to actual cvs (Table 4). In practice actual minimum and maximum cluster sizes are unlikely to be available in advance of a trial, but reasonably accurate estimates may provide an adequate estimate of cv. Estimating sample size requirements in practice For our hypothetical trial, sample size estimates range from 34 to 38 practices when variable cluster size is accounted for, and 29 practices when it is not (Figure 3). Sampling distributions of design effects for cluster-level and GEE analyses are virtually identical for trials containing 29, 33, 34, or 38 clusters; we show the distribution for 38 clusters (Figure 4). As expected, GEE analyses are more efficient than cluster-level analyses. The design effect calculated ignoring variable cluster size underestimates sample size required in over 98% (99%) of trials using GEE (cluster-level) analyses. 1298 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Figure 3 The effect of variable cluster size on sample size requirements in a hypothetical trial. Note: The four design effects compared with sampling distributions of actual design effects are shown in bold in this figure 2.4 2.2 Design effect 2.0 1.8 1.6 1.4 1.2 Without Previous Modelling Range Cluster_level GEE Legend: without = advance estimate without accounting for variable cluster size (equation (3)) previous = advance estimate based on cv from similar previous trial (equation (2)) modelling = advance estimate based on modelling cv (equation (2)) range = advance estimate based on cv estimated from likely range of cluster sizes (equation (2)) cluster_level = sampling distribution for cluster-level analyses (ICC and expected mean cluster fixed) GEE = sampling distribution for GEE analyses (ICC and expected mean cluster size fixed) Figure 4 Four advance estimates of design effect compared with the sampling distribution of actual design effects for our hypothetical trial CLUSTER SIZE VARIATION AND SAMPLE SIZE Discussion The design effect for a cluster randomized trial with unequal cluster sizes analysed using cluster-level analyses weighting by cluster size can be estimated from the coefficient of variation of cluster size, mean cluster size, and the ICC expected in the trial. This design effect is conservative for individual-level analyses. For a specific cv, the maximum possible increase in sample size required to allow for variable cluster size is 1 1 cv2. Trials randomizing UK general practices commonly have cvs ~ 0.65, which can result in sample size increases of up to 42%. The strength of the simple design effect formula used in this paper is its simplicity. In addition to elements used in the sample size calculation assuming equal cluster sizes our formula only requires an estimate of either the standard deviation or cv. A potential weakness is that accurate estimates of these quantities may not always be easy to obtain. We have presented various methods of estimating cv. The accuracy of any method will depend on the extent to which it incorporates relevant important sources of cluster size variation. A further drawback of using any simple design effect formula, including the one presented here, is that its appropriateness depends on the accuracy of ICC predictions. The size of many cluster randomized trials can lead to considerable sampling error in an ICC estimate, and any simple formula may work less well for small trials. Analysis method can also affect the value of the ICC.29 A Bayesian approach to analysis in which the ICC is determined by a prior distribution as well as the trial data itself may alleviate the problem of uncertainty in predicting the ICC. Similar formulae to the one we present have been derived previously.11–13 Manatunga and Hudgens briefly discuss the implications of their formula for sample size calculations assuming that cluster size variation can be completely determined by the distribution of whole cluster sizes in the 1299 population of clusters. Our discussion of the estimation of cluster size variation incorporates other important sources of variation. Our formula is not directly applicable to individuallevel analyses. We show the relative efficiency of GEE analyses compared with cluster-level analyses weighting by cluster size in one particular setting. Previous work has shown the increased efficiency advantages of analyses other than this type of cluster-level analysis when cluster sizes are large.14 We plan to explore relative efficiency further in a future paper. One implication of our work is that investigators should consider the possible impact of variable cluster size on trial power, particularly when variation in cluster size, the ICC or mean cluster size are expected to be large. Given the uncertain nature of sample size calculations, however, it is almost certainly not necessary to adjust sample size to take account of this variation when cv is ,0.23. Our results also emphasize the importance of small cluster sizes in an efficient design. We focus here on applications in one specific context: primary health care. The basic techniques we present are completely general, but researchers working in other fields, with different types of clusters, will need to consider the methods in their own situations. We have only considered continuous and binary outcomes in this paper, and have not considered stratified designs. The comparison of design effects under different analyses needs developing further. Acknowledgements We would like to thank Obi Ukoumunne, Mike Campbell, and Chris Frost for helpful comments on this work, and the authors of the Diabetes Care from Diagnosis trial and Hampshire Depression Project for permission to use their trial data. Sandra Eldridge’s work was funded by an NHS Executive Primary Care Researcher Development Award. KEY MESSAGES Many cluster randomized trials have variable cluster sizes, which should be accounted for in sample size calculations. A simple formula provides a good estimate of sample size requirements for some types of analyses and a conservative estimate for other types of analyses. The coefficient of variation of cluster size needed for this formula can be estimated in several ways. For trials randomizing UK general practices individual-level analyses can be noticeably more efficient than cluster-level analyses weighting by cluster size. When the coefficient of variation is ,0.23, the effect of adjustment for variable cluster size on sample size is negligible. References 1 2 3 4 Lindquist EF. Statistical Analysis in Educational Research. Boston: Houghton Mifflin, 1940. 5 Donner A, Klar N. Statistical considerations in the design and analysis of community intervention trials. J Clin Epidemiol 1996;49:435–39. 6 Donner A, Klar N. Design and Analysis of Cluster Randomised Trials in Health Research. London: Arnold, 2000. 7 Kerry SM, Bland JM. Trials which randomize practices I: how should they be analysed? Fam Pract 1998;15:80–83. Kerry SM, Bland JM. Trials which randomize practices II: sample size. Fam Pract 1998;15:84–87. Donner A. Some aspects of the design and analysis of cluster randomised trials. Appl Stat 1998;47:95–113. Kerry SM, Bland JM. Sample size in cluster randomisation. BMJ 1998;316:549. 1300 8 9 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Hayes RJ, Bennett S. Simple sample size calculation for clusterrandomized trials. Int J Epidemiol 1999;28:319–26. Donner A. Sample size requirements for stratified cluster randomization designs. Stat Med 1992;11:743–50. 10 11 12 13 14 15 16 17 18 19 Eldridge S, Ashby D, Feder G, Rudnicka AR, Ukoumunne OC. Lessons for cluster randomised trials in the twenty-first century: a systematic review of trials in primary care. Clin Trials 2004;1:80–90. Lake S, Kammann E, Klar N, Betensky R. Sample size re-estimation in cluster randomization trials. Stat Med 2002;21:1337–50. Manatunga AK, Hudgens MG. Sample size estimation in cluster randomised studies with varying cluster size. Biom J 2001;43:75–86. Kang SH, Ahn CW, Jung SH. Sample size calculation for dichotomous outcomes in cluster randomization trials with varying cluster size. Drug Inf J 2003;37:109–14. Kerry SM, Bland JM. Unequal cluster sizes for trials in English and Welsh general practice: implications for sample size calculations. Stat Med 2001;20:377–90. Donner A, Birkett N, Buck C. Randomization by cluster. Sample size requirements and analysis. Am J Epidemiol 1981;114:906–14. Kerry SM, Bland JM. The intracluster correlation coefficient in cluster randomisation. BMJ 1998;316:1455. Pan W. Sample size and power calculations with correlated binary data. Control Clin Trials 2001;22:211–27. Feder G, Griffiths C, Eldridge S, Spence M. Effect of postal prompts to patients and general practitioners on the quality of primary care after a coronary event (POST): randomised controlled trial. BMJ 1999;318:1522–26. Feder G, Griffiths C, Highton C, Eldridge S, Spence M, Southgate L. Do clinical guidelines introduced with practice based education improve care of asthmatic and diabetic patients? A randomised controlled trial in general practices in east London. BMJ 1995;311:1473–78. 20 Griffiths C, Foster G, Barnes N et al. Specialist nurse intervention to reduce unscheduled asthma care in a deprived multiethnic area: the east London randomised controlled trial for high risk asthma (ELECTRA). BMJ 2004;328:144. 21 Thompson C, Kinmonth AL, Stevens L et al. Effects of a clinical-practice guideline and practice-based education on detection and outcome of depression in primary care: Hampshire Depression Project randomised controlled trial. Lancet 2000;355:185–91. 22 Woodcock AJ, Kinmonth AL, Campbell MJ, Griffin SJ, Spiegal NM. Diabetes care from diagnosis: effects of training in patient-centred care on beliefs, attitudes and behaviour of primary care professionals. Patient Educ Couns 1999;37:65–79. 23 Burns T, Kendrick T. Care of long-term mentally ill patients by British general practitioners. Psychiatr Serv 1997;48:1586–88. 24 Department of Health. Primary Care Trusts. 2002. Available at: http://www.dh.gov.uk/PolicyAndGuidance/OrganisationPolicy/ PrimaryCare/PrimaryCareTrusts/fs/en (Last Accessed May 2006). 25 Department of Health (2002). Average List Size of Unrestricted Principals and Equivalents by Partnership. [database] London: Department of Health General and Personal Medical Services Statistics. 26 Spiegelhalter D, Thomas A, Best N. WinBUGS Version 1.4. 2003 [software on the Internet] Available at: http://www.mrc-bsu.cam.ac. uk/bugs/ (Last Accessed May 2006) 27 Verma V, Le T. An analysis of sampling errors for the Demographic Health Surveys. Int Stat Rev 1996;64:265–94. 28 Mood AM, Graybill FA, Boes DC. Introduction to the Theory of Statistics. Kogakusha: 1974. 29 Evans BA, Feng Z, Peterson AV. A comparison of generalized linear mixed model procedures with estimating equations for variance and covariance parameter estimation in longitudinal studies and group randomized trials. Stat Med 2001;20:3353–73.
© Copyright 2025 Paperzz