Heterogeneity Chapters 15 and 16 Introduction to Meta-analysis Borenstein, Hedges, Higgins & Rothstein CAMARADES: Bringing evidence to translational medicine Heterogeneity Chapter 15 Overview • The goal of a synthesis is not simply to compute a summary effect, but rather to make sense of the pattern of effects. – If the effect size is consistent across studies we need to know that and to consider the implications, and if it varies we need to know that and consider the different implications. CAMARADES: Bringing evidence to translational medicine • The problem- the observed variation in the estimated effect sizes is partly spurious as it includes both true variation in effect sizes and also random error. • We use the following measures: – – – – – Q statistic (a measure of weighted squared deviations) P value (the result of a statistical test based on Q) T2 (the between-studies variance) T (the between-studies standard deviation) I2 (the ratio of true heterogeneity to total observed variation) CAMARADES: Bringing evidence to translational medicine Chapter 16 Identifying and Quantifying Heterogeneity • Under the random effects model we allow that true effect sizes may vary from study to study. We discuss approaches to identify and then quantify this heterogeneity. • When we discuss heterogeneity in effect sizes we mean the variation in true effect sizes. • However the variation that we actually observe is partly spurious, incorporating both (true) heterogeneity and also random error. True effect size = the effect size in the underlying population, and the effect size we would observe if the study had an infinitely large sample size (and therefore no sampling error). CAMARADES: Bringing evidence to translational medicine • If all studies in an analysis shared the same true effect size, so that true heterogeneity is zero. We would not expect the observed effects to be identical to each other but because of within-study error, we would expect each to fall within some range of the common effect. • If the true effect sizes vary from one study to the next , the observed effects vary from one another for 2 reasons: 1. the real heterogeneity in effect size 2. the within-study error To quantify the heterogeneity we partition the observed variance into these 2 components and then focus on the real heterogeneity in effect sizes. How do we do this? CAMARADES: Bringing evidence to translational medicine 1. We compute the total amount of study-tostudy variation actually observed 2. We estimate how much the observed effects would be expected to vary from each other if the true effect was actually the same in all studies 3. The excess variation (if any) is assumed to reflect real differences in effect sizes (that is, the heterogeneity) CAMARADES: Bringing evidence to translational medicine Dispersion across studies relative to error within studies • Observed effects are identical in A & B. • CIs are relatively wide in A and relatively narrow in B. • In A, all studies could share a common effect, with the observed dispersion falling within the umbrella of the CIs. • The CIs for the B studies are quite narrow and cannot comfortably account for the observed dispersion. • Similarly, the observed effects are identical in C & D where C has wider CIs. • In C, the effects can be fully explained by within-study error, while in D they cannot. CAMARADES: Bringing evidence to translational medicine • From A to C both the within-study variance and the observed variance have been multiplied by 2. Same for B and D. • The scale has increased but the ratio (observed/within) is unchanged. • While the effects are more widely dispersed in the 2nd row than in the 1st, this is not relevant to the purpose of isolating the true dispersion. • What matters is the ratio of observed to expected dispersion, which is the same in A and C (and is the same in B and D) • Q= a statistic that is sensitive to the ratio of the observed variation to the withinstudy error, rather than their absolute values. CAMARADES: Bringing evidence to translational medicine Computing Q • 1st step in partitioning the heterogeneity is to compute Q, defined as » » » » Where Wi is the study weight (1/Vi) Yi is the study effect size M is the summary effect K is the number of studies • In words, we compute the deviation of each effect size from the mean, square it, weight this by the inverse-variance for that study, and sum these values over all studies to yield the weighted sum of squares (WSS), or Q. CAMARADES: Bringing evidence to translational medicine The expected value of Q based on within-study error • Because Q is a standardised measure the expected value does not depend on the metric of the effect size, but is simply the degrees of freedom where k is the number of studies CAMARADES: Bringing evidence to translational medicine The excess variation • Since Q is the observed WSS and df is the expected WSS (under the assumption that all studies share a common effect), the difference, reflects the excess variation, the part that will be attributed to differences in the true effects from study to study. CAMARADES: Bringing evidence to translational medicine Ratio of observed to expected variation A) Observed value of Q =3 Expected value= 5 (k-1) Observed variation is less than expected based on within study error (Q is less than the degrees of freedom) B) Observed variation is greater than we would expect based on within-study error (Q is greater than the degrees of freedom) • Q reflects the total dispersion (WSS) • Q-df reflects the excess dispersion CAMARADES: Bringing evidence to translational medicine • Q = 3 in both A and C because these plots share the same ratio • Q= 12 in both B and D because these plots share the same ratio • (despite the fact that the absolute range of effects is higher in C and D) CAMARADES: Bringing evidence to translational medicine How to derive Tau2 and I2 from Q and df To estimate what proportion of the We can use Q to estimate the observed variance reflects real variance (and standard deviation) differences among studies (rather than of the true effects : start with Q, random error) we will start with Q, remove the dependence on the remove the dependence on the number of studies, and return to the number of studies, and express the original metric. These are Tau2 and results as a ratio, I2 Tau. CAMARADES: Bringing evidence to translational medicine • Researchers typically ask: is the heterogeneity statistically significant? • We test the null hypothesis that all studies share a common effect size. • Typically we set alpha at 0.10 or at 0.05, with a p value less than alpha leading us to reject the null hypothesis, and conclude that the studies do not share a common effect size. • This test of significance is sensitive both to the magnitude of the effect (here, the excess dispersion) and the precision with which this effect is estimated (here, based on the number of studies). CAMARADES: Bringing evidence to translational medicine The impact of excess dispersion Compare plots A vs B, which both have 6 studies. As the excess dispersion increases (Q moves from 3.00 in A to 12.00 in B) the p value moves from 0.70 to 0.035. Similarly when we compare plots C vs D. The impact of number of studies Compare plots A vs C, identical except A has 6 studies and C has 12 with the same estimated value of between-study variation. With the additional precision the p-value moves away from zero, from 0.70 (for A) to 0.83 (for C). Compare plots B vs D, identical except B has 6 studies and D has 12 with the same estimated value of betweenstudies variation (Tau2=0.037). With the added precision the p-value moves towards zero. CAMARADES: Bringing evidence to translational medicine The p-value for the left-hand columns moves towards 1.0 as we added studies, while the p-value for the left-hand column moved towards 0.0 as we added studies. At left, since Q is less than df the additional evidence strengthens the case that the excess dispersion is zero, and moves the p-value towards 1. At right, since Q exceeds df, the additional evidence strengthens the case that the excess dispersion is not zero, and moves the p-value towards 0. CAMARADES: Bringing evidence to translational medicine Q and its p-value • A significant p-value provides evidence that the true effects vary, the converse is not true. • A nonsignificant p-value should not be taken as evidence that the effect sizes are consistent, since the lack of significance could be due to low power. • The Q statistic and p-value address only the test of significance and should never be used as surrogates for the amount of true variance. • A nonsignificant p-value could reflect a trivial amount of observed dispersion, but could also reflect a substantial amount of observed dispersion with imprecise studies • Similarly, a significant p-value could reflect a substantial amount of observed dispersion but could also reflect a minor amount of observed dispersion with precise studies. • The purpose of the test is to assess the viability of the null hypothesis, and not to estimate the magnitude of the true dispersion. CAMARADES: Bringing evidence to translational medicine Estimating Tau2 • Tau2 is defined as the variance of the true effect sizes • In other words, if we had an infinitely large (so that the estimate in each study was the true effect) and computed the variance of these effects, this variance would be . • Since we cannot observe the true effects we cannot compute this variance directly but estimate it from the observed effects, with the estimate denoted T2. • To yield this estimate we start with the difference (Q-df) which represents the dispersion in true effects on a standardised scale. We divide by a quantity (C) which has the effect of putting the measure back into its original metric and also making it an average, rather than a sum, of squared deviations. CAMARADES: Bringing evidence to translational medicine • This means that T2 is in the same metric (squared) as the effect itself, and also reflects the absolute amount of variation in that scale. • While the actual variance of the true effects can never be less than zero, our estimate of this value T2 can be less than zero if, because of sampling error the observed variance is less than we would expect based within-study error- in other words, if Q<df. In this case, T2 is simply set to zero. • If Q>df then T2 will be positive and it will be based on 2 factors. The first is the amount of excess variation (Q-df), and the second is the metric of the effect size index. CAMARADES: Bringing evidence to translational medicine • The impact of the excess variation on our estimate of T2 is evident if we compare A vs B. – • The within study error is smaller in B. Therefore while the observed variation is the same in both plots, a higher proportion of this variation is assumed to be real in B. As we move from A to B, Q moves from 12 to 48.01 and T2 from 0.037 to 0.057. The impact of the scale on our estimate of T2 is evident if we compare C vs D. – Q and df are the same in the 2 plots , which means that the same proportion of the observed variance will be attributed to between-studies variance. However, the absolute amount of the variance is larger in D, so this proportion translates into a larger estimate of . As we move from C to D, T2 moves from 0.037 to 0.096. CAMARADES: Bringing evidence to translational medicine T2 • T2 (our estimate for the variance of the true effects) is used to assign weights under the random effects model, where the weight assigned to each study is • In words, the total variance for a study (V*Yi) is the sum of the within-study variance VY and the between-studies variance, (T2). • This method of estimating the variance between studies is known as the method of moments or the DerSimonian and Laird method. CAMARADES: Bringing evidence to translational medicine Tau refers to the actual variance and T2 is our estimate of this parameter. • Now we turn to the standard deviation of the true effect sizes. • Here, refers to the actual standard deviation and T is our estimate of this parameter. • T, the estimate of the standard deviation, is simply the square root of T2. • CAMARADES: Bringing evidence to translational medicine The expected distribution of true effects, based on T. E.g. Plot A the summary effect is 0.41 and T is 0.193. We expect that some 95% of the true effects will fall in the range of 0.41 plus or minus 1.96 T, or 0.04 to 0.79 and this is reflected in the bell curve. Plots A and B have the same observed variance , but differ in the proportion of this variance that is attributed to real differences in effect size. In A, the bell curve is relatively narrow and captures only a fraction of the observed dispersion- the rest is assumed to reflect error. In B, the bell curve is relatively wide, and captures a larger fraction of the dispersion, since most of the dispersion is here assumed to be real. CAMARADES: Bringing evidence to translational medicine The expected distribution of true effects, based on T. Similarly, plots C vs D the ratio of true to observed variance is the same, but the observed dispersion is larger in D. The bell curve is wider in D than in C but in both cases a comparable proportion of the effects fall within the range of the curve (because the ratio is the same). CAMARADES: Bringing evidence to translational medicine T • T enables us to talk about the substantive importance of the dispersion. • An intervention with a summary effect size of 0.50. – If T is 0.10, then most of the effects (95%) fall in the approximate range of 0.30 to 0.70. – If T is 0.20 then most of the true effects fall in the approx range of 0.10 to 0.90. – If T is 0.30 then most of the true effects fall in the approx range of -0.10 to +0.10 CAMARADES: Bringing evidence to translational medicine I2 • What proportion of the observed variance reflects real differences in effect size? • The statistic I2 reflects this proportion • That is, the ratio of excess dispersion to total dispersion. The statistic I2 can be viewed as a statistic of the form • That is, the ratio of true heterogeneity to total variance across the observed effect estimates. However, this is not a true definition of I2 because in reality there is not a single VY, since the within-study variances vary from study to study. • The I2 statistic is a descriptive statistic and not an estimate of any underlying quantity. CAMARADES: Bringing evidence to translational medicine Impact of excess dispersion on I2 • For any df, I2 moves in tandem with Q. As such, it is driven entirely by the ratio of observed dispersion to within-study dispersion. • In the top row, both plots A and B have a Q value of 12.00 with 5 degrees of freedom. Therefore both have an I2 58.34%. • Similarly in plots C and D. The wider scale does not impact the I2. CAMARADES: Bringing evidence to translational medicine Impact of excess dispersion on I2 I2 reflects the extent of overlap of CIs, which is not dependent on the actual location or spread of the true effects. As such it is convenient to view I2 as a measure of inconsistency across the findings of the studies, and not as a measure of the real variation across the underlying true effects. The scale of I2 has a range of 0-100%, regardless of the scale used for the metaanalysis itself. It can be interpreted as a ratio, and has the additional advantage being analogous to indices used in psychometrics (where reliability is the ratio of true to total variance) or regression (where R2 is the proportion of the total variance that can be explained by covariates). Importantly, I2 is not directly affected by the number of studies in the analysis. CAMARADES: Bringing evidence to translational medicine Concluding remarks I2 • I2 allows us to discuss the amount of variance on a relative scale • We can use I2 to determine what proportion of the observed variance is real • If I2 is near zero, then almost all of the observed variance is spurious, which means there is nothing to explain. • If I2 is large, then it would make sense to speculate about reasons for the variance & possibly to apply techniques such as subgroup analysis or metaregression to try & explain it. – – – • Low 25% Moderate 50% High 75% This indicates what proportion of the observed variation is real but does not address the dispersion. CAMARADES: Bringing evidence to translational medicine Comparing the measures of heterogeneity • The Q statistic and its p-value serve as a test of significance. Useful because depends on number of studies and not sensitive to the metric of the effect size index. • T2 serves as the between-studies variance in the analysis and our estimate of T serves as the standard deviation of the true effects. Useful because they are sensitive to the metric of the effect size and they are not sensitive to the number of studies. • I2 is the ratio of true heterogeneity to total variation in observed effects, a kind of signal to noise ratio. Useful because it is not sensitive to the metric of the effect size and it is not sensitive to the number of studies. CAMARADES: Bringing evidence to translational medicine • T2 and T reflect the amount of true heterogeneity (the variance or the standard deviation) • I2 reflects the proportion of observed dispersion that is due to this heterogeneity. CAMARADES: Bringing evidence to translational medicine I2 reflects only the proportion of variance that is true, and says nothing about the absolute value of this variance. Above, I2 is the same in both plots but in A the true effects are clustered in a small range (T2=0.006) while in B they are dispersed across a wider range (T2=0.037). T2 reflects only the absolute value of the true variance and says nothing about the proportion of observed variance that is true. Above, T2 is the same in both plots , but in A it is a large part (I2= 58.34%) of a small observed dispersion whereas in B it is a small part (I2=16.01%) of a large observed dispersion. CAMARADES: Bringing evidence to translational medicine Please note • T2 is tied to the effect size index while I2 is not. For example, T2 for a synthesis of risk ratios will be in the metric of log risk ratios while T2 for a synthesis of standardised mean differences will be in the metrics of standardised mean differences. It would not be meaningful to compare the T2 values for 2 synthesis unless they were in the same metric. • By contrast, I2 is on a ratio scale of 0% to 100% and it is possible to compare this value from different syntheses. CAMARADES: Bringing evidence to translational medicine 1. 2. The Q statistic and its p-value only address the viability of the null hypothesis and not the amount of excess dispersion. Q is sensitive to relative variance (the kind tracked by I2) and not absolute variance (the kind tracked by T2 and T). An informative presentation of heterogeneity indices requires both a measure of the magnitude and a measure of uncertainty. Magnitude may be represented by the degree of true variation on the scale of the effect measure (T2) or the degree of inconsistency (I2) or both. Uncertainty over whether apparent heterogeneity is genuine may be expressed using the p-value for Q or using confidence intervals for T2 or I2. Note that uncertainty around T2 or I2 is often very large. If the studies themselves have poor precision (wide CIs), this could mask the presence of real heterogeneity, resulting in an estimate of zero for T2 and I2. Therefore, it would be a mistake to interpret a T2 or I2 of zero as meaning that the effect sizes are consistent unless this is justified by CIs for T2 and I2 that exclude large values. CAMARADES: Bringing evidence to translational medicine Confidence intervals T2 I2 CAMARADES: Bringing evidence to translational medicine CAMARADES: Bringing evidence to translational medicine
© Copyright 2026 Paperzz