CHAPTER 21 Notes on Subgroup Analyses and Meta-Regression Introduction Computational model Multiple comparisons Software Analyses of subgroups and regression analyses are observational Statistical power for subgroup analyses and meta-regression INTRODUCTION In this chapter we address a number of issues that are relevant to both subgroup analyses (analysis of variance) and to meta-regression. COMPUTATIONAL MODEL The researcher must always choose between a fixed-effect model and a randomeffects model. When we are working with a single set of studies the fixed-effect analysis assumes that all studies share a common effect size. When we are working with subgroups, it assumes that all studies within a subgroup share a common effect size. When we are working with meta-regression, it assumes that all studies which have the same values on the covariates share a common effect size. These kinds of assumption can sometimes be justified, as in the pharmaceutical example that we used on pages 83, 161 and 195. In most cases, however, especially when the studies for the review have been culled from the literature, it is more plausible to assume that the subgroup membership or the covariates explain some, but not all, of the dispersion in effect sizes. Therefore, the random-effects model is more likely to fit the data, and is the model that should be selected. Introduction to Meta-Analysis. Michael Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein © 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-05724-7 206 Heterogeneity Mistakes to avoid in selecting a model When we introduced the idea of a random-effects model in Chapter 12 we noted that researchers sometimes start with a fixed-effect model and then move to a randomeffects model if there is empirical evidence of heterogeneity (a statistically significant p-value). In the case of subgroup analysis this approach would suggest that we start by using the fixed-effect model within groups, and then move to the random-effects model only if Q within groups was statistically significant. In the case of meta-regression it would suggest that we start by using the fixed-effect model, and then move to the random-effects model only if the Q for the residual error was statistically significant. We explained that this approach was problematic when working with a single set of studies, and it continues to be a bad idea here, for the same reasons. If substantive considerations suggest that the effect size is likely to vary (within the full set of studies, within subgroups, or for studies with a common set of covariate values) then we should be using the corresponding model even if the test for heterogeneity fails to yield a significant p-value. This lack of significance means only that we have failed to meet a certain threshold of proof (possibly because of low statistical power) and does not prove that the studies share a common effect size. Practical differences related to the model Researchers often ask about the practical implications of using a random-effects model rather than a fixed-effect model. The random-effects model will apportion the study weights more evenly, so that a large study has less impact on the summary effect (or regression line) than it would have under the fixed-effect model, and a small study has more impact that it would have under the fixed-effect model. Also, confidence intervals will tend to be wider under the random-effects model than under the fixed-effect model. While this tells us what the impact will be of using the fixed-effect or random-effects model, it says nothing about which model we should use. The only issue relevant to that decision is the question of which model fits the data. The null hypothesis under the different models Since the meaning of a summary effect size is different for fixed versus random effects, the null hypothesis being tested also differs under the two models. Recall that when we are working with a single group, under the fixed-effect model the summary effect size represents the common effect size for all the studies. The null hypothesis is that the common effect size is equal to a nil value (0.00 for a difference, or 1.00 for a ratio). By contrast, under the random-effects model the summary effect size represents the mean of the distribution of effect sizes. The null hypothesis is that the mean of all possible studies is equal to the nil value. Chapter 21: Notes on Subgroup Analyses and Meta-Regression In subgroup analyses, under the fixed-effect model the summary effect size for subgroups A and B each represents the common effect size for a group of studies. The null hypothesis is that the common effect for the A studies is the same as the common effect for the B studies. By contrast, under the random-effects model the effect size for subgroups A and B each represents the mean of a distribution of effect sizes. The null hypothesis is that the mean of all possible A studies is identical to the mean of all possible B studies. In regression, under the fixed-effect model we assume that there is one true effect size for any given value of the covariate(s). The null hypothesis is that this effect size is the same for all values of the covariate(s). By contrast, under the randomeffects model we assume that there is a distribution of effect sizes for any given value of the covariate(s). The null hypothesis is that this mean is the same for all values of the covariate(s). While the distinction between a common effect size and a mean effect size might sound like a semantic nuance, it actually reflects an important distinction between the models. In the case of the fixed-effect model, because we assume that we are dealing with a common effect size, we apply an error model which assumes that the between-studies variance is zero. In the case of the random-effects model, because we allow that the effect sizes may vary, we apply an error model which makes allowance for this additional source of uncertainty. This difference has an impact on the mean (the summary effect for a single group, the summary effect within subgroups, and the slope in a meta-regression). It also has an impact on the standard error, tests of significance and confidence intervals. Some technical considerations in random-effects meta-regression As is the case for a standard random-effects meta-analysis in the absence of covariates, several methods are available for estimating 2 in meta-regression, including a moment method and maximum likelihood method. In practice, any differences among methods will usually be small. The results we presented in this chapter used a moment estimate, which is the same as the method we used in Chapter 16 to estimate 2 for a single group. If we were to perform a meta-regression with no covariates, our estimate of 2 would be the same as the estimate we would obtain using the formulas in that chapter. Whichever method is used to estimate 2, the use of a Z-test to assess the statistical significance of a covariate (or the difference between two subgroups), while common, is not strictly appropriate. When dealing with simple numerical data, to compute a confidence interval or to test the significance of a difference (or variance) we use Z if the sampling distribution is known. By contrast, we use t if the sampling distribution is being estimated from the dispersion observed in the sample (as it is, for example, when we compare means using a t-test). Similarly, in meta-analysis, the Z-distribution is appropriate only for the fixed-effect model, where the only source of error is within studies. By contrast, when we use a 208 Heterogeneity random-effects model, we are estimating the dispersion across studies, and should account for this by using a t-distribution. Several methods have been proposed to address this issue, including one by Knapp and Hartung (2003) which is outlined below. While these can be applied to any use of the random-effects model (for a single group of studies, for a subgroup analysis, and for meta-regression), they have to date only been implemented in software for meta-regression. The Knapp-Hartung method involves two modifications to the standard error for the random-effects model. First, the between-studies component of the variance is multiplied by a factor that makes it correspond to the t-distribution rather than the Z-distribution. Second, the test statistic is compared against the t-distribution rather than the Z-distribution. This has the effect of expanding the width of the confidence intervals and of moving the p-value away from zero. Higgins and Thompson (2004) proposed an approach that bypasses the sampling distributions and instead employs a permutation test to yield a p-value. Using this approach we compute the Z-score corresponding to the observed covariate. Then, we randomly redistribute the covariates among studies and see what proportion of these yield a Z-score exceeding the one that we had obtained. This proportion may be viewed as an exact p-value. MULTIPLE COMPARISONS In primary studies researchers often need to address the issue of multiple comparisons. The basic problem is that if we conduct a series of analyses with alpha set at 0.05 for each, then the overall likelihood of a type I error (assuming that the null is actually true) will exceed 5%. This problem crops up when a study includes more than two groups and we compare more than one pair of means. It also arises when we perform an analysis on more than one outcome. While there is consensus that conducting many comparisons can pose a problem, there is no consensus about how this problem should be handled. Some suggest conducting an omnibus test that asks if there are any nonzero effects, and then proceeding to look at pair-wise comparisons only if the initial test meets the criterion for significance. Others suggest going straight to the pairwise tests but using a stricter criterion for significance (say 0.01 rather than 0.05 for five tests). Hedges and Olkin (1985) discuss this and other methods to control the error rate when using multiple tests. Some suggest that the researcher not make any formal adjustment, but evaluate the data in context. For example, one significant p-value in forty tests would not be seen as grounds for rejection of the null hypothesis. Essentially the same issue exists in meta-analysis. In the case of subgroup analyses, if a meta-analysis includes a number of subgroups, the issue of multiple comparisons arises when we start to compare several pairs of subgroup means. In the case of meta-regression this issue arises when we include a number of covariates and want to test each one for significance. As with primary studies, while there is Chapter 21: Notes on Subgroup Analyses and Meta-Regression consensus that conducting many comparisons can pose a problem, there is no consensus about how this problem should be handled. The approaches generally used for primary studies can be applied to meta-analysis as well. SOFTWARE Some of the programs developed for meta-analysis are able to perform subgroup analysis as well as meta-regression (see Chapter 44). Note that programs intended for statistical analysis of primary studies should not be used to perform these procedures in meta-analysis, for two reasons. First, routines for analysis of variance or multiple regression intended for primary studies do not weight the studies, as is needed for meta-analysis. While most programs do allow the user to assign weights, this becomes a difficult procedure when we move to random-effects weights (which are usually the ones we want to use). Second, the rules for assigning degrees of freedom in the analysis of variance and meta-regression are different for metaanalysis than for primary studies, and so using the primary-study routines for a meta-analysis will yield incorrect standard errors and p-values. ANALYSES OF SUBGROUPS AND REGRESSION ANALYSES ARE OBSERVATIONAL In a randomized trial, participants are assigned at random to a condition (such as treatment versus placebo). Because the participants are assumed to be similar in all respects except for the treatment condition, differences that do emerge between conditions can be attributed to the treatment. By contrast, in an observational study we compare pre-existing groups, such as workers with a college education versus those who did not attend college. While we can report on differences in wages of the two groups we cannot attribute this outcome to the amount of schooling because the groups differ in various ways. For example, we are likely to find that those with a college education are paid more, but we cannot attribute this to their schooling since it could be due (at least in part) to other factors associated with higher socioeconomic status. The issue of randomized versus observational studies as it relates to metaanalysis is discussed in Chapter 40. There, we discuss the fact that randomized studies and observational studies address different questions, and for this reason it generally makes sense to include only one or the other in a given meta-analysis. However, there is one issue that is directly relevant to the present discussion, as follows. Assume we start with a set of randomized experiments that assess the impact of an intervention. The effect in any single experiment could serve to establish causality and the summary effect can also serve to establish causality. This is because the relationship between treatment and outcome is protected by the randomization process (it must be due to treatment) in each study, and this protection carries over to the summary effect. 210 Heterogeneity However, even if the individual studies are randomized trials, once we move beyond the goal of reporting a summary effect and proceed to perform a subgroup analyses or meta-regression, we have moved out of the domain of randomized experiments, and into the domain of observational studies. For example, suppose that half the studies used a low dose of aspirin while half used a high dose, and that the impact of the treatment was significantly stronger in the high-dose studies. It is possible that the difference is due to the dose, but it is also possible that the studies that used a higher dose differed in some systematic way from the other studies. Perhaps these studies used patients who were in poor health, or older, and therefore more likely to benefit from the treatment. Therefore, the difference between subgroups, or the relationship between a covariate and effect size, is observational. The same caveats that apply to any observational studies, in particular the fact that relationship does not imply causality, apply here too. That said, in primary observational studies, researchers sometimes use regression analysis to try and remove the impact of potential confounders. In the aspirin example they might enter covariates in the sequence of health, age, and dose, to assess the impact of dose with health and age held constant. This is not a perfect solution since there may be other confounders of which we are not aware, but this approach can help to isolate the impact of specific factors and generate hypotheses to be tested in randomized trials. The same holds true for metaregression. Of course, since covariate values are assigned at the study level, meta-regression can be used to adjust for potential confounders only for comparisons across studies, and not for potential confounders within studies. There is one exception to the rule that subgroup analysis and regression cannot prove causality. This exception is the case where we know that the studies are identical in all respects except for the one captured by subgroup membership or by the covariate. The pharmaceutical example is a case in point. Here, we enrolled 1000 patients and assigned some to studies that would test a low dose of the drug vs. placebo, and others to studies that would test a high dose of the drug vs. placebo. Here, the assignment to subgroups is random. The same would apply if the patients were assigned to ten studies where the dose of drug was varied on a continuous scale, and we used meta-regression to test the relationship between dose and effect size. This set of circumstances is rarely (if ever) found in practice. STATISTICAL POWER FOR SUBGROUP ANALYSES AND META-REGRESSION Statistical power is the likelihood that a test of significance will reject the null hypothesis. In the case of subgroup analyses it is the likelihood that the Z-test to compare the effect in two groups, or the Q-test to compare the effects across a series of groups, will yield a statistically significant p-value. In the case of meta-regression it is the likelihood that the Z-test of a single covariate or the Q-test of a set of covariates will yield a statistically significant p-value. Power depends on the size of the effect and the precision with which we measure the effect. For subgroup analysis this means that power will increase as the Chapter 21: Notes on Subgroup Analyses and Meta-Regression difference between (or among) subgroup means increases, and/or the standard error within subgroups decreases. For meta-regression this means that power will increase as the magnitude of the relationship between the covariate and effect size increases, and/or the precision of the estimate increases. In both cases, a key factor driving the precision of the estimate will be the total number of individual subjects across all studies and (for random effects) the total number of studies. While there is a general perception that power for testing the main effect is consistently high in meta-analysis, this perception is not correct (see Chapter 29) and certainly does not extend to tests of subgroup differences or to meta-regression. The failure to find a statistically significant p-value when comparing subgroups or in meta-regression could mean that the effect (if any) is quite small, but could also mean that the analysis had poor power to detect even a large effect. One should never use a nonsignificant finding to conclude that the true means in subgroups are the same, or that a covariate is not related to effect size. SUMMARY POINTS The selection of a computational model (fixed-effect or random-effects) should be based on our understanding of the underlying distribution. In most cases, especially when the studies have been gathered from the published literature, the random-effects model (within-subgroups) is more plausible than the fixed-effect model. The strategy of starting with the fixed-effect model and then moving to the random-effects (or mixed-effect) model if the test for heterogeneity is significant, is a mistake, and should be strongly discouraged. The problem of performing multiple tests (the fear that the actual alpha may exceed the nominal alpha) is similar in meta-analysis to the same problem in primary studies, and similar strategies are suggested for dealing with this problem. The relationship between effect size and subgroup membership, or between effect size and covariates, is observational, and cannot be used to prove causality. This holds true even if all studies in the analysis are randomized trials. The protection afforded by the study design carries over to the summary effect across all studies, but not to other analyses. Statistical power for detecting a difference among subgroups, or for detecting the relationship between a covariate and effect size, is often low, and the usual caveats apply. To wit, failure to obtain a statistically significant difference among subgroups should never be interpreted as evidence that the effect is the same across subgroups. Similarly, failure to obtain a statistically significant effect for a covariate should never be interpreted as evidence that there is no relationship between the covariate and the effect size. 212 Heterogeneity Further Reading Cohen, J., West, S.G., Cohen, P., & Aiken, L. (2002). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed). Mahwah, NJ, Lawrence Erlbaum Assoc. Higgins, J.P.T., & Thompson, S.G (2004). Controlling the risk of spurious findings from metaregression. Statistics in Medicine 23,1663–1682. Knapp, G. & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine 22, 2693–2710.
© Copyright 2026 Paperzz