meta-analyses - Oxford Academic

Human Reproduction, Vol.29, No.8 pp. 1622 –1626, 2014
Advanced Access publication on June 4, 2014 doi:10.1093/humrep/deu127
INVITED COMMENTARY
The good, the bad and the ugly:
meta-analyses
Madelon van Wely*
Center for Reproductive Medicine, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
*Correspondence address. E-mail: [email protected]
Submitted on April 24, 2014; resubmitted on April 24, 2014; accepted on May 1, 2014
There seems to be a growing negativity toward meta-analyses. Two years
ago systematic reviews and meta-analyses, and more specifically
Cochrane reviews, were critiqued (Humaidan et al., 2012; Humaidan
and Polyzos, 2012). In the present issue of Human Reproduction
another example of negative publicity toward meta-analyses is published
in the form of an Opinion paper (Simón and Bellver, 2014). The authors
use the meta-analyses that have been published on the value of endometrial scratching in IVF as an example. Meta-analyses-attacking authors in
essence argue that meta-analyses should be faultless while meta-analyses
are considered to be highest in the evidence-based pyramid. But, as the
critics rightfully point out, studies that include meta-analyses are often
not without biases. What is going on? Are meta-analyses not as useful
as we thought they would be? Are the included studies not good enough?
In this Editorial, we will discuss the use and pitfalls of aggregate
meta-analyses. Furthermore, the main points raised by the authors of
the Opinion paper (Simón and Bellver, 2014) will be addressed. What
should be done if only small trials of possibly low quality are available?
When should we perform meta-analysis? Should we accept a combination of observational and randomized studies in meta-analyses? What
can we do when multiple meta-analyses are published within a short
time-period?
A Short History
Meta-analysis has developed over time as a way to deal qualitatively with
varying study results (O’Rourke, 2007). Questions on the possibility of
summarized results from different studies were tackled in the 18th and
19th century by astronomers and mathematicians such as Gauss and
Laplace (Laplace, 1820).
In 1906 the British statistician Karl Pearson was the first to apply
methods to combine observations from different clinical studies. He
was asked to analyze data comparing infection and mortality among soldiers who had volunteered for inoculation against typhoid fever in various
places across the British Empire with that of other soldiers who had not
volunteered (Pearson, 1904). During the same period the British statistician Ronald Fisher and colleagues worked on the appropriate analysis of
multiple studies in agriculture (Fisher, 1935). Their methods formed a
basis for the meta-analytical methods as we know them. In medicine,
the first publication on the aggregation of findings from different
studies was in 1955 by Beecher who combined studies that compared
a placebo with a treatment (Beecher, 1955).
It was in 1976 that Gene Glass used the term ‘meta-analysis’ to refer to
‘the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings’ (Glass, 1976;
O’Rourke, 2007). Subsequently, many statisticians worked on further
improvements of the statistical methods behind the process of
meta-analyzing results.
What Is Meta-Analysis?
Conceptually, a meta-analysis uses a statistical approach to combine the
results from multiple studies. In practice the analysis has to be preceded
by a systematic review that starts off with a clearly formulated clinical
question. A systematic review means that the available literature is evaluated in a systematic way such that it is reproducible for others to prevent
author-induced selective bias in the inclusion of studies. After selecting
and describing studies, meta-analysis can be used to summarize the predefined outcome.
The major advantage of meta-analysis is that accumulation of evidence
can improve the precision and accuracy of effect estimates and increase
the statistical power to detect an effect. A further advantage of
meta-analysis is that it facilitates the generalization of results to a larger
population. With the help of cumulative meta-analysis a shift over time
can be visualized. In cumulative meta-analysis studies are added one at
a time, usually according to date of publication. The results are summarized as each new study is added. In a forest plot of a cumulative
meta-analysis only the first horizontal line represents the results of that
single study. The following horizontal lines are the summary of the
results after inclusion of each subsequent study. An example is provided
for all studies that compared urinary-derived gonadotrophins and recombinant FSH (Fig. 1). After the first six studies the difference in live
birth was in favor of recombinant FSH but after including many more
trials, there was no longer any evidence of a difference (van Wely and
van der Veen, 2011).
& The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved.
For Permissions, please email: [email protected]
1623
Invited Commentary
Figure 1 An example of a cumulative meta-analysis. Adapted from van Wely and van der Veen (2011).
Within a meta-analysis inconsistency of results across studies can
be quantified and analyzed. For instance, does inconsistency arise from
sampling error, or are study results influenced by between-study
heterogeneity?
Meta-analysis can be extended by meta-regression. Meta-regression
allows the evaluation of the effects of continuous and categorical variables and is in essence a more advanced way to do subgroup analysis.
Though meta-regression is a valuable method to assess differential
effects in subgroups, many studies will lack power to find a difference.
As a rule of thumb about 10 studies are required to evaluate differential
effects for one variable.
Popularity of Meta-Analyses
Today, meta-analysis has become a key component of evidence-based
medicine. Clearly there has been a tremendous rise in meta-analyses
over the last decade. Looking at the amount of clinical meta-analyses
in PubMed it seems however that the top has been reached (Fig. 2).
The rise in meta-analyses over the last decade is the result of the
growth in the number of clinical trials and of the desire to use accruing
evidence as early as possible to improve health care decisions.
Moher and Olkin (1995) described the ‘dramatic increase’ in the
number of published meta-analyses. They suggested that several
1624
meta-analyses have improved medicine. As an example the authors use
the meta-analysis that described the efficacy of corticosteroids given to
mothers that were expected to deliver prematurely (Crowley et al.,
1990). The results of their meta-analysis did not only indicate that corticosteroids significantly reduced morbidity and mortality of these infants
but also showed that such evidence was available at least a decade earlier.
The authors stated that, had a meta-analysis been conducted when the
evidence became available, much unnecessary suffering might have been
avoided.
It is therefore understandable that policy makers use systematic
reviews and meta-analyses, in addition to randomized controlled trials
(RCTs), in their decision-making. Moreover, it has become standard
practice to ask for a systematic review and meta-analysis on what is
known on a certain subject in grant applications. Indeed an evidencebased overview is always helpful as long as the quality of the evidence
is acceptable.
Figure 2 Number of meta-analyses in PubMed after imputing the
term ‘meta-analysis’ and limiting to ‘clinical’ and ‘human’.
Invited Commentary
Problems That May Arise in
Meta-Analyses
No computer or statistical means can solve the problem that if the data
are poor, the product of the analysis will be poor as well. This is also
known as GIGA, or the ‘garbage in – garbage out’ principle. The best
way to deal with this is to assess the quality of the studies. In epidemiological evaluations we are more or less looking for trends. Accumulating
evidence from large cohorts is required to investigate relatively rare
safety issues like neurological sequelae in preterm born children. In clinical meta-analyses on the effectiveness of interventions only methodologically sound studies should be included in a meta-analysis, a practice
called ‘best evidence synthesis’. This means the inclusion of RCTs and
exclusion of non-randomized studies as these are more likely to find
large effects due to their non-random nature. Furthermore, underpowered studies with large effects in assisted reproductive technology (ART)
studies should be considered with care and must ensure that the saying is
not abused. It cannot be that every time somebody dislikes a result it is
stated that the included studies had invalid data. Including only randomized
trials will help. A further helpful tool is to look at the statistical
heterogeneity. Figure 3 shows two forest plots. The first one is a forest
plot of four studies that have quite similar or homogeneous results, also
expressed as an inconsistency measure or I square of 0%. The second
one is a forest plot of four studies with heterogeneous results; the corresponding I square here was 90%. With such large heterogeneity between
studies the pooled estimate does not represent a true difference.
Furthermore, it should be realized that a meta-analysis of several small
studies does not predict the results of a single large study. A wellpowered RCT is what we really need (Lelorier et al., 1997). Only small
trials could be included in the meta-analyses on endometrial scratching.
The Cochrane Review on the subject could include four trials that evaluated endometrial injury in the previous cycle in terms of pregnancy outcomes (Nastri et al., 2012). A quick update of that review resulted in the
Figure 3 Two forest plots of four fictive studies. The left graph shows the effect measures for each individual study are all within each others’ boundary, i.e.
data are homogenous. The right graph shows large differences in effect measures between each individual study, i.e. data are heterogeneous.
Invited Commentary
1625
Figure 4 The pooled and study specific risk rate for clinical pregnancy following endometrial injury versus no endometrial injury in the previous cycle of
couples that underwent IVF.
inclusion of six trials, after including three other trials (Baum et al., 2012;
Shohayeb et al., 2012; Nastri et al., 2013) and removing the interim analysis of the Nastri trial. Publication of interim analyses is not advisable.
Such an interim analysis can affect the future conduct of a trial and
make interpretation of final results difficult. Looking at the forest plot
of the updated meta-analysis there was no statistical heterogeneity
between the studies, as can be seen in Fig. 4. A differential effect was
seen only in the smallest study with 36 women (Baum et al., 2012).
Still, in view of the concept that several small studies may not predict
the results of a single large study, we cannot be sure yet whether endometrial scratching leads to better results. The good news is that larger
trials are ongoing in different parts of the world (see http://www.
clinicaltrials.gov/). These studies should take into account safety issues
as well as patient burdensomeness as the procedure has been reported
to be painful (Nastri et al., 2013).
Unit of randomization error is a commonly seen problem in
meta-analyses. When women or couples are randomized to the interventions of interest, then all outcomes in the meta-analyses should be
expressed per woman. Expressing the outcome per embryo would artificially increase the evidence. Mind you, it was not the embryo that was
randomized. As a result implantation rate following IVF is inappropriate in
meta-analyses, unless all women had a single embryo transferred.
Changing primary outcomes or basing the conclusions on secondary
outcomes is another major problem that can partly be prevented by
registering the meta-analyses in Prospero (http://www.crd.york.ac
.uk/PROSPERO/). Even better would be the registration of a protocol
which is mandatory for Cochrane reviews.
Another problem in meta-analysis is publication bias. Usually this is
due to the underreporting of trials that did not find a difference. Nowadays this form of bias is easier to detect as journals request randomized
trials to be registered at one of the trial registries (http://www.
controlled-trials.com/isrctn/search.html, http://www.clinicaltrials.gov/
ct2/search/index). Checking trial registries for relevant trials should
therefore be part of the literature search.
Other publications issues are the existence of more meta-analyses
than actual trials in the literature and the publication of multiple
meta-analyses within a short time period on the same comparison. It is
the responsibility of the editors of journals to check what has been recently published in the field. Editors and reviewers should together aim
to prevent not only double publication but also over-publications. For
authors it can be wise to go to the Prospero website to check whether
another group is already doing the same thing as more and more nonCochrane reviews will be registered here (http://www.crd.york.ac.
uk/PROSPERO/).
1626
When to Meta-Analyse and
Update
In a critique on systematic reviews and meta-analyses it was recommended that systematic reviews should include at least three to four
trials with a total sample size of a minimum of 1000 patients (Humaidan,
Polyzos, 2012). There is no evidence at all for such a policy. We need to
know all evidence to evaluate what is done in daily practice and whether
this can be improved in a safe and effective way. In response to the critique well-known methodologists wrote that all clinical decisions
should be based on good-quality systematic reviews that provide a synthesis of the current best evidence, no matter how shaky or sparse the
evidence might be. When it is demonstrated that evidence is weak or inadequate this still adds value by revealing to knowledge users the true
nature of the evidence informing their decisions (Ansari and Moher,
2013).
Cochrane reviews have their own dynamics concerning when to do a
review and when to update. As written previously ‘Cochrane reviews
should not be postponed, waiting for more evidence. On the contrary,
they should be undertaken and published when an important clinical
question has been addressed by clinical trials’ (Hughes et al., 2012).
Cochrane reviews are updated on a regular basis. The increase in information may result in narrowing down the boundaries of the effect estimate. Sometimes, the update actually changes the conclusions. This
does not imply that previous reviews were wrong but does show that
the evidence up to that point had been inadequate and that the update
was a necessity.
In the Opinion paper in the present issue of Human Reproduction the
following statement was made: ‘The weakness of published
meta-analyses is so evident that some societies such as the Royal
College of Obstetricians and Gynaecologists have created guidelines
subdividing the level of evidence 1...’ (Simón and Bellver, 2014). Subdividing evidence does not relate to a weakness. Evidence reflects what we
know now at this moment. Due to changing policies, protocols, patient
populations, concomitant diseases, etc., we can never be 100% sure that
our effect measure reflects the truth. However, solid evidence proves
with a reasonable certainty that we do know the truth. Level 1 evidence
stands for likely reliable evidence for interventional conclusion and
reflects the presence of RCTs with or without meta-analyses that
pooled the available evidence. The RCOG uses a subdivision of level 1
evidence that is actually really helpful in that respect. Evidence 1+ +
can be interpreted as it being highly likely that the evidence reflects the
truth. Evidence 1+ implies it is most likely that the observed effects
are true effects. With evidence 12 it seems that the effect is as observed
but we do need more evidence to be sure. All evidence 2 and above
are not based upon truly randomized trials and are therefore more
prone to bias.
Summary
The intention of clinical systematic reviews and meta-analyses is to summarize all available good-quality evidence. Meta-analysis should be seen
as a helpful tool. It is not the tool that should be criticized but the people
using the tool in case the analysis was not done appropriately.
It is impossible to prevent all pitfalls in systematic reviews and
meta-analyses. But what we can do is to register all potential problems
Invited Commentary
and not jump to conclusions too quickly. Be careful with meta-analyses
that only include small underpowered trials and watch out for heterogeneous results.
Good meta-analyses are objective and take into account both effectiveness and safety and do not base their conclusions on a bunch of secondary outcomes. When used wisely meta-analyses will remain a helpful
friend.
References
Ansari MT, Moher D. Systematic reviews deserve more credit than they get.
Nat Med 2013;19:395– 396.
Baum M, Yerushalmi GM, Maman E, Kedem A, Machtinger R, Hourvitz A,
Dor J. Does local injury to the endometrium before IVF cycle really
affect treatment outcome? Results of a randomized placebo controlled
trial. Gynecol Endocrinol 2012;28:933 – 936.
Beecher HK. The powerful placebo. JAMA 1955;159:1602 –1606.
Crowley P, Chalmers I, Keirse MJ. The effects of corticosteroid
administration before preterm delivery: an overview of the evidence
from controlled trials. Br J Obstet Gynaecol 1990;97:11 – 25.
Fisher RA. The Design of Experiments. Edinburgh: Oliver and Boyd, 1935.
Glass GV. Primary, secondary and meta-analysis of research. Educ Res 1976;
10:3 – 8.
Hughes EG, van Wely M, Farquhar CM. Cochrane reviews in perspective: the
importance of appropriate conclusions and timing of publication. Hum
Reprod 2012;27:3 – 5.
Humaidan P, Polyzos NP. (Meta)analyze this: systematic reviews might lose
credibility. Nat Med 2012;18:1321.
Humaidan P, Kol S, Engmann L, Benadiva C, Papanikolaou EG, Andersen CY.
Copenhagen GnRH Agonist Triggering Workshop Group. Should
Cochrane reviews be performed during the development of new
concepts? Hum Reprod 2012;27:6 – 8.
Laplace P-S. Théorie Analytique des Probabilités. Oeuvres Complètes 7, 3rd
edn. Paris: Courcier, 1820: lxxvii.
Lelorier J, Grégoire GV, Benhaddad A, Lapierre J, Derderian FO.
Discrepancies between meta-analyses and subsequent large
randomized, controlled trials. N Engl J Med 1997;337:536– 542.
Moher D, Olkin I. Meta-analysis of randomized controlled trials. A concern
for standards. JAMA 1995;274:1962– 1964.
Nastri CO, Gibreel A, Raine-Fenning N, Maheshwari A, Ferriani RA,
Bhattacharya S, Martins WP. Endometrial injury in women undergoing
assisted reproductive techniques. Cochrane Database Syst Rev 2012;
7:CD009517.
Nastri CO, Ferriani RA, Raine-Fenning N, Martins WP. Endometrial
scratching performed in the non-transfer cycle and outcome of assisted
reproduction: a randomized controlled trial. Ultrasound Obstet Gynecol
2013;42:375 – 382.
O’Rourke K. J R Soc Med 2007;100:579– 582.
Pearson K. Report on certain enteric fever inoculation statistics. BMJ 1904;
3:1243 – 1246.
Shohayeb A, El-Khayat W. Does a single endometrial biopsy regimen (S-EBR)
improve ICSI outcome in patients with repeated implantation failure? A
randomised controlled trial. Eur J Obstet Gynecol Reprod Biol 2012;
164:176– 179.
Simón C, Bellver J. Scratching beneath ‘The Scratching Case’: systematic
reviews and meta-analyses, the back door for evidence-based medicine.
Hum Reprod 2014;29:1618 – 1621.
van Wely M, van der Veen F. To assist or not to assist embryo hatching. Hum
Reprod Update 2011;17:436 – 437.