preprint - Open Science Framework

STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Publication bias in meta-analyses from psychology and medical research: A meta-meta-analysis
Robbie C. M. van Aert1, Jelte M. Wicherts1, and Marcel A. L. M. van Assen1,2
1
Tilburg University 2Utrecht University
Author Note
Funding for this research was provided by the Berkeley Initiative for Transparency in the Social
Sciences and the Laura and John Arnold Foundation.
Correspondence concerning the article should be addressed to Robbie C. M. van Aert,
Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, the
Netherlands
1
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Abstract
Publication bias is a substantial problem for the credibility of research in general and of metaanalyses in particular, as it yields overestimated effects and may suggest the existence of nonexisting effects. Although there is consensus that publication bias is widespread, how strongly it
affects different scientific literatures is currently less well-known. We examine evidence of
publication bias in a large-scale data set of meta-analyses published in Psychological Bulletin
(representing meta-analyses from psychology) and the Cochrane Database of Systematic
Reviews (representing meta-analyses from medical research). Psychology is compared to
medicine, because medicine has a longer history than psychology with respect to preregistration
of studies as an effort to counter publication bias. The severity of publication bias and its
inflating effects on effect size estimation were systematically studied by applying state-of-the-art
publication bias tests and the p-uniform method for estimating effect size corrected for
publication bias. Publication bias was detected in only a small amount of homogeneous subsets.
The lack of evidence of publication bias was probably caused by the small number of effect sizes
in most subsets resulting in low statistical power for publication bias tests. Regression analyses
showed that small-study effects were actually present in the meta-analyses. That is, smaller
studies were accompanied with larger effect sizes, possible due to publication bias.
Overestimation of effect size due to publication bias was similar for meta-analyses published in
Psychological Bulletin and Cochrane Database of Systematic Reviews.
Keywords: publication bias, meta-analysis, p-uniform, small-study effects
2
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Meta-analysis is the standard technique for synthesizing different studies on the same topic, and
is defined as “the statistical analysis of a large collection of analysis results from individual
studies for the purpose of integrating the findings” (Glass, 1976, p. 3). One of the greatest threats
to the validity of meta-analytic results is publication bias, meaning that the publication of studies
depends on the direction and statistical significance of the results (Rothstein, Sutton, &
Borenstein, 2005). Publication bias generally leads to effect sizes being overestimated and the
publication of false-positive results (e.g., Lane & Dunlap, 1978; Nuijten, van Assen, Veldkamp,
& Wicherts, 2015). Hence, publication bias results in false impressions about the magnitude and
existence of an effect (van Assen, van Aert, & Wicherts, 2015).
Evidence of publication bias exists in various research fields. The social sciences
literature consists of approximately 90% statistically significant results (Fanelli, 2012; Sterling,
Rosenbaum, & Weinkam, 1995), which is not in line with the on average low statistical power of
about 50% or less in, for instance, psychology (Bakker, van Dijk, & Wicherts, 2012; Cohen,
1990). Franco, Malhotra, and Simonovits (2014) examined publication bias in studies that
received a grant within social sciences, and concluded that 64.6% of the studies with statistically
nonsignificant results was not written up (cf. Cooper, DeNeve, & Charlton, 1997). In a highly
similar project within the psychological literature, Franco, Simonovits, and Malhotra (2016)
showed that 70% of the included outcomes in a study were not reported, and that this selective
reporting depends on statistical significance of the outcomes. The Reproducibility Project
Psychology (Open Science Collaboration, 2015) also provided indirect evidence for the existence
of publication bias within psychology. In this project, a selection of 100 psychological studies
were replicated to examine its replicability and to identify characteristics that explain replication
success. Only 39 studies were deemed to be replicated providing indirect evidence for the
3
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
presence of publication bias.
More attention has been paid to publication bias within medical research compared to the
social sciences (Hopewell, Clarke, & Mallet, 2005). Medical research has a longer history in
registering clinical trials before conducting the research (e.g., Dickersin, Chan, Chalmers, Sacks,
& Smith, 1987; Jones et al., 2013). As of 2007, the US Food and Drug Administration Act
(FDA) even requires US researchers to make the results of different types of clinical trials
publicly available independent of whether the results have been published or not. With registers
like clinicaltrials.gov, it is easier for meta-analysts to search for unpublished research, and to
include it in their meta-analysis. Furthermore, it is straightforward to study publication bias by
comparing the reported results in registers with the reported results in publications. Studies
comparing the reported results in registers and publications show that statistically significant
outcomes are more likely to be reported, and clinical trials with statistically significant results
have a higher probability of getting published (Dwan et al., 2008; Kirkham et al., 2010).
A number of methods exist to test for publication bias in a meta-analysis and to estimate
a meta-analytic effect size corrected for publication bias. However, publication bias is often not
routinely assessed in many meta-analyses (Aguinis, Dalton, Bosco, Pierce, & Dalton, 2010;
Aytug, Rothstein, Zhou, & Kern, 2012; Banks, Kepes, & Banks, 2012) or analyzed with
suboptimal methods that lack statistical power to detect it (e.g., Ioannidis, 2008; Ioannidis &
Trikalinos, 2007a). It has been suggested to reexamine publication bias in published metaanalyses (Banks, Kepes, & Banks, 2012; Banks, Kepes, & McDaniel, 2012) by applying recently
developed methods to better understand the severity and prevalence of publication bias in
different fields. These novel methods have better statistical properties than existing publication
bias tests and methods developed earlier to correct effect sizes for publication bias. Several
4
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
authors have recommended to not rely on a single method for examining publication bias in a
meta-analysis, but rather to use and report a set of different publication bias methods (Coburn &
Vevea, 2015; Kepes, Banks, McDaniel, & Whetzel, 2012). This so-called triangulation takes into
account that none of the publication bias methods outperforms all the other methods under each
and every condition; one method can signal publication bias in a meta-analysis whereas another
one does not. Using a set of methods to assess the prevalence and severity of publication bias
may yield a more balanced conclusion.
We will answer three research questions in this paper. The first research question is about
the prevalence of publication bias: “What is the prevalence of publication bias within metaanalyses about psychological and medical research?” (1a), and “Is publication bias more
prevalent in psychology than in medical research after controlling for the number of effect sizes
in a meta-analysis?” (1b). Medical research is selected to compare with psychology, because
more attention has been paid to publication bias in general (Hopewell et al., 2005) and study
registration in particular (e.g., Dickersin et al., 1987; Jones et al., 2013) within medicine
compared to psychology. We also evaluate the amount of agreement between different
publication bias methods. In the second research question, we examine whether effect size
estimates of traditional meta-analysis and corrected for publication bias by the p-uniform method
can be predicated by characteristics of a meta-analysis: “What characteristics of a meta-analysis
are predictors of the estimates of traditional meta-analysis and p-uniform?”. The third research
question also consists of two parts and is about overestimation in effect size caused by
publication bias: “How much is effect size overestimated by publication bias?” (3a), and “What
characteristics of a meta-analysis are predictors of the overestimation?” (3b).
The paper continuous by providing an overview of publication bias methods. Next, we
5
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
describe the criteria for a meta-analysis to be included in our study, and how the data of metaanalyses were extracted and analyzed. Subsequently, we provide the results of our analyses and
conclude with a discussion.
Publication bias methods
Methods for examining publication bias can be divided into two groups: methods that
assess or test the presence of publication bias and methods that estimate effect sizes corrected for
publication bias. Methods that correct effect sizes for publication bias usually also provide a
confidence interval and test the null hypothesis of no effect corrected for publication bias.
Assessing or testing publication bias
The most often used method for assessing publication bias is the fail-safe N method
(Banks, Kepes, & McDaniel, 2012; Ferguson & Brannick, 2012). This method computes how
many effect sizes with a zero effect size have to be added to a meta-analysis for changing a
statistically significant summary effect size in a meta-analysis to a nonsignificant result
(Rosenthal, 1979). Publication bias is deemed to be absent if the method suggests that a large
number of effect sizes should be added to the meta-analysis to get a nonsignificant summary
effect size. Applying the method is discouraged (Becker, 2005; Borenstein, Hedges, Higgins, &
Rothstein, 2009, Chapter 30), because it computes the number of required effect sizes to get a
nonsignificant summary effect size assuming that all these effect sizes are zero. Hence, the
number of required effect sizes will be lower if the effect sizes are below zero and higher if the
effect sizes are above zero. Another drawback of the method is that it focuses on statistical
significance and not on the magnitude of an effect that is substantially of importance (Borenstein
6
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
et al., 2009, Chapter 30).
Another popular method is the funnel plot (Light & Pillemer, 1984). In a funnel plot, the
effect size estimates of the included studies in a meta-analysis are presented on the x-axis and
some measure of the effect sizes’ precision is displayed on the y-axis. Figure 1 shows a funnel
plot for a meta-analysis in the systematic review by Jürgens and Graudal (2004) studying the
effect of sodium intake on different health outcomes. Solid circles indicate Hedges’ g effect sizes
included in this meta-analysis, and the standard error is presented on the y-axis of this funnel
plot. A funnel plot illustrates whether small-study effects are present. That is, whether there is a
relationship between effect size and its precision. The funnel plot should be symmetric and
resemble an inverted funnel in the absence of small-study effects whereas a gap in the funnel
indicates that small-study effects exist. Publication bias is one of the causes of small-study
effects, but funnel plot asymmetry is often interpreted as evidence for publication bias in a metaanalysis. Small-study effects can also be caused by, for instance, heterogeneity in true effect size
and poor methodological design of smaller studies (Egger, Smith, Schneider, & Minder, 1997).
Another cause of funnel plot asymmetry is when study’s sample size is determined based on a
power analysis. A true effect size that is expected to be large results in a small sample size in a
power analysis. The observed effect size with an underlying large true effect size is most likely
large as well, but estimated based on a small sample size. The opposite holds for a small true
effect size; required sample size in a power analysis is large, but the observed effect size is most
likely small. Hence, a correlation between observed effect size and its precision is introduced by
determining sample size of a study based on power analysis.
7
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Figure 1. Funnel plot showing the relationship between observed effect size (Hedges’ g) and its
standard error in a meta-analysis by Jürgens and Graudal (2004) on the effect of sodium intake
on Noradrenaline. Solid circles are Hedges’ g observed effect sizes and open circles are imputed
effect sizes by the trim and fill method.
Evaluating whether small-study effects exist by eyeballing a funnel plot is rather
subjective (Terrin, Schmid, & Lau, 2005). Hence, Egger’s regression test (Egger et al., 1997) and
the rank-correlation test (Begg & Mazumdar, 1994) were developed to test whether small-study
effects are present in a meta-analysis. Egger’s regression test fits a regression line through the
observed effect sizes in the funnel plot, and evidence for small-study effects is obtained if the
8
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
slope of this regression line is significantly different from zero. The rank-correlation test
computes the rank correlation (Kendall’s τ) between the study’s effect sizes and their precision to
test for small-study effects. Drawback of these funnel plot asymmetry tests is that statistical
power to detect publication bias is low especially if the number of effect sizes in a meta-analysis
is small (Begg & Mazumdar, 1994; Sterne, Gavaghan, & Egger, 2000). Hence, these methods are
recommended to be only applied to meta-analyses with ten or more effect sizes (Sterne et al.,
2011).
The test of excess significance (TES) compares whether the number of statistically
significant effect sizes in a meta-analysis with the expected number of statistically significant
effect sizes (Ioannidis & Trikalinos, 2007b). More statistically significant results than expected
indicates that some effect sizes are missing from the meta-analysis possibly caused by
publication bias. The expected number of statistically significant effect sizes is computed based
on the statistical power of the effect sizes in a meta-analysis. This statistical power is calculated
for each effect size assuming that the true effect size is equal to the meta-analytic effect size
estimate. The expected number of statistically significant results is then computed by taking the
sum of the statistical power for each effect size. The observed and expected number of
statistically significant results are compared by means of a χ2 test. Ioannidis and Trikalinos
(2007b) recommend to not apply the method if heterogeneity in true effect size is present.
Moreover, the TES is known to be conservative (Francis, 2013).
Another more recently developed method for examining publication bias is the p-uniform
method (van Aert, Wicherts, & van Assen, 2016; van Assen et al., 2015). This method is based
on the statistical principle that the distribution of p-values at the true effect size is uniform. Since
in the presence of publication bias not all statistically nonsignificant effect sizes get published, p-
9
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
uniform discards nonsignificant effect sizes and computes p-values conditional on being
statistically significant. These conditional p-values should be uniformly distributed at the (fixedeffect) meta-analytic effect size estimate based on the significant and nonsignificant effect sizes,
and deviations from the uniform distribution signals publication bias. P-uniform’s publication
bias test was compared to the TES in a simulation study (van Assen et al., 2015), and statistical
power of p-uniform was in general larger than the TES except for conditions with a true effect
size of zero and no severe publication bias. The simulation study also showed that type-I error
rate of p-uniform’s publication bias test was too low if the true effect size was of medium size.
Limitations of p-uniform’s publication bias test are that it assumes that the true effect size is
homogeneous which is not very common (e.g., Higgins, 2008; Ioannidis, Trikalinos, &
Zintzaras, 2006; Stanley & Doucouliagos, 2014), and that the method inefficiently uses the
available information by discarding statistically nonsignificant effect sizes in a meta-analysis.
Correcting effect sizes for publication bias
Publication bias tests provide evidence about whether publication bias is present in a
meta-analysis. However, from an applied perspective the magnitude of the effect after correcting
for publication bias is more of interest. Furthermore, methods that can estimate the effect size
corrected for publication bias are desirable, because publication bias tests in general have low
statistical power (Moreno et al., 2009).
The most popular method to correct for publication bias in a meta-analysis is the trim and
fill method (Duval & Tweedie, 2000a, 2000b). This method corrects for funnel plot asymmetry
by trimming the most extreme effect sizes from one side of the funnel plot and filling these effect
sizes in the other side of the funnel plot to obtain funnel plot symmetry. The corrected effect size
estimate is obtained by computing the meta-analytic estimate based on the observed and imputed
10
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
effect sizes. Trim and fill can also be used to create a confidence interval and test the null
hypothesis of no effect after adjusting for funnel plot asymmetry. The procedure of trim and fill
is illustrated in Figure 1. The most extreme effect sizes from the right-hand side of the funnel
plot are trimmed and imputed in the left-hand side of the funnel plot (open circles in Figure 1).
Drawback of the trim and fill method is, similar to the other methods based on the funnel plot,
that the method corrects for small-study effects and this does not necessarily have to be
publication bias. Simulation studies have shown that results of trim and fill cannot be trusted
because it incorrectly adds studies when none are missing (Peters, Sutton, Jones, Abrams, &
Rushton, 2007; Rothstein & Bushman, 2012; Terrin, Schmid, Lau, & Olkin, 2003). Hence, trim
and fill is discouraged because of its misleading results (Moreno et al., 2009; Simonsohn,
Nelson, & Simmons, 2014; van Assen et al., 2015).
The PET-PEESE method (Stanley & Doucouliagos, 2014) is an extension of Egger’s
regression test to estimate an effect size in a meta-analysis corrected for small-study effects.
PET-PEESE is based on a regression analysis where the observed effect sizes are regressed on
their standard errors by means of a weighted least squares regression with the inverse of the
effect sizes’ sampling variances as weights. If the intercept is not significantly different from
zero, the estimate of the intercept is interpreted as effect size estimate corrected for publication
bias. The estimate of the intercept reflects the effect size estimate in a study with a standard error
of zero (i.e., a study with an infinite sample size). However, the intercept is biased if the intercept
is significantly different from zero (Stanley & Doucouliagos, 2014). Hence, the intercept of
another weighted least squares regression analysis (with the inverse sampling variances as
weights) is interpreted as effect size estimate if the intercept is significantly different from zero
where the observed effect sizes are regressed on their sampling variances. Simulation studies
11
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
have shown that PET-PEESE substantially reduced the overestimation caused by small-study
effects, and that the method is unlikely to provide reliable results if it is based on less than 10
effect sizes (Stanley, Doucouliagos, & Ioannidis, 2017).
The p-uniform method can also be used for estimating effect size (and a confidence
interval) and testing the null hypothesis of no effect corrected for publication bias. P-uniform’s
effect size estimate is equal to the effect size for which the p-values conditional on being
statistically significant are uniformly distributed. A similar method that uses the distribution of
conditional p-values for estimating effect size in the presence of publication bias is the p-curve
method (Simonsohn et al., 2014). This method is similar to the p-uniform method, but differs
slightly in implementation (for a description of the difference between the two methods see van
Aert et al., 2016). A limitation of p-uniform and p-curve is that effect sizes are overestimated in
the presence of heterogeneity in true effect size (van Aert et al., 2016). Especially if the
heterogeneity in true effect size is more than moderate (I2 > 50%; more than half of the total
variance in effect size is caused by heterogeneity) both methods overestimate the effect size and
their results should be interpreted as a sensitivity analysis. Another limitation of the methods is
that these are not efficient by estimating effect size based on only the statistically significant
effect sizes. P-uniform and p-curve both outperformed trim and fill in simulation studies
(Simonsohn et al., 2014; van Assen et al., 2015).
A selection model approach (Hedges & Vevea, 2005) can also be used for estimating
effect size corrected for publication bias. A selection model makes assumptions on the
distribution of effect sizes (i.e., effect size model) and the mechanism that determines which
studies are observed (i.e., selection model). The selection model specifies which effect sizes
from the effect size model are observed. The effect size estimate (and confidence interval)
12
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
corrected for publication bias is obtained by combining the effect size and selection model.
Selection model approaches exist that estimate the selection model (Iyengar & Greenhouse,
1988; Vevea & Hedges, 1995) whereas others assume a known selection model (Vevea &
Woods, 2005). Selection models are hardly used in practice, because the user has to make
sophisticated assumptions and choices and a large number of effect sizes are needed to avoid
convergence problems (Field & Gillett, 2010; Terrin et al., 2003).
Methods
Data
A large-scale data set was created with meta-analyses published between 2004 and 2014
in Psychological Bulletin and in the Cochrane Library to study the extent and prevalence of
publication bias in psychology and medical research. Psychological Bulletin was selected to
represent psychological meta-analyses, because this journal publishes many meta-analyses on
topics from psychology. Meta-analyses published in the Cochrane Database of Systematic
Reviews (CDSR) of the Cochrane Library were used to represent medical research. This
database is a collection of peer-reviewed medical systematic reviews.
A requirement for the inclusion of a meta-analysis was that either fixed-effect or randomeffects meta-analysis had to be used in the meta-analysis (i.e., no other meta-analytic methods as
for instance meta-analytic structural equation modelling or multilevel meta-analysis). Another
requirement was that sufficient information in the meta-analysis should be available to compute
the primary study’s standardized effect size and its sampling variance. The same effect size
measure (e.g., correlation and standardized mean difference) as in the meta-analysis was used to
13
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
compute the primary study’s effect size and its sampling variance. Formulas as described in
Borenstein (2009), Fleiss and Berlin (2009), and Viechtbauer (2007) were used for computing
the standardized effect sizes and their sampling variances. For each included primary study, we
extracted besides information on effect size and sampling variance also information on all
categorical moderator variables. Based on these moderators, we created homogeneous data sets
of effect sizes, i.e. a homogeneous data set consists of the effect sizes that have the same scores
on all the extracted moderators. Consequently, each meta-analysis may contained more than one
data set if multiple homogeneous data sets could be extracted based on the included moderators.
We only included datasets with less than moderate heterogeneity (I2<50%, i.e., less than
half of the total variance in effect sizes is caused by residual between-study heterogeneity)
(Higgins, Thompson, Deeks, & Altman, 2003), because none of the publication bias methods has
desirable statistical properties in case of extreme between-study heterogeneity in true effect size
(Ioannidis & Trikalinos, 2007a, 2007b; van Aert et al., 2016; van Assen et al., 2015). Different
effect size measures were sometimes used within a meta-analysis. This may cause heterogeneity
in a meta-analysis, so the type of effect size measure was also used for creating homogeneous
subgroups. Publication bias tests have low statistical power (e.g., Begg & Mazumdar, 1994;
Macaskill, Walter, & Irwig, 2001; van Assen et al., 2015) if the number of effect sizes in a metaanalysis is small. Hence, another criterion for including a data set in the analyses was that a data
set should contain at least five effect sizes.
A search in Psychological Bulletin1 resulted in 137 meta-analyses that were published
between 2004 and 2014 and that were eligible for inclusion. The corresponding author was
contacted and a data request was made if not sufficient information was available in the metaanalysis. Data of the meta-analyses were extracted by hand from the papers.2 A total number of
14
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
370 datasets from meta-analyses published in Psychological Bulletin with less than moderate
heterogeneity and five or more studies were eligible for inclusion in our study.
Data of all systematic reviews in the CDSR are online stored in a standardized format,
and can therefore be extracted by an automated procedure. We used the Cochrane scraper
developed by Springate and Kontopantelis (2014) to automatically extract data from systematic
reviews. The total number of meta-analyses in the CDSR is larger than in Psychological Bulletin,
so we drew a simple random sample without replacement of systematic reviews from the CDSR
to represent medical research. Each systematic review in the database has an identification
number. We sampled identification numbers, extracted datasets from the sampled systematic
review, and included a dataset in our study if I2<50% and the number of effect sizes in a dataset
was at least five. We continued sampling systematic reviews and extracting datasets till the same
number of eligible datasets for inclusion were obtained as extracted from Psychological Bulletin
(370). The next section describes how the research questions were answered, and how the
variables were measured.
Analysis
Research question 1.
The prevalence of publication bias in data sets from psychology
and medical research was examined in research question 1 by using state-of-the-art publication
bias methods. The same effect size measure as in the original meta-analyses was used to answer
RQ 1. Different publication bias tests were applied to each data set to examine the prevalence of
publication bias. The following publication bias methods were included: p-uniform’s publication
bias test (van Aert et al., 2016; van Assen et al., 2015), Egger’s test (Egger et al., 1997), rankcorrelation test (Begg & Mazumdar, 1994), and the TES (Ioannidis & Trikalinos, 2007b). Since
15
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
publication bias tests have low statistical power we followed a recommendation by Egger et al.
(1997) to conduct two-tailed hypothesis tests with α=.1 for all methods. P-uniform’s publication
bias test can be applied to observed effect sizes in a data set that are either significantly smaller
or larger than zero. Hence, p-uniform was applied to negative or positive effect sizes in a data set
depending on where the majority of statistically significant results was observed based on a twotailed hypothesis test with α=.05. The estimator based on the Irwin-Hall distribution was used for
p-uniform. RQ 1a was answered by counting how often each method rejects the null hypothesis
of no publication bias. Agreement among the publication bias tests was examined by computing
Loevinger’s H values (Loevinger, 1948) for each combination of two methods. Loevinger’s H is
a statistic to quantify the association between two dichotomous variables (i.e., statistically
significant or not). The maximum value of Loevinger H is 1 indicating a perfect association
where the minimum value depends on characteristics of the data. For data sets with no
statistically significant effect sizes, p-uniform could not be applied, and we computed the
association between the results of p-uniform and other methods only for data sets with
statistically significant effect sizes.
We studied whether publication bias was more prevalent in data sets from Psychological
Bulletin and CDSR (research question 1b) by conducting for each publication bias test a logistic
regression with as dependent variable whether a publication bias test was statistically significant
or not and as independent variable a dummy variable indicating whether a dataset was obtained
from Psychological Bulletin or CDSR (reference category). The number of effect sizes in a data
set (or statistically significant effect sizes for p-uniform) was included as control variable,
because statistical power of publication bias tests depends on the number of effect sizes in a data
set and the number of effect sizes in data sets from meta-analyses published in Psychological
16
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Bulletin and CDSR were expected to differ. We hypothesized that publication bias is more
severe in data sets from Psychological Bulletin than CDSR after controlling for the number of
effect sizes in a data set (or number of statistically significant effect sizes for p-uniform). This
relationship was expected because medical researchers have been longer aware of the
consequences of publication bias whereas broad awareness of publication bias recently
originated in psychology. Furthermore, psychology and medical research differ in characteristics
like sample sizes, cost of data collection, and flexibility in design that are considered relevant for
publication bias (Ioannidis, 2005). One-tailed hypothesis tests with α=.05 were used for
answering research question 1b.
Research question 2.
Characteristics of data sets were used to predict the
estimates of traditional meta-analysis and estimates corrected for publication bias in this research
question. All effect sizes and their variances were transformed to Cohen’s d by using the
formulas in Borenstein (2009, section 12.5). If Cohen’s d could not be computed based on the
available information, Hedges’ g was used as an approximation of Cohen’s d (52 data sets).
Random-effects meta-analysis was used to estimate the effect size rather than fixed-effect
meta-analysis. Random-effects meta-analysis assumes that there is no fixed true effect
underlying each effect size (Raudenbush, 2009), and was preferred over fixed-effect metaanalysis because a small amount of heterogeneity in true effect size could be present in the data
sets. The Paule-Mandel estimator (Paule & Mandel, 1982) was used in random-effects metaanalysis for estimating the amount of between-study heterogeneity since this estimator has the
best statistical properties in most situations (Langan, Higgins, & Simmonds, 2016; Veroniki et
al., 2016). Effect sizes corrected for publication bias were estimated with p-uniform, because the
number of effect sizes included in meta-analyses in medical research is often small. Hence,
17
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
selection models were not used due to convergence problems in meta-analyses with a small
number of effect sizes (e.g., Field & Gillett, 2010), and PET-PEESE is not recommended to be
used since it yields unreliable results if there are less than 10 observed effect sizes (Stanley et al.,
2017). P-uniform was preferred over trim and fill, because applying trim and fill is discouraged
(Moreno et al., 2009; Simonsohn et al., 2014; van Assen et al., 2015). For applying p-uniform,
the estimator based on the Irwin-Hall distribution was used, and two-tailed hypothesis tests in the
primary studies were assumed with α=.05. The underlying true effect size in a data set can be
either positive or negative. Hence, the dependent variable of these analyses were the absolute
values of the estimates of random-effects meta-analysis and p-uniform.
Two separate weighted least squares (WLS) regressions were used with as dependent
variables the absolute values of the effect sizes estimates of random-effects meta-analysis and puniform. The inverse of the meta-analytic variances was selected as weights in the WLS
regressions, because it is a function of both the sample size in the primary studies and the
number of effect sizes in a data set. For the analysis with the meta-analytic effect size estimate as
dependent variable, all data sets were used whereas only the data sets with statistically
significant effect sizes were used for the analyses with p-uniform’s estimate as the dependent
variable.
Five independent variables were included in the WLS regression with the random-effects
meta-analytic estimate per data set as dependent variable. The independent variables and the
hypothesized relationships are listed in Table 1. The meta-analytic effect size estimate was
expected to be larger in data sets from Psychological Bulletin, because publication bias was
expected to be more severe in psychology than medical research. No straightforward relationship
was hypothesized between the I2-statistic and the meta-analytic effect size estimate. The amount
18
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
of heterogeneity can be either over- or underestimated depending on the extent of publication
bias (Jackson, 2006). Hence, no main effect of I2 on the estimate of random-effects meta-analysis
was expected. Primary studies’ precision in a data set was operationalized by computing the
harmonic mean of the primary studies’ standard error. A negative relationship was expected
between primary studies’ precision and the meta-analytic estimate, because less precise effect
size estimates (i.e., larger standard errors) were expected to be accompanied with more bias and
hence larger meta-analytic effect size estimates. The proportion of statistically significant effect
sizes in a data set was expected to have a positive relationship on the meta-analytic effect size
estimate, because effect sizes with the same sample size that are statistically significant are larger
than statistically nonsignificant effect sizes. The independent variable indicating the number of
effect sizes in a data set was included to control for differences in the number of studies in a
meta-analysis.
Table 1. Hypotheses between independent variables and effect size estimate of random-effects
meta-analysis.
Independent variable
Discipline
Hypothesis
Overestimation was expected to be more severe in data sets from
Psychological Bulletin than CDSR
2
I -statistic
No relationship was expected
Primary studies’ precision A negative relationship was expected between primary studies’
precision and the meta-analytic estimate
Proportion of statistically
A positive relationship was expected between the proportion of
significant effect sizes
statistically significant effect sizes and the meta-analytic effect
size estimate
Number of effect sizes in a No relationship was expected
data set
For the WLS regression with p-uniform’s estimate of meta-analyses as dependent
variable, the same independent variables were included as for the WLS regression with the meta19
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
analytic estimate as dependent variable (see Table 2). No hypothesis was specified for the effect
of discipline. A relationship between discipline and the effect size estimate of p-uniform may
exist, but it is unclear whether estimates after correcting for publication bias are larger for data
sets from Psychological Bulletin or CDSR. We expected a positive relationship of the I2-statistic,
because p-uniform overestimates the true effect size in the presence of between-study
heterogeneity (van Aert et al., 2016; van Assen et al., 2015). The primary studies’ precision was
operationalized in the same way as for the WLS regression with the meta-analytic estimate as
dependent variable. A negative relationship between primary studies’ precision and the metaanalytic effect size estimate is indicative for publication bias. Since p-uniform corrects for
publication bias, no relationship was expected between primary studies’ precision and puniform’s estimate. No specific direction of the effect of the proportion of statistically significant
effect sizes in a data set was hypothesized. Many statistically significant effect sizes in a data set
suggest that the studied effect size is large, sample size of the primary studies are large, or there
was severe publication bias in combination with many conducted (but not published) primary
studies. These partly opposing effects may have canceled each other out or there can be a
positive or negative relationship. The number of effect sizes in a data set was again included as
control variable, and no relationship was expected with p-uniform’s effect size estimate.
20
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Table 2. Hypotheses between independent variables and p-uniform’s effect size estimate.
Independent variable
Discipline
Hypothesis
No directional hypothesis was specified, because the direction
was unclear
2
I -statistic
A positive relationship was expected between the I2-statistic and
p-uniform’s estimate
Primary studies’ precision No relationship was expected
Proportion of statistically
No directional hypothesis was specified, because the direction
significant effect sizes
was unclear
Number of effect sizes in a No relationship was expected
data set
The effect size estimate of p-uniform can become extremely positive or negative if there
are multiple p-values just below the α-level (van Aert et al., 2016; van Assen et al., 2015). These
outliers may affect the results of the regression analysis with p-uniform’s estimate as dependent
variable. Hence, we used quantile regression (Koenker, 2005) as a sensitivity analysis, because
this procedure is less influenced by outliers in the dependent variable. In the quantile regression,
the independent variables were regressed on the median of the estimates of p-uniform.
Research question 3. Estimates of random-effects meta-analysis and p-uniform obtained
for answering research question 2 were used to examine the overestimation caused by
publication bias. It is possible that estimates of the meta-analysis and p-uniform have opposite
signs (i.e., negative estimate of p-uniform and positive meta-analytic estimate or the other way
around). An effect size estimate of p-uniform in the opposite direction than the meta-analytic
estimate is often unrealistic, because this suggests that, for instance, a negative true effect size
results in multiple positive observed effect sizes. Effect size estimates in opposing directions by
meta-analysis and p-uniform may be caused by many p-values just below α-level. Hence, puniform’s estimate was set equal to zero in these situations. Setting p-uniform’s estimate to zero
21
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
when its sign is opposite to that of meta-analysis is in line with the recommendation in van Aert
et al. (2016).
A new variable Y was created to reflect the overestimation of random-effects metaanalysis when compared with p-uniform. If the meta-analytic estimate was larger than zero,
Y=MA-corrected where “MA” is the meta-analytic estimate and “corrected” is the estimate of puniform. If the meta-analytic estimate was smaller than zero, Y=-MA+corrected. Variable Y was
zero if the estimates of p-uniform and meta-analysis were the same, positive if p-uniform’s effect
size estimate was closer to zero than the meta-analytic estimate (if they originally had the same
sign), and negative if p-uniform’s estimate was farther away from zero than the meta-analytic
estimate (if they originally had the same sign). The Y variable was computed for each dataset
with statistically significant effect sizes, and we computed the mean and median of Y for data
sets from Psychological Bulletin and CDSR in order to get an overall estimate of the amount of
overestimation in effect size (research question 3a).
In research question 3b, WLS regression (with the inverse of the variance of the randomeffects meta-analytic estimate as weights) was used to examine the relationship between the
differences in estimates of random-effects meta-analysis and p-uniform and the same
independent variables as in research question 2. The hypothesized relationships between the
independent variables and Y are summarized in Table 3. A larger value on Y was expected for
data sets from Psychological Bulletin than CDSR, because overestimation was expected to be
more severe in psychology than in medicine. We hypothesized a negative relation between the
I2-statistic and Y. P-uniform overestimates the effect size in the presence of between-study
heterogeneity (van Aert et al., 2016; van Assen et al., 2015) resulting in a smaller difference
between the estimates of p-uniform and random-effects meta-analysis. Primary studies’ precision
22
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
was hypothesized to be negatively related to Y, because overestimation of the meta-analytic
estimate was expected to decrease as a function of primary studies’ precision. No specific
directions of the relationships between the number of effect sizes in a data set and the proportion
of statistically significant effect sizes in a data set were expected. Although a positive effect of
this proportion on the meta-analytic effect size estimate was expected, the effect of the
proportion on p-uniform’s estimate was unclear. The number of effect sizes in a data set was
incorporated in the WLS regression as control variable.
Table 3. Hypotheses between independent variables and overestimation in effect size when
comparing random-effects meta-analysis with p-uniform (Y).
Independent variable
Discipline
Hypothesis
A larger value on Y was expected for data sets from
Psychological Bulletin than CDSR
2
I -statistic
A negative relationship was expected between the I2-statistic and
Y
Primary studies’ precision A negative relationship was expected between primary studies’
precision and Y
Proportion of statistically
No directional hypothesis was specified, because the direction
significant effect sizes
was unclear
Number of effect sizes in a No relationship was expected
data set
Estimates of p-uniform that are in the opposite direction than traditional meta-analysis
were set equal to zero before computing the Y variable. This affected the results of the WLS
regression. Hence, quantile regression (Koenker, 2005) was used as sensitivity analysis with the
median of Y as dependent variable instead of the mean of Y in WLS regression. Since quantile
regression uses the median of Y, its results were not affected by truncating estimates of p-
23
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
uniform to zero.
Results
The total number of included data sets was 740 (370 representing Psychological Bulletin
and 370 representing CDSR). Figure 2 shows the number of effect sizes in a data set for included
data sets from Psychological Bulletin (a) and CDSR (b). Both figures show that no data sets were
included with less than five effect sizes. The number of effect sizes per data set was slightly
larger for Psychological Bulletin (mean 9.2, first quartile 5, third quartile 9) compared to CDSR
(mean 7.6, first quartile 5, third quartile 8). The mean number of statistically significant effect
sizes based on a two-tailed hypothesis test and α=.05 were 2.6 (30.1%) for data sets from
Psychological Bulletin and 1.2 (16%) for data sets from CDSR. A requirement for applying puniform is that statistically significant effect sizes should be present in a meta-analysis. This was
not the case for 38.5% of the data sets (27.3% of data sets from Psychological Bulletin and
49.7% of data sets from CDSR), and p-uniform could not be applied to these data sets.
24
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Figure 2. Histograms of the number of effect sizes in data sets from Psychological Bulletin (a)
and CDSR (b).
Research question 1
Table 4 shows the results of applying Egger’s regression test, the rank-correlation test, puniform’s publication bias test, and the TES to examine the prevalence of publication bias in the
meta-analyses. The panels in Table 4 illustrate how often each publication bias test was
statistically significant (marginal frequencies and percentages) and also the agreement among the
methods (joint frequencies). Agreement among the methods was quantified by means of
Loevinger’s H (bottom-right cell of each panel).
25
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Table 4. Results of applying Egger’s regression test, rank-correlation test, p-uniform’s
publication bias test, and test of excess significance (TES) to examine the prevalence of
publication bias in meta-analyses from Psychological Bulletin and Cochrane Database of
Systematic Reviews. H denotes Loevinger’s H to describe the association between two methods.
Rank-correlation
Not sig.
Not sig.
Sig.
608
33
p-uniform
641; 87.9%
Not sig.
Sig.
Not sig.
324
44
368; 83.6%
Egger
Egger
Sig.
51
37
88; 12.1%
Sig.
64
8
72; 16.4%
Total
659; 90.4%
70; 9.6%
H = .464
Total
388; 88.2%
52; 11.8%
H = -.012
TES
p-uniform
Not sig.
Sig.
Not sig.
625
27
652; 88.1%
Sig.
80
8
88; 11.9%
Egger
Rank-
Not sig.
Sig.
Not sig.
350
44
394; 89.5%
Sig.
38
8
46; 10.5%
Total
388; 88.2%
52; 11.8%
H = .063
correlati
on
Total
705; 95.3%
35; 4.7%
H = .124
TES
Rank-
TES
Not sig.
Sig.
Not sig.
630
29
659; 90.4%
Sig.
64
6
70; 9.6%
Total
694; 95.2%
35; 4.8%
H=.083
correlati
puniform
Not sig.
Sig.
Not sig.
361
27
388; 88.2%
Sig.
44
8
52; 11.8%
Total
405; 92%
35; 8%
H=.125
on
26
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
No strong evidence for publication bias was observed in the data sets. Publication bias
was detected in at most 88 data sets (11.9%) by Egger’s regression test. The TES and rankcorrelation test were statistically significant in 35 (4.7%) and 70 (9.6%) data sets. In the subset of
data sets with at least one statistically significant effect size, p-uniform’s publication bias test
detected publication bias in 52 data sets (11.8%). This was more than TES (35; 8%) and the
rank-correlation test (46; 10.5%) and less than Egger’s regression test (72; 16.4%) when these
methods were applied to this subset of data sets. Note that the rank-correlation could not be
applied to all 740 data sets, because there was no variation in the observed effect size in 11 data
sets. Associations among the methods were low (H < .13) except for the association between
Egger’s regression test and the rank-correlation test (H = .464). This association was expected
since both methods examine the relationship between observed effect size and some measure of
its precision.
We studied in RQ 1b whether publication bias was more prevalent in data sets from
Psychological Bulletin than CDSR after controlling for the number of effect sizes (or for puniform statistically significant effect sizes) in a meta-analysis. Publication bias was more
prevalent in data sets from Psychological Bulletin if the results of the rank-correlation test (odds
ratio=1.919, z=2.453, one-tailed p-value=.007) and TES (odds ratio=2.056, z=1.913, one-tailed
p-value=.028) were used as dependent variable, but not for Egger’s regression test (odds
ratio=1.298, z=1.128, one-tailed p-value=.259) and p-uniform (odds ratio=0.998, z=-0.007, onetailed p-value=.503). Tables with the results of these logistic regression analyses are reported in
Appendix A. Note, however, that if we control for the number of tests performed (i.e., 4) by
means of the Bonferoni correction, only the result of the rank-correlation test is statistically
27
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
significant (p = .007 < .05/4 = .0125). All in all, we conclude that evidence on differences in the
prevalence of publication bias across the two disciplines is not convincing.
Research question 2
Absolute values of the effect size estimates of random-effects meta-analysis and puniform were predicted based on characteristics of the data sets. One-tailed hypothesis tests were
used in case of a directional hypothesis (see Table 2). Table 5 presents the results of the WLS
regression with the absolute values of the estimates of random-effects meta-analysis as
dependent variable. The variables in the model explain 64.5% of the variance in the estimates of
random-effects meta-analysis (R2=0.645; F(5,734)=266.5, p < .001). The absolute value of the
meta-analytic estimate was 0.016 larger for data sets from Psychological Bulletin compared to
CDSR (t(733)=1.714, one-tailed p-value =.043). The I2-statistic had an unexpected negative
effect on the absolute value of the meta-analytic estimate (B=-0.001, t(733)=-4.143, two-tailed pvalue<.001). The harmonic mean of the standard error had a positive effect (B=1.095,
t(733)=23.512, one-tailed p-value<.001) suggesting that data sets with less precise estimates
resulted in more extreme meta-analytic estimates. As hypothesized, a positive effect was
observed for the proportion of statistically significant effect sizes on the absolute value of the
meta-analytic estimate (B=0.479, t(733)=-32.021, two-tailed p-value<.001).
28
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Table 5. Results of weighted least squares regression with the absolute value of the randomeffects meta-analysis effect size estimate as dependent variable and as independent variables
discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of
statistically significant effect sizes in a data set (Prop. sig. effect sizes), and number of effect
sizes in a data set.
B (SE)
t-value (p-value)
95% CI
Intercept
-0.139 (0.013)
-10.650 (<.001)
-0.164;-0.113
Discipline
0.016 (.010)
1.714 (.043)
-0.002;0.035
I2-statistic
-0.001 (.0003)
-4.143 (<.001)
-0.002;-0.001
Standard error
1.095 (.047)
23.512 (<.001)
1.003;1.186
Prop. sig.
0.479 (.015)
32.021 (<.001)
0.450;0.509
-0.0003 (.0003)
-1.057 (.291)
-0.001;0.0002
effect sizes
Number of
effect sizes
Note. CDSR is the reference category for discipline. p-values for discipline, harmonic mean of
the standard error, and proportion of significant effect sizes in a data set are one-tailed whereas
the other p-values are two-tailed. CI = confidence interval.
Table 6 shows the results of WLS regression with the absolute value of p-uniform’s
estimate as dependent variable. P-uniform could only be applied to 455 data sets (61.5%),
because these data sets contained at least one statistically significant effect size. The variance of
the 455 estimates by p-uniform (73.070) was substantially larger than the variance of the
estimates of random-effects meta-analysis (0.204) when applied to these 455 data sets. This was
mainly caused by some extreme effect size estimates by p-uniform (i.e., four estimates were
29
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
larger than 10 and seven estimates were smaller than -10. The proportion explained variance in
p-uniform’s estimate was R2=.015 (F(5,449)=, p=.239). We hypothesized a positive relationship
of the I2-statistic, but such a relationship was not observed (B=0.012, t(448)=0.843, one-tailed pvalue=.200). This may indicate that p-uniform’s estimates were not severely overestimated
because of the heterogeneity in true effect size. No effect was observed between the other
independent variables and the absolute value of p-uniform’s estimate. Quantile regression was
used as sensitivity analysis to examine whether the results were distorted by extreme effect size
estimates of p-uniform (see Table B1). The effect of the harmonic mean of the standard error
was lower in the quantile regression, but in contrast to the WLS regression statistically
significant (B=2.017, t(448)=3.660, two-tailed p-value<.001) and the effect of the proportion of
statistically significant effect sizes was substantially lower in the quantile regression (B=0.194,
t(448)=1.881, two-tailed p-value=.061).
30
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Table 6. Results of weighted least squares regression with the absolute value of p-uniform’s
effect size estimate as dependent variable and as independent variables discipline, I2-statistic,
harmonic mean of the standard error (standard error), proportion of statistically significant effect
sizes in a data set (Prop. sig. effect sizes), and number of effect sizes in a data set.
B (SE)
t-value (p-value)
95% CI
Intercept
-1.149 (0.794)
1.447 (.149)
-0.411; 2.709
Discipline
-0.185 (0.550)
-0.336 (.737)
-1.265;0.896
I2-statistic
0.012 (0.014)
0.843 (.200)
-0.016;0.040
Standard error
3.124 (2.780)
1.124 (.262)
-2.339;8.587
Prop. sig.
-1.454 (0.867)
-1.678 (.094)
-3.158;0.249
-0.021 (0.015)
-1.404 (.161)
-0.050;0.008
effect sizes
Number of
effect sizes
Note. CDSR is the reference category for discipline. p-value for the I2-statistic is one-tailed
whereas the other p-values are two-tailed. CI = confidence interval.
Research question 3
We examined the overestimation in effect size by creating the Y variable that reflects the
overestimation in effect size of random-effects meta-analysis when compared with p-uniform’s
estimate. Results indicated that the overestimation was less than d=0.05 for data sets from
Psychological Bulletin (mean=0.012, median=0.048) and CDSR (mean=0.015, median=0.019),
and that differences between estimates of data sets from Psychological Bulletin and CDSR were
negligible (research question 3a).
31
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Table 7 presents the results of WLS regression with Y as dependent variable and
characteristics of the data sets as independent variables. The independent variables explained
15.3% of the variance of Y (F(5,449)=16., p < .001). As the descriptive results above already
illustrate, the effect size in data sets from Psychological Bulletin were not significantly larger
than from CDSR (B=-0.026, t(448)=-1.331, one-tailed p-value=.908). A negative effect of the I2statistic was observed (B=-0.003, t(448)=-5.255, one-tailed p-value<.001). The relationship
between the harmonic mean of the standard error and Y was observed in the expected direction
(B=0.218, t(448)=2.172, one-tailed p-value=.015), and the proportion of statistically significant
effect sizes in a data set had a positive effect on Y (B=0.161, t(448)=5.140, two-tailed pvalue<.001).
32
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Table 7. Results of weighted least squares regression with the effect size overestimation in
random-effects meta-analysis when compared to p-uniform (Y) as dependent variable and as
independent variables discipline, I2-statistic, harmonic mean of the standard error (standard
error), proportion of statistically significant effect sizes in a data set (Prop. sig. effect sizes), and
number of effect sizes in a data set.
B (SE)
t-value (p-value)
95% CI
Intercept
-0.026 (0.029)
-0.923 (.357)
-0.083;0.030
Discipline
-0.026 (0.020)
-1.331 (.908)
-0.066;0.013
I2-statistic
-0.003 (0.001)
-5.255 (<.001)
-0.004;-0.002
Standard error
0.218 (0.100)
2.172 (.015)
0.021;0.416
Prop. sig.
0.161 (0.031)
5.140 (<.001)
0.100;0.223
-0.002 (0.001)
-3.671 (<.001)
-0.003;-0.001
effect sizes
Number of
effect sizes
Note. CDSR is the reference category for discipline. p-values for discipline, the I2-statistic, and
the harmonic mean of the standard error are one-tailed whereas the other p-values are two-tailed.
CI = confidence interval.
The estimate of p-uniform was truncated to zero in 130 data sets before computing Y in
order to deal with unlikely cases where p-uniform’s estimate would be in the opposite direction
than the estimate of random-effects meta-analysis. Hence, quantile regression with the median of
Y as dependent variable was conducted to examine whether the results of WLS regression were
affected by this truncation (see Table 8). The effects of the I2-statistic (B=-0.0003, t(448)=0.547, one-tailed p-value=.292) and proportion of statistically significant effect size in a data set
33
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
(B=0.021, t(448)=0.463, one-tailed p-value=.644) were no longer statistically significant.
Although the effect of the harmonic mean of the standard error was larger in the quantile
regression, it was no longer statistically significant (B=0.346, t(448)=1.511, one-tailed pvalue=.066). Consequently, using the median of Y as dependent variable in quantile regression
instead of the mean of Y in WLS regression yields different results and conclusions.
Table 8. Results of quantile regression with the median of effect size overestimation in randomeffects meta-analysis when compared to p-uniform (Y) as dependent variable and as independent
variables discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion
of statistically significant effect sizes in a data set (Prop. sig. effect sizes), and number of effect
sizes in a data set.
B (SE)
t-value (p-value)
95% CI
Intercept
-0.009 (0.078)
-0.116 (.908)
-0.012;0.083
Discipline
-0.005 (0.026)
-0.191 (.576)
-0.021;0.021
I2-statistic
-0.0003 (0.0005)
-0.547 (.292)
-0.001;-0.0001
Standard error
0.346 (0.229)
1.511 (.066)
0.204;0.476
Prop. sig.
0.021 (0.045)
0.463 (.644)
-0.002;0.062
-0.003 (0.001)
-1.867 (.063)
-0.004;-0.002
effect sizes
Number of
effect sizes
Note. CDSR is the reference category for discipline. p-values for discipline, the I2-statistic, and
the harmonic mean of the standard error are one-tailed whereas the other p-values are two-tailed.
CI = confidence interval based on inverting a rank test.
Conclusion and discussion
34
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Publication bias is a major threat to the validity of meta-analyses. It results in
overestimated effect sizes in primary studies which in turn also biases the meta-analytic results
(e.g., Lane & Dunlap, 1978; van Assen et al., 2015). Evidence for publication bias has been
observed in many research fields (e.g., Fanelli, 2012; Franco et al., 2014; Franco et al., 2016;
Sterling et al., 1995), and different methods were developed to examine publication bias in metaanalyses (for an overview see Rothstein et al., 2005). We studied the prevalence of publication
bias and the overestimation caused by it in a large number of published meta-analyses within
psychology and medical research by applying publication bias methods to homogeneous data
sets of these meta-analysis. Characteristics of meta-analyses were also used to predict metaanalytic estimates and overestimation of effect size.
No convincing evidence for the prevalence of publication bias was observed in our largescale data set. Publication bias was detected in at most 88 data sets (11.9%) by Egger’s
regression test, 70 data sets (9.6%) by the rank-correlation test, 35 data sets (4.7%) by the TES,
and 52(11.8%) data sets by p-uniforms publication bias test. Results also show that publication
bias tests do often not agree on the presence of publication bias in a data set. Furthermore, no
univocal evidence was obtained for publication bias being more severe in data sets representing
psychology compared to data sets representing medical research. A WLS regression with the
random-effects meta-analytic estimates as dependent variable revealed a negative effect of
primary studies’ precision on the meta-analytic estimate. This suggests that small-study effects
were present, and one of the causes of these small-study effects is publication bias. We only
analyzed homogeneous data sets which makes it less likely that the small-study effects are
caused by, for instance, methodological differences between the effect sizes in a data set rather
35
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
than publication bias. Small-study effects were also observed in other meta-meta-analyses by
Fanelli and Ioannidis (2013) and Fanelli, Costas, and Ioannidis (2017).
The same independent variables used for predicting the random-effects meta-analytic
effect size estimate were also used in a WLS regression with p-uniform’s estimate as dependent
variable. None of the independent variables had a statistically significant effect on p-uniform’s
effect size estimate. The explained variance in the WLS regression with p-uniform’s estimate as
dependent variable (1.5%) was substantially lower than with the estimate of random-effects
meta-analysis as dependent variable (64.5%). This difference in explained variance was probably
caused by the large variance in p-uniform’s effect size estimates across data sets. The variance in
these estimates was large, because estimates of p-uniform were sometimes extremely positive or
negative caused by observed effect sizes with p-values just below the α-level. We also studied
the overestimation in effect size due to publication bias by comparing the estimates of randomeffects meta-analysis and p-uniform. Primary studies’ precision and the proportion of statistically
significant effect sizes in a data set both had an effect on the overestimation of effect size in a
WLS regression, but these effects were no longer statistically significant in a sensitivity analysis.
The results of our paper are not in line with previous research showing that publication
bias is omnipresent in science (e.g., Fanelli, 2012; Franco et al., 2014; Franco et al., 2016;
Sterling et al., 1995). There are three indications for the absence of severe publication bias in the
included meta-analyses in our study. First, many statistically nonsignificant effect sizes were
included in the meta-analyses. Only 30.1% of the observed effect sizes were statistically
significant in the meta-analyses from Psychological Bulletin, and 16% of the observed effect
sizes were statistically significant in the meta-analyses from CDSR. This is, for instance,
substantially lower than the approximately 90% statistically significant results in the social
36
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
sciences literature (Fanelli, 2012; Sterling et al., 1995). Second, publication bias was often not
detected in the included meta-analyses. Finally, overestimation in effect size caused by
publication bias was hardly observed in the included meta-analyses.
Publication bias and overestimation caused by it could also have gone undetected due to a
variety of reasons. First, authors included many unpublished studies in their meta-analyses which
decreased the detectability of publication bias. Second, publication bias methods were applied to
subsets of primary studies’ included in the meta-analysis to create more homogeneous data sets.
Creating these smaller data sets was necessary, because publication bias methods’ statistical
properties deteriorate if heterogeneity in true effect size is moderate or large. However,
publication bias methods were now applied to data sets with a small number of effect sizes.
Publication bias tests suffer from low statistical power (Begg & Mazumdar, 1994; Sterne et al.,
2000), and imprecise meta-analytic effect size estimates are obtained in these conditions (van
Aert et al., 2016; van Assen et al., 2015). Third, the selection of homogeneous data sets could
have let to the exclusion of data sets with severe publication bias. Imagine a data set with a
number of statistically significant effect sizes that were published, and only two statistically
nonsignificant effect sizes that were obtained from unpublished research. The inclusion of the
effect sizes from the unpublished research causes heterogeneity in the data set, and therefore a
data set with potentially severe publication bias was excluded from our study. Finally,
publication bias is less of an issue if the relationship of interest in a meta-analysis is not about the
main outcome of primary studies. Significance of the main outcome in a primary study probably
determines publication and not significance of a secondary outcome.
Although no convincing evidence for publication bias was observed in our study, we
agree with others (e.g., Banks, Kepes, & McDaniel, 2012; Field & Gillett, 2010; Sutton, 2005)
37
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
that publication bias should be routinely assessed in every meta-analysis. Moreover, a set of
publication bias methods is recommended to be applied and reported in each meta-analysis,
because each method assesses publication bias in a different way and one method may detect
publication bias in a meta-analysis whereas another method does not (Coburn & Vevea, 2015;
Kepes et al., 2012). Future research should focus on developing publication bias methods that are
able to examine publication bias in meta-analyses with a small number of effect sizes and
heterogeneity in true effect size. We are currently working on extending p-uniform such that it
can also deal with heterogeneous true effect size. Other promising developments are the recently
increased attention for selection models (e.g., Citkowicz & Vevea, 2017) and the development of
a Bayesian method to correct for publication bias (Guan & Vandekerckhove, 2015). Albeit metaanalysts will greatly benefit from improved methods to assess publication bias, attention should
also be paid to registering studies to make unpublished research readily accessible (e.g.,
Dickersin, 2005). Such a register enables meta-analysts to also include unpublished research in
their meta-analysis, and will improve the validity of meta-analytic results.
38
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
References
Aguinis, H., Dalton, D. R., Bosco, F. A., Pierce, C. A., & Dalton, C. M. (2010). Meta-analytic
choices and judgment calls: Implications for theory building and testing, obtained effect
sizes, and scholarly impact. Journal of Management, 37(1), 5-38.
doi:10.1177/0149206310377113
Aytug, Z. G., Rothstein, H. R., Zhou, W., & Kern, M. C. (2012). Revealed or concealed?
Transparency of procedures, decisions, and judgment calls in meta-analyses.
Organizational Research Methods, 15(1), 103-133. doi:10.1177/1094428111403495
Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological
science. Perspectives on Psychological Science, 7(6), 543-554.
doi:10.1177/1745691612459060
Banks, G. C., Kepes, S., & Banks, K. P. (2012). Publication bias: The antagonist of metaanalytic reviews and effective policymaking. Educational Evaluation and Policy
Analysis, 34(3), 259-277. doi:10.3102/0162373712446144
Banks, G. C., Kepes, S., & McDaniel, M. A. (2012). Publication bias: A call for improved metaanalytic practice in the organizational sciences. International Journal of Selection and
Assessment, 20(2), 182-197. doi:10.1111/j.1468-2389.2012.00591
Becker, B. J. (2005). Failsafe N or file-drawer number. In H. R. Rothstein, A. J. Sutton, & M.
Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and
adjustments (pp. 111-125). Chichester, UK: Wiley.
Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for
publication bias. Biometrics, 50(4), 1088-1101.
Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hedges, & J. C.
Valentine (Eds.), The Handbook of Research Synthesis and Meta-Analysis (pp. 221-236).
New York: Russell Sage Foundation.
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to metaanalysis. Chichester, UK: John Wiley & Sons, Ltd.
Citkowicz, M., & Vevea, J. L. (2017). A parsimonious weight function for modeling publication
bias. Psychological Methods, 22(1), 28-41. doi:http://dx.doi.org/10.1037/met0000119
39
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Coburn, K. M., & Vevea, J. L. (2015). Publication bias as a function of study characteristics.
Psychological Methods, 20(3), 310-330. doi:http://dx.doi.org/10.1037/met0000046
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304-1312.
Cooper, H., DeNeve, K., & Charlton, K. (1997). Finding the missing science: The fate of studies
submitted for review by a human subjects committee. Psychological Methods, 2(4), 447452. doi:10.1037/1082-989X.2.4.447
Dickersin, K. (2005). Publication bias: Recognizing the problem understanding its origins and
scope, and preventing harm. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.),
Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 11-33).
Chichester, England: Wiley.
Dickersin, K., Chan, S., Chalmers, T. C., Sacks, H. S., & Smith, H. (1987). Publication bias and
clinical trials. Controlled Clinical Trials, 8(4), 343-353.
Duval, S., & Tweedie, R. (2000a). A nonparametric "trim and fill" method of accounting for
publication bias in meta-analysis. Journal of the American Statistical Association,
95(449), 89-98. doi:10.1080/01621459.2000.10473905
Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot-based method of testing
and adjusting for publication bias in meta-analysis. Biometrics, 56(2), 455-463.
Dwan, K., Altman, D. G., Arnaiz, J. A., Bloom, J., Chan, A.-W., Cronin, E., . . . Siegfried, N.
(2008). Systematic review of the empirical evidence of study publication bias and
outcome reporting bias. PLoS ONE, 3(8), e3081. doi:10.1371/journal.pone.0003081
Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by
a simple, graphical test. British Medical Journal, 315, 629-634.
Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries.
Scientometrics, 90(3), 891-904. doi:10.1007/s11192-011-0494-7
Fanelli, D., Costas, R., & Ioannidis, J. P. A. (2017). Meta-assessment of bias in science. Proc
Natl Acad Sci USA. doi:10.1073/pnas.1618569114
Fanelli, D., & Ioannidis, J. P. (2013). US studies may overestimate effect sizes in softer research.
Proc Natl Acad Sci USA, 110(37), 15031-15036.
doi:www.pnas.org/cgi/doi/10.1073/pnas.1302997110
40
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence,
methods for identifying and controlling, and implications for the use of meta-analyses.
Psychological Methods, 17(1), 120-128. doi:10.1037/a0024445
Field, A. P., & Gillett, R. (2010). How to do a meta-analysis. British Journal of Mathematical
and Statistical Psychology, 63(3), 665-694. doi:10.1348/000711010X502733
Fleiss, J. L., & Berlin, J. A. (2009). Effect sizes for dichotomous data. In H. Cooper, L. V.
Hedges, & J. C. Valentine (Eds.), The Handbook of Research Synthesis and MetaAnalysis (pp. 237-253). New York: Russell Sage Foundation.
Francis, G. (2013). Replication, statistical consistency, and publication bias. Journal of
Mathematical Psychology, 57(5), 153-169. doi:10.1016/j.jmp.2013.02.003
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences:
Unlocking the file drawer. Science, 345(6203), 1502-1505. doi:10.1126/science.1255484
Franco, A., Simonovits, G., & Malhotra, N. (2016). Underreporting in psychology experiments:
Evidence from a study registry. Social Psychological and Personality Science, 7(1), 8-12.
doi:10.1177/1948550615598377
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher,
5(10), 3-8.
Guan, M., & Vandekerckhove, J. (2015). A Bayesian approach to mitigation of publication bias.
Psychonomic Bulletin and Review, Advance online publication. doi:10.3758/s13423-0150868-6
Hedges, L. V., & Vevea, J. L. (2005). Selection method approaches. In H. R. Rothstein, A. J.
Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention,
assessment, and adjustments. Chichester: UK: Wiley.
Higgins, J. P. T. (2008). Commentary: Heterogeneity in meta-analysis should be expected and
appropriately quantified. International Journal of Epidemiology, 37(5), 1158-1160.
doi:10.1093/ije/dyn204
Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring
inconsistency in meta-analyses. British Medical Journal, 327(7414), 557-560.
doi:10.1136/bmj.327.7414.557
41
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Hopewell, S., Clarke, M., & Mallet, S. (2005). Grey literature and systematic reviews. In H. R.
Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis:
Prevention, assessment and adjustments (pp. 99-110).
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2(8),
e124. doi:10.1371/journal.pmed.0020124
Ioannidis, J. P. (2008). Interpretation of tests of heterogeneity and bias in meta-analysis. Journal
of Evaluation in Clinical Practice, 14(5), 951-957. doi:10.1111/j.13652753.2008.00986.x
Ioannidis, J. P., & Trikalinos, T. A. (2007a). The appropriateness of asymmetry tests for
publication bias in meta-analyses: A large survey. Canadian Medical Association
Journal, 176(8), 1091-1096. doi:10.1503/cmaj.060410
Ioannidis, J. P., & Trikalinos, T. A. (2007b). An exploratory test for an excess of significant
findings. Clinical Trials, 4(3), 245-253. doi:10.1177/1740774507079441
Ioannidis, J. P., Trikalinos, T. A., & Zintzaras, E. (2006). Extreme between-study homogeneity
in meta-analyses could offer useful insights. Journal of Clinical Epidemiology, 59(10),
1023-1032. doi:10.1016/j.jclinepi.2006.02.013
Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file drawer problem.
Statistical Science, 3(1), 109-117. doi:10.1214/ss/1177013012
Jackson, D. (2006). The implications of publication bias for meta-analysis' other parameter.
Statistics in Medicine, 25(17), 2911-2921. doi:10.1002/sim.2293
Jones, C. W., Handler, L., Crowell, K. E., Keil, L. G., Weaver, M. A., & Platts-Mills, T. F.
(2013). Non-publication of large randomized clinical trials: Cross sectional analysis.
British Medical Journal, 347. doi:10.1136/bmj.f6104
Jürgens, G., & Graudal, N. A. (2004). Effects of low sodium diet versus high sodium diet on
blood pressure, renin, aldosterone, catecholamines, cholesterols, and triglyceride.
Cochrane Database of Systematic Reviews, (1). doi:10.1002/14651858.CD004022.pub2
Kepes, S., Banks, G. C., McDaniel, M., & Whetzel, D. L. (2012). Publication bias in the
organizational sciences. Organizational Research Methods, 15(4), 624-662.
doi:10.1177/1094428112452760
42
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Kirkham, J. J., Dwan, K. M., Altman, D. G., Gamble, C., Dodd, S., Smyth, R., & Williamson, P.
R. (2010). The impact of outcome reporting bias in randomised controlled trials on a
cohort of systematic reviews. BMJ, 340(c365). doi:10.1136/bmj.c365
Koenker, R. (2005). Quantile regression. Cambridge: Cambridge University Press.
Lane, D. M., & Dunlap, W. P. (1978). Estimating effect size: Bias resulting from the significance
criterion in editorial decisions. British Journal of Mathematical & Statistical Psychology,
31, 107-112.
Langan, D., Higgins, J. P. T., & Simmonds, M. (2016). Comparative performance of
heterogeneity variance estimators in meta-analysis: A review of simulation studies.
Research Synthesis Methods(498). doi:10.1002/jrsm.1198
Light, R. J., & Pillemer, D. B. (1984). Summing up: The science of reviewing research.
Cambridge, MA: Harvard University Press.
Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of scale
analysis and factor analysis. Psychological Bulletin, 45(6), 507-529.
Macaskill, P., Walter, S. D., & Irwig, L. (2001). A comparison of methods to detect publication
bias in meta-analysis. Statistics in Medicine, 20(4), 641-654.
Moreno, S. G., Sutton, A. J., Ades, A. E., Stanley, T. D., Abrams, K. R., Peters, J. L., & Cooper,
N. J. (2009). Assessment of regression-based methods to adjust for publication bias
through a comprehensive simulation study. BMC Medical Research Methodology, 9(2).
doi:10.1186/1471-2288-9-2
Nuijten, M. B., van Assen, M. A. L. M., Veldkamp, C. L. S., & Wicherts, J. M. (2015). The
replication paradox: Combining studies can decrease accuracy of effect size estimates.
Review of General Psychology, 19(2), 172-182. doi:10.1037/gpr0000034
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science.
Science, 349(6251). doi:10.1126/science.aac4716
Paule, R. C., & Mandel, J. (1982). Consensus values and weighting factors. Journal of Research
of the National Bureau of Standards, 87(5), 377-385.
43
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2007). Performance of
the trim and fill method in the presence of publication bias and between-study
heterogeneity. Statistics in Medicine, 26(25), 4544-4562. doi:10.1002/sim.2889
Raudenbush, S. W. (2009). Analyzing effect sizes: Random-effects models. In H. Cooper, L. V.
Hedges, & J. C. Valentine (Eds.), The Handbook of Research Synthesis and MetaAnalysis (pp. 295-315). New York: Russell Sage Foundation.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological
Bulletin, 86(3), 638-641.
Rothstein, H. R., & Bushman, B. J. (2012). Publication bias in psychological science: Comment
on Ferguson and Brannick (2012). Psychological Methods, 17(1), 129-136.
doi:10.1037/a0027128
Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis. In H.
R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis:
Prevention, assessment and adjustments. Chichester, UK: Wiley.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve and effect size: Correcting for
publication bias using only significant results. Perspectives on Psychological Science,
9(6), 666-681. doi:10.1177/1745691614553988
Springate, D. A., & Kontopantelis, E. (2014). Cochrane_scraper: tools for downloading data
from the Cochrane Library of Systematic Reviews (Version 1.1.0). Retrieved from
https://github.com/DASpringate/Cochrane_scraper
Stanley, T. D., & Doucouliagos, H. (2014). Meta-regression approximations to reduce
publication selection bias. Research Synthesis Methods, 5(1), 60-78.
Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. (2017). Finding the power to reduce
publication bias. Statistics in Medicine. doi:10.1002/sim.7228
Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The
effect of the outcome of statistical tests on the decision to publish and vice versa. The
American Statistician, 49(1), 108-112. doi:10.2307/2684823
Sterne, J. A. C., Gavaghan, D., & Egger, M. (2000). Publication and related bias in metaanalysis: Power of statistical tests and prevalence in the literature. Journal of Clinical
Epidemiology, 53(11), 1119-1129. doi:10.1016/S0895-4356(00)00242-0
44
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Sterne, J. A. C., Harbord, R. M., Sutton, A. J., Jones, D. R., Ioannidis, J. P., Terrin, N., . . .
Higgins, J. P. T. (2011). Recommendations for examining and interpreting funnel plot
asymmetry in meta-analyses of randomised controlled trials. British Medical Journal,
343(7818), 1-8. doi:http://dx.doi.org/10.1136/bmj.d4002
Sutton, A. J. (2005). Evidence concerning the consequences of publication and related biases. In
H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis:
Prevention, assessment and adjustments (pp. 175-192). Chichester, UK: Wiley.
Terrin, N., Schmid, C. H., & Lau, J. (2005). In an empirical evaluation of the funnel plot,
researchers could not visually identify publication bias. Journal of Clinical
Epidemiology, 58(9), 894-901. doi:10.1016/j.jclinepi.2005.01.006
Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the
presence of heterogeneity. Statistics in Medicine, 22(13), 2113-2126.
doi:10.1002/sim.1461
United States Code (2007) US Public Law 110-85: Food and Drug Administration Amendments
Act 2007. http://frwebgate.access.gpo.gov/cgibin/getdoc.cgi?dbname=110_cong_public_laws&docid=f:publ085.110.pdf.
van Aert, R. C. M., Wicherts, J. M., & van Assen, M. A. L. M. (2016). Conducting metaanalyses on p-values: Reservations and recommendations for applying p-uniform and pcurve. Perspectives on Psychological Science, 11(5), 713-729.
doi:10.1177/1745691616650874
van Assen, M. A. L. M., van Aert, R. C. M., & Wicherts, J. M. (2015). Meta-analysis using
effect size distributions of only statistically significant studies. Psychological Methods,
20(3), 293-309. doi:http://dx.doi.org/10.1037/met0000025
Veroniki, A. A., Jackson, D., Viechtbauer, W., Bender, R., Bowden, J., Knapp, G., . . . Salanti,
G. (2016). Methods to estimate the between-study variance and its uncertainty in metaanalysis. Research Synthesis Methods, 7(1), 55-79. doi:10.1002/jrsm.1164
Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the
presence of publication bias. Psychometrika, 60(3), 419-435. doi:10.1007/bf02294384
Vevea, J. L., & Woods, C. M. (2005). Publication bias in research synthesis: Sensitivity analysis
using a priori weight functions. Psychological Methods, 10(4), 428-443.
doi:10.1037/1082-989X.10.4.428
45
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the
two-independent and two-dependent samples design. Journal of Educational and
Behavioral Statistics, 32(1), 39-60. doi:10.3102/1076998606298034
46
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Footnotes
1
The search was conducted within the journal Psychological Bulletin for articles
published between 2004 and 2014, and the search terms “meta-analy*” and not “comment”,
“note”, “correction”, and “reply” in the article’s title were used.
2
In order to verify the procedure of extracting data, data from a random sample of 10%
of the included meta-analyses were extracted a second time by a different researcher. No major
discrepancies were observed between the data that was extracted twice.
47
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Appendix A
Table A1. Results of logistic regression predicting statistical significance of Egger’s regression
test with discipline and control variable number of effect sizes in a data set.
B (SE)
z-value (p-value)
OR
95% CI for OR
Intercept
-2.372 (.193)
-12.270 (< .001)
0.093
0.063;0.134
Discipline
0.261 (.231)
1.128 (.130)
1.298
0.826;2.051
Number of
0.025 (.011)
2.418 (.016)
1.026
1.004;1.048
effect sizes
Note. CDSR is the reference category for discipline. p-values for the intercept and number of
effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI =
profile likelihood confidence interval.
Table A2. Results of logistic regression predicting statistical significance of rank-correlation test
with discipline and control variable number of effect sizes in a data set.
B (SE)
z-value (p-value)
OR
95% CI for OR
Intercept
-2.856 (.232)
-12.310 (< .001)
0.057
0.036;0.089
Discipline
0.652 (.266)
2.453 (.007)
1.919
1.150;3.273
Number of
0.026 (.011)
2.375 (.018)
1.026
1.004;1.049
effect sizes
Note. CDSR is the reference category for discipline. p-values for the intercept and number of
effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI =
profile likelihood confidence interval
48
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Table A3. Results of logistic regression predicting statistical significance of p-uniform’s
publication bias test with discipline and control variable number of statistically significant effect
sizes in a data set.
B (SE)
z-value (p-value)
OR
95% CI for OR
Intercept
-1.689 (.276)
-6.110 (< .001)
0.185
0.106;0.314
Discipline
-0.002 (.305)
-0.007 (.503)
0.998
0.551;1.830
Number of
-0.112 (.072)
-1.558 (.119)
0.894
0.765;1.012
effect sizes
Note. CDSR is the reference category for discipline. p-values for the intercept and number of
effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI =
profile likelihood confidence interval
Table A4. Results of logistic regression predicting statistical significance of test of excess
significance with discipline and control variable number of effect sizes in a data set.
B (SE)
z-value (p-value)
OR
95% CI for OR
Intercept
-3.815 (.330)
-11.564 (< .001)
0.022
0.011;0.040
Discipline
0.721 (.377)
1.913 (.028)
2.056
1.003;4.461
Number of
0.038 (.012)
3.170 (.002)
1.039
1.015;1.065
effect sizes
Note. CDSR is the reference category for discipline. p-values for the intercept and number of
effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI =
profile likelihood confidence interval
49
STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS
Appendix B
Table B1. Results of quantile regression with the median of p-uniform’s effect size estimates as
dependent variable and as independent variables discipline, I2-statistic, harmonic mean of the
standard error (standard error), proportion of statistically significant effect sizes in a data set
(Prop. sig. effect sizes), and number of effect sizes in a data set.
B (SE)
t-value (p-value)
95% CI
Intercept
-0.067 (0.193)
-0.347 (.729)
-0.315;-0.045
Discipline
0.009 (0.069)
0.131 (.896)
-0.016;0.070
I2-statistic
-0.001 (0.002)
-0.673 (.749)
-0.002;0.001
Standard error
2.017 (0.551)
3.660 (<.001)
1.715;2.182
Prop. sig.
0.194 (0.103)
1.881 (.060)
0.078;0.330
-0.002 (0.002)
-0.911 (.363)
-0.005;0.0002
effect sizes
Number of
effect sizes
Note. CDSR is the reference category for discipline. p-value for the I2-statistic is one-tailed
whereas the other p-values are two-tailed. CI = confidence interval based on inverting a rank test.
50