STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Publication bias in meta-analyses from psychology and medical research: A meta-meta-analysis Robbie C. M. van Aert1, Jelte M. Wicherts1, and Marcel A. L. M. van Assen1,2 1 Tilburg University 2Utrecht University Author Note Funding for this research was provided by the Berkeley Initiative for Transparency in the Social Sciences and the Laura and John Arnold Foundation. Correspondence concerning the article should be addressed to Robbie C. M. van Aert, Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, the Netherlands 1 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Abstract Publication bias is a substantial problem for the credibility of research in general and of metaanalyses in particular, as it yields overestimated effects and may suggest the existence of nonexisting effects. Although there is consensus that publication bias is widespread, how strongly it affects different scientific literatures is currently less well-known. We examine evidence of publication bias in a large-scale data set of meta-analyses published in Psychological Bulletin (representing meta-analyses from psychology) and the Cochrane Database of Systematic Reviews (representing meta-analyses from medical research). Psychology is compared to medicine, because medicine has a longer history than psychology with respect to preregistration of studies as an effort to counter publication bias. The severity of publication bias and its inflating effects on effect size estimation were systematically studied by applying state-of-the-art publication bias tests and the p-uniform method for estimating effect size corrected for publication bias. Publication bias was detected in only a small amount of homogeneous subsets. The lack of evidence of publication bias was probably caused by the small number of effect sizes in most subsets resulting in low statistical power for publication bias tests. Regression analyses showed that small-study effects were actually present in the meta-analyses. That is, smaller studies were accompanied with larger effect sizes, possible due to publication bias. Overestimation of effect size due to publication bias was similar for meta-analyses published in Psychological Bulletin and Cochrane Database of Systematic Reviews. Keywords: publication bias, meta-analysis, p-uniform, small-study effects 2 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Meta-analysis is the standard technique for synthesizing different studies on the same topic, and is defined as “the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings” (Glass, 1976, p. 3). One of the greatest threats to the validity of meta-analytic results is publication bias, meaning that the publication of studies depends on the direction and statistical significance of the results (Rothstein, Sutton, & Borenstein, 2005). Publication bias generally leads to effect sizes being overestimated and the publication of false-positive results (e.g., Lane & Dunlap, 1978; Nuijten, van Assen, Veldkamp, & Wicherts, 2015). Hence, publication bias results in false impressions about the magnitude and existence of an effect (van Assen, van Aert, & Wicherts, 2015). Evidence of publication bias exists in various research fields. The social sciences literature consists of approximately 90% statistically significant results (Fanelli, 2012; Sterling, Rosenbaum, & Weinkam, 1995), which is not in line with the on average low statistical power of about 50% or less in, for instance, psychology (Bakker, van Dijk, & Wicherts, 2012; Cohen, 1990). Franco, Malhotra, and Simonovits (2014) examined publication bias in studies that received a grant within social sciences, and concluded that 64.6% of the studies with statistically nonsignificant results was not written up (cf. Cooper, DeNeve, & Charlton, 1997). In a highly similar project within the psychological literature, Franco, Simonovits, and Malhotra (2016) showed that 70% of the included outcomes in a study were not reported, and that this selective reporting depends on statistical significance of the outcomes. The Reproducibility Project Psychology (Open Science Collaboration, 2015) also provided indirect evidence for the existence of publication bias within psychology. In this project, a selection of 100 psychological studies were replicated to examine its replicability and to identify characteristics that explain replication success. Only 39 studies were deemed to be replicated providing indirect evidence for the 3 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS presence of publication bias. More attention has been paid to publication bias within medical research compared to the social sciences (Hopewell, Clarke, & Mallet, 2005). Medical research has a longer history in registering clinical trials before conducting the research (e.g., Dickersin, Chan, Chalmers, Sacks, & Smith, 1987; Jones et al., 2013). As of 2007, the US Food and Drug Administration Act (FDA) even requires US researchers to make the results of different types of clinical trials publicly available independent of whether the results have been published or not. With registers like clinicaltrials.gov, it is easier for meta-analysts to search for unpublished research, and to include it in their meta-analysis. Furthermore, it is straightforward to study publication bias by comparing the reported results in registers with the reported results in publications. Studies comparing the reported results in registers and publications show that statistically significant outcomes are more likely to be reported, and clinical trials with statistically significant results have a higher probability of getting published (Dwan et al., 2008; Kirkham et al., 2010). A number of methods exist to test for publication bias in a meta-analysis and to estimate a meta-analytic effect size corrected for publication bias. However, publication bias is often not routinely assessed in many meta-analyses (Aguinis, Dalton, Bosco, Pierce, & Dalton, 2010; Aytug, Rothstein, Zhou, & Kern, 2012; Banks, Kepes, & Banks, 2012) or analyzed with suboptimal methods that lack statistical power to detect it (e.g., Ioannidis, 2008; Ioannidis & Trikalinos, 2007a). It has been suggested to reexamine publication bias in published metaanalyses (Banks, Kepes, & Banks, 2012; Banks, Kepes, & McDaniel, 2012) by applying recently developed methods to better understand the severity and prevalence of publication bias in different fields. These novel methods have better statistical properties than existing publication bias tests and methods developed earlier to correct effect sizes for publication bias. Several 4 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS authors have recommended to not rely on a single method for examining publication bias in a meta-analysis, but rather to use and report a set of different publication bias methods (Coburn & Vevea, 2015; Kepes, Banks, McDaniel, & Whetzel, 2012). This so-called triangulation takes into account that none of the publication bias methods outperforms all the other methods under each and every condition; one method can signal publication bias in a meta-analysis whereas another one does not. Using a set of methods to assess the prevalence and severity of publication bias may yield a more balanced conclusion. We will answer three research questions in this paper. The first research question is about the prevalence of publication bias: “What is the prevalence of publication bias within metaanalyses about psychological and medical research?” (1a), and “Is publication bias more prevalent in psychology than in medical research after controlling for the number of effect sizes in a meta-analysis?” (1b). Medical research is selected to compare with psychology, because more attention has been paid to publication bias in general (Hopewell et al., 2005) and study registration in particular (e.g., Dickersin et al., 1987; Jones et al., 2013) within medicine compared to psychology. We also evaluate the amount of agreement between different publication bias methods. In the second research question, we examine whether effect size estimates of traditional meta-analysis and corrected for publication bias by the p-uniform method can be predicated by characteristics of a meta-analysis: “What characteristics of a meta-analysis are predictors of the estimates of traditional meta-analysis and p-uniform?”. The third research question also consists of two parts and is about overestimation in effect size caused by publication bias: “How much is effect size overestimated by publication bias?” (3a), and “What characteristics of a meta-analysis are predictors of the overestimation?” (3b). The paper continuous by providing an overview of publication bias methods. Next, we 5 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS describe the criteria for a meta-analysis to be included in our study, and how the data of metaanalyses were extracted and analyzed. Subsequently, we provide the results of our analyses and conclude with a discussion. Publication bias methods Methods for examining publication bias can be divided into two groups: methods that assess or test the presence of publication bias and methods that estimate effect sizes corrected for publication bias. Methods that correct effect sizes for publication bias usually also provide a confidence interval and test the null hypothesis of no effect corrected for publication bias. Assessing or testing publication bias The most often used method for assessing publication bias is the fail-safe N method (Banks, Kepes, & McDaniel, 2012; Ferguson & Brannick, 2012). This method computes how many effect sizes with a zero effect size have to be added to a meta-analysis for changing a statistically significant summary effect size in a meta-analysis to a nonsignificant result (Rosenthal, 1979). Publication bias is deemed to be absent if the method suggests that a large number of effect sizes should be added to the meta-analysis to get a nonsignificant summary effect size. Applying the method is discouraged (Becker, 2005; Borenstein, Hedges, Higgins, & Rothstein, 2009, Chapter 30), because it computes the number of required effect sizes to get a nonsignificant summary effect size assuming that all these effect sizes are zero. Hence, the number of required effect sizes will be lower if the effect sizes are below zero and higher if the effect sizes are above zero. Another drawback of the method is that it focuses on statistical significance and not on the magnitude of an effect that is substantially of importance (Borenstein 6 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS et al., 2009, Chapter 30). Another popular method is the funnel plot (Light & Pillemer, 1984). In a funnel plot, the effect size estimates of the included studies in a meta-analysis are presented on the x-axis and some measure of the effect sizes’ precision is displayed on the y-axis. Figure 1 shows a funnel plot for a meta-analysis in the systematic review by Jürgens and Graudal (2004) studying the effect of sodium intake on different health outcomes. Solid circles indicate Hedges’ g effect sizes included in this meta-analysis, and the standard error is presented on the y-axis of this funnel plot. A funnel plot illustrates whether small-study effects are present. That is, whether there is a relationship between effect size and its precision. The funnel plot should be symmetric and resemble an inverted funnel in the absence of small-study effects whereas a gap in the funnel indicates that small-study effects exist. Publication bias is one of the causes of small-study effects, but funnel plot asymmetry is often interpreted as evidence for publication bias in a metaanalysis. Small-study effects can also be caused by, for instance, heterogeneity in true effect size and poor methodological design of smaller studies (Egger, Smith, Schneider, & Minder, 1997). Another cause of funnel plot asymmetry is when study’s sample size is determined based on a power analysis. A true effect size that is expected to be large results in a small sample size in a power analysis. The observed effect size with an underlying large true effect size is most likely large as well, but estimated based on a small sample size. The opposite holds for a small true effect size; required sample size in a power analysis is large, but the observed effect size is most likely small. Hence, a correlation between observed effect size and its precision is introduced by determining sample size of a study based on power analysis. 7 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Figure 1. Funnel plot showing the relationship between observed effect size (Hedges’ g) and its standard error in a meta-analysis by Jürgens and Graudal (2004) on the effect of sodium intake on Noradrenaline. Solid circles are Hedges’ g observed effect sizes and open circles are imputed effect sizes by the trim and fill method. Evaluating whether small-study effects exist by eyeballing a funnel plot is rather subjective (Terrin, Schmid, & Lau, 2005). Hence, Egger’s regression test (Egger et al., 1997) and the rank-correlation test (Begg & Mazumdar, 1994) were developed to test whether small-study effects are present in a meta-analysis. Egger’s regression test fits a regression line through the observed effect sizes in the funnel plot, and evidence for small-study effects is obtained if the 8 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS slope of this regression line is significantly different from zero. The rank-correlation test computes the rank correlation (Kendall’s τ) between the study’s effect sizes and their precision to test for small-study effects. Drawback of these funnel plot asymmetry tests is that statistical power to detect publication bias is low especially if the number of effect sizes in a meta-analysis is small (Begg & Mazumdar, 1994; Sterne, Gavaghan, & Egger, 2000). Hence, these methods are recommended to be only applied to meta-analyses with ten or more effect sizes (Sterne et al., 2011). The test of excess significance (TES) compares whether the number of statistically significant effect sizes in a meta-analysis with the expected number of statistically significant effect sizes (Ioannidis & Trikalinos, 2007b). More statistically significant results than expected indicates that some effect sizes are missing from the meta-analysis possibly caused by publication bias. The expected number of statistically significant effect sizes is computed based on the statistical power of the effect sizes in a meta-analysis. This statistical power is calculated for each effect size assuming that the true effect size is equal to the meta-analytic effect size estimate. The expected number of statistically significant results is then computed by taking the sum of the statistical power for each effect size. The observed and expected number of statistically significant results are compared by means of a χ2 test. Ioannidis and Trikalinos (2007b) recommend to not apply the method if heterogeneity in true effect size is present. Moreover, the TES is known to be conservative (Francis, 2013). Another more recently developed method for examining publication bias is the p-uniform method (van Aert, Wicherts, & van Assen, 2016; van Assen et al., 2015). This method is based on the statistical principle that the distribution of p-values at the true effect size is uniform. Since in the presence of publication bias not all statistically nonsignificant effect sizes get published, p- 9 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS uniform discards nonsignificant effect sizes and computes p-values conditional on being statistically significant. These conditional p-values should be uniformly distributed at the (fixedeffect) meta-analytic effect size estimate based on the significant and nonsignificant effect sizes, and deviations from the uniform distribution signals publication bias. P-uniform’s publication bias test was compared to the TES in a simulation study (van Assen et al., 2015), and statistical power of p-uniform was in general larger than the TES except for conditions with a true effect size of zero and no severe publication bias. The simulation study also showed that type-I error rate of p-uniform’s publication bias test was too low if the true effect size was of medium size. Limitations of p-uniform’s publication bias test are that it assumes that the true effect size is homogeneous which is not very common (e.g., Higgins, 2008; Ioannidis, Trikalinos, & Zintzaras, 2006; Stanley & Doucouliagos, 2014), and that the method inefficiently uses the available information by discarding statistically nonsignificant effect sizes in a meta-analysis. Correcting effect sizes for publication bias Publication bias tests provide evidence about whether publication bias is present in a meta-analysis. However, from an applied perspective the magnitude of the effect after correcting for publication bias is more of interest. Furthermore, methods that can estimate the effect size corrected for publication bias are desirable, because publication bias tests in general have low statistical power (Moreno et al., 2009). The most popular method to correct for publication bias in a meta-analysis is the trim and fill method (Duval & Tweedie, 2000a, 2000b). This method corrects for funnel plot asymmetry by trimming the most extreme effect sizes from one side of the funnel plot and filling these effect sizes in the other side of the funnel plot to obtain funnel plot symmetry. The corrected effect size estimate is obtained by computing the meta-analytic estimate based on the observed and imputed 10 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS effect sizes. Trim and fill can also be used to create a confidence interval and test the null hypothesis of no effect after adjusting for funnel plot asymmetry. The procedure of trim and fill is illustrated in Figure 1. The most extreme effect sizes from the right-hand side of the funnel plot are trimmed and imputed in the left-hand side of the funnel plot (open circles in Figure 1). Drawback of the trim and fill method is, similar to the other methods based on the funnel plot, that the method corrects for small-study effects and this does not necessarily have to be publication bias. Simulation studies have shown that results of trim and fill cannot be trusted because it incorrectly adds studies when none are missing (Peters, Sutton, Jones, Abrams, & Rushton, 2007; Rothstein & Bushman, 2012; Terrin, Schmid, Lau, & Olkin, 2003). Hence, trim and fill is discouraged because of its misleading results (Moreno et al., 2009; Simonsohn, Nelson, & Simmons, 2014; van Assen et al., 2015). The PET-PEESE method (Stanley & Doucouliagos, 2014) is an extension of Egger’s regression test to estimate an effect size in a meta-analysis corrected for small-study effects. PET-PEESE is based on a regression analysis where the observed effect sizes are regressed on their standard errors by means of a weighted least squares regression with the inverse of the effect sizes’ sampling variances as weights. If the intercept is not significantly different from zero, the estimate of the intercept is interpreted as effect size estimate corrected for publication bias. The estimate of the intercept reflects the effect size estimate in a study with a standard error of zero (i.e., a study with an infinite sample size). However, the intercept is biased if the intercept is significantly different from zero (Stanley & Doucouliagos, 2014). Hence, the intercept of another weighted least squares regression analysis (with the inverse sampling variances as weights) is interpreted as effect size estimate if the intercept is significantly different from zero where the observed effect sizes are regressed on their sampling variances. Simulation studies 11 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS have shown that PET-PEESE substantially reduced the overestimation caused by small-study effects, and that the method is unlikely to provide reliable results if it is based on less than 10 effect sizes (Stanley, Doucouliagos, & Ioannidis, 2017). The p-uniform method can also be used for estimating effect size (and a confidence interval) and testing the null hypothesis of no effect corrected for publication bias. P-uniform’s effect size estimate is equal to the effect size for which the p-values conditional on being statistically significant are uniformly distributed. A similar method that uses the distribution of conditional p-values for estimating effect size in the presence of publication bias is the p-curve method (Simonsohn et al., 2014). This method is similar to the p-uniform method, but differs slightly in implementation (for a description of the difference between the two methods see van Aert et al., 2016). A limitation of p-uniform and p-curve is that effect sizes are overestimated in the presence of heterogeneity in true effect size (van Aert et al., 2016). Especially if the heterogeneity in true effect size is more than moderate (I2 > 50%; more than half of the total variance in effect size is caused by heterogeneity) both methods overestimate the effect size and their results should be interpreted as a sensitivity analysis. Another limitation of the methods is that these are not efficient by estimating effect size based on only the statistically significant effect sizes. P-uniform and p-curve both outperformed trim and fill in simulation studies (Simonsohn et al., 2014; van Assen et al., 2015). A selection model approach (Hedges & Vevea, 2005) can also be used for estimating effect size corrected for publication bias. A selection model makes assumptions on the distribution of effect sizes (i.e., effect size model) and the mechanism that determines which studies are observed (i.e., selection model). The selection model specifies which effect sizes from the effect size model are observed. The effect size estimate (and confidence interval) 12 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS corrected for publication bias is obtained by combining the effect size and selection model. Selection model approaches exist that estimate the selection model (Iyengar & Greenhouse, 1988; Vevea & Hedges, 1995) whereas others assume a known selection model (Vevea & Woods, 2005). Selection models are hardly used in practice, because the user has to make sophisticated assumptions and choices and a large number of effect sizes are needed to avoid convergence problems (Field & Gillett, 2010; Terrin et al., 2003). Methods Data A large-scale data set was created with meta-analyses published between 2004 and 2014 in Psychological Bulletin and in the Cochrane Library to study the extent and prevalence of publication bias in psychology and medical research. Psychological Bulletin was selected to represent psychological meta-analyses, because this journal publishes many meta-analyses on topics from psychology. Meta-analyses published in the Cochrane Database of Systematic Reviews (CDSR) of the Cochrane Library were used to represent medical research. This database is a collection of peer-reviewed medical systematic reviews. A requirement for the inclusion of a meta-analysis was that either fixed-effect or randomeffects meta-analysis had to be used in the meta-analysis (i.e., no other meta-analytic methods as for instance meta-analytic structural equation modelling or multilevel meta-analysis). Another requirement was that sufficient information in the meta-analysis should be available to compute the primary study’s standardized effect size and its sampling variance. The same effect size measure (e.g., correlation and standardized mean difference) as in the meta-analysis was used to 13 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS compute the primary study’s effect size and its sampling variance. Formulas as described in Borenstein (2009), Fleiss and Berlin (2009), and Viechtbauer (2007) were used for computing the standardized effect sizes and their sampling variances. For each included primary study, we extracted besides information on effect size and sampling variance also information on all categorical moderator variables. Based on these moderators, we created homogeneous data sets of effect sizes, i.e. a homogeneous data set consists of the effect sizes that have the same scores on all the extracted moderators. Consequently, each meta-analysis may contained more than one data set if multiple homogeneous data sets could be extracted based on the included moderators. We only included datasets with less than moderate heterogeneity (I2<50%, i.e., less than half of the total variance in effect sizes is caused by residual between-study heterogeneity) (Higgins, Thompson, Deeks, & Altman, 2003), because none of the publication bias methods has desirable statistical properties in case of extreme between-study heterogeneity in true effect size (Ioannidis & Trikalinos, 2007a, 2007b; van Aert et al., 2016; van Assen et al., 2015). Different effect size measures were sometimes used within a meta-analysis. This may cause heterogeneity in a meta-analysis, so the type of effect size measure was also used for creating homogeneous subgroups. Publication bias tests have low statistical power (e.g., Begg & Mazumdar, 1994; Macaskill, Walter, & Irwig, 2001; van Assen et al., 2015) if the number of effect sizes in a metaanalysis is small. Hence, another criterion for including a data set in the analyses was that a data set should contain at least five effect sizes. A search in Psychological Bulletin1 resulted in 137 meta-analyses that were published between 2004 and 2014 and that were eligible for inclusion. The corresponding author was contacted and a data request was made if not sufficient information was available in the metaanalysis. Data of the meta-analyses were extracted by hand from the papers.2 A total number of 14 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS 370 datasets from meta-analyses published in Psychological Bulletin with less than moderate heterogeneity and five or more studies were eligible for inclusion in our study. Data of all systematic reviews in the CDSR are online stored in a standardized format, and can therefore be extracted by an automated procedure. We used the Cochrane scraper developed by Springate and Kontopantelis (2014) to automatically extract data from systematic reviews. The total number of meta-analyses in the CDSR is larger than in Psychological Bulletin, so we drew a simple random sample without replacement of systematic reviews from the CDSR to represent medical research. Each systematic review in the database has an identification number. We sampled identification numbers, extracted datasets from the sampled systematic review, and included a dataset in our study if I2<50% and the number of effect sizes in a dataset was at least five. We continued sampling systematic reviews and extracting datasets till the same number of eligible datasets for inclusion were obtained as extracted from Psychological Bulletin (370). The next section describes how the research questions were answered, and how the variables were measured. Analysis Research question 1. The prevalence of publication bias in data sets from psychology and medical research was examined in research question 1 by using state-of-the-art publication bias methods. The same effect size measure as in the original meta-analyses was used to answer RQ 1. Different publication bias tests were applied to each data set to examine the prevalence of publication bias. The following publication bias methods were included: p-uniform’s publication bias test (van Aert et al., 2016; van Assen et al., 2015), Egger’s test (Egger et al., 1997), rankcorrelation test (Begg & Mazumdar, 1994), and the TES (Ioannidis & Trikalinos, 2007b). Since 15 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS publication bias tests have low statistical power we followed a recommendation by Egger et al. (1997) to conduct two-tailed hypothesis tests with α=.1 for all methods. P-uniform’s publication bias test can be applied to observed effect sizes in a data set that are either significantly smaller or larger than zero. Hence, p-uniform was applied to negative or positive effect sizes in a data set depending on where the majority of statistically significant results was observed based on a twotailed hypothesis test with α=.05. The estimator based on the Irwin-Hall distribution was used for p-uniform. RQ 1a was answered by counting how often each method rejects the null hypothesis of no publication bias. Agreement among the publication bias tests was examined by computing Loevinger’s H values (Loevinger, 1948) for each combination of two methods. Loevinger’s H is a statistic to quantify the association between two dichotomous variables (i.e., statistically significant or not). The maximum value of Loevinger H is 1 indicating a perfect association where the minimum value depends on characteristics of the data. For data sets with no statistically significant effect sizes, p-uniform could not be applied, and we computed the association between the results of p-uniform and other methods only for data sets with statistically significant effect sizes. We studied whether publication bias was more prevalent in data sets from Psychological Bulletin and CDSR (research question 1b) by conducting for each publication bias test a logistic regression with as dependent variable whether a publication bias test was statistically significant or not and as independent variable a dummy variable indicating whether a dataset was obtained from Psychological Bulletin or CDSR (reference category). The number of effect sizes in a data set (or statistically significant effect sizes for p-uniform) was included as control variable, because statistical power of publication bias tests depends on the number of effect sizes in a data set and the number of effect sizes in data sets from meta-analyses published in Psychological 16 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Bulletin and CDSR were expected to differ. We hypothesized that publication bias is more severe in data sets from Psychological Bulletin than CDSR after controlling for the number of effect sizes in a data set (or number of statistically significant effect sizes for p-uniform). This relationship was expected because medical researchers have been longer aware of the consequences of publication bias whereas broad awareness of publication bias recently originated in psychology. Furthermore, psychology and medical research differ in characteristics like sample sizes, cost of data collection, and flexibility in design that are considered relevant for publication bias (Ioannidis, 2005). One-tailed hypothesis tests with α=.05 were used for answering research question 1b. Research question 2. Characteristics of data sets were used to predict the estimates of traditional meta-analysis and estimates corrected for publication bias in this research question. All effect sizes and their variances were transformed to Cohen’s d by using the formulas in Borenstein (2009, section 12.5). If Cohen’s d could not be computed based on the available information, Hedges’ g was used as an approximation of Cohen’s d (52 data sets). Random-effects meta-analysis was used to estimate the effect size rather than fixed-effect meta-analysis. Random-effects meta-analysis assumes that there is no fixed true effect underlying each effect size (Raudenbush, 2009), and was preferred over fixed-effect metaanalysis because a small amount of heterogeneity in true effect size could be present in the data sets. The Paule-Mandel estimator (Paule & Mandel, 1982) was used in random-effects metaanalysis for estimating the amount of between-study heterogeneity since this estimator has the best statistical properties in most situations (Langan, Higgins, & Simmonds, 2016; Veroniki et al., 2016). Effect sizes corrected for publication bias were estimated with p-uniform, because the number of effect sizes included in meta-analyses in medical research is often small. Hence, 17 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS selection models were not used due to convergence problems in meta-analyses with a small number of effect sizes (e.g., Field & Gillett, 2010), and PET-PEESE is not recommended to be used since it yields unreliable results if there are less than 10 observed effect sizes (Stanley et al., 2017). P-uniform was preferred over trim and fill, because applying trim and fill is discouraged (Moreno et al., 2009; Simonsohn et al., 2014; van Assen et al., 2015). For applying p-uniform, the estimator based on the Irwin-Hall distribution was used, and two-tailed hypothesis tests in the primary studies were assumed with α=.05. The underlying true effect size in a data set can be either positive or negative. Hence, the dependent variable of these analyses were the absolute values of the estimates of random-effects meta-analysis and p-uniform. Two separate weighted least squares (WLS) regressions were used with as dependent variables the absolute values of the effect sizes estimates of random-effects meta-analysis and puniform. The inverse of the meta-analytic variances was selected as weights in the WLS regressions, because it is a function of both the sample size in the primary studies and the number of effect sizes in a data set. For the analysis with the meta-analytic effect size estimate as dependent variable, all data sets were used whereas only the data sets with statistically significant effect sizes were used for the analyses with p-uniform’s estimate as the dependent variable. Five independent variables were included in the WLS regression with the random-effects meta-analytic estimate per data set as dependent variable. The independent variables and the hypothesized relationships are listed in Table 1. The meta-analytic effect size estimate was expected to be larger in data sets from Psychological Bulletin, because publication bias was expected to be more severe in psychology than medical research. No straightforward relationship was hypothesized between the I2-statistic and the meta-analytic effect size estimate. The amount 18 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS of heterogeneity can be either over- or underestimated depending on the extent of publication bias (Jackson, 2006). Hence, no main effect of I2 on the estimate of random-effects meta-analysis was expected. Primary studies’ precision in a data set was operationalized by computing the harmonic mean of the primary studies’ standard error. A negative relationship was expected between primary studies’ precision and the meta-analytic estimate, because less precise effect size estimates (i.e., larger standard errors) were expected to be accompanied with more bias and hence larger meta-analytic effect size estimates. The proportion of statistically significant effect sizes in a data set was expected to have a positive relationship on the meta-analytic effect size estimate, because effect sizes with the same sample size that are statistically significant are larger than statistically nonsignificant effect sizes. The independent variable indicating the number of effect sizes in a data set was included to control for differences in the number of studies in a meta-analysis. Table 1. Hypotheses between independent variables and effect size estimate of random-effects meta-analysis. Independent variable Discipline Hypothesis Overestimation was expected to be more severe in data sets from Psychological Bulletin than CDSR 2 I -statistic No relationship was expected Primary studies’ precision A negative relationship was expected between primary studies’ precision and the meta-analytic estimate Proportion of statistically A positive relationship was expected between the proportion of significant effect sizes statistically significant effect sizes and the meta-analytic effect size estimate Number of effect sizes in a No relationship was expected data set For the WLS regression with p-uniform’s estimate of meta-analyses as dependent variable, the same independent variables were included as for the WLS regression with the meta19 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS analytic estimate as dependent variable (see Table 2). No hypothesis was specified for the effect of discipline. A relationship between discipline and the effect size estimate of p-uniform may exist, but it is unclear whether estimates after correcting for publication bias are larger for data sets from Psychological Bulletin or CDSR. We expected a positive relationship of the I2-statistic, because p-uniform overestimates the true effect size in the presence of between-study heterogeneity (van Aert et al., 2016; van Assen et al., 2015). The primary studies’ precision was operationalized in the same way as for the WLS regression with the meta-analytic estimate as dependent variable. A negative relationship between primary studies’ precision and the metaanalytic effect size estimate is indicative for publication bias. Since p-uniform corrects for publication bias, no relationship was expected between primary studies’ precision and puniform’s estimate. No specific direction of the effect of the proportion of statistically significant effect sizes in a data set was hypothesized. Many statistically significant effect sizes in a data set suggest that the studied effect size is large, sample size of the primary studies are large, or there was severe publication bias in combination with many conducted (but not published) primary studies. These partly opposing effects may have canceled each other out or there can be a positive or negative relationship. The number of effect sizes in a data set was again included as control variable, and no relationship was expected with p-uniform’s effect size estimate. 20 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Table 2. Hypotheses between independent variables and p-uniform’s effect size estimate. Independent variable Discipline Hypothesis No directional hypothesis was specified, because the direction was unclear 2 I -statistic A positive relationship was expected between the I2-statistic and p-uniform’s estimate Primary studies’ precision No relationship was expected Proportion of statistically No directional hypothesis was specified, because the direction significant effect sizes was unclear Number of effect sizes in a No relationship was expected data set The effect size estimate of p-uniform can become extremely positive or negative if there are multiple p-values just below the α-level (van Aert et al., 2016; van Assen et al., 2015). These outliers may affect the results of the regression analysis with p-uniform’s estimate as dependent variable. Hence, we used quantile regression (Koenker, 2005) as a sensitivity analysis, because this procedure is less influenced by outliers in the dependent variable. In the quantile regression, the independent variables were regressed on the median of the estimates of p-uniform. Research question 3. Estimates of random-effects meta-analysis and p-uniform obtained for answering research question 2 were used to examine the overestimation caused by publication bias. It is possible that estimates of the meta-analysis and p-uniform have opposite signs (i.e., negative estimate of p-uniform and positive meta-analytic estimate or the other way around). An effect size estimate of p-uniform in the opposite direction than the meta-analytic estimate is often unrealistic, because this suggests that, for instance, a negative true effect size results in multiple positive observed effect sizes. Effect size estimates in opposing directions by meta-analysis and p-uniform may be caused by many p-values just below α-level. Hence, puniform’s estimate was set equal to zero in these situations. Setting p-uniform’s estimate to zero 21 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS when its sign is opposite to that of meta-analysis is in line with the recommendation in van Aert et al. (2016). A new variable Y was created to reflect the overestimation of random-effects metaanalysis when compared with p-uniform. If the meta-analytic estimate was larger than zero, Y=MA-corrected where “MA” is the meta-analytic estimate and “corrected” is the estimate of puniform. If the meta-analytic estimate was smaller than zero, Y=-MA+corrected. Variable Y was zero if the estimates of p-uniform and meta-analysis were the same, positive if p-uniform’s effect size estimate was closer to zero than the meta-analytic estimate (if they originally had the same sign), and negative if p-uniform’s estimate was farther away from zero than the meta-analytic estimate (if they originally had the same sign). The Y variable was computed for each dataset with statistically significant effect sizes, and we computed the mean and median of Y for data sets from Psychological Bulletin and CDSR in order to get an overall estimate of the amount of overestimation in effect size (research question 3a). In research question 3b, WLS regression (with the inverse of the variance of the randomeffects meta-analytic estimate as weights) was used to examine the relationship between the differences in estimates of random-effects meta-analysis and p-uniform and the same independent variables as in research question 2. The hypothesized relationships between the independent variables and Y are summarized in Table 3. A larger value on Y was expected for data sets from Psychological Bulletin than CDSR, because overestimation was expected to be more severe in psychology than in medicine. We hypothesized a negative relation between the I2-statistic and Y. P-uniform overestimates the effect size in the presence of between-study heterogeneity (van Aert et al., 2016; van Assen et al., 2015) resulting in a smaller difference between the estimates of p-uniform and random-effects meta-analysis. Primary studies’ precision 22 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS was hypothesized to be negatively related to Y, because overestimation of the meta-analytic estimate was expected to decrease as a function of primary studies’ precision. No specific directions of the relationships between the number of effect sizes in a data set and the proportion of statistically significant effect sizes in a data set were expected. Although a positive effect of this proportion on the meta-analytic effect size estimate was expected, the effect of the proportion on p-uniform’s estimate was unclear. The number of effect sizes in a data set was incorporated in the WLS regression as control variable. Table 3. Hypotheses between independent variables and overestimation in effect size when comparing random-effects meta-analysis with p-uniform (Y). Independent variable Discipline Hypothesis A larger value on Y was expected for data sets from Psychological Bulletin than CDSR 2 I -statistic A negative relationship was expected between the I2-statistic and Y Primary studies’ precision A negative relationship was expected between primary studies’ precision and Y Proportion of statistically No directional hypothesis was specified, because the direction significant effect sizes was unclear Number of effect sizes in a No relationship was expected data set Estimates of p-uniform that are in the opposite direction than traditional meta-analysis were set equal to zero before computing the Y variable. This affected the results of the WLS regression. Hence, quantile regression (Koenker, 2005) was used as sensitivity analysis with the median of Y as dependent variable instead of the mean of Y in WLS regression. Since quantile regression uses the median of Y, its results were not affected by truncating estimates of p- 23 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS uniform to zero. Results The total number of included data sets was 740 (370 representing Psychological Bulletin and 370 representing CDSR). Figure 2 shows the number of effect sizes in a data set for included data sets from Psychological Bulletin (a) and CDSR (b). Both figures show that no data sets were included with less than five effect sizes. The number of effect sizes per data set was slightly larger for Psychological Bulletin (mean 9.2, first quartile 5, third quartile 9) compared to CDSR (mean 7.6, first quartile 5, third quartile 8). The mean number of statistically significant effect sizes based on a two-tailed hypothesis test and α=.05 were 2.6 (30.1%) for data sets from Psychological Bulletin and 1.2 (16%) for data sets from CDSR. A requirement for applying puniform is that statistically significant effect sizes should be present in a meta-analysis. This was not the case for 38.5% of the data sets (27.3% of data sets from Psychological Bulletin and 49.7% of data sets from CDSR), and p-uniform could not be applied to these data sets. 24 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Figure 2. Histograms of the number of effect sizes in data sets from Psychological Bulletin (a) and CDSR (b). Research question 1 Table 4 shows the results of applying Egger’s regression test, the rank-correlation test, puniform’s publication bias test, and the TES to examine the prevalence of publication bias in the meta-analyses. The panels in Table 4 illustrate how often each publication bias test was statistically significant (marginal frequencies and percentages) and also the agreement among the methods (joint frequencies). Agreement among the methods was quantified by means of Loevinger’s H (bottom-right cell of each panel). 25 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Table 4. Results of applying Egger’s regression test, rank-correlation test, p-uniform’s publication bias test, and test of excess significance (TES) to examine the prevalence of publication bias in meta-analyses from Psychological Bulletin and Cochrane Database of Systematic Reviews. H denotes Loevinger’s H to describe the association between two methods. Rank-correlation Not sig. Not sig. Sig. 608 33 p-uniform 641; 87.9% Not sig. Sig. Not sig. 324 44 368; 83.6% Egger Egger Sig. 51 37 88; 12.1% Sig. 64 8 72; 16.4% Total 659; 90.4% 70; 9.6% H = .464 Total 388; 88.2% 52; 11.8% H = -.012 TES p-uniform Not sig. Sig. Not sig. 625 27 652; 88.1% Sig. 80 8 88; 11.9% Egger Rank- Not sig. Sig. Not sig. 350 44 394; 89.5% Sig. 38 8 46; 10.5% Total 388; 88.2% 52; 11.8% H = .063 correlati on Total 705; 95.3% 35; 4.7% H = .124 TES Rank- TES Not sig. Sig. Not sig. 630 29 659; 90.4% Sig. 64 6 70; 9.6% Total 694; 95.2% 35; 4.8% H=.083 correlati puniform Not sig. Sig. Not sig. 361 27 388; 88.2% Sig. 44 8 52; 11.8% Total 405; 92% 35; 8% H=.125 on 26 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS No strong evidence for publication bias was observed in the data sets. Publication bias was detected in at most 88 data sets (11.9%) by Egger’s regression test. The TES and rankcorrelation test were statistically significant in 35 (4.7%) and 70 (9.6%) data sets. In the subset of data sets with at least one statistically significant effect size, p-uniform’s publication bias test detected publication bias in 52 data sets (11.8%). This was more than TES (35; 8%) and the rank-correlation test (46; 10.5%) and less than Egger’s regression test (72; 16.4%) when these methods were applied to this subset of data sets. Note that the rank-correlation could not be applied to all 740 data sets, because there was no variation in the observed effect size in 11 data sets. Associations among the methods were low (H < .13) except for the association between Egger’s regression test and the rank-correlation test (H = .464). This association was expected since both methods examine the relationship between observed effect size and some measure of its precision. We studied in RQ 1b whether publication bias was more prevalent in data sets from Psychological Bulletin than CDSR after controlling for the number of effect sizes (or for puniform statistically significant effect sizes) in a meta-analysis. Publication bias was more prevalent in data sets from Psychological Bulletin if the results of the rank-correlation test (odds ratio=1.919, z=2.453, one-tailed p-value=.007) and TES (odds ratio=2.056, z=1.913, one-tailed p-value=.028) were used as dependent variable, but not for Egger’s regression test (odds ratio=1.298, z=1.128, one-tailed p-value=.259) and p-uniform (odds ratio=0.998, z=-0.007, onetailed p-value=.503). Tables with the results of these logistic regression analyses are reported in Appendix A. Note, however, that if we control for the number of tests performed (i.e., 4) by means of the Bonferoni correction, only the result of the rank-correlation test is statistically 27 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS significant (p = .007 < .05/4 = .0125). All in all, we conclude that evidence on differences in the prevalence of publication bias across the two disciplines is not convincing. Research question 2 Absolute values of the effect size estimates of random-effects meta-analysis and puniform were predicted based on characteristics of the data sets. One-tailed hypothesis tests were used in case of a directional hypothesis (see Table 2). Table 5 presents the results of the WLS regression with the absolute values of the estimates of random-effects meta-analysis as dependent variable. The variables in the model explain 64.5% of the variance in the estimates of random-effects meta-analysis (R2=0.645; F(5,734)=266.5, p < .001). The absolute value of the meta-analytic estimate was 0.016 larger for data sets from Psychological Bulletin compared to CDSR (t(733)=1.714, one-tailed p-value =.043). The I2-statistic had an unexpected negative effect on the absolute value of the meta-analytic estimate (B=-0.001, t(733)=-4.143, two-tailed pvalue<.001). The harmonic mean of the standard error had a positive effect (B=1.095, t(733)=23.512, one-tailed p-value<.001) suggesting that data sets with less precise estimates resulted in more extreme meta-analytic estimates. As hypothesized, a positive effect was observed for the proportion of statistically significant effect sizes on the absolute value of the meta-analytic estimate (B=0.479, t(733)=-32.021, two-tailed p-value<.001). 28 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Table 5. Results of weighted least squares regression with the absolute value of the randomeffects meta-analysis effect size estimate as dependent variable and as independent variables discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of statistically significant effect sizes in a data set (Prop. sig. effect sizes), and number of effect sizes in a data set. B (SE) t-value (p-value) 95% CI Intercept -0.139 (0.013) -10.650 (<.001) -0.164;-0.113 Discipline 0.016 (.010) 1.714 (.043) -0.002;0.035 I2-statistic -0.001 (.0003) -4.143 (<.001) -0.002;-0.001 Standard error 1.095 (.047) 23.512 (<.001) 1.003;1.186 Prop. sig. 0.479 (.015) 32.021 (<.001) 0.450;0.509 -0.0003 (.0003) -1.057 (.291) -0.001;0.0002 effect sizes Number of effect sizes Note. CDSR is the reference category for discipline. p-values for discipline, harmonic mean of the standard error, and proportion of significant effect sizes in a data set are one-tailed whereas the other p-values are two-tailed. CI = confidence interval. Table 6 shows the results of WLS regression with the absolute value of p-uniform’s estimate as dependent variable. P-uniform could only be applied to 455 data sets (61.5%), because these data sets contained at least one statistically significant effect size. The variance of the 455 estimates by p-uniform (73.070) was substantially larger than the variance of the estimates of random-effects meta-analysis (0.204) when applied to these 455 data sets. This was mainly caused by some extreme effect size estimates by p-uniform (i.e., four estimates were 29 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS larger than 10 and seven estimates were smaller than -10. The proportion explained variance in p-uniform’s estimate was R2=.015 (F(5,449)=, p=.239). We hypothesized a positive relationship of the I2-statistic, but such a relationship was not observed (B=0.012, t(448)=0.843, one-tailed pvalue=.200). This may indicate that p-uniform’s estimates were not severely overestimated because of the heterogeneity in true effect size. No effect was observed between the other independent variables and the absolute value of p-uniform’s estimate. Quantile regression was used as sensitivity analysis to examine whether the results were distorted by extreme effect size estimates of p-uniform (see Table B1). The effect of the harmonic mean of the standard error was lower in the quantile regression, but in contrast to the WLS regression statistically significant (B=2.017, t(448)=3.660, two-tailed p-value<.001) and the effect of the proportion of statistically significant effect sizes was substantially lower in the quantile regression (B=0.194, t(448)=1.881, two-tailed p-value=.061). 30 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Table 6. Results of weighted least squares regression with the absolute value of p-uniform’s effect size estimate as dependent variable and as independent variables discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of statistically significant effect sizes in a data set (Prop. sig. effect sizes), and number of effect sizes in a data set. B (SE) t-value (p-value) 95% CI Intercept -1.149 (0.794) 1.447 (.149) -0.411; 2.709 Discipline -0.185 (0.550) -0.336 (.737) -1.265;0.896 I2-statistic 0.012 (0.014) 0.843 (.200) -0.016;0.040 Standard error 3.124 (2.780) 1.124 (.262) -2.339;8.587 Prop. sig. -1.454 (0.867) -1.678 (.094) -3.158;0.249 -0.021 (0.015) -1.404 (.161) -0.050;0.008 effect sizes Number of effect sizes Note. CDSR is the reference category for discipline. p-value for the I2-statistic is one-tailed whereas the other p-values are two-tailed. CI = confidence interval. Research question 3 We examined the overestimation in effect size by creating the Y variable that reflects the overestimation in effect size of random-effects meta-analysis when compared with p-uniform’s estimate. Results indicated that the overestimation was less than d=0.05 for data sets from Psychological Bulletin (mean=0.012, median=0.048) and CDSR (mean=0.015, median=0.019), and that differences between estimates of data sets from Psychological Bulletin and CDSR were negligible (research question 3a). 31 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Table 7 presents the results of WLS regression with Y as dependent variable and characteristics of the data sets as independent variables. The independent variables explained 15.3% of the variance of Y (F(5,449)=16., p < .001). As the descriptive results above already illustrate, the effect size in data sets from Psychological Bulletin were not significantly larger than from CDSR (B=-0.026, t(448)=-1.331, one-tailed p-value=.908). A negative effect of the I2statistic was observed (B=-0.003, t(448)=-5.255, one-tailed p-value<.001). The relationship between the harmonic mean of the standard error and Y was observed in the expected direction (B=0.218, t(448)=2.172, one-tailed p-value=.015), and the proportion of statistically significant effect sizes in a data set had a positive effect on Y (B=0.161, t(448)=5.140, two-tailed pvalue<.001). 32 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Table 7. Results of weighted least squares regression with the effect size overestimation in random-effects meta-analysis when compared to p-uniform (Y) as dependent variable and as independent variables discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of statistically significant effect sizes in a data set (Prop. sig. effect sizes), and number of effect sizes in a data set. B (SE) t-value (p-value) 95% CI Intercept -0.026 (0.029) -0.923 (.357) -0.083;0.030 Discipline -0.026 (0.020) -1.331 (.908) -0.066;0.013 I2-statistic -0.003 (0.001) -5.255 (<.001) -0.004;-0.002 Standard error 0.218 (0.100) 2.172 (.015) 0.021;0.416 Prop. sig. 0.161 (0.031) 5.140 (<.001) 0.100;0.223 -0.002 (0.001) -3.671 (<.001) -0.003;-0.001 effect sizes Number of effect sizes Note. CDSR is the reference category for discipline. p-values for discipline, the I2-statistic, and the harmonic mean of the standard error are one-tailed whereas the other p-values are two-tailed. CI = confidence interval. The estimate of p-uniform was truncated to zero in 130 data sets before computing Y in order to deal with unlikely cases where p-uniform’s estimate would be in the opposite direction than the estimate of random-effects meta-analysis. Hence, quantile regression with the median of Y as dependent variable was conducted to examine whether the results of WLS regression were affected by this truncation (see Table 8). The effects of the I2-statistic (B=-0.0003, t(448)=0.547, one-tailed p-value=.292) and proportion of statistically significant effect size in a data set 33 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS (B=0.021, t(448)=0.463, one-tailed p-value=.644) were no longer statistically significant. Although the effect of the harmonic mean of the standard error was larger in the quantile regression, it was no longer statistically significant (B=0.346, t(448)=1.511, one-tailed pvalue=.066). Consequently, using the median of Y as dependent variable in quantile regression instead of the mean of Y in WLS regression yields different results and conclusions. Table 8. Results of quantile regression with the median of effect size overestimation in randomeffects meta-analysis when compared to p-uniform (Y) as dependent variable and as independent variables discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of statistically significant effect sizes in a data set (Prop. sig. effect sizes), and number of effect sizes in a data set. B (SE) t-value (p-value) 95% CI Intercept -0.009 (0.078) -0.116 (.908) -0.012;0.083 Discipline -0.005 (0.026) -0.191 (.576) -0.021;0.021 I2-statistic -0.0003 (0.0005) -0.547 (.292) -0.001;-0.0001 Standard error 0.346 (0.229) 1.511 (.066) 0.204;0.476 Prop. sig. 0.021 (0.045) 0.463 (.644) -0.002;0.062 -0.003 (0.001) -1.867 (.063) -0.004;-0.002 effect sizes Number of effect sizes Note. CDSR is the reference category for discipline. p-values for discipline, the I2-statistic, and the harmonic mean of the standard error are one-tailed whereas the other p-values are two-tailed. CI = confidence interval based on inverting a rank test. Conclusion and discussion 34 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Publication bias is a major threat to the validity of meta-analyses. It results in overestimated effect sizes in primary studies which in turn also biases the meta-analytic results (e.g., Lane & Dunlap, 1978; van Assen et al., 2015). Evidence for publication bias has been observed in many research fields (e.g., Fanelli, 2012; Franco et al., 2014; Franco et al., 2016; Sterling et al., 1995), and different methods were developed to examine publication bias in metaanalyses (for an overview see Rothstein et al., 2005). We studied the prevalence of publication bias and the overestimation caused by it in a large number of published meta-analyses within psychology and medical research by applying publication bias methods to homogeneous data sets of these meta-analysis. Characteristics of meta-analyses were also used to predict metaanalytic estimates and overestimation of effect size. No convincing evidence for the prevalence of publication bias was observed in our largescale data set. Publication bias was detected in at most 88 data sets (11.9%) by Egger’s regression test, 70 data sets (9.6%) by the rank-correlation test, 35 data sets (4.7%) by the TES, and 52(11.8%) data sets by p-uniforms publication bias test. Results also show that publication bias tests do often not agree on the presence of publication bias in a data set. Furthermore, no univocal evidence was obtained for publication bias being more severe in data sets representing psychology compared to data sets representing medical research. A WLS regression with the random-effects meta-analytic estimates as dependent variable revealed a negative effect of primary studies’ precision on the meta-analytic estimate. This suggests that small-study effects were present, and one of the causes of these small-study effects is publication bias. We only analyzed homogeneous data sets which makes it less likely that the small-study effects are caused by, for instance, methodological differences between the effect sizes in a data set rather 35 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS than publication bias. Small-study effects were also observed in other meta-meta-analyses by Fanelli and Ioannidis (2013) and Fanelli, Costas, and Ioannidis (2017). The same independent variables used for predicting the random-effects meta-analytic effect size estimate were also used in a WLS regression with p-uniform’s estimate as dependent variable. None of the independent variables had a statistically significant effect on p-uniform’s effect size estimate. The explained variance in the WLS regression with p-uniform’s estimate as dependent variable (1.5%) was substantially lower than with the estimate of random-effects meta-analysis as dependent variable (64.5%). This difference in explained variance was probably caused by the large variance in p-uniform’s effect size estimates across data sets. The variance in these estimates was large, because estimates of p-uniform were sometimes extremely positive or negative caused by observed effect sizes with p-values just below the α-level. We also studied the overestimation in effect size due to publication bias by comparing the estimates of randomeffects meta-analysis and p-uniform. Primary studies’ precision and the proportion of statistically significant effect sizes in a data set both had an effect on the overestimation of effect size in a WLS regression, but these effects were no longer statistically significant in a sensitivity analysis. The results of our paper are not in line with previous research showing that publication bias is omnipresent in science (e.g., Fanelli, 2012; Franco et al., 2014; Franco et al., 2016; Sterling et al., 1995). There are three indications for the absence of severe publication bias in the included meta-analyses in our study. First, many statistically nonsignificant effect sizes were included in the meta-analyses. Only 30.1% of the observed effect sizes were statistically significant in the meta-analyses from Psychological Bulletin, and 16% of the observed effect sizes were statistically significant in the meta-analyses from CDSR. This is, for instance, substantially lower than the approximately 90% statistically significant results in the social 36 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS sciences literature (Fanelli, 2012; Sterling et al., 1995). Second, publication bias was often not detected in the included meta-analyses. Finally, overestimation in effect size caused by publication bias was hardly observed in the included meta-analyses. Publication bias and overestimation caused by it could also have gone undetected due to a variety of reasons. First, authors included many unpublished studies in their meta-analyses which decreased the detectability of publication bias. Second, publication bias methods were applied to subsets of primary studies’ included in the meta-analysis to create more homogeneous data sets. Creating these smaller data sets was necessary, because publication bias methods’ statistical properties deteriorate if heterogeneity in true effect size is moderate or large. However, publication bias methods were now applied to data sets with a small number of effect sizes. Publication bias tests suffer from low statistical power (Begg & Mazumdar, 1994; Sterne et al., 2000), and imprecise meta-analytic effect size estimates are obtained in these conditions (van Aert et al., 2016; van Assen et al., 2015). Third, the selection of homogeneous data sets could have let to the exclusion of data sets with severe publication bias. Imagine a data set with a number of statistically significant effect sizes that were published, and only two statistically nonsignificant effect sizes that were obtained from unpublished research. The inclusion of the effect sizes from the unpublished research causes heterogeneity in the data set, and therefore a data set with potentially severe publication bias was excluded from our study. Finally, publication bias is less of an issue if the relationship of interest in a meta-analysis is not about the main outcome of primary studies. Significance of the main outcome in a primary study probably determines publication and not significance of a secondary outcome. Although no convincing evidence for publication bias was observed in our study, we agree with others (e.g., Banks, Kepes, & McDaniel, 2012; Field & Gillett, 2010; Sutton, 2005) 37 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS that publication bias should be routinely assessed in every meta-analysis. Moreover, a set of publication bias methods is recommended to be applied and reported in each meta-analysis, because each method assesses publication bias in a different way and one method may detect publication bias in a meta-analysis whereas another method does not (Coburn & Vevea, 2015; Kepes et al., 2012). Future research should focus on developing publication bias methods that are able to examine publication bias in meta-analyses with a small number of effect sizes and heterogeneity in true effect size. We are currently working on extending p-uniform such that it can also deal with heterogeneous true effect size. Other promising developments are the recently increased attention for selection models (e.g., Citkowicz & Vevea, 2017) and the development of a Bayesian method to correct for publication bias (Guan & Vandekerckhove, 2015). Albeit metaanalysts will greatly benefit from improved methods to assess publication bias, attention should also be paid to registering studies to make unpublished research readily accessible (e.g., Dickersin, 2005). Such a register enables meta-analysts to also include unpublished research in their meta-analysis, and will improve the validity of meta-analytic results. 38 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS References Aguinis, H., Dalton, D. R., Bosco, F. A., Pierce, C. A., & Dalton, C. M. (2010). Meta-analytic choices and judgment calls: Implications for theory building and testing, obtained effect sizes, and scholarly impact. Journal of Management, 37(1), 5-38. doi:10.1177/0149206310377113 Aytug, Z. G., Rothstein, H. R., Zhou, W., & Kern, M. C. (2012). Revealed or concealed? Transparency of procedures, decisions, and judgment calls in meta-analyses. Organizational Research Methods, 15(1), 103-133. doi:10.1177/1094428111403495 Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7(6), 543-554. doi:10.1177/1745691612459060 Banks, G. C., Kepes, S., & Banks, K. P. (2012). Publication bias: The antagonist of metaanalytic reviews and effective policymaking. Educational Evaluation and Policy Analysis, 34(3), 259-277. doi:10.3102/0162373712446144 Banks, G. C., Kepes, S., & McDaniel, M. A. (2012). Publication bias: A call for improved metaanalytic practice in the organizational sciences. International Journal of Selection and Assessment, 20(2), 182-197. doi:10.1111/j.1468-2389.2012.00591 Becker, B. J. (2005). Failsafe N or file-drawer number. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 111-125). Chichester, UK: Wiley. Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 50(4), 1088-1101. Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The Handbook of Research Synthesis and Meta-Analysis (pp. 221-236). New York: Russell Sage Foundation. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to metaanalysis. Chichester, UK: John Wiley & Sons, Ltd. Citkowicz, M., & Vevea, J. L. (2017). A parsimonious weight function for modeling publication bias. Psychological Methods, 22(1), 28-41. doi:http://dx.doi.org/10.1037/met0000119 39 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Coburn, K. M., & Vevea, J. L. (2015). Publication bias as a function of study characteristics. Psychological Methods, 20(3), 310-330. doi:http://dx.doi.org/10.1037/met0000046 Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304-1312. Cooper, H., DeNeve, K., & Charlton, K. (1997). Finding the missing science: The fate of studies submitted for review by a human subjects committee. Psychological Methods, 2(4), 447452. doi:10.1037/1082-989X.2.4.447 Dickersin, K. (2005). Publication bias: Recognizing the problem understanding its origins and scope, and preventing harm. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 11-33). Chichester, England: Wiley. Dickersin, K., Chan, S., Chalmers, T. C., Sacks, H. S., & Smith, H. (1987). Publication bias and clinical trials. Controlled Clinical Trials, 8(4), 343-353. Duval, S., & Tweedie, R. (2000a). A nonparametric "trim and fill" method of accounting for publication bias in meta-analysis. Journal of the American Statistical Association, 95(449), 89-98. doi:10.1080/01621459.2000.10473905 Duval, S., & Tweedie, R. (2000b). Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics, 56(2), 455-463. Dwan, K., Altman, D. G., Arnaiz, J. A., Bloom, J., Chan, A.-W., Cronin, E., . . . Siegfried, N. (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE, 3(8), e3081. doi:10.1371/journal.pone.0003081 Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629-634. Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891-904. doi:10.1007/s11192-011-0494-7 Fanelli, D., Costas, R., & Ioannidis, J. P. A. (2017). Meta-assessment of bias in science. Proc Natl Acad Sci USA. doi:10.1073/pnas.1618569114 Fanelli, D., & Ioannidis, J. P. (2013). US studies may overestimate effect sizes in softer research. Proc Natl Acad Sci USA, 110(37), 15031-15036. doi:www.pnas.org/cgi/doi/10.1073/pnas.1302997110 40 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17(1), 120-128. doi:10.1037/a0024445 Field, A. P., & Gillett, R. (2010). How to do a meta-analysis. British Journal of Mathematical and Statistical Psychology, 63(3), 665-694. doi:10.1348/000711010X502733 Fleiss, J. L., & Berlin, J. A. (2009). Effect sizes for dichotomous data. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The Handbook of Research Synthesis and MetaAnalysis (pp. 237-253). New York: Russell Sage Foundation. Francis, G. (2013). Replication, statistical consistency, and publication bias. Journal of Mathematical Psychology, 57(5), 153-169. doi:10.1016/j.jmp.2013.02.003 Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502-1505. doi:10.1126/science.1255484 Franco, A., Simonovits, G., & Malhotra, N. (2016). Underreporting in psychology experiments: Evidence from a study registry. Social Psychological and Personality Science, 7(1), 8-12. doi:10.1177/1948550615598377 Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3-8. Guan, M., & Vandekerckhove, J. (2015). A Bayesian approach to mitigation of publication bias. Psychonomic Bulletin and Review, Advance online publication. doi:10.3758/s13423-0150868-6 Hedges, L. V., & Vevea, J. L. (2005). Selection method approaches. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment, and adjustments. Chichester: UK: Wiley. Higgins, J. P. T. (2008). Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified. International Journal of Epidemiology, 37(5), 1158-1160. doi:10.1093/ije/dyn204 Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327(7414), 557-560. doi:10.1136/bmj.327.7414.557 41 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Hopewell, S., Clarke, M., & Mallet, S. (2005). Grey literature and systematic reviews. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 99-110). Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. doi:10.1371/journal.pmed.0020124 Ioannidis, J. P. (2008). Interpretation of tests of heterogeneity and bias in meta-analysis. Journal of Evaluation in Clinical Practice, 14(5), 951-957. doi:10.1111/j.13652753.2008.00986.x Ioannidis, J. P., & Trikalinos, T. A. (2007a). The appropriateness of asymmetry tests for publication bias in meta-analyses: A large survey. Canadian Medical Association Journal, 176(8), 1091-1096. doi:10.1503/cmaj.060410 Ioannidis, J. P., & Trikalinos, T. A. (2007b). An exploratory test for an excess of significant findings. Clinical Trials, 4(3), 245-253. doi:10.1177/1740774507079441 Ioannidis, J. P., Trikalinos, T. A., & Zintzaras, E. (2006). Extreme between-study homogeneity in meta-analyses could offer useful insights. Journal of Clinical Epidemiology, 59(10), 1023-1032. doi:10.1016/j.jclinepi.2006.02.013 Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file drawer problem. Statistical Science, 3(1), 109-117. doi:10.1214/ss/1177013012 Jackson, D. (2006). The implications of publication bias for meta-analysis' other parameter. Statistics in Medicine, 25(17), 2911-2921. doi:10.1002/sim.2293 Jones, C. W., Handler, L., Crowell, K. E., Keil, L. G., Weaver, M. A., & Platts-Mills, T. F. (2013). Non-publication of large randomized clinical trials: Cross sectional analysis. British Medical Journal, 347. doi:10.1136/bmj.f6104 Jürgens, G., & Graudal, N. A. (2004). Effects of low sodium diet versus high sodium diet on blood pressure, renin, aldosterone, catecholamines, cholesterols, and triglyceride. Cochrane Database of Systematic Reviews, (1). doi:10.1002/14651858.CD004022.pub2 Kepes, S., Banks, G. C., McDaniel, M., & Whetzel, D. L. (2012). Publication bias in the organizational sciences. Organizational Research Methods, 15(4), 624-662. doi:10.1177/1094428112452760 42 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Kirkham, J. J., Dwan, K. M., Altman, D. G., Gamble, C., Dodd, S., Smyth, R., & Williamson, P. R. (2010). The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ, 340(c365). doi:10.1136/bmj.c365 Koenker, R. (2005). Quantile regression. Cambridge: Cambridge University Press. Lane, D. M., & Dunlap, W. P. (1978). Estimating effect size: Bias resulting from the significance criterion in editorial decisions. British Journal of Mathematical & Statistical Psychology, 31, 107-112. Langan, D., Higgins, J. P. T., & Simmonds, M. (2016). Comparative performance of heterogeneity variance estimators in meta-analysis: A review of simulation studies. Research Synthesis Methods(498). doi:10.1002/jrsm.1198 Light, R. J., & Pillemer, D. B. (1984). Summing up: The science of reviewing research. Cambridge, MA: Harvard University Press. Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of scale analysis and factor analysis. Psychological Bulletin, 45(6), 507-529. Macaskill, P., Walter, S. D., & Irwig, L. (2001). A comparison of methods to detect publication bias in meta-analysis. Statistics in Medicine, 20(4), 641-654. Moreno, S. G., Sutton, A. J., Ades, A. E., Stanley, T. D., Abrams, K. R., Peters, J. L., & Cooper, N. J. (2009). Assessment of regression-based methods to adjust for publication bias through a comprehensive simulation study. BMC Medical Research Methodology, 9(2). doi:10.1186/1471-2288-9-2 Nuijten, M. B., van Assen, M. A. L. M., Veldkamp, C. L. S., & Wicherts, J. M. (2015). The replication paradox: Combining studies can decrease accuracy of effect size estimates. Review of General Psychology, 19(2), 172-182. doi:10.1037/gpr0000034 Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). doi:10.1126/science.aac4716 Paule, R. C., & Mandel, J. (1982). Consensus values and weighting factors. Journal of Research of the National Bureau of Standards, 87(5), 377-385. 43 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2007). Performance of the trim and fill method in the presence of publication bias and between-study heterogeneity. Statistics in Medicine, 26(25), 4544-4562. doi:10.1002/sim.2889 Raudenbush, S. W. (2009). Analyzing effect sizes: Random-effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The Handbook of Research Synthesis and MetaAnalysis (pp. 295-315). New York: Russell Sage Foundation. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641. Rothstein, H. R., & Bushman, B. J. (2012). Publication bias in psychological science: Comment on Ferguson and Brannick (2012). Psychological Methods, 17(1), 129-136. doi:10.1037/a0027128 Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and adjustments. Chichester, UK: Wiley. Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9(6), 666-681. doi:10.1177/1745691614553988 Springate, D. A., & Kontopantelis, E. (2014). Cochrane_scraper: tools for downloading data from the Cochrane Library of Systematic Reviews (Version 1.1.0). Retrieved from https://github.com/DASpringate/Cochrane_scraper Stanley, T. D., & Doucouliagos, H. (2014). Meta-regression approximations to reduce publication selection bias. Research Synthesis Methods, 5(1), 60-78. Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. (2017). Finding the power to reduce publication bias. Statistics in Medicine. doi:10.1002/sim.7228 Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician, 49(1), 108-112. doi:10.2307/2684823 Sterne, J. A. C., Gavaghan, D., & Egger, M. (2000). Publication and related bias in metaanalysis: Power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology, 53(11), 1119-1129. doi:10.1016/S0895-4356(00)00242-0 44 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Sterne, J. A. C., Harbord, R. M., Sutton, A. J., Jones, D. R., Ioannidis, J. P., Terrin, N., . . . Higgins, J. P. T. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. British Medical Journal, 343(7818), 1-8. doi:http://dx.doi.org/10.1136/bmj.d4002 Sutton, A. J. (2005). Evidence concerning the consequences of publication and related biases. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 175-192). Chichester, UK: Wiley. Terrin, N., Schmid, C. H., & Lau, J. (2005). In an empirical evaluation of the funnel plot, researchers could not visually identify publication bias. Journal of Clinical Epidemiology, 58(9), 894-901. doi:10.1016/j.jclinepi.2005.01.006 Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in Medicine, 22(13), 2113-2126. doi:10.1002/sim.1461 United States Code (2007) US Public Law 110-85: Food and Drug Administration Amendments Act 2007. http://frwebgate.access.gpo.gov/cgibin/getdoc.cgi?dbname=110_cong_public_laws&docid=f:publ085.110.pdf. van Aert, R. C. M., Wicherts, J. M., & van Assen, M. A. L. M. (2016). Conducting metaanalyses on p-values: Reservations and recommendations for applying p-uniform and pcurve. Perspectives on Psychological Science, 11(5), 713-729. doi:10.1177/1745691616650874 van Assen, M. A. L. M., van Aert, R. C. M., & Wicherts, J. M. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20(3), 293-309. doi:http://dx.doi.org/10.1037/met0000025 Veroniki, A. A., Jackson, D., Viechtbauer, W., Bender, R., Bowden, J., Knapp, G., . . . Salanti, G. (2016). Methods to estimate the between-study variance and its uncertainty in metaanalysis. Research Synthesis Methods, 7(1), 55-79. doi:10.1002/jrsm.1164 Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419-435. doi:10.1007/bf02294384 Vevea, J. L., & Woods, C. M. (2005). Publication bias in research synthesis: Sensitivity analysis using a priori weight functions. Psychological Methods, 10(4), 428-443. doi:10.1037/1082-989X.10.4.428 45 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics, 32(1), 39-60. doi:10.3102/1076998606298034 46 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Footnotes 1 The search was conducted within the journal Psychological Bulletin for articles published between 2004 and 2014, and the search terms “meta-analy*” and not “comment”, “note”, “correction”, and “reply” in the article’s title were used. 2 In order to verify the procedure of extracting data, data from a random sample of 10% of the included meta-analyses were extracted a second time by a different researcher. No major discrepancies were observed between the data that was extracted twice. 47 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Appendix A Table A1. Results of logistic regression predicting statistical significance of Egger’s regression test with discipline and control variable number of effect sizes in a data set. B (SE) z-value (p-value) OR 95% CI for OR Intercept -2.372 (.193) -12.270 (< .001) 0.093 0.063;0.134 Discipline 0.261 (.231) 1.128 (.130) 1.298 0.826;2.051 Number of 0.025 (.011) 2.418 (.016) 1.026 1.004;1.048 effect sizes Note. CDSR is the reference category for discipline. p-values for the intercept and number of effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI = profile likelihood confidence interval. Table A2. Results of logistic regression predicting statistical significance of rank-correlation test with discipline and control variable number of effect sizes in a data set. B (SE) z-value (p-value) OR 95% CI for OR Intercept -2.856 (.232) -12.310 (< .001) 0.057 0.036;0.089 Discipline 0.652 (.266) 2.453 (.007) 1.919 1.150;3.273 Number of 0.026 (.011) 2.375 (.018) 1.026 1.004;1.049 effect sizes Note. CDSR is the reference category for discipline. p-values for the intercept and number of effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI = profile likelihood confidence interval 48 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Table A3. Results of logistic regression predicting statistical significance of p-uniform’s publication bias test with discipline and control variable number of statistically significant effect sizes in a data set. B (SE) z-value (p-value) OR 95% CI for OR Intercept -1.689 (.276) -6.110 (< .001) 0.185 0.106;0.314 Discipline -0.002 (.305) -0.007 (.503) 0.998 0.551;1.830 Number of -0.112 (.072) -1.558 (.119) 0.894 0.765;1.012 effect sizes Note. CDSR is the reference category for discipline. p-values for the intercept and number of effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI = profile likelihood confidence interval Table A4. Results of logistic regression predicting statistical significance of test of excess significance with discipline and control variable number of effect sizes in a data set. B (SE) z-value (p-value) OR 95% CI for OR Intercept -3.815 (.330) -11.564 (< .001) 0.022 0.011;0.040 Discipline 0.721 (.377) 1.913 (.028) 2.056 1.003;4.461 Number of 0.038 (.012) 3.170 (.002) 1.039 1.015;1.065 effect sizes Note. CDSR is the reference category for discipline. p-values for the intercept and number of effect are two-tailed whereas the p-value for discipline is one-tailed. OR = odds ratio. CI = profile likelihood confidence interval 49 STUDYING PUBLICATION BIAS IN A META-META-ANALYSIS Appendix B Table B1. Results of quantile regression with the median of p-uniform’s effect size estimates as dependent variable and as independent variables discipline, I2-statistic, harmonic mean of the standard error (standard error), proportion of statistically significant effect sizes in a data set (Prop. sig. effect sizes), and number of effect sizes in a data set. B (SE) t-value (p-value) 95% CI Intercept -0.067 (0.193) -0.347 (.729) -0.315;-0.045 Discipline 0.009 (0.069) 0.131 (.896) -0.016;0.070 I2-statistic -0.001 (0.002) -0.673 (.749) -0.002;0.001 Standard error 2.017 (0.551) 3.660 (<.001) 1.715;2.182 Prop. sig. 0.194 (0.103) 1.881 (.060) 0.078;0.330 -0.002 (0.002) -0.911 (.363) -0.005;0.0002 effect sizes Number of effect sizes Note. CDSR is the reference category for discipline. p-value for the I2-statistic is one-tailed whereas the other p-values are two-tailed. CI = confidence interval based on inverting a rank test. 50
© Copyright 2026 Paperzz