Unbiased Publication Bias: Theory and Evidence

Unbiased Publication Bias: Theory and Evidence
Chishio Furukawa∗
January 2017
Abstract
Publication bias compromises the truthfulness of scientific evidence that informs policy
decisions. While it is conventionally attributed to the research community’s bias, this
paper shows that, when the policymaker only counts positive vs negative results of null
hypothesis testing to decide whether to implement policies, even unbiased and rational
researchers’ reports will have such bias in the socially optimal communication equilibrium.
The information asymmetry among researchers and the coarseness of conclusions explain the
omission of intermediate estimates such that reported estimates have an upward bias. The
contrasting use of standard errors between Bayesian updating and null hypothesis testing
explains the inflation of some marginally insignificant results. Empirically, the model makes
a unique testable prediction: omission occurs among imprecisely estimated coefficients near
zero. Examples show they apply to and only to literature with aggregation through vote
counting. Practically, the paper proposes a simple and conservative bias correction method
that is robust under various models of publication bias commonly suggested.
∗
[email protected]. Department of Economics, Massachusetts Institute of Technology
I am indebted to Abhijit Banerjee for his guidance. I gained knowledge regarding publication bias from Toshi
A. Furukawa and the workshop at Berkeley Initiative for Transparency in Social Science in June, 2015. I
thank Harry Di Pei, Arda Gitmetz, Alan Olivi, Michael B. Wong for advice on theory, and Tetsuya Kaji, Yuya
Sasaki, and Isaiah Andrews on econometrics, and Esther Duflo, Daniel Prinz, Daniel Waldinger, Alonso Bucarey,
Mahvish Shaukat, Neil Thakral, and seminar participants at MIT Political Economy lunch for encouragements.
All errors are mine.
1
Introduction
Informed decisions on socioeconomic, medical, educational, and environmental policies need a
careful synthesis of the scientific literature full of subtle details and disagreements. In practice,
however, an influential summary of 12,000 published articles on climate change that reached
President Obama ran: “97% of climate papers stating a position on human-caused global
warming agree that global warming is happening – and we are the cause” (The Consensus
Project, 2014). This review was apparently a successful communication strategy for aggregating
contradictory evidence for citizens and policymakers even though it only referred to the abstract
of each study and did not assess credibility nor reflect details of the estimates.
However transparent and effective, meta-analysts today would consider this approach to
be unsatisfactory for two reasons. First, publication bias due to omission of negative results
and inflation of marginal results is widely documented in many areas of scientific research.
As positive results are more likely to be published and disseminated than negative results,1
aggregation without proper consideration for selective reporting can be misleading. Second,
vote counting of positive vs negative results is so coarse that the conclusion can often be overturned with formal aggregation of coefficients weighted by their precision and methodological
reliability.
This paper shows, contrary to such common view, that the first deficiency of publication bias
counter-balances the second inefficiency of vote counting; that is, while publication bias is often seen as being caused by interested researchers, it could be observed with fully disinterested
researchers who face the difficulty of simplifying nuanced results into yes-or-no conclusions.
Specifically, publication bias is the systematic gap between the published literature and the
subject it studies due to selective reporting across and within studies. Across studies, omission
of imprecise negative results leads to collectively generating upward bias on the reported estimates. Within published studies, inflation that turns marginally negative results into positive
results leads to exaggeration of reported estimates. This paper will demonstrate that coarse
aggregation, information asymmetry, and Bayes’ rule can explain both omission and inflation.
Further, given that the model makes distinct testable prediction regarding the publication
selection process, it suggests a new, simple bias correction method that is conservative and
robust under commonly proposed models of publication bias.
The model’s first result is that unbiased researchers omit results with estimates of moderate
magnitude in such a way that the equilibrium has an upward bias in average underlying estimates of reported studies. This is driven by a combination of information asymmetry among
researchers, binary actions, and a message space that is coarser than a signal space. Let us
consider an example with two social-welfare-maximizing researchers, who receive βi , an unbiased and noisy estimate of policy benefit. If their signals can be directly communicated,
then by Bayes’ rule, it is optimal to implement the policy if βi + β−i ≥ 0 under a benchmark
1
By negative results, this paper will refer to any non-positive results, including both statistically insignificant
results and statistically significant negative results.
1
condition2 . If the policymaker only counts the binary conclusion and implements the policy
if there are more positive results than negative results
(Table
1), then a researcher draws a
positive conclusion if his signal is sufficiently high βi ≥ β , or a negative conclusion if his
signal is sufficiently low βi < β , or not report the study if his signal has an intermediate
value β > βi ≥ β .
Table 1. Policy decision under vote counting
Since each researcher does not know what other studies will observe and report, he conditions his reporting decision on the state in which his own report will be pivotal and swing the
policy decision in the direction of its conclusion. A positive result switches the policymaker
from canceling (
) to implementing (
) the policy only when another researcher
did not report his result because his signal had an intermediate value. Thus, the optimal
threshold β satisfies the indifference condition that satisfies the social-welfare-maximizing rule,
βi + β−i = 0, in expectation conditional on pivotality:
h
i
β + E β−i | − i does not report, βi = β = 0
In contrast, a negative result leads her to cancel (
) rather than to implement (
(1)
)
the policy only when his report is positive. Thus, the optimal threshold β satisfies:
h
i
β + E β−i | − i reports a positive result, βi = β = 0
(2)
Because the binary conclusion cannot convey the strength and can only communicate the
sign of the continuous signal each researcher receives, omission of intermediate results can better convey information than always reporting either positive or negative results; omission is
analogous to adding another message that the result was neither positive nor negative. In addition, by combination of the conditions (1) and (2), researchers are more cautious of reporting
negative results than positive results because negative results change the policy decision only
when another researcher receives very large signal β−i ≥ β, whereas positive results change it
only when another researcher has a signal of moderate magnitude, β > β−i ≥ β. By this asymmetry of pivotality conditions, given that results are reported, the coefficients are on average
2
Suppose each signal βi is normally distributed with identical standard errors, and the two researchers’
signals are independent conditional on the true policy effect. Moreover, suppose the cost of implementing the
policy is zero, the objective is risk-neutral, and that their common prior is also normally distributed with mean
zero. This set-up is public information. This brief overview will focus on an equilibrium in which thresholds
β, β are the same for two researchers, and the main analysis will validate this assumption.
2
upward biased: E [βi |i reports either a positive or negative result] > E [βi ]. This asymmetry
between willingness to report positive vs negative results is key feature of the stable and optimal equilibrium: while equilibria without omission or with symmetric willingness to report
the two results also exist, they are locally unstable and socially suboptimal.
The second result is numerical and shows that, when a constant p-value is used to determine
the binary conclusion across studies with different precisions, the social-welfare-maximizing
researchers may inflate some marginally insignificant results to still draw positive conclusions.
This is due to the contrast in the use of standard errors between Bayes’ rule and null hypothesis
testing: whereas Bayesian updating divides each coefficient by its variance because it is an
aggregation with weights proportional to sample size n, the t-statistic divides the coefficient
estimate by its standard error because it is a normalization for convergence that occurs at
√
rate n. Whereas the p-value focuses on how unlikely that a given observation occurs under
the null hypothesis of zero effect, the optimal critical level given the coarse aggregation by
policymakers focuses on how well binary conclusions can approximate the Bayesian updating
formula. Thus, rational Bayesian researchers adopt a threshold of coefficient magnitude that
is strictly convex in its standard error to determine whether to draw a positive conclusion, and
therefore, may report a positive conclusion even if the null hypothesis testing does not reject
at a given conventional threshold.
Numerical illustrations demonstrate these results apply to general environments with various parameter values. Quantitatively, the upward bias can be as large as one standard deviation
of the underlying distribution of policy effectiveness at least among the most imprecise studies. Further, the omission rate can be as large as 40 percent on average, and 70-80 percent
among the most imprecise studies. When the imprecise studies are reported, moreover, they
are positive results 90 percent of the time because the first result of biased omission and the
second result of non-linear thresholds reinforce each other; imprecise negative results are disproportionately less likely to be reported than positive results. Whereas full reporting of null
hypothesis testing implies that coarse aggregation leads to welfare losses as large as 30 percent
of the full information benchmark, reporting with publication bias can roughly halve the losses.
Finally, the strategic substitutability in the indifference conditions (1) and (2) shows that
coarse communication amplifies small differences in belief or interest among researchers to
generate large discrepancies in their decision rules to report yes-or-no conclusions. While
the allegiance bias, the association between authors and conclusions, is often attributed to
polarized views of researchers, the model suggests that their underlying disagreements may be
much smaller than the observed differences in their reporting patterns may suggest.
As a positive theory, the voting model of publication bias provides empirical predictions
on patterns of omission that are distinct from models of publication bias commonly used for
bias correction: omission occurs among the imprecisely estimated coefficients of intermediate
magnitude such that there is an upward bias overall. With two publicly available meta-analysis
datasets from economics, a comparison of observed vs predicted distributions of standard errors
3
and coefficients can test this hypothesis. The example with vote counting, the effect of labor
unions on productivity, supports the prediction: negative results tend to be more precisely
estimated than positive results and to have coefficients that are large in absolute value. On
the other hand, the literature on the elasticity of intertemporal substitutions (EIS) that is
unlikely to be summarized by vote-counting does not support the prediction: negative results
are less likely to have negative point estimates and instead have bunching at the coefficient
zero because of the theoretical restriction of concave utility. These examples demonstrate how
coarse communication can explain patterns of meta-analysis data that other models cannot.
Assumptions on the publication selection process are fundamental for the choice of bias correction methods in meta-analyses, and the model suggests that the existing methods are likely
to suffer from mis-specification by imposing a uniform or monotone selection process. As an
alternative, this paper proposes a new, simple, and conservative bias correction method that is
robust in addressing all of the commonly proposed selection processes. While the communication model and other existing models make different predictions on the publication probability
of different coefficient magnitudes, they all make a shared prediction that precisely estimated
studies are less subject to publication bias than imprecisely estimated ones. Therefore, one
can obtain the efficient estimate from meta-analysis datasets by using only estimated studies
that are sufficiently precise, where the inclusion criteria of less precise studies can be determined by the bias-variance trade-off. This proposed correction methodology, called stem-based
bias correction because it uses the “stem” of the “funnel plot,” can be applied to over 4,000
meta-analyses conducted every year that carefully synthesize contradictory pieces of evidence.
As a normative theory, the model provides an understanding of the publication selection
process that is not dismissive of researchers’ intent and rationality by clarifying that efficient
aggregation under coarse communication is fundamentally incompatible with unbiasedness of
estimates that underly reported studies. To explain publication bias, literature has assumed
that the research community is either guided by its agendas, concerned primarily with career,
or driven by its predisposition for significant results. Such widespread interpretation suggests
corrective and compulsory policies are socially desirable because consumers of knowledge may
be misled or may no longer trust scientific evidence. In contrast, this model suggests that
observed patterns of publication bias do not imply interests and irrationality among researchers;
conversely, the model predicts some degree of publication bias even from unbiased researchers.
Given the difficulties of scientific communication, this paper suggests the readers to hold on to
their willingness to learn from the literature even when there is evidence of publication bias.
Moreover, the researchers need not take the publication bias detected in meta-analysis as a
warning that calls for full reporting of all evidence because efficient communication with single
papers is an objective distinct from the unbiasedness of estimates in systematic reviews.
The paper is organized as follows: Section 2 reviews the evidence on publication bias;
Section 3 presents the model with analytical and numerical results; Section 4 presents the
empirical tests, and proposes a new correction method; Section 5 discusses the model’s key
4
assumptions, relation to literature, and policy implications; and Section 6 concludes.
2
Motivating Evidence
Before presenting the main analyses, this section reviews evidence of publication bias to clarify
what data the model aims to explain, with reference to literature from various fields.
β
density
b
σ
0
1.96
β
σ
Figure 1: Funnel plot (Left) and histogram of t-statistics (Right)
Notes: The funnel plot (Left) is a plot of coefficient and standard errors of main estimates from each
study. The figure plots the reported study in filled dots and omitted study in empty circles. In the
presence of publication bias, there is a positive correlation between coefficients and standard errors
of reported studies, as depicted by the red line. The right figure is a histogram of reported studies’
t-statistics where the gray line describes the smooth underlying density. With publication bias, there
will be a mass of studies right above the conventional significance level. The figures are generated by
the author for an illustrative purpose, and do not reflect actual data.
2.1
Omission
Consider a set of independent studies with estimated coefficients and standard errors {βi , σi }
that measure the identical average treatment benefit b so that Eβi = b. If there were no small
study effects or finite sample bias so that the treatment effect is not affected by sample size,
then in the absence of selective reporting, the coefficient estimate βi is uncorrelated with its
standard error σi . Moreover, if the underlying effect has a normal distribution, then there is
an abundance of intermediate coefficient values with only few extreme values.
A number of meta-analysis data sets do not support these two predictions and suggest the
omission of some studies, as shown in Figure 1 Left. First, there is often a positive correlation
between the coefficient magnitude and the standard error; on average, imprecisely estimated
studies have higher coefficient values than precise studies. This could be due to researchers
5
omitting studies unless they are positive statistically significant. If the study is imprecise,
then a large coefficient magnitude is needed whereas if the study is precise, then coefficient
magnitude can be modest (Hedges, 1992). Alternatively, this could also be due to researchers
omitting studies when the coefficient values are low (Duval and Tweedie, 2000; Copas and
Li, 1997). There are two formal tests that examine the presence of publication bias through
examining the correlation between coefficient magnitude and study precision: an ordinal test
that examines their rank correlations (Begg and Mezuemder, 1994) and a cardinal test that
examines the correlation by regression (Egger et al., 1997). Second, there is occasionally excess
variance in the estimates with an abundance of studies at the extreme values beyond significance
thresholds and a paucity of studies with intermediate coefficients (Stanley, 2005).
While not universal, it is common to find such patterns of omission in meta-analyses. In
economics, important estimates such as the impact of minimum wage on employment (Card
and Krueger, 1995), the return to schooling (Ashenfelter et al., 1999), and the intertemporal
elasticity of substitution (Havránek, 2015), have evidence of a positive correlation. In medicine,
the effect of anti-depressant drugs was exaggerated by about 30 percent due to omission of
negative results (Turner et al., 2008), and this received much attention through a New York
Times article (Carey, 2008). In environmental studies, estimates of the social cost of carbon,
a key statistic for carbon tax policy, were shown to have the bias (Havránek et al., 2015).
2.2
Inflation
When the originally intended specification has a statistically insignificant result, researchers
may inflate the statistical significance through the choice of specifications for outcome and
control variables rather than omit the entire study (Leamer, 1978). If increasing the coefficient
requires search costs and it is sufficient to have statistically significant results, then there will
be an excess mass right above the statistical milestones, as depicted in Figure 1 Right.
Although empirically distinguishing inflation from omission is challenging, some evidence
suggest that omission alone cannot explain data, and thus inflation must exist. In economics,
under some parametric assumptions on the underlying distribution and a form of selection process as well as monotonicity of publication probability across t-statistics, Brodeur et al. (2016)
shows that about 8% of results reported as statistically significant may be due to inflation.
In sociology and political science, the Caliper test that examines excess mass right about the
cut-off suggests that some inflation may exist (Gerber and Malhotra, 2008). In psychology
with lab experiments such that sample size can be adjusted subsequently, Simonsohn et al.
(2014) reported density of p-values among the statistically significant tests were increasing in
p-values and interpreted this as evidence of inflation. In medicine where original specifications
are available for registered protocols from the Food and Drug Administration (FDA), Turner
et al. (2008) suggests that 12 out of 36 studies on anti-depressants whose original conclusions
were negative were published as positive in refereed journals.
6
3
Model
3.1
Set-up
Consider a static communication model between N senders, called researchers, and 1 receiver,
called a policymaker. Researchers receive private unbiased signals of the treatment effect of
policy βi ∼ N 0, σi2 , which are independent conditional on the true treatment effect and
have the standard errors σi > 0 that are independently distributed according to some common
distribution G (σ). They send the message of a positive result mi = 1 or a negative result
mi = 0, or do not report their study mi = ∅. The policymaker observes the number of positive
results, n1 =
PN
i=1 1(mi
= 1) and negative results n0 =
PN
i=1 1(mi
= 0) and decides whether
to implement the policy a ∈ {0, 1}.
Both the researchers and the policymaker maximize the social welfare from the policyimplementation decision:
a (Eb − c) ,
(3)
where b is the true treatment effect, and c is the cost of policy implementation.3 They have a
common prior over b that is distributed according to the normal distribution with mean 0 and
standard deviation σb . Assume c ≥ 0 so that the policymaker will not implement policy under
her prior belief.4
First, researchers receive their own signals, and simultaneously decide whether to publish
their binary conclusions. Then, the policymaker sees the reports and makes her policy decision
using her posterior. Finally, the pay-offs are realized. The number of players, their pay-offs,
priors, and the cost of policy implementation c are public information.
3.1.1
Equilibrium
This paper focuses on the Perfect Bayesian Nash Equilibrium, which is a set of strategies that
are the best responses to one another and the beliefs that are consistent with Bayes rule.5
The strategy of policymaker π (n) ≡ P (a = 1|n) , is a mapping from number of positive and
negative results n ≡ {n1 , n0 } to the probability distribution over binary policy action a. The
strategy of a researcher is a mapping from his own signal βi and σi to probabilities over his
message mi ∈ {∅, 0, 1}.6
3
While this set-up may appear to suppose no uncertainty in welfare when the policy is
not implemented
2
since it is 0, it encompasses such setting: suppose the welfare under policy is u1 ∼ N u1 , σu1
and the welfare in
2
the absence of policy is u0 ∼ N u0 , σu0
. Then, it is optimal to implement the policy if and only if u1 − u0 ≥ c.
2
2
Thus, defining b ≡ u1 − u0 and σb2 ≡ σu1
+ σu0
above will be an equivalent set-up.
4
Eb = 0 is a normalization since only Eb − c affects the policy implementation decision. Therefore, prior
belief over b will be incorporated into c in the discussions. The model allows Eb > 0 and c > 0 and the results
on bias and omission remain unchanged.
5
For expositional ease, it also assumes that beliefs over off-path realizations are consistent with utility
maximization.
6
For expositional simplicity, note that the messages m ≡ {mi }i=1,...,N are anonymous in that the policymaker
does not observe who reported each result. Without loss of generality, the analysis restricts attention to the
case in which the meaning of the message is consistent across all senders.
7
Communication models always have multiple equilibria, including completely uninformative
babbling equilibria. To make meaningful empirical predictions, the analysis takes two steps
of refining the equilibrium. First, to restrict attention to equilibria that avoid implausible
coordination failures, it focuses on equilibria that are fully responsive and fully informative.
An equilibrium is fully responsive if π (n) is not constant across all values of n1 for some n0 and
across all values of n0 for some n1 . An equilibrium is fully informative if E [b|n] is not constant
analogously.7 Second, to select among multiple equilibria that are responsive and informative,
it focuses on equilibria that are optimal and locally stable. An equilibrium is optimal if no
other equilibria can attain strictly higher ex ante welfare. An equilibrium is locally stable if,
for every neighborhood of the equilibrium, there exists a sub-neighborhood of the equilibrium
from which the myopic and iterative adjustment of strategies stays in that neighborhood. It
is standard to focus on the most informative equilibrium in sender-receiver games, and on the
optimal equilibrium in common interest games. Local stability assures robustness to small
deviations from the equilibrium strategies.
3.2
Analytical characterizations with homogeneous precision
This subsection will characterize the equilibrium under the assumption that N = 2 and G (σ)
is a degenerate distribution at a particular σ. This set-up8 can illustrate the main mechanism
behind the optimality of omission that generates bias of the signals used when m 6= ∅.
3.2.1
Communication with estimates
It is helpful to first characterize the full information benchmark of directly communicating
βi rather than restricted message. Suppose the policymaker can update her expectation over
treatment effect directly using signals available to researchers. Using the Bayes rule, her
posterior belief over b is
E [b | β1 , .., βN ] =
1
σ2
PN
1
σb2
i=1 βi
+ σN2
(4)
Thus, it is ex ante efficient to implement the policy (a = 1) if (with strict inequality) and
only if
PN
i=1 βi
N
σ2
≥c 1+
N σb2
7
!
(5)
Note that this definition differs from usual responsiveness and informativeness in that it requires both
m = 1, 0 to be informative and influential. For example, the policy rule a∗ = 1 ⇔ n1 = 2 is not fully responsive
in that m = 0 never alters the decision.
8
Given that meta-analyses often include more than 2 studies, this set-up with N = 2 may appear restrictive.
However, this simple setting is common in the committee decision-making literature to which this paper is most
closely related, such as Gilligan and Krehbiel, 1989; Austen-Smith, 1993; Krishna and Morgan, 2001; and Hao,
Rosen, Suen, 2001. Analytical characterization for N ≥ 3 is infeasible due to technical difficulties of analytically
evaluating multivariate normal’s truncated mean, a necessary step to ensure the consistency of posterior. Thus,
next subsection will demonstrate numerically that the main mechanism holds for N ≥ 3.
8
That is, the policy should be implemented if the average signal is larger than the threshold
plus the off-setting effect from the conservative prior.
3.2.2
Communication with coarse messages
Under coarse messages, the Bayes’ rule suggests plausible equilibria will have the monotonicity
on the strategies of both researchers and policymaker.
Lemma 1 (Monotonicity). For N=2, for any c, any fully responsive and fully informative
equilibrium has strategies that are monotone:
(i) each researcher takes the threshold strategy: there exists βi , β i ∈ (−∞, ∞) such that
m∗i =



1



βi ≥ β i
h
∅ if

βi ∈ βi , β i



0
βi < βi
(ii) the policymaker’s decision π ∗ (n) is increasing in number of positive results (n1 ) and
weakly in number of negative results (n0 ) in the following sense. For any n1 , n0 , and k,
π ∗ (n1 , n0 ) > 0 ⇒ π ∗ (n1 + k, n0 ) = 1
π ∗ (n1 , n0 ) < 1 ⇒ π ∗ (n1 , n0 + k) = 0
Proof. Appendix A1.
The monotonicity restriction is due to Bayes’ rule (4), combined with the law of iterated
expectation and monotonicity of the mean of a truncated normal distribution with respect to
the mean of the underlying distribution. At the threshold, the following indifference conditions9
must be satisfied:
E [b|βi , m−i ∈ P iv] =
1
σ2
(βi + E [β−i |m−i ∈ P iv, βi ])
= c,
1
2
+
2
2
σ
σ
(6)
0
where P iv denotes the message of another researcher such that changing one’s own message
is consequential for the policymaker’s decision.10 Given the researchers’ monotone strategies
with respect to their own signals βi , the policymaker also takes monotone strategies.
Using the Lemma 1, Proposition 1 will first demonstrate that, even though the set-up is
entirely symmetric given c = 0, a stable and optimal equilibrium has omission that generates
9
These indifference conditions are equivalent to the first-order conditions of the social welfare function with
respect to the threshold choice. Because of the common interest assumption, it is an unconstrained optimization
problem with no incentive constraints.
10
It may appear at first glance that this monotonicity condition naturally extends to general set-up with
heterogeneous σ, G (σ), by the law of iterated expectation. However, the counterexample in Appendix A2 shows
that, for some opponent strategies, it needs not hold. Therefore, characterization of equilibrium of general G (σ)
requires some assumptions on endogenous decisions of other researchers’ strategies that are difficult to evaluate
analytically. As an alternative, this paper will analyze the equilibrium numerically and verify the monotonicity
as shown in Appendix Figure B1.
9
upward bias in the underlying estimates of reported studies. To highlight the contrast, Proposition 2 will then show that responsive and informative equilibria with either no or symmetric
omission also exist, but that they are sub-optimal and unstable.
Figure 2: Equilibrium thresholds and policy decisions for N = 2, c = 0
Notes: Panel 1 plots the benchmark first-best policy implementation rule (β1 + β2 ≥ 0) as given by
the Bayes’ rule and an example of the bivariate normal distribution of signal realizations {β1 , β2 }.
Panel 2, 3, and 4 illustrate the equilibrium thresholds and policy decisions under three responsive and
informative equilibria. The dotted line shows the thresholds for each equilibrium, and the policy is
implemented if {β1 , β2 } were in the region northwest to the solid line for Panel 1 and 2. For Panel
3, policy is implemented with 21 probability in region surrounded by the dotted line. “False negative”
(“False positive”) regions denote signal realizations such that the policy is (is not) implemented under
the full information but is not (is) implemented in equilibrium. The figures’ origin is {0, 0}.
10
Proposition 1 (Stable and optimal equilibrium with bias). Let N = 2 and c = 0.
(i) There exists an equilibrium with the following strategies. The policymaker’s strategy is
a supermajoritarian policy decision rule:

1 if
π∗ =
0 if
n1 > n0
(7)
n1 ≤ n0
The researchers’ strategies are common and characterized by the thresholds that satisfy:
β≥0>β
(8)
β < −β
(9)
thus generates upward bias in the estimates of the reported studies:
E [βi |mi 6= ∅] > 0
(10)
Proof. Suppose that the policymaker follows the supermajoritarian rule as in (7). Assuming
that the two researchers adopt the common thresholds, they satisfy the indifference condition
(6) in expectation conditional on pivotality because they do not know the signal of another
researcher β−i :
h
i
β + E β−i |β > β−i ≥ β, βi = β = 0
h
(11)
i
β + E β−i |β−i ≥ β, βi = β = 0
(12)
Since these thresholds are strategic substitutes of each other, the symmetric threshold
exists and is unique. Since the policymaker focuses on the average value of signal whereas the
researchers’ indifference conditions are defined for the marginal value at the threshold, it is
strictly optimal to implement the policy following the supposed rule (7).
To confirm β
≥ 0 > β, towards contradiction,
suppose
β ≤ 0. Byi the indifference condition
h
i
h
(11), β = −E β−i |β−i ∈ [β, β), βi = β
> 0 since E β−i |β−i < β
< 0 regardless of other
conditions.
Because this contradicts
the assumption, β ≥ 0. Substituting this into (12),
h
i
β = −E β−i |β−i > β, βi = β < 0.
The asymmetry
β < −β must hold ifor β ≥ 0. Towards contradiction, suppose β ≥ −β.
h
Then, β = −E β−i |β−i ∈ [β, β), βi = β < 0 because the combinations of conditions β−i ∈
h
i
[β, β) and βi = β imply E β−i |β−i ∈ [β, β), βi = β > 0. Since this contradicts the assumption,
β < −β must hold; that is, positive results are pivotal only when another researcher omits his
result and negative results are pivotal only when another researcher reports a positive result.
11
The bias as in (10) occurs due to this asymmetry of thresholds. Appendix A3 shows that the
symmetric equilibrium exists.
To investigate the properties of this equilibrium, let us note the following numerical observation:
Observation 1. (Moderate strategic
substitutability
under normal distribution) Define the
BR
combined best response function β i
β −i as the fixed point of the indifference condition (11)
in which β−i is given by the condition (12).11
BR
∂β i
β −i
∂β −i
> −1 ∀β −i
(13)
Although this condition may appear to be an endogenous assumption, it is a condition
defined on the derivative of the mean of truncated normal distribution, which is a primitive of
the model. While it is difficult to verify analytically, the Appendix Figure B2 shows that it
holds for all plausible range of parameters.
Proposition 1 (ii) Under Observation 1, the equilibrium with asymmetric omission characterized in (i) is locally stable and optimal.
Proof. Appendix A3.
The equilibrium characterized in Proposition 1 differs from the conventional interpretation
of publication bias in three ways. First, it shows that the bias can occur among the unbiased
researchers in a locally stable, thus empirically plausible, equilibrium. Second, the biased
reporting is socially optimal and thus, the policymaker also prefers such reporting practices.
Third, the bias in the underlying estimates of reported studies does not imply the bias in policy
decisions.
Thenumerical illustration shown in Appendix Figure B2 shows P (a∗ (m) = 1) <
P aF B (β) = 1 =
1
2
under various parameters; that is, the likelihood of policy implementation
is not higher than that under the equilibrium with full information communication. In this
way, scientific communication in this model is conservative because only the messages m, not
the estimates β, are reported and interpreted through coarse aggregation. The equilibrium
thus has an “unbiased” publication bias both because it is observed with unbiased researchers
and because it does not lead to bias in decision-making.
Figure 2.2 is the graphical illustration of this equilibrium, and shows it is optimal because it
most efficiently approximates the full information threshold in Figure 2.1. The equilibria that
maximize social welfare are the ones that minimize the welfare loss due to suboptimal decisions:
because of coarse messages, the policymaker may implement the policy when she would not
11
Note that, by Lemma 1, the combined best response function is unique.
12
under full communication or may not implement when she would. This equilibrium has the
lowest welfare loss due to such false positive and false negative errors because omission in this
model is just like enriching the message space from mi ∈ {1, 0} to mi ∈ {1, 0, ∅}. Moreover,
an iterative adjustment from a neighborhood converges to the equilibrium. This is because
the policymaker’s strategy does not rely on marginal conditions of researchers’ strategies, and
researchers’ strategies are continuous and only moderately responsive to that of each other.
Proposition 2 will formally show other informative and responsive equilibria as illustrated in
Figures 2.3 and 2.4 exist, but are not robust to small perturbation of strategies and attain only
lower welfare than this equilibrium.
Proposition 2 (Unstable and sub-optimal equilibria with no bias).
Let N = 2 and c = 0.
(i) There exist equilibria with the following strategies:
• No omission: There exists an equilibrium with following strategies that satisfy the following conditions: the policymaker follows
∗
π =



1



n1 = 2
π̃ if




0
(14)
n1 = 1, n0 = 0
n0 ≥ n1 ,
where π̃ ∈ (0, 1] and the researchers’ thresholds satisfy β = β < 0 so that E [βi |mi 6= ∅] = 0.
• Symmetric omission: There exists a symmetric equilibrium with strategies that satisfy the
following conditions: the policymaker follows
π∗ =



1



1
2




0
n1 > n0
if
(15)
n1 = n0
n1 < n0
and the researchers’ thresholds satisfy β > 0 > β and β = −β so that E [βi |mi 6= ∅] = 0.
(ii) These equilibria with no or symmetric omission are sub-optimal and unstable.
Proof. Appendix A4.
In the environment identical to Proposition 1, Proposition 2 shows that other informative
and responsive equilibria with no bias in the estimates that underly the reported messages
exist, and nevertheless that they are unstable and socially sub-optimal. The following heuristic
arguments summarize the key logic in the proof contained in the Appendix A3.
As illustrated in the Figure 2.3, there exists an equilibrium with no omission because
the benefit of enriching the message by omission exists only when another researcher also
omits. Nevertheless, a following perturbation argument shows this equilibrium is not stable
0
nor optimal. Modifying the strategy of one researcher slightly by β = β + ∆ for small ∆ > 0
induces another researcher to also omit some results, breaking stability. Moreover, shifting from
0
(14) to the supermajoritarian rule (7) and modifying both researchers’ strategies to β = β + ∆
13
strictly improves welfare because it reduces the false positive errors: such perturbation can
better approximate the full information threshold.
Given the mixed strategy of the policymaker, there also exists an equilibrium with a symmetric omission of positive and negative results as depicted in Figure 2.4. This is because of the
set-up that is symmetric between 0 due to the bivariate normal distribution and Eb = c = 0.
0
However, a perturbation of the researcher strategy by β = β − ∆ and β 0 = β − ∆ can
demonstrate it is also unstable and sub-optimal. As the policymaker’s mixed strategy relies
on tie-breaking and indifference, it is not robust to small perturbations of researcher strategies
and turns to (7) under such modification. Moreover, this shift increases welfare because it
increases E [b|n1 > n0 ] by quantity proportional to ∆ while decreases welfare by magnitude
proportional to ∆2 . Since there is a first-order gain and second-order loss12 , this perturbation
improves welfare for small ∆ > 0.
Relating the results to the seminal works in voting theory regarding information aggregation can clarify why the omission with bias is optimal. Both voting theory and this model
have set-up such as private information, common interests, and simultaneous reporting. The
result regarding omission resembles the result that uninformed and unbiased voters abstain
when there are other informed and independent voters (Feddersen and Pesendorfer, 1996).
When voters lack information regarding which candidate is better, they prefer to abstain from
voting to let others with reliable information cast decisive votes. The result regarding biased
reporting is analogous to the result that, under private information, unanimity rule increases
the probability of false conviction of innocent if the jurors condition their votes on the state
in which their votes are pivotal (Feddersen and Pesendorfer, 1998). Because unanimity rule is
a conservative voting rule, each unbiased juror’s equilibrium strategy will be biased to counterbalance such overly cautious rule by occasionally voting to convict even if their private
information suggests defendant is innocent. They then suggest that, therefore, the unanimity
rule is socially sub-optimal compared to other voting rules such as a majoritarian rule.
The key difference of this paper’s model from these models is the information coarsening:
while the models in voting theory have commonly assumed that voters receive binary signals13 ,
this paper considers researchers who receive continuous signals but can send only binary messages. This assumption of continuous signals is particularly appropriate for academic research
in which the private signals are the estimates of policy effectiveness. While this difference may
appear subtle, a set-up with binary signals in Appendix A5 demonstrates it is consequential:
if the signals were binary so that the researchers only know the sign of coefficient, 1(βi ≥ 0),
then an optimal and stable equilibrium has no omission nor biased reporting. That is, if signals
were binary, reporting everything that the researcher knows as he knows is always optimal.
12
Specifically, the welfare gain is 2∆ P m−i = 1|β E β−i |m−i = 1, β + P m−i = ∅|β E β−i |m−i = ∅, β }
and the welfare loss is 2∆2 P βi ∈ β − ∆, β ∧ β−i ∈ β − ∆, β
β+β .
13
Feddersen and Pesendorfer (1999) considers an environment in which there are finitely many ordered signals.
The bias in their paper arises due to the correlation between preference and biased distribution of information.
This paper differs from theirs in that the private signals have two dimensions and cannot be ordered, and that
bias arises due to the pivotality conditions.
14
Under coarse communication, researchers can only communicate the sign of their signals
while they also know and wish to convey the strength of their signals. Thus, even though
their signal precisions are the same so that all researchers are equally well-informed, those with
coefficients near zero still want to omit their results to communicate that their signals had small
magnitudes in absolute value. The set-up with information coarsening offers an explanation
for the result-dependent reporting in which small, noisy studies are publishable only with
significant results. In contrast, the set-up with binary signals cannot explain this because it
suggests reporting decision depends only on precision of experiments that depends on process
such as sample size and study design. In addition, even though an equilibrium with symmetric
omission exists, the optimal equilibrium has asymmetric omission with bias in the estimates
that underlie reported studies. This is because thresholds β, β with asymmetry around zero
can closely approximate the first-best decision rule, β1 + β2 ≥ 0, in a multi-dimensional setting
as the comparison of Figure 1.2 and 1.4 shows. In contrast, the set-up with binary signals
does not lead to optimality of biased reporting because garbling of information by researchers
cannot improve the policymaker’s decisions.
Full characterization of all fully informative and fully responsive equilibria will involve two
additional equilibria that are analogous to those characterized in Proposition 1 and Proposition 2 without omission. Instead of the supermajoritarian rule (7), the policymaker adopts a
submajoritarian decision rule a∗ = 1 ⇔ n1 ≥ n0 in these equilibria. Proposition 1 focuses on
the case of a∗ = 1 ⇔ n1 > n0 because it will be numerically shown to be the unique optimal
equilibrium for c > 0 as the likelihood of incurring this positive cost is lower in this equilibrium
(Appendix Figure B4). While it is ideal to analytically characterize equilibrium for c > 0, it is
intractable to prove optimality because it involves solving system of nonlinear conditions (11,
12) and evaluating sum of truncated correlated multivariate normals. Instead, the next section
will demonstrate this numerically under range of plausible parameters.
Extension. While the above model has considered the noise in the estimates due to
sampling errors, the publication decision also depends on the study-specific noise that affects
its generalizability. For example, the underlying treatment effect in one particular population
may differ from that in another. In observational studies, the estimation strategy may have
biases in ambiguous directions. This kind of noise cannot be reduced by increasing sample size,
and thus, researchers may not report even if their standard error of estimates, σi , were small.
The model can incorporate such underlying heterogeneity by adding another error: βi =
b + i + ζi , where b ∼ N 0, σb2 is the true benefit, i ∼ N 0, σi2 is the sampling error, and
ζi ∼ N
0, σ02
is the study-specific error. Then, the posterior will be modified to
so that the above analyses will carry through even in this case.14
1
(β1 +β2 )
σ 2 +σ 2
0
1
+ 22 2
σ2
σ +σ
0
b
14
In practice, it may appear unreasonable to assume that σ0 , heterogeneity across settings, is known while
the mean benefit b is unknown; it is more reasonable to take the expectation over σ0 to determine the thresholds
instead. Moreover, when the study design’s credibility is different, σ0 can be specific to studies and corresponds
to the weight on credibility of study.
15
3.3
Numerical illustrations with heterogeneous precisions
The numerical analysis can extend the equilibrium characterization in Proposition 1 to a more
general setting. Computing the equilibrium takes two steps: first, assuming that the monotonicity as in Lemma 1 and the supermajoritarian rule (a∗ = 1 ⇔ n1 > n0 ) hold in a symmetric
equilibrium, solve for the system of indifference conditions; second, verify that the monotone
strategies and the supermajoritarian rule are consistent with players’ optimization15 .
The key indifference conditions extend (11) and (12) in Proposition 1 as follows: for every
i, for every σi ∈ Supp (G), strategies β i (σi ) and βi (σi ) satisfy

P
β i (σi )
β−i
2 +σ 2 +
−i σ 2 +σ 2
σ
 i 0
0
−i
E
P
1
1
+
i σ 2 +σ 2
σb2
0
i


0
n1 = n00  = c
 β (σ )

P
i i
β−i +
2
2
2
2
−i σ +σ  σ +σ

0
−i
E  i 10 P
n01 = n00 + 1 = c,
1
2
2 +
2
i
σb
(16)
(17)
σi +σ0
where n01 , n00 refers to number of positive and negative results of other reports respectively.
The left hand side is the posterior belief of policy benefit conditional on being pivotal, and
the right hand side is the threshold effectiveness for implementing the policy. It allows for
uncertainty over not only strength but also precision of others’ signal16 and combinations of
others’ messages such that the message is pivotal due to N ≥ 3, and equates the posterior (4)
to any c > 0.
3.3.1
Sub-optimality of t-statistics rule
Figure 3 demonstrates the main numerical result that β (σ), the threshold between sending
positive message or not reporting the study, is strictly convex in σ.17 While the model does not
15
For example, the Appendix Figure B4 verifies that the supermajoritarian rule attains higher social welfare
than a submajoritarian rule (a∗ = 1 ⇔ n1 ≥ n0 ) when c > 0.
16
As shown in Appendix Figure B5, the simulations with heterogeneous G (σ) in this sections discretizes the
values of standard deviations into 10 bins σ ∈ {0.1, ..., 1}, with χ2 distribution of 4 degrees of freedom
between 0 and 10. This smooth distribution approximates the imputed distribution of standard errors
of all underlying studies in the labor union example discussed in empirical section.
17
In Figure 4, β (σ), the threshold between sending negative message or not reporting the study, is concave.
While the convexity of β (σ) holds for any c > 0, such concavity of β (σ) holds only for small c > 0. As the
Appendix Figure B6 shows, the β (σ) threshold can be convex and increasing in some range of σ for sufficiently
large c > 0. This is driven by the effect of the conservative prior as described in the Bayes’ rule (4). The full
information threshold increases as σi increases due to the term
cσi2
.
N σ2
b
Thus, if c > 0 were large, even β (σ) would
be convex and can be increasing in σ. This is counterintuitive for two reasons. First, it is a counterexample to one
of the principal findings in voting theory literature that those who are less informed abstain more (Feddersen
and Pesendorfer, 1999; Feddersen, 2004). In the most stark example, the abstention probability appears to
converge to 0 as the voter becomes completely uninformed (σi → ∞). This shows that abstention is not driven
by uninformedness per se, but is instead driven by the weakness of belief that supports either policy options.
If the prior is strong because c > 0 is large, then having evidence can moderate the prior and lead to more
abstention. This can offer an explanation for why “uninformed voters” may vote more as we see in recent
elections with extreme candidates in the real world. It is consistent with the model presented in Banerjee et
16
impose any rule based on t-statistic, actual scientific publication requires statistical significance
to draw positive conclusions. Under some constant t-statistic threshold, researchers whose
signals fall in the shaded region may inflate the t-statistic through specification searches to
announce positive results. This occurs even when researchers are unbiased, and moreover, the
policymaker also prefers reports with inflation than those with strict compliance a constant
t-statistic threshold rule. As specification searches incur marginal costs, this mechanism can
explain why there will a mass of studies right above the significance threshold as reviewed in
Section 2.
Figure 3: β (σ) and β (σ) thresholds
Notes: Figure 3 plots the β (σ) , β (σ) thresholds under the prior standard deviation σb = 1, no studyspecific effect σ0 = 0, and policy cost c = 0. The darker solid line is β (σ), the lighter solid line is
β (σ), and the dashed line represents some linear t-statistic threshold. Studies in the shaded region
draw positive conclusions even though they are marginally statistically insignificant.
The non-linearity of thresholds β (σ) , β (σ) is primarily due to the difference in the use
of standard errors between the null hypothesis testing and the Bayesian updating: whereas
t-statistics divides coefficients by their standard errors, σ, the Bayes’ rule divides coefficients by variance, σ 2 . t-statistics normalize the convergence of distribution of βi that ocal. (2011) although their model did not consider the voting as means of information aggregation. Second, this
result suggests that, if the researchers can make their estimates less precise and publish if their objective were to
publish. This prediction may be viewed as a weakness of the theory. Since in reality reporting binary decisions
is not as flexible as the model allows, the policymakers are likely to require large number of positive results to
decide a∗ = 1.
17
√
curs at rate
n. In contrast, the Bayesian updating puts weights to each study that is
proportional to its information that increases at rate n in the absence of study-specific ef18
fects
condition (16) can be approximately rewritten
as β i (σi ) ∝
(σ0 =
0). The indifference
σi2 c − E
β−i 0
−i σ 2 |n1
−i
= n00
P
if no uncertainty in σ−i exists. Since c > E
β−i 0
−i σ 2 |n1
−i
P
= n00
so that the policy would not be implemented in the absence of an additional positive result,
the threshold β i (σi ) is convex due to the term σi2 . Conversely, βi (σi ) can be concave because
βi (σi ) ∝ σi2 c − E
P
β−i 0
−i σ 2 |n1
−i
= n00 + 1
and c < E
P
β−i 0
−i σ 2 |n1
−i
= n00 + 1 .
The following heuristic argument demonstrates that the linear threshold rule β (σ) = tσ and
β (σ) = −tσ for a constant t-statistic threshold t > 0 is generally sub-optimal. Consider the
benchmark example in which c = 0 and σ0 = 0. If a linear threshold rule were applied, then the
probability of omission is constant across any values of standard errors σ. Recall, however, that
the voting theory of abstention suggests it is optimal for the omission probability to decrease
with precision of studies when c = 0. Hence, the linear threshold rule contradicts the efficiency
of aggregation through vote counting.
Figure 3 shows three additional qualitative results of the thresholds β (σ) , β (σ) that deviate
from t-statistics rule. First, the asymmetry as in (9), β (σ) < −β (σ), continues to hold among
large range of σ given heterogeneous G (σ) and small c > 0 so that the coefficients that underly
reported studies have an upward bias.19 Second, the very imprecise studies with marginally
significant results are omitted. When the dataset is not large or the identification strategy
is not fully reliable, it is plausible that researchers or referees do not publish results that are
only marginally significant.20 Third, as long as researchers can claim “negative results” when
the coefficient value is small even if it were statistically significant, the very precise studies
may be always reported regardless of their signs. If there were no study-specific effects nor
uncertainties due to study design (σ0 = 0)21 , when σ → 0, the desirability of omission goes
to zero since the signal becomes noiseless. Since the impact of σ on thresholds is primarily
quadratic, the omission probability is very small for the neighborhood of σ = 0.
18
The Bayesian updating uses the weights inversely proportional to variance when σ02 = 0 because it is the
only weight that respects invariance to re-grouping of experiments. To see this, consider two experiments that
T
C
have sample size k1 , k2 of both treatment and control groups. Denoting yij
, yij
as the outcomes of those in
Pkj
treatment and control group, the treatment effect estimate from each study is βj =
P Pkj
Pkj
yT −
i=1 ij
yC
i=1 ij
i=1
Pkj
T
yij
−
kj
i=1
C
yij
for
j
j = 1, 2. The aggregate estimate is defined as β1+2 =
. The only way to obtain this
k1 +k2
2 β2
expression from coefficients β1 , β2 is to apply the formula β1+2 = k1 βk11 +k
,
which
is equivalent to the Bayes
+k2
rule since variance is inversely proportional to the sample size.
19
As Figure 4(B) and Table 2 column (4) show, the bias becomes the opposite when c > 0 is very large
relative to σb .
20
While some claim that it is optimal not to publish any noisy results, the numerical analysis shows that it
is socially optimal that noisy extreme results are reported.
21
As Figure 4(B) shows, when σ0 > 0, the omission probability of most precise studies becomes clearly greater
than zero because the information contained in one study is bounded above. Since in reality researchers do not
know σ0 , it relies on the researchers’ belief over the parameter.
18
3.3.2
Increasing the number of researchers
Figure 4: β (σ) , β (σ) thresholds under various σb and σ0
Notes: Figure 4 plots the β (σ) , β (σ) thresholds under c = 13 and varying standard deviations σb and
σ0 . In each figure, the convex solid lines are β (σ) and concave solid lines are β (σ). There is only one
threshold for N = 1 since there is no benefit of omission when there is only one researcher.
19
There were two critical limitations in the set-up for analytical characterization regarding the
number of researchers. First, there were only two researchers even though there are more than
three researchers on most research topics. Second, it assumed that the policymaker knew the
number of researchers whereas the problem of omission is often attributed to the uncertainty
regarding how many studies are unreported. What are the effects of increasing the number of
researchers on the policymaker’s aggregation strategy, researchers’ omission and bias, and the
welfare attainment?
First, the policymaker’s supermajoritarian rule (a∗ = 1 ⇔ n1 > n0 ) holds in the optimal
communication equilibrium even for N = 3, ..., 10 (Appendix Figure B7)22 . This suggests
that even if the policymaker does not know the number of underlying studies, they can still
compare the number of reported positive vs negative results to make the decisions they would
have taken with knowledge of N . While this result may rely on risk neutrality as will be
discussed in sensitivity analysis, it suggests that the assumption that the policymaker knows
N is not critical.23
Second, the researchers’ omission probability gradually increases as N increases. The Figure
4 depicts the example equilibrium thresholds for N = 1, ..., 4 keeping all other environment
constant when (A) σb is high and σ0 is high, (B) σb is low and σ0 is low, and (C) σb is high and
σ0 is low. In all cases, the probability of omission conditional on study precision, P (mi = ∅|σi ),
increases with N for any values of σi . This is because, as N increases, the total information
owned by other researchers rises and leaving the decisions to others’ papers becomes more
desirable.24
Nevertheless, the bias on the coefficients that underly reported studies may increase or
decrease as increase in N may shift the thresholds in either directions.25 This is because there
are three channels through which N alters the thresholds. As Figure 4(A) shows, the effect
of changing pivotality condition shifts β upwards and β in ambiguous directions, potentially
mitigating the bias as N increases.26 As Figure 4(B) shows, the decreasing effect of conservative
22
This is verified for degenerate value of σ. N = 10 is the highest number of researchers attempted due to
computational feasibility.
23
The set-up needs to maintain the assumption that the researchers know how many other researchers exist
on the same subject. First, this appears to be closer to actual scientific practice than the assumption that the
readers also know number of researchers. Second, the Figure 4 demonstrates that the thresholds do not change
qualitatively with N and the changes are not quantitatively large: thus, even if there were uncertainty in N
from researchers’ perspective, they may still be able to choose approximately optimal thresholds.
24
This result is consistent with the literature on voting theory that shows that the abstention probability
increases as the number of voters increases.
25
This inquiry relates to the literature on media that explores the impact of market competition on media
bias, and finds that the higher number of competing senders may increase the bias arising from taste and decrease
bias arising from reputational motives (Gentzkow et al., 2016).
26
Consider, for example, the pivotality conditions for N = 2 and N = 3. β (N = 2) is lower than β (N = 3)
because it satisfies the indifference condition (16) when one researcher receives high signal as opposed to when one
receives high signal and another receives intermediate signal. β (N = 2) may be lower or higher than β (N = 3)
because it satisfies the indifference condition (17) when only one another researcher receives intermediate negative
signal as opposed to two researchers receiving intermediate negative signals or one receiving positive and another
20
prior as in (5) shifts both β and β downwards. When there are more researchers, each researcher
needs less extreme signals to overturn the default decision. As Figure 4(C) suggests, there is
also the effect of equilibrium thresholds adjustment that shifts both β and β in directions that
offset the effect of the first two effects. For example, β may shift downwards due to upward
shift in β especially among noisy studies, increasing the bias. These considerations jointly
determine the conditions under which the increase in number of researchers, N , may increase
or decrease the bias.
Finally, while rigorous proof is beyond the scope of this paper27 , it conjectures that the
information aggregates fully in a sense that, as N goes to infinity, the probability that the
equilibrium action is the best action converges to 1 (Feddersen and Pesendorfer, 1996). The
incompatibility between informative and strategic voting occurred due to the inflexible voting
rule (Austen-Smith and Banks, 1996). In this communication model, since the aggregation rule
through voting can flexibly adjust, the analogous source of inefficiency is absent. Any thresholds
will asymptotically fully convey the policy benefit b since the cumulative distribution function
of normal distribution is strictly increasing everywhere and thus one-to-one. As Figure 4 shows
that the thresholds will likely be finite at least for some range of σ, information is most likely
to aggregate fully as N → ∞ in this model.
3.3.3
Quantitative characterization of bias, omission, and welfare
Could a model explain substantial degree of upward bias due to omission28 , frequent omission,
and suggest that the welfare gain from the optimal threshold as opposed to linear threshold
with full reporting is sizable? Table 2 summarizes the quantitative exploration.
In many specifications, Panel (A) suggests sizable upward bias in unweighted29 average
estimates. Consistent with Figures 3 and 4, these biases are driven by noisy studies. In
contrast, most precise studies have very small biases. Note, for large c > 0, there can be
downward bias because the studies whose coefficients near c are omitted.30 Quantitatively,
despite its symmetric set-up, the model can explain the bias as large as one standard deviation
of the underlying distribution of benefits at least among the lease precise studies.
Panel (B) shows that the average omission rate can be as high as roughly 30 to 50 percents.
While most precise studies have only 1 to 3 percent omission rate in the absence of studyspecific effects (σ0 = 0), most noisy studies may be omitted roughly 70 to 80 percents. On the
other hand, belief regarding high σ0 can raise the omission rate even among the most precise
receiving negative signals.
27
Analytically, it is impossible to characterize the voting rule due to difficulties of evaluating the sum of
truncated normals. Numerically, it is impossible to investigate asymptotic properties because of computational
feasibility. Moreover, unlike in voting theory, the inquiry on asymptotic property is not central to the application
of scientific communication because there are usually often only a few researchers on one topic.
28
Note that this is a lower bound of the actual bias under the t-statistic rule because inflation of coefficients
leads to even larger upward bias in reported estimates.
29
Note that this is different from the meta-analysis estimates with Bayes’ rule, which puts higher weight on
the more precise studies. Many meta-analysis studies often discuss these unweighted estimates.
30
This prediction of downward bias also occurs with the Hedges selection model discussed in Section 4.1.
21
Table 2: Bias, omission, and welfare
(1)
(2)
(3)
Baseline High σb High σ0
(A) Bias:
i. overall
0.34
0.20
0.26
ii. σi = 0.1
0.00
0.00
0.01
iii. σi = 1
1.01
0.96
0.60
(B) Omission probability: i. overall
0.43
0.32
0.44
ii. σi = 0.1
0.03
0.01
0.30
iii. σi = 1
0.70
0.67
0.60
(C) Welfare:
i. unrestricted
0.97
0.99
0.96
ii. restricted
0.90
0.97
0.89
Specification changes from baseline
σb = 1
σ0 = 12
(4)
High c
-0.56
-0.03
-1.27
0.42
0.03
0.77
0.88
0.67
c = 13
(5)
High N
0.31
0.00
1.07
0.48
0.03
0.80
0.96
0.91
N =3
Notes: Table 2 summarizes (A) bias E [βi |mi 6= ∅, σi ], (B) omission probability P (mi = ∅|σi ), and (C)
ex-ante welfare E [a∗ (m∗ (βi , σi )) [Eb (βi , σi ) − c]] under various settings. In (A) and (B), i. overall values
are expected values unconditional on σi realization; ii focuses on the most precise studies (σi = 0.1);
and iii focuses on the least precise studies (σi = 1). In (C), welfare is computed as a fraction of full
information welfare. i. unrestricted refers to the environment without linear t-statistics rule whereas ii.
restricted refers to the hypothetical setting in which no omission is allowed and the threshold is restricted
to follow β (σ) = tσ. Here, t is computed to be the optimal value. Since there is no omission by the
exogenous restriction, bias and omission probability are both zero. The baseline specification applies
σb = 31 , σ0 = 0, c = 0, and N = 2. Columns (2)-(4) modifies this environment for each specification as
presented.
studies to around 30 percent. Combined with the result in (A), the baseline case demonstrates
that, among the 30 percent of reported noisy studies (σi = 1), 28 percents are positive results
and only 2 percents are negative results. Conversely, most precise studies have mostly negative
conclusions roughly to the extent that the full reporting suggests. This is consistent with
researchers’ expectation that, when their set-up is noisy, their studies are publishable only
when they are significant and “right-signed.”
Finally, the model can quantify welfare gains from permitting publication bias relative to
imposing restriction that every binary conclusion of null hypothesis testing needs to be reported.
As discussed in the introduction, coarse communication can be largely welfare-reducing, and
even with sophisticated readers who compute the posterior correctly and flexible adjustment
of t-statistics, there is roughly 3 to 30 percent welfare loss relative to the full information
benchmark. The gain from allowing for some omission and inflation is to roughly halve these
welfare losses to 1 to 12 percents. This analysis demonstrates that the welfare consequences
can be quantitatively important.
3.3.4
Implications on empirical tests
The numerical analysis provides a qualitative prediction that, as Figure 3 shows, omission
occurs mainly among the imprecisely estimated studies with intermediate coefficient magnitude.
In particular, when the negative results are reported, they tend to be either precisely estimated
22
or have the negative coefficients with large absolute values. This qualitative result is coherent
across all specifications.
While the positive correlation between coefficients and standard errors may occur when underlying effectiveness, b, is large, the prediction on publication selection process raises question
on the validity of interpreting the correlation as evidence of “biased publication bias” (Begg
and Mezuemder, 1994; Egger et al., 1997). A strong correlation may be due to unbiased motive to report only the very precise null results. Moreover, even when there is a large degree
of omission, correlation may be weak due to only a few imprecise and very negative results.
The analysis suggests that caution is warranted when relating the correlation tests with biased
motives of researchers.
3.4
Sensitivity Analysis
The main model made simplifying assumptions to focus on the key mechanism due to coarse
messages. While the full analysis is beyond the scope of this paper, the following examples
illustrate how results may alter under different assumptions with conflict of interest, sequential
reporting, and risk aversion.31
3.4.1
Conflict of interest vs common interest
While the main model assumed perfect alignment of interests, public discussions frequently
suppose some misalignment of objectives due to private motives for policy, different prior
beliefs32 , or private benefits and costs of publication. Alternatively, the policymaker may have
bias due to political affiliations. Communication under conflict of interest depends critically
on the assumptions of commitment abilities. Throughout this discussion, assume that the
policymaker cannot commit to a particular decision rule π (n); if she could, then the equilibrium
under common interest would still be an equilibrium because she can make policies in ways
that researchers find deviation from the strategy under common interest undesirable.
Small vs large misalignment. Consider the benchmark model environment with 2
researchers and the cost of policy c = 0. For example, researchers may have biased motives
for promoting or discouraging policies so that their payoff can be written as a (b + d), where
d ∈ R denotes a distortionary motive. While the main model abstracted away from the journal
referee process, it is an important process in reality, and preference for publication can be
written as ab + d1(mi 6= 0). For small d, the equilibrium policy rule a (n) remains the same,
and the message strategies β, β change only slightly. To see this, note that under common
31
There are other important considerations, such as irreversibility of decisions, endogenous precision and
information acquisition, uncertainty over N , heterogeneity in c after the reports, endogenous agenda-setting,
multi-dimensional policies, and multi-dimensional learning that involves reputational concerns of researchers.
However, they require involved modification to the set-up, and are thus left for future analysis.
32
Aumann (1976) suggested that, under common prior, people cannot agree to disagree because they rationally
update their belief using others’ priors. However, it is commonplace to observe researchers with different beliefs
in the real world perhaps due to limited communication, and there are fruitful communication models that
investigate heterogeneous beliefs (See, for example, Banerjee and Somanathan, 2001).
23
interest, the policymakers strictly preferred to implement the policy when n1 > n0 because the
policymaker focuses on averege E [β|m] whereas the indifference was satisfied at the threshold
β, β. Therefore, if the threshold changes continuously as d increases, then for sufficiently small
d, the supermajoritarian decision rule π (n) as in (7) remains to be optimal. As contained in the
proof in Appendix A6, β, β changes continuously as d changes in both examples. Regardless of
whether senders can commit, small conflict of interest changes the equilibrium only marginally;
thus, the main analysis and its empirical predictions are qualitatively valid even without the
assumption of perfectly disinterested researchers.
Clearly, large biases in researcher motives can significantly alter the qualitative characterization of equilibrium. If the researchers cannot commit to some reporting strategies, then
the policymaker may never trust their reports, and thus no informative communication will
be feasible (Crawford and Sobel, 1982; Hao, Rosen, and Suen, 2001). If the researchers can
commit to some reporting rules, the researchers may optimally persuade the policymakers to
implement the policies more often than the social optimum (Kamenica and Gentzkow, 2016).
Allegiance bias. The model can also explain the allegiance bias, the correlation between conclusions of studies and authors, as vote counting consideration amplifies originally
small disagreements among researchers. Consider the example in the introduction and suppose one researcher 1 hopes to promote the policy so that his utility is a (b + d) , d > 0
whereas another researcher 2 is unbiased so that his payoffh is ab. Recall that the iunbiased researcher’s indifference conditions are given by β 2 = −E β1 |β 1 > β1 ≥ β1 , β2 = β 2 and
h
i
β2 = −E β1 |β1 ≥ β 1 , β2 = β2 , which are decreasing in the biased researchers strategies β 1 , β1 .
Therefore, when the interested researcher lowers his threshold β 1 , β1 to induce more policies,
the unbiased researcher responds by raising his threshold β 2 , β2 and recommends fewer policies. This further feeds back into the interested researcher’s strategies and induces him to
recommend policies even more frequently. In this way, even if the underlying differences in
preference or belief may be small, polarization in the standards applied to draw their conclusions occurs endogenously through strategic substitutability between researchers. It is natural
to expect small differences in beliefs among researchers, and this explanation can shed light on
one aspect of the nature of academic debates.
To quantify the magnitude, consider an example with 2 researchers, c = 0, and σ0 = σ = 1.
If neither researcher has bias, then the equilibrium threshold is β1 = 0.19. If researcher 1 has
bias d = −0.1 so that he has bias towards policy implementation and researcher j does not
have bias, then the thresholds become β1 = −0.25 and β2 = 0.5, as illustrated in Figure 5.
Note that, if there were only 1, then the threshold for recommending policy only changes by
−0.2 due to d = −0.1 in this set-up. This example illustrates that, because of the strategic
substitutability, the difference in threshold, β2 − β1 = 0.75, becomes more than three times
larger than that under N = 1.
24
Figure 5: Equilibrium responses to each other’s β
Notes: To visualize strategic substitutability, Figure
5 shows the best response function (11) that
substitutes (12). That is, it plots the βi = BRi β −i , β−i = BRi β −i , BR−i βi for i = 1, 2 above.
The solid lines show the best response functions when neither researchers have bias. The dashed lines
show the best response functions when researcher 1 has bias d = −0.1.
3.4.2
Sequential vs simultaneous report
The model focused on the simultaneous report in which the researchers present their results
with no knowledge of others’ results. While actual reporting is sequential, such simultaneity
assumption could be loosely interpreted as researchers having uncertainties about the future
results of others. This simplification is common both in cheap talk models and voting models
although actual conversations take place sequentially, and many examples of voting take place
with lag across constituencies so that there is gradual revelation of information. Following
examples demonstrate how dynamic considerations maintain or alter the results.
Example 1. (Equivalence) Consider the main model with 2 researchers. Suppose, instead, researcher i = 1 reports his result first, and researcher i = 2 observes only the result
m1 when he reports his result. Then, the optimal equilibrium is equivalent to the model with
simultaneous report.
To see why, note that researcher 2 was already conditioning his report on the realization of
m1 at the pivotal case. Thus, maintaining the strategy equivalent to simultaneous reporting
is optimal. Given the researcher 2’s strategy is unaltered, it is also optimal for researcher 1 to
keep his same strategy. This result is analogous to Dekel and Piccione (2000)’s result that the
25
simultaneous voting game’s equilibria are also sequential game’s equilibria.33 The key is that
the information asymmetry exists between the researchers, not the exact timing assumption.
Example 2. (Reviewer) Consider the main model with 2 researchers. Suppose, instead,
researcher i = 1 reports his result first, and researcher i = 2 observes {m1 , β1 , σ1 } only when
m1 6= ∅ before he reports his result. Then, when the policymaker adopts the supermajoritarian
rule as in (7), m∗1 = 1 and the first best is attained always.
Example 1 assumed that researcher 2 does not read researcher 1’s paper carefully to understand the signals {β1 , σ1 }. In contrast, it is reasonable to suppose that the researcher 2 can
obtain full details only when researcher 1 does not omit the result.
This example highlights the motive of publishing papers only to communicate the detailed
results to other researchers: to convey the signal to later researchers, early researchers must
report either m1 = 1 or m1 = 0. If the early researcher report m1 = 0, however, there will not
be any message for researcher 2 to lead to policy implementation under supermajoritarian rule.
Because the early researcher is sure that the later researcher will aggregate evidence and vote
for the socially optimal decision, it is strictly optimal to report m1 = 1 however negative β1
is to maintain the option value. In this way, expectation to let a reviewer directly summarize
the signals, or a replicator confirm the validity of message, drastically alters the nature of
equilibrium. The first best welfare is attained because the later researcher synthesizes both
results to make policy recommendation. This example captures the conventional wisdom to be
cautious of literature at an early stage of inquiry because early researchers may expect others
will correct the policy to the right direction even if they turn out to be overly optimistic. That
early literature tend to have large coefficient magnitude has been empirically confirmed in some
studies (Munafo et al., 2007).
3.4.3
Risk aversion vs risk neutrality
Even though the main set-up focused on the role of standard errors σ in the Bayes’ rule,
the scientific requirement of low p-values may be primarily due to risk aversion of a decisionmaker who hopes to be “certain enough” of the positive effect. One can apply the Taylor
approximation to utility over policy benefit b with constant relative risk aversion γ ≥ 0 to
obtain the decision rule
(Eb)1−γ
1−γ
−
γ V ar(b)
2 (Eb)1+γ
≥ c, confirming the uncertainty over policy benefit
deters policy. With a single agent and flat prior, the optimal threshold can be written as
 − 1

2
1
c
 2
−
1−γ
γ 1−γ
β
t≥t=
1

−
|2 (log β − c)| 2
33
if γ 6= 1
(18)
if γ = 1
When there are more than 3 researchers, the final researcher can improve the reporting strategy slightly
with the knowledge of which message realization of pivotality he faces. In this sense, availability of others’
reports may improve welfare unlike in the Dekel-Piccione model. Nevertheless, unlike the sequential cheap talk
models (Krishna and Morgan, 2004), the characteristics of equilibrium do not change; their results were based
on risk averse preferences and equilibrium selection.
26
so that the threshold t-statistics is decreasing in the coefficient magnitude β. Thus, the
risk aversion could mitigate the convexity of t-statistics threshold for positive conclusions as
shown by numerical simulation in the model with risk neutrality. With many researchers,
since the posterior variance is V ar (b) =
E
PN
1
i=0 σ 2
i
−1
>
1
σ02
+
1
σi2
+E
PN 1
2
σ−i
PN
1
i=0 σ 2
i
−1
and the Jensen’s inequality applies
−1
, the researchers become on average more cautious
of recommending the policy because they are uncertain about what other researchers know.
4
Empirics
The model makes empirical predictions on the meta-analysis data that are distinct from other
commonly used models of publication bias: omission34 occurs among the imprecisely estimated
coefficients with intermediate magnitude. This section presents the tests to distinguish between
the models using widely available type of data35 , and apply them to two examples in economics.
The assumptions on publication selection process is important for choosing the appropriate bias
correction methods in meta-analyses. Given the possible relevance of the communication model,
the section proposes a new bias correction method that is useful under commonly suggested
models of publication bias.
4.1
Predictions
Consider the following data generating process with heterogeneous effects of the main coefficients and standard errors {βi , σi } of all studies i ∈ {1, ..., N } such that σ1 ≤ ... ≤ σN . The
data of published studies are generated in three steps: first, the study precision σi ∼ G and
the underlying effect bi ∼ N (b0 , σ0 ) is independently determined. Second, the random error
i ∼ N 0, σi2 is independently drawn and the coefficient βi = bi + i is determined. Third, the
researcher i decides to report a binary message based on {βi , σi } or not.
Contrast with existing models. This paper’s model makes the predictions that, when
negative results are published, they have either high precision or negative coefficient that is large
in magnitude. To the best of my knowledge, to date, there are three models in economics and
34
Although inflation may also be an important source of bias, it is beyond the scope of this paper to empirically
test their patterns for two reasons. First, the test of inflation requires far more data points than the test of
omission since it focuses on the local distribution around the cut-off as opposed to the entire distribution. Unlike
the existing evidence that have pooled the studies of various types to examine the excess mass around the cut-off,
this analysis on heterogeneous extent of inflation across study precision requires looking into a set of studies
that estimate one common parameter. Second, the theory does not provide parameter-free predictions on the
pattern of inflation across heterogeneity of study precisions. If the target effectiveness c > 0, then there should
be little inflation at precise values; but if c ≤ 0, then there should be more inflaion around the precise values
than around the noisy studies. If the distribution of standard errors G (σ) has a fat tail, then there should be
less inflation at the most noisy studies. If G has thin tail, there will be more inflation among them. In this
way, the pattern of inflation depends on specific model environment, which is difficult to infer solely from the
meta-analysis data.
35
While it is ideal to test them with data that include observations of omitted studies or non-inflated specifications, such data are available only in rare circumstances in which funding or regulatory agencies held such
data.
27
Table 3: Comparison between empirical distribution and counterfactual distribution
This paper’s model Uniform selection Extremum selection Gradual selection
G∅ (σ)
Ĝ∅ > G̃∅
Ĝ∅ = G̃∅
Ĝ∅ > G̃∅
Ĝ∅ > G̃∅
H0 (β)
Ĥ0 > H̃0
Ĥ0 > H̃0
Ĥ0 < H̃0
Ĥ0 < H̃0
Notes: Table 3 summarizes the prediction of each model. Titles are given to highlight the contrast
between them. Let G̃∅ (σ) denote the distribution of all standard errors conditional on the study being
the negative results without any selection. Let H̃0 (β) denote the distribution of coefficient conditional
on the study not rejecting the null hypothesis with threshold t̄, without any selection. Ĝ∅ and Ĥ0 denote
respective observed distributions.
biostatistics that describe other data selection process and have been used for bias correction
method.36 However, none of them makes the sets of empirical predictions that are identical
to the communication model. The distinction between models arises in the precision and
coefficient magnitude of published null and negative results.
First, the uniform selection model (Hedges, 1992) supposes the publication probability as
a step function of its t-statistic,
P (study i is reported) =

η1
if
η
if
0
|βi |
σi
|βi |
σi
≥ t̄
(19)
< t̄,
where η1 > η0 > 0. This suggests that the results that are not positively significant are
systematically unlikely to be published. Thus, conditional on being null results, the distribution
is identical to the underlying distribution: Ĝ0 = G̃0 . This model is commonly used in economics
with maximum likelihood estimation to correct for publication bias. (Hedges, 1992; Ashenfelter
et al., 1999; McCrary et al., 2016). In this framework, the positive correlation test as discussed
in Section 2 is a joint test of whether publication selection takes place and the underlying effect
is positive (b0 > 0) since, if b0 < 0, the correlation between coefficient and standard error will
be negative.
Second, the extremum selection model (Duval and Tweedie, 2000) assumes that the most
extreme negative values are suppressed from publication:
P (study i is reported) =

1
if βi ≥ βmin
0
if βi < βmin ,
(20)
where βmin is some threshold. In common case where βmin < b0 , there arises little selection
36
Fafchamps and Labonne (2016) informally proposes an alternative rule of journal acceptance: papers can
be accepted when the result is statistically significant or when it can reject the modified the null hypothesis: the
effect is as high as some target level c > 0. This model also has a prediction that the publication probability is
U-shaped in the magnitude of coefficient. Unlike in the communication model of this paper, however, their model
predicts that the negative results requires smaller magnitude than the positive results in absolute value. If the
underlying effect were zero, their model predicts deflation of overall estimate when not adjusted for publication
bias. In this way, the two models can still be empirically distinguished.
28
among the most precise studies. Thus, null results are more likely to be reported when the
standard error is small so that G∅ > G̃∅ . At the same time, the model also suggests that
marginally insignificant negative results are not particularly more likely to be omitted since
the selection is unrelated to the statistical significance milestones: H0 < H̃0 . This model is
most widely used as basis for in trim-and-fill publication bias correction method in medicine:
the method removes the noisy estimates with large coefficients until the positive correlation
between coefficient and standard errors no longer exists, and imputes the missing studies using
the samples such that the bias no longer exists, as discussed in main textbooks such as Cochrane
Handbook (Julian and Green, 2011) and Hunter and Schmidt (2004).
Third, the gradual selection model (Copas and Li, 1997) supposes the selection probability
1
+ δ2 βi ,
σi
P (study i is reported) = Φ δ0 + δ1
(21)
where δ1 , δ2 > 0, and uses two-step estimation for meta-analysis. It suggests that as σi becomes
small, the publication probability becomes close to 1 so that G∅ > G̃∅ . Nevertheless, publication
probability is also increasing in βi so that marginally insignificant negative results are more
likely to be reported than negative results with large magnitude: H0 < H̃0 . The Copas
correction method iteratively fit δ2 values starting from zero, and stop when the residual
asymmetry unexplained by the model is sufficiently small.
4.2
Tests
To test these predictions, the paper develops a semi-parametric Kolmogorov-Smirnov-type
(henceforth, KS) test that compare the predicted and observed distributions among null and
negative results. While the meta-analysis literature posited the aforementioned models of
underlying selection process, to the best of my knowledge, they have not tested their models
despite its feasibility under some common parametric assumptions. While the estimates have
joint two-dimensional distribution on {β, σ}, the analysis focuses on each dimension separately
since KS test is well established for one-dimensional random variable, and since it allows for
interpretation with prediction along each dimension as summarized in Table 2.
Assumptions. The test uses two assumptions that are commonly used in meta-analysis
models:
A1. Underlying effects with independent and identical normal distribution: for all i
bi ∼ N b0 , σ02
Since σ0 can take any value, this heterogeneous effects assumption accommodates, but does
not impose, commonalities across studies37 . In addition, A1 supposes that the study precision
σi is independent of the true underlying policy effect: E [β|σ] = b0 ∀σ, and that the underlying
37
This model is referred to as “random effects” model in biostatistics.
29
effects are homoskedastic.38
A2. No inflation within statistically significant results: if
|βi |
σi
≥ t, then the data generating
process is
βi = bi + i ,
where i ∼ N 0, σi2 .
This assumption says that the bias due to inflation of marginal results is unimportant. With
the limited sample sizes that is relevant for meta-analysis, it is infeasible to distinguish the inflation and omission; such test requires large sample size at the margin of statistical significance.
While restrictive, this is an assumption applied in all other studies on bias correction.
Under A1 and A2, the likelihood of an observation {βi , σi } can be written semi-parametrically:
a parametric assumption on one dimension along βi of joint distribution {βk , σk } is necessary
to extrapolate counterfactual distribution, and sufficient to flexibly cover another dimension
across σi ; it is not necessary to make parametric assumptions on distribution along σi .
Note that it is not necessary to assume that all statistically significant results are reported:
even among statistically significant results, publication probability could be higher among
precise studies. Publication bias occurs due to differential publication likelihood along the
coefficient values conditional on the precision. Thus, the analysis uses the distribution of
statistically significant results as reflecting both underlying distribution and selection inherent
in study precision to construct the counterfactual distribution of negative results that reflects
both considerations.
Bootstrap-based two-step estimation and testing. The test involves two-step estimation of counterfactual distribution and its KS test with the observed distribution: the test
first estimates the parameters {b0 , σ0 } to construct the counterfactual distributions, and then
estimates the KS statistics.
Estimation step 1. estimate distribution
of {b0 , σ0 }: use the maximum likelihood estimation
∗
β
−b
n
0
i
with likelihood function Πi=1 φ √ 2 2 , where i = 1, ..., n∗ are some most precise studies.
σi +σ0
As will be discussed in the bias correction
section,
n∗ is chosen to maximize the precision of the
n
o
estimate.39 Distribution of estimates b̂0 , σ̂0
is obtained by the bootstrap with replacement
such that the sampling probability is proportional to the informational value of study i:
1
.
σi2 +σ02
Estimation
stepo2. construct the counterfactual distributions G̃∅ and H̃0 : for each ren
alization of
b̂0 , σ̂0 , estimate the counterfactual distribution G̃∅ using the standard error
distributions of significant results, and H̃0 using the standard error distributions of negative
38
This condition is violated in the presence of small study effects: when σ is large due to small sample size, the
true effect b0 could also be larger because the experimenter can pay more attention per subject. Nevertheless,
it is also a widely used simplication in meta-analysis.
39
Since this test is used to propose the bias correction approach, one may be concerned that the conclusion
and the method choice are mutually supporting each other. However, since this bias correction approach is
appropriate even under other models of publication bias, such circularity of argument does not arise here.
Moreover, existing methods
are misspecified if the predictions of this paper’s model applies. In addition, the
estimation step of b̂0 , σ̂0 uses the entire sample without regard to the studies conclusions because it can
improve the estimate of b̂0 , σ̂0 .
30
results:
G̃∅ σ|b̂0 , σ̂0 =
1
C
Φ √tσi2−b̂0 2
−tσi −b̂0
−Φ √
2
2
σi +σ̂0
X
1−Φ
i||βi |≥tσi
√tσi2−b̂0 2
σi +σ̂0
σ +σ̂0
i
1 (βi ≥ tσi ) + Φ
√tσi2−b̂0 2
σi +σ̂0
1 (σ ≥ σi )
1 (βi < tσi )
(22)

 β−b̂0




√
Φ

1 X
σi2 +σ̂02
, 1 ,
min
H̃0 β|b̂0 , σ̂0 =


n0 i|m =0


 Φ √tσi2−b̂0 2

i
(23)
σi +σ̂0
Φ
where C =
P
√tσi2−b̂0 2
σ +σ̂
i||βi |≥tσi
1−Φ
tσi −b̂0
σ 2 +σ̂ 2
0
i
√
i
−Φ
−tσi −b̂0
√
2
2
σ +σ̂
0
i
0
1(βi ≥tσi )+Φ
√tσi2−b̂0 2
σ +σ̂
0
i
. The algorithm uses
1(βi <tσi )
sufficiently fine grid of σ and β for the numerical estimation of the distributions.
40 , and compute the
Testing step. estimate the KS statistics distribution through bootstrap
p-value for one-sided test: for each counterfactual distributions G̃∅ σ|b̂0 , σ̂0 and H̃0 σ|b̂0 , σ̂0 ,
compute the one-sided KS statistics:
n
o
(24)
n
o
(25)
DG b̂0 , σ̂0 = sup Ĝ∅ (σ) − G̃∅ σ|b̂0 , σ̂0
DH b̂0 , σ̂0 = sup Ĥ0 (β) − H̃0 β|b̂0 , σ̂0
For each counterfactual distribution, the test applies the inverse cumulative distribution function method to generate the simulated
empirical distribution with sample size of n0 . The
o
n
p-value of each distribution given b̂0 , σ̂0 is the fraction of simulated random draws such that
their KS statistic is at least as large as DG or DH respectively. Since the p-value of the KS test
is defined as the probability that the maximum difference between the theoretical distribution
40
The analysis employs the bootstrap methods to account for two difficulties in computing the p-value of this
test. First, unlike the standard KS test between a theoretical distribution and an empirical distribution, the
theoretical distribution has its own uncertainty due to the estimation with a finite sample. Since ignoring this
uncertainty in two-step estimations (Newey and McFadden, 1994) underestimates the p-value in this
application,
I incorporate them by bootstrap sampling. The bootstrap method is appropriate for distribution of b̂0 , σ̂0 since
the parameters of interest are not the extremum statistics of the parameter space. Second, unlike the standard
two-sample KS test with well-defined sample sizes, one sample size is used to construct the counterfactual
distribution as their weighted sum. Since the proof for KS statistics that describes rate of convergence uses the
uniform weights across samples, it cannot be directly applied to this test. Bootstrap simulation to compute the
mapping between KS statistics and p-values using the identical sample sizes and distributions is a straightforward
solution to this trouble. The bootstrap with inverse cumulative distribution function method incorporates the
high KS statistics threshold due to small sample size; it makes multiple observations take the same value in the
distribution, thus making it more likely to have large KS statistics. KS approach is known to be conservative as
it applies to any distributions; for specific distribution, more efficient thresholds are often available. Since the
meta-analysis is limited for its sample size, knowing the exact KS test statistics thresholds and increasing the
power of test is valuable.
31
and empirical distribution of same sample size
is at
least as large as the statistic D, the average
n
o
p-value across the bootstrap estimates of b̂0 , σ̂0
is the p-value of this overall test. The test
repeats this simulation until the estimated p-value converges.
4.3
Applications
Two examples, effect of labor unions on productivity and Elasticity of Intertemporal Substitution (EIS), from economics literature are used to test the predictions, chosen on the basis
that their data were publicly available either directly in their papers or the author’s websites41 ,
and both had large sample size to ensure power of the tests. These examples are used only
as exploratory analysis, not as a representative test that applies to other disciplines. Whereas
choice of sample size is endogenous in lab experiments in psychology and medicine, it is largely
determined by the survey designed by others in many observational studies in economics. In
this way, the nature of selection may be different and economics example is suitable for the
main mechanism of the model.
Sample restriction. This analysis takes the example of the labor union effects as an
example of literature with vote counting and binary conclusions, and EIS as a counterexample
of such communication for two reasons. First, whereas the former is an example with binary
conclusions without one well-defined outcome variable, the latter has a continuous outcome
to be used as parameter values of models with a coherent definition. Second, whereas in the
former literature 34 out of 73 studies highlight their binary conclusions in their abstracts, only
60 out of 169 studies mention their binary conclusions explicitly there.
To highlight the role of dichotomization of conclusions, the analysis focuses on the papers
with binary conclusions stated upfront for labor union effects, and papers without such conclusions for EIS. To classify the papers whether they highlight the yes-or-no conclusion, the
analysis applied the following criteria: whether the binary conclusions of null hypothesis testing are stated in either abstract, introduction, or conclusion of the paper42 . If they were not
mentioned in such easily accessible places of the article, the vote-counting readers are unlikely
to learn from them (for example, Batt, 1999). Conversely, when the author wants to communicate particular estimates they do not feel conclusive, they tend to only contain brief discussions
in the text.
Finally, the stark contrast between the two datasets is already clear from their funnel plots
41
To the best of my knowledge, there were only a few meta-analysis datasets readily available and had
comparable sample size that ensures power. Among them, I chose ones with different sources of authors. One
limitation of using the ones with large sample size is that it could be systematically different from the metaanalyses with smaller sample size.
42
Since this decision involves some subjective judgements, I complement the analysis in two
ways: (i) the textual evidence for binary classifications and their discussions are available online at
http://economics.mit.edu/grad/chishio/research, and (ii) Appendix C1 presents the analysis with entire
data sets. While the pattern becomes weaker, the overall conclusions are identical. There were 7 book chapters
for labor union effects, where there is no requirements that the main conclusions are summarized in abstracts
or introduction unlike journal articles. I have checked all available books, and they mostly discussed the results.
Thus, the analysis included all the estimates from book chapters.
32
Table 4: Summary of Kolmogorov-Smirnov-type tests
Labor union effect
EIS
G∅
H0
G∅
H0
p-value
0.0036
0.0052
0.0287 1.000
D
0.3872
0.2892
0.3052 0.0181
D
0.2832
0.1977
0.2730 0.4341
arg sup
0.0799
-0.0632
0.1180 61.82
b̂0
0.0075
0.0505
σ̂0
0.0642
0.0956
∗
n
30
18
Used Information
0.6692
0.3767
Notes: Table 4 summarizes the stem-based estimates and the results of KS-type tests described in
section 4.2. D denotes the KS statistic with the predicted distribution as in (24) and (25). D denotes
the critical values of KS statistics for rejecting the null hypothesis at 95 confidence level. This is included
for benchmark comparison although the p-value cannot be computed by comparison of KS statistics.
arg sup denotes the minimum parameter values {σ, n
β} at which
the KS statistic is measured. The po
value is the overall p-value computed by bootstrap.
b̂0 , σ̂0
denotes the overall estimate and n∗ is the
number of included studies. Used information denotes the measure in (27).
contained in Appendix C2: the positive correlation test does not detect the omission in the
former example whereas it does in the later. The tests herein illustrate how they may reflect
the vote counting concerns.
4.3.1
Example: Effect of unions on productivity (Doucoulagos and Laroche, 2003)
There is a large theoretical controversy and empirical discrepancy regarding the effect of labor
unions on firm productivity. The question is relevant for policy decision on labor market regulations to promote or prohibit union formations with aim of increasing US firm competitiveness.
Neoclassical economics would suggest that unionization reduces firm productivity by raising
wages above competitive prices or by lowering the return to capital due to labor’s rent-seeking.
On the other hand, some scholars suggested that labor unions can improve labor productivity
by improving communication between firms and workers, thus lowering inefficient turn-overs.
Both effects are theoretically plausible: this is an example in which the symmetric prior on both
directions is plausible. Since the positive effects are contrary to the prior given by economic
theory, positive coefficients are mapped to the “positive results” in the communication model.
Data. Doucouliagos and Laroche (2003) collected 73 studies that estimate the effect of
unionization on firm productivities. The average t-statistics and coefficient magnitudes of
main specifications are reported in Table 1 of their paper43 , which is the preliminary source of
43
Since the outcome variable, firm productivity, needs not be scale-invariant, one may be naturally concerned
that the standard errors reflect arbitrary re-scaling of variables rather than precision of estimates. To examine
the extent to which the standard error reflects the estimate precision, I regressed the inverse of the variance
33
data for this analysis. A major drawback of the data is that, because they take the average tstatistic among multiple specifications, their headline conclusions may differ from conclusions
inferred by the conventional hypothesis testing. For example, Brown and Medoff (1978) is
a seminal study that claimed “unionization is found to have a substantial positive effect on
output per worker” as stated upfront in their abstract. However, due to various attempts
to relax assumptions, the average t-statistics of the paper is 1.95 in Doucouliagos-Laroche
meta-analysis data set, making the study be taken as statistically insignificant at conventional
threshold of 1.96. To avoid erroneously counting these studies as claiming negative results,
5 studies are excluded in total. With these criteria, there were 54 included studies with 17
positive results and 37 negative results.
Results. The stem-based bias corrected estimate is not statistically different from zero.
Note that the underlying heterogeneity measure σ̂0 is large, consistent with Freeman (1988)’s
claim that the effects can be very heterogeneous.
First, Figure 6 Top shows that null results have standard deviations that are first-order
stochastically dominated by what can be predicted using the statistically significant results
under the uniform omission model: Ĝ∅ > G̃∅ (p = 0.0035.) The interpretation of the KStest statistic is that, although the statistically significant estimates predict that only about
33 percent of studies among null results have σi < 0.11, there are over 72 percent of studies
with such σi . That is, most standard errors of published null results are precisely estimated,
contradicting the uniform selection model.
Second, Figure 6 Bottom shows that the distribution of coefficient magnitudes among negative results is first-order stochastically dominated by the predicted distribution: Ĥ0 > H̃0
(p = 0.0047.) The interpretation is that, compared to the counterfactual of uniform omission
of negative results that predict only about 25 percent of studies have β̂ < −0.1, the empirical
distribution has approximately 54 percent taking β̂ < −0.1. This contradicts the extremum
and gradual selection models, which predict that strictly less than 25 percent of studies have
β̂ < −0.1. In a review, Stanley (2005) applies a χ2 -test to suggest that there are too many
statistically significant results. This test confirms that such biases arises not only from positive
results, but also from negative results.
Using the positive correlation tests, Doucouliagos and Laroche (2003) suggested that the
publication bias is not an issue in this literature. However, this analysis highlights that the
selective omission is most likely to be prevalent here, too: the lack of positive correlation could
be due to the near-zero underlying mean effect and theoretical plausibility of effects on both
directions.
on the sample size. Among all 73 studies, the coefficient was 0.99 and the R2 was 0.96. This indicates that
standard error mostly reflects the estimate precision that increases with its sample size.
34
Figure 6: Distribution of negative studies’ standard errors and coefficients for labor union
effects
Notes: Figure 6 plots the cumulative distribution function of standard errors of null studies (Top) and
coefficients of negative studies (Bottom) for estimates of the labor union effects. The solid green line
represents the observed distribution and the dashed blue line represents the predicted distribution along
with its 95 confidence interval estimated by the bootstrap. The solid red and vertical line quantifies the
Kolmogorov-Smirnov statistic.
35
4.3.2
Counterexample: EIS for consumption (Havránek, 2015)
To demonstrate that academic communications does not necessarily involve vote-counting, this
subsection also discusses the EIS for consumption, the key economic primitive that links interest
rates and consumption growth. Due to their wide applicability, they are not only estimated a
number of times, but also have controversies such as disparities in estimates from microdata
and calibration from macrodata. The goal of meta-analysis is not to proclaim the average
estimates across studies as that to be used for calibrations of representative agents invariant
of context or model; instead, it is to characterize robust heterogeneities across settings beyond
study-specific idiosyncratic variations. Havránek (2015) reviews 169 papers and demonstrates
that, for EIS, there exist systematic differences for estimates of asset holders, incorporating
non-separability with durable goods, focusing on total consumptions rather than only food,
and separating the intertemporal substitution and risk aversion by Epstein-Zin preferences.
Such systematic reviews are useful for researchers since checking all papers on one’s own can
be prohibitively time-consuming for individual readers.44
Data. Among Havranek’s data sets, this analysis focuses on 117 (or 118) papers that estimate the common categories among key heterogeneities45 and takes the median estimates from
each paper following Havránek’s discussions. Moreover, to highlight the pattern of publication
bias in the absence of vote-counting concerns for the testing, it takes the 68 papers out of 117
that do not mention their EIS estimates in their abstracts, introductions, nor conclusions.
Results. While 53 out of 117 studies have positive and statistically significant results,
the bias-corrected estimate is not statistically different from zero. This is consistent with
the conventional understanding and Havránek’s conclusion that the estimation using timeseparable preferences and total consumption data with non-asset holders leads to near-zero
IES.
First, Appendix Figure C2 (Top) shows that the empirical distribution of null results Ĝ∅
are marginally more concentrated on precise estimates than what would be predicted under
the uniform selection model G̃∅ . The magnitude is similar to that of the union productivity
effect. Second, Appendix Figure C2 (Bottom) shows that, contrary to this paper’s model,
the empirical distribution of coefficients of negative results Ĥ0 is not more negative than the
prediction H̃0 ; instead, the one-sided test on the opposite direction is significant (p = 0.000).
As the Figure demonstrates, the largest discrepancy between the empirical distribution Ĥ0
ˆ = 0, indicating that the negative
and counterfactual distribution H̃0 occurs precisely at EIS
estimates are systematically less likely to be published than positive estimates. This also
ˆ = 0 is
explains the first result that Ĝ∅ > G̃∅ . When the change in reporting probability at EIS
taken into account, the two distributions Ĥ0 and H̃0 becomes very similar. This suggests that
the normality assumption was suitable in this example. Moreover, it is not entirely consistent
44
Important biases arising from aggregation, model mis-specification, small samples, seasonality, slow-moving
omitted variables, and weak instruments are not resolved by the meta-analysis.
45
Those who do not hold assets, without durable goods, have variables not for total consumption, and not
use the Epstein-Zin preferences.
36
with gradual or extremum selection models although the predictions Ĝ∅ > G̃∅ and Ĥ0 < H̃0
match because these models predict different patterns of omission among null results.
As demonstrated by his discussion of p-values, Havránek (2015)’s critique was that the reporting bias is potentially caused by the preference to report statistically significant coefficients.
While this could be important, the above analysis showed that the positive correlation between
coefficients and standard errors is also largely due to the theoretical implausibility of models
with convex utility functions. My impression of the literature is that the authors are very
careful in raising caveats for proper interpretations of the parameters. Moreover, the literature
has strong focus on the mechanisms rather than the particular estimate values, and is willing
to be upfront about empirical limitations of the models. Since there is common knowledge
that the most parsimonious models may generate economically implausible parameters, that
efforts focused on finding suitable alternatives with theoretically consistent estimates appears
constructive.
Lastly, this example highlights another important mechanism of omission that even four
distinct models of publication bias does not capture: to be in the plausible range of values.
In this sense, it highlights the difficulties of proposing a model that applies universally to any
meta-analysis data.
4.3.3
Discussions
Alternative explanations. As discussed in the sensitivity analysis, since negative results
cannot be consequential in the absence of other positive results, the early literature tends
to have large coefficients. In addition, if the data improve over time, the standard errors
may become gradually smaller. While some of the earliest influential studies in both fields,
such as Brown and Medoff (1978) and Hansen and Singleton (1982) were positive results,
the regression of coefficients and standard error on years of publication shows both statistically
and practically insignificant correlation between them46 . Thus, these examples show that, even
without dynamic considerations, publication bias can arise and persist as the communication
model showed.
While the model focused on the tendency of communication through vote-counting as a
reason for the difference between two examples, the difference in prior belief could also be
important. While all positive, negative, or no effects are theoretically plausible for labor
union effects, only positive estimates are theoretically coherent for EIS. As the distribution of
coefficient demonstrates, the economists’ thin prior on non-positive EIS may be an important
reason for the aforementioned results.
Qualitative examples. The empirical tests formalized the pattern of publication selection
that is qualitatively consistent with common observations. First, precisely estimated null
results are often published and disseminated widely. Some examples include the large-scale
46
The raw regression of βi = θpublication yeari + i and σi = θpublication yeari + i has R2 of 0 and
statistically insignificant coefficients θ in both examples.
37
clean cookstove study (Hanna et al., 2016) and the air pollution regulation in Mexico (Davis,
2008), and the community-based development programs (Casey et al., 2012). It would be remiss
not to acknowledge a well-known counterexample: DEVTA study, the largest randomized trial
that showed null effects of deworming and vitamin A supplementation on child mortality and
health, was not published until 7 years after the data collection (Garner, 2013). While the delay
required by the careful analysis of authors is extensive, that it was published in the Lancet,
a top medical journal, is, in a way, reassuring of the academic journal’s willingness to report
precise negative results.
Second, results with large magnitude of coefficient with negative signs also tend to be
published even if they may not be so precisely estimated. The examples from economics include
the negative labor supply elasticities close to −1 among the taxi driver papers (Camerer, 1997)
and the positive impact of inequality on economic growth (Forbes, 2000). In medicine, the
unexpected harmful effect of a therapeutic strategy on the cardiovascular events found in
one trial was widely reported (The Action to Control Cardiovascular Risk in Diabetes trial,
2008; Boussageon, 2011) while subsequent meta-analyis found that the aggregate effect is not
statistically significant. The trim-and-fill method that uses the extremum selection model
acknowledges them as the major drawback of their methods (Duval and Tweedie, 2000), and
considered them as anomalies in their interpretations of publication bias. Because it does not
suit the common Begg’s and Egger’s positive correlation tests, Stanley (2005) called them as
“type-II” publication bias. A contribution of the model is to rationalize the data that were
considered as outliers in existing interpretations.
4.4
Bias Correction Method
The assumptions on the data generating process of meta-analysis data set are fundamental
to the choice of publication bias correction method, which can be applied to over 4,000 meta
analyses conducted every year. These meta-analyses are considered to be the most credible
synthesis in the hierarchy of evidence. What could an alternative bias correction method be if
existing assumptions on selection process suffer from a large degree of misspecification?
This paper proposes a new, simple, and fully data-dependent bias correction method that
is robust under all of existing publication selection models. It uses the common implication
of these models that the bias of reported estimate βi is increasing in its standard error σi at
least when σi is small. It focuses on some most precisely estimated studies to maximize the
precision of the meta-analysis estimate. I suggest the name “stem-based method” because it
uses the “stem” of the funnel plot for the estimation.
4.4.1
Common Prediction
While the preceding discussions focused on the differences among the models of publication
selection, there is one useful implication that holds under all of the models:
38
Proposition 3 (minimal bias among most precise studies). Define Bias(σi ) ≡
E [βi |σi , study i reported] − b0 .
1. (Limit)
(i) Consider the uniform and gradual, and extremum selection models.
lim Bias (σi )2 = min Bias (σi )2
σi
σi →0
For uniform and gradual selection models, minσi Bias (σi )2 = 0.
(ii) Consider the extremum selection models. If σ0 = 0, then
lim Bias (σi )2 = 0
σi →0
2. (Monotonicity)
(i) Consider the uniform selection model. For any b0 6= 0, there exists σ > 0 such that
Bias(σi )2 is increasing in σi for σi ∈ [0, σ]. If b0 = 0, Bias(σi ) = 0 always.
(ii) Consider the extremum and gradual selection models. Bias(σi )2 is increasing in σi for
all σi .
Proof. Appendix C4.
The first statements on the limit of σi states that, when the sample size increases to infinity,
the bias due to omission approaches its minimum.47 The reason is that there is no reason not to
report the results whose binary conclusions are almost sure to be correct unless there are studyspecific heterogeneities that increase in sample size cannot eliminate. The second statement
points out that the extent of bias is increasing in standard errors. For the extreme and gradual
selection models where the direction of bias is monotone, this is because, as the sample size
drops, more estimates are likely to be affected by the selection. For the uniform selection model
where the omission is assumed to occur in both sides, the bias could be decreasing among the
most noisy studies due to the symmetry of thresholds around zero. However, this is unlikely
to be an important concern for two reasons: first, as Appendix C4 shows, low σ occurs only
when the underlying b0 as well as the bias is also small. Second, the method to be proposed
will take into account the overall estimate that is closest to 0 so that, if the bias were indeed
decreasing among noisy studies, those values will be taken as the hypothesized true value at
the pilot stage, leading the method to include them in samples.
The analogous monotonicity arises under the plausible parameter values in this paper’s
model. As Table 2 has shown, the asymmetry between thresholds β and β was increasing in
the standard error, leading to the increasing bias. Moreover, the bias among the most precise
studies was small. In reality, to make the model’s prediction relevant, one has to assume that,
47
With extremum selection models and this paper’s communication model, the bias does not converge to zero
even when σi approaches zero. While this may appear concerning, it is not resolved by other methods of bias
correction either.
39
when the estimated coefficient β is clearly small for practical significance, the authors can claim
negative result even if it were statistically significant.
4.4.2
Criterion
n
Let the objective of the estimator b̂0 , σ̂0
o
be maximization of the precision of the estimates.
One of the most common approach with this objectives is to minimize the mean squared error
(MSE):
min MSE = E b̂0 − b0
2
(26)
The MSE can be conveniently decomposed48 into the Variance= E b̂0 − Eb̂0
2
due to
idiosyncratic errors, and the Bias= E bi − Eb̂0 due to systematic errors:
MSE = Bias2 + Variance
While optimization of this objective is infeasible since the underlying b0 is unknown, this
paper proposes a criterion that approximates this bias-variance decomposition. This is a common approach used in optimal bandwidth choice for kernel estimations as well as sample choice
of machine learning. How could an estimator minimize MSE so that it is efficient?49
The bias-variance decomposition suggests that the estimator is efficient when it focuses on
some most precise estimates i = 1, ..., n∗ . First, when comparing two studies with different
precisions, it is always optimal to include more precise studies for σi < σ. This because
more precise studies are less biased in expectation (consequence of selection models), and
because more precise studies provide lower variance (consequence of Bayes formula: Variance=
P
1
σ02 +σi2
−1
).
Second, the bias-variance trade-off provides useful guidance for n∗ , the optimal number of
studies to include for maximizing efficiency. As more studies are included, the bias increases
while the variance decreases by increasing the number of total observations and experiments.
Including more studies is beneficial even though it may introduce slight bias in the estimate if
it can largely decrease the variance by increasing the total observations and the experiments.
While the theory cannot show that the objective of minimizing MSE(n) is necessarily
convex, the Appendix Figures C3 show that its empirical approximation is convex in the
examples considered in this paper. If there were no selective reporting, all studies will be
48
E b̂0 − b0
2
= E b̂0 − Eb̂0 + Eb̂0 − b0
2
= E b̂0 − Eb̂0
|
{z
Bias2
2
}
+ E b0 − Eb̂0
|
{z
Variance
2
}
+ 2E b̂0 − Eb̂0
|
{z
=0
b0 − Eb̂0
}
Note the difference between t-statistics σβ and bias-variance trade-off. In MSE criteria, higher t-statistics
is desirable if and only if it is due to low variance. If high t-statistics were due to β, then it increases bias and
thus undesirable. This logic suggests that, when the meta-analysis estimate rejects the null hypothesis, it is
not merely due to the algorithm that allows for some slight bias since the algorithm at the same time allowed
slightly high variance by not including many studies. This is confirmed by the simulation with proposed choice
of initial b̂pilot .
49
40
included since the bias does not arise by including more studies.
In contrast with the objective in the communication model a (bi − c), the objective of maximizing the precision of overall estimates is symmetric. The communication model focused on
making the efficient binary policy decision a ∈ {0, 1} given some target c, which introduced
asymmetry in the objective. In this objective in the absence of asymmetry, the full disclosure
of all estimates is always optimal.
4.4.3
Estimation
n
To operationalize the criterion, the pilot estimate of b̂0 , σ̂0
o
pilot
is necessary. Using the pilot
estimates, it becomes feasible to compute the MSE at each number of included studies n =
50 . Finally, the optimal
2, ..., N by suitable estimation methods such as maximum likelihood
n
o
number of studies is determined and the efficient estimate of b̂0 , σ̂0 ∗ can be obtained. I
n
make specific proposals for the pilot estimate with the priority to be conservative.
Concretely, the estimation takes the following two steps:
n
1. Estimate the random effects model parameters b̂0 , σ̂0
o
n
of normal distribution using the
most precise studies i = 1, ..., n for n = 2, ..., N (σ1 ≤ ... ≤ σN );
2. Choose the optimal number of studies n∗ by computing MSE(n).
The choice of pilot estimates will be guided by the maximin model in decision-theory: when
not have reliable information on distribution, take the conservative choices. Given that publication bias is primarily concerning for estimates away from the null hypothesis, a natural conservative pilot value for the mean effect is the one closest to it: b̂0,pilot = b̂0,n |n ∈
arg minñ∈{1,...,N } b̂0,ñ
2
.51 This makes the estimator conservative: a small simulation study
using the observed distribution of standard errors from the studies on unions and simulating
the same sample size with true effect b0 = 0 shows that the null hypothesis is rejected with
probability only 0.035, whereas standard method yields 0.05. This demonstrates a moderate
degree of type II error. At the same time, if the meta-analysis method rejects the null hypothesis, we can be confident that it is not driven by this methodological choice. This also addresses
the concerns raised for the uniform selection model: noting that the model is primarily concerned with the bias away from zero, if the bias indeed decreases for large standard errors, the
pilot estimate will still be those with lowest bias level across n∗ = 1, ..., N .
50
Like uniform and gradual selection models, this method will by default assume the normality of distribution.
Nevertheless, it is straightforward to adjust to other models such as having higher omission probability below
value of zero. This extension is used for the example of EIS.
51
The precise estimation of bias for methods using bias-variance trade-off is often difficult. For example,
Imbens-Kalyanaraman optimal bandwidth applies some regularization factors to avoid the overestimation of
bias (Imbens and Kalyanaraman, 2011). While this choice of initial value is not perfect, I argue that it is a fully
data-dependent and reasonable choice. This explains the standard case of null hypothesis testing. When the null
2
hypothesis (b̂null ) is different than zero, one can write b̂0,pilot = b̂0,n |n ∈ arg minñ b̂0,ñ − b̂null . Although this
2
may not be the most conservative choice since b̂0,pilot = b̂0,n |n ∈ arg maxñ b̂0,ñ − b̂0 would give even larger
bias, such choice may be too conservative since this exacerbate the type II error. Moreover, this specification
will require altering b̂0,pilot depending on the estimated b̂0 .
41
Given that MSE is increasing in variance, the conservative choice of underlying standard
error is to take the largest value among all estimates: σ̂0,pilot = maxn σ̂0,n . It safeguards against
choosing n∗ that is unreasonably low: estimates of σ̂0,n tend to be small for low n, thus, the
algorithm may not include further studies only because it ignored the potential underlying
heterogeneity. Since high σ̂0,pilot estimate implies larger value of including more studies to
reduce overall variance, this algorithm lean towards including many studies and allowing for
slight bias in the point estimate.
It is helpful to quantify the fraction of information used for the meta-analysis estimate. Applying the notion of information as proportional to its sample size52 , the fraction of information
used for the stem-based estimate is
Pn∗
∗
Used Information (n ) =
1
σi2 +σ̂02
PN 1
σi2 +σ̂02
(27)
This quantity can inform how much information the algorithm has thrown away to avoid
publication bias in the meta-analysis. The high measure of used information indicates that
there were little publication bias. In the simulation study where no publication bias exists, the
algorithm appears to use approximately 60-70 percent of information with sample size 20, and
50-60 percent with sample size 100.
4.4.4
Discussions
The stem-based bias correction method has its advantages in minimal assumptions and simplicity. First, while other correction methods relied on particular form of publication selection, this
method only uses bias monotonicity assumption, which is satisfied under all proposed models
of selection. In this sense, the conclusion will not be driven by the particular forms of selection
the meta-analysts choose. As the examples demonstrated, even four models of publication bias
are not sufficient to capture entire considerations such as plausible values, impact of inflation,
or genuine finite sample bias or small study bias. This method can suitably address these
concerns by incorporating them in the likelihood functions. Second, the estimation applies an
intuitive criteria defined by minimization of MSE and allows for straightforward extension to
new methods such as network meta-analyses.53 Moreover, unlike the uniform or gradual bias
correction method, it has a visual representation that can help with communication to readers,
as presented in Figure 7. While other methods posited particular publication selection patterns
52
To understand the intuition behind expression
Pn∗
1
2,
σi2 +σ̂0
set σ̂02 = 0 so that there is no between-study
heterogeneity. In this case, the measure is simply the sum of the sample sizes in included studies among total
sample size because σ12 is proportional to study i’s sample size. When σ̂0 is high, the relative informational
i
value of noisy studies increases, leading to higher number of optimal inclusion n∗ .
53
For example, Barth et al. (2013) has already used similar methodology although their decision to focus on
some most precise studies was not based on the concern for publication bias but rather on the study quality.
Moreover, Stanley et al. (2010) also suggested focusing only on the most precise studies can alleviate the
publication bias. Both papers’ decisions for the number of studies to include were made arbitrarily; in this
sense, the method provides a formal criteria for such analyses.
42
to correct the bias, this method allows for flexible patterns of selection and finds the optimal
way to mitigate the bias. The disadvantage of the method is that, because it does not impose
particular structure on selection process, it does not utilize all information of data when some
particular selection process is appropriate.
Figure 7: Example with labor union’s effect on productivity (above) and with EIS (below)
Notes: Figures 7 illustrate the examples of the funnel plot-based visualization of stem-based bias correction method. The diamond at the top of the figures indicates the estimate along with its 95 percent
confidence interval. The curved line indicates the initial MLE estimates b̂0,ñ for ñ ∈ {1, ..., N } as more
less precise estimates are included. The diamond at the middle of the curve indicates minimal level of
precision for the inclusion. Therefore, the stem-based estimate is given by the studies, represented by
circle, whose precisions are above this diamond. It also corresponds to the “stem”, the thin part, of the
“funnel.” As the figures indicate, the diamond is at the position right before the bias becomes large.
Precision measure scales are adjusted for readability.
43
5
Discussions
The following discussions will first examine the validity of key assumptions of the model; second,
they will explain how the model builds on theoretical literature on information aggregation
and communication; third, they will clarify the model’s implications for the policies to address
publication bias.
5.1
5.1.1
Key assumptions
Vote counting
The central friction of the set-up is the technological restriction that policymakers only count
the yes-or-no conclusion of studies. The general idea that scientific communication involves
simplification of researchers’ knowledge appears uncontroversial: whether due to limited expertise to understand subtleties, to cost of time to process information, or to paucity of memory
to recall details, the intended audience may not consult all information in papers to make their
decisions.54 This is a part of our daily experience, and the demand for systematic reviews
demonstrates that fully sophisticated aggregation of evidence is costly for individual readers
to undertake. As long as message space is smaller than signal space, then the addition of an
option to omit results is welfare-enhancing as it enriches the message space.
Specifically, while less universally applicable, many reviewers may simply compare numbers
of positive vs negative results. First, this restriction implies that the readers’ interpretation
ignores coefficients’ magnitude. Such simplification may occur despite the original papers’
nuanced conclusions not only due to highlighted binary conclusions in titles and abstracts
of papers, but also due to convenience when summarizing a series of results succinctly in
review articles for conveying impression of results. The Cochrane Review also suggests that,
when the statistical specifications or variable definitions are diverse across studies, simple votecounting may be an efficient way of aggregation (Higgins and Green, 2011).55 Second, it also
assumes that the insignificant intermediate results and negative significant results are bundled
54
This approach relates to the literature on bounded rationality that incorporates cognitive limitation of
processing information. For example, see Mullainathan, 2000 and Schwartzstein, 2014.
55
Note that this result is robust to arbitrary weights readers may assign when they also consider the precision
of study, so long as not only the readers but also the researchers themselves know each other’s precision. To see
this, consider an example of two researchers model with one with very high precision, and another with very
low precision. If researchers themselves know each other’s precision, then the one with high precision almost
always publishes his result whereas the one with low precision almost never does. When the low precision
researcher publishes, it is because he received a signal of extreme magnitude. In this example, even though
policymaker knows the precision of each study, he weighs each study equally: if she weighs low precision studies
less, then low-precision study claiming negative result is never consequential, and eliminating such possibility
cannot be optimal. In practice, applying weights can be difficult when the models are complex. For example,
the International Panel on Climate Change adopts “model democracy” in which every model in articles above
their qualifications receive equal weights in aggregating estimates from various climate models (Knutti, 2010).
44
as non-positive results when vote counting. In the setting with binary actions, the papers’
conclusions can be perceived as recommendations and thus be reduced to two messages.56
Many researchers would agree that it is difficult to present a paper wherein the main claim is
“inconclusive,” and sometimes significant negative results cannot be coherent with theory in
some socio-economic settings. That language of “negative” results and “null” results are used
almost interchangeably suggests that comparison of positive vs non-positive results takes place
at least informally. Even influential regulatory agencies such as FDA adopt a simplistic rule of
approving the drug if there were two positive results, illustrating how coarse aggregation is used
in practice and non-positive results are considered equivalently. For example, to convey the
gravity of publication bias, Turner et al. (2008) showed that 37 out of 38 positive studies were
published whereas only 3 out of 36 non-positive studies were published as negative in research
on antidepressants. Such comparison of binary messages, combined with binary actions, leads
to the bias in reported coefficients in the stable and optimal communication equilibrium.
5.1.2
Binary actions
The specification that the policy decisions are binary leads to the endogenous bias of reported
βi through the asymmetry of the payoff function: the expected benefit of policy matters if and
only if it surpasses some threshold to implement the policy. If the action space was continuous
and the objective was to choose an action closest to the value of expected benefit with symmetric
loss function as discussed in bias correction method (26), then the bias no longer arises.
This assumption is suitable for examples of medical researchers who inform the regulatory
agencies and doctors for their decisions on whether to adopt the drugs studies. Even when
the exact policy decisions, such as level of minimum wages, can be continuously adjusted,
the citizens or committee members may make binary voting decisions for specific parties or
proposals with particular values. While any decisions can be modeled as continuous when mixed
strategies are possible, binary action space assumption is a reasonable approximation of many
policy decisions that have been adopted in a number of communication models (Dewatropoint
and Tirole, 1999; Hao, Rosen, and Suen, 2001: Morris, 2001; Glazer and Rubinstein, 2004).
5.1.3
Conservative prior
In addition to vote counting and binary actions, that prior expected benefit policy is less
than its cost of implementation (c ≥ 0 while prior mean is 0) is necessary and sufficient for the
bias of reported coefficients to be upward in the optimal communication equilibrium. This
assumption may appear non-applicable to examples in which the goals are to confirm whether
the existing policies are indeed effective. However, it is suitable for the main application of
the model in which researchers are testing new drugs or programs that they believe could be
56
When included, the “inconclusive” results are often not included in the abstract but discussed briefly in the
paper. As the example of EIS shows, vote-counting may not occur as predominant way of aggregation in some
literature when the conclusion is not binary.
45
effective yet do not implement them without positive evidence. As only 10 percent of new drugs
are approved (Hay et al., 2014)57 , it is plausible to consider average drugs that are rigorously
tested are expected to be ineffective.
5.2
5.2.1
Relation to Literature
Information asymmetry in voting
This paper closely relates to the models of voting as process of aggregating private information (Austen-Smith and Banks, 1996; Feddersen and Pesendorfer, 1996, 1997, 1998, 1999).
As discussed in section 3.2, while both models have pivotality condition as the key mechanism for explaining abstention from reporting, my set-up differs from theirs in two important
aspects and deliver new results. First, the researchers can only send binary messages while
their signals are continuous whereas these voting models commonly assume that signals are
binary.58 Therefore, even when all researchers are equally informed, those who find coefficients
of intermediate magnitudes omit their studies to reflect the strength, rather than the sign, of
their signal. In addition, my result on allegiance bias is related to their result that uninformed
independent voter may vote in the opposite direction of partisan bias of other electorates. My
set-up with continuous signal extends this by demonstrating an amplification effect of strategic substitutability: biased researchers’ reports have even larger bias in coarse communication
when unbiased researchers try to offset their bias. Second, the policymaker cannot commit to
a particular strategy unlike in their voting setting with ex ante fixed voting rules.59 The model
demonstrates that, even in optimal equilibrium, the biased reporting emerges whereas the voting literature showed that bias occurs under sub-optimal voting rule such as the unanimity
rule.
As their voting theory relates to the auction theory, this model reveals a novel connection
of publication bias to the winner’s curse. The common interpretation focuses on the coefficient
magnitude, βi , and viewing the published estimates as unbiased estimates suffers from the
winner’s curse because low coefficient values tend to be suppressed (Young et al., 2008). This
model suggests a perspective that focuses on the conclusion message, mi , and shows sending
binary messages from all studies suffers from the winner’s curse. To see this, recall the setup with multiple researchers such that cost of policy, c, and prior belief of average effect,
b, are both zero. If a researcher treats his own signal as the only available information and
57
Some may critique that this is not an appropriate comparison because the researchers use their
private
information to select the drugs into the experiment. One can think of the model’s prior N 0, σb2 to be the
prior on the selected studies, which this example also shows.
58
Although this point has not been a focus of discussions in the voting literature, whether the coarsening
occurs at the signal level or at the message level makes difference in the nature of the optimal equilibrium in
terms of bias. To the extent that private signals are continuous in reality, my model offers a relevant novel
explanation for abstention in voting.
59
As Morgan and Stocken (2008) clarifies, one major difference between voting models and communication
model is the commitment power of the receiver. Without commitment to particular decision rule, this communication model requires the posterior consistency of decision-rule, which is infeasible to analyze analytically with
continuous signals.
46
recommends policy without omission so that mi =1(βi ≥ 0), then he suffers from
the winner’s
h
i
curse in that he is making policy recommendations too often: for signals βi ∈ 0, β , he naively
recommends thepolicy
yet posterior combined with others’ signals is not sufficiently high, and
for signals βi ∈ β, 0 , he naively recommends not to implement policy whereas the expected
policy benefit is sufficiently high. In this way, omission for signals βi ∈ β, β by conditioning
on others’ signal realization is a solution to this winner’s curse. The null hypothesis testing is
often justified by the idea that scientific conclusions must be “certain enough,” as demonstrated
in the risk aversion examples. When considering the collective communication, it could also
be viewed as providing a solution to the winner’s curse by making researchers draw policy
recommendations less than they would do on their own.
5.2.2
Communication and media
My model also relates to the large theoretical literature on communication and media, which
has focused on biased motives, reputation motives, and costly communication in the set-up
where the message space is as large as the signal space. First, a key result of the costless
communication models is that when there is conflict of interest between an informed expert
and an uninformed decision-maker, only feasible form of communication becomes coarse to limit
manipulation (Crawford and Sobel, 1982; Sobel, 2011).60 In contrast with this literature that
focused on the consequences of conflict of interest on the endogenous coarsening of messages,
this paper takes coarse message space as an exogenous technological restriction and derives
the reporting bias endogenously with multiple senders that have no conflict of interest. In this
sense, it highlights the causal relationship between coarse communication and reporting bias
that is the opposite of what existing literature has focused on.
Second, the key result in the repeated communication models is that reputational concerns
can generate reporting bias even in the absence of conflict of interest (Gentzkow and Shapiro,
2006; Shapiro, 2016). The literature shows the biased reporting arises from the demand for
reporting consistent with the heterogeneous prior beliefs receivers have (Suen, 2004; Gentzkow
and Shapiro, 2006). Applied to academic communications, these models predict that senders
frequently report confirmative results. In contrast, my model explains publication bias as
arising from demand for easily aggregable pieces of evidence. It predicts that surprising results,
information that swings one’s action away from the action taken in the absence of reports, are
more reported.61
Third, the models with cost of communication (Banerjee and Somanathan, 2001) also
predict that only extreme results are reported because the cost may be higher than the gain
60
This strategic consideration derived with one sender setting applies to the settings with multiple senders
and one-dimensional noisy information (Hao, Rosen, Suen, 2001), but to a lesser degree with multi-dimensional
information without uncertainty (Battaglini, 2002).
61
My result on allegiance bias is similar to these models in that it is the presence of other biased or inaccurate
information sources that generates reporting bias (e.g. political correctness in Morris, 2001). In both models, if
other senders are unbiased, the unbiased senders deliver messages in the way optimal to the receivers.
47
from reporting confirmative results that do not significantly alter policy decisions. The cost
of writing and refereeing an academic paper is realistic and important, and as section 3.4.1
shows, my model’s qualitative predictions are robust to cost of communication. There are
two differences in testable predictions of costly communication models and my model with
asymmetric information. First, the former suggests inflation of coefficient does not happen
unless there is conflict of interest that generates upward bias at the conventional significance
threshold t. Second, whereas the former predicts removing the cost of communication can
eliminate publication bias, the later suggests that the bias remains so long as readers coarsely
summarize multiple studies.
Finally, the paper relates to some literature that show decision-makers may value information or take actions in biased ways under various communication frictions. Suen (2004)
demonstrates that news consumers may prefer like-minded sources because it can provide
coarse messages that are sufficiently informative to influence the reader’s decision. Fryer and
Jackson (2008) show that minority discrimination could occur due to coarse thinking that puts
them into one category, leading to biased decisions. Blume and Board (2013) show that, even
with common interests, the uncertainty about how well the receiver can understand messages
inhibits communication as language barriers. This paper focuses on the interaction between
coarse messages and informational asymmetry among multiple ex ante identical senders, which
has not been explored in these class of models.
5.2.3
Existing theories of publication bias
There are two sets of papers that theoretically address the policies to counter publication bias,
and find that such bias may be socially efficient. The first set of papers considers the model
with endogenous information acquisition, where the costs are incurred by biased researchers.
In such settings, rewarding the researchers with the option to falsely reporti favorable results
can raise the experimental effort closer to the social optimum (Glaeser, 2006). This incentive
benefit may be greater than the social cost of false reports (Libgober, 2015). Others focus on
settings with verifiable information and selective disclosure (Henry, 2009) as well as the optimal
stopping problem of evidence accumulation (Henry and Ottaviani, 2014). In contrast, my model
focuses on the collective communication with exogenous information, an inner problem of the
models with endogenous information. An important difference is that the model assumes that
the researchers do not have biased preferences, and focuses on the contrast of use of standard
errors between null hypothesis testing and Bayes’ rule to provide reasons to draw positive
conclusions with marginally insignificant results.
Second, De Winter and Happee (2012) compares the publication practices of no omission
and selective reporting of statistically significant results when the null hypothesis is the average
of existing evidence, and shows that meta-analytic estimates are more accurate given fixed
number of publications. While contradicted by van Assen et al. (2014)’s replication, my
approach is similar to this model in that it focuses on the cost of communication. Whereas De
48
Winter and Happee consider direct communication of estimates, this paper’s approach focuses
on the friction due to only communicating binary conclusions.
5.2.4
Existing empirics on information transmission
The empirical section of the paper relates to the literature that examine the empirical predictions of the aforementioned models of information transmission. In the real-world setting, the
voting literature shows that education is correlated with voter turn-out, which is consistent
with the prediction of abstention (Feddersen, 2004). The literature on media uses measures of
bias such as relative position to other media sources and intensity and tone of news coverage to
analyze demand and supply of media bias (Puglisi and Snyder, 2016). In laboratory settings,
both voting literature and communication literature have varied payoff and information structure to examine the validity of models’ predictions (Battaglini et al., 2010; Crawford, 1998).
This paper advances these empirical tests by obtaining the direct measurement of bias in the
real-world setting because academic research offers a unique opportunity to recover the distribution of underlying signals. In addition, to the best of my knowledge, this is the first attempt
to test the various models of publication selection process the meta-analysts have proposed.
5.3
5.3.1
Policy Implications
Common interpretations and recommendations
Because of shared understanding that all results are needed to attain collectively unbiased
estimates, it is common to interpret publication bias as a failure of incentive structure in
the production of knowledge (Ioannidis, 2005; Every-Palmer, 2014). Researchers may want
to support positive results due to their affiliation with industry funding source and partner
organizations, or due to their beliefs in some theories. (Goldcare, 2010; Zilak and McCloskey,
2008; Doucouliagos and Stanley, 2013b) Journals and researchers may perceive insignificant
results as less conclusive62 and influential, and thus publish significant results to maintain
their reputation and to build their careers (Reinhart, 2015). Psychologically, everyone may
subconsciously hold a predisposition for significant results (Card and Krueger, 1995) that are
“entertaining”. A number of authors have expressed their concerns and consider omission of
studies and inflation of results as a serious scientific misconduct (Chalmers, 1998; Martinson,
2005) that may misinform policies and consequently compromise the reliability of knowledge
even more than explicit fabrication of data (John et al., 2012).
Authoritative organizations have responded to accumulations of concerns and evidence by
making formal recommendations on publication practices. For example, the International Committee of Medical Journal Editors (ICMJE) suggest that the publication decisions should be
62
Fisher also suggested that one cannot infer conclusion when hypothesis is not rejected, and NeymanPearson’s proposal to set a single alternative hypothesis is often considered to be arbitrary (Salsberg, 2002).
While the model’s explanation is similar to this argument, it further explains why upward bias emerges on averge
whereas merely explaining ‘conclusiveness’ based on p-values does not.
49
independent from the results of study: “Editorial decisions ... should not be influenced by
... findings that are negative... In addition, authors should submit for publication or otherwise
make publicly available, and editors should not exclude from consideration for publication, studies with findings that are not statistically significant or that have inconclusive findings. Such
studies may provide evidence that combined with that from other studies through meta-analysis
might still help answer important questions ...” (ICMJE, 2016). The American Statistical
Association also recommends that negative results should be reported to facilitate proper interpretation of literature: “P -values and related analyses should not be reported selectively.
Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. ... Whenever a researcher chooses what to present based on statistical results, valid
interpretation of those results is severely compromised if the reader is not informed of the choice
and its basis” (Wasserstein and Lazar, 2016).
5.3.2
Current Policies
Under such common interpretation that selective publications are due to biased motives, some
corrective policies to adjust researcher incentives can enhance social welfare: even if experts
are sophisticated and aware of selectin reporting, selective reporting can lead to low social welfare because it reduces information available to policymakers. Regulatory organizations and
journals have put forth efforts to reduce publication bias with goal towards comprehensive disclosure of evidence. The FDA only considers that studies which have registered protocols and
the analysis strictly followed them. The journal Basic and Applied Social Psychology banned
use of null hypothesis testing (Woolston, 2015). In 2016, journals of American Economic Associations abandoned use of eyecatchers for statistically significant results in regression tables.
Registration of pre-analysis plans has begun to “counter publication bias” (Siegfried, 2012)
and become a norm in field experiments in economics (Miguel et al., 2014; Olken, 2015). The
Journal of Negative Results help reduce the difficulty of disseminating the studies with negative
results in various fields such as medicine. The United Nations calls for the governments across
the world to have the clinical trials be registered for their approval (AllTrials, 2016).
Despite the progress, many suggest that the issue of publication bias is far from being
eliminated due to the limits of enforceability (Hunter and Schmidt, 2004). While registration
is also necessary for observational studies to avoid any result-dependent reporting, the extension
of registry to observational studies has been controversial (Editors of Epidemiology, 2010) and
the attempts with clinicaltrials.gov has not been successful (Boccia et al., 2016).
5.3.3
Alternative interpretation: null hypothesis testing for communication
Instead of following these common interpretations, this model builds on the literature on the
history and sociology of statistics that argues that null hypothesis testing became widely accepted due to its advantage in communication (Shibamura, 2004). Despite its original contro50
versy, today the method is not only taught in classes and used in research, but also adopted
by the judicial decisions on discrimination cases and regulatory decisions on environmental
and medical policies. First, null hypothesis testing leads to binary conclusions, which are the
messages with the lowest burden measured by informational entropy. Second, it is “objective”
in that the conclusions are invariant of who analyzes the data (Porter, 1996). The literature
argues that such simplicity and analyzer-independence of null hypothesis testing was widely
appealing for communicating the complex scientific inquiry to the non-experts.63
To the best of my knowledge, the discussion on null hypothesis testing has not explored
its implication for aggregating multiple pieces of evidence through their conclusions. While
full reporting is ideal when the readers are unboundedly sophisticated and have unlimited
computational capacity, it also leads to an extremely conservative implication for the readers
who only count binary conclusions: if readers are interested in whether there are any positive
effects, then they are expected to still conclude that null hypothesis is collectively rejected
even if 97 percent of studies report negative results regardless of studies’ precision. To see why,
recall that if the true effect were null, then exactly 97.5 percent of the papers report negative
results under the conventional double-sided testing at 95 percent confidence level. Thus, if
there were any less than 97.5 percent of non-positive results, then the true effect must be
positive. However, common sense would say that general audience will interpret the message
“97% of climate papers cannot claim human-caused global warming is happening” as negation,
not confirmation, of anthropogenic climate change; unless systematic meta-analysis exists to
aggregate results for the readers, it may be difficult to accurately convey the overall result
under both vote counting and full reporting.
This alternative understanding is a clarification, not necessarily a contradiction, of the common interpretation that researchers are merely seeking for professional reward of publication.
The researchers in the model are, in actual research processes, the joint decision-makers composed of authors, referees, and editors. Thus, even if authors are themselves career-seeking,
anonymous referees and editors may be social-welfare-maximizing. Nevertheless, this common
interpretation is incomplete without explanation for the demand of “evidence with publication
bias” to which the journal referees and editors respond. The findings of this paper suggest that
caution is warranted in unambiguously attributing this demand to biased beliefs or motives:
coarse communication of null hypothesis testing is sufficient to explain it.
5.3.4
Suggestions
The model suggests ways to facilitate scientific communication between authors of articles,
meta-analysts, and readers. First, it suggests to authors to use the statistical significance
thresholds only as rules of thumb, to attend to coefficient magnitudes for inference of yes-orno conclusions, and to tolerate precisely estimated or well identified null results. Second, it
63
In this sense, despite its name “hypothesis testing,” it has only weak relation with the model of science in
which researchers set and adjust their own hypotheses and test them against data. (Hampel)
51
suggests to reviewers to discuss publication bias in their meta-analyses without concern to be
interpreted as criticism to the literature. Third, it suggests readers need not distrust overall
lessons from scientific literature even when there were evidence of publication bias.
5.3.5
Caveats
The paper does not claim that publication bias is always socially acceptable; the consequence
of misinforming public policy through scientific evidence can be too grave to be optimistic. It
is plausible that, due to the competitive process of academic publication, the degree of publication bias is sub-optimally high. The sensitivity analyses show that, if systematic reviews
are expected, if risk aversion is a key concern, or if the interests of experimenters and policymaker are in serious conflict, then allowing publication bias can be detrimental. In some fields
of medicine in which meta-analyses are common, minimizing risk is critical, and where the
funding industry’s profits at stake are sizable, thus, the mandatory reporting remains critical.
The paper’s goal is solely to clarify the implications of vote counting, not to recommend it
nor to exaggerate its prevalence. The Cochrane Review writes “Occasionally meta-analyses use
‘vote counting’ to compare the number of positive studies with the number of negative studies.
... it should be avoided whenever possible. ... Vote counting might be considered as a last
resort in situations when standard meta-analytical methods cannot be applied (such as when
there is no consistent outcome measure)” (Higgins and Green, 2011). There are a number of
careful researchers who are forthright about their studies’ limitations in the abstracts, and
many readers who are mindful of the caveats and avoid oversimplification of evidence. The
most ideal communication remains to be such thorough aggregation of all evidence.
6
Conclusion
Robert K. Merton64 suggested that disinterestedness, the pursuit of knowledge for common
scientific enterprise, is a core ethos of scientific practice (Merton, 1942). If the objective is to
acquire accurate knowledge and all information can be fully and costlessly communicated, the
value of information depends not on its conclusion but only on its precision. Thus, publication
bias is often perceived to contradict the ideal vision of unbiased researchers equally willing to
report both positive and negative results.
Merton also proposed communalism, the collective learning process, as integral to scientific
practice. This paper shows that, when coupled with the difficulty of reducing the complex and
nuanced results into simple and coarse conclusions, even disinterested researchers’ reports will
have both omissions and inflation. Casual language such as “exciting results” vs “ no results”
appear to suggest irrationality and bias in the academic community. The model offers one
rational theory of interesting results: whether positive or negative, they are results sufficiently
64
An eminent sociologist of science who is a father of Robert C. Merton, an economist with distinguished
contributions in finance.
52
“conclusive” to influence the decision in the direction of its yes-or-no result when other studies
are collectively inconclusive.
The contributions of the paper are twofold. First, it extends and combines the theoretical
literature on communication and the empirical literature on publication bias: it builds on the
former by identifying the implication of collective coarse information transmission for reporting
bias, and extends the latter by offering a rational foundation for publication selection process
and directly testing its predictions with data. Second, given that the model offers alternative
models of publication selection, it proposes a novel, simple “stem-based” bias correction method
based on the prediction supported by various models of publication bias commonly used.
53
References
AllTrials. 2016 “UN Calls for Global Action on Clinical Trial Transparency.”
http://www.alltrials.net/news/un-calls-for-global-action-on-clinical-trial-transparency/.
Action to Control Cardiovascular Risk in Diabetes Study Group, Hertzel C. Gerstein, Michael
E. Miller, Robert P. Byington, David C. Goff, J. Thomas Bigger, John B. Buse, et al. 2008.
“Effects of Intensive Glucose Lowering in Type 2 Diabetes.” The New England Journal of
Medicine 358 (24): 2545–59.
Ashenfelter, Orley, Colm Harmon, and Hessel Oosterbeek. 1999. “A Review of Estimates of
the Schooling/Earnings Relationship, with Tests for Publication Bias.” Labour Economics
6 (4): 453–470.
Austen-Smith, David. 1993. “Interested Experts and Policy Advice: Multiple Referrals under
Open Rule.” Games and Economic Behavior 5 (1): 3–43.
Austen-Smith, David, and Jeffrey S. Banks. 1996. “Information Aggregation, Rationality, and
the Condorcet Jury Theorem.” American Political Science Review 90 (1): 34–45.
Banerjee, Abhijit, and Rohini Somanathan. 2001. “A Simple Model of Voice.” The Quarterly
Journal of Economics 116 (1): 189–227.
Banerjee, Abhijit, Selvan Kumar, Rohini Pande, and Felix Su. 2011. “Do Informed Voters
Make Better Choices? Experimental Evidence from Urban India.” Harvard University
mimeo.
Barth, Jürgen, Thomas Munder, Heike Gerger, Eveline Nüesch, Sven Trelle, Hansjörg Znoj,
Peter Jüni, and Pim Cuijpers. 2013. “Comparative Efficacy of Seven Psychotherapeutic
Interventions for Patients with Depression: A Network Meta-Analysis.” PLOS Med 10 (5):
e1001454.
Bastian, Hilda, Paul Glasziou, and Iain Chalmers. 2010. “Seventy-Five Trials and Eleven
Systematic Reviews a Day: How Will We Ever Keep Up?” PLoS Medicine 7 (9): e1000326.
Batt, Rosemary. 1999. “Work Organization, Technology, and Performance in Customer Service
and Sales.” Industrial & Labor Relations Review 52 (4): 539–564.
Battaglini, Marco. 2002. “Multiple Referrals and Multidimensional Cheap Talk.” Econometrica
70 (4): 1379–1401.
Battaglini, Marco, Rebecca B Morton, and Thomas R. Palfrey. 2010. “The Swing Voter’s
Curse in the Laboratory.” The Review of Economic Studies 77 (1): 61–89.
Battaglini, Marco, Ernest K. Lai, Wooyoung Lim, and Joseph Tao-yi Wang. 2016. “The
Informational Theory of Legislative Committees: An Experimental Analysis.” Working
Paper
Begg, Colin B., and Madhuchhanda Mazumdar. 1994. “Operating Characteristics of a Rank
Correlation Test for Publication Bias.” Biometrics 50 (4): 1088.
Blume, Andreas, and Oliver Board. 2013. “Language Barriers.” Econometrica 81 (2): 781–812.
Boccia, Stefania, Kenneth J. Rothman, Nikola Panic, Maria Elena Flacco, Annalisa Rosso,
Roberta Pastorino, Lamberto Manzoli, et al. 2016. “Registration Practices for Observa54
tional Studies on ClinicalTrials.gov Indicated Low Adherence.” Journal of Clinical Epidemiology 70 (February): 176–82.
Boussageon, Rémy, Theodora Bejan-Angoulvant, Mitra Saadatian-Elahi, Sandrine Lafont,
Claire Bergeonneau, Behrouz Kassaï, Sylvie Erpeldinger, James M. Wright, François
Gueyffier, and Catherine Cornu. 2011. “Effect of Intensive Glucose Lowering Treatment on All Cause Mortality, Cardiovascular Death, and Microvascular Events in Type 2
Diabetes: Meta-Analysis of Randomised Controlled Trials.” BMJ 343 (July): d4169.
Brodeur, Abel, Mathias Lé, Marc Sangnier, and Yanos Zylberberg. 2016. “Star Wars: The
Empirics Strike Back.” American Economic Journal: Applied Economics 8 (1): 1–32.
Camerer, Colin, Linda Babcock, George Loewenstein, and Richard Thaler. 1997. “Labor
Supply of New York City Cabdrivers: One Day at a Time.” The Quarterly Journal of
Economics 112 (2): 407–41.
Card, David, and Alan B. Krueger. 1995. “Time-Series Minimum-Wage Studies: A MetaAnalysis.” The American Economic Review 85 (2): 238–243.
Carey, Benedict. 2008. “Antidepressant Studies Unpublished.” The New York Times, January
17. http://www.nytimes.com/2008/01/17/health/17depress.html.
Casey, Katherine, Rachel Glennerster, and Edward Miguel. 2012. “Reshaping Institutions:
Evidence on Aid Impacts Using a Preanalysis Plan*.” The Quarterly Journal of Economics
127 (4): 1755–1812.
Chalmers l.
1990.
“Underreporting Research Is Scientific Misconduct.” JAMA 263 (10):
1405–8.
Copas, J. B., and H. G. Li. 1997. “Inference for Non-Random Samples.” Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 59 (1): 55–95.
Crawford, Vincent. 1998. “A Survey of Experiments on Communication via Cheap Talk.”
Journal of Economic Theory 78 (2): 286–98.
Crawford, Vincent P., and Joel Sobel. 1982. “Strategic Information Transmission.” Econometrica 50 (6): 1431–51.
de Winter, Joost, and Riender Happee. 2013. “Why Selective Publication of Statistically
Significant Results Can Be Effective.” PloS One 8 (6): e66463.
Davis, Lucas W. 2008. “The Effect of Driving Restrictions on Air Quality in Mexico City.”
Journal of Political Economy 116 (1): 38–81.
Dekel, Eddie, and Michele Piccione. 2000. “Sequential Voting Procedures in Symmetric Binary
Elections.” Journal of Political Economy 108 (1): 34–55.
Dewatripont, Mathias, and Jean Tirole. 1999. “Advocates.” Journal of Political Economy 107
(1): 1–39.
Dickhaut, John W., Kevin A. McCabe, and Arijit Mukherji. 1995. “An Experimental Study
of Strategic Information Transmission.” Economic Theory 6 (3): 389–403.
Doucouliagos, Chris, and T.d. Stanley. 2013. “Are All Economic Facts Greatly Exaggerated?
Theory Competition and Selectivity.” Journal of Economic Surveys 27 (2): 316–39.
55
Doucouliagos, Christos, and Patrice Laroche. 2003. “What Do Unions Do to Productivity? A
Meta-Analysis.” Industrial Relations: A Journal of Economy and Society 42 (4): 650–691.
Duval, Sue, and Richard Tweedie. 2000. “Trim and Fill: A Simple Funnel-Plot–based Method
of Testing and Adjusting for Publication Bias in Meta-Analysis.” Biometrics 56 (2):
455–463.
Egger, Matthias, George Davey Smith, Martin Schneider, and Christoph Minder. 1997. “Bias
in Meta-Analysis Detected by a Simple, Graphical Test.” BMJ 315 (7109): 629–634.
Every-Palmer, Susanna, and Jeremy Howick. 2014. “How Evidence-Based Medicine Is Failing
due to Biased Trials and Selective Publication: EBM Fails due to Biased Trials and
Selective Publication.” Journal of Evaluation in Clinical Practice 20 (6): 908–14.
Fafchamps, Marcel, and Julien Labonne. 2016. “Using Split Samples to Improve Inference
about Causal Effects.” National Bureau of Economic Research Working Paper.
Feddersen, Timothy J. 2004. “Rational Choice Theory and the Paradox of Not Voting.” Journal
of Economic Perspectives 18 (1): 99–112.
Feddersen, Timothy J., and Wolfgang Pesendorfer. 1996. “The Swing Voter’s Curse.” The
American Economic Review, 408–424.
———. 1997. “Voting Behavior and Information Aggregation in Elections With Private Information.” Econometrica 65 (5): 1029.
———. 1998. “Convicting the Innocent: The Inferiority of Unanimous Jury Verdicts under
Strategic Voting.” American Political Science Review 92 (1): 23–35.
———. 1999. “Abstention in Elections with Asymmetric Information and Diverse Preferences.”
American Political Science Review 93 (2): 381–98.
Forbes, Kristin J. 2000. “A Reassessment of the Relationship between Inequality and Growth.”
American Economic Review 90 (4): 869–87.
Franco, Annie, Neil Malhotra, and Gabor Simonovits. 2014. “Publication Bias in the Social
Sciences: Unlocking the File Drawer.” Science 345 (6203): 1502–5.
Freeman, Richard. 1988. “Union Density and Economic Performance: An Analysis of US
States.” European Economic Review 32 (2–3): 707–716.
Fryer, Roland, and Matthew O. Jackson. 2008. “A Categorical Model of Cognition and Biased
Decision Making.” The BE Journal of Theoretical Economics 8 (1).
Gabaix, Xavier. 2014. “A Sparsity-Based Model of Bounded Rationality.” The Quarterly
Journal of Economics 129 (4): 1661–1710.
Garner, Paul, David Taylor-Robinson, and Harshpal Singh Sachdev. 2013. “DEVTA: Results
from the Biggest Clinical Trial Ever.” The Lancet 381 (9876): 1439–41.
Gentzkow, Matthew, and Jesse Shapiro. 2006. “Media Bias and Reputation.” Journal of
Political Economy, 114(2).
Gentzkow, Matthew, Jesse Shapiro, and Daniel Stone 2016. “Chapter 14 - Media Bias in the
Marketplace: Theory.” In Handbook of Media Economics, edited by Joel Waldfogel and
David Strömberg Simon P. Anderson, 1:647–67.
56
Gerber, Alan. 2008. “Do Statistical Reporting Standards Affect What Is Published? Publication Bias in Two Leading Political Science Journals.” Quarterly Journal of Political
Science 3 (3): 313–26.
Gerber, Alan S., and Neil Malhotra. 2008. “Publication Bias in Empirical Sociological Research: Do Arbitrary Significance Levels Distort Published Results?” Sociological Methods
& Research, June.
Gilligan, Thomas W., and Keith Krehbiel. 1987. “Collective Decisionmaking and Standing
Committees: An Informational Rationale for Restrictive Amendment Procedures.” Journal
of Law, Economics, & Organization 3 (2): 287–335.
Glaeser, Edward L. 2008. “Researcher Incentives and Empirical Methods.” in The Foundations
of Positive and Normative Economics, Andrew Caplin and Andrew Schotter
Glazer, Jacob, and Ariel Rubinstein. 2004. “On Optimal Rules of Persuasion.” Econometrica
72 (6): 1715–36.
Goldacre, Ben. 2010. Bad Science: Quacks, Hacks, and Big Pharma Flacks. Reprint edition.
New York: Farrar, Straus and Giroux.
Hanna, Rema, Esther Duflo, and Michael Greenstone. 2016. “Up in Smoke: The Influence
of Household Behavior on the Long-Run Impact of Improved Cooking Stoves.” American
Economic Journal: Economic Policy 8 (1): 80–114.
Hansen, Lars Peter, and Kenneth J. Singleton. 1982. “Generalized Instrumental Variables
Estimation of Nonlinear Rational Expectations Models.” Econometrica 50 (5): 1269.
Havránek, Tomáš. 2015. “Measuring Intertemporal Substitution: The Importance of Method
Choices and Selective Reporting.” Journal of the European Economic Association 13 (6):
1180–1204.
Havranek, Tomas, Zuzana Irsova, Karel Janda, and David Zilberman. 2015. “Selective Reporting and the Social Cost of Carbon.” Energy Economics 51 (September): 394–406.
Hay, Michael, David W. Thomas, John L. Craighead, Celia Economides, and Jesse Rosenthal.
2014. “Clinical Development Success Rates for Investigational Drugs.” Nature Biotechnology 32 (1): 40–51.
Hedges, Larry V. 1992. “Modeling Publication Selection Effects in Meta-Analysis.” Statistical
Science, 246–255.
Henry, Emeric. 2009. “Strategic Disclosure of Research Results: The Cost of Proving Your
Honesty.” The Economic Journal 119 (539): 1036–64.
Henry, Emeric, and Marco Ottaviani. 2014. “Research and the Approval Process.” mimeo.
Higgins, Julian P. T., and Sally Green. 2011. Cochrane Handbook for Systematic Reviews of
Interventions. John Wiley & Sons.
Hunter, John E., and Frank L. Schmidt. 2004. Methods of Meta-Analysis: Correcting Error
and Bias in Research Findings. 2nd edition. Thousand Oaks, Calif: SAGE Publications,
Inc.
ICMJE. 2016. “Recommendations for the Conduct, Reporting, Editing, and Publication of
57
Scholarly Work in Medical Journals.” http://www.icmje.org/recommendations/.
Imbens, Guido, and Karthik Kalyanaraman. 2011. “Optimal Bandwidth Choice for the Regression Discontinuity Estimator.” The Review of Economic Studies
Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine
2 (8): e124.
John, K. Leslie, George Loewenstein, and Drazen Prelec. 2012. “Measuring the Prevalence of
Questionable Research Practices With Incentives for Truth Telling.” Psychological Science
23 (5): 524–32.
Kamenica, Emir, and Matthew Gentzkow. 2011. “Bayesian Persuasion.” American Economic
Review 101 (6): 2590–2615.
Knutti, Reto. 2010. “The End of Model Democracy?: An Editorial Comment.” Climatic
Change 102 (3–4): 395–404.
Krishna, Vijay, and John Morgan. 2001. “A Model of Expertise.” The Quarterly Journal of
Economics 116 (2): 747–75.
———. 2004. “The Art of Conversation: Eliciting Information from Experts through MultiStage Communication.” Journal of Economic Theory 117 (2): 147–79.
Leamer, Edward E. 1978. Specification Searches: Ad Hoc Inference with Nonexperimental
Data. 1 edition. New York: Wiley.
Li, Hao, Sherwin Rosen, and Wing Suen. 2001. “Conflicts and Common Interests in Committees.” American Economic Review 91 (5): 1478–97.
Libgober, Jonathan. 2015. “False Positives in Scientific Research.” Harvard University mimeo.
Martinson, Brian C., Melissa S. Anderson, and Raymond de Vries. 2005. “Scientists Behaving
Badly.” Nature 435 (7043): 737–38.
McCrary, Justin, Garret Christensen, and Daniele Fanelli. 2016. “Conservative Tests under
Satisficing Models of Publication Bias.” Edited by Daniele Marinazzo. PLOS ONE 11 (2):
e0149590.
Merton, Robert K. 1942 “The Normative Structure of Science", in the Sociology of Science:
Theoretical and Empirical Investigations, Chicago: University of Chicago Press
Miguel, Edward, Colin Camerer, Katherine Casey, Joshua Cohen, Kevin M. Esterling, Alan
Gerber, Rachel Glennerster, et al. 2014. “Promoting Transparency in Social Science
Research.” Science 343 (6166): 30–31.
Morgan, John, and Phillip C. Stocken. 2008. “Information Aggregation in Polls.” American
Economic Review 98 (3): 864–96.
Morris, Stephen. 2001. “Political Correctness.” Journal of Political Economy 109 (2): 231–265.
Mullainathan, Sendhil. 2000. “Thinking Through Categories.” MIT mimeo
Munafo, M. R., I. J. Matheson, and J. Flint. 2007. “Association of the DRD2 Gene Taq1A
Polymorphism and Alcoholism: A Meta-Analysis of Case–control Studies and Evidence of
Publication Bias.” Molecular Psychiatry 12 (5): 454–461.
The Editors of Epidemiology, 2010. “The Registration of Observational Studies—When Metaphors
58
Go Bad:” Epidemiology, July, 1.
Newey, Whitney K., and Daniel McFadden. 1994. “Chapter 36 Large Sample Estimation and
Hypothesis Testing.” in Handbook of Econometrics, 4:2111–2245.
Olken, Benjamin A. 2015. “Promises and Perils of Pre-Analysis Plans.” Journal of Economic
Perspectives 29 (3): 61–80.
Porter, Theodore M. 1996. Trust in Numbers. Reprint edition. Princeton, N.J.: Princeton
University Press.
Puglisi, Riccardo, and James M. Snyder Jr. 2016. “Chapter 15 - Empirical Studies of Media
Bias.” In Handbook of Media Economics, edited by Joel Waldfogel and David Strömberg
Simon P. Anderson, 1:647–67.
Reinhart, Alex. 2015. Statistics Done Wrong: The Woefully Complete Guide. 1 edition. San
Francisco: No Starch Press.
Rosenthal, Robert. 1979. “The File Drawer Problem and Tolerance for Null Results.” Psychological Bulletin 86 (3): 638.
Salsburg, David. 2002. The Lady Tasting Tea: How Statistics Revolutionized Science in the
Twentieth Century. 60717th edition. New York: Holt Paperbacks.
Sánchez-Pagés, Santiago, and Marc Vorsatz. 2007. “An Experimental Study of Truth-Telling
in a Sender–receiver Game.” Games and Economic Behavior 61 (1): 86–112.
Schwartzstein, Joshua. 2014. “Selective Attention and Learning.” Journal of the European
Economic Association 12 (6): 1423–52.
Shapiro, Jesse M. 2016. “Special Interests and the Media: Theory and an Application to
Climate Change.” National Bureau of Economic Research Working Paper
Shibamura, Ryo. 2004 “Fisher no tokei riron – suisoku tokeigaku no keisei to sono shakaiteki
haikei” (Statistical Theory of Fisher: Evolution of Inferential Statistics and its Social
Background, in Japanese), Kyushu University Press
Siegfried, John J. 2012. “Minutes of the Meeting of the Executive Committee Chicago, IL
January 5, 2012.” American Economic Review 102 (3): 645–52.
Simonsohn, Uri, Leif D. Nelson, and Joseph P. Simmons. 2014. “P-Curve: A Key to the
File-Drawer.” Journal of Experimental Psychology: General 143 (2): 534–47.
Sobel, Joel. 2010. “Giving and Receiving Advice.” In Econometric Society 10th World Congress.
Stanley T.D. 2005 “Beyond Publication Bias” in “Meta-Regression Analysis: Issues of Publication Bias in Economics edited” by Collin Roberts and T. D. Stanley
Stanley, T. D., Stephen B. Jarrell, and Hristos Doucouliagos. 2010. “Could It Be Better
to Discard 90% of the Data? A Statistical Paradox.” The American Statistician 64 (1):
70–77.
Suen, Wing. 2004. “The Self-Perpetuation of Biased Beliefs.” The Economic Journal 114 (495):
377–396.
The Consensus Project. 2014. “Why we need to talk about the scientific consensus on climate
change”
59
http://www.skepticalscience.com/news.php?f=why-we-need-to-talk-about-climate-consensus
Turner, Erick H., Annette M. Matthews, Eftihia Linardatos, Robert A. Tell, and Robert Rosenthal. 2008. “Selective Publication of Antidepressant Trials and Its Influence on Apparent
Efficacy.” New England Journal of Medicine 358 (3): 252–60.
van Assen, Marcel A. L. M., Robbie C. M. van Aert, Michèle B. Nuijten, and Jelte M. Wicherts.
2014. “Why Publishing Everything Is More Effective than Selective Publishing of Statistically Significant Results.” Edited by K. Brad Wray. PLoS ONE 9 (1): e84896.
van Assen, Marcel A. L. M., Robbie C. M. van Aert, and Jelte M. Wicherts. 2015. “MetaAnalysis Using Effect Size Distributions of Only Statistically Significant Studies.” Psychological Methods 20 (3): 293–309.
Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA’s Statement on P-Values:
Context, Process, and Purpose.” The American Statistician 70 (2): 129–33.
Woolston, Chris. 2015. “Psychology Journal Bans P Values.” Nature 519 (7541): 9–9.
Young, Neal S., John PA Ioannidis, and Omar Al-Ubaydli. 2008. “Why Current Publication
Practices May Distort Science.” PLoS Medicine 5 (10): e201.
Ziliak, Stephen T., and Deirdre N. McCloskey. 2008. The Cult of Statistical Significance: How
the Standard Error Costs Us Jobs, Justice, and Lives. 1st edition. Ann Arbor: University
of Michigan Press.
60
Appendices
Appendix A will present analytical proofs and examples; Appendix B will contain computational results; and
Appendix C relates to the proofs and visualizations in empirics section.
Appendix A.
A1 Lemma 1 Monotonicity
Step 1. Recall that, under the full information, the posterior belief is given by the Bayes rule:
E [b | βi , β−i ] =
1
σ2
(βi + β−i )
1
+ σ22
σ2
b
Take some arbitrary decision rules of the policymaker π (mi , m−i ), which may not be monotone in the messages.
Take some decision rule of another researcher
m−i =



1
β−i ∈ B11 , B21 , ...
β−i ∈ B1∅ , B2∅ , ...
β−i ∈ B10 , B20 , ...
∅


0
, where Bkm denote the kth interval of β−i realization such that message m is sent. Note that any mixed strategies
on some intervals can be arbitrarily closely approximated by taking the pure strategies on more finely partitioned
intervals.65
By researcher i’s utility maximization, the strategy of the researcher i is given by the potentially multiple thresholds
β and β that satisfy the following indifference conditions with EW (·) denoting the expected welfare:
EW mi = 1, βi = β = EW mi = ∅, βi = β
EW mi = 0, βi = β = EW mi = ∅, βi = β ,
where by definition
EW (mi , βi ) =
X
P (m−i |βi ) π (mi , m−i )
m−i


1
σ2
m−i E β−i |β−i ∈ Bk
 12 + 22
σ
σb


, βi + βi − c

Substituting and rearranging,
X


1
σ2
h h
m−i i
i


E β−i |β−i ∈ Bk
,β + β − c = 0
 12 + 22

σ
σb


1


h
h
i
i
X 2
m
P m−i |β [π (∅, m−i ) − π (0, m−i )] 1 σ 2 E β−i |β−i ∈ Bk −i , β + β − c = 0
 2 + 2

m
P m−i |β [π (1, m−i ) − π (∅, m−i )]
(28)
m−i
−i
(29)
σ
σb
The remainder of the proof will show that these conditions are satisfied for some unique values of β and β in any
responsive and informative equilibria so that the researcher i’s strategy can be described by the threshold rule as in
65
This relates to the Riemann integral. Suppose another researcher takes some mixed strategy P (m−i = 1) ∈ (0, 1) and P (m−i = 0) ∈
(0, 1) on some interval B. Then, one can rewrite this strategy in terms of pure strategies that maintains the relevant conditional expectations
the same and its derivative arbitrarily closely.
61
Lemma 1.
Step 2. Let us use the conditions of responsiveness and informativeness of messages to ensure that these indifference conditions are well-defined so that it does not hold for all values of βi and it holds for at least some values of
βi .
First, responsiveness to both n0 and n1 ensures that π (1, m−i ) 6= π (∅, m−i ) and yet π (0, m−i ) 6= π (∅, m−i ) for
some m−i . This avoids cases in which, even if the messages are informative, it is not responsive because E [b|m] < c
for all messages m.
Second, informativeness of equilibrium, combined with anonymity and researchers’ utility maximization implies
Lemma 1.(ii) monotonicity of π. Towards contradiction, suppose there exist messages mi , m0i and m−i , m0−i such
that π (mi , m−i ) > π (m0i , m−i ) and π mi , m0−i
> π m0i , m0−i . For this strategy to be consistent with utility
maximization of policymaker, her posterior belief must satisfy E [b|mi , m−i ] ≥ E [b|m0i , m−i ] and E b|mi , m0−i ≤
E b|m0i , m0−i in equilibrium. For this to hold at the same time, the unconditional expectations must equal (E [βi |mi ] =
1
σ2
1
+ 22
σ
σ2
b
E [βi |m0i ]) due to the Bayes’ rule E [b|mi , m−i ] =
{E [βi |mi , m−i ] + E [β−i |mi , m−i ]} and the monotonicity of
E [β−i |βi , m−i ] in βi as will be shown in Step 3. However, this contradicts the informativeness of all messages. Thus,
π (mi , m−i ) > π (m0i , m−i ) implies π mi , m0−i ≥ π m0i , m0−i and we focus on the equilibrium in which the meaning of
message is consistent66 : π (1, m−i ) > π (∅, m−i ) > π (0, m−i ). This avoids a responsive yet uninformative equilibrium
due
to coordination failure:
suppose another
researcher
randomizes the messages with equal probabilities so that
i
h
m−i E β−i |β−i ∈ Bk
,β =
σ02
2
σb +σ 2
β and P m−i |β = 13 . If for example π (1, ∅) − π (∅, ∅) = − (π (1, 1) − π (∅, 1)) and
π (1, 0) = π (∅, 0), then
the first
condition can be satisfied for any value of β. Thus, researcher i would also be willing
to randomize with P m−i |β = 13 , too. Since both researchers randomize, the equilibrium is uninformative.
Note that the informativeness of equilibrium implies Lemma 1 (ii) because the mixed strategy π ∈ (0, 1) can occur
only if some E [b | mi , m−i ] = c. Since the ranking of posterior is the strict inequality due to informativeness, and by
the utility maximization of the receiver,
π ∗ (n1 , n0 ) > 0 ⇒ π ∗ (n1 + k, n0 ) = 1
π ∗ (n1 , n0 ) < 1 ⇒ π ∗ (n1 , n0 + k) = 0
Step 3. We show that any conditional expectation of β−i is weakly increasing in βi :
m−i ∂E β−i |β−i ∈ Bk
0≤
∂βi
m−i If Bk
m−i
Bk
m−i
= βk
m−i is degenerate,
then E β−i |β−i ∈ Bk
, βi
, βi is constant. Otherwise, let us write each interval as:
m
, β k −i .
Because of bivariate normal distribution, conditional distribution is given by
β−i ∼ N ρβi , σ̃ 2
, where correlation coefficient ρ ≡
σb2
,
σb2 +σ 2
and variance σ̃ 2 ≡ 1 − ρ2
σ 2 + σb2 . Using the formula of truncated
normal distribution,
PK
k=1
E [β−i |β−i ∈ P ivj , βi ] =
i PK
h h Φ β k − Φ βk
k=1
66
φ(β k )−φ(βk )
ρβi − σ̃ Φ β −Φ β
( k) ( k)
Φ β k − Φ βk
i
That is, mi = 1 means high signal and mi = 0 means low signal. One could construct an equilibrium in which the meaning of messages
differs, but this will be qualitatively equivalent.
62
, where φ is the probability density function and Φ is the cumulative
density function of N ρβi , σ̃ 2 .
m
First, note that the relative increase in probability of β−i ∈ βk , β k is increasing in the value of E β−i |β−i ∈ Bk −i , βi .
To see this, note that
h ∂ Φ β k − Φ βk
i
∂βi
1
ρ φ β k − φ βk
h i = −
σ̃ Φ β − Φ βk
Φ β k − Φ βk
k
h
i
φ(β )−φ(βk )
− Φ β k −Φ β is increasing in E β−i |β−i ∈ βk , β k , βi because of the formula for mean of truncated normal
( k) ( k)
distribution. Therefore, as βi increases, the relative likelihood of taking β−i occuring in intervals with high values of
βi also increases.
Second, the expected value of the truncated normal distribution is also increasing in βi .67 For notational ease,
denote the mean by µ = ρβi . Then, by chain rule
h
∂E β|β ≤ β ≤ β, βi
i
h
∂E β|β ≤ β ≤ β, µ
=
∂βi
distribution is given
ρ
∂µ
∂E[β|β≤β≤β,βi ]
∂µ
Thus, it is necessary and sufficient to show
i
≥ 0. The expression of the mean of truncated normal
by68
i
h
E β|β ≤ β ≤ β = µ − σ̃
φ−φ
β
Φ−Φ
1
β−µ
φ
dβ
σ̃F (µ)
σ̃
β
=
−β
, where F (µ) ≡ Φ − Φ. By chain rule
∂Etr (β)
=
∂µ
β
1
β−µ
φ0
σ̃F (µ)
σ̃
β
−β
First, the normal density satisfies φ0
φ−φ
σ̃F (µ)
F (µ)
= − ∂ ln∂µ
and
∂Etr (β)
=
∂µ
β
1
σ̃ 2
1
σ̃ 2
=
−β
β σ̃F1(µ) φ
β−µ
σ̃
β−µ
σ̃
=−
φ−φ
1
1
dβ −
−
σ̃
F (µ)
σ̃
−
β−µ
σ̃
φ
β−µ
σ̃
β
1
β−µ
φ
dβ
σ̃F (µ)
σ̃
β
−β
. Second, focus on the second term and see that
dβ = Etr (β). Therefore,
β
1
β−µ
∂ ln F (µ)
β
(β − µ) φ
dβ −
Etr (β)
σ̃F
(µ)
σ̃
∂µ
−β
" β
1
β−µ
β2
φ
dβ − µ
σ̃F (µ)
σ̃
−β
β
#
1
β−µ
Etr (β) − µ
β
φ
dβ −
Etr (β)
σ̃
σ̃ 2
−β σ̃F (µ)
i
1 h
2
β
−
µE
(β)
−
(E
(β)
−
µ)
E
(β)
=
E
tr
tr
tr
tr
σ̃ 2
h
i
1
2
2
=
E
β
−
E
(β)
tr
tr
σ̃ 2
V artr (β)
=
>0
σ̃ 2
Step 4. Consider the two indifference conditions in (28, 29). Since c is finite and limβi →∞ (βi +E [β−i |β−i ∈ P iv1 , βi ]) =
∞ as βi spans the entire real line due to normal distribution, and (βi + E [β−i |β−i ∈ P iv1 , βi ]) is monotone in βi , there
exists a threshold β i such that m∗i = 1 if βi > β i and only if βi ≥ β i . Similarly, there exists βi such that m∗i = 0 if
βi < βi and only if βi ≤ βi .
67
68
I thankAlecos
Papadopoulos
for this step of proof.
φ = φ
β−µ
σ̃
, φ = φ
β−µ
σ̃
are normal probability density functions, and Φ = Φ
distribution functions.
63
β−µ
σ̃
, Φ = Φ
β−µ σ̃
are normal cumulative
A2 Counterexample to monotonicity
Note that monotonicity may not hold under any non-degenerate G (σ) even conditional on the supermajoritarian rule
of the policymaker and monotone rule of another researcher. To see this, consider a bivariate distribution
(
σ−i =
σH
σL
g
1 − g,
for some g ∈ (0, 1).
For some realization of σi , the indifference condition can be written as
P σH |m−i = ∅, β
h
i
β
σ2 + σ12 E β−i |m−i = ∅, σH , β
i
H
1
σb2
1
σi2
+
+
1
2
σH
+ P σL |m−i = ∅, β
by the law of iterated expectations. Even though both
β
σi2
h
i
β
σ2 + σ12 E β−i |m−i = ∅, σL , β
i
L
1
σb2
h
i
+
+ σ12 E β−i |m−i = ∅, β and
H
1
σi2
β
σi2
+
1
2
σL
h
=0
+ σ12 E β−i |m−i = ∅, β
i
H
are increasing in β, P σH |m−i = ∅, β and P σH |m−i = ∅, β could be changing in such a way that the entire left
hand side may be decreasing in some β.
By Bayes’ rule,
P σH |m−i = ∅, β =
gP m−i = ∅|σH , β
P m−i = ∅, β
; P σL |m−i = ∅, β =
(1 − g) P m−i = ∅|σL , β
P m−i = ∅, β
Consider an example in which another researcher −i takes the strategies
m−i (β−i , σH ) = ∅ ⇔ β−i ∈ (β H , β H + ∆]
s
m−i (β−i , σL ) = ∅ ⇔ β−i ∈ (β L , β L +
r
for some small ∆ and
2
σb2 +σL
2
2
σb +σH
σ02 + σL2
2 ∆]
σ02 + σH
is a normalization. Then,
h
i
h
i
E β−i |m−i = ∅, σH , β ' β H ; E β−i |m−i = ∅, σL , β ' β L

β H − ρ σσHi β


β L − ρ σσLi β

 ; P m−i = ∅|σL , β = φ  q

P m−i = ∅|σH , β = φ  q
2
2
2
2
σb + σH
σb + σL
, where φ (·) is the density of standard normal distribution.
By substitution, the indifference condition is approximately


g̃H φ 
, where g̃H =
βH −
g
1
+ 12 + 12
σ2
σ
σ
0
i
H
q
σb2
2
2
σb +σH

"
σH
σi β 
2
σb2 + σH
and g̃L =

#
β
β

+ H
+ g̃L φ 

2
σi2
σH
1−g
1
+ 12 + 12
σ2
σ
σ
0
i
L
βL −
q
σb2
2
2
σb +σL
σb2 + σL2
. Rearranging the condition,
64

"
σL
σi β 
#
β
β
+ L
=0

σi2
σL2

−
β
g̃H σi2
+
βH
2
σH
g̃L
+
βL
2
σL
β
σi2
βL−
√
φ

=

σ2
σL
b
β
σ 2 +σ 2 σi
L
b
2
2
σb +σL
βH −
σ2
σH
b
σ 2 +σ 2 σi
H
b
2
σb2 +σH
√
φ





β


Suppose β L = 0 for example. Then,
−
g̃H
g̃L
1+
σi2 β H
2 β
σH
s
!
=
2
σb2 + σH
σb2 + σL2

2 
2 
2
σb2 σL



 β − 2 σb 2 σH β


−
β
σb +σH σi 
 σb2 +σL2 σi 
 H
exp  q
 − q



2


σb2 + σH
σb2 + σL2


The monotonicity implies that this condition is satisfied only at a unique value of β. However, the left hand side
of the condition is increasing in β, and on the other hand, the right hand side is non-monotone in β. Therefore,
for some values of parameters, this condition can be satisfied at multiple values of β, contradicting the monotonicity
assumption. That the conditional expectation may be decreasing in one’s own signal could be potentially important
so that the threshold rule may no longer be an equilibrium.
A3 Proposition 1
(i) While the proof of the omission (8) and asymmetry (9) is contained in the main text, there are two additional steps
to proving the Proposition 1 (i) fully. First, one has to verify that, given these thresholds, the supermajoritarian rule
(7) is the policymaker’s optimal strategy. This holds because the researchers’ strategy must satisfy the indifference
condition at the margin whereas the policymaker assess whether the condition holds on average. For example, note
that
E [b | m1 = 1, m2 = ∅] =
>
1
σ2
1
σb2
h
2
σ2
+
1
σ2
1
σb2
h
n
2
σ2
+
i
E β1 + E β2 | β > β2 ≥ β, β1 |β1 ≥ β
h
β + E β2 | β > β2 ≥ β, β1 = β
io
i
= 0,
where the last equality holds due to the researchers’ indifference condition. Analogous arguments also hold for other
realizations of messages, showing (7) is an equilibrium.
Second, the following algebraic argument formally shows that the asymmetry in (9) leads to the upward bias (10):
E [βi |mi 6= ∅] = P [mi = 1|mi 6= ∅] E [βi |mi = 1] + P [mi = 0|mi 6= ∅] E [βi |mi = 0]
1−Φ β
=
q
1−Φ β +Φ β
=
q
σ 2 + σb2
σ 2 + σb2
φ β
−
1−Φ β
Φ β
q
1−Φ β +Φ β
σ 2 + σb2
φ β
Φ β
φ β −φ β
>0
1−Φ β +Φ β
Finally, to confirm the existence and uniqueness of the symmetric equilibrium, one can check the slope of the
combined best response functions using the best response functions. In particular, it is sufficient to show that
65
BR
∂β i
(β −i )
∂β −i
< 0. To see this, given β −i , consider the system of best response functions
h
βi β−i = −E β−i |β −i > β−i ≥ β −i , βi = βi
h
i
i
β−i βi = −E βi |βi ≥ β i , β−i = β−i ,
which are the best response functions (11) and (12) where the thresholds are not restricted to be symmetric
between the two researchers. Note that, by the property of truncated distribution,
∂βi β−i , β−i
∂βi
∈ (−1, 0) and
∂β−i βi
∂βi
∈ (−1, 0)
everywhere in the range. Thus, the unique solution exists.
Consider a small increase in β −i , which shifts βi β−i to be lower. Thus, tracing the curve β−i βi , the equilibrium value of βi also decreases because
∂βi (β−i )
∂βi
BR
< 0. This shows
∂β i
(β −i )
∂β −i
< 0.
(ii) Following arguments show the stability and optimality respectively:
Stability. This paper focuses on local stability among a number of different notions of stability in game theory.
In particular,
here, consider
the equilibrium as a tuple of policymaker strategy and researcher’s monotone strategies
n
o
− ∈ R6+4 so that the initial position
E ≡ π (n) , β , β , β , β . Define the initial disturbance of strategies as →
i
i
j
j
is given by the sum of equilibrium strategies and disturbances that respects the range π ∈ [0, 1]. Consider the
following non-equilibrium adjustment process of fictitious play repeated dynamically t = 1, 2, ... In each period, first,
the policymaker adjusts to the best response to the researcher’s best response inherited from the previous period.69
Then, one researcher (say, i) adjusts his strategy given the policymaker’s updated strategy and another researcher j’s
inherited strategy. Finally, the researcher j adjusts his strategy given policymaker and researcher i’s strategy. Let Et
denote the strategy after the adjustment in period t. Adopting the formulation seen in Defition 6.1 in Chapter 1 of
Fudenberg and Levine (1998), I say that the equilibrium E is locally stable if for every dˆ > 0, there exists d > 0 such
that, given any disturbance such that Euclidian distance d () < d, the equilibrium stays in the neighborhood of the
ˆ I say that the equilibrium is locally unstable if it is not locally stable.
equilibrium (i.e. d (E − E∞ ) < d).
This definition is similar to the Cournot stability, one of the oldest and simplest notion of stability, albeit in
the context with private signals. The equilibrium characterized in Proposition 1 also satisfies the stability in the
foreward induction sense (e.g. those given by Kohlberg and Merten, and Kreps and Cho). Because it is the optimal
equilibrium, players have no incentive to deviate. The advantage of the local stability over these concepts is that it
ensures that the hyper sophisticated coordination is not necessary to sustain the equilibrium; even if the play may
deviate from the equilibrium slightly, the reasonable dynamic adjustment will remain near the equilibrium.
On the other hand, there are two limitations of the definition above. First, to be fully formal, one may want to
consider a continuous perturbation of the strategies that map the signals βi × σi ∈ R2 into probability distribution
over messages. However, this makes the computation of distance to be analytically difficult; thus, this approach
focuses only on the deviation within the threshold rule.
Second, the above fictitious play is silent on the learning process of others’ strategies and the justification of myopic
plays. Immediate and complete learning of each other’s strategy and myopic adjustment process are the assumptions
that help make the investigation of stability tractable; with high-dimensional strategies and more than two players,
stability concept becomes difficult to define and verify; most literature on voting games also have remained silent on
69
It is possible to allow for the policymaker to move later because the researchers’ strategies are continuous in policymaker’s strategy.
This set-up simplifies the steps of proof.
66
the stability property. The contribution of Esponza and Pouzo (2012) showed, when the voters learn about others’
strategies when they are pivotal, the mixed strategy Nash equilibrium is unstable. Their approach was tractable
thanks to the set-up with two states of the world; in the set-up of this paper, the underlying state is continuous.
While the models with finite number of signals typically have symmetric equilibrium that is unstable, the section 7.1
of their paper adds that the symmetric equilibrium is stable when the signal is continuous. In this sense, their result
of lack of stability in symmetric equilibrium of information aggregation models does not suggest lack of stability in
my set-up. The full analysis with specifications on the feedback process is beyond the scope of this current paper.
The following argument shows that the equilibrium
in Proposition
1 is locally stable. Since the policymaker
adjusts first, it is sufficient to consider the noise = i , i , j , j
so that the initial strategies are given by {β i +
i , βi + i ,β j + j , βj + j }. Note that we focus on the symmetric equilibrium β i = β j = β and βi = βj = β.
Step 1. Before
consideringi the adjustment process, it is useful to make the following observations. Denote
h
B β, β, µ = E β|β ≤ β ≤ β, µ to be the expected value of truncated normal distribution with mean µ. Then, for
small ∆,
(30)
(31)
(32)
B β + ∆, β, µ ∈ B β, β, µ , B β, β, µ + ∆
B β, β + ∆, µ ∈ B β, β, µ , B β, β, µ + ∆
B β + ∆, β + ∆, µ ∈ B β, β, µ , B β, β, µ + ∆
(30) and (31) are due to the observation that the mean of truncated distribution does not change as fast as the
boundary changes. (32) is clear by taking the total derivative, combined with the Step 3 in Appendix A1:
dB =
∂B
∂B
∂B
dβ +
dβ +
dµ
∂β
∂µ
∂β
Suppose we consider dβ = dβ = dµ = ∆. Then, since B β + ∆, β + ∆, µ + ∆ ∈ B β, β, µ + ∆ by definition,
1−
Since
∂B
∂µ
∈ (0, 1),
∂B
∂β
+
∂B
∂β
∂B ∂B
∂B
=
+
∂µ
∂β
∂β
∈ (0, 1).
Step 2. Consider the adjustment in t = 1. Note that the posterior belief E [b|n] is continuous in the researchers’
thresholds. Since in equilibrium the posterior beliefs were either strictly above or below 0, sufficiently small perturbation does not alter the policymaker’s strategy. Thus, the policymaker returns to playing the supermajoritarian policy
rule (7).
Consider the researchers’ adjustment. First, i’s strategies will be given by
βi,1 β j,0 = βi,1 β + j
> β − j
by the strategic substitutability in (12) and (31);
β i,1 β j,0 , βj,0 = β i,1 β + j , β + j
> β − j
by the strategic substitutability in (11) and (32) for all j > ˆ (j ) with ˆ (j ) < 0. That is, so long as j is not too
low, then the above inequality holds due to the continuity of β i,1 β, β with respect to β.
67
Then, j’s strategies will be given by
β j,1 β i,1 , βi,1 < β j,1 β − j , β − j
< β + j ,
where the first inequality followed from the strategic substitutability in (11) and the second inequality is due to
(32).
βj,1 β i,1 < βj,1 β − j
< β + j ,
where the first inequality followed fromthe strategic substitutability in (12) and the second inequality by (31).
Step 3. Let us consider t = 2. Since the thresholds are still in the neighrborhood of the equilibrium, for sufficiently
small j , the policymaker’s strategy remains to be the supermajoritarian rule. By the argument analogous to t = 1,
βi,2 β j,1 > βi,1 β + j
> β − j
β i,2 β j,1 , βj,1 > β i,1 β + j , β + j
> β − j
Similarly, repeating the same adjustment,
βj,2 β i,2 < βj,2 β − j
< β + j
β j,2 β i,2 , βi,2 < β j,2 β − j , β − j
< β + j
Therefore, iterating this process forward, we observe that, for all t ≥ 1, the policymaker’s decision rule is supermajoritarian (7) and
βi,t > β − j ; β i,t > β − j
βj,t < β + j ; β i,t < β + j
Step 4. Given dˆ > 0, we can now construct d > 0 from the above conditions. To be most conservative,
suppose
q
the policymaker strategy and researcher i’s strategies were not disturbed. Then, we can set d = minj
2j + ˆ (j )2 .
If the initial perturbation were small enough so that d () < d, then it is guaranteed that the equilibrium will have
68
the distance at most d (E − E∞ ) < dˆ = minj
r
i
h
1
2 × 2j + ˆ (j )2 ). Thus, d = 2− 2 dˆ will satisfy the local stability
condition.
Note that the proof for the local stability here was more involved than the setting with only 1 dimensional strategy
because the strategy was 2-dimensional. Unlike in 1-dimensional case, the Observation 1 does not directly imply the
local stability.
Optimality. Consider the policymaker’s strategy. Following argument shows that the equilibrium that involves
mixed strategies are sub-optimal. Suppose that π (mi , mj ) ∈ (0, 1) for some mi , mj . Consider raising the cut-off to
send the message mi marginally. Then, π (mi , mj ) = 1 and expected welfare from the realization mi , mj is positive.
By Lamma 1, π (mi , mj − k) = 0 and π (mi − 1, mj ) = 0 prior to perturbation. Because the expected welfare from
the message realization mi , mj − k and mi − 1, mj is strictly negative, there exists some small perturbation such that
π (mi , mj − k) = 0 and π (mi − 1, mj ) = 0 even after the perturbation. Thus, expected welfare conditional on the
message realization mj , mj − k are unchanged. By Lamma 1, π (mi + k, mj ) = 1 and for the diagonal, either interior
or corner solutions of π (mi + 1, mj − 1) are possible. If π (mi + 1, mj − 1) is 0 or 1 due to expected welfare strictly
greater or less than 0, then the small perturbation does not alter the decision. If π (mi + 1, mj − 1) ∈ (0, 1) then
the perturbation turns this into π (mi + 1, mj − 1) = 1. Note nevertheless that, combining the message realizations
mi +1, mj and mi +1, mj −1 the expected welfare is unchanged because the increase in welfare in region mi +1, mj −1 is
offset completely by the lower probability of message mi + 1, mj realization. Since this perturbation improved welfare
when mj and kept welfare constant at mj + k and mj − k, it must have improved the overall welfare. Since the welfare
must be higher when welfare-maximizing researchers choose the equilibrium thresholds based on such atlernative
pure-strategy π than such non-equilibrium deviation, the mixed-strategy equilibria must have been sub-optimal.
Note that the equilibria that are not fully informative or not fully responsive cannot be optimal because allowing for
responsiveness to additional messages can always improve welfare. There are two fully responsive and fully informative
equilibria: one under the supermajoritarian rule a = 1 ⇔ n1 > n0 or the submajoritarian rule a = 1 ⇔ n1 ≥ n0 .
Since these equilibria are symmetric with each other around the origin due to c = 0, they attain the same level of
welfare. Thus, the equilibrium under a = 1 ⇔ n1 > n0 is an optimal equilibrium. Note that, conditional on the
supermajoritarian rule, the researchers’ thresholds are unique due to the Observation 1.
The full demonstration of the optimality and stability for the case with N ≥ 3 is beyond the scope of this paper.
Nonetheless, the logic for optimality and stability under the supermajoritarian voting rule applies to the case of N ≥ 3.
First, the supermajoritarian rule partitions the space of signals RN in a way that allows for fine approximation of the
first-best hyperplane. Therefore, no other equilibria can better convey the information. Second, the local stability
will most likely carry over to the case of N ≥ 3 in the model. There is literature on Cournot stability that argues that
the iterative myopic adjustment process diverges for N ≥ 4 (Theochardis, 1960). The subsequent literature (Fisher,
1961) has shown that the myopic non-equilibrium adjustment process converges when it is continuous for N ≥ 3, and
that the divergence result occured due to a particular adjustment Theochardis assumed.
A4 Proposition 2
Equilibrium with no omission
The policymaker’s strategy described in terms of the number of positive and negative results can be written in
terms of the messages:
69
mi
m−i
0 ∅
0 π̃
0 0
0 0
π∗
1
∅
0
1
1
π̃
0
with π̃ ∈ (0, 1].
Equilibrium condition. First, let us verify that β = β is an equilibrium under this policy rule. Recall that
the thresholds, in general, must satisfy the conditions (28) and (29). Suppose another researcher plays β = β. Then,
regardless of one’s own signal, P (m−i = ∅|βi ) = 0. Therefore, one can write the indifference conditions (28) and (29)
as
1
σ2
P m−i = 1|β i (1 − π̃)
+
1
σ2
P m−i = 1|βi π̃
1
σb2
1
σb2
+
h h
i
2
σ2
h h
2
σ2
i
E β−i |β−i ≥ β −i , β i + β i = 0
i
i
E β−i |β−i ≥ β −i , βi + βi = 0
(33)
(34)
Since these conditions are proportional to each other, researcher i’s optimal strategy has β i = βi .
Second, given that the indifference condition is satisfied for researchers when signals equal βi = β = β, the
policymaker will also be indifferent between a = 1 and a = 0 when n1 = 1, n0 = 0. By Lemma 1, this implies that
the conjectured policymaker’s strategy is an equilibrium.
Finally, note that the best response function β i β −i has
dβ i (β −i )
dβ −i
∈ (−1, 0) as noted above. Therefore, the unique
symmetric equilibrium exists. Note that β = β < 0. Towards contradiction, suppose β ≥ 0. Then, the right-hand-side
of the condition (33) is strictly positive, and thus, the condition
(33) cannot be satisfied.
o
n
Stability. Consider the perturbation of the strategies β + i , β + i , β + j , β + j
ordering of the threshold is preserved. Then, generically,
π ∗ (n
1
such that > so that the
= 1, n0 = 0) is either 1 or 0 because the indifference
no longer holds. Given that another researcher j has positive probability of omission, it is also optimal to omit some
results. Therefore, β i,1 > β and βi,1 < β. Thus, j’s best response will also have β j,1 > β and βj,1 < β. Thus, there
does not exist any local perturbation such that the equilibrium does not diverge away. That is, the equilibrium is not
locally stable.
Optimality. To see that the equilibrium with no omission attains strictly lower welfare than the equilibrium
with asymmetric omission, consider the following non-equilibrium deviation: suppose both researchers shift their
0
thresholds so that β = β + ∆ and the policymaker switches to the supermajoritarian rule (7). This attains
strictly
higher welfare than the equilibrium without omission because, for signal realizations such that βi , βj ∈ β, β + ∆ ,
the expected welfare from policy implementation is negative. Since this deviation can allow the policymaker not to
implement the policy then, it improves the welfare.
By Proposition 1, we know that the equilibrium given the policymaker’s strategy (7) is unique. Moreover, since
the researchers’ indifference conditions are identical to the first order conditions of maximizing the social weflare, the
equilibrium characterized by Proposition 1 attains strictly higher welfare than the non-equilibrium deviation above.
Therefore, the equilibrium with no omission is dominated by the equilibrium as in Proposition 1.
Equilibrium with symmetric omission.
Equilibrium condition. First, given the strategic symmetry β = −β, I demonstrate that the receiver’s strategy
70
is optimal. Denoting the density by Φ and φ,
E [b | mi = ∅, m−i = ∅] =
=
=
1
σ2
1
σb2
+
1
σb2
+
1
σb2
+
−β
−β
β
"
βi
−β
−β
#
β
[βi + β−i ] dφ (β−i ) +
2
σ2
1
σ2
β
(βi + β−i ) dΦ (βi , β−i )
2
σ2
1
σ2
β
β
"
βi
−β
−β
[βi + β−i ] dφ (β−i ) dφ (βi )
βi
[βi + β−i ] dφ (β−i ) −
2
σ2
#
βi
[βi + β−i ] dφ (β−i ) dφ (βi )
−β
=0
by the symmetry of bivariate normal distribution.
E [b | mi = 1, m−i = 0] =
=
=
=
1
σ2
1
σb2
+
2
σ2
1
σ2
1
σb2
+
+
+
−∞
(
∞ βi
∞ −β
−β
β
βi
)
(βi + β−i ) dΦ (βi , β−i )
(βi + β−i ) dΦ (βi , β−i ) +
βi
∞ −β
(βi + β−i ) dΦ (βi , β−i ) +
2
σ2
−∞
(
2
σ2
−∞
β
(
1
σ2
1
σb2
(βi + β−i ) dΦ (βi , β−i )
β
2
σ2
1
σ2
1
σb2
∞ −β
β
(βi + β−i ) dΦ (βi , β−i )
β
∞ −β
−
βi
∞ −β
(βi + β−i ) dΦ (βi , β−i ) +
β
)
(βi + β−i ) dΦ (βi , β−i )
β
βi
)
βi
=0
Since c = 0, the policymaker is indifferent between implementing and canceling the policy when (n0 , n1 ) ∈ {(0, 0) , (1, 1)}.
By Lemma 1 (ii), the strategy satisfies the policymaker’s optimization Given the mixed strategy, the researchers’ indifference conditions (28) and (29) are given by
h
i
h
io
1n P m−i = 0|β, P iv E β−i |m−i = 0, β + P m−i = ∅|β, P iv E β−i |m−i = ∅, β = 0
2
h
i
h
io
1n β+
P m−i = ∅|β, P iv E β−i |m−i = ∅, β + P m−i = 1|β, P iv E β−i |m−i = 1, β = 0
2
β+
Substituting the formula of truncated normal distribution, these conditions are
i
1 h
β i + E β−i |β−i ≤ β −i , β i = 0
2
i
1 h
βi + E β−i |β−i ≥ β−i , βi = 0
2
Note that when β i = −βi and β −i = −β−i , these conditions are equivalent to each other. Moreover, the solution β i
is strictly decreasing in β −i . Combining, there exists a solution β i = −βi = β −i = −β−i that satisfies the researchers’
indifference conditions.
Stability. Since the policymaker’s strategy depends on the indifference, it is fragile to a small perturbation of
researcher’s strategy.
Optimality. With an argument analogous to the equilibrium with no omission, a small perturbation can improve
71
the welfare. Moreover, such non-equilibrium deviation is strictly dominated by the equilibrium characterized in
Proposition 1. Thus, the equilibrium with symmetric omission cannot be an optimal equilibrium.
A5 Example with binary signals
Consider the modification of the set-up in Propositions 1 and 2 with 2 researchers from continuous into binary signals.
The state space, communication technology, payoff, and information structure remain unchanged. Maintaining the
symmetric structure while allowing c > 0, suppose the signals are
si = 1 (βi ≥ 0)
Let p ≡ P (βi ≥ 0|b ≥ 0) >
1
2,
to be the precision of the signal which is decreasing in σ. Note that, due to
symmetry of normal distribution, P (βi < 0|b < 0) = p also holds. Denote the maximum cost that makes the policy
implementation socially optimal given two positive signals as c ≡ E [b|s1 = 1, s2 = 1]. If c > c, then communication
is never beneficial since the policy is never worthy of implementation.70
Proposition A5: Suppose c ∈ [0, c].
(i) An optimal and stable (if c ∈ (0, c)) equilibrium has
• the policymaker’s strategy
mi
π∗
1
0
m−i
0 1
0 1
0 0
• researcher’s strategy
(
mi =
1
0
if si = 1
if si = 0
(ii) If c = 0, the following policymaker’s strategy is also optimal with identical researcher’s strategy:
mi
π∗
1
0
m−i
0 1
1
1
2
0 12
Proof. Equilibrium conditions. It is optimal to implement the policy if and only if posterior belief E [b|m] > c.
Assuming the truth-telling, the posterior belief of policymaker is given by the Bayes rule:
P (b ≥ 0) P (m1 = 1, m2 = 1|b ≥ 0)
P (m1 = 1, m2 = 1)
2
p
1
>
=
2
2
p2 + (1 − p)
P (b ≥ 0|m1 = 1, m2 = 1) =
70
Formally, p =
∞h
0
φ( − b )
b + σ̃ 1−Φ −σ̃ b φ
( σ̃ )
i b
σb
db. Moreover, E [b|s1 = 1, s2 = 1] =
72
q
2
2
2 p −(1−p)
,
σ 2 π p2 +(1−p)2
b
where π is the circular constant.
P (b ≥ 0) P (m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1|b ≥ 0)
P (m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1)
1
p (1 − p)
1
= 2
=
p (1 − p)
2
P (b ≥ 0|m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1) =
P (b ≥ 0|m1 = 0, m2 = 0) =
(1 − p)2
1
<
2
2
p2 + (1 − p)
Thus, the expected welfare when a = 1 given messages is
EW (a = 1|m1 = 1, m2 = 1) = P (b ≥ 0|m1 = 1, m2 = 1) E [b|b ≥ 0]
− P (b < 0|m1 = 1, m2 = 1) E [b|b < 0] − c
s
=
2 p2 − (1 − p)2
−c≥0
σb2 π p2 + (1 − p)2
s
EW (a = 1|m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1) =
2
σb2 π
1 1
−
2 2
−c≤0
EW (a = 1|m1 = 0, m2 = 0) = P (b ≥ 0|m1 = 0, m2 = 0) E [b|b ≥ 0]
− P (b < 0|m1 = 0, m2 = 0) E [b|b < 0] − c
s
=
2 (1 − p)2 − p2
−c<0
σb2 π p2 + (1 − p)2
Therefore, the policymaker’s strategy is optimal given the sincere reporting of researchers. To verify that the
researchers’ strategies are also the best response to the policymaker’s strategy, let us consider the expected welfare
from reports conditional on being pivotal (that is, s−i = 1):
s
EW (mi = 1|si = 1, s−i = 1) =
2 p2 − (1 − p)2
−c≥0
σb2 π p2 + (1 − p)2
EW (mi = 0|si = 1, s−i = 1) = 0
Therefore, m∗i = 1 when si = 1.
s
EW (mi = 1|si = 0, s−i = 1) =
2
σb2 π
1 1
−
2 2
−c≤0
EW (mi = 0|si = 0, s−i = 1) = 0
Therefore, m∗i = 0 when si = 0. When c = 0, π (m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1) =
1
2
is also optimal since
the policymaker is indifferent between implementing and not implementing the policy. It also maintains the sincere
reporting by researchers to be the optimal strategy.
Optimality. Since there is no garbling of signals, other equilibria cannot attain strictly higher welfare than this
equilibrium by the Blackwell’s theorem.
73
Stability. To see that the equilibrium is stable as long as c > 0, note that the pure-strategy decisions are based
on strict preference over actions. First, observe that the posterior beliefs are continuous in the researchers’ reporting
strategies. Thus, for a small perturbation on the researchers’ strategy, the supermajoritarian rule described in (i) is
optimal. Second, observe that the difference in expected welfare EW (mi = 1|si , m−i ) − EW (mi = 0|si , m−i ) is also
continuous in another researchers’ strategy. Therefore, even if another researcher deviates from truth-telling slightly,
it is still optimal to equate the message with the signal. Thus, for a small perturbation of initial strategy, the iterative
adjustment converges to the equilibrium.
Note that the Proposition A1 can be extended to the case with N ≥ 3 unlike in the model of Austen-Smith
and Banks (1996). The difference is that, in this model, there is no exogenous requirement to apply a particular
voting rule in aggregating evidence. Therefore, when c > 0 is large, the policymaker strategy can adjust to be more
conservative.
A6 Continuity under small conflict of interest
Consider an example such that the objective is given by a (b + d). By Lamma 1, the monotone strategy remains to
be optimal since d is analogous to c. The indifference conditions are given by

β−ρβ

0

β + ρβ − σ̃F
σ̃
β−ρβ

β−ρβ
0

β + ρβ − σ̃G
σ̃ , σ̃
= −d 2 +
= −d 2 +
σ2
σb2
σ2
σb2
, where F (x) = ln [1 − Φ (x)] and G (x, y) = ln [Φ (x) − Φ (y)] and Φ (·) is a cdf of standard normal distribution.
Rearranging,

2σ 2 +σ 2


 2b 2 β
= σ̃f β, β − d 2 +
σb +σ
2σ 2 +σ 2


 σ2b+σ2 β
= σ̃g β, β − d 2 +
b
, where f β, β =
F0
β−ρβ
σ̃
and g β, β =
G0
β−ρβ β−ρβ
σ̃ , σ̃
σ2
σb2
σ2
σb2
.
By the implicit function theorem for system of equations,
" ∂β #
∂d
∂β
∂d

∂f
σ̃ ∂β
=

2σ 2 +σ 2
− σ2b+σ2
b
∂g
σ̃ ∂β
∂f
σ̃ ∂β
2σ 2 +σ 2
∂g
σ̃ ∂β
− σ2b+σ2
b
− 2+
=
Since
∂f
∂β
< 0 and
∂g
∂β
∂f
σ̃ ∂β
−
2σb2 +σ 2
σb2 +σ 2
< 0,
∂f
σ̃ ∂β
−
σ2
σb2
2σb2 +σ 2
σb2 +σ 2
∂g
σ̃ ∂β
−
∂g
σ̃ ∂β

−1 
σ2
−
2
+

σb2
 

 


2
σ
− 2+
2σb2 +σ 2
σb2 +σ 2
−
σb2


 σ̃
+
2σb2 +σ 2
σb2 +σ 2
∂g ∂f
σ̃ 2 ∂β
∂β
σ̃
∂g
∂β
−
∂g
− ∂β
∂f
∂β
+
∂f
∂β
2σb2 +σ 2
σb2 +σ 2
2σ 2 +σ 2
+ σ2b+σ2
b
+



∂g ∂f
+ σ̃ 2 ∂β
∂β > 0 and thus the derivatives
∂β
∂d
and
∂β
∂d
are
defined everywhere. Therefore, the thresholds β, β are continuous in d.
One can modify this approach for the preference for publication described by utility function ab + d1 (m 6= 0). The
preference for publication can be incorporated into the indifference conditions as willingness to sacrifice the social
welfare b − c by the discrepancy d in the direction that allows for publication. Thus, the conditions are then given by
74

β−ρβ


β + ρβ − σ̃F 0
σ̃

β−ρβ β−ρβ

,
β + ρβ − σ̃G0
σ̃
σ̃
=d 2+
σ2
σ02
= −d 2 +
σ2
σ02
, where the only difference is the sign of d for the first condition. Therefore, the implicit function theorem still applies
and the derivatives are still well-defined.
Note that the
researcher’s
heterogeneous prior could be thought of analogously. Suppose the prior of researchers
2
is given by N b̂0 , σb with b̂0 6= 0, whereas the policymaker has the prior N 0, σb2 . The researchers’ indifference
conditions are written as
h
β + E β−i |β−i
h
β + E β−i |β > β−i
σ2
σ2
≥ β, βi = β + b̂0 2 = c 2 + 2
σ0
σ0
!
σ2
σ2
≥ β, βi = β + b̂0 2 = c 2 + 2
σ0
σ0
!
i
i
In this way, one can observe that the difference in the indifference condition due to b̂0 is almost entirely analogous
to d, that of conflict of interest.
Appendix B.
Figure B1: Monotonicity with heterogeneous G (σ)
Notes: Figure B1 plots the posterior belief of policy benefit conditional on one’s own signals and another researcher’s signal
when it is pivotal: E [b|βi , σi , mi ∈ P iv] for N = 2, c = 0, σb = 31 , and heterogeneous G (σ). While only σi = 0.1, 0.5, 1 are
plotted for visual ease, posteriors for other values of σi also have the same pattern. In equilibrium, the posteriors are monotone
in βi even for heterogeneous G (σ), confirming that the threshold strategy is optimal.
75
Figure B2: Likelihood of policy implementation
Notes: Figure B2 plots the probability of policy implementation P (a = 1) for N = 2, c = 0, σb = 31 , and various values of
standard error of signal, σ, in the equilibriun characterized by Proposition 1. It shows that, relative to the policy implementation
probability under communication of estimates, P (a = 1) = 12 , policy is slightly less likely to be implemented. This is primarily
due to the conservative rule of supermajoritarian voting rule a∗ = 1 ⇔ n1 > n0 , largely mitigated by the thresholds β, β that
lead to the upward bias of the estimates that underly reported studies.
Figure B3: Local stability and optimality for σ > σ
Notes: Figure B3 plots the combined best response functions that solve the following fixed point for each β j for N = 2, c = 0,
σb = 13 , and various values of standard error of signal σ in the equilibriun
characterized by Proposition 1: β i solves the fixed
point of the combined best response function β i = BRi β j , βj , β i = BRi β j , BRj β i , β i . The condition for optimality and
stability is that
∂β i
|
∂β j β i =β j
> −1 so that the strategic substitutability is not too strong. This in turn implies optimality and local
contraction stability. The figure shows that
this condition holds not only for various values of σ from as small as 0.1σb to as
large as 10σb ; that is, the slope of β i β j evaluated at 45 degree line is less steep than the negative 45 degree line. While not
plotted, the function β i β j for σ much larger than 10σb is also not visually distinguishable from that of 10σb .
76
Figure B4: Optimality of supermajoritarian voting rule
Notes: Figure B4 plots the welfare (measured as a fraction of benchmark case with communication of estimates) under a
supermajoritarian rule (a∗ = 1 ⇔ n1 > n0 ) and a submajoritarian rule (a∗ = 1 ⇔ n1 ≥ n0 ) for N = 2, σb = σ = 1, and various
values of c ≥ 0. It shows that the supermajoritarian rule attains higher welfare than the submajoritarian rule for c > 0, and
identical welfare for c = 0. The supermajoritarian rule is better than the submajoritarian rule especially when c is high and thus
there is large relative welfare loss.
Figure B5: Approximation of empirical distribution
Notes: Figure B5 plots the imputed empirical distribution of standard error σ against χ2 distribution with 4 degree of
freedom between 0 and 10. The distribution is imputed because the observed distribution may not represent the entire underlying
distribution because of omission of some studies. The empirical distribution is constructed with the positive significant results
−1
−1
P
P
√ 2 i −b̂20
√ 2 i −b̂20
from the example 1 of labor union (G (σ) = C1 i 1 − Φ 1.96σ
1 (σ ≥ σi ) , where C = i 1 − Φ 1.96σ
is
σi −σ̂0
σi −σ̂0
the normalizing constant. The largest standard error is normalized to be 1. The figure shows that χ2 distribution with 4 degree
of freedom approximates the empirical distribution reasonably, and was used for the numerical illustration in Section 3.3.
77
Figure B6: Example of convex β (σ)
Notes: Figure B6 plots the thresholds β (σ) and β (σ) for c = 23 , σb = 13 , and heterogeneous G (σ). This example shows that
β (σ) can be convex for large c > 0. This is because, when c is large, the effect of conservative prior as expressed in (5) is also
large, and overturns the effect of σ through the weights on Bayesian updating. Even though the difference between the threshold
of N = 1 and N = 2 grows as σ increases, the convexity of the threshold of N = 1 may dominate.
Figure B7: Posterior consistency
Notes: Figure B7 plots the distribution of posterior belief Eb (n) in the equilibrium of supermajoritarian policy rule (a∗ =
1 ⇔ n1 ≥ n0 ) with σb = σ = 0 and c = 0 for N = 2, ..., 10. It shows that, when there are marginally more positive results than
negative results (n1 = n0 + 1), then the posterior belief is positive; conversely, when there are marginally no more positive result
than negative results (n1 = n0 ), then the posterior belief is negative. Since the posterior beliefs are monotone in the number of
positive and negative results, this confirms that the conjectured supermajoritarian policy rule (a∗ = 1 ⇔ n1 ≥ n0 ) is consistent
with the belief and utility maximization of the policymaker. Note that, for N ≥ 3, an arbitrarily conjectured policy rule has no
guarantee to be supported in equilibrium. While the Proposition 1 characterizes equilibrium only for N = 2, this shows that the
result most likely extends to much larger N . The Figure B7 is suggestive that this remains to be an equilibrium for larger N as
the posterior belief steadily approaches posterior Eb (n) = 0. This result of consistency of supermajoritarian policy rule holds for
various values of c. Here, the analysis restricts to the case of degenerate distribution of σ due to the computational feasibility.
78
Appendix C. Empirics
C1 Kolmogorov-Smirnov Test with entire data
Table C1: Summary of Kolmogorov-Smirnov-type tests with entire sample
Labor union effect
EIS
G∅
H0
G∅
H0
p-value 0.0003
0.0800
0.0786 0.9113
D
0.3740
0.1448
0.1933 0.0195
D
0.2330
0.1606
0.1960 0.1671
arg sup 0.0831
-0.0629
0.1594 9.1830
Notes: Table C1 summarizes the results of KS-type tests applied to the samples not selected by whether the binary conclusions
were highlighted in the papers’ abstracts, introduction, or conclusion. D denotes the KS statistic with the predicted distribution
as in (24) and (25). D denotes the critical values of KS statistics for rejecting the null hypothesis at 95 confidence level. This is
included for benchmark comparison although the p-value cannot be computed by comparison of KS statistics. arg sup denotes
the parameter values {σ, β} at which the KS statistic is measured. The p-value is the overall p-value computed by bootstrap.
The estimates use the stem-based estimates identical to the ones presented in Table 4.
In addition to the Table 4 that focused on the estimates that focused on the papers with binary conclusions,
Table C1 summarizes the test statistics with the entire sample to be fully transparent. First, it shows that the overall
qualitative patterns remain even with the full sample: the p-value for G∅ (σ) is highly statistically significant, and
H0 (β) is significant at 10 percent level. Second, nevertheless, the overall result is weaker compared to the case that
focused on the papers with binary conclusions. This is consistent with the explanation that the pattern was driven
by the concern to determine yes-or-no conclusions.
C2 Funnel plot of examples
Figure C1: Funnel plot for labor union effects (Left) and EIS (Right)
Notes: Figures C1 are the funnel plots of all data points for both the labor union effects and EIS literature. The funnel plot
for labor union effects visually shows that there is scarcity of null results among the imprecisely estimates studies. The KS-type
test presented in the paper is a formal test of this pattern.
79
C3 Figures of Kolmogorov-Smirnov Test for EIS
Figure C2: Distribution of negative studies’ standard errors and coefficients for EIS
Notes: Figure C2 plots the cumulative distribution function of standard errors of null studies (Top) and coefficients of
negative studies (Bottom) for estimates of EIS. The solid green line represents the observed distribution and the dashed blue
line represents the predicted distribution along with its 95 confidence interval estimated by the bootstrap. The solid red and
vertical line quantifies the KS statistic for distribution of standard errors, and the KS statistic of the hypothesis that Ĥ0 < H̃0
for distribution of coefficients. The 95 confidence interval for H̃0 appears small primarily because of the scale of axies along β.
C4 Proofs of Proposition 3
The following proofs show the limit and monotonicity of bias for each three model one by one.
80
Uniform selection model
Using the truncated normal distribution’s formula, one can express the bias under uniform selection model as
Bias (σi ) ≡ E [βi |σi , study i reported] − b0
h
= b0 +
σ02 + σi2
= − σ02 + σi2
i
η1 −φ β + φ β
q
h i
η1 Φ β + 1 − Φ β
−tσi −b0
∂ ln η1 Φ √ 2 2
σ0 +σi
i
i − b0
h + η0 Φ β − Φ β
+ 1 − Φ √tσi2−b0 2
+ η0 Φ √tσi2−b0 2
σ0 +σi
σ0 +σi
−tσi −b0
−Φ √
2
2
σ0 +σi
∂b0
= − σ02 + σi2
h − η0 φ β − φ β
−tσi −b0
∂ ln η1 + (η1 − η0 ) Φ √ 2 2
σ0 +σi
− Φ √tσi2−b0 2
σ0 +σi
∂b0
, where Φ in the final expression denotes the cdf of standard normal distribution.
(1) Limit. By the continuity and finiteness of limit, one can distribute the limit:
lim Bias (σi ) = lim − σ02 + σi2
σi →0
∂ ln η1 + (η1 − η0 ) limσi →0
σ0 +σi
− Φ √tσi2−b0 2
σ0 +σi
∂b
σi →0
n
= −σ02 ×
−tσi −b0
Φ √
2
2
h ∂ ln η1 + (η1 − η0 ) Φ
−b0
σ0
−Φ
−b0
σ0
io0
∂b0
∂ ln {η1 }
= −σ02 ×
=0
∂b0
Therefore, limσi →0 Bias (σi )2 = 0. Since Bias2 cannot be
than 0, this
less is the minimum.
−tσ
−b
tσ
−b
(2) Bias monotonicity. Denoting Z (b0 , σi ) = Φ √ i2 0 2 − Φ √ 2i 02 for notational ease, one can rewrite the
σ0 +σi
σ0 +σi
bias as
Bias (σi ) = − σ02 + σi2
∂ ln Z ∂Z (b , σ )
0 i
∂Z
∂b0
 
q
=

0
for


∂ ln [η1 − (η1 − η0 ) Z]   tσi − b0 
−tσi − b0 
σ02 + σi2
φ q
− φ q
∂Z
σ2 + σ2
σ2 + σ2
i
0
i
It
q is sufficient to confirm that all three factors are increasing in σi when σi is near zero. This is clearly true
σ02 + σi2 .
∂ ln[η1 −(η1 −η0 )Z]
∂Z By symmetry, φ √tσi2−b0 2
is increasing in Z since η1 > η0 , and Z is increasing in σi for sufficiently small σi .
−tσi −b0
−φ √
2
2
σ0 +σi
σ0 +σi
= φ √tσi2−b0 2
σ0 +σi
− φ √tσi2+b0 2 , which is initially increasing in σi and
σ0 +σi
decreasing in σi afterwords. To see this, notice that, as σi increases the mean of the distribution, √ tσ2 i
σ0 +σi2
the truncation points √
limσi →∞ φ √tσi2−b0 2
σ0 +σi
b0
, − √ b20 2
σ02 +σi2
σ0 +σi
− φ √tσi2+b0 2
σ0 +σi
divergence of limσi →∞
, shifts while
remain symmetric around 0. Thus, the bias initially increases. Nevertheless,
= φ (t) − φ (t) = 0. Since limσi →∞
∂ ln[η1 −(η1 −η0 )Z]
∂Z
is finite if η0 > 0 and the
q
σ02 + σi2 = ∞ is not fast, by L’Hopital’s rule, the term limσi →∞ φ √tσi2−b0 2 −φ √tσi2+b0 2
σ0 +σi
σ0 +σi
=
0 dominates. Combining, the Bias2 will be
zerofor large σi . Note that Bias(σi )2 could be constant
decreasing
towards
i
since it is 0 throughout σi if b0 = 0 by Φ √−tσ
2
σ0 +σi2
The expression φ √tσi2−b0 2
σ0 +σi
− φ √tσi2+b0 2
σ0 +σi
− Φ √ tσ2 i
σ0 +σi2
= 0.
provides some information about the value of σ since this difference
81
is approximately at its peak when one argument is near 0. If b0 > 0, this occurs when σi =
b0
t ;
that is, when σi is at
the precision such that the null hypothesis test just rejects under the specified t-statistic threshold and true value of
b0 . Since other terms will still be increasing, it shows that when b0 high, the bias monotonicity holds for the relevant
range. Thus, if b0 were low, σ could also be very low so that the bias monotonicity may not hold for most of the
observed values of σi . Nevertheless, this is unlikely to be an important concern for two reasons:
1. When b0 is low, the potential bias itself is very low. Therefore, it is highly unlikely that the rejection of null
hypothesis can be driven solely by the bias itself.
2. The pilot value of b0 is specified to be the one closest to 0 among all possible values of n. Therefore, if the bias
indeed decreases as n becomes larger, the mean estimate that includes those with lowest bias will be included.
Gradual selection model
Consider the gradual selection model whose selection probability is given by the probit model. This is analogous to
the Heckman selection model with joint normality assumption on the error terms from the first and second stages.
The publication probability in the text can be written as
1
P (mi =
6 ∅) = P δ0 + δ1 + εi > 0
σi
, where εi ∼ N (δ2 βi , 1). Then, the bias can be written as
Bias (σi ) ≡ E [βi |σi , study i reported] − b0
q
= δ2 σ02 + σi2
φ δ0 + δ1 σ1i
Φ δ0 + δ1 σ1i
(1) Limit. By continuity and finiteness of limit, and under the model’s assumption of δ1 > 0,
lim Bias (σi ) = δ2 lim
σi →0
σ02 + σi2 lim
σi →0
σi →0
(2) Bias monotonicity. Note that the inverse Mills ratio
φ
increases the term
δ0 +δ1 σ1
i
Φ
δ0 +δ1 σ1
i
φ(·)
Φ(·)
=0
φ δ0 + δ1 σ1i
q
Φ δ0 + δ1 σ1i
is a decreasing function of the argument, as σi
increases. Since
q
σ02 + σi2 is also increasing in σi , Bias (σi )2 is increasing in σi .
Extremum selection model
Consider the extremum selection model in which, for expositional simplicity, the mean is normalized to b0 = 0. The
bias can be thought of analogously for b0 6= 0.
(1) Consider βmin ≥ 0. Note that the monotone likelihood ratio property (MLRP) is satisfied for the normal
distribution: for any σ > σ 0 and for any β > β 0 > βmin ,
f (β|σ)
f (β|σ 0 )
>
f (β 0 |σ)
f (β 0 |σ 0 )
because
β 2 − β 02
exp − 2
σ0 + σi2
!
β 2 − β 02
> exp − 2
σ0 + σi02
82
!
By MLRP,
E [β|σi , β ≥ βmin ] > E β|σi0 , β ≥ βmin
Thus, the bias is also increasing in σi .
(2) Suppose βmin < 0. Then, the bias is
Bias (σi ) ≡ E [βi |σi , study i reported]
=
φ
q
σ02 + σi2
√βmin
σ02 +σi2
1 − Φ √βmin
2
σ0 +σi2
Both limit and bias monotonicity comes from the observation that
Mill’s ratio
φ(·)
1−Φ(·)
q
σ02 + σi2 is increasing in σi , and the inverse
is a decreasing function of the argument. Therefore, bias is increasing in σi .
Since the bias is increasing everywhere on σi ≥ 0, it approximates its minimum at the limit σi = 0.
C5 Mean Squared Errors
83
Figure C3: Bias-variance trade-off for labor union effects (top) and EIS (bottom)
Notes: Figures C3 plot how the mean squared error of the stem estimate changes as more studies are included. The mean
squared error is decomposed into bias term (Bias2 ), which is increasing in number of included studies, and variance term, which
is decreasing in number of included studies. They show that, when combined, the MSE is approximately convex in both data.
Appendix References
Esponda, Ignacio, and Demian Pouzo. 2012. “Learning Foundation and Equilibrium Selection in Voting Environments
with Private Information.” University of California, Berkeley mimeo
Fisher, Franklin M. 1961. “The Stability of the Cournot Oligopoly Solution: The Effects of Speeds of Adjustment
and Increasing Marginal Costs.” The Review of Economic Studies 28 (2): 125–35. doi:10.2307/2295710. “The
Theory of Learning in Games.” 2016.
Fudenberg, Drew, and David K. Levine. 1998. “The Theory of Learning in Games” MIT Press.
Theocharis, R. D. 1960. “On the Stability of the Cournot Solution on the Oligopoly Problem.” Review of Economic
Studies 27 (2): 133–34.
84