Unbiased Publication Bias: Theory and Evidence Chishio Furukawa∗ January 2017 Abstract Publication bias compromises the truthfulness of scientific evidence that informs policy decisions. While it is conventionally attributed to the research community’s bias, this paper shows that, when the policymaker only counts positive vs negative results of null hypothesis testing to decide whether to implement policies, even unbiased and rational researchers’ reports will have such bias in the socially optimal communication equilibrium. The information asymmetry among researchers and the coarseness of conclusions explain the omission of intermediate estimates such that reported estimates have an upward bias. The contrasting use of standard errors between Bayesian updating and null hypothesis testing explains the inflation of some marginally insignificant results. Empirically, the model makes a unique testable prediction: omission occurs among imprecisely estimated coefficients near zero. Examples show they apply to and only to literature with aggregation through vote counting. Practically, the paper proposes a simple and conservative bias correction method that is robust under various models of publication bias commonly suggested. ∗ [email protected]. Department of Economics, Massachusetts Institute of Technology I am indebted to Abhijit Banerjee for his guidance. I gained knowledge regarding publication bias from Toshi A. Furukawa and the workshop at Berkeley Initiative for Transparency in Social Science in June, 2015. I thank Harry Di Pei, Arda Gitmetz, Alan Olivi, Michael B. Wong for advice on theory, and Tetsuya Kaji, Yuya Sasaki, and Isaiah Andrews on econometrics, and Esther Duflo, Daniel Prinz, Daniel Waldinger, Alonso Bucarey, Mahvish Shaukat, Neil Thakral, and seminar participants at MIT Political Economy lunch for encouragements. All errors are mine. 1 Introduction Informed decisions on socioeconomic, medical, educational, and environmental policies need a careful synthesis of the scientific literature full of subtle details and disagreements. In practice, however, an influential summary of 12,000 published articles on climate change that reached President Obama ran: “97% of climate papers stating a position on human-caused global warming agree that global warming is happening – and we are the cause” (The Consensus Project, 2014). This review was apparently a successful communication strategy for aggregating contradictory evidence for citizens and policymakers even though it only referred to the abstract of each study and did not assess credibility nor reflect details of the estimates. However transparent and effective, meta-analysts today would consider this approach to be unsatisfactory for two reasons. First, publication bias due to omission of negative results and inflation of marginal results is widely documented in many areas of scientific research. As positive results are more likely to be published and disseminated than negative results,1 aggregation without proper consideration for selective reporting can be misleading. Second, vote counting of positive vs negative results is so coarse that the conclusion can often be overturned with formal aggregation of coefficients weighted by their precision and methodological reliability. This paper shows, contrary to such common view, that the first deficiency of publication bias counter-balances the second inefficiency of vote counting; that is, while publication bias is often seen as being caused by interested researchers, it could be observed with fully disinterested researchers who face the difficulty of simplifying nuanced results into yes-or-no conclusions. Specifically, publication bias is the systematic gap between the published literature and the subject it studies due to selective reporting across and within studies. Across studies, omission of imprecise negative results leads to collectively generating upward bias on the reported estimates. Within published studies, inflation that turns marginally negative results into positive results leads to exaggeration of reported estimates. This paper will demonstrate that coarse aggregation, information asymmetry, and Bayes’ rule can explain both omission and inflation. Further, given that the model makes distinct testable prediction regarding the publication selection process, it suggests a new, simple bias correction method that is conservative and robust under commonly proposed models of publication bias. The model’s first result is that unbiased researchers omit results with estimates of moderate magnitude in such a way that the equilibrium has an upward bias in average underlying estimates of reported studies. This is driven by a combination of information asymmetry among researchers, binary actions, and a message space that is coarser than a signal space. Let us consider an example with two social-welfare-maximizing researchers, who receive βi , an unbiased and noisy estimate of policy benefit. If their signals can be directly communicated, then by Bayes’ rule, it is optimal to implement the policy if βi + β−i ≥ 0 under a benchmark 1 By negative results, this paper will refer to any non-positive results, including both statistically insignificant results and statistically significant negative results. 1 condition2 . If the policymaker only counts the binary conclusion and implements the policy if there are more positive results than negative results (Table 1), then a researcher draws a positive conclusion if his signal is sufficiently high βi ≥ β , or a negative conclusion if his signal is sufficiently low βi < β , or not report the study if his signal has an intermediate value β > βi ≥ β . Table 1. Policy decision under vote counting Since each researcher does not know what other studies will observe and report, he conditions his reporting decision on the state in which his own report will be pivotal and swing the policy decision in the direction of its conclusion. A positive result switches the policymaker from canceling ( ) to implementing ( ) the policy only when another researcher did not report his result because his signal had an intermediate value. Thus, the optimal threshold β satisfies the indifference condition that satisfies the social-welfare-maximizing rule, βi + β−i = 0, in expectation conditional on pivotality: h i β + E β−i | − i does not report, βi = β = 0 In contrast, a negative result leads her to cancel ( ) rather than to implement ( (1) ) the policy only when his report is positive. Thus, the optimal threshold β satisfies: h i β + E β−i | − i reports a positive result, βi = β = 0 (2) Because the binary conclusion cannot convey the strength and can only communicate the sign of the continuous signal each researcher receives, omission of intermediate results can better convey information than always reporting either positive or negative results; omission is analogous to adding another message that the result was neither positive nor negative. In addition, by combination of the conditions (1) and (2), researchers are more cautious of reporting negative results than positive results because negative results change the policy decision only when another researcher receives very large signal β−i ≥ β, whereas positive results change it only when another researcher has a signal of moderate magnitude, β > β−i ≥ β. By this asymmetry of pivotality conditions, given that results are reported, the coefficients are on average 2 Suppose each signal βi is normally distributed with identical standard errors, and the two researchers’ signals are independent conditional on the true policy effect. Moreover, suppose the cost of implementing the policy is zero, the objective is risk-neutral, and that their common prior is also normally distributed with mean zero. This set-up is public information. This brief overview will focus on an equilibrium in which thresholds β, β are the same for two researchers, and the main analysis will validate this assumption. 2 upward biased: E [βi |i reports either a positive or negative result] > E [βi ]. This asymmetry between willingness to report positive vs negative results is key feature of the stable and optimal equilibrium: while equilibria without omission or with symmetric willingness to report the two results also exist, they are locally unstable and socially suboptimal. The second result is numerical and shows that, when a constant p-value is used to determine the binary conclusion across studies with different precisions, the social-welfare-maximizing researchers may inflate some marginally insignificant results to still draw positive conclusions. This is due to the contrast in the use of standard errors between Bayes’ rule and null hypothesis testing: whereas Bayesian updating divides each coefficient by its variance because it is an aggregation with weights proportional to sample size n, the t-statistic divides the coefficient estimate by its standard error because it is a normalization for convergence that occurs at √ rate n. Whereas the p-value focuses on how unlikely that a given observation occurs under the null hypothesis of zero effect, the optimal critical level given the coarse aggregation by policymakers focuses on how well binary conclusions can approximate the Bayesian updating formula. Thus, rational Bayesian researchers adopt a threshold of coefficient magnitude that is strictly convex in its standard error to determine whether to draw a positive conclusion, and therefore, may report a positive conclusion even if the null hypothesis testing does not reject at a given conventional threshold. Numerical illustrations demonstrate these results apply to general environments with various parameter values. Quantitatively, the upward bias can be as large as one standard deviation of the underlying distribution of policy effectiveness at least among the most imprecise studies. Further, the omission rate can be as large as 40 percent on average, and 70-80 percent among the most imprecise studies. When the imprecise studies are reported, moreover, they are positive results 90 percent of the time because the first result of biased omission and the second result of non-linear thresholds reinforce each other; imprecise negative results are disproportionately less likely to be reported than positive results. Whereas full reporting of null hypothesis testing implies that coarse aggregation leads to welfare losses as large as 30 percent of the full information benchmark, reporting with publication bias can roughly halve the losses. Finally, the strategic substitutability in the indifference conditions (1) and (2) shows that coarse communication amplifies small differences in belief or interest among researchers to generate large discrepancies in their decision rules to report yes-or-no conclusions. While the allegiance bias, the association between authors and conclusions, is often attributed to polarized views of researchers, the model suggests that their underlying disagreements may be much smaller than the observed differences in their reporting patterns may suggest. As a positive theory, the voting model of publication bias provides empirical predictions on patterns of omission that are distinct from models of publication bias commonly used for bias correction: omission occurs among the imprecisely estimated coefficients of intermediate magnitude such that there is an upward bias overall. With two publicly available meta-analysis datasets from economics, a comparison of observed vs predicted distributions of standard errors 3 and coefficients can test this hypothesis. The example with vote counting, the effect of labor unions on productivity, supports the prediction: negative results tend to be more precisely estimated than positive results and to have coefficients that are large in absolute value. On the other hand, the literature on the elasticity of intertemporal substitutions (EIS) that is unlikely to be summarized by vote-counting does not support the prediction: negative results are less likely to have negative point estimates and instead have bunching at the coefficient zero because of the theoretical restriction of concave utility. These examples demonstrate how coarse communication can explain patterns of meta-analysis data that other models cannot. Assumptions on the publication selection process are fundamental for the choice of bias correction methods in meta-analyses, and the model suggests that the existing methods are likely to suffer from mis-specification by imposing a uniform or monotone selection process. As an alternative, this paper proposes a new, simple, and conservative bias correction method that is robust in addressing all of the commonly proposed selection processes. While the communication model and other existing models make different predictions on the publication probability of different coefficient magnitudes, they all make a shared prediction that precisely estimated studies are less subject to publication bias than imprecisely estimated ones. Therefore, one can obtain the efficient estimate from meta-analysis datasets by using only estimated studies that are sufficiently precise, where the inclusion criteria of less precise studies can be determined by the bias-variance trade-off. This proposed correction methodology, called stem-based bias correction because it uses the “stem” of the “funnel plot,” can be applied to over 4,000 meta-analyses conducted every year that carefully synthesize contradictory pieces of evidence. As a normative theory, the model provides an understanding of the publication selection process that is not dismissive of researchers’ intent and rationality by clarifying that efficient aggregation under coarse communication is fundamentally incompatible with unbiasedness of estimates that underly reported studies. To explain publication bias, literature has assumed that the research community is either guided by its agendas, concerned primarily with career, or driven by its predisposition for significant results. Such widespread interpretation suggests corrective and compulsory policies are socially desirable because consumers of knowledge may be misled or may no longer trust scientific evidence. In contrast, this model suggests that observed patterns of publication bias do not imply interests and irrationality among researchers; conversely, the model predicts some degree of publication bias even from unbiased researchers. Given the difficulties of scientific communication, this paper suggests the readers to hold on to their willingness to learn from the literature even when there is evidence of publication bias. Moreover, the researchers need not take the publication bias detected in meta-analysis as a warning that calls for full reporting of all evidence because efficient communication with single papers is an objective distinct from the unbiasedness of estimates in systematic reviews. The paper is organized as follows: Section 2 reviews the evidence on publication bias; Section 3 presents the model with analytical and numerical results; Section 4 presents the empirical tests, and proposes a new correction method; Section 5 discusses the model’s key 4 assumptions, relation to literature, and policy implications; and Section 6 concludes. 2 Motivating Evidence Before presenting the main analyses, this section reviews evidence of publication bias to clarify what data the model aims to explain, with reference to literature from various fields. β density b σ 0 1.96 β σ Figure 1: Funnel plot (Left) and histogram of t-statistics (Right) Notes: The funnel plot (Left) is a plot of coefficient and standard errors of main estimates from each study. The figure plots the reported study in filled dots and omitted study in empty circles. In the presence of publication bias, there is a positive correlation between coefficients and standard errors of reported studies, as depicted by the red line. The right figure is a histogram of reported studies’ t-statistics where the gray line describes the smooth underlying density. With publication bias, there will be a mass of studies right above the conventional significance level. The figures are generated by the author for an illustrative purpose, and do not reflect actual data. 2.1 Omission Consider a set of independent studies with estimated coefficients and standard errors {βi , σi } that measure the identical average treatment benefit b so that Eβi = b. If there were no small study effects or finite sample bias so that the treatment effect is not affected by sample size, then in the absence of selective reporting, the coefficient estimate βi is uncorrelated with its standard error σi . Moreover, if the underlying effect has a normal distribution, then there is an abundance of intermediate coefficient values with only few extreme values. A number of meta-analysis data sets do not support these two predictions and suggest the omission of some studies, as shown in Figure 1 Left. First, there is often a positive correlation between the coefficient magnitude and the standard error; on average, imprecisely estimated studies have higher coefficient values than precise studies. This could be due to researchers 5 omitting studies unless they are positive statistically significant. If the study is imprecise, then a large coefficient magnitude is needed whereas if the study is precise, then coefficient magnitude can be modest (Hedges, 1992). Alternatively, this could also be due to researchers omitting studies when the coefficient values are low (Duval and Tweedie, 2000; Copas and Li, 1997). There are two formal tests that examine the presence of publication bias through examining the correlation between coefficient magnitude and study precision: an ordinal test that examines their rank correlations (Begg and Mezuemder, 1994) and a cardinal test that examines the correlation by regression (Egger et al., 1997). Second, there is occasionally excess variance in the estimates with an abundance of studies at the extreme values beyond significance thresholds and a paucity of studies with intermediate coefficients (Stanley, 2005). While not universal, it is common to find such patterns of omission in meta-analyses. In economics, important estimates such as the impact of minimum wage on employment (Card and Krueger, 1995), the return to schooling (Ashenfelter et al., 1999), and the intertemporal elasticity of substitution (Havránek, 2015), have evidence of a positive correlation. In medicine, the effect of anti-depressant drugs was exaggerated by about 30 percent due to omission of negative results (Turner et al., 2008), and this received much attention through a New York Times article (Carey, 2008). In environmental studies, estimates of the social cost of carbon, a key statistic for carbon tax policy, were shown to have the bias (Havránek et al., 2015). 2.2 Inflation When the originally intended specification has a statistically insignificant result, researchers may inflate the statistical significance through the choice of specifications for outcome and control variables rather than omit the entire study (Leamer, 1978). If increasing the coefficient requires search costs and it is sufficient to have statistically significant results, then there will be an excess mass right above the statistical milestones, as depicted in Figure 1 Right. Although empirically distinguishing inflation from omission is challenging, some evidence suggest that omission alone cannot explain data, and thus inflation must exist. In economics, under some parametric assumptions on the underlying distribution and a form of selection process as well as monotonicity of publication probability across t-statistics, Brodeur et al. (2016) shows that about 8% of results reported as statistically significant may be due to inflation. In sociology and political science, the Caliper test that examines excess mass right about the cut-off suggests that some inflation may exist (Gerber and Malhotra, 2008). In psychology with lab experiments such that sample size can be adjusted subsequently, Simonsohn et al. (2014) reported density of p-values among the statistically significant tests were increasing in p-values and interpreted this as evidence of inflation. In medicine where original specifications are available for registered protocols from the Food and Drug Administration (FDA), Turner et al. (2008) suggests that 12 out of 36 studies on anti-depressants whose original conclusions were negative were published as positive in refereed journals. 6 3 Model 3.1 Set-up Consider a static communication model between N senders, called researchers, and 1 receiver, called a policymaker. Researchers receive private unbiased signals of the treatment effect of policy βi ∼ N 0, σi2 , which are independent conditional on the true treatment effect and have the standard errors σi > 0 that are independently distributed according to some common distribution G (σ). They send the message of a positive result mi = 1 or a negative result mi = 0, or do not report their study mi = ∅. The policymaker observes the number of positive results, n1 = PN i=1 1(mi = 1) and negative results n0 = PN i=1 1(mi = 0) and decides whether to implement the policy a ∈ {0, 1}. Both the researchers and the policymaker maximize the social welfare from the policyimplementation decision: a (Eb − c) , (3) where b is the true treatment effect, and c is the cost of policy implementation.3 They have a common prior over b that is distributed according to the normal distribution with mean 0 and standard deviation σb . Assume c ≥ 0 so that the policymaker will not implement policy under her prior belief.4 First, researchers receive their own signals, and simultaneously decide whether to publish their binary conclusions. Then, the policymaker sees the reports and makes her policy decision using her posterior. Finally, the pay-offs are realized. The number of players, their pay-offs, priors, and the cost of policy implementation c are public information. 3.1.1 Equilibrium This paper focuses on the Perfect Bayesian Nash Equilibrium, which is a set of strategies that are the best responses to one another and the beliefs that are consistent with Bayes rule.5 The strategy of policymaker π (n) ≡ P (a = 1|n) , is a mapping from number of positive and negative results n ≡ {n1 , n0 } to the probability distribution over binary policy action a. The strategy of a researcher is a mapping from his own signal βi and σi to probabilities over his message mi ∈ {∅, 0, 1}.6 3 While this set-up may appear to suppose no uncertainty in welfare when the policy is not implemented 2 since it is 0, it encompasses such setting: suppose the welfare under policy is u1 ∼ N u1 , σu1 and the welfare in 2 the absence of policy is u0 ∼ N u0 , σu0 . Then, it is optimal to implement the policy if and only if u1 − u0 ≥ c. 2 2 Thus, defining b ≡ u1 − u0 and σb2 ≡ σu1 + σu0 above will be an equivalent set-up. 4 Eb = 0 is a normalization since only Eb − c affects the policy implementation decision. Therefore, prior belief over b will be incorporated into c in the discussions. The model allows Eb > 0 and c > 0 and the results on bias and omission remain unchanged. 5 For expositional ease, it also assumes that beliefs over off-path realizations are consistent with utility maximization. 6 For expositional simplicity, note that the messages m ≡ {mi }i=1,...,N are anonymous in that the policymaker does not observe who reported each result. Without loss of generality, the analysis restricts attention to the case in which the meaning of the message is consistent across all senders. 7 Communication models always have multiple equilibria, including completely uninformative babbling equilibria. To make meaningful empirical predictions, the analysis takes two steps of refining the equilibrium. First, to restrict attention to equilibria that avoid implausible coordination failures, it focuses on equilibria that are fully responsive and fully informative. An equilibrium is fully responsive if π (n) is not constant across all values of n1 for some n0 and across all values of n0 for some n1 . An equilibrium is fully informative if E [b|n] is not constant analogously.7 Second, to select among multiple equilibria that are responsive and informative, it focuses on equilibria that are optimal and locally stable. An equilibrium is optimal if no other equilibria can attain strictly higher ex ante welfare. An equilibrium is locally stable if, for every neighborhood of the equilibrium, there exists a sub-neighborhood of the equilibrium from which the myopic and iterative adjustment of strategies stays in that neighborhood. It is standard to focus on the most informative equilibrium in sender-receiver games, and on the optimal equilibrium in common interest games. Local stability assures robustness to small deviations from the equilibrium strategies. 3.2 Analytical characterizations with homogeneous precision This subsection will characterize the equilibrium under the assumption that N = 2 and G (σ) is a degenerate distribution at a particular σ. This set-up8 can illustrate the main mechanism behind the optimality of omission that generates bias of the signals used when m 6= ∅. 3.2.1 Communication with estimates It is helpful to first characterize the full information benchmark of directly communicating βi rather than restricted message. Suppose the policymaker can update her expectation over treatment effect directly using signals available to researchers. Using the Bayes rule, her posterior belief over b is E [b | β1 , .., βN ] = 1 σ2 PN 1 σb2 i=1 βi + σN2 (4) Thus, it is ex ante efficient to implement the policy (a = 1) if (with strict inequality) and only if PN i=1 βi N σ2 ≥c 1+ N σb2 7 ! (5) Note that this definition differs from usual responsiveness and informativeness in that it requires both m = 1, 0 to be informative and influential. For example, the policy rule a∗ = 1 ⇔ n1 = 2 is not fully responsive in that m = 0 never alters the decision. 8 Given that meta-analyses often include more than 2 studies, this set-up with N = 2 may appear restrictive. However, this simple setting is common in the committee decision-making literature to which this paper is most closely related, such as Gilligan and Krehbiel, 1989; Austen-Smith, 1993; Krishna and Morgan, 2001; and Hao, Rosen, Suen, 2001. Analytical characterization for N ≥ 3 is infeasible due to technical difficulties of analytically evaluating multivariate normal’s truncated mean, a necessary step to ensure the consistency of posterior. Thus, next subsection will demonstrate numerically that the main mechanism holds for N ≥ 3. 8 That is, the policy should be implemented if the average signal is larger than the threshold plus the off-setting effect from the conservative prior. 3.2.2 Communication with coarse messages Under coarse messages, the Bayes’ rule suggests plausible equilibria will have the monotonicity on the strategies of both researchers and policymaker. Lemma 1 (Monotonicity). For N=2, for any c, any fully responsive and fully informative equilibrium has strategies that are monotone: (i) each researcher takes the threshold strategy: there exists βi , β i ∈ (−∞, ∞) such that m∗i = 1 βi ≥ β i h ∅ if βi ∈ βi , β i 0 βi < βi (ii) the policymaker’s decision π ∗ (n) is increasing in number of positive results (n1 ) and weakly in number of negative results (n0 ) in the following sense. For any n1 , n0 , and k, π ∗ (n1 , n0 ) > 0 ⇒ π ∗ (n1 + k, n0 ) = 1 π ∗ (n1 , n0 ) < 1 ⇒ π ∗ (n1 , n0 + k) = 0 Proof. Appendix A1. The monotonicity restriction is due to Bayes’ rule (4), combined with the law of iterated expectation and monotonicity of the mean of a truncated normal distribution with respect to the mean of the underlying distribution. At the threshold, the following indifference conditions9 must be satisfied: E [b|βi , m−i ∈ P iv] = 1 σ2 (βi + E [β−i |m−i ∈ P iv, βi ]) = c, 1 2 + 2 2 σ σ (6) 0 where P iv denotes the message of another researcher such that changing one’s own message is consequential for the policymaker’s decision.10 Given the researchers’ monotone strategies with respect to their own signals βi , the policymaker also takes monotone strategies. Using the Lemma 1, Proposition 1 will first demonstrate that, even though the set-up is entirely symmetric given c = 0, a stable and optimal equilibrium has omission that generates 9 These indifference conditions are equivalent to the first-order conditions of the social welfare function with respect to the threshold choice. Because of the common interest assumption, it is an unconstrained optimization problem with no incentive constraints. 10 It may appear at first glance that this monotonicity condition naturally extends to general set-up with heterogeneous σ, G (σ), by the law of iterated expectation. However, the counterexample in Appendix A2 shows that, for some opponent strategies, it needs not hold. Therefore, characterization of equilibrium of general G (σ) requires some assumptions on endogenous decisions of other researchers’ strategies that are difficult to evaluate analytically. As an alternative, this paper will analyze the equilibrium numerically and verify the monotonicity as shown in Appendix Figure B1. 9 upward bias in the underlying estimates of reported studies. To highlight the contrast, Proposition 2 will then show that responsive and informative equilibria with either no or symmetric omission also exist, but that they are sub-optimal and unstable. Figure 2: Equilibrium thresholds and policy decisions for N = 2, c = 0 Notes: Panel 1 plots the benchmark first-best policy implementation rule (β1 + β2 ≥ 0) as given by the Bayes’ rule and an example of the bivariate normal distribution of signal realizations {β1 , β2 }. Panel 2, 3, and 4 illustrate the equilibrium thresholds and policy decisions under three responsive and informative equilibria. The dotted line shows the thresholds for each equilibrium, and the policy is implemented if {β1 , β2 } were in the region northwest to the solid line for Panel 1 and 2. For Panel 3, policy is implemented with 21 probability in region surrounded by the dotted line. “False negative” (“False positive”) regions denote signal realizations such that the policy is (is not) implemented under the full information but is not (is) implemented in equilibrium. The figures’ origin is {0, 0}. 10 Proposition 1 (Stable and optimal equilibrium with bias). Let N = 2 and c = 0. (i) There exists an equilibrium with the following strategies. The policymaker’s strategy is a supermajoritarian policy decision rule: 1 if π∗ = 0 if n1 > n0 (7) n1 ≤ n0 The researchers’ strategies are common and characterized by the thresholds that satisfy: β≥0>β (8) β < −β (9) thus generates upward bias in the estimates of the reported studies: E [βi |mi 6= ∅] > 0 (10) Proof. Suppose that the policymaker follows the supermajoritarian rule as in (7). Assuming that the two researchers adopt the common thresholds, they satisfy the indifference condition (6) in expectation conditional on pivotality because they do not know the signal of another researcher β−i : h i β + E β−i |β > β−i ≥ β, βi = β = 0 h (11) i β + E β−i |β−i ≥ β, βi = β = 0 (12) Since these thresholds are strategic substitutes of each other, the symmetric threshold exists and is unique. Since the policymaker focuses on the average value of signal whereas the researchers’ indifference conditions are defined for the marginal value at the threshold, it is strictly optimal to implement the policy following the supposed rule (7). To confirm β ≥ 0 > β, towards contradiction, suppose β ≤ 0. Byi the indifference condition h i h (11), β = −E β−i |β−i ∈ [β, β), βi = β > 0 since E β−i |β−i < β < 0 regardless of other conditions. Because this contradicts the assumption, β ≥ 0. Substituting this into (12), h i β = −E β−i |β−i > β, βi = β < 0. The asymmetry β < −β must hold ifor β ≥ 0. Towards contradiction, suppose β ≥ −β. h Then, β = −E β−i |β−i ∈ [β, β), βi = β < 0 because the combinations of conditions β−i ∈ h i [β, β) and βi = β imply E β−i |β−i ∈ [β, β), βi = β > 0. Since this contradicts the assumption, β < −β must hold; that is, positive results are pivotal only when another researcher omits his result and negative results are pivotal only when another researcher reports a positive result. 11 The bias as in (10) occurs due to this asymmetry of thresholds. Appendix A3 shows that the symmetric equilibrium exists. To investigate the properties of this equilibrium, let us note the following numerical observation: Observation 1. (Moderate strategic substitutability under normal distribution) Define the BR combined best response function β i β −i as the fixed point of the indifference condition (11) in which β−i is given by the condition (12).11 BR ∂β i β −i ∂β −i > −1 ∀β −i (13) Although this condition may appear to be an endogenous assumption, it is a condition defined on the derivative of the mean of truncated normal distribution, which is a primitive of the model. While it is difficult to verify analytically, the Appendix Figure B2 shows that it holds for all plausible range of parameters. Proposition 1 (ii) Under Observation 1, the equilibrium with asymmetric omission characterized in (i) is locally stable and optimal. Proof. Appendix A3. The equilibrium characterized in Proposition 1 differs from the conventional interpretation of publication bias in three ways. First, it shows that the bias can occur among the unbiased researchers in a locally stable, thus empirically plausible, equilibrium. Second, the biased reporting is socially optimal and thus, the policymaker also prefers such reporting practices. Third, the bias in the underlying estimates of reported studies does not imply the bias in policy decisions. Thenumerical illustration shown in Appendix Figure B2 shows P (a∗ (m) = 1) < P aF B (β) = 1 = 1 2 under various parameters; that is, the likelihood of policy implementation is not higher than that under the equilibrium with full information communication. In this way, scientific communication in this model is conservative because only the messages m, not the estimates β, are reported and interpreted through coarse aggregation. The equilibrium thus has an “unbiased” publication bias both because it is observed with unbiased researchers and because it does not lead to bias in decision-making. Figure 2.2 is the graphical illustration of this equilibrium, and shows it is optimal because it most efficiently approximates the full information threshold in Figure 2.1. The equilibria that maximize social welfare are the ones that minimize the welfare loss due to suboptimal decisions: because of coarse messages, the policymaker may implement the policy when she would not 11 Note that, by Lemma 1, the combined best response function is unique. 12 under full communication or may not implement when she would. This equilibrium has the lowest welfare loss due to such false positive and false negative errors because omission in this model is just like enriching the message space from mi ∈ {1, 0} to mi ∈ {1, 0, ∅}. Moreover, an iterative adjustment from a neighborhood converges to the equilibrium. This is because the policymaker’s strategy does not rely on marginal conditions of researchers’ strategies, and researchers’ strategies are continuous and only moderately responsive to that of each other. Proposition 2 will formally show other informative and responsive equilibria as illustrated in Figures 2.3 and 2.4 exist, but are not robust to small perturbation of strategies and attain only lower welfare than this equilibrium. Proposition 2 (Unstable and sub-optimal equilibria with no bias). Let N = 2 and c = 0. (i) There exist equilibria with the following strategies: • No omission: There exists an equilibrium with following strategies that satisfy the following conditions: the policymaker follows ∗ π = 1 n1 = 2 π̃ if 0 (14) n1 = 1, n0 = 0 n0 ≥ n1 , where π̃ ∈ (0, 1] and the researchers’ thresholds satisfy β = β < 0 so that E [βi |mi 6= ∅] = 0. • Symmetric omission: There exists a symmetric equilibrium with strategies that satisfy the following conditions: the policymaker follows π∗ = 1 1 2 0 n1 > n0 if (15) n1 = n0 n1 < n0 and the researchers’ thresholds satisfy β > 0 > β and β = −β so that E [βi |mi 6= ∅] = 0. (ii) These equilibria with no or symmetric omission are sub-optimal and unstable. Proof. Appendix A4. In the environment identical to Proposition 1, Proposition 2 shows that other informative and responsive equilibria with no bias in the estimates that underly the reported messages exist, and nevertheless that they are unstable and socially sub-optimal. The following heuristic arguments summarize the key logic in the proof contained in the Appendix A3. As illustrated in the Figure 2.3, there exists an equilibrium with no omission because the benefit of enriching the message by omission exists only when another researcher also omits. Nevertheless, a following perturbation argument shows this equilibrium is not stable 0 nor optimal. Modifying the strategy of one researcher slightly by β = β + ∆ for small ∆ > 0 induces another researcher to also omit some results, breaking stability. Moreover, shifting from 0 (14) to the supermajoritarian rule (7) and modifying both researchers’ strategies to β = β + ∆ 13 strictly improves welfare because it reduces the false positive errors: such perturbation can better approximate the full information threshold. Given the mixed strategy of the policymaker, there also exists an equilibrium with a symmetric omission of positive and negative results as depicted in Figure 2.4. This is because of the set-up that is symmetric between 0 due to the bivariate normal distribution and Eb = c = 0. 0 However, a perturbation of the researcher strategy by β = β − ∆ and β 0 = β − ∆ can demonstrate it is also unstable and sub-optimal. As the policymaker’s mixed strategy relies on tie-breaking and indifference, it is not robust to small perturbations of researcher strategies and turns to (7) under such modification. Moreover, this shift increases welfare because it increases E [b|n1 > n0 ] by quantity proportional to ∆ while decreases welfare by magnitude proportional to ∆2 . Since there is a first-order gain and second-order loss12 , this perturbation improves welfare for small ∆ > 0. Relating the results to the seminal works in voting theory regarding information aggregation can clarify why the omission with bias is optimal. Both voting theory and this model have set-up such as private information, common interests, and simultaneous reporting. The result regarding omission resembles the result that uninformed and unbiased voters abstain when there are other informed and independent voters (Feddersen and Pesendorfer, 1996). When voters lack information regarding which candidate is better, they prefer to abstain from voting to let others with reliable information cast decisive votes. The result regarding biased reporting is analogous to the result that, under private information, unanimity rule increases the probability of false conviction of innocent if the jurors condition their votes on the state in which their votes are pivotal (Feddersen and Pesendorfer, 1998). Because unanimity rule is a conservative voting rule, each unbiased juror’s equilibrium strategy will be biased to counterbalance such overly cautious rule by occasionally voting to convict even if their private information suggests defendant is innocent. They then suggest that, therefore, the unanimity rule is socially sub-optimal compared to other voting rules such as a majoritarian rule. The key difference of this paper’s model from these models is the information coarsening: while the models in voting theory have commonly assumed that voters receive binary signals13 , this paper considers researchers who receive continuous signals but can send only binary messages. This assumption of continuous signals is particularly appropriate for academic research in which the private signals are the estimates of policy effectiveness. While this difference may appear subtle, a set-up with binary signals in Appendix A5 demonstrates it is consequential: if the signals were binary so that the researchers only know the sign of coefficient, 1(βi ≥ 0), then an optimal and stable equilibrium has no omission nor biased reporting. That is, if signals were binary, reporting everything that the researcher knows as he knows is always optimal. 12 Specifically, the welfare gain is 2∆ P m−i = 1|β E β−i |m−i = 1, β + P m−i = ∅|β E β−i |m−i = ∅, β } and the welfare loss is 2∆2 P βi ∈ β − ∆, β ∧ β−i ∈ β − ∆, β β+β . 13 Feddersen and Pesendorfer (1999) considers an environment in which there are finitely many ordered signals. The bias in their paper arises due to the correlation between preference and biased distribution of information. This paper differs from theirs in that the private signals have two dimensions and cannot be ordered, and that bias arises due to the pivotality conditions. 14 Under coarse communication, researchers can only communicate the sign of their signals while they also know and wish to convey the strength of their signals. Thus, even though their signal precisions are the same so that all researchers are equally well-informed, those with coefficients near zero still want to omit their results to communicate that their signals had small magnitudes in absolute value. The set-up with information coarsening offers an explanation for the result-dependent reporting in which small, noisy studies are publishable only with significant results. In contrast, the set-up with binary signals cannot explain this because it suggests reporting decision depends only on precision of experiments that depends on process such as sample size and study design. In addition, even though an equilibrium with symmetric omission exists, the optimal equilibrium has asymmetric omission with bias in the estimates that underlie reported studies. This is because thresholds β, β with asymmetry around zero can closely approximate the first-best decision rule, β1 + β2 ≥ 0, in a multi-dimensional setting as the comparison of Figure 1.2 and 1.4 shows. In contrast, the set-up with binary signals does not lead to optimality of biased reporting because garbling of information by researchers cannot improve the policymaker’s decisions. Full characterization of all fully informative and fully responsive equilibria will involve two additional equilibria that are analogous to those characterized in Proposition 1 and Proposition 2 without omission. Instead of the supermajoritarian rule (7), the policymaker adopts a submajoritarian decision rule a∗ = 1 ⇔ n1 ≥ n0 in these equilibria. Proposition 1 focuses on the case of a∗ = 1 ⇔ n1 > n0 because it will be numerically shown to be the unique optimal equilibrium for c > 0 as the likelihood of incurring this positive cost is lower in this equilibrium (Appendix Figure B4). While it is ideal to analytically characterize equilibrium for c > 0, it is intractable to prove optimality because it involves solving system of nonlinear conditions (11, 12) and evaluating sum of truncated correlated multivariate normals. Instead, the next section will demonstrate this numerically under range of plausible parameters. Extension. While the above model has considered the noise in the estimates due to sampling errors, the publication decision also depends on the study-specific noise that affects its generalizability. For example, the underlying treatment effect in one particular population may differ from that in another. In observational studies, the estimation strategy may have biases in ambiguous directions. This kind of noise cannot be reduced by increasing sample size, and thus, researchers may not report even if their standard error of estimates, σi , were small. The model can incorporate such underlying heterogeneity by adding another error: βi = b + i + ζi , where b ∼ N 0, σb2 is the true benefit, i ∼ N 0, σi2 is the sampling error, and ζi ∼ N 0, σ02 is the study-specific error. Then, the posterior will be modified to so that the above analyses will carry through even in this case.14 1 (β1 +β2 ) σ 2 +σ 2 0 1 + 22 2 σ2 σ +σ 0 b 14 In practice, it may appear unreasonable to assume that σ0 , heterogeneity across settings, is known while the mean benefit b is unknown; it is more reasonable to take the expectation over σ0 to determine the thresholds instead. Moreover, when the study design’s credibility is different, σ0 can be specific to studies and corresponds to the weight on credibility of study. 15 3.3 Numerical illustrations with heterogeneous precisions The numerical analysis can extend the equilibrium characterization in Proposition 1 to a more general setting. Computing the equilibrium takes two steps: first, assuming that the monotonicity as in Lemma 1 and the supermajoritarian rule (a∗ = 1 ⇔ n1 > n0 ) hold in a symmetric equilibrium, solve for the system of indifference conditions; second, verify that the monotone strategies and the supermajoritarian rule are consistent with players’ optimization15 . The key indifference conditions extend (11) and (12) in Proposition 1 as follows: for every i, for every σi ∈ Supp (G), strategies β i (σi ) and βi (σi ) satisfy P β i (σi ) β−i 2 +σ 2 + −i σ 2 +σ 2 σ i 0 0 −i E P 1 1 + i σ 2 +σ 2 σb2 0 i 0 n1 = n00 = c β (σ ) P i i β−i + 2 2 2 2 −i σ +σ σ +σ 0 −i E i 10 P n01 = n00 + 1 = c, 1 2 2 + 2 i σb (16) (17) σi +σ0 where n01 , n00 refers to number of positive and negative results of other reports respectively. The left hand side is the posterior belief of policy benefit conditional on being pivotal, and the right hand side is the threshold effectiveness for implementing the policy. It allows for uncertainty over not only strength but also precision of others’ signal16 and combinations of others’ messages such that the message is pivotal due to N ≥ 3, and equates the posterior (4) to any c > 0. 3.3.1 Sub-optimality of t-statistics rule Figure 3 demonstrates the main numerical result that β (σ), the threshold between sending positive message or not reporting the study, is strictly convex in σ.17 While the model does not 15 For example, the Appendix Figure B4 verifies that the supermajoritarian rule attains higher social welfare than a submajoritarian rule (a∗ = 1 ⇔ n1 ≥ n0 ) when c > 0. 16 As shown in Appendix Figure B5, the simulations with heterogeneous G (σ) in this sections discretizes the values of standard deviations into 10 bins σ ∈ {0.1, ..., 1}, with χ2 distribution of 4 degrees of freedom between 0 and 10. This smooth distribution approximates the imputed distribution of standard errors of all underlying studies in the labor union example discussed in empirical section. 17 In Figure 4, β (σ), the threshold between sending negative message or not reporting the study, is concave. While the convexity of β (σ) holds for any c > 0, such concavity of β (σ) holds only for small c > 0. As the Appendix Figure B6 shows, the β (σ) threshold can be convex and increasing in some range of σ for sufficiently large c > 0. This is driven by the effect of the conservative prior as described in the Bayes’ rule (4). The full information threshold increases as σi increases due to the term cσi2 . N σ2 b Thus, if c > 0 were large, even β (σ) would be convex and can be increasing in σ. This is counterintuitive for two reasons. First, it is a counterexample to one of the principal findings in voting theory literature that those who are less informed abstain more (Feddersen and Pesendorfer, 1999; Feddersen, 2004). In the most stark example, the abstention probability appears to converge to 0 as the voter becomes completely uninformed (σi → ∞). This shows that abstention is not driven by uninformedness per se, but is instead driven by the weakness of belief that supports either policy options. If the prior is strong because c > 0 is large, then having evidence can moderate the prior and lead to more abstention. This can offer an explanation for why “uninformed voters” may vote more as we see in recent elections with extreme candidates in the real world. It is consistent with the model presented in Banerjee et 16 impose any rule based on t-statistic, actual scientific publication requires statistical significance to draw positive conclusions. Under some constant t-statistic threshold, researchers whose signals fall in the shaded region may inflate the t-statistic through specification searches to announce positive results. This occurs even when researchers are unbiased, and moreover, the policymaker also prefers reports with inflation than those with strict compliance a constant t-statistic threshold rule. As specification searches incur marginal costs, this mechanism can explain why there will a mass of studies right above the significance threshold as reviewed in Section 2. Figure 3: β (σ) and β (σ) thresholds Notes: Figure 3 plots the β (σ) , β (σ) thresholds under the prior standard deviation σb = 1, no studyspecific effect σ0 = 0, and policy cost c = 0. The darker solid line is β (σ), the lighter solid line is β (σ), and the dashed line represents some linear t-statistic threshold. Studies in the shaded region draw positive conclusions even though they are marginally statistically insignificant. The non-linearity of thresholds β (σ) , β (σ) is primarily due to the difference in the use of standard errors between the null hypothesis testing and the Bayesian updating: whereas t-statistics divides coefficients by their standard errors, σ, the Bayes’ rule divides coefficients by variance, σ 2 . t-statistics normalize the convergence of distribution of βi that ocal. (2011) although their model did not consider the voting as means of information aggregation. Second, this result suggests that, if the researchers can make their estimates less precise and publish if their objective were to publish. This prediction may be viewed as a weakness of the theory. Since in reality reporting binary decisions is not as flexible as the model allows, the policymakers are likely to require large number of positive results to decide a∗ = 1. 17 √ curs at rate n. In contrast, the Bayesian updating puts weights to each study that is proportional to its information that increases at rate n in the absence of study-specific ef18 fects condition (16) can be approximately rewritten as β i (σi ) ∝ (σ0 = 0). The indifference σi2 c − E β−i 0 −i σ 2 |n1 −i = n00 P if no uncertainty in σ−i exists. Since c > E β−i 0 −i σ 2 |n1 −i P = n00 so that the policy would not be implemented in the absence of an additional positive result, the threshold β i (σi ) is convex due to the term σi2 . Conversely, βi (σi ) can be concave because βi (σi ) ∝ σi2 c − E P β−i 0 −i σ 2 |n1 −i = n00 + 1 and c < E P β−i 0 −i σ 2 |n1 −i = n00 + 1 . The following heuristic argument demonstrates that the linear threshold rule β (σ) = tσ and β (σ) = −tσ for a constant t-statistic threshold t > 0 is generally sub-optimal. Consider the benchmark example in which c = 0 and σ0 = 0. If a linear threshold rule were applied, then the probability of omission is constant across any values of standard errors σ. Recall, however, that the voting theory of abstention suggests it is optimal for the omission probability to decrease with precision of studies when c = 0. Hence, the linear threshold rule contradicts the efficiency of aggregation through vote counting. Figure 3 shows three additional qualitative results of the thresholds β (σ) , β (σ) that deviate from t-statistics rule. First, the asymmetry as in (9), β (σ) < −β (σ), continues to hold among large range of σ given heterogeneous G (σ) and small c > 0 so that the coefficients that underly reported studies have an upward bias.19 Second, the very imprecise studies with marginally significant results are omitted. When the dataset is not large or the identification strategy is not fully reliable, it is plausible that researchers or referees do not publish results that are only marginally significant.20 Third, as long as researchers can claim “negative results” when the coefficient value is small even if it were statistically significant, the very precise studies may be always reported regardless of their signs. If there were no study-specific effects nor uncertainties due to study design (σ0 = 0)21 , when σ → 0, the desirability of omission goes to zero since the signal becomes noiseless. Since the impact of σ on thresholds is primarily quadratic, the omission probability is very small for the neighborhood of σ = 0. 18 The Bayesian updating uses the weights inversely proportional to variance when σ02 = 0 because it is the only weight that respects invariance to re-grouping of experiments. To see this, consider two experiments that T C have sample size k1 , k2 of both treatment and control groups. Denoting yij , yij as the outcomes of those in Pkj treatment and control group, the treatment effect estimate from each study is βj = P Pkj Pkj yT − i=1 ij yC i=1 ij i=1 Pkj T yij − kj i=1 C yij for j j = 1, 2. The aggregate estimate is defined as β1+2 = . The only way to obtain this k1 +k2 2 β2 expression from coefficients β1 , β2 is to apply the formula β1+2 = k1 βk11 +k , which is equivalent to the Bayes +k2 rule since variance is inversely proportional to the sample size. 19 As Figure 4(B) and Table 2 column (4) show, the bias becomes the opposite when c > 0 is very large relative to σb . 20 While some claim that it is optimal not to publish any noisy results, the numerical analysis shows that it is socially optimal that noisy extreme results are reported. 21 As Figure 4(B) shows, when σ0 > 0, the omission probability of most precise studies becomes clearly greater than zero because the information contained in one study is bounded above. Since in reality researchers do not know σ0 , it relies on the researchers’ belief over the parameter. 18 3.3.2 Increasing the number of researchers Figure 4: β (σ) , β (σ) thresholds under various σb and σ0 Notes: Figure 4 plots the β (σ) , β (σ) thresholds under c = 13 and varying standard deviations σb and σ0 . In each figure, the convex solid lines are β (σ) and concave solid lines are β (σ). There is only one threshold for N = 1 since there is no benefit of omission when there is only one researcher. 19 There were two critical limitations in the set-up for analytical characterization regarding the number of researchers. First, there were only two researchers even though there are more than three researchers on most research topics. Second, it assumed that the policymaker knew the number of researchers whereas the problem of omission is often attributed to the uncertainty regarding how many studies are unreported. What are the effects of increasing the number of researchers on the policymaker’s aggregation strategy, researchers’ omission and bias, and the welfare attainment? First, the policymaker’s supermajoritarian rule (a∗ = 1 ⇔ n1 > n0 ) holds in the optimal communication equilibrium even for N = 3, ..., 10 (Appendix Figure B7)22 . This suggests that even if the policymaker does not know the number of underlying studies, they can still compare the number of reported positive vs negative results to make the decisions they would have taken with knowledge of N . While this result may rely on risk neutrality as will be discussed in sensitivity analysis, it suggests that the assumption that the policymaker knows N is not critical.23 Second, the researchers’ omission probability gradually increases as N increases. The Figure 4 depicts the example equilibrium thresholds for N = 1, ..., 4 keeping all other environment constant when (A) σb is high and σ0 is high, (B) σb is low and σ0 is low, and (C) σb is high and σ0 is low. In all cases, the probability of omission conditional on study precision, P (mi = ∅|σi ), increases with N for any values of σi . This is because, as N increases, the total information owned by other researchers rises and leaving the decisions to others’ papers becomes more desirable.24 Nevertheless, the bias on the coefficients that underly reported studies may increase or decrease as increase in N may shift the thresholds in either directions.25 This is because there are three channels through which N alters the thresholds. As Figure 4(A) shows, the effect of changing pivotality condition shifts β upwards and β in ambiguous directions, potentially mitigating the bias as N increases.26 As Figure 4(B) shows, the decreasing effect of conservative 22 This is verified for degenerate value of σ. N = 10 is the highest number of researchers attempted due to computational feasibility. 23 The set-up needs to maintain the assumption that the researchers know how many other researchers exist on the same subject. First, this appears to be closer to actual scientific practice than the assumption that the readers also know number of researchers. Second, the Figure 4 demonstrates that the thresholds do not change qualitatively with N and the changes are not quantitatively large: thus, even if there were uncertainty in N from researchers’ perspective, they may still be able to choose approximately optimal thresholds. 24 This result is consistent with the literature on voting theory that shows that the abstention probability increases as the number of voters increases. 25 This inquiry relates to the literature on media that explores the impact of market competition on media bias, and finds that the higher number of competing senders may increase the bias arising from taste and decrease bias arising from reputational motives (Gentzkow et al., 2016). 26 Consider, for example, the pivotality conditions for N = 2 and N = 3. β (N = 2) is lower than β (N = 3) because it satisfies the indifference condition (16) when one researcher receives high signal as opposed to when one receives high signal and another receives intermediate signal. β (N = 2) may be lower or higher than β (N = 3) because it satisfies the indifference condition (17) when only one another researcher receives intermediate negative signal as opposed to two researchers receiving intermediate negative signals or one receiving positive and another 20 prior as in (5) shifts both β and β downwards. When there are more researchers, each researcher needs less extreme signals to overturn the default decision. As Figure 4(C) suggests, there is also the effect of equilibrium thresholds adjustment that shifts both β and β in directions that offset the effect of the first two effects. For example, β may shift downwards due to upward shift in β especially among noisy studies, increasing the bias. These considerations jointly determine the conditions under which the increase in number of researchers, N , may increase or decrease the bias. Finally, while rigorous proof is beyond the scope of this paper27 , it conjectures that the information aggregates fully in a sense that, as N goes to infinity, the probability that the equilibrium action is the best action converges to 1 (Feddersen and Pesendorfer, 1996). The incompatibility between informative and strategic voting occurred due to the inflexible voting rule (Austen-Smith and Banks, 1996). In this communication model, since the aggregation rule through voting can flexibly adjust, the analogous source of inefficiency is absent. Any thresholds will asymptotically fully convey the policy benefit b since the cumulative distribution function of normal distribution is strictly increasing everywhere and thus one-to-one. As Figure 4 shows that the thresholds will likely be finite at least for some range of σ, information is most likely to aggregate fully as N → ∞ in this model. 3.3.3 Quantitative characterization of bias, omission, and welfare Could a model explain substantial degree of upward bias due to omission28 , frequent omission, and suggest that the welfare gain from the optimal threshold as opposed to linear threshold with full reporting is sizable? Table 2 summarizes the quantitative exploration. In many specifications, Panel (A) suggests sizable upward bias in unweighted29 average estimates. Consistent with Figures 3 and 4, these biases are driven by noisy studies. In contrast, most precise studies have very small biases. Note, for large c > 0, there can be downward bias because the studies whose coefficients near c are omitted.30 Quantitatively, despite its symmetric set-up, the model can explain the bias as large as one standard deviation of the underlying distribution of benefits at least among the lease precise studies. Panel (B) shows that the average omission rate can be as high as roughly 30 to 50 percents. While most precise studies have only 1 to 3 percent omission rate in the absence of studyspecific effects (σ0 = 0), most noisy studies may be omitted roughly 70 to 80 percents. On the other hand, belief regarding high σ0 can raise the omission rate even among the most precise receiving negative signals. 27 Analytically, it is impossible to characterize the voting rule due to difficulties of evaluating the sum of truncated normals. Numerically, it is impossible to investigate asymptotic properties because of computational feasibility. Moreover, unlike in voting theory, the inquiry on asymptotic property is not central to the application of scientific communication because there are usually often only a few researchers on one topic. 28 Note that this is a lower bound of the actual bias under the t-statistic rule because inflation of coefficients leads to even larger upward bias in reported estimates. 29 Note that this is different from the meta-analysis estimates with Bayes’ rule, which puts higher weight on the more precise studies. Many meta-analysis studies often discuss these unweighted estimates. 30 This prediction of downward bias also occurs with the Hedges selection model discussed in Section 4.1. 21 Table 2: Bias, omission, and welfare (1) (2) (3) Baseline High σb High σ0 (A) Bias: i. overall 0.34 0.20 0.26 ii. σi = 0.1 0.00 0.00 0.01 iii. σi = 1 1.01 0.96 0.60 (B) Omission probability: i. overall 0.43 0.32 0.44 ii. σi = 0.1 0.03 0.01 0.30 iii. σi = 1 0.70 0.67 0.60 (C) Welfare: i. unrestricted 0.97 0.99 0.96 ii. restricted 0.90 0.97 0.89 Specification changes from baseline σb = 1 σ0 = 12 (4) High c -0.56 -0.03 -1.27 0.42 0.03 0.77 0.88 0.67 c = 13 (5) High N 0.31 0.00 1.07 0.48 0.03 0.80 0.96 0.91 N =3 Notes: Table 2 summarizes (A) bias E [βi |mi 6= ∅, σi ], (B) omission probability P (mi = ∅|σi ), and (C) ex-ante welfare E [a∗ (m∗ (βi , σi )) [Eb (βi , σi ) − c]] under various settings. In (A) and (B), i. overall values are expected values unconditional on σi realization; ii focuses on the most precise studies (σi = 0.1); and iii focuses on the least precise studies (σi = 1). In (C), welfare is computed as a fraction of full information welfare. i. unrestricted refers to the environment without linear t-statistics rule whereas ii. restricted refers to the hypothetical setting in which no omission is allowed and the threshold is restricted to follow β (σ) = tσ. Here, t is computed to be the optimal value. Since there is no omission by the exogenous restriction, bias and omission probability are both zero. The baseline specification applies σb = 31 , σ0 = 0, c = 0, and N = 2. Columns (2)-(4) modifies this environment for each specification as presented. studies to around 30 percent. Combined with the result in (A), the baseline case demonstrates that, among the 30 percent of reported noisy studies (σi = 1), 28 percents are positive results and only 2 percents are negative results. Conversely, most precise studies have mostly negative conclusions roughly to the extent that the full reporting suggests. This is consistent with researchers’ expectation that, when their set-up is noisy, their studies are publishable only when they are significant and “right-signed.” Finally, the model can quantify welfare gains from permitting publication bias relative to imposing restriction that every binary conclusion of null hypothesis testing needs to be reported. As discussed in the introduction, coarse communication can be largely welfare-reducing, and even with sophisticated readers who compute the posterior correctly and flexible adjustment of t-statistics, there is roughly 3 to 30 percent welfare loss relative to the full information benchmark. The gain from allowing for some omission and inflation is to roughly halve these welfare losses to 1 to 12 percents. This analysis demonstrates that the welfare consequences can be quantitatively important. 3.3.4 Implications on empirical tests The numerical analysis provides a qualitative prediction that, as Figure 3 shows, omission occurs mainly among the imprecisely estimated studies with intermediate coefficient magnitude. In particular, when the negative results are reported, they tend to be either precisely estimated 22 or have the negative coefficients with large absolute values. This qualitative result is coherent across all specifications. While the positive correlation between coefficients and standard errors may occur when underlying effectiveness, b, is large, the prediction on publication selection process raises question on the validity of interpreting the correlation as evidence of “biased publication bias” (Begg and Mezuemder, 1994; Egger et al., 1997). A strong correlation may be due to unbiased motive to report only the very precise null results. Moreover, even when there is a large degree of omission, correlation may be weak due to only a few imprecise and very negative results. The analysis suggests that caution is warranted when relating the correlation tests with biased motives of researchers. 3.4 Sensitivity Analysis The main model made simplifying assumptions to focus on the key mechanism due to coarse messages. While the full analysis is beyond the scope of this paper, the following examples illustrate how results may alter under different assumptions with conflict of interest, sequential reporting, and risk aversion.31 3.4.1 Conflict of interest vs common interest While the main model assumed perfect alignment of interests, public discussions frequently suppose some misalignment of objectives due to private motives for policy, different prior beliefs32 , or private benefits and costs of publication. Alternatively, the policymaker may have bias due to political affiliations. Communication under conflict of interest depends critically on the assumptions of commitment abilities. Throughout this discussion, assume that the policymaker cannot commit to a particular decision rule π (n); if she could, then the equilibrium under common interest would still be an equilibrium because she can make policies in ways that researchers find deviation from the strategy under common interest undesirable. Small vs large misalignment. Consider the benchmark model environment with 2 researchers and the cost of policy c = 0. For example, researchers may have biased motives for promoting or discouraging policies so that their payoff can be written as a (b + d), where d ∈ R denotes a distortionary motive. While the main model abstracted away from the journal referee process, it is an important process in reality, and preference for publication can be written as ab + d1(mi 6= 0). For small d, the equilibrium policy rule a (n) remains the same, and the message strategies β, β change only slightly. To see this, note that under common 31 There are other important considerations, such as irreversibility of decisions, endogenous precision and information acquisition, uncertainty over N , heterogeneity in c after the reports, endogenous agenda-setting, multi-dimensional policies, and multi-dimensional learning that involves reputational concerns of researchers. However, they require involved modification to the set-up, and are thus left for future analysis. 32 Aumann (1976) suggested that, under common prior, people cannot agree to disagree because they rationally update their belief using others’ priors. However, it is commonplace to observe researchers with different beliefs in the real world perhaps due to limited communication, and there are fruitful communication models that investigate heterogeneous beliefs (See, for example, Banerjee and Somanathan, 2001). 23 interest, the policymakers strictly preferred to implement the policy when n1 > n0 because the policymaker focuses on averege E [β|m] whereas the indifference was satisfied at the threshold β, β. Therefore, if the threshold changes continuously as d increases, then for sufficiently small d, the supermajoritarian decision rule π (n) as in (7) remains to be optimal. As contained in the proof in Appendix A6, β, β changes continuously as d changes in both examples. Regardless of whether senders can commit, small conflict of interest changes the equilibrium only marginally; thus, the main analysis and its empirical predictions are qualitatively valid even without the assumption of perfectly disinterested researchers. Clearly, large biases in researcher motives can significantly alter the qualitative characterization of equilibrium. If the researchers cannot commit to some reporting strategies, then the policymaker may never trust their reports, and thus no informative communication will be feasible (Crawford and Sobel, 1982; Hao, Rosen, and Suen, 2001). If the researchers can commit to some reporting rules, the researchers may optimally persuade the policymakers to implement the policies more often than the social optimum (Kamenica and Gentzkow, 2016). Allegiance bias. The model can also explain the allegiance bias, the correlation between conclusions of studies and authors, as vote counting consideration amplifies originally small disagreements among researchers. Consider the example in the introduction and suppose one researcher 1 hopes to promote the policy so that his utility is a (b + d) , d > 0 whereas another researcher 2 is unbiased so that his payoffh is ab. Recall that the iunbiased researcher’s indifference conditions are given by β 2 = −E β1 |β 1 > β1 ≥ β1 , β2 = β 2 and h i β2 = −E β1 |β1 ≥ β 1 , β2 = β2 , which are decreasing in the biased researchers strategies β 1 , β1 . Therefore, when the interested researcher lowers his threshold β 1 , β1 to induce more policies, the unbiased researcher responds by raising his threshold β 2 , β2 and recommends fewer policies. This further feeds back into the interested researcher’s strategies and induces him to recommend policies even more frequently. In this way, even if the underlying differences in preference or belief may be small, polarization in the standards applied to draw their conclusions occurs endogenously through strategic substitutability between researchers. It is natural to expect small differences in beliefs among researchers, and this explanation can shed light on one aspect of the nature of academic debates. To quantify the magnitude, consider an example with 2 researchers, c = 0, and σ0 = σ = 1. If neither researcher has bias, then the equilibrium threshold is β1 = 0.19. If researcher 1 has bias d = −0.1 so that he has bias towards policy implementation and researcher j does not have bias, then the thresholds become β1 = −0.25 and β2 = 0.5, as illustrated in Figure 5. Note that, if there were only 1, then the threshold for recommending policy only changes by −0.2 due to d = −0.1 in this set-up. This example illustrates that, because of the strategic substitutability, the difference in threshold, β2 − β1 = 0.75, becomes more than three times larger than that under N = 1. 24 Figure 5: Equilibrium responses to each other’s β Notes: To visualize strategic substitutability, Figure 5 shows the best response function (11) that substitutes (12). That is, it plots the βi = BRi β −i , β−i = BRi β −i , BR−i βi for i = 1, 2 above. The solid lines show the best response functions when neither researchers have bias. The dashed lines show the best response functions when researcher 1 has bias d = −0.1. 3.4.2 Sequential vs simultaneous report The model focused on the simultaneous report in which the researchers present their results with no knowledge of others’ results. While actual reporting is sequential, such simultaneity assumption could be loosely interpreted as researchers having uncertainties about the future results of others. This simplification is common both in cheap talk models and voting models although actual conversations take place sequentially, and many examples of voting take place with lag across constituencies so that there is gradual revelation of information. Following examples demonstrate how dynamic considerations maintain or alter the results. Example 1. (Equivalence) Consider the main model with 2 researchers. Suppose, instead, researcher i = 1 reports his result first, and researcher i = 2 observes only the result m1 when he reports his result. Then, the optimal equilibrium is equivalent to the model with simultaneous report. To see why, note that researcher 2 was already conditioning his report on the realization of m1 at the pivotal case. Thus, maintaining the strategy equivalent to simultaneous reporting is optimal. Given the researcher 2’s strategy is unaltered, it is also optimal for researcher 1 to keep his same strategy. This result is analogous to Dekel and Piccione (2000)’s result that the 25 simultaneous voting game’s equilibria are also sequential game’s equilibria.33 The key is that the information asymmetry exists between the researchers, not the exact timing assumption. Example 2. (Reviewer) Consider the main model with 2 researchers. Suppose, instead, researcher i = 1 reports his result first, and researcher i = 2 observes {m1 , β1 , σ1 } only when m1 6= ∅ before he reports his result. Then, when the policymaker adopts the supermajoritarian rule as in (7), m∗1 = 1 and the first best is attained always. Example 1 assumed that researcher 2 does not read researcher 1’s paper carefully to understand the signals {β1 , σ1 }. In contrast, it is reasonable to suppose that the researcher 2 can obtain full details only when researcher 1 does not omit the result. This example highlights the motive of publishing papers only to communicate the detailed results to other researchers: to convey the signal to later researchers, early researchers must report either m1 = 1 or m1 = 0. If the early researcher report m1 = 0, however, there will not be any message for researcher 2 to lead to policy implementation under supermajoritarian rule. Because the early researcher is sure that the later researcher will aggregate evidence and vote for the socially optimal decision, it is strictly optimal to report m1 = 1 however negative β1 is to maintain the option value. In this way, expectation to let a reviewer directly summarize the signals, or a replicator confirm the validity of message, drastically alters the nature of equilibrium. The first best welfare is attained because the later researcher synthesizes both results to make policy recommendation. This example captures the conventional wisdom to be cautious of literature at an early stage of inquiry because early researchers may expect others will correct the policy to the right direction even if they turn out to be overly optimistic. That early literature tend to have large coefficient magnitude has been empirically confirmed in some studies (Munafo et al., 2007). 3.4.3 Risk aversion vs risk neutrality Even though the main set-up focused on the role of standard errors σ in the Bayes’ rule, the scientific requirement of low p-values may be primarily due to risk aversion of a decisionmaker who hopes to be “certain enough” of the positive effect. One can apply the Taylor approximation to utility over policy benefit b with constant relative risk aversion γ ≥ 0 to obtain the decision rule (Eb)1−γ 1−γ − γ V ar(b) 2 (Eb)1+γ ≥ c, confirming the uncertainty over policy benefit deters policy. With a single agent and flat prior, the optimal threshold can be written as − 1 2 1 c 2 − 1−γ γ 1−γ β t≥t= 1 − |2 (log β − c)| 2 33 if γ 6= 1 (18) if γ = 1 When there are more than 3 researchers, the final researcher can improve the reporting strategy slightly with the knowledge of which message realization of pivotality he faces. In this sense, availability of others’ reports may improve welfare unlike in the Dekel-Piccione model. Nevertheless, unlike the sequential cheap talk models (Krishna and Morgan, 2004), the characteristics of equilibrium do not change; their results were based on risk averse preferences and equilibrium selection. 26 so that the threshold t-statistics is decreasing in the coefficient magnitude β. Thus, the risk aversion could mitigate the convexity of t-statistics threshold for positive conclusions as shown by numerical simulation in the model with risk neutrality. With many researchers, since the posterior variance is V ar (b) = E PN 1 i=0 σ 2 i −1 > 1 σ02 + 1 σi2 +E PN 1 2 σ−i PN 1 i=0 σ 2 i −1 and the Jensen’s inequality applies −1 , the researchers become on average more cautious of recommending the policy because they are uncertain about what other researchers know. 4 Empirics The model makes empirical predictions on the meta-analysis data that are distinct from other commonly used models of publication bias: omission34 occurs among the imprecisely estimated coefficients with intermediate magnitude. This section presents the tests to distinguish between the models using widely available type of data35 , and apply them to two examples in economics. The assumptions on publication selection process is important for choosing the appropriate bias correction methods in meta-analyses. Given the possible relevance of the communication model, the section proposes a new bias correction method that is useful under commonly suggested models of publication bias. 4.1 Predictions Consider the following data generating process with heterogeneous effects of the main coefficients and standard errors {βi , σi } of all studies i ∈ {1, ..., N } such that σ1 ≤ ... ≤ σN . The data of published studies are generated in three steps: first, the study precision σi ∼ G and the underlying effect bi ∼ N (b0 , σ0 ) is independently determined. Second, the random error i ∼ N 0, σi2 is independently drawn and the coefficient βi = bi + i is determined. Third, the researcher i decides to report a binary message based on {βi , σi } or not. Contrast with existing models. This paper’s model makes the predictions that, when negative results are published, they have either high precision or negative coefficient that is large in magnitude. To the best of my knowledge, to date, there are three models in economics and 34 Although inflation may also be an important source of bias, it is beyond the scope of this paper to empirically test their patterns for two reasons. First, the test of inflation requires far more data points than the test of omission since it focuses on the local distribution around the cut-off as opposed to the entire distribution. Unlike the existing evidence that have pooled the studies of various types to examine the excess mass around the cut-off, this analysis on heterogeneous extent of inflation across study precision requires looking into a set of studies that estimate one common parameter. Second, the theory does not provide parameter-free predictions on the pattern of inflation across heterogeneity of study precisions. If the target effectiveness c > 0, then there should be little inflation at precise values; but if c ≤ 0, then there should be more inflaion around the precise values than around the noisy studies. If the distribution of standard errors G (σ) has a fat tail, then there should be less inflation at the most noisy studies. If G has thin tail, there will be more inflation among them. In this way, the pattern of inflation depends on specific model environment, which is difficult to infer solely from the meta-analysis data. 35 While it is ideal to test them with data that include observations of omitted studies or non-inflated specifications, such data are available only in rare circumstances in which funding or regulatory agencies held such data. 27 Table 3: Comparison between empirical distribution and counterfactual distribution This paper’s model Uniform selection Extremum selection Gradual selection G∅ (σ) Ĝ∅ > G̃∅ Ĝ∅ = G̃∅ Ĝ∅ > G̃∅ Ĝ∅ > G̃∅ H0 (β) Ĥ0 > H̃0 Ĥ0 > H̃0 Ĥ0 < H̃0 Ĥ0 < H̃0 Notes: Table 3 summarizes the prediction of each model. Titles are given to highlight the contrast between them. Let G̃∅ (σ) denote the distribution of all standard errors conditional on the study being the negative results without any selection. Let H̃0 (β) denote the distribution of coefficient conditional on the study not rejecting the null hypothesis with threshold t̄, without any selection. Ĝ∅ and Ĥ0 denote respective observed distributions. biostatistics that describe other data selection process and have been used for bias correction method.36 However, none of them makes the sets of empirical predictions that are identical to the communication model. The distinction between models arises in the precision and coefficient magnitude of published null and negative results. First, the uniform selection model (Hedges, 1992) supposes the publication probability as a step function of its t-statistic, P (study i is reported) = η1 if η if 0 |βi | σi |βi | σi ≥ t̄ (19) < t̄, where η1 > η0 > 0. This suggests that the results that are not positively significant are systematically unlikely to be published. Thus, conditional on being null results, the distribution is identical to the underlying distribution: Ĝ0 = G̃0 . This model is commonly used in economics with maximum likelihood estimation to correct for publication bias. (Hedges, 1992; Ashenfelter et al., 1999; McCrary et al., 2016). In this framework, the positive correlation test as discussed in Section 2 is a joint test of whether publication selection takes place and the underlying effect is positive (b0 > 0) since, if b0 < 0, the correlation between coefficient and standard error will be negative. Second, the extremum selection model (Duval and Tweedie, 2000) assumes that the most extreme negative values are suppressed from publication: P (study i is reported) = 1 if βi ≥ βmin 0 if βi < βmin , (20) where βmin is some threshold. In common case where βmin < b0 , there arises little selection 36 Fafchamps and Labonne (2016) informally proposes an alternative rule of journal acceptance: papers can be accepted when the result is statistically significant or when it can reject the modified the null hypothesis: the effect is as high as some target level c > 0. This model also has a prediction that the publication probability is U-shaped in the magnitude of coefficient. Unlike in the communication model of this paper, however, their model predicts that the negative results requires smaller magnitude than the positive results in absolute value. If the underlying effect were zero, their model predicts deflation of overall estimate when not adjusted for publication bias. In this way, the two models can still be empirically distinguished. 28 among the most precise studies. Thus, null results are more likely to be reported when the standard error is small so that G∅ > G̃∅ . At the same time, the model also suggests that marginally insignificant negative results are not particularly more likely to be omitted since the selection is unrelated to the statistical significance milestones: H0 < H̃0 . This model is most widely used as basis for in trim-and-fill publication bias correction method in medicine: the method removes the noisy estimates with large coefficients until the positive correlation between coefficient and standard errors no longer exists, and imputes the missing studies using the samples such that the bias no longer exists, as discussed in main textbooks such as Cochrane Handbook (Julian and Green, 2011) and Hunter and Schmidt (2004). Third, the gradual selection model (Copas and Li, 1997) supposes the selection probability 1 + δ2 βi , σi P (study i is reported) = Φ δ0 + δ1 (21) where δ1 , δ2 > 0, and uses two-step estimation for meta-analysis. It suggests that as σi becomes small, the publication probability becomes close to 1 so that G∅ > G̃∅ . Nevertheless, publication probability is also increasing in βi so that marginally insignificant negative results are more likely to be reported than negative results with large magnitude: H0 < H̃0 . The Copas correction method iteratively fit δ2 values starting from zero, and stop when the residual asymmetry unexplained by the model is sufficiently small. 4.2 Tests To test these predictions, the paper develops a semi-parametric Kolmogorov-Smirnov-type (henceforth, KS) test that compare the predicted and observed distributions among null and negative results. While the meta-analysis literature posited the aforementioned models of underlying selection process, to the best of my knowledge, they have not tested their models despite its feasibility under some common parametric assumptions. While the estimates have joint two-dimensional distribution on {β, σ}, the analysis focuses on each dimension separately since KS test is well established for one-dimensional random variable, and since it allows for interpretation with prediction along each dimension as summarized in Table 2. Assumptions. The test uses two assumptions that are commonly used in meta-analysis models: A1. Underlying effects with independent and identical normal distribution: for all i bi ∼ N b0 , σ02 Since σ0 can take any value, this heterogeneous effects assumption accommodates, but does not impose, commonalities across studies37 . In addition, A1 supposes that the study precision σi is independent of the true underlying policy effect: E [β|σ] = b0 ∀σ, and that the underlying 37 This model is referred to as “random effects” model in biostatistics. 29 effects are homoskedastic.38 A2. No inflation within statistically significant results: if |βi | σi ≥ t, then the data generating process is βi = bi + i , where i ∼ N 0, σi2 . This assumption says that the bias due to inflation of marginal results is unimportant. With the limited sample sizes that is relevant for meta-analysis, it is infeasible to distinguish the inflation and omission; such test requires large sample size at the margin of statistical significance. While restrictive, this is an assumption applied in all other studies on bias correction. Under A1 and A2, the likelihood of an observation {βi , σi } can be written semi-parametrically: a parametric assumption on one dimension along βi of joint distribution {βk , σk } is necessary to extrapolate counterfactual distribution, and sufficient to flexibly cover another dimension across σi ; it is not necessary to make parametric assumptions on distribution along σi . Note that it is not necessary to assume that all statistically significant results are reported: even among statistically significant results, publication probability could be higher among precise studies. Publication bias occurs due to differential publication likelihood along the coefficient values conditional on the precision. Thus, the analysis uses the distribution of statistically significant results as reflecting both underlying distribution and selection inherent in study precision to construct the counterfactual distribution of negative results that reflects both considerations. Bootstrap-based two-step estimation and testing. The test involves two-step estimation of counterfactual distribution and its KS test with the observed distribution: the test first estimates the parameters {b0 , σ0 } to construct the counterfactual distributions, and then estimates the KS statistics. Estimation step 1. estimate distribution of {b0 , σ0 }: use the maximum likelihood estimation ∗ β −b n 0 i with likelihood function Πi=1 φ √ 2 2 , where i = 1, ..., n∗ are some most precise studies. σi +σ0 As will be discussed in the bias correction section, n∗ is chosen to maximize the precision of the n o estimate.39 Distribution of estimates b̂0 , σ̂0 is obtained by the bootstrap with replacement such that the sampling probability is proportional to the informational value of study i: 1 . σi2 +σ02 Estimation stepo2. construct the counterfactual distributions G̃∅ and H̃0 : for each ren alization of b̂0 , σ̂0 , estimate the counterfactual distribution G̃∅ using the standard error distributions of significant results, and H̃0 using the standard error distributions of negative 38 This condition is violated in the presence of small study effects: when σ is large due to small sample size, the true effect b0 could also be larger because the experimenter can pay more attention per subject. Nevertheless, it is also a widely used simplication in meta-analysis. 39 Since this test is used to propose the bias correction approach, one may be concerned that the conclusion and the method choice are mutually supporting each other. However, since this bias correction approach is appropriate even under other models of publication bias, such circularity of argument does not arise here. Moreover, existing methods are misspecified if the predictions of this paper’s model applies. In addition, the estimation step of b̂0 , σ̂0 uses the entire sample without regard to the studies conclusions because it can improve the estimate of b̂0 , σ̂0 . 30 results: G̃∅ σ|b̂0 , σ̂0 = 1 C Φ √tσi2−b̂0 2 −tσi −b̂0 −Φ √ 2 2 σi +σ̂0 X 1−Φ i||βi |≥tσi √tσi2−b̂0 2 σi +σ̂0 σ +σ̂0 i 1 (βi ≥ tσi ) + Φ √tσi2−b̂0 2 σi +σ̂0 1 (σ ≥ σi ) 1 (βi < tσi ) (22) β−b̂0 √ Φ 1 X σi2 +σ̂02 , 1 , min H̃0 β|b̂0 , σ̂0 = n0 i|m =0 Φ √tσi2−b̂0 2 i (23) σi +σ̂0 Φ where C = P √tσi2−b̂0 2 σ +σ̂ i||βi |≥tσi 1−Φ tσi −b̂0 σ 2 +σ̂ 2 0 i √ i −Φ −tσi −b̂0 √ 2 2 σ +σ̂ 0 i 0 1(βi ≥tσi )+Φ √tσi2−b̂0 2 σ +σ̂ 0 i . The algorithm uses 1(βi <tσi ) sufficiently fine grid of σ and β for the numerical estimation of the distributions. 40 , and compute the Testing step. estimate the KS statistics distribution through bootstrap p-value for one-sided test: for each counterfactual distributions G̃∅ σ|b̂0 , σ̂0 and H̃0 σ|b̂0 , σ̂0 , compute the one-sided KS statistics: n o (24) n o (25) DG b̂0 , σ̂0 = sup Ĝ∅ (σ) − G̃∅ σ|b̂0 , σ̂0 DH b̂0 , σ̂0 = sup Ĥ0 (β) − H̃0 β|b̂0 , σ̂0 For each counterfactual distribution, the test applies the inverse cumulative distribution function method to generate the simulated empirical distribution with sample size of n0 . The o n p-value of each distribution given b̂0 , σ̂0 is the fraction of simulated random draws such that their KS statistic is at least as large as DG or DH respectively. Since the p-value of the KS test is defined as the probability that the maximum difference between the theoretical distribution 40 The analysis employs the bootstrap methods to account for two difficulties in computing the p-value of this test. First, unlike the standard KS test between a theoretical distribution and an empirical distribution, the theoretical distribution has its own uncertainty due to the estimation with a finite sample. Since ignoring this uncertainty in two-step estimations (Newey and McFadden, 1994) underestimates the p-value in this application, I incorporate them by bootstrap sampling. The bootstrap method is appropriate for distribution of b̂0 , σ̂0 since the parameters of interest are not the extremum statistics of the parameter space. Second, unlike the standard two-sample KS test with well-defined sample sizes, one sample size is used to construct the counterfactual distribution as their weighted sum. Since the proof for KS statistics that describes rate of convergence uses the uniform weights across samples, it cannot be directly applied to this test. Bootstrap simulation to compute the mapping between KS statistics and p-values using the identical sample sizes and distributions is a straightforward solution to this trouble. The bootstrap with inverse cumulative distribution function method incorporates the high KS statistics threshold due to small sample size; it makes multiple observations take the same value in the distribution, thus making it more likely to have large KS statistics. KS approach is known to be conservative as it applies to any distributions; for specific distribution, more efficient thresholds are often available. Since the meta-analysis is limited for its sample size, knowing the exact KS test statistics thresholds and increasing the power of test is valuable. 31 and empirical distribution of same sample size is at least as large as the statistic D, the average n o p-value across the bootstrap estimates of b̂0 , σ̂0 is the p-value of this overall test. The test repeats this simulation until the estimated p-value converges. 4.3 Applications Two examples, effect of labor unions on productivity and Elasticity of Intertemporal Substitution (EIS), from economics literature are used to test the predictions, chosen on the basis that their data were publicly available either directly in their papers or the author’s websites41 , and both had large sample size to ensure power of the tests. These examples are used only as exploratory analysis, not as a representative test that applies to other disciplines. Whereas choice of sample size is endogenous in lab experiments in psychology and medicine, it is largely determined by the survey designed by others in many observational studies in economics. In this way, the nature of selection may be different and economics example is suitable for the main mechanism of the model. Sample restriction. This analysis takes the example of the labor union effects as an example of literature with vote counting and binary conclusions, and EIS as a counterexample of such communication for two reasons. First, whereas the former is an example with binary conclusions without one well-defined outcome variable, the latter has a continuous outcome to be used as parameter values of models with a coherent definition. Second, whereas in the former literature 34 out of 73 studies highlight their binary conclusions in their abstracts, only 60 out of 169 studies mention their binary conclusions explicitly there. To highlight the role of dichotomization of conclusions, the analysis focuses on the papers with binary conclusions stated upfront for labor union effects, and papers without such conclusions for EIS. To classify the papers whether they highlight the yes-or-no conclusion, the analysis applied the following criteria: whether the binary conclusions of null hypothesis testing are stated in either abstract, introduction, or conclusion of the paper42 . If they were not mentioned in such easily accessible places of the article, the vote-counting readers are unlikely to learn from them (for example, Batt, 1999). Conversely, when the author wants to communicate particular estimates they do not feel conclusive, they tend to only contain brief discussions in the text. Finally, the stark contrast between the two datasets is already clear from their funnel plots 41 To the best of my knowledge, there were only a few meta-analysis datasets readily available and had comparable sample size that ensures power. Among them, I chose ones with different sources of authors. One limitation of using the ones with large sample size is that it could be systematically different from the metaanalyses with smaller sample size. 42 Since this decision involves some subjective judgements, I complement the analysis in two ways: (i) the textual evidence for binary classifications and their discussions are available online at http://economics.mit.edu/grad/chishio/research, and (ii) Appendix C1 presents the analysis with entire data sets. While the pattern becomes weaker, the overall conclusions are identical. There were 7 book chapters for labor union effects, where there is no requirements that the main conclusions are summarized in abstracts or introduction unlike journal articles. I have checked all available books, and they mostly discussed the results. Thus, the analysis included all the estimates from book chapters. 32 Table 4: Summary of Kolmogorov-Smirnov-type tests Labor union effect EIS G∅ H0 G∅ H0 p-value 0.0036 0.0052 0.0287 1.000 D 0.3872 0.2892 0.3052 0.0181 D 0.2832 0.1977 0.2730 0.4341 arg sup 0.0799 -0.0632 0.1180 61.82 b̂0 0.0075 0.0505 σ̂0 0.0642 0.0956 ∗ n 30 18 Used Information 0.6692 0.3767 Notes: Table 4 summarizes the stem-based estimates and the results of KS-type tests described in section 4.2. D denotes the KS statistic with the predicted distribution as in (24) and (25). D denotes the critical values of KS statistics for rejecting the null hypothesis at 95 confidence level. This is included for benchmark comparison although the p-value cannot be computed by comparison of KS statistics. arg sup denotes the minimum parameter values {σ, n β} at which the KS statistic is measured. The po value is the overall p-value computed by bootstrap. b̂0 , σ̂0 denotes the overall estimate and n∗ is the number of included studies. Used information denotes the measure in (27). contained in Appendix C2: the positive correlation test does not detect the omission in the former example whereas it does in the later. The tests herein illustrate how they may reflect the vote counting concerns. 4.3.1 Example: Effect of unions on productivity (Doucoulagos and Laroche, 2003) There is a large theoretical controversy and empirical discrepancy regarding the effect of labor unions on firm productivity. The question is relevant for policy decision on labor market regulations to promote or prohibit union formations with aim of increasing US firm competitiveness. Neoclassical economics would suggest that unionization reduces firm productivity by raising wages above competitive prices or by lowering the return to capital due to labor’s rent-seeking. On the other hand, some scholars suggested that labor unions can improve labor productivity by improving communication between firms and workers, thus lowering inefficient turn-overs. Both effects are theoretically plausible: this is an example in which the symmetric prior on both directions is plausible. Since the positive effects are contrary to the prior given by economic theory, positive coefficients are mapped to the “positive results” in the communication model. Data. Doucouliagos and Laroche (2003) collected 73 studies that estimate the effect of unionization on firm productivities. The average t-statistics and coefficient magnitudes of main specifications are reported in Table 1 of their paper43 , which is the preliminary source of 43 Since the outcome variable, firm productivity, needs not be scale-invariant, one may be naturally concerned that the standard errors reflect arbitrary re-scaling of variables rather than precision of estimates. To examine the extent to which the standard error reflects the estimate precision, I regressed the inverse of the variance 33 data for this analysis. A major drawback of the data is that, because they take the average tstatistic among multiple specifications, their headline conclusions may differ from conclusions inferred by the conventional hypothesis testing. For example, Brown and Medoff (1978) is a seminal study that claimed “unionization is found to have a substantial positive effect on output per worker” as stated upfront in their abstract. However, due to various attempts to relax assumptions, the average t-statistics of the paper is 1.95 in Doucouliagos-Laroche meta-analysis data set, making the study be taken as statistically insignificant at conventional threshold of 1.96. To avoid erroneously counting these studies as claiming negative results, 5 studies are excluded in total. With these criteria, there were 54 included studies with 17 positive results and 37 negative results. Results. The stem-based bias corrected estimate is not statistically different from zero. Note that the underlying heterogeneity measure σ̂0 is large, consistent with Freeman (1988)’s claim that the effects can be very heterogeneous. First, Figure 6 Top shows that null results have standard deviations that are first-order stochastically dominated by what can be predicted using the statistically significant results under the uniform omission model: Ĝ∅ > G̃∅ (p = 0.0035.) The interpretation of the KStest statistic is that, although the statistically significant estimates predict that only about 33 percent of studies among null results have σi < 0.11, there are over 72 percent of studies with such σi . That is, most standard errors of published null results are precisely estimated, contradicting the uniform selection model. Second, Figure 6 Bottom shows that the distribution of coefficient magnitudes among negative results is first-order stochastically dominated by the predicted distribution: Ĥ0 > H̃0 (p = 0.0047.) The interpretation is that, compared to the counterfactual of uniform omission of negative results that predict only about 25 percent of studies have β̂ < −0.1, the empirical distribution has approximately 54 percent taking β̂ < −0.1. This contradicts the extremum and gradual selection models, which predict that strictly less than 25 percent of studies have β̂ < −0.1. In a review, Stanley (2005) applies a χ2 -test to suggest that there are too many statistically significant results. This test confirms that such biases arises not only from positive results, but also from negative results. Using the positive correlation tests, Doucouliagos and Laroche (2003) suggested that the publication bias is not an issue in this literature. However, this analysis highlights that the selective omission is most likely to be prevalent here, too: the lack of positive correlation could be due to the near-zero underlying mean effect and theoretical plausibility of effects on both directions. on the sample size. Among all 73 studies, the coefficient was 0.99 and the R2 was 0.96. This indicates that standard error mostly reflects the estimate precision that increases with its sample size. 34 Figure 6: Distribution of negative studies’ standard errors and coefficients for labor union effects Notes: Figure 6 plots the cumulative distribution function of standard errors of null studies (Top) and coefficients of negative studies (Bottom) for estimates of the labor union effects. The solid green line represents the observed distribution and the dashed blue line represents the predicted distribution along with its 95 confidence interval estimated by the bootstrap. The solid red and vertical line quantifies the Kolmogorov-Smirnov statistic. 35 4.3.2 Counterexample: EIS for consumption (Havránek, 2015) To demonstrate that academic communications does not necessarily involve vote-counting, this subsection also discusses the EIS for consumption, the key economic primitive that links interest rates and consumption growth. Due to their wide applicability, they are not only estimated a number of times, but also have controversies such as disparities in estimates from microdata and calibration from macrodata. The goal of meta-analysis is not to proclaim the average estimates across studies as that to be used for calibrations of representative agents invariant of context or model; instead, it is to characterize robust heterogeneities across settings beyond study-specific idiosyncratic variations. Havránek (2015) reviews 169 papers and demonstrates that, for EIS, there exist systematic differences for estimates of asset holders, incorporating non-separability with durable goods, focusing on total consumptions rather than only food, and separating the intertemporal substitution and risk aversion by Epstein-Zin preferences. Such systematic reviews are useful for researchers since checking all papers on one’s own can be prohibitively time-consuming for individual readers.44 Data. Among Havranek’s data sets, this analysis focuses on 117 (or 118) papers that estimate the common categories among key heterogeneities45 and takes the median estimates from each paper following Havránek’s discussions. Moreover, to highlight the pattern of publication bias in the absence of vote-counting concerns for the testing, it takes the 68 papers out of 117 that do not mention their EIS estimates in their abstracts, introductions, nor conclusions. Results. While 53 out of 117 studies have positive and statistically significant results, the bias-corrected estimate is not statistically different from zero. This is consistent with the conventional understanding and Havránek’s conclusion that the estimation using timeseparable preferences and total consumption data with non-asset holders leads to near-zero IES. First, Appendix Figure C2 (Top) shows that the empirical distribution of null results Ĝ∅ are marginally more concentrated on precise estimates than what would be predicted under the uniform selection model G̃∅ . The magnitude is similar to that of the union productivity effect. Second, Appendix Figure C2 (Bottom) shows that, contrary to this paper’s model, the empirical distribution of coefficients of negative results Ĥ0 is not more negative than the prediction H̃0 ; instead, the one-sided test on the opposite direction is significant (p = 0.000). As the Figure demonstrates, the largest discrepancy between the empirical distribution Ĥ0 ˆ = 0, indicating that the negative and counterfactual distribution H̃0 occurs precisely at EIS estimates are systematically less likely to be published than positive estimates. This also ˆ = 0 is explains the first result that Ĝ∅ > G̃∅ . When the change in reporting probability at EIS taken into account, the two distributions Ĥ0 and H̃0 becomes very similar. This suggests that the normality assumption was suitable in this example. Moreover, it is not entirely consistent 44 Important biases arising from aggregation, model mis-specification, small samples, seasonality, slow-moving omitted variables, and weak instruments are not resolved by the meta-analysis. 45 Those who do not hold assets, without durable goods, have variables not for total consumption, and not use the Epstein-Zin preferences. 36 with gradual or extremum selection models although the predictions Ĝ∅ > G̃∅ and Ĥ0 < H̃0 match because these models predict different patterns of omission among null results. As demonstrated by his discussion of p-values, Havránek (2015)’s critique was that the reporting bias is potentially caused by the preference to report statistically significant coefficients. While this could be important, the above analysis showed that the positive correlation between coefficients and standard errors is also largely due to the theoretical implausibility of models with convex utility functions. My impression of the literature is that the authors are very careful in raising caveats for proper interpretations of the parameters. Moreover, the literature has strong focus on the mechanisms rather than the particular estimate values, and is willing to be upfront about empirical limitations of the models. Since there is common knowledge that the most parsimonious models may generate economically implausible parameters, that efforts focused on finding suitable alternatives with theoretically consistent estimates appears constructive. Lastly, this example highlights another important mechanism of omission that even four distinct models of publication bias does not capture: to be in the plausible range of values. In this sense, it highlights the difficulties of proposing a model that applies universally to any meta-analysis data. 4.3.3 Discussions Alternative explanations. As discussed in the sensitivity analysis, since negative results cannot be consequential in the absence of other positive results, the early literature tends to have large coefficients. In addition, if the data improve over time, the standard errors may become gradually smaller. While some of the earliest influential studies in both fields, such as Brown and Medoff (1978) and Hansen and Singleton (1982) were positive results, the regression of coefficients and standard error on years of publication shows both statistically and practically insignificant correlation between them46 . Thus, these examples show that, even without dynamic considerations, publication bias can arise and persist as the communication model showed. While the model focused on the tendency of communication through vote-counting as a reason for the difference between two examples, the difference in prior belief could also be important. While all positive, negative, or no effects are theoretically plausible for labor union effects, only positive estimates are theoretically coherent for EIS. As the distribution of coefficient demonstrates, the economists’ thin prior on non-positive EIS may be an important reason for the aforementioned results. Qualitative examples. The empirical tests formalized the pattern of publication selection that is qualitatively consistent with common observations. First, precisely estimated null results are often published and disseminated widely. Some examples include the large-scale 46 The raw regression of βi = θpublication yeari + i and σi = θpublication yeari + i has R2 of 0 and statistically insignificant coefficients θ in both examples. 37 clean cookstove study (Hanna et al., 2016) and the air pollution regulation in Mexico (Davis, 2008), and the community-based development programs (Casey et al., 2012). It would be remiss not to acknowledge a well-known counterexample: DEVTA study, the largest randomized trial that showed null effects of deworming and vitamin A supplementation on child mortality and health, was not published until 7 years after the data collection (Garner, 2013). While the delay required by the careful analysis of authors is extensive, that it was published in the Lancet, a top medical journal, is, in a way, reassuring of the academic journal’s willingness to report precise negative results. Second, results with large magnitude of coefficient with negative signs also tend to be published even if they may not be so precisely estimated. The examples from economics include the negative labor supply elasticities close to −1 among the taxi driver papers (Camerer, 1997) and the positive impact of inequality on economic growth (Forbes, 2000). In medicine, the unexpected harmful effect of a therapeutic strategy on the cardiovascular events found in one trial was widely reported (The Action to Control Cardiovascular Risk in Diabetes trial, 2008; Boussageon, 2011) while subsequent meta-analyis found that the aggregate effect is not statistically significant. The trim-and-fill method that uses the extremum selection model acknowledges them as the major drawback of their methods (Duval and Tweedie, 2000), and considered them as anomalies in their interpretations of publication bias. Because it does not suit the common Begg’s and Egger’s positive correlation tests, Stanley (2005) called them as “type-II” publication bias. A contribution of the model is to rationalize the data that were considered as outliers in existing interpretations. 4.4 Bias Correction Method The assumptions on the data generating process of meta-analysis data set are fundamental to the choice of publication bias correction method, which can be applied to over 4,000 meta analyses conducted every year. These meta-analyses are considered to be the most credible synthesis in the hierarchy of evidence. What could an alternative bias correction method be if existing assumptions on selection process suffer from a large degree of misspecification? This paper proposes a new, simple, and fully data-dependent bias correction method that is robust under all of existing publication selection models. It uses the common implication of these models that the bias of reported estimate βi is increasing in its standard error σi at least when σi is small. It focuses on some most precisely estimated studies to maximize the precision of the meta-analysis estimate. I suggest the name “stem-based method” because it uses the “stem” of the funnel plot for the estimation. 4.4.1 Common Prediction While the preceding discussions focused on the differences among the models of publication selection, there is one useful implication that holds under all of the models: 38 Proposition 3 (minimal bias among most precise studies). Define Bias(σi ) ≡ E [βi |σi , study i reported] − b0 . 1. (Limit) (i) Consider the uniform and gradual, and extremum selection models. lim Bias (σi )2 = min Bias (σi )2 σi σi →0 For uniform and gradual selection models, minσi Bias (σi )2 = 0. (ii) Consider the extremum selection models. If σ0 = 0, then lim Bias (σi )2 = 0 σi →0 2. (Monotonicity) (i) Consider the uniform selection model. For any b0 6= 0, there exists σ > 0 such that Bias(σi )2 is increasing in σi for σi ∈ [0, σ]. If b0 = 0, Bias(σi ) = 0 always. (ii) Consider the extremum and gradual selection models. Bias(σi )2 is increasing in σi for all σi . Proof. Appendix C4. The first statements on the limit of σi states that, when the sample size increases to infinity, the bias due to omission approaches its minimum.47 The reason is that there is no reason not to report the results whose binary conclusions are almost sure to be correct unless there are studyspecific heterogeneities that increase in sample size cannot eliminate. The second statement points out that the extent of bias is increasing in standard errors. For the extreme and gradual selection models where the direction of bias is monotone, this is because, as the sample size drops, more estimates are likely to be affected by the selection. For the uniform selection model where the omission is assumed to occur in both sides, the bias could be decreasing among the most noisy studies due to the symmetry of thresholds around zero. However, this is unlikely to be an important concern for two reasons: first, as Appendix C4 shows, low σ occurs only when the underlying b0 as well as the bias is also small. Second, the method to be proposed will take into account the overall estimate that is closest to 0 so that, if the bias were indeed decreasing among noisy studies, those values will be taken as the hypothesized true value at the pilot stage, leading the method to include them in samples. The analogous monotonicity arises under the plausible parameter values in this paper’s model. As Table 2 has shown, the asymmetry between thresholds β and β was increasing in the standard error, leading to the increasing bias. Moreover, the bias among the most precise studies was small. In reality, to make the model’s prediction relevant, one has to assume that, 47 With extremum selection models and this paper’s communication model, the bias does not converge to zero even when σi approaches zero. While this may appear concerning, it is not resolved by other methods of bias correction either. 39 when the estimated coefficient β is clearly small for practical significance, the authors can claim negative result even if it were statistically significant. 4.4.2 Criterion n Let the objective of the estimator b̂0 , σ̂0 o be maximization of the precision of the estimates. One of the most common approach with this objectives is to minimize the mean squared error (MSE): min MSE = E b̂0 − b0 2 (26) The MSE can be conveniently decomposed48 into the Variance= E b̂0 − Eb̂0 2 due to idiosyncratic errors, and the Bias= E bi − Eb̂0 due to systematic errors: MSE = Bias2 + Variance While optimization of this objective is infeasible since the underlying b0 is unknown, this paper proposes a criterion that approximates this bias-variance decomposition. This is a common approach used in optimal bandwidth choice for kernel estimations as well as sample choice of machine learning. How could an estimator minimize MSE so that it is efficient?49 The bias-variance decomposition suggests that the estimator is efficient when it focuses on some most precise estimates i = 1, ..., n∗ . First, when comparing two studies with different precisions, it is always optimal to include more precise studies for σi < σ. This because more precise studies are less biased in expectation (consequence of selection models), and because more precise studies provide lower variance (consequence of Bayes formula: Variance= P 1 σ02 +σi2 −1 ). Second, the bias-variance trade-off provides useful guidance for n∗ , the optimal number of studies to include for maximizing efficiency. As more studies are included, the bias increases while the variance decreases by increasing the number of total observations and experiments. Including more studies is beneficial even though it may introduce slight bias in the estimate if it can largely decrease the variance by increasing the total observations and the experiments. While the theory cannot show that the objective of minimizing MSE(n) is necessarily convex, the Appendix Figures C3 show that its empirical approximation is convex in the examples considered in this paper. If there were no selective reporting, all studies will be 48 E b̂0 − b0 2 = E b̂0 − Eb̂0 + Eb̂0 − b0 2 = E b̂0 − Eb̂0 | {z Bias2 2 } + E b0 − Eb̂0 | {z Variance 2 } + 2E b̂0 − Eb̂0 | {z =0 b0 − Eb̂0 } Note the difference between t-statistics σβ and bias-variance trade-off. In MSE criteria, higher t-statistics is desirable if and only if it is due to low variance. If high t-statistics were due to β, then it increases bias and thus undesirable. This logic suggests that, when the meta-analysis estimate rejects the null hypothesis, it is not merely due to the algorithm that allows for some slight bias since the algorithm at the same time allowed slightly high variance by not including many studies. This is confirmed by the simulation with proposed choice of initial b̂pilot . 49 40 included since the bias does not arise by including more studies. In contrast with the objective in the communication model a (bi − c), the objective of maximizing the precision of overall estimates is symmetric. The communication model focused on making the efficient binary policy decision a ∈ {0, 1} given some target c, which introduced asymmetry in the objective. In this objective in the absence of asymmetry, the full disclosure of all estimates is always optimal. 4.4.3 Estimation n To operationalize the criterion, the pilot estimate of b̂0 , σ̂0 o pilot is necessary. Using the pilot estimates, it becomes feasible to compute the MSE at each number of included studies n = 50 . Finally, the optimal 2, ..., N by suitable estimation methods such as maximum likelihood n o number of studies is determined and the efficient estimate of b̂0 , σ̂0 ∗ can be obtained. I n make specific proposals for the pilot estimate with the priority to be conservative. Concretely, the estimation takes the following two steps: n 1. Estimate the random effects model parameters b̂0 , σ̂0 o n of normal distribution using the most precise studies i = 1, ..., n for n = 2, ..., N (σ1 ≤ ... ≤ σN ); 2. Choose the optimal number of studies n∗ by computing MSE(n). The choice of pilot estimates will be guided by the maximin model in decision-theory: when not have reliable information on distribution, take the conservative choices. Given that publication bias is primarily concerning for estimates away from the null hypothesis, a natural conservative pilot value for the mean effect is the one closest to it: b̂0,pilot = b̂0,n |n ∈ arg minñ∈{1,...,N } b̂0,ñ 2 .51 This makes the estimator conservative: a small simulation study using the observed distribution of standard errors from the studies on unions and simulating the same sample size with true effect b0 = 0 shows that the null hypothesis is rejected with probability only 0.035, whereas standard method yields 0.05. This demonstrates a moderate degree of type II error. At the same time, if the meta-analysis method rejects the null hypothesis, we can be confident that it is not driven by this methodological choice. This also addresses the concerns raised for the uniform selection model: noting that the model is primarily concerned with the bias away from zero, if the bias indeed decreases for large standard errors, the pilot estimate will still be those with lowest bias level across n∗ = 1, ..., N . 50 Like uniform and gradual selection models, this method will by default assume the normality of distribution. Nevertheless, it is straightforward to adjust to other models such as having higher omission probability below value of zero. This extension is used for the example of EIS. 51 The precise estimation of bias for methods using bias-variance trade-off is often difficult. For example, Imbens-Kalyanaraman optimal bandwidth applies some regularization factors to avoid the overestimation of bias (Imbens and Kalyanaraman, 2011). While this choice of initial value is not perfect, I argue that it is a fully data-dependent and reasonable choice. This explains the standard case of null hypothesis testing. When the null 2 hypothesis (b̂null ) is different than zero, one can write b̂0,pilot = b̂0,n |n ∈ arg minñ b̂0,ñ − b̂null . Although this 2 may not be the most conservative choice since b̂0,pilot = b̂0,n |n ∈ arg maxñ b̂0,ñ − b̂0 would give even larger bias, such choice may be too conservative since this exacerbate the type II error. Moreover, this specification will require altering b̂0,pilot depending on the estimated b̂0 . 41 Given that MSE is increasing in variance, the conservative choice of underlying standard error is to take the largest value among all estimates: σ̂0,pilot = maxn σ̂0,n . It safeguards against choosing n∗ that is unreasonably low: estimates of σ̂0,n tend to be small for low n, thus, the algorithm may not include further studies only because it ignored the potential underlying heterogeneity. Since high σ̂0,pilot estimate implies larger value of including more studies to reduce overall variance, this algorithm lean towards including many studies and allowing for slight bias in the point estimate. It is helpful to quantify the fraction of information used for the meta-analysis estimate. Applying the notion of information as proportional to its sample size52 , the fraction of information used for the stem-based estimate is Pn∗ ∗ Used Information (n ) = 1 σi2 +σ̂02 PN 1 σi2 +σ̂02 (27) This quantity can inform how much information the algorithm has thrown away to avoid publication bias in the meta-analysis. The high measure of used information indicates that there were little publication bias. In the simulation study where no publication bias exists, the algorithm appears to use approximately 60-70 percent of information with sample size 20, and 50-60 percent with sample size 100. 4.4.4 Discussions The stem-based bias correction method has its advantages in minimal assumptions and simplicity. First, while other correction methods relied on particular form of publication selection, this method only uses bias monotonicity assumption, which is satisfied under all proposed models of selection. In this sense, the conclusion will not be driven by the particular forms of selection the meta-analysts choose. As the examples demonstrated, even four models of publication bias are not sufficient to capture entire considerations such as plausible values, impact of inflation, or genuine finite sample bias or small study bias. This method can suitably address these concerns by incorporating them in the likelihood functions. Second, the estimation applies an intuitive criteria defined by minimization of MSE and allows for straightforward extension to new methods such as network meta-analyses.53 Moreover, unlike the uniform or gradual bias correction method, it has a visual representation that can help with communication to readers, as presented in Figure 7. While other methods posited particular publication selection patterns 52 To understand the intuition behind expression Pn∗ 1 2, σi2 +σ̂0 set σ̂02 = 0 so that there is no between-study heterogeneity. In this case, the measure is simply the sum of the sample sizes in included studies among total sample size because σ12 is proportional to study i’s sample size. When σ̂0 is high, the relative informational i value of noisy studies increases, leading to higher number of optimal inclusion n∗ . 53 For example, Barth et al. (2013) has already used similar methodology although their decision to focus on some most precise studies was not based on the concern for publication bias but rather on the study quality. Moreover, Stanley et al. (2010) also suggested focusing only on the most precise studies can alleviate the publication bias. Both papers’ decisions for the number of studies to include were made arbitrarily; in this sense, the method provides a formal criteria for such analyses. 42 to correct the bias, this method allows for flexible patterns of selection and finds the optimal way to mitigate the bias. The disadvantage of the method is that, because it does not impose particular structure on selection process, it does not utilize all information of data when some particular selection process is appropriate. Figure 7: Example with labor union’s effect on productivity (above) and with EIS (below) Notes: Figures 7 illustrate the examples of the funnel plot-based visualization of stem-based bias correction method. The diamond at the top of the figures indicates the estimate along with its 95 percent confidence interval. The curved line indicates the initial MLE estimates b̂0,ñ for ñ ∈ {1, ..., N } as more less precise estimates are included. The diamond at the middle of the curve indicates minimal level of precision for the inclusion. Therefore, the stem-based estimate is given by the studies, represented by circle, whose precisions are above this diamond. It also corresponds to the “stem”, the thin part, of the “funnel.” As the figures indicate, the diamond is at the position right before the bias becomes large. Precision measure scales are adjusted for readability. 43 5 Discussions The following discussions will first examine the validity of key assumptions of the model; second, they will explain how the model builds on theoretical literature on information aggregation and communication; third, they will clarify the model’s implications for the policies to address publication bias. 5.1 5.1.1 Key assumptions Vote counting The central friction of the set-up is the technological restriction that policymakers only count the yes-or-no conclusion of studies. The general idea that scientific communication involves simplification of researchers’ knowledge appears uncontroversial: whether due to limited expertise to understand subtleties, to cost of time to process information, or to paucity of memory to recall details, the intended audience may not consult all information in papers to make their decisions.54 This is a part of our daily experience, and the demand for systematic reviews demonstrates that fully sophisticated aggregation of evidence is costly for individual readers to undertake. As long as message space is smaller than signal space, then the addition of an option to omit results is welfare-enhancing as it enriches the message space. Specifically, while less universally applicable, many reviewers may simply compare numbers of positive vs negative results. First, this restriction implies that the readers’ interpretation ignores coefficients’ magnitude. Such simplification may occur despite the original papers’ nuanced conclusions not only due to highlighted binary conclusions in titles and abstracts of papers, but also due to convenience when summarizing a series of results succinctly in review articles for conveying impression of results. The Cochrane Review also suggests that, when the statistical specifications or variable definitions are diverse across studies, simple votecounting may be an efficient way of aggregation (Higgins and Green, 2011).55 Second, it also assumes that the insignificant intermediate results and negative significant results are bundled 54 This approach relates to the literature on bounded rationality that incorporates cognitive limitation of processing information. For example, see Mullainathan, 2000 and Schwartzstein, 2014. 55 Note that this result is robust to arbitrary weights readers may assign when they also consider the precision of study, so long as not only the readers but also the researchers themselves know each other’s precision. To see this, consider an example of two researchers model with one with very high precision, and another with very low precision. If researchers themselves know each other’s precision, then the one with high precision almost always publishes his result whereas the one with low precision almost never does. When the low precision researcher publishes, it is because he received a signal of extreme magnitude. In this example, even though policymaker knows the precision of each study, he weighs each study equally: if she weighs low precision studies less, then low-precision study claiming negative result is never consequential, and eliminating such possibility cannot be optimal. In practice, applying weights can be difficult when the models are complex. For example, the International Panel on Climate Change adopts “model democracy” in which every model in articles above their qualifications receive equal weights in aggregating estimates from various climate models (Knutti, 2010). 44 as non-positive results when vote counting. In the setting with binary actions, the papers’ conclusions can be perceived as recommendations and thus be reduced to two messages.56 Many researchers would agree that it is difficult to present a paper wherein the main claim is “inconclusive,” and sometimes significant negative results cannot be coherent with theory in some socio-economic settings. That language of “negative” results and “null” results are used almost interchangeably suggests that comparison of positive vs non-positive results takes place at least informally. Even influential regulatory agencies such as FDA adopt a simplistic rule of approving the drug if there were two positive results, illustrating how coarse aggregation is used in practice and non-positive results are considered equivalently. For example, to convey the gravity of publication bias, Turner et al. (2008) showed that 37 out of 38 positive studies were published whereas only 3 out of 36 non-positive studies were published as negative in research on antidepressants. Such comparison of binary messages, combined with binary actions, leads to the bias in reported coefficients in the stable and optimal communication equilibrium. 5.1.2 Binary actions The specification that the policy decisions are binary leads to the endogenous bias of reported βi through the asymmetry of the payoff function: the expected benefit of policy matters if and only if it surpasses some threshold to implement the policy. If the action space was continuous and the objective was to choose an action closest to the value of expected benefit with symmetric loss function as discussed in bias correction method (26), then the bias no longer arises. This assumption is suitable for examples of medical researchers who inform the regulatory agencies and doctors for their decisions on whether to adopt the drugs studies. Even when the exact policy decisions, such as level of minimum wages, can be continuously adjusted, the citizens or committee members may make binary voting decisions for specific parties or proposals with particular values. While any decisions can be modeled as continuous when mixed strategies are possible, binary action space assumption is a reasonable approximation of many policy decisions that have been adopted in a number of communication models (Dewatropoint and Tirole, 1999; Hao, Rosen, and Suen, 2001: Morris, 2001; Glazer and Rubinstein, 2004). 5.1.3 Conservative prior In addition to vote counting and binary actions, that prior expected benefit policy is less than its cost of implementation (c ≥ 0 while prior mean is 0) is necessary and sufficient for the bias of reported coefficients to be upward in the optimal communication equilibrium. This assumption may appear non-applicable to examples in which the goals are to confirm whether the existing policies are indeed effective. However, it is suitable for the main application of the model in which researchers are testing new drugs or programs that they believe could be 56 When included, the “inconclusive” results are often not included in the abstract but discussed briefly in the paper. As the example of EIS shows, vote-counting may not occur as predominant way of aggregation in some literature when the conclusion is not binary. 45 effective yet do not implement them without positive evidence. As only 10 percent of new drugs are approved (Hay et al., 2014)57 , it is plausible to consider average drugs that are rigorously tested are expected to be ineffective. 5.2 5.2.1 Relation to Literature Information asymmetry in voting This paper closely relates to the models of voting as process of aggregating private information (Austen-Smith and Banks, 1996; Feddersen and Pesendorfer, 1996, 1997, 1998, 1999). As discussed in section 3.2, while both models have pivotality condition as the key mechanism for explaining abstention from reporting, my set-up differs from theirs in two important aspects and deliver new results. First, the researchers can only send binary messages while their signals are continuous whereas these voting models commonly assume that signals are binary.58 Therefore, even when all researchers are equally informed, those who find coefficients of intermediate magnitudes omit their studies to reflect the strength, rather than the sign, of their signal. In addition, my result on allegiance bias is related to their result that uninformed independent voter may vote in the opposite direction of partisan bias of other electorates. My set-up with continuous signal extends this by demonstrating an amplification effect of strategic substitutability: biased researchers’ reports have even larger bias in coarse communication when unbiased researchers try to offset their bias. Second, the policymaker cannot commit to a particular strategy unlike in their voting setting with ex ante fixed voting rules.59 The model demonstrates that, even in optimal equilibrium, the biased reporting emerges whereas the voting literature showed that bias occurs under sub-optimal voting rule such as the unanimity rule. As their voting theory relates to the auction theory, this model reveals a novel connection of publication bias to the winner’s curse. The common interpretation focuses on the coefficient magnitude, βi , and viewing the published estimates as unbiased estimates suffers from the winner’s curse because low coefficient values tend to be suppressed (Young et al., 2008). This model suggests a perspective that focuses on the conclusion message, mi , and shows sending binary messages from all studies suffers from the winner’s curse. To see this, recall the setup with multiple researchers such that cost of policy, c, and prior belief of average effect, b, are both zero. If a researcher treats his own signal as the only available information and 57 Some may critique that this is not an appropriate comparison because the researchers use their private information to select the drugs into the experiment. One can think of the model’s prior N 0, σb2 to be the prior on the selected studies, which this example also shows. 58 Although this point has not been a focus of discussions in the voting literature, whether the coarsening occurs at the signal level or at the message level makes difference in the nature of the optimal equilibrium in terms of bias. To the extent that private signals are continuous in reality, my model offers a relevant novel explanation for abstention in voting. 59 As Morgan and Stocken (2008) clarifies, one major difference between voting models and communication model is the commitment power of the receiver. Without commitment to particular decision rule, this communication model requires the posterior consistency of decision-rule, which is infeasible to analyze analytically with continuous signals. 46 recommends policy without omission so that mi =1(βi ≥ 0), then he suffers from the winner’s h i curse in that he is making policy recommendations too often: for signals βi ∈ 0, β , he naively recommends thepolicy yet posterior combined with others’ signals is not sufficiently high, and for signals βi ∈ β, 0 , he naively recommends not to implement policy whereas the expected policy benefit is sufficiently high. In this way, omission for signals βi ∈ β, β by conditioning on others’ signal realization is a solution to this winner’s curse. The null hypothesis testing is often justified by the idea that scientific conclusions must be “certain enough,” as demonstrated in the risk aversion examples. When considering the collective communication, it could also be viewed as providing a solution to the winner’s curse by making researchers draw policy recommendations less than they would do on their own. 5.2.2 Communication and media My model also relates to the large theoretical literature on communication and media, which has focused on biased motives, reputation motives, and costly communication in the set-up where the message space is as large as the signal space. First, a key result of the costless communication models is that when there is conflict of interest between an informed expert and an uninformed decision-maker, only feasible form of communication becomes coarse to limit manipulation (Crawford and Sobel, 1982; Sobel, 2011).60 In contrast with this literature that focused on the consequences of conflict of interest on the endogenous coarsening of messages, this paper takes coarse message space as an exogenous technological restriction and derives the reporting bias endogenously with multiple senders that have no conflict of interest. In this sense, it highlights the causal relationship between coarse communication and reporting bias that is the opposite of what existing literature has focused on. Second, the key result in the repeated communication models is that reputational concerns can generate reporting bias even in the absence of conflict of interest (Gentzkow and Shapiro, 2006; Shapiro, 2016). The literature shows the biased reporting arises from the demand for reporting consistent with the heterogeneous prior beliefs receivers have (Suen, 2004; Gentzkow and Shapiro, 2006). Applied to academic communications, these models predict that senders frequently report confirmative results. In contrast, my model explains publication bias as arising from demand for easily aggregable pieces of evidence. It predicts that surprising results, information that swings one’s action away from the action taken in the absence of reports, are more reported.61 Third, the models with cost of communication (Banerjee and Somanathan, 2001) also predict that only extreme results are reported because the cost may be higher than the gain 60 This strategic consideration derived with one sender setting applies to the settings with multiple senders and one-dimensional noisy information (Hao, Rosen, Suen, 2001), but to a lesser degree with multi-dimensional information without uncertainty (Battaglini, 2002). 61 My result on allegiance bias is similar to these models in that it is the presence of other biased or inaccurate information sources that generates reporting bias (e.g. political correctness in Morris, 2001). In both models, if other senders are unbiased, the unbiased senders deliver messages in the way optimal to the receivers. 47 from reporting confirmative results that do not significantly alter policy decisions. The cost of writing and refereeing an academic paper is realistic and important, and as section 3.4.1 shows, my model’s qualitative predictions are robust to cost of communication. There are two differences in testable predictions of costly communication models and my model with asymmetric information. First, the former suggests inflation of coefficient does not happen unless there is conflict of interest that generates upward bias at the conventional significance threshold t. Second, whereas the former predicts removing the cost of communication can eliminate publication bias, the later suggests that the bias remains so long as readers coarsely summarize multiple studies. Finally, the paper relates to some literature that show decision-makers may value information or take actions in biased ways under various communication frictions. Suen (2004) demonstrates that news consumers may prefer like-minded sources because it can provide coarse messages that are sufficiently informative to influence the reader’s decision. Fryer and Jackson (2008) show that minority discrimination could occur due to coarse thinking that puts them into one category, leading to biased decisions. Blume and Board (2013) show that, even with common interests, the uncertainty about how well the receiver can understand messages inhibits communication as language barriers. This paper focuses on the interaction between coarse messages and informational asymmetry among multiple ex ante identical senders, which has not been explored in these class of models. 5.2.3 Existing theories of publication bias There are two sets of papers that theoretically address the policies to counter publication bias, and find that such bias may be socially efficient. The first set of papers considers the model with endogenous information acquisition, where the costs are incurred by biased researchers. In such settings, rewarding the researchers with the option to falsely reporti favorable results can raise the experimental effort closer to the social optimum (Glaeser, 2006). This incentive benefit may be greater than the social cost of false reports (Libgober, 2015). Others focus on settings with verifiable information and selective disclosure (Henry, 2009) as well as the optimal stopping problem of evidence accumulation (Henry and Ottaviani, 2014). In contrast, my model focuses on the collective communication with exogenous information, an inner problem of the models with endogenous information. An important difference is that the model assumes that the researchers do not have biased preferences, and focuses on the contrast of use of standard errors between null hypothesis testing and Bayes’ rule to provide reasons to draw positive conclusions with marginally insignificant results. Second, De Winter and Happee (2012) compares the publication practices of no omission and selective reporting of statistically significant results when the null hypothesis is the average of existing evidence, and shows that meta-analytic estimates are more accurate given fixed number of publications. While contradicted by van Assen et al. (2014)’s replication, my approach is similar to this model in that it focuses on the cost of communication. Whereas De 48 Winter and Happee consider direct communication of estimates, this paper’s approach focuses on the friction due to only communicating binary conclusions. 5.2.4 Existing empirics on information transmission The empirical section of the paper relates to the literature that examine the empirical predictions of the aforementioned models of information transmission. In the real-world setting, the voting literature shows that education is correlated with voter turn-out, which is consistent with the prediction of abstention (Feddersen, 2004). The literature on media uses measures of bias such as relative position to other media sources and intensity and tone of news coverage to analyze demand and supply of media bias (Puglisi and Snyder, 2016). In laboratory settings, both voting literature and communication literature have varied payoff and information structure to examine the validity of models’ predictions (Battaglini et al., 2010; Crawford, 1998). This paper advances these empirical tests by obtaining the direct measurement of bias in the real-world setting because academic research offers a unique opportunity to recover the distribution of underlying signals. In addition, to the best of my knowledge, this is the first attempt to test the various models of publication selection process the meta-analysts have proposed. 5.3 5.3.1 Policy Implications Common interpretations and recommendations Because of shared understanding that all results are needed to attain collectively unbiased estimates, it is common to interpret publication bias as a failure of incentive structure in the production of knowledge (Ioannidis, 2005; Every-Palmer, 2014). Researchers may want to support positive results due to their affiliation with industry funding source and partner organizations, or due to their beliefs in some theories. (Goldcare, 2010; Zilak and McCloskey, 2008; Doucouliagos and Stanley, 2013b) Journals and researchers may perceive insignificant results as less conclusive62 and influential, and thus publish significant results to maintain their reputation and to build their careers (Reinhart, 2015). Psychologically, everyone may subconsciously hold a predisposition for significant results (Card and Krueger, 1995) that are “entertaining”. A number of authors have expressed their concerns and consider omission of studies and inflation of results as a serious scientific misconduct (Chalmers, 1998; Martinson, 2005) that may misinform policies and consequently compromise the reliability of knowledge even more than explicit fabrication of data (John et al., 2012). Authoritative organizations have responded to accumulations of concerns and evidence by making formal recommendations on publication practices. For example, the International Committee of Medical Journal Editors (ICMJE) suggest that the publication decisions should be 62 Fisher also suggested that one cannot infer conclusion when hypothesis is not rejected, and NeymanPearson’s proposal to set a single alternative hypothesis is often considered to be arbitrary (Salsberg, 2002). While the model’s explanation is similar to this argument, it further explains why upward bias emerges on averge whereas merely explaining ‘conclusiveness’ based on p-values does not. 49 independent from the results of study: “Editorial decisions ... should not be influenced by ... findings that are negative... In addition, authors should submit for publication or otherwise make publicly available, and editors should not exclude from consideration for publication, studies with findings that are not statistically significant or that have inconclusive findings. Such studies may provide evidence that combined with that from other studies through meta-analysis might still help answer important questions ...” (ICMJE, 2016). The American Statistical Association also recommends that negative results should be reported to facilitate proper interpretation of literature: “P -values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. ... Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis” (Wasserstein and Lazar, 2016). 5.3.2 Current Policies Under such common interpretation that selective publications are due to biased motives, some corrective policies to adjust researcher incentives can enhance social welfare: even if experts are sophisticated and aware of selectin reporting, selective reporting can lead to low social welfare because it reduces information available to policymakers. Regulatory organizations and journals have put forth efforts to reduce publication bias with goal towards comprehensive disclosure of evidence. The FDA only considers that studies which have registered protocols and the analysis strictly followed them. The journal Basic and Applied Social Psychology banned use of null hypothesis testing (Woolston, 2015). In 2016, journals of American Economic Associations abandoned use of eyecatchers for statistically significant results in regression tables. Registration of pre-analysis plans has begun to “counter publication bias” (Siegfried, 2012) and become a norm in field experiments in economics (Miguel et al., 2014; Olken, 2015). The Journal of Negative Results help reduce the difficulty of disseminating the studies with negative results in various fields such as medicine. The United Nations calls for the governments across the world to have the clinical trials be registered for their approval (AllTrials, 2016). Despite the progress, many suggest that the issue of publication bias is far from being eliminated due to the limits of enforceability (Hunter and Schmidt, 2004). While registration is also necessary for observational studies to avoid any result-dependent reporting, the extension of registry to observational studies has been controversial (Editors of Epidemiology, 2010) and the attempts with clinicaltrials.gov has not been successful (Boccia et al., 2016). 5.3.3 Alternative interpretation: null hypothesis testing for communication Instead of following these common interpretations, this model builds on the literature on the history and sociology of statistics that argues that null hypothesis testing became widely accepted due to its advantage in communication (Shibamura, 2004). Despite its original contro50 versy, today the method is not only taught in classes and used in research, but also adopted by the judicial decisions on discrimination cases and regulatory decisions on environmental and medical policies. First, null hypothesis testing leads to binary conclusions, which are the messages with the lowest burden measured by informational entropy. Second, it is “objective” in that the conclusions are invariant of who analyzes the data (Porter, 1996). The literature argues that such simplicity and analyzer-independence of null hypothesis testing was widely appealing for communicating the complex scientific inquiry to the non-experts.63 To the best of my knowledge, the discussion on null hypothesis testing has not explored its implication for aggregating multiple pieces of evidence through their conclusions. While full reporting is ideal when the readers are unboundedly sophisticated and have unlimited computational capacity, it also leads to an extremely conservative implication for the readers who only count binary conclusions: if readers are interested in whether there are any positive effects, then they are expected to still conclude that null hypothesis is collectively rejected even if 97 percent of studies report negative results regardless of studies’ precision. To see why, recall that if the true effect were null, then exactly 97.5 percent of the papers report negative results under the conventional double-sided testing at 95 percent confidence level. Thus, if there were any less than 97.5 percent of non-positive results, then the true effect must be positive. However, common sense would say that general audience will interpret the message “97% of climate papers cannot claim human-caused global warming is happening” as negation, not confirmation, of anthropogenic climate change; unless systematic meta-analysis exists to aggregate results for the readers, it may be difficult to accurately convey the overall result under both vote counting and full reporting. This alternative understanding is a clarification, not necessarily a contradiction, of the common interpretation that researchers are merely seeking for professional reward of publication. The researchers in the model are, in actual research processes, the joint decision-makers composed of authors, referees, and editors. Thus, even if authors are themselves career-seeking, anonymous referees and editors may be social-welfare-maximizing. Nevertheless, this common interpretation is incomplete without explanation for the demand of “evidence with publication bias” to which the journal referees and editors respond. The findings of this paper suggest that caution is warranted in unambiguously attributing this demand to biased beliefs or motives: coarse communication of null hypothesis testing is sufficient to explain it. 5.3.4 Suggestions The model suggests ways to facilitate scientific communication between authors of articles, meta-analysts, and readers. First, it suggests to authors to use the statistical significance thresholds only as rules of thumb, to attend to coefficient magnitudes for inference of yes-orno conclusions, and to tolerate precisely estimated or well identified null results. Second, it 63 In this sense, despite its name “hypothesis testing,” it has only weak relation with the model of science in which researchers set and adjust their own hypotheses and test them against data. (Hampel) 51 suggests to reviewers to discuss publication bias in their meta-analyses without concern to be interpreted as criticism to the literature. Third, it suggests readers need not distrust overall lessons from scientific literature even when there were evidence of publication bias. 5.3.5 Caveats The paper does not claim that publication bias is always socially acceptable; the consequence of misinforming public policy through scientific evidence can be too grave to be optimistic. It is plausible that, due to the competitive process of academic publication, the degree of publication bias is sub-optimally high. The sensitivity analyses show that, if systematic reviews are expected, if risk aversion is a key concern, or if the interests of experimenters and policymaker are in serious conflict, then allowing publication bias can be detrimental. In some fields of medicine in which meta-analyses are common, minimizing risk is critical, and where the funding industry’s profits at stake are sizable, thus, the mandatory reporting remains critical. The paper’s goal is solely to clarify the implications of vote counting, not to recommend it nor to exaggerate its prevalence. The Cochrane Review writes “Occasionally meta-analyses use ‘vote counting’ to compare the number of positive studies with the number of negative studies. ... it should be avoided whenever possible. ... Vote counting might be considered as a last resort in situations when standard meta-analytical methods cannot be applied (such as when there is no consistent outcome measure)” (Higgins and Green, 2011). There are a number of careful researchers who are forthright about their studies’ limitations in the abstracts, and many readers who are mindful of the caveats and avoid oversimplification of evidence. The most ideal communication remains to be such thorough aggregation of all evidence. 6 Conclusion Robert K. Merton64 suggested that disinterestedness, the pursuit of knowledge for common scientific enterprise, is a core ethos of scientific practice (Merton, 1942). If the objective is to acquire accurate knowledge and all information can be fully and costlessly communicated, the value of information depends not on its conclusion but only on its precision. Thus, publication bias is often perceived to contradict the ideal vision of unbiased researchers equally willing to report both positive and negative results. Merton also proposed communalism, the collective learning process, as integral to scientific practice. This paper shows that, when coupled with the difficulty of reducing the complex and nuanced results into simple and coarse conclusions, even disinterested researchers’ reports will have both omissions and inflation. Casual language such as “exciting results” vs “ no results” appear to suggest irrationality and bias in the academic community. The model offers one rational theory of interesting results: whether positive or negative, they are results sufficiently 64 An eminent sociologist of science who is a father of Robert C. Merton, an economist with distinguished contributions in finance. 52 “conclusive” to influence the decision in the direction of its yes-or-no result when other studies are collectively inconclusive. The contributions of the paper are twofold. First, it extends and combines the theoretical literature on communication and the empirical literature on publication bias: it builds on the former by identifying the implication of collective coarse information transmission for reporting bias, and extends the latter by offering a rational foundation for publication selection process and directly testing its predictions with data. Second, given that the model offers alternative models of publication selection, it proposes a novel, simple “stem-based” bias correction method based on the prediction supported by various models of publication bias commonly used. 53 References AllTrials. 2016 “UN Calls for Global Action on Clinical Trial Transparency.” http://www.alltrials.net/news/un-calls-for-global-action-on-clinical-trial-transparency/. Action to Control Cardiovascular Risk in Diabetes Study Group, Hertzel C. Gerstein, Michael E. Miller, Robert P. Byington, David C. Goff, J. Thomas Bigger, John B. Buse, et al. 2008. “Effects of Intensive Glucose Lowering in Type 2 Diabetes.” The New England Journal of Medicine 358 (24): 2545–59. Ashenfelter, Orley, Colm Harmon, and Hessel Oosterbeek. 1999. “A Review of Estimates of the Schooling/Earnings Relationship, with Tests for Publication Bias.” Labour Economics 6 (4): 453–470. Austen-Smith, David. 1993. “Interested Experts and Policy Advice: Multiple Referrals under Open Rule.” Games and Economic Behavior 5 (1): 3–43. Austen-Smith, David, and Jeffrey S. Banks. 1996. “Information Aggregation, Rationality, and the Condorcet Jury Theorem.” American Political Science Review 90 (1): 34–45. Banerjee, Abhijit, and Rohini Somanathan. 2001. “A Simple Model of Voice.” The Quarterly Journal of Economics 116 (1): 189–227. Banerjee, Abhijit, Selvan Kumar, Rohini Pande, and Felix Su. 2011. “Do Informed Voters Make Better Choices? Experimental Evidence from Urban India.” Harvard University mimeo. Barth, Jürgen, Thomas Munder, Heike Gerger, Eveline Nüesch, Sven Trelle, Hansjörg Znoj, Peter Jüni, and Pim Cuijpers. 2013. “Comparative Efficacy of Seven Psychotherapeutic Interventions for Patients with Depression: A Network Meta-Analysis.” PLOS Med 10 (5): e1001454. Bastian, Hilda, Paul Glasziou, and Iain Chalmers. 2010. “Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up?” PLoS Medicine 7 (9): e1000326. Batt, Rosemary. 1999. “Work Organization, Technology, and Performance in Customer Service and Sales.” Industrial & Labor Relations Review 52 (4): 539–564. Battaglini, Marco. 2002. “Multiple Referrals and Multidimensional Cheap Talk.” Econometrica 70 (4): 1379–1401. Battaglini, Marco, Rebecca B Morton, and Thomas R. Palfrey. 2010. “The Swing Voter’s Curse in the Laboratory.” The Review of Economic Studies 77 (1): 61–89. Battaglini, Marco, Ernest K. Lai, Wooyoung Lim, and Joseph Tao-yi Wang. 2016. “The Informational Theory of Legislative Committees: An Experimental Analysis.” Working Paper Begg, Colin B., and Madhuchhanda Mazumdar. 1994. “Operating Characteristics of a Rank Correlation Test for Publication Bias.” Biometrics 50 (4): 1088. Blume, Andreas, and Oliver Board. 2013. “Language Barriers.” Econometrica 81 (2): 781–812. Boccia, Stefania, Kenneth J. Rothman, Nikola Panic, Maria Elena Flacco, Annalisa Rosso, Roberta Pastorino, Lamberto Manzoli, et al. 2016. “Registration Practices for Observa54 tional Studies on ClinicalTrials.gov Indicated Low Adherence.” Journal of Clinical Epidemiology 70 (February): 176–82. Boussageon, Rémy, Theodora Bejan-Angoulvant, Mitra Saadatian-Elahi, Sandrine Lafont, Claire Bergeonneau, Behrouz Kassaï, Sylvie Erpeldinger, James M. Wright, François Gueyffier, and Catherine Cornu. 2011. “Effect of Intensive Glucose Lowering Treatment on All Cause Mortality, Cardiovascular Death, and Microvascular Events in Type 2 Diabetes: Meta-Analysis of Randomised Controlled Trials.” BMJ 343 (July): d4169. Brodeur, Abel, Mathias Lé, Marc Sangnier, and Yanos Zylberberg. 2016. “Star Wars: The Empirics Strike Back.” American Economic Journal: Applied Economics 8 (1): 1–32. Camerer, Colin, Linda Babcock, George Loewenstein, and Richard Thaler. 1997. “Labor Supply of New York City Cabdrivers: One Day at a Time.” The Quarterly Journal of Economics 112 (2): 407–41. Card, David, and Alan B. Krueger. 1995. “Time-Series Minimum-Wage Studies: A MetaAnalysis.” The American Economic Review 85 (2): 238–243. Carey, Benedict. 2008. “Antidepressant Studies Unpublished.” The New York Times, January 17. http://www.nytimes.com/2008/01/17/health/17depress.html. Casey, Katherine, Rachel Glennerster, and Edward Miguel. 2012. “Reshaping Institutions: Evidence on Aid Impacts Using a Preanalysis Plan*.” The Quarterly Journal of Economics 127 (4): 1755–1812. Chalmers l. 1990. “Underreporting Research Is Scientific Misconduct.” JAMA 263 (10): 1405–8. Copas, J. B., and H. G. Li. 1997. “Inference for Non-Random Samples.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59 (1): 55–95. Crawford, Vincent. 1998. “A Survey of Experiments on Communication via Cheap Talk.” Journal of Economic Theory 78 (2): 286–98. Crawford, Vincent P., and Joel Sobel. 1982. “Strategic Information Transmission.” Econometrica 50 (6): 1431–51. de Winter, Joost, and Riender Happee. 2013. “Why Selective Publication of Statistically Significant Results Can Be Effective.” PloS One 8 (6): e66463. Davis, Lucas W. 2008. “The Effect of Driving Restrictions on Air Quality in Mexico City.” Journal of Political Economy 116 (1): 38–81. Dekel, Eddie, and Michele Piccione. 2000. “Sequential Voting Procedures in Symmetric Binary Elections.” Journal of Political Economy 108 (1): 34–55. Dewatripont, Mathias, and Jean Tirole. 1999. “Advocates.” Journal of Political Economy 107 (1): 1–39. Dickhaut, John W., Kevin A. McCabe, and Arijit Mukherji. 1995. “An Experimental Study of Strategic Information Transmission.” Economic Theory 6 (3): 389–403. Doucouliagos, Chris, and T.d. Stanley. 2013. “Are All Economic Facts Greatly Exaggerated? Theory Competition and Selectivity.” Journal of Economic Surveys 27 (2): 316–39. 55 Doucouliagos, Christos, and Patrice Laroche. 2003. “What Do Unions Do to Productivity? A Meta-Analysis.” Industrial Relations: A Journal of Economy and Society 42 (4): 650–691. Duval, Sue, and Richard Tweedie. 2000. “Trim and Fill: A Simple Funnel-Plot–based Method of Testing and Adjusting for Publication Bias in Meta-Analysis.” Biometrics 56 (2): 455–463. Egger, Matthias, George Davey Smith, Martin Schneider, and Christoph Minder. 1997. “Bias in Meta-Analysis Detected by a Simple, Graphical Test.” BMJ 315 (7109): 629–634. Every-Palmer, Susanna, and Jeremy Howick. 2014. “How Evidence-Based Medicine Is Failing due to Biased Trials and Selective Publication: EBM Fails due to Biased Trials and Selective Publication.” Journal of Evaluation in Clinical Practice 20 (6): 908–14. Fafchamps, Marcel, and Julien Labonne. 2016. “Using Split Samples to Improve Inference about Causal Effects.” National Bureau of Economic Research Working Paper. Feddersen, Timothy J. 2004. “Rational Choice Theory and the Paradox of Not Voting.” Journal of Economic Perspectives 18 (1): 99–112. Feddersen, Timothy J., and Wolfgang Pesendorfer. 1996. “The Swing Voter’s Curse.” The American Economic Review, 408–424. ———. 1997. “Voting Behavior and Information Aggregation in Elections With Private Information.” Econometrica 65 (5): 1029. ———. 1998. “Convicting the Innocent: The Inferiority of Unanimous Jury Verdicts under Strategic Voting.” American Political Science Review 92 (1): 23–35. ———. 1999. “Abstention in Elections with Asymmetric Information and Diverse Preferences.” American Political Science Review 93 (2): 381–98. Forbes, Kristin J. 2000. “A Reassessment of the Relationship between Inequality and Growth.” American Economic Review 90 (4): 869–87. Franco, Annie, Neil Malhotra, and Gabor Simonovits. 2014. “Publication Bias in the Social Sciences: Unlocking the File Drawer.” Science 345 (6203): 1502–5. Freeman, Richard. 1988. “Union Density and Economic Performance: An Analysis of US States.” European Economic Review 32 (2–3): 707–716. Fryer, Roland, and Matthew O. Jackson. 2008. “A Categorical Model of Cognition and Biased Decision Making.” The BE Journal of Theoretical Economics 8 (1). Gabaix, Xavier. 2014. “A Sparsity-Based Model of Bounded Rationality.” The Quarterly Journal of Economics 129 (4): 1661–1710. Garner, Paul, David Taylor-Robinson, and Harshpal Singh Sachdev. 2013. “DEVTA: Results from the Biggest Clinical Trial Ever.” The Lancet 381 (9876): 1439–41. Gentzkow, Matthew, and Jesse Shapiro. 2006. “Media Bias and Reputation.” Journal of Political Economy, 114(2). Gentzkow, Matthew, Jesse Shapiro, and Daniel Stone 2016. “Chapter 14 - Media Bias in the Marketplace: Theory.” In Handbook of Media Economics, edited by Joel Waldfogel and David Strömberg Simon P. Anderson, 1:647–67. 56 Gerber, Alan. 2008. “Do Statistical Reporting Standards Affect What Is Published? Publication Bias in Two Leading Political Science Journals.” Quarterly Journal of Political Science 3 (3): 313–26. Gerber, Alan S., and Neil Malhotra. 2008. “Publication Bias in Empirical Sociological Research: Do Arbitrary Significance Levels Distort Published Results?” Sociological Methods & Research, June. Gilligan, Thomas W., and Keith Krehbiel. 1987. “Collective Decisionmaking and Standing Committees: An Informational Rationale for Restrictive Amendment Procedures.” Journal of Law, Economics, & Organization 3 (2): 287–335. Glaeser, Edward L. 2008. “Researcher Incentives and Empirical Methods.” in The Foundations of Positive and Normative Economics, Andrew Caplin and Andrew Schotter Glazer, Jacob, and Ariel Rubinstein. 2004. “On Optimal Rules of Persuasion.” Econometrica 72 (6): 1715–36. Goldacre, Ben. 2010. Bad Science: Quacks, Hacks, and Big Pharma Flacks. Reprint edition. New York: Farrar, Straus and Giroux. Hanna, Rema, Esther Duflo, and Michael Greenstone. 2016. “Up in Smoke: The Influence of Household Behavior on the Long-Run Impact of Improved Cooking Stoves.” American Economic Journal: Economic Policy 8 (1): 80–114. Hansen, Lars Peter, and Kenneth J. Singleton. 1982. “Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models.” Econometrica 50 (5): 1269. Havránek, Tomáš. 2015. “Measuring Intertemporal Substitution: The Importance of Method Choices and Selective Reporting.” Journal of the European Economic Association 13 (6): 1180–1204. Havranek, Tomas, Zuzana Irsova, Karel Janda, and David Zilberman. 2015. “Selective Reporting and the Social Cost of Carbon.” Energy Economics 51 (September): 394–406. Hay, Michael, David W. Thomas, John L. Craighead, Celia Economides, and Jesse Rosenthal. 2014. “Clinical Development Success Rates for Investigational Drugs.” Nature Biotechnology 32 (1): 40–51. Hedges, Larry V. 1992. “Modeling Publication Selection Effects in Meta-Analysis.” Statistical Science, 246–255. Henry, Emeric. 2009. “Strategic Disclosure of Research Results: The Cost of Proving Your Honesty.” The Economic Journal 119 (539): 1036–64. Henry, Emeric, and Marco Ottaviani. 2014. “Research and the Approval Process.” mimeo. Higgins, Julian P. T., and Sally Green. 2011. Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons. Hunter, John E., and Frank L. Schmidt. 2004. Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. 2nd edition. Thousand Oaks, Calif: SAGE Publications, Inc. ICMJE. 2016. “Recommendations for the Conduct, Reporting, Editing, and Publication of 57 Scholarly Work in Medical Journals.” http://www.icmje.org/recommendations/. Imbens, Guido, and Karthik Kalyanaraman. 2011. “Optimal Bandwidth Choice for the Regression Discontinuity Estimator.” The Review of Economic Studies Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124. John, K. Leslie, George Loewenstein, and Drazen Prelec. 2012. “Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling.” Psychological Science 23 (5): 524–32. Kamenica, Emir, and Matthew Gentzkow. 2011. “Bayesian Persuasion.” American Economic Review 101 (6): 2590–2615. Knutti, Reto. 2010. “The End of Model Democracy?: An Editorial Comment.” Climatic Change 102 (3–4): 395–404. Krishna, Vijay, and John Morgan. 2001. “A Model of Expertise.” The Quarterly Journal of Economics 116 (2): 747–75. ———. 2004. “The Art of Conversation: Eliciting Information from Experts through MultiStage Communication.” Journal of Economic Theory 117 (2): 147–79. Leamer, Edward E. 1978. Specification Searches: Ad Hoc Inference with Nonexperimental Data. 1 edition. New York: Wiley. Li, Hao, Sherwin Rosen, and Wing Suen. 2001. “Conflicts and Common Interests in Committees.” American Economic Review 91 (5): 1478–97. Libgober, Jonathan. 2015. “False Positives in Scientific Research.” Harvard University mimeo. Martinson, Brian C., Melissa S. Anderson, and Raymond de Vries. 2005. “Scientists Behaving Badly.” Nature 435 (7043): 737–38. McCrary, Justin, Garret Christensen, and Daniele Fanelli. 2016. “Conservative Tests under Satisficing Models of Publication Bias.” Edited by Daniele Marinazzo. PLOS ONE 11 (2): e0149590. Merton, Robert K. 1942 “The Normative Structure of Science", in the Sociology of Science: Theoretical and Empirical Investigations, Chicago: University of Chicago Press Miguel, Edward, Colin Camerer, Katherine Casey, Joshua Cohen, Kevin M. Esterling, Alan Gerber, Rachel Glennerster, et al. 2014. “Promoting Transparency in Social Science Research.” Science 343 (6166): 30–31. Morgan, John, and Phillip C. Stocken. 2008. “Information Aggregation in Polls.” American Economic Review 98 (3): 864–96. Morris, Stephen. 2001. “Political Correctness.” Journal of Political Economy 109 (2): 231–265. Mullainathan, Sendhil. 2000. “Thinking Through Categories.” MIT mimeo Munafo, M. R., I. J. Matheson, and J. Flint. 2007. “Association of the DRD2 Gene Taq1A Polymorphism and Alcoholism: A Meta-Analysis of Case–control Studies and Evidence of Publication Bias.” Molecular Psychiatry 12 (5): 454–461. The Editors of Epidemiology, 2010. “The Registration of Observational Studies—When Metaphors 58 Go Bad:” Epidemiology, July, 1. Newey, Whitney K., and Daniel McFadden. 1994. “Chapter 36 Large Sample Estimation and Hypothesis Testing.” in Handbook of Econometrics, 4:2111–2245. Olken, Benjamin A. 2015. “Promises and Perils of Pre-Analysis Plans.” Journal of Economic Perspectives 29 (3): 61–80. Porter, Theodore M. 1996. Trust in Numbers. Reprint edition. Princeton, N.J.: Princeton University Press. Puglisi, Riccardo, and James M. Snyder Jr. 2016. “Chapter 15 - Empirical Studies of Media Bias.” In Handbook of Media Economics, edited by Joel Waldfogel and David Strömberg Simon P. Anderson, 1:647–67. Reinhart, Alex. 2015. Statistics Done Wrong: The Woefully Complete Guide. 1 edition. San Francisco: No Starch Press. Rosenthal, Robert. 1979. “The File Drawer Problem and Tolerance for Null Results.” Psychological Bulletin 86 (3): 638. Salsburg, David. 2002. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. 60717th edition. New York: Holt Paperbacks. Sánchez-Pagés, Santiago, and Marc Vorsatz. 2007. “An Experimental Study of Truth-Telling in a Sender–receiver Game.” Games and Economic Behavior 61 (1): 86–112. Schwartzstein, Joshua. 2014. “Selective Attention and Learning.” Journal of the European Economic Association 12 (6): 1423–52. Shapiro, Jesse M. 2016. “Special Interests and the Media: Theory and an Application to Climate Change.” National Bureau of Economic Research Working Paper Shibamura, Ryo. 2004 “Fisher no tokei riron – suisoku tokeigaku no keisei to sono shakaiteki haikei” (Statistical Theory of Fisher: Evolution of Inferential Statistics and its Social Background, in Japanese), Kyushu University Press Siegfried, John J. 2012. “Minutes of the Meeting of the Executive Committee Chicago, IL January 5, 2012.” American Economic Review 102 (3): 645–52. Simonsohn, Uri, Leif D. Nelson, and Joseph P. Simmons. 2014. “P-Curve: A Key to the File-Drawer.” Journal of Experimental Psychology: General 143 (2): 534–47. Sobel, Joel. 2010. “Giving and Receiving Advice.” In Econometric Society 10th World Congress. Stanley T.D. 2005 “Beyond Publication Bias” in “Meta-Regression Analysis: Issues of Publication Bias in Economics edited” by Collin Roberts and T. D. Stanley Stanley, T. D., Stephen B. Jarrell, and Hristos Doucouliagos. 2010. “Could It Be Better to Discard 90% of the Data? A Statistical Paradox.” The American Statistician 64 (1): 70–77. Suen, Wing. 2004. “The Self-Perpetuation of Biased Beliefs.” The Economic Journal 114 (495): 377–396. The Consensus Project. 2014. “Why we need to talk about the scientific consensus on climate change” 59 http://www.skepticalscience.com/news.php?f=why-we-need-to-talk-about-climate-consensus Turner, Erick H., Annette M. Matthews, Eftihia Linardatos, Robert A. Tell, and Robert Rosenthal. 2008. “Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy.” New England Journal of Medicine 358 (3): 252–60. van Assen, Marcel A. L. M., Robbie C. M. van Aert, Michèle B. Nuijten, and Jelte M. Wicherts. 2014. “Why Publishing Everything Is More Effective than Selective Publishing of Statistically Significant Results.” Edited by K. Brad Wray. PLoS ONE 9 (1): e84896. van Assen, Marcel A. L. M., Robbie C. M. van Aert, and Jelte M. Wicherts. 2015. “MetaAnalysis Using Effect Size Distributions of Only Statistically Significant Studies.” Psychological Methods 20 (3): 293–309. Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA’s Statement on P-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. Woolston, Chris. 2015. “Psychology Journal Bans P Values.” Nature 519 (7541): 9–9. Young, Neal S., John PA Ioannidis, and Omar Al-Ubaydli. 2008. “Why Current Publication Practices May Distort Science.” PLoS Medicine 5 (10): e201. Ziliak, Stephen T., and Deirdre N. McCloskey. 2008. The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. 1st edition. Ann Arbor: University of Michigan Press. 60 Appendices Appendix A will present analytical proofs and examples; Appendix B will contain computational results; and Appendix C relates to the proofs and visualizations in empirics section. Appendix A. A1 Lemma 1 Monotonicity Step 1. Recall that, under the full information, the posterior belief is given by the Bayes rule: E [b | βi , β−i ] = 1 σ2 (βi + β−i ) 1 + σ22 σ2 b Take some arbitrary decision rules of the policymaker π (mi , m−i ), which may not be monotone in the messages. Take some decision rule of another researcher m−i = 1 β−i ∈ B11 , B21 , ... β−i ∈ B1∅ , B2∅ , ... β−i ∈ B10 , B20 , ... ∅ 0 , where Bkm denote the kth interval of β−i realization such that message m is sent. Note that any mixed strategies on some intervals can be arbitrarily closely approximated by taking the pure strategies on more finely partitioned intervals.65 By researcher i’s utility maximization, the strategy of the researcher i is given by the potentially multiple thresholds β and β that satisfy the following indifference conditions with EW (·) denoting the expected welfare: EW mi = 1, βi = β = EW mi = ∅, βi = β EW mi = 0, βi = β = EW mi = ∅, βi = β , where by definition EW (mi , βi ) = X P (m−i |βi ) π (mi , m−i ) m−i 1 σ2 m−i E β−i |β−i ∈ Bk 12 + 22 σ σb , βi + βi − c Substituting and rearranging, X 1 σ2 h h m−i i i E β−i |β−i ∈ Bk ,β + β − c = 0 12 + 22 σ σb 1 h h i i X 2 m P m−i |β [π (∅, m−i ) − π (0, m−i )] 1 σ 2 E β−i |β−i ∈ Bk −i , β + β − c = 0 2 + 2 m P m−i |β [π (1, m−i ) − π (∅, m−i )] (28) m−i −i (29) σ σb The remainder of the proof will show that these conditions are satisfied for some unique values of β and β in any responsive and informative equilibria so that the researcher i’s strategy can be described by the threshold rule as in 65 This relates to the Riemann integral. Suppose another researcher takes some mixed strategy P (m−i = 1) ∈ (0, 1) and P (m−i = 0) ∈ (0, 1) on some interval B. Then, one can rewrite this strategy in terms of pure strategies that maintains the relevant conditional expectations the same and its derivative arbitrarily closely. 61 Lemma 1. Step 2. Let us use the conditions of responsiveness and informativeness of messages to ensure that these indifference conditions are well-defined so that it does not hold for all values of βi and it holds for at least some values of βi . First, responsiveness to both n0 and n1 ensures that π (1, m−i ) 6= π (∅, m−i ) and yet π (0, m−i ) 6= π (∅, m−i ) for some m−i . This avoids cases in which, even if the messages are informative, it is not responsive because E [b|m] < c for all messages m. Second, informativeness of equilibrium, combined with anonymity and researchers’ utility maximization implies Lemma 1.(ii) monotonicity of π. Towards contradiction, suppose there exist messages mi , m0i and m−i , m0−i such that π (mi , m−i ) > π (m0i , m−i ) and π mi , m0−i > π m0i , m0−i . For this strategy to be consistent with utility maximization of policymaker, her posterior belief must satisfy E [b|mi , m−i ] ≥ E [b|m0i , m−i ] and E b|mi , m0−i ≤ E b|m0i , m0−i in equilibrium. For this to hold at the same time, the unconditional expectations must equal (E [βi |mi ] = 1 σ2 1 + 22 σ σ2 b E [βi |m0i ]) due to the Bayes’ rule E [b|mi , m−i ] = {E [βi |mi , m−i ] + E [β−i |mi , m−i ]} and the monotonicity of E [β−i |βi , m−i ] in βi as will be shown in Step 3. However, this contradicts the informativeness of all messages. Thus, π (mi , m−i ) > π (m0i , m−i ) implies π mi , m0−i ≥ π m0i , m0−i and we focus on the equilibrium in which the meaning of message is consistent66 : π (1, m−i ) > π (∅, m−i ) > π (0, m−i ). This avoids a responsive yet uninformative equilibrium due to coordination failure: suppose another researcher randomizes the messages with equal probabilities so that i h m−i E β−i |β−i ∈ Bk ,β = σ02 2 σb +σ 2 β and P m−i |β = 13 . If for example π (1, ∅) − π (∅, ∅) = − (π (1, 1) − π (∅, 1)) and π (1, 0) = π (∅, 0), then the first condition can be satisfied for any value of β. Thus, researcher i would also be willing to randomize with P m−i |β = 13 , too. Since both researchers randomize, the equilibrium is uninformative. Note that the informativeness of equilibrium implies Lemma 1 (ii) because the mixed strategy π ∈ (0, 1) can occur only if some E [b | mi , m−i ] = c. Since the ranking of posterior is the strict inequality due to informativeness, and by the utility maximization of the receiver, π ∗ (n1 , n0 ) > 0 ⇒ π ∗ (n1 + k, n0 ) = 1 π ∗ (n1 , n0 ) < 1 ⇒ π ∗ (n1 , n0 + k) = 0 Step 3. We show that any conditional expectation of β−i is weakly increasing in βi : m−i ∂E β−i |β−i ∈ Bk 0≤ ∂βi m−i If Bk m−i Bk m−i = βk m−i is degenerate, then E β−i |β−i ∈ Bk , βi , βi is constant. Otherwise, let us write each interval as: m , β k −i . Because of bivariate normal distribution, conditional distribution is given by β−i ∼ N ρβi , σ̃ 2 , where correlation coefficient ρ ≡ σb2 , σb2 +σ 2 and variance σ̃ 2 ≡ 1 − ρ2 σ 2 + σb2 . Using the formula of truncated normal distribution, PK k=1 E [β−i |β−i ∈ P ivj , βi ] = i PK h h Φ β k − Φ βk k=1 66 φ(β k )−φ(βk ) ρβi − σ̃ Φ β −Φ β ( k) ( k) Φ β k − Φ βk i That is, mi = 1 means high signal and mi = 0 means low signal. One could construct an equilibrium in which the meaning of messages differs, but this will be qualitatively equivalent. 62 , where φ is the probability density function and Φ is the cumulative density function of N ρβi , σ̃ 2 . m First, note that the relative increase in probability of β−i ∈ βk , β k is increasing in the value of E β−i |β−i ∈ Bk −i , βi . To see this, note that h ∂ Φ β k − Φ βk i ∂βi 1 ρ φ β k − φ βk h i = − σ̃ Φ β − Φ βk Φ β k − Φ βk k h i φ(β )−φ(βk ) − Φ β k −Φ β is increasing in E β−i |β−i ∈ βk , β k , βi because of the formula for mean of truncated normal ( k) ( k) distribution. Therefore, as βi increases, the relative likelihood of taking β−i occuring in intervals with high values of βi also increases. Second, the expected value of the truncated normal distribution is also increasing in βi .67 For notational ease, denote the mean by µ = ρβi . Then, by chain rule h ∂E β|β ≤ β ≤ β, βi i h ∂E β|β ≤ β ≤ β, µ = ∂βi distribution is given ρ ∂µ ∂E[β|β≤β≤β,βi ] ∂µ Thus, it is necessary and sufficient to show i ≥ 0. The expression of the mean of truncated normal by68 i h E β|β ≤ β ≤ β = µ − σ̃ φ−φ β Φ−Φ 1 β−µ φ dβ σ̃F (µ) σ̃ β = −β , where F (µ) ≡ Φ − Φ. By chain rule ∂Etr (β) = ∂µ β 1 β−µ φ0 σ̃F (µ) σ̃ β −β First, the normal density satisfies φ0 φ−φ σ̃F (µ) F (µ) = − ∂ ln∂µ and ∂Etr (β) = ∂µ β 1 σ̃ 2 1 σ̃ 2 = −β β σ̃F1(µ) φ β−µ σ̃ β−µ σ̃ =− φ−φ 1 1 dβ − − σ̃ F (µ) σ̃ − β−µ σ̃ φ β−µ σ̃ β 1 β−µ φ dβ σ̃F (µ) σ̃ β −β . Second, focus on the second term and see that dβ = Etr (β). Therefore, β 1 β−µ ∂ ln F (µ) β (β − µ) φ dβ − Etr (β) σ̃F (µ) σ̃ ∂µ −β " β 1 β−µ β2 φ dβ − µ σ̃F (µ) σ̃ −β β # 1 β−µ Etr (β) − µ β φ dβ − Etr (β) σ̃ σ̃ 2 −β σ̃F (µ) i 1 h 2 β − µE (β) − (E (β) − µ) E (β) = E tr tr tr tr σ̃ 2 h i 1 2 2 = E β − E (β) tr tr σ̃ 2 V artr (β) = >0 σ̃ 2 Step 4. Consider the two indifference conditions in (28, 29). Since c is finite and limβi →∞ (βi +E [β−i |β−i ∈ P iv1 , βi ]) = ∞ as βi spans the entire real line due to normal distribution, and (βi + E [β−i |β−i ∈ P iv1 , βi ]) is monotone in βi , there exists a threshold β i such that m∗i = 1 if βi > β i and only if βi ≥ β i . Similarly, there exists βi such that m∗i = 0 if βi < βi and only if βi ≤ βi . 67 68 I thankAlecos Papadopoulos for this step of proof. φ = φ β−µ σ̃ , φ = φ β−µ σ̃ are normal probability density functions, and Φ = Φ distribution functions. 63 β−µ σ̃ , Φ = Φ β−µ σ̃ are normal cumulative A2 Counterexample to monotonicity Note that monotonicity may not hold under any non-degenerate G (σ) even conditional on the supermajoritarian rule of the policymaker and monotone rule of another researcher. To see this, consider a bivariate distribution ( σ−i = σH σL g 1 − g, for some g ∈ (0, 1). For some realization of σi , the indifference condition can be written as P σH |m−i = ∅, β h i β σ2 + σ12 E β−i |m−i = ∅, σH , β i H 1 σb2 1 σi2 + + 1 2 σH + P σL |m−i = ∅, β by the law of iterated expectations. Even though both β σi2 h i β σ2 + σ12 E β−i |m−i = ∅, σL , β i L 1 σb2 h i + + σ12 E β−i |m−i = ∅, β and H 1 σi2 β σi2 + 1 2 σL h =0 + σ12 E β−i |m−i = ∅, β i H are increasing in β, P σH |m−i = ∅, β and P σH |m−i = ∅, β could be changing in such a way that the entire left hand side may be decreasing in some β. By Bayes’ rule, P σH |m−i = ∅, β = gP m−i = ∅|σH , β P m−i = ∅, β ; P σL |m−i = ∅, β = (1 − g) P m−i = ∅|σL , β P m−i = ∅, β Consider an example in which another researcher −i takes the strategies m−i (β−i , σH ) = ∅ ⇔ β−i ∈ (β H , β H + ∆] s m−i (β−i , σL ) = ∅ ⇔ β−i ∈ (β L , β L + r for some small ∆ and 2 σb2 +σL 2 2 σb +σH σ02 + σL2 2 ∆] σ02 + σH is a normalization. Then, h i h i E β−i |m−i = ∅, σH , β ' β H ; E β−i |m−i = ∅, σL , β ' β L β H − ρ σσHi β β L − ρ σσLi β ; P m−i = ∅|σL , β = φ q P m−i = ∅|σH , β = φ q 2 2 2 2 σb + σH σb + σL , where φ (·) is the density of standard normal distribution. By substitution, the indifference condition is approximately g̃H φ , where g̃H = βH − g 1 + 12 + 12 σ2 σ σ 0 i H q σb2 2 2 σb +σH " σH σi β 2 σb2 + σH and g̃L = # β β + H + g̃L φ 2 σi2 σH 1−g 1 + 12 + 12 σ2 σ σ 0 i L βL − q σb2 2 2 σb +σL σb2 + σL2 . Rearranging the condition, 64 " σL σi β # β β + L =0 σi2 σL2 − β g̃H σi2 + βH 2 σH g̃L + βL 2 σL β σi2 βL− √ φ = σ2 σL b β σ 2 +σ 2 σi L b 2 2 σb +σL βH − σ2 σH b σ 2 +σ 2 σi H b 2 σb2 +σH √ φ β Suppose β L = 0 for example. Then, − g̃H g̃L 1+ σi2 β H 2 β σH s ! = 2 σb2 + σH σb2 + σL2 2 2 2 σb2 σL β − 2 σb 2 σH β − β σb +σH σi σb2 +σL2 σi H exp q − q 2 σb2 + σH σb2 + σL2 The monotonicity implies that this condition is satisfied only at a unique value of β. However, the left hand side of the condition is increasing in β, and on the other hand, the right hand side is non-monotone in β. Therefore, for some values of parameters, this condition can be satisfied at multiple values of β, contradicting the monotonicity assumption. That the conditional expectation may be decreasing in one’s own signal could be potentially important so that the threshold rule may no longer be an equilibrium. A3 Proposition 1 (i) While the proof of the omission (8) and asymmetry (9) is contained in the main text, there are two additional steps to proving the Proposition 1 (i) fully. First, one has to verify that, given these thresholds, the supermajoritarian rule (7) is the policymaker’s optimal strategy. This holds because the researchers’ strategy must satisfy the indifference condition at the margin whereas the policymaker assess whether the condition holds on average. For example, note that E [b | m1 = 1, m2 = ∅] = > 1 σ2 1 σb2 h 2 σ2 + 1 σ2 1 σb2 h n 2 σ2 + i E β1 + E β2 | β > β2 ≥ β, β1 |β1 ≥ β h β + E β2 | β > β2 ≥ β, β1 = β io i = 0, where the last equality holds due to the researchers’ indifference condition. Analogous arguments also hold for other realizations of messages, showing (7) is an equilibrium. Second, the following algebraic argument formally shows that the asymmetry in (9) leads to the upward bias (10): E [βi |mi 6= ∅] = P [mi = 1|mi 6= ∅] E [βi |mi = 1] + P [mi = 0|mi 6= ∅] E [βi |mi = 0] 1−Φ β = q 1−Φ β +Φ β = q σ 2 + σb2 σ 2 + σb2 φ β − 1−Φ β Φ β q 1−Φ β +Φ β σ 2 + σb2 φ β Φ β φ β −φ β >0 1−Φ β +Φ β Finally, to confirm the existence and uniqueness of the symmetric equilibrium, one can check the slope of the combined best response functions using the best response functions. In particular, it is sufficient to show that 65 BR ∂β i (β −i ) ∂β −i < 0. To see this, given β −i , consider the system of best response functions h βi β−i = −E β−i |β −i > β−i ≥ β −i , βi = βi h i i β−i βi = −E βi |βi ≥ β i , β−i = β−i , which are the best response functions (11) and (12) where the thresholds are not restricted to be symmetric between the two researchers. Note that, by the property of truncated distribution, ∂βi β−i , β−i ∂βi ∈ (−1, 0) and ∂β−i βi ∂βi ∈ (−1, 0) everywhere in the range. Thus, the unique solution exists. Consider a small increase in β −i , which shifts βi β−i to be lower. Thus, tracing the curve β−i βi , the equilibrium value of βi also decreases because ∂βi (β−i ) ∂βi BR < 0. This shows ∂β i (β −i ) ∂β −i < 0. (ii) Following arguments show the stability and optimality respectively: Stability. This paper focuses on local stability among a number of different notions of stability in game theory. In particular, here, consider the equilibrium as a tuple of policymaker strategy and researcher’s monotone strategies n o − ∈ R6+4 so that the initial position E ≡ π (n) , β , β , β , β . Define the initial disturbance of strategies as → i i j j is given by the sum of equilibrium strategies and disturbances that respects the range π ∈ [0, 1]. Consider the following non-equilibrium adjustment process of fictitious play repeated dynamically t = 1, 2, ... In each period, first, the policymaker adjusts to the best response to the researcher’s best response inherited from the previous period.69 Then, one researcher (say, i) adjusts his strategy given the policymaker’s updated strategy and another researcher j’s inherited strategy. Finally, the researcher j adjusts his strategy given policymaker and researcher i’s strategy. Let Et denote the strategy after the adjustment in period t. Adopting the formulation seen in Defition 6.1 in Chapter 1 of Fudenberg and Levine (1998), I say that the equilibrium E is locally stable if for every dˆ > 0, there exists d > 0 such that, given any disturbance such that Euclidian distance d () < d, the equilibrium stays in the neighborhood of the ˆ I say that the equilibrium is locally unstable if it is not locally stable. equilibrium (i.e. d (E − E∞ ) < d). This definition is similar to the Cournot stability, one of the oldest and simplest notion of stability, albeit in the context with private signals. The equilibrium characterized in Proposition 1 also satisfies the stability in the foreward induction sense (e.g. those given by Kohlberg and Merten, and Kreps and Cho). Because it is the optimal equilibrium, players have no incentive to deviate. The advantage of the local stability over these concepts is that it ensures that the hyper sophisticated coordination is not necessary to sustain the equilibrium; even if the play may deviate from the equilibrium slightly, the reasonable dynamic adjustment will remain near the equilibrium. On the other hand, there are two limitations of the definition above. First, to be fully formal, one may want to consider a continuous perturbation of the strategies that map the signals βi × σi ∈ R2 into probability distribution over messages. However, this makes the computation of distance to be analytically difficult; thus, this approach focuses only on the deviation within the threshold rule. Second, the above fictitious play is silent on the learning process of others’ strategies and the justification of myopic plays. Immediate and complete learning of each other’s strategy and myopic adjustment process are the assumptions that help make the investigation of stability tractable; with high-dimensional strategies and more than two players, stability concept becomes difficult to define and verify; most literature on voting games also have remained silent on 69 It is possible to allow for the policymaker to move later because the researchers’ strategies are continuous in policymaker’s strategy. This set-up simplifies the steps of proof. 66 the stability property. The contribution of Esponza and Pouzo (2012) showed, when the voters learn about others’ strategies when they are pivotal, the mixed strategy Nash equilibrium is unstable. Their approach was tractable thanks to the set-up with two states of the world; in the set-up of this paper, the underlying state is continuous. While the models with finite number of signals typically have symmetric equilibrium that is unstable, the section 7.1 of their paper adds that the symmetric equilibrium is stable when the signal is continuous. In this sense, their result of lack of stability in symmetric equilibrium of information aggregation models does not suggest lack of stability in my set-up. The full analysis with specifications on the feedback process is beyond the scope of this current paper. The following argument shows that the equilibrium in Proposition 1 is locally stable. Since the policymaker adjusts first, it is sufficient to consider the noise = i , i , j , j so that the initial strategies are given by {β i + i , βi + i ,β j + j , βj + j }. Note that we focus on the symmetric equilibrium β i = β j = β and βi = βj = β. Step 1. Before consideringi the adjustment process, it is useful to make the following observations. Denote h B β, β, µ = E β|β ≤ β ≤ β, µ to be the expected value of truncated normal distribution with mean µ. Then, for small ∆, (30) (31) (32) B β + ∆, β, µ ∈ B β, β, µ , B β, β, µ + ∆ B β, β + ∆, µ ∈ B β, β, µ , B β, β, µ + ∆ B β + ∆, β + ∆, µ ∈ B β, β, µ , B β, β, µ + ∆ (30) and (31) are due to the observation that the mean of truncated distribution does not change as fast as the boundary changes. (32) is clear by taking the total derivative, combined with the Step 3 in Appendix A1: dB = ∂B ∂B ∂B dβ + dβ + dµ ∂β ∂µ ∂β Suppose we consider dβ = dβ = dµ = ∆. Then, since B β + ∆, β + ∆, µ + ∆ ∈ B β, β, µ + ∆ by definition, 1− Since ∂B ∂µ ∈ (0, 1), ∂B ∂β + ∂B ∂β ∂B ∂B ∂B = + ∂µ ∂β ∂β ∈ (0, 1). Step 2. Consider the adjustment in t = 1. Note that the posterior belief E [b|n] is continuous in the researchers’ thresholds. Since in equilibrium the posterior beliefs were either strictly above or below 0, sufficiently small perturbation does not alter the policymaker’s strategy. Thus, the policymaker returns to playing the supermajoritarian policy rule (7). Consider the researchers’ adjustment. First, i’s strategies will be given by βi,1 β j,0 = βi,1 β + j > β − j by the strategic substitutability in (12) and (31); β i,1 β j,0 , βj,0 = β i,1 β + j , β + j > β − j by the strategic substitutability in (11) and (32) for all j > ˆ (j ) with ˆ (j ) < 0. That is, so long as j is not too low, then the above inequality holds due to the continuity of β i,1 β, β with respect to β. 67 Then, j’s strategies will be given by β j,1 β i,1 , βi,1 < β j,1 β − j , β − j < β + j , where the first inequality followed from the strategic substitutability in (11) and the second inequality is due to (32). βj,1 β i,1 < βj,1 β − j < β + j , where the first inequality followed fromthe strategic substitutability in (12) and the second inequality by (31). Step 3. Let us consider t = 2. Since the thresholds are still in the neighrborhood of the equilibrium, for sufficiently small j , the policymaker’s strategy remains to be the supermajoritarian rule. By the argument analogous to t = 1, βi,2 β j,1 > βi,1 β + j > β − j β i,2 β j,1 , βj,1 > β i,1 β + j , β + j > β − j Similarly, repeating the same adjustment, βj,2 β i,2 < βj,2 β − j < β + j β j,2 β i,2 , βi,2 < β j,2 β − j , β − j < β + j Therefore, iterating this process forward, we observe that, for all t ≥ 1, the policymaker’s decision rule is supermajoritarian (7) and βi,t > β − j ; β i,t > β − j βj,t < β + j ; β i,t < β + j Step 4. Given dˆ > 0, we can now construct d > 0 from the above conditions. To be most conservative, suppose q the policymaker strategy and researcher i’s strategies were not disturbed. Then, we can set d = minj 2j + ˆ (j )2 . If the initial perturbation were small enough so that d () < d, then it is guaranteed that the equilibrium will have 68 the distance at most d (E − E∞ ) < dˆ = minj r i h 1 2 × 2j + ˆ (j )2 ). Thus, d = 2− 2 dˆ will satisfy the local stability condition. Note that the proof for the local stability here was more involved than the setting with only 1 dimensional strategy because the strategy was 2-dimensional. Unlike in 1-dimensional case, the Observation 1 does not directly imply the local stability. Optimality. Consider the policymaker’s strategy. Following argument shows that the equilibrium that involves mixed strategies are sub-optimal. Suppose that π (mi , mj ) ∈ (0, 1) for some mi , mj . Consider raising the cut-off to send the message mi marginally. Then, π (mi , mj ) = 1 and expected welfare from the realization mi , mj is positive. By Lamma 1, π (mi , mj − k) = 0 and π (mi − 1, mj ) = 0 prior to perturbation. Because the expected welfare from the message realization mi , mj − k and mi − 1, mj is strictly negative, there exists some small perturbation such that π (mi , mj − k) = 0 and π (mi − 1, mj ) = 0 even after the perturbation. Thus, expected welfare conditional on the message realization mj , mj − k are unchanged. By Lamma 1, π (mi + k, mj ) = 1 and for the diagonal, either interior or corner solutions of π (mi + 1, mj − 1) are possible. If π (mi + 1, mj − 1) is 0 or 1 due to expected welfare strictly greater or less than 0, then the small perturbation does not alter the decision. If π (mi + 1, mj − 1) ∈ (0, 1) then the perturbation turns this into π (mi + 1, mj − 1) = 1. Note nevertheless that, combining the message realizations mi +1, mj and mi +1, mj −1 the expected welfare is unchanged because the increase in welfare in region mi +1, mj −1 is offset completely by the lower probability of message mi + 1, mj realization. Since this perturbation improved welfare when mj and kept welfare constant at mj + k and mj − k, it must have improved the overall welfare. Since the welfare must be higher when welfare-maximizing researchers choose the equilibrium thresholds based on such atlernative pure-strategy π than such non-equilibrium deviation, the mixed-strategy equilibria must have been sub-optimal. Note that the equilibria that are not fully informative or not fully responsive cannot be optimal because allowing for responsiveness to additional messages can always improve welfare. There are two fully responsive and fully informative equilibria: one under the supermajoritarian rule a = 1 ⇔ n1 > n0 or the submajoritarian rule a = 1 ⇔ n1 ≥ n0 . Since these equilibria are symmetric with each other around the origin due to c = 0, they attain the same level of welfare. Thus, the equilibrium under a = 1 ⇔ n1 > n0 is an optimal equilibrium. Note that, conditional on the supermajoritarian rule, the researchers’ thresholds are unique due to the Observation 1. The full demonstration of the optimality and stability for the case with N ≥ 3 is beyond the scope of this paper. Nonetheless, the logic for optimality and stability under the supermajoritarian voting rule applies to the case of N ≥ 3. First, the supermajoritarian rule partitions the space of signals RN in a way that allows for fine approximation of the first-best hyperplane. Therefore, no other equilibria can better convey the information. Second, the local stability will most likely carry over to the case of N ≥ 3 in the model. There is literature on Cournot stability that argues that the iterative myopic adjustment process diverges for N ≥ 4 (Theochardis, 1960). The subsequent literature (Fisher, 1961) has shown that the myopic non-equilibrium adjustment process converges when it is continuous for N ≥ 3, and that the divergence result occured due to a particular adjustment Theochardis assumed. A4 Proposition 2 Equilibrium with no omission The policymaker’s strategy described in terms of the number of positive and negative results can be written in terms of the messages: 69 mi m−i 0 ∅ 0 π̃ 0 0 0 0 π∗ 1 ∅ 0 1 1 π̃ 0 with π̃ ∈ (0, 1]. Equilibrium condition. First, let us verify that β = β is an equilibrium under this policy rule. Recall that the thresholds, in general, must satisfy the conditions (28) and (29). Suppose another researcher plays β = β. Then, regardless of one’s own signal, P (m−i = ∅|βi ) = 0. Therefore, one can write the indifference conditions (28) and (29) as 1 σ2 P m−i = 1|β i (1 − π̃) + 1 σ2 P m−i = 1|βi π̃ 1 σb2 1 σb2 + h h i 2 σ2 h h 2 σ2 i E β−i |β−i ≥ β −i , β i + β i = 0 i i E β−i |β−i ≥ β −i , βi + βi = 0 (33) (34) Since these conditions are proportional to each other, researcher i’s optimal strategy has β i = βi . Second, given that the indifference condition is satisfied for researchers when signals equal βi = β = β, the policymaker will also be indifferent between a = 1 and a = 0 when n1 = 1, n0 = 0. By Lemma 1, this implies that the conjectured policymaker’s strategy is an equilibrium. Finally, note that the best response function β i β −i has dβ i (β −i ) dβ −i ∈ (−1, 0) as noted above. Therefore, the unique symmetric equilibrium exists. Note that β = β < 0. Towards contradiction, suppose β ≥ 0. Then, the right-hand-side of the condition (33) is strictly positive, and thus, the condition (33) cannot be satisfied. o n Stability. Consider the perturbation of the strategies β + i , β + i , β + j , β + j ordering of the threshold is preserved. Then, generically, π ∗ (n 1 such that > so that the = 1, n0 = 0) is either 1 or 0 because the indifference no longer holds. Given that another researcher j has positive probability of omission, it is also optimal to omit some results. Therefore, β i,1 > β and βi,1 < β. Thus, j’s best response will also have β j,1 > β and βj,1 < β. Thus, there does not exist any local perturbation such that the equilibrium does not diverge away. That is, the equilibrium is not locally stable. Optimality. To see that the equilibrium with no omission attains strictly lower welfare than the equilibrium with asymmetric omission, consider the following non-equilibrium deviation: suppose both researchers shift their 0 thresholds so that β = β + ∆ and the policymaker switches to the supermajoritarian rule (7). This attains strictly higher welfare than the equilibrium without omission because, for signal realizations such that βi , βj ∈ β, β + ∆ , the expected welfare from policy implementation is negative. Since this deviation can allow the policymaker not to implement the policy then, it improves the welfare. By Proposition 1, we know that the equilibrium given the policymaker’s strategy (7) is unique. Moreover, since the researchers’ indifference conditions are identical to the first order conditions of maximizing the social weflare, the equilibrium characterized by Proposition 1 attains strictly higher welfare than the non-equilibrium deviation above. Therefore, the equilibrium with no omission is dominated by the equilibrium as in Proposition 1. Equilibrium with symmetric omission. Equilibrium condition. First, given the strategic symmetry β = −β, I demonstrate that the receiver’s strategy 70 is optimal. Denoting the density by Φ and φ, E [b | mi = ∅, m−i = ∅] = = = 1 σ2 1 σb2 + 1 σb2 + 1 σb2 + −β −β β " βi −β −β # β [βi + β−i ] dφ (β−i ) + 2 σ2 1 σ2 β (βi + β−i ) dΦ (βi , β−i ) 2 σ2 1 σ2 β β " βi −β −β [βi + β−i ] dφ (β−i ) dφ (βi ) βi [βi + β−i ] dφ (β−i ) − 2 σ2 # βi [βi + β−i ] dφ (β−i ) dφ (βi ) −β =0 by the symmetry of bivariate normal distribution. E [b | mi = 1, m−i = 0] = = = = 1 σ2 1 σb2 + 2 σ2 1 σ2 1 σb2 + + + −∞ ( ∞ βi ∞ −β −β β βi ) (βi + β−i ) dΦ (βi , β−i ) (βi + β−i ) dΦ (βi , β−i ) + βi ∞ −β (βi + β−i ) dΦ (βi , β−i ) + 2 σ2 −∞ ( 2 σ2 −∞ β ( 1 σ2 1 σb2 (βi + β−i ) dΦ (βi , β−i ) β 2 σ2 1 σ2 1 σb2 ∞ −β β (βi + β−i ) dΦ (βi , β−i ) β ∞ −β − βi ∞ −β (βi + β−i ) dΦ (βi , β−i ) + β ) (βi + β−i ) dΦ (βi , β−i ) β βi ) βi =0 Since c = 0, the policymaker is indifferent between implementing and canceling the policy when (n0 , n1 ) ∈ {(0, 0) , (1, 1)}. By Lemma 1 (ii), the strategy satisfies the policymaker’s optimization Given the mixed strategy, the researchers’ indifference conditions (28) and (29) are given by h i h io 1n P m−i = 0|β, P iv E β−i |m−i = 0, β + P m−i = ∅|β, P iv E β−i |m−i = ∅, β = 0 2 h i h io 1n β+ P m−i = ∅|β, P iv E β−i |m−i = ∅, β + P m−i = 1|β, P iv E β−i |m−i = 1, β = 0 2 β+ Substituting the formula of truncated normal distribution, these conditions are i 1 h β i + E β−i |β−i ≤ β −i , β i = 0 2 i 1 h βi + E β−i |β−i ≥ β−i , βi = 0 2 Note that when β i = −βi and β −i = −β−i , these conditions are equivalent to each other. Moreover, the solution β i is strictly decreasing in β −i . Combining, there exists a solution β i = −βi = β −i = −β−i that satisfies the researchers’ indifference conditions. Stability. Since the policymaker’s strategy depends on the indifference, it is fragile to a small perturbation of researcher’s strategy. Optimality. With an argument analogous to the equilibrium with no omission, a small perturbation can improve 71 the welfare. Moreover, such non-equilibrium deviation is strictly dominated by the equilibrium characterized in Proposition 1. Thus, the equilibrium with symmetric omission cannot be an optimal equilibrium. A5 Example with binary signals Consider the modification of the set-up in Propositions 1 and 2 with 2 researchers from continuous into binary signals. The state space, communication technology, payoff, and information structure remain unchanged. Maintaining the symmetric structure while allowing c > 0, suppose the signals are si = 1 (βi ≥ 0) Let p ≡ P (βi ≥ 0|b ≥ 0) > 1 2, to be the precision of the signal which is decreasing in σ. Note that, due to symmetry of normal distribution, P (βi < 0|b < 0) = p also holds. Denote the maximum cost that makes the policy implementation socially optimal given two positive signals as c ≡ E [b|s1 = 1, s2 = 1]. If c > c, then communication is never beneficial since the policy is never worthy of implementation.70 Proposition A5: Suppose c ∈ [0, c]. (i) An optimal and stable (if c ∈ (0, c)) equilibrium has • the policymaker’s strategy mi π∗ 1 0 m−i 0 1 0 1 0 0 • researcher’s strategy ( mi = 1 0 if si = 1 if si = 0 (ii) If c = 0, the following policymaker’s strategy is also optimal with identical researcher’s strategy: mi π∗ 1 0 m−i 0 1 1 1 2 0 12 Proof. Equilibrium conditions. It is optimal to implement the policy if and only if posterior belief E [b|m] > c. Assuming the truth-telling, the posterior belief of policymaker is given by the Bayes rule: P (b ≥ 0) P (m1 = 1, m2 = 1|b ≥ 0) P (m1 = 1, m2 = 1) 2 p 1 > = 2 2 p2 + (1 − p) P (b ≥ 0|m1 = 1, m2 = 1) = 70 Formally, p = ∞h 0 φ( − b ) b + σ̃ 1−Φ −σ̃ b φ ( σ̃ ) i b σb db. Moreover, E [b|s1 = 1, s2 = 1] = 72 q 2 2 2 p −(1−p) , σ 2 π p2 +(1−p)2 b where π is the circular constant. P (b ≥ 0) P (m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1|b ≥ 0) P (m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1) 1 p (1 − p) 1 = 2 = p (1 − p) 2 P (b ≥ 0|m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1) = P (b ≥ 0|m1 = 0, m2 = 0) = (1 − p)2 1 < 2 2 p2 + (1 − p) Thus, the expected welfare when a = 1 given messages is EW (a = 1|m1 = 1, m2 = 1) = P (b ≥ 0|m1 = 1, m2 = 1) E [b|b ≥ 0] − P (b < 0|m1 = 1, m2 = 1) E [b|b < 0] − c s = 2 p2 − (1 − p)2 −c≥0 σb2 π p2 + (1 − p)2 s EW (a = 1|m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1) = 2 σb2 π 1 1 − 2 2 −c≤0 EW (a = 1|m1 = 0, m2 = 0) = P (b ≥ 0|m1 = 0, m2 = 0) E [b|b ≥ 0] − P (b < 0|m1 = 0, m2 = 0) E [b|b < 0] − c s = 2 (1 − p)2 − p2 −c<0 σb2 π p2 + (1 − p)2 Therefore, the policymaker’s strategy is optimal given the sincere reporting of researchers. To verify that the researchers’ strategies are also the best response to the policymaker’s strategy, let us consider the expected welfare from reports conditional on being pivotal (that is, s−i = 1): s EW (mi = 1|si = 1, s−i = 1) = 2 p2 − (1 − p)2 −c≥0 σb2 π p2 + (1 − p)2 EW (mi = 0|si = 1, s−i = 1) = 0 Therefore, m∗i = 1 when si = 1. s EW (mi = 1|si = 0, s−i = 1) = 2 σb2 π 1 1 − 2 2 −c≤0 EW (mi = 0|si = 0, s−i = 1) = 0 Therefore, m∗i = 0 when si = 0. When c = 0, π (m1 = 1, m2 = 0 ∨ m1 = 0, m2 = 1) = 1 2 is also optimal since the policymaker is indifferent between implementing and not implementing the policy. It also maintains the sincere reporting by researchers to be the optimal strategy. Optimality. Since there is no garbling of signals, other equilibria cannot attain strictly higher welfare than this equilibrium by the Blackwell’s theorem. 73 Stability. To see that the equilibrium is stable as long as c > 0, note that the pure-strategy decisions are based on strict preference over actions. First, observe that the posterior beliefs are continuous in the researchers’ reporting strategies. Thus, for a small perturbation on the researchers’ strategy, the supermajoritarian rule described in (i) is optimal. Second, observe that the difference in expected welfare EW (mi = 1|si , m−i ) − EW (mi = 0|si , m−i ) is also continuous in another researchers’ strategy. Therefore, even if another researcher deviates from truth-telling slightly, it is still optimal to equate the message with the signal. Thus, for a small perturbation of initial strategy, the iterative adjustment converges to the equilibrium. Note that the Proposition A1 can be extended to the case with N ≥ 3 unlike in the model of Austen-Smith and Banks (1996). The difference is that, in this model, there is no exogenous requirement to apply a particular voting rule in aggregating evidence. Therefore, when c > 0 is large, the policymaker strategy can adjust to be more conservative. A6 Continuity under small conflict of interest Consider an example such that the objective is given by a (b + d). By Lamma 1, the monotone strategy remains to be optimal since d is analogous to c. The indifference conditions are given by β−ρβ 0 β + ρβ − σ̃F σ̃ β−ρβ β−ρβ 0 β + ρβ − σ̃G σ̃ , σ̃ = −d 2 + = −d 2 + σ2 σb2 σ2 σb2 , where F (x) = ln [1 − Φ (x)] and G (x, y) = ln [Φ (x) − Φ (y)] and Φ (·) is a cdf of standard normal distribution. Rearranging, 2σ 2 +σ 2 2b 2 β = σ̃f β, β − d 2 + σb +σ 2σ 2 +σ 2 σ2b+σ2 β = σ̃g β, β − d 2 + b , where f β, β = F0 β−ρβ σ̃ and g β, β = G0 β−ρβ β−ρβ σ̃ , σ̃ σ2 σb2 σ2 σb2 . By the implicit function theorem for system of equations, " ∂β # ∂d ∂β ∂d ∂f σ̃ ∂β = 2σ 2 +σ 2 − σ2b+σ2 b ∂g σ̃ ∂β ∂f σ̃ ∂β 2σ 2 +σ 2 ∂g σ̃ ∂β − σ2b+σ2 b − 2+ = Since ∂f ∂β < 0 and ∂g ∂β ∂f σ̃ ∂β − 2σb2 +σ 2 σb2 +σ 2 < 0, ∂f σ̃ ∂β − σ2 σb2 2σb2 +σ 2 σb2 +σ 2 ∂g σ̃ ∂β − ∂g σ̃ ∂β −1 σ2 − 2 + σb2 2 σ − 2+ 2σb2 +σ 2 σb2 +σ 2 − σb2 σ̃ + 2σb2 +σ 2 σb2 +σ 2 ∂g ∂f σ̃ 2 ∂β ∂β σ̃ ∂g ∂β − ∂g − ∂β ∂f ∂β + ∂f ∂β 2σb2 +σ 2 σb2 +σ 2 2σ 2 +σ 2 + σ2b+σ2 b + ∂g ∂f + σ̃ 2 ∂β ∂β > 0 and thus the derivatives ∂β ∂d and ∂β ∂d are defined everywhere. Therefore, the thresholds β, β are continuous in d. One can modify this approach for the preference for publication described by utility function ab + d1 (m 6= 0). The preference for publication can be incorporated into the indifference conditions as willingness to sacrifice the social welfare b − c by the discrepancy d in the direction that allows for publication. Thus, the conditions are then given by 74 β−ρβ β + ρβ − σ̃F 0 σ̃ β−ρβ β−ρβ , β + ρβ − σ̃G0 σ̃ σ̃ =d 2+ σ2 σ02 = −d 2 + σ2 σ02 , where the only difference is the sign of d for the first condition. Therefore, the implicit function theorem still applies and the derivatives are still well-defined. Note that the researcher’s heterogeneous prior could be thought of analogously. Suppose the prior of researchers 2 is given by N b̂0 , σb with b̂0 6= 0, whereas the policymaker has the prior N 0, σb2 . The researchers’ indifference conditions are written as h β + E β−i |β−i h β + E β−i |β > β−i σ2 σ2 ≥ β, βi = β + b̂0 2 = c 2 + 2 σ0 σ0 ! σ2 σ2 ≥ β, βi = β + b̂0 2 = c 2 + 2 σ0 σ0 ! i i In this way, one can observe that the difference in the indifference condition due to b̂0 is almost entirely analogous to d, that of conflict of interest. Appendix B. Figure B1: Monotonicity with heterogeneous G (σ) Notes: Figure B1 plots the posterior belief of policy benefit conditional on one’s own signals and another researcher’s signal when it is pivotal: E [b|βi , σi , mi ∈ P iv] for N = 2, c = 0, σb = 31 , and heterogeneous G (σ). While only σi = 0.1, 0.5, 1 are plotted for visual ease, posteriors for other values of σi also have the same pattern. In equilibrium, the posteriors are monotone in βi even for heterogeneous G (σ), confirming that the threshold strategy is optimal. 75 Figure B2: Likelihood of policy implementation Notes: Figure B2 plots the probability of policy implementation P (a = 1) for N = 2, c = 0, σb = 31 , and various values of standard error of signal, σ, in the equilibriun characterized by Proposition 1. It shows that, relative to the policy implementation probability under communication of estimates, P (a = 1) = 12 , policy is slightly less likely to be implemented. This is primarily due to the conservative rule of supermajoritarian voting rule a∗ = 1 ⇔ n1 > n0 , largely mitigated by the thresholds β, β that lead to the upward bias of the estimates that underly reported studies. Figure B3: Local stability and optimality for σ > σ Notes: Figure B3 plots the combined best response functions that solve the following fixed point for each β j for N = 2, c = 0, σb = 13 , and various values of standard error of signal σ in the equilibriun characterized by Proposition 1: β i solves the fixed point of the combined best response function β i = BRi β j , βj , β i = BRi β j , BRj β i , β i . The condition for optimality and stability is that ∂β i | ∂β j β i =β j > −1 so that the strategic substitutability is not too strong. This in turn implies optimality and local contraction stability. The figure shows that this condition holds not only for various values of σ from as small as 0.1σb to as large as 10σb ; that is, the slope of β i β j evaluated at 45 degree line is less steep than the negative 45 degree line. While not plotted, the function β i β j for σ much larger than 10σb is also not visually distinguishable from that of 10σb . 76 Figure B4: Optimality of supermajoritarian voting rule Notes: Figure B4 plots the welfare (measured as a fraction of benchmark case with communication of estimates) under a supermajoritarian rule (a∗ = 1 ⇔ n1 > n0 ) and a submajoritarian rule (a∗ = 1 ⇔ n1 ≥ n0 ) for N = 2, σb = σ = 1, and various values of c ≥ 0. It shows that the supermajoritarian rule attains higher welfare than the submajoritarian rule for c > 0, and identical welfare for c = 0. The supermajoritarian rule is better than the submajoritarian rule especially when c is high and thus there is large relative welfare loss. Figure B5: Approximation of empirical distribution Notes: Figure B5 plots the imputed empirical distribution of standard error σ against χ2 distribution with 4 degree of freedom between 0 and 10. The distribution is imputed because the observed distribution may not represent the entire underlying distribution because of omission of some studies. The empirical distribution is constructed with the positive significant results −1 −1 P P √ 2 i −b̂20 √ 2 i −b̂20 from the example 1 of labor union (G (σ) = C1 i 1 − Φ 1.96σ 1 (σ ≥ σi ) , where C = i 1 − Φ 1.96σ is σi −σ̂0 σi −σ̂0 the normalizing constant. The largest standard error is normalized to be 1. The figure shows that χ2 distribution with 4 degree of freedom approximates the empirical distribution reasonably, and was used for the numerical illustration in Section 3.3. 77 Figure B6: Example of convex β (σ) Notes: Figure B6 plots the thresholds β (σ) and β (σ) for c = 23 , σb = 13 , and heterogeneous G (σ). This example shows that β (σ) can be convex for large c > 0. This is because, when c is large, the effect of conservative prior as expressed in (5) is also large, and overturns the effect of σ through the weights on Bayesian updating. Even though the difference between the threshold of N = 1 and N = 2 grows as σ increases, the convexity of the threshold of N = 1 may dominate. Figure B7: Posterior consistency Notes: Figure B7 plots the distribution of posterior belief Eb (n) in the equilibrium of supermajoritarian policy rule (a∗ = 1 ⇔ n1 ≥ n0 ) with σb = σ = 0 and c = 0 for N = 2, ..., 10. It shows that, when there are marginally more positive results than negative results (n1 = n0 + 1), then the posterior belief is positive; conversely, when there are marginally no more positive result than negative results (n1 = n0 ), then the posterior belief is negative. Since the posterior beliefs are monotone in the number of positive and negative results, this confirms that the conjectured supermajoritarian policy rule (a∗ = 1 ⇔ n1 ≥ n0 ) is consistent with the belief and utility maximization of the policymaker. Note that, for N ≥ 3, an arbitrarily conjectured policy rule has no guarantee to be supported in equilibrium. While the Proposition 1 characterizes equilibrium only for N = 2, this shows that the result most likely extends to much larger N . The Figure B7 is suggestive that this remains to be an equilibrium for larger N as the posterior belief steadily approaches posterior Eb (n) = 0. This result of consistency of supermajoritarian policy rule holds for various values of c. Here, the analysis restricts to the case of degenerate distribution of σ due to the computational feasibility. 78 Appendix C. Empirics C1 Kolmogorov-Smirnov Test with entire data Table C1: Summary of Kolmogorov-Smirnov-type tests with entire sample Labor union effect EIS G∅ H0 G∅ H0 p-value 0.0003 0.0800 0.0786 0.9113 D 0.3740 0.1448 0.1933 0.0195 D 0.2330 0.1606 0.1960 0.1671 arg sup 0.0831 -0.0629 0.1594 9.1830 Notes: Table C1 summarizes the results of KS-type tests applied to the samples not selected by whether the binary conclusions were highlighted in the papers’ abstracts, introduction, or conclusion. D denotes the KS statistic with the predicted distribution as in (24) and (25). D denotes the critical values of KS statistics for rejecting the null hypothesis at 95 confidence level. This is included for benchmark comparison although the p-value cannot be computed by comparison of KS statistics. arg sup denotes the parameter values {σ, β} at which the KS statistic is measured. The p-value is the overall p-value computed by bootstrap. The estimates use the stem-based estimates identical to the ones presented in Table 4. In addition to the Table 4 that focused on the estimates that focused on the papers with binary conclusions, Table C1 summarizes the test statistics with the entire sample to be fully transparent. First, it shows that the overall qualitative patterns remain even with the full sample: the p-value for G∅ (σ) is highly statistically significant, and H0 (β) is significant at 10 percent level. Second, nevertheless, the overall result is weaker compared to the case that focused on the papers with binary conclusions. This is consistent with the explanation that the pattern was driven by the concern to determine yes-or-no conclusions. C2 Funnel plot of examples Figure C1: Funnel plot for labor union effects (Left) and EIS (Right) Notes: Figures C1 are the funnel plots of all data points for both the labor union effects and EIS literature. The funnel plot for labor union effects visually shows that there is scarcity of null results among the imprecisely estimates studies. The KS-type test presented in the paper is a formal test of this pattern. 79 C3 Figures of Kolmogorov-Smirnov Test for EIS Figure C2: Distribution of negative studies’ standard errors and coefficients for EIS Notes: Figure C2 plots the cumulative distribution function of standard errors of null studies (Top) and coefficients of negative studies (Bottom) for estimates of EIS. The solid green line represents the observed distribution and the dashed blue line represents the predicted distribution along with its 95 confidence interval estimated by the bootstrap. The solid red and vertical line quantifies the KS statistic for distribution of standard errors, and the KS statistic of the hypothesis that Ĥ0 < H̃0 for distribution of coefficients. The 95 confidence interval for H̃0 appears small primarily because of the scale of axies along β. C4 Proofs of Proposition 3 The following proofs show the limit and monotonicity of bias for each three model one by one. 80 Uniform selection model Using the truncated normal distribution’s formula, one can express the bias under uniform selection model as Bias (σi ) ≡ E [βi |σi , study i reported] − b0 h = b0 + σ02 + σi2 = − σ02 + σi2 i η1 −φ β + φ β q h i η1 Φ β + 1 − Φ β −tσi −b0 ∂ ln η1 Φ √ 2 2 σ0 +σi i i − b0 h + η0 Φ β − Φ β + 1 − Φ √tσi2−b0 2 + η0 Φ √tσi2−b0 2 σ0 +σi σ0 +σi −tσi −b0 −Φ √ 2 2 σ0 +σi ∂b0 = − σ02 + σi2 h − η0 φ β − φ β −tσi −b0 ∂ ln η1 + (η1 − η0 ) Φ √ 2 2 σ0 +σi − Φ √tσi2−b0 2 σ0 +σi ∂b0 , where Φ in the final expression denotes the cdf of standard normal distribution. (1) Limit. By the continuity and finiteness of limit, one can distribute the limit: lim Bias (σi ) = lim − σ02 + σi2 σi →0 ∂ ln η1 + (η1 − η0 ) limσi →0 σ0 +σi − Φ √tσi2−b0 2 σ0 +σi ∂b σi →0 n = −σ02 × −tσi −b0 Φ √ 2 2 h ∂ ln η1 + (η1 − η0 ) Φ −b0 σ0 −Φ −b0 σ0 io0 ∂b0 ∂ ln {η1 } = −σ02 × =0 ∂b0 Therefore, limσi →0 Bias (σi )2 = 0. Since Bias2 cannot be than 0, this less is the minimum. −tσ −b tσ −b (2) Bias monotonicity. Denoting Z (b0 , σi ) = Φ √ i2 0 2 − Φ √ 2i 02 for notational ease, one can rewrite the σ0 +σi σ0 +σi bias as Bias (σi ) = − σ02 + σi2 ∂ ln Z ∂Z (b , σ ) 0 i ∂Z ∂b0 q = 0 for ∂ ln [η1 − (η1 − η0 ) Z] tσi − b0 −tσi − b0 σ02 + σi2 φ q − φ q ∂Z σ2 + σ2 σ2 + σ2 i 0 i It q is sufficient to confirm that all three factors are increasing in σi when σi is near zero. This is clearly true σ02 + σi2 . ∂ ln[η1 −(η1 −η0 )Z] ∂Z By symmetry, φ √tσi2−b0 2 is increasing in Z since η1 > η0 , and Z is increasing in σi for sufficiently small σi . −tσi −b0 −φ √ 2 2 σ0 +σi σ0 +σi = φ √tσi2−b0 2 σ0 +σi − φ √tσi2+b0 2 , which is initially increasing in σi and σ0 +σi decreasing in σi afterwords. To see this, notice that, as σi increases the mean of the distribution, √ tσ2 i σ0 +σi2 the truncation points √ limσi →∞ φ √tσi2−b0 2 σ0 +σi b0 , − √ b20 2 σ02 +σi2 σ0 +σi − φ √tσi2+b0 2 σ0 +σi divergence of limσi →∞ , shifts while remain symmetric around 0. Thus, the bias initially increases. Nevertheless, = φ (t) − φ (t) = 0. Since limσi →∞ ∂ ln[η1 −(η1 −η0 )Z] ∂Z is finite if η0 > 0 and the q σ02 + σi2 = ∞ is not fast, by L’Hopital’s rule, the term limσi →∞ φ √tσi2−b0 2 −φ √tσi2+b0 2 σ0 +σi σ0 +σi = 0 dominates. Combining, the Bias2 will be zerofor large σi . Note that Bias(σi )2 could be constant decreasing towards i since it is 0 throughout σi if b0 = 0 by Φ √−tσ 2 σ0 +σi2 The expression φ √tσi2−b0 2 σ0 +σi − φ √tσi2+b0 2 σ0 +σi − Φ √ tσ2 i σ0 +σi2 = 0. provides some information about the value of σ since this difference 81 is approximately at its peak when one argument is near 0. If b0 > 0, this occurs when σi = b0 t ; that is, when σi is at the precision such that the null hypothesis test just rejects under the specified t-statistic threshold and true value of b0 . Since other terms will still be increasing, it shows that when b0 high, the bias monotonicity holds for the relevant range. Thus, if b0 were low, σ could also be very low so that the bias monotonicity may not hold for most of the observed values of σi . Nevertheless, this is unlikely to be an important concern for two reasons: 1. When b0 is low, the potential bias itself is very low. Therefore, it is highly unlikely that the rejection of null hypothesis can be driven solely by the bias itself. 2. The pilot value of b0 is specified to be the one closest to 0 among all possible values of n. Therefore, if the bias indeed decreases as n becomes larger, the mean estimate that includes those with lowest bias will be included. Gradual selection model Consider the gradual selection model whose selection probability is given by the probit model. This is analogous to the Heckman selection model with joint normality assumption on the error terms from the first and second stages. The publication probability in the text can be written as 1 P (mi = 6 ∅) = P δ0 + δ1 + εi > 0 σi , where εi ∼ N (δ2 βi , 1). Then, the bias can be written as Bias (σi ) ≡ E [βi |σi , study i reported] − b0 q = δ2 σ02 + σi2 φ δ0 + δ1 σ1i Φ δ0 + δ1 σ1i (1) Limit. By continuity and finiteness of limit, and under the model’s assumption of δ1 > 0, lim Bias (σi ) = δ2 lim σi →0 σ02 + σi2 lim σi →0 σi →0 (2) Bias monotonicity. Note that the inverse Mills ratio φ increases the term δ0 +δ1 σ1 i Φ δ0 +δ1 σ1 i φ(·) Φ(·) =0 φ δ0 + δ1 σ1i q Φ δ0 + δ1 σ1i is a decreasing function of the argument, as σi increases. Since q σ02 + σi2 is also increasing in σi , Bias (σi )2 is increasing in σi . Extremum selection model Consider the extremum selection model in which, for expositional simplicity, the mean is normalized to b0 = 0. The bias can be thought of analogously for b0 6= 0. (1) Consider βmin ≥ 0. Note that the monotone likelihood ratio property (MLRP) is satisfied for the normal distribution: for any σ > σ 0 and for any β > β 0 > βmin , f (β|σ) f (β|σ 0 ) > f (β 0 |σ) f (β 0 |σ 0 ) because β 2 − β 02 exp − 2 σ0 + σi2 ! β 2 − β 02 > exp − 2 σ0 + σi02 82 ! By MLRP, E [β|σi , β ≥ βmin ] > E β|σi0 , β ≥ βmin Thus, the bias is also increasing in σi . (2) Suppose βmin < 0. Then, the bias is Bias (σi ) ≡ E [βi |σi , study i reported] = φ q σ02 + σi2 √βmin σ02 +σi2 1 − Φ √βmin 2 σ0 +σi2 Both limit and bias monotonicity comes from the observation that Mill’s ratio φ(·) 1−Φ(·) q σ02 + σi2 is increasing in σi , and the inverse is a decreasing function of the argument. Therefore, bias is increasing in σi . Since the bias is increasing everywhere on σi ≥ 0, it approximates its minimum at the limit σi = 0. C5 Mean Squared Errors 83 Figure C3: Bias-variance trade-off for labor union effects (top) and EIS (bottom) Notes: Figures C3 plot how the mean squared error of the stem estimate changes as more studies are included. The mean squared error is decomposed into bias term (Bias2 ), which is increasing in number of included studies, and variance term, which is decreasing in number of included studies. They show that, when combined, the MSE is approximately convex in both data. Appendix References Esponda, Ignacio, and Demian Pouzo. 2012. “Learning Foundation and Equilibrium Selection in Voting Environments with Private Information.” University of California, Berkeley mimeo Fisher, Franklin M. 1961. “The Stability of the Cournot Oligopoly Solution: The Effects of Speeds of Adjustment and Increasing Marginal Costs.” The Review of Economic Studies 28 (2): 125–35. doi:10.2307/2295710. “The Theory of Learning in Games.” 2016. Fudenberg, Drew, and David K. Levine. 1998. “The Theory of Learning in Games” MIT Press. Theocharis, R. D. 1960. “On the Stability of the Cournot Solution on the Oligopoly Problem.” Review of Economic Studies 27 (2): 133–34. 84
© Copyright 2026 Paperzz