A General Framework for
Combining Information
Outline
& A Frequentist Approach to
Incorporate Expert Opinions
Minge Xie
Department of Statistics
Rutgers, The State University of New
Jersey
Motivating example
Statistical inference using confidence distributions
(CDs)
CD concept: a “distribution estimator” of a parameter
General framework for combining information
Incorporating expert opinions to clinical trials
Supported in part by grants from NSF, NSA and
OMJSA
Clinical trial on Migraine Pain (Johnson & Johnson
Inc.)
Potential problems of Bayesian solutions
Modeling of expert opinions
Combining Information via confidence distributions
Motivating Example:
Migraine Pain Clinical Trial (J&J)
Binomial Clinical trial on migrant headache
Randomized trial of two treatment groups:
n1 subjects in A group; n0 subjects in B group (control)
Binary responses:
Expert opinions on the improvement (δ = p1 – p0)
Being solicited from 11 experts, prior to the clinical trial.
Following the design of Parmar et al. (1994), Spiegelhalter et al. (1994)
1
Goal: Incorporate expert opinions with information in the
clinical data
(Courtesy of J&J)
“Worse” ← Improvement δ = p1 – p0 → “Betters”
Generating hypothesis testing questions for future studies
Univariate Baysian Approach
Naïve Bayesian: directly apply Spiegelhalter et al. (1994)’s
approach for normal clinical trials
Normal Prior (fit a normal curve to histogram)
Likelihood (estimated)
Posterior (explicit)
Where
and
(Courtesy of J&J)
1
Issues in the
Naïve Bayesian Approach
Theoretical Issue: Strictly speaking, Bayesian theory does
not apply in the univariate Bayesian approach.
Bayesian method is not good at “division of labor” (Efron 1986,
Wasserman 2007)
Bi-Beta prior (Olkin and Liu, 2003)
Normal prior could also be an issue
Need to know
to apply the Bayesian formula, which
is NOT possible in the binomial trial!
Bi-variate Bayesian Approach
Bounded range of δ = p1 – p0
Possible skewed distribution of expert opinions
Likelihood function
Full Bayesian solution: use “a comprehensive Bayesian
model” to jointly model p0 and p1.
Marginals are beta-distributed; Correlation ranges from 0 to 1.
Additional information (or assumption) on p0 is required
Two independent binomial distributions (or normal approximations)
Posterior
No explicit formula and need to use an MCMC algorithm
Skewed Histogram of Prior
Opinions
p1 – p0
p1 – p0
Bi-variate Bayesian Analysis
Frequentist Alternative
Feature:
Directly model the parameter of interest δ, instead of both p0 & p1
Counter-intuitive!
(A Paradox?)
This is the so called “the edge of division of labor” of a frequentist
approach (Efron, 1986, Wasserman, 2007).
No prior information on p0 and p1 is collected or available in the J&J
project
Problem:
Only experts opinion, no actual data, are available on prior
experiments
Not possible in conventional frequentist inference
Solution: A Frequentist Bayesian compromise approach
utilizing confidence distributions (CDs)!
2
Background on Confidence
Distribution (CD)
Parameter Estimations
Point Estimation
Confidence Intervals
Confidence Distributions (CDs)
Historic connection to fiducial distribution, which is largely
considered as “Fisher's biggest blunder” (see, Efron, 1998)
Not yet seriously looked at the possible utilities of CDs especially
in statistical applications
0.8
Hn(θ)
1.0
(i): For each given sample set Xn ∈ X, Hn(·) is a continuous
cumulative distribution function in the parameter space Θ;
[Translation: Hn(θ), as a function of θ, should be a distribution
function on the parameter space]
(ii): At the true value θ = θ0, Hn(θ0) = Hn(Xn, θ0), as a function of the
sample set Xn, has the uniform distribution U(0,1).
[Translation: Hn(θ0), at true θ0 , should have the “right” information]
0.2
0.4
0.0
0.6
0.5
ξn(α)
0.0
-0.5
⇒ The inverse function
Hn(·) = ξn-1(·)
is a CD in the usual Fisherian sense.
Highlights: New definition (pure frequentist/non-fiducial);
Likelihood connection; CD combination; Predictive
distribution; HCDR, etc.
A function Hn(θ) = Hn(Xn,θ) on X × Θ → (0,1) is called a
confidence distribution (CD) for a parameter θ, if
Classical CD function (Efron 1993):
Let ( - ∞, ξn(α)] be a 100α% lowerside confidence interval for θ, at
every level α ∈ (0,1).
Assume the ξn(α) = ξn(Xn, α) is
continuous and increasing in α.
Efron (1993, 1998), Fraser (1991, 1996), Lehmann (1993), Singh, Xie
and Strawderman, (2005, 2007), Schweder and Hjort (2002, 2003,
2008), Schweder, 2003, Parzen (2005), Lawless and Fredette (2005),
Bickel (2007), Xie, Singh and Strawderman (2008), Tian, Wang, Cai
and Wei (2008), Xie, Liu, Damaraju and Olson (2009), etc.
A Formal Definition
Classical Notion of CD
Bootstrap distributions are “distribution estimators” and CDs.
Efron (1998) has “a prediction” that “something like fiducial inference”
may will “become a big hit in the 21st century”.
Some recent articles:
useful device for constructing all types of frequentists' statistical
inferences, e.g., point estimator, confidence interval, p-value, etc.
Long history but little attention on CDs in the past
The renewed interest starts from Efron (1998)
Loaded with information (comparable to Bayesian
posterior distributions)
Renewed interest in CDs
0.0
0.2
0.4
0.6
0.8
1.0
-0.5
0.0
0.5
(Adopted from Singh et al., 2005)
α
A Formal Definition (cont.)
The function Hn(·) is called an asymptotic confidence
distribution (aCD), if (ii) is replaced by
(ii)': At θ = θ0, Hn(θ0) → U(0,1), as n → ∞, and the
continuity requirement on Hn(·) is dropped.
θ
Connection to Classical
Notion
In the classical example where Hn(θ) = ξn-1(θ) and ξn(α)
being the upper end of lower-side confidence interval:
For any α ∈ (0,1) and at θ = θ0,
P{Hn(θ
θ0) ≤ α} = P{θ
θ0 ≤ ξn(α)} = α.
⇒ Hn(θ0) = Hn(Xn, θ0), as a function of the sample set Xn,
is U(0,1) distributed!
3
Depiction of the 2nd
Requirement
CD Subsumes a Broad Range
of Examples
Hn(θ) ≤ 1 - Hn(θ)
when θ < θ0
s.t.
( Hn(θ0) = 1 - Hn(θ0) )
Regular parametric cases
s.t.
s.t.
Hn(θ0) ~ U(0,1)
Hn(θ) ≥ 1 - Hn(θ)
when θ > θ0
Including classical examples of fiducial distributions & plus more
Bootstrap distributions
Likelihood function (normalized as function of parameters)
Including likelihood function, profile likelihood, implied likelihood,
etc. (e.g., Singh, Xie, Strawderman, 2005, 2007, Efron,1993, Schweder
and Hjort, 2008 )
Hn(θ0) 1-Hn(θ0)
H (θ
n
) 1-H
H n(θ
n (θ)
θ = θ0
(balance0 scale)
Right Amount of Information
balanced at θ=θ0
(θ
1-H n
)
θ
(θ0<θ)
θ
(θ < θ )
(balance scale)
)
(balance scale)
Wrong Amount of Information
not balanced at θ ≠ θ0
“s.t.”: stochastic comparison between two random variables;
s.t.
The significance (p-value) functions (Fraser, 1991)
Predictive distribution (Lawless and Fredette 2005)
Some Bayesian priors and posteriors
Among others
For examples, X ≤ Y ⇔ P(X ≤ t) ≥ P(Y ≤ t) for all t,
s.t.
s.t.
s.t.
X = Y ⇔ X ≤ Y and Y ≤ X
CD Example 1: Normal Mean
Assume X1, …,Xn is from N(µ, σ2).
A CD for µ (σ known):
CD Example 2:
Bootstrap Distribution
Let
be a consistent estimator of θ
be the corresponding bootstrap estimator
An aCD for θ is
As n → 1, the limiting distribution of centered is often
symmetric. In this case, the raw bootstrap distribution
⇔ µ is estimated by N(X, σ2/n)!
A CD for µ (σ unknown):
is also an aCD for θ.
where Ftn-1 is the cumulative distribution
function of the tn-1 distribution.
CD Example 3:
Informative Bayesian Prior
Informative prior distribution can be viewed as a CD
An informative prior distribution of mean parameter θ
π(θ) ~ Ν(µ0,σ02)
Past experiment: summary statistic X0 with variance
var(X0) = σ02 and realization (observation) X0 = µ0.
A General Framework of
Combining Information
Assume
are CDs from k independent studies.
A continuous & monotonic function (in each coordinate)
Define
Denote by X the sample space of the past experiment
where
Θ the parameter space of the mean θ
⇒ H0(θ) = Φ((θ - X0)/σ0) is a CD function on X × Θ → (0,1)!
We call H0(θ) = Φ((θ - X0)/σ0) = Φ((θ - µ0)/σ0) a CD
from the (hypothetical) prior experiment
are i.i.d. U(0,1) random variables
Combined CD function:
is a CD function for the parameter θ
4
Convenient Choices
Convenient Choices
A convenient choice
for any continuous cumulative distribution function F0
⇒
A convenient choice
for any continuous cumulative distribution function F0
⇒
where
and
means convolution
where
and
means convolution
Weighted approach:
⇒
where
A Special Case F0 = Φ
Unification of Meta-analysis
approaches
No weights combination
⇒ Classical meta-analysis of p-value combination
using normal distribution function
Weighted combination
⇒ Normal model based meta-analysis methods
.
Meta-analysis methods
Classical p-value
combination
approaches
from Marden (1991)
Model based
meta-analysis
approaches
from Table IV
of Normand (1999)
Can be derived under the
CD combination framework?
P-values comb.: Fisher method
Yes
P-values comb.: Normal method
Yes
P-values comb.: Tippett method
Yes
P-values Comb.: Max method
Yes
P-values Comb.: Sum method
Yes
Fixed effects model: MLE method
Yes
Fixed effects model: Bayesian method
Yes
Random effects model: Method of moment
Yes
Random effects model: REML method
Yes
Random effects model: Bayesian method
(normal prior and fixed τ)
Yes
From Xie et al. (2008)
Error Bounds on Combining
Summary on CDs
Lemma (Xie et al 2008). Let
Two levels of unifications:
CD, as a pure frequentist concept, covers a wide range
of examples
for all
Then
Theoretical: If individual CD Hi is accurate up to
, the
combined CD function H(c) is accurate up to
, where
.
Practical: The CD combination approach still works, even if
the model assumptions might be slightly off or
approximations of CD functions are used.
Useful device for statistical inference
The proposed CD combination methods subsume
important existing information combination
approaches
Flexible, efficient and broad
Offers new methods beyond conventional approaches
Frequentist’s Bayesian compromise to incorporate
expert opinions
Robust Meta-analysis approach
5
CD to Summarize Expert Opinions
CD to Summarize Expert Opinions
Question: How to use a
CD to summarize the
expert opinions?
Model Assumptions
Experts are randomly sampled from a large pool of
experts and they are subject to errors.
The average of their errors tends to zero when the number of
experts participating the survey goes to infinity.
There are past experiments and pre-existing knowledge
on the improvement d = p1 – p0, to which the experts are
exposed.
(Courtesy of JnJ)
Model/CD of Past Information
Let
be a “sufficient statistic” (sample realization
Model/CD of Expert Opinion
)
Summarize the information on the improvement δ from the past
experiments or knowledge.
Monotonically increase/decrease in δ and its distribution is free of δ.
Example:
Pre-existing knowledge is the realization
not δ0.
is modeled as a bootstrap copy of
:
.
⇒ A CD function for δ from the past experiments
where
Problem:
is unknown/unobserved
Rely on expert opinions to recover the prior information!
CD to Summarize Expert
Opinions
Let
be the summary statistic of the past experiments
or knowledge to which the ith expert has be exposed
(Assumption) Exists a function
These experiments could be some informal, similar, or even
virtual experiments, but we assume no actual data are available.
Answer: The distribution
underlying the histogram is
the CD!
(Error model) Experts are subject to errors:
Let
be the CD function for δ constructed from
the bootstrapped statistic.
Key connection: Treat the inputs in the survey table as
recollections of their
by the experts:
CD to summarize clinical trial
data
An aCD from its MLE estimator
with
.
The CD from past experiments/knowledge
Assume:
, as
The histogram is a distribution estimate of δ!!
When fit a normal curve, the CD density is the same a the
prior of the naïve Bayesian approach
Similar to Bayesian modeling approach by Genest and Zidek
(1986)
A CD from density/profile likelihood function
Let
be the log profile likelihood function of δ.
Denote by
.
⇒ A CD of δ is
6
CD Combination
: a CD for expert opinions (histogram or fitted curve)
: a CD from clinical data.
Combined CD using normal function (explicit):
An example: weights
.
When we use the normal approximation and fit a normal curve
to the histogram, it matches with the naïve posterior
Summary & Discussion
Empirical results:
When histogram is close to bell shaped, all agree with each other.
When histogram is skewed
Naive Bayesian approach can not catch the skewness
Full Bayesian approach produces a counter-intuitive solution
Computational effort and implementation:
Xie, M., Liu, R., Damaraju, C.V., and Olson, W.H. (2009). Incorporating
expert opinions with information from binomial clinical Trials. Technical report.
Department of Statistics, Rutgers University.
Xie, M., Singh, K., and Strawderman, W.E. (2008). Confidence distributions
and a unifying framework for meta-analysis. Technical report. Department of
Statistics, Rutgers University.
The CD approach is explicit and computationally simple
Relevant papers
No need of simulation or use of an expensive algorithm
The histogram of expert opinions can be directly used in the CD approach
Theoretical issues:
A fundamental flaw exists in the naive Bayesian approach
A full Bayesian solution is to jointly model (p0, p1)
•
Singh, K., Xie, M. and Strawderman, W.E. (2005). Combining Information
from Independent Sources Through Confidence Distributions. The Annals of
Statistics. 33, 159-183
•
Singh, K., Xie, M. and Strawderman, W.E. (2007). Confidence Distribution
(CD)-Distribution Estimator of a Parameter, in Complex Datasets and Inverse
Problems. IMS Lecture Notes-Monograph Series, 54, (R. Liu, et al. Eds.),
132-150.
•
Xie, M., Singh, K. and Zhang, C.-H. (2009). Confidence Intervals for
Population Ranks in the Presence of Ties or Near Ties. JASA. In Press.
No need to assume the form of the prior distribution or extra computing
Not possible to find a conditional density function f(data|δ).
Requires additional prior information on p0 or p1.
(Bigger problem) Could generate a “paradox” in the skewed case.
7
Thank You!
8
© Copyright 2026 Paperzz