Bayesian Tests and Model Diagnostics in Conditionally Independent

Bayesian Tests and Model Diagnostics in Conditionally Independent Hierarchical Models
Author(s): Jim Albert and Siddhartha Chib
Source: Journal of the American Statistical Association, Vol. 92, No. 439 (Sep., 1997), pp. 916925
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2965555 .
Accessed: 24/04/2011 05:35
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=astata. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Bayesian Tests and Model Diagnostics in
Conditionally Independent Hierarchical Models
Jim ALBERTand Siddhartha CHIB
from
distributed
Considerthe conditionallyindependenthierarchicalmodel (CIHM) in whichobservationsyi are independently
A are distributed
distributed
fromdistributions
f(yi 0I), the parametersOi are independently
g(01A), and the hyperparameters
simulatedby Markov
of all parametersof theCIHM can be efficiently
h(A). The posteriordistribution
accordingto a distribution
chain Monte Carlo (MCMC) algorithms.Althoughthese simulationalgorithmshave facilitatedthe applicationof CIHMs, they
generallyhave not addressedthe problemof computingquantitiesusefulin model selection.This articleexploreshow MCMC
can be used to computeBayes factorsthatare usefulin criticizing
algorithms
and otherrelatedcomputational
simulationalgorithms
a particularCIHM. In thecase wheretheCIHM modelsa beliefthattheparametersare exchangeableor lie on a regressionsurface,
priorbelief.Bayes factorscan also be used to judge
the Bayes factorcan measurethe consistencyof the data withthe structural
thenonexistenceor existenceof outliers,
of particularassumptionsin CIHMs, includingthechoice of linkfunction,
thesuitability
The methodsare illustratedin the situationin whicha CIHM is used to model structural
and the priorbeliefin exchangeability.
about a set of binomialprobabilities.
priorinformation
KEY WORDS: Bayes factor;Binomial data; Exchangeability;Exponentialfamily;Generalizedlinear model; Gibbs sampling;
Outliers;Partial
algorithm;
MarkovchainMonteCarlo; Metropolis-Hastings
Hierarchicalmodel;Link estimation;
Randomeffects.
exchangeability;
exchangebinomial
Thisarticlefocuseson thefollowing
disable model.Supposethatthe yi are independently
frombinomial(n
,pi) distributions
withsample
tributed
ofmodelcrit- sizes andprobabilities
thegeneralproblem
Thisarticleconsiders
the
ofsuccesspi. Reparameterize
ni
in- probabilities tothelogits =
of a conditionally
withinthecontext
icismandtesting
i log(pi/( - pi)). To model
pi
model(CIHM). Supposethatinde- thebeliefthatthe areexchangeable,
hierarchical
dependent
the0i areassumed
pi
pendentobservationsy = (yl, . . .y,Y) are observed,where to be a randomsamplefroma
distribution
and
T2)
N(A,
according
to thedensity
yi is distributed
f(yiI0) withunthe
second
a
at
T2) is assigned prior
stage.
knownparameter
0i. Supposethatthesetof n parameters(A,In thisarticlethisexchangeable
modelis usedintwoapsomestruc0 = 01,v v04n}arebelieveda prioritosatisfy
etal. (1985),one
plications.
In an examplefromTsutakawa
suchas exchangeability
or,moregenerturalrelationship,
in simultaneously
cancermortality
is interested
estimating
lower-dimensional
ally,a beliefthattheylie on a particular
of
ofcitiesinMissouri.Theincidence
ratesfroma number
surface.One can modelthisbeliefby assumregression
betweencities
canceris notbelievedto varysignificantly
distributed,
where0i is disingthat{Oi} areindependently
modelappearsplausible.
andtheuse of theexchangeable
of
froma priordensity
tributed
p(OS1A)whereA is a vector
In a secondexample,one wishesto estimate
theproporcommon
toall ofthen priordistributions.
hyperparameters
tionsofcorrect
itemsof a mathematanswerson different
a prior
by assigning
is completed
The modelspecification
test.Since all of thequestionscoverbasic
ics placement
set of hyperparameters.
distribution
p(A) to theunknown
skills,one mightexpectthatthe
algebraandtrigonometry
ofhierarchical
modelshavebeenpre(Generaldiscussions
of correctresponseon different
itemsto be
probabilities
sentedin Berger1985,chap.4, Lindleyand Smith1972,
theexchangeable
and therefore
again
assumption
similar,
andMorris1983a.)
seems
reasonable.
class of CIHM's can be defined
by takOne particular
In thetwenty
ofthenormal
yearssincetheintroduction
ing the samplingdensityf(yIO) to be a numberof an
Smith
CIHMs
Lindley
and
(1972),
hierarchical
model
by
1989;
exponential
family(Albert1988; Kass and Steffey
to
useful
model
structure
for
coma
have
been
shown
be
Lu 1992; Morris1983b).In particular,
supposethatthe
information
from
related
Gaver
experiments.
(See
bining
distributed
froman exponential
famyi areindependently
In addiet al. 1992 fora recentreviewof applications.)
Suppose
that
the
with
mean
ily density
f(yi1ij)
qj.
qj are tion,a number
of
new
have
been
computational
techniques
believeda priorito satisfythe generalized
linearmodel
in
for
distributions
this
summarizing
posterior
developed
mapg(71i)= 0-= xT/3,whereg is a knownlinkfunction
of
models.
For
Albert
and
Kass
and
example,
(1988)
class
pingthemeanrj to therealvaluedparameter
0i and 3 is
the
illustrated
Laplace
computations
using
Steffey
(1989)
an unknown
vector.One can model
regression
parameter
Gelfandand Smith(1990) and
a beliefin thisregression
structure
by taking0i indepen- methodandmorerecently,
and Smith(1990) illustrated
Racine-Poon,
Gelfand,
Hills,
fromN(xT/3, T2) distributions
and then
dentlydistributed
the
use
of
Markov
Monte
chain
Carlo(MCMC) algorithms
thehyperparameters
(/3,T2) a knowndistribution
assigning
in
of
from
the
joint
posterior
distribution
obtaining
samples
p(0, T2).
CIHMs.AlbertandChib(1993)andDellaportas
andSmith
1. INTRODUCTION
IndependentHierarchicalModel
1.1 The Conditionally
Jim Albert is Professor,Departmentof Mathematicsand Statistics,
Bowling Green,OH 43403. Siddhartha
Bowling Green State University,
Chib is Professor,JohnM. Olin School of Business,WashingtonUniversity,St. Louis, MO 63130.
916
? 1997 American Statistical Association
Journal of the American Statistical Association
September 1997, Vol. 92, No. 439, Theory and Methods
917
IndependentHierarchicalModels
Albertand Chib: Conditionally
forparticularex(1993) have developedMCMC algorithms
thatcan be generalizedin a
ponentialfamilydistributions
way to theCIHM.
straightforward
of theuse of thismethodin comparingmodels,and Carlin
and Chib 1995, fora MCMC model indicatoralgorithmto
estimate-Yk(Y)-)
Models
1.3 ComparingFixed-and Random-Effects
1.2 ModelCriticism
Given a particularCIHM, the focus of the foregoingis
of theCIHM, it shouldbe rec- to criticizethe assumptionsby means of mixturemodels.
Despite theattractiveness
issue is whethertheobserved
ognized that any posteriorinferenceis made conditional A perhapsmorefundamental
prior
belief in the regression
the
on the collectionof assumptionsimplicitin the modeling. data are consistentwith
one wishes to decide
case,
In the exchangeable
These assumptionsare typicallymade forconvenience.For structure.
where
model,
0i are all equal, and
example,parametricformsmaybe selectedon thebasis of betweenthefixed-effects
In the
where
the
model,
Oi are different.
conjugacyor because thechosenformssimplifythefitting therandom-effects
mortalthe
whether
questions
one
dataset,
suchas thelogit,maybe chosen, cancermortality
procedure.A linkfunction,
parametersare easy to interpret. ityratesdifferbetweencities.To constructa Bayesian test
so thatthetransformed
model, note thatas the hyperparameThese modelingassumptions,however,shouldbe made of the fixed-effects
concentrateson
withcaution.It is possiblethatthepatternsof theobserved terT2 approaches0, the priordistribution
=
hyThe
fixed-effects
x3.
withthemodel,andone mustthink the regressionsurfaceg(02)
datamaybe inconsistent
=
T2
H
that
o.
to
the
hypothesis
of alternativeassumptionsthatprovide a betterfit.Note pothesisis equivalent
2
K
T
that
H
hypothesis
the
alternative
thatthe parameters{0%} in the basic exchangeableCIHM One can test against
of
the
factor
Bayes
means
value
by
To
and identically takes some positive
independent
are assumedto be conditionally
where m(YIT 2) is the marginal
BKH
m(y|Tno)/m(yj0),
tenable
to
be
This assumptiondoes not appear
distributed.
for
fixedvalue of thehyperparama
of
the
data
analysisof the cancer mortalityrates, probability
froma preliminary
the
goodnessof fitof the reduced
assess
rateshave unusu- eterT2. One can
because a few of the observedmortality
=
the Bayes factorfor a secomputing
x</ by
ally largevalues. As a result,it is of interestto examinean model 0,
of
Kass
and Raftery1995 for a
model fortheratesthatpermitsthe occurrence quence of values To. (See
alternative
use
of
factors.)
Bayes
of the
of a fewoutliers.Anotherconcernrelatesthechoice of the review
link functionand whetherthe logit link is appropriatefor 1.4 Outlineof the Article
thesedata. In thecontextof theplacementdataset,the obSection 2 illustratesa MCMC scheme for comparing
of correctanswersappearto clusterinto
servedproportions
fortheCIHM. Using theconditionalindependence
models
questwo groups,correspondingto the easy and difficult
of the model,a simpleMCMC algorithmis prostructure
tions.In thiscase it maybe preferableto fita "two-group"
to
estimate
a regressionCIHM withparticularemposed
exchangeablemodel in whichquestionswithina particular
situation.Withtheexchangeable
the
exchangeable
on
phasis
groupare believedexchangeable.In bothapplications,it is
of model perturbationsthree
the
types
base,
model
as
difficult
to specifythe second-stagepriorfor (A,,T2). Thus
mixture
partialexchangeability,
distributions,
scale-inflated
any particularchoice of prioris at best a roughmatchto
andin each case a
link
models-are
mixture
introduced,
and
one's priorbeliefs.Therefore,it is of interestto investigate
modeldiagnostics
outlined
to
is
MCMC
perform
algorithm
the sensitivity
of the inferencewithrespectto the choice
A model indicatoralgoto the perturbation.
forthesecond-stage withrespect
of functionalformor hyperparameters
rithmis also developedforthecase whereone is uncertain
prior.
of thesecond-stagepriordistribution
aboutthespecification
In situations with plausible alternativemodels, the
of thehierarchicalexchangeablemodel.
Bayesian paradigm provides a useful methodologyfor
Section3 discussestheuse of simulationmethodsin deassessing the sensitivityto model assumptionsand for
rivinga Bayesian test for a reducedregressionmodel. A
assumptions.Suppose that there excriticizingdifferent
is developedthatpermits
MCMC modelindicatoralgorithm
ist K possible models, M1,.. ., MK, for the observed
models.
movementbetweenthe fixed-and random-effects
data. Associated with Mk, let /k denote the parameThe Bayes factoris obtainedfromthe simulatedposterior
ter vector and let fk(YIbk) and 7rk(0k) denote the samand Secof T2. Section4 containsillustrations,
distribution
pling densityand prior density.If the prior densityis
tion5 presentssome briefconcludingremarks.
givenby EK= 1 "Yk7Tk(0k), where-Yk is thepriorprobability of model Mk, thenby Bayes' rule, the posteriorden2. MODEL DIAGNOSTICS USING MARKOVCHAIN
MONTE CARLO IN AN EXCHANGEABLE MODEL
sityis givenby EKZ Yk(Y)7rk(4k|Y), where7rk(4k|Y)
unknown
of
the
is
the
density
posterior
to Fita
2.1 A MarkovChain MonteCarlo Algorithm
Ck7rk(Ok)fk(yjOk)
parameterconditionalon model Mk and -Yk(Y) is the posModel
HierarchicalRegression
teriorprobabilityof model Mk. The collectionof posterior
CIHM, whichis useConsiderthe followingthree-stage
1, ... , K} is usefulin investigatk
densities{rk(4kjy),
froma
in
a
set
of
that
the
belief
modeling
ful
n parameters
of inferenceswithrespectto changesin
ing the sensitivity
satisfiesa regressionstrucexponentialfamilydistribution
the model assumptions,and theposteriormodel probabiliture:
ties {yk (y) } are helpfulin criticizingparticularsets of asindependentfromf(y2lt), wherer = E(yi)
1. yiJrh?
sumptions.(See Kass and Raftery1995, fora recentreview
Journalof the AmericanStatisticalAssociation,September 1997
918
Fortheexamplesofthisarticle,
tribution.
theburn-in
time
appearedto be relatively
short,and theposterior
calculationswerebasedon a simulated
sampleof5,000iterations.
In thespecialcase in whichone is modeling
a beliefin
In thismodeldescription,
memberof
f is a particular
model
exchangeability,
stages2 and 3 of thehierarchical
theexponential
suchas thebinomial,
family,
Poisson,or
are replacedby theassumptions
thatat thesecondstage,
gamma,rqiis themeanof theithobservation,
and g is a
the0i are a randomsamplefroma K(,u,T2) distribution,
ofthemeanpaknownlinkfunction
thatmapsthesupport
andatthethird
stage,,uandT2 haveindependent
J(lUo,2)
intotherealline.To modelthebeliefthattherqi
rameter
In thefollowing,
and1g(a, b) distributions.
thisis referred
thetransformed
satisfya regression
structure,
parameters
to as the"basic"exchangeable
model.In thiscase thehyfromnormaldistributions
0i are takento be independent
perparameters
,uandT2 havethefollowing
conditional
diswithmeanparameters
xTf3andcommonvarianceparametributions:
terT2. In thefinalstageoftheCIHM,theunknown
hyperare assignedindependent
multivariate
normal Alf{Oi},T2, y distributed
parameters
andinversegammadistributions
withknownhyperparametervalues.
A
(n/T2 + I/U2
'
Due to the conditional
independence
structure
of this
model,it is straightforward
to construct
a MCMC simula- and
tionalgorithm
to generate
variates
fromthejointposterior
fT2
y distributed
}, tA,
distribution
of ({Oi4,3, T2). Conditional
on theparameters 1
I~~~~
/3and T2 01,... ,O, have independentposteriordistribu19 a + n/2,b +
(0i_ At)2)
tionswith0i distributed
to thedensity
according
propori=1
tionalto
2. 0i= g(i) are independent
fromAJ(xT/3,T2)
3. (/3,T2) areindependent
with3 distributed
B- )
JK(/3o,
andT2 distributed
1Ig(a, b).
f(yi Ih(0i))0(Oi; xT'3
I
2)J
2.2
Outlierand PartialExchangeable Models
As describedin Section1, supposethatone wishesto
wherer1 h(0) is theinverse
linkfunction
and Q(-;A,,T 2) iS perturb
thebasicexchangeable
modelbyreplacing
particuthenormaldensity
function.
Let 0 andX denotethevector larcomponents
mixtures
thatreflect
bydiscrete
alternative
of real-valued
parameters
andregression
matrix.
Thenby modelsordirections
inwhichthemodelcanfail.First,supcombining
stages2 and 3 of themodel,one sees thatthe pose thatone or moreof the0i are outlying
in thesense
3 andT2 havethefollowing
hyperparameters
conditionalthattheirlocationsarelocatedfarfromthelocationsofthe
distributions:
A simplescale-multiple
maingroupof parameters.
outlier
model(Gastwirth
andCohen1970)is obtained
byassuming
BT1)
AK(o3,
/31{Oi},T2, y distributed
thatthe0i are a prioriindependently
distributed
fromthe
mixture
density
and
A,T 2) ?+pi(Oi; ,KT2),
7r(0iIA, T2) = (1 -P)q(0,;
{0}, 3,y distributed
T21
K > 1.
This density
reflects
thepriorbeliefthatwithprobability
the
ith
onechooses
component
Typically,
pi,
0i is outlying.
whered/3 BT1(/o ? XT0T-2) andB1 =Bo ? XTXT-2.
K tobe equalto2 or3 andsetspi equaltoa smallnumber,
The structure
oftheforegoing
conditional
posterior
distri- reflecting
thepriorbeliefthatoutliers
areunlikely
tooccur.
butionssuggests
a simpleMCMC scheme.Givenan initial The foregoing
MCMC schemecan be modifiedin a
/ andT2, use n independentstraightforward
guessat thehyperparameters
mannerto incorporate
thisoutliermodel.
Metropolis
steps(ChibandGreenberg
1995;Tierney1994) Let Ai denotethe scalarmultipleof thevariancecorretosimulate
valuesofthetransformed
means0%.Intheappli- sponding
to thecomponent
0i. A priori,
Ai is equalto the
cationsdiscussedhere,we useda randomwalkchainwith values1 andK withknownprobabilities
1 - pi andpi. The
theincrement
density
fJo(O,c), wherethestandard
devia- posterior
distributions
of Oi,p,uand T2, conditional
on all
tionc is approximately
twicethestandard
deviation
ofthe remaining
parameters
including
Ai, followtheexpressions
beingsimulated.
(Fromempirical
work,iftheden- givenearlierwiththevarianceT2 of 0i replacedby AiT2.
density
sityis approximately
normally
distributed,
thenit appears Finally,thefullconditional
posterior
distribution
of Ai is
thatthismethod
willacceptcandidates
withprobability
.5.) discreteon thevalues1 and K withprobabilities
propor/ andTl2 tionalto (1 - Pi)4(0i; A, T2) andpib(0i;,u,KT2) (see also
valuesofthe simulate
Next,giventhesimulated
tf,
fromtheforegoing
normal
andinverse
gammadistributions,
Guttman
andSchollnik1994).
whereone is conditioning
on themostrecently
simulated The foregoing
outliermodelis one particular
deviation
values.Thiscyclesimulates
a Markovchainthatconverges fromtheassumption
thatthe0%areexchangeable.
Another
to thejointposterior
distribution
of {e%}, , T2. Afterdis- deviationfromexchangeability
is a beliefin partialextheremaining
simulated changeability,
cardinga setofburn-in
iterations,
whichstatesthatthe 0%can be partitioned
sampleis takenas a samplefromthejointposterior
dis- intoa numberof groupssuchthatwithineach groupthe
919
IndependentHierarchicalModels
Albertand Chib: Conditionally
thesame
parametersare exchangeable.Withthisassumption,one is finedso thatthefunctiongx(,i) has approximately
values of A. The choice of linkfamily
interestedin shrinkingthe observationstowarda central locationfordifferent
Values of r'
value in each group.(This behaviorwas termed"multiple forthebinomialproblemhas thischaracteristic.
shrinkage"by George [1986] and O'Hagan [1988]. Con- close to .5 are mappedto 0 values about0, and thedifferent
in thefunctionforvalues of
method values of A reflectdifferences
sonniand Veronese[1995] describedan alternative
forbinomialdata.) r1close to 0 or 1.
partialexchangeability
of implementing
A MCMC schemeis derivableby a simplemodification
is into a known
Specifically,suppose thatthe partitioning
numberG of groupsand let therandomvariablegi denote of the schemeforthebasic exchangeablemodel. (See also
of Oi (gi = 1. ..I G). De- Carlinand Polson 1991, fora similarmethodin regression
theunknowngroupmembership
from
as 0 = (0(1),... O(G)) whereO6(j)= { 0 models.)The values of Oiare generatedindependently
scribethepartition
suchthatgi = j }. Suppose thata prioriOibelongsto group densities proportionalto f(yi hx(0i))c4(0t;At,T2), where
Values of the
j withprobability
h,\(0%)is the inverselink transformation.
pi., whereE> Pij = 1.
Write the elementsof the jth group as O(j) = (0jl, second-stageparameters,uand T2 are generatedjust in the
A jn,). Assumethata priorithe O(j) are independentbasic exchangeablecase. Finally,one generatesa value of
*
and, withinthe jth group,the Ojh are exchangeable.This the link functionparameterA by simulatingfromthe disexchangeabilityassumptioncan be modeled as earlierby crete distribution{Aj } with probabilitiesproportionalto
letting0.1, ..., Oj be a randomsample froma K(pj, T ) {pj H$1 f (yiIhx (0i))}and taking/j, T? independentfromknownA(m., -yj)and
Second-Stage Priors
2.4 Alternative
1g(aj, bj) distributions.
In the basic exchangeablemodel,one is also concerned
consistof theOi,
In thisproblemtheunknownparameters
theassessmentof thepriorsforthe second-stagepaabout
the
parameters
second-stage
thegroupmemberships
gi, and
Atand T2. The assumptionsof independenceof
A
rameters
of
the
the
locations
G
groups.
(/lj, T2), which describe
and assignmentsof normaland inverse
simthe
two
parameters
constructed
is
by alternately
MCMC samplingscheme
forms
condiare made primarilyfor convefunctional
the
parameters
gamma
second-stage
ulatingfromthe Oi and
priorsfor(,u,T2).
to
use alternative
wish
may
A
user
membernience.
the
and
group
tionalon the groupmemberships,
of theposterior
the
sensitivity
investigate
wish
to
If
the
One
may
group
shipsconditionalon all remainingparameters.
in
the
forthe secondprior
for
across
changes
the
0
been
then
has
partitioned
analysis
membershipsare known,
Oi
2
in model
one
is
interested
In
addition,
the
following
have
parameters.
stage
and
The
intothe G groups.
0jh, p.,
distributhe
prior
choices
of
Are
there
particular
criticism.
conditionaldistributions:
tionthatare moreconsistentwiththeobserveddata?
fromf (Yjh h(Oj h))>(O6jh; 3j,T)
1.- 0jh distributed
In general,suppose thatthe user has L alternativepri/I
2. jij distributedJV[(Z"L1 Ojh/lT + mj/-y,)/(nj/T? + ors for
can be represented
(t, T 2). The priordistribution
l/yj), (nu/T2 + i/'yj)-1]
3. T 2 distributed (-,(aj+ nj/)2) b +
2
En,.1(03
mixture
as thediscrete
lr(p,T 2)
= EL
P1pj7rj(A T2),
where
priorsand the pj are the
the 7rj (.) representthe different
priorprobabilitiesof the L models. If one were interested
choices fortheprior,thenthe
To completeone cycleof theGibbs sampler,one needsto solelyin comparingdifferent
Conditionalon all remain- probabilitiespj = 1/L could be used.
simulatethegroupmemberships.
introducethe model
To implementa MCMC algorithm,
ing parameters,the groupmembershipg, of 0, is discrete
to indicator I e {1, ... , L}, which indicates which of
proportional
on theintegers1,2, . . , G withprobabilities
the L priors is true. One wishes to simulate from the
{Pij (0i; A3, TS2) j = 1 ... IG}.
joint posteriordistributionof ({Oi}, , T 2, I). As before,
LinkFunctions
2.3 Different
the Oi are simulated independentlyfrom the densities
T2). Given thatthe model indicatorI =
Anotherassumptionof the basic exchangeablemodel is f (yiIh(0t)) (0i; A,t
thatthemeanparametersri are connectedto thereal-valued j, one simulates(,u,T2) fromthebivariatedensityproporparametersOi by theknownlinkfunctiong. To investigate tionalto
n
of the choice of g, one can imbedthis
the appropriateness
choice of linkwithina familyof linksOi g=9 (71?),whereA
T2).
2)_Fj(/
11 >0
T8
z=1
is an elementof a discreteset {A1,.. , AL}. For example,in
one can consider One cycle of the MCMC iterationsis completedby simuthebinomialcase whereri is a proportion,
linkfunctions(Aranda-Ordaz1981) latingI from{1, ... I L} withprobabilitiesproportionalto
theclass of symmetric
h_ Aj)2)-
{pj7rj(At, T 2)}.
=
+(1_
A
3. A MARKOVCHAIN MONTE CARLO ALGORITHM
TO OBTAINA BAYESIAN TEST
OF A REDUCED MODEL
wherethe choices A 0,1 correspondto the use of logistic and linearlinks,and a probitlink correspondsapproxAs in theprevioussection,considerthebasic exchangeimatelyto the value A =.39. The choice of thisfamilyof
able model
because an exchangeable
linkscan not be done arbitrarily,
fromf(yi r), wherer = E(y2)
prioris placed on the 0%,and so the familyneeds to be de1. y.1'7 independent
Journalof the AmericanStatisticalAssociation,September 1997
920
sity.This is describedhereforthebinomial/logit
model;
a ) a similarapproximation
3. (A,,T2) are independent
with,udistributed
can be developedforexponenPJ(Ato,
the empiricallogit
fromr(T2), wherer(T2) is an arbitrarytial familydensities.Approximately,
andT2 distributed
To con- Zi = log((yi + .5)/(ni - yi + .5)) is approximately
distribution
placedonthevariance
hyperparameter.
normal
struct
a testofthehypotheses
H: T2 = 0, K: T2 > 0, a dis- withmean,U+?yiand variancea? = l/(yi +.5) + I/(ni -yi +
on ,uandT, theposterior
needsto be constructed.5). Conditional
densities
of the
crete/continuous
priordistribution
independent
normalwithmeanand
forT2. Supposethatwithprobabilityp,2 = 0, and,with -yiare approximately
1 -Po, T2is assigneda density
probability
p(T2). Thenthe variance
Bayesfactor(BF) in support
of thealternative
hypothesis
(zi )/a2i2
K overH is givenby
2. 0i = g(ri) are independentfromAK(u,T2)
,
.
1/of + 1/T2
BF = P(ylK)
P(YIH)
f70m(yIT2)p(T2)
m(yIT2
= 0)
dT2
and
wherem(yIT2) is the marginaldensityof the observed
=i 1/of ? 1/T2'
data y fora fixedvalue of T2. The posterior
probabilityof the alternative
hypothesis
is givenby P(Kly) = Usingthisapproximation,
theproposaldensity
q is defined
(1 + [po/(l- po)]BF-1)-1. (See Bergerand Delampady as follows:
1987fora generaldiscussion
ofthismethod
intesting
point
1. Simulatea valueofT fromthemixedpriordensity
nullhypotheses.)
ValuesoftheBayesfactor
canbe computed
froma simPo{f2 = O} + (1 - po){fT2 > Olp(T2)ulatedsampleofvaluesofthemarginal
posterior
density
of
valueT(C).
T2 = 0 andT2 > 0 haverespec- Call thesimulated
T2. A priori,
thehypotheses
=
T(C)
0, thentheproposedvaluesof -yiare all
tivepriorprobabilities
po and 1 - po. Fromthesimulated 2. If
0. IfT (c) > 0,thensimulate
-yifromindependent
sample,one can estimate
theposterior
probabilities
of the identically
distributions
withmeansandvariances
'j andvi.
P1
hypotheses
P(T2
= Oly)and 1 -P1 = P(T2 > O?y). normal
The Bayesfactoris computed
as theratioof theposterior Inthecancermortality
inthenextsecexamplediscussed
odds to thepriorodds BF = [(1 - p)/pl]/[(l - Po)/po].
tion,theforegoing
algorithm
is runusinga discrete
distribuTo simulate
fromthejointposterior
distribution,
thefol- tionp(T2) over a set of plausibleT2 values,sayT2, ..., T .
lowingMCMC schemeis used.First,rewrite
thefirst
stage A largepriorprobability
p0 is chosenforT2 = 0, so that
of theexchangeable
priormodelas 0i = At+ yi,whereyi thealgorithm
will visitthisvaluea sufficient
numberof
is distributed
N(0, T2). To simulate
fromtheposterior
den- timesto obtainan accurateapproximation
to itsposterior
sityof ({'Yi}, A,,T), we blocktheparameters
intothesubsets probability.
Fromtheempirical
frequencies
ofthevalues0,
in turnfromthedis- {fTh2}fromthesimulation
variates
({'yi}, r) andA, andsimulate
run,theposterior
probabilities
are
tribution
of ({'Yi}, T) conditional
on Atandthedistributioncalculated,
andtheseareconverted
to valuesof theBayes
of Atconditional
on ({'yi}, T).
factor.Specificcomments
aboutthechoiceof gridof T2
To simulate
fromtheconditional
distribution
of({ yi},T), valuesand theaccuracyof thissimulation
algorithm
are
a Metropolis-Hastings-type
algorithm
is used.Supposethat madein theexampleof Section4.1.2.
thecurrent
(T(g), { y
(g) }) represent
valueoftheparameters.
4. EXAMPLES
Generatea candidatevaluefromtheproposaldistribution
Thisproposaldistribution
first
a value 4.1 Cancer Mortality
q(T, {V-yi}).
generates
Data
ofT, (C), andthengenerates
a vector{jc) } conditional
on
4.1.1 Introduction.
To illustrate
themethods
described
thesimulated
valueT(C). Computetheacceptance
probabilin
2
Sections
and
3,
consider
the
cancer
mortality
dataanityPROB = max{1,q}, where
alyzedby Tsutakawa,
Shoop,andMarienfeld
(1985).One
in simultaneously
is interested
estimating
theratesofdeath
N; ,T(c)2)]
n= [f(YiIh(O('))) 0(,y
fromstomach
cancerformales"atrisk"intheage bracket
x lr (T (c)2 )q(T(o), {(g)})
45-64 forthe 84 largestcitiesin Missouri.For the ith
city (i = 1,... , 84), one observes the numberni at risk
nl [f(yiIh(O(g))(,y(9);, Tv(g)2)]
of cancerdeathsyi.Supposethatthe{yi}
andthenumber
x 7r (T (g) 2 ) q (T (C-), {a(g)
}
)
Y
are independent
binomialwithrespective
probabilities
of
If a uniform
randomvariate,U, is smallerthanPROB, death{pi}. If thecancerdeathratesare notbelievedto
thenthe candidatevector(T(c), {'Y(c)}) is accepted;oth- varysignificantly
acrosscities(smallor large)and across
erwise,thealgorithm
remainsat thecurrent
vectorvalue malesin thisage range,thenit maybe reasonableto be(T(9), {Y(9) }). A Metropolis
algorithm
is also usedto sim- lieve a priorithattheratesare exchangeable.
This prior
ulatefromtheconditional
distribution
of At.
beliefcan be modeled(as in Sec. 2.1) by letting
thelogThe successof this algorithm
dependson a suitable its0%= 1og(pi/( - p%)) be independent
Nyua,T2), andthen
proposaldensityq(T, {-yi}).This proposaldensitycan be assigning
fifCuo,
, andT2 independent
o2 ) and19(a, b) disconstructed
usingan approximation
to theposterior
den- tributions.
-
921
IndependentHierarchicalModels
Albertand Chib: Conditionally
Dataset
Model forthe Cancer Mortality
Table 1. Bayes Factors to Test the Fixed-Effects
7r
0
.041
.061
.091
.135
.202
.301
.449
.670
Prior
Posterior
Bayes factor
log(Bayes factor)
.5
.0689
1.00
0
.0625
.0151
1.75
.242
.0625
.0278
3.23
.509
.0625
.0697
8.09
.908
.0625
.1762
20.4
1.31
.0625
.2823
32.4
1.51
.0625
.2614
30.2
1.48
.0625
.0932
10.7
1.03
.0625
.0053
.62
-.21
4.1.2
is evaluatedby the
integral
Each (univariate)
Model? First,one quadrature.
Fixed- or Random-Effects
of Naylorand Smith(1982). Figure1 contains
modelis appro- algorithm
a random-effects
mayquestionwhether
The "exact"valuesof the
modelwouldbe a fixed- resultsfromthesecalculations.
priateforthisdataset.A simpler
by
ofdeathis thesame log Bayesfactorforvariousvaluesof logT (computed
modelin whichtheprobability
effects
curve.
as a smooth
areplotted
method)
integration
evidencethatthedeathrates thedirect
acrosscities.Is theresufficient
valuesthatappearinTable1 (computed
that Thecorresponding
{Pi} varybetweencities?One can testthehypothesis
arealso showninthe
method)
a Bayesfactorfortheequiva- fromtheMCMC simulation
P1 = .= P bycomputing
lie close
hypothe-figure.
estimates
thatT2 = 0 againstthealternative
Notethatall thesimulation-based
lenthypothesis
thattheMCMC algorithm
valueTO.Table1 illustratesto thesmoothcurve,indicating
sisthatT2 equalssomepositive
accurateestimates.
usingtheMCMC algorithmhas provided
computations
theBayesfactor
in Section3. First,a gridof equallyspacedvaldescribed
It shouldbe also notedfromTable1 andFigure1 thatas
first
increase
ues oflog(T2) waschosen,anda priorwasusedthatplaced T increases,
thevaluesofthelogBayesfactor
.5 andthensharply
- .6.
.5 on thevalue T2 = 0 and theremaining
probability
decreasewhenlogT is approximately
overthepositiveT2 valueson the In addition,
uniformly
probability
areinthe1-1.5rangefor
thelogBayesfactors
was cho- aninterval
grid.In thisexample,a vaguepriordistribution
ofT values.(A gridofT valuesshouldbe chosen
,. Next,theMCMC algorithmtocoverthevalueatwhichtheBayesfactor
senforthehyperparameter
is maximized.)
prob- This analysisprovidessomesupport
andtheposterior
wasrunform = 100,000iterations,
fora random-effects
of the modelwitha positivevalueof T.
by therelativefrequencies
abilitieswereestimated
valueson thegrid.
simulated
4.1.3 FittingtheExchangeableModel. As a firstlook
To gaugethe accuracyof thesecomputedBayes faclogits,
data,Figure2 plotstheempirical
canbe used.The at themortality
method
computational
tors,an alternative
of
logarithms
the
against
1/2)},
+
+
1/2)/(ni
yi
{log(yi
as
on T2 can be written
of y conditional
density
marginal
thesamplesizes,{rni},of eachcity.The twoparallellines
integral
the(n + l)-dimensional
thathave
to observations
of pointsin theplotcorrespond
of obthevariability
deathcountsof 0 and 1. Generally,
m (y lT2)
{II(Yi
Ih(0i)) (0i; ,T2 ) dO} d/l.
largeforthesmallercities.More
servedlogitsis relatively
byborforthesecitiescan be obtained
accurateestimates
acrossall cities.
ofn rowinginformation
in { Oi is a product
Fora fixedvalueofg,theintegrand
described
theuseofthebasicMCMC algorithm
eachof whichcanbe computed Consider
integrals,
one-dimensional
valuessetatpo = 0,
by inSection2.1withthehyper-parameter
in , is computed
Thentheouterintegral
byquadrature.
rela1. Thesevaluesreflect
a2 = 1 000, a = 5, and b
I
2
-4.5
1.5 -
0
-5
00
-1
-5.5
0-.5
U-
0
0~~~~~~
00
0
-6 ix-
q
00
%
0 0
o:
-0.5
0
2
~0
6.5
FsL
-1
0
-1
logtau
-0.5
0
Figure1. Values oftheLog Bayes FactorsComputedUsingAdaptive
Quadrature(Smooth Curve) and MCMC (PlottedPoints) forthe Cancer
Dataset.
Mortality
-8S
2
0
2.5
000
0
0
-2ll
-1.5
0
0
0
0
0
-7.50
-1.5-
0
3
0
3.5
LOG N
4
4.5
5
5.5
Figure2. Plot of EmpiricalLogitsAgainstLogarithmsof the Sample
Dataset.
Sizes forthe Cancer Mortality
922
Journalof the AmericanStatisticalAssociation,September 1997
TheMCMC algorithm
withmodelindicators
canalsobe
usedtoinvestigate
thesuitability
ofthechoiceofa logistic
linkforthisdataset.The mainproblemis to definesome
plausiblealternative
linkfunctions
forthisdataset.In Section2.3, a class of symmetric
linkfunctions
mappingthe
realvalued0 to theprobability
p was proposed.Butthese
functions
are"matched"
onlyforvaluesofp in an interval
about.5,andthiswillbe unsuitable
forthisexample,
where
themortality
ratesare closeto 0. A closelyrelatedfamily
oftransformations
thepowerfamily(Tukey1977)
-5
-5.5
z
2
-6
O
0
ai:o
o
.6.5 -B
o
o
0
gv(P) =
-7.5Q
p)
+ bv.
The choicesv = 0, av = 1, and bv= 0 givethelogisticlink,
2
2.5
Figure 3.
Dataset.
3
3.5
LOG N
4
4.5
5
5.5
PosteriorMeans of the Logits forthe Cancer Mortality
tivelyvaguepriorbeliefsaboutthelocationofthesecondstageparameters.
Figure3 containstheplotof theposteriormeansof thelogits{Oi}. On comparing
theplotsin
Figures2 and 3, it can be notedthattheobservedlogits
ofthesmallercitiesareshrunk
towardan avsubstantively
value.Thismodelappearsto havea much
eragemortality
smallereffecton themortality
ratesforthelargercities,
becausetheirrespective
posterior
meansare close to the
observedlogits.
Next,we examinethemeritsof theinitialmodeling
asNotefromFigure3 thatseveralof thepostesumptions.
riormeansappearto be somewhat
fromthemain
distinct
groupofobservations.
We thusconsider
an alternative
outliermodelthatcan accommodate
unusualvaluesof theOi.
We also examinewhether
an alternative
linkfunction
can
providea better
fit.
4.1.4
(1
andforalternative
powervaluesv, theconstants
av andb,
can be chosento matchthelogisticlinkfora particular
intervalof p values(Emersonand Stoto 1983). For this
dataset,an averagemortality
rateis p = .001,andforthe
choices v = -.5 and v = .5, values of the linearconstants
canbe foundso thatthetransformation
gv(p) has thesame
valueas thelogisticlinkat p = .001andthederivative
at
p = .001hasthesamevalueas thelogit.Figure5 plotsthree
inverselinkfunctions,
and
h,(0): thelogisticcdffunction
thetwoalternative
linkfunctions
fortherangeof 0 values
inthisexample.Usingtheterminology
ofTukey(1977),the
powervaluev = .5 is one stepawayfromthelogittoward
a linearlink function,and the value v
= -.5
has one step
greater
thanthelogit.
curvature
Forthisexample,thesimulation
was performed
usinga
mixture
modelwithequal priorprobabilities
forthethree
linkfunctions
andthesecond-stage
hyperparameters
setas
earlier.Afterm = 2,000 cyclesof thealgorithm,
theposteriorrelativefrequencies
of thelinksv = -.5, 0, .5 were
estimated
to be .18,.52,and.30. Thusforthisdataset,
the
use of thelogisticlinkis preferred,
andthe"morelinear"
linkv = .5 is morepreferable
thanthe"morecurved"link
Fittingan Outlier Model. Consider the scale- v = -.5. This analysis is not intendedto be a thorough
inflated
mixture
modeldescribedin Section2.2 and sup- study.
itgivessupport
to theuse ofthelogitlink
However,
posethat,a priori,
eachlogitOiis distributed
JV(0,T2) with in combining
proportions
fromrelatedexperiments.
.95 and distributed
probability
AF(0,3Ir2) withprobability
.05. Equivalently,
theunknown
scale multiple
Ai is equal
0
0.14
to 1 and3 withpriorprobabilities
.95 and.05.Supposethat
thesecond-stage
hyperparameters
takethesamevaluesas
0.12
inthebasicmodel.Theposterior
thatAi = 3 for
probability
z5
eachcityis obtained
fromtheMCMC algorithm
described (D
0.1
earlier.The resultsare plottedin Figure4. Notethatone
cityhas a largeposterior
outlying
probability,
suggesting 0L 0.08 thatthiscitymayhavean usuallyhighmortality
rate.
0~~~~~
0d
0~~~~~
thisnewmodelhasidentified
Although
a possibleoutlier, a. 0.06it is notclearwhether
thisoutlieris a significant
improve0~~~~~~~~~~~~~~4~~an%%0
0
nC 0
mentoverthebasicexchangeable
model.To checkthis,a
0
|
@0
0
newmodelwas runwithan indicator
variablethatselects
thebasicmodelor theoutliermodel.Withthepriorprob- 0.02abilityof eachmodelsetat .5, theposterior
of
probability
thebasicmodelwasestimated
tobe .88.Thus,although
the
2
2.5
3
3.5
4
4.5
5
5.5
outliermodelhas identified
one possibleoutlier,
it has not
LOG N
led to a significantly
betterfitthanthatobtainedwiththe
Figure4. PosteriorOutlying
Probabilities
Using the OutlierExchangeableModelfortheCancerMortality
Dataset.
basicexchangeable
model.
-J
923
Albertand Chib: ConditionallyIndependentHierarchicalModels
4
2
x lo-'
0
1.5
3.5 -
0
3-
1
2.5 -
/
0
0.5
W
0-2
0
0
-Z
o
0
0~~~
~~~~~~0
0~~~~~~~~~~~
0~~~~~~~~~~~~~
w
Cl,
~
0
LU
1.5 -
o
-1 -1.5
0
0
0
0
0
o
0
o
0
0.5 -7
c
-9.5
0
-1.5 -0
0
Q
0
0
-9
-8.5
-8
-7.5
-7
0
-6.5
-6
-5.5
-5
-4.5
Figure5. Three InverseLinkFunctionshv(0). Values of (v, a, b) are
v=
(-.5, -.0633, -4.9068), (0, 1, 0), and (.5, 63.2139, -8.9068)..,
, v =O ;---, v = .5.
.5;
0
5
10
20
15
QUESTION NUMBER
25
30
35
Figure6. Plots of Observed LogitsAgainstQuestion Numberforthe
Math Placement Dataset.
therelatively
In addition,
largeshapeparameter
= 10 is chosenforT2. Thisreflects
thepriorbelief
value
ai
4.2 PlacementTest Data
thatthelogitswillclustertowardthesetwogroups.
As a secondexample,consideran
4.2.1 Introduction.
newparamin Section2.2,one introduces
As described
test.AtBowlplacement
itemanalysisfroma mathematics
foreach
eters{gi}, whichindicatethegroupmembership
are givena 32-item
freshmen
ingGreenStateUniversity,
and thenuses theMCMC algorithm
of the32 questions,
classplacement.
choicetestto aid in mathematics
multiple
of theendistribution
fromthejointposterior
thenumberof correctan- to simulate
For a sampleof 200 students,
means
posterior
the
7a
plots
Figure
tire
set
of
parameters.
Figure6 plotstheobforeachquestion.
swersis recorded
of the
for
all
logits
observed
the
against
of
the
logits
{Oi}
incorrect)
against
correct/number
servedlogitslog(number
stanFigure7b plotstheratiosof theposterior
questions.
thequestionnumber.
logits.
of
the
errors
standard
classical
the
dard
to
deviations
i
question
answers
a student
Letpi denotetheprobability
values,thismodel
correct,i = 1, . .. , 32. It is clearfromthefigurethata fixed- Withthisselectionof hyperparameter
of thetwocateinto
one
of
the
questions
classifies
most
modelinwhichthepi areequalis notreasonableeffects
of
the
probabilities
{gi} wereclose
in termsofthelevelof diffi- gories.The posterior
theproblems
appearto differ
quesFora particular
the to 0 or 1 for28 ofthe32 questions.
whether
it is notclearfromthisfigure
culty.However,
theobservedlogittoward
meanshrinks
modelis suitable. tion,theposterior
random-effects/exchangeable
alternative
Manyof theobservedlogitsarein the-1.5-.5 range.But
2easy(logitsvalues
a fewquestionsappearto be relatively
0
00
trend
from1-2),andthereappearstobe a slightdownward
1-oo
~~~~~~~~~~0
inthegraphareconsistent w <
inthegraph.Theseobservations
0
0thattestbaofthetestitems.Thequestions
withknowledge
to questions 0~ 0~~~~~~0
easycompared
sic algebraskillsarerelatively
0
algebraicexor questionson simplifying
on trigonometry
00
fractions.
containing
pressions
2
1
1.5
0.5
0
-1
-0.5
-2
-1.5
4.2.2
Fittinga PartialExchangeableModel. The fore-
pi1 andI'2-
OBSERVED LOGIT
(a)
1.3
ofa partialextheconsideration
motivate
goingcomments
in
2.2.
Consider
Section
described
model
as
changeable
1.2
cn
~~~~~~~~~0
the simplestmodelof thistype,in whichthereare two
0~~~~~~~
0
classesof questionsand,withina class,thequestionsare
thespecThismodelrequires
believedtobe exchangeable.
i
00
thatspecifythe 0.9 -?
of second-stage
hyperparameters
ification
oflogits{Oi}. Forthisexam- 0.80
ofthetwogroupings
locations
1
1.5
2
0.5
-1
0
-0.5
-2
-1.5
OBSERVED LOGIT
ple,thevalues(mi,-yi) = (-1,01), (m2,y2) = (1.4, 01),
and(al, b\)= (10,1) areused.Thesevaluesofthehyperpathepriorbeliefthatthetwogroupsoflogits
reflect
rameters
Figure 7. Plots of PosteriorMeans of the Logits (a) and Ratios of
aboutthevalues-1 and1.4.Thispriorbeliefis thePosterior
willcluster
to theClassicalStandardErrors(b)
StandardDeviations
totheparametersforMathPlacementDataset.
deviations
bygivingsmallstandard
strong
Journalof the AmericanStatisticalAssociation,September 1997
924
probabilities
obtained
themeanlogitforthegroupforwhichthatquestionhas (-1, 1.4,.01,.01,1,1). The posterior
in
run
are
Table
2.
The
a
simulation
by
displayed
marginal
Themostinteresting
is
beenclassified.
aspectofthisfigure
arealso shown.We see thatthereis
probabilities
inthebot- posterior
oftheposterior
standard
deviations
thebehavior
valuesof a, befromthedataforhyperparameter
tomgraph.The usualstandard
errorof theobservedlogit support
2-4
values
of
1-2.
tween
and
a2
between
Because
thereis a
is given by y71 ? (ri - yi)jl, where yi and ni are the
there
is
more
largergroupofdifficult
questions,
support
(a
Gener- largervalueof forshrinkage
number
correct
andsamplesizefortheithquestion.
of thelogitsin thisgroup
ai)
thanthese towardan
deviations
aresmaller
standard
ally,theposterior
averagevalue.
theaddedprecision
from
standard
errors,
reflecting
combintheposterior
ingthedatafromrelatedquestions.
However,
5. CONCLUSIONS
standard
of thequestionswithgroupmemberdeviations
Thisarticlehaspresented
a generalmethod
ofrobustifynotcloseto0 or 1 aresignificantly
shipprobabilities
larger
inga Bayesianhierarchical
model.By theuse of discrete
thantheclassicalstandard
errors.These particular
quesmixtures,
onecancriticize
thechoiceofpriorandlinkfuncintoone of thetwogroups,
tionsare noteasilyclassified
tionin a hierarchical
linearmodel.We have
generalized
standard
deviations
reflect shownhowthemixture
andtherelatively
largeposterior
can also be usedtobuild
approach
theconflict
in thisclassification.
outlierandpartialexchangeable
models.Thesemodelsare
relevantwhena largenumberof parameters
4.2.3 Checkingthe Second-StagePrior One way to particularly
andthepriorbeliefof
beingestimated
investigate
thesuitability
of the"two-groups"
assumptionare simultaneously
mayhaveto be questioned.
in thepartialexchangeable
modelis to considera mix- exchangeability
feature
Oneattractive
oftheproposed
mixture
modelperturepriorforthe values of the hyperparameters
{mn}.
approachis thatit requiresstraightforward
elabThe valuesof ml and m2 givethepriorlocationsforthe turbation
of standard
MCMC methods
forfitting
exchangetwogroups,and as thedistanceIrn - M21 approaches
0, orations
can be appliedto
the two-groups
model reducesto the basic exchange- able models.Indeed,thismethodology
strucable model. A prior distribution
was chosen that othermodelsthatsharethesamebasichierarchical
on the seven pairs of val- ture,andthechoicemustbe madeamonga setofcontendplaced equal probabilities
we planto exploretheuseresearch,
ues (in,,M2) = (-1,1.4), (-.8,1.2), (-.6,1.0), (-.4,.8), ingmodels.In future
of alternative
clas(-.2,.6), (0,.4), (.2,.2). The valuesof the hyperparam-fulnessof thisapproachin thecontext
forselecting
models.
eters (QY1ly2,
al, a2,bi,b2) were set to the same values sicalandBayesianapproaches
(.01,.01,10,10,1,1)thatwereusedintheinitialpartialex[ReceivedFebruary1995. RevisedMarch 1996.]
was rerun.The
changeablefit.The simulation
algorithm
=
pair(ml, M2)
(-1, 1.4) receiveda posterior
probability
REFERENCES
of .31 andthepair(-.8,1.2) a probability
of .62,andno
otherpairhada significant
Thusitap- Albert,J. H. (1988), "ComputationalMethodsUsing a Bayesian Hierarposterior
probability.
pearsthatthedatasupport
thepartialexchangeable
model chical GeneralizedLinear Model," Journalof theAmericanStatistical
overthebasicmodel.Inotherwords,
thereis someevidence Association,83, 1037-1045.
Albert,J.H., and Chib,S. (1993), "BayesianAnalysisof Binaryand Polyforthedataclustering
intotwogroups.
chotomousResponseData," Journalof theAmericanStatisticalAssociintheforegoing
Another
concern
fitting
procedure
is the ation,88, 669-679.
of thesecond-stage
specification
prioron thevariancepa- Aranda-Ordaz,F. J.(1981), "On Two Familiesof Transformations
to AdTi2 and r22.The hyperparameters
rameters
a, and a2 af- ditivityforBinaryResponse Data," Biometrika,68, 357-363.
J. 0. (1985), StatisticalDecision Theoryand Bayesian Analysis,
fectthegenerallocationofthesepriors.As thevalueof ai Berger,
New York: Springer-Verlag.
increases,
thepriorforri2placesmoremasstoward0. A Berger,J. O., and Delampady,M. (1987), "TestingPrecise Hypotheses,"
largevalueof ai reflects
thepriorbeliefthatthelogitsin
StatisticalScience, 3, 317-352.
thatparticular
To assessthechoiceof Carlin,B. P., and Chib, S. (1995), "Bayesian Model Choice via Markov
groupwill cluster.
different
valuesfora, and a2, a priorwas selectedthat Chain MonteCarlo,"JournaloftheRoyal StatisticalSociety,Ser. B, 57,
473-484.
on the hyperparameter
placed equal probabilities
values Carlin, B. P,. and Polson, N. G. (1991), "Inferencefor Nonconjugate
in theset {(a1,a2),ai = 1,2,4,8,16}. The remaining
hy- Bayesian Methods Using the Gibbs Sampler,"Canadian Journalof
Statistics,4, 399-405.
valueswereset to (ml, iM2, Yl,Y2, bl, b2) =
perparameter
Table 2. Posterior Probabilitiesof Variance Hyperparameters
a1 and a2 forthe PartialExchangeable Model
a2
a,
1
2
4
8
16
1
2
.04
.10
.18
.06
.00
.38
.04
.09
.16
.05
.00
.34
.02
.05
.08
.03
.00
.17
.01
.02
.03
.01
.00
.07
.01
.02
.01
.00
.00
.04
4
8
16
.11
.28
.46
.15
.00
1.00
Chib, S., and Greenberg,E. (1995), "Understandingthe MetropolisTheAmericanStatistician,49, 327-335.
HastingsAlgorithm,"
Consonni,G., and Veronese,P. (1995), "A BayesianMethodforCombining
ResultsFrom Several BinomialExperiments,"
Journalof theAmerican
StatisticalAssociation,90, 935-944.
Dellaportas,P., and Smith,A. F. M. (1993), "Bayesian InferenceforGeneralizedLinear and ProportionalHazards Models via Gibbs Sampling,"
AppliedStatistics,42, 443-459.
Emerson,J. D., and Stoto,M. A. (1983), "Transforming
Data'" in UnderstandingRobustand ExploratoryData Analysis,eds. D. C. Hoaglin,F.
Mostellerand J.W. Tukey,New York:Wiley.
Gastwirth,J. L., and Cohen, M. L. (1970), "Small-SampleBehaviorof
Some RobustLinear Estimatorsof Location,"Journalof theAmerican
StatisticalAssociation,65, 946-973.
Albertand Chib: Conditionally
IndependentHierarchicalModels
Gaver,D. P.,Draper,D., Goel, P. K., Greenhouse,J.C., Hedges,L. V.,MorStatistical
ris,C. N., and Waternaux,
C. (1992), CombiningInformation:
Issues and Opportunities
for Research,Washington:NationalAcademy
Press.
Gelfand,A. E., Hills, S. E., Racine-Poon,A., and Smith,A. F. M. (1990),
of BayesianInferencein NormalModels UsingGibbs Sam"Illustration
pling,"Journalof theAmericanStatisticalAssociation,85, 972-985.
Gelfand,A. E., and Smith,A. F. M. (1990), "Sampling-BasedApproaches
to CalculatingMarginalDensities,"Journalof theAmericanStatistical
Association,85, 398-409.
George,E. I. (1986), "MinimaxMultipleShrinkageEstimators,"The Annals of Statistics,14, 188-205.
Guttman,
I., and Scollnik,D. P. M. (1994), "An IndexSamplingAlgorithm
of a Class of Model SelectionProblems,"Communications
in Statistics,
23, 323-339.
Kass, R. E., and Raftery,A. E. (1995), "Bayes Factors,"Journalof the
AmericanStatisticalAssociation,90, 773-795.
Kass, R. E., and Steffey,
D. (1989), "ApproximateBayesian Inferencein
ConditionallyIndependentHierarchicalModels (ParametricEmpirical
Bayes Models)," Journalof the AmericanStatisticalAssociation,84,
717-726.
Kass, R. E., Tierney,L., and Kadane, J. B. (1988), "Asymptoticsin
Bayesian Computation,"in Bayesian Statistics3, eds. J. Bernardo,M.
DeGroot,D. V. Lindley,and A. F. M. Smith,OxfordU.K.: OxfordUniversityPress.
925
Lindley,D. V.,and Smith,A. F. M. (1972), "Bayes EstimatesfortheLinear
Model," Journalof theRoyal StatisticalSociety,Ser. B, 34, 1-41.
Lu, W. (1992), "Empiricaland HierarchicalEstimationof Several Means
in theNaturalExponentialFamily,"Technicalreport,The University
of
Texas at Austin,CenterforStatisticalSciences.
Morris,C. (1983a), "ParametricEmpiricalBayes Inference:Theoryand
Methods,"Journalof theAmericanStatisticalAssociation,78, 47-65.
(1983b), "NaturalExponentialFamilies With QuadraticVariance
Functions:StatisticalTheory,"TheAnnalsof Statistics,11, 515-529.
O'Hagan, A. (1988), "ModellingwithHeavy Tails,"Bayesian Statistics3,
eds. J.M. Bernardo,M. H. DeGroot,D. V. Lindleyand A. F. M. Smith,
OxfordU.K.: OxfordUniversityPress.
Raftery,A. E. (1993), "ApproximateBayes Factors and Accountingfor
Model Uncertaintyin GeneralizedLinear Models," Technical Report
255, Universityof Washington,
Dept. of Statistics.
Tierney,L. (1994), "MarkovChainsforExploringPosteriorDistributions,"
The Annalsof Statistics,22, 1701-1762.
forPosTierney,L., and Kadane, J.B. (1986), "AccurateApproximations
teriorMomentsand MarginalDensities,"Journalof theAmericanStatisticalAssociation,81, 82-86.
Tsutakawa,R. K., Shoop, G. L., and Marienfeld,C. J. (1985), "Empirical
Bayes Estimationof Cancer MortalityRates,"Statisticsin Medicine,4,
201-212.
Tukey,J. W. (1977), ExploratoryData Analysis,New York: AddisonWesley.