3/2

1


Dirichlet process mixtures are active research areas
Dirichlet mixtures are it!

The flexibility of DPM models supported its huge popularity in wide variety
of areas of application.

DPM models are general and can be argued to have less structure.

Double Dirichlet Process Mixtures add a degree of structure, possibly at
the expense of some degree of flexibility, but possibly with better
interpretability in some cases

We discuss applications (and limitations) of these semiparametric double
mixtures

We compare fit-prediction duality with competing models
2
Other DP extensions



Double Dirichlet process mixtures are a subclass of
dependent Dirichlet Process mixtures (MacEachern 1999,……)
Double DP mixture are different from Hierarchical Dirichlet
Processes (The et al. 2006 )
Double DPM is simply independent DPMS
3
Motivating Example 1


Luminex measurements on two biomarker proteins from
n=156 Patients
 IL-1β protein
 C-reactive protein
The biological effects of these two proteins are thought to
be not (totally) overlapping.
  1i  
 y1i 
  ~ N 2   ,  i , i  1,..., n
 y2i 
   2i  
4
Two Biomarkers (y1 and y2)
  1i  
 y1i 
  ~ N 2   ,  i , i  1,..., n
 y2i 
   2i  

Usual DP mixture of normals (Ferguson 1983,…..)
( i ,  i ) i.i.d. ~ G, i  1,..., n
G ~ DP ( , G0 )
G0   ~ N 2 ( 0 ,  0 )   ( ~ Inv. Wishart )
Questions
 Should we model the two biomarkers jointly?

Should we cluster the patients based on both biomarkers jointly?

The biomarkers may operate somewhat independently.
5
Double DP mixtures
  1i 
 12i
 y1i 
  ~ N 2   ,  i  
   2i 
 y2i 


 1i 2i  
 , i  1,..., n
2
 2i  
( 1i ,  12i ) i.i.d. ~ G1 , i  1,..., n
G1 ~ DP (1 , G01 )
2
G01  1 ~ N ( 01,  01
)  ( 12 ~ Inv. Gamma )
(  2i ,  22i ) i.i.d. ~ G2 , i  1,..., n
G2 ~ DP ( 2 , G0 2 )
2
G02   2 ~ N ( 02 ,  02
)   ( 22 ~ Inv. Gamma )

Equicorrelation – corr(y1i, y2i) are assumed to be the same for all i=1,…,n

Clustering based on biomarker 1 and based on biomarker 2 can be
different
6
Motivating Example 2: Interrater Agreement


Agreement between 2 Raters (Melia and Diener-West 1994)
Each rater provides an ordinal rating on a scale of 1-5 (lowest to
highest invasion)of the extent to which tumor has invaded the
eye,n=885
Rater 1
R
a
t
e
r
2
1
2
3
4
5
1
291
74
1
1
1
2
186
256
7
7
3
3
2
4
0
2
0
4
3
10
1
14
2
5
1
7
1
8
3
7
Interrater agreement

Kottas, Muller, Quintana (2005) analyzed these data using a
flexible DP mixture of Bivariate probit ordinal model which
modeled the unstructured joint probabilities
prob(Rater 1=i and Rater2 = j), i=1,…,5, j=1,…,5


One way to quantify interrrater agrrement is to measure
departure from the structured model of independence
We consider a (mixture of) Double DP mixtures model here
which provides separate DP structures for the two raters. We
then measure ``agreement’’ from this model.
8
Motivating Example 3
Mixed model for longitudinal data

yij  X ij   Z ijbi  ij , or
yi  ( yi1 ,..., yip ) ~ N ( X i   Z i bi , i )

It is common to assume (Bush and MacEachern 1996)
bi ~ G, G ~ DP

Modeling the error covariance i or the error variance
(if i =diag(2i)) extends the normal distribution
assumption to normal scale mixtures (t, Logistic,…)
 i2 ~ G, G ~ DP
9
Putting the two together

One way to combine these two structures is
(bi ,  i2 ) ~ G, G ~ DP


Do we expect the random effects bi appearing in the
modeling the mean and the error variances to cluster
similarly?
The error variance model often is used to extend the
distributional assumption.
bi ~ G1 , G1 ~ DP (1 , G01 )
 i2 ~ G2 , G2 ~ DP ( 2 , G02 )
10
Double DPM

I will discuss




Fitting
Applicability
Flexibility
Limitations
of such double semiparametric mixtures

I will also compare these models with usual DP
models via predictive model comparison criteria
11
Dirichlet process

Dirichlet Process is a probability measure on the space of
distributions (probability measures) G.

G ~ Dirichlet Process (G0), where G0 is a probability


Dirichlet Process assigns positive mass to every open set
of probabilities on support(G0)
Conjugacy: Y1,…., Yn ~ i.i.d. G, (G) = DP( G0)
Then Posterior (G|Y) ~ DP( G0 + nFn) where Fn is the
empirical distn.

Polya Urn Scheme
12
Stick breaking and discreteness

G~ DP( G0) implies G is almost surely discrete

G   q j { j } with prob. 1 (Tiwari and Sethuraman 1982)
j 1
1 , 2 ,.... ~ i..i.d . G0
Stick - breaking represntat ion
q1  w1 ,
w1 ~ Beta (1,  
q2  w2 (1  w1 ),
w2 ~ Beta (1,  
q3  w3 (1  w2 )(1  w1 ), w3 ~ Beta (1,  
13
Bayes estimate from DP



The discrete nature of a random G from a DP leads to some
disturbing features, such as this result from Diaconis and
Freedman (1986)
Location model
yi = + i, i=1,…n
  has prior (), such as a normal prior
 1,…, n ~ i.i.d. G
G ~ DP(G0) - symmetrized
G0 = Cauchy or t-distn
Then the posterior mean is an inconsistent estimate of 
14
Dirichlet process mixtures (DPM)
Yi ~ f ( yi | i , , xi ), such as N (i , ), i  1,...n
f is a known parametric distn.
xi are possible covariates .  other parms
1 ,..., n ~ i.id . G
G ~ DP (G0 )

If we marginalize over i, we obtain a semiparametric mixture
Yi ~
 f (y
i
|  , , xi )dG ( )
where the mixing distribution G is random and follows DP(G0)
15
DPM - clusters
Yi ~ f ( yi | i , , xi )
1 ,..., n ~ i.id . G, G ~ DP (G0 )

Since G is almost surely discrete, 1,…,n form clusters


1= 5 = 8
2= 3 = 4= 6= 7
 1unique
 2unique
etc.

The number of clusters, and the clusters themselves, are
random.
16
DPM – MCMC

The Polya urn/marginalized sampler (Escobar 1994, Escobar &
West 1995) samples i one-at-a-time from
(i | -i, data)


Improvements, known as collapsed samplers, are proposed in
MacEachern (1994, 1998) where, instead of sampling i , only
the cluster membership of i are sampled.
For non-conjugate DPM (sampling density f(yi |i ) and base
measure G0 are not conjugate), various algorithms have been
proposed.
17
Finite truncation and Blocked Gibbs
Yi ~ f ( yi | i , , xi )

M
1 ,..., n ~ G   q j{ } instead of DP - G   q j {
j 1


j
j 1
j}
With this finite truncation, it is now a finite
mixture model with stick-breaking structure on qj
(1,....,n) and (q1,....,qM) can be updated in
blocks (instead of one-at-time as in Polya Urn
sampler) which may provide better mixing
18
Comments



In each iteration, the Polya urn/marginal sampler cycles thru
each observation, and for each, assigns its membership
among a new and existing clusters.
The Poly urn sampler is also not straightforward to implement
in non-linear (non-conjugate) problems or when the sample
size n may not be fixed.
For the blocked sampler, on the other hand, the choice of the
truncation M is not well understood.
19
Model comparison in DPM models


Basu and Chib (2003)
developed Bayes factor/
marginal likelihood
computation method for
DPM.
This provided a framework
for quantitative comparison
of DPM with competing
parametric and
semi/nonparametric models.
Log-Marginal and log-Bayes
factor for Longitudinal Aids trial
(n=467) with random coeffs
having distn G
DPM
Student-t
Normal
DPM
-3477
Student-t
(76)
-3553
Normal
(62)
(-14)
-3539
20
Marginal likelihood of DPM

Based on the Basic marginal identity (Chib 1995)
log-posterior()=log-likelihood() + log-prior() - log-marginal
log-marginal = log-likelihood(*) + log-prior(*) – log-posterior(*)

The posterior ordinate of DPM is evaluated via prequential
conditioning as in Chib (1995)
Log-Marginal and log-Bayes factor

The likelihood ordinate
of DPM is evaluated from a
DPM Student-t Normal
(collapsed) sequential
importance
sampler.
DPM
-3477
(76)
(62)
Student-t
Normal
-3553
(-14)
-3539
21
22
23
24
Double Dirichlet process mixtures (DDPM)
Yi ~ f ( yi | i , i , , xi ), such as N (i , i ), i  1,...n
f is a known parametric distn.
xi are possible covariates .   other parms
1 ,..., n ~ i.id . G
G ~ DP (  G 0 )

 1 ,..., n ~ i.id . G
G ~ DP ( G 0 )
Marginalization obtains a double semiparametric mixture
Yi ~   f ( yi |  , , , xi )dG ( ) dG ( )
where the mixing distributions G and G are random
25
Two Biomarkers case: y1 and y2
2

y


 1i
 1i 
 1i 

  ~ N 2  ,  i  
   2i 
 y2i 


 1i 2i  
 , i  1,..., n
2
 2i  
( 1i ,  12i ) i.i.d. ~ G1 , i  1,..., n
G1 ~ DP (1 , G01 )
2
G01  1 ~ N ( 01,  01
)  ( 12 ~ Inv. Gamma )
(  2i ,  22i ) i.i.d. ~ G2 , i  1,..., n
G2 ~ DP ( 2 , G0 2 )
2
G02   2 ~ N ( 02 ,  02
)   ( 22 ~ Inv. Gamma )
26
A simpler model: normal means only
  i  
 yi1 
  ~ N 2   ,  , i  1,..n
 yi 2 
  i  
1 ,....., n ~ i.i.d . G ,
 1 ,....., n ~ i.i.d . G ,  has a
G ~ DP (  G 0 )
G ~ DP ( G 0 )

Wishart prior
We generate n=50 (i,i) means and then (yi1,yi2)
observations from this Double-DPM model
27
Double DPM
mu
-4
-10
-10
-8
-8
-6
-6
-4
psi
y[,2]
-2
-2
0
0
2
2
Observations y
-4
-2
0
2
phi
4
6
-5
0
5
y[,1]
28
2
Single
bivariate
meansymbols
vector
Plot of DPM
y and in
mu:the
Clusters
in different
Double
DPM
mean in
components
Plot
of y and
mu: in
Clusters
different symbols
y
2
y
mu
-2
-6
-4
y[,2]
-4
-6
-10
-8
-8
-10
y[,2]
-2
0
0
mu
-5
0
5
y[,1]
10
-5
0
5
y[,1]
29
Model fitting



We fitted the Double DPM and the Bivariate DPM models to
these data.
The Double DPM model can be fit by a two-stage Polya urn
sampler or a two-stage blocked Gibbs sampler.
“Collapsing” can become more difficult.
1 n
Average  MSEd   E post ( id  idtrue ) 2 , d  1,2
n i 1
1 n
Average  MADd   E post | id  idtrue |, d  1,2
n i 1
30
MSE
Double DP
Bivariate DP in
(y1,y2)
Double DP
Bivariate DP in
(y1,y2)
MAD
y1
y2
y1
y2
Data generated from
Double DP
0.99 1.32 0.72 0.92
1.02
4.29
0.83
1.73
Data generated from
Bivariate DP
0.98 1.29 0.81 0.94
0.98
1.08
0.75
0.81
31
Wallace (asymmetric) criterion for comparing
two clusters/partitions

Let S be the number of mean pairs which are in the same cluster
in a MCMC posterior draw and also in the true clustering.

Let nk, k=1,..K be the number of means in cluster Ck in the MCMC
draw.

Then the Wallace asymmetric criterion for comparing these two
clusters is
S
k nk (nk  1) / 2
Average Wallace
Data generated from
Double DP
Bivariate DP
Double DP
y1
0.89
y2
0.42
y1
0.66
Bivariate DP in (y1,y2)
0.72
0.24
0.62
y2
0.48
0.62
32
Measurements on two biomarker proteins by Luminex
panels
22000
• Frozen parafin embedded tissues, pre and post surgery
20000
• Luminex panel
18000
16000
14000
CRP
• Nodal involvement
10
20
30
40
50
60
70
80
33
Two biomarker proteins
The bivariate DPM

   
 y1i 
  ~ N 2   1i ,  i , i  1,..., n
 y2i 
   2i  

( i ,  i ) i.i.d. ~ G, i  1,..., n
G ~ DP ( , G0 )
G0   ~ N 2 ( 0 ,  0 )   ( ~ Inv. Wishart )
vs the Double DPM
  1i 
 12i
 y1i 
  ~ N 2   ,  i  
   2i 
 y2i 


 1i 2i  
 , i  1,..., n
2
 2i  
( 1i ,  12i ) i.i.d. ~ G1 , i  1,..., n
G1 ~ DP (1 , G01 )
2
G01  1 ~ N ( 01,  01
)  ( 12 ~ Inv. Gamma )
(  2i ,  22i ) i.i.d. ~ G2 , i  1,..., n
G2 ~ DP ( 2 , G0 2 )
2
G02   2 ~ N ( 02 ,  02
)   ( 22 ~ Inv. Gamm
34
µpred
Double DP
14000
18000
mu.pred[2]
50
0
-50
mu.pred[1]
100
22000
Double DP
0
2000
6000
10000
0
10000
Bivariate DP
14000
18000
mu.pred[2]
100
50
0
-50
mu.pred[1]
6000
22000
Bivariate DP
2000
0
2000
6000
10000
0
2000
6000
10000
35
0.00000
0.00
0.00030
0.08
0
0
10
10
20
20
30
30
40
0.00015
y.pred[1]
0.04
y.pred[1]
0.00000
0.00015
y.pred[1]
0.00 0.04 0.08
y.pred[1]
0.00030
0.12
ypred
Double DP
Double DP
40
14000
Bivariate DP
14000
18000
18000
22000
Bivariate DP
22000
36
ypred
Double DP
Bivariate DP
0.030
0.025
0.010
0.020
densit y
0.015
25000
densit y
25000
0.005
r ed
[ 2]
0.000
10000
0
20000
15000
y.p
15000
y.p
0.005
20000
r ed
[ 2]
0.010
10
20
30
y.pred[1]
40
50
60
0.000
10000
0
10
20
30
40
50
60
y.pred[1]
37
-12
LPML = log f(yi| y-i)
Double DP = -1498.67
Bivariate DP= -1533.01
-14
Log CPO
-10
-8
log CPO = log f(yi| y-i)
Double DP
Bivariate DP
0
50
100
Observation
150
38
Model comparison



I prefer to use marginal likelihood/ Bayes factor for model
comparison.
The DIC (Deviance Information Criterion) , as proposed in
Spiegelhalter et al. (2002) can be problematic for missing
data/random-effects/mixture models.
Celeux et al. (2006) proposed many different DICs for missing
data models
39
DIC3

I have earlier considered DIC3 (Celeux et al. 2006,
Richardson 2002) in missing data and random effects
models which is based on the observed likelihood
DIC 3  4 E log p( yobs |  ) | yobs   2 log E  p( yobs |  ) | yobs ,
where p( yobs |  )   p( yobs ,  ,  |  ) d (   )


The integration over the latent parameters often has to be
obtained numerically.
This is difficult in the present problem
40
DIC9

I am proposing to use DIC9 which is similar to DIC3
but is based on the conditional likelihood
p( yobs |  ,  , )
DIC 9  4 E ,  , log p( yobs |  ,  , ) | yobs   2 log E ,  ,  p( yobs |  ,  , ) | yobs ,
DIC9
Double DP
3018.3
Bivariate DP 3050.3
41
Convergence rate results: Ghosal and Van Der Vaart (2001)

Normal location mixtures
Model: Yi ~ i.i.d. p(y) = (y-)dG(), i=1,…,n
G ~ DP(G0), G0 is Normal
Truth:

p0(y) = (y-)dF()
Ghosal and Van Der Vaart (2001): Under some regularity
conditions,
Hellinger distance (p, p0)  0 “almost surely”
at the rate of (log n)3/2/n
42
Ghosal and Van Der Vaart (2001): results contd.

Bivariate DP location-scale mixture of normals
Yi ~ i.i.d. p(y) = (y-)dH(,),


i=1,…,n
Ghosal and Van Der Vaart (2001): If H0 is Normal {a compactly supported distn},
then the convergence rate is
(log n)7/2/n
Double DP location-scale mixture of normals
Yi ~ i.i.d. p(y) = (y-)dG() dG(),
G ~DP(G0),

H ~DP(H0)
i=1,…,n
G ~DP(G0)
Ghosal and Van Der Vaart (2001): If G0 is Normal, G0 is compactly supported
and the true density
p0(y) = (y-)dF1() dF2() is also a double mixture, then
Hellinger distance (p, p0)  0 at the rate of (log n)3/2/n
43
Interrater data


Agreement between 2 Raters (Melia and Diener-West 1994)
Each rater provides an ordinal rating on a scale of 1-5 (lowest to
highest invasion)of the extent to which tumor has invaded the
eye,n=885
Rater 1
R
a
t
e
r
2
1
2
3
4
5
1
291
74
1
1
1
2
186
256
7
7
3
3
2
4
0
2
0
4
3
10
1
14
2
5
1
7
1
8
3
44
DPM multivariate ordinal model

Kottas, Muller and Quintana (2005)
P (Rater1  j and Rater2  k for subject i)
 P( 1 j-1  Z i1   1 j ,  2 k -1  Z i 2   2 k )
 Z i1 
Z i    i.i.d. ~  N 2 ( z |  , ) dG (  , )
 Zi2 
G ~ DP ( G0 )
45
Interrater agreement




The objective is to measure agreement
between raters beyond what is
possible by chance.
This is often measured by departure
from independence, often specifically
in the diagonals
Polychoric correlation of the latent
bivariate normal Z has been used as a
measure of association.
Rater 1
R
a
t
e
r
2
1
2
3
4
5
1
291
74
1
1
1
2
186
256
7
7
3
3
2
4
0
2
0
4
3
10
1
14
2
5
1
7
1
8
3
………………… of the latent bivariate
normal mixtures???
46
Latent class model (Agresti & Lang 1993)

C latent classes
pijk  P(Rater1  j and Rater2  k for subject i)
 pcjk if subject i belongs to class c, c  1,..., C

Ratings of the two raters within a class are independent
pcjk  pc1 j pc 2 k
C
pijk   I (  i  c) pc1 j pc 2 k
c 1
Rater 1
R
a
t
e
r
2
1
2
3
4
5
1
291
74
1
1
1
2
186
256
7
7
3
3
2
4
0
2
0
4
3
10
1
14
2
5
1
7
1
8
3
47
Mixtures of Double DPMs
C
pijk   I (  i  c) pc1 j pc 2 k
c 1

For each latent class, we model pc1j and pc2k
by two separate univariate ordinal probit
DPM models
p clj  P ( cl , j 1  Z cl   cl , j ), j  1,..5, l  1,2, c  1,.., C
Z cli i.i.d. ~  N1 ( z |  ,  2 ) dGcl (  ,  2 )
Gcl ~ DP ( cl G0 cl )
48
Computational issue
C
pijk   I (  i  c) pc1 j pc 2 k
c 1


The ``sample size’’ nc in latent group c is not fixed. This
causes problem for the polya-urn/marginal sampler which
works with fixed sample size
Do, Muller, Tang (2005) suggested a solution to this
problem by jointly sampling the latent il =(il,il2) and the
latent rating class membership i.
49
1
2
3
4
.3288
.0836
.0011
.0011
Estimated
cell
probabilities
1
.3261
.0869
.0013
.0020
5
.0011
.0007
(.2945,.3590) (.0696,.1078) (.0001,.0044) (.0003,.0056)
(0,.0027)
.3286
.0821
.003
.0022
.0009
(.3060,,.3524) (.0701,.09510) (.0008,.007) (.0007,.0047) (.0001,.0023)
.2102
.2893
.0079
.0079
.0034
2
.2135
.2826
.0082
.0069
.0031
(.1858,.2428) (.2521,.3141) (.0031,.0152) (.0022,.0143) (.0007,.0075)
.2098
.2853
.0076
0103
.0033
(.1856,.2334) (.2700,.2997) (.0036,.0129) (..0074,.0138) (.0017,.0053)
.0023
.0045
0
.0023
0
3
.0023
.0055
.0016
.0022
.0008
(.004,.0065)
(.0017,.0107) (.0003,.0038) (.004,.0062)
(.0,.0032)
.0029
.006
.0003
.0015
.0004
(.0009,.0068) (.0025,.0114)
(0,.0009)
(.0002,.0039)
(0,.0011)
.0034
.0113
.0011
.0158
.0023
4
.0042
.0102
.0023
..0143
.0028
(.0012,.0094) (.0042,.0187) (.0004,.006)
(.0066,.024) (.0006,.0069)
.0031
.0124
.0016
.0127
.0033
(.001,.0059)
(.0092,.0157) (.0002,.0044) (.0089,.0168) (.0014,.0057)
.0011
.0079
.0011
.009
.0034
5
.0012
.0071
.0019
.0083
.0039
(.0001,.0041) (.0026,.0140) (.0003,.0054) (.0034,.0153) (.0009,.009)
.0019
.0086
.0011
.0086
.0026
(.0006,.004)
(.0055,.0129) (.0002,.0032) (.0053,.0118) (.0011,.0045)
50
Marginal probability estimates

Latent Group 1
Rater A
1
2
3
4
5
0.6691 .3246 .0036 .0014 .0012
Rater B
1
2
3
4
5
0.9374 .0559 .0049 .0011 .0008
1
2
3
4
5
1
2
3
4
0.6261 0.03859 0.003273 0.00068
0.3058 0.01675 0.001511 0.000326
0.003405 0.000194 2.38E-05 6.81E-06
0.001237 0.000121 4.37E-05 1.86E-05
0.000882 0.000212 7.31E-05 3.12E-05
5
0.000473
0.000225
5.06E-06
1.79E-05
3.10E-05
51
Marginal probability estimates

Rater A
1
2
3
4
5
0.1562 .7139 .0188 .0661 .0451
Latent Group 2
Rater B
1
2
3
4
5
0.1432 .7432 .0226 .0706 .0205
1
2
3
4
5
1
0.02196
0.1107
0.002445
0.005113
0.002968
2
0.1264
0.5625
0.01194
0.02517
0.01718
3
4
5
0.002716 0.003709 0.001382
0.0138 0.02051 0.006414
0.000555 0.003084 0.000764
0.003294 0.02575 0.006747
0.002219 0.01753 0.005212
52
Joint mean covariance modeling



Trial with n=200 patients who had acute MI within 28 days of
baseline and are depressed/low social support
Underwent 6 months of usual care (control) or individual
and/or group-based cognitive behavioral counseling
(treatment).
Response y = depression (Beck Depression Inventory)
measured at 0,182,365,548, 913, 1278 days (but actually at
irregular intervals)

Covariate: Treatment, Family history, Age, Sex, BMI,……

Intermittent missing response, missing covariate….
53
0
200
400
600
800
Visit Days
1000
1200
54
0
10
20
30
Beck depression Inventory
40
50
Model
Model for the mean

E ( yij )  ij  X ij    Z ij bi

Z ij bi  b01
 b1i tij  b2i (tij  ti 2 ) 

Model for the covariance
yi  ( yi1 ,.., yi 6 ) ~ N 6 (i , i )

Pourahmadi (1999), Pourahmadi and Daniels (2002) use
a Cholesky decomposition of the covariance which allows
one to use log-linear model for the variances and “linear
regression” for the off-diagonal terms
55
Modeling the covariance
80
We assume
 i  diag ( i21 ,..,  i26 )
65
60
55
Z ij bi  boi  b1i tij  b2i tij2
Variance
log  ij2  X ij    Z ij bi
70
75

1
2
3
4
5
6
visits
56
Mean and variance level random effects
bi  (boi , b1i , b2i ) and bi  (boi , b1i , b2i ), i  1,..n

? A joint DPM for the two random effects together which allows
clustering at the patient level
(bi , bi ) i.i.d ~ G  , , G  , ~ DP

? Or Double DPM, that is, independent DPM separately for the each
of the two random effects which allows separate clustering at the
mean and variance level
bi i.i.d ~ G  , G  ~ DP 
bi i.i.d ~ G , G ~ DP 

Most frequentist and parametric Bayesian analyses use the latter
independence among the mean and variance level random effects.
57
-2400
-2300
log-likeliood
-2300
-2400
-7000
-5000
-3000
-7000
iterations
-5000
-3000
iterations
-20
-15
-10
-5
Log CPO
Double DP
Bivariate DP
-25
Log-CPO
log-likeliood
-2200
Bivariate DP
-2200
Double DP
0
50
100
Observations
150
200
58
Fixed effects estimates
.slope
.change
Double DP
-1.12 (-2.39,-0.16)
-3.29 (-4.36,-1.77)
Bivariate DP
Normal
-0.89 (-1.97,0.24) -0.26
(-.65,.28)
-2.98 (-4.4,-1.11) -4.057 (-4.72,-3.43)
.slope
.quad
0.18
(-.006,.38)
0.19 (.014,.382)
-0.005 (-0.027,0.021) -.006 (-.027,.013)
-0.16
0.033
(-0.32,0.05)
(0.009,0.051)
59
Pseudo marginal likelihood
log  f(yi| y-i)
Double DP
-2495.290
Bivariate DP -2503.916
Normal
-2530.92
60
Summary



Double DP mixtures may add a level of structure to mixture
modeling with DP.
They produce interesting “product-clustering”
They are applicable to specific problems that may benefit
from this structure
61
[email protected]
62