Constructing dependent random probability measures from

Constructing dependent random probability measures
from completely random measures
Changyou Chen1 , Vinayak Rao2 , Wray Buntine1 , Yee Whye Teh3
presented by Sinead Williamson4
1
NICTA, 2 Duke University, 3 University of Oxford, 4 UT Austin
ICML 2013
Introduction
Two strands of research in NPBayes modelling of random probability measures
(RPMs):
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
2 / 23
Introduction
Two strands of research in NPBayes modelling of random probability measures
(RPMs):
priors that are more expressive than the Dirichlet Process
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
2 / 23
Introduction
Two strands of research in NPBayes modelling of random probability measures
(RPMs):
priors that are more expressive than the Dirichlet Process
e.g. power-law behaviour or more uncertainty on number of clusters:
Normalized Random Measures [James et al., 2005, Kingman, 1975]
(e.g. normalized generalized Gamma process)
Poisson-Kingman processes [Pitman, 2003]
(e.g. Pitman-Yor process [Pitman and Yor, 1997])
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
2 / 23
Introduction
Two strands of research in NPBayes modelling of random probability measures
(RPMs):
priors that are more expressive than the Dirichlet Process
priors that model more structured data
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
2 / 23
Introduction
Two strands of research in NPBayes modelling of random probability measures
(RPMs):
priors that are more expressive than the Dirichlet Process
priors that model more structured data
for data violating the assumption of exchangeability:
Time-series, spatial data, conditional density modelling
Research traces back to work of [MacEachern, 1999] on dependent RPMs
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
2 / 23
Introduction
Two strands of research in NPBayes modelling of random probability measures
(RPMs):
priors that are more expressive than the Dirichlet Process
priors that model more structured data
This talk:
Flexible constructions for dependent RPMs with flexible marginals
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
2 / 23
Relevant work
There is a rich literature on dependent RPMs
the seminal work of [MacEachern, 1999] on dependent DPs
Existing work that is directly relevant [Rao and Teh, 2009, Nipoti, 2010,
Lijoi et al., 2012, Foti et al., 2012, Lin and Fisher, 2012, Griffin et al., 2013]
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
3 / 23
Relevant work
There is a rich literature on dependent RPMs
the seminal work of [MacEachern, 1999] on dependent DPs
Existing work that is directly relevant [Rao and Teh, 2009, Nipoti, 2010,
Lijoi et al., 2012, Foti et al., 2012, Lin and Fisher, 2012, Griffin et al., 2013]
C. Chen, V. Rao, W. Buntine and Y.W. Teh (2013)
Dependent Normalized Random Measures
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
3 / 23
Completely random measures (CRMs)
A random measure µ on some space (X, ΣX ) such that
µ(A) ⊥
⊥ µ(B) if A and B are disjoint
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
4 / 23
Completely random measures (CRMs)
A random measure µ on some space (X, ΣX ) such that
µ(A) ⊥
⊥ µ(B) if A and B are disjoint
The measure µ is atomic:
µ=
X
wi δxi
i
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
4 / 23
Completely random measures (CRMs)
A random measure µ on some space (X, ΣX ) such that
µ(A) ⊥
⊥ µ(B) if A and B are disjoint
The measure µ is atomic:
µ=
X
wi δxi
i
(xi , wi ) : events of a Poisson process on the space X × W, where W = [0, ∞).
The Poisson process has intensity ν(w , x) = ρ(w )h(x), where
ρ(w ) is the Lévy intensity of the CRM, and
h(x) is the base probability density.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
4 / 23
Normalized random measures
Poisson process {wi , xi }
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
5 / 23
Normalized random measures
Poisson process {wi , xi }
CRM µ ≡ {wi , xi }
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
5 / 23
Normalized random measures
Poisson process {wi , xi }
CRM µ ≡ {wi , xi }
Normalize to construct a random probability measure G : G (·) =
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
µ(·)
µ(X)
June, 2013
5 / 23
Normalized random measures
Poisson process {wi , xi }
CRM µ ≡ {wi , xi }
Normalize to construct a random probability measure G : G (·) =
µ(·)
µ(X)
In the following, we set ρ(w ) = αw −σ−1 exp(−τ w ) corresponding to the
generalized Gamma process.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
5 / 23
Normalized random measures
Poisson process {wi , xi }
CRM µ ≡ {wi , xi }
Normalize to construct a random probability measure G : G (·) =
µ(·)
µ(X)
In the following, we set ρ(w ) = αw −σ−1 exp(−τ w ) corresponding to the
generalized Gamma process.
We want: Dependent normalized random measures, Gt
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
5 / 23
Dependent normalized random measures
Define a common latent CRM/Poisson process.
Define dependent measures via transformations of this process.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
6 / 23
Dependent normalized random measures
Define a common latent CRM/Poisson process.
Define dependent measures via transformations of this process.
I
I
I
Superposition [Rao and Teh, 2009, Griffin et al., 2013]
Rescaling
Thinning [Lin et al., 2010, Lin and Fisher, 2012]
Normalize these dependent CRMs to produce dependent NRMs.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
6 / 23
Superposition theorem
The superposition of two independent Poisson processes with intensity
νi (·), i = 1, 2 is a Poisson process with intensity ν1 (·) + ν2 (·)
The resulting CRM has Lévy measure ρ = ρ1 + ρ2 .
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
7 / 23
Superposition theorem
The superposition of two independent Poisson processes with intensity
νi (·), i = 1, 2 is a Poisson process with intensity ν1 (·) + ν2 (·)
The resulting CRM has Lévy measure ρ = ρ1 + ρ2 .
The projection of a Poisson
process from X × W × A to X × W is a Poisson
R
process with intensity A ν(dx, dw , da)
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
7 / 23
Superposition theorem
The superposition of two independent Poisson processes with intensity
νi (·), i = 1, 2 is a Poisson process with intensity ν1 (·) + ν2 (·)
The resulting CRM has Lévy measure ρ = ρ1 + ρ2 .
The projection of a Poisson
process from X × W × A to X × W is a Poisson
R
process with intensity A ν(dx, dw , da)
IfRν(·) factors
as ρ(w )h(x)νa (a), then the resulting CRM has Lévy intensity
ρ(w )h(x).
ν
(a)da
a
A
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
7 / 23
Spatial Normalized Gamma processes [Rao and Teh, 2009]
A measure-valued stochastic process Gt , t ∈ T is an arbitrary space
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
8 / 23
Spatial Normalized Gamma processes [Rao and Teh, 2009]
A measure-valued stochastic process Gt , t ∈ T is an arbitrary space
Instantiate a Poisson process on some augmented space
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
8 / 23
Spatial Normalized Gamma processes [Rao and Teh, 2009]
A measure-valued stochastic process Gt , t ∈ T is an arbitrary space
Instantiate a Poisson process on some augmented space
Associate each t with a subset X × W × At
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
8 / 23
Spatial Normalized Gamma processes [Rao and Teh, 2009]
A measure-valued stochastic process Gt , t ∈ T is an arbitrary space
Instantiate a Poisson process on some augmented space
Associate each t with a subset X × W × At
Restrict to At , and project onto the original space, defining an NRM
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
8 / 23
Spatial Normalized Gamma processes [Rao and Teh, 2009]
A measure-valued stochastic process Gt , t ∈ T is an arbitrary space
Instantiate a Poisson process on some augmented space
Associate each t with a subset X × W × At
Restrict to At , and project onto the original space, defining an NRM
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
8 / 23
Spatial Normalized Gamma processes [Rao and Teh, 2009]
A measure-valued stochastic process Gt , t ∈ T is an arbitrary space
Instantiate a Poisson process on some augmented space
Associate each t with a subset X × W × At
Restrict to At , and project onto the original space, defining an NRM
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
8 / 23
Spatial Normalized Gamma processes [Rao and Teh, 2009]
A measure-valued stochastic process Gt , t ∈ T is an arbitrary space
Instantiate a Poisson process on some augmented space
Associate each t with a subset X × W × At
Restrict to At , and project onto the original space, defining an NRM
Dependency across NRMs is controlled by amount of overlap of At ’s
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
8 / 23
Dependent normalized random measures
([Griffin et al., 2013])
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
9 / 23
Dependent normalized random measures
([Griffin et al., 2013])
Gt ∝
R
X
ztr µr ,
ztr ∈ {0, 1}
(1)
r =1
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
9 / 23
Mixed normalized random measures
A simple generalization: allow ztr ∈ R+
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
10 / 23
Mixed normalized random measures
A simple generalization: allow ztr ∈ R+
zµ(·) belongs to the same class of CRMs as µ(·).
Poisson mapping theorem:
I
If {wi } is a sample from a Poisson process with intensity ν(w ), then {zwi } is
a Poisson process with intensity z−1 ν(w /z).
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
10 / 23
Mixed normalized random measures
A simple generalization: allow ztr ∈ R+
zµ(·) belongs to the same class of CRMs as µ(·).
Poisson mapping theorem:
I
If {wi } is a sample from a Poisson process with intensity ν(w ), then {zwi } is
a Poisson process with intensity z−1 ν(w /z).
ztr governs how strongly atoms of CRM r contribute to covariate t.
Given a set {ztr }, for each t, Gt is an NRM.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
10 / 23
Thinning theorem
If {wi } is a sample from a Poisson process with intensity ν(w ), then
i.i.d.
{zi wi }, where zi ∼ Bernoulli(p) is Poisson with intensity pν(w ).
Suggests independently thinning atoms of a CRM to form a new CRM
([Lin et al., 2010]).
Corresponds to SNRM with an exponential number of CRM.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
11 / 23
Thinned Normalized Random Measures
Spatial NRM characterized by a set {ztr ∈ {0, 1}
∀t ∈ T , r ∈ R}
ztr specifies whether or not all atoms of CRM r are present at covariate t.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
12 / 23
Thinned Normalized Random Measures
Spatial NRM characterized by a set {ztr ∈ {0, 1}
∀t ∈ T , r ∈ R}
ztr specifies whether or not all atoms of CRM r are present at covariate t.
Thinned NRM: introduce indicator variables ztrw ∈ {0, 1} for each atom.
zrtk ∼ Bernoulli(qrt ) k = 1, 2, · · ·
Then, the probability measure at covariate t is given by
R
µt (dθ) =
∞
XX
1
zrtk wrk
µ̂t (dθ), where µ̂t (dθ) =
µ̂t (Θ)
r =1
(2)
k=1
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
12 / 23
Thinned Normalized Random Measures
Spatial NRM characterized by a set {ztr ∈ {0, 1}
∀t ∈ T , r ∈ R}
ztr specifies whether or not all atoms of CRM r are present at covariate t.
Thinned NRM: introduce indicator variables ztrw ∈ {0, 1} for each atom.
zrtk ∼ Bernoulli(qrt ) k = 1, 2, · · ·
Then, the probability measure at covariate t is given by
R
µt (dθ) =
∞
XX
1
zrtk wrk
µ̂t (dθ), where µ̂t (dθ) =
µ̂t (Θ)
r =1
(2)
k=1
Proposition
Conditioned on the set of qrt ’s, each random probability measure µt defined in (2)
is
Pmarginally distributed as a normalized random measure with Lévy measure
r zrt νr (dw , dθ).
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
12 / 23
Dependent normalized random measures
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
13 / 23
Inference
[Lin et al., 2010] have proposed a similar model at NIPS this year.
They provide a marginal sampler for posterior inference.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
14 / 23
Inference
[Lin et al., 2010] have proposed a similar model at NIPS this year.
They provide a marginal sampler for posterior inference.
Unfortunately, this sampler is incorrect.
A similar error exists in [Lin and Fisher, 2012]
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
14 / 23
Inference
[Lin et al., 2010] have proposed a similar model at NIPS this year.
They provide a marginal sampler for posterior inference.
Unfortunately, this sampler is incorrect.
A similar error exists in [Lin and Fisher, 2012]
At a high level:
I
I
I
One can superimpose 3 CRMs to construct 2 dependent RPMs, each
marginally an NRM.
However, given observations from one , the other is no longer an NRM.
It becomes a mixture of NRMs.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
14 / 23
Inference (a marginal sampling approach)
Following [James et al., 2005], introduce auxiliary variables ut ∀t ∈ T
Conditionally marginalize the CRMs µr to obtain a generalized CRP
Alternately resample partition given {ut }, and then {ut } given partition.
Works for SNRM and MNRM, but impractical for TNRM.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
15 / 23
Inference (a slice sampling approach)
Following [Walker, 2007], introduce auxiliary variables sr ∀r ∈ R
Instantiate atoms of the CRM µr larger than sr
Conditionally sample partition of observations, and associated z0 s
Alternately resample µr given {sr }, and then {sr } given µr .
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
16 / 23
Application: Document modelling
Four corpora of documents (ICML,
Person, TPAMI, NIPS)
Documents organized by year.
Largest corpus: NIPS
I
17 years, 2483 documents, 3.28M
words and a vocabulary of 14K
Use a nonparametric topic model.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
17 / 23
Application to document modelling
Test perplexity on four different corpora (small is good)
ICML
1100
Person
8500
TPAMI
NIPS
1300
8000
1250
1050
1200
3000
HDP
HNGG
TNGG
MNGG
HSNGG
HTNGG
HMNGG
HMNGP
7500
7000
2500
1000
1150
6500
1100
6000
2000
950
900
5500
1500
1050
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
18 / 23
Application to document modelling
120
ICML
ESS/1000 samples
200
150
80
100
300
500
NIPS
MNGG (marg)
MNGG (slice)
TNGG (slice)
50
400
300
200
150
20
Chen, Rao, Buntine, and Teh (Duke)
TPAMI
250
40
0
400
350
100
60
Person
100
200
100
50
0
0
Dependent RPMs from CRMs
0
June, 2013
19 / 23
Application to document modelling
140
ICML (s)
120
Time/1000 samples
350
300
Person (s)
4.5
TPAMI (h)
12
NIPS (h)
MNGG (marg)
MNGG (slice)
TNGG (slice)
4.0
10
3.5
100
250
80
200
2.5
60
150
2.0
40
100
3.0
8
6
1.5
4
1.0
20
50
0
0
Chen, Rao, Buntine, and Teh (Duke)
2
0.5
0.0
Dependent RPMs from CRMs
0
June, 2013
20 / 23
Thank you!
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
21 / 23
Bibliography I
Foti, N. J., Futoma, J., Rockmore, D., and Williamson, S. A. (2012).
A unifying representation for a class of dependent random measures.
Technical Report arXiv:1211.4753, Dartmouth College and CMU, USA.
Griffin, J. E., Kolossiatis, M., and Steel, M. F. J. (2013).
Comparing distributions by using dependent normalized random-measure mixtures.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), pages n/a–n/a.
James, L. F., Lijoi, A., and Pruenster, I. (2005).
Bayesian inference via classes of normalized random measures.
ICER Working Papers - Applied Mathematics Series 5-2005, ICER - International Centre for Economic Research.
Kingman, J. F. C. (1975).
Random discrete distributions.
Journal of the Royal Statistical Society, 37:1–22.
Lijoi, A., Nipoti, B., and Pruenster, I. (2012).
Bayesian inference with dependent normalized completely random measures.
Technical report.
Lin, D., Grimson, E., and Fisher, J. (2010).
Construction of dependent Dirichlet processes based on Poisson processes.
In NIPS.
Lin, D. H. and Fisher, J. (2012).
Coupling nonparametric mixtures via latent Dirichlet processes.
In NIPS.
MacEachern, S. (1999).
Dependent nonparametric processes.
In Proceedings of the Section on Bayesian Statistical Science. American Statistical Association.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
22 / 23
Bibliography II
Nipoti, B. (2010).
Transformations of dependent completely random measures.
Technical report.
Pitman, J. (2003).
Poisson-kingman partitions.
In of Lecture Notes-Monograph Series, pages 1–34.
Pitman, J. and Yor, M. (1997).
The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator.
Annals of Probability, 25:855–900.
Rao, V. and Teh, Y. W. (2009).
Spatial normalized gamma processes.
In Advances in Neural Information Processing Systems.
Walker, S. G. (2007).
Sampling the Dirichlet mixture model with slices.
Communications in Statistics - Simulation and Computation, 36:45.
Chen, Rao, Buntine, and Teh (Duke)
Dependent RPMs from CRMs
June, 2013
23 / 23