Exact and Approximate Sum Representations for the

Exact and Approximate Sum
Representations for the Dirichlet Process
Hemnant Ishwaran and Mahmoud Zarepour
Presented by: John Paisley
Paper objective
• This paper is concerned with an analytical measure
of the closeness of the infinite-dimensional DP to the
finite-dimensional DD as seen through the Gamma
method for drawing from the DD
Result
• I didn’t fully understand how this was arrived
at or if there is any important meaning to it.
Interesting result they mention (but taken from elsewhere)
Interesting trick for speeding up VB inference
• They represented the DP in this paper in an interesting way.
• This is a Dirichlet process after “N” draws. Because $\alpha / K$
goes to zero for the DP, the posterior on the selected components
is simply the number of counts. The $\alpha$ on the right
represents the weight of all remaining components (which never
changes).
• We’ve fixed the truncation for VB mixture modeling for theoretical
reasons. Also, when using DP, we use stick-breaking to add and
subtract component because it is ad-hoc to add and subtract
components to a finite DD. The above representation provides a
theoretically justifiable way to add and subtract components to the
DD. (continued)
Continued…
• I think we can use the “DD” as on the previous page (actually
a DP) in a VB setting. We can subtract unused components
with every iteration and not violate any DP rules or be called
ad-hoc. I think we can also show that the lower bound
guarantee in VB is also not violated.
• Why this is good: The stick-breaking prior for DP is a biased
prior (Qi presented a way to address this for VB, but it could
be called ad-hoc). This prior is symmetric (very important) and
is still fully DP. Also, computation time increases linearly as a
function of truncation. Therefore, there has been a trade-off:
increase truncation for better results, but longer time or viceversa. Now, because we can theoretically justify pruning with
every iteration (IF I’m right), we can literally have both.