slides

Modeling Syntactic Structures of Topics with a Nested HMM‐LDA
Jing Jiang
Singapore Management University
Singapore Management University
Topic Models
• A generative model for discovering hidden topics from documents
• Topics are represented as word distributions
Topics are represented as word distributions
This makes full synchrony
of activated units the
default condition in the
model cortex, as in Brown
s model [Brown and
model [Brown and
Cooke, 1996], so that the background activation is
coherent, and
h
d can be
b read
d
into high order cortical
y
levels which synchronize with it.
12/7/2009
ICDM 2009
2
How to Interpret Topics?
• List the top‐K frequent words
– Not easy to interpret
– How are the top words related to each other?
• Our
Our method: model the syntactic structures of method: model the syntactic structures of
topics using a combination of hidden Markov models (HMM) and topic models (LDA)
models (HMM) and topic models (LDA)
• A preliminary solution towards meaningful representations of topics
representations of topics
12/7/2009
ICDM 2009
3
Related Work on Syntactic LDA
• Similar to / based on [Griffiths et al. 05]
– More general, with multiple “semantic classes”
• [Boyd‐Graber & Blei 09]
– Combines parse trees with LDA
Combines parse trees with LDA
– Expensive to obtain parse trees for large text collections
• [Gruber et al. 07]
– Combines HMM with LDA
Combines HMM with LDA
– Does not model syntax
12/7/2009
ICDM 2009
4
HMM to Model Syntax • In natural language sentences, the syntactic class of a word occurrence (noun, verb, adjective, adverb, preposition, etc.) depends on its context
• Transitions between syntactic classes follow some structure
• HMMs can be used to model these transitions
– HMM‐based part‐of‐speech tagger
HMM‐based part‐of‐speech tagger
12/7/2009
ICDM 2009
5
Overview of Our Model
• Assumptions
– A topic is represented as an HMM
– C Content states: convey semantic meanings of topics (likely to be nouns, verbs, adjectives, etc.)
– F Functional states: serve linguistic functions (e.g. prepositions and articles)
• Word distributions of these functional states are shared among topics
– Each document has a mixture of topics
Each document has a mixture of topics
– Each sentence is generated from a single topic
12/7/2009
ICDM 2009
6
Overview of Our Model
12/7/2009
ICDM 2009
7
Overview of Our Model
Topics
12/7/2009
ICDM 2009
8
Overview of Our Model
Topics
12/7/2009
ICDM 2009
States
9
Overview of Our Model
Topics
12/7/2009
ICDM 2009
States
10
The n‐HMM‐LDA Model
12/7/2009
ICDM 2009
11
Document Generation Process
Sample topics and transition probabilities
12/7/2009
ICDM 2009
12
Document Generation Process
Sample a topic distribution for the document (same as in LDA)
12/7/2009
ICDM 2009
13
Document Generation Process
Sample a topic for a sentence
12/7/2009
ICDM 2009
14
Document Generation Process
Generate the words in the sentence using the HMM
sentence using the HMM corresponding to this topic
12/7/2009
ICDM 2009
15
Variations
• Transition probabilities between states can be either topic‐specific (left) or shared by all topics (right)
12/7/2009
ICDM 2009
16
Model Inference: Gibbs Sampling
• Sample a topic for a sentence
• Sample a state for a word
12/7/2009
ICDM 2009
17
Experiments – Data Sets
• NIPS publications (downloaded from http://nips.djvuzone.org/txt.html)
• Reuters‐21578
Date Sets
NIPS Publications*
Reuters‐21578
Vocabulary
18,864
10,739
Words
5,305,230
1,460,666
documents for training
documents for
1314
8052
documents for testing
618
2665
12/7/2009
ICDM 2009
18
Quantitative Evaluation
• Perplexity: a commonly used metric for the generalization power of language models
• For a test document, observe the first K
sentences and predict the remaining
sentences and predict the remaining sentences
12/7/2009
ICDM 2009
19
LDA vs. LDA‐s
• LDA‐s: n‐HMM‐LDA with a single state for each HMM.
– Same as standard LDA with each sentence having a single topic Reuters
NIPS
One topic per sentence assumption helps
One‐topic‐per‐sentence assumption helps.
12/7/2009
ICDM 2009
20
HMM
• Achieves much lower perplexity, but cannot be used to discover topics
NIPS
12/7/2009
ICDM 2009
21
Increase Number of Functional States
• Fixing the number of content states to 1 and the number of topics to 40
Reuters
NIPS
More functional states decreases perplexity
More functional states decreases perplexity.
12/7/2009
ICDM 2009
22
Qualitative Evaluation
• Use the top frequent words to represent a topic/state
12/7/2009
ICDM 2009
23
Sample Topics/States from LDA/HMM
NIPS
12/7/2009
ICDM 2009
24
Sample States from n‐HMM‐LDA‐d
NIPS
12/7/2009
ICDM 2009
25
Different Content States
12/7/2009
ICDM 2009
26
Case Study (LDA)
This makes full synchrony
of activated units the
default condition in the
model cortex, as in Brown
s model
model [Brown and
[ rown and
Cooke, 1996], so that the background activation is
coherent and can be
coherent, and
can be read
into high order cortical
levels which synchronize with it.
12/7/2009
is
the
that
be
are
can
one
it
to
for
ICDM 2009
the
off
a
signal
and
to
in
frequency
is
*
of
the
in
and
cells
to
cell
model
a
response
27
Case Study (n‐HMM‐LDA)
This makes full synchrony of activated units the default condition in the model cortex, as in Brown ss model
model [[Brown and rown and
Cooke, 1996], so that the background activation is coherent and can be read
coherent, and can be read into high order cortical
levels which synchronize with it.
12/7/2009
*
receptive
synaptic
inhibitory
head
excitatory
direction
cell
visual
pyramidal
ICDM 2009
cells
cellll
*
neurons
field
input
response
model
activity
synapses
28
Case Study (Comparison)
This makes full synchrony
of activated units the
default condition in the
model cortex, as in Brown
s model
model [Brown and
[ rown and
Cooke, 1996], so that the background activation is
coherent and can be
coherent, and
can be read
into high order cortical
levels which synchronize with it.
This makes full synchrony of activated units the default condition in the model cortex, as in Brown ss model
model [[Brown and rown and
Cooke, 1996], so that the background activation is coherent and can be read
coherent, and can be read into high order cortical
levels which synchronize with it.
n‐HMM‐LDA
LDA
12/7/2009
ICDM 2009
29
Conclusion
• We proposed a nested‐HMM‐LDA to model the syntactic structures of topics
– Extension of [Griffiths et al. 05]
• Experiments on two data sets show that
p
– The model achieves perplexity between LDA and HMM
– The model can provide more insights into the structures of topics than standard LDA
12/7/2009
ICDM 2009
30
Thank You!
• Questions?
12/7/2009
ICDM 2009
31