Topic models - Illinois Wiki TEST

KDD 2011
Summary of Text Mining sessions
Hongbo Deng
3 Text Mining Sessions, 9 Papers
•
Beyond Keyword Search: Discovering Relevant Scientific Literature
–
•
Collaborative Topic Modeling for Recommending Scientific Articles
–
•
Liangjie Hong (Lehigh University), Dawei Yin, Jian Guo, Brian D. Davison
David Andrzejewski (Lawrence Livermore National Laboratory), David Buttler
Recommendation
Localized Factor Models for Multi-Context Recommendation
–
•
Jun Zhu (Carnegie Mellon University), Ni Lao, Ning Chen, Eric P. Xing
Latent Topic Feedback for Information Retrieval
–
•
Tristan Snowsill (University of Bristol), Nick Fyson, Tijl De Bie, Nello Cristianini
Tracking Trends: Incorporating Term Volume into Temporal Topic Models
–
•
Daniel Ramage (Stanford University), Christopher D. Manning, Susan Dumais
Conditional Topical Coding: An Efficient Topic Model Conditioned on Rich Features
–
•
Topic Model
Refining Causality: Who Copied from Whom?
–
•
Chong Wang (Princeton University), David M. Blei
Partially Labeled Topic Models for Interpretable Text Mining
–
•
Khalid El-Arini (Carnegie Mellon University), Carlos Guestrin
Deepak Agarwal (Yahoo! Labs), Bee-Chung Chen, Bo Long
Latent Aspect Rating Analysis without Aspect Keyword Supervision
–
Hongning Wang (University of Illinois at Urbana-Champaign), Yue Lu, ChengXiang Zhai
Topic models are widely used in other sessions, e.g., user modeling, query log analysis, ad …
Collaborative Topic Modeling for
Recommending Scientific Articles
• Problem:
– To recommend scientific articles to users of an online community
• Input:
– Users’s libraries from CiteULike
– The content of the articles
• Output:
– Find articles relevant to their interests
• Three traditional ways
– Follow citations in other articles they are interested in
– Keyword search
– Using recommendation methods (CiteULike)
• Several criteria
Collaborative Filtering
+
Topic Modeling
– Recommending older articles is important
– Recommending new articles is also important
– Exploratory variables can be valuable in online scientific archives and
communities
Collaborative Topic Modeling for
Recommending Scientific Articles
• Two types of data
– The other users’ libraries [collaborative filtering]
• Like latent factor models, use information from other users’
libraries
• For a particular user, it can recommend articles from other
users who liked similar articles
• Latent factor models work well for recommending known
articles, but cannot generalize to previously unseen articles
– The content of the articles [topic modeling]
• To generalize to unseen articles, the authors uses topic
modeling
• Can recommend articles that have similar content to other
articles that a user likes
Collaborative Topic Modeling for
Recommending Scientific Articles
• Intuition: Combine collaborative filtering and probabilistic
topic modeling for recommending scientific articles
The key property in CTR lies in how the item
latent vector $v_j$ is generated
We assume the item latent
vector $v_j$ is close to topic
proportion $\theta_j$, but could
diverge from it if it has to
Latent Topic Feedback for Information
Retrieval
• Problem: a user navigation an unfamiliar
corpus of text documents where document
metadata is limited or unavailable
• Intuition: To augment keyword search with
user feedback on latent topics
• Key point: A new method for obtaining and
exploiting user feedback at the latent topic
level
Latent Topic Feedback for Information
Retrieval
• Method:
– To learn latent topics from the corpus and construct
meaningful representations of these topics
– At query time, decide which latent topics are
potentially relevant and present the appropriate topic
representations alongside keyword search results
– When a user selects a latent topic, the vocabulary
terms most strongly associated with that topic are
then used to augment the original query
Beyond Keyword Search: Discovering
Relevant Scientific Literature
• Problem: As the number of publications has grown,
difficult for scientists to find relevant prior work for
their particular research
• Input: a set of papers as a query
• Output: a set of highly relevant articles
• Method:
– Modeling scientific influence between documents:
optimize an objective function
– Select a set of papers A with maximum influence to/from
the query set Q
– Incorporate trust and personalization: as scientists trust
some authors more than others, results can be
personalized to individual preferences
Partially Labeled Topic Models for
Interpretable Text Mining
• Problem: make use of the unsupervised learning
of topic modeling, with constrains that align
some learned topics with a human-provided label
• Input: a collection of documents, partial labels
a multinomial
distribution over
words $V$ that tend
to co-occur with
each other and
some label
per-doc-label topic distribution
Graphical model for PLDA
per-topic word distribution
per-doc label distribution
Observed: each document’s words w and labels Λ
• Output: θ, Φ, ψ
Extend the generative story of
LDA to incorporate labels, and
of Labeled LDA to incorporate
per-label latent topics
Latent Aspect Rating Analysis without
Aspect Keyword Supervision
Aspect Segmentation
Reviews + overall ratings
+
Aspect segments
location:1
amazing:1
walk:1
anywhere:1
room:1
nicely:1
appointed:1
comfortable:1
nice:1
accommodating:1
smile:1
friendliness:1
attentiveness:1
Latent Rating Regression
Term Weights Aspect Rating Aspect Weight
0.0
2.9
0.1
0.9
0.1
1.7
0.1
3.9
2.1
1.2
1.7
2.2
0.6
3.9
0.2
4.8
0.2
5.8
0.6
Gap ???
Latent Aspect Rating Analysis without
Aspect Keyword Supervision
• LARAM
• LRR (Wang et al., 2010)
• Jointly model
aspects and aspect
rating/weights
• Segmented aspects
from previous step
Some Observations
• Text mining is very hot
• Topic modeling has been widely used in text
analysis or many other applications, e.g.,
query understanding, advertisement …
– Combine topic modeling with other models, e.g.,
collaborative filtering
– Integrate more information into topic modeling,
e.g., labeled and unlabeled information (partially
labeled)
– Two-step solution -> unified way
Thanks!