KDD 2011 Summary of Text Mining sessions Hongbo Deng 3 Text Mining Sessions, 9 Papers • Beyond Keyword Search: Discovering Relevant Scientific Literature – • Collaborative Topic Modeling for Recommending Scientific Articles – • Liangjie Hong (Lehigh University), Dawei Yin, Jian Guo, Brian D. Davison David Andrzejewski (Lawrence Livermore National Laboratory), David Buttler Recommendation Localized Factor Models for Multi-Context Recommendation – • Jun Zhu (Carnegie Mellon University), Ni Lao, Ning Chen, Eric P. Xing Latent Topic Feedback for Information Retrieval – • Tristan Snowsill (University of Bristol), Nick Fyson, Tijl De Bie, Nello Cristianini Tracking Trends: Incorporating Term Volume into Temporal Topic Models – • Daniel Ramage (Stanford University), Christopher D. Manning, Susan Dumais Conditional Topical Coding: An Efficient Topic Model Conditioned on Rich Features – • Topic Model Refining Causality: Who Copied from Whom? – • Chong Wang (Princeton University), David M. Blei Partially Labeled Topic Models for Interpretable Text Mining – • Khalid El-Arini (Carnegie Mellon University), Carlos Guestrin Deepak Agarwal (Yahoo! Labs), Bee-Chung Chen, Bo Long Latent Aspect Rating Analysis without Aspect Keyword Supervision – Hongning Wang (University of Illinois at Urbana-Champaign), Yue Lu, ChengXiang Zhai Topic models are widely used in other sessions, e.g., user modeling, query log analysis, ad … Collaborative Topic Modeling for Recommending Scientific Articles • Problem: – To recommend scientific articles to users of an online community • Input: – Users’s libraries from CiteULike – The content of the articles • Output: – Find articles relevant to their interests • Three traditional ways – Follow citations in other articles they are interested in – Keyword search – Using recommendation methods (CiteULike) • Several criteria Collaborative Filtering + Topic Modeling – Recommending older articles is important – Recommending new articles is also important – Exploratory variables can be valuable in online scientific archives and communities Collaborative Topic Modeling for Recommending Scientific Articles • Two types of data – The other users’ libraries [collaborative filtering] • Like latent factor models, use information from other users’ libraries • For a particular user, it can recommend articles from other users who liked similar articles • Latent factor models work well for recommending known articles, but cannot generalize to previously unseen articles – The content of the articles [topic modeling] • To generalize to unseen articles, the authors uses topic modeling • Can recommend articles that have similar content to other articles that a user likes Collaborative Topic Modeling for Recommending Scientific Articles • Intuition: Combine collaborative filtering and probabilistic topic modeling for recommending scientific articles The key property in CTR lies in how the item latent vector $v_j$ is generated We assume the item latent vector $v_j$ is close to topic proportion $\theta_j$, but could diverge from it if it has to Latent Topic Feedback for Information Retrieval • Problem: a user navigation an unfamiliar corpus of text documents where document metadata is limited or unavailable • Intuition: To augment keyword search with user feedback on latent topics • Key point: A new method for obtaining and exploiting user feedback at the latent topic level Latent Topic Feedback for Information Retrieval • Method: – To learn latent topics from the corpus and construct meaningful representations of these topics – At query time, decide which latent topics are potentially relevant and present the appropriate topic representations alongside keyword search results – When a user selects a latent topic, the vocabulary terms most strongly associated with that topic are then used to augment the original query Beyond Keyword Search: Discovering Relevant Scientific Literature • Problem: As the number of publications has grown, difficult for scientists to find relevant prior work for their particular research • Input: a set of papers as a query • Output: a set of highly relevant articles • Method: – Modeling scientific influence between documents: optimize an objective function – Select a set of papers A with maximum influence to/from the query set Q – Incorporate trust and personalization: as scientists trust some authors more than others, results can be personalized to individual preferences Partially Labeled Topic Models for Interpretable Text Mining • Problem: make use of the unsupervised learning of topic modeling, with constrains that align some learned topics with a human-provided label • Input: a collection of documents, partial labels a multinomial distribution over words $V$ that tend to co-occur with each other and some label per-doc-label topic distribution Graphical model for PLDA per-topic word distribution per-doc label distribution Observed: each document’s words w and labels Λ • Output: θ, Φ, ψ Extend the generative story of LDA to incorporate labels, and of Labeled LDA to incorporate per-label latent topics Latent Aspect Rating Analysis without Aspect Keyword Supervision Aspect Segmentation Reviews + overall ratings + Aspect segments location:1 amazing:1 walk:1 anywhere:1 room:1 nicely:1 appointed:1 comfortable:1 nice:1 accommodating:1 smile:1 friendliness:1 attentiveness:1 Latent Rating Regression Term Weights Aspect Rating Aspect Weight 0.0 2.9 0.1 0.9 0.1 1.7 0.1 3.9 2.1 1.2 1.7 2.2 0.6 3.9 0.2 4.8 0.2 5.8 0.6 Gap ??? Latent Aspect Rating Analysis without Aspect Keyword Supervision • LARAM • LRR (Wang et al., 2010) • Jointly model aspects and aspect rating/weights • Segmented aspects from previous step Some Observations • Text mining is very hot • Topic modeling has been widely used in text analysis or many other applications, e.g., query understanding, advertisement … – Combine topic modeling with other models, e.g., collaborative filtering – Integrate more information into topic modeling, e.g., labeled and unlabeled information (partially labeled) – Two-step solution -> unified way Thanks!
© Copyright 2026 Paperzz