HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades Xinran He1, Theodoros Rekatsinas2, James Foulds3, Lise Getoor3 and Yan Liu1 07/08/2015 1University of Southern California 2University of Maryland, College Park 3University of California, Santa Cruz Introduction β’ Diffusion is an important and fundamental phenomenon: Viral marketing, detection of rumors, modeling news dynamics β¦ β’ Abundant text-based cascades in a variety of social platforms t=2 t=1 E G t=1.5 t=0 t=3.5 B D C A He et al. F HawkesTopic ICML 2015 01/17 Traditional vs Text-based Cascades Traditional cascades Text-based cascades t=2 G t=1 t=3.5 t=2 E t=1.5 t=0 B D A - Temporal information t=1 t=3.5 C F t=1.5 t=0 - Temporal information - Content information Incorporate content information => better model of diffusion Incorporate temporal information => better model of documents He et al. HawkesTopic ICML 2015 02/17 Network Inference t=2 G t=3.5 t=1 aaa bbb bba ccc bbb ccc B D Topic 1 aaa aab ccc bbc aaa E t=1.5 t=0 aaa aaa bbb A Topic 2 bbb bba bbc Topic 3 ccc aaa aab ccc G 0.2 C 0.1 E 0.2 0.3 0.5 B 0.6 F D C 0.1 A F Network Inference focuses on inferring a hidden diffusion network Related work: - NetInf, NetRate [Gomez et al. 11,12], MMHP [Yang and Zha 13], KernelCascades [Du el al. 12] - TopicCascades [Du el al. 13] He et al. HawkesTopic ICML 2015 03/17 Topic Modeling G t=3.5 ccc bbc aaa t=2 Corpus ccc bbc aaa ccc aaa bbb bbb ccc t=1 bba ccc bbb ccc B E t=1.5 aaa bbb bba t=0 aaa aaa D Topic 1 aaa aab bbb aaa aaa bbb A aaa aab ccc Topic 3 Topic 2 bbb bba bbc ccc ccc bbc aaa aaa bbb bba G ccc bbb ccc aaa aab C ccc E aaa aaa bbb B D F A aaa aab ccc C F Topic modeling aims to discover the latent thematic topics Related work: - LDA [Blei et al. 03], CTM [Blei and Lafferty 06] - Citation Influence model [Dietz el al. 07], TIR model [Foulds et al. 13] He et al. HawkesTopic ICML 2015 04/17 Our Contribution ccc bbc aaa Topic Modeling ccc bbc aaa t=2 G t=3.5 t=1 aaa bbb bba ccc bbb ccc B Topic 2 aaa aab ccc bbb bba bbc aaa aaa bbb ccc bbb ccc aaa aab ccc Topic 3 ccc aaa aaa bbb C 0.3 G D aaa aab aaa bbb bba E t=1.5 t=0 Topic 1 A F 0.3 Network Inference E 0.2 B 0.4 0.6 D C 0.1 F A HawkesTopic: joint model for simultaneous Network Inference and Topic Modeling from text-based cascades He et al. HawkesTopic ICML 2015 05/17 HawkesTopic: Intuition ccc cca bbb ccc ccc bbb π π£1 aaa aaa bbb aaa aba bbb bbb bba cca π£2 π Mutual exciting nature: A posting event can trigger future events Content cascades: The content of a document should be similar to the document that triggers its publication He et al. HawkesTopic ICML 2015 06/17 Modeling Posting Times Mutually exciting nature captured via Multivariate Hawkes Process (MHP) [Liniger 09]. For MHP, intensity process ππ£ (π‘) takes the form: Rate = + Base intensity ππ£ π‘ = ππ£ + Influence from previous events π:π‘π <π‘ π΄π£π ,π£ πΞ (π‘ β π‘π ) π΄π’,π€ : influence strength from π’ to π£ πΞ (β ): probability density function of the delay distribution He et al. HawkesTopic ICML 2015 07/17 Generating Posting Times π π£1 Level 0 Level 1 π π£2 Level 2 Generate events and their posting times in a breadth first order by interpreting the MHP as clustered Poisson process [Simma 10] Provide explicit parent relationship for evolution of the content information He et al. HawkesTopic ICML 2015 08/17 Modeling Documents ccb cac aaa ccb cac ccc πΌ1 Topic 1 π π£1 aab aaa ccc πΌ2 aac aab ccc ccb aab ccc ccb ccc aab Topic 2 ccc ccb cac π β¦ π£2 aaa aab aac π½1:πΎ Step 1: Generate the topics π½1:πΎ : π½π βΌ π·ππ(πΌ) Step 2: For spontaneous events (level=0): ππ βΌ π(πΌπ£ , π 2 πΌ) Step 3: For triggered events (level>0): ππ βΌ π(πparent[π] , π 2 πΌ) Step 4: For each word in each document: π§π,π βΌ Discrete π ππ He et al. HawkesTopic , π₯π,π βΌ Discrete(π½π§π,π ) ICML 2015 09/17 Inference Joint variational inference based on full mean-field approximation ππ π πΌ, π, π· = π ππ ππ π ππ ππ π(π§π,π |ππ,π ) πβπΈ π=1 -- Laplace approximation for non-conjugate variable: ππ βΌ π(ππ , π 2 πΌ) -- Other variables: ππ βΌ Discrete ππ , π§π,π βΌ Discrete ππ,π Update for the π ππ ππ : Hawkes Process ππ,π β² β π ππ ππ β² , π 2 πΌ × Similarity between document topics He et al. π΄π£ ,π£ πβ² π × Influence between users HawkesTopic πΞ (π‘π β π‘π β² ) Proximity of events in time ICML 2015 10/17 Experiments: setting βEbolaβ news articles ~4 months ~9k articles, 330 news media sites Copying information as ground truth High-energy physics theory papers ~12 years Top 50/100/200 researchers Citation network as ground truth Evaluation metrics: -- Topic modeling: document competition likelihood [Wallach et al. 09] -- Network Inference: AUC against the ground truth network He et al. HawkesTopic ICML 2015 11/17 Experiments: algorithms Algorithm Description HTM Our method with topic number K=50 and K=100 for ArXiv with 200 authors LDA Latent Dirichlet Allocation with collapsed Gibbs sampling CTM Correlated topic modeling with variational inference Hawkes Hawkes process considering only event posting time Hawkes-LDA Two steps approach that first infers topics with LDA Hawkes-CTM Two steps approach that first infers topics with CTM He et al. Topic Modeling Network Inference HawkesTopic ICML 2015 12/17 Result: EventRegistry Network Inference accuracy: 10% improvement Component 1 Component 2 Component 3 Hawkes Hawkes-LDA Hawkes-CTM HTM 0.622 0.670 0.666 0.669 0.704 0.665 0.673 0.716 0.669 0.697 0.730 0.700 Topic modeling accuracy: Component 1 Component 2 Component 3 He et al. LDA CTM HTM -42945 -22558 -17574 -42458 -22181 -17574 -42325 -22164 -17571 HawkesTopic ICML 2015 13/17 Result: EventRegistry He et al. HawkesTopic ICML 2015 14/17 Result: ArXiv Network Inference accuracy: Top50 Top100 Top200 40% improvement Hawkes Hawkes-LDA Hawkes-CTM HTM 0.594 0.588 0.618 0.656 0.589 0.630 0.645 0.614 0.629 0.807 0.687 0.659 Topic modeling accuracy: Top50 Top100 Top200 He et al. LDA CTM HTM -11074 -15711 -27758 -10769 -15477 -27630 -10708 -15252 -27443 HawkesTopic ICML 2015 15/17 Result: ArXiv He et al. HawkesTopic ICML 2015 16/17 Conclusion HawkesTopic model unifies Correlated Topic Model and Hawkes process: ο infers hidden diffusion network ο discovers thematic topics of documents Joint model of temporal information and content information in text-based cascades gets the best result Experiments on ArXiv and EventRegistry datasets ο EventRegistry: 10% improvement in AUC ο ArXiv: 40% improvement in AUC He et al. HawkesTopic ICML 2015 17/17
© Copyright 2026 Paperzz