Price of Anarchy for the N-player Competitive Cascade Game with

HawkesTopic: A Joint Model for Network Inference and Topic
Modeling from Text-Based Cascades
Xinran He1, Theodoros Rekatsinas2,
James Foulds3, Lise Getoor3 and Yan Liu1
07/08/2015
1University
of Southern California
2University of Maryland, College Park
3University of California, Santa Cruz
Introduction
β€’ Diffusion is an important and fundamental phenomenon:
Viral marketing, detection of rumors, modeling news dynamics …
β€’ Abundant text-based cascades in a variety of social platforms
t=2
t=1
E
G
t=1.5
t=0
t=3.5
B
D
C
A
He et al.
F
HawkesTopic
ICML 2015
01/17
Traditional vs Text-based Cascades
Traditional cascades
Text-based cascades
t=2
G
t=1
t=3.5
t=2
E t=1.5
t=0
B
D
A
- Temporal information
t=1
t=3.5
C
F
t=1.5
t=0
- Temporal information
- Content information
Incorporate content information => better model of diffusion
Incorporate temporal information => better model of documents
He et al.
HawkesTopic
ICML 2015
02/17
Network Inference
t=2
G
t=3.5
t=1
aaa
bbb
bba
ccc
bbb
ccc
B
D
Topic 1
aaa
aab
ccc
bbc
aaa
E t=1.5
t=0
aaa
aaa
bbb
A
Topic 2
bbb bba
bbc
Topic 3
ccc
aaa
aab
ccc
G
0.2
C
0.1
E 0.2
0.3
0.5
B
0.6
F
D
C
0.1
A
F
Network Inference focuses on inferring a hidden diffusion network
Related work:
- NetInf, NetRate [Gomez et al. 11,12], MMHP [Yang and Zha 13], KernelCascades [Du el al. 12]
- TopicCascades [Du el al. 13]
He et al.
HawkesTopic
ICML 2015
03/17
Topic Modeling
G
t=3.5
ccc
bbc
aaa
t=2
Corpus
ccc
bbc
aaa
ccc
aaa
bbb
bbb
ccc t=1 bba
ccc
bbb
ccc
B
E t=1.5
aaa
bbb
bba t=0 aaa
aaa
D
Topic 1
aaa
aab
bbb
aaa
aaa
bbb
A
aaa
aab
ccc
Topic 3
Topic 2
bbb bba
bbc
ccc
ccc
bbc
aaa
aaa
bbb
bba
G
ccc
bbb
ccc
aaa
aab
C ccc
E
aaa
aaa
bbb
B
D
F
A
aaa
aab
ccc
C
F
Topic modeling aims to discover the latent thematic topics
Related work:
- LDA [Blei et al. 03], CTM [Blei and Lafferty 06]
- Citation Influence model [Dietz el al. 07], TIR model [Foulds et al. 13]
He et al.
HawkesTopic
ICML 2015
04/17
Our Contribution
ccc
bbc
aaa
Topic Modeling
ccc
bbc
aaa
t=2
G
t=3.5
t=1
aaa
bbb
bba
ccc
bbb
ccc
B
Topic 2
aaa
aab
ccc
bbb bba
bbc
aaa
aaa
bbb
ccc
bbb
ccc
aaa
aab
ccc
Topic 3
ccc
aaa
aaa
bbb
C
0.3
G
D
aaa
aab
aaa
bbb
bba
E t=1.5
t=0
Topic 1
A
F
0.3
Network Inference
E
0.2
B
0.4
0.6
D
C
0.1
F
A
HawkesTopic: joint model for simultaneous Network Inference
and Topic Modeling from text-based cascades
He et al.
HawkesTopic
ICML 2015
05/17
HawkesTopic: Intuition
ccc
cca
bbb
ccc
ccc
bbb
𝒕
𝑣1
aaa
aaa
bbb
aaa
aba
bbb
bbb
bba
cca
𝑣2
𝒕
Mutual exciting nature: A posting event can trigger future events
Content cascades: The content of a document should be similar to the
document that triggers its publication
He et al.
HawkesTopic
ICML 2015
06/17
Modeling Posting Times
Mutually exciting nature captured via Multivariate Hawkes
Process (MHP) [Liniger 09].
For MHP, intensity process πœ†π‘£ (𝑑) takes the form:
Rate
=
+
Base intensity
πœ†π‘£ 𝑑 =
πœ‡π‘£
+
Influence from previous events
𝑒:𝑑𝑒 <𝑑 𝐴𝑣𝑒 ,𝑣 𝑓Δ (𝑑
βˆ’ 𝑑𝑒 )
𝐴𝑒,𝑀 : influence strength from 𝑒 to 𝑣
𝑓Δ (β‹…): probability density function of the delay distribution
He et al.
HawkesTopic
ICML 2015
07/17
Generating Posting Times
𝒕
𝑣1
Level 0
Level 1
𝒕
𝑣2
Level 2
Generate events and their posting times in a breadth first order by interpreting the MHP
as clustered Poisson process [Simma 10]
Provide explicit parent relationship for evolution of the content information
He et al.
HawkesTopic
ICML 2015
08/17
Modeling Documents
ccb
cac
aaa
ccb
cac
ccc
𝛼1
Topic 1
𝒕
𝑣1
aab
aaa
ccc
𝛼2
aac
aab
ccc
ccb
aab
ccc
ccb
ccc
aab
Topic 2
ccc ccb
cac
𝒕
…
𝑣2
aaa aab
aac
𝛽1:𝐾
Step 1: Generate the topics 𝛽1:𝐾 : π›½π‘˜ ∼ π·π‘–π‘Ÿ(𝛼)
Step 2: For spontaneous events (level=0): πœ‚π‘’ ∼ 𝑁(𝛼𝑣 , 𝜎 2 𝐼)
Step 3: For triggered events (level>0): πœ‚π‘’ ∼ 𝑁(πœ‚parent[𝑒] , 𝜎 2 𝐼)
Step 4: For each word in each document: 𝑧𝑒,𝑛 ∼ Discrete πœ‹ πœ‚π‘’
He et al.
HawkesTopic
, π‘₯𝑒,𝑛 ∼ Discrete(𝛽𝑧𝑒,𝑛 )
ICML 2015
09/17
Inference
Joint variational inference based on full mean-field approximation
𝑁𝑒
𝑄 𝜼, 𝒛, 𝑷 =
π‘ž πœ‚π‘’ πœ‚π‘’ π‘ž 𝑃𝑒 π‘Ÿπ‘’
π‘ž(𝑧𝑒,𝑛 |πœ™π‘’,𝑛 )
π‘’βˆˆπΈ
𝑛=1
-- Laplace approximation for non-conjugate variable: πœ‚π‘’ ∼ 𝑁(πœ‚π‘’ , 𝜎 2 𝐼)
-- Other variables: 𝑃𝑒 ∼ Discrete π‘Ÿπ‘’ , 𝑧𝑒,𝑛 ∼ Discrete πœ™π‘’,𝑛
Update for the π‘ž 𝑃𝑒 π‘Ÿπ‘’ :
Hawkes Process
π‘Ÿπ‘’,𝑒 β€² ∝ 𝑁 πœ‚π‘’ πœ‚π‘’ β€² , 𝜎 2 𝐼
×
Similarity between
document topics
He et al.
𝐴𝑣
,𝑣
𝑒′ 𝑒
×
Influence between
users
HawkesTopic
𝑓Δ (𝑑𝑒 βˆ’ 𝑑𝑒 β€² )
Proximity of events
in time
ICML 2015
10/17
Experiments: setting
β€œEbola” news articles ~4 months
~9k articles, 330 news media sites
Copying information as ground truth
High-energy physics theory papers ~12 years
Top 50/100/200 researchers
Citation network as ground truth
Evaluation metrics:
-- Topic modeling: document competition likelihood [Wallach et al. 09]
-- Network Inference: AUC against the ground truth network
He et al.
HawkesTopic
ICML 2015
11/17
Experiments: algorithms
Algorithm
Description
HTM
Our method with topic number K=50
and K=100 for ArXiv with 200 authors
LDA
Latent Dirichlet Allocation with
collapsed Gibbs sampling
CTM
Correlated topic modeling with
variational inference
Hawkes
Hawkes process considering only
event posting time
Hawkes-LDA
Two steps approach that first infers
topics with LDA
Hawkes-CTM
Two steps approach that first infers
topics with CTM
He et al.
Topic Modeling Network Inference
HawkesTopic
ICML 2015
12/17
Result: EventRegistry
Network Inference accuracy: 10% improvement
Component 1
Component 2
Component 3
Hawkes
Hawkes-LDA
Hawkes-CTM
HTM
0.622
0.670
0.666
0.669
0.704
0.665
0.673
0.716
0.669
0.697
0.730
0.700
Topic modeling accuracy:
Component 1
Component 2
Component 3
He et al.
LDA
CTM
HTM
-42945
-22558
-17574
-42458
-22181
-17574
-42325
-22164
-17571
HawkesTopic
ICML 2015
13/17
Result: EventRegistry
He et al.
HawkesTopic
ICML 2015
14/17
Result: ArXiv
Network Inference accuracy:
Top50
Top100
Top200
40% improvement
Hawkes
Hawkes-LDA
Hawkes-CTM
HTM
0.594
0.588
0.618
0.656
0.589
0.630
0.645
0.614
0.629
0.807
0.687
0.659
Topic modeling accuracy:
Top50
Top100
Top200
He et al.
LDA
CTM
HTM
-11074
-15711
-27758
-10769
-15477
-27630
-10708
-15252
-27443
HawkesTopic
ICML 2015
15/17
Result: ArXiv
He et al.
HawkesTopic
ICML 2015
16/17
Conclusion
HawkesTopic model unifies Correlated Topic Model and Hawkes process:
οƒž infers hidden diffusion network
οƒž discovers thematic topics of documents
Joint model of temporal information and content information in text-based
cascades gets the best result
Experiments on ArXiv and EventRegistry datasets
οƒž EventRegistry: 10% improvement in AUC
οƒž ArXiv: 40% improvement in AUC
He et al.
HawkesTopic
ICML 2015
17/17