Dynamic Egocentric Models for Citation Networks

Dynamic Egocentric Models for Citation
Networks
Duy Vu
Arthur Asuncion
David Hunter
Padhraic Smyth
To appear in Proceedings of the 28th International Conference on
Machine Learning, 2011
MURI meeting, June 3, 2011
Outline
Egocentric Modeling Framework
Inference for the Models
Application to Citation Network Datasets
Egocentric Counting Processes
I
Goal: Model a dynamically evolving network
I
Following standard recurrent event theory, place a counting
process Ni (t) on node i, i = 1, . . . , n.
I
Ni (t) counts the number of “events” involving the ith node.
I
Combine Ni (t) gives a multivariate counting process
N(t) = (N1 (t), . . . , Nn (t)).
I
Genuinely multivariate; no assumption about the
independence of Ni (t).
I
“Egocentric” using Carter’s terminology because i are nodes,
not node pairs.
Modeling of Citation Networks
I
New papers join the network over time.
I
At arrival, a paper cites others that are already in the network.
I
Main dynamic development is the number of citations
received.
I
Thus, Ni (t) equals the cumulative number of citations to
paper i at time t.
I
“Egocentric” means Ni (t) is ascribed to nodes. Alternative
“relational” framework, using N(i,j) (t), is not appropriate
here: Relationship (i, j) is at risk of an event (citation) only at
a single instant in time.
I
Further discussion of general time-varying network modeling
ideas given by Butts (2008) and Brandes et al (2009).
The Doob-Meyer Decomposition
Each Ni (t) is nondecreasing in time, so N(t) may be considered a
submartingale; i.e., it satisfies
E [N(t) | past up to time s] ≥ N(s)
for all t > s.
The Doob-Meyer Decomposition
Each Ni (t) is nondecreasing in time, so N(t) may be considered a
submartingale; i.e., it satisfies
E [N(t) | past up to time s] ≥ N(s)
for all t > s.
Any submartingale may be uniquely decomposed as
Z t
N(t) =
λ(s) ds + M(t) :
0
I
λ(t) is the “signal” at time t (this intensity function is what
we will model)
I
M(t) is a continuous-time Martingale.
Modeling the Intensity Process
The intensity process for node i is given by
λi (t|Ht − ) = Yi (t)α0 (t) exp β > si (t) ,
where
Modeling the Intensity Process
The intensity process for node i is given by
λi (t|Ht − ) = Yi (t)α0 (t) exp β > si (t) ,
where
I
Yi (t) = I (t > tiarr ) is the “at-risk indicator”
I
Ht − is the past of the network up to but not including time t
I
α0 (t) is the baseline hazard function
I
β is the vector of coefficients to estimate
I
si (t) = (si1 (t), . . . , sip (t)) is a p-vector of statistics for paper i
Preferential Attachment Statistics
For each cited paper j already in the network. . .
PN
I First-order PA: sj1 (t) =
i=1 yij (t). “Rich get richer” effect
P
I Second-order PA: sj2 (t) =
i6=k yki (t)yij (t).
Effect due to being cited by well-cited papers
I
Recency-based
first-order PA (we take Tw = 180 days):
P
arr < T ).
sj3 (t) = N
y
w
i=1 ij (t)I (t − ti
Temporary elevation of citation intensity after recent citations
j
Statistics in red are time-dependent. Others are fixed once j joins
the network.
Triangle Statistics
For each cited paper j already in the network. . .
P
I “Seller” statistic: sj4 (t) =
yki (t)yij (t)ykj (t).
Pi6=k
I “Broker” statistic: sj5 (t) =
ykj (t)yji (t)yki (t).
P i6=k
I “Buyer” statistic: sj6 (t) =
i6=k yjk (t)yki (t)yji (t).
Seller
A
Broker
B
Buyer
C
Statistics in red are time-dependent. Others are fixed once j joins
the network.
Out-Path Statistics
For each cited paper j already in the network. . .
PN
I First-order out-degree (OD): sj7 (t) =
i=1 yji (t).
P
I Second-order OD: sj8 (t) =
i6=k yjk (t)yki (t).
j
Statistics in red are time-dependent. Others are fixed once j joins
the network.
Topic Modeling Statistics
Additional statistics, using abstract text if available, as follows:
I
An LDA model (Blei et al, 2003) is learned on the training set.
I
Topic proportions θ generated for each training node.
I
LDA model also used to estimate topic proportions θ for each
node in the test set.
I
We construct a vector of similarity statistics:
sLDA
(tiarr ) = θi ◦ θj ,
j
where ◦ denotes the element-wise product of two vectors.
I
We use 50 topics; each sj component has a corresponding β.
Partial Likelihood
Recall: The intensity process for node i is
λi (t|Ht − ) = Yi (t)α0 (t) exp β > si (t) .
If α0 (t) ≡ α0 (t, γ), we may use the “local Poisson-ness” of the
multivariate counting process to obtain (and maximize) a
likelihood function (details omitted).
Partial Likelihood
Recall: The intensity process for node i is
λi (t|Ht − ) = Yi (t)α0 (t) exp β > si (t) .
If α0 (t) ≡ α0 (t, γ), we may use the “local Poisson-ness” of the
multivariate counting process to obtain (and maximize) a
likelihood function (details omitted).
However, we treat α0 as a nuisance parameter and take a partial
likelihood approach as in Cox (1972): Maximize
m
m exp β > s (te )
exp β > sie (te )
Y
Y
ie
=
L(β) =
Pn
κ(te )
>
e=1
e=1
i=1 Yi (te ) exp β si (te )
Trick: Write κ(te ) = κ(te−1 ) + ∆κ(te ), then
optimize ∆κ(te ) calculation.
Data Sets We Analyzed
Three citation network datasets from the physics literature:
1. APS: Articles in Physical Review Letters, Physical Review, and
Reviews of Modern Physics from 1893 through 2009. Timestamps
are monthly for older, daily for more recent.
2. arXiv-PH: arXiv high-energy physics phenomenology articles from
Jan. 1993 to Mar. 2002. Timestamps are daily.
3. arXiv-TH: High-energy physics theory articles spanning from
January 1993 to April 2003. Timestamps are continuous-time
(millisecond resolution). Also includes text of paper abstracts.
APS
arXiv-PH
arXiv-TH
Papers
463,348
38,557
29,557
Citations
4,708,819
345,603
352,807
Unique Times
5,134
3,209
25,004
Three Phases
1. Statistics-building phase: Construct network history and
build up network statistics.
2. Training phase: Construct partial likelihood and estimate
model coefficients.
3. Test phase: Evaluate predictive capability of the learned
model.
Statistics-building is ongoing even through the training and test
phases. The phases are split along citation event times.
Number of unique citation
event times in the three phases:
APS
arXiv-PH
arXiv-TH
Building
4,934
2,209
19,004
Training
100
500
1000
Test
100
500
5000
Average Normalized Ranks
I
Compute “rank” for each true citation among sorted
likelihoods of each possible citation.
I
Normalize by dividing by the number of possible citations.
I
Average of the normalized ranks of each observed citation.
I
Lower rank indicates better predictive performance.
arXiv−PH
Average normalized rank
Average normalized rank
0.31
0.3
0.29
0.28
PA
P2PT
P2PTR180
0
2
4
Paper batches
6
arXiv−TH
0.26
0.24
0.22
0.2
0.18
0.16
0
PA
P2PT
P2PTR180
5
Paper batches
10
Average normalized rank
APS
0.32
PA
P2PT
P2PTR180
LDA
LDA+P2PTR180
0.3
0.25
0.2
0.15
0.1
0
5
Paper batches
10
I
Batch sizes are 3000, 500, 500, respectively.
I
PA: pref. attach only (s1 (t)); P2PT: s1 , . . . , s8 except s3 ;
I
P2PTR180: s1 , . . . , s8 ; LDA: LDA stats only
Recall Performance
Recall: Proportion of true citations among largest K likelihoods.
1
Recall
0.8
0.6
0.4
PA
P2PT
P2PTR180
LDA
LDA+P2PTR180
0.2
0
0
5000
10000
15000
Cut−point K
I
PA: pref. attach only (s1 (t)); P2PT: s1 , . . . , s8 except s3 ;
I
P2PTR180: s1 , . . . , s8 ; LDA: LDA stats only
Coefficient Estimates for LDA + P2PTR180 Model
Statistics
s1 (PA)
s2 (2nd PA)
s3 (PA-180)
s4 (Seller)
s5 (Broker)
s6 (Buyer)
s7 (1st OD)
s8 (2nd OD)
Seller
A
Broker
C
Seller
D
A
B
B
Buyer
Coefficients (β)
0.01362
0.00012
0.02052
-0.00126
-0.00066
-0.00387
0.00090
0.02052
C
Diverse seller effect:
D more likely cited than A.
Broker
A
B
B
Buyer
C
E
Diverse buyer effect:
E more likely cited than C .
References
Blei, D.M., Ng, A.Y., and Jordan, M.I.
Latent Dirichlet allocation.
Journal of Machine Learning Research, 3:993–1022, 2003.
Brandes, U., Lerner, J., and Snijders, T.A.B.
Networks evolving step by step: Statistical analysis of dyadic event
data.
In Advances in Social Network Analysis and Mining, pp. 200–205.
IEEE, 2009.
Butts, C.T.
A relational event framework for social action.
Sociological Methodology, 38(1):155–200, 2008.
Cox, D. R.
Regression models and life-tables.
Journal of the Royal Statistical Society, Series B, 34:187–220, 1972.
Why Such Long Building Phases?
I
The lengthy building phase mitigates truncation effects at the
beginning of network formation and effects of severely
grouped event times
I
Training and test windows still cover a substantial period of
time (e.g. 2.5 years for APS)
I
Performance is relatively invariant to the size of the training
windows. We achieved essentially the same results using
windows of size 2000 and 5000 for arXiv-TH.
Number of unique citation
event times in the three phases:
APS
arXiv-PH
arXiv-TH
Building
4,934
2,209
19,004
Training
100
500
1000
Test
100
500
5000
Average Partial Loglikelihood
Compute average of the partial likelihoods for each citation
event.
arXiv−PH
PA
P2PT
P2PTR180
−12.8
−12.85
−12.9
−12.95
0
2
4
Paper batches
6
Average partial likelihood
Average partial likelihood
APS
−12.75
arXiv−TH
Average partial likelihood
I
−10
−10.2
−10.4
−10.6
−10.8
0
PA
P2PT
P2PTR180
5
Paper batches
10
−8.5
PA
P2PT
P2PTR180
LDA
LDA+P2PTR180
−9
−9.5
−10
−10.5
0
5
Paper batches
10
I
Batch sizes are 3000, 500, 500, respectively.
I
PA: pref. attach only (s1 (t)); P2PT: s1 , . . . , s8 except s3 ;
I
P2PTR180: s1 , . . . , s8 ; LDA: LDA stats only