幻灯片 1 - GDM@Fudan

GDM@FUDAN Graph Data Management Lab, School of Computer Science http://gdm.fudan.edu.cn
Fudan University, Shanghai, China
Which Topic will You Follow?
Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei Wang and Sheng Huang
Introduction
•Motivations
Empirical Study
•Driving Forces of Topic-Following
•Who are the most appropriate candidates to receive a
call-for-paper or call-for-participation?
•What session topics should we propose for a
conference of next year?
•Addressing these objectives, we study author’s topicfollowing behavior in Scientific Collaboration Network
(SCN), i.e., an author follows others to publish papers of
a given topic
•U1: users
•U2: users
•U3: users
•U4: users
•Results:
affected by both social influence and homophily
affected only by social influence
affected only homophily
without any impact
•Two forces are mixed to impact topic-following
•Impacts are time-sensitive and decrease in an exponential way
•Basic Idea
•Scientific Collaboration Network
•It is represented as
a graph where vertices
represent authors and
edges represent coauthor
relationships extracted
from DBLP dataset
•It is a temporal graph
Gt, in which vertices and
edges increase as time t
elapses
•Author’s topic-following behavior is the process of topic
diffusion in social networks, which is driven by two
typical ingredients, social influence and homophily
•We try to find the variables that can precisely depict
social influence and homophily in our scenario and use
them to predict one author’s topic-following behavior in
future
•Challenges
•How to distinguish social influence and homophily?
•Topic definition and identification
•Sample sparseness
•Social Influence
•An author adopts a topic with more probability when
more of his neighbors have followed the topic before
•x is affected neighbor number/proportion
•p(x) is the probability that an author follows the topic
•It is more probable for an author to follow the topics
that have been adopted by his neighbors (direct
propagation) who have coauthored more papers with him
•Contributions
•Uncover the effects of social influence and homophily
on topic diffusion
•Propose a Multiple Logistic Regression (MLR) model to
predict author’s topic-following behavior
•Extensive experiments prove our model’s excellent
performance
Modeling Topic Diffusion in Scientific Collaboration Networks
•Model
•It is a two-category classification to predict whether an
author will follow a given topic
•Multiple Logistic Regression (MLR) model is feasible for
our scenario, where the probability of topic-following is
formalized as:
where xi is explanatory variable, αand β are parameters
we should estimate by training
•Baseline model
where a is the number of neighbors who have followed
the topic
•Explanatory Variables
•Social Influence
•An author u’s tendency to follow topic s in year t, is
composed from all his neighbor v’s tendency to this topic, as
well as considering their coauthor strengths
•Parameter Estimation
•By maximum likelyhood against training set
•β
has larger Wald value than β1 indicating FTS
(homophily) is more crucial to impact topic-following
behavior than FSI
2
•Model Evaluation
•Metrics
•Recall/sensitivity, specificity, precision, accuracy, AUC
•Fβ, we set β=1.1 to favor recall a little
•For topic XML
•Area under ROC curve (AUC) is 0.743 vs. 0.638
•Homophily
•We use topic similarity to depict the homophily among users
in the context of topic-following
•A 25-dim vector u represents an author’s topic history, each
dimension is the number of his papers of a given topic
•Then, topic similarity between user u and v can be defined
as
•For other 4 representative topics, MLR outperforms the
baseline in both accuracy and Fβ
•W.r.t. those users who have followed topic s before t, i.e.,
we measure u’s homophily as
•Then, the whole MLR model is
•Y=π(x)=1, if u follows s or its related topics
yangdeqing, [email protected]
ECML/PKDD2012, Bristol, UK