Slides

Attention and Event Detection
• Identifying, attributing and describing spatial bursts
• Early online identification of attention items in social media
Louis Gong
[email protected]
www.louisgong.com
Identifying, attributing and describing
spatial bursts
Michael Mathioudakis, Nilesh Bansal, and Nick Koudas. 2010.
Identifying, attributing and describing spatial bursts. Proc. VLDB. 3, 1-2
(September 2010), 1091-1102.
•
•
•
•
•
Problem Description
Related Works
Solution
Experiment & Result
Q&A
BlogScope
• Automatically collect
information.
(blogosphere, news sources, social network,
online forums.)
• Advanced information retrieval tasks with data
mining and language processing.
• Warehouses metadata about the content (time
of creation , demographic profile of author).
Problem Description
• User generated content that appears on blogs,
microblogging websites, wikis and social networks
proliferates at profound rates.
• Automating the process of information discovery given
the vast collection of information.
• Example:
Barack Obama, 2008,
Bin laden, recently
Related Works
• 1. J. Kleinberg. Bursty and hierarchical structure in streams. In KDD, 2002.
•
proposed a model for burst identification over document streams.
• 2. J. M. Kleinberg and E. Tardos. Approximation algorithms for
classification problems with pairwise relationships: metric labeling and
markov random fields. J. ACM, 2002
•
provides a 2-approximation linear programming algorithm to spatial burst
detection problem.
•
3. Statistical discrepancy functions are used to quantify the difference between
distributions and are commonly used to identify regions where two spatial
distributions differ significantly. Such regions can be interpreted as areas where
one spatial distribution exhibits a burst in comparison with the other.
Solution
• Identify spatial burst
• Burst attribution
• Keywords based description
6
Spatial Bursts
• G: grid; for a suitable choice of granularity, geographical
entities of interest(cities) correspond to a cell.
•
Rs: the spatial distribution of related documents
published within t.
•
Ds: the spatial distribution of all the documents published
within t.
•
Spatial bursts are identified as cells for
which the value of Rs is large in comparison
with Ds.
Burst Attribution
• Attribute the burst to profile features.
•
1. Focus on a specific set of bursty cells and ask what
are the demographic factors in the absence of which no
burst would have been detected. (eg. “Toronto Film
Festival”)
•
2. Compare a bursty region with a non-bursty region
and get the demographic factors that make
the difference.
Keyword based description of bursts
• Query Expansion:
•
Identify the keywords highly related to q (bursts for a query q).
q U wi .
• Curve Estimation:
•
•
the keywords w that occur frequently together with q often exhibit
a burst themselves over the same interval.
q0[t]est = (1 + ) minfb(q)[t]; b(wi)[t]g
Experiment & Result
• Average running time of the algorithms
Experiment & Result
• Queries q were
submitted to
BlogScope, with
temporal interval qt
set as the first 10
days of March 2009.
• Retrieving
distributions Rs and
Ds for a query.
Experiment & Result
• Parameter Sensitivity
Summary
• Scalable method to identify spatial information bursts.
• Efficient techniques to attribute bursts to specific
demographic factors.
• Techniques to analyze bursts and effectively identify
sets of keywords that describe the burst.
Early online identification of attention
items in social media
Michael Mathioudakis, Nick Koudas, and Peter Marbach. 2010
In Proceedings of the third ACM international conference on Web
search and data mining (WSDM '10). ACM, New York, NY, USA, 301310.
•
•
•
•
•
Problem Description
ISIS Model
Experiment
Result
Q&A
Problem Description
• Activity in social media is manifested via interaction that
involve text, images, links and other information items.
• Naturally, some items attract more attention than
others, expressed with large volumes of linking,
commenting or tagging activity.
• Being able to identify information items that gather
much attention in such a real time information collective
is a challenging task.
Comparison (traditional & social media)
• Traditional webpages – Graph Model (PageRank)
•
diff:
• 1. Social media is associated with individual documents,
pictures, news articles. So it is reasonable to separate
the measures for the importance or attention gathering
potential of different items.
• 2.Linking activity in social media is the product of
continuous interaction between participating individuals.
Dynamic aspects of this process are not captured by
graph model.
Comparison (traditional & social media)
• 3.Linking is not the only
action by which structure
arises in social media, as
individuals also interact by
commenting, sharing,
recommending or rating.
Subject
• Proposed the first formal definition and analysis of such
a model and use it as a basis to identify attention
gathering items in online fashion.
• Identify individual items that attract a significant number
of actions and its main focus is ‘early identification’ of
such items.
ISIS Model
• An abstraction of social media activity.
• Information units(units) – items such as blog posts
status messages, photos, etc. in social media stream.
• Information sources(sources) – individuals contributing
information.
• A source participate in two sets of stochastic processes:
•
•
1. The process of emitting information units in a streaming
fashion.
2. Processes of interaction with other sources.
ISIS Model
• Each unit is associated with a timestamp tp and a validity period dp.
• The validity periods of units emitted by the same source might
overlap.
ISIS Model
• Source interaction
ISIS Model
• Source interaction
ISIS Model
• Source interaction
Experiment Setting
Result
• Interaction weights of
posts in
• (a) engadget.com
• (b) techcrunch.com
Result
• Attention Gathering Posts
Result
• Quality vs Efficiency Trade-offs
Summary
• ISIS Model : a general stochastic model for interacting
streaming information sources.
• Measure for the attention gathering potential of
information units.
• Experimental results on real data collected form a
period of blogging activity.
Q&A
Thank You
Louis Gong
[email protected]
www.louisgong.com