Event Identification in Social Media

EVENT IDENTIFICATION IN
SOCIAL MEDIA
Hila Becker, Luis Gravano
Columbia University
Mor Naaman
Rutgers University
Social Media Sites Host Many
“Event” Documents
2
“Event”= something that occurs at a certain time in a certain place
[Yang et al. ’99]

Popular, widely known events
Presidential Inauguration, Thanksgiving Day Parade

Smaller events, without traditional news coverage
Local food drive, street fair

…
Photo-sharing: Flickr
Video-sharing: YouTube
Social networking: Facebook
Social media documents for “All Points West” festival,
Liberty State Park, New Jersey, 8/8/08
Identifying Events and Associated
Social Media Documents
3

Applications
 Event

search and browsing
 Local search
…
General approach: group similar documents via
clustering
Each cluster corresponds to one event and its associated
social media documents
Event Identification: Challenges
4

Uneven data quality
 Missing,
short, uninformative text
 … but revealing structured context available: tags,
date/time, geo-coordinates



Scalability
Dynamic data stream of event information
Unknown number of events
 Necessary
for many clustering algorithms
 Difficult to estimate
Clustering Social Media Documents
5
Social media document representation
 Social media document similarity
 Social media document clustering
 Clustering task: definition
 Ensemble algorithm: combining multiple
clustering results
 Preliminary evaluation

Social Media Document Representation
6
Title
Description
Tags
Date/Time
Location
All-Text
Social Media Document Similarity
7
Title

Title
Description
Description
Tags
Text: tf-idf weights, cosine similarity
A A A

Date/TimeTags
Keywords
B
B
B
Time: proximity in minutes
time
LocationDate/Time
Keywords
Date/TimeLocation
Proximity
LocationProximity
All-Text
All-Text

Location: geo-coordinate proximity
Social Media Document Clustering Framework
8
Social media
documents
Document feature
representation
Event clusters
Clustering: Ensemble Algorithm
9
Ctitle
Wtitle
Ctag
Wtags
f(C,W)
s
Wtime
Ctime
Consensus Function:
combine ensemble
similarities
Learned in a
training step
Ensemble
clustering
solution
Clustering: Measuring Quality
10

Homogeneous clusters
✔

Complete clusters
✔

Metric: Normalized Mutual Information (NMI)
Shared information between clustering solution and
“ground truth”
Experimental Setup
11

Data: >270K Flickr photos
Event labels from Yahoo!’s “upcoming” event database
 Split into 3 parts for training/validation/testing





Clusterers: single pass algorithm with centroid similarity
Weighing scheme: Normalized Mutual Information
(NMI) scores on validation set
Consensus function: weighted average of clusterers’
binary predictions
Final prediction step: single pass clustering algorithm
Preliminary Evaluation Results
12

Individual clusterer performance
 Highest
NMI: Tags, All-Text
 Lowest NMI: Description, Title

Ensemble performance, compared against all
individual clusterers
 Highest
overall performance in terms of NMI
 More homogenous clusters: each event is spread
over fewer clusters
Details in paper
Future Work: Alternative Choices
13
Document similarity metric
 Ensemble approach
 Weight
assignment
 Choice of clusterers
 Train
a classifier to predict document similarity
 Features
correspond to similarity scores
 All-text, title, tags, time, location, etc.
 Numeric values in [0,1]
 State-of-the-art classifiers: SVM, Logistic Regression, …
Future Work: Alternative Choices
14

Final clustering step
 Apply
graph partitioning algorithms
Requires estimating the number of clusters


Evaluation metrics: beyond NMI
Datasets
 Flickr
LastFM, YouTube
 Exploit social network connections
Conclusions
15

Identified events and their corresponding social
media documents
Proposed a clustering solution
 Leveraged different representations of social media
documents
 Employed various social media similarity metrics



Developed a weighted ensemble clustering approach
Reported preliminary results of our event
identification approach on a large-scale dataset of
Flickr photographs