Analysing tweets describing during natural disasters in Europe and

Analysing Tweets:
During natural disasters
in Europe and Asia
Presenter: Kiran Zahra
VGI Analytics - AGILE 2017
2
Central
Italy
3
4
5
Strong
6
Strong
Sudden
7
8
3 Killed
9
3 Killed
170 Temples
damaged
10
11
Earthquake
12
Aim
• How geographic features are reported from two
different continents during disasters
• The credibility of users reporting from two different
continents
• Performance of machine learning algorithm to filter
noise
13
Data Collection
14
Case-study
• Italy (Europe) 6.2 Magnitude
• Myanmar (Asia) 6.8 Magnitude
Wed 24 Aug, 488,238
16
Tweets
15
Research Question 1
• How reported geographic feature granularity vary
from two continents?
16
Method
• 500 Tweets
(Italy and
Myanmar)
Geographic
feature extraction
Geonames
gazetteer
• Count
number of
occurrences
• Hierarchy of
geographic
features
Geocode
17
18
19
Research Question 2
• What is the quality of tweets in terms of credibility?
20
Twitter Metadata
• Retweet count
• Friends count
• Location
• Vrified
• Screen name etc.
21
User-based Features
Account age (AG)
Statuses count (SC)
The time passed since the author
registered their account
The number of tweet sent by the user
Followers count (FoC) Number of people following this user
Friends count (FrC)
Verified (V)
Has description (D)
Has URL (U)
Number of people user is following
If the account has been verified
A non-empty bio
A non-empty homepage URL
(Castillo, Mendoza & Poblete 2011)
22
Credibility
• Credibility is the function of
C= f (FrC, SC , FoC , AG, U, D, V)
23
Credibility
Friends Count
Italy
Status Count
Myanmar
Italy
Myanmar
700000
30000000
600000
25000000
500000
20000000
400000
300000
15000000
200000
10000000
100000
5000000
0
0
Friends Count
Status Count
24
Credibility
Followers Count
Italy
Account Age
Myanmar
Italy
Myanmar
2500000
3000
2000000
2500
2000
1500000
1500
1000000
1000
500000
500
0
0
Followers Count
Account Age
25
Credibility
URL
Italy
Description
Myanmar
Italy
Myanmar
300
460
250
455
200
450
150
100
445
50
440
0
435
URL
Description
26
27
Classifiers trained for one event,
work good for the same kind of
event without training data
(Verma et al. 2011)
28
Research Question 3
• How well Naïve Bayes perform when the classifier
is trained for one event and tested on another
event of same nature?
29
Tweet Classification
• Information: Tweet text about disaster event and
its location
• Not Information: Everything else falls in this
category
30
Information
31
Information
Earthquake
Italy
32
Information 
33
Information 
Earthquake
Italy
34
Not Information
35
Not Information
36
Naïve Bayes Workflow
Reads Test
and Training
Data
Creates a
Corpus
Classify based
on Frequency
of Terms
37
Precision & Recall for Italy
• Information: 98% Precision , 93% Recall
• Not Information: 92% Precision, 97% Recall
38
Naïve Bayes Workflow
Reads Test
Data
Creates a
Corpus
Classify based
on Frequency
of Terms
39
Precision & Recall for Myanmar
• Information: 88% Precision , 91% Recall
• Not Information: 92% Precision, 89% Recall
40
Key Points
• Granularity of geographic location is very important
when reporting disasters
• Geographic locations can be differently reported
from different regions of the world
• User-based credibility features reflect different user
characteristics
41
Key Points
• Difficult to judge credibility based only on one set
of features
• Naïve Bayes trained on one event performed well
on the same kind of event in a different geographic
region
42
Limitations
• Local administrative divisions do not match Geonames
gazetteer
• Use of English language to collect data in non-native
English speaking countries
• Twitter streaming API
• Sampling
43
Understanding the properties of social
media are important before using this
data during disasters
44