Analysing Tweets: During natural disasters in Europe and Asia Presenter: Kiran Zahra VGI Analytics - AGILE 2017 2 Central Italy 3 4 5 Strong 6 Strong Sudden 7 8 3 Killed 9 3 Killed 170 Temples damaged 10 11 Earthquake 12 Aim • How geographic features are reported from two different continents during disasters • The credibility of users reporting from two different continents • Performance of machine learning algorithm to filter noise 13 Data Collection 14 Case-study • Italy (Europe) 6.2 Magnitude • Myanmar (Asia) 6.8 Magnitude Wed 24 Aug, 488,238 16 Tweets 15 Research Question 1 • How reported geographic feature granularity vary from two continents? 16 Method • 500 Tweets (Italy and Myanmar) Geographic feature extraction Geonames gazetteer • Count number of occurrences • Hierarchy of geographic features Geocode 17 18 19 Research Question 2 • What is the quality of tweets in terms of credibility? 20 Twitter Metadata • Retweet count • Friends count • Location • Vrified • Screen name etc. 21 User-based Features Account age (AG) Statuses count (SC) The time passed since the author registered their account The number of tweet sent by the user Followers count (FoC) Number of people following this user Friends count (FrC) Verified (V) Has description (D) Has URL (U) Number of people user is following If the account has been verified A non-empty bio A non-empty homepage URL (Castillo, Mendoza & Poblete 2011) 22 Credibility • Credibility is the function of C= f (FrC, SC , FoC , AG, U, D, V) 23 Credibility Friends Count Italy Status Count Myanmar Italy Myanmar 700000 30000000 600000 25000000 500000 20000000 400000 300000 15000000 200000 10000000 100000 5000000 0 0 Friends Count Status Count 24 Credibility Followers Count Italy Account Age Myanmar Italy Myanmar 2500000 3000 2000000 2500 2000 1500000 1500 1000000 1000 500000 500 0 0 Followers Count Account Age 25 Credibility URL Italy Description Myanmar Italy Myanmar 300 460 250 455 200 450 150 100 445 50 440 0 435 URL Description 26 27 Classifiers trained for one event, work good for the same kind of event without training data (Verma et al. 2011) 28 Research Question 3 • How well Naïve Bayes perform when the classifier is trained for one event and tested on another event of same nature? 29 Tweet Classification • Information: Tweet text about disaster event and its location • Not Information: Everything else falls in this category 30 Information 31 Information Earthquake Italy 32 Information 33 Information Earthquake Italy 34 Not Information 35 Not Information 36 Naïve Bayes Workflow Reads Test and Training Data Creates a Corpus Classify based on Frequency of Terms 37 Precision & Recall for Italy • Information: 98% Precision , 93% Recall • Not Information: 92% Precision, 97% Recall 38 Naïve Bayes Workflow Reads Test Data Creates a Corpus Classify based on Frequency of Terms 39 Precision & Recall for Myanmar • Information: 88% Precision , 91% Recall • Not Information: 92% Precision, 89% Recall 40 Key Points • Granularity of geographic location is very important when reporting disasters • Geographic locations can be differently reported from different regions of the world • User-based credibility features reflect different user characteristics 41 Key Points • Difficult to judge credibility based only on one set of features • Naïve Bayes trained on one event performed well on the same kind of event in a different geographic region 42 Limitations • Local administrative divisions do not match Geonames gazetteer • Use of English language to collect data in non-native English speaking countries • Twitter streaming API • Sampling 43 Understanding the properties of social media are important before using this data during disasters 44
© Copyright 2026 Paperzz