Linguistic and Pragmatic Patterns in Understatement Tweets

Linguistic and Pragmatic Patterns in Understatement Tweets
Gwen Wijman
ANR: 620029
Bachelor Thesis
Communication and Information Sciences
Specialization Human Aspects of Information Technology
Faculty of Humanities
Tilburg University, Tilburg
Supervisor: dr. G.A. Chrupala
Second Reader: dr. ir. P.H.M. Spronck
January 2015
Acknowledgement
This thesis is the final product for my Bachelor‟s program Communication and Information
Sciences at Tilburg University. I have learned a great deal the last three and a half years and
even though I could not use all that I have learned in this particular thesis project, I am happy
that I took on this challenge.
First and foremost I want to thank my thesis supervisor, Gzregorz Chrupala, for his support
and advice. Thanks for being very patient and answering all my (sometimes foolish) questions
with care.
Thanks to my second reader, Pieter Spronck, for assessing my thesis in its final stages.
Thanks to Bram Jansen for your help and words of encouragement. You were the light when
all I could see was darkness. I literally would not have completed my thesis without your
support.
Last but not least, I want to thank everyone who ever helped me to complete the journey of
my Bachelor‟s program. I hope to have you along for another year as I will take on the
challenge of obtaining a Master‟s degree in Tilburg.
2
Abstract
In this study, I investigate whether linguistic and pragmatic features that have previously been
found in sarcastic tweets, are present in understatement tweets. Two binary logistic regression
models were used to find character and word n-grams to predict whether a tweet would be an
understatement or not. Patterns that had been found in sarcastic tweets appeared in the
understatement tweets, these features included laughter expressions, emoticons, repeated
punctuation, question marks, exclamations, intensifiers, emotion words and explicit markers
for sarcasm.
The results were not completely in line with research previously conducted by De
Freitas et al. (2014), Liebrecht et al. (2013) and González-Ibáñez et al. (2011) because the
features present in my dataset differed in polarity and strength. Approximately half of the
features found were positive predictors for understatements and the other 50 percent were
negative predictors for understatements. Not all features relating to sarcastic tweets were
strong predictors for understatements or non-understatements.
3
Table of contents
1. Introduction
p. 6
1.1 Problem Statement
p. 6
1.2 Research question
p. 6
2. Theoretical Background
p. 8
2.1 Sarcasm and irony
p. 8
2.2 Understatements
p. 10
2.3 Patterns in sarcastic tweets
p. 10
3. Methods
p. 13
3.1 Data collection
p. 13
3.2 Software
p. 14
3.3 Analysis
p. 14
3.3.1 N-grams
p. 14
3.3.2 Logistic regression
p. 14
4. Results
p. 16
4.1 The classification models
p. 16
4.2 Character n-grams
p. 20
4.3 Word n-grams
p. 25
4.4 New data
p. 29
5. Discussion
p. 31
5.1 Confusion matrices
p. 31
5.2 Error analysis
p. 32
5.3 Research question
p. 35
4
6. Conclusion
p. 36
6.1 Conclusion
p. 36
6.2 Implications and future work
p. 37
References
p. 38
5
1. Introduction
1.1 Problem statement
Understatements are often used in different settings, including movies, books, social media
and real life settings. One example of understatement use in movies is from Monty Python
and the Holy Grail by Gilliam & Jones (1975). Both arms of the Black Knight are cut off but
the knight refers to it as „jut a flesh wound‟. Another example is from the social media
network Twitter: googlemaps (2014): “Today‟s @letour stage is a bit on the twisty side.
#understatement http://goo.gl/8l9nMe”. A final example of using an understatement in real
life would be saying in a conversation that „it is raining a bit‟ while it in fact is pouring.
Because companies are interested in the valuable information that utterances on
social media contain, Justo, Corcoran, Lukin, Walker, & Torres (2014) note that they
could be interested in opinion mining and sentiment analysis on social media platforms.
They also state that language on social media is challenging to examine through Natural
Language Processing (NLP) because of its informal and unstructured nature. That makes
language on social media different from language as written in books and language used
in face to face conversations.
Research has been conducted related to finding patterns and features of sarcastic or
ironic tweets. But no such research has been done concerning one specific type of irony:
understatements.
1.2 Research Question
Because a lot of research has already been conducted for finding linguistic and pragmatic
patterns in ironic and sarcastic tweets, I am interested in a smaller part of the whole. I want to
know whether patterns that already have been found for sarcastic and ironic tweets can be
found in tweets containing understatements as well.
6
This leads to the following research question:
Research question (RQ): Are there linguistic and/or pragmatic patterns to be found in tweets
containing understatements?
The results of this study indicate that there are patterns relating to linguistic and
pragmatic features in sarcasm research to be found in understatement tweets. However, these
patterns are not all positive predictors for understatements. Approximately half of the features
were negative predictors for understatements. This entails that they rather predict that an
utterance is not an understatement. The features previously found in sarcasm research are not
as strong in understatement tweets as they were for sarcastic tweets.
The following chapters will be discussed in this thesis: Chapter two focuses on the
theoretical background that underlies this research. Chapter three discusses the methods used
for our experiments. Chapter four presents the results that were found by conducting our
experiments. Chapter five gives a detailed error analysis and discusses the results found.
Finally, chapter six gives a conclusion of this thesis and discusses limitations and implications
for future work.
7
2. Theoretical Background
In this chapter we will focus on the theoretical background which is the foundation for this
thesis. We will start by explaining sarcasm and irony in section 2.1. Section 2.2 focuses on
understatements and the concluding section 2.3 gives an overview of features found in
previous research of sarcastic tweets.
2.1 Sarcasm and irony
Researchers do not agree about the use of the terms “irony” and “sarcasm”. The terms
are sometimes used interchangeably in research literature (Kreuz & Caucci, 2007; Liebrecht,
Kunneman & Van Den Bosch, 2013). While other researchers choose to make a distinction
between (verbal) irony and sarcasm, for example Caucci and Kreuz (2012) and Colston and
O‟Brien (2000). The Oxford English Dictionary defines irony as “the expression of meaning
through the use of words which normally mean the opposite in order to be humorous or to
emphasize a point” (Soanes et al., 2006, p.384). While sarcasm is characterized as “a way of
using words which say the opposite of what you mean, in order to upset or mock someone
(Soanes et al., 2006, p.642). These definitions look very similar but according to Nunberg (as
cited in Caucci & Kreuz, 2012) sarcasm is “just one of many types of verbal irony” (p.1).
Sarcasm and irony are seen as equal in this research. We therefor use „sarcasm‟ when irony
might be meant, but do keep in mind that sarcasm is a type of verbal irony.
In a conversation, speakers normally comply to certain rules or “maxims” (Grice, as
cited in Colston, 1997). When a speaker violates a maxim, the speaker does so intentionally.
This is called a conversational implicature. According to Grice (as cited in Colston, 1997), the
maxim of quality is violated when using sarcasm. When the sarcastic utterance is taken
literally, it has a different meaning than when the receiver would understand the speaker‟s
intent or implicature of the message. An example is a child who tells his mother that the meal
she prepared is really fantastic. He however refuses to eat any of it. The mother might think
8
that there is something wrong with his appetite, when she takes his statement literally. The
child however, does not like the food at all, and uses a sarcastic utterance to implicate this.
According to Maynard and Greenwood (2014) sarcasm often occurs in user generated
content and is challenging to investigate. Background knowledge such as context and culture
are needed to identify whether an utterance is sarcastic or not. Because it is difficult for
computers to interpret this context and culture it is challenging to detect sarcasm through
machines.
In spoken language sarcasm is identifiable through non-verbal cues such as face
expressions and the use of voice inflections (Lee & Narayanan, 2005). In written text, Lee and
Narayanan (2005) claim, there are no standard cues to inflect the reader‟s intent of being
sarcastic. However, Liebrecht et al. (2013) show that hashtags are used as extra linguistic
elements. The hashtag #sarcasm for example, can be seen as “the social media equivalent”
(p.35) of non-verbal cues that people transmit when using sarcasm while communicating face
to face.
In their research, González-Ibáñez, Muresan, & Wacholder (2011) used sarcastic
utterances which where explicitly identified as sarcastic by the writer for their analysis. They
assume that the best judge is the author of the tweet because other human judges do not have
enough context to successfully judge whether a tweet is sarcastic or not. Other research has
also focused on author identification (the usage of a hashtag) of sarcastic tweets (Maynard &
Greenwood, 2014; Liebrecht, Kunneman & Van Den Bosch, 2013).
The use of hashtags has to be handled with care. One has to take into account that
hashtags might not always be used reliably because a user assigns them to their own writings.
A wrong understanding of the concept of sarcasm might lead to sarcasm assignment failure.
Justo et al. (2014) note that not every sarcastic utterance is tweeted with the hashtag #sarcasm,
and that the hashtag may only be used for the most obvious forms of sarcasm.
9
2.2 Understatements
An understatement is an expression which is intentionally expressed weaker than it truly is.
According to the Oxford English Dictionary an understatement (noun) or to understate (verb)
is to “describe or represent something as being smaller or less important than it really is”
(Soanes, Hawker, & Elliott, 2006, p.800). An understatement can be seen as a form of verbal
irony, such as sarcasm.
As in irony, the maxim of quality is broken when using understatements (Grice, in
Colston, 1997). A soccer supporter (A) could be watching a game of their favorite team, and
that time could be losing because they are not playing very good. That supporter might say to
a fellow supporter (B) sitting next to him that „they are messing things up a bit‟. Supporter A
intentionally weakens his utterance towards supporter B for ironic purposes. Supported A
knows that supporter B sees that their favorite team are messing things up very badly.
In addition to the violation of the maxim of quality. The maxim of quantity is broken
in understatements. The maxim of quantity entails that one should only give enough
information to allow the receiver to understand the message (Martin, 1991). The implicature
of an understatement is therefore transmitting a message without literally saying it. Supporter
A says that his team is „messing things up a bit‟ while in fact his team is playing very badly.
2.3 Patterns in sarcastic tweets
Different patterns can be distinguished in sarcastic utterances on Twitter. De Freitas,
Vanin, Hogetop, Bochernitsan & Vieira (2014) identified several linguistic patterns for irony
in Portuguese tweets related to the topic „Fim do mundo‟, or the end of the world in English.
Patterns they found include laughter expressions (for example hahaha in English), emoticons
(, ), hashtags (#ironia, #joking, #kidding) and different specific utterances such as “só
que” which translates to “NOT” in English. They also looked for the use of repeated
10
punctuation (???, !!!!) and quotation marks (?). The patterns that occurred most in their corpus
were laughter expressions, emoticons, the use of repeated punctuation and quotation marks.
Liebrecht et al. (2013) collected a corpus of Dutch tweets labeled with the hashtag
#sarcasm (#sarcasme in Dutch). They trained a classifier to predict sarcasm in tweets with the
hashtag #sarcasm removed. Their analysis using a Balanced Winnow Classifier shows that
sarcastic tweets often contain hyperboles (in 60% of the tweets), which are formed with the
use of intensifiers (positive words including „awesome‟, „lovely‟, „fantastic‟) and positive
exclamations („wow‟, „yay‟, „yes‟). The sarcastic tweets that did not use a hyperbole (34%)
often contained an explicit marker (words that are possible synonyms for #sarcasm such as
#LOL, #joke and #NOT, the use of the hashtag is not required).
González-Ibáñez et al. (2011) collected sarcastic, positive and negative tweets based
on the hashtags that users had assigned to their tweets. They described features which could
occur in sarcastic, positive and negative tweets and investigated whether these factors could
be used to identify sarcastic, positive or negative tweets with help of a χ2 analysis. They made
a clear distinction between lexical and pragmatic factors. Lexical factors include emotion
words (positive and negative), negations (unbreakable, misunderstood) and interpunction
(exclamations marks, question marks). Emoticons and mentions (@google) were identified as
pragmatic factors. González-Ibáñez et al. (2011) found that positive emotions (for example
happy, excited), negations and mentions are important patterns for sarcasm detection. In this
research we choose not to consider mentions as a marker for sarcastic tweets, because
mentions are very common in overall Twitter usage.
In table 1, a summary is made of the linguistic and pragmatics patterns that are found
in sarcastic and ironic tweets in studies conducted by De Freitas et al. (2014), Liebrecht et al.
(2013) and González-Ibáñez et al. (2011). The examples used are real tweets collected from
the Twitter API.
11
Table 1.
Linguistic and pragmatic patterns in sarcastic tweets.
Pattern
Example
Laughter expressions
Haha the joys of coming home from work at 10.30 and
having to study. I have such an amazing life. #Sarcasm.
Emoticons
Mid life crisis is what makes some men popular on twitter :)
#FACT #outrage #sarcasm
Repeated punctuation
NO!! I'm soooo surprised!! #Sarcasm
Question mark
This is what the 10 Year Treasury has done so far this
morning. Totally rational, right? #sarcasm
Intensifiers
Well, this is an awesome day. #sarcasm
Exclamations
off to work…..YAY!!!! #sarcasm – Thankfully I only have
to do my old job for 2 nights
Explicit markers
This has been de best day ever #not #lies #sarcasm
Emotion words
I start my day reading class tomorrow and I am SO excited.
#sarcasm
12
3. Methods
In this chapter the methodologies that underlie the experiments are discussed. In Section 3.1
information about data collection and the dataset is given. Section 3.2 discusses the software
used for the experiments and in 3.3 the analysis of our data is explained.
3.1 Data collection
Data used for this research consists of English tweets which contain the hashtag
#understatement. Tweets were collected using a script in Python. The script was used twice
daily to collect as many tweets as possible. In the period between September 28th and October
19th 2958 tweets were gathered.
The tweets collected then needed to be manually filtered for retweets. Retweets are
tweets that have been reposted by a user other than the author. It was then needed to filter the
tweets for true understatements. In this research an understatement is considered a true
understatement when the utterance is intentionally weakened for sarcastic purposes. 656
tweets were used for the final analysis. All hashtags #understatement were removed before
analyzing the data.
To compare the understatement tweets to non-understatement tweets, tweets from the
same users as in the understatement dataset were gathered. A script in Python is used to gather
the tweets in the timeline of every user, and for each user a non-understatement tweet was
collected. Because it was not possible to retrieve tweets of every user of the understatement
dataset 646 non-understatement tweets were collected. Some users protected their accounts,
making it only possible for followers to collect their tweets and other users had deleted their
profiles.
The total dataset consisting of understatements and non-understatements is divided
into three parts. One part for training the classifier, and two parts for testing. The training set
13
consists of 60 percent of the tweets (782 tweets), the validation set consists of 20% (260
tweets) and the final testing set consists of the remaining 20% (260 tweets).
3.2 Software
The software used in this research includes PythonXY, the Python GUI and editor named
IDLE. For collecting tweets the Twitter API is used in combination with a Python script. For
the analysis of the data scikit-learn python module was used.
3.3 Analysis
Using Python and the Scikit-learn module, binary logistic regression was used to predict the
dependent variable on the basis of the independent variables. The dependent variable in this
research is whether a tweet is an understatement or not. The independent variables are ngrams extracted from the tweets. We compare the extracted n-grams to the features listed in
Table 1 (in section 2.3).
3.3.1 N-grams
N-grams were used to find whether subsequent characters and words were predictors for
tweets containing understatements. We experimented what range of n-grams gave the best
accuracy score and the best model was used.
3.3.2 Logistic regression
Logistic regression is a linear model for classification. The logistic regression estimates what
the probability is that an event will occur. When an event does not occur it is labeled with 0
and when it does occur it is labeled with 1.
14
With use of binary logistic regression we want to predict whether a tweet contains an
understatement or not. In this research, tweets with no understatement belong to category 0
and tweets with an understatement belong to category 1. It predicts whether in this case a
tweet belongs to category 0 or 1 on the basis of other information. The information in which
the classification is based in this research are character or word n-grams which are extracted
from the tweets in our dataset. The model that is built using the n-grams and the logistic
regression helps us to predict whether new tweets are understatements or non-understatements
(Field, 2009). Because there are several predictors in our logistic regression our equation is as
follows:
( | )
(
∑
)
With β0 as the bias, Xn as the predictor variable and βn as the weight or coefficient of the
corresponding predictor variable.
Our model has a value which regulates the strength of the regularization of our
analysis, the C-value. Regularization is the process of the penalization of extreme parameter
values. The smaller the C-value, the stronger the regularization is in our models. Trying
different values for the regularization variable is helpful, because the model could become
more accurate for a certain value of C. When the best value of C is found, the model does not
overfit or underfit the model to our training set.
15
4. Results
This chapter will show the results of our experiments. First, the experiments concerning the
character n-grams and word n-grams classification models are discussed in section 4.1.
Subsequently, the results relating to character n-grams and word n-grams are given in section
4.2 and 4.3. Finally, we present the results of the classification models when tested with new
data in section 4.4.
4.1 The classification models
The experiments for this study commenced with finding the right classification models for
analyzing the data. In order to find the best model I experimented with different ranges for
character and word n-grams and chose the range with the highest accuracy score. Then I
tested whether different C-values had an effect on the accuracy scores for the n-gram ranges.
The combination of the best n-gram range and the C-value with the highest accuracy score
was used as our classification model for further experiments. When accuracy scores for
different ranges and c-values were identical, the most simple model was used. This section
will give an overview of the results of the experiments conducted to find the best
classification models. We start by presenting the results of the character n-gram classification
models.
Figure 1 shows the different character n-gram ranges that were tested to find the right
model and their corresponding accuracy scores. The different lines correspond to the
minimum n-gram range, and the numbers on the x-axis correspond with the maximal value of
the n-gram range. The lowest accuracy score found was 0.569 for n-gram range [6,10], the
highest accuracy score was 0.673 for n-gram ranges [2,4] and [2,5].
What attracts attention is that the ranges of [2,4] and [2,5] hold the same accuracy
score (0.673). Because the n-gram ranges of [2,4] and [2,5] had the same accuracy score I
16
decided to test both ranges with different C-values. In this model the C-value entails a
variable that regularizes strength. A smaller value of C defines a stronger penalty on the
coefficients. In this research the default L2-regularization is used. The default value of C is 1.
In Figure 2 an overview is given of the different C-values used with the n-gram ranges
[2,4] and [2,5] and their corresponding accuracy scores.
Figure 1. Character n-gram ranges and corresponding accuracy scores, C=1.
The lowest accuracy score found for the different C-values in n-gram range [2,4] was
0.669 for C=4, C=5, C=10 and C=100. The highest accuracy score was 0.677 and corresponds
with C=1.1. The lowest accuracy score found for the different C-values in n-gram range [2,5]
was 0.661 for C=10 and C=100. The highest accuracy score was 0.673 and corresponds with
C=1 and C=1.2.
The best character model with which further experiments were carried out thus is the
model with a character n-gram range of [2,4] and a C-value of C=1.1. The corresponding
accuracy score of the model is 0.677.
17
Figure 2. C-values and accuracy scores for character n-gram range [2,4] and [2,5].
I now continue with discussing the results of the word n-gram classification model.
Figure 3 gives an overview of the different word n-gram ranges tested and the corresponding
accuracy scores. The lowest accuracy score found was 0.553 for n-gram ranges [2,5] and
[2,7]. The highest accuracy score of 0.631 goes with the n-gram ranges of [1,5] and [1,6].
Because there were two ranges found with an equal high accuracy score we decided to
conduct further experiments on both ranges.
Figure 4 shows the different C-values experimented with for word n-gram range [1,5]
and [1,6] and their corresponding accuracy scores. For [1,5] the lowest accuracy score found
was 0.619 for C=0.1 and C=100. The highest accuracy score was 0.631 for C=1. For n-gram
range [1,6] the lowest accuracy score found was 0.612 for C=100. The highest accuracy score
encountered was 0.631 for C=0.5, C=1 and C=2.
18
Figure 3. Word n-gram ranges and corresponding accuracy scores, C=1.
Figure 4. C-values and accuracy scores for word n-gram range [1,5] and [1,6].
Figure 4 made clear that there were four models with the highest accuracy score
(0.631) and therefore performed the best. The best word model that was used for further
experiments was the most simple version of the models. Conclusively, the model with a word
n-gram range of [1,5] and a C-value of C=1 was chosen to continue the experiments with.
19
4.2 Character n-grams
In the following sections I will discuss the n-grams that have been found in our analysis. I
look at the best predictors for understatements tweets and also try to detect the features
previously found in sarcasm research (Table 1, section 2.3) in our dataset. This section
(section 4.2) will involve the results concerning the character n-gram model. Section 4.3
focuses on the results found with the word n-gram model. As discussed in section 4.1, the best
character and word n-gram models were used for this analysis. The coefficients given are
relative values, this implies that the values are meaningful in relation to the model and each
other. They are not meaningful in comparison to other models, for example.
Figure 5 gives an overview of the 20 character n-grams that are the best predictors for
tweets containing understatements and non-understatement tweets. It is noticeable that the
first and second strongest predictors for understatements are 0.764 apart. What is also
remarkable is that the strongest predictor with a value of 1.303, is a dot followed by a blank
space („._‟). This punctuation mark is used extensively in language and is therefore a
questionable predictor for understatements. The combination of a dot and blank space
appeared 158 times in the validation set. The combination appeared mostly when tweets were
composed of multiple sentences and when these sentences were separated by a dot and a
blank space. Other occurrences of the character n-gram are due to the manual labeling of the
tweets. A label was added at the end of every tweet, but in some tweets a blank space was
present between the last character of the tweet, in this case a dot, and the label.
The predictor „ha‟ could be an expression of laughter but this is also questionable. The
most laughter expressions are a repetition of „ha‟. In the following tweet from the validation
set, a repetition of „ha‟ is used as a laughter expression: “Haha the guys at work just are
calling me a shopaholic how rude !!”. In the validation set the character combination „ha‟
occurred 147 times. In none of these occurrences the expression „ha‟ was a laughter
20
expression. It was rather part of a word, for example „thanks‟ and „happy‟ as shown in the
following two tweets from the validation set: “@LobsterLiveMixx Thanks for following! via
http://t.co/ctftxCN6yq” “@brihtarchik happy birthday”. Other positive character n-grams
found in this experiment show no clear relation with the features previously found in sarcastic
tweets.
As with the majority of the positive understatement predictors, no clear relation
between most character n-grams and the features found in previous sarcasm research can be
distinguished. We did, however find a 2-gram consisting of two exclamation marks „!!‟,
which is related to the feature „repeated punctuation‟ in sarcasm research. The repetition of
punctuation is supposed to be a predictor for sarcastic tweets but these results suggest that „!!‟
is a negative predictor for understatement tweets.
21
Figure 5. Top 20 character n-grams of best character model.
Note. _ = blank space.
Figure 6 shows the features previously found in sarcasm research and the relating
character n-grams extracted from our data. With our character n-gram model we found four
laughter expressions, five emoticons, five different cases of repeated punctuation, one
question mark, four exclamations and five explicit markers.
22
Figure 6. Features and corresponding character n-grams of best character model.
Note. _ = blank space.
The laughter expressions found include „haha‟ with a coefficient of -0.122, „ha‟ with a
value of -0.225, „lol‟ with a coefficient of 0.013 and „lol_‟ with a score of 0.032. It is
noticeable that the laughter expressions „haha‟ and „ha‟ are negative predictors for
understatements tweets. This entails that they do not predict an understatement tweet but
rather a non-understatement tweet. „lol‟ and „lol_‟ are positive predictors for understatement
tweets, but their coefficient values are rather low.
23
Emoticons found in our character n-grams are „:)‟ with a coefficient of -0.010, „:)_‟
with a score of 0.014, „=)‟ with a coefficient of 0.005, „:-D‟ with a coefficient of -0.009 and
„:o)‟ with a value of 0.015. Two of the emoticons were negative predictors and three positive
predictors. All of their coefficient scores are low, therefore they are not significant predictors
for either understatements or non-understatements.
Repeated punctuation appeared five times in our n-gram list, namely „!!‟ with a score
of -0.366, „!!!‟ with a coefficient of -0.226, „!!!!‟ with a value of -0.136, „??‟with a coefficient
of -0.121 and „???‟ with a coefficient of -0.062. It is noteworthy that all five repeated
punctuation n-grams are negative predictors for understatements. They thus do not predict
understatements but rather non-understatements. As the repeated punctuation marks, the
single question mark is a negative predictor for understatements with a value of -0.125.
Exclamations are also represented in our extracted n-grams, four variants were found.
„Wow‟ with a coefficient of 0.004, „wow‟ with a coefficient of 0.024, „Yay‟ with a value of 0.004 and „yay‟ with a coefficient of -0.046. „Wow‟ and „wow‟ are positive predictors for
understatements while „Yay‟ and „yay‟ are negative predictors for understatements. All four
exclamation n-grams are rather weak predictors for understatements or non-understatements.
Lastly, explicit markers for sarcasm were encountered. We found „NOT‟ with a
coefficient of -0.023, „Not‟ with a value of 0.026, „not‟ with a score of 0.103, „#Not‟ with a
coefficient of 0.010 and „#not‟ with a value of 0.041. All but „NOT‟ were positive predictors
for understatements. What is remarkable is that only „not‟ has a relatively high coefficient
(0.103) in comparison with the other explicit markers. It could be argued that only „not‟ is a
plausible predictor for understatements and that the other coefficients are too low to be of
significance.
24
Intensifiers and emotion words were not found in the n-gram list. This can be
explained by the number of characters that were extracted from the tweets. Strings existing of
two, three or four characters could be too small to extract intensifiers or emotion words from.
4.3 Word n-grams
We now continue with discussing the results concerning the word n-gram model. Figure 7
gives an overview of the 20 word n-grams that are the best predictors for understatement and
non-understatement tweets.
One positive predictor for understatement tweets has a relation with the features found
in sarcastic tweets. „not‟ could be an explicit marker for sarcasm, for instance used in the
following tweet: “@FoxNews breaking: climate change robs walruses of ability to swim.
#NotReally am I the only one who remembers haulouts on @NatGeo ?”. Three other positive
predictors that were found could be used for weakening an utterance. „little‟ „pretty‟ and „bit‟
are words that could be used in an understatement context. For example “Ok that last 3
minutes was pretty fun #DallasStars” or “Got a little wet walking home from the grocery
store. #stormTO”.
The negative predictors for understatement tweets have no clear relation with the
features previously found in sarcasm research.
25
Figure 7. Top 20 word n-grams of best word model.
Figure 8 shows the features previously found in sarcasm research and the relating
word n-grams extracted from our data. With our best word n-gram model we found three
laughter expressions, nine intensifiers, six emotion words, seven exclamations and two
different explicit markers.
26
Figure 8. Features and corresponding word n-grams of best word model.
The laughter expressions found include „LOL‟ with a coefficient of -0.062, „lol‟ with a
value of 0.038 and „hahaha‟ with a coefficient of -0.087. It is notable that „LOL‟ and „hahaha‟
are negative predictors for understatements and thus predict non-understatements while „lol‟
is a positive predictor for understatements. All three laughter expressions have a rather low
coefficient value. They are not important predictors for understatements or nonunderstatements.
27
Intensifiers found are „Really‟ with a coefficient of 0.055, „really‟ with a coefficient of
-0.201, „So‟ with a value of -0.193, „so‟ with a coefficient of -0.248, „Super‟ with a score of 0.094, „super‟ with a coefficient of 0.056, „Too‟ with a coefficient of -0.125, „too‟ with a
value of 0.064 and „very‟ with a coefficient of 0.222. It is notable that „really‟, „So‟, „so‟,
„Super‟ and „Too‟ are negative predictors for understatements tweets. This entails that they
are predictors for non-understatements. Of these five negative predictors, „really‟, „So‟, „so‟
and „too‟ have rather high coefficient value. They could be seen as plausible predictors for
non-understatements. Out of the four positive predictors, „Really‟, „super‟, „too‟ and „very‟,
only „very‟ has a rather high coefficient value. „Really‟, „super‟ and „too‟ have such low
coefficients that it could be argued that they are no significant predictors for understatements.
The six emotion words found in our word n-gram experiment are „not happy‟ with a
coefficient of 0.196, „happy‟ with a coefficient of 0.399, „sad‟ with a score of 0.020, „Excited‟
with a coefficient of -0.118, „excited‟ with a value of 0.181 and „disappointed‟ with a
coefficient of 0.079. It is noteworthy that there is only one negative predictor for
understatements amongst the six emotion words, namely „Excited‟. This is simultaneously the
only word that starts with a capital letter. Another remarkable thing is that the negative
emotion words „sad‟ and „disappointed‟ have a rather low coefficient value. „Not happy‟ is an
amalgamation of „not‟ and „happy‟ and that could be why the value of that negative emotion
word is higher than that of the other two emotion words. Two positive emotion words in
lowercase letters, „happy‟ and „excited‟ have a relative high coefficient value and could
therefor been seen as plausible predictors for understatements.
I found seven exclamations, „YAY‟ with a coefficient of -0.072, „Yay‟ with a
coefficient of -0.027, „yay‟ with a score of -0.073, „Wow‟ with a coefficient of 0.131, „wow‟
with a coefficient of -0.125, „Yes‟ with a value of -0.100 and „yes‟ with a coefficient of 0.199. It is notable that only one exclamation, „wow‟, is a positive predictor for
28
understatements. All other exclamations are negative predictors for understatements and thus
predict non-understatements. „YAY, „Yay and „yay‟ have a relatively low coefficient value
and are therefore not strong predictors for non-understatements. „wow‟, „Yes‟ and „yes‟ have
a coefficient value that is rather high. It could be said that they are plausible predictors for
non-understatements.
Finally, two explicit markers for sarcasm were found. „Not‟ with a coefficient of
0.438 and „not‟ with a value of 0.829. The coefficient values of these explicit markers are
high compared to the coefficients of the other features found in the word n-gram experiment.
„Not‟ and „not‟ are plausible predictors for understatements.
4.4 New data
In order to test whether the two best classification models performed better, equal or worse
when given new data, I used my test set for a final accuracy measurement. Table 2 gives an
overview of the classifiers, their performance accuracy on the validation set and the
performance accuracy on the test set.
The best character n-gram model (n-gram range 2,4, C=1.1) had an accuracy of 67.7%
on the validation data. When this model was used with the test set, an accuracy of 73.8% was
measured. The best word n-gram model (n-gram range 1,5, C=1) had an accuracy of 63.2%
when using the validation dataset. When the model was used with the test set, an accuracy of
66.9% was measured. These results show that the test set was easier to process for both
models than the validation set.
29
Table 2.
Comparing accuracy scores of validation set and test set.
Classifier
Validation set accuracy
Test set accuracy
Character N-grams
0.677
0.738
0.631
0.669
range 2,4 C = 1.1
Word N-grams
range 1,5 C=1
30
5. Discussion
In this chapter the results of our experiments will be interpreted. We start by presenting the
confusion matrices for both the best character and best word models in section 5.1. In section
5.2 we will have a detailed look at the type of errors that the models made. Finally, our
research question will be answered in section 5.3
5.1 Confusion matrices
Confusion matrices were created to give an overview of the mistakes that the classification
model makes. Mistakes include false positives, the assignment of a positive label to a negative
tweet and false negatives, the assignment of a negative label to a positive tweet. The
validation set consisted of 260 tweets, of which 130 tweets were labeled positively and 130
tweets negatively.
Figure 9 shows the correct and incorrect assignments of the best character model on
the validation dataset. The model assigned a total of 176 tweets with the correct label and
made 84 mistakes. This leads to an accuracy score of 67.7%. 80 tweets were labeled correctly
as understatement and 96 tweets were correctly labeled as non-understatement tweets. The
model assigned a positive label to a negative tweet 35 times and false negatives were found
49 times.
Figure 10 shows the mistakes made by the best word classification model on the
validation dataset. The model assigned a total of 164 tweets with the correct label and made
96 mistakes. This leads to an accuracy score of 63.1%. 83 tweets were correctly labeled as
positive and 81 tweets were correctly labeled as negative. The model made 96 mistakes of
which 50 were false positives and 46 were false negatives.
31
Classifier positive labels
Classifier negative labels
True positive labels
80
49
True negative labels
35
96
Figure 9. Confusion matrix of best character model on validation data.
Classifier positive labels
Classifier negative labels
True positive labels
83
46
True negative labels
50
81
Figure 10. Confusion matrix of best word model on validation data.
A paired samples t-test was conducted to compare the number of correct label
assignments in the best character n-gram classifier and the best word n-gram classifier. There
was no significant difference found in the number of correct label assignment between the
best character n-gram classifier (M=0.68, SD=0.469) and the best word n-gram classifier
(M=0.63, SD=0.484); t(1.398), p=.163.
5.2 Error analysis
In this section the errors that our models made on the verification dataset are discussed. I start
by discussing the average words and characters of the right and wrongly assigned tweets of
the best character n-gram model in section 5.2.1 and continue with the best word n-gram
model in section 5.2.2. We then look at the type of mistakes that both models made in section
5.2.3.
32
5.2.1 Character n-gram model
As discussed in section 5.1, our best character n-gram model had an accuracy score of 67.7%.
This entails that the model was right in 67.7% of the label assignments and was wrong in
32.3% of the label assignments. In this section we talk about the latter, the 32.3% in which the
model was wrong.
Table 3 shows the means of the amount of words and characters with either the blank
spaces excluded or included for the right and wrong assignments. The difference in average
word count between wrong and right assignments is 0.78. The differences in average between
the amount of characters without and with blank spaces included are respectively 0.20 and
1.29. These scores show that the average correctly assigned tweet is smaller than wrongly
assigned tweets. The largest difference can be found in the amount of characters with blank
spaces included.
Table 3.
Average of words and characters of wrong and right assignments by the best character ngram model compared.
Wrong assignments
Correct assignments
Average words
11.71
10.93
Average characters (no blank spaces)
60.46
60.26
Average characters (with blank spaces)
73.70
72.41
5.2.2. Word n-gram model
We now continue with the error analysis of the best word n-gram model. As stated in section
5.1, the best word n-gram model had an accuracy score of 63.1%. This means that the model
33
made correct predictions in 63.1% of the cases and made false predictions in 36.9% of the
cases.
Table 4 shows the average of words and characters of wrong and right assignments by
the best word n-gram model. The difference in amount of words between wrong and right
assignments is -1.42. The differences in number of characters without or with blank spaces
taken into account are respectively -10.37 and -11.79. These results indicate that correct
assignment tweets consist of more words and more characters on average.
Table 4.
Average of words and characters of wrong and right assignments by the best word n-gram
model compared.
Wrong assignments
Correct assignments
Average words
10.24
11.66
Average characters (no blank spaces)
53.69
64.06
Average characters (with blank spaces)
65.23
77.02
5.2.3 Mistakes by both models
To complete the error analysis, I look at the mistakes that models both made. What attracted
my attention was that almost all false negatives occurred because of labeling errors. Two
examples of false negatives are as follows: “@akilpin: There's about a million fireworks and
drums going off outside. The Spanish love their fiestas!” and “@ESPNChiHawks: Richards
still adjusting with Blackhawks”. These examples were labeled as understatements but the
model assigned them as non-understatement tweets. These errors were made due to
insufficient manual labeling of the dataset and the model was correct in assigning a negative
label to these tweets.
34
The models also made errors in the form of false positives. The nature of these
mistakes was harder to discover than the false negatives. Two examples of false positives are
as follows: „@francaiskitty Actually, you can pay someone to do that. lol.” and “10 Writing
Tips To Connect With Your Readers”. It is not clear to me why both models assign a positive
label to these (and similar) negative tweets. The first example does contain a positive
predictor for understatement tweets („lol‟) but that predictor was rather weak with a
coefficient of 0.013 for the best character n-gram model and a coefficient of 0.038 for the best
word n-gram model. Both models thus do classify negative tweets as positive for unclear
reasons.
5.3 Research question
This section will answer the overarching research question. The research question was
formulated as follows: RQ: Are there linguistic and/or pragmatic patterns to be found in
tweets containing understatements?
In this thesis I attempted to find linguistic and pragmatic patterns in tweets containing
understatements in order to be able to answer the research question. I found laughter
expressions, emoticons, repeated punctuation, question marks, exclamations and explicit
markers for sarcasm with our best character n-gram model and laughter expressions,
intensifiers, emotion words exclamations and explicit markers for sarcasm with our best word
n-gram model. I thus did find linguistic and pragmatic markers that had previously been
encountered in sarcastic tweets, but not all of these features were positive predictors for
understatements. Half of the character n-gram features found were negative predictors for
understatements. Out of the encountered word n-gram features the distribution between
positive and negative predictors for understatements was almost equally divided.
35
6. Conclusion
In this final chapter I will give a conclusion of this study (section 6.1) and discuss the
limitations that arose during this study (section 6.2). Implications for future work are also
given in section 6.2.
6.1 Conclusion
This study focused on finding linguistic and pragmatic features in understatement tweets. Two
binary logistic regression models were used to find character and word n-grams to predict
whether a tweet would be an understatement or not.
Linguistic and pragmatic features were found, but not all of these features were
positive predictors for understatements. The best character n-gram model found laughter
expressions, emoticons, repeated punctuation, question marks, exclamations and explicit
markers for sarcasm. The best word n-gram model found laughter expressions, intensifiers,
emotion words exclamations and explicit markers for sarcasm. There were an equal amount of
positive and negative predictors for the character n-grams and an almost equal amount of
positive and negative predictors for word n-grams.
My findings were not completely in line with research previously conducted by De
Freitas et al. (2014), Liebrecht et al. (2013) and González-Ibáñez et al. (2011). The findings
were similar in a way that the understatement tweets did contain similar predictors as sarcastic
tweets, but differed in the polarity and strength of these predictors. In previous work the
pragmatic and linguistic markers found in sarcastic tweets were all positive predictors for
sarcasm. In my results, these markers were either positive or negative predictors for
understatements. Approximately half of the markers encountered in my dataset were positive
predictors, the other half were negative predictors for understatements.
36
6.2 Limitations and future work
In this section we will discuss the limitations that arose while conducting this research and
implications for future work will be given. We start by discussing the limitations.
Because of the limited time in which the research had to be done, the amount of tweets
collected was rather small. To get better results it is recommended to use a larger dataset
collected over a greater timespan for this type of research. The filtering of the tweets was
done by one untrained person who is not an English native speaker. A more reliably dataset
could be assembled when multiple trained judges assigned the tweets to the understatement or
non-understatement conditions. When their assignments are compared an even better dataset
could emerge.
More research is needed to get a clear view of the linguistic and pragmatic markers
that are predictors for understatements. Future studies could look at features that are
characteristics of understatements instead of using the features that have been found in past
sarcasm research. A feature could be words that weaken a statement, for example „little‟,
„mildly‟ and „a bit‟. These weakening words occurred in my dataset and it could be important
to have a further look at them. Future work could also look at understatements in languages
other than English. For example Dutch or German. These two languages both use
understatements as a stylistic device. When other languages have been studied, the results
could be compared to investigate whether the linguistic and pragmatic markers that predict
understatement tweets are alike or different. To take this research to another level one could
also investigate whether people with different cultural backgrounds use different pragmatic or
linguistic markers in their understatements.
In short, not a lot of research has been done concerning understatements and linguistic
and pragmatic markers as predictors. In order to get a better understanding of the patterns that
are the foundation of understatements more research is needed.
37
References
Caucci, G. M., & Kreuz, R. J. (2012). Social and paralinguistic cues to sarcasm. In Humor,
25, 1-22.
Colston, H. L. (1997). " I've Never Seen Anything Like It": Overstatement, Understatement,
and Irony. Metaphor and Symbol, 12(1), 43-58.
Colston, H. L., & O'Brien, J. (2000). Contrast and pragmatics in figurative language:
Anything understatement can do, irony can do better. Journal of Pragmatics, 32(11),
1557-1583.
de Freitas, L. A., Vanin, A. A., Hogetop, D. N., Bochernitsan, M. N., & Vieira, R. (2014).
Pathways for irony detection in tweets. In Proceedings of the 29th Annual ACM
Symposium on Applied Computing (pp. 628-633). ACM.
Gilliam, T. (Director), & Jones, T. (Director). (1975). Monty Python and the Holy Grail
[Motion picture]. United Kingdom: EMI Films.
González-Ibáñez, R., Muresan, S., & Wacholder, N. (2011). Identifying sarcasm in
Twitter: a closer look. In Proceedings of the 49th Annual Meeting of the Association
for Computational Linguistics: Human Language Technologies: short papers-Volume
2 (pp. 581-586). Association for Computational Linguistics.
googlemaps (2014, July 18). Today‟s @letour stage is a bit on the twisty side.
#understatement
http://goo.gl/8l9nMe
[Twitter
post].
Retrieved
from
https://twitter.com/googlemaps/status/490149030781919232
Justo, R., Corcoran, T., Lukin, S. M., Walker, M., & Torres, M. I. (2014). Extracting relevant
knowledge for the detection of sarcasm and nastiness in the social web. Knowledge
Based Systems, 69, 124-133.
Martin, N. D. (1990). Understatement and Overstatement in Closing Arguments. La. L.
Rev., 51, 651-666.
38
Maynard, D., & Greenwood, M. A. (2014). Who cares about sarcastic tweets? Investigating
the impact of sarcasm on sentiment analysis. In Proceedings of LREC.
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. Speech
and Audio Processing, IEEE Transactions on, 13(2), 293-303.
Liebrecht, C.C., Kunneman, & F.A., Bosch, A.P.J. van den. (2013). The perfect solution for
Detecting sarcasm in tweets #not. In: Balahur, A., Goot, E. van der., Montoyo, A.(ed.),
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity,
Sentiment and Social Media Analysis (pp. 29-37). New Brunswick, NJ : ACL.
Soanes, C., Hawker, S., & Elliott, J. (Eds.). (2006). Paperback Oxford English
Dictionary (Vol. 10). Oxford University Press.
39