The role of pragmatic markers in the
recognition of sarcasm in tweets
Shannon Bakker
ANR: 811118
Bachelor’s Thesis
Communication and Information Sciences
Specialization Text and Communication
Faculty of Humanities
Tilburg University, Tilburg
Supervisor: Dr. G.A. Chrupala
Second Reader: Dr. P.H.M. Spronck
January, 2015
Acknowledgements
When I started at Tilburg University two and a half years ago I could not have predicted that at this
moment in time I would already finish my thesis to mark the end of my bachelor Communication and
Information sciences. I could also not have predicted that I, a girl with a love for writing and
journalism, would write my thesis on a topic like this: the role of pragmatic markers in the automatic
detection of sarcasm in tweets. A topic in which my love for how language works finds expression,
but also a topic for which I learned so many new things. When I started this thesis I had virtually no
programming experience and had never done a logistic regression, or any regression whatsoever.
That I managed to write this thesis at all is thanks to a few people, for who I would like to express my
appreciation.
First of all my supervisor Dr. G.A. Chrupala. I want to thank him for his incredible patience
when he explained concepts related to the subject to me for the second or sometimes even the third
time. I also want to thank him for his comments on the pieces I wrote. I want to thank my second
reader for his efforts as well: Dr. P.H.M. Spronck. Next I want to thank Gwen Wijman, a student in the
same thesis group as I. She helped me incredibly with my first baby steps with Python and whenever
I had a question she was the first I asked whether she already found a solution for my problem.
Finally I want to thank my family and friends, and especially Annemieke Salentijn, who helped me
tremendously with my English. Her comments on my writing were direct, sometimes a little bit strict,
but most of all very helpful.
1
Abstract
In this thesis the role of pragmatic markers in the automatic detection of sarcasm in Tweets was
investigated. The effect of pragmatic markers on the performance of models identifying sarcastic
tweets is examined and it was investigated which pragmatic markers are the best used for detecting
sarcasm. Previous research has shown that pragmatic markers are important markers for sarcasm in
computer mediated communication. The dependent variable in this study was a tweet being
sarcastic or not. The independent variables were character- or word-n-grams. I used logistic
regression to make models that predict whether tweets are sarcastic or not. Logistic regression was
performed on a dataset of 2000 English tweets. The dataset consisted of thousand sarcastic tweets
with #sarcasm and thousand non-sarcastic tweets. This dataset was divided in a training set, a
validation set and a test set. The training set was used to build up the models, the validation set to
determine the optimal n-gram range and regularization parameter and the test set to assess the
predictive value of the models. In total there were four models made: a model with character-ngrams; a model with word-n-grams; a model with character-n-grams with added pragmatic markers
and a model with word-n-grams with added pragmatic markers. Adding pragmatic markers to the
models did not result in higher accuracy scores of the models, but the accuracy scores of all the
models made were above chance level. Also after the logistic regression was performed the wordand character-n-grams in the models were sorted on the basis of their coefficients. It turned out that
pragmatic markers were not among the best markers for sarcasm, but content words like ‘love’,
‘great’ and ‘Monday’ were.
2
Contents
1.
Introduction ..................................................................................................................................... 4
1.1 Motivation ..................................................................................................................................... 4
1.2 Practical contribution..................................................................................................................... 4
1.3 Academic contribution................................................................................................................... 5
1.4 Research questions ........................................................................................................................ 5
1.5 Outline ........................................................................................................................................... 6
2.
Background...................................................................................................................................... 6
2.1 Twitter ........................................................................................................................................... 6
2.2 Sentiment analysis ......................................................................................................................... 6
2.3 Sarcasm detection ......................................................................................................................... 7
2.4 Pragmatic markers in non-literal utterances.................................................................................. 8
3.
Methodology ................................................................................................................................... 9
3.1 Dataset........................................................................................................................................... 9
3.2 Variable definitions ...................................................................................................................... 10
3.3 Descriptive Statistics .................................................................................................................... 10
3.4 Method of analysis....................................................................................................................... 11
4.
Results ........................................................................................................................................... 12
4.1 Determining the best parameters................................................................................................ 12
4.2 Comparing the four models ......................................................................................................... 15
4.3 The best markers ......................................................................................................................... 16
4.4 Error-analysis ............................................................................................................................... 17
4.5 Summary ...................................................................................................................................... 18
5.
Discussion ...................................................................................................................................... 19
References ............................................................................................................................................. 21
3
1.
Introduction
1.1 Motivation
The topic of my thesis will be the role of pragmatic markers in the recognition of sarcasm in tweets.
Sarcasm detection in general is an interesting and relevant research field because sarcasm is a really
common phenomenon. One out of the ten things we say to our friends is not meant to be literal
(Gibbs, 2000). Moreover the detection of sarcasm in tweets is useful because there are various kinds
of applications in which the detection of sarcasm in tweets can be a valuable addition.
The reason that I chose to focus on the role of pragmatic markers in sarcasm detection has to
do with how sarcasm in spoken language works. In spoken language different kinds of subtle clues
give information about whether a sentence is sarcastic or not. An example of such a clue is pitch of
voice as shown by Rakov and Ronseberg (2013). They developed an algorithm on the basis of pitch of
voice which is able to identify in eight out of ten utterances whether an utterance is meant ironic or
not. Pitch of voice and other clues like facial expression are called pragmatic markers. Pragmatic
markers give information that helps the recipient of an utterance understand what the sentence
means and what the speaker want to communicate. This information is important because for
example a sentence like ‘It is really cold here’ can have multiple meanings. It could be a request for
the receiver to turn up the heating or just an observation that one wants to share (Renkema, 2004).
Pragmatic markers show which meaning is intended by the speaker.
Pragmatic information is not limited to speech, it is possible to have pragmatic markers in
written text. For example in social media and text-message-services features like emoticons are used
quite often to give clues about how an utterance should be understood (Dresner and Herring, 2010).
Twitter is a social medium in which pragmatic markers might be especially important. In Twitter
writers have only 140 characters to express what they want to say. Therefore the short pragmatic
markers might come in useful expressing the intended meaning. At the moment, to the best of my
knowledge, the role of pragmatic markers in the detection of sarcasm has not yet been the focus of
research in studies on sarcasm in Twitter messages. Due to the great significance of pragmatic
markers in spoken speech it is expected that pragmatic markers are important in the recognition of
sarcasm in tweets as well.
1.2 Practical contribution
As said in section 1.1 sarcasm detection in tweets can be of great use for multiple applications. One
important example of an area in which this is true is when one wants to automatically translate
tweets. Currently, most translating machines do not take into account the possibility that an
utterance might be sarcastic, while the subtle clues that reveal sarcasm can get lost in translation.
4
Hence, it is very hard for the reader of a translated sentence to grasp what is meant with the original
utterance. Sarcasm detection in Twitter could also be of great value for marketing companies. When
one can detect sarcasm, companies are able to interpret more precisely what people think about
their product. Sarcasm quite often takes the form of a positive literal meaning while the ‘real’
meaning is negative. The sentiment of tweets on a certain product can be measured by looking at
positive words like ‘love’ and ‘happy’ and negative word like ‘hate’ and ‘stupid’. However when one
uses this method it is then unnoticeable whether a tweet is sarcastic and that the real meaning is
opposite of what the tweet literally says.
1.3 Academic contribution
In section 1.1 it is mentioned that pragmatic markers have not yet been the focus of research in
studies about sarcasm in tweets while they are very important in recognizing sarcasm in speech.
Because of this last observation it is valuable to understand how pragmatic markers work in written
texts. With respect to Twitter, pragmatic information is possibly even more important because of the
short messages and the informal nature of Twitter. In tweets it is, unlike formal texts, allowed to use
all kinds of markers like emoticons and multiple exclamation marks. Therefore I expect that there will
be a large amount of pragmatic markers used in Twitter data. Also the current research on sarcasm
in Twitter has not yet been able to make a detector of sarcasm with a high accuracy (González-Ibáñez
Muresan & Wacholder, 2011; Liebrecht, Kunneman & van den Bosch, 2013). Hopefully a better
understanding of how pragmatic markers work will also lead to better classifiers of sarcasm.
1.4 Research questions
The research questions in this thesis are the following:
1. Which pragmatic markers can mark sarcasm?
2. Which pragmatic markers are the best predictors for the occurrence of sarcasm in tweets?
3. How well do pragmatic markers predict whether a tweet is sarcastic or not?
The first question is answered in chapter 2 by a review of the literature on sarcasm in Twitter and the
role of pragmatic markers in other kinds of written texts, such as e-mail and comments on websites.
The pragmatic markers that might mark sarcasm are, for example, exclamation marks and emoticons.
The second question is answered by making two models for identifying sarcasm with a dataset of
sarcastic and non-sarcastic tweets. The models were made with logistic regression. The first model
had character-n-grams as independent variables and the other had word-n-grams as independent
variables. After the logistic regression was performed the character- and word-n-grams of the models
were sorted on the basis of their coefficients. It turned out that pragmatic markers were not among
5
the ten best markers for sarcasm. Content words like ‘great’, ‘love’ and ‘Monday’ were more
important. Finally the third question was answered by investigating the effects of adding pragmatic
markers to the models on the overall accuracy score of the logistic regression. The effect of this was
very small and not significant. Nevertheless all models with or without added pragmatic markers did
preform significantly above chance level in identifying sarcastic tweets.
1.5 Outline
The outline of this thesis is as followed. In chapter 1 the motivation for this thesis is explained and
the exact research question is defined. Chapter 2 will describe background information about
relevant concepts for this study. The dataset and the method of analysis are discussed in chapter 3.
And the results of this analysis will be discussed in chapter 4. In chapter 5 conclusions will be drawn
from the results and compared with previous research.
2.
Background
2.1 Twitter
Twitter is a social medium which allows users to send messages of 140 characters about various
topics to the world. Twitter started in 2007 and nowadays 248 million people are monthly active
users and more than 500 million tweets are send every day (Twitter, 2014). An important feature of
Twitter messages are its hashtags. Hashtags give information about a tweet without really being part
of the message. Hashtags mostly give information about the topic of the tweet, but they can also
indicate the tone of the tweet. For example in sarcastic tweets it is common to use the hashtag
sarcasm. This is very useful for this study, because it helps me to find a set of tweets that are almost
certainly all sarcastic. Hashtags are easily detectable and because #sarcasm is used commonly for
sarcastic tweets, it is quite easy to make a database of these tweets. A different important feature of
Twitter messages is the @user sign. People can direct tweets to specific people by typing ‘@’
followed by the username of the person one wants to send a tweet to. The @user sign makes it
possible to not only send general tweets to all followers, but also to have conversations on Twitter.
2.2 Sentiment analysis
The study of sarcasm detection and sarcasm on Twitter is part of the research field of sentiment
analysis. Sentiment analysis can be defined as the task of the detection and classification of opinions
and sentiments in written input (Montoyo, Martínez-Barco & Balahur, 2012).
6
This written input can vary from large text documents such as opinionated articles to
utterances of just one sentence, like tweets. In the last few years sentiment analysis has been
applied to Twitter in various studies. Tweets are written on multiple subjects, but in sentiment
analysis the most important topics are events, products and services (de Freitas, Vanin, Hogetop,
Bochernitsan & Vieira, 2014).
An example of a study investigating sentiment is the study conducted by Tumasjan, Sprenger,
Sandner and Welpe (2010). This study shows that the times a political party is mentioned on Twitter
was a correct predictor for the results of a German federal election. In addition to counting the
number of tweets with the name of a political party political sentiment was measured as well. This
was accomplished by using text analysis software that uses a dictionary to determine whether a
tweet is positive or negative, but it also investigates the tentativeness of a tweet by looking at words
such as ‘guess’, ‘maybe’ and ‘perhaps’. The sentiment embedded in tweets reflected the role of the
politicians that were researched. This means that tweets mentioning extreme right or left wing
politicians were more aggressive than tweets mentioning politicians of the more moderate parties. A
different example of research in which sentiment analysis is applied is the study conducted by
Chamlertwat, Bhattarakosol, Rungkasiri and Haruechaiyasak (2012) investigating brand sentiment.
They proposed a system which is based on sentiment analysis to measure consumer opinions on
smartphones. This system collects tweets on a certain smartphone, filters opinionated tweets out,
determines their polarity, determines which features of a smartphone the opinionated tweets are
about and gives a readable overview of the results. By developing this system it was shown that it is
possible to find important information for smartphone companies about consumer sentiment in a
short period of time using sentiment in tweets.
2.3 Sarcasm detection
Sarcasm has not had up till now, a lot of attention in sentiment analysis though sarcasm is an
important and common phenomenon in social media. Maynard and Greenwood (2014) studied the
effect of sarcasm on the polarity of tweets detected in sentiment analysis. They showed that by just
reversing the polarity of tweets with #sarcasm they could already achieve a better sentiment analysis
than more traditional sentiment analysis tools that do not take this into account. This displays that
with a very simple view on sarcasm one can already achieve better sentiment analysis. This could
improve even more when sarcastic utterances can be identified more precisely (there are also
sarcastic tweets that do not have a #sarcasm sign) and when one understands the subtleties of
sarcasm better.
The most common definition of sarcasm is that sarcasm is a form of irony in which the literal
meaning of an utterance is reversed (Gibbs, 2000). Despite this quite clear definition of sarcasm
7
there is a lot of dispute about when an utterance is sarcastic and when it is not (Liebrecht et al.,
2013). Therefore in various studies researching sarcasm in written text the authors of the texts
explicitly mark the utterances as sarcastic. This is the method I will use as well.
At present there have been some studies devoted to detect sarcasm in tweets. An example is
the study conducted by González-Ibáñez et al. (2011). Using lexical and pragmatic features they
found that pragmatic markers were among the most important discriminating features in the
classification of sarcastic tweets. The presence and the frequency of the features observed were
used to determine the impact of a feature on the discrimination between the sarcastic and nonsarcastic tweets. With a chi-square test they identified the ten most discriminating features. The
conclusion of the article was that the techniques they used were not yet great in identifying sarcastic
tweets. Other studies researching sarcasm in Twitter showed similar results. Liebrecht et al. (2013)
used a classification algorithm called ‘Balanced Winnow’ to extract features and used a chi-square
test to weigh those. In this study a result found was that exclamation marks are important signifiers
of sarcasm in Dutch tweets. An exclamation mark can be a marker for sarcasm because it can be seen
as a marker for a hyperbole (Liebrecht et al., 2013). In hyperbolic sentences the positive tone of a
sentence is exaggerated to signal that the sentence should be interpreted sarcastic. An example of a
sentence like this is: ‘The weather is fantastic today!!!.’
Although research on sarcasm in Twitter shows that pragmatic markers are important in
detecting sarcasm, it was difficult to find research on sarcasm in tweets that has solely been devoted
to pragmatic features. Research on pragmatic features could give more insight in the way pragmatic
markers are used and make better classifiers of sarcastic utterances.
2.4 Pragmatic markers in non-literal utterances
As said before pragmatic markers can be defined as linguistic signs that do not give information
about the literal meaning of the text but give information about the way an utterance should be
understood. Hancock (2004) has identified three categories of pragmatic markers that signify irony in
computer mediated communication. These categories were punctuation, emoticons and adapted
vocalization signals (e.g. ‘haha’). Whalen, Pexman and Gill (2009) have elaborated on these findings
by examining the use of these kind of pragmatic markers in non-literal language in e-mail. They chose
to be more specific in the categories used. Their categories of pragmatic markers were exclamation
marks, ellipses (‘…’), question marks, hyphens, parentheses, quotation marks, emoticons,
vocalization signals and words written entirely in capital letters. They found just as the research done
by Liebrecht et al. (2013) that the exclamation mark was the most common pragmatic marker in
ironic utterances. Other common pragmatic markers were hyphens, ellipses and parentheses.
8
To understand how and why the pragmatic markers mentioned can mark sarcasm I will
discuss the categories by Whalen et al. (2009) with some further explanation and examples. First is
the exclamation mark, which was often found to be an important marker for sarcastic tweets. As
discussed in paragraph 2.2 an exclamation mark marks an exaggeration. Which in the case of a
hyperbole means that an utterance is meant sarcastic. An example is: ‘I love school!!!.’ Words
written entirely in capital letters also mark a hyperbole. A different category of pragmatic markers
are hyphens, ellipses and parentheses. These three markers have in common that they physically
separate a part of the text, which is probably used to mark the part of the text that is not supposed
to be taken literally (Whalen et al. 2009). An illustration of a sentence in which this happens is the
following tweet: ‘Dear couple that decided to fight outside my window at 3 AM, in a 30 minute tirade
of expletives...Thank you. I didn't need sleep. #sarcasm’ (jnoahwatt, 2014). A question mark can be a
marker of a sarcastic utterance because it can signal a rhetorical question. An example of a tweet in
which this happens is: 'You look like shit. Is that the style now? #Sarcasm' (conconarbonida2, 2014).
Furthermore quotation marks can be markers of sarcasm because they indicate that the writer would
not use the word to describe the situation himself. For example: ‘Those people really view
themselves as ‘normal’’. Also an emoticon is a marker of sarcasm because it signals that the
utterance should not be taken too seriously, especially the ‘;)’ emoticon does this. Finally vocalization
markers, especially ‘haha’, mark sarcasm in the same way as emoticons because they signal that
something is not meant seriously or as a joke. An example of this can be seen in the following tweet:
'I love when people think they have to be better than me or have to copy me haha #sarcasm'
(pure_dreaming, 2014).
3.
Methodology
3.1 Dataset
The dataset consists of 2000 English tweets collected with the Twitter API. One thousand tweets in
this dataset are sarcastic tweets that contain #sarcasm. I chose to collect tweets with #sarcasm
because, as mentioned before, there is no clear definition of what sarcasm actually is (Filatova,
2012). The writer of the utterance is probably the best person to determine whether an utterance is
sarcastic or not. The other thousand tweets in this dataset are non-sarcastic tweets written by the
same users as the sarcastic tweets. I chose to collect tweets from the same users to control for
variations in language use. For this reason and to be sure that there are no duplicates in the dataset
retweets are filtered out. Each tweet is given a label: a zero when a tweet is not sarcastic and a one if
the tweet is.
9
3.2 Variable definitions
As described in chapter 2 the dependent variable is whether the tweet is sarcastic or not. Sarcasm is
defined by the writers of the utterances themselves. Two models with each different independent
variables are used to detect sarcasm. One model has character-n-grams as independent variables
and the other model has word-n-grams as independent variables. An n-gram is a string with a length
n. An n-gram can consist of an n amount of characters or an n amount of words. When using the
CountVectorizer in the Python module Scikit-learn it is possible to collect a range of n-grams, which
later can be used to detect which features are important in a text. For example one can set the
CountVectorizer in such a way that it collects all the 2-3 character-n-grams in a text. In this case one
will find all the combinations of two and three characters. Possible n-grams are in this case ‘the’ and
‘and’ but also ‘I .’ and ‘..’. The CountVectorizer also assigns a value to all the n-grams it generates
based on how often certain n-grams occurs. An example is when ‘the’ occurs twenty times in a text it
will receive a value of twenty and if ‘book’ occurs five times it will receive a value of five. On the basis
of the output of the CountVectorizer the logistic regression discussed in paragraph 3.4 will be
executed.
After the logistic regression is performed the list of n-grams are sorted on the basis of their
coefficients. This way the most influential n-grams can be determined. I then determine to what
extent important n-grams correspond to the pragmatic markers in table 1. These pragmatic marker
are derived from the research of Whalen et al. (2009).
Table 1: Pragmatic markers that could be signs of sarcasm
Categories
Markers
Ellipses
… u2026
Quotation marks
‘ ’
Exclamation points
!
Question marks
?
Emoticon
:) ;) :P -_- :D :( :o =) xD :-) u0001f…
Caps
A B C etc.
Parentheses
() {} []
Hyphens
- _
Vocalization signals
Hahaha mmmmm
3.3 Descriptive Statistics
In table 2 the average amount of characters and the average amount of words of the tweets in the
dataset are provided. The sarcastic tweets with the #sarcasm were longer than the non-sarcastic
tweets (sarcastic: 94.39 characters, non-sarcastic: 84.50 characters, t(2000)= 6.04, p<.001). This is
not a surprise because the #sarcasm was present in all the sarcastic tweets, which probably made the
sarcastic tweets longer than the non-sarcastic tweets. What is interesting is that even with the
10
#sarcasm removed the sarcastic tweets are still significantly longer than the non-sarcastic tweets
(sarcastic: 86.38 characters, non-sarcastic: 84.50 characters, t(2000)= 1.14, p<.001). The same
pattern can be seen for the amount of words in the tweets. The sarcastic tweets with #sarcasm have
more words than the non-sarcastic tweets (sarcastic: 14.06 words, non-sarcastic: 12.65 words,
t(2000)= 4.93, p<.001) and without the #sarcasm the sarcastic tweets have more words than the
non-sarcastic tweets (sarcastic: 13.77 words, non-sarcastic: 12.65 words, t(2000)= 3.85, p<.001).
Before the tweets are analysed for the prediction task the #sarcasm is removed from the tweets.
Otherwise the prediction task would be very easy.
Table 2: Length of the tweets in the dataset
Sarcastic tweets with #
sarcasm
Amount of
94.39
characters
Amount of words
14.06
Sarcastic tweets with
#sarcasm removed
86.38
Non-sarcastic tweets
13.77
12.65
84.50
3.4 Method of analysis
The technique used to predict whether a tweet is sarcastic or not is logistic regression. This is a
model in which on the basis of multiple independent variables a prediction is made about one
dependent variable that has categorical values (Pallant, 2010). In logistic regression the dependent
variable is viewed as a probability P. P is a number between zero and one. If P is one a case has a
certain property, in our case the tweet is sarcastic. If P is zero a case does not have a certain
property. The chance of the tweet being sarcastic is P, the chance of the tweet not being sarcastic is
therefore 1-P. The odds-ratio is the chance of category 1 divided by the chance of category 2.
Therefore the odds-ratio is: odds = P/1-P. The logit is the natural logarithm of the odds ratio.
Therefore the logit is: logit = ln(P/1-P). After this the logit is calculated with a linear regression.
The B represents the coefficients of the predictor variables xi. The predictor variables in this study are
the character-n-grams or the word-n-grams. The final output of the logistic regression is the chance
of the tweets being sarcastic. This chance is found by calculating the probability P from the logit.
To prevent overfitting the data file was split in a training set containing sixty percent of the
tweets, a validation set containing twenty percent of the tweets and a test set containing twenty
percent of the tweets as well. The training set was used to find the n-grams and build up the model.
The validation set was used to explore what effect different parameters like the value of the n-gram
11
range and the regularization parameter have on the accuracy score of the logistic regression and
determine the best parameters. The regularization parameter is part of the module of the logistic
regression and controls how much the model tries to fit the training data. Finally the test set was
used to test the best-performing model on a new dataset and to report how well the model worked
and what the predictive value of the model was.
To execute the analysis logistic regression in the Scikit-learn module in Python was used. The
Scikit-learn module can be used for statistical data analysis on several sorts of datasets including text
data (Pedregosa et al., 2011).
4.
Results
4.1 Determining the best parameters
As said in section 3.4 there are two parameters that can be adjusted to improve the accuracy score.
These parameters are the n-gram range and the regularization parameter in the module of the
logistic regression. The effects of the n-gram range and the regularization parameter on the accuracy
score of the character-n-gram model can be found in figure 1 and 2. The n-gram range with the best
accuracy score was 5-8 and the best regularization parameter was 2.5. The best overall accuracy
score on the validation set was 0.648.
0.7
Low end
n-gram
range
accuracy score
0.65
1
2
0.6
3
4
0.55
5
6
0.5
1
2
3
4
5
6
7
8
9
10
high end n-gram range
Figure 1. The effect of n-gram range on the character-n-gram model with a regularization parameter
of 2.5
12
accuracy score
0.7
0.65
0.6
0.55
0.5
1
2
2.5
3
4
4.5
5
regularization parameter
Figure 2. The effect of the regularization parameter on the character-n-gram model with an n-gram
range of 5-8.
As said the best n-gram range has a minimum value of five. This has implications for the pragmatic
markers that can be discovered: most of the pragmatic markers have between one and three
characters. Therefore short pragmatic markers are added to the model with the 5-8 character-ngrams. These markers are visible in table 3. The regularization parameter was also optimized for this
model. The distribution of the accuracy scores over the different values of the regularization
parameter is shown in figure 3. The best accuracy score on the validation set was 0.660 with a
regularization parameter of 0.1.
Table 3: The pragmatic markers that were added to the character-n-gram model
Categories
Markers
Ellipses
.. … ….
Quotation marks
‘ ’
Exclamation points
! !! !!!
Question marks
? ?? ???
Emoticon
:) ;) :P -_- :D :( :o =) xD :-)
Caps
A B C etc.
Parentheses
() {} []
Hyphens
- _
Other
@ # .
accuracy score
0.7
0.65
0.6
0.55
0.5
0.1
1
2
3
10
100
regularization parameter
Figure 3. The effect of the regularization parameter on the character-n-gram model with added
pragmatic markers
13
The n-gram range and the regularization parameter have also been adjusted for the word-n-gram
model. The best n-gram range for this model was 1-3 and the best regularization parameter was 60.
The best accuracy score was 0.638. The overall effect of the n-gram range and the regularization
parameter on accuracy are displayed in figure 4 and 5.
0.7
Low end
n-gram
range
accuracy score
0.65
1
0.6
2
3
0.55
0.5
1
2
3
4
5
6
high end n-gram range
Figure 4. The effect of n-gram range on the word-n-gram model with a regularization parameter of
60.0
accuracy score
0.7
0.65
0.6
0.55
0.5
0.1
1
2
5
10
50
60
100
regularization parameter
Figure 5. The effect of the regularization parameter on the word-n-gram model with an n-gram range
of 1-3
The word-n-gram model does not take punctuation into account, which makes it possible to add the
short pragmatic markers of table 3 to this model as well. The best accuracy score of this model was
0.640 with a regularization parameter of 2.0. The overall effect of the regularization parameter on
accuracy is visible in figure 6.
14
accuracy score
0.7
0.65
0.6
0.55
0.5
0.1
1
1.5
2
2.5
3
5
10
50
60
100
regularization parameter
Figure 6. The effect of the regularization parameter on the word-n-gram model with added
pragmatic markers
4.2 Comparing the four models
In table 4 the accuracy scores of the four models on the test set are given. The accuracy score of the
model with character-n-grams without added pragmatic markers is 0.630 and the accuracy score of
the character-n-gram model with added pragmatic markers is 0.635.The model with word-n-grams
without added pragmatic markers has an accuracy score of 0.635 and the word-n-gram model with
pragmatic markers added had an accuracy score of 0.615.
Table 4: Accuracy scores of the models on the test set
Model type
Character model with added pragmatic markers
Accuracy score
0.635
Character model without added pragmatic markers
0.630
Word model with added pragmatic markers
0.615
Word model without added pragmatic markers
0.635
A one-way Anova was executed to investigate whether the accuracy scores of the four models
differed from each other. The result was that the models did not differ from each other (F<1), but
they all did differ from chance level (character with t(400)= 3.81, p<.001, character without t(400)=
3.56, p<.001, word with t(400)= 3.40, p<.001, word without, t(400)= 3.70, p<.001).
A chi-square test was performed to investigate in what way the correct and wrong
predictions are distributed. In table 5 the distribution of the predictions of the character-model with
the pragmatic markers added can be seen. Overall the predictions were not equally distributed, there
were more correct predictions then false predictions (χ²(3)=29.80). The two different kinds of false
predictions were equally distributed (χ²(1)=0.44, p=.51) which was similar to the different correct
answers (χ²(1)=0.25, p=.62).
15
Table 5: The distribution of the false and correct predictions of the character model with added
pragmatic markers
False
Correct
Data is sarcastic and Data is not sarcastic
Data and prediction
Data and prediction
prediction is not
and prediction is
are sarcastic
are not sarcastic
sarcastic
sarcastic
77
69
123
116
4.3 The best markers
In table 6 the ten markers with the highest and the lowest coefficients of the model with the
character-n-grams without added pragmatic markers are displayed. The markers that were obviously
part of a character-n-gram that was also in the top ten were filtered out. For example the word
‘onday’ is the third most influential sarcastic marker, but ‘Monday’ was also in the list, ‘onday’ was
obviously a 5-n-gram of Monday. The n-grams with the highest coefficient stayed and the n-grams
that signified the same word and had a lower coefficient were removed from the list. In table 6 the
words the n-grams were probably derived of are also displayed. As seen in table 6 ‘love’ is the best
marker for sarcasm and the rest of the top ten consists of the adjectives ‘fantastic’ and ‘great’, the
noun ‘Monday, the verb ‘are’, the numeral ‘all’ and some function words. No pragmatic markers
were present in the top ten. In the top fifty of best n-grams the pragmatic markers found were a
smiley in Unicode (U0001f60*) and an ellipse of five dots (……). The Unicode code U0001f60* can be
followed by one character to get a smiley. Examples of a smiley with this code are U0001f602 (
and U0001f60A (
)
). Interestingly a smiley was also found among the best markers for non-sarcasm
(U0001f62*). This code can be followed by a character to get a smiley as well. Examples from this
range are U0001f621 (
) and U0001f627 (
). Other words that signified a tweet that was not
sarcastic were function words like ‘this’, ‘but’ and ‘what’.
Table 6: The best markers of the character-n-gram model
Markers for sarcasm
Marker
word
‘love ‘
love
' all '
all
'onday'
Monday
'today'
today
' are '
are
‘ they'
they
' that'
that
' day '
day
'reat '
great
fantastic
'astic'
Markers for not-sarcasm
Marker
Word
' this'
this
'0001f62'
U0001f62*
' but '
but
'what '
what
' will'
will
' done'
done
' is a'
is a
'ing th'
*ing the
' FAME'
FAME
" done'"
done
16
In table 7 the ten markers with the highest and lowest values of the word-n-gram model without
added pragmatic markers are displayed. There are some similarities between the words in table 6
and table 7. For example the marker of non-sarcasm ‘U0001f621’ (
) in table 7 is probably the
whole word form of ‘0001f62’ which was an important marker of non-sarcasm in the character
model. A different similarity is the occurrence of the words ‘Monday’, ‘love’ and ‘great’ among the
best markers for sarcasm in the word model. These words were also important markers for sarcasm
in the character model. An example of a tweet that had Monday as a marker for sarcasm was the
tweet: 'Is there anything better than waking up with a migraine.....on MONDAY! #sarcasm #fml'
(RyanRich80, 2014). A good example of a sarcastic tweet with ‘love’ is: 'Stairs are my absolute
favorite
right
now!
Man,
I
love
stairs.
#sarcasm
#marathon
#runnerproblems
#ow
http://t.co/VBcRyTlrnS' (ETriyonis, 2014). Finally ‘great’ can for example be found in the following
sarcastic tweet: 'Wow what a great thanksgiving. #sarcasm' (Sandycheeeeekz, 2014). The most
important markers of the models with added pragmatic markers are not displayed since adding these
markers did not result in better models.
Table 7: The best markers of the word-n-gram model
Markers for not-sarcasm
'Undefeated'
'U0001f621'
'coming U0001f621'
that coming U0001f621
'this'
'will'
'what
'Vao_Goldheart'
'it'
'but'
Marker for sarcasm
'state_chumps'
'jon_rubin'
'Oh'
'Yay'
'Monday'
'yay'
'love'
'great'
'fun'
'really'
4.4 Error-analysis
To find out what factors make it difficult to detect sarcasm the various predictions of the models
have been examined. It turned out that 158 of the 400 tweets were predicted correct in all the four
models. There were 34 tweets that were mistakenly predicted as sarcastic in all four models and
there were 36 tweets that in all models were predicted as non-sarcastic while they were sarcastic.
There were some similarities between the 34 tweets that were mistakenly predicted as
sarcastic. For example these tweets sometimes contained markers for sarcasm such as ‘love’. An
example of this is the following tweet: ‘Congrats to The Flash and Jane the Virgin on their full-season
pick-ups! That's why I love Mark Pedowitz. When he moves, he MOVES’ (julieplec, 2014). Moreover
important themes in the incorrect predicted tweets were school (‘I have school at 9 today.
,
17
reinalynsuyom, 2014) and getting up in the morning (‘No one told me that going to uni would mean
getting up at 6am #uniproblems’, sophielmrobson, 2014). Which are themes that are often seen in
sarcastic tweets as well. Furthermore there were often tweets that were probably sarcastic but were
not explicitly tagged as such. Examples are ‘Oh, and they vote for health care, too. Because health
care is this really horrible thing everyone should be wary of’ (AbsurdlyJames, 2014) and ‘Another fun
commute to work. Lets see if it takes less then 2 hours’ (ElAmazo, 2014).
The 36 tweets which were mistakenly predicted as non-sarcastic also showed some
similarities. Most of these tweets were not obviously sarcastic and a human judge with little
background knowledge would have a lot of trouble identifying these tweets as sarcastic as well.
Examples are ‘Ahhhh man, seeing that Steve Nash is out for the season means the Lakers aren't going
to win the NBA title.’(BlaydeEickhoff, 2014) and ‘@AntDeRosa coincidence? I think not’
(notRhysFletcher, 2014). What also strikes is that many of these tweets, just as the last example, are
directed to specific people with the ‘@user’ sign. Another example of a tweet that is directed to
specific persons is ‘@snidelewhiplash @LadyErvona I actually thing those things are swell’
(SaviCharlotte, 2014).
4.5 Summary
On the basis of the results previously displayed one can say that a model on the basis of n-grams
does not provide a very good classification of sarcastic tweets. The character-n-gram model was able
to correctly predict in 63.0 percent of the cases whether a tweet was sarcastic or not and the wordn-gram model showed an accuracy of 63.5 percent. This is 13 and 13.5 percent above chance level.
In section 4.2 four models for detecting sarcasm were compared: a character-n-gram model without
added pragmatic markers; a character-n-gram model with added pragmatic markers; a word-n-gram
model without added pragmatic markers and a word-n-gram model with added pragmatic markers.
All of these models were not statistically significantly different from each other, but all did perform
above chance level. It was found that for the character-n-gram model with added pragmatic markers
more correct than wrong predictions were made, but the types of good or wrong predictions were
equally distributed. For example there was the same amount of correct predictions when the tweet
was sarcastic as when the tweet was not sarcastic. In section 4.3 the most important markers of the
word and character models are displayed. No pragmatic marker was important enough to be in the
top ten of markers for sarcasm. Words that were among the best markers for sarcasm in both
models were the noun ‘Monday’, the adjective ‘great’ and the verb ‘love’. Words that were among
the best markers for non-sarcasm in both models were ‘this’, ‘but’, ‘what’ and ‘
’. In section 4.5
the tweets that were predicted wrong in all four models were analysed. It was found that tweets
18
which were mistakenly predicted as sarcastic often had themes that were common in sarcastic
tweets like school and getting up in the morning. And sometimes it was very likely that the tweets
which were incorrectly identified as sarcastic were sarcastic, but just not tagged explicitly with the
#sarcasm. The tweets that were mistakenly predicted as non-sarcastic were rarely very clear cases of
sarcasm. Moreover it was often found that these tweets were directed to specific people.
5.
Discussion
This study shows that the detection of sarcasm is difficult, but that with a small dataset and a simple
supervised classifier it is possible to make a classifier for sarcasm that preforms above chance level.
This result shows that this way of classifying tweets is a hopeful path for further research. The
models could probably be improved further by simply increasing the size of the dataset. Another
result of this study is that content words seem to be more important for detecting sarcasm than
pragmatic markers. Content words like ‘great’, ‘love’ and ‘Monday’ were among the top ten best
markers for sarcasm, while no pragmatic markers were in the top ten. Also adding pragmatic markers
to the predictive models did not result in a significantly better classification of sarcastic tweets. These
last two results were not as hypothesized, but there are some factors that can partially explain the
limited importance of pragmatic markers.
First the collection of the dataset and so selecting on sarcastic tweets with the #sarcasm may
have had an influence on the results. According to Davidov, Tsur and Rappoport (2010) the #sarcasm
is used in several ways. The first reason why people use the #sarcasm is that it makes tweets
searchable. Writers of tweets add #sarcasm to their tweets so that the tweet can be found when
people search for #sarcasm. Secondly people use the #sarcasm not only in sarcastic tweets but also
in tweets following a sarcastic tweet to make clear that the previous tweet was sarcastic. The third
reason to use #sarcasm is to make sure that readers understand the sometimes very subtle forms of
sarcasm in a tweet. Sometimes the lack of context and the short amount of text in a tweet makes it
impossible for a reader to detect sarcasm. The #sarcasm ensures that tweets are not misunderstood.
A different theory about why people use the #sarcasm is given in the study of Liebrecht et al.
(2013). They claimed that in hyperbolic sarcastic tweets pragmatic markers are used as signifiers of
sarcasm and that in non-hyperbolic sentences the #sarcasm is used as a pragmatic marker.
Exclamation marks and other intensifiers enable the writer to write a hyperbolic sentence. For nonhyperbolic sarcastic sentences, users do not use pragmatic markers but the #sarcasm, which in itself
is a pragmatic markers because it replaces the linguistic cues needed to detect sarcasm. If this is true
most tweets in the sarcastic dataset would have been non-hyperbolic sentences. This does not seem
very likely because in the top ten of n-grams that mark sarcasm in both the character and the word
19
model the words ‘love’ and ‘great’ were found. ‘Love’ and ‘great’ denote a highly positive sentiment.
So these words mark a hyperbolic sarcastic sentence.
The third use of #sarcasm of the study of Davidov, Tsur and Rappoport (2010), the use of
#sarcasm when the sarcasm is subtle, seems more common. Especially in the tweets that could not
be identified as sarcastic in all four models the sarcasm was very subtle and the tweets had very little
pragmatic markers to denote sarcasm. In these tweets the #sarcasm was necessary to make sure that
readers understood that the tweet was sarcastic. That some tweets with #sarcasm display very
subtle sarcasm makes it hard to detect sarcasm with pragmatic markers.
Another possible reason for the relatively small importance of pragmatic markers in
detecting sarcasm is the manner in which pragmatic markers occur. In section 1.3 it was argued that
due to the informal nature of Twitter, pragmatic markers would often be present in tweets. This is
something that has not been proven to be untrue, but it is probable that these pragmatic markers
are not only very common in sarcastic tweets, but also in non-sarcastic tweets. A question mark can
be a marker for a sarcastic sentence but of course also a question mark can indicate a question. An
emoticon like :) can be a signal for sarcasm but it can also be a marker for positivism as in the tweet:
‘Looking forward to my first #ploneconf 2014 at Bristol! Have to do some tweaking to my "Plone + 10
years of happiness" -presentation :)’ (rikupekka, 2014). Therefore it might be possible that the
markers for sarcasm are just as common in the non-sarcastic dataset as in the sarcastic dataset, only
they have different meanings in the different datasets.
Interestingly, the study by González-Ibáñez et al. (2011) which was discussed in paragraph
2.2 had the same method of collecting tweets as this study and in that study the researchers were
able to find that emoticons and punctuation markers were in the top ten markers for sarcasm. This is
contradictory to the results in this study. Since González-Ibáñez et al. (2011) used a dataset which
was almost as large as this one’s and they collected the tweets the same way as was done here,
further research is necessary to find out what role pragmatic markers play in sarcastic tweets.
Sarcasm detection in general turned out to be difficult. A factor that complicated sarcasm
detection in this study was that some tweets were sarcastic but not explicitly tagged with the
#sarcasm. These tweets were placed in the non-sarcastic dataset. The sarcastic tweets in the nonsarcastic database could have had pragmatic markers and content words to mark sarcasm, which
decreases the value of these markers for distinguishing between the set with #sarcasm and the set
without. In further research this might be prevented somewhat by using human judges to identify
whether a tweet is sarcastic or not. Although this might help a bit, a study by González-Ibáñez et al.
(2011) showed that human judges could identify sarcastic utterances correctly with an accuracy of
only 63%, which is roughly the same accuracy as the models discussed in 4.1. This shows that the
20
value of human judges for further research is limited and that sarcasm detection in tweets is even
difficult for humans. This is largely due the lack of context in tweets and the short length of tweets.
In further research the performance of models to detect sarcasm could as said probably be
much improved by using a much larger dataset. By doing this the markers that characterize sarcastic
tweets can be identified far more precisely. Therefore a bigger dataset is the most promising avenue
for further research. Besides this it is also possible to focus not only on detecting sarcasm in general,
but also on detecting the various forms of sarcasm. The set of sarcastic tweets is very heterogeneous.
Sarcastic tweets can be a hyperbole, a rhetorical question, an understatement and various other
types of utterances that should not be taken literally. It is important to focus on these separate kinds
of sarcasm because those different forms have very different characteristics. A hyperbole typically
has a very strong positive sentiment, which would implicate that it has words like ‘great’ and
‘fantastic’ and multiple exclamations marks. An understatement is a statement in which something is
said in relatively careful words while someone means a stronger phrase. Characteristics of an
understatement could be words like ‘maybe’, ‘might’ and a question mark. Because of the specific
characteristics of the various forms of sarcasm it might be easier to detect a form of sarcasm than all
forms of sarcasm at once. Therefore I expect that when the various forms of sarcasm are researched
better this will lead to a better detection of sarcasm in general.
References
AbsurdlyJames. (2014, 23 October). Oh, and they vote for health care, too. Because health care is this
really horrible thing everyone should be wary of. [Twitter post]. Retrieved from
https://twitter.com/AbsurdlyJames/status/525133368447741952
Chamlertwat, W., Bhattarakosol, P., Rungkasiri, T., & Haruechaiyasak, C. (2012). Discovering
Consumer Insight from Twitter via Sentiment Analysis. Journal of Universal Computer Science,
18(8), 973-992.
conconarbonida2. (2014, 24 October). You look like shit. Is that the style now? #Sarcasm [Twitter
post]. Retrieved from https://twitter.com/conconarbonida2/status/525564683084783616
BlaydeEickhoff. (2014, 24 October). Ahhhh man, seeing that Steve Nash is out for the season means
the Lakers aren't going to win the NBA title. [Twitter post]. Retrieved from
https://twitter.com/BlaydeEickhoff/status/525524555482136576
Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in
twitter and amazon. In Proceedings of the Fourteenth Conference on Computational Natural
Language Learning (pp. 107-116). Stroudsburg, PA: Association for Computational Linguistics.
Dresner, E., & Herring, S. C. (2010). Functions of the nonverbal in CMC: Emoticons and illocutionary
21
force. Communication Theory, 20(3), 249-268.
ElAmazo. (2014, 22 October). Another fun commute to work. Lets see if it takes less then 2 hours.
[Twitter post]. Retrieved from https://twitter.com/ElAmazo/status/524941694500741121
ETriyonis. (2014, 13 October). Stairs are my absolute favorite right now! Man, I love stairs. #sarcasm
#marathon #runnerproblems #ow http://t.co/VBcRyTlrnS [Twitter post]. Retrieved from
https://twitter.com/ETriyonis/status/521696722326540288
Filatova, E. (2012). Irony and sarcasm: Corpus generation and analysis using crowdsourcing. In:
N.C.C. Choukri, K. Declerck, T. Doan, M.U. Maegaard, B. Mariani, J. Odijk & S. Piperidis (Eds.),
Proceedings of the Eight International Conference on Language Resources and Evaluation (pp.
392-398). Istanbul, Turkey: European Language Resources Association.
de Freitas, L. A., Vanin, A. A., Hogetop, D. N., Bochernitsan, M. N., & Vieira, R. (2014). Pathways for
irony detection in tweets. In Proceedings of the 29th Annual ACM Symposium on Applied
Computing (pp. 628-633). doi: 10.1145/2554850.2555048
Gibbs, R. W. (2000). Irony in talk among friends. Metaphor and symbol, 15, 5-27.
González-Ibáñez, R., Muresan, S., & Wacholder, N. (2011). Identifying sarcasm in Twitter: a closer
look. In Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics (pp. 581-586). Portland, OR: Association for Computational Linguistics.
Hancock, J. T. (2004). Verbal irony use in face-to-face and computer-mediated conversations. Journal
of Language and Social Psychology, 23(4), 447-463.
jnoahwatt. (2014, 6 October). Dear couple that decided to fight outside my window at 3 AM, in a 30
minute tirade of expletives...Thank you. I didn't need sleep. #sarcasm [Twitter post].
Retrieved from https://twitter.com/jnoahwatt/status/519113411137773569
julieplec. (2014, 22 October). Congrats to The Flash and Jane the Virgin on their full-season pick-ups!
That's why I love Mark Pedowitz. When he moves, he MOVES. [Twitter post]. Retrieved from
https://twitter.com/julieplec/status/524638072160272384
Liebrecht, C. C., Kunneman, F. A., & van den Bosch, A. P. J. (2013). The perfect solution for detecting
sarcasm in tweets# not. In A. Balahur, E. van der Goot, & A. Montoyo (Eds.), Proceedings of
the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media
Analysis (pp. 29-37). Retrieved from http://hdl.handle.net/2066/112949
reinalynsuyom. (2014, 22 October). I have school at 9 today. \U0001f44c [Twitter post]. Retrieved
from https://twitter.com/reinalynsuyom/status/525146962640007168.
rikupekka. (2014, 23 October). Looking forward to my first #ploneconf 2014 at Bristol! Have to do
some tweaking to my "Plone + 10 years of happiness" -presentation :) [Twitter post].
Retrieved from https://twitter.com/rikupekka/status/525168778163003392
22
SaviCharlotte. (2014, 24 October). @snidelewhiplash @LadyErvona I actually thing those things are
swell
#sarcasm
[Twitter
post].
Retrieved
from
https://twitter.com/SaviCharlotte/status/525511773579792384
Maynard, D., & Greenwood, M. A. (2014). Who cares about sarcastic tweets? investigating the
impact of sarcasm on sentiment analysis. In Proceedings of the Ninth Language Resources
and Evaluation (pp. 4238-4243).
Montoyo, A., Martínez-Barco, P., & Balahur, A. (2012). Subjectivity and sentiment analysis: An
overview of the current state of the area and envisaged developments. Decision Support
Systems, 53(4), 675-679..
notRhysFletcher. (2014, 24 October). @AntDeRosa coincidence? I think not [Twitter post]. Retrieved
from https://twitter.com/notRhysFletcher/status/525524440101048320
Pallant, J. (2010). SPSS survival manual: A step by step guide to data analysis using SPSS.
Maidenhead, England: McGraw-Hill.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É.
(2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning
Research, 12, 2825-2830.
pure_dreaming. (2014, 12 October). I love when people think they have to be better than me or have
to
copy
me
haha
#sarcasm
[Twitter
post].
Retrieved
from
https://twitter.com/pure_dreaming/status/521395513946411008
Rakov, R., & Rosenberg, A. (2013). " sure, i did the right thing": a system for sarcasm detection in
speech. In Proceedings of the 14th Annual Conference of the International Speech
Communcation Association (pp. 842-846).
Renkema, J. (2004). Introduction to discourse studies. Amsterdam, the Netherlands: John Benjamins
Publishing.
RyanRich80. (2014, 13 October). Is there anything better than waking up with a migraine.....on
MONDAY!
#sarcasm
#fml
[Twitter
post].
Retrieved
from
https://twitter.com/RyanRich80/status/521682205891117057
Sandycheeeeekz. (2014, 13 October). Wow what a great thanksgiving. #sarcasm [Twitter post].
Retrieved from https://twitter.com/Sandycheeeeekz/status/521732102535274496
Sophielmrobson. (2014, 22 October). No one told me that going to uni would mean getting up at
6am
#uniproblems
[Twitter
post].
Retrieved
from
https://twitter.com/sophielmrobson/status/524799839151079425
Tumasjan, A., Sprenger, T.O., Sandner, P.G., & Welpe, I.M. (2010). Predicting elections with Twitter:
What 140 characters reveal about political sentiment. In Proceedings of the Fourth
International AAAI Conference on Weblogs and Social Media (pp. 178–185). Menlo Park, CA:
23
The AAAI Press.
Twitter. (2014). [About Twitter: Company]. Retrieved from https://about.twitter.com/company/
Whalen, J. M., Pexman, P. M., & Gill, A. J. (2009). “Should Be Fun—Not!”: Incidence and Marking of
Nonliteral Language in E-Mail. Journal of Language and Social Psychology. 28(3), 263–280.
24
© Copyright 2026 Paperzz