Calculating Emotional Score of Words for User Emotion Detection in

Calculating Emotional Score of Words
for User Emotion Detection in Messenger Logs
Lun-Wei Ku and Cheng-Wei Sun
National Yunlin University of Science and Technology, Taiwan
[email protected]; [email protected]
granularities and degrade to the upper level when the
experimental materials are insufficient. With the
hierarchy, how to map categories in other researches to
ours becomes clearer and easier, and when generating the
testing instances, annotators have more information of
how to mark label current emotion. In this paper, six
emotion classes in the first layer were used in
experiments.
Subjective information was easily found in web based
chatting environments such as blog space, twitter, and
instant message programs. Among them, the webpage
based information, like the blog articles, was more public
and easily reachable, while the program based
information, like the messenger logs, was difficult to
access due to the privacy issue. However, private
messages could tell more about the emotion of users.
Therefore, detecting and analyzing emotions from them is
indispensible for further applications.
In the past, emotion analysis was usually performed
from the reader’s perspective. Therefore, the commonly
seen process to acquire the experimental corpus was first
collecting materials and then asking annotators to label
the emotions on them. However, we aim to detect
emotions from the author’s perspective in this paper;
therefore, different process to collect experimental
materials is proposed.
As far as we know, there are no materials containing
MSN logs and the corresponding user’s emotions so far.
We tried to collect sentences containing emoticons from
blogs, and viewed the emoticon within the sentence as the
blogger’s emotion to construct the experimental materials
further to learn how to detect emotion automatically.
Sentences containing emoticons were called emoticon
sentences hereafter. Then we utilized the information
learned from messenger logs to detect user’s emotions.
To evaluate the performance, we recorded the messenger
logs and asked annotators to label their emotions each
hour.
In this paper, we proposed four approaches to
calculate the emotion score of a word for each emotion
class. The experiments were designed from the author’s
perspective in both the material preparation and
Abstract
This paper utilized sentences containing emoticon
from the articles in Yahoo! blogs to automatically detect
user’s emotions from messenger logs. Four approaches,
topical approach, emotional approach, retrieval
approach, and lexicon approach, were proposed. Forty
emoticon classes found in Yahoo! blog articles were used
for experiments. Two experiments were performed. The
first experiment classified sentences into 40 emoticon
classes by calculating emotional scores of words. The
second experiment took the Yahoo! and MSN messenger
logs collected from users as the experimental materials,
classified them into 40 emoticon classes by proposed
approaches, and mapped 40 emoticon classes to 6
emotion classes to tell the user’s emotion. The best
performance of the proposed approaches for user
emotion detection was achieved by the topical approach
and its micro-average precision 0.48 was satisfactory.
Keywords: Sentiment analysis, Emoticon generation,
Emotion detection, Messenger log
1.
Introduction
Emotion analysis has become one of the major
research topics in the subjective information processing.
Related approaches as well as applications have been
proposed [1][2]. There are many ways to categorize
emotions. Different emotion states were used for
experiments in previous research [3]. To find suitable
categories of emotions, we adopted the three-layered
emotion hierarchy proposed by [4]1. In this hierarchy, 6
emotions are in the first layer, including love, joy,
surprise, anger, sadness and fear. The second layer
includes 25 emotions, and the third layer includes 135
emotions. Using this hierarchical classification of
emotions, we can categorize them from rough to fine
1
http://changingminds.org/explanations/emotions/
basic%20emotions.htm
IEEE IRI 2012, August 8-10, 2012, Las Vegas, Nevada, USA
978-1-4673-2284-3/12/$31.00 ©2012 IEEE
138
evaluation phases. The emotional scores were used to
detect the user’s emotion from messenger logs. Using the
proposed approaches, the emotional scores could be
generated even when emotion categories were different.
2.
words in the messenger log were accumulated to
determine the emotion class of the log. Four approaches
were proposed: topical approach, emotional approach,
retrieval approach and lexicon approach. Topical
approach and retrieval approach considers the importance
of words in the query sentence and the emoticon
sentences, while the emotional approach calculated the
“emoticonal” tendency of the words in the query sentence.
Lexicon approaches looked up words of different
emoticon classes in an emotion dictionary.
Experimental materials
Emoticon sentences and messenger logs were adopted
for experiments in this paper. Emoticon sentences were
used to learn the emotional scores for detecting emoticons
in sentences and testing the performance. The learned
emotional scores were then utilized to predict the
emoticon in the messenger logs and further to find the
emotion. Though the emoticon of sentences in messenger
logs was predicted, messenger logs were used only to test
the performance of emotion detection.
2.1.
3.1.
In this approach, the concept of tfɀidf was used to
calculate the emotional score of words in the query
sentence. Each emoticon sentence was treated as a
document, and then the idf of each word was first
calculated by formula (1):
(1)
idf ( wi ) log( N / N wi )
Emoticon sentences from Yahoo! blogs
Because multiple emoticons in one sentence will
increase the degree of confusion, sentences containing
exact one emoticon were collected for experiments. A
total of 1,540,163 Sentences were collected from articles
in Yahoo! blog in the period July, 2006 to June, 2007.
Forty emoticons in Yahoo! blog platform, their meanings,
and the number of the corresponding collected sentences
were listed in Table 1.
2.2.
where tf denoted term frequency in one document, idf
inverted document frequency, wi the current word, N the
total number of documents (here, emoticon sentences),
and N w the number of documents containing wi. Then the
i
idf score of the word was distributed to 40 emoticon
classes by the probability of observing the emoticon
sentences of the emoticon class cj containing wi over all
emoticon sentences. Each word wi has forty emoticon
scores and each score, denoted as emoticon(wi, cj),
corresponded to an emoticon class cj. Formulae (2) and (3)
show how to calculate these scores.
Messenger logs
Messenger logs were used as the materials to detect
emotions. They were originally used for creating
intelligent ambient according to emotions of users [5].
We collected texts from Yahoo! Messenger and MSN
Messenger logs of 8 annotators. Whenever there was at
least one new message, once an hour the collecting
program would pop up the menu and ask them to annotate
their current emotion. A total of 150 records were
annotated for experiments and statistics were shown in
Table 2. The quantity was not large because we needed to
wait for the annotators’ chat, and at most one record
would be generated each hour by one annotator.
Emo
Log #
1
11
2
80
3
1
4
15
5
39
emoticon( wi , c j )
prob ( s wi ,c j )
where s wi ,c j
prob( swi ,c j ) ˜ idf ( wi )
(2)
N ( s wi ,c j )
(3)
N
denoted those sentences of emoticon class
cj containing word wi. After that, the emotional score of
wi in emotion class ek was calculated by summing up the
scores of the emoticons mapped to ek as shown in formula
(4). The mapping of the emoticon classes and the emotion
class was listed in Table 3. The emotion class emo_class
of the query sentence sq was determined by formula (5).
(4)
emotion( wi , ek )
emoticon( wi , c j )
¦
6
4
c j ek
emo _ class( sq ) arg max
Table 2. Annotated messenger logs
(Emo: Emotions, 1=Love, 2=Joy, 3=Surprise, 4=Angry,
5=Sad, 6=Fear; Log #: number of messenger logs)
3.
Topical Approach
ek
¦ emotion(w , e )
i
k
(5)
wisq
Because an emoticon sentence was treated as a
document, if a word appeared multiple times in one
emoticon sentence, its idf would be accumulated for each
observation. Therefore, the term frequency (tf) was
implicitly calculated in formula (5) following the
definition of tfɀidf that tf was the frequency of a term in
the current document.
Learning emotional scores of words
The major aim of the proposed approaches was to
detect user’s emotions in messenger logs. As mentioned,
the emoticon sentences were treated as the learning
materials and from them the emotional score of each word
was calculated. The learned emotional scores of the
139
1. smile
75,650
11.
surprise
58,175
2. sad
41,118
3. blink
42,334
4. happy
114,969
5. blink2
44,328
6. bother
43,413
12. angry
13. smug
14. cool
15. worried
16. evil
46,618
22.
stubborn
13,050
20,796
28,957
44,390
23. same
24. sleepy
25. eh…
6,938
33. let me
31. yeh? 32. slobber
think
21,039
37,101
14,149
10,011
15,461
21.angel
6,617
34. is it?
28,560
35. clap
17. cry
19,873
26. sick
36. pray
44,342
8. shy
48,406
9. tongue
71,402
18. laugh
19. honest
29. clowd
5,855
131,148
28. won’t
tell
32,563
37. sign
38. humph
39. flower
19,783
55,859
92,635
27. secret
17,577
52,880
7. love
106,274
17,480
10. kiss
29,414
20. so
stuck
17,170
30. don’t
be stupid
35,024
11,710
3,256
40. pig
13,838
Table 1.Statistics of emoticons and emoticon sentences
ek (emotion)
cj (emoticon)
Love
7(love), 8(shy), 10(kiss)
Joy
1(smile), 4(happy), 13(smug), 18(laugh)
Surprise
11(surprise)
Angry
12(angry)
Sadness
17(cry), 37(sign)
Fear
15(worried)
were retrieved and the majority of their emoticons was
treated as the emoticon class of the query sentence. The
retrieving of the emoticon sentences was based on tfɀidf
model, too. The ranking score rank_score(sn) of the
emoticon sentence sn was calculated by formula (6), and
then emotional class of sq was determined by formula (7).
emo _ class ( s q )
arg max sek , rank ( sek ) d 10
(7)
ek
3.4.
Emotional Approach
Lexicon Approach
The lexicon approach was performed by looking up
words in an emotion dictionary and calculating their
emotional scores. The emoticon sentences were not used
in this approach; instead, the Chinese emotion dictionary
[6] was adopted. In this dictionary, lexicons were
categorized into eight emotion types: awesome,
heartwarming, surprising, sad, useful, happy, boring, and
angry. These eight emotion types appeared in Yahoo!
News Taiwan in the year 2008 but as we can see, not all
of them were general emotion states. Therefore, we tried
to find Lin’s emotion categories in Parrott’s emotion
hierarchy before using his dictionary. Those could not be
found were categorized into the Other class.
Ku and Chen’s approach [7] for calculating sentiment
scores senti_score(wi) was then adopted to give scores to
these lexicons, as the emoticon sentences were no longer
used for learning scores. The scores of lexicons of the
same emotional classes were summed up and the emotion
category of the highest total score was selected as the
detected emotion as shown in the formula (8).
The concept of using tfɀidf is to find words observed
more often (tf) in fewer documents (idf). These terms can
be viewed as the representative of a document. In
emotional approach, we tried to find the representative
words for each emoticon class and calculate the score to
denote the degree of representation. Therefore, in
emotional approach, the emoticon sentences of the same
emoticon class were concatenated into one document. A
total of 40 documents were generated and the
corresponding tf ɀ idf scores were calculated for each
word to be the emoticon(wi, cj), Then the emo_class(sq)
was determined by formula (4) and (5), too.
3.3.
(6)
i
wisn ˆsq
Table 3. The mapping of emoticon classes and
the emotion class
3.2.
¦ idf (w )
rank _ score( sn )
Retrieval Approach
The retrieval approach treated the emoticon detection
problem as an information retrieval (IR) problem. The
query sentence sq was the query in IR system, and the
emoticon sentences which contained the most important
words in sq were retrieved to vote for its emoticon class.
To evaluate IR approaches, P@10 is a common measure
which calculates the precision of the most highly ranked
ten results. P@10 was used because the user usually cared
the highly ranked documents and hence most IR system
tried to improve the precision of retrieving them.
Following the concept of the information retrieval and
P@10, the most highly ranked 10 emoticon sentences
emo _ class( sq )
arg max
ek
¦ senti _ score(w , e )
i
k
(8)
wi sq ,lexk
where lexk denoted the lexicon set which belonged to
emotion class ek.
140
1.
0.042
11.
0.108
21.
0.002
31.
0.034
2.
0.007
12.
0.247
22.
0.001
32.
0.167
3.
0.007
13.
0.003
23.
0.007
33.
0.014
4.
0.153
14.
0.01
24.
0.181
34.
0.001
5.
0.008
15.
0.025
25.
0.0
35.
0.153
6.
0.142
16.
0.008
26.
0.031
36.
0.255
7.
0.377
17.
0.454
27.
0.004
37.
0.002
8.
0.078
18.
0.646
28.
0.005
38.
0.003
9.
0.033
19.
0.001
29.
0.001
39.
0.063
10.
0.033
20.
0.001
30.
0.025
40.
0.011
Table 4. Precision of emoticon detection (topical approach)
4.
The lexicon approach was different from the other
three in that it did not calculate scores based on emoticon
sentences. Its performance was the second among all. The
advantage of using lexicons was that we could find words
not appearing in the emoticon sentences and hence would
still be able to know the emoticon class of sentences, even
though there were no previously seen words in them. The
performance of the emotion class Surprise (1.000, 58,175
emoticon sentences) in Table 5 showed this phenomenon.
However, having fixed lexicon set was also its
disadvantage. When there were many emoticon sentences
so that scores of various words were learned in the topical
approach, the lexicon approach would suffer from limited
lexicons when determining the emotion class.
Experimental results and discussions
To evaluate the performance of the emoticon detection
in emoticon sentences and the emotion detection in
messenger logs, 10-fold experiments were performed.
The best results of emoticon detection among four
approaches, generated by the topical approach, were
listed in Table 4, and the results of four approaches for
emotion detection were listed in Table 5.
Table 4 showed that the performance of emoticon
detection for each emoticon class varied. The best
performances were found in three class: Laugh (class 18,
0.646), Cry (class 17, 0.454) and Joy (class 7, 0.377), and
Table 1 showed that the number of the emoticon
sentences of these classes overwhelmed the other classes.
Figure 1 further showed the relation between the number
of emoticon sentences and the performance of emoticon
detection. From Figure 1 we could see that the curve of
the number conformed to the curve of the performances
of three approaches. That is, collecting more emoticon
sentences would help improve the performance.
Table 5 showed that on average the best performance
of the user emotion detection was achieved by using the
emotion scores generated by the topical approach, while
the emotional approach performed the worst. After
looking over the emotional scores, we found that the
unsatisfactory performance was caused by the
concatenation of the emoticon sentences of the same class.
This process made forty very large documents, where the
term frequency might become the dominate factor and
deteriorated the performance.
The retrieval approach was better than the emotional
approach, but worse than the topical approach. Both the
topical approach and the retrieval approach determined
the results according to the tfɀidf score. However, the
topical approach distributed it to emotion classes, while
the retrieval approach utilized it to rank sentences for
voting the result. As a result, we can say that considering
the important words of the query sentence to find the
emotion class performs better than letting similar
sentences to determine for it.
Figure 1. Number of emoticon sentences and the
performances of 40 emoticon classes
As to performance of user emotion detection shown in
Table 5, all approaches tended to perform unsatisfactory
for emotion class Love, Angry and Fear. For Angry and
Fear, Figure 1 has shown that the insufficiency of
emoticon sentences was one causing factor of the low
performance. Moreover, Table 2 showed that these
emotion classes were seldom selected by annotators. Logs
of these classes might be related to specific events
represented by special word compositions instead of a
certain subjective words.
Table 6 showed the confusion matrix of user emotion
detection. It showed that messenger logs of Sadness were
often classified to Joy. Two major characteristics were
141
found in Sadness logs: lack of words or only commonly
used words were found. The former characteristic made
us difficult to determine its emotion class because there
was no information; we would need additional context by
using n-grams instead of individual words to get more
information for the sentence of the latter characteristic.
Bellegarda reported that his best f-measure was 0.340
also for 6 categories. Notice that his work analyzed from
the reader’s perspective, while our work analyzed from
the author’s perspective. The emotion analysis from
author’s perspective was generally considered more
difficult than from the reader’s perspective as what a user
felt might not be consistent with what he/she wrote in
messengers. Therefore, though Bellegarda’s experiments
and experiments in this paper were done on different
datasets and evaluated by different metrics, the microaverage precision of the topical approach, 0.480, reported
by this paper was considered satisfactory.
approach
class
found in the current sentence, in other words, the suitable
emoticon for the current sentence. Suzuki and Tzuda [8]
analyzed texts and generated emoticons for sentences
automatically for the cell-phone message, which was
similar to what we did. Emoticons have been used to
reduce dependency in machine learning techniques for
sentiment classification [9]. Therefore, emoticon
sentences were also used for subjective sentence
classification. However, sentences in those researches
were classified into only positive and negative classes
[10], or an additional neutral class instead of forty
emoticon classes, which was different from our
experiments.
Various emotion dictionaries were proposed. Some of
them include emotions but not limited to emotions, like
General Inquire 2 and SentiWordnet [11]. Some created
dictionaries or lexicons specific for emotion analysis [12].
Researchers utilized the existing emotion dictionaries to
classify emotions in texts [13]. However, Osherenko and
André did research on the necessity of affect dictionaries
and found that they did not provide much additional
information [14], which conformed to our results.
Knowing emotions is a good way to predicting the
further actions or demand of users. Many researchers
found experimental materials on the Internet chatting
platforms [15]. Matsumoto, Fuji, and Kuroiwa developed
a system utilizing sentence structures to estimate the
emotion from a text paragraph [16], which looked like the
system we would like to build. However, they analyzed
the texts from reader’s perspective, which was different
from our aim and future applications.
Topical Emotional Retrieval Lexicon
Love
0.000
0.000
0.000
Joy
0.850
0.238
0.438
0.325
Surprise
0.000
0.000
0.000
1.000
0.000
Angry
0.000
0.000
0.000
0.000
Sadness
0.103
0.000
0.103
0.026
Fear
0.000
0.000
0.000
0.000
Macro-Avg
0.159
0.040
0.090
0.225
Micro-Avg
0.480
0.127
0.260
0.187
Table 5. Precision of user emotion detection
in messenger logs
S
A
1
2
3
4
5
6
1
2
3
4
5
6
0
3
0
0
0
0
9
68
1
13
33
4
1
0
0
0
0
0
0
1
0
1
0
0
0
4
0
1
4
0
0
0
0
0
0
0
6.
This paper aimed to detect user’s emotions from the
messenger logs, which is an emotion analysis research
problem from the author’s perspective. Forty emoticon
classes and six emotion classes were adopted for the
experiments. The emoticon sentences from Yahoo! blogs
were utilized to learn the emotional score of words. Four
approaches including topical approach, emotional
approach, retrieval approach, and lexicon approach were
proposed. The topical approach performed the best, while
the lexicon approach did better in minority classes. The
micro-average precision of the topical approach achieved
0.48, which was satisfactory compared to other researches.
Several improvements can be implemented in the
future. To improve the topical approach, different scoring
functions will be tested. To improve the emotional
approach, a sentence classification can be performed and
sentences of the same class can be concatenated to
generate short documents for each emoticon class. To
improve the retrieval approach, real information retrieval
Table 6. Confusion matrix of emotion detection
(topical approach; S:system; A:answer)
5.
Conclusion and future work
Related work
The experiments in this paper were related to several
research problems, including emoticon generation,
emotional lexicon collection, and emotion detection.
The emoticon detection experiments in this paper can
be viewed as an emoticon generation process. Our
approaches predicted whether an emoticon should be
2
142
http://www.wjh.harvard.edu/~inquirer/
systems such as Lucene3 will be adopted. To improve the
lexicon approach, we will try to use more emotional
dictionaries in the experiments. In this paper, we suffered
from the unbalanced materials among classes. We plan to
collect more materials for small classes to further examine
the effect of the quantity factor.
[8] Nobuo Suzuki, and Kazuhiko Tsuda, “Automatic
Emoticons Generation Method For Web Community”,
IADIS International Conference on Web Based
Communities, San Sebastian, Spain, 2006, pp. 331-334.
[9] Jonathon Read, "Using emoticons to reduce
dependency in machine learning techniques for sentiment
classification", Proceedings of the ACL Student Research
Workshop, 2005, pp. 43-48,
Acknowledgements
Research of this paper was partially supported by
National Science Council, Taiwan, under the
contract NSC100-2218-E-224-013-.
[10] Ying-Tse Sun, Chien-Liang Chen, Chun-Chieh Liu,
Chao-Lin Liu, and Von-Wun Soo, “Sentiment
Classification of Short Chinese Sentences”, Proceedings
of the 22nd Conference on Computational Linguistics and
Speech Processing (ROCLING 2010), Nantou, Taiwan,
2010, pp. 184-198.
References
[1] Dipankar Das, “Analysis and Tracking of Emotions in
English and Bengali Texts: A Computational Approach”,
Proceedings of the International World Wide Web
Conference (WWW 2011), 2011, pp. 343-347.
[11] Stefano Baccianella, Andrea Esuli, and Fabrizio
Sebastiani, "SENTIWORDNET 3.0: An Enhanced
Lexical Resource for Sentiment Analysis and Opinion
Mining", Proceedings of the Seventh International
Conference on Language Resources and Evaluation
(LREC 2010), Malta, May 17-23, 2010, pp. 2200-2204.
[2] Elena Frantova, and Sabine Bergler, “Automatic
Emotion Annotation of Dream Diaries”, K-CAP
Workshop on Analyzing Social Media to Represent
Collective Knowledge, California, USA, 2009.
[12] Ge Xu, Xinfan Meng, and HoufengWang, “Build
Chinese Emotion Lexicons Using A Graph-based
Algorithm and Multiple Resources”, Proceedings of the
23rd International Conference on Computational
Linguistics (Coling 2010), Beijing, 2010, pp. 1209-1217.
[3] Bellegarda, Jerome R, “Emotion Analysis Using
Latent Affective Folding and Embedding”, Proceedings
of the NAACL HLT 2010 Workshop on Computational
Approaches to Analysis and Generation of Emotion in
Text, Los Angeles, 2010, pp. 1-9.
[13] Futoshi Sugimoto, and Yoneyama Masahide, “A
Method for Classifying Emotion of Text Based on
Emotional Dictionaries for Emotional Reading”, Second
International Symposium on Communications, Control,
and Signal Processing, Marrakech, Morocco, 2006.
[4] W. Parrott, “Emotions in Social Psychology”,
Psychology Press, Philadelphia, 2001.
[5] Lun-Wei Ku, Cheng-Wei Sun, and Ya-Hsin Hsueh,
“Demonstration of IlluMe: Creating Ambient According
to Instant Message Logs”, Proceedings of 50th Annual
Meeting of Association for Computational Linguistics,
demo paper, Jeju, Republic of Korea, July 8-14, 2012.
[14] Alexander Osherenko and Elisabeth André, “Lexical
Affect Sensing: Are Affect Dictionaries Necessary to
Analyze Affect?”, Affective Computing and Intelligent
Interaction (ACII2007), LNCS 4738, Lisbon, Portugal,
2007, pp. 232–243.
[6] Kevin Hsin-Yih Lin, Changhua Yang, and Hsin-Hsi
Chen, “Emotion Classification of Online News Articles
from the Reader’s Perspective”, Proceedings of the 2008
IEEE/WIC/ACM International Conference on Web
Intelligence, 2008, pp. 220-226.
[15] Shashank and Pushpak Bhattacharyya, "Emotion
Analysis of Internet Chat", Proceedings of ICON-2008:
6th International Conference on Natural Language
Processing, Macmillan Publishers, India.
[7] Lun-Wei Ku and Hsin-His Chen, “Mining Opinions
from the Web: Beyond Relevance Retrieval”, Journal of
American Society for Information Science and
Technology, Special Issue on Mining Web Resources for
Enhancing Information Retrieval, 58(12), 2007, pp. 18381850.
3
[16] Kazuyuki Matsumoto, Ren Fuji, and Shingo
Kuroiwa, “Emotion Estimation System Based on Emotion
Occurrence Sentence Pattern”, International Conference
on Intelligent Computing (ICIC 2006), LNAI 4114, 2006,
pp. 902–911.
http://lucene.apache.org/core/
143