International Journal of Advanced Intelligence
Volume 3, Number 1, pp.1-24, March, 2011.
c AIA International Advanced Information Institute
⃝
Analysis of Wakamono Kotoba Emotion Corpus and
Its Application in Emotion Estimation
Kazuyuki Matsumoto
Yusuke Konishi
Hidemichi Sayama
Fuji Ren
Faculty of Engineering, University of Tokushima
2-1 Minami-josanjima, Tokushima, 770-8506, Japan
{ matumoto;ren} @is.tokushima-u.ac.jp
Received (November 10, 2010)
Revised (January 18, 2011)
Recently, there is a lot of research that aims to estimate emotion from text. The meanings
of linguistic expressions used in daily life vary depending on the context in which they
are used. That is to say, the information they contain presents ambiguities. Especially
the so-called “Wakamono Kotoba,” Japanese language used by young people contains
semantic ambiguities. Such words are usually not included in the existing dictionaries,
making the meanings of these words difficult to be recognized. In this research project
we proposed a method to estimate emotion from sentences that include Wakamono Kotoba by using statistical learning methods such as Naı̈ve Bayes method and Accumulation
method. The existing research usually focused on learning methods using word or word
N-gram as features. However, such word-based features are insufficient to process Wakamono Kotoba because Wakamono Kotoba often cannot be recognized as one semantic
word by morphological analysis. In this paper we describe how we constructed the linguistic resource of Wakamono Kotoba emotion corpus to be used for emotion recognition
and introduce the features we obtained from statistical analysis. Our Wakamono Kotoba
emotion corpus includes Japanese words used by young people to express emotion. These
were mainly gathered from Weblogs that were written by young people from their teens
to their twenties.
Keywords: Emotion Recognition, Wakamono Kotoba, Emotion Corpus.
1. Introduction
With the recent development of information and network technology, we are becoming better able to communicate with each other remotely by using electric tools
such as e-mail or web chat. These text-based communications cannot confirm the
other’s facial expressions or vocal sounds; therefore, users tend to fail to understand
the communication partner’s emotions. Combined with the anonymity afforded by
these technologies, the lack of emotional transparency inherent in these communications sometimes results in misunderstandings or personal attacks being traded off
between people.
Emoticons and pictographs are often used to carry out text-based communication smoothly and convey the writer’s emotion precisely. However, such emoticons
1
2
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
and pictographs are supplementary after all. Some newly created words can express complex and rich emotions without using them. For example, “Uza-kawaii”
conveys negative emotions such as annoying or foolish; however, it also conveys
positive emotions such as cute4 .
Such words vividly express complex emotions. They are generally called “Wakamono Kotoba” and typically used among young Japanese people5,6 . Against the
background of increasing Internet terminology, which is used only in Web communication, Wakamono Kotoba is thought to be used mostly by people of the young
generation, usually under twenty years old. They use Web communication tools
such as Weblogs on a daily basis, and use Wakamono Kotoba there2,3 . Many of
the Wakamono Kotoba are the abbreviations of other words and very difficult to
understand for older generations. For example, it is relatively easy to imagine that
“Me-ad” is the abbreviation of “mail-address,” however, “Damu-i” and “Chikiru”
are incomprehensible gobbledygook because they do not retain the original form of
the words.
Some Wakamono Kotoba consist of existing words; however, they appear semantically unnatural because they are used with meanings different from the original
meanings. For example, the word “Ton-Katsu” is a popular Japanese dish’s name;
however, young people sometimes use “Ton-Katsu” to mean “win from monumental
condition.”
• Ton demonai joukyo kara Katsu. (To win from monumental condition.)
Among these abbreviations used in Wakamono Kotoba there are several types.
Some shorten the words but do not change the part of speech: from noun to abbreviated form of noun, adjective to abbreviated form of adjective, and verb to
abbreviated form of verb. Others shorten the sentences as jargon whose meaning
is especially difficult to decipher. Our research group has been engaged in estimating emotion of a writer from Weblogs. As an approach to realizing our aim, we
focused on wordings used on Web. We also thought that it was impossible to ignore
Wakamono Kotoba for emotion estimation from Weblogs because there were many
emotional expressions of Wakamono Kotoba used on Weblogs. There is no clear line
between the words used by young generations in daily life and words frequently used
on the Internet, and the definition of Wakamono Kotoba is ambiguous3,8 . Therefore, in our study we focused on newly generated words of the last few years and
designated the words as Wakamono Kotoba using our judgment. The chosen words
were set as keywords and sentences including the keywords were automatically collected from Weblogs. We also annotated emotion tags on the collected sentences
including Wakamono Kotoba to create a corpus. By using this corpus we conducted
the experiments for emotion estimation from the sentences including Wakamono
Kotoba.
Section 2 describes a method for estimating the emotion of the writer based on
Wakamono Kotoba and describes how we constructed a Wakamono Kotoba emotion
corpus. Section 3 compares the proposed method based on the emotion estimation
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
3
model and the existing method by conducting experiments to evaluate the effectiveness of the proposed method. A discussion of the obtained results will be found
in Section 4. Finally, we conclude and outline future work in Section 5.
2. Proposed Method
2.1. Emotion Estimation Method for Sentences Including
Wakamono Kotoba
Wakamono Kotoba is mainly classified into two groups: one expresses or evokes
emotion and another does not. However, this classification is sometimes not so
clear semantically. For example, the word “Yabai” originally means “dangerous,”
but young generations use it to mean “great.” “Yabai” is a direct example where
the original negative emotion is changed to a positive emotion in the usage of
Wakamono Kotoba.
Most Wakamono Kotoba are coined words; therefore, they are not registered in
dictionaries. Thus it is difficult to split them into units of words properly by morphological analysis, and they are sometimes judged as unknown words. Table 1 shows
statistics about parts of speech used in Wakamono Kotoba that were recognized
as one word. We used the MeCab ver. 0.98a MeCab has unknown word processing
capability and it does not annotate the tag of part of speech as “unknown” in the
default setting.
Table 1. POS analysis of Wakamono Kotoba as the POS tagger.
Part of Speech
Noun
Verb
Adjective
Adverb
Prefix
Freq.
95
7
4
3
1
Table 2 shows examples of the words that were recognized as “unknown” and
“noun.” (Unknown words were set to be tagged “UNKNOWN.”). The table shows
that Wakamono Kotoba recognized as “unknown one word” mostly consisted of
only Hiragana or Katakana characters.
Many Wakamono Kotoba are compound words that consist of combinations of
existing words. Therefore, if the Wakamono Kotoba cannot be recognized as one
word, it will be difficult to distinguish the word as a part of Wakamono Kotoba and
not as a word with traditional meaning. This might decrease accuracy of emotion
estimation. We thought that it would be a problem for estimating the emotion of
a http://mecab.sourceforge.net/
4
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
the writer if word segmentation in pre-processing did not work correctly, given that
most methods for emotion estimation from text use words as a feature. Harada10,11
and Kubomura12 . attempted to convert Wakamono Kotoba into known words. Their
research focused on Wakamono Kotoba of transformation type and tried to convert
the Wakamono Kotoba into existing known words based on their features. They
aimed to improve the accuracy of analysing the sentences including Wakamono Kotoba on Weblogs. However, if the conversion of Wakamono Kotoba into known words
failed, the sentence would be interpreted as having a completely different meaning.
Also, a subtle nuance of emotion might be changed by converting Wakamono Kotoba into existing known words. Therefore, we thought that a method for emotion
estimation without converting Wakamono Kotoba would be necessary. Of course,
sentences include other emotional words outside of Wakamono Kotoba, and those
should also be used as features.
In this paper, we extended the existing emotion estimation method based on
word features, and proposed a new method specialized for emotion estimation from
the sentences including Wakamono Kotoba.
Table 2. Example of unknown word and noun.
POS
Freq.
Example
UNKNOWN
69
ギガンティック, ぼふぼふ, モゲモゲ, カルフール,
ぐわんぐわん, チュープリ, チャイチャイ, ドカド
カ, ラブラブ, etc
Noun
33
オルソン, ゼニガメ, 友希, 鬼太郎, エレキ, 度肝,
シリコン, etc
Table 3. Typical example of Wakamono Kotoba.
イタメシ,渋カジ,きよぶた,ウルウル,KY,うざい,キモい,
けばい,セレブ, ぱくる,すっぴん,タカビー,どたキャン,
オタッキー,カンペ,卒アル,脳内ミュージック,しゃばい
2.2. Proposed System
In this subsection, we describe the overview of a system to estimate emotion from
sentences including Wakamono Kotoba. This paper does not discuss extracting
Wakamono Kotoba from sentences because we target only the sentences including
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
5
Table 4. Basic emotion category.
Anger
Love
Surprise
Anxiety
Respect
Hate
Joy
Sorrow
Hope
Wakamono Kotoba. First, inputted text consisting of several sentences is split into
sentence units. The means to split sentence units was the use of punctuation marks
at the end of the sentence, for example, ‘ 。’ or ‘ .’. Next, we match each word in
the sentences to the Wakamono Kotoba dictionary. If Wakamono Kotoba is matched
in a sentence, the sentences are split before and after the Wakamono Kotoba and
word N-grams are produced for each sentence. Then, the emotion estimation result
is obtained by using the obtained features, based on the Wakamono Kotoba emotion
dictionary and the emotion estimation model. We used Naı̈ve Bayes classification
or accumulation method for learning the emotion estimation model. Finally, the
emotion category that obtained maximum evaluation scores is outputted.
2.2.1. Wakamono Kotoba Emotion Corpus
The Wakamono Kotoba emotion corpus consists of the sentences including Wakamono Kotoba. Each sentence is annotated with tags to indicate the type of emotion.
2.2.2. Wakamono Kotoba
Wakamono Kotoba is defined as slang or jargon used by people aged from junior high
school age to around their thirties. It is typically used to promote communication,
amusement or solidarity, to convey ambiguous image, or to hide/alleviate/clarify
something. It is also considered to include specific words or phrases conveying freedom from traditional rules or a sense of amusement3 . However, Wakamono Kotoba
is different from slang in the way that only young people use them frequently. The
Online Slang dictionary1 introduces many English slangs.
Table 3 shows typical Wakamono Kotoba as it appeared starting in the late
1990s to the present. These words usually come and go with the times; therefore,
the existing language processing tools such as morphological analysis system cannot
handle most of them.
2.2.3. Selection of Wakamono Kotoba
Because there are many words that can be applied to the definition of Wakamono
Kotoba, we focused on words fitting the following conditions:
1) Words frequently used on the Web
2) Words that can express or evoke any emotion by themselves
6
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
Existing known words are sometimes used with different meaning as Wakamono
Kotoba. Such ambiguous words were also considered for our study. The collected
Wakamono Kotoba were registered in the Wakamono Kotoba dictionary. Most of
Wakamono Kotoba are not clear about a part of speech or semantic attribute and
their definition is very difficult; therefore, we did not annotate such attributes.
2.2.4. Construction of Wakamono Kotoba Emotion Corpus
We constructed the Wakamono Kotoba emotion corpus as follows:
1. Search Weblogsb
2. From the obtained articles including Wakamono Kotoba, choose the sentences with semantically true Wakamono Kotoba for the corpus, considering
that some existing words have the same spellings with Wakamono Kotoba.
3. Add emotion tags to each sentence in the corpus. The kinds of annotated
emotion tags are in Table 4. The distribution of annotation for each tag
varied per Table 5.
If the sentences obtained from weblogs expressed any emotions without Wakamono Kotoba, they were registered in another emotion corpus only when the expressed emotion was Anger, Hate, Joy or Hope, which were the top four emotion
categories appeared in the Wakamono Kotoba emotion corpus.
Table 6 shows the statistics of each emotion. Table 9 shows the example sentences
included in the corpus. Table 10 shows the example of the registered words.
Table 7 shows the statistics of the part of speech distribution of Wakamono
Kotoba when they were recognized as one morpheme. For morphological analysis
we used three dictionaries: IPA dictionaryc , UniDicd and Naist dictionarye . The
distributions of part of speech by the IPA dictionary and the Naist dictionary were
similar because both dictionaries are based on the same POS tag set.
When morphological analysis was conducted by setting unknown word to be
tagged “UNKNOWN”, the POS was distributed as in Table 8. The Wakamono Kotoba were most frequently recognized as noun by UniDic among three dictionaries.
Our research aims to estimate emotion based on words, however, Wakamono Kotoba do not always have to be analyzed as one word. As a feature we can also use
a string of multiple words that expresses more complex meanings. Therefore, we
calculated how many Wakamono Kotoba failed to be recognized as the sequential N
morphemes. As the Fig.1 shows, by shifting the string of words by word, we checked
if the string of words matches Wakamono Kotoba or not.
Fig.2 shows the statistic result of the morphological anlayis error of Wakamono
Kotoba with three morphological analysis dictionaries. As this result shows, when
b http://blog-search.yahoo.co.jp
c http://sourceforge.jp/projects/ipadic/
d http://www.tokuteicorpus.jp/dist/
e http://sourceforge.jp/projects/naist-jdic/wiki/FrontPage
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
Input
7
懐メロ・・・・そんなに好きじゃないハズなのに時々脳内ミュージックになるよね(笑)
Split into morphemes
懐メロ / ・ / ・ / ・ / ・ / そんなに / 好き / じゃ / ない / ハズ / な / のに /
時々 / 脳 / 内 / ミュージック / に / なる / よ / ね / ( / 笑 / )
Extract feature
N=1
懐メロ / ・ / ・ / ・ / ・ / そんなに / 好き / じゃ / ない / ハズ / な / のに /
時々 / 脳 / 内 / ミュージック / に / なる / よ / ね / ( / 笑 / )
N=2
懐メロ・ / ・・ / ・・ / ・・ / ・そんなに / そんなに好き / 好きじゃ / じゃない /
ないハズ / ハズな / なのに / のに時々 / 時々脳 / 脳内 / 内ミュージック
/ ミュージックに / になる / なるよ / よね / ね( / (笑 / 笑) / )
N=3
懐メロ・・ / ・・・ / ・・・ / ・・そんなに / ・そんなに好き / そんなに好きじゃ /
好きじゃない / じゃないハズ / ないハズな / ハズなのに / なのに時々 /
のに時々脳 / 時々脳内 / 脳内ミュージック / 内ミュージックに / ミュージッ
クになる / になるよ / なるよね / よね( / ね(笑 / (笑)
Fig. 1. Example of checking morphologcial analysis result.
the maximum number of the morpheme (N) was over 4, there was a few reduction
of anlysis error.
Table 5. Distribution of tag annotation to the corpus (Wakamono Kotoba emotion corpus).
Tag
Hate
Joy
Hope
Anger
Anxiety
Love
Sorrow
Respect
Surprise
Freq.
1,273
1,029
960
687
264
188
130
127
115
Percentage(%)
26.67
21.56
20.11
14.39
5.53
3.94
2.72
2.66
2.41
8
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
Table 6. Distribution of tag annotation of the corpus (regular emotion corpus).
Tag
Anger
Joy
Hate
Hope
Freq.
2,269
1,236
1,103
742
Table 7. POS distribution of Wakamono Kotoba by POS tagger based on the three dictionaries
(default setting).
Dic.
IPA
UniDic
Naist
Noun
1309 (88.0)
1050 (70.6)
1288 (87.1)
Verb
83 (5.6)
114 (7.7)
92 (6.2)
Adjective
82 (5.5)
155 (10.4)
77 (5.2)
Adverb
10 (0.7)
84 (5.6)
9 (0.6)
Other
4 (0.3)
84 (5.6)
12 (0.8)
Table 8. POS distribution of Wakamono Kotoba by POS tagger based on three dictionaries (unknown setting).
Dic.
IPA
UniDic
Naist
UNKNOWN
911 (61.2)
224 (15.1)
863 (58.4)
Noun
399 (26.8)
826 (55.5)
434 (29.4)
Verb
83 (5.6)
114 (7.7)
92 (6.2)
Adjective
82 (5.5)
155 (10.4)
77 (5.2)
Adverb
10 (0.7)
84 (5.6)
9 (0.6)
Table 9. Sample sentences in the corpus.
目がドロンとしていてキモイ!
Hate
体によさそうなので、カニカマとローテがいい Hope
かもしれませんね(笑)久々のガングロちゃん。
さらにパワーアップ?
!
むかつくというか、いらつくというか・
・
・
・
Anger
Other
3 (0.2)
84 (5.6)
3 (0.2)
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
9
1500
2000
IPA
UniDic
Naist
500
1000
Num. of Error
2500
3000
Error of Morphological Analysis
0
1
2
3
4
5
6
7
Max of N
Fig. 2. Morphological analysis error of Wakamono Kotoba.
Table 10. Example of registering words.
Word
Pasokin ( パソ禁 )
Karouji ( 過労児 )
Tekuhara ( テクハラ )
Oyaji gyag ( オヤジギャグ )
Uzakawaii ( ウザ可愛い )
Ekokawa ( エコかわ )
Konotai ( 好タイ )
Gloss
being prohibitted from using personal computer
child who is very busy and never
catches a break
technology harassment
old man jokes
annoying but cute
cheap and cute item
one’s type
Emotion
Hate
Hate
Hate
Hate
Joy
Joy
Hope
2.2.5. Wakamono Kotoba Emotion Estimation Model
The sentences including Wakamono Kotoba are usually spoken or casual lines. For
such sentences, the emotion estimation method based on grammatical rules or sentence patterns cannot be applied.
Matsumoto et al.16 proposed an emotion estimation from the sentences by fitting
the sentences into the prepared sentence patterns. When the sentences did not fit
into the sentence patterns, they estimated sentence emotions from the words in
10
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
the sentences. Aman et al.17 estimated sentence emotion based on words in the
sentences referring the Roget’s thesaurus and WordNet-Affect. They used the basic
six emotion categories proposed by Ekman for estimation.
Changqin et al.18 originally constructed a huge Weblog emotion corpus and proposed the emotion estimation method based on the keywords expressing emotions.
Mishina et al.19 proposed an emotion estimation method based on similarity of
word N-grams.
They used the emotion tagged corpus as training data and obtained the appearance frequency of word N-grams for emotion estimation. However, these researchers
did not discuss the applicability to sentences that include words such as those of
Wakamono Kotoba, which were usually not included in the dictionary. Therefore,
we used the Naı̈ve Bayes method that utilized the traditional text classification algorithm and was efficient for spam mail classification. For comparison we also used
“the accumulation method” which used a simpler algorithm than the Naı̈ve Bayes
method.
Yamamoto et al.9 used Naı̈ve Bayes classifiers for emotion estimation and reported that the method could estimate emotion with approximately 76.8% accuracy
by using words as features. When a sentence without Wakamono Kotoba was inputted, the Naı̈ve Bayes classification model constructed based on the sentences
without Wakamono Kotoba was used for emotion estimation.
1) Naı̈ve Bayes Classification: Naı̈ve Bayes classification is based on the
idea that the words included in a sentence should greatly contribute to deciding the
sentence category. When a set of words included in sentence S is defined as W , the
category that maximizes the posterior probability P (c|S) is calculated to decide the
category of sentence S.
Equation 1 shows Naı̈ve Bayes algorithm.
c = argmax P (ck )
ck ∈C
∏
P (wi |ck )
(1)
i=1
Even if just one feature word is not in the training data that is included in
the target sentence, the evaluated value for the target sentence becomes 0, which
signals “problems of data sparseness.” Therefore, we use the “Laplace method” as a
smoothing method. The Laplace method is a method that adds 1 to the frequency
of the feature word in the training data to make the frequencies of all feature words
over 1. Equation 2 shows how to correct the probability of occurrence of the feature
Words wi in the Category ck ∈ C through the Laplace method. f reqi,j in the
equation indicates the frequency of occurrence of Word wi in Category cj . f reqk
is the total number of the feature words in Category ck ∈ |C| indicating the total
number of the kind of categories.
P (wi |ck ) =
f reqi,j + 1
f reqk + |C|
(2)
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
11
2) Accumulation Method: The accumulation method is a kind of text classification method proposed by Suzuki13,14 . This method uses the frequency of the
feature words for each category and decides the most frequently appearing category
as the category to which each feature word belongs. By calculating the difference of
the frequency of the feature word, the evaluation value for the category the feature
word belongs to is decided. Equation 3 shows how to calculate the difference of the
frequency.
N DF (wm , C) =
f1 (wm ) − f2 (wm )
f (wm )
(3)
C indicates a set of all categories. N DF (wm , C) is calculated by subtracting
the appearance frequency of the feature word wm in the second most frequently
appeared category f1 (wm ) from the appearance frequency of that in the most frequently appearing category f2 (wm ), then dividing it by the sum of the appearance
frequency of the feature word wm in all categories. One of the merits of this method
is that it can reduce the amount of calculation, especially in dealing with enormous
ranges of features, by deciding one category for feature words.
The difference of the frequency was calculated for all feature words in the sentences and the sum of these was set as the category evaluation score. The category
evaluation score is calculated by Equation 4.
∑
Esx (ck ) =
fsx (wm )N DF (wm , C)
(4)
wm ∈W
In the equation, Esx (ck ) is the category evaluation value of sentence sx for
category ck . ck (wm ) indicating the category to which feature word wm belongs.
W indicates the set of feature words wm belonging to category ck in sentence sx .
fsx (wm ) indicates the appearance frequency of feature word wm in sentence sx .
2.2.6. Features Used in Emotion Estimation
In this research, we used word’s N-gram as feature by specifying the range of N . For
example, when N = [1, 5], we used 1-gram to 5-gram as features. As the value of N
increases, the appearance frequency becomes low and difficult to use in matching.
Therefore, we treated the word’s N-gram with larger N as more important features
than a word whose N-gram had smaller N . Precisely, we directly multiplied the
value of N by the appearance frequency.
3. Evaluation Experiment
3.1. Overview of Experiment
For the experiments, we did not consider the emoticons that were used for emotional expression. However, the effect of the emoticons for emotional estimation
12
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
should not be ignored. Therefore, we constructed the emotion model by using the
corpus removed of emoticons. To create a Wakamono Kotoba emotion model we
used two methods: the Naı̈ve Bayes method and the Accumulation method. The
corpus used in the experiment is described below. As the test corpus, we used 3,919
sentences including 199 kinds of Wakamono Kotoba and all sentences were annotated with emotion tags of Anger, Joy, Hope or Hate. We defined the collection of
the sentences as “Wakamono Corpus.” Considering the influence of the emoticons
on emotion estimation, we also used another corpus removed of emoticons for the
experiment. The 5,148 sentences without Wakamono Kotoba but with annotation
of emotion tags were also subjected to the experiment under the same conditions.
The experiment used Leave-One-Out Cross Validation and the accuracy rate was
calculated by Equation 5.
Fig. 3 shows the flow of the experiment.
Fig. 3. Construction of the experiment.
correct sentence number
× 100
(5)
total sentence number
The 957 sentences in the Wakamono Kotoba Emotion Corpus included emoticons, which made up 24.4% of the corpus. We also removed Wakamono Kotoba in
the sentences, then created N-grams from the sentences for another experiment to
consider the influence of Wakamono Kotoba on emotion estimation.
The conditions of the experiment are as follows:
Accuracy(%) =
• A use the corpus
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
13
• B use the corpus removed of emoticons
• C use the corpus removed of Wakamono Kotoba
• D use the corpus removed of Wakamono Kotoba and emoticons
Condition C means to split the sentences before and after Wakamono Kotoba
before extracting N-grams from the sentences. Condition D splits the sentences in
the corpus removed of emoticons before and after Wakamono Kotoba to extract
N-grams.
3.2. Result of Experiment
The results of the experiment using the Naı̈ve Bayes (NB) and the Accumulation
method (AC) are shown in Table 11. Fig. 4 shows the result of the Naı̈ve Bayes
method applied to Wakamono Kotoba Emotion Corpus. In each emotion N: N =
[1, 3] obtained the highest accuracy.
Fig. 4 also shows the accuracy rates of each maximum N value of the N-gram
for each emotion. Fig. 5 shows the accuracy rate for each emotion when UniDic20
was used as the MeCab dictionary instead of the IPA dictionary. The result shows
that the accuracy rate of the experiment using UniDic was higher than using IPA
dictionary for each emotion. The reason might be that the UniDic dictionary is
a word dictionary for morphological analysis specialized to the emoticons or to
modern language. However, when UniDic was used in the Accumulation method,
the accuracy rate totally decreased. In the case of the Accumulation method, the
difference of the appearance frequencies of the feature word in between the most and
the second most frequently appearing emotion categories greatly affected the score.
So, it seemed that the more Wakamono Kotoba or the emoticons were recognized as
one word using UniDic, the more the scores would decrease. Comparing the results
before and after removing the emoticons from the corpus, there was not such a
pronounced difference in the accuracy rate. The accuracy with the Naı̈ve Bayes
method was 8.8% higher than with the Accumulation method in the Wakamono
Kotoba corpus.
Fig. 8 shows the graph that compares the result when the most high accuracy
was obtained by IPA dictionary and UniDic in Accumulation method. From the
figure we can understand the difference of accuracy between IPA dictionary and
UniDic when emotions are Joy and Hope. The result indicates that the accuracy
of each emotion category is susceptible to the effect of the dictionary when the
Accumulation method is used.
From these results, it appears the accumulation method was not effective for
emotion estimation in sentences including Wakamono Kotoba.
The corpus (Wakamono Corpus) consisting of the sentences including Wakamono Kotoba obtained totally higher accuracy than the corpus (Normal Corpus)
without Wakamono Kotoba. As the accuracy of emotion estimation is affected by
the length of the sentences in the corpus, we cannot easily find out the reasons.
However, most sentences using Wakamono Kotoba included many casual expres-
14
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
sions that were distinctive to spoken language. Such diversity might have caused
the dispersion of the appearance frequency of the feature words and caused the difficulty producing the difference in likelihood. As the result of removing Wakamono
Kotoba, the accuracy in the Wakamono Kotoba corpus decreased in all of N ’
s ranges
more than the accuracy of the corpus removed of Wakamono Kotoba. It suggested
that Wakamono Kotoba could be an important feature for estimating emotion.
As we used the emotion estimation model based on the appearance frequency
of the feature words (Word N-gram), the appearance frequency of the words was
biased in the (Wakamono Corpus) and the (Normal Corpus).
Table 11. Result of Naı̈ve Bayes method(NB) and Accumulation method(AC).
Corpus
Range of N
Wakamono
Normal
1
1–2
1–3
1
1–2
1–3
A
NB
78.9
87.7
84.7
82.8
88.6
91.0
AC
58.0
57.7
54.9
56.7
55.5
53.5
Accuracy(%)
B
C
NB
AC
NB
AC
78.0 59.0 62.1 55.3
88.0 58.7 72.5
55
85.6 56.1 67.6 52.7
82.7 57.2
88.4 55.8
90.9 53.8
-
Table 12. Example of top 4 Wakamono Kotoba with high error rate.
Wakamono Kotoba
Haburare (はぶられ)
Jimikon (地味婚)
Itaden (イタ電)
Fast Fassion (ファストファッション)
Error Rate(%)
41.2
33.3
33.3
16.7
Table 13. Example of top 4 Wakamono Kotoba with success rate.
Wakamono Kotoba
Komike (コミケ)
Furima (フリマ)
UmaUma (ウマウマ)
GyaruMama (ギャルママ)
Success Rate(%)
100
100
100
100
D
NB
61.4
72.8
60.9
-
AC
55.9
55.6
53.4
-
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
100
90
80
70
60
50
40
30
20
10
0
Result of Naive Bayes: Wakamono Kotoba Corpus
N1
N2
N3
Anger
Hate
Hope
Joy
Fig. 4. Result of experiment (Naı̈ve Bayes) use
IPA dictionary.
100
90
80
70
60
50
40
30
20
10
0
N1
N2
N3
Hate
Hope
Joy
Fig. 6. Result of experiment (Accumulation
Method) use IPA dictionary.
Result of Naive Bayes: Wakamono Kotoba Corpus (Use UniDic)
N1
N2
N3
Anger
Hate
Hope
Joy
Fig. 5. Result of experiment(Naive Bayes) use
UniDic.
Result of Accumulation Method: Wakamono Kotoba Corpus
Anger
100
90
80
70
60
50
40
30
20
10
0
15
100
90
80
70
60
50
40
30
20
10
0
Result of Accumulation Method:
Wakamono Kotoba Corpus
N1
N2
N3
Anger
Hate
Hope
Joy
Fig. 7. Result of experiment (Accumulation
method) use UniDic.
90
80
70
60
50
IPA dic
UniDic
40
30
20
10
0
Anger
Hate
Hope
Joy
Fig. 8. Comparison of IPA dictionary and UniDic at Accumulation method.
4. Discussion
4.1. Failure Examples
Table 12 indicates the top four Wakamono Kotoba that resulted in high error rates.
In this section we would like to focus on these four words and analyze the results of
16
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
the experiments. In four out of seven sentences, “Haburare” , which means ostracize,
was annotated with the wrong emotion tags. The high error rate of “Haburare” is
due to these wrong annotations by the annotator. For example, in the sentences
below, it is highly unlikely to find the emotion of Joy.
• Shusho nanoni nani Haburare tenno? ( Is he being ostracized although he
is the Prime Minister?)
• Haburare terunokana? ( Am I being ostracized ? )
The other three failure sentences are as follows:
• Haburare temo... ( Although being ostracized, .... )
• Haburare teruyouna... ( feel like being ostracized .... )
• Barance chousei ga mendosou dakaratte Haburare soude. ( I might be ostracized because adjusting the balance looks difficult. )
“Haburare” expressed the emotions of Joy: 30%, Hope: 0%, Hate: 35%, Anger:
35%; however, it is hard to estimate the correct emotion from such short sentences
as the above examples. That will be the reason why the system wrongly estimated
“Anger,” although the correct emotion of the three sentences was “Hate”. Therefore,
for estimating emotion from extremely short sentences, some additional consideration will be necessary. “Jimikon”, which means no-frills wedding or small wedding,
showed an emotion of Joy: 11%, Hope: 56%, Hate: 33%, Anger: 0%. The sentences
including “Jimikon” were wrongly estimated as “Joy” because such sentences also
included the words related to “Joy” such as “iina!”, meaning ‘sure would be nice!’.
“Hanayome ishou” means bridal dress and “Shiawase” means happy, although the
correct emotion was “Hate”. We classified the successfully estimated sentences into
each Wakamono Kotoba. 39 out of 197 Wakamono Kotoba obtained an accuracy
of 100% for emotion estimation. Table 13 shows the top four Wakamono Kotoba
with high accuracy. “Komike” had the highest accuracy and the distribution of the
emotion tags annotated to the sentences including “Komike” was Joy: 2%, Hope:
68%, Hate: 24%, Anger: 0%. “Komike” tended to express the emotion of Hope by
itself. The example sentences including “Komike” were as follows:
• コミケ 終わった後の今年最後の楽しみになりそうですね。
( It will be the last fun after the “Komike” this year. )
• ちなみに今度の地元 コミケ で被っていようかなと思ってな。
( I am thinking of wearing this at the local “Komike” next time. )
• コミケ、楽しみにしてますよっ!
( I am looking forward to the “Komike”! )
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
17
There were many sentences expressing enthusiasm, expectation or Hope for the
event, because “Komike” means an event or place. Therefore, the sentences also included the words expressing expectation or scheduled, such as “Tanoshimi,” “Shiyoukana to omou”, and the accuracy rate seemed to become high. The 24% sentences
including “Komike” were annotated with the emotion tag of “Hate” as below:
• コミケ 後も起きてなきゃいけないんですね
(Do I have to be awake after the “Komike”? )
• 今の コミケ の理念的に難しくないか
(Isn’t it too difficult from the ideology of the “Komike”? )
As these examples show, the sentences expressing Hate force an action. Therefore, the sentence form of “–nai” expressing strong Hate is included. “Furima” had
high accuracy like “Komike” as the word expresses an event or place. The sentences
including “UmaUma” were annotated the emotion tags of Joy: 15 Hope: 1 Hate: 0
Anger: 0 and Wakamono Kotoba itself expressed the emotion of Joy strongly.
4.2. Effect of sentence length
100
80
60
40
20
0
Emotion Estimation Error Rate (%)
We analyzed the relation between sentence length and emotion estimation error
rate on the assumption that sentence length affects emotion estimation. The result
is shown in Fig. 9.
0
10
20
30
Sentence Length
Fig. 9. Estimation error rate and sentence length.
40
18
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
As the figure shows, it is clear that if the sentence is too long or too short,
emotion estimation error rate becomes high. It is necessary to realize the mechanism
of emotion estimation that considers the appropriate feature according to the length
of the sentence.
4.3. Effect of emoticons
To consider the influence of the emoticons we compared the accuracy rates in the
Wakamono Kotoba emotion corpus and in the Wakamono Kotoba emotion corpus
removed of emoticons. The result showed that the accuracy rate slightly increased in
the Wakamono Kotoba emotion corpus without emoticons, although the difference
was within the margin of error (0.816%).
4.4. Effect of Wakamono Kotoba
Because most of Wakamono Kotoba consisted of multiple morphemes and had emotion, the evaluation score in 2-gram and 3-gram tended to be high. In some cases,
more than one Wakamono Kotoba were included in a sentence.
• ただ、ちょっと 自己中 で...KY で...(汗)
Tada, chotto jikochu de … KY de… (Ase)
( He is just self-centered....and cannot sense the atmosphere. )
Both “Jikochu” and “KY” are Wakamono Kotoba meaning negative emotion
(Hate or Anger) in this example.
When Wakamono Kotoba was not used as feature, the accuracy decreased. As
shown in the above examples, Wakamono Kotoba itself often has emotion and consists of plural morphemes. Therefore, the number of the words in the sentences
decreases significantly after removing Wakamono Kotoba. This tends to make the
contents of the sentences more difficult to grasp, as well as making judging emotion
tags more difficult to be annotate, as in the below examples.
• Including Wakamono Kotoba : 材料、がっつり 1人分 (Ingredients for a
person to smash)
• Not including Wakamono Kotoba : 材料、一人分 (Ingredients for a person)
In the sentences including Wakamono Kotoba, sentence form and grammar are
sometimes incorrect, making it difficult to judge emotion.
• Including Wakamono Kotoba : “Akeome! Kotoyoro!” : あけおめ!ことよろ!
(Happy New Year!)
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
19
• Not including Wakamono Kotoba : “!!” : (!!)
• Including Wakamono Kotoba : “Kimoi desho..” : きもい でしょー Kimoidesho (Yuck)
• Not including Wakamono Kotoba : “desho.” : でしょー (Desho)
These problems happen because emotion was estimated from a single sentence.
It will be necessary to consider the context for better emotion estimation.
Problems due to the wavering standard for tag annotation When a person annotates tags by him/herself, the annotation standard varies depending on the annotator’s feeling. Here is an example:
• ここんとこ売り上げが どじょうのぼり だな。
Kokontoko uriage ga dojonobori dana.
(Recently sales numbers have been rising reasonably.)
The annotator tagged “Hope”; however, in this sentence both “Joy” and “Hope”
are acceptable. The annotation should be conducted by multiple annotators and the
annotation standard should be set clearly.
One of the reasons why the accuracy of emotion estimation decreased when
Wakamono Kotoba was removed might be because the Wakamono Kotoba itself is
often used to represent the total emotion of the sentence. We analyzed the difference in the results of emotion estimation before and after removing the Wakamono
Kotoba when Naı̈ve Bayes method was used with N = 3. Table 14 shows that 1,114
sentences failed in the emotion estimation after Wakamono Kotoba were removed
although they were successful in emotion estimation before Wakamono Kotoba were
removed.
On the other hand, the table shows that 158 sentences were successful in emotion
estimation after Wakamono Kotoba were removed although they failed in emotion
estimation before Wakamono Kotoba were removed.
Table 14. Analysis of effect of Wakamono Kotoba.
Before
Success
Failure
Success
Failure
After
Failure
Success
Success
Failure
Sentence Number
1,114
158
1,980
668
Percentage (%)
28.42
4.03
50.51
17.04
As this result, it became clear that the removal of Wakamono Kotoba negatively affected emotion estimation. On the contrary, there were also the sentences
20
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
that were successfully estimated by removing Wakamono Kotoba. To analyze these
sentences we conducted the statistics on Wakamono Kotoba that were included in
these sentences before removing them and obtained the result in Table 15.
Table 15. Frequency of Wakamono Kotoba in successfull sentences when Wakamono Kotoba were
removed.
Freq.
6
Example of Wakamono Kotoba
ギャル
4
ぐわんぐわん, etc.
3
自己中, アゲアゲ, ラブラブ, むかつく, ギャル男, ちゅどーん, etc.
2
フリマ, ウザい, 卒サラ, プチる, プチプラ, ゆるキャラ, ポジる, ハブられ,
ガビガビ, メガネ顔, ガングロ, きょどる, etc.
1
タオラー, キモい, インパーク, セレブ, アウトオブ眼中, チャリ, 過疎る,
派手婚, ネガキャン, イケメン, 公園デビュー, うぜぇ, ktkr, 女子力, 中二
病, アド変, コクる, しゃばい, etc.
We analyzed the sentences including “Gyaru” which was the most frequently
appeared in the successfully estimated sentences by removing Wakamono Kotoba.
As the result, in the six sentences, there were no sentences using “Gyaru” with
the original meaning. They were always used with other meanings such as “Gyarumama” or “Gyaru-o”. In all probability the estimation became successful because
by removing the word “Gyaru” the effect of the emotion of “Gyaru” disappeared.
The statistics of emotion tags annotated to the sentences including “Gyaru”
were in Table 16. In the sentences including Gyaru the distribution of emotion tags
varied widely. Because there was a few case that the word itself expressed emotion,
the word might have become noise in emotion estimation.
Table 16. Distribution of emotion tags to sentences including “Gyaru”.
Tag
Joy
Hate
Anger
Hope
Freq.
23
23
7
20
Percentage(%)
31.5
31.5
9.6
27.4
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
21
We analyzed the Wakamono Kotoba failure rate was 100% and found that there
was a lot of words which itself experssed strong emotion and its context was almost
same. Table 17 shows the examples of the Wakamono Kotoba whose failure rate was
100% after removing Wakamono Kotoba and frequently appeared in the corpus.
Table 17. Example of Wakamono Kotoba in error rate : 100%.
ズル込み (Zurukomi) , マジ (Maji), ガッツリ
(Gattsuri), ウマウマ (Umauma), ヤバイ (Yabai),
あけおめ (Akeome)
To observe the effect of the words that were not splitted correctly in morphological analysis, the statistics were collected on Wakamono Kotoba that did not match
the string of morpheme from one to six in the length in the morpheme analysis. We
listed up the Wakamono Kotoba whose morphological error rate was over 0.5 and
frequently appeared in the corpus. Table 18 shows the result. There is a lot of short
words in the table maybe because many of Wakamono Kotoba are the abbriviations
or the clipped words of the existing words. Because these short words tend to be
recognized as a part of the existing word, the analysis error rate of such short words
might become high. Many of the long Wakamono Kotoba are the compound word
combining the existing words. Such words are difficult to be distinguished from the
existing words in emotion estimation. The ambiguity of the word sense might be a
reason why the emotion estimation failed.
Table 18. Example of Wakamono Kotoba in error rate : over 0.5.
Word
えぐい (Egui )
メガネ顔 (Meganegao )
ガチンコ (Gachinko )
しぱしぱ (Shipashipa )
ギャル (Gyaru )
ありござ (Arigoza )
盛り (Mori )
ぶっち (Bucchi )
うにょうにょ (Unyounyo )
キショ (Kisho )
かりもふ (Karimofu )
ちみちみ (Chimichimi )
ぐわんぐわん (Guwanguwan )
きょどる (Kyodoru )
IPA
1.00
1.00
1.00
0.90
0.89
0.87
0.86
0.80
0.64
0.63
0.60
0.56
0.50
0.50
UniDic
1.00
1.00
1.00
0.77
0.92
0.80
0.60
-
Naist
1.00
1.00
1.00
0.90
0.81
0.87
0.86
0.80
0.64
0.63
0.60
0.50
0.58
22
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
In the result of Naı̈ve Bayes method with range of N(=1-3), the correlation between the emotion estimation error rate and the morphological analysis error of
Wakamono Kotoba is analyzed as shown in Fig. 10. However, this figure doesn’t
indicate clear correlation. The coefficients of the correlation were: 0.03431596 in
Pearson, -0.02513911 in Kendall and -0.03378695 in Spearman. In emotion estimation not the morphological analysis error of Wakamono Kotoba but other factors
seem to have a larger affect.
1.0
et 0.9
a
R0.8
ro
rr 0.7
E
no0.6
it 0.5
a
im
ts 0.4
E0.3
no
it 0.2
o
m0.1
E
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Morphological Analysis Error Rate
Fig. 10. Correlation between estimation error and morphological analysis error.
5. Conclusion
In this paper we proposed emotion estimation from the Wakamono Kotoba included
in sentences and conducted evaluation experiments based on the proposed method.
As a result, the proposed method estimated the sentence emotion with accuracy of
80% for the sentences including Wakamono Kotoba.
In the experiments we used the simple methods using Naı̈ve Bayes and Accumulation as classifiers. The weight of feature was decided by the length of the strings
of the morphemes. However, in general, the text classification method weights the
features by using TF-IDF or self-mutual information and selects the features. If
we have a sufficient amount of corpus, we can use TF-IDF or self-mutual information efficiency. In future we would like to introduce these methods and select the
features.
In future work, we would like to compare these methods with other text classi-
Analysis of Wakamono Kotoba Emotion Corpus and Its Application in Emotion Estimation
23
fication methods. There are also several issues to be solved, such as how to extract
stop words or important words from the sentences and detect Wakamono Kotoba
or emoticons.
We also would like to add these functions to realize the emotion estimation
method with higher accuracy and finally would like to apply our method to emotion
estimation from blog articles.
Acknowledgments
This research has been partially supported by Ministry of Education, Science, Sports and
Culture, Grant-in-Aid for Challenging Exploratory Research, 22240021.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
The Online Slang Dictionary. http://onlineslangdictionary.com/
A. Yonekawa. Younger’s Word Dictionary, Tokyodo Publisher, (in Japanese), 1997.
A. Yonekawa. Study on Young Word, Meiji Shoin, (in Japanese), 1998.
Y. Kitahara. Afureru Shingo, Taishukan Shoten, (in Japanese), 2009.
K. Noguchi. Kanari Kigakarina Nihongo, Shueisha, (in Japanese), 2004.
N. Yamaguchi. Wakamono Kotobani Mimiwosumaseba, Koudansha, (in Japanese), 2007.
Y. Kuwamoto. Production and stability of Wakamono Kotoba, Akita technical college bulletin,
38, pp. 113–120, 2003.
Y. Ding and F. Ren. Constructing Chinese Internet terminology corpus, IPSJ SIG Technical
Report, 2009-NL-193(4), pp. 1–7, 2009.
M. Yamamoto, S. Tsuchiya., S. Kuroiwa and F. Ren. Emotion classification for emotion
corpus construction, IEICE Technical Report 2007(76), NLC2007-6, pp. 31–35, 2007.
T. Harada and H. Kameda. Younger’s Word Processing Methods and their Evaluation, Technical report of IEICE. Thught and language 102(491),pp. 1–6, 2002.
T. Harada, Y. Yamamoto, C. Kubomura, Y. Sasaki and H. Kameda. Evaluation of Younger’s
Word Processing System, Proceedings of the IEICE General Conference, S37–38, 2006.
C. Kubomura, T. Harada, Y. Sasaki, Y. Yamamoto and H. Kameda. An evaluation method
of a younger ’s word processing system with use of blog articles, Technical report of IEICE.
Thought and language 105(613), pp. 165–169, 2006.
M. Suzuki. Text classification using the difference of term frequency between categories, Journal of Japanese Industrial Management Associaion 59(4), pp. 355–363, 2008.
M. Suzuki, and S. Hirasawa. Text classification using the sum of frequency ratios of word and
n-gram over categories, The transactions of the Institute of Electrical Engineers of Japan. C,
A publication of Electronics, Information and System Society 129(1), pp. 118–124, 2009.
K. Kita. Probabilistic Language Modeling, University of Tokyo Press, Tokyo, 1999.
K. Matsumoto, K. Mishina, F. Ren and S. Kuroiwa. Emotion Estimation Algorithm based on
emotion occurrence sentence pattern, Natural Language Processing, 14(3), pp. 239–271, 2007.
S. Aman and S. Szpakowicz. Using roget’s thesaurus for fine-grained emotion recognition,
Proceedings of the Third International Joint Conference on Natural Language Processing,
pp. 312–318, 2008.
Quan, C. and Ren, F. A blog emotion corpus for emotional expression analysis in Chinese,
Computer Speech and Language,24(4), pp. 726–749, 2010.
K. Mishina, K. Tsuchiya, S. Kuroiwa and F. Ren. Comparison between the human emotion
transfer ratio and the similarities of emotion, ICAI 2008, pp. 126–129, 2008.
Japanese Morphological Analysis Dictionary: UniDic, National Institute for Japanese Language and Linguistics.
24
K. Matsumoto, Y. Konishi, H. Sayama and F. Ren
Kazuyuki Matsumoto
(Member)
He received the Ph.D degree in 2008 from Faculty of
Engineering, the University of Tokushima. He is currently
an assistant professor at the University of Tokushima. His
research interests include Affective Computing, Emotion
Recognition and Natural Language Processing. He is a
member of IPSJ, IEICE and NLP.
Yusuke Konishi
He received the M.E. degree in 2005 from Faculty of
Engineering, the University of Tokushima. His research
interests include Natural Language Processing and Educational Engineering.
Hidemichi Sayama
He graduated from Faculty of Engineering, the University of Tokushima. He is currently in a Master course
of the University of Tokushima. His research interests include Natural Language Processing, Affective Computing
and Music Information Processing.
Fuji Ren
(Member)
He received the Ph.D. degree in 1991 from Faculty of
Engineering, Hokkaido University, Japan. He worked at
CSK, Japan, where he was a chief researcher of NLP.
From 1994 to 2000, he was an associate professor in the
Faculty of Information Sciences. From 2001 he joined
the faculty of engineering, the University of Tokushima
as a professor. His research interests include Affective
Computing, Artificial Intelligence, Language Understanding and Communication. He is a member of the IEICE,
CAAI, IEEJ, IPSJ, JSAI, AAMT, a senior member of
IEEE and a Fellow of JFES.
© Copyright 2026 Paperzz