as a PDF

Affect Analysis of Text Using Fuzzy Semantic Typing
Pero Subasic and Alison Huettner
CLARITECH Corporation
Justsystem Group
5301 Fifth Avenue, Pittsburgh, PA 15232, USA
Abstract- We propose a novel, convenient fusion of
natural-language processing and fuzzy logic techniques
for analyzing affect content in free text; our main goals
are fast analysis and visualization of affect content for
decision-making. The primary linguistic resource for
fuzzy semantic typing is the fuzzy affect lexicon, from
which other important resources are generated, notably
the fuzzy thesaurus and affect category groups. Free text
is tagged with affect categories from the lexicon, and the
affect categories’ centralities and intensities are combined
using techniques from fuzzy logic to produce affect sets –
fuzzy sets that represent the affect quality of a document.
We show different aspects of affect analysis using news
stories and movie reviews. Our experiments show a very
good correspondence of affect sets with human judgments
of affect content. We ascribe this to the effective
representation of ambiguity in our fuzzy affect lexicon,
and the ability of fuzzy logic to deal successfully with the
ambiguity of words in natural language.
Planned
extensions of the system include personalized profiles for
Web-based content dissemination, fuzzy retrieval,
clustering and classification.
I. INTRODUCTION
The huge amount of text stored on computer systems is
getting larger every day. Moving beyond the basic
assumption that a given piece of text should be easily located,
the next generation of systems aims toward integrated,
personalized services and decision support. In these areas, a
quick analysis of particular qualities in the text and an
intuitive presentation to the user become increasingly
important. To match an individual user’s profile on the World
Wide Web, for example, it is necessary to introduce a human
dimension into text understanding and representation. Affect
analysis of textual content can be of particular importance
here. The recently proposed area of affective computing ([1])
supports this conclusion with convincing evidence from a
multitude of sources in psychology, cognitive science, social
sciences, and decision-making.
Analyzing affect in a text, however, presents us with two
obvious sources of ambiguity and imprecision: the first being
emotions themselves, and the second, words in a natural
language ([2]). Rather than attempting to constrain and limit
this ambiguity, we have taken the opposite approach: we
explicitly represent and process ambiguity by introducing
fuzzy logic into the picture. Specifically, we integrate basic
techniques from fuzzy logic and from computing with words
([2]) with techniques from natural language processing
(NLP). Since the central technique we use from NLP is
semantic typing ([3]), we refer to this approach as fuzzy
semantic typing. We illustrate here its use in analyzing affect.
At the most basic level, it involves:
•
•
•
•
Isolating a vocabulary of words belonging to a metalinguistic domain (here, affect or emotion)
Using multiple categorizations and scalar metrics to
represent the meaning of each word in that domain
Computing profiles for texts based on the categorizations
and scores of their component domain words
Manipulating the profiles to visualize the texts
We deal with lexical ambiguity by allowing a single lexicon
entry (domain word) to belong to multiple semantic
categories. Imprecision is handled, not only via multiple
category assignments, but also by allowing degrees of
relatedness (centrality) between lexicon entries and their
various categories. In addition to centralities, lexicon entries
are also assigned numerical intensities, which represent the
strength of the affect level described by that word.
After the affect words in a document are tagged, the fuzzy
logic part of the system handles them by using fuzzy
combination operators, set extension operators and a fuzzy
thesaurus to analyze fuzzy sets representing affects.
Apart from the fuzzy affect lexicon, we generate additional
resources for enhanced functionality. The fuzzy thesaurus is
generated from the affect lexicon and is used to expand affect
sets; the affect category groups are generated by clustering
the fuzzy thesaurus for easier visualization, navigation and
browsing. We provide a detailed explanation of these
resources, with several practical examples, in section II.
A main representation vehicle in our system is a set of fuzzy
semantic categories (affect categories) followed by their
respective centralities and intensities, called an affect set. An
affect set with attached centralities is always treated as a pure
fuzzy set, and all fuzzy techniques applicable to fuzzy sets
are applied to affect sets. Intensities are handled differently,
in a more statistical way, since they involve less ambiguity
and imprecision and more quantitative aspects of the text.
Affect sets are presented in section III.
Finally, visualization is a very important issue in our
system. It demonstrates the real power of fuzzy semantic
typing in presenting a concise, to-the-point qualitative
representation of affects in texts. We show some very
interesting visualization samples of affect sets for movies and
news articles in section IV.
In section V we discuss further development of the system
and present our conclusions.
II. LINGUISTIC RESOURCES
This section describes the linguistic resources of the fuzzy
typing system: the affect lexicon, the fuzzy thesaurus and
affect category groups.
A. Affect Lexicon
The affect lexicon contains entries of the form:
<lexical_entry> <part_of_speech_tag>
<affect_category> <centrality> <intensity>
as in
"arrogance" sn superiority 0.7 0.9.
Lexical entry. A lexical entry is a word that has an
affectual connotation or denotes affect directly. At present,
our fuzzy affect lexicon contains 3876 lexical entries, about
half of what is planned.
Part of speech. Since ambiguity sometimes depends on a
word's part of speech (POS) – and since NLP processing
allows us to differentiate parts of speech in documents – we
have included POS information for lexicon entries. For
example, the word alert has different category assignments
associated with different POS values:
"alert" adj intelligence
"alert" vb warning
That is, the adjective alert means quick to perceive and act –
a kind of intelligence – while the verb alert means to call to a
state of readiness – a kind of warning.
A word's POS can affect its centrality or intensity values as
well as its category assignment. For example, lexicon entries
with POS, categories, and centrality degrees for the word
craze include:
"craze" vb insanity 0.8
"craze" sn insanity 0.5
That is, the verb craze belongs to affect category insanity
with a degree of 0.8; the singular noun craze belongs to the
same category with a degree of 0.5. This reflects the fact that
the verb craze means to make insane or as if insane – very
central to the insanity category! – while the noun craze is an
exaggerated and often transient enthusiasm – i.e., it belongs
to insanity only in a less central, more metaphorical sense.
Category. Many of our categories have strayed somewhat
from the strictly affect domain: for example, deprivation,
health, and intelligence are only marginally affects, and
death, destruction and justice are not affects at all. Such
categories have been created in cases where (a) some
significant portion of an affect word's meaning cannot be
captured using pure affect categories; and (b) the same
meaning component recurred again and again in the
vocabulary we were trying to handle. For example, a word
like corpse certainly entails some affect, and can plausibly be
assigned to categories sadness and horror; at the same time,
a part of its meaning is obviously being missed by those
categorizations. Moreover, words like assassination, cyanide,
execute, funeral, genocide, and homicidal share this missing
meaning component. On this first pass, we have gone ahead
and created extra, not-strictly-affect categories to handle such
words; in the future, when we review and revise the category
inventory, we may rethink this decision.
At present, there are 83 affect categories. Each affect
category has an explicit opposite, with three exceptions.
Centrality. Centrality degrees range from 0 to 1 by
increments of 0.1. A word which belongs to several affect
categories will generally have different centralities from
category to category, as in this example:
"emasculate" vb weakness 0.7
"emasculate" vb lack 0.4
"emasculate" vb violence 0.3
That is, the element of weakness is fairly central in the word
emasculate (a rating of 0.7); the notion of a specific lack is
also present but less central (rating of 0.4); and an additional
element of violence is possible but not really necessary
(rating of 0.3).
In assigning centrality, typical questions the developer
should answer for each entry/affect category include: To what
extent is affect X related to category C? To what extent does
affect X co-occur with category C? To what extent can affect
X be replaced with category C in the text, without changing
the meaning?
Since centralities indicate the presence of a certain quality
(represented by the appropriate affect category) for a given
lexicon entry, centralities are handled as fuzzy membership
degrees.
Intensity. In addition to centralities, lexicon entries are also
assigned numerical intensities, which represent the strength
of the affect level described by that entry. Intensity degrees,
like centrality degrees, range from 0 to 1 by increments of
0.1. Here are some examples (the second number represents
the intensity):
"abhor" vb repulsion 1.0 1.0
"contempt" sn repulsion 0.6 0.7
"aversion" sn repulsion 0.9 0.5
"displeasure" sn repulsion 0.3 0.3
"fat" adj repulsion 0.2 0.1
All of these words have some element or connotation of
repulsion. A word like abhor expresses very intense
repulsion (as well as being very central to the concept of
repulsion); contempt, aversion, and displeasure are
progressively less intense on the repulsion scale. A word
like fat – which is not at all central to the repulsion concept,
as expressed by its low centrality of 0.2, but which has some
slight overtones of repulsion to many Americans – is an
objective description, hence hardly an affect word at all. This
is reflected in its low intensity score of 0.1. (In general,
scores below 0.4 on both scales tend to be the most subjective
and notional.)
A word that belongs to several affect categories will
generally have different intensities from category to category,
as in this example:
relationship degree is equal to 1.0, we say that we have
discovered affectual synonyms, as in
"avenge" vb conflict 0.1 0.8
"avenge" vb violence 0.8 0.5
"avenge" vb justice 0.4 0.7
Non-synonymous pairs having entries in the matrix are
related to some specified degree.
The fuzzy thesaurus is primarily used for expansion of
affect sets. For example, if an affect set consists of love/0.7,
and the user opts to expand it using the fuzzy thesaurus,
related categories such as attraction will be added to the set
automatically.
That is, avenge is a high-intensity conflict word, but only a
moderate-intensity word with respect to violence; its intensity
rating for justice is somewhere in between.
Assigning category labels and membership degrees to
lexicon entries is a very subjective process. During the
present proof-of-concept phase, the assignments have been
made by a single linguist. They are obviously influenced by
the linguist’s own experience, reading background, and (since
affects are in question) personal/emotional background and
prejudices. Though subjective, the process is not completely
idiosyncratic – the assignments will be general enough in the
main to yield useful results. Ideally, however, we would like
to involve additional linguists, to review and refine the
inventory of atomic categories and to ensure some consensus
on the representation of difficult items. In a finished system,
repeated iterations and use of additional profiles or personal
lexicons will allow the individual user to fine-tune
membership degrees and accommodate his or her own
subjective criteria.
conflict, violence, 1.0
pain, harm, 1.0
B. Affect Category Groups
Affect category groups are generated automatically by
clustering the fuzzy thesaurus. In this process, affect
categories with high similarity degrees (as defined in the
fuzzy thesaurus) are grouped together. For example, we
might find that love, attraction, happiness, desire and
pleasure formed one affect category group, while repulsion,
horror, inferiority and pain formed another. If the
automatically-created groups are not so intuitively natural as
this example, the user can edit them.
Affect category groups can be used for more efficient
grouping of affect categories in visualization charts. One
example is shown in section 4.
III. FUZZY SEMANTIC TYPING
A.
Fuzzy Thesaurus
A. Affect Sets
The fuzzy thesaurus is generated by the system from the
affect lexicon. It is generated using max-min combination:
R ( AC i , AC j ) =
∨
A∈ AffectLexicon
{C A ( AC i ) ∧ C A (AC j )}
where AC i , AC j are affect categories whose relationship
degree
R( AC i , AC j )
we
want
to
compute,
and
C A ( AC i ), C A ( AC j ) are the centralities of affect categories
AC i , AC j with respect to affect A . C A ( AC i ), C A ( AC j ) is
taken directly from the affect lexicon.
The fuzzy thesaurus establishes relationships between pairs
of affect categories, based on the centralities of lexical items
assigned to both categories in the lexicon. It contains entries
of the form:
<affect_category_1>, <affect_category_2>,
<relationship_degree>
A central construct in our affect analysis is the affect set. It
comprises the set of unique affect categories from a given
text, with attached centralities and intensities. The following
sections discuss the generation of an affect set for a general
document.
B. Tagging of Free Text
The algorithm for tagging a document with an affect set is
given in Fig 1. It comprises the following steps.
B.1. Normalization and POS Tagging
The document is tagged with the appropriate affect
categories, as follows:
1.
2.
as in
attraction, love, 0.8
arranged in a matrix. When the relationship degree is equal to
0, no entry is recorded in the fuzzy thesaurus. When the
3.
The document is parsed and tokens (individual words)
are generated one at a time.
Each token is normalized using normalization rules for
the English language, shown as "Grammar" in Figure 1.
The normalized tokens are looked up in the affect
lexicon. If a token has a lexicon entry, we retrieve all
affect categories with their associated centrality and
intensity scores.
2.
1.1 Compute the maximal centrality (fuzzy union) of all
centralities attached to that affect category in the
tagged document. The result is the centrality of that
category for the document as a whole.
1.2 Compute the average intensity of all intensities
attached to that affect category in the tagged
document. The result is the intensity of that category
for the document as a whole.
Combine the counts of each affect category with its
intensities using simple averaging, to yield the overall
intensity score for the document.
As an example, consider the following document:
Fig. 1. Generation of the document affect set, a fuzzy set
representing the affective content of a document.
Using this algorithm, we generate the initial affect set for
each document.
As an example, consider a simple document consisting of
the sentence:
His film, Un Chien Andalou, co-directed by
Dali, caused an uproar (he filled his pockets
with stones so he would have something to
throw if the audience attacked him).
This document is tagged with:
"uproar" sn violence 0.6 0.6
"attack" vb violence 0.9 0.8
"attack" vb conflict 0.8 0.7
because of the word uproar and the word attacked. Note that,
since the word attacked belongs to both of the affect
categories violence and conflict, both categories are included
as document tags.
B.2. Combination of Centralities and Intensities; Document
Affect Set
The following algorithm describes how to reduce the initial
affect set by combining the centralities and intensities of
recurring categories.
1.
For each affect category that appears in the tagging set:
Luis Bunuel’s The Exterminating Angel (1962)
is a macabre comedy, a mordant view of human
nature
that
suggests
we
harbor
savage
instincts and unspeakable secrets. Take a
group of prosperous dinner guests and pen
them up long enough, he suggests, and they’ll
turn on one another like rats. Bunuel begins
with small, alarming portents. The cook and
the servants suddenly escape, just as the
guests are arriving. The hostess is furious;
she planned an after-dinner entertainment
involving a bear and two sheep. Now it will
have to be canceled. It is typical that such
surrealistic touches are dropped in without
comment. The dinner party is a success. The
guests whisper slanders about each other,
their eyes playing across the faces of their
fellow guests with greed, lust and envy.
After dinner, we glimpse a woman’s purse,
filled with chicken feathers and rooster
claws.
After fuzzy semantic tagging, the following output is
produced:
macabre,adj,death,0.50,0.60
macabre,adj,horror,0.90,0.60
comedy,sn,humor,1.00,0.60
mordant,adj,pain,0.3,0.5
mordant,adj,clarity,0.4,0.8
savage,adj,violence,1.00,1.00
...
envy,sn,inferiority,0.4,0.4
envy,sn,lack,0.5,0.5
envy,sn,slyness,0.5,0.6
fill,sn,surfeit,0.70,0.40
We need to combine recurring affect categories into a set of
unique tags, with centralities and intensities that accurately
reflect the overall document content. For that purpose, we
discard the original affect words and the POS information,
and combine the intensities and centralities of the remaining
affect categories. Intensities and centralities are handled
differently. Centrality indicates the purity of a quality
represented by an affect category. Intensity indicates the
strength of that quality. Thus the number of occurrences of a
particular affect category in a document does not affect its
centrality, but does affect its intensity. Centrality, as the
purity of a quality, depends on the maximal centrality over all
instances of that affect category in a particular document.
That is, the maximal purity of the quality in the document
already implies more diluted degrees of that quality, and is
therefore appropriate as the combined centrality/purity for
that category. The appropriate operation here is thus fuzzy
union. On the other hand, the more times an affect category is
present in the document, and the higher the intensities of its
instances, the higher will be the combined intensity/strength
attached to it. We compute the intensity attached to an affect
category as a simple average of all the intensities attached to
the affect category’s instances.
After computing centralities using fuzzy union, and
arranging elements so that the elements with higher
membership degrees (centralities) are at the front of the fuzzy
set, we have:
Exterminating Angel: Negative Centralities
abs ur di ty
l ack
i nf er i or i ty
This form of representation for the fuzzy set of affect
categories enables us to spot predominant affects in the
document. The meaning of this affect category set is that the
document has a high degree of violence, humor, warning,
anger, success, slander, greed, horror, aversion, absurdity,
excitement, desire, pleasure, promise and surfeit; a medium
degree of repulsion, fear, lack, death, slyness, intelligence,
deception, insanity, clarity, innocence and inferiority; and a
low degree of pain, disloyalty, failure, creation and surprise.
To compute the overall intensity we use a simple weighted
average over all affect category instances and their respective
intensities. For a given document, overall intensity is 0.597.
Overall intensity is used to detect documents with offensive
content. For example, high overall intensity (over 0.7) in
combination
with
a
specific
centrality
profile
( distaste / 0.8 + violence / 0.9 + pain / 0.8 )
may
indicate
offensive and undesirable content.
IV. AFFECT SET VISUALIZATION
An interesting and important area related to the fuzzy typing
work is visualization of the results. Each affect category’s
centralities and intensities can be represented as a point on
the perimeter of a unit circle. Then, centralities and intensities
can be visualized on separate charts, as shown in Figures 2-5.
In order to demonstrate various ways of organizing these
charts, we show different charts for different information
objects.
anger
decepti on
di s l oyal ty
0.5
gr eed
aver si on
0
war ni ng
r epul si on
f ai l ur e
sl ander
death
hor r or
s l yness
pai n
f ear
Exterminating Angel: Positive Centralities
clarity
surfeit
violence humor warning anger success slander greed horror
+
+
+
+
+
+
+
+
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.90
aversion absurdity excitement desire pleasure promise surfeit
+
+
+
+
+
+
+
+
0.90
0.80
0.80
0.80
0.70
0.70
0.70
repulsion fear lack death slyness int elligence deception
+
+
+
+
+
+
+
+
0.60
0.60 0.50 0.50
0.50
0.50
0.50
insanity clarity innocence inf eriority pain disloyalty failure
+
+
+
+
+
+
+
0.50
0.40
0.40
0.40
0.30
0.30
0.30
creation surprise
+
+
0.30
0.30
1
i nsani ty
promise
1
creation
0.8
0.6
intelligence
0.4
0.2
0
surprise
success
innocence
desire
excitement
pleasure
humor
Fig. 2. Centralities for positive and negative affects are
shown separately. Affect categories are arranged
into similar groups around the circle.
Exterminating Angel: Expanded Centralities
surfeit
promise
surprise fear
horror
clarity
sanity
security
absurdity
justice
guiltirritation
immorality
pain
failure
slander
happiness
disadvantage
charm
love
openness
violence
injustice
creation
strength
confusion
innocence
inferiority
conflict
pomposity
destruction
intelligence
reasonableness
slyness
warning
sadness
excitement
crime
success
stupidity
superiority
insanity
disloyalty
anger
humor
advantage
persuasionharm
greed
attraction
aversion
deception
desire
pleasure
repulsion
insensitivity
hate
cooperation
deprivation
courage
Fig. 3. With all centralities from Figure 2 expanded by the
fuzzy thesaurus, we see a greater level of detail. Note that
additional affect categories exist in the new chart.
In Figure 2 we show centralities with positive affects
separated from centralities with negative affects. In this way,
the positive vs. the negative side of the document can be
easily analyzed. In Figure 3, we show an application of the
fuzzy thesaurus. Specifically, when all affect categories are
expanded using the fuzzy thesaurus, we obtain the chart in
Train Crash: Centralities
violence
sur pr ise
advant age
1
clar it y
confli ct
0.5
str ength
confusi on
0
pai n
dest r ucti on
int el ligence
disadvantage
f ear
har m
i mmor al ity
hor r or
honesty
added to the chart through expansion. In Figure 4 we show
the affect structure of a news report – a train crash in London.
We show both centralities and intensities, which are highest
for those categories typical of news on accidents. Opposite
affect categories can be placed on opposite sides of the circle
with respect to the center point. This is illustrated in Figure 5
for affect centralities in a Matrix review. With this
arrangement, we can easily spot which part of the circle is
better developed and understand its overall affect content.
Affect categories can be generated for groups of movies as
well: we carried out an analysis of romance, action, science
fiction, comedy and family movies.
V. CONCLUSION
Train Crash: Intensities
advant age
violence 0.8
clar ity
0.6
sur pr ise
conflict
0.4
st r ength
confusion
0.2
pai n
destr uction
0
int el ligence
disadvantage
i mmor al ity
hor r or
fear
har m
honesty
Over all intensi ty =0.53
The fuzzy semantic typing approach deals very well with
ambiguity and imprecision in free text. It can be efficiently
combined with a set of visualization tools, for easy, accurate
analysis of affect content in a document. Although these
conclusions are definite, we feel that we just started
exploration in this uncharted territory. In that sense, the effort
reported here is just a beginning. Our plans in the immediate
future include:
•
Fig. 4. News report on train crash in London, October 1999.
Centralities describe the quality, and intensities the quantity
(count and strength) of affects in a document. As expected,
affects of fear, harm, pain and surprise are central. The most
intense affects are conflict, confusion, disadvantage and pain.
•
Matrix: Centralities
•
cooperation
surprise
1
pain
courage
Management of linguistic resources. We will
implement user-initiated updating of affect lexicons:
modification of centralities and intensities and definition
of complex affect categories.
Other domains. Fuzzy typing is a general framework
and can be adapted to many different application areas:
business,
food/cooking,
fashion,
architecture,
cultural/artistic and psychology-related material, liquor,
perfumes.
Different points of view. Existing affect analysis can be
combined with a lexicon containing expressions that
describe intentions (e.g., would like to, will, is
considering, is thinking of) to give us more insight into
the document content.
creation
irresponsibility
0.5
humor
injustice
morality
0
immorality
justice
horror
responsibility
destruction
pleasure
fear
predictability
conflict
Fuzzy typing represents an innovative way to capture
metalinguistic facts about a text while allowing for linguistic
ambiguity and vagueness. The metalinguistic representation
can be easily utilized in retrieval, clustering, and
classification. The approach is useful in an indefinite number
of domains, and lends itself to customization for a particular
user or task. We look forward to continuing our research in
these directions.
REFERENCES
Fig. 5. Profile of a recent cult movie Matrix, generated from a
movie review. Opposite affect categories are on opposite
sides of the circle – the left side shows negative affects, the
right side positive. With a well-developed negative side, this
movie is justly rated "R" in the US.
1.
2.
3.
Figure 3. It contains both positive and negative affects, with a
higher level of detail, since new affect categories have been
Rosalind W. Picard, Affective Computing, MIT Press,
1997.
Lotfi A. Zadeh, Fuzzy Logic = Computing with Words,
IEEE Transactions on Fuzzy Systems, 2, 103-111, 1996.
Miller, George, and Walter Charles, Contextual
correlates of semantic similarity, 6:1-28, Language and
Cognitive Processes, 1991.