Affective space interfaces

FAKULTÄT FÜR !NFORMATIK
Affective space interfaces
DIPLOMARBEIT
zur Erlangung des akademischen Grades
Diplom-Ingenieur
im Rahmen des Studiums
Medieninformatik
eingereicht von
Oliver Spindler
Matrikelnummer 0100611
an der
Fakultät für Informatik der Technischen Universität Wien
Betreuer: Ao.Univ.Prof. Dr. Peter Purgathofer
Wien, 20.03.2009
_______________________
______________________
(Unterschrift Verfasser)
A-1040 Wien
Technische Universität Wien
Karlsplatz 13 Tel. +43/(0)1/58801-0
(Unterschrift Betreuer)
http://www.tuwien.ac.at
H
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die
verwendeten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die
Stellen der Arbeit – einschließlich Tabellen und Abbildungen –, die anderen Werken
oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall
unter Angabe der Quelle als Entlehnung kenntlich gemacht habe.
Wien, 20. März 2009
2
Kurzfassung
Die Bedeutung, die wir Informationseinheiten wie Wörtern, Bildern, Musik und
Filmen zuschreiben, liegt nicht nur in ihrem rationalen Wert, sondern genauso in
den Gefühlen, die diese Dinge in uns erwecken. Interaktive Systeme beschränken
sich bislang vor allem auf die Beschreibung des eigentlichen Inhalts, die Denotation,
während die Emotionen, die wir mit Informationseinheiten assoziieren, ihre Konnotation, meist nicht beachtet wird. Die Berücksichtigung von emotionaler Bedeutung
würde neue Möglichkeiten eröffnen, um Inhalte zu strukturieren und zu entdecken.
Benutzer könnten Inhalte finden, die ihrer Stimmung entsprechen, unabhängig vom
Medientypen.
Die vorliegende Arbeit untersucht verschiedene Möglichkeiten, wie emotionale
Bedeutung in interaktiven Systemen repräsentiert werden kann. Vorgestellte Lösungsansätze sind unter anderem der Einsatz von Sprache, Farben und Gesichtsausdrücken. Die theoretische Basis für diese Untersuchung bilden Arbeiten aus
den Disziplinen Semiologie, experimentelle Psychologie, Kunsttheorie und Interaktionsdesign.
Es wurde ein comichaftes Gesichtsmodell entwickelt, das Emotionen durch Gesichtsausdrücke visualisiert. Die resultierende Softwarekomponente kann in Webbrowsern eingesetzt werden und folgt dem Verhalten von echten Gesichtern, indem
Gesichtsmuskeln simuliert werden. Sechs Grundemotionen können gemischt werden, um spezifische und subtile emotionale Zustände auszudrücken.
3
Abstract
The meaning we ascribe to information entities – words, pictures, music, video,
etc. – not only lies in their rational value but as much in the feelings they evoke
in us. Currently, interactive systems focus on description of the actual content, or
denotation, while associated emotions, the affective connotation of content, is usually
neglected. Embracing the affective meaning of media entities would enable to structure content in novel ways, emphasising similarities and differences across media
types not covered by textual content descriptions, and would allow users to find
content to match an intended mood.
To this end, I examine work from diverse fields such as semiology, experimental psychology, art theory and human-computer interaction to form the theoretical
basis for the use of this aspect of meaning in interactive systems. Building on this
theoretical basis, several ways to visualise and describe affective connotation are
compared, which include the use of language, colours and facial expressions.
Additionally, a software component has been developed, which visualises affective connotation through facial expressions of a comic-like face. The face model
uses web technology for easy deployment in browser environments and simulates
the behaviour of biological faces by following a muscle-based approach. It is capable of expressing subtle emotional changes through arbitrary blending of six basic
emotions.
4
Thank you. . .
to everyone who made this work possible, especially. . .
to Scott McCloud, for inspiring work and support
to Tony Bryant, for supervision during my exchange semester
to the people at igw/hci at Vienna UT
to Peter Purgathofer, for years of inspiration and guidance
to my friends and family, especially. . .
to Thomas, for accommodating me and great cooperation
to my parents, for a lifetime of understanding and support
5
Contents
1
Introduction
1.1
2
3
4
5
8
Research question and chapter outline . . . . . . . . . . . . . . . . . . . 10
Meaning
11
2.1
2.2
Meaning in linguistics and semiology . . . . . . . . . . . . . . . . . . . 11
Meaning in experimental psychology . . . . . . . . . . . . . . . . . . . 15
2.3
Meaning in information resources . . . . . . . . . . . . . . . . . . . . . 20
Affect and emotion theories
23
3.1
Emotion, mood or affect? . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2
Induction and communication of emotion . . . . . . . . . . . . . . . . . 25
3.3
Dimensional vs. categorical approach . . . . . . . . . . . . . . . . . . . 26
3.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Affect in art and media
34
4.1 General theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2
Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Metalanguages for affective space
46
5.1
Affective scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2
Semantic scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3
Affect words in natural language . . . . . . . . . . . . . . . . . . . . . . 52
5.4
Colours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5
Facial expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.6
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6
CONTENTS
6
7
Grimace
7
76
6.1
6.2
Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3
Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4
Technical details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.6
Discussion and future directions . . . . . . . . . . . . . . . . . . . . . . 101
Summary and conclusion
Bibliography
103
106
Chapter 1
Introduction
We do not just think, we feel. In fact, one cannot be without the other. As sentient
beings, our cognition always involves both reason and emotion working together
hand in hand. The meaning we ascribe to all things lies as much in their rational
value as in the feelings we associate with them.
Computers are neither sentient nor capable of thinking in the true sense of the
word. The artificial intelligence community is working hard to change this, but
at their core, computers remain machines, calculators, which describe the world
through mathematics and logic. The web has made it possible to find comprehensive information about just any topic; all it takes is to enter one or a few keywords
into a search engine. In this way, computers help us to overcome our limited processing and memory capabilities and support reasoning.
Affective experiences – feelings, moods and emotions – are much more difficult
to grasp. Being internal sensations, they are not ‘there’, not tangible. And yet, they
accompany us in everything we do and think, serving as implicit yet ubiquitous
guides in our lives. Emotions do not follow mathematical laws, nor can they be
predicted by logic. Perhaps this is the reason why there have been so few attempts
to acknowledge this part of meaning in computer science.
However, emotions present a big opportunity if we strive to increase the usefulness of computers. If software not only supported what things mean on a rational
level, but also how we feel about these things, this should greatly improve the ways
in which content can be structured, queried and presented. This thesis is an attempt
at showing a way how this goal might be achieved.
To this end, I build on work from a diverse range of scientific fields, primarily semiotics and linguistics, (experimental) psychology, art theory and, of course,
human-computer interaction (HCI). As is the case with probably most interdisciplinary work, a main challenge was to reconcile different traditions and specialised
terminology to form a coherent argument. While I have taken care to do justice to
8
CHAPTER 1. INTRODUCTION
9
all theories consulted, it seems quite inevitable that I have oversimplified matters at
one point or another.
Affective computing is a recent yet already thriving research area in the HCI community. It can be seen as an effort to recognise the importance of affect and emotions
in interaction with computers. Affective computing strives to create interactive systems which register the user’s emotional state and adapt their behaviour and output in a way which is appropriate for the user. When computers behave like human
conversational partners, both exhibiting and reacting to emotions, interaction experience should become more pleasant and effective for users. This goal has been
summarised as “making machines less frustrating to interact with” (Picard, 1997, p.
214).
This thesis, too, is concerned with the role of affect in human-computer interaction. As such, it may be categorised as being an effort in affective computing.
However, both the starting point of this investigation as well as its goals differ
considerably from those of affective computing. Rather than seeing affect as an atmospheric facilitator of interaction, this thesis tries to examine how affect can be a
focal point of discourse when we interact with computers.
I build on the hypothesis that the way we feel about things is as important
as what these things mean on a rational level. Thus, this thesis treats affect as
an intrinsic property of any meaningful entity. Entities which seem to be entirely
unrelated at first glance may evoke very similar affective responses in humans.
The term meaningful entity is deliberately general. It encompasses words and larger
linguistic units, various kinds of art and media and any other form human creativity
takes.
CHAPTER 1. INTRODUCTION
1.1
10
Research question and chapter outline
This thesis addresses the question, “What is the nature of the affective experiences we
associate with meaningful entities, and how can these experiences be described and utilised
in interactive systems?”. The respective chapters of this thesis deals with various
issues on the way to this goal. Chapters 2–4 deal with the first part of the research
question, while chapters 5 and 6 focus on the second part of the research question.
• Chapter 2 examines how affect is related with meaning. This brief initial
analysis introduces the concepts of affective connotation and affective space. It
serves as necessary theoretical groundwork for the subsequent chapters.
• Chapter 3 introduces theories of affect and emotion. Understanding the psychological and biological nature of affect is needed to inform the design of
affect-aware interactive systems.
• Chapter 4 examines the special relation of affect and arts and tries to explain
how different media types manage to express and arouse emotions.
• Chapter 5 compares several ways in which affect can be described and utilised
in interactive systems. Examples include affect words, facial expressions and
colours.
• Chapter 6 describes Grimace, an experimental affective space interface, which
visualises emotions through facial expressions of a comic-like face.
Chapter 2
Meaning
This thesis builds on the proposition that affect is an intrinsic property of meaning.
This chapter demarcates the kind of meaning I refer to, how its relationship with
affect might come about, and why I consider it to be of particular importance for
the design of interactive systems.
To this end, theories about meaning from two different scientific fields are compared. Theories from the domains of linguistics and semiology are consulted first,
laying the theoretical groundwork for work from the area of experimental psychology. Finally, I outline what implications these theories have in the domain
interactive systems and information resources.
2.1
Meaning in linguistics and semiology
The task of defining the meaning of ‘meaning’ is a difficult one and spans many
centuries of scholarly debate. In their classical treatise of The Meaning of Meaning,
Ogden et al. (1923/1969) identified no less than 16 different groups of definitions for
‘meaning’ which have been put forward by various authors. I see manifold reasons
for such disagreement. First of all, different disciplines ask for different working
definitions. Model-like explanations reduce the complexity of a problem in order
to illustrate the issue at hand. Most importantly, however, discussion of meaning
leads to discussion of the very nature of sense-making and is thus shaped by our
view of the world. In this way, the actual meaning of ‘meaning’ depends on the
context the term is used in.
The Oxford English Dictionary supplies us with a valuable starting point for
finding a working definition for this thesis. One of the several definitions the OED
offers for ‘meaning’ is, “That which is intended to be or actually is expressed or
indicated.” (OED: meaning) This definition can be analysed with the use of semiotic
theory.
11
12
CHAPTER 2. MEANING
A linguistic sign was defined by de Saussure (1916/1959) as a dyadic relationship,
in which a signifier signifies a signified. He uses this relation signifier > signified
as the basis of his investigation. A signifier is the “sound image”, the form a word
takes, while the signified is the “concept”, referent, or, we might say, “that which is
expressed or indicated”. Thus the definition above from the OED clearly focuses
on the signified. There is significant disagreement between authors whether ‘meaning’ lies in the relation signifier > signified, or in the signified itself (Ogden et al.,
1923/1969, p.185).
Homonyms and synonyms
It is easy to show that the relationship between signifiers and signifieds is less than
clear in most cases. For instance, one signifier can point to multiple signifieds. The
signifier fluke can refer to a type of fish, a type of flatworm, the hooks of an anchor,
a manufacturer of electronic test equipment or a novel of this name. In linguistics,
a sign relation in which one signifier points to multiple signifieds simultaneously is
called a homonym. On the other hand, completely different signifiers can refer to the
very same signified. movie and film, or baby and infant are but two of countless
examples. Signifiers which point to identical signifieds are synonyms.
The same model signifier > signified can be applied equally well to non-linguistic
signs. The word hammer refers to an object with a heavy, sturdy top part and a
handle bar, which is designed to apply accumulated manual force onto another object. Now consider the icon
, a non-linguistic signifier, which arguably signifies
the same signified as the word hammer.
Whether we use linguistic or non-linguistic signs, we use them in a world already filled with signs. Usually, we have many alternative signs to choose from.
Synonymous and homonymous signs are examples where sign meanings overlap.
When a sign has another meaning, we are or can be made aware of this fact. Consider puns, which are but one example where this ambiguity is being used for
humorous effect. Therefore, we need to extend our working definition of ‘meaning’
to “that which is intended to be or actually is expressed or indicated”. Also note that
our definition of meaning does not imply the use of language for the process of
signification.
Our definition now includes two different kinds of ‘meaning’; the intended meaning and the actually expressed meaning. If signs were fully explained by a simple
signifier > signified relation, no such ambiguity would be possible, and intended
meaning would always equal actually expressed meaning.
CHAPTER 2. MEANING
13
Historical view: two aspects of meaning
Garza-Cuarón (1991) gives a comprehensive account of the history of the concept
of ‘meaning’. Throughout history, there has been a clear tendency by scholars to
distinguish between a first and a second meaning. The basis for this distinction,
however, has changed many times.
Since mediaeval times, adjectives, or, “connotative terms”, were said to have
two meanings. Firstly, an adjective refers to the subject which possesses the quality indicated by the adjective. Secondly, an adjective indicates a quality which a
subject possesses. James Mill (1829/1878, cited in Garza-Cuarón, 1991) reverses the
mediaeval distinction, extends the definition to verbs and introduces new names.
Notation refers to the quality indicated by an adjective or the action indicated by a
verb. Connotation refers to the subject which possesses the indicated quality or performs the action. John Stuart Mill (1843/1973, cited in Garza-Cuarón, 1991) finally
introduces the terms denotation and connotation. Denotation refers to all subjects a
word applies to, connotation refers to the attribute which is implied through a word.
For instance, “the word white denotes all white things, as snow, paper, the foam of
the sea, etc., and implies, or in the language of the schoolmen, connotes, the attribute
whiteness” (Mill 1843/1973, cited in Garza-Cuarón, 1991). The distinction between
‘connotation’ and ‘denotation’ as the two primary aspects of ‘meaning’ was very
influential and has been employed ever since.
Defining denotation and connotation
The Oxford English Dictionary offers, among several others, this definition for denotation: “A term employed to denote or describe a thing; a designation.” (OED:
denotation). This definition can be seen as a valid description of the simple sign
relation (signifier > signified, or “sound image” references “concept”).
In the traditional view of ‘connotation’, introduced by John Stuart Mill, the term
refers to the attributes that are implied when we refer to a specific signified. For
instance, reference to the mythical figure Hercules (denotation) implies features like
strong, male, mythical (connotation). Urban (1939) suggests the name conceptual
connotation for this tradition.
In the first half of the 20th century, a new usage of connotation clearly emerges.
Through the contributions of Ogden et al. (1923/1969), Erdmann (1925/1966, cited
in Garza-Cuarón 1991) and Urban (1939), ‘connotation’ now refers to something less
clearly defined than implied attributes. Urban speaks of “the feeling or emotion
with which the word is bound up as an expression” (Urban, 1939, p. 141), and Osgood deals with “connotative, emotive, or metaphorical ‘meaning’” (Osgood et al.,
1957, p. 321). According to Garza-Cuarón (1991), connotation has always had this
meaning in layman English, but the 20th century saw the embrace of this tradition
in scientific debate. In this context, it is common in scientific debate to refer to
14
CHAPTER 2. MEANING
‘emotive meaning’ or ‘emotive tradition’. However, we will see in the next chapter
that this leads to a quite ambiguous view of the word ‘emotion’. Hence I refer to
this aspect of connotation via the more general term affective connotation.
Thus, there are two very different views of what ‘connotation’ is. The term
either refers to implied properties of a sign (conceptual connotation) or to the feelings which are aroused or somehow related with a sign (affective connotation). The
Oxford English Dictionary defines ‘connotation’ in the following way: “The signifying in addition; inclusion of something in the meaning of a word besides what
it primarily denotes; implication.” (OED: connotation) Arguably, both traditions,
‘conceptual connotation’ and ‘affective connotation’, are covered by this definition.
Let us return to the working definition of meaning from before; “that which is
intended to be or actually is expressed or indicated”. Denotation refers to the intended
meaning of a sign. However, something else is actually expressed along the way too,
which can be summed up as connotation.
Semiotic view: Denotation, connotation and metalanguage
Barthes (1973/1996) gives an explanation of how denotation and connotation are
related. The sign model used by him is very similar to the one introduced above. A
system of signification (or sign) consists of a plane of expression (or signifier), which
references a plane of content (or signified). For reasons of consistency, I will stick
with De Saussure’s terminology.
He maintains that connotation occurs when one sign (the denotation) becomes
the signifier of a new sign (the connotation). Figure 2.1 illustrates this principle.
CONNOTATION
DENOTATION
Sd
Sr
Sr
Sd
Figure 2.1: Denotation and connotation. Adapted from Barthes (1973/1996).
For example, consider the sign ‘machine’. Generally, the signifier machine
refers to devices which convert energy into some form of activity. This is the denotative level.
However, there is more to ‘machine’ than this plain statement. Machines can
symbolise industrialisation and thus inhumanity, loss of jobs, generally unpleasant
notions. However, they can also evoke positive associations with, say, increased
living standard through automation. It is not possible to discern if the signifier
or the signified provokes these associations; thinking of or seeing a machine can
have the same effect as hearing machine. For now, we can only say that the sign
‘machine’ evokes the associations. (Chapter 4 elaborates on this question.)
15
CHAPTER 2. MEANING
When this sign evokes something else, it actually must have become the signifier of a new system. The new signifier references ‘something else’, which must
therefore be another signified. This new system may be called the connotative level.
Therefore, “a connotated system is a system whose plane of expression is itself
constituted by a signifying system” (Barthes, 1973/1996, p.129).
Before we can examine what this something else, the connotated signified, actually is, we need the semiotic groundwork for this analysis, for it is not obvious how
to describe the signified of a connotative system. On the denotative level, we can
find actual entities in our surrounding world which we can later reference through
the use of words. However, this is not possible on the connotative level, for this
system’s signified is intangible, with no manifest object to refer to.
So, the best we can do is to refer to a connotated signified by proxy; we need
a new sign which reliably stands for the sought after signified. An obvious choice
is the denotative sign which revealed the connotation’s existence in the first place.
However, this does not allow us to further investigate connotation, we can still only
refer to a ‘something else’ which we can feel. What we therefore need is a new
system which substitutes the connotated signified we seek to describe with a new
sign of which we know both signifier and signified. Barthes calls this substitution
metalanguage. The principle is illustrated in figure 2.2. Hence, “a metalanguage is a
system whose plane of content is itself constituted by a signifying system” (Barthes,
1973/1996, p.130).
CONNOTATION
METALANGUAGE
Sr
Sd
Sr
Sd
Figure 2.2: Connotation and metalanguage. Adapted from Barthes (1973/1996).
In the course of this thesis, we will encounter various metalanguages which have
been put forward by different authors to describe connotation.
2.2
Meaning in experimental psychology
In 1957, Osgood et al. were the first to attempt quantitative measurement of ‘meaning’. The task of measuring meaning is a bold one, for, as the authors admit, “[t]here
are at least as many meanings of ‘meaning’ as there are disciplines which deal with
language” (Osgood et al., 1957, p.2). The use of the very general term ‘meaning’ for
their efforts proved to be controversial, and it is important to examine what kind of
‘meaning’ Osgood et al. were dealing with.
Their own definition is based on “the distinction between what has variously
been called denotative, designative, or referential ‘meaning’ and what has been
CHAPTER 2. MEANING
16
called connotative, emotive, or metaphorical ‘meaning’.” (Osgood et al., 1957, p.321)
Once again, we see the distinction between denotation and connotation. Osgood et
al. are not concerned with denotation: “[We] are not providing an index of what
signs refer to, and if reference or designation is the sine qua non of meaning, as
some readers will insist, then they will conclude that this book is badly mistitled.”
(Osgood et al., 1957, p.321, italics theirs)
Instead, their focus lies on “connotative, emotive, or metaphorical ‘meaning’”.
Apart from this statement, no reference is given about which literature their understanding is based on. However, they clearly subscribe to the new current of affective
connotation introduced before, the psychological association of ideas to linguistic
or non-linguistic stimuli. They do not mention the traditional use of the term ‘connotation’, which I refer to as ‘conceptual connotation’, at any point in the book.
The general term ‘meaning’ is used in the specific sense of affective connotation
throughout the book. This “seemingly peculiar use of connotation” (Garza-Cuarón,
1991, p.106, italics hers) triggered heavy criticism. “[The linguist Uriel] Weinreich
is criticising Osgood for his ignorance of the studies on the subject of meaning.”
(Garza-Cuarón, 1991, p.108) Osgood admits not to be “as sophisticated as I probably should be with respect to philosophical and linguistic semantics” (Osgood
1959, cited in (Garza-Cuarón, 1991, p.107)), but also reminds his critics of the longstanding debate about the definition of meaning and connotation. In my opinion,
giving the book a title more specific would have avoided a lot of unnecessary controversy, which actually was anticipated by the authors (Osgood et al., 1957, p.320).
The semantic differential
Osgood et al. (1957) introduce the semantic differential, a tool devised for quantitative
measurement of affective connotation. Test subjects are supplied with a linguistic
stimulus and with a number of seven-step scales. On every scale, the two extremes
are marked with bipolar adjectives, i.e. pairs of adjectives with antonymous or
opposite meaning. Test subjects are asked to rate the presented stimulus along the
scales by ticking each of them at a position which feels appropriate. If the left end
of the scale is marked with A and the right end is marked with B, the seven steps
mean ‘extremely A’, ‘quite A’, ‘slightly A’, ‘neutral’, ‘slightly B’, ‘quite B’, ‘extremely
B’.
In a way, semantic differentials are similar to Likert scales, which are still popular in questionnaires. On a Likert scale, extremes are not defined with words.
Instead, subjects rate their level of agreement with the stimulus; the scale ranges
from ‘totally disagree’ to ‘completely agree’. Although it has been noted that semantic differentials and Likert scales do not deliver equal results (Friborg et al.,
2006), the concept is very similar.
The semantic differential has proven to be widely influential and has become a
standard method in psychological questionnaires. Initially devised by Osgood et al.
CHAPTER 2. MEANING
17
for use with linguistic stimuli (i.e. words), semantic differentials can also be used
for non-linguistic stimuli. The basic referential function of signs, i.e. ‘denotation’, is
not necessarily dependent on the mode of the sign, and, as shown before, two completely different signifiers can refer to the very same signified. For instance, French
(1977) and Espe (1985) suggest a graphic differential for cross-cultural research, which
uses non-linguistic icons instead of antonymous adjectives to describe the scales. In
my undergraduate thesis (Spindler, 2006), I used semantic differentials for musical
stimuli in a web-based quantitative study.
Major factors
Osgood et al. (1957) used the semantic differential in several studies with a large
number of test subjects. With their test results, they performed a factor analysis.
When different scales are rated similarly to other scales, this indicates similarity or
overlap in the connotation of the scales. Factor analysis finds these similarities and
shows underlying dependencies between scales. It reduces the number of dimensions by extracting the statistical variance.
Three factors or dimensions were identified to be of greatest importance. They
reoccurred in every study and were subsequently named (Osgood et al., 1957, p.6263).
Evaluation is the most important factor. It “accounts for approximately half to
three-quarters of the extractable variance” (Osgood et al., 1957, p.72). The adjective pair that received the purest results for this factor was good-bad. This means
that almost all of this pairs’ affective association is covered by evaluation. Other
examples named are optimistic-pessimistic, positive-negative and completeincomplete. Therefore, more than half of a word’s ‘affective connotation’ is determined by how favourable or unfavourable we perceive its denoted signified. They
conclude that “the attitudinal variable in human thinking [. . . ] appears to be primary – when asked if she’d like to see the Dinosaur in the museum, the young lady
from Brooklyn first wanted to know, ‘Is it good or is it bad?’” (Osgood et al., 1957,
p. 72)
Potency is connected with power and related concepts like size or weight. The
pivot pair (the purest scale) named by Osgood et al. is hard-soft. Other examples
are heavy-light, masculine-feminine or strong-weak.
Activity is the third factor, “concerned with quickness, excitement, warmth, agitation and the like.” (Osgood et al., 1957, p.73) active-passive was determined as
the pivot pair, other examples are excitable-calm or hot-cold.
Potency and activity are of similar importance to the affective connotation of a
word, each of them accounting for approximately half the variance of evaluation.
Osgood et al. also found a slight correlation between the two factors. For that
reason, they also suggest that one might combine the two factors under the name
dynamism. As can be seen in figure 2.3, many other factors have been extracted.
18
CHAPTER 2. MEANING
Osgood et al. suggest names for some of these factors but could not identify them
as stable across all studies. Though the first three factors account for much of a
terms’ affective connotation, one needs to bear in mind that a three-dimensional
description can not be exhaustive (p. 323). The authors do not believe, however,
that a very high number of dimensions would finally lead to a description of a
word’s denotation.
Per cent of Total Variance
100%
75%
50%
25%
I
II
III
Evaluation Potency Activity
IV
V
VI
VII
VIII
IX
n
Dynamism
Factors in Order of Extraction
Figure 2.3: “Relative importance of semantic space dimensions”. Adapted from
(Osgood et al., 1957, p.73)
However, the factors uncovered by Osgood et al. show something fundamental
about human sense-making. Valdez and Mehrabian (1994) note that the same factors were also replicated for other kinds of stimuli like paintings and sonar signals.
Consequently, the factors play an important part in many theories of affect and will
reoccur on many occasions throughout this thesis.
Personal and cultural subjectivity
Affective experiences are inherently subjective. The statements obtained by the
semantic differential vary from person to person and across cultures, depending
on the personal views and values of the person asked. Osgood (1964) conducted
a cross-cultural study to address these issues, focusing on 12 to 16 year old males
from 16 different countries.
First of all, concepts differ considerably in their polarization, which is calculated
as the average distance of ratings from the centre of the scales. Some concepts evoke
much stronger affective reactions than others. For instance, mother evokes strong
affective reactions in all cultures, while wednesday has low affective intensity ev-
CHAPTER 2. MEANING
19
erywhere. Furthermore, the level of polarisation can be different across cultures.
For instance, the concept of guilt did not evoke strong affective reactions in USAmericans and Indians, but very strong reactions in three other cultures.
His analysis further shows that the level of subjectivity varies strongly across
tested concepts, represented by different values for standard deviation. Osgood
sees this as an indicator of how much a concept is stereotyped. For some words,
considerable agreement is reached, which Osgood calls “culturally stereotyped concepts”. On the other hand, concepts for which very different answers were given
are “culturally amorphous”. Furthermore, the level of stereotyping varies crossculturally. Some of the tested concepts achieved quite consistent ratings in some
cultures and highly diverse ones in other cases.
Despite all the cultural differences, the notion of three basic factors EPA could
be reproduced in all cases. While actual affective connotation varies interpersonally and cross-culturally, the ways in which it can be described seem to be very
consistent.
Affective space
Evaluation, Potency and Activity (EPA) have proven to be stable factors in the description of affective connotation. They are defined as independent, orthogonal
dimensions of a Euclidian space. Thus, they can be seen to span a cartesian coordinate system. Osgood et al. call this coordinate system ‘semantic space’. However,
‘semantics’ is, like ‘meaning’, a very general term which can be seen to cover both
denotation and affective connotation.
Osgood et al. note, as said before, that their work does not capture the denotative function of signification, which can illustrated by an example. The signs
success, nurse and sincere refer to different, not necessarily related things (denotation). However, they yield very similar results for the factors EPA (Osgood et al.,
1957, p.323), which means that our affective reaction towards these concepts is very
similar (affective connotation). In this way, they show a relation in affect between
these concepts, a similarity not covered by their lexical definition. Dimensional
theories of affect and emotion (see chapter 3) replicate these factors. In his crosscultural study, Osgood (1964) himself picks up the term affective meaning to describe
what is measured by the semantic differential. For these reasons, the term affective
space seems to be a more appropriate name for the coordinate system spanned by
these general affective factors.
This hypothesised affective space describes affective connotation. As Barthes
(1973/1996) has shown, connoted systems can not be described directly. Instead,
we need to use some form of metalanguage, which replaces the connotated signified
with a sign of which we know both signifier and signified. Evaluation, potency and
activity fulfil this requirement. Thus, they constitute a very direct and low-level
metalanguage for affective connotation.
CHAPTER 2. MEANING
2.3
20
Meaning in information resources
The described aspects of meaning – denotation, conceptual connotation and affective connotation – apply to any form of content, anything we ascribe meaning to. In
the following, I try to outline – in admittedly simplified terms – how these semiotic
principles can be applied to the world wide web.
We can look at the web as a huge collection of content. In itself, this collection is unstructured; there is no global hierarchy or taxonomy of meaning. The
content is made up of different media types; text, pictures, sound and video. On
the technical level, these entities are represented as files or objects; a more general
term is information entity. Each and every information entity carries meaning for
humans, which can be divided up further into denotation, conceptual connotation
and affective connotation.
Consider the situation for text, the most common media type on the web. On
the denotative level, text consists of a signifier – the digitally represented words
– and a signified – the lexical meaning of the text, the actual content. This is the
level of meaning we usually engage with. It is also a form of meaning well handled
by software. Search engine robots can parse the text. At a first level, the robot can
make an index of words which occur in the text. This already proves very helpful for
human users to find content. Much like a dictionary, we can retrieve the content if
we query the search engine index with words that occur in the text. For this process,
the software does not need to have any idea about what is actually signified in the
text; the whole process relies on the human capability of sense-making.
The search engine robot can be assisted by explicit metadata. HTML <meta>
tags describe content in a structured manner. The author can enter a number of
keywords which hint towards what is denoted in the text and can enter a language
code. Thus, metadata categorises the content. In our semiotic model, metadata
corresponds to conceptual connotation.
What remains to be covered is the affective connotation of the text. Generally,
this aspect of meaning is not present in the words (signifier) of the text; it is something which we can feel if we are aware of the content of the text (signified). A naïve
search engine robot has no awareness of the signified and is not sentient. Thus, a
simple word parsing approach does not work. Likewise, it is unclear how the user
is supposed to express a query for content with a certain affective value. As we have
seen, connotation can only be described through the detour of a metalanguage.
The task of an affective space interface is thus twofold. It must present the affective connotation of information entities to the user in an understandable manner
(visualisation), and users need to able to express what kind of affect a sought after
information entity needs to express (query). Of course, these are just two ways of
looking at the same problem; ideally, users can query the system in the same way
the system visualises affective connotation. For these tasks, effective metalanguages
21
CHAPTER 2. MEANING
are necessary, for both internal representation and in ways which are comprehensible for humans.
Data generation methods
Quite recently, the information retrieval community has begun efforts to automatically extract the affective value of text (e.g. Kamps et al., 2004; Esuli and Sebastiani,
2006; Bestgen, 2008) or other media types (e.g. Chen et al., 2008 deal with music). These efforts are summarised under the term opinion mining. This is a young
research area, aiming to solve the problem of annotating existing content with affective metadata.
This thesis is not about algorithmic feature extraction. Instead, my focus lies on
human interfaces which allow users to interact with the affective value of content.
Generally, I am making no assumptions as to how this data came about. Algorithmic extraction may certainly be helpful for large-scale affective annotation of source
material. However, current efforts mostly concern text and are only approaching
extraction of the first dimension, evaluation. Furthermore, the analysis of Osgood
(1964) has shown the strong subjectivity and cultural differences of affective statements, which seems to contradict the underlying assumption of opinion mining
that there is an inherent affective meaning which can extracted algorithmically.
The interfaces I will describe allow manual annotation of content with affective
information. Affective connotation is an internal and subjective process. Therefore,
the only method that seems to be both valid and feasible is introspection, the reporting of internal experiences by the subjects themselves. The idea is to facilitate
introspection by giving the user an interface which allows him or her to express
affective states in an intuitive way, and which visualises affect in some form on a
computer screen.
The other method that has been employed in experimental psychology to attain
data about internal experiences is the use of physiological measurements, like the
measurement of heart rate and blood pressure. In a laboratory setting for experiments, this method may be feasible. For users of interactive systems, however, it
is not. We need a method which gets by with the output and input options of a
standard computer setup. One possible exception might be the use of automatic
facial expression recognition, which will be discussed briefly in chapter 5.5.
A manual, introspective annotation process could be facilitated by social collaboration. This process can be compared, for example, with social bookmarking
services like delicious1 . There, too, content is not being categorised algorithmically.
Instead, users describe their bookmarks with short descriptions and categorise it
with tags. Because this information is available to other users, they benefit too.
Content annotation thus becomes a collaborative effort. In the case of affective con1 http://www.delicious.com,
last accessed 2009-03-18
CHAPTER 2. MEANING
22
notation, social collaboration would increase the validity of statements, resulting in
a kind of voting process. Each user gives an opinion, and from several statements,
statistical data can be derived. A mean or median value could indicate an overall
direction of an entity’s affective value, while standard deviation indicates to which
extent users agree in their opinions (stereotyped vs. amorphous in Osgood’s terms).
Human statements about affect are all equally valid, and this process offers a way
to achieve sensible overall statements.
Discussion
Affective connotation has been recognised as an intrinsic part of meaning. While denotation and conceptual connotation are covered well by interactive systems, there
are only few attempts to represent the affective value of information entities. One
possible reason for this is the difficulty of describing and visualising this kind of
information. Since affect is only connoted, it cannot be described directly but only
via metalanguages. I believe that affective connotation represents a big yet virtually
untapped potential for use in information systems.
In the next chapter, I examine the nature of affect and emotions.
Chapter 3
Affect and emotion theories
The authors who noted the existence of emotional or affective connotation (e.g.
Urban, 1939, Ogden et al., 1923/1969, Osgood et al., 1957) say little about the reasons
for its existence, apart from reference to emotions. In this chapter, I will undertake
a brief examination of theories about the nature of affect and emotions. As the
examination will show and was also noted by Osgood (1964), theories of emotion
and affect resonate well with evaluation, potency and activity, the major dimensions
of affective space as put forward by Osgood et al. (1957). Therefore, it seems to be
a justifiable inference to see emotion as an important part of affective connotation.
Emotions are an important aspect of life for humans and animals. Their existence is undisputed; their basis and function, however, is disputed. As Fehr and
Russell (1984) put it, “Everyone knows what an emotion is, until asked to give
a definition.” Most research about emotion has been undertaken in the field of
psychology. However, there are also theories about emotion which come from a
wide range of other disciplines – e.g. biology, philosophy, anthropology, musicology. Strongman (1996) gives a comprehensive account of more than 150 theories of
emotion. However, he admits that having an informed view of emotion does not
necessarily mean that one might be able define what emotions really are. The brief
survey which follows only provides a glimpse into the emotion research and is in
no ways exhaustive. Emphasis has been put on theories which could prove valuable
in informing interface design.
Research and scholarly debate about emotions has its roots in philosophy (Strongman, 1996, Solomon, 1993). Plato disregarded emotions and saw it as something
that hinders and detracts from reason. Strongman (1996, p.5) argues that this view
is still common in “folk theory”; outbursts of emotion are still frowned upon, one
is always expected to contain one’s emotions and to act rationally. Aristotle, on the
other hand, had a much more positive view of emotions. He realised that our perception of what happens around us influences our emotions. He was able to name
and analyse specific emotions like anger, pity and fear, and he also saw a connec23
CHAPTER 3. AFFECT AND EMOTION THEORIES
24
tion of emotion with pleasure and pain, thus anticipating an evaluative dimension
of affect.
After that, emotion research was neglected for a long time. Finally, in the late
nineteenth century, Darwin’s contribution to the field, The Expression of the Emotions
in Man and Animals (1872), pioneered the view that emotions are innate and occur
in animals and humans alike. His work is the source of the view that emotions
have biological reasons, rather than being social constructs which are learned in
the course of a lifetime. Darwin noted the intrinsic relationship between emotions
and facial expressions, the latter being seen as the primary way how emotions are
communicated. Facial expressions will be discussed in detail in chapter 5. Darwin’s
work has influenced many researchers.
3.1
Emotion, mood or affect?
Different terms with overlapping meaning are in use to refer to affective phenomena, even within scientific disciplines. The most common terms, which at times are
being used interchangeably, are ‘affect’, ‘emotion’ and ‘mood’. Sloboda and Juslin
(2001) see inaccuracy in the choice of terms as a major source of disagreement between researchers. So, before affect theories can be examined, these terms need to
be disambiguated.
Affect is seen as the most general of the three terms and includes other affective phenomena like emotions or moods. Figure 3.1 gives a graphical overview of
affective concepts and how they can be distinguished through their respective duration. The shortest affective phenomena are “facial expressions and most bodily
responses” (Oatley and Jenkins, 1996, p.29), which typically last in the range of a
few seconds. Emotions are seen to last for a period of time in the range of minutes
to hours, though others (e.g. Schubert, 2001) might place emotions in the range of
seconds. Moods can last from hours to months. At the right end of the spectrum,
emotional disorders and personality traits are long-term affective phenomena, which
can stay with human beings for many years.
Davidson (1994, cited in Sloboda and Juslin, 2001) has a similar view. He says
that moods provide a longer lasting “affective background”, which makes it more
likely for some emotions to occur and less likely for others. Ekman (1999a) emphasises that emotions can begin very quickly, due to their adaptive function.
For Solomon (1993), the difference between ‘emotion’ and ‘mood’ lies in that the
former is directed at something, while the latter is not. He writes that “emotions
are always ‘about’ something or other. One is always angry about something; one
is always in love with someone or something . . . ; one is always afraid of something
(even if one doesn’t know what it is)” (Solomon, 1993, p.12). This he sees in contrast
to moods, which do not have a determinable object (Solomon, 1993, p.11).
CHAPTER 3. AFFECT AND EMOTION THEORIES
25
Figure 3.1: “A spectrum of affective phenomena in terms of the time course of
each”. Adapted from Oatley and Jenkins, 1996, p.30.
Another distinction that has been put forward is that emotions are said to result
in distinct facial expressions, while moods do not (e.g. Ekman, 1999a). The intrinsic
relationship between emotions and facial expressions will be examined in detail in
chapter 5.5.
Emotions therefore have a quite specific meaning. This is the reason why I
avoid the commonplace terms ‘emotional connotation’ (e.g. Urban, 1939) or ‘emotive connotation’ (e.g. Osgood et al., 1957), but refer to the concept as ‘affective
connotation’. Use of the term ‘emotional connotation’ might imply that the concept
only applies to full- blown, short-lived emotions. However, the factors evaluation,
potency and activity apply to other affective phenomena as well, as will be shown
in the following.
3.2
Induction and communication of emotion
Scherer and Zentner (2001) describe a basic and easily understandable model of
how emotion can be induced in humans and communicated to others. The model
is reproduced in figure 3.2.
The upper part of the diagram describes how emotions are induced in humans.
In order for emotions to occur, there must be some kind of event. This event causes
an appraisal process in a person, which evaluates the implications this event has for
him or her. Several factors may be taken into account here. A person may evaluate
the event’s implications concerning his or her needs and goals, and whether the
person is capable of dealing with the consequences of the event. The outcome
of this appraisal process determines how the person feels about this event. For
instance, if the event blocks the way towards a goal, one might feel angry. If one
feels to be in danger, this would cause fear. An unexpected event which results in
a pleasant situation for one would cause surprise and joy. Each of these emotions
then results in expressive behaviour, the symptom. Possible symptoms include facial
expressions, gestures and change in posture.
26
CHAPTER 3. AFFECT AND EMOTION THEORIES
EMOTION
Event
Appraisal
Induction
Person
Empathy
Expression
Symptom
Contagion
Observer
COMMOTION
Figure 3.2: Emotion induction and mediated commotion. Adapted from Scherer
and Zentner (2001, p. 366).
The diagram’s lower part illustrates commotion, which is how emotion might
be communicated to an observer along the induction process. The first possibility
is that an observer goes through a similar induction process. The observer does
not need to be directly affected by the event for this to happen. An example is
when somebody sees injustice being done. Although not being the person suffering,
appraisal of the event would induce an emotion in the observer, which can be very
different from the emotion induced in the concerned person (e.g. anger in the
observer, fear or sadness in the concerned person). Another possibility is empathy,
in which the observer identifies with the person. Scherer and Zentner note that
empathy requires sympathy for the person. If the observer likes the person, the
emotional state of this person might cause emotions in the observer. An example
would be to feel sorry about the illness of somebody. Finally, the authors note a
third path of commotion, contagion. In the case of commotion through contagion,
emotion is induced simply by observing the expressive behaviour of somebody,
without the need of knowing about the reason for an emotion. The observer may
then mimic this expressive behaviour. A possible example is to smile back at a
stranger who gave you a smile.
3.3
Dimensional vs. categorical approach
The theory outlined above is but one of literally hundreds of emotion theories.
These theories are quite diverse and are rooted in the various scientific fields that
have contributed to our understanding of emotion. I will focus on two common
approaches, which seem to be the most promising ones for use in affective space
interfaces. For the most part, I will focus on the results of these theories, not getting
into details about the biological or psychological explanations for the existence of
27
CHAPTER 3. AFFECT AND EMOTION THEORIES
AROUSAL
Afraid
Aroused
Astonished
Alarmed
Tense
Angry
Excited
Annoyed
Distressed
Frustrated
Delighted
Happy
PLEASURE
Pleased
Miserable
Sad
Gloomy
Glad
Serene
Content
At ease
Satisfied
Relaxed
Calm
Depressed
Bored
Droopy
Tired
Sleepy
Figure 3.3: “A circumplex model of affect” Russell, 1980, p.1167, redrawn.
emotions. The first group of theories follows the dimensional approach, in which it
is maintained that emotions can be described accurately enough through a number
of independent factors. The other group ascribes to the categorical approach, built
on the notion of distinct emotions. Finally, while advocates of either side usually
consider the two concepts to be mutually exclusive, some researchers have tried to
reconcile the two currents.
Dimensional approach
Theories of emotion which take the dimensional approach identify a very small
number of factors which together describe an emotional state, thus spanning an
emotional or affective space as it has been introduced in the first chapter. The identified psychological or biological reasons for emotion vary considerably. However,
the identified dimensions tend to be very similar. A common debate between advocates of these theories is whether two dimensions are sufficient to describe emotions
accurately enough or if three dimensions are necessary.
The idea of underlying dimensions of emotion goes back to the late nineteenth
century (Sloboda and Juslin, 2001) but receives more attention about 40 years later,
beginning with a contribution by Woodworth (1938; cited in Sloboda and Juslin,
2001). Schlosberg is an important early proponent of a dimensional approach. At
first he identified two dimensions (Schlosberg, 1952), but later added a third dimension (Schlosberg, 1954).
CHAPTER 3. AFFECT AND EMOTION THEORIES
28
An influential dimensional theory was Russell’s circumplex model (1980) in
which emotions are roughly distributed on a circle in a two-dimensional space. The
dimensions being used are valence and arousal, which remind of Osgood’s dimensions evaluation and activity. First, Russell divided this space up into 8 sections.
Then he selected 8 terms for affective categories and let subjects order them on this
circle. In the next study, he let subjects place a number of words which describe
affective states place in one of 8 affect categories. Subjects largely agreed on the
categories in which the terms fit in. This resulted in good distribution of the terms
around the circle. His results are reproduced in figure 3.3.
Sloboda and Juslin say the following about Russell’s theory, “The circumplex
model captures two important aspects of emotions: that they vary in their degree
of similarity and that certain emotions (e.g. happy, sad) are often thought of as
bipolar. About the same circular structure has been found in a large number of different domains . . . suggesting that the circumplex model really captures something
fundamental about emotional responses.” (Sloboda and Juslin, 2001, p.77)
However, important emotional distinctions are blurred in a two-dimensional
model. For instance, fear and anger are very different in their implications for the
body. In the circumplex model, the two emotions occupy very similar positions,
because they both are unpleasant and have high arousal (Sloboda and Juslin, 2001).
This problem is commonly tackled with the inclusion of a third dimension. “Although use of only two of these factors has been tempting because of greater simplicity, adequate characterization of important distinctions among certain clusters
of affect (e.g., fear, sadness, anger) has necessitated a three- dimensional representation.” (Mehrabian, 1996, p.3)
Mehrabian thus suggests a more sophisticated model, postulating the dimensions pleasure, arousal and dominance. He notices the similarity to Osgood’s dimensions and acknowledges them to be affective dimensions in essence. Pleasure,
arousal and dominance are suggested by Mehrabian as emotional equivalents to
Osgood’s general dimensions. He gives a detailed list of emotion examples for each
of the octants of this three-dimensional coordinate system, which is reproduced in
table 3.1.
In table 3.2, I give a comparison of the various dimensions which have been
suggested by proponents of dimensional theories of affect. Despite the use of different terms, there is a striking similarity between the dimensions which have been
identified in theories of emotion, and those which have been identified by Osgood
et al. (1957) as dimensions of meaning. First, there is general consensus about an
evaluative factor, variously called ‘pleasure’, ‘pleasantness’ or ‘evaluation’. ‘activation’ and ‘arousal’ correspond well to ‘activity’; ‘dominance’ and ‘attention’ may be
seen as similar to ‘potency’.
The dimensional theories have been criticised by Paul Ekman, a strong advocate
of the categorical approach, arguing that “the evidence suggested at least four or
29
CHAPTER 3. AFFECT AND EMOTION THEORIES
Octant
Examples
+P +A +D
admired, bold, creative, powerful, vigorous
+P +A −D
amazed, awed, fascinated, impressed, infatuated
+P −A +D
comfortable, leisurely, relaxed, satisfied, unperturbed
+P −A −D
consoled, docile, protected, sleepy, tranquilised
−P +A +D
antagonistic, belligerent, cruel, hateful, hostile
−P +A −D
bewildered, distressed, humiliated, in pain, upset
−P −A +D
disdainful, indifferent, selfish-uninterested, uncaring, unconcerned
−P −A −D
bored, depressed, dull, lonely, sad
Table 3.1: Pleasure, arousal and dominance model (Mehrabian, 1996)
Author
Dimensions
Schlosberg (1952)
Pleasantness
Attention
Schlosberg (1954)
Pleasantness
Activation
Attention
Osgood et al. (1957)
Evaluation
Activity
Potency
Osgood et al. (1957)
Evaluation
Dynamism
Ekman (1957)
Pleasantness
Activity
Osgood (1976)
Pleasantness
Activation
Russell (1980)
Valence
Arousal
Mehrabian (1996)
Pleasure
Arousal
Control
Dominance
Table 3.2: Comparison of dimensional theories of affect
five dimensions” (Ekman et al., 1972, pp.73-74) and consensus only covers the most
basic factors of evaluation and intensity. Osgood remarks that “of course, there
must be many dimensions, so the real question is how many are needed to account
for the lion’s share of the variance” (Osgood, 1976, p.126). This is in line with his
analysis of ‘meaning’ (Osgood et al., 1957), in which many factors were identified
but three of them showed to be most stable and important.
Categorical approach
The categorical approach to emotion is built on the notion of a small set of basic
emotions (e.g. Ekman et al., 1972, Ekman, 1994, Plutchik, 1980, Izard, 1977), which
are distinct from each other. These emotions are seen to have evolved during evolu-
CHAPTER 3. AFFECT AND EMOTION THEORIES
30
tion, serving specific functions. Commonly, emotions are seen to be connected with
the pursuit of goals (e.g. Ekman, 1999a). For instance, anger occurs when a plan to
reach a goal does not work out, while we feel happiness when we have achieved
our goals (Oatley, 1992). Through emotions, “our appraisal of a current event is
influenced by our ancestral past.” (Ekman, 1992a, p.171) In this view, emotions help
to guide our behaviour in a world of unexpected events. Thus, basic emotions are
not rational and “solve problems with speed rather than precision” (Sloboda and
Juslin, 2001, p.77)
There is general agreement about the fact that a list of basic emotions does not
cover the whole range of emotional states human beings can experience. The explanations for this discrepancy, however, differ. Plutchik (1980) postulates 8 basic
emotions, which can occur simultaneously to produce non-basic emotions. Ekman
(1999a) has also considered this possibility, but then introduced his theory of emotion families, which is explained below.
The idea of ‘basic emotions’ has been criticised on various occasions, mostly by
proponents of a dimensional approach (e.g. Russell, 1994). A point of criticism is
the unclear distinction between which emotions are basic and which ones are not.
The number of identified basic emotions varies greatly between authors. Ortony
and Turner (1990) compared categorical theories of emotion and found as little as
2 and as many as 18 postulated basic emotions in different theories. However,
this discrepancy most like comes about through different definitions of what an
emotion really is (Sloboda and Juslin, 2001). Accounting for that, there seems to be
considerable consensus among most advocates of a categorical approach in regard
to those emotions which have been postulated by Ekman.
Paul Ekman
Paul Ekman is the best known proponent of a categorical approach. He started out
using a two-dimensional approach (Ekman, 1957), but soon revised his theory. He
became most famous for his cross-cultural studies of emotion (Ekman et al., 1972),
in which he used photographs of facial expressions from different cultures. He
could show that emotions could be accurately judged cross-culturally from looking
at facial expressions. Facial expressions are crucial to his theories of emotion, so
much so that at times only affective states which are accompanied by distinct facial expressions he considered to be proper emotions (Ekman, 1999a, Ortony and
Turner, 1990, Sloboda and Juslin, 2001). He inferred six basic emotions which could
be judged cross-culturally: ‘anger‘, ‘joy’, ‘surprise’, ‘fear’, ‘sadness’, ‘disgust’.
Later, he refined his theory. While he maintains that basic emotions are always
accompanied by a bodily signal, it does not need to be a facial expression (Ekman,
1992a). He introduces the notion of emotion families, which are similar emotions
and variations of one basic emotion (Ekman 1975; cited in Ekman, 1992a). He
also acknowledges that many other affective states are candidates for basic emotion
CHAPTER 3. AFFECT AND EMOTION THEORIES
31
status, namely ‘interest’, ‘contempt’, ‘guilt’ and ‘shame’ (Ekman, 1992a). He also
addressed the problem that his list only includes one positive emotion, ‘joy’, but
five negative ones. He maintains that there are as many positive emotions as there
are negative emotions. Unlike negative emotions, however, positive emotions do
not have a distinct bodily signal, but share the facial expression of a smile.
His theories will be further explored in chapter 5.5. Since his theories are closely
tied to expressions, they provide a promising framework for the visualisation of
affect in interactive systems.
3.4
Discussion
Both approaches, dimensional and categorical, have their merits and receive encouraging experimental results. In fact, the two approaches seem to describe different
aspects of the same phenomenon.
Christie and Friedman note in an emotion judgement study which used films
as stimuli that “a valence – activation circumplex was found in experienced emotion despite that the films were selected on discrete emotion criteria” (Christie and
Friedman, 2004). Ekman et al. (1972), a strong advocate of basic emotions, acknowledges the factors evaluation and intensity. Two dimensions of emotion are
seen as a simple explanation but are continually regarded to be insufficient to accurately describe emotions (Young et al., 1997, Ekman, 1994, Izard, 1997), while
three-dimensional models are seen to describe emotion accurately enough for practical purposes (Mehrabian, 1996).
On the other hand, the notion of basic emotions which serve biological functions is compelling. A large number of cross-cultural experiments in which test
subjects were able to accurately judge emotions shows that human beings actually
do think in categories of emotion (Etcoff and Magee, 1992) and are able to name
them correctly.
It is not a new idea that these two approaches are actually not so different. Figure 3.4 shows an early attempt to map categories of emotion onto a two-dimensional
affective space. In their comparison of the two approaches, Young et al. conclude:
“Dimensions such as pleasant–unpleasant thus correspond to intellectual, not perceptual constructs” (Young et al., 1997). Dimensional approaches constitute a model
to efficiently describe emotions. However, evidence suggests that humans do actually think in emotion categories.
In chapter 2, I introduced the work of Osgood et al. (1957). In semantical studies,
they examined the nature of the affective connotation of words and inferred the
existence of an affective space, constituted by the dimensions evaluation, activity
and potency. Their studies show that any stimulus occupies a point in affective
space. The words used for their studies did not specifically denote affect or emotion
32
CHAPTER 3. AFFECT AND EMOTION THEORIES
Love, Mirth, Happiness
PLEASANT
Contempt
Surprise
REJECTION
ATTENTION
Disgust
UNPLEASANT
Fear,
Suffering
Anger,
Determination
Figure 3.4: Emotion categories on a two-dimensional affective space. Adapted from
Woodworth and Schlosberg (1954).
but still had an affective connotation. The semantic differential then is but one way
to make affective connotation visible.
Dimensional theories of emotion start out from different propositions but arrive at very similar conclusions. This is hardly surprising; emotions are one of
several kinds of affective phenomena and can thus be described in the same way.
Therefore, each emotion naturally occupies a position in affective space. Mehrabian (1996) explicitly references evaluation, activity and potency as framework for
emotion description. Since emotions are affective phenomena, they not only have
an affective connotation but actually denote affect. Consequently, their positions in
affective space are quite easily determined and are likely to be more extreme than
the position of concepts which do not denote affect.
However, this approach does not fully explain emotions. Darwin (1872) showed
the existence of emotions in other primates, a view which was confirmed by the
studies of Ekman and others. Emotions are ancient mechanisms which predate
language and have developed evolutionary to automatically adjust our body to
external influences. They also serve a communicative purpose; through facial expressions, surrounding beings are informed about one’s emotional state, in order to
adjust their behaviour accordingly.
While emotions are not the only affective phenomena, they certainly are very
important ones. Their close ties with expressive behaviour, first and foremost facial
expressions, as well as their cross-cultural universality, make them a very interesting
candidate for a necessary task on the way to affective space interfaces, the expression or visualisation of affective space on a computer screen. Chapter 6 describes
such an attempt.
In summary, I believe that dimensional models are a general description of affect. These dimensions, however they might be called, apply to any concept which
CHAPTER 3. AFFECT AND EMOTION THEORIES
33
carries meaning and thus can be evaluated by humans. Emotions are not equal
to affect, but rather are a specialised form of affect which fulfils several biological
functions. Categorical approaches describe specific emotions, not affect. However,
since any emotion can be located in a general affective space, they describe affect
via this proxy.
Computational models of affect
Peter and Herbon (2006) give a rare overview of how the various emotion and affect
theories could be best transferred into the domain of human-computer interaction.
They believe in a current lack of solid theoretical models of emotion in most affectrelated software efforts. Software designers should look into the field of psychology
to select those theories best suited for use in software. This decision should be based
on the system’s requirements.
They favour dimensional theories and disapprove of categorical theories for use
in HCI. They criticise that categorical theories label emotions with words, a process
which they not only deem unnecessary but also counter-productive. They see emotion categories to be artificial and specific to the English language. Dimensional
theories, on the other hand, do not need verbal descriptions. Dimensional descriptions are easily transferable into software, only requiring to store two or three values
which fully describe an emotion.
I agree with the notion of Peter and Herbon that software designers need to pick
those theories which are best suited for a specific task at hand. However, I do not
share their strong disregard for the categorical approach. Research has shown that
humans do think in categories of emotion. The large evidence from cross-cultural
research, in which emotions were not named but only shown as photographs, compellingly shows that emotional categories are neither artificial nor language-specific.
The strong connection with facial expressions is a major advantage of emotion categories. Facial expressions are a promising way to visualise at least part of affective
space in a universally understandable manner.
As I have explained before, the two approaches most likely describe different
things. A dimensional model seems to be ideal for general purposes in which
affect should be described. For the description of emotions, however, the categorical
approach seems to be more expressive.
One might argue that categories could simply be expressed by locations in affective space. A dimensional model in the core would then render actual categories
obsolete. However, I do not believe that an actual emotion can be fully expressed
by two or even three affective dimensions. Their meaning is quite specific and most
likely includes other dimensions. Still, a three-dimensional model seems to be able
to capture the differences between many of the proposed emotion categories. Conversion of values between the models should then be possible, though at a loss of
accuracy in either direction.
Chapter 4
Affect in art and media
The notion of affective space was inferred empirically from language stimuli (see
chapter 2). However, there is a considerable body of evidence that this concept is
ubiquitous in human creations and can be observed (and reliably judged) in any
form human creativity takes. Art has a very special relationship with emotions.
Some forms of art seem to be created specifically for affective reasons, but the
principle can be observed in any piece. Furthermore, everyday media produce
affective connotations as well.
I begin with general theories which outline how it is possible that art and media
are related with emotions. Afterwards, I will examine the special relationship of
emotions and music. The findings of this chapter lay the final groundwork for the
use of affective information in interactive systems.
4.1
General theories
Emotions are a defining feature of sentient beings, which include humans and animals. As explained in chapter 3, emotions can be thought of as processes, which
begin with appraisal and may result in expressive behaviour. Expressive behaviour,
like facial expressions (see chapter 5), is therefore not the cause of emotions, but an
accompanying signal of an ongoing emotional process.
Works of art are not sentient. This leads to two questions I will try to address:
How is it possible that we think that art can express emotions? Furthermore, emotions always have an object towards they are directed (Solomon, 1993). Why, then,
are we actually moved emotionally by a work of art if our appraisal of the situation
tells us that there is no actual reason to feel in such a way?
34
CHAPTER 4. AFFECT IN ART AND MEDIA
35
Sign models
In chapter 2, I made frequent use of the simple sign model (signifier signifies signified). The concept is well suited for language, since it captures the usually arbitrary
relationship between words and their denoted concept. It is not possible to discern
what a word means from just reading or hearing it, and different languages use
different words to reference identical concepts.
Davies (2001), who writes about emotions in music, mentions this as one possible way of thinking about how art can express emotions. In this model, works of
art merely serve a sign function like words do. The relationship, then, is arbitrary
and works on the basis of syntactic rules. He counters, though, that such a relationship is not possible because music lacks semantics and other defining features of
language.
In linguistics, onomatopoeia is the exception to the rule that words are arbitrary
signs. Onomatopoeia describes words whose spelling and pronunciation imitate the
sound of their signified. Well-known English examples include hiccup and cuckoo.
Onomatopoeic signs are not arbitrary, but conventionalised and still differ between
languages. This class of signs may be called iconic or representational signs (e.g.
Davies, 2001, Mikunda, 2002). Iconicity generally refers to signs – linguistic and
non-linguistic – whose signifier is similar to, or somehow resembles its signified.
This does not, however, take away from their sign function. Cuckoo still is, though
not completely arbitrary, a linguistic signifier for a specific kind of bird. Iconic
signs refer to something else – their denotation – and can give rise to feelings –
their affective connotation.
In the context of music, Davies compares this to certain sounds. For instance,
while a trumpet can produce high, and therefore bright sounds, the lowest sounds
of a clarinet would be dark. However, he raises the concern that this model does
not reflect how we feel about music; reading the word sad does not give rise to the
same feelings as listening to sad music. Emotion is actually expressed by works of
art, not just signified. This leads to the contour theory.
Contour theory
A stronger form of connection between art and emotion can be achieved if a work of
art does not merely function as a signifier for a certain emotional state but actually
expresses this state. This becomes possible if a work of art bears a great deal of
resemblance with an emotional state.
Davies (2001) is a proponent of this theory. He argues that when something
closely resembles an emotion, it is plausible that this something should be perceived
to express an emotion. This solves the problem that a piece of art is seen to express
emotions, even though it is not sentient and thus cannot ‘have’ them.
CHAPTER 4. AFFECT IN ART AND MEDIA
36
This resemblance comprises any form through which emotions are expressed
by sentient beings, like facial expressions, gestures, postures and other forms of
behaviour. Davies argues that these expressions do not operate as a signifier for an
emotional state like iconic signs do, but ‘are’ this state in itself. Some behaviours
are always perceived as expressive of emotions, even though they are not actually
expressing a present emotion.
He gives the example of the weeping willow and St Bernard’s dogs (see figure
4.1). Both shapes appear to be sad-looking. However, plants are not sentient and
thus cannot possibly experience this emotion. Dogs are sentient, but there is no
reason to assume that St Bernard’s dogs are sad, even though their faces look like
they were. Davies attributes this phenomenon to the human trait of anthropomorphising our environment, which means that we ascribe human characteristics to
non-human beings and even non-living things.
Figure 4.1: Plants and animals can be perceived as sad-looking without feeling this
emotion. Taken from Davies (2001, p. 36).
Pieces of art can then mimic these expressive behaviours in their own ways.
Paintings might imitate shapes that are found in facial expressions or postures,
while music’s dynamic character could imitate the movements that characterise
emotional states (like the slow movement of a sad person). In the contour theory,
only those emotions that result in expressive behaviour can thus be expressed in
works of art.
A question that remains is why recipients would react emotionally to works of
art which only resemble an emotional expression. For instance, listening to sad
music actually invokes physiological changes similar to actually experiencing this
emotion; in this way, listeners ‘mirror’ the emotion expressed in a song. If we are
aware that an entity only appears to be expressing sadness but in fact is not feeling
this way, it is counter-intuitive that we should mirror the emotion.
Davies responds that such an emotional reaction is not an emotion in the narrow
sense, which requires that an emotion is always directed at or is about something
(Solomon, 1993). The work of art does not become the object of the emotion we
experience. Instead, he compares the phenomenon to situations in which a certain
CHAPTER 4. AFFECT IN ART AND MEDIA
37
mood – say, sadness – is prevalent. Even though one might not have any personal
reason to feel sad, the mood affects one to experience this emotion. This idea conforms with contagion, the third type of emotion communication noted by Scherer
and Zentner (2001) (see chapter 3).
Expression theory
Robinson (2005) is a proponent of the expression theory. This theory maintains that
the emotions in a piece of art are expressed by a persona. This persona might be
identical to the author1 , in which case she speaks of an implied author. In other
cases, it is a completely fictional character. The introduction of a persona solves the
dilemma that works of art are not sentient and thus cannot express emotions. Instead, it is a character, embodied in the piece, who lives through emotion processes.
This persona is shaped by the work of art, because it might give hints about
this persona’s character, appearance, behaviour and beliefs. Though certainly influenced by the work, the actual form this persona takes is constructed by the recipient2 .
This is a very similar view to Iser (1978), who examines the role of the reader in
literature. He maintains that a text can only come to life if it is “konkretisiert”, or realised, by a reader. In the reader’s mind, the stimuli a text provides are augmented
to draw a certain ‘picture’ (“Konkretisierung”), one that is different between readers, but also if one person reads a text multiple times. Only through this process a
story can achieve what he calls “life-likeness”.
Once the persona is established, it is subject to and interacts with the fictional
world described. Thus, Robinson offers two interrelated ways in which emotion
can be expressed in art. On the one hand, the piece can focus on the environment
the person is exposed to. Emotion is then expressed by showing how the world
appears to the person when he or she feels a certain emotion. For instance, if the
person is angry, the world would be described as offending; if the person is sad,
the world would be a place that is empty and devoid of meaning. On the other
hand, the focus can be on the person. Then, emotion can be expressed through the
character itself, by describing his or her thoughts and beliefs. An angry character
would be offended, a frightened character would believe to be threatened. Since the
character interacts with the world, this is a matter of focus, and a successful work
of art would likely exhibit both possibilities.
When a piece of art expresses a certain emotion, this is usually done intentionally by the author. However, the recipient is the one who finally realises the work.
Thus, it is also possible that the stimuli provided by the work lead to an emotional
response not intended by the author. In this way, a piece of art comes to occupy its
1 The author, in this context, refers to the originator of a work of art. Depending on its mode, this
might be the composer, the director, etc.
2 Depending on the media type, recipient refers to reader, listener, observer, etc.
CHAPTER 4. AFFECT IN ART AND MEDIA
38
place in affective space. The character’s thoughts that are expressed occupy a point
in affective space, which results in affective connotation of the piece.
In parts, Robinson picks up ideas from the contour theory; she acknowledges
the role of resemblance and that some shapes, sounds or movements naturally correspond with emotions, just the way facial expressions tell us something about the
emotions of other humans. These characteristics are being distorted, simplified
or exaggerated by artists to show the emotion more clearly, “as if to abstract the
essence of the expressive gesture in a purified form” (Robinson, 2005, p. 288). She
believes, however, that art can go much further in its expressive powers.
After we were affected by an emotion process in real life, we can look back on
our experience and label the process with a term that summarises the feeling, like
sad or happy. This is very difficult to achieve while we are experiencing an emotion.
For Robinson, this is the point which art excels at.
Works of art do not just show emotions like facial expressions but let the recipient actually participate in what it is like to go through an emotion process. Art
which expresses emotions reflects on an emotion process and offer the recipient a
summary of the process. If this is done successfully, the recipient as well as the author learn something about the emotion and afterwards understand it a little better.
And while emotion terms can only give a rough indicator, emotion expression in
art can be incredibly precise and subtle.
As its name implies, the expression theory focuses on how works of art are
capable of expressing emotions. The other question, how emotions can be aroused
in the recipient, is not touched upon, but does not pose a problem in this framework.
The emotions are expressed through a persona, which the recipient accepts as a
sentient being. Thus, the same ways in which emotion is communicated between
humans – induction, empathy and contagion – can apply (see chapter 3).
Discussion
A simple sign model as explanation of how emotions are expressed in arts is generally rejected. Even when this relationship is not arbitrary but somehow natural
through resemblance, it does not capture how we feel about art. Such a model
would suggest that art can only express emotions if it denotes affect in the same
way the word sad denotes an emotion. Davies (2001) argues that this would reduce
art to “brute naming” of emotions. Instead, emotions and affect are perceived as
being inherent to works of art.
The second chapter has shown that affect does not need to be denoted to be
present. Whenever we use signs, they are always bound up with affective connotation, even if they are denoting concepts that are not affect-related. I maintain that
the situation is not different for art. Again, the actual content does not need to
denote emotions to be expressive of affect. Instead, the impressive emotional expressiveness of art lies in the intentional and skilful control of its affective connota-
39
CHAPTER 4. AFFECT IN ART AND MEDIA
CONNOTATION
DENOTATION
Sr
Sr
Sd
Sd
Figure 4.2: Denotation and connotation
tion. If affect is connoted, it fulfils the immediate character of emotional experience
Davies is asking for.
In figure 4.2, this function would be located in the curly brace that controls
which associations are evoked. Again, we can see that neither the signifier nor
the signified alone is responsible for affective connotation, but the sign as a whole.
What is denoted (signified) is as important as how it is denoted (signifier). However,
some forms of art may put emphasis on the one or the other. Realistic depictions
(like photographs or detailed landscape paintings) accurately depict the signified,
while abstract depictions (like abstract paintings or instrumental music) at best hint
at a signified but achieve their expressiveness through their signifier.
This is the point where the contour and expression theory come into play. They
are general attempts to explain how such control can be achieved, which can be
further detailed for any media type.
One way how to achieve this effect of regulation is captured by the contour
theory. When images or pieces of music bear resemblance to typical ways how
emotions are expressed in humans, it seems only natural that such associations
are aroused in humans. This effect can operate independently of an artwork’s denotation. In Davies’ example of St Bernard’s dogs, nothing sad is denoted, yet the
expression is perceived as such because the facial features of this type of dog resemble the facial expressions of sad humans. In this context, Robinson (2005) speaks
unfavourably of the “doggy theory” and is convinced that this cannot possibly explain the emotional expressiveness of art.
The expression theory with its focus on persona in art covers another possibility.
The theory is most plausible when there actually is a character in a work of art who
can express his or her thoughts. Davies (2001) denies the proposition of expression
theorists that recipients always construct a persona in works of art which expresses
his or her emotions. He gives the example of instrumental music, for which he
does not believe listeners construct persona in absence of a character. The same
argument could be given for any form of abstract art. Robinson (2005) responds
that it is not necessary to think of a persona to appreciate an instrumental piece of
music, but it is possible and makes it easier to understand its emotional expression.
Another possibility is the use of symbols. Artworks cannot be fully explained by
a simple sign model, but the signifying character of symbols can play an important
role. Some forms of art – surrealist paintings, for example – frequently depict
CHAPTER 4. AFFECT IN ART AND MEDIA
40
objects or shapes which are not to be taken literally but actually denote something
else. In this context, a symbol can be seen as a conventionalised sign relation, in
which a certain signifier reliably stands for a signified. Such symbols, in turn, have
their own affective connotation, thus contributing to the general affective value of
work of art.
The routes outlined above, through which art can achieve its affective expressiveness, are only simple explanations, and doubtlessly, there are countless other
possibilities to consider. The real value, however, lies in the interplay between these
factors, a complex set of denotations and connotations working together to express
the artist’s vision. The explanations offered here may be simplistic; yet they present
possible starting points to represent affective connotation in interactive systems. To
some degree, these general models should be valid for any media type. In the
following, specificities to an important media type, music, will be pointed out.
4.2
Music
The most compelling evidence for an art form that has strong connections with affect and emotions exists for music. Scherer (2001) notes, “It has often been claimed
that music is the language of the emotions”, and Juslin and Sloboda (2001, p. 3)
believe that “emotional experience is probably the main reason behind most people’s engagement with music.” Interactive systems which deal with music should
therefore benefit from awareness of its affective value.
The relationship, however, is not easily explained. If we engage with music
because of emotions, what are these emotions? Are these emotions expressed by
music, or does music arouse emotions in the listener? And if music arouses emotions, how is it possible that so many of us listen to music that would be described
as sad, when we usually avoid negative emotions?
These questions are even more difficult to answer if the focus is on instrumental
music. In absence of lyrics, it is hard to imagine a persona expressing his or her
thoughts, thus more or less ruling out the expression theory for instrumental music
(though Robinson (2005) maintains this is still possible).
Influence of musical structure
Gabrielsson and Lindstrom (2001) examined how emotional expression is influenced by musical structure, such as tempo, pitch, melody and rhythm. To this end,
they surveyed an impressive number of studies. They subscribe to the idea that
composers do not express the emotions they are currently feeling, but are aware of
the effect of musical structure to achieve intended emotional expressions, quite in
line with contour theory.
CHAPTER 4. AFFECT IN ART AND MEDIA
41
The consulted studies can be divided into those that use real music stimuli
and those that use short, isolated sequences. The advantage of using real music
is that it provides a realistic setting. On the downside, results gathered in such
manner are inherently ambiguous because there is no controlled variation of single
factors. Studies which use short sound sequences are easier to interpret in absence
of possible interactions. However, they might lack validity because they are not
part of a complex musical structure. As a compromise, some studies used real
music which was manipulated to vary in one specific factor. A difficulty of this
approach is to achieve naturally sounding compositions for every varying factor.
When isolated, tempo is strongly positively correlated with valence and arousal.
These tendencies, however, can be overruled by other factors, like mode. Minor
mode is often associated with sadness, while major mode is more likely to express happiness. These tendencies were even confirmed in studies with children.
However, happiness can also be expressed in minor mode under the influence of
other factors, like pitch and loudness. Loudness is strongly correlated with arousal,
though big changes in loudness seem to express fear. The influence of pitch is not
as clear, though in many cases it seems to be positively correlated with valence.
In their conclusion, Gabrielsson and Lindstrom stress the importance of context
when examining the affective meaning of musical structure. The expressive power
of music does not lie in isolated factors but in the complex interplay between them.
In my view, this can be compared to language; single words may denote concepts
and connote affect, but explaining the meaning of each word can never explain
the full message of a text, which only comes to life because of the way words are
arranged.
Another differentiating feature between the surveyed studies is the way emotional reactions were recorded. The oldest studies asked their subjects to give free
descriptions of their emotional experience. Some studies used a list of affective
terms from which subjects had to pick the most appropriate term. Other studies
tried to get non-verbal responses from subjects, which were measured continuously.
These methods are most appropriate to measure dynamic, time-dependent effects
of music.
A final group of studies asked subjects to rate their experiences along semantic
differentials. The results were then analysed for correlations between factors, usually through factor analysis, to determine underlying dimensions. The cited studies
obtained very similar results. In line with studies of other types of stimuli that used
semantic differentials and factor analysis, two factors were consistently identified:
valence and arousal. These results are yet another indicator of the ubiquity of these
factors in affective responses.
CHAPTER 4. AFFECT IN ART AND MEDIA
42
Production of emotions in the listener
Scherer and Zentner (2001) examined if and how emotions can be ‘produced’ by
music in the listener. They note that listeners usually have little trouble to judge
what kind of emotion is expressed in a piece of music. However, this does not
automatically mean that the listener feels such an emotion.
The authors note a study in which they instructed subjects to describe both
which emotions they believe the music stimuli are supposed to express (perception),
and what they actually felt when listening to the music (production). The results
gathered differed considerably. In most cases, perceived emotions were reported as
much stronger than felt emotions, though in some cases, the opposite was reported.
Scherer and Zentner survey various efforts to measure emotional responses to
music. Some studies used a dimensional model, and some followed a categorical
approach. Followers of the categorical approach maintain that presence of basic
emotions always results in specific facial expressions. Thus, if music produces these
basic emotions in the listener, facial activity should be present. One study notes an
effect of valence on facial expression. Dislike of music (negative valence) tends
to result in contraction of the corrugator (the ‘frowning’ muscle), while favourable
music (positive valence) activates zygomatic major (the ‘smiling’ muscle). Generally,
though, the results are less significant than what would be expected. On the other
hand, the dimensional model is seen as valid, but at the same time not being capable
of capturing the subtle differences between emotions. This is a view in line with
other critics of the dimensional approach (see chapter 2).
They conclude that results of the studies are indecisive, and they believe that
neither of the approaches is appropriate to measure which emotions are aroused in
listeners. They doubt that basic emotion terms like anger, fear or disgust are the
emotions we are likely to feel when listening to music, for they describe emotions
much stronger than how we feel about music. They encourage the development
of a taxonomy of terms that are appropriate for the measurement of the subtle
emotional responses to music, like longing, awe, solemnity or tenderness.
Time dependency
At this point, it is important to remember the difference between moods and emotions. While moods are seen as longer-lasting states, emotions tend to be short experiences which necessitate the presence of an object or event towards the emotion
is directed. Schubert (2001) recognises this and reports on the continuous measurement of emotions in music listening. One way to achieve this is to let listeners
record their affective experience on a valence-arousal affective space.
Results showed that the expressed emotions vary considerably over the course
of whole songs. The overall feeling a piece of music evokes seems to be different
to the single emotions evoked in the course of the song. The average of contin-
CHAPTER 4. AFFECT IN ART AND MEDIA
43
uous responses is less pronounced than overall responses. This aligns well with
intuition; emotional responses may be strong, but are also short-lived. They may
be followed by passages less emotional. They may even cancel each other out on
a two-dimensional affective space when a song contains both positively and negatively valenced passages. As with all statistics, calculating an average is not very
meaningful without considering variance, too.
As we have seen before, there are many ways in which emotions can be expressed by musical structure. This expressed emotion can vary considerably across
different parts of the song. Thus, in order to capture the emotional expression of
a song, it is necessary to measure the listeners’ responses continuously. However,
listeners can readily give statements about the overall affect expressed in a song,
which seems counter-intuitive considering the changes in emotion recorded in continuous measurement. Thus, I believe that the overall affective response to a song
most likely refers to its mood, not emotion.
Evaluation vs. pleasure
Davies (2001) noted the seemingly paradoxical situation that many people enjoy
music which expresses emotions with negative valence (such as sadness). This
phenomenon becomes even more puzzling when it is taken into account that music
can produce emotions in the listener. According to Davies, the sadness expressed in
music is sometimes mirrored by listeners, thus feeling sad themselves. In everyday
life, we tend to avoid negative emotions. How, then, is it possible that we willingly
engage in music which makes us feel sad?
Davies illustrates several ways to address this question. One view puts emphasis on the educating effect of emotions in art (as noted by Robinson, see above).
When we experience negative emotions through music, there are no negative consequences, as opposed to situations in real life. Thus, the induced emotions are
muted in comparison and make us accustomed to these feelings. Davies counters
that this might be true for representational art for which a persona can be constructed, but instrumental music and other non-representational art cannot achieve
this educating effect.
Another possible answer is that we recognise the negative expression in music,
yet value the way this is achieved. The negative emotions we experience are then
just something we need to face to see the true value of a piece of music. Davies
compares this to endurance races and other challenging activities, in which the
negative aspects are to be overcome on the way to the achievement of finishing.
This shows that in real life negative emotions are not avoided at all circumstances,
as one might spontaneously think.
Schubert (2007) conducted a study about the influence of perceived and felt
emotion on preference in music. Emotion perceived to be expressed in music is
called external locus (EL), and emotion felt by the listener is called internal locus (IL).
CHAPTER 4. AFFECT IN ART AND MEDIA
44
Subjects rated music along valence-arousal dimensions (defined with the words
happy-sad and aroused-sleepy). In addition, they rated ‘emotional strength’, familiarity and preference (defined as hate it-love it), each on a 7-step scale.
Schubert maintains that preference for the expression of negative emotions in
music implies a certain level of dissociation between these experiences. Either, there
is dissociation in part, which would mean that the negative emotion aroused by the
stimulus is overruled by the positive emotion of preference. Full dissociation, on
the other hand, would mean that the listener feels the negative emotion which is
expressed, yet enjoys this feeling.
The results showed that the emotion subjects felt while listening to music had a
greater influence on preference than what was perceived in music, though recorded
absolute values for felt emotion were lower than expressed emotion. Schubert notices the effect of a “locus gap”; music for which felt emotion and perceived emotion
were rated similarly tended achieve higher ratings for preference. He suspects that
pieces of music which try to express an emotion which is not aroused in the listener are seen as failing to achieve their intended effect. The biggest influence on
preference is exerted by ‘emotional strength’. What kind of emotion is expressed
or felt is less important for preference than the intensity with which an emotion is
expressed. Another strongly influencing factor is familiarity; pieces which listeners
know well tend to be preferred over pieces unknown before.
Overall, the study shows that emotions expressed in music can actually produce emotions in listeners, but expressed and felt emotions are not necessarily the
same. Negative emotions may be enjoyed as much as positive emotions, which hints
towards full dissociation between emotion and preference; it seems that listeners
enjoy the emotions (even the negative ones), which are aroused through music.
Another way to look at the phenomenon is offered by Norman (2004). He distinguishes between the visceral and reflective level of emotional experience. Visceral
refers to hardwired reactions of our nervous system, while reflective is a cognitive
evaluation of a stimulus. Seen in this way, music which is evaluated negatively on
the visceral level may still be valued positively on the reflective level if we are familiar with it. Norman calls this an “acquired taste”. This distinction aligns well with
the strong influence of familiarity on music preference noted by Schubert. Only
when we are accustomed to a style of music can the reflective level overrule the
negative first response on the visceral level.
It seems like we are dealing with two independent forms of valence. When Osgood et al. (1957) examined the affective connotation of words, they named the most
important factor evaluation. It describes our attitude towards a stimulus, whether
this thing is good or bad. Mehrabian (1996) adopted the model for emotions, and
named the first factor pleasure, which he sees as “cognitive judgements of evaluation”. Affective states which are evaluated positively are those that are pleasurable
for us, while unpleasant emotions are evaluated in a negative way. Thus, when our
CHAPTER 4. AFFECT IN ART AND MEDIA
45
personal emotions are concerned, our evaluation (good-bad) is identical with the
way it affects us (pleasant-unpleasant).
Preference in music is not just about its affective value, it is a judgement about
our general evaluation of a piece. Unpleasant emotions may be aroused in the
listener, and the emotion in itself would be evaluated negatively. However, this
emotion is embedded in the complex context of a piece of music (or any other
form of art) and may be an integral part of the whole. As the study of Schubert
has shown, these judgements are independent of each other. Preference and the
valence of expressed or felt emotion are both evaluative dimensions. Henceforth, I
will try to distinguish between these two forms of valence. I will use pleasure for
the valence of emotional states, evaluation for preference judgements and valence as
a more general term which encompasses both concepts.
4.3
Discussion
The discussion above has focused on the expression of affect in music for several
reasons. Music seems to have an even stronger relationship with affective experiences than most other forms of art. Furthermore, the question about how it is
possible for works of art to express emotions is most difficult to answer when the
focus is on instrumental music. I have outlined only a few ways in which this seems
to be possible; undoubtedly, there are many other possibilities to consider.
The analysis has uncovered that appraisal in art seems to consist of two layers.
These two layers become apparent if one examines how it possible that we enjoy
works of art which express negative emotions. To distinguish between the two
layers, I recommend the term pleasure to describe the emotion which is expressed
in a piece, and evaluation for the appraisal process of a complete piece.
The general dimensions of affect – evaluation, activity and potency – seem to be
valid for works of art, regardless of their form. However, it seems likely that specific
forms of art are more suitable to express some forms of affect than others. In the
case of music, it seems unlikely that some of Ekman’s basic emotions are expressed.
Therefore, if the goal is to describe affect, it is important to remember that expressiveness can be optimised if the explanatory model pays respect to the specificities
of the art form. The general affective dimensions then provide a common ground,
give a first indication of a piece’s affective connotation and enable comparisons between different forms of art. This leads to the next chapter, in which I examine
different ways in which affect might be described.
Chapter 5
Metalanguages for affective
space
How do we gain access to the information in affective space? Chapter 2 introduced the notion of Barthes (1973/1996) that by definition, connotation cannot be
described directly but only through the use of metalanguage. If affective space is
made up of affective connotation, we need to find metalanguages which reliably
reference this kind of information.
M'sr
M'sd
CONNOTATION
Metasign
Msr
R'sr
M sd
R'sd
DENOTATION
CONNOTATION
Actual sign
Rsr
R sd
DENOTATION
Figure 5.1: Metasign
Consider a sign R, whose affective connotation R" we wish to describe. A metalanguage replaces the connoted system’s signified (R"sd ) with a new sign M, which
will be called metasign. To be well suited for our task, the meaning of the metasign
– denotation and connotation – needs to be well known. For it to be part of an effective metalanguage, the metasign’s denotation Msd needs to match the connoted
system’s signified R"sd as closely as possible. In addition, since we are introducing
a new sign, we cannot avoid introducing new connotations (M" ) at the same time.
This leads to the requirement that the connotations of the metalanguage need to fit
46
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
47
with its denotation. If denotation and connotation of the metasign do not match,
its meaning becomes ambiguous, resulting in a loss of expressiveness.
The term metalanguage is used quite liberally. Metalanguage, in the sense used
here, is a system of symbols for communication, not restricted to the use of linguistic entities such as words or sentences. While natural language does constitute a
powerful metalanguage, there are many other representations that have their own
advantages. In the following, I will give an overview of metalanguages for affective
connotation which appear to be feasible for use in interactive systems. Of course,
this list is in no ways exhaustive.
5.1
Affective scales
Osgood et al. (1957) have shown that affective space can be represented well through
a three-dimensional model. Though the actual space must be much more complex
and involves an unknown number of dimensions, rating a concept along these dimensions gives a good indicator of its affective value. Therefore, visual representations of these factors are a very direct metalanguage for affective connotation.
Evaluative scales
The research of Osgood et al. has shown that the evaluative dimension carries the
greatest importance. So, it is no coincidence that evaluative ratings of content is
very common. Indeed, evaluative scales are ubiquitous and can be found in many
forms.
In its most popular form, the evaluative dimension is represented by ‘starratings’. Star classification systems have a long tradition. Hotels are given official
stars to indicate their quality standard. Nowadays, countless websites, e.g. media
stores, encourage their users to rate content on such scales. Another common form
of evaluative scales are scores. The scale may range from 0 (worst) to 100 (best) (e.g.
game reviews), 1 to 10 (e.g. performance ratings) , 5 to 1 (e.g. Austrian school and
university grades), without changing the aspect of meaning covered.
The concept is easily understood and successful. Evaluative scales reduce the
complexity of meaning down to a single dimension and thus give the roughest of
indicators of whether a concept is worthwhile to consider. Used in this way, star
qualifications and scores express the same meaning as a semantic differential for
the pairing good-bad, which was determined by Osgood et al. as the pivot pair for
the evaluative factor. However, the simplicity of evaluative scales also means that
they are very limited in their reach.
Furthermore, the analysis of music in chapter 4.2 has shown that one needs
to distinguish between two forms of valence, evaluation and pleasure. Evaluative
scales do not describe the affect expressed by an information entity, of which plea-
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
48
sure is the most important dimension. In section 5.2, I describe a study about semantic scales and music, which shows that listeners can easily distinguish between
these two forms of valence.
While affective connotation is inherently subjective, evaluative scales are especially susceptible to subjectivity. Good and bad are no objective criteria and are
always dependent on a person’s taste, for one man’s meat may be another man’s
poison. Professional evaluative ratings, as in the example of hotel ratings, are therefore based on a common catalogue of criteria on which to base the ultimate decision
upon. Teachers mark students on the basis of a list of course requirements. In the
case of user-provided evaluative data, meaningful results can only be expected if a
large number of ratings is compared statistically.
Evaluative scales are probably the most widely used affective space interface, but
are very limited in their expressiveness. They are purely reflective, not describing
the affect being expressed by an information entity but if we like or dislike an entity.
General scales: Musicovery
The concept of scales can be applied to purely affective factors, pleasure, arousal
and dominance (PAD). Combined with evaluation, this gives a good indication of
an entity’s affective connotation. Since PAD are defined as orthogonal dimensions,
they can serve as a very direct visualisation of affective space.
There are not many cases in which the concept is being employed. One example
is the interactive webradio Musicovery1 . In addition to year of release and genre,
users can also select the desired mood of the music the webradio should play. This
is done via a two-dimensional representation of affective space; pleasure (defined
as dark-positive) and arousal (defined as calm-energetic). A screenshot is reproduced in figure 5.2.
Figure 5.2: Musicovery. Two-dimensional pleasure-arousal affective space.
The concept is not difficult to understand for users. Even if a user does not
immediately understand what the scales mean, one click automatically delivers
results which approximately match the selected position in affective space. From
1 http://www.musicovery.com/,
last accessed 2009-03-18.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
49
this result, users can determine if the mood being expressed ‘feels’ like what they
were looking for or if the selected position needs to be adjusted.
Musicovery only allows browsing of content. Users cannot express if the results
match what they associate with a selected position in affective space. The affective
meaning of songs has been determined beforehand. According to a press release2 ,
this is done via “40 musical descriptors”, each of which can take one of 10 values. From these data, a two-dimensional position is calculated. It is not stated of
which nature these descriptors are – semantic differentials are most likely – or how
these descriptors are being rated – manually or algorithmically. As we have seen
in section 2.2, affective experience is subjective and differs between cultures and
even between single persons. A useful addition to the concept may thus be to allow
users to express their own opinion about the music they listen to, thus fine-tuning
the available data.
Though very helpful, there is also a downside to the concept of direct visual
representation of affective space. A selected point in this space is not expressive
on its own but only in relativity to the whole space. There is no actual visual
representation for affect; instead, the position of the selected point needs to be
shown in comparison to the whole space to become meaningful for users. This
is not a big problem for a browsing interface like the one of Musicovery, since the
space needs to be represented only once. The concept is unfeasible, however, for the
description of content where space is limited. A search for keywords or browsing
through categories delivers results which are usually heterogeneous in respect to
their affective connotation. For these situations, straightforward representations of
affective space cannot be used.
In Musicovery, the affective factors are defined by bipolar adjectives, which in
fact turns them into semantic differentials. The next section elaborates on the general use of semantic differentials.
5.2
Semantic scales
Osgood et al. devised the semantic differential as an instrument for the measurement of the affective connotation of words. However, the principle can also be
applied to other forms of stimuli.
For example, TV guides commonly rate films along several dimensions that
commonly vary across films and film genres. The scales are provided in addition
to the textual description (denotation) and give viewers a quick indication of the
mood to be expected from the film (affective connotation). The example in figure
5.3 shows the five scales that are used by an Austrian TV guide. In the bottom row,
a 5-step evaluative scale gives a general indicator. In the upper row, four scales with
4 steps each – 0 to 3 dots – represent film-specific semantic dimensions. The scales
2 http://musicovery.com/pressRelease/PressKitMUSICOVERY.doc,
last access 2009-03-15.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
50
are not indicated by antonymous adjectives but by nouns which denote the concept.
Translated into English, they might be called: thrill, humour, challenge (in the
sense of challenging to the mind) and eroticism. These scales can be seen to be
equivalent to semantic differentials, the only difference being the way in which the
scale is defined. Instead of antonymous adjectives, a noun implies the semantic dimension. The scales could be easily translated into semantic differentials, while still
representing the same concepts: calm-thrilling, humourless-hilarious, simplechallenging and chaste-erotic.
Figure 5.3: A typical film rating box in a TV guide. Taken from tv-media.at
Four scales may not seem to be very specific in their meaning – after all, affective
space is made up of an unknown number of dimensions. However, 4 scales with
four steps each (as depicted in figure 5.3) already divide affective space up into
44 =256 subdivisions, which are already tailored towards the media type – being
vague in most regions, but specific where differences are important and most likely
to occur. When the vocabulary of the scales is selected well for the media type,
such an indication can easily be superior to, say, genre classifications. To give a
practical example, the work of both the Farrelly brothers and of Woody Allen is
frequently classified as “Comedy” and “Romance”, but whoever is familiar with
their films is aware of how different they are. Thus, on a humourless-hilarious
scale, they may achieve similar ratings, while scores on simple-challenging are
likely to differ considerably.
The affective picture drawn can be seen as a by-product of the scales. Since
every word has its place in affective space, ratings along specific semantic dimensions automatically express something about the affective value of the entity being described. This actually is an advantage rather than disadvantage of scales.
The vocabulary can be tailored towards the media type to be most expressive and
discriminating between entities. The affective value of the scales, in turn, can be
determined beforehand. This can be achieved either by way of semantic differentials which have the defining nouns or adjectives as the object of investigation, or
automatically through opinion mining, as explained in chapter 2.3. Locating the
scales in affective space before actually using them solves the requirement that the
meaning of metasigns must be well-known and thus turns scales into an effective
metalanguage. The media-specific statements gathered from the scales can then be
used to locate entities in general affective dimensions – EPA, for example. It makes
scales for different media types compatible and comparable, thus allowing to find
affective similarities between entities from different media types.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
51
The translation problem
Speakers of German will likely agree that the presented translations for the film
scales from the TV guide do not exactly match the overall tone of what the original
scales express; their denotations might be the same, but their affective connotations
are not. This leads to one of the major problems which scales as metalanguage
exhibit, the problem of translation. Speakers of multiple languages are very aware
of how difficult it can be to catch the right tone in a translation. While it is possible
in many cases to find a direct denotative equivalent across languages, an affective
equivalent is much harder to achieve. For more abstract concepts, even the denotative equivalent might be missing. Instead, when there is no perfect one-to-one
translation, one can employ a phrase or sentence to achieve a good overall match.
The translation problem shows how important it is to be aware of the affective
connotation which is introduced alongside a metasign. It influences the overall
meaning of a concept, and thus, it cannot be assured that speakers of different
languages are rating the same affective dimension. This makes it very hard to compare results acquired from scales across languages; it is all but certain if occurring
differences are due to cultural differences, or plainly because subjects were rating
different concepts.
Study: scales and music
In an undergraduate study of mine (Spindler, 2006), scales were applied to music.
30 pieces of western popular music were selected to cover a wide range of musical styles. 15 scales were selected to be well suited to draw an affective picture
of music. Scales were designed as semantic differentials, using German-language
antonymous terms at both ends, and further explained by brief descriptions. In
addition, a familiar star-rating was supplied to draw comparisons with evaluation.
The experiment was designed as a website, with test subjects registering to allow
identification across multiple sessions. The music player was embedded in the
website, which assured that subjects were only rating the song they were currently
listening to. At the end of the rating process, subjects filled in a questionnaire about
their experiences. For each scale, they stated how well they understood the meaning
of the scale.
There was a slightly positive correlation between evaluation and ratings for
other scales; subjects were likely to give a higher rating on specific scales when
they liked the song. Interestingly, there was almost no correlation between the scale
desperate-happy, which tested for pleasure, and the evaluative scale (Pearson r
correlation of 0.1). This result is in line with the dissociation theory put forward by
Schubert (see section 4.2) and implies two forms of valence. The only scale which
showed slightly negative correlation with evaluation was the scale unsensibelschnulzig (which roughly translates into insensitive-soppy). A possible expla-
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
52
nation would be that the defining words for this scale themselves generally score
low on evaluation. If one favours a piece of music, one is unlikely to link it with
words that describe an undesired concept. Vice versa, a piece of music really disliked may also be described with unfavourable words. This is another indicator that
the affective connotation of metasigns needs to be taken into account.
Subjects achieved reasonable agreement on most scales. In most cases, there was
a distinct tendency towards one value on each scale, the distribution of ratings often
approximating a Gaussian distribution. Taking the data from the questionnaire into
account, subjects reached better agreement on scales whose meaning was very clear.
5.3
Affect words in natural language
Another choice for a metalanguage is natural language. In this case, we use language – words, sentences or longer passages of text – to denote the affective value
which is connoted by an entity.
The most expressive option is free text. Here, the message can be as subtle as
our writing capabilities are. Writers – especially poets – excel here; the imagery of
poetry is very expressive of affect. However, this option is not feasible as a helpful
metalanguage. Free text is probably as complicated to analyse as the information
entity we wish to describe. In this case, we just replace an image, a piece of music,
or an essay with another piece of text, without getting closer to our goal. Thus,
free text does not fulfil the requirement that the metalanguage must employ signs
whose full meaning is clear and unambiguous.
More suitable are affect words, i.e. words which denote affect. For instance,
this includes the names most frequently employed for basic emotions; anger, fear,
surprise, disgust, joy. Of course, there are countless other words which would fit
in this list equally well. In this case, the complexity of text is reduced to single signs
with well-known meaning, which fulfils the requirements laid out. An advantage of
affect words is that denotation and affective connotation match by definition; they
are, in a way, affect in a very pure form.
To become a feasible metalanguage for use in interactive systems, affect laden
words must be used in an implementation which allows us to describe entities with
affect words. On the web, this implementation can be found in tags.
Tags
Tags and keywords are a very direct form of natural language descriptions. Libraries have used keywords for a long time to classify content without having to
rely on the title of a work. This classification would be made according to a controlled vocabulary. Clearly disambiguating the meaning of the keywords ensured
that keywords applied by different librarians would be comparable.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
53
Tags are keywords too, but do not follow a controlled vocabulary nor are created
by professionals. Usually applied on the web, users can pick arbitrary words to
describe content. This approach is particularly well suited for the web. Inherently
open-ended, tags introduce a flat, non-hierarchical form of structure not captured
by planned approaches.
Their biggest strength might lie in their way to adapt. Language continually
changes, inventing and coining new words for concepts which cannot be described
yet. And as soon as there is a word to denote a concept, it can become a tag. In
comparison, controlled vocabularies are slow, and, while being well-designed for a
particular task, do not adapt well to changing requirements. This phenomenon has
been compared to desire lines (Mathes, 2004), a concept from city planning. When
people frequently diverge from paved walking paths and use shortcuts through
grass, trails become visible over time. Desire lines thus indicate the routes people
want to take, which is something tags excel at.
At the same time, the ability to adapt is also a problem of tags. Words frequently
are synonymous, so one tag might reference very different concepts. Also, the
choice of words is down to taste, some users prefer one word over another in face
of homonyms. Spelling variants and mistakes, as well as the decision between
singular and plural form, introduce distinct tags which should actually refer to
the same concept. And, in many cases, the concept is not applied correctly by
users. For instance, in many cases title words are just reproduced as tags word
by word, leading to not very helpful tags like “the”. In face of so many arbitrary
choices, librarians may shudder and praise controlled vocabularies. To counter
these problems, input dialogs for tags usually support the user in their choice of
words by suggesting the choices of other users. Suggesting a few tags from which
the user can pick conforms with the principle “answers first, then questions”.
Even if this principle is not applied, when tags are used for a while, a relatively
stable vocabulary of frequently used tags emerges. This concept is sometimes referred to as folksonomy, a portmanteau of folk and taxonomy (Mathes, 2004, Spiteri,
2007). Unlike taxonomies, tags speak, in a way, the language of people themselves
– their desire lines – and are therefore more likely to be accepted and understood
by fellow users and can prove helpful in search of information.
An example of how tags can be used successfully are social bookmarking services like delicious3 , where users store their bookmarks online, along with tags to
describe the bookmarked website. This allows the user to find the bookmark again
without having to remember the title of the website, only the concept it represents.
Additionally, tags are aggregated across users, allowing to find websites about a
particular topic which other users have deemed worthwhile. In this way, tags also
function as a recommender service.
3 http://www.delicious.com,
last accessed 2009-03-18.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
54
Another example are plot keywords on IMDB4 . In this case, users name elements
which occur in the plot of a film.
In tags, the various aspects of meaning may all be covered, for tags are as expressive as natural language is. In the case of IMDB’s plot keywords, they usually
provide information about an item’s denotation, the film itself. Tags on delicious,
on the other hand, mostly group content by topics and thus target conceptual connotation. And, through the use of affect words, tags may also describe affective
connotation.
Case study: tags as affect metalanguage for music
Chapter 4 gave an overview about the intrinsic relationship between music and
emotion. A recurring notion throughout the literature about this topic is that emotional experience and emotional needs are important reasons for humans to engage
in music, be it creating, playing or listening to music (e.g. Scherer and Zentner,
2001). This is also confirmed in studies where people are asked about the reasons
for engaging in music (e.g. Boal-Palheiros and Hargreaves, 2001, Tarrant et al., 2000).
Since tags seemingly represent, as explained before, the desire lines of users, a look
at folksonomies for music should therefore mirror the importance of emotions in
music.
Since August 2005, the social music platform last.fm5 allows users to describe
music through tags. As is the case with delicious, the individual tags are aggregated. The “top tags” page6 lists the 150 most frequently applied tags in a “tag
cloud”, which constitute a folksonomy created by the last.fm users. In line with
folksonomy theory, the vocabulary is quite stable; in the period from October 2007
to January 2009, only 7 out of the 150 top tags have made way for new ones, and
those that have vanished or have been added are only found among those less frequently used (indicated by the smallest font size). The list of top tags from January
20, 2009 (see figure 5.4) was analysed in detail.
Out of the 150 tags, about two thirds clearly represent a musical genre. Genres
are a common way to classify music. According to Moore (2001), the term is similar,
but not identical, to style and has no definition universally agreed upon. It might
refer to conventions in instrumentation, composition, the way of playing, or the effect to be achieved by the music. Applied by media or listeners, it is not uncommon
for musicians to be unhappy with being confined to a genre (e.g. Zorn, 1999). However, the popularity of genre names as tags might suggest that the concept is really
helpful to listeners and reflects a way of how we classify music. An advantage of
using tags instead of a controlled vocabulary for genre descriptions lies, again, in
4 http://www.imdb.com,
last accessed 2009-03-18.
last accessed 2009-03-18.
6 http://last.fm/charts/toptags
5 http://last.fm,
55
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
Figure 5.4: last.fm top tags from January 20, 2009
their adaptability. Pieces of music are not restricted to one category since multiple
tags can be applied, and new musical currents may be embraced quickly.
It is possible to infer affective information from some genres. For instance,
‘death metal’ or ‘punk’ may imply that the music expresses some form of ‘anger’.
Generally, though, genre classifications seem to cover conceptual connotation, with
some hints towards denotation and affective connotation. The ongoing work of
the music information retrieval community towards automatic genre classification
(e.g. Scaringella et al., 2006, McKay and Fujinaga, 2004) suggests that genres imply
enough distinguishing musical features to be captured algorithmically.
conceptual connotation
affective connotation
instrumentation
origin
time
personal
miscellaneous
evaluative
affective
acoustic
american
00s
albums i own
atmospheric
awesome
beautiful
female
canadian
60s
seen live
classic
cool
fun
female vocalist
finnish
70s
cover
favorite
love
female vocalists
french
90s
experimental
favorites
melancholy
guitar
german
favorite songs
mellow
male vocalists
latin
favourite
sad
polish
favourites
sexy
piano
russian
swedish
uk
Table 5.1: Classification of last.fm top tags which do not describe genres
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
56
The remaining 43 tags may be classified according to table 5.1. I have tried to
categorise the tags according to the kind of meaning covered. One can see that tags
are used for a wide range of concepts; they may describe a song’s instrumentation,
its origin and release date. ‘albums i own’ and ‘seen live’ are simply personal
reminders, which do not describe the song in any way. All the classes of tags
named so far describe attributes that are implied through the song, thus covering
conceptual connotation.
For some tags, my classification definitely is debatable. For instance, classic
seems to imply positive evaluation, experimental may be seen by some as genre
(in the sense that it is defining for experimental music to cross traditional genre
boundaries), and love is also denotative, as the tag might refer to the lyrics of a
song. So for some, this list might look a little different.
The general notion is, however, that tags which describe affective connotation
are not very common. 14 out of 150, or around 9% of the tags carry significant
affective meaning, half of which being evaluative terms which seem to be used
as personal reminders (“favourite” in 5 different forms) and may thus belong to
the same category of conceptual connotation as ‘albums i own’ and ‘seen live’.
The remaining 7 affective tags which are not strictly evaluative still are strongly
valenced; 5 are positive, 2 are negative. As suggested in chapter 4.2, sad music
can be favourable. Sadness is a very negatively evaluated emotion if genuinely
experienced, but music which expresses negative emotions can still be positively
evaluated. Full fledged emotions like ‘anger’ or ‘fear’ are not present in this list at
all.
Returning to the initial premise of this analysis, there seems to be a striking gap
between what musicologists and listeners alike believe is important about music,
and the ways in which listeners express their musical needs in verbal terms.
Discussion
I can only address the discrepancy uncovered by this analysis with a hypothesis.
Although language offers the words we might be looking for, it may not be very
intuitive to express affect through words. The relation between emotions and music
is intrinsic, but also complicated (see 4.2). While moods and emotions are expressed
by music and listeners pick music for their affective value, full-fledged emotions are
not to be expected. On the contrary, music can be incredibly subtle in its affective
meaning, and statements like fearful or disgusting are not the words that spring
to mind when we think about music. Moreover, as suggested by Schubert (2001),
actual emotions tend to be short experiences and are unlikely to be constant over
the duration of a song. A tag may only summarise the mood of a song, not the
emotions expressed.
My results are in line with the study of Bainbridge et al. (2003), who analysed
music related queries posted to Google Answers. In their study, only 2.4 % of music
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
57
queries dealt with affective terms. Also, the phenomenon is not restricted to music.
For example, Osgood et al. (1957, p. 19) note a study in which subjects were asked
to rate ice cream. Presented with a preselected vocabulary on semantic differentials,
subjects could give a much wider range of confident judgements than when they
were asked to describe the experience in their own words.
Affect is all about personal experience. Poets are experts in producing words
which capture an emotional state, but for most of us, it is not easy to find the right
words. One way to address this would be to follow the “answers first, then questions” principle. An interactive system could suggest a number of tags that describe
moods which are likely to be communicated by music. Language is unmatched in
its power to express thoughts and to describe actual content, or denotation. However, the inherently non-verbal character of affect suggests to me that there may be
other ways to describe the affective value of entities, which do not rely on words
and are more suitable for the task.
5.4
Colours
There is a considerable body of evidence that colours have a strong connection with
emotions. If this relationship can be formalised in a consistent and predictable
matter, colours would be a promising candidate for an affect metalanguage.
Colours can be fully described in three-dimensional colour space models, which
are tailored towards different purposes and can be translated into each other. Commonly, studies about colour and emotion employ the device independent CIELAB
space, which is designed to be perceptually uniform. Human colour perception is
nonlinear; CIELAB is compressed so that distances in the model result in the same
visually observable differences regardless of colour. The three dimensions in this
model are L* or lightness, and two chromatic dimensions a* and b*. The chromatic
dimensions can be reformulated as C* (chromaticity, similar to saturation) and h
(hue angle). Some studies are directly based on a HSB space, in which colours are
described by hue, saturation and brightness. While such a model is not corrected
for human perception, it is very intuitive to understand for humans.
The ability to describe colours through three independent factors makes them
well suited for dimensional emotion theories. Nearly all studies consulted are
searching for formulaic mappings between colour space dimensions and hypothesised affective dimensions.
Valdez and Mehrabian (1994) based their study on the PAD (Pleasure-ArousalDominance) emotion model, which is declared to be equivalent to EPA (EvaluationActivity-Potency) (see section 3.3). They used semantic differentials for adjective
pairs, whose positions in affective space had been determined beforehand and
which were recognised to be stable on two factors and heavily dependent on the
remaining factor. 76 colours were selected from the Munsell colour system and de-
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
58
scribed in the HSB colour space. Their study was conducted on American university
students.
They found that all three factors PAD are strongly influenced by brightness
and saturation. Pleasure increases with both saturation and brightness. Arousal
was strongly linearly dependent on saturation, but also influenced by brightness –
high for low values, low for intermediate values and medium for high brightness.
Dominance also increased linearly with saturation, but less strongly than arousal,
and decreased with brightness. They also found a relationship between hue and
pleasure. Blue was found to be most pleasant, and yellow to be least pleasant,
and other colours resulted in intermediate levels of pleasure. However, compared
to brightness and saturation, changes in hue did not result in such clear affective
differences. For arousal and dominance, the effects of hue were reported to be weak.
Ou et al. (2004) did not operate from a predetermined emotion model. Instead,
they applied their own factor analysis to derive the major factors underlying colour
emotions. However, they used 10 antonymous adjective pairs like in semantic differentials, which were picked to be highly sensitive to one of the major affective
dimensions. Subjects were only given binary choices between the adjectives with
no intermediate values, and for each pair, answers were summed up to derive an
overall score. Their study used 20 colours picked from the CIELAB colour space
and was conducted with British and Chinese subjects. While cross-cultural agreement was very high overall, there were significant differences on two scales, tenserelaxed and like-dislike, so these scales were excluded from the factor analysis.
Another scale, warm-cool, was recognised as independent from other scales and
also excluded. Although factor analysis would have proven the independence of
the scale, Ou et al. opted to anticipate this result.
From the data of the remaining seven scales, they extracted two major factors,
colour activity and colour weight. A third factor, colour heat, was added from the
data of the warm-cool scale. They arrive at similar conclusions as Valdez and
Mehrabian; activity and weight depend on lightness and chromaticity. The only
factor influenced by hue was heat, with a hue of 50º (red to red-orange) being the
warmest. Ou et al. did not include a factor for valence, which will be discussed
below. They also compare their results with other Asian studies, which derived
the same factors activity, weight and heat, indicating that these factors really tell
something basic about colour and emotions.
Suk (2006) used the SAM (Self Assessment Manikin), a graphical representation
of the general affective factors PAD, to measure affective responses to colours. Experiments were conducted with German and South Korean subjects. Again, the
results were similar. Chromaticity was positively correlated with all three dimensions. Lightness had not such a strong influence. Differences in changes to hue
were weak, but again, blue was rated as the most pleasant colour. Cross-cultural
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
59
reactions were similar, with South Korean subjects rating blue as more pleasant and
red as more arousing than German subjects.
Discussion
In summary, all of the consulted studies come to very similar conclusions. Colour
saturation or chromaticity has the strongest influence on the affective perception of
colours; while all factors depend on chroma to some extent, the connection with
activity is strongest. Valdez and Mehrabian (1994) hypothesise that a physiological
reason for this influence might be that photoreceptors are stimulated more strongly
by highly saturated colours.
Lightness (or brightness) also influences all factors. The strongest influence is
an inverse correlation with the perception of dominance, which seems to be in line
with the intuitive notion that bright colours are also perceived as light, and vice
versa.
In comparison, the influence of hue is much weaker, at least in regard to activity
and dominance. Several studies note that the common perception of red as being
highly arousing actually stems from the high chromaticity of common examples
of red, while green is commonly seen as soothing because the selected sample is
usually not very saturated (D’Andrade and Egan, 1974).
However, hue fully describes if a colour is perceived as warm or cool. Ou et al.
(2004) regard warm-cool as a separate dimension of colour perception. It is a
pervasive idea that colours can be warm and cool, both among artists and other
people. This is manifested in the conventionalised labels of water-taps; hot water is
red, and cold water is blue. The study confirms this intuitive notion. This suggests
to me a possible application of colours in affective space interfaces; colours can
reliably represent the special semantic dimension warm-cool or hot-cold.
Hue also influences if a colour was liked or disliked by subjects. In all studies,
blue was perceived as the most pleasant hue, while yellow was continuously regarded as least pleasant. However, valence ratings seem to be culturally dependent;
in the study of Ou et al. (2004), ratings differed so much between cultures that they
decided to exclude like-dislike from the factor analysis.
Colour symbolism
One potential problem with the use of colours as metalanguage is their potential
cultural dependency. Colour symbolism refers to the use of colours as symbols in
culture. Kaya and Epps (2004) note that the emotions we associate with colours are
highly dependent on personal preferences and previous experiences.
Suk (2006) speaks of colour semantics and apparently refers to the denotative
function which colours can perform. She gives several examples. For instance,
green has a special meaning in Islam, being “associated with luxurious green mead-
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
60
ows and trees, and it symbolizes paradise for those who liven in barren land” (p.
33), an importance which is mirrored by the use of green in the national flags of
many Islamic countries. In traditional Chinese architecture, specific colours are associated with each of the four directions of the compass. In Western cultures, death
is symbolised by black, while in some Asian cultures, it is associated with white.
Suk tries to avoid such associations by showing just the colours and measuring the
immediate reaction of subjects.
Emotionally}Vague7 was a web-based study about colours and emotions. While
the other studies tried to determine the underlying dimensions of colour affect, this
study simply asked subjects which colours they associate with basic emotions. 250
subjects from many different countries participated in the study. The results are
reproduced in figure 5.5.
Joy
Love
Anger
Sadness
Fear
Figure 5.5: Colour association with basic emotions. Taken from Emotionally}Vague.
The results mirror some of the notions of the dimensional studies. Highly arousing or active emotions are associated with saturated colours and brighter shades.
Apparently, lightness is also connected with valence; the two positive emotions are
associated with brighter colours than the negative ones. Anger, being negatively
valenced but highly arousing, consequently is associated with highly saturated red
but also black. Also note that hue has an influence; both sadness and fear are, if
at all, associated with cool colours, while the positive emotions are represented by
warm colours. Though there are parallels, the proposed formulas of the dimensional studies would not predict these results.
One possible explanation for these deviations is the influence of colour symbolism. Emotions are very frequent experiences in our daily lives, and specific colours
have become associated with these states. As figure 5.5 shows, bright yellow stands
7 http://www.emotionallyvague.com,
last accessed 2009-03-18
61
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
for joy, possibly because it is how we perceive the sun. Red and pink stand for love,
but red also symbolises anger.
The most striking deviation concerns the influence of hue. In all dimensional
studies, blue was perceived as the most pleasant colour, while yellow was continuously regarded as the least pleasant example. However, as figure 5.5 clearly shows,
yellow is by far the most common association for joy. Blue, on the other hand, most
commonly occurs with sadness and only comes in sixth for joy. In the English language, to feel blue is synonymous with being depressed or sad. According to the
OED, this use of blue came into common use in the 19th century. It is not clear if
the association of blue and sadness is a result of this meaning of the word blue,
or if the meaning came into use because the colour does reference sadness in some
biological way that predates language.
To hypothesise, this seems to be another case where two layers of valence become apparent. If asked which colours we prefer, blue is the most frequent answer.
Thus, the affective connotation of the colour blue has positive valence. If used as a
signifier, however, blue can denote an emotional state with very negative valence.
Yellow, on the other hand, can be a signifier for the emotional state of joy in the
same way that the word joy is. If people are to choose their favourite colour among
a selection, yellow is not preferred; thus, the affective connotation of the colour yellow itself has negative valence. However, yellow is associated with joy, which is a
very positive emotion. The principle is illustrated in figure 5.6.
colour emotions
colour symbolism
POSITIVE
VALENCE
joy
NEGATIVE
VALENCE
sadness
Figure 5.6: Hypothesised denotative functions for yellow and blue.
Since experimental results for the valence of blue and yellow differ clearly from
the symbolic use of these colours, it seems believable that the results obtained in
the dimensional study actually measured the emotion of colours itself, and not their
symbolic meaning, as it was intended by Suk. In the case of arousal, there seems
to be no difference between how people feel about colours on their own (colour
emotions), and when they are asked about the feelings they associate with colours
(colour symbolism).
Examples
There are a few examples in which the connection between colours and emotion is
being employed.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
62
Chen and Yen (2008) describe a video player which allows manual annotation of
videos with emotions. Passages of a video can be tagged with an emotion which a
user believes to be expressed. The emotions are represented by colours: angry (red),
fear (green), sad (blue), happy (yellow). Passages can also be tagged as neutral
(beige). The selection of colours mostly conforms with colour symbolism.
de Melo and Paiva (2007) describe an embodied conversational agent. They
want to support the agent’s expressiveness by adding colourful light and shadows
to its environment. The colours used are not described in details, but they use red
light to convey anger and a black & white filter to express despair.
Cymbolism8 is an interesting project which aims to create a dictionary of wordcolour associations. Visitors are presented with a random word along with a description of its denotation. The visitor can then pick one out of 19 colours to express
which colour he or she associates with this word. New words can be suggested by
visitors. The website is designed as a tool for designers to support them in the task
of picking a colour which goes well with a certain word.
Moody
Music is strongly tied to emotions, and emotions are somehow connected with
colours. The software Moody9 builds upon this mediated relation of music and
colours. It is a plugin for the iTunes media player and lets users tag their personal
music collections with colours. Afterwards, the user can select a desired range of
moods (colours), and the software only plays pieces of music tagged accordingly.
Users can also share their colour tags online.
Figure 5.7 shows a screenshot of the interface. The colours are mapped according to a two-dimensional representation of affective space. Each dimension is
represented as a 4-step scale, resulting in 16 possible colours. The horizontal axis
is defined as sad-happy, representing pleasure, and the vertical axis is defined as
calm-intense, representing arousal (see figure 5.7 on the right).
The colours for the four extremes of this affective space are: blue for negative
pleasure and low arousal (−P − A), red for negative pleasure and high arousal
(−P + A), green (+P − A) and yellow (+P + A). This selection has been clearly made
for the symbolic meaning of these colours; yellow for joy, red for anger, and blue
for sadness. The emotions associated with these colours occupy the same positions
in a two-dimensional affective space: sadness (−P − A), anger (+P − A) and joy
(+P + A). The intermediate colours are blendings between the corner colours. In a
L*a*b* colour space, yellow-blue constitute axis b*, and red-green constitute axis a*.
This makes green a logical choice for the fourth colour.
Figure 5.8 shows the 16 colours of Moody in a chromaticity diagram (twodimensional diagram with a* as abscissa and b* as ordinate). The numbers in the
8 http://www.cymbolism.com/,
last accessed 2009-03-18.
last accessed 2009-03-18.
9 http://www.crayonroom.com/moody.php,
63
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
AROUSAL
VALENCE
Figure 5.7:
metaphor
Moody:
Two-dimensional pleasure-arousal space using a colour
colour spots represent the colour’s lightness value. The colours are distributed well
in a*b* space. The colours represent two pleasure dimensions, valence and arousal;
their orientations in a*b* space are also indicated. In this implementation, the affective dimensions lie diagonally in a*b* space.
There are some parallels that can be drawn between the empirically derived
formulas from the cited studies and the implementation in Moody. For the most
part, chromaticity is positively correlated with valence and arousal. The only exception is blue, which represents lowest valence and lowest arousal, but is not the
colour with lowest saturation. Apart from that, there is a clear tendency that colours
farther from the centre represent higher valence or arousal. Like in the empirical
studies, lightness is also positively correlated with both valence and arousal. The
biggest difference to pure colour emotions as inferred in the empirical studies is
the influence of hue. In Moody, hue differences are an important differentiation
between represented affective states, resulting in almost equidistant distribution of
the colours in a*b* space.
Moody is an interesting example of an affective space interface. Affective space
is reduced to two dimensions, with only four steps on each dimensions, resulting in
16 affective states that can be expressed. Arguably, this leads to only a rough indicator. On the other hand, affective experiences are subjective and fuzzy, and giving
only few options makes it easier for users to choose between these alternatives. The
use of colours as metaphor also befits the fuzzy character of the affective value of
music. Colours are not very specific, and are, possibly, more easily associated with
affective experiences induced or expressed by music than other descriptors. Furthermore, colours have the advantage that an affective state can be expressed on
very little space; it is not necessary to reproduce the coordinate system to be able to
locate a colour’s position.
Colour choice in Moody is symbolic. Colour symbolism fits well with their
use as signifiers for the affective connotation of media. As explained before, for
64
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
b*
61 …L* value
92
Arousal
Valence
82
50
81
70
73
72
65
61
64
25
62
55
57
57
-50
a*
-25
0
25
50
50
52
-25
46
Figure 5.8: Moody: a*b* chromaticity diagram showing the colours’ positions.
arousal, there seems to be no difference in colour symbolism and colour emotions;
chromaticity and arousal are positively correlated. In the case of valence, it is not
intended to describe how favourable the song is, but how favourable the emotions
are that are expressed in the song; valence is understood as a happy-sad scale. The
situation might be different if the other form of valence, like-dislike, were to be
expressed. In this case, colours which are favourable and unfavourable should be
picked. Of course, star ratings provide a much more familiar and unambiguous
scale for this type of valence.
5.5
Facial expressions
Research of human emotions is closely tied to the study of facial expressions, frequently seen as the primary way for humans to communicate their emotional state
(Ekman, 1999b, Buck, 1984). The relationship seems to be so intrinsic that it is hard
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
65
to write about emotions without explaining facial expressions, and vice versa. In my
case, the topics were divided up to emphasise the role facial expressions can play in
affective space interfaces, being one of several metalanguages to choose from. However, because emotion theories commonly explain both phenomena, several notions
from chapter 3 will reoccur here.
Darwin (1872) pioneered the view of facial expressions which have developed
evolutionary. He was the first to undertake cross-cultural research in this area and
concluded that certain facial expressions are universal, i.e. being exhibited and
understood by humans regardless of cultural background. He also showed that
some emotion related facial expressions are also exhibited by animals, and that even
infants have the capability of recognising emotional states from facial expressions
in other humans.
Facial expression research only became relevant again the second half of the
twentieth century. Since then, many of his findings were confirmed by newer
research. Buck (1984) explains the communicative function of emotions through
a communication metaphor: Emotional states are encoded in a facial expression.
Such facial expressions are recognised by other humans via sensory input and are
subsequently decoded to infer the emotional state of the person who exhibited the
facial expression.
Perception: categorical vs. dimensional
Being intrinsically tied to emotions, facial expressions are being described with the
same models; there are proponents of a dimensional approach and a categorical
approach. For instance, Schlosberg’s early dimensional theories of emotion (1952,
1954) were actually attempts to describe facial expressions. In fact, much of what is
known about emotions has been derived from studies of facial expressions.
Perhaps the most prominent proponent of the connection between emotions
and universal facial expressions is Paul Ekman. His cross-cultural research (Ekman
et al., 1972), from which he inferred the universality of six basic emotions, shaped
the idea that emotions are perceived categorically. His views have been challenged
numerous times. Ortony and Turner (1990, p. 320) state that for Ekman, states
which do not result in distinct facial expressions are not actual emotions. This
notion is not entirely true, though. Ekman and Friesen (1975) acknowledge the idea
that basic emotions are in fact emotion families, which are emotions that are similar
in regard to physiological activity and expressive behaviour. This notion renders
the distinction between categorical and dimensional models somewhat obsolete; the
useful observation of dimensional models that emotions vary in their similarity is
acknowledged by proponents of categorical explanations.
Moreover, several studies have confirmed Ekman’s notions of universality and
categorical perception of emotions. For instance, Etcoff and Magee (1992) conducted
a study for which they produced series of drawings which blend from one emotion
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
66
to another. Their results showed clear perception boundaries around the pictures
which expressed two emotions in the same amount, and subjects hardly ever used
two emotion terms to describe a face although encouraged to do so. The only
exception which did not seem be perceived as a separate category was surprise.
This leads them to suggest that surprise may actually be a cognitive state which can
co-occur with true emotions.
Young et al. (1997) conducted a study which tries to determine if facial expression perception is better explained by a dimensional or categorical model. They too
prepared images which show mixtures of two emotions; in this case, morphs of real
photographs were used. Again, subjects picked one of the basic emotions in most
cases. In a second experiment, they produced blends with a neutral expression
and again noticed a perception boundary. They also note that distance from pure
emotion had an influence on reaction time, and in some cases, subjects accurately
judged the second emotion which had been blended in to produce the morph.
They draw the conclusion that facial expression perception is categorical, not
dimensional. In two-dimensional models like the circumplex model (Russell, 1980),
emotions build antonymous pairs. For instance, happiness is seen as the opposite of
sadness. Therefore, a mixture of these emotions should give a neutral expression.
Since their subjects did not judge such mixtures as neutral, they conclude that a
two-dimensional model can not be accurate. Instead, emotions displayed by facial
expressions must lie in a higher-dimensional space, in which distinct emotions are
connected without passing through the neutral state. “Dimensions such as pleasantunpleasant thus correspond to intellectual, not perceptual constructs.” (Young et al.,
1997, p. 309) It should be remembered, however, that a three-dimensional model
(e.g. Mehrabian, 1996, Osgood, 1976) can reliably distinguish basic emotions.
This result is very much in line with the notion that affective space is indeed
multidimensional. Basic human emotions occupy distinct points in this space, but
this is not an exhaustive description. However, given the importance emotions play
in our lives, facial expressions capture very important affective dimensions, even
in their most basic form. Of course, the message conveyed by facial expressions
is usually much more complex. Actors are capable of consciously communicating
much more through their facial expressions, while in our daily lives, we do so
without actually noticing.
Display rules
Though the biological basis for emotions and facial expressions is most likely identical all over the world, actually exhibited facial expressions differ considerably due
to cultural differences. Ekman and Friesen (1975) coined the term display rules,
which influence how we express our emotional states. These rules are specific to
culture and become inhabited over a lifetime. Humans learn these rules either by
observation of other people in the same situation, or by being told what is appro-
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
67
priate and what is not in face of a situation. A good example is a funeral (noted
by Buck, 1984 and Ekman and Friesen, 1975), where it would be inappropriate to
show happiness even if one feels so inside. Also, if the secretary of a deceased businessman were to show more sadness than the widow, it might suggest that their
relationship was more than only work-related.
For Buck (1984), it is important to distinguish between spontaneous communication and symbolic communication. Spontaneous communication via facial expressions
is a result of the emotional state we are experiencing. To employ his example, it is
a sign for one’s emotional state in the same way that dark clouds signal impending
rain. This is the kind of facial expression which can also be observed in animals.
In humans, spontaneous facial expressions are still regulated by display rules. The
facial expressions which result when we feel basic emotions fall into this category.
On the other hand, symbolic communication is unique to humans and culturally
dependent. We can choose to display a very wide array of facial expressions. This
may be done for humorous reasons to mark an utterance as a joke, or for deceit
to make others believe that we feel in a different way than we really do. Symbolic
facial expressions may be so frequently used that they become habits, at which
point we are hardly aware anymore that we display them.
The difference becomes well visible when we compare genuine, spontaneous
smiles and smiles which we choose to display. Both involve contraction of zygomaticus major, which is the facial muscle that pulls the lips upwards and apart in
the way characteristic for smiles. We are aware that the smile is the facial movement which signals happiness and friendliness and can smile voluntarily. However,
genuine smiles also involve contraction of orbicularis oculi, a muscle surrounding
the eye. Contraction of this muscle results in slightly closed eyes and wrinkles next
to the eyes. While it is possible to make this movement voluntarily, this is usually
not done in a deliberate smile. This phenomenon was already noted in the 19th
century by the neurologist Duchenne de Boulogne, who conducted experiments in
which he electrically stimulated facial muscles to produce facial expressions. The
genuine smile is now also known as the Duchenne smile. Ekman (1992b) gives a
more detailed account of what distinguishes different kinds of smiles.
Facial expressions as affect metalanguage
As noticed in chapter 3, emotions are an important aspect of affect, and emotional
categories all have their place in affective space. Because of their close ties with
emotions, facial expressions seem to be a very promising metalanguage for affective connotation. In their basic form, they are universally understood, regardless of
culture. Because they are purely visual, non-verbal signals, they solve the translation and comprehensibility problems inherent to all language-based approaches.
Though basically universal, there are cultural dependencies which need to be
considered carefully. Symbolic facial expressions cannot be assumed to be un-
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
68
derstood correctly across cultural boundaries, and display rules should certainly
influence facial expression displays to be acceptable.
Basic emotions are a helpful framework to explain facial expressions, and vice
versa. It is important to remember, however, that a face can communicate much
more than that, just as human emotions are much more complicated. When facial
expressions express basic emotions, they represent just a certain region of affective
space. The discussion of the relationship of affect and specific media in chapter 4
has shown, however, that music, photographs or films only seldom communicate
these basic emotions; their messages are much more subtle.
This can be addressed in two ways. One possibility is to tailor the facial display for the purpose at hand. This can be achieved by the use of symbols. The
advantage of this approach is that the model can stay quite simple – only those affective messages that are common for the targeted media type need to be depicted.
The downside of this approach is a loss in universality. A specialised facial display
is less useful for media types it was not designed for. Moreover, since a simple
model makes it inevitable to rely on additional symbols to make up for a lack of
expressiveness, the display becomes even more culturally dependent. The best example for simplistic, tailored facial expression interfaces are emoticons, which are
discussed below.
The other solution are facial expression displays whose emotional expressiveness approaches that of real human faces. Computer-generated characters in animation or live-action films have become so convincing that this task seems not
completely unfeasible. When this level of expressiveness is achieved, there is no
need for the addition of symbols nor specialisation towards a specific application.
Then, a face can serve as a highly accurate yet very subtle representative of a wide
region of affective space. The development of such a display would be challenging, in both technical and artistic ways and would require artists and developers to
work hand-in-hand. It would also ask for a computational model of affect that surpasses basic emotions or three-dimensional models in its expressiveness. Chapter 6
describes a first attempt at developing such a display.
Abstraction and exaggeration
A related question is how far a face should be abstracted to be best suited for an
affective space interface. It is without question that some level of abstraction is
necessary; an artificial display is an approximation towards reality no matter how
sophisticated. It can be argued, though, that it is not even desirable to imitate a
natural face in the most detailed way.
Figure 5.9 is taken from McCloud (2006), an excellent resource for artists on
how to draw comics. It depicts the same meaning – the affective state of anxiety –
in five different levels of abstraction along a resemblance-meaning dimension. At
the left end of this spectrum, a highly detailed drawing of a face conveys anxiety.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
69
Figure 5.9: Anxiety in five levels of abstraction. Taken from McCloud (2006, p. 96)
Towards the right side, the level of detail is gradually reduced; with increasing
abstraction the level of resemblance decreases. The third picture only contains a
minimal number of facial features but still expresses anxiety as accurately as the
most detailed one.
When the facial expression is abstracted even more, facial features are lost which
are necessary to convey the emotion; in this case, the wrinkles are missing. To make
up for the loss of expressiveness, the face is augmented with sweat beads, a symbol
whose effectiveness depends on the viewer’s knowledge of its meaning. The word
anxious constitutes the right end of the spectrum; it is pure meaning and depends
entirely on the knowledge of the viewer to imagine the facial expression which this
word denotes. A detailed discussion of McCloud (2006) and its implications for the
design of facial expression interfaces is given in chapter 6.
The goal of high expressiveness does not mean that it is necessary to depict a
face in great level of detail. As we have seen, a more simple drawing can be equally
effective. Indeed, realism can prove counterproductive for this task. Our faces are
guided by display rules, and it is only seldom that we actually exhibit full-blown
emotions in our face. Also, high realism specifically depicts a person. The more
abstract the drawing, the more general it becomes, thus making it easier for users
to identify or engage with the face.
Facial expressions can be captured by a number of key features. These features
are the eyes, eyebrows and the mouth. Wrinkles are a result of skin pushed together
and are thus a by-product of the actual facial expression. In computer-generated
faces, they are frequently ignored. However, they too are necessary for accurate
facial expressions. Concentrating on the key features of facial expressions reduces
the number of distracting elements and can help to convey the intended emotion
more clearly.
A final, pragmatic argument in favour of a less detailed approach is that of
expectations. When a face looks very real, it might lead users to expect the same
capabilities as a real person. As Picard notes, “The more complex the system,
and the higher the user’s expectations, the harder it also becomes for the system’s
designer to craft the appearance of natural, believable emotions.” (Picard, 1997, p.
221)
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
70
The expressiveness of a face can be increased further if the key features of an
expression are exaggerated. This is just the device which is employed in caricatures. Brennan (1985) made an early and impressive attempt at a system which
automatically creates caricature drawings. The system represents drawings by a set
of curves. The drawing which should be turned into a caricature is compared with
a reference drawing, and the differences between the drawings are stored as vectors.
The user can then choose a factor by which the calculated vectors are multiplied.
The differences to the reference picture are the drawing’s key features. Increasing
the distance of this key features in the direction of their respective vectors results in
an exaggerated version of the first picture.
Calder et al. (2000) conducted a study on the effects of caricaturisation. They
produced caricatures of real photographs which depict basic emotions. They employed a method similar to Brennan’s caricature generator. A computer system
calculates the key features of an emotional expression by comparing it with the
photograph of a neutral expression. With morphing software, these differences are
then exaggerated to produce caricatures. They tested for variables like intensity
of depicted emotion and face-likeness of the caricature. Their results showed that
face-likeness of exaggerated photos was perceived as decreasing. However, emotion
intensity was rated linearly increasing with the level of caricaturisation. Exaggeration thus seems to be a valid method to increase the range of emotions that can be
depicted by facial expression displays.
Emoticons
In interactive systems, the most common application of facial expressions as metalanguage for affect are emoticons, which are highly abstracted facial expressions,
usually represented by conventionalised character sequences. In face-to-face communication, bodily expressions and prosody – rhythm, stress and intonation – communicate the speaker’s intention to the listener. Text-based computer-mediated
communication makes it very hard for people to convey the affective value of messages, because neither of these devices is available. Emoticons are used to solve
this shortcoming, communicating affect through the only available device: textual
characters.
According to the Wikipedia entry on emoticons10 , while the principle can be
traced back to the 19th century, emoticons were started to be used in 1982. The first
emoticons were :-), symbolising a smiling face to mark a message as a joke, and
:-(, a sad face to flag something as serious. The concept was picked up quickly as
intended and has become a ubiquitous part of online communication. The symbol
:-) seems to be influenced by the iconographic smiley , which became popular
in the 1970s11 . Indeed, sometimes emoticons are called smileys, and in more recent
10 http://en.wikipedia.org/wiki/Emoticon,
11 http://en.wikipedia.org/wiki/Smiley,
last accessed 2009-02-10.
last accessed 2009-02-10.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
71
years, many software applications have started to automatically replace text-based
emoticons with graphical emoticons, which are usually based on this design.
In Asian countries, a different style of emoticons has evolved. The basic smiling emoticon is produced as (^_^). Eastern style emoticons are not read sideways.
While Western emoticons emphasise the mouth, Eastern emoticons focus on the
eyes as their primary communicative signal. This may be a result of cultural differences in display rules, but I could not find a study which tries to answer this
question. Another influencing factor might be the drawing style of manga, which
commonly features oversized eyes. According to the Wikipedia entry on emoticons,
Eastern style emoticons have been becoming more popular in the Western world
lately. While text-based emoticons are very different, I could not find an instant
messenger which transforms these into graphical ones. For instance, the Japanese
version of Yahoo! Instant Messenger, a separate product rather than a localisation,
uses the same graphical emoticons as the Western version.
There are many conventionalised emoticons to express a wide range of affective
states that are hard to express verbally. Derks et al. (2008) conclude in their study
about emoticons that they are primarily used to express emotions, to put emphasis
on the verbal message and to express humour. In this way, they argue, emoticons
express what usually is expressed non-verbally. In an earlier study, Derks et al.
(2007) noted the influence of context and valence on the use of emoticons; in a
positively evaluated situation and in a social context, emoticons are more common
than in task-oriented or negatively evaluated contexts. Interestingly, Walther and
D’Addario (2001) conclude in their study that they could not find any influence of
emoticons on the interpretation of text messages. They conclude that emoticons
complement verbal messages but are not used to change their meaning. However,
they have become so frequently used that they lead Azuma et al. (2008) to imagine
emoticons as a possible universal visual language of the future.
Emoticons do not occur automatically as a direct reflection of our emotional
state but have to be applied manually by the communicating user. Because of this
voluntary nature, the use of emoticons can only be symbolic communication. The
symbols used are mostly based on spontaneous facial expressions but are highly
exaggerated in these cases. For example, it is a spontaneous emotional reaction of
humans to blush, possibly reflecting embarrassment or modesty. The blush emotiin the instant messenger Skype12 exaggerates the characteristic red cheeks to
con
symbolic level. Abstraction to symbolic level means that the correct interpretation
depends on the receiver’s familiarity with the meaning of the symbol. The symbols chosen, of course, are ones that are established and well-known, at least in the
targeted culture. This abstraction is made for good reasons. Textual emoticons use
only a few characters to display an affective state. Graphical emoticons are very
small too, with common sizes around 20×20 pixels. Only the symbolic, simplistic
12 http://www.skype.com,
last accessed 2009-03-18.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
72
and exaggerated character of emoticons makes it possible to depict affective states
on so little space.
Textual emoticons do not need a separate interface; they are simply entered as
characters. Graphical emoticons may be placed by producing the equivalent textual
emoticon, which is automatically converted into the graphical representation by
the software. Alternatively, users pick the desired emoticon from a list of available
choices. This list is usually laid out without an underlying structure. As explained
before, the affective states covered are highly specialised and may not be equally
distributed in major affective dimensions.
A notable exception is Papermint13 , an online community in which users are
represented by comic avatars. Avatars can display facial expressions, which are
selected by the user through emoticons that symbolise emotions. The available
choices are arranged on a valence-arousal affective space (see figure 5.10). When
the user hovers over an emoticon, a tooltip gives a one-word description of the
depicted state. This example illustrates that lower-dimensional mappings onto affective space can be useful to inform the layout of interface elements, even though
these elements may represent specialised affective dimensions.
AROUSAL
VALENCE
Figure 5.10: Papermint. Emoticons arranged on a valence-arousal affective space.
I believe that emoticons are an excellent example of affective connotation made
explicit. Text-based online conversations have much in common with spoken language, full of colloquialisms and frequently containing onomatopoeic expressions.
As explained before, text-based communication is well suited for denotative discourse, while affect is difficult to express adequately. Emoticons augment a conversation limited by its mode in such way with the affective connotation intended by
the speaker.
13 http://www.papermint.com,
last accessed 2009-03-18.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
73
The ‘affective vocabulary’ of textual emoticons is not limited; new character
combinations can be invented which draw on established symbols or introduce
novel ones. Graphical emoticons are much more limited to a preselected number of
available states. In both cases, however, the vocabulary is highly specialised towards
the target domain, the augmentation of a conversation with its intended affective
connotation. Emoticons are thus an example of an affective space interface in which
the affective dimensions covered are tailored for the purpose.
Example: LittleBigPlanet
In the platformer game LittleBigPlanet for the Playstation 3, players control avatars
in pre-built or user-contributed levels. These avatars are called Sackboy and Sackgirl
and can be customised in the game. The game is designed for online playing and
lets multiple users play platform scenarios together while being physically apart.
The game recognises the importance of affect in the communication between
players. Being console-based, the game is controlled via gamepads, which makes it
difficult to send text-based messages to other players (though this is possible). This
makes sending emoticons to others infeasible.
Instead, the game allows players to express affect through their avatars. Via the
gamepad’s left-hand d-pad
, a player can select an emotion to be expressed by
his or her avatar. Four basic emotions can be expressed, each in three intensities:
is joy, while
is sadness. Thus, the vertical axis represents valence. Furthermore,
is fear, while
is anger, which turns the horizontal axis into the dominance
dimension. Four basic emotions are thus mapped onto a two-dimensional valencedominance affective space, which is illustrated in figure 5.11.
The emotional states are expressed by the avatar in two ways, facial expressions
and posture. Facial expressions are exaggerated, which communicates the emotion
in a clear way and fits well with the visual comic-like style of the game. Facial
expressions and postures are animated, which adds greatly to their expressiveness.
Posture is mainly employed to communicate sadness and fear. In presence of
fear, Sackboy raises his hands defensively and leans back his head. The expression
of sadness is intensified because Sackboy is bent forwards and lets his arms hang
down.
The addition of posture is an interesting possibility to be considered in the
search for ways to express affect. In the affective computing community, there is
now ongoing work to infer the emotional state of people from video recordings of
their posture (e.g. Kleinsmith and Bianchi-Berthouze, 2007, Castellano et al., 2007).
Facial expression recognition
There are attempts at automatic recognition of facial expressions (e.g. Bartlett et al.,
2003), which promise to be an interesting input method for affective information.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
74
Figure 5.11: LittleBigPlanet: Facial expressions in a valence-dominance affective
space.
On the one hand, visual source material – photos and video – could be analysed
algorithmically to extract some indication of expressed affect when human faces are
present.
On the other hand, with webcams being commonplace nowadays, facial expression recognition might become a feasible input method, which does not rely on
introspection, but rather uses a physiological measure to obtain information about
the affective meaning of information entities. At their current state, such algorithms
require expressions to be posed and can only detect them into few categories. However, to be used as physiological input method, algorithms need to be capable of
detecting subtle changes in the spontaneous and weak facial expressions exhibited
by users when exposed to information entities. Taking into account the inconclusive results of studies which measured facial expressions in reaction to music (see
4.2: Production of emotions in the listener), it seems doubtful that such a method
becomes feasible in the foreseeable future.
CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE
5.6
75
Discussion
I have described several metalanguages for affective space. Again, this is only a
small selection out of the many possibilities worth considering. Each of the metalanguages described here seems to have its merits, so the question of which metalanguage to pick depends on the purpose.
Language-based approaches have the advantage of being very specific in their
message. One possibility is to use tags which describe affective states. As the
analysis has shown, users use this possibility to a lesser extent than what might be
expected. This could be countered with an interface which recommends well suited
tags to the user.
Semantic scales also have the advantage of being very specific in their meaning.
Such scales are usually preselected, thus solving the problem of tags that users
need to find the right words. The selection of defining words for the scales can
be adjusted for the type of media being described. From these specific statements,
positions on general affective dimensions can be calculated. Scales further allow to
express different levels of intensity, something which cannot be achieved with tags.
On the downside, both scales and tags exhibit the problem of translating into other
languages. Translations can shift a terms’ meaning, which makes results difficult to
compare.
Affective space can be represented directly through its most important dimensions. While the concept seems to be easily understood, expressiveness is only
achieved when a given point in affective space is shown in context of the range of
possible values in this space, which limits the concept to cases in which enough
screen estate is available.
Colours, on the other hand, can be represented well in spatially constrained
situations. Another advantage of colours is the possibility to describe any colour
with three dimensions, which seems to make them well suited for use with a general
dimensional model of affect. When selecting colour as metalanguage, however,
it is important to remember the difference between colour emotions and colour
symbolism. All of the examples consulted focus on colour symbolism, suggesting
that this kind of association is stronger than the other.
Facial expressions seem to be the best choice if actual emotions are to be expressed. They are – to a certain degree – understood world-wide. Being entirely
visual, there is no translation problem. If abstracted and exaggerated to a certain
level, their expressiveness can be increased while required space is decreased. In the
last chapter, I describe a project which aims at the development of a software component which expresses emotions through facial expressions, taking into account
the findings of this chapter.
Chapter 6
Grimace
Facial expressions are, as we have seen, closely tied to emotions. Because they
are universally understandable, they constitute an excellent metalanguage for this
important aspect of affect. In this chapter, I describe Grimace, a web component
which displays emotions through facial expressions.
Reproducing facial expressions through computer graphics necessitates artistic
skills. For this reason, the component was developed in co-operation with fellow
student Thomas Fadrus. While I developed the concept, software architecture and
the bulk of the actual implementation, my colleague focused on the visual aspects
of our project.
Emotion model
Artists who want to draw convincing portraits of humans need to be expert observers of facial expression. Grimace was based on the book Making Comics by Scott
McCloud (2006), a manual for artists on how to draw comics. Chapter 2 of the
book deals with how to convincingly draw facial expressions. McCloud develops
an emotion model which follows the categorical approach, in which he postulates
6 basic emotions:
Anger
Joy
Surprise
Disgust
Sadness
Fear
Table 6.1: The basic emotions defined by McCloud (2006)
This list is, in turn, based on the research of Paul Ekman. In a series of crosscultural experiments, Ekman et al. (1972) showed photographs of facial expressions
76
CHAPTER 6. GRIMACE
77
to members of different cultures. Since the posed expressions could be judged accurately, he inferred the universality of several specific emotions. While the existence
and the number of basic emotions is still debated, his results show that these 6
emotions can be judged correctly.
Artists who want to draw convincing portraits of humans need to be expert
observers of facial expression. The Artist’s Complete Guide to Facial Expression by
Gary Faigin (1990), an excellent guide to drawing detailed facial expressions, is
based on a categorical model. McCloud (2006) takes visual cues for his model from
Faigin’s work and shows how to depict emotion through facial expressions in the
world of comics, offering the ideal framework for our work.
In McCloud’s model, these basic emotions can be blended to achieve complex
emotions. He compares this process to the way arbitrary colours can be mixed
from three primary colours. Accordingly, he calls the 6 basic emotions primaries,
and blendings of two basic emotions secondaries. He gives example depictions and
names for all primary and secondary emotions, which we used as the basis for our
work. He asserts that mixtures can occur in arbitrary intensity and might even
include three emotions.
Goals
The goal of the project Grimace was to build a facial expression display using web
technology which effectively conveys emotions as depicted by McCloud (2006, p.8385). This includes any primary (or basic) emotion and any secondary emotion
(blendings of two basic emotions) in arbitrary intensity. The result should be a free
component which can be easily integrated into other projects.
6.1
Related work
Realistic approaches
The development of dynamic face models is an important research area. Most of this
work is being undertaken in the field of affective computing, which aims to enhance
interactive systems with affective awareness and expressiveness. In a principal work
for this research area, the goal was defined as “making machines less frustrating
to interact with.” (Picard, 1997, p. 214) Interactive systems should recognise the
affective state of the user and adapt their output and behaviour accordingly.
One commonly proposed solution is the use of embodied conversational agents
(ECA). In such an environment, the system interacts with the user through an
avatar with human-like appearance and expressiveness. ECA strive to be believable
conversation partners. Therefore, they need to be able to express their emotions
through facial expressions. Since an ECA also speaks, another requirement is ap-
CHAPTER 6. GRIMACE
78
propriate facial animation which supports the impression that the ECA is actually
speaking. Usually, an ECA is modelled as a 3D head.
Wang (1993) undertook an early yet impressive attempt at building a threedimensional face model, which predates affective computing. The face consists
of a set of hierarchical b-spline patches, which can be manipulated by simulated
muscles.
Pan et al. (2007) focus on the notion that the basic emotions as postulated by Ekman are not the ones which are actually needed for conversations with believable
agents. They developed a 3D character which expresses affective states like agreement, interest, thinking and uncertainty through facial expressions and animated
gestures. Ochs et al. (2005), on the other hand, base their three-dimensional ECA
on basic emotions, but allow them to be blended to achieve more subtle affective
states.
Albrecht et al. (2005) depart from the concept of basic emotions and base their
agent on a dimensional approach. Their agent augments a text-to-speech system.
The system analyses a text for certain terms whose coordinates for three affective dimensions are stored in a table. With these values, the system augments the
spoken text with appropriate facial expressions. Zhang et al. (2007) follow a similar approach. In their system, high-level affective dimensions are translated into
MPEG 4 Facial Animation instructions. This standard is described by Pandzic and
Forchheimer (2002) in detail.
There are publicly available ECA frameworks. The CSLU Toolkit1 is one example. It is a comprehensive framework which facilitates application development
with the included ECA. Another example is Xface2 , which achieves an impressive
level of realism. It makes use of the MPEG 4 Facial Animation standard.
Comic-like approaches
The approaches described above usually employ characters which are designed for
a high level of realism. Such level of realism, however, is not necessary for this
project. We aim to unambiguously express emotions through facial expressions.
McCloud (2006) shows that a certain level of abstraction is possible without any
loss of expressiveness (see section 6.2). There are a few attempts to develop face
models which aim to achieve such abstracted comic-like appearance.
Bartneck (2001) conducted a study on the well-suitedness of different representations of facial expressions. Three representations at different abstraction levels
were compared; photographs of a real face, an embodied conversational agent from
the CSLU Toolkit, and highly abstracted emoticons (black & white, 10 × 9 pixels in
size). Subjects rated facial expressions of these representations for convincingness
and distinctness. Results showed that the very abstract emoticon was perceived to
1 http://cslu.cse.ogi.edu/toolkit/,
2 http://xface.itc.it/,
last accessed 2009-02-20.
last accessed 2009-02-20.
CHAPTER 6. GRIMACE
79
be as convincing as the photograph, while distinctness was rated as decreasing with
increasing abstraction levels.
Tanahashi and Kim (1999) developed a comic-like face model. Their model is
designed to express four out of six basic emotions as defined by Ekman. They also
experiment with exaggeration of the features and with the addition of symbols to
achieve a higher level of expressiveness. The symbols employed are highly culturally dependent. Iwashita et al. (1999) also follow a comic approach in their system
for caricature development. An interesting point is the use of a survey to improve
the validity of the system, in which they asked subjects to pick the most expressive
and convincing representation out of a few alternatives.
Schubert (2004) describes a very simple comic-like face model, which only consists of eyes and mouth. These features are represented by parabolic functions. His
model is based on a dimensional approach. The shape of the mouth indicates valence, the degree to which eyes are opened represents arousal. The model is used to
visualise emotions expressed by music. Emotions are very short experiences which
are not constant over the duration of a song. The model expresses these changes in
affect.
Discussion
Our goal is not to build a conversational agent; we do not include animation or a
speech component. The realistic three-dimensional approaches are designed for a
purpose quite different from our goals. Although embodied conversational agents
can express emotions through facial expressions, we believe that expressiveness can
be increased further. For instance, the ECA cited above do not include wrinkles in
their design. However, wrinkles are an essential aspect of some facial expressions
(disgust, for example).
The comic-like models we have found, on the other hand, reduce the facial
complexity to an extent which results in a loss of expressiveness. Tanahashi and
Kim try to counter this with the addition of symbols. This is something we wish to
avoid, because symbols are culturally dependent, while cross-cultural research has
shown that facial expressions on their own are universally understood.
This brief survey leads to the principles on which we built our system.
6.2
Design
Grimace is based on two principles: simplicity and accuracy. Our face is simple
by design, a two-dimensional comic-like face which only consists of those facial
features which are relevant to the display of emotions. In order to achieve credible
facial expressions, however, the few facial features available must be flexible and
accurate.
80
CHAPTER 6. GRIMACE
In 3D computer graphics, face models are represented by a polygon mesh. The
mesh can be manipulated to show facial expressions. However, our goal of clearly
displaying emotions does not ask for a detailed 3D model. McCloud has shown
that a certain level of abstraction is possible without any loss of expressiveness.
Consider, again, figure 6.1, in which anxiety is depicted in five levels of abstraction
– from a very realistic drawing to full abstraction (a word). The face in the middle
uses only a minimal number of facial features while being as expressive of this
emotion as the most realistic depiction. Reduced detail and complexity allows to
focus on those features which are necessary to effectively display a specific facial
expression. This is the level of abstraction we want to achieve. We believe that
detailed reproduction of faces is not necessary to convey emotions unambiguously
and a certain level of abstraction can actually convey emotions more clearly than
a very realistic approach. Calder et al. (2000) show that comprehensibility can be
increased even further if characteristic features of facial expressions are exaggerated.
Figure 6.1: McCloud (2006, p.96)
The Uncanny Valley
Another reason why we opted for a comic-like, abstracted approach is the notion
of The Uncanny Valley. The concept was introduced in a short essay by robotresearcher Masahiro Mori (1970). He asserts that representations of humans at
different levels of human-likeness are not linearly correlated with the level of familiarity. While abstract depictions – like cartoon characters – are readily accepted as
quite familiar and healthy persons naturally achieve the greatest level of familiarity,
there is a gap in between for almost human-like depictions. These almost humanlike appearances – corpses and zombies are examples given – are perceived to be
creepy or uncanny. Figure 6.2 shows Mori’s graphical representation of the concept.
The drop in familiarity for almost human-like characters is visible as a valley, hence
the name Uncanny Valley.
Bartneck et al. (2007) conducted a study in which they tried to find out if Mori’s
predictions are accurate. They used pictures of robots at different levels of humanlikeness, as well as pictures of real humans. It has to be noted that Bartneck et al.
measured the level of likeability in their study, not the level of familiarity which Mori
81
CHAPTER 6. GRIMACE
+
Uncanny valley
Moving
Still
Bunraku puppet
Healthy
person
Familiarity
Humanoid robot
Stuffed animal
Industrial robot
Human likeness
50%
100%
Corpse
–
Prosthetic hand
Zombie
Figure 6.2: Graphical representation of The Uncanny Valley. Taken from Geller
(2008), based on Mori (1970).
described. Their findings did not confirm the predicted raise in likeability. More
abstract depictions were perceived as more likeable than pictures of real humans.
They note that knowledge about whether the photo showed a real human or a robot
did not have an influence on likeability. Instead, results were highly dependent on
the perceived beauty of the stimulus. From their results, they infer that it might be
more accurate to speak of an Uncanny Cliff.
Geller (2008) gives an up-to-date examination of the concept. He notes that
there is a number of examples which contradict Mori’s predictions. One important
factor which influences familiarity is the depiction of the eyes. There might be
an uncanny threshold; the eyes need to be very well done to be acceptable. He
closes with a recent quote of Mori, in which he, too, questions the predicted raise
familiarity. Mori now believes that the human ideal is best expressed by the calm
facial expressions of Buddha statues.
If there is an Uncanny Valley, it needs to be carefully avoided for our face to be
perceived as acceptable and not creepy. This we hope to achieve by choosing an
abstracted, comic-like graphical approach.
Functional principles
We believe that closely following the biological process of how emotions result in
facial expressions increases the credibility and clarity of the displayed emotion. In
humans and animals alike, facial expressions result from the contraction of facial
muscles. Usually, muscles are attached to bones at both ends. With facial muscles,
CHAPTER 6. GRIMACE
82
however, only one end is anchored to a bone, while the other one is attached to
the skin. When facial muscles contract, they move parts of our face. Each muscle
affects the facial features in a certain way, and each emotion affects a different set
of muscles. When several muscles are contracted in a special way, familiar facial
expressions arise which clearly convey emotions. Therefore, our facial model is
muscle-based.
To achieve the desired level of expressiveness and abstraction, we had to find
the facial features which are necessary to convey emotions. As it turned out, there
are only a few facial features needed; eyebrows, eyes and the mouth. On top of
that, some emotions only become distinguishable from each other when wrinkles
are taken into account. McCloud’s drawings of the 6 six basic emotions were further
reduced to the essential features and wrinkles of each emotion (figure 6.3).
Figure 6.3: Essential features for 6 basic emotions.
These features had to be translated into graphical elements that could be transformed algorithmically. We found the necessary combination of accuracy and flexibility in Bézier splines. All facial features and muscles are represented by one or
more splines. Figure 6.4 shows the first attempt to represent the facial features with
a minimal number of Bézier splines. This early model proved to be not capable of
expressing all necessary facial expressions and was augmented in later iterations.
The shape of Bézier splines is determined by a very small number of control
points. The idea was to connect virtual muscles to these control points in such a
manner that contraction of the muscles would result in natural looking transformations of the splines. Emotions would, in turn, affect multiple muscles in a concerted
way.
McCloud notes that the facial expressions which result from emotions are fully
symmetrical. Therefore, we decided that it would be enough to model one half of
the face and mirror the result to the other half.
6.3
Development
Selected technology
Grimace has been developed in Actionscript 3 and is being deployed as a Flash /
SWF file. Actionscript 3 is the scripting language used by Adobe Flash and Adobe
CHAPTER 6. GRIMACE
83
Figure 6.4: First attempt to represent facial features via Bézier splines.
Flex. Though not advertised, the language can be used on its own. The technology
was selected for several reasons:
• Free: The Flex 3 SDK is available open source under the Mozilla Public License.
It contains MXMLC, an Actionscript 3 compiler, which allows generation of
SWF files entirely through Actionscript 3 code, without the need for an IDE
like Adobe Flash IDE or Adobe Flex Builder.
• Optimised for dynamic vector graphics: Flash originated as a vector-based animation tool and offers comprehensive vector-drawing functions.
• Optimised for the web: Flash is a web-centric technology which delivers small
file sizes and can be conveniently integrated in web projects.
• Ubiquity and consistency: The Flash player is available for all major operating
systems and has an extremely high install base. SWF files are displayed in
exactly the same way across platforms and browsers.
Iterative development
We have adopted an iterative approach to development. Instead of defining the
complete architecture beforehand, we started with only implementing the most basic functions. First, we implemented the necessary classes to define facial features.
Afterwards, we added the capability for muscles to move control points of features.
In many iteration steps, muscles were optimised. Features were added over time;
starting with the eyes (the most simple feature), later adding eyebrows and the
mouth and finally wrinkles. The goal was to find muscle definitions which would
84
CHAPTER 6. GRIMACE
Tension
Polynomial mapping
1
0.75
Sine mapping
0.5
0.25
0
0.25
0.5
0.75
1
Anger
Figure 6.5: Muscle tensions were plotted and interpolated for each emotion.
allow the features to be transformed in such a way that they would be capable of
expressing each of the 6 basic emotions. We defined this as our facial expression
gamut.
Afterwards, emotions were added. Each emotion should be able to influence
any number of control points in a very flexible manner. Again, we started with the
emotion easiest to implement – surprise – and gradually added the other ones. The
relationship between emotions and muscles were derived from McCloud’s drawings. We had representations of each emotion at 4 levels of intensity. For each
level, muscles were adjusted to match the expression. Thus, we achieved an indication of the tensions for each emotion and each muscle at 5 points (neutral and
4 intensity levels). To enable arbitrary intensity levels, we plotted muscle tensions
over emotion intensity and applied curve-fitting. Thus, we achieved mathematical
representations for each emotion-muscle pair. For example, figure 6.5 shows the
relationship of two muscles with anger. The indicated forms are approximated by
two mathematical functions.
Then, we repeated the complete process for wrinkles. Here, an additional challenge was the introduction of opacity. Wrinkles are not visible all the time, but
only when certain muscles are contracted. The relationship between muscles and
wrinkle opacity was implemented in the same way.
At this point, the face could convincingly express every basic emotion. What
remained to be done was to allow blendings of two emotions. For each muscleemotion pair, we introduced an additional parameter, which controls how muscles
CHAPTER 6. GRIMACE
85
should be contracted in presence of multiple emotions. This process involved a
great deal of optimisation work.
When the face was capable of expressing any primary or secondary emotion, the
component was adapted for deployment. This included the addition of a JavaScript
API, which allows full control over the face’s capabilities, and the construction of a
project website.
6.4
Technical details
Grimace follows a muscle-based approach and thus mimics the way biological faces
operate. In human and animal faces, facial expressions result from contraction of
facial muscles. Facial muscles are, unlike muscles in any other part of the body,
only fixed to bones at one end, while the other one is attached directly into the
facial skin. This unique property allows the wide range facial expressions humans
are capable of displaying.
Our face model consists of three major components: emotions, muscles and
features.
• Features, which are the visible elements of a face. Features can be transformed
by muscles. Typically, this includes dynamic features like eyes, eyebrows,
mouth and wrinkles, as well as static features like nose and head outline.
• Muscles, which are the link between emotions and features. The shape of a
muscle is defined by a spline and when contracted can move an arbitrary
number of control points along its path.
• Emotions, which are the high-level concept which influence a number of muscles in an arbitrary fashion. Each emotion affects specific regions of the face
and results in familiar facial expressions.
Figure 6.6 illustrates how these three components work together to achieve a facial
expression. It shows the mouth and its surrounding muscles for a neutral expression and four states of anger.
The shape of the mouth is represented by two features, upper lip and lower lip
(shown as ). A feature consists of several control points. The mouth is surrounded
by several muscles. A muscle has a defined path (shown as ) and a current tension
(the
dot along its path). Each control point can be influenced by multiple muscles
(influences are ). When a muscle is contracted, it moves its tension dot along its
path. Any control point which is influenced by the muscle is then moved, which
results in a change of the feature’s shape. Now, when an emotion is present – the
example shows the influence of anger – several muscles contract simultaneously.
In the following, each of these components and their underlying technologies
are being described. This is followed by a brief description of how Grimace can be
86
CHAPTER 6. GRIMACE
0.0
0.25
0.5
0.75
1.0
Figure 6.6: Influence of anger on muscles surrounding the mouth.
put to use in other projects. A complete overview of all the classes is given in a
UML-style class diagram.
6.4.1
Muscles
As explained before, facial muscles are fixed to a bone at one end, and attached to
skin at the other end. When muscles contract, they shorten and thus pull the skin
towards the point where they are attached to the bone.
We simulate this behaviour. The shape of a muscle is defined by a spline (see
6.4.4). However, unlike real muscles, muscles in Grimace have no width. The
tension parameter of a muscle corresponds to the position t ∈ [0, 1] along the
spline. Thus, t = 0 is a complete relaxed muscle, while t = 1 represents maximum
tension. A muscle can be defined with the parameter initTension, which defines
the neutral state for this muscle. This defaults to 0, but in some cases, a neutral face
– i.e. no emotion is active – results in contracted muscles. An example is Levator
palpebrae, which controls the upper eye lid. Since the eyes are halfway open in
neutral state, this muscle is defined with initTension.
The tension of a muscle, or rather, the distance between the points Q(t =
initTension) and Q(t = tension) influences the position of feature nodes (see
6.4.2: Node influences). In turn, the tension of a muscle is calculated from the
emotions which exhibit an influence on the muscle (see 6.4.3).
Finally, Muscles are grouped into instances of MuscleGroup. This grouping is
optional, but currently muscles are divided into feature muscles and wrinkle muscles, defining additional muscles which simulate the wrinkles resulting from when
feature muscles contract. This is a point where we had to leave an accurate biological representation to achieve the desired facial expression gamut.
87
CHAPTER 6. GRIMACE
6.4.2
Features
Features, the visible parts of a face, can be transformed by muscles. The Feature
class encapsulates distinct facial features, e.g. the upper lip, an eyebrow or a wrinkle.
A feature is comprised of one or more segments, which are instances of the
FeatureSegment class. The shape of a segment is defined by a spline. Thus, a
feature can take an arbitrary shape by connecting several segments.
A spline has two endpoints and 0 or more control points, referred to as nodes.
Every point is represented by the FeatureNode class and can be influenced by an
arbitrary number of muscles. For every node-muscle influence, a weight parameter
is stored.
For n registered muscles, the position of node N is evaluated in the following
way: For each registered muscle M we calculate the distance between the muscle’s
position resulting from its current tension v, and the position resulting from its
initial tension t. The distance is scaled by the respective weight factor w. The
node’s initial position N0 is then translated by the resulting vector.
n
N = N0 + ∑ (wi · ( Mi (v) − Mi (t)))
i =1
Features can also be filled arbitrarily, represented by the FeatureFill class.
Fills can also be influenced by muscles, thus adding a way to add animation. For
every fill, a FeatureNode represents a pivot point, which can then be influenced by
muscles like any other node and moves the whole fill when translated.
Not every feature is constantly visible. Wrinkles result from tightening of facial
skin and thus only become visible when certain muscles are contracted. To simulate
this behaviour, the visibility of features can be mapped to the tension of a muscle.
The relation is not direct but mediated through mappings (see 6.4.5); therefore, the
feature opacity can be controlled flexibly.
6.4.3
Emotions
Emotions are the high-level concept which we aim to display via facial expressions.
In real faces as well as in our implementation, the 6 basic emotions we have implemented result in distinct facial expressions, which have been said to be recognisable
cross-culturally.
The presence of an emotion is represented by a parameter value ∈ [0, 1], where
value = 0 means the emotion is not present, and value = 1 represents maximum
influence of an emotion. When an emotional state is present, it results in simultaneous contraction of a set of muscles. This contraction does not follow value
linearly. For instance, some features only start to be influenced when an emotion
is strongly present, while others are continuously influenced, but more strongly in
88
CHAPTER 6. GRIMACE
early than in later stages. Therefore, for every emotion-muscle influence, we have
defined a mapping which allows flexible control over how a muscle is contracted
for an emotion state (see 6.4.5).
Our emotion model subscribes to the idea that complex emotions are in fact
mixtures of basic emotions. If two or more emotions are present simultaneously,
more than one set of muscles is affected. However, since different emotions sometimes influence the same muscles, influences have a priority parameter. When
a muscle is influenced by more than one emotion simultaneously, priority defines
the influence of each emotion on the final tension of a muscle. For instance, a genuine smile not only influences the shape of the mouth, but also results in squinting
of the eye. However, a result of feeling surprised are widely-opened eyes. If joy
and surprise are experienced together, the eyes remain open, because surprise has a
stronger influence on the eyes than joy. This is represented by different priorities.
Given an influence of n emotions, with emotion values vi , influence priorities pi
and raw emotion tensions ti , the final tension of a muscle is calculated as:
t=
n

∑
 vi · pi · ti ·
i =1
6.4.4

1
n
∑ ( vi · pi )
i =1




Splines
Spline is the common term for the use of piecewise parametric curves in computer
graphics. A spline has two endpoints and may have control points in between. With
splines, complex shapes can easily described or approximated by very few points.
Bézier curves are a form of parametric curves which are commonly used in vectordrawing and animation software. It is a notable property of Bézier curves that the
curve does not run through the control points but is merely pulled towards them.
All shapes in Grimace – facial features and muscles – are based on straight lines
and Bézier curves. They offer an easily understandable way to model the face,
and the selected technology offers native support for these types of splines. Facial
features are all visible components of the face, e.g. the eyes, the mouth or wrinkles.
Each feature consists of one or more segments, and the shape of each segment is
defined by exactly one spline. In addition, muscles are also based on Bézier curves;
the shape of each muscle is defined by exactly one spline.
Splines implement the ISpline interface. The interface defines the getPoint(t)
method, which calculates the location of a point along the spline given the position
t ∈ [0, 1], where t = 0 is the starting point of the spline, and t = 1 is the endpoint.
89
CHAPTER 6. GRIMACE
The following splines are available for muscles and facial features:
(a) Line
(b) Quadratic Bézier
(c) Cubic Bézier
R
Figure 6.7: Spline types
1
S3
R2
S2
S0=R3
Q3=R0
Q0
C continuity
A spline which connects two endpoints with
a straight line. Flash offers the
Line
1
native drawing method lineTo for Qthis spline
type.
Q =R
1
2
S1=RB
A
Quadratic Bézier AR Quadratic
Bézier curve
has one control point. Flash offers the
S
R
1
2
3
C1
2
1
C2
native drawing method curveTo for this spline type.
7
8
1
2
1
8
S
=R
R
The parametric form
of a SQuadratic
Bézier
curve is:
Q =R
0
3
1
2
3
2
0
S1
1
2
Q(Q t) = P0 (1 − t)2 +C P
1 · 2t (1 − t ) + P2 · t , t ∈ [0, 1]
continuity
0
1
Q2=R0
1
2
Q1
1
2
R2=S0
3
8
Q2=RA
S2=T0
1
2
S1=RB
T1
3
8
Q
A Cubic Bézier spline
has two control points and offers great control
Cubic Bézier
1
over the curve form. If two or more Cubic Bézier splines are
concatenated, they offer
C =T
1
2
3
2
C
enough
flexibility to draw all necessaryC =QCfacial features, including the mouth, which
1
0
7
8
1
2
1
8
20
demands the greatest
flexibility.
S
R
1
2
1
2
1
1
The parametric form of a Cubic Bézier curve is:
R2=S0
Q(t) = P0 (Q1=R− t)3 + P1 · 3S(=T1 − t)2 t + P2 · 3(1 − t)t2 + P3 · t3 , t ∈ [0, 1]
2
1
2
0
2
1
2
0
T1
3
8
3
8
FlashQ does not offer a native drawing method for Cubic Béziers. However,
1
the form can be approximated
by lower-complexity curves like Quadratic Bézier
C =T
3
2
splines. TheC =Qmore lower-complexity curves are used, the more accurate the form of
the approximated curve becomes.
0
0
We have selected the Fixed Midpoint approximation method described by Groleau
(2002). It approximates a Cubic Bézier with four Quadratic Béziers and offers a good
trade-off between accuracy and calculation complexity. The approach is illustrated
in figure 6.8. Given the four points of a Cubic Bézier C to be approximated, endpoints and control points for Quadratic Béziers Q, R, S and T are calculated in the
following way:
Q3=R0
Q0
C1 continuity
S1=RB
Q1
Q2=RA
90
CHAPTER 6. GRIMACE
1
2
C1
C2
1
2
1
8
1
2
7
8
S1
R1
1
2
R2=S0
S2=T0
Q2=R0
1
2
3
8
1
2
T1
3
8
Q1
C3=T2
C0=Q0
Figure 6.8: Fixed Midpoint approximation
C0 +C1
2
+
2
+
2
C1 +C2
2
C0 + C2
C
+ 1
4
2
C1 +C2
C2 +C3
C + C3
C
2
2
H1 =
= 1
+ 2
4
2
5C0 + 3C1
Q0 = C0 ; Q1 =
8
7H0 + H1
H0 + H1
R1 =
; R2 =
8
2
Q1 + R1
Q2 = R0 =
2
H0 + 7H1
S0 = R 2 ; S1 =
8
3C2 + 5C3
T1 =
; T2 = C3
8
S + T1
S2 = T0 = 1
2
H0 =
=
In our implementation, the spline can be used like a regular Cubic Bézier with
two endpoints and two control points, while the approximation is handled internally by the class.
91
CHAPTER 6. GRIMACE
For some facial features, i.e. the shape of the mouth or several wrinkles, a
Joiner
single Cubic Bézier curve does not suffice. In these cases, two or more curves may
be joined together to form a curve with additional flexibility. In these cases, one
feature consists of more than one segment.
Parametric continuity C n is a description of the smoothness of concatenated parametric curves:
• C0 : curves are joined.
• C1 : first derivatives are equal.
• C2 : first and second derivatives are equal.
Without additional measures, connected Bézier curves only offer C0 continuity. If
two connected splines are to appear as a single and coherent curve, however, at
least C1 continuity is necessary.
S3
R2
R1
Q0
S2
S0=R3
Q3=R0
C1 continuity
S1=RB
Q1
Q2=RA
Figure 6.9: Joiner
1
2
C1
The Joiner spline is a Cubic Bézier spline whose control points are calculated
C
2
from the control points of adjacent splines to achieve C1 continuity. The concept
is
illustrated in figure 6.9.
1
8
7
8
1
2
1
2
A Joiner spline R 12is constructed
from two Sendpoints
R0 , R3 and two additional
1
R1
points R A , R B . These additional points
are used to calculate the necessary control
R =S
2
0
points R1 , R2 to achieve C1 continuity in both endpoints. R1 and R2 lie on the
−−−→
−−−→
lines formed by R0 R A and R3 R B . The distance ofS the
=T control points from the reQ =R
2
1
0
0
2 between the
1
spective
endpoints on2their
respective axis is derived from the distance
2
endpoints.
T1
3
8
3
8
Typically,
R A and R B are set to the nearest control points of adjacent splines.
Q
1
For instance, if Cubic Bézier Q ends in R0 , then R A would be set to Q2 . Likewise, if
Cubic Bézier S starts in R3 , then R B would be set to S1 .
C3=T2
C0=Q0
CHAPTER 6. GRIMACE
92
The Joiner class is also used for mirroring. Assume a mirror through the vertical
axis at position x = 0, which results in horizontal mirroring. To ensure a smooth
curve, R" ( x = 0) must be 0. If R0 = ( x = 0, y = y0 ), this can be achieved by setting
R A = ( x < 0, y = y0 ). Then, R0 and R A build a horizontal line, which places R1
at ( x > 0, y = y0 ) and results in zero slope for x = 0. When the curve is now
horizontally mirrored at this point, C1 continuity is achieved.
6.4.5
Mappings
Each emotion influences a different set of muscles. McCloud offers drawings for
each basic emotion in 4 intensity levels. These drawings were used as references,
which we wanted to match. For each emotion and each intensity level, muscles were
adjusted to match the reference drawing. The values of the muscle tensions were
saved for each intensity level. Plots of the muscle tensions showed that the relationship is a different one for each combination of muscle and emotion. In some cases,
the relation is a linear one – heightening the level of an emotion increases a muscle’s tension. More often than not, however, the relation is much more complicated.
In order to achieve credible muscle tensions, this relationship, only indicated by 5
points (neutral and 4 intensity levels for each emotion), needs to be interpolated.
We represent the relationships by a number of mathematical functions, which
we call Mappings. A Mapping takes a few parameters which influence the resulting
function in a flexible way to approximate the form of the underlying relationship.
The IMapping interface is merely a wrapper for a low-level mathematical function
and only has one method:
function y(x:Number):Number;
Every registered emotion-muscle influence is represented by a Mapping. The
y-method takes the current value of an emotion as parameter x and returns the
current tension for the muscle.
Another relation represented by Mappings is the visibility of Features. Some
features – wrinkles – only become visible when a muscle is contracted. Representing
this relation through Mappings allows fine-grained control over the opacity.
Three mapping types are currently available:
SineMapping
This form of Mapping is defined by four parameters. The function
returns y0 for x < x0 , and y1 for x ≥ x1 . For x0 ≤ x < x1 , the curve interpolates
between y0 and y1 , following the form of a sine function. This results in a smooth
transition between the two states.
93
CHAPTER 6. GRIMACE
y1
y0
x0
0
x1
Figure 6.10: SineMapping



y

+0
+ +
,,
,
y( x ) =
0.5 · sin π · xx−−xx00 + 1.5 + 0.5 · (y1 − y0 ) + y0
1



y
1
GaussMapping
x < x0
x0 ≤ x < x1
x ≥ x1
This mapping represents the Gaussian function and is used in
cases where a muscle is only contracted for intermediate values of an emotion, but
not for low or high values.
!2=0.02
a2
!2=0.01
a1
!2=0.005
a0
0
µ
0
µ
(a) Influence of scale factor a
(b) Influence of variance σ2
Figure 6.11: GaussMapping
( x − µ )2
1
−
y( x ) = a · √ e 2σ2
σ 2π
The mapping takes three parameters: value = a; mean = µ; variance = σ2
CHAPTER 6. GRIMACE
PolynomialMapping
94
This is a direct representation of a polynomial function. It
can approximate any necessary form by increasing the order of the polynomial.
However, the function is hard to configure manually. In practice, we used the
curve-fitting methods of Grapher.app, which calculates a polynomial interpolation
of desired order for a given point set.
y ( x ) = a n x n + a n −1 x n −1 + · · · + a 2 x 2 + a 1 x + a 0
6.4.6
Stroke styles
The shape of features is represented by splines. Stroke styles determine how the
splines are visually represented.
If no stroke style is set, the spline is simply stroked by a constant width brush.
However, in many cases, this does not deliver favourable results. Stroke styles
implement the IStrokeStyle interface. The interface’s draw method supplies the
style with the spline to be drawn.
BrushStyle
Currently, BrushStyle is the only stroke style available. It simulates
the characteristic form of a brush; thin lines at the start, getting thicker towards
the centre, and again thinner towards the end. This corresponds to the parameters
startWidth, maxWidth and endWidth.
From the spline to be stroked, two splines are derived which define the shape
of the stroke. One spline defines the upper edge, the other one defines the lower
edge. In every point of the base spline, a normal is drawn. On each normal, the
positions of the points of upper and lower are shifted; points of upper spline to the
left, points of lower spline to the right. Thus, maxWidth does not directly represent
the actual thickness of the resulting stroke, but the distance of the control points.
The concept is illustrated in Figure 6.12.
95
maxWidth
CHAPTER 6. GRIMACE
t=
xW
ma
1
3
h
idt
t=
2
3
idth
endW
startWid
th
Figure 6.12: BrushStyle applied to a Cubic Bézier spline
6.4.7
Facedata file format
Faces are entirely defined through external files which are loaded at runtime. This
allows the development of faces which look entirely different to the standard face
we developed. Additional emotions can also be implemented.
A complete set of Facedata defines the following:
• Features, which are the visible elements of a face. Features can be transformed
by muscles. Typically, this includes dynamic features like eyes, eyebrows,
mouth and wrinkles, as well as static features like nose and head outline.
• Muscles, which are the link between emotions and features. The shape of a
muscle is defined by a spline and when contracted can move an arbitrary
number of control points along its path.
• Emotions, which are the high-level concept which influence a number of muscles in an arbitrary fashion. Each emotion affects specific regions of the face
and results in familiar facial expressions.
• Overlays, which are optional graphical elements added on top of the face to
add additional personality to the face. In the standard model, the hairdo is an
overlaid vector graphic. Pixel-based graphics can also be included.
Facedata is an XML-based file format. Currently, no graphical editor is available;
Facedata has to be edited manually. A corresponding DTD is kept up-to-date3 with
3 The
latest version of the DTD can be found at http://grimace-project.net/dtd/latest.dtd
CHAPTER 6. GRIMACE
96
the current capabilities of Grimace and allows face developers to validate their files
through an XML validation service4 .
Since the definitions can become quite large and data have to be edited manually,
Facedata definitions can be spread across files. The loadFacedata API method takes
an array of URLs as parameter, loading the files in the supplied order.
Deployment and use
Grimace is a self-contained component which enables the addition of facial expressions to software projects. Being written in Actionscript 3, the component is
deployed as SWF file and can be opened by Adobe Flash Player 9 and upwards.
The component can be downloaded from the project website and includes detailed
instructions and demo files.
Control of the component is offered by an API, which is compatible with JavaScript
and Actionscript 3. The recommended method is to embed Grimace into web pages
and control it through JavaScript via the API. Through embedding, Grimace can
also be controlled via Actionscript 3. Apart from pure AS3, this includes Flex
and Flash (from version CS4 upwards). The AS3 API is basically identical to the
JavaScript API but less tested.
Customisation The download package includes a complete face in the form of
a set of Facedata XML files. We encourage the development of new faces based
on these definitions. Currently, no graphical editor is available, values need to be
edited manually. However, the package also includes Facemap.swf, the tool we
used to develop the face definition. The tool allows to show muscles and their
current tension, include underlaid pictures which offer reference, and allows the
output of the current state of all components.
4 e.g.
http://www.validome.org/xml/
97
CHAPTER 6. GRIMACE
6.4.8
Class diagram
External API
ExternalCommands
Grimace
JSHandler
MuscleController
EmotionCore
1
FeatureController
1
0..*
Geom
ASHandler
1
0..*
MuscleGroup
Emotion
1
0..*
Muscle
0..*
1
Data input
0..*
1
0..*
1
Feature
FacedataLoader
<<contracts>>
1
1
SineMapping
IMapping
<<interface>>
XMLFactory
PolynomialMapping
XMLDraw
GaussMapping
1..*
1
<<visibility>>
<<shape>>
1
ISpline
<<interface>>
1
0..*
1
FeatureSegment
1
FeatureFill
1
<<shape>>
0..1
IStrokeStyle
<<interface>>
AbstractSpline
BrushStyle
Line
QuadraticBezier
<<pivot>>
CubicBezier
Joiner
<<moves>>
1..*
0..*
FeatureNode
Figure 6.13: Class diagram
98
CHAPTER 6. GRIMACE
6.5
Results
We believe that the goal of this project has been achieved. Our component can
display all primary and secondary emotions which were depicted by McCloud.
Furthermore, primaries can be blended in arbitrary intensities, thus covering states
not covered before.
The component has been released to the public under a Creative Commons
licence. A project website, http://grimace-project.net, has been implemented.
The website features a demo application that allows visitors to express arbitrary
blendings of any two emotions. A download package is available, which includes
the component, demo applications for all supported programming environments
and comprehensive documentation on how to use the component.
Public reactions to the project were notedly positive, shown in a large number of approving comments. Scott McCloud kindly featured our project on his
blog on February 25, 2009, emphasising that facial expressions should be taught in
school, for which our project might be very useful. We are also very thankful to
Mr McCloud for his encouraging words and useful comments about our work at an
intermediate stage of the project.
The resulting face is shown in figure 6.14 with a neutral expression. Figure
6.15 shows the 6 emotional primaries at four intensity levels. In figure 6.16, any
combination of two primaries (both at 75% intensity level) is shown.
Figure 6.14: Neutral expression
CHAPTER 6. GRIMACE
Joy
Surprise
Fear
Sadness
Disgust
Anger
Figure 6.15: Primary emotions in 4 intensity levels
99
100
CHAPTER 6. GRIMACE
Joy + Surprise
Surprise + Fear
Fear + Disgust
Joy + Sadness
Surprise + Sadness
Fear + Anger
Joy + Fear
Surprise + Disgust
Sadness + Disgust
Joy + Disgust
Surprise + Anger
Sadness + Anger
Joy + Anger
Fear + Sadness
Disgust + Anger
Figure 6.16: Secondary emotion blendings of intensity level 3
CHAPTER 6. GRIMACE
6.6
101
Discussion and future directions
The described software component Grimace displays emotions through a comic-like
face. We believe that the display of emotional information is a valuable addition to
information resources, and facial expressions are a natural way of expressing this
kind of information. The work of McCloud (2006) was used as guide and visual reference throughout the design and development process. We believe to have found
a useful compromise between simplicity and necessary detail. We include all facial
features which are necessary to convey an emotion while omitting the rest.
The component was developed using web technology, which allows easy deployment. We have defined an API which allows convenient integration into other
projects without the need for knowledge about technical details. All configuration
data is loaded from external files, which use an XML-based file format. The file
format is fully documented and allows full customisation of all aspects – features,
muscles and emotions. The component is stable and ready for use for the intended
purpose. A project website was implemented, from which the component and documentation can be downloaded.
The expressiveness of the face model was tested in an informal study by my
colleague. Subjects were presented with basic emotions and any combination of
two emotions, resulting in 22 test faces. Subjects had to select one or two emotions
which they believed were expressed by the face. Answers were only counted as
correct when, in the case of blendings, both emotions had been selected correctly,
or, for pure emotions, only the correct emotion had been selected. The lowest
success rate was reported for disgust, only being judged correctly in 30% of the
cases on its own, and an average of 46% in blendings with other emotions. Pure
joy was identified most reliably with 93% correct answers. On average, correct
answers were given in 53% of the cases. This number may not seem to be very
high. However, even the lowest success rate is significantly higher than chance rate,
which would be
1
22
≈ 4.5%. It also disagrees with the conclusions of Etcoff and
Magee (1992) that we perceive facial expressions to belong to one category. Our
results show that humans are, to a certain extent, capable of judging blendings of
basic emotions in facial expressions.
However, the results of the study suggest much room for improvement of the expressiveness of the face. On the whole, however, we believe the goal of the project
has been achieved in a satisfactory manner. Still, many areas remain to be addressed, a few of which will be outlined in the following.
First of all, the current face model can be further optimised. We had to add
additional muscles to the principal facial muscles in a few cases to achieve the
desired expressiveness. However, it might be possible to reduce the number of
necessary muscles by optimising the definition of the actual muscles. Calder et al.
(2000) show that comprehensibility of facial expressions can be increased further if
CHAPTER 6. GRIMACE
102
the characteristic features of an expression are exaggerated. Our model has comiclike appearance, and it might be possible to make our model even more expressive
if we allow a certain level of unrealistic, cartoon-like exaggerated expressions.
Customisation and extension of the current face model would become much
easier if a graphical editor was available. First and foremost, such an editor should
facilitate customisation of visible features of a face. Currently, the control points for
features need to be entered manually in XML files. These are the parts which can be
exchanged easily. The relationships between muscles and emotions, however, need
considerable attention and are quite tedious to change.
So far, the system can only display facial expressions which represent emotional
states. Of course, humans can communicate much more through their faces, which
can be easily observed by studying the wide range of facial expressions which actors
can display. Facial expressions which cannot be expressed currently include doubt
or agreement. The Facial Action Coding System, or FACS, byEkman et al. (1978)
describes a comprehensive framework of all possible facial movements. If the range
of possible facial expressions is to be extended, this framework would offer a good
basis.
This would also mean a departure from the mirroring of facial features. Right
now, facial features are completely symmetrical. In FACS, asymmetrical movement
of features is possible. Ideally, the system would still mirror those parts that are
symmetrical and only consider the differences to the symmetrical state when necessary.
Chapter 7
Summary and conclusion
Summary
This thesis has examined the nature of the affective experiences that we associate
with information entities like words, pictures and music, and how these experiences
can be put to use in interactive systems.
Chapter 1 provided an introduction, outlining the motivation for this thesis and
putting it into the research context.
Chapter 2 was concerned with the nature of meaning. The sign theory, in which
a signifier references a signified, formed the starting point of the analysis. Three
types of meaning were distinguished; denotation, conceptual connotation and affective connotation. Denotation is the actual content, while conceptual connotation
covers any objective property which is implied by an entity. Affective connotation
refers to the emotions or feelings which are expressed by an entity or aroused in
a recipient. Connotation cannot be described directly but only through the use of
metalanguages. Work from experimental psychology shows that affective connotation can be represented well through three factors, evaluation, potency and activity.
The last part of the chapter applied the distinguished aspects of meaning to current
web technology.
Chapter 3 addressed the nature of affective connotation, comparing two theoretical approaches towards affect and emotions. Dimensional theories correspond
to the factors derived for the description of affective connotation. Categorical approaches focus on the notion of a set of basic emotions, which fulfil specific functions in human behaviour. Affect and emotion were distinguished as being unequal,
the latter being a specialised form of the former. In light of this analysis, I came to
the conclusion that general affect is best described through dimensional approaches,
while actual emotions are better captured through categorical theories.
Chapter 4 elaborated briefly on the question of how art is related with affect
by presenting two theories. The contour theory states that art can express emotions because works of art resemble the expressive behaviour which results from
103
CHAPTER 7. SUMMARY AND CONCLUSION
104
emotions. The expression theory focuses on the notion that recipients construct a
persona, which expresses emotions like a person would do. Music seems to have
a special relationship with affect and was analysed in greater detail. Several ways
how music achieves its emotional expressiveness were considered, which showed
that emotional expressiveness in music is more subtle than what can be captured
by basic emotions. The observation that music which produces negative emotions
is sometimes seen as favourable led to the differentiation of two forms of valence,
evaluation and pleasure.
Chapter 5 integrated the findings of the preceding chapters, considering various
metalanguages for the description of affective connotation. Affective factors can be
simply represented as scales, which constitutes a very direct metalanguage. The
most common form are scales which represent the evaluative factor as star ratings.
Defining scales with words turns them into semantic scales, which can be very
specific in their meaning while still being expressive of general affective factors.
Another language based metalanguage are affect words that are applied as tags.
An informal analysis showed that this alternative, though possible with current
implementations, is not very popular. Colours were considered as a metaphor for
affective states. Studies were compared which looked for formulas through which
affective factors and colours are related. Colour symbolism was recognised to have
a strong influence which in some cases contradicts the findings of the cited studies.
Finally, facial expressions were considered as a universal visual language for the
description of emotions. Facial expressions seem to be best explained by a categorical approach. Abstraction and exaggeration were proposed as ways to enhance the
emotional expressiveness of facial expression displays. Emoticons were analysed
and presented as a very popular form of affective space interfaces.
Chapter 6 introduced Grimace, an experimental facial expression display for
emotions. The face model has comic-like appearance and is abstracted to a level
just below the need for symbols. Facial features are represented by splines, which
are influenced by simulated muscles. Six basic emotions each contract a different
set of muscles in non-linear ways. Basic emotions can be blended, resulting in
an arbitrary number of expressible emotions. The face model was implemented in
Actionscript, and all configuration data are read from external XML-based files. It is
deployed as a component, which can be integrated in other projects and controlled
from JavaScript via an API.
Conclusion
The present work can only serve as a starting point for the development of interactive systems which target the affective experience any information entity arouses in
humans. Though many questions were left unaddressed, some useful conclusions
can be drawn.
CHAPTER 7. SUMMARY AND CONCLUSION
105
First, there seems to be no single best solution to the question how affect can
be captured best in interactive systems. Rather, the decision which option to pick
depends on several factors. Here, the context of the work needs to be considered,
whether there are tight spatial constraints, or if users need to input their opinion
through the interface. Also, the nature of the entities to be described has strong
implications. Solutions which can be adapted for each media type, like semantic
scales, seem like a good compromise between expressiveness and simplicity.
Through all differences, a recurrent theme was the notion of three factors which
capture much of what differentiates affective states. The ubiquity of these orthogonal factors leads to the notion that there really exists some form of affective space
in which affective experiences can be located. These three factors – let them be
called pleasure, arousal and dominance – may constitute a general metalanguage
for affect-aware systems. The simplicity of such a dimensional model limits the
expressiveness that can be achieved to some extent. However, these factors can
constitute a common ground, which enables comparisons across media types and
specific implementations.
Emotions have a special status. As affective experiences, they can be described
through three dimensions. However, the specific meaning which every emotion
has for humans is captured better if we think in emotional categories. Just like
words have a specific meaning (denotation) and an accompanying affective experience (affective connotation), so do emotions. The functions that emotions serve
are described well if we name them specifically – this is emotion’s denotative level.
Since emotions denote affective experiences, their affective connotation is quite pronounced and easily determined. This leads me to the conclusion that dimensional
descriptions of emotions are, in fact, descriptions of the affective connotation of
emotions. This notion does not lessen the value of dimensional descriptions of
emotions; it merely puts them into an appropriate context.
Since emotions result in specific facial expressions, this intrinsic relationship can
be used to visualise emotions in interactive systems. Facial expressions can be seen
to denote specific emotions, just like the names of emotions do, but in a visual
and universally understandable way. The presented component, Grimace, serves
as a starting point for the visualisation of emotions in interactive systems. If my
conclusions are correct, all emotions which can be expressed by Grimace should
occupy a point in affective space on which a majority of people can agree on. The
reverse, however, might not be true; not every point in affective space needs to result
in a distinct facial expression, because emotions are only one of several affective
phenomena. A test of this hypothesis remains to be conducted in the future.
I hope to have presented convincing evidence that affect does play an important
role in human experience and is worth to be recognised in the design of interactive
systems. In doing so, we would pay respect to human nature, the inseparable
duality of reason and emotion.
Bibliography
Albrecht, I., Schröder, M., Haber, J. and Seidel, H. 2005. Mixed feelings: expression of
non-basic emotions in a muscle-based talking head. Virtual Reality, 8(4):201–212.
Azuma, J., Kobe, J., Ebner, M. and Graz, A. 2008. A Stylistic Analysis of Graphic
Emoticons: Can they be Candidates for a Universal Visual Language of the Future?
In: Proceedings of World Conference on Educational Media, Hypermedia and
Telecommunications (ED-Media), pp. 972–977.
Bainbridge, D., Cunningham, S. and Downie, J. 2003. How people describe their music
information needs: A grounded theory analysis of music queries. In: Proceedings of the
International Symposium on Music Information Retrieval, pp. 221–222.
Barthes, R. 1973/1996. Denotation and Connotation. The Communication Theory
Reader, pp. 129–133.
Bartlett, M., Littlewort, G., Fasel, I. and Movellan, J. 2003. Real time face detection
and facial expression recognition: Development and applications to human computer interaction. In: CVPR Workshop on Computer Vision and Pattern Recognition for
Human-Computer Interaction.
Bartneck, C. 2001. How Convincing is Mr. Data’s Smile: Affective Expressions of Machines. User Modeling and User-Adapted Interaction, 11(4):279–295.
Bartneck, C., Kanda, T., Ishiguro, H. and Hagita, N. 2007. Is The Uncanny Valley
An Uncanny Cliff? In: Proceedings of the 16th IEEE International Symposium on
Robot and Human Interactive Communication, RO-MAN, pp. 368–373.
Bestgen, Y. 2008. Building Affective Lexicons from Specific Corpora for Automatic Sentiment Analysis. Proceedings of Language Resources and Evaluation Conference
LREC 2008.
Boal-Palheiros, G. and Hargreaves, D. 2001. Listening to music at home and at school.
British Journal of Music Education, 18(02):103–118.
Brennan, S. 1985. Caricature Generator: The Dynamic Exaggeration of Faces by Computer.
Leonardo, 18(3):170–178.
106
BIBLIOGRAPHY
107
Buck, R. 1984. The communication of emotion. Guilford Press.
Calder, A., Rowland, D., Young, A., Nimmo-Smith, I., Keane, J. and Perrett, D. 2000.
Caricaturing facial expressions. Cognition, 76(2):105–146.
Castellano, G., Villalba, S. and Camurri, A. 2007. Recognising Human Emotions
from Body Movement and Gesture Dynamics. Lecture Notes in Computer Science,
4738:71.
Chen, L., Chen, G., Xu, C., March, J. and Benford, S. 2008. EmoPlayer: A media player
for video clips with affective annotations. Interacting with Computers, 20(1):17–28.
Chen, P. and Yen, J. 2008. A Color Set Web-Based Agent System for 2-Dimension Emotion
Image Space. Lecture Notes in Computer Science, 4953:113.
Christie, I. and Friedman, B. 2004. Autonomic specificity of discrete emotion and dimensions of affective space: a multivariate approach. International Journal of Psychophysiology, 51(2):143–153.
Condon, C., Perry, M. and O’Keefe, R. 2004. Denotation and connotation in the humancomputer interface: The ’Save as’ command. Behaviour and Information Technology,
23(1):21–31.
D’Andrade, R. and Egan, M. 1974. The Colors of Emotion. American Ethnologist,
1(1):49–63.
Darwin, C. 1872. The Expression of the Emotions in Man and Animals. J. Murray.
Davies, S. 2001. Philosophical perspectives on music’s expressiveness. Music and emotion: Theory and research, pp. 23–44.
Derks, D., Bos, A. and Grumbkow, J. 2007. Emoticons and social interaction on the
Internet: the importance of social context. Computers in Human Behavior, 23(1):842–
849.
Derks, D., Bos, A. and von Grumbkow, J. 2008. Emoticons in Computer-Mediated
Communication: Social Motives and Social Context. CyberPsychology & Behavior,
11(1):99–101.
Ekman, P. 1957. A methodological discussion of nonverbal behavior. Journal of Psychology, 43:141–149.
Ekman, P. 1992a. An argument for basic emotions. Cognition & Emotion, 6(3 & 4):169–
200.
Ekman, P. 1992b. Facial Expressions of Emotion: New Findings, New Questions. Psychological Science, 3(1):34–38.
BIBLIOGRAPHY
108
Ekman, P. 1994. Strong evidence for Universals in facial expressions: a reply to Russell’s
mistaken critique. Psychological bulletin, 115(2):268–287.
Ekman, P. 1999a. Basic emotions. Handbook of cognition and emotion, pp. 45–60.
Ekman, P. 1999b. Facial expressions. Handbook of cognition and emotion, pp. 301–
320.
Ekman, P. and Friesen, W. 1975. Unmasking the Face: A Guide to Recognizing
Emotions from Facial Clues. Prentice Hall.
Ekman, P., Friesen, W. and Ellsworth, P. 1972. Emotion in the Human Face: Guidelines for Research and an Integration of Findings. Pergamon.
Ekman, P., Friesen, W., Hager, J. and Face, A. 1978. Facial Action Coding System.
Consulting Psychologists Press.
Erdmann, K. 1966. Die Bedeutung des Wortes: Aufsätze aus dem Grenzgebiet der
Sprachpsychologie und Logik. Wissenschaftliche Buchgesellschaft.
Espe, H. 1985. A cross-cultural investigation of the graphic differential. Journal of Psycholinguistic Research, 14(1):97–111.
Esuli, A. and Sebastiani, F. 2006. SentiWordNet: A publicly available lexical resource for
opinion mining. In: Proceedings of LREC, pp. 417–422.
Etcoff, N. and Magee, J. 1992. Categorical perception of facial expressions. Cognition,
44(3):227–40.
Faigin, G. 1990. The Artist’s Complete Guide to Facial Expression. Watson-Guptill.
Fehr, B. and Russell, J. 1984. Concept of emotion viewed from a prototype perspective.
Journal of experimental psychology. General, 113(3):464–486.
French, P. 1977. Nonverbal measurement of affect: The graphic differential. Journal of
Psycholinguistic Research, 6(4):337–347.
Friborg, O., Martinussen, M. and Rosenvinge, J. 2006. Likert-based vs. semantic
differential-based scorings of positive psychological constructs: A psychometric comparison of two versions of a scale measuring resilience. Personality and Individual Differences, 40(5):873–884.
Gabrielsson, A. and Lindstrom, E. 2001. The influence of musical structure on emotional
expression. Music and emotion: Theory and research, pp. 223–248.
Garza-Cuarón, B. 1991. Connotation and Meaning. Mouton De Gruyter.
Geller, T. 2008. Overcoming the uncanny valley. IEEE Computer Graphics and Applications, 28(4):11–17.
BIBLIOGRAPHY
109
Groleau, T. 2002. Approximating Cubic Bezier Curves in Flash MX.
URL http://timotheegroleau.com/Flash/articles/cubic_bezier_in_flash.
htm
Iser, W. 1978. The Implied Reader: Patterns of Communication in Prose Fiction
from Bunyan to Beckett. Johns Hopkins University Press.
Iwashita, S., Takeda, Y. and Onisawa, T. 1999. Expressive facial caricature drawing. In:
Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE International, volume 3.
Izard, C. 1977. Human Emotions. Plenum Pub Corp.
Izard, C. 1997. 3. Emotions and facial expressions: A perspective from Differential
Emotions Theory. Cambridge University Press.
Juslin, P. and Sloboda, J. 2001. Music and emotion: Introduction. Music and Emotion:
Theory and Research, pp. 3–20.
Kamps, J., Marx, M., Mokken, R. and de Rijke, M. 2004. Using WordNet to measure semantic orientation of adjectives. In: Proceedings of the 4th International Conference
on Language Resources and Evaluation (LREC 2004, volume 4, pp. 1115–1118.
Kaya, N. and Epps, H. 2004. Color-emotion associations: Past experience and personal
preference. Proceedings of the AIC 2004 Color and Paints, Interim Meeting of the
International Color Association.
Kleinsmith, A. and Bianchi-Berthouze, N. 2007. Recognizing Affective Dimensions
from Body Posture. Lecture Notes in Computer Science, 4738:48.
Lowe, W. 2001. What is the Dimensionality of Human Semantic Space? In: Connectionist Models of Learning, Development and Evolution: Proceedings of the Sixth
Neural Computation and Psychology Workshop, Liege, Belgium, 16-18 September 2000. Springer.
Mathes, A. 2004. Folksonomies-Cooperative Classification and Communication Through
Shared Metadata. In: Computer Mediated Communication, LIS590CMC (Doctoral
Seminar), Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, December.
URL http://adammathes.com/academic/computer-mediated-communication/
folksonomies.pdf
McCloud, S. 2006. Making Comics: Storytelling Secrets of Comics, Manga and
Graphic Novels. HarperPerennial.
McKay, C. and Fujinaga, I. 2004. Automatic genre classification using large high-level
musical feature sets. In: Proceedings of the International Conference on Music
Information Retrieval, volume 525, p. 30.
BIBLIOGRAPHY
110
Mehrabian, A. 1996. Pleasure-arousal-dominance: A general framework for describing and
measuring individual differences in Temperament. Current Psychology, 14(4):261–292.
de Melo, C. and Paiva, A. 2007. Expression of Emotions in Virtual Humans Using Lights,
Shadows, Composition and Filters. Lecture Notes in Computer Science, 4738:546.
Mikunda, C. 2002. Kino spüren: Strategien der emotionalen Filmgestaltung. WUVUniversitätsverlag.
Moore, A. 2001. Categorical conventions in music discourse: style and genre. Music and
Letters, 82(3):432–442.
Mori, M. 1970. The Uncanny Valley. Energy, 7(4):33–35.
URL http://graphics.cs.ucdavis.edu/~staadt/ECS280/Mori1970OTU.pdf
Norman, D. 2004. Emotional Design: Why We Love (or Hate) Everyday Things.
Basic Books.
Oatley, K. 1992. Best Laid Schemes: The Psychology of Emotions. Cambridge
University Press; Paris: Editions de la Maison des science de l’homme.
Oatley, K. and Jenkins, J. 1996. Understanding Emotions. Blackwell Publishers.
Ochs, M., Niewiadomski, R., Pelachaud, C. and Sadek, D. 2005. Intelligent expressions of emotions. In: 1st International Conference on Affective Computing and
Intelligent Interaction ACII. Springer.
Ogden, C., Richards, I. and Eco, U. 1923/1969. The Meaning of Meaning: A Study
of the Influence of Language Upon Thought and of the Science of Symbolism.
Routledge & Kegan Paul, Ltd.
Ortony, A. and Turner, T. 1990. What’s basic about basic emotions. Psychological
Review, 97(3):315–331.
Osgood, C. 1964. Semantic differential technique in the comparative study of cultures.
American Anthropologist, pp. 171–200.
Osgood, C. 1976. Focus on Meaning. Mouton.
Osgood, C., Suci, G. and Tannenbaum, P. 1957. The Measurement of Meaning.
University of Illinois Press.
Ou, L., Luo, M., Woodcock, A. and Wright, A. 2004. A Study of Colour Emotion
and Colour Preference. Part I: Colour Emotions for Single Colours. Color Research &
Application, 29(3):232–240.
Pan, X., Gillies, M., Sezgin, T. and Loscos, C. 2007. Expressing Complex Mental States
Through Facial Expressions. Lecture Notes in Computer Science, 4738:745.
BIBLIOGRAPHY
111
Pandzic, I. and Forchheimer, R. 2002. MPEG-4 Facial Animation: The Standard,
Implementation and Applications. Wiley.
Peter, C. and Herbon, A. 2006. Emotion representation and physiology assignments in
digital systems. Interacting with Computers, 18(2):139–170.
Picard, R. 1997. Affective Computing. MIT Press.
Plutchik, R. 1980. Emotion: A Psychoevolutionary Synthesis. Harper & Row, New
York.
Robinson, J. 2005. Deeper Than Reason: Emotion and Its Role in Literature, Music,
and Art. Oxford University Press.
Russell, J. 1980. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178.
Russell, J. 1994. Is There Universal Recognition of Emotion From Facial Expression? A
Review of the Cross-Cultural Studies. Psychological Bulletin, 115:102–102.
de Saussure, F. 1916/1959. Course in general linguistics. Philos. Library, New York.
Scaringella, N., Zoia, G. and Mlynek, D. 2006. Automatic genre classification of music
content. Signal Processing Magazine, IEEE, 23(2):133–141.
Scherer, K. 2001. Foreword. Music and emotion: Theory and research.
Scherer, K. and Zentner, M. 2001. Emotional effects of music: Production rules. Music
and emotion: Theory and research, pp. 361–392.
Schlosberg, H. 1952. The description of facial expressions in terms of two dimensions.
Journal of experimental psychology, 44(4):229–37.
Schlosberg, H. 1954. Three dimensions of emotion. Psychological Review, 61(2):81–8.
Schubert, E. 2001. Continuous measurement of self-report emotional response to music.
Music and emotion: Theory and research, pp. 393–414.
Schubert, E. 2004. EmotionFace: Prototype facial expression display of emotion in music.
In: Proc. Int. Conf. On Auditory Displays (ICAD).
Schubert, E. 2007. The influence of emotion, locus of emotion and familiarity upon preference in music. Psychology of Music, 35(3):499.
Schubert, E. and Fabian, D. 2006. The dimensions of baroque music performance: a
semantic differential study. Psychology of Music, 34(4):573.
Sloboda, J. and Juslin, P. 2001. Psychological perspectives on music and emotion. Music
and emotion: Theory and research, pp. 71–104.
BIBLIOGRAPHY
112
Solomon, R. 1993. The philosophy of emotions. Handbook of emotions, pp. 3–15.
Spindler, O. 2006. Managing Music. Unpublished undergraduate thesis.
Spiteri, L. 2007. Structure and form of folksonomy tags: The road to the public library
catalogue. Information Technology and Libraries, 26(3):13–25.
Strongman, K. 1996. The Psychology of Emotion: Theories of Emotion in Perspective. John Wiley & Son Ltd.
Suk, H. 2006. Color and emotion. Ph.D. thesis, Universität Mannheim, Allgemeine
Psychologie.
URL http://madoc.bib.uni-mannheim.de/madoc/volltexte/2006/1336/pdf/
version_11.0.pdf
Tanahashi, S. and Kim, Y. 1999. A comic emotional expression method and its applications. In: TENCON 99. Proceedings of the IEEE Region 10 Conference, volume 1.
Tarrant, M., North, A. and Hargreaves, D. 2000. English and American Adolescents’
Reasons for Listening to Music. Psychology of Music, 28(2):166.
Urban, W. 1939. Language and reality. Allen & Unwin London.
Valdez, P. and Mehrabian, A. 1994. Effects of color on emotions. Journal of experimental psychology. General, 123(4):394–409.
Walther, J. and D’Addario, K. 2001. The Impacts of Emoticons on Message Interpretation
in Computer-Mediated Communication. Social Science Computer Review, 19(3):324.
Wang, C. 1993. Langwidere: A Hierarchical Spline Based Facial Animation System
with Simulated Muscles. Ph.D. thesis, University of Calgary.
Woodworth, R. and Schlosberg, H. 1954. Experimental Psychology. Methuew.
Young, A., Rowland, D., Calder, A., Etcoff, N., Seth, A. and Perrett, D. 1997. Facial
expression megamix: Tests of dimensional and category accounts of emotion recognition.
Cognition, 63(3):271–313.
Zhang, S., Wu, Z., Meng, H. and Cai, L. 2007. Facial Expression Synthesis Using PAD
Emotional Parameters for a Chinese Expressive Avatar. Lecture Notes in Computer
Science, 4738:24.
Zorn, J. 1999. Arcana: Musicians on Music. Granary Books.