FAKULTÄT FÜR !NFORMATIK Affective space interfaces DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur im Rahmen des Studiums Medieninformatik eingereicht von Oliver Spindler Matrikelnummer 0100611 an der Fakultät für Informatik der Technischen Universität Wien Betreuer: Ao.Univ.Prof. Dr. Peter Purgathofer Wien, 20.03.2009 _______________________ ______________________ (Unterschrift Verfasser) A-1040 Wien Technische Universität Wien Karlsplatz 13 Tel. +43/(0)1/58801-0 (Unterschrift Betreuer) http://www.tuwien.ac.at H Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwendeten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 20. März 2009 2 Kurzfassung Die Bedeutung, die wir Informationseinheiten wie Wörtern, Bildern, Musik und Filmen zuschreiben, liegt nicht nur in ihrem rationalen Wert, sondern genauso in den Gefühlen, die diese Dinge in uns erwecken. Interaktive Systeme beschränken sich bislang vor allem auf die Beschreibung des eigentlichen Inhalts, die Denotation, während die Emotionen, die wir mit Informationseinheiten assoziieren, ihre Konnotation, meist nicht beachtet wird. Die Berücksichtigung von emotionaler Bedeutung würde neue Möglichkeiten eröffnen, um Inhalte zu strukturieren und zu entdecken. Benutzer könnten Inhalte finden, die ihrer Stimmung entsprechen, unabhängig vom Medientypen. Die vorliegende Arbeit untersucht verschiedene Möglichkeiten, wie emotionale Bedeutung in interaktiven Systemen repräsentiert werden kann. Vorgestellte Lösungsansätze sind unter anderem der Einsatz von Sprache, Farben und Gesichtsausdrücken. Die theoretische Basis für diese Untersuchung bilden Arbeiten aus den Disziplinen Semiologie, experimentelle Psychologie, Kunsttheorie und Interaktionsdesign. Es wurde ein comichaftes Gesichtsmodell entwickelt, das Emotionen durch Gesichtsausdrücke visualisiert. Die resultierende Softwarekomponente kann in Webbrowsern eingesetzt werden und folgt dem Verhalten von echten Gesichtern, indem Gesichtsmuskeln simuliert werden. Sechs Grundemotionen können gemischt werden, um spezifische und subtile emotionale Zustände auszudrücken. 3 Abstract The meaning we ascribe to information entities – words, pictures, music, video, etc. – not only lies in their rational value but as much in the feelings they evoke in us. Currently, interactive systems focus on description of the actual content, or denotation, while associated emotions, the affective connotation of content, is usually neglected. Embracing the affective meaning of media entities would enable to structure content in novel ways, emphasising similarities and differences across media types not covered by textual content descriptions, and would allow users to find content to match an intended mood. To this end, I examine work from diverse fields such as semiology, experimental psychology, art theory and human-computer interaction to form the theoretical basis for the use of this aspect of meaning in interactive systems. Building on this theoretical basis, several ways to visualise and describe affective connotation are compared, which include the use of language, colours and facial expressions. Additionally, a software component has been developed, which visualises affective connotation through facial expressions of a comic-like face. The face model uses web technology for easy deployment in browser environments and simulates the behaviour of biological faces by following a muscle-based approach. It is capable of expressing subtle emotional changes through arbitrary blending of six basic emotions. 4 Thank you. . . to everyone who made this work possible, especially. . . to Scott McCloud, for inspiring work and support to Tony Bryant, for supervision during my exchange semester to the people at igw/hci at Vienna UT to Peter Purgathofer, for years of inspiration and guidance to my friends and family, especially. . . to Thomas, for accommodating me and great cooperation to my parents, for a lifetime of understanding and support 5 Contents 1 Introduction 1.1 2 3 4 5 8 Research question and chapter outline . . . . . . . . . . . . . . . . . . . 10 Meaning 11 2.1 2.2 Meaning in linguistics and semiology . . . . . . . . . . . . . . . . . . . 11 Meaning in experimental psychology . . . . . . . . . . . . . . . . . . . 15 2.3 Meaning in information resources . . . . . . . . . . . . . . . . . . . . . 20 Affect and emotion theories 23 3.1 Emotion, mood or affect? . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Induction and communication of emotion . . . . . . . . . . . . . . . . . 25 3.3 Dimensional vs. categorical approach . . . . . . . . . . . . . . . . . . . 26 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Affect in art and media 34 4.1 General theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Metalanguages for affective space 46 5.1 Affective scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Semantic scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3 Affect words in natural language . . . . . . . . . . . . . . . . . . . . . . 52 5.4 Colours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.5 Facial expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6 CONTENTS 6 7 Grimace 7 76 6.1 6.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.3 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.4 Technical details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.6 Discussion and future directions . . . . . . . . . . . . . . . . . . . . . . 101 Summary and conclusion Bibliography 103 106 Chapter 1 Introduction We do not just think, we feel. In fact, one cannot be without the other. As sentient beings, our cognition always involves both reason and emotion working together hand in hand. The meaning we ascribe to all things lies as much in their rational value as in the feelings we associate with them. Computers are neither sentient nor capable of thinking in the true sense of the word. The artificial intelligence community is working hard to change this, but at their core, computers remain machines, calculators, which describe the world through mathematics and logic. The web has made it possible to find comprehensive information about just any topic; all it takes is to enter one or a few keywords into a search engine. In this way, computers help us to overcome our limited processing and memory capabilities and support reasoning. Affective experiences – feelings, moods and emotions – are much more difficult to grasp. Being internal sensations, they are not ‘there’, not tangible. And yet, they accompany us in everything we do and think, serving as implicit yet ubiquitous guides in our lives. Emotions do not follow mathematical laws, nor can they be predicted by logic. Perhaps this is the reason why there have been so few attempts to acknowledge this part of meaning in computer science. However, emotions present a big opportunity if we strive to increase the usefulness of computers. If software not only supported what things mean on a rational level, but also how we feel about these things, this should greatly improve the ways in which content can be structured, queried and presented. This thesis is an attempt at showing a way how this goal might be achieved. To this end, I build on work from a diverse range of scientific fields, primarily semiotics and linguistics, (experimental) psychology, art theory and, of course, human-computer interaction (HCI). As is the case with probably most interdisciplinary work, a main challenge was to reconcile different traditions and specialised terminology to form a coherent argument. While I have taken care to do justice to 8 CHAPTER 1. INTRODUCTION 9 all theories consulted, it seems quite inevitable that I have oversimplified matters at one point or another. Affective computing is a recent yet already thriving research area in the HCI community. It can be seen as an effort to recognise the importance of affect and emotions in interaction with computers. Affective computing strives to create interactive systems which register the user’s emotional state and adapt their behaviour and output in a way which is appropriate for the user. When computers behave like human conversational partners, both exhibiting and reacting to emotions, interaction experience should become more pleasant and effective for users. This goal has been summarised as “making machines less frustrating to interact with” (Picard, 1997, p. 214). This thesis, too, is concerned with the role of affect in human-computer interaction. As such, it may be categorised as being an effort in affective computing. However, both the starting point of this investigation as well as its goals differ considerably from those of affective computing. Rather than seeing affect as an atmospheric facilitator of interaction, this thesis tries to examine how affect can be a focal point of discourse when we interact with computers. I build on the hypothesis that the way we feel about things is as important as what these things mean on a rational level. Thus, this thesis treats affect as an intrinsic property of any meaningful entity. Entities which seem to be entirely unrelated at first glance may evoke very similar affective responses in humans. The term meaningful entity is deliberately general. It encompasses words and larger linguistic units, various kinds of art and media and any other form human creativity takes. CHAPTER 1. INTRODUCTION 1.1 10 Research question and chapter outline This thesis addresses the question, “What is the nature of the affective experiences we associate with meaningful entities, and how can these experiences be described and utilised in interactive systems?”. The respective chapters of this thesis deals with various issues on the way to this goal. Chapters 2–4 deal with the first part of the research question, while chapters 5 and 6 focus on the second part of the research question. • Chapter 2 examines how affect is related with meaning. This brief initial analysis introduces the concepts of affective connotation and affective space. It serves as necessary theoretical groundwork for the subsequent chapters. • Chapter 3 introduces theories of affect and emotion. Understanding the psychological and biological nature of affect is needed to inform the design of affect-aware interactive systems. • Chapter 4 examines the special relation of affect and arts and tries to explain how different media types manage to express and arouse emotions. • Chapter 5 compares several ways in which affect can be described and utilised in interactive systems. Examples include affect words, facial expressions and colours. • Chapter 6 describes Grimace, an experimental affective space interface, which visualises emotions through facial expressions of a comic-like face. Chapter 2 Meaning This thesis builds on the proposition that affect is an intrinsic property of meaning. This chapter demarcates the kind of meaning I refer to, how its relationship with affect might come about, and why I consider it to be of particular importance for the design of interactive systems. To this end, theories about meaning from two different scientific fields are compared. Theories from the domains of linguistics and semiology are consulted first, laying the theoretical groundwork for work from the area of experimental psychology. Finally, I outline what implications these theories have in the domain interactive systems and information resources. 2.1 Meaning in linguistics and semiology The task of defining the meaning of ‘meaning’ is a difficult one and spans many centuries of scholarly debate. In their classical treatise of The Meaning of Meaning, Ogden et al. (1923/1969) identified no less than 16 different groups of definitions for ‘meaning’ which have been put forward by various authors. I see manifold reasons for such disagreement. First of all, different disciplines ask for different working definitions. Model-like explanations reduce the complexity of a problem in order to illustrate the issue at hand. Most importantly, however, discussion of meaning leads to discussion of the very nature of sense-making and is thus shaped by our view of the world. In this way, the actual meaning of ‘meaning’ depends on the context the term is used in. The Oxford English Dictionary supplies us with a valuable starting point for finding a working definition for this thesis. One of the several definitions the OED offers for ‘meaning’ is, “That which is intended to be or actually is expressed or indicated.” (OED: meaning) This definition can be analysed with the use of semiotic theory. 11 12 CHAPTER 2. MEANING A linguistic sign was defined by de Saussure (1916/1959) as a dyadic relationship, in which a signifier signifies a signified. He uses this relation signifier > signified as the basis of his investigation. A signifier is the “sound image”, the form a word takes, while the signified is the “concept”, referent, or, we might say, “that which is expressed or indicated”. Thus the definition above from the OED clearly focuses on the signified. There is significant disagreement between authors whether ‘meaning’ lies in the relation signifier > signified, or in the signified itself (Ogden et al., 1923/1969, p.185). Homonyms and synonyms It is easy to show that the relationship between signifiers and signifieds is less than clear in most cases. For instance, one signifier can point to multiple signifieds. The signifier fluke can refer to a type of fish, a type of flatworm, the hooks of an anchor, a manufacturer of electronic test equipment or a novel of this name. In linguistics, a sign relation in which one signifier points to multiple signifieds simultaneously is called a homonym. On the other hand, completely different signifiers can refer to the very same signified. movie and film, or baby and infant are but two of countless examples. Signifiers which point to identical signifieds are synonyms. The same model signifier > signified can be applied equally well to non-linguistic signs. The word hammer refers to an object with a heavy, sturdy top part and a handle bar, which is designed to apply accumulated manual force onto another object. Now consider the icon , a non-linguistic signifier, which arguably signifies the same signified as the word hammer. Whether we use linguistic or non-linguistic signs, we use them in a world already filled with signs. Usually, we have many alternative signs to choose from. Synonymous and homonymous signs are examples where sign meanings overlap. When a sign has another meaning, we are or can be made aware of this fact. Consider puns, which are but one example where this ambiguity is being used for humorous effect. Therefore, we need to extend our working definition of ‘meaning’ to “that which is intended to be or actually is expressed or indicated”. Also note that our definition of meaning does not imply the use of language for the process of signification. Our definition now includes two different kinds of ‘meaning’; the intended meaning and the actually expressed meaning. If signs were fully explained by a simple signifier > signified relation, no such ambiguity would be possible, and intended meaning would always equal actually expressed meaning. CHAPTER 2. MEANING 13 Historical view: two aspects of meaning Garza-Cuarón (1991) gives a comprehensive account of the history of the concept of ‘meaning’. Throughout history, there has been a clear tendency by scholars to distinguish between a first and a second meaning. The basis for this distinction, however, has changed many times. Since mediaeval times, adjectives, or, “connotative terms”, were said to have two meanings. Firstly, an adjective refers to the subject which possesses the quality indicated by the adjective. Secondly, an adjective indicates a quality which a subject possesses. James Mill (1829/1878, cited in Garza-Cuarón, 1991) reverses the mediaeval distinction, extends the definition to verbs and introduces new names. Notation refers to the quality indicated by an adjective or the action indicated by a verb. Connotation refers to the subject which possesses the indicated quality or performs the action. John Stuart Mill (1843/1973, cited in Garza-Cuarón, 1991) finally introduces the terms denotation and connotation. Denotation refers to all subjects a word applies to, connotation refers to the attribute which is implied through a word. For instance, “the word white denotes all white things, as snow, paper, the foam of the sea, etc., and implies, or in the language of the schoolmen, connotes, the attribute whiteness” (Mill 1843/1973, cited in Garza-Cuarón, 1991). The distinction between ‘connotation’ and ‘denotation’ as the two primary aspects of ‘meaning’ was very influential and has been employed ever since. Defining denotation and connotation The Oxford English Dictionary offers, among several others, this definition for denotation: “A term employed to denote or describe a thing; a designation.” (OED: denotation). This definition can be seen as a valid description of the simple sign relation (signifier > signified, or “sound image” references “concept”). In the traditional view of ‘connotation’, introduced by John Stuart Mill, the term refers to the attributes that are implied when we refer to a specific signified. For instance, reference to the mythical figure Hercules (denotation) implies features like strong, male, mythical (connotation). Urban (1939) suggests the name conceptual connotation for this tradition. In the first half of the 20th century, a new usage of connotation clearly emerges. Through the contributions of Ogden et al. (1923/1969), Erdmann (1925/1966, cited in Garza-Cuarón 1991) and Urban (1939), ‘connotation’ now refers to something less clearly defined than implied attributes. Urban speaks of “the feeling or emotion with which the word is bound up as an expression” (Urban, 1939, p. 141), and Osgood deals with “connotative, emotive, or metaphorical ‘meaning’” (Osgood et al., 1957, p. 321). According to Garza-Cuarón (1991), connotation has always had this meaning in layman English, but the 20th century saw the embrace of this tradition in scientific debate. In this context, it is common in scientific debate to refer to 14 CHAPTER 2. MEANING ‘emotive meaning’ or ‘emotive tradition’. However, we will see in the next chapter that this leads to a quite ambiguous view of the word ‘emotion’. Hence I refer to this aspect of connotation via the more general term affective connotation. Thus, there are two very different views of what ‘connotation’ is. The term either refers to implied properties of a sign (conceptual connotation) or to the feelings which are aroused or somehow related with a sign (affective connotation). The Oxford English Dictionary defines ‘connotation’ in the following way: “The signifying in addition; inclusion of something in the meaning of a word besides what it primarily denotes; implication.” (OED: connotation) Arguably, both traditions, ‘conceptual connotation’ and ‘affective connotation’, are covered by this definition. Let us return to the working definition of meaning from before; “that which is intended to be or actually is expressed or indicated”. Denotation refers to the intended meaning of a sign. However, something else is actually expressed along the way too, which can be summed up as connotation. Semiotic view: Denotation, connotation and metalanguage Barthes (1973/1996) gives an explanation of how denotation and connotation are related. The sign model used by him is very similar to the one introduced above. A system of signification (or sign) consists of a plane of expression (or signifier), which references a plane of content (or signified). For reasons of consistency, I will stick with De Saussure’s terminology. He maintains that connotation occurs when one sign (the denotation) becomes the signifier of a new sign (the connotation). Figure 2.1 illustrates this principle. CONNOTATION DENOTATION Sd Sr Sr Sd Figure 2.1: Denotation and connotation. Adapted from Barthes (1973/1996). For example, consider the sign ‘machine’. Generally, the signifier machine refers to devices which convert energy into some form of activity. This is the denotative level. However, there is more to ‘machine’ than this plain statement. Machines can symbolise industrialisation and thus inhumanity, loss of jobs, generally unpleasant notions. However, they can also evoke positive associations with, say, increased living standard through automation. It is not possible to discern if the signifier or the signified provokes these associations; thinking of or seeing a machine can have the same effect as hearing machine. For now, we can only say that the sign ‘machine’ evokes the associations. (Chapter 4 elaborates on this question.) 15 CHAPTER 2. MEANING When this sign evokes something else, it actually must have become the signifier of a new system. The new signifier references ‘something else’, which must therefore be another signified. This new system may be called the connotative level. Therefore, “a connotated system is a system whose plane of expression is itself constituted by a signifying system” (Barthes, 1973/1996, p.129). Before we can examine what this something else, the connotated signified, actually is, we need the semiotic groundwork for this analysis, for it is not obvious how to describe the signified of a connotative system. On the denotative level, we can find actual entities in our surrounding world which we can later reference through the use of words. However, this is not possible on the connotative level, for this system’s signified is intangible, with no manifest object to refer to. So, the best we can do is to refer to a connotated signified by proxy; we need a new sign which reliably stands for the sought after signified. An obvious choice is the denotative sign which revealed the connotation’s existence in the first place. However, this does not allow us to further investigate connotation, we can still only refer to a ‘something else’ which we can feel. What we therefore need is a new system which substitutes the connotated signified we seek to describe with a new sign of which we know both signifier and signified. Barthes calls this substitution metalanguage. The principle is illustrated in figure 2.2. Hence, “a metalanguage is a system whose plane of content is itself constituted by a signifying system” (Barthes, 1973/1996, p.130). CONNOTATION METALANGUAGE Sr Sd Sr Sd Figure 2.2: Connotation and metalanguage. Adapted from Barthes (1973/1996). In the course of this thesis, we will encounter various metalanguages which have been put forward by different authors to describe connotation. 2.2 Meaning in experimental psychology In 1957, Osgood et al. were the first to attempt quantitative measurement of ‘meaning’. The task of measuring meaning is a bold one, for, as the authors admit, “[t]here are at least as many meanings of ‘meaning’ as there are disciplines which deal with language” (Osgood et al., 1957, p.2). The use of the very general term ‘meaning’ for their efforts proved to be controversial, and it is important to examine what kind of ‘meaning’ Osgood et al. were dealing with. Their own definition is based on “the distinction between what has variously been called denotative, designative, or referential ‘meaning’ and what has been CHAPTER 2. MEANING 16 called connotative, emotive, or metaphorical ‘meaning’.” (Osgood et al., 1957, p.321) Once again, we see the distinction between denotation and connotation. Osgood et al. are not concerned with denotation: “[We] are not providing an index of what signs refer to, and if reference or designation is the sine qua non of meaning, as some readers will insist, then they will conclude that this book is badly mistitled.” (Osgood et al., 1957, p.321, italics theirs) Instead, their focus lies on “connotative, emotive, or metaphorical ‘meaning’”. Apart from this statement, no reference is given about which literature their understanding is based on. However, they clearly subscribe to the new current of affective connotation introduced before, the psychological association of ideas to linguistic or non-linguistic stimuli. They do not mention the traditional use of the term ‘connotation’, which I refer to as ‘conceptual connotation’, at any point in the book. The general term ‘meaning’ is used in the specific sense of affective connotation throughout the book. This “seemingly peculiar use of connotation” (Garza-Cuarón, 1991, p.106, italics hers) triggered heavy criticism. “[The linguist Uriel] Weinreich is criticising Osgood for his ignorance of the studies on the subject of meaning.” (Garza-Cuarón, 1991, p.108) Osgood admits not to be “as sophisticated as I probably should be with respect to philosophical and linguistic semantics” (Osgood 1959, cited in (Garza-Cuarón, 1991, p.107)), but also reminds his critics of the longstanding debate about the definition of meaning and connotation. In my opinion, giving the book a title more specific would have avoided a lot of unnecessary controversy, which actually was anticipated by the authors (Osgood et al., 1957, p.320). The semantic differential Osgood et al. (1957) introduce the semantic differential, a tool devised for quantitative measurement of affective connotation. Test subjects are supplied with a linguistic stimulus and with a number of seven-step scales. On every scale, the two extremes are marked with bipolar adjectives, i.e. pairs of adjectives with antonymous or opposite meaning. Test subjects are asked to rate the presented stimulus along the scales by ticking each of them at a position which feels appropriate. If the left end of the scale is marked with A and the right end is marked with B, the seven steps mean ‘extremely A’, ‘quite A’, ‘slightly A’, ‘neutral’, ‘slightly B’, ‘quite B’, ‘extremely B’. In a way, semantic differentials are similar to Likert scales, which are still popular in questionnaires. On a Likert scale, extremes are not defined with words. Instead, subjects rate their level of agreement with the stimulus; the scale ranges from ‘totally disagree’ to ‘completely agree’. Although it has been noted that semantic differentials and Likert scales do not deliver equal results (Friborg et al., 2006), the concept is very similar. The semantic differential has proven to be widely influential and has become a standard method in psychological questionnaires. Initially devised by Osgood et al. CHAPTER 2. MEANING 17 for use with linguistic stimuli (i.e. words), semantic differentials can also be used for non-linguistic stimuli. The basic referential function of signs, i.e. ‘denotation’, is not necessarily dependent on the mode of the sign, and, as shown before, two completely different signifiers can refer to the very same signified. For instance, French (1977) and Espe (1985) suggest a graphic differential for cross-cultural research, which uses non-linguistic icons instead of antonymous adjectives to describe the scales. In my undergraduate thesis (Spindler, 2006), I used semantic differentials for musical stimuli in a web-based quantitative study. Major factors Osgood et al. (1957) used the semantic differential in several studies with a large number of test subjects. With their test results, they performed a factor analysis. When different scales are rated similarly to other scales, this indicates similarity or overlap in the connotation of the scales. Factor analysis finds these similarities and shows underlying dependencies between scales. It reduces the number of dimensions by extracting the statistical variance. Three factors or dimensions were identified to be of greatest importance. They reoccurred in every study and were subsequently named (Osgood et al., 1957, p.6263). Evaluation is the most important factor. It “accounts for approximately half to three-quarters of the extractable variance” (Osgood et al., 1957, p.72). The adjective pair that received the purest results for this factor was good-bad. This means that almost all of this pairs’ affective association is covered by evaluation. Other examples named are optimistic-pessimistic, positive-negative and completeincomplete. Therefore, more than half of a word’s ‘affective connotation’ is determined by how favourable or unfavourable we perceive its denoted signified. They conclude that “the attitudinal variable in human thinking [. . . ] appears to be primary – when asked if she’d like to see the Dinosaur in the museum, the young lady from Brooklyn first wanted to know, ‘Is it good or is it bad?’” (Osgood et al., 1957, p. 72) Potency is connected with power and related concepts like size or weight. The pivot pair (the purest scale) named by Osgood et al. is hard-soft. Other examples are heavy-light, masculine-feminine or strong-weak. Activity is the third factor, “concerned with quickness, excitement, warmth, agitation and the like.” (Osgood et al., 1957, p.73) active-passive was determined as the pivot pair, other examples are excitable-calm or hot-cold. Potency and activity are of similar importance to the affective connotation of a word, each of them accounting for approximately half the variance of evaluation. Osgood et al. also found a slight correlation between the two factors. For that reason, they also suggest that one might combine the two factors under the name dynamism. As can be seen in figure 2.3, many other factors have been extracted. 18 CHAPTER 2. MEANING Osgood et al. suggest names for some of these factors but could not identify them as stable across all studies. Though the first three factors account for much of a terms’ affective connotation, one needs to bear in mind that a three-dimensional description can not be exhaustive (p. 323). The authors do not believe, however, that a very high number of dimensions would finally lead to a description of a word’s denotation. Per cent of Total Variance 100% 75% 50% 25% I II III Evaluation Potency Activity IV V VI VII VIII IX n Dynamism Factors in Order of Extraction Figure 2.3: “Relative importance of semantic space dimensions”. Adapted from (Osgood et al., 1957, p.73) However, the factors uncovered by Osgood et al. show something fundamental about human sense-making. Valdez and Mehrabian (1994) note that the same factors were also replicated for other kinds of stimuli like paintings and sonar signals. Consequently, the factors play an important part in many theories of affect and will reoccur on many occasions throughout this thesis. Personal and cultural subjectivity Affective experiences are inherently subjective. The statements obtained by the semantic differential vary from person to person and across cultures, depending on the personal views and values of the person asked. Osgood (1964) conducted a cross-cultural study to address these issues, focusing on 12 to 16 year old males from 16 different countries. First of all, concepts differ considerably in their polarization, which is calculated as the average distance of ratings from the centre of the scales. Some concepts evoke much stronger affective reactions than others. For instance, mother evokes strong affective reactions in all cultures, while wednesday has low affective intensity ev- CHAPTER 2. MEANING 19 erywhere. Furthermore, the level of polarisation can be different across cultures. For instance, the concept of guilt did not evoke strong affective reactions in USAmericans and Indians, but very strong reactions in three other cultures. His analysis further shows that the level of subjectivity varies strongly across tested concepts, represented by different values for standard deviation. Osgood sees this as an indicator of how much a concept is stereotyped. For some words, considerable agreement is reached, which Osgood calls “culturally stereotyped concepts”. On the other hand, concepts for which very different answers were given are “culturally amorphous”. Furthermore, the level of stereotyping varies crossculturally. Some of the tested concepts achieved quite consistent ratings in some cultures and highly diverse ones in other cases. Despite all the cultural differences, the notion of three basic factors EPA could be reproduced in all cases. While actual affective connotation varies interpersonally and cross-culturally, the ways in which it can be described seem to be very consistent. Affective space Evaluation, Potency and Activity (EPA) have proven to be stable factors in the description of affective connotation. They are defined as independent, orthogonal dimensions of a Euclidian space. Thus, they can be seen to span a cartesian coordinate system. Osgood et al. call this coordinate system ‘semantic space’. However, ‘semantics’ is, like ‘meaning’, a very general term which can be seen to cover both denotation and affective connotation. Osgood et al. note, as said before, that their work does not capture the denotative function of signification, which can illustrated by an example. The signs success, nurse and sincere refer to different, not necessarily related things (denotation). However, they yield very similar results for the factors EPA (Osgood et al., 1957, p.323), which means that our affective reaction towards these concepts is very similar (affective connotation). In this way, they show a relation in affect between these concepts, a similarity not covered by their lexical definition. Dimensional theories of affect and emotion (see chapter 3) replicate these factors. In his crosscultural study, Osgood (1964) himself picks up the term affective meaning to describe what is measured by the semantic differential. For these reasons, the term affective space seems to be a more appropriate name for the coordinate system spanned by these general affective factors. This hypothesised affective space describes affective connotation. As Barthes (1973/1996) has shown, connoted systems can not be described directly. Instead, we need to use some form of metalanguage, which replaces the connotated signified with a sign of which we know both signifier and signified. Evaluation, potency and activity fulfil this requirement. Thus, they constitute a very direct and low-level metalanguage for affective connotation. CHAPTER 2. MEANING 2.3 20 Meaning in information resources The described aspects of meaning – denotation, conceptual connotation and affective connotation – apply to any form of content, anything we ascribe meaning to. In the following, I try to outline – in admittedly simplified terms – how these semiotic principles can be applied to the world wide web. We can look at the web as a huge collection of content. In itself, this collection is unstructured; there is no global hierarchy or taxonomy of meaning. The content is made up of different media types; text, pictures, sound and video. On the technical level, these entities are represented as files or objects; a more general term is information entity. Each and every information entity carries meaning for humans, which can be divided up further into denotation, conceptual connotation and affective connotation. Consider the situation for text, the most common media type on the web. On the denotative level, text consists of a signifier – the digitally represented words – and a signified – the lexical meaning of the text, the actual content. This is the level of meaning we usually engage with. It is also a form of meaning well handled by software. Search engine robots can parse the text. At a first level, the robot can make an index of words which occur in the text. This already proves very helpful for human users to find content. Much like a dictionary, we can retrieve the content if we query the search engine index with words that occur in the text. For this process, the software does not need to have any idea about what is actually signified in the text; the whole process relies on the human capability of sense-making. The search engine robot can be assisted by explicit metadata. HTML <meta> tags describe content in a structured manner. The author can enter a number of keywords which hint towards what is denoted in the text and can enter a language code. Thus, metadata categorises the content. In our semiotic model, metadata corresponds to conceptual connotation. What remains to be covered is the affective connotation of the text. Generally, this aspect of meaning is not present in the words (signifier) of the text; it is something which we can feel if we are aware of the content of the text (signified). A naïve search engine robot has no awareness of the signified and is not sentient. Thus, a simple word parsing approach does not work. Likewise, it is unclear how the user is supposed to express a query for content with a certain affective value. As we have seen, connotation can only be described through the detour of a metalanguage. The task of an affective space interface is thus twofold. It must present the affective connotation of information entities to the user in an understandable manner (visualisation), and users need to able to express what kind of affect a sought after information entity needs to express (query). Of course, these are just two ways of looking at the same problem; ideally, users can query the system in the same way the system visualises affective connotation. For these tasks, effective metalanguages 21 CHAPTER 2. MEANING are necessary, for both internal representation and in ways which are comprehensible for humans. Data generation methods Quite recently, the information retrieval community has begun efforts to automatically extract the affective value of text (e.g. Kamps et al., 2004; Esuli and Sebastiani, 2006; Bestgen, 2008) or other media types (e.g. Chen et al., 2008 deal with music). These efforts are summarised under the term opinion mining. This is a young research area, aiming to solve the problem of annotating existing content with affective metadata. This thesis is not about algorithmic feature extraction. Instead, my focus lies on human interfaces which allow users to interact with the affective value of content. Generally, I am making no assumptions as to how this data came about. Algorithmic extraction may certainly be helpful for large-scale affective annotation of source material. However, current efforts mostly concern text and are only approaching extraction of the first dimension, evaluation. Furthermore, the analysis of Osgood (1964) has shown the strong subjectivity and cultural differences of affective statements, which seems to contradict the underlying assumption of opinion mining that there is an inherent affective meaning which can extracted algorithmically. The interfaces I will describe allow manual annotation of content with affective information. Affective connotation is an internal and subjective process. Therefore, the only method that seems to be both valid and feasible is introspection, the reporting of internal experiences by the subjects themselves. The idea is to facilitate introspection by giving the user an interface which allows him or her to express affective states in an intuitive way, and which visualises affect in some form on a computer screen. The other method that has been employed in experimental psychology to attain data about internal experiences is the use of physiological measurements, like the measurement of heart rate and blood pressure. In a laboratory setting for experiments, this method may be feasible. For users of interactive systems, however, it is not. We need a method which gets by with the output and input options of a standard computer setup. One possible exception might be the use of automatic facial expression recognition, which will be discussed briefly in chapter 5.5. A manual, introspective annotation process could be facilitated by social collaboration. This process can be compared, for example, with social bookmarking services like delicious1 . There, too, content is not being categorised algorithmically. Instead, users describe their bookmarks with short descriptions and categorise it with tags. Because this information is available to other users, they benefit too. Content annotation thus becomes a collaborative effort. In the case of affective con1 http://www.delicious.com, last accessed 2009-03-18 CHAPTER 2. MEANING 22 notation, social collaboration would increase the validity of statements, resulting in a kind of voting process. Each user gives an opinion, and from several statements, statistical data can be derived. A mean or median value could indicate an overall direction of an entity’s affective value, while standard deviation indicates to which extent users agree in their opinions (stereotyped vs. amorphous in Osgood’s terms). Human statements about affect are all equally valid, and this process offers a way to achieve sensible overall statements. Discussion Affective connotation has been recognised as an intrinsic part of meaning. While denotation and conceptual connotation are covered well by interactive systems, there are only few attempts to represent the affective value of information entities. One possible reason for this is the difficulty of describing and visualising this kind of information. Since affect is only connoted, it cannot be described directly but only via metalanguages. I believe that affective connotation represents a big yet virtually untapped potential for use in information systems. In the next chapter, I examine the nature of affect and emotions. Chapter 3 Affect and emotion theories The authors who noted the existence of emotional or affective connotation (e.g. Urban, 1939, Ogden et al., 1923/1969, Osgood et al., 1957) say little about the reasons for its existence, apart from reference to emotions. In this chapter, I will undertake a brief examination of theories about the nature of affect and emotions. As the examination will show and was also noted by Osgood (1964), theories of emotion and affect resonate well with evaluation, potency and activity, the major dimensions of affective space as put forward by Osgood et al. (1957). Therefore, it seems to be a justifiable inference to see emotion as an important part of affective connotation. Emotions are an important aspect of life for humans and animals. Their existence is undisputed; their basis and function, however, is disputed. As Fehr and Russell (1984) put it, “Everyone knows what an emotion is, until asked to give a definition.” Most research about emotion has been undertaken in the field of psychology. However, there are also theories about emotion which come from a wide range of other disciplines – e.g. biology, philosophy, anthropology, musicology. Strongman (1996) gives a comprehensive account of more than 150 theories of emotion. However, he admits that having an informed view of emotion does not necessarily mean that one might be able define what emotions really are. The brief survey which follows only provides a glimpse into the emotion research and is in no ways exhaustive. Emphasis has been put on theories which could prove valuable in informing interface design. Research and scholarly debate about emotions has its roots in philosophy (Strongman, 1996, Solomon, 1993). Plato disregarded emotions and saw it as something that hinders and detracts from reason. Strongman (1996, p.5) argues that this view is still common in “folk theory”; outbursts of emotion are still frowned upon, one is always expected to contain one’s emotions and to act rationally. Aristotle, on the other hand, had a much more positive view of emotions. He realised that our perception of what happens around us influences our emotions. He was able to name and analyse specific emotions like anger, pity and fear, and he also saw a connec23 CHAPTER 3. AFFECT AND EMOTION THEORIES 24 tion of emotion with pleasure and pain, thus anticipating an evaluative dimension of affect. After that, emotion research was neglected for a long time. Finally, in the late nineteenth century, Darwin’s contribution to the field, The Expression of the Emotions in Man and Animals (1872), pioneered the view that emotions are innate and occur in animals and humans alike. His work is the source of the view that emotions have biological reasons, rather than being social constructs which are learned in the course of a lifetime. Darwin noted the intrinsic relationship between emotions and facial expressions, the latter being seen as the primary way how emotions are communicated. Facial expressions will be discussed in detail in chapter 5. Darwin’s work has influenced many researchers. 3.1 Emotion, mood or affect? Different terms with overlapping meaning are in use to refer to affective phenomena, even within scientific disciplines. The most common terms, which at times are being used interchangeably, are ‘affect’, ‘emotion’ and ‘mood’. Sloboda and Juslin (2001) see inaccuracy in the choice of terms as a major source of disagreement between researchers. So, before affect theories can be examined, these terms need to be disambiguated. Affect is seen as the most general of the three terms and includes other affective phenomena like emotions or moods. Figure 3.1 gives a graphical overview of affective concepts and how they can be distinguished through their respective duration. The shortest affective phenomena are “facial expressions and most bodily responses” (Oatley and Jenkins, 1996, p.29), which typically last in the range of a few seconds. Emotions are seen to last for a period of time in the range of minutes to hours, though others (e.g. Schubert, 2001) might place emotions in the range of seconds. Moods can last from hours to months. At the right end of the spectrum, emotional disorders and personality traits are long-term affective phenomena, which can stay with human beings for many years. Davidson (1994, cited in Sloboda and Juslin, 2001) has a similar view. He says that moods provide a longer lasting “affective background”, which makes it more likely for some emotions to occur and less likely for others. Ekman (1999a) emphasises that emotions can begin very quickly, due to their adaptive function. For Solomon (1993), the difference between ‘emotion’ and ‘mood’ lies in that the former is directed at something, while the latter is not. He writes that “emotions are always ‘about’ something or other. One is always angry about something; one is always in love with someone or something . . . ; one is always afraid of something (even if one doesn’t know what it is)” (Solomon, 1993, p.12). This he sees in contrast to moods, which do not have a determinable object (Solomon, 1993, p.11). CHAPTER 3. AFFECT AND EMOTION THEORIES 25 Figure 3.1: “A spectrum of affective phenomena in terms of the time course of each”. Adapted from Oatley and Jenkins, 1996, p.30. Another distinction that has been put forward is that emotions are said to result in distinct facial expressions, while moods do not (e.g. Ekman, 1999a). The intrinsic relationship between emotions and facial expressions will be examined in detail in chapter 5.5. Emotions therefore have a quite specific meaning. This is the reason why I avoid the commonplace terms ‘emotional connotation’ (e.g. Urban, 1939) or ‘emotive connotation’ (e.g. Osgood et al., 1957), but refer to the concept as ‘affective connotation’. Use of the term ‘emotional connotation’ might imply that the concept only applies to full- blown, short-lived emotions. However, the factors evaluation, potency and activity apply to other affective phenomena as well, as will be shown in the following. 3.2 Induction and communication of emotion Scherer and Zentner (2001) describe a basic and easily understandable model of how emotion can be induced in humans and communicated to others. The model is reproduced in figure 3.2. The upper part of the diagram describes how emotions are induced in humans. In order for emotions to occur, there must be some kind of event. This event causes an appraisal process in a person, which evaluates the implications this event has for him or her. Several factors may be taken into account here. A person may evaluate the event’s implications concerning his or her needs and goals, and whether the person is capable of dealing with the consequences of the event. The outcome of this appraisal process determines how the person feels about this event. For instance, if the event blocks the way towards a goal, one might feel angry. If one feels to be in danger, this would cause fear. An unexpected event which results in a pleasant situation for one would cause surprise and joy. Each of these emotions then results in expressive behaviour, the symptom. Possible symptoms include facial expressions, gestures and change in posture. 26 CHAPTER 3. AFFECT AND EMOTION THEORIES EMOTION Event Appraisal Induction Person Empathy Expression Symptom Contagion Observer COMMOTION Figure 3.2: Emotion induction and mediated commotion. Adapted from Scherer and Zentner (2001, p. 366). The diagram’s lower part illustrates commotion, which is how emotion might be communicated to an observer along the induction process. The first possibility is that an observer goes through a similar induction process. The observer does not need to be directly affected by the event for this to happen. An example is when somebody sees injustice being done. Although not being the person suffering, appraisal of the event would induce an emotion in the observer, which can be very different from the emotion induced in the concerned person (e.g. anger in the observer, fear or sadness in the concerned person). Another possibility is empathy, in which the observer identifies with the person. Scherer and Zentner note that empathy requires sympathy for the person. If the observer likes the person, the emotional state of this person might cause emotions in the observer. An example would be to feel sorry about the illness of somebody. Finally, the authors note a third path of commotion, contagion. In the case of commotion through contagion, emotion is induced simply by observing the expressive behaviour of somebody, without the need of knowing about the reason for an emotion. The observer may then mimic this expressive behaviour. A possible example is to smile back at a stranger who gave you a smile. 3.3 Dimensional vs. categorical approach The theory outlined above is but one of literally hundreds of emotion theories. These theories are quite diverse and are rooted in the various scientific fields that have contributed to our understanding of emotion. I will focus on two common approaches, which seem to be the most promising ones for use in affective space interfaces. For the most part, I will focus on the results of these theories, not getting into details about the biological or psychological explanations for the existence of 27 CHAPTER 3. AFFECT AND EMOTION THEORIES AROUSAL Afraid Aroused Astonished Alarmed Tense Angry Excited Annoyed Distressed Frustrated Delighted Happy PLEASURE Pleased Miserable Sad Gloomy Glad Serene Content At ease Satisfied Relaxed Calm Depressed Bored Droopy Tired Sleepy Figure 3.3: “A circumplex model of affect” Russell, 1980, p.1167, redrawn. emotions. The first group of theories follows the dimensional approach, in which it is maintained that emotions can be described accurately enough through a number of independent factors. The other group ascribes to the categorical approach, built on the notion of distinct emotions. Finally, while advocates of either side usually consider the two concepts to be mutually exclusive, some researchers have tried to reconcile the two currents. Dimensional approach Theories of emotion which take the dimensional approach identify a very small number of factors which together describe an emotional state, thus spanning an emotional or affective space as it has been introduced in the first chapter. The identified psychological or biological reasons for emotion vary considerably. However, the identified dimensions tend to be very similar. A common debate between advocates of these theories is whether two dimensions are sufficient to describe emotions accurately enough or if three dimensions are necessary. The idea of underlying dimensions of emotion goes back to the late nineteenth century (Sloboda and Juslin, 2001) but receives more attention about 40 years later, beginning with a contribution by Woodworth (1938; cited in Sloboda and Juslin, 2001). Schlosberg is an important early proponent of a dimensional approach. At first he identified two dimensions (Schlosberg, 1952), but later added a third dimension (Schlosberg, 1954). CHAPTER 3. AFFECT AND EMOTION THEORIES 28 An influential dimensional theory was Russell’s circumplex model (1980) in which emotions are roughly distributed on a circle in a two-dimensional space. The dimensions being used are valence and arousal, which remind of Osgood’s dimensions evaluation and activity. First, Russell divided this space up into 8 sections. Then he selected 8 terms for affective categories and let subjects order them on this circle. In the next study, he let subjects place a number of words which describe affective states place in one of 8 affect categories. Subjects largely agreed on the categories in which the terms fit in. This resulted in good distribution of the terms around the circle. His results are reproduced in figure 3.3. Sloboda and Juslin say the following about Russell’s theory, “The circumplex model captures two important aspects of emotions: that they vary in their degree of similarity and that certain emotions (e.g. happy, sad) are often thought of as bipolar. About the same circular structure has been found in a large number of different domains . . . suggesting that the circumplex model really captures something fundamental about emotional responses.” (Sloboda and Juslin, 2001, p.77) However, important emotional distinctions are blurred in a two-dimensional model. For instance, fear and anger are very different in their implications for the body. In the circumplex model, the two emotions occupy very similar positions, because they both are unpleasant and have high arousal (Sloboda and Juslin, 2001). This problem is commonly tackled with the inclusion of a third dimension. “Although use of only two of these factors has been tempting because of greater simplicity, adequate characterization of important distinctions among certain clusters of affect (e.g., fear, sadness, anger) has necessitated a three- dimensional representation.” (Mehrabian, 1996, p.3) Mehrabian thus suggests a more sophisticated model, postulating the dimensions pleasure, arousal and dominance. He notices the similarity to Osgood’s dimensions and acknowledges them to be affective dimensions in essence. Pleasure, arousal and dominance are suggested by Mehrabian as emotional equivalents to Osgood’s general dimensions. He gives a detailed list of emotion examples for each of the octants of this three-dimensional coordinate system, which is reproduced in table 3.1. In table 3.2, I give a comparison of the various dimensions which have been suggested by proponents of dimensional theories of affect. Despite the use of different terms, there is a striking similarity between the dimensions which have been identified in theories of emotion, and those which have been identified by Osgood et al. (1957) as dimensions of meaning. First, there is general consensus about an evaluative factor, variously called ‘pleasure’, ‘pleasantness’ or ‘evaluation’. ‘activation’ and ‘arousal’ correspond well to ‘activity’; ‘dominance’ and ‘attention’ may be seen as similar to ‘potency’. The dimensional theories have been criticised by Paul Ekman, a strong advocate of the categorical approach, arguing that “the evidence suggested at least four or 29 CHAPTER 3. AFFECT AND EMOTION THEORIES Octant Examples +P +A +D admired, bold, creative, powerful, vigorous +P +A −D amazed, awed, fascinated, impressed, infatuated +P −A +D comfortable, leisurely, relaxed, satisfied, unperturbed +P −A −D consoled, docile, protected, sleepy, tranquilised −P +A +D antagonistic, belligerent, cruel, hateful, hostile −P +A −D bewildered, distressed, humiliated, in pain, upset −P −A +D disdainful, indifferent, selfish-uninterested, uncaring, unconcerned −P −A −D bored, depressed, dull, lonely, sad Table 3.1: Pleasure, arousal and dominance model (Mehrabian, 1996) Author Dimensions Schlosberg (1952) Pleasantness Attention Schlosberg (1954) Pleasantness Activation Attention Osgood et al. (1957) Evaluation Activity Potency Osgood et al. (1957) Evaluation Dynamism Ekman (1957) Pleasantness Activity Osgood (1976) Pleasantness Activation Russell (1980) Valence Arousal Mehrabian (1996) Pleasure Arousal Control Dominance Table 3.2: Comparison of dimensional theories of affect five dimensions” (Ekman et al., 1972, pp.73-74) and consensus only covers the most basic factors of evaluation and intensity. Osgood remarks that “of course, there must be many dimensions, so the real question is how many are needed to account for the lion’s share of the variance” (Osgood, 1976, p.126). This is in line with his analysis of ‘meaning’ (Osgood et al., 1957), in which many factors were identified but three of them showed to be most stable and important. Categorical approach The categorical approach to emotion is built on the notion of a small set of basic emotions (e.g. Ekman et al., 1972, Ekman, 1994, Plutchik, 1980, Izard, 1977), which are distinct from each other. These emotions are seen to have evolved during evolu- CHAPTER 3. AFFECT AND EMOTION THEORIES 30 tion, serving specific functions. Commonly, emotions are seen to be connected with the pursuit of goals (e.g. Ekman, 1999a). For instance, anger occurs when a plan to reach a goal does not work out, while we feel happiness when we have achieved our goals (Oatley, 1992). Through emotions, “our appraisal of a current event is influenced by our ancestral past.” (Ekman, 1992a, p.171) In this view, emotions help to guide our behaviour in a world of unexpected events. Thus, basic emotions are not rational and “solve problems with speed rather than precision” (Sloboda and Juslin, 2001, p.77) There is general agreement about the fact that a list of basic emotions does not cover the whole range of emotional states human beings can experience. The explanations for this discrepancy, however, differ. Plutchik (1980) postulates 8 basic emotions, which can occur simultaneously to produce non-basic emotions. Ekman (1999a) has also considered this possibility, but then introduced his theory of emotion families, which is explained below. The idea of ‘basic emotions’ has been criticised on various occasions, mostly by proponents of a dimensional approach (e.g. Russell, 1994). A point of criticism is the unclear distinction between which emotions are basic and which ones are not. The number of identified basic emotions varies greatly between authors. Ortony and Turner (1990) compared categorical theories of emotion and found as little as 2 and as many as 18 postulated basic emotions in different theories. However, this discrepancy most like comes about through different definitions of what an emotion really is (Sloboda and Juslin, 2001). Accounting for that, there seems to be considerable consensus among most advocates of a categorical approach in regard to those emotions which have been postulated by Ekman. Paul Ekman Paul Ekman is the best known proponent of a categorical approach. He started out using a two-dimensional approach (Ekman, 1957), but soon revised his theory. He became most famous for his cross-cultural studies of emotion (Ekman et al., 1972), in which he used photographs of facial expressions from different cultures. He could show that emotions could be accurately judged cross-culturally from looking at facial expressions. Facial expressions are crucial to his theories of emotion, so much so that at times only affective states which are accompanied by distinct facial expressions he considered to be proper emotions (Ekman, 1999a, Ortony and Turner, 1990, Sloboda and Juslin, 2001). He inferred six basic emotions which could be judged cross-culturally: ‘anger‘, ‘joy’, ‘surprise’, ‘fear’, ‘sadness’, ‘disgust’. Later, he refined his theory. While he maintains that basic emotions are always accompanied by a bodily signal, it does not need to be a facial expression (Ekman, 1992a). He introduces the notion of emotion families, which are similar emotions and variations of one basic emotion (Ekman 1975; cited in Ekman, 1992a). He also acknowledges that many other affective states are candidates for basic emotion CHAPTER 3. AFFECT AND EMOTION THEORIES 31 status, namely ‘interest’, ‘contempt’, ‘guilt’ and ‘shame’ (Ekman, 1992a). He also addressed the problem that his list only includes one positive emotion, ‘joy’, but five negative ones. He maintains that there are as many positive emotions as there are negative emotions. Unlike negative emotions, however, positive emotions do not have a distinct bodily signal, but share the facial expression of a smile. His theories will be further explored in chapter 5.5. Since his theories are closely tied to expressions, they provide a promising framework for the visualisation of affect in interactive systems. 3.4 Discussion Both approaches, dimensional and categorical, have their merits and receive encouraging experimental results. In fact, the two approaches seem to describe different aspects of the same phenomenon. Christie and Friedman note in an emotion judgement study which used films as stimuli that “a valence – activation circumplex was found in experienced emotion despite that the films were selected on discrete emotion criteria” (Christie and Friedman, 2004). Ekman et al. (1972), a strong advocate of basic emotions, acknowledges the factors evaluation and intensity. Two dimensions of emotion are seen as a simple explanation but are continually regarded to be insufficient to accurately describe emotions (Young et al., 1997, Ekman, 1994, Izard, 1997), while three-dimensional models are seen to describe emotion accurately enough for practical purposes (Mehrabian, 1996). On the other hand, the notion of basic emotions which serve biological functions is compelling. A large number of cross-cultural experiments in which test subjects were able to accurately judge emotions shows that human beings actually do think in categories of emotion (Etcoff and Magee, 1992) and are able to name them correctly. It is not a new idea that these two approaches are actually not so different. Figure 3.4 shows an early attempt to map categories of emotion onto a two-dimensional affective space. In their comparison of the two approaches, Young et al. conclude: “Dimensions such as pleasant–unpleasant thus correspond to intellectual, not perceptual constructs” (Young et al., 1997). Dimensional approaches constitute a model to efficiently describe emotions. However, evidence suggests that humans do actually think in emotion categories. In chapter 2, I introduced the work of Osgood et al. (1957). In semantical studies, they examined the nature of the affective connotation of words and inferred the existence of an affective space, constituted by the dimensions evaluation, activity and potency. Their studies show that any stimulus occupies a point in affective space. The words used for their studies did not specifically denote affect or emotion 32 CHAPTER 3. AFFECT AND EMOTION THEORIES Love, Mirth, Happiness PLEASANT Contempt Surprise REJECTION ATTENTION Disgust UNPLEASANT Fear, Suffering Anger, Determination Figure 3.4: Emotion categories on a two-dimensional affective space. Adapted from Woodworth and Schlosberg (1954). but still had an affective connotation. The semantic differential then is but one way to make affective connotation visible. Dimensional theories of emotion start out from different propositions but arrive at very similar conclusions. This is hardly surprising; emotions are one of several kinds of affective phenomena and can thus be described in the same way. Therefore, each emotion naturally occupies a position in affective space. Mehrabian (1996) explicitly references evaluation, activity and potency as framework for emotion description. Since emotions are affective phenomena, they not only have an affective connotation but actually denote affect. Consequently, their positions in affective space are quite easily determined and are likely to be more extreme than the position of concepts which do not denote affect. However, this approach does not fully explain emotions. Darwin (1872) showed the existence of emotions in other primates, a view which was confirmed by the studies of Ekman and others. Emotions are ancient mechanisms which predate language and have developed evolutionary to automatically adjust our body to external influences. They also serve a communicative purpose; through facial expressions, surrounding beings are informed about one’s emotional state, in order to adjust their behaviour accordingly. While emotions are not the only affective phenomena, they certainly are very important ones. Their close ties with expressive behaviour, first and foremost facial expressions, as well as their cross-cultural universality, make them a very interesting candidate for a necessary task on the way to affective space interfaces, the expression or visualisation of affective space on a computer screen. Chapter 6 describes such an attempt. In summary, I believe that dimensional models are a general description of affect. These dimensions, however they might be called, apply to any concept which CHAPTER 3. AFFECT AND EMOTION THEORIES 33 carries meaning and thus can be evaluated by humans. Emotions are not equal to affect, but rather are a specialised form of affect which fulfils several biological functions. Categorical approaches describe specific emotions, not affect. However, since any emotion can be located in a general affective space, they describe affect via this proxy. Computational models of affect Peter and Herbon (2006) give a rare overview of how the various emotion and affect theories could be best transferred into the domain of human-computer interaction. They believe in a current lack of solid theoretical models of emotion in most affectrelated software efforts. Software designers should look into the field of psychology to select those theories best suited for use in software. This decision should be based on the system’s requirements. They favour dimensional theories and disapprove of categorical theories for use in HCI. They criticise that categorical theories label emotions with words, a process which they not only deem unnecessary but also counter-productive. They see emotion categories to be artificial and specific to the English language. Dimensional theories, on the other hand, do not need verbal descriptions. Dimensional descriptions are easily transferable into software, only requiring to store two or three values which fully describe an emotion. I agree with the notion of Peter and Herbon that software designers need to pick those theories which are best suited for a specific task at hand. However, I do not share their strong disregard for the categorical approach. Research has shown that humans do think in categories of emotion. The large evidence from cross-cultural research, in which emotions were not named but only shown as photographs, compellingly shows that emotional categories are neither artificial nor language-specific. The strong connection with facial expressions is a major advantage of emotion categories. Facial expressions are a promising way to visualise at least part of affective space in a universally understandable manner. As I have explained before, the two approaches most likely describe different things. A dimensional model seems to be ideal for general purposes in which affect should be described. For the description of emotions, however, the categorical approach seems to be more expressive. One might argue that categories could simply be expressed by locations in affective space. A dimensional model in the core would then render actual categories obsolete. However, I do not believe that an actual emotion can be fully expressed by two or even three affective dimensions. Their meaning is quite specific and most likely includes other dimensions. Still, a three-dimensional model seems to be able to capture the differences between many of the proposed emotion categories. Conversion of values between the models should then be possible, though at a loss of accuracy in either direction. Chapter 4 Affect in art and media The notion of affective space was inferred empirically from language stimuli (see chapter 2). However, there is a considerable body of evidence that this concept is ubiquitous in human creations and can be observed (and reliably judged) in any form human creativity takes. Art has a very special relationship with emotions. Some forms of art seem to be created specifically for affective reasons, but the principle can be observed in any piece. Furthermore, everyday media produce affective connotations as well. I begin with general theories which outline how it is possible that art and media are related with emotions. Afterwards, I will examine the special relationship of emotions and music. The findings of this chapter lay the final groundwork for the use of affective information in interactive systems. 4.1 General theories Emotions are a defining feature of sentient beings, which include humans and animals. As explained in chapter 3, emotions can be thought of as processes, which begin with appraisal and may result in expressive behaviour. Expressive behaviour, like facial expressions (see chapter 5), is therefore not the cause of emotions, but an accompanying signal of an ongoing emotional process. Works of art are not sentient. This leads to two questions I will try to address: How is it possible that we think that art can express emotions? Furthermore, emotions always have an object towards they are directed (Solomon, 1993). Why, then, are we actually moved emotionally by a work of art if our appraisal of the situation tells us that there is no actual reason to feel in such a way? 34 CHAPTER 4. AFFECT IN ART AND MEDIA 35 Sign models In chapter 2, I made frequent use of the simple sign model (signifier signifies signified). The concept is well suited for language, since it captures the usually arbitrary relationship between words and their denoted concept. It is not possible to discern what a word means from just reading or hearing it, and different languages use different words to reference identical concepts. Davies (2001), who writes about emotions in music, mentions this as one possible way of thinking about how art can express emotions. In this model, works of art merely serve a sign function like words do. The relationship, then, is arbitrary and works on the basis of syntactic rules. He counters, though, that such a relationship is not possible because music lacks semantics and other defining features of language. In linguistics, onomatopoeia is the exception to the rule that words are arbitrary signs. Onomatopoeia describes words whose spelling and pronunciation imitate the sound of their signified. Well-known English examples include hiccup and cuckoo. Onomatopoeic signs are not arbitrary, but conventionalised and still differ between languages. This class of signs may be called iconic or representational signs (e.g. Davies, 2001, Mikunda, 2002). Iconicity generally refers to signs – linguistic and non-linguistic – whose signifier is similar to, or somehow resembles its signified. This does not, however, take away from their sign function. Cuckoo still is, though not completely arbitrary, a linguistic signifier for a specific kind of bird. Iconic signs refer to something else – their denotation – and can give rise to feelings – their affective connotation. In the context of music, Davies compares this to certain sounds. For instance, while a trumpet can produce high, and therefore bright sounds, the lowest sounds of a clarinet would be dark. However, he raises the concern that this model does not reflect how we feel about music; reading the word sad does not give rise to the same feelings as listening to sad music. Emotion is actually expressed by works of art, not just signified. This leads to the contour theory. Contour theory A stronger form of connection between art and emotion can be achieved if a work of art does not merely function as a signifier for a certain emotional state but actually expresses this state. This becomes possible if a work of art bears a great deal of resemblance with an emotional state. Davies (2001) is a proponent of this theory. He argues that when something closely resembles an emotion, it is plausible that this something should be perceived to express an emotion. This solves the problem that a piece of art is seen to express emotions, even though it is not sentient and thus cannot ‘have’ them. CHAPTER 4. AFFECT IN ART AND MEDIA 36 This resemblance comprises any form through which emotions are expressed by sentient beings, like facial expressions, gestures, postures and other forms of behaviour. Davies argues that these expressions do not operate as a signifier for an emotional state like iconic signs do, but ‘are’ this state in itself. Some behaviours are always perceived as expressive of emotions, even though they are not actually expressing a present emotion. He gives the example of the weeping willow and St Bernard’s dogs (see figure 4.1). Both shapes appear to be sad-looking. However, plants are not sentient and thus cannot possibly experience this emotion. Dogs are sentient, but there is no reason to assume that St Bernard’s dogs are sad, even though their faces look like they were. Davies attributes this phenomenon to the human trait of anthropomorphising our environment, which means that we ascribe human characteristics to non-human beings and even non-living things. Figure 4.1: Plants and animals can be perceived as sad-looking without feeling this emotion. Taken from Davies (2001, p. 36). Pieces of art can then mimic these expressive behaviours in their own ways. Paintings might imitate shapes that are found in facial expressions or postures, while music’s dynamic character could imitate the movements that characterise emotional states (like the slow movement of a sad person). In the contour theory, only those emotions that result in expressive behaviour can thus be expressed in works of art. A question that remains is why recipients would react emotionally to works of art which only resemble an emotional expression. For instance, listening to sad music actually invokes physiological changes similar to actually experiencing this emotion; in this way, listeners ‘mirror’ the emotion expressed in a song. If we are aware that an entity only appears to be expressing sadness but in fact is not feeling this way, it is counter-intuitive that we should mirror the emotion. Davies responds that such an emotional reaction is not an emotion in the narrow sense, which requires that an emotion is always directed at or is about something (Solomon, 1993). The work of art does not become the object of the emotion we experience. Instead, he compares the phenomenon to situations in which a certain CHAPTER 4. AFFECT IN ART AND MEDIA 37 mood – say, sadness – is prevalent. Even though one might not have any personal reason to feel sad, the mood affects one to experience this emotion. This idea conforms with contagion, the third type of emotion communication noted by Scherer and Zentner (2001) (see chapter 3). Expression theory Robinson (2005) is a proponent of the expression theory. This theory maintains that the emotions in a piece of art are expressed by a persona. This persona might be identical to the author1 , in which case she speaks of an implied author. In other cases, it is a completely fictional character. The introduction of a persona solves the dilemma that works of art are not sentient and thus cannot express emotions. Instead, it is a character, embodied in the piece, who lives through emotion processes. This persona is shaped by the work of art, because it might give hints about this persona’s character, appearance, behaviour and beliefs. Though certainly influenced by the work, the actual form this persona takes is constructed by the recipient2 . This is a very similar view to Iser (1978), who examines the role of the reader in literature. He maintains that a text can only come to life if it is “konkretisiert”, or realised, by a reader. In the reader’s mind, the stimuli a text provides are augmented to draw a certain ‘picture’ (“Konkretisierung”), one that is different between readers, but also if one person reads a text multiple times. Only through this process a story can achieve what he calls “life-likeness”. Once the persona is established, it is subject to and interacts with the fictional world described. Thus, Robinson offers two interrelated ways in which emotion can be expressed in art. On the one hand, the piece can focus on the environment the person is exposed to. Emotion is then expressed by showing how the world appears to the person when he or she feels a certain emotion. For instance, if the person is angry, the world would be described as offending; if the person is sad, the world would be a place that is empty and devoid of meaning. On the other hand, the focus can be on the person. Then, emotion can be expressed through the character itself, by describing his or her thoughts and beliefs. An angry character would be offended, a frightened character would believe to be threatened. Since the character interacts with the world, this is a matter of focus, and a successful work of art would likely exhibit both possibilities. When a piece of art expresses a certain emotion, this is usually done intentionally by the author. However, the recipient is the one who finally realises the work. Thus, it is also possible that the stimuli provided by the work lead to an emotional response not intended by the author. In this way, a piece of art comes to occupy its 1 The author, in this context, refers to the originator of a work of art. Depending on its mode, this might be the composer, the director, etc. 2 Depending on the media type, recipient refers to reader, listener, observer, etc. CHAPTER 4. AFFECT IN ART AND MEDIA 38 place in affective space. The character’s thoughts that are expressed occupy a point in affective space, which results in affective connotation of the piece. In parts, Robinson picks up ideas from the contour theory; she acknowledges the role of resemblance and that some shapes, sounds or movements naturally correspond with emotions, just the way facial expressions tell us something about the emotions of other humans. These characteristics are being distorted, simplified or exaggerated by artists to show the emotion more clearly, “as if to abstract the essence of the expressive gesture in a purified form” (Robinson, 2005, p. 288). She believes, however, that art can go much further in its expressive powers. After we were affected by an emotion process in real life, we can look back on our experience and label the process with a term that summarises the feeling, like sad or happy. This is very difficult to achieve while we are experiencing an emotion. For Robinson, this is the point which art excels at. Works of art do not just show emotions like facial expressions but let the recipient actually participate in what it is like to go through an emotion process. Art which expresses emotions reflects on an emotion process and offer the recipient a summary of the process. If this is done successfully, the recipient as well as the author learn something about the emotion and afterwards understand it a little better. And while emotion terms can only give a rough indicator, emotion expression in art can be incredibly precise and subtle. As its name implies, the expression theory focuses on how works of art are capable of expressing emotions. The other question, how emotions can be aroused in the recipient, is not touched upon, but does not pose a problem in this framework. The emotions are expressed through a persona, which the recipient accepts as a sentient being. Thus, the same ways in which emotion is communicated between humans – induction, empathy and contagion – can apply (see chapter 3). Discussion A simple sign model as explanation of how emotions are expressed in arts is generally rejected. Even when this relationship is not arbitrary but somehow natural through resemblance, it does not capture how we feel about art. Such a model would suggest that art can only express emotions if it denotes affect in the same way the word sad denotes an emotion. Davies (2001) argues that this would reduce art to “brute naming” of emotions. Instead, emotions and affect are perceived as being inherent to works of art. The second chapter has shown that affect does not need to be denoted to be present. Whenever we use signs, they are always bound up with affective connotation, even if they are denoting concepts that are not affect-related. I maintain that the situation is not different for art. Again, the actual content does not need to denote emotions to be expressive of affect. Instead, the impressive emotional expressiveness of art lies in the intentional and skilful control of its affective connota- 39 CHAPTER 4. AFFECT IN ART AND MEDIA CONNOTATION DENOTATION Sr Sr Sd Sd Figure 4.2: Denotation and connotation tion. If affect is connoted, it fulfils the immediate character of emotional experience Davies is asking for. In figure 4.2, this function would be located in the curly brace that controls which associations are evoked. Again, we can see that neither the signifier nor the signified alone is responsible for affective connotation, but the sign as a whole. What is denoted (signified) is as important as how it is denoted (signifier). However, some forms of art may put emphasis on the one or the other. Realistic depictions (like photographs or detailed landscape paintings) accurately depict the signified, while abstract depictions (like abstract paintings or instrumental music) at best hint at a signified but achieve their expressiveness through their signifier. This is the point where the contour and expression theory come into play. They are general attempts to explain how such control can be achieved, which can be further detailed for any media type. One way how to achieve this effect of regulation is captured by the contour theory. When images or pieces of music bear resemblance to typical ways how emotions are expressed in humans, it seems only natural that such associations are aroused in humans. This effect can operate independently of an artwork’s denotation. In Davies’ example of St Bernard’s dogs, nothing sad is denoted, yet the expression is perceived as such because the facial features of this type of dog resemble the facial expressions of sad humans. In this context, Robinson (2005) speaks unfavourably of the “doggy theory” and is convinced that this cannot possibly explain the emotional expressiveness of art. The expression theory with its focus on persona in art covers another possibility. The theory is most plausible when there actually is a character in a work of art who can express his or her thoughts. Davies (2001) denies the proposition of expression theorists that recipients always construct a persona in works of art which expresses his or her emotions. He gives the example of instrumental music, for which he does not believe listeners construct persona in absence of a character. The same argument could be given for any form of abstract art. Robinson (2005) responds that it is not necessary to think of a persona to appreciate an instrumental piece of music, but it is possible and makes it easier to understand its emotional expression. Another possibility is the use of symbols. Artworks cannot be fully explained by a simple sign model, but the signifying character of symbols can play an important role. Some forms of art – surrealist paintings, for example – frequently depict CHAPTER 4. AFFECT IN ART AND MEDIA 40 objects or shapes which are not to be taken literally but actually denote something else. In this context, a symbol can be seen as a conventionalised sign relation, in which a certain signifier reliably stands for a signified. Such symbols, in turn, have their own affective connotation, thus contributing to the general affective value of work of art. The routes outlined above, through which art can achieve its affective expressiveness, are only simple explanations, and doubtlessly, there are countless other possibilities to consider. The real value, however, lies in the interplay between these factors, a complex set of denotations and connotations working together to express the artist’s vision. The explanations offered here may be simplistic; yet they present possible starting points to represent affective connotation in interactive systems. To some degree, these general models should be valid for any media type. In the following, specificities to an important media type, music, will be pointed out. 4.2 Music The most compelling evidence for an art form that has strong connections with affect and emotions exists for music. Scherer (2001) notes, “It has often been claimed that music is the language of the emotions”, and Juslin and Sloboda (2001, p. 3) believe that “emotional experience is probably the main reason behind most people’s engagement with music.” Interactive systems which deal with music should therefore benefit from awareness of its affective value. The relationship, however, is not easily explained. If we engage with music because of emotions, what are these emotions? Are these emotions expressed by music, or does music arouse emotions in the listener? And if music arouses emotions, how is it possible that so many of us listen to music that would be described as sad, when we usually avoid negative emotions? These questions are even more difficult to answer if the focus is on instrumental music. In absence of lyrics, it is hard to imagine a persona expressing his or her thoughts, thus more or less ruling out the expression theory for instrumental music (though Robinson (2005) maintains this is still possible). Influence of musical structure Gabrielsson and Lindstrom (2001) examined how emotional expression is influenced by musical structure, such as tempo, pitch, melody and rhythm. To this end, they surveyed an impressive number of studies. They subscribe to the idea that composers do not express the emotions they are currently feeling, but are aware of the effect of musical structure to achieve intended emotional expressions, quite in line with contour theory. CHAPTER 4. AFFECT IN ART AND MEDIA 41 The consulted studies can be divided into those that use real music stimuli and those that use short, isolated sequences. The advantage of using real music is that it provides a realistic setting. On the downside, results gathered in such manner are inherently ambiguous because there is no controlled variation of single factors. Studies which use short sound sequences are easier to interpret in absence of possible interactions. However, they might lack validity because they are not part of a complex musical structure. As a compromise, some studies used real music which was manipulated to vary in one specific factor. A difficulty of this approach is to achieve naturally sounding compositions for every varying factor. When isolated, tempo is strongly positively correlated with valence and arousal. These tendencies, however, can be overruled by other factors, like mode. Minor mode is often associated with sadness, while major mode is more likely to express happiness. These tendencies were even confirmed in studies with children. However, happiness can also be expressed in minor mode under the influence of other factors, like pitch and loudness. Loudness is strongly correlated with arousal, though big changes in loudness seem to express fear. The influence of pitch is not as clear, though in many cases it seems to be positively correlated with valence. In their conclusion, Gabrielsson and Lindstrom stress the importance of context when examining the affective meaning of musical structure. The expressive power of music does not lie in isolated factors but in the complex interplay between them. In my view, this can be compared to language; single words may denote concepts and connote affect, but explaining the meaning of each word can never explain the full message of a text, which only comes to life because of the way words are arranged. Another differentiating feature between the surveyed studies is the way emotional reactions were recorded. The oldest studies asked their subjects to give free descriptions of their emotional experience. Some studies used a list of affective terms from which subjects had to pick the most appropriate term. Other studies tried to get non-verbal responses from subjects, which were measured continuously. These methods are most appropriate to measure dynamic, time-dependent effects of music. A final group of studies asked subjects to rate their experiences along semantic differentials. The results were then analysed for correlations between factors, usually through factor analysis, to determine underlying dimensions. The cited studies obtained very similar results. In line with studies of other types of stimuli that used semantic differentials and factor analysis, two factors were consistently identified: valence and arousal. These results are yet another indicator of the ubiquity of these factors in affective responses. CHAPTER 4. AFFECT IN ART AND MEDIA 42 Production of emotions in the listener Scherer and Zentner (2001) examined if and how emotions can be ‘produced’ by music in the listener. They note that listeners usually have little trouble to judge what kind of emotion is expressed in a piece of music. However, this does not automatically mean that the listener feels such an emotion. The authors note a study in which they instructed subjects to describe both which emotions they believe the music stimuli are supposed to express (perception), and what they actually felt when listening to the music (production). The results gathered differed considerably. In most cases, perceived emotions were reported as much stronger than felt emotions, though in some cases, the opposite was reported. Scherer and Zentner survey various efforts to measure emotional responses to music. Some studies used a dimensional model, and some followed a categorical approach. Followers of the categorical approach maintain that presence of basic emotions always results in specific facial expressions. Thus, if music produces these basic emotions in the listener, facial activity should be present. One study notes an effect of valence on facial expression. Dislike of music (negative valence) tends to result in contraction of the corrugator (the ‘frowning’ muscle), while favourable music (positive valence) activates zygomatic major (the ‘smiling’ muscle). Generally, though, the results are less significant than what would be expected. On the other hand, the dimensional model is seen as valid, but at the same time not being capable of capturing the subtle differences between emotions. This is a view in line with other critics of the dimensional approach (see chapter 2). They conclude that results of the studies are indecisive, and they believe that neither of the approaches is appropriate to measure which emotions are aroused in listeners. They doubt that basic emotion terms like anger, fear or disgust are the emotions we are likely to feel when listening to music, for they describe emotions much stronger than how we feel about music. They encourage the development of a taxonomy of terms that are appropriate for the measurement of the subtle emotional responses to music, like longing, awe, solemnity or tenderness. Time dependency At this point, it is important to remember the difference between moods and emotions. While moods are seen as longer-lasting states, emotions tend to be short experiences which necessitate the presence of an object or event towards the emotion is directed. Schubert (2001) recognises this and reports on the continuous measurement of emotions in music listening. One way to achieve this is to let listeners record their affective experience on a valence-arousal affective space. Results showed that the expressed emotions vary considerably over the course of whole songs. The overall feeling a piece of music evokes seems to be different to the single emotions evoked in the course of the song. The average of contin- CHAPTER 4. AFFECT IN ART AND MEDIA 43 uous responses is less pronounced than overall responses. This aligns well with intuition; emotional responses may be strong, but are also short-lived. They may be followed by passages less emotional. They may even cancel each other out on a two-dimensional affective space when a song contains both positively and negatively valenced passages. As with all statistics, calculating an average is not very meaningful without considering variance, too. As we have seen before, there are many ways in which emotions can be expressed by musical structure. This expressed emotion can vary considerably across different parts of the song. Thus, in order to capture the emotional expression of a song, it is necessary to measure the listeners’ responses continuously. However, listeners can readily give statements about the overall affect expressed in a song, which seems counter-intuitive considering the changes in emotion recorded in continuous measurement. Thus, I believe that the overall affective response to a song most likely refers to its mood, not emotion. Evaluation vs. pleasure Davies (2001) noted the seemingly paradoxical situation that many people enjoy music which expresses emotions with negative valence (such as sadness). This phenomenon becomes even more puzzling when it is taken into account that music can produce emotions in the listener. According to Davies, the sadness expressed in music is sometimes mirrored by listeners, thus feeling sad themselves. In everyday life, we tend to avoid negative emotions. How, then, is it possible that we willingly engage in music which makes us feel sad? Davies illustrates several ways to address this question. One view puts emphasis on the educating effect of emotions in art (as noted by Robinson, see above). When we experience negative emotions through music, there are no negative consequences, as opposed to situations in real life. Thus, the induced emotions are muted in comparison and make us accustomed to these feelings. Davies counters that this might be true for representational art for which a persona can be constructed, but instrumental music and other non-representational art cannot achieve this educating effect. Another possible answer is that we recognise the negative expression in music, yet value the way this is achieved. The negative emotions we experience are then just something we need to face to see the true value of a piece of music. Davies compares this to endurance races and other challenging activities, in which the negative aspects are to be overcome on the way to the achievement of finishing. This shows that in real life negative emotions are not avoided at all circumstances, as one might spontaneously think. Schubert (2007) conducted a study about the influence of perceived and felt emotion on preference in music. Emotion perceived to be expressed in music is called external locus (EL), and emotion felt by the listener is called internal locus (IL). CHAPTER 4. AFFECT IN ART AND MEDIA 44 Subjects rated music along valence-arousal dimensions (defined with the words happy-sad and aroused-sleepy). In addition, they rated ‘emotional strength’, familiarity and preference (defined as hate it-love it), each on a 7-step scale. Schubert maintains that preference for the expression of negative emotions in music implies a certain level of dissociation between these experiences. Either, there is dissociation in part, which would mean that the negative emotion aroused by the stimulus is overruled by the positive emotion of preference. Full dissociation, on the other hand, would mean that the listener feels the negative emotion which is expressed, yet enjoys this feeling. The results showed that the emotion subjects felt while listening to music had a greater influence on preference than what was perceived in music, though recorded absolute values for felt emotion were lower than expressed emotion. Schubert notices the effect of a “locus gap”; music for which felt emotion and perceived emotion were rated similarly tended achieve higher ratings for preference. He suspects that pieces of music which try to express an emotion which is not aroused in the listener are seen as failing to achieve their intended effect. The biggest influence on preference is exerted by ‘emotional strength’. What kind of emotion is expressed or felt is less important for preference than the intensity with which an emotion is expressed. Another strongly influencing factor is familiarity; pieces which listeners know well tend to be preferred over pieces unknown before. Overall, the study shows that emotions expressed in music can actually produce emotions in listeners, but expressed and felt emotions are not necessarily the same. Negative emotions may be enjoyed as much as positive emotions, which hints towards full dissociation between emotion and preference; it seems that listeners enjoy the emotions (even the negative ones), which are aroused through music. Another way to look at the phenomenon is offered by Norman (2004). He distinguishes between the visceral and reflective level of emotional experience. Visceral refers to hardwired reactions of our nervous system, while reflective is a cognitive evaluation of a stimulus. Seen in this way, music which is evaluated negatively on the visceral level may still be valued positively on the reflective level if we are familiar with it. Norman calls this an “acquired taste”. This distinction aligns well with the strong influence of familiarity on music preference noted by Schubert. Only when we are accustomed to a style of music can the reflective level overrule the negative first response on the visceral level. It seems like we are dealing with two independent forms of valence. When Osgood et al. (1957) examined the affective connotation of words, they named the most important factor evaluation. It describes our attitude towards a stimulus, whether this thing is good or bad. Mehrabian (1996) adopted the model for emotions, and named the first factor pleasure, which he sees as “cognitive judgements of evaluation”. Affective states which are evaluated positively are those that are pleasurable for us, while unpleasant emotions are evaluated in a negative way. Thus, when our CHAPTER 4. AFFECT IN ART AND MEDIA 45 personal emotions are concerned, our evaluation (good-bad) is identical with the way it affects us (pleasant-unpleasant). Preference in music is not just about its affective value, it is a judgement about our general evaluation of a piece. Unpleasant emotions may be aroused in the listener, and the emotion in itself would be evaluated negatively. However, this emotion is embedded in the complex context of a piece of music (or any other form of art) and may be an integral part of the whole. As the study of Schubert has shown, these judgements are independent of each other. Preference and the valence of expressed or felt emotion are both evaluative dimensions. Henceforth, I will try to distinguish between these two forms of valence. I will use pleasure for the valence of emotional states, evaluation for preference judgements and valence as a more general term which encompasses both concepts. 4.3 Discussion The discussion above has focused on the expression of affect in music for several reasons. Music seems to have an even stronger relationship with affective experiences than most other forms of art. Furthermore, the question about how it is possible for works of art to express emotions is most difficult to answer when the focus is on instrumental music. I have outlined only a few ways in which this seems to be possible; undoubtedly, there are many other possibilities to consider. The analysis has uncovered that appraisal in art seems to consist of two layers. These two layers become apparent if one examines how it possible that we enjoy works of art which express negative emotions. To distinguish between the two layers, I recommend the term pleasure to describe the emotion which is expressed in a piece, and evaluation for the appraisal process of a complete piece. The general dimensions of affect – evaluation, activity and potency – seem to be valid for works of art, regardless of their form. However, it seems likely that specific forms of art are more suitable to express some forms of affect than others. In the case of music, it seems unlikely that some of Ekman’s basic emotions are expressed. Therefore, if the goal is to describe affect, it is important to remember that expressiveness can be optimised if the explanatory model pays respect to the specificities of the art form. The general affective dimensions then provide a common ground, give a first indication of a piece’s affective connotation and enable comparisons between different forms of art. This leads to the next chapter, in which I examine different ways in which affect might be described. Chapter 5 Metalanguages for affective space How do we gain access to the information in affective space? Chapter 2 introduced the notion of Barthes (1973/1996) that by definition, connotation cannot be described directly but only through the use of metalanguage. If affective space is made up of affective connotation, we need to find metalanguages which reliably reference this kind of information. M'sr M'sd CONNOTATION Metasign Msr R'sr M sd R'sd DENOTATION CONNOTATION Actual sign Rsr R sd DENOTATION Figure 5.1: Metasign Consider a sign R, whose affective connotation R" we wish to describe. A metalanguage replaces the connoted system’s signified (R"sd ) with a new sign M, which will be called metasign. To be well suited for our task, the meaning of the metasign – denotation and connotation – needs to be well known. For it to be part of an effective metalanguage, the metasign’s denotation Msd needs to match the connoted system’s signified R"sd as closely as possible. In addition, since we are introducing a new sign, we cannot avoid introducing new connotations (M" ) at the same time. This leads to the requirement that the connotations of the metalanguage need to fit 46 CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 47 with its denotation. If denotation and connotation of the metasign do not match, its meaning becomes ambiguous, resulting in a loss of expressiveness. The term metalanguage is used quite liberally. Metalanguage, in the sense used here, is a system of symbols for communication, not restricted to the use of linguistic entities such as words or sentences. While natural language does constitute a powerful metalanguage, there are many other representations that have their own advantages. In the following, I will give an overview of metalanguages for affective connotation which appear to be feasible for use in interactive systems. Of course, this list is in no ways exhaustive. 5.1 Affective scales Osgood et al. (1957) have shown that affective space can be represented well through a three-dimensional model. Though the actual space must be much more complex and involves an unknown number of dimensions, rating a concept along these dimensions gives a good indicator of its affective value. Therefore, visual representations of these factors are a very direct metalanguage for affective connotation. Evaluative scales The research of Osgood et al. has shown that the evaluative dimension carries the greatest importance. So, it is no coincidence that evaluative ratings of content is very common. Indeed, evaluative scales are ubiquitous and can be found in many forms. In its most popular form, the evaluative dimension is represented by ‘starratings’. Star classification systems have a long tradition. Hotels are given official stars to indicate their quality standard. Nowadays, countless websites, e.g. media stores, encourage their users to rate content on such scales. Another common form of evaluative scales are scores. The scale may range from 0 (worst) to 100 (best) (e.g. game reviews), 1 to 10 (e.g. performance ratings) , 5 to 1 (e.g. Austrian school and university grades), without changing the aspect of meaning covered. The concept is easily understood and successful. Evaluative scales reduce the complexity of meaning down to a single dimension and thus give the roughest of indicators of whether a concept is worthwhile to consider. Used in this way, star qualifications and scores express the same meaning as a semantic differential for the pairing good-bad, which was determined by Osgood et al. as the pivot pair for the evaluative factor. However, the simplicity of evaluative scales also means that they are very limited in their reach. Furthermore, the analysis of music in chapter 4.2 has shown that one needs to distinguish between two forms of valence, evaluation and pleasure. Evaluative scales do not describe the affect expressed by an information entity, of which plea- CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 48 sure is the most important dimension. In section 5.2, I describe a study about semantic scales and music, which shows that listeners can easily distinguish between these two forms of valence. While affective connotation is inherently subjective, evaluative scales are especially susceptible to subjectivity. Good and bad are no objective criteria and are always dependent on a person’s taste, for one man’s meat may be another man’s poison. Professional evaluative ratings, as in the example of hotel ratings, are therefore based on a common catalogue of criteria on which to base the ultimate decision upon. Teachers mark students on the basis of a list of course requirements. In the case of user-provided evaluative data, meaningful results can only be expected if a large number of ratings is compared statistically. Evaluative scales are probably the most widely used affective space interface, but are very limited in their expressiveness. They are purely reflective, not describing the affect being expressed by an information entity but if we like or dislike an entity. General scales: Musicovery The concept of scales can be applied to purely affective factors, pleasure, arousal and dominance (PAD). Combined with evaluation, this gives a good indication of an entity’s affective connotation. Since PAD are defined as orthogonal dimensions, they can serve as a very direct visualisation of affective space. There are not many cases in which the concept is being employed. One example is the interactive webradio Musicovery1 . In addition to year of release and genre, users can also select the desired mood of the music the webradio should play. This is done via a two-dimensional representation of affective space; pleasure (defined as dark-positive) and arousal (defined as calm-energetic). A screenshot is reproduced in figure 5.2. Figure 5.2: Musicovery. Two-dimensional pleasure-arousal affective space. The concept is not difficult to understand for users. Even if a user does not immediately understand what the scales mean, one click automatically delivers results which approximately match the selected position in affective space. From 1 http://www.musicovery.com/, last accessed 2009-03-18. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 49 this result, users can determine if the mood being expressed ‘feels’ like what they were looking for or if the selected position needs to be adjusted. Musicovery only allows browsing of content. Users cannot express if the results match what they associate with a selected position in affective space. The affective meaning of songs has been determined beforehand. According to a press release2 , this is done via “40 musical descriptors”, each of which can take one of 10 values. From these data, a two-dimensional position is calculated. It is not stated of which nature these descriptors are – semantic differentials are most likely – or how these descriptors are being rated – manually or algorithmically. As we have seen in section 2.2, affective experience is subjective and differs between cultures and even between single persons. A useful addition to the concept may thus be to allow users to express their own opinion about the music they listen to, thus fine-tuning the available data. Though very helpful, there is also a downside to the concept of direct visual representation of affective space. A selected point in this space is not expressive on its own but only in relativity to the whole space. There is no actual visual representation for affect; instead, the position of the selected point needs to be shown in comparison to the whole space to become meaningful for users. This is not a big problem for a browsing interface like the one of Musicovery, since the space needs to be represented only once. The concept is unfeasible, however, for the description of content where space is limited. A search for keywords or browsing through categories delivers results which are usually heterogeneous in respect to their affective connotation. For these situations, straightforward representations of affective space cannot be used. In Musicovery, the affective factors are defined by bipolar adjectives, which in fact turns them into semantic differentials. The next section elaborates on the general use of semantic differentials. 5.2 Semantic scales Osgood et al. devised the semantic differential as an instrument for the measurement of the affective connotation of words. However, the principle can also be applied to other forms of stimuli. For example, TV guides commonly rate films along several dimensions that commonly vary across films and film genres. The scales are provided in addition to the textual description (denotation) and give viewers a quick indication of the mood to be expected from the film (affective connotation). The example in figure 5.3 shows the five scales that are used by an Austrian TV guide. In the bottom row, a 5-step evaluative scale gives a general indicator. In the upper row, four scales with 4 steps each – 0 to 3 dots – represent film-specific semantic dimensions. The scales 2 http://musicovery.com/pressRelease/PressKitMUSICOVERY.doc, last access 2009-03-15. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 50 are not indicated by antonymous adjectives but by nouns which denote the concept. Translated into English, they might be called: thrill, humour, challenge (in the sense of challenging to the mind) and eroticism. These scales can be seen to be equivalent to semantic differentials, the only difference being the way in which the scale is defined. Instead of antonymous adjectives, a noun implies the semantic dimension. The scales could be easily translated into semantic differentials, while still representing the same concepts: calm-thrilling, humourless-hilarious, simplechallenging and chaste-erotic. Figure 5.3: A typical film rating box in a TV guide. Taken from tv-media.at Four scales may not seem to be very specific in their meaning – after all, affective space is made up of an unknown number of dimensions. However, 4 scales with four steps each (as depicted in figure 5.3) already divide affective space up into 44 =256 subdivisions, which are already tailored towards the media type – being vague in most regions, but specific where differences are important and most likely to occur. When the vocabulary of the scales is selected well for the media type, such an indication can easily be superior to, say, genre classifications. To give a practical example, the work of both the Farrelly brothers and of Woody Allen is frequently classified as “Comedy” and “Romance”, but whoever is familiar with their films is aware of how different they are. Thus, on a humourless-hilarious scale, they may achieve similar ratings, while scores on simple-challenging are likely to differ considerably. The affective picture drawn can be seen as a by-product of the scales. Since every word has its place in affective space, ratings along specific semantic dimensions automatically express something about the affective value of the entity being described. This actually is an advantage rather than disadvantage of scales. The vocabulary can be tailored towards the media type to be most expressive and discriminating between entities. The affective value of the scales, in turn, can be determined beforehand. This can be achieved either by way of semantic differentials which have the defining nouns or adjectives as the object of investigation, or automatically through opinion mining, as explained in chapter 2.3. Locating the scales in affective space before actually using them solves the requirement that the meaning of metasigns must be well-known and thus turns scales into an effective metalanguage. The media-specific statements gathered from the scales can then be used to locate entities in general affective dimensions – EPA, for example. It makes scales for different media types compatible and comparable, thus allowing to find affective similarities between entities from different media types. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 51 The translation problem Speakers of German will likely agree that the presented translations for the film scales from the TV guide do not exactly match the overall tone of what the original scales express; their denotations might be the same, but their affective connotations are not. This leads to one of the major problems which scales as metalanguage exhibit, the problem of translation. Speakers of multiple languages are very aware of how difficult it can be to catch the right tone in a translation. While it is possible in many cases to find a direct denotative equivalent across languages, an affective equivalent is much harder to achieve. For more abstract concepts, even the denotative equivalent might be missing. Instead, when there is no perfect one-to-one translation, one can employ a phrase or sentence to achieve a good overall match. The translation problem shows how important it is to be aware of the affective connotation which is introduced alongside a metasign. It influences the overall meaning of a concept, and thus, it cannot be assured that speakers of different languages are rating the same affective dimension. This makes it very hard to compare results acquired from scales across languages; it is all but certain if occurring differences are due to cultural differences, or plainly because subjects were rating different concepts. Study: scales and music In an undergraduate study of mine (Spindler, 2006), scales were applied to music. 30 pieces of western popular music were selected to cover a wide range of musical styles. 15 scales were selected to be well suited to draw an affective picture of music. Scales were designed as semantic differentials, using German-language antonymous terms at both ends, and further explained by brief descriptions. In addition, a familiar star-rating was supplied to draw comparisons with evaluation. The experiment was designed as a website, with test subjects registering to allow identification across multiple sessions. The music player was embedded in the website, which assured that subjects were only rating the song they were currently listening to. At the end of the rating process, subjects filled in a questionnaire about their experiences. For each scale, they stated how well they understood the meaning of the scale. There was a slightly positive correlation between evaluation and ratings for other scales; subjects were likely to give a higher rating on specific scales when they liked the song. Interestingly, there was almost no correlation between the scale desperate-happy, which tested for pleasure, and the evaluative scale (Pearson r correlation of 0.1). This result is in line with the dissociation theory put forward by Schubert (see section 4.2) and implies two forms of valence. The only scale which showed slightly negative correlation with evaluation was the scale unsensibelschnulzig (which roughly translates into insensitive-soppy). A possible expla- CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 52 nation would be that the defining words for this scale themselves generally score low on evaluation. If one favours a piece of music, one is unlikely to link it with words that describe an undesired concept. Vice versa, a piece of music really disliked may also be described with unfavourable words. This is another indicator that the affective connotation of metasigns needs to be taken into account. Subjects achieved reasonable agreement on most scales. In most cases, there was a distinct tendency towards one value on each scale, the distribution of ratings often approximating a Gaussian distribution. Taking the data from the questionnaire into account, subjects reached better agreement on scales whose meaning was very clear. 5.3 Affect words in natural language Another choice for a metalanguage is natural language. In this case, we use language – words, sentences or longer passages of text – to denote the affective value which is connoted by an entity. The most expressive option is free text. Here, the message can be as subtle as our writing capabilities are. Writers – especially poets – excel here; the imagery of poetry is very expressive of affect. However, this option is not feasible as a helpful metalanguage. Free text is probably as complicated to analyse as the information entity we wish to describe. In this case, we just replace an image, a piece of music, or an essay with another piece of text, without getting closer to our goal. Thus, free text does not fulfil the requirement that the metalanguage must employ signs whose full meaning is clear and unambiguous. More suitable are affect words, i.e. words which denote affect. For instance, this includes the names most frequently employed for basic emotions; anger, fear, surprise, disgust, joy. Of course, there are countless other words which would fit in this list equally well. In this case, the complexity of text is reduced to single signs with well-known meaning, which fulfils the requirements laid out. An advantage of affect words is that denotation and affective connotation match by definition; they are, in a way, affect in a very pure form. To become a feasible metalanguage for use in interactive systems, affect laden words must be used in an implementation which allows us to describe entities with affect words. On the web, this implementation can be found in tags. Tags Tags and keywords are a very direct form of natural language descriptions. Libraries have used keywords for a long time to classify content without having to rely on the title of a work. This classification would be made according to a controlled vocabulary. Clearly disambiguating the meaning of the keywords ensured that keywords applied by different librarians would be comparable. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 53 Tags are keywords too, but do not follow a controlled vocabulary nor are created by professionals. Usually applied on the web, users can pick arbitrary words to describe content. This approach is particularly well suited for the web. Inherently open-ended, tags introduce a flat, non-hierarchical form of structure not captured by planned approaches. Their biggest strength might lie in their way to adapt. Language continually changes, inventing and coining new words for concepts which cannot be described yet. And as soon as there is a word to denote a concept, it can become a tag. In comparison, controlled vocabularies are slow, and, while being well-designed for a particular task, do not adapt well to changing requirements. This phenomenon has been compared to desire lines (Mathes, 2004), a concept from city planning. When people frequently diverge from paved walking paths and use shortcuts through grass, trails become visible over time. Desire lines thus indicate the routes people want to take, which is something tags excel at. At the same time, the ability to adapt is also a problem of tags. Words frequently are synonymous, so one tag might reference very different concepts. Also, the choice of words is down to taste, some users prefer one word over another in face of homonyms. Spelling variants and mistakes, as well as the decision between singular and plural form, introduce distinct tags which should actually refer to the same concept. And, in many cases, the concept is not applied correctly by users. For instance, in many cases title words are just reproduced as tags word by word, leading to not very helpful tags like “the”. In face of so many arbitrary choices, librarians may shudder and praise controlled vocabularies. To counter these problems, input dialogs for tags usually support the user in their choice of words by suggesting the choices of other users. Suggesting a few tags from which the user can pick conforms with the principle “answers first, then questions”. Even if this principle is not applied, when tags are used for a while, a relatively stable vocabulary of frequently used tags emerges. This concept is sometimes referred to as folksonomy, a portmanteau of folk and taxonomy (Mathes, 2004, Spiteri, 2007). Unlike taxonomies, tags speak, in a way, the language of people themselves – their desire lines – and are therefore more likely to be accepted and understood by fellow users and can prove helpful in search of information. An example of how tags can be used successfully are social bookmarking services like delicious3 , where users store their bookmarks online, along with tags to describe the bookmarked website. This allows the user to find the bookmark again without having to remember the title of the website, only the concept it represents. Additionally, tags are aggregated across users, allowing to find websites about a particular topic which other users have deemed worthwhile. In this way, tags also function as a recommender service. 3 http://www.delicious.com, last accessed 2009-03-18. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 54 Another example are plot keywords on IMDB4 . In this case, users name elements which occur in the plot of a film. In tags, the various aspects of meaning may all be covered, for tags are as expressive as natural language is. In the case of IMDB’s plot keywords, they usually provide information about an item’s denotation, the film itself. Tags on delicious, on the other hand, mostly group content by topics and thus target conceptual connotation. And, through the use of affect words, tags may also describe affective connotation. Case study: tags as affect metalanguage for music Chapter 4 gave an overview about the intrinsic relationship between music and emotion. A recurring notion throughout the literature about this topic is that emotional experience and emotional needs are important reasons for humans to engage in music, be it creating, playing or listening to music (e.g. Scherer and Zentner, 2001). This is also confirmed in studies where people are asked about the reasons for engaging in music (e.g. Boal-Palheiros and Hargreaves, 2001, Tarrant et al., 2000). Since tags seemingly represent, as explained before, the desire lines of users, a look at folksonomies for music should therefore mirror the importance of emotions in music. Since August 2005, the social music platform last.fm5 allows users to describe music through tags. As is the case with delicious, the individual tags are aggregated. The “top tags” page6 lists the 150 most frequently applied tags in a “tag cloud”, which constitute a folksonomy created by the last.fm users. In line with folksonomy theory, the vocabulary is quite stable; in the period from October 2007 to January 2009, only 7 out of the 150 top tags have made way for new ones, and those that have vanished or have been added are only found among those less frequently used (indicated by the smallest font size). The list of top tags from January 20, 2009 (see figure 5.4) was analysed in detail. Out of the 150 tags, about two thirds clearly represent a musical genre. Genres are a common way to classify music. According to Moore (2001), the term is similar, but not identical, to style and has no definition universally agreed upon. It might refer to conventions in instrumentation, composition, the way of playing, or the effect to be achieved by the music. Applied by media or listeners, it is not uncommon for musicians to be unhappy with being confined to a genre (e.g. Zorn, 1999). However, the popularity of genre names as tags might suggest that the concept is really helpful to listeners and reflects a way of how we classify music. An advantage of using tags instead of a controlled vocabulary for genre descriptions lies, again, in 4 http://www.imdb.com, last accessed 2009-03-18. last accessed 2009-03-18. 6 http://last.fm/charts/toptags 5 http://last.fm, 55 CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE Figure 5.4: last.fm top tags from January 20, 2009 their adaptability. Pieces of music are not restricted to one category since multiple tags can be applied, and new musical currents may be embraced quickly. It is possible to infer affective information from some genres. For instance, ‘death metal’ or ‘punk’ may imply that the music expresses some form of ‘anger’. Generally, though, genre classifications seem to cover conceptual connotation, with some hints towards denotation and affective connotation. The ongoing work of the music information retrieval community towards automatic genre classification (e.g. Scaringella et al., 2006, McKay and Fujinaga, 2004) suggests that genres imply enough distinguishing musical features to be captured algorithmically. conceptual connotation affective connotation instrumentation origin time personal miscellaneous evaluative affective acoustic american 00s albums i own atmospheric awesome beautiful female canadian 60s seen live classic cool fun female vocalist finnish 70s cover favorite love female vocalists french 90s experimental favorites melancholy guitar german favorite songs mellow male vocalists latin favourite sad polish favourites sexy piano russian swedish uk Table 5.1: Classification of last.fm top tags which do not describe genres CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 56 The remaining 43 tags may be classified according to table 5.1. I have tried to categorise the tags according to the kind of meaning covered. One can see that tags are used for a wide range of concepts; they may describe a song’s instrumentation, its origin and release date. ‘albums i own’ and ‘seen live’ are simply personal reminders, which do not describe the song in any way. All the classes of tags named so far describe attributes that are implied through the song, thus covering conceptual connotation. For some tags, my classification definitely is debatable. For instance, classic seems to imply positive evaluation, experimental may be seen by some as genre (in the sense that it is defining for experimental music to cross traditional genre boundaries), and love is also denotative, as the tag might refer to the lyrics of a song. So for some, this list might look a little different. The general notion is, however, that tags which describe affective connotation are not very common. 14 out of 150, or around 9% of the tags carry significant affective meaning, half of which being evaluative terms which seem to be used as personal reminders (“favourite” in 5 different forms) and may thus belong to the same category of conceptual connotation as ‘albums i own’ and ‘seen live’. The remaining 7 affective tags which are not strictly evaluative still are strongly valenced; 5 are positive, 2 are negative. As suggested in chapter 4.2, sad music can be favourable. Sadness is a very negatively evaluated emotion if genuinely experienced, but music which expresses negative emotions can still be positively evaluated. Full fledged emotions like ‘anger’ or ‘fear’ are not present in this list at all. Returning to the initial premise of this analysis, there seems to be a striking gap between what musicologists and listeners alike believe is important about music, and the ways in which listeners express their musical needs in verbal terms. Discussion I can only address the discrepancy uncovered by this analysis with a hypothesis. Although language offers the words we might be looking for, it may not be very intuitive to express affect through words. The relation between emotions and music is intrinsic, but also complicated (see 4.2). While moods and emotions are expressed by music and listeners pick music for their affective value, full-fledged emotions are not to be expected. On the contrary, music can be incredibly subtle in its affective meaning, and statements like fearful or disgusting are not the words that spring to mind when we think about music. Moreover, as suggested by Schubert (2001), actual emotions tend to be short experiences and are unlikely to be constant over the duration of a song. A tag may only summarise the mood of a song, not the emotions expressed. My results are in line with the study of Bainbridge et al. (2003), who analysed music related queries posted to Google Answers. In their study, only 2.4 % of music CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 57 queries dealt with affective terms. Also, the phenomenon is not restricted to music. For example, Osgood et al. (1957, p. 19) note a study in which subjects were asked to rate ice cream. Presented with a preselected vocabulary on semantic differentials, subjects could give a much wider range of confident judgements than when they were asked to describe the experience in their own words. Affect is all about personal experience. Poets are experts in producing words which capture an emotional state, but for most of us, it is not easy to find the right words. One way to address this would be to follow the “answers first, then questions” principle. An interactive system could suggest a number of tags that describe moods which are likely to be communicated by music. Language is unmatched in its power to express thoughts and to describe actual content, or denotation. However, the inherently non-verbal character of affect suggests to me that there may be other ways to describe the affective value of entities, which do not rely on words and are more suitable for the task. 5.4 Colours There is a considerable body of evidence that colours have a strong connection with emotions. If this relationship can be formalised in a consistent and predictable matter, colours would be a promising candidate for an affect metalanguage. Colours can be fully described in three-dimensional colour space models, which are tailored towards different purposes and can be translated into each other. Commonly, studies about colour and emotion employ the device independent CIELAB space, which is designed to be perceptually uniform. Human colour perception is nonlinear; CIELAB is compressed so that distances in the model result in the same visually observable differences regardless of colour. The three dimensions in this model are L* or lightness, and two chromatic dimensions a* and b*. The chromatic dimensions can be reformulated as C* (chromaticity, similar to saturation) and h (hue angle). Some studies are directly based on a HSB space, in which colours are described by hue, saturation and brightness. While such a model is not corrected for human perception, it is very intuitive to understand for humans. The ability to describe colours through three independent factors makes them well suited for dimensional emotion theories. Nearly all studies consulted are searching for formulaic mappings between colour space dimensions and hypothesised affective dimensions. Valdez and Mehrabian (1994) based their study on the PAD (Pleasure-ArousalDominance) emotion model, which is declared to be equivalent to EPA (EvaluationActivity-Potency) (see section 3.3). They used semantic differentials for adjective pairs, whose positions in affective space had been determined beforehand and which were recognised to be stable on two factors and heavily dependent on the remaining factor. 76 colours were selected from the Munsell colour system and de- CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 58 scribed in the HSB colour space. Their study was conducted on American university students. They found that all three factors PAD are strongly influenced by brightness and saturation. Pleasure increases with both saturation and brightness. Arousal was strongly linearly dependent on saturation, but also influenced by brightness – high for low values, low for intermediate values and medium for high brightness. Dominance also increased linearly with saturation, but less strongly than arousal, and decreased with brightness. They also found a relationship between hue and pleasure. Blue was found to be most pleasant, and yellow to be least pleasant, and other colours resulted in intermediate levels of pleasure. However, compared to brightness and saturation, changes in hue did not result in such clear affective differences. For arousal and dominance, the effects of hue were reported to be weak. Ou et al. (2004) did not operate from a predetermined emotion model. Instead, they applied their own factor analysis to derive the major factors underlying colour emotions. However, they used 10 antonymous adjective pairs like in semantic differentials, which were picked to be highly sensitive to one of the major affective dimensions. Subjects were only given binary choices between the adjectives with no intermediate values, and for each pair, answers were summed up to derive an overall score. Their study used 20 colours picked from the CIELAB colour space and was conducted with British and Chinese subjects. While cross-cultural agreement was very high overall, there were significant differences on two scales, tenserelaxed and like-dislike, so these scales were excluded from the factor analysis. Another scale, warm-cool, was recognised as independent from other scales and also excluded. Although factor analysis would have proven the independence of the scale, Ou et al. opted to anticipate this result. From the data of the remaining seven scales, they extracted two major factors, colour activity and colour weight. A third factor, colour heat, was added from the data of the warm-cool scale. They arrive at similar conclusions as Valdez and Mehrabian; activity and weight depend on lightness and chromaticity. The only factor influenced by hue was heat, with a hue of 50º (red to red-orange) being the warmest. Ou et al. did not include a factor for valence, which will be discussed below. They also compare their results with other Asian studies, which derived the same factors activity, weight and heat, indicating that these factors really tell something basic about colour and emotions. Suk (2006) used the SAM (Self Assessment Manikin), a graphical representation of the general affective factors PAD, to measure affective responses to colours. Experiments were conducted with German and South Korean subjects. Again, the results were similar. Chromaticity was positively correlated with all three dimensions. Lightness had not such a strong influence. Differences in changes to hue were weak, but again, blue was rated as the most pleasant colour. Cross-cultural CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 59 reactions were similar, with South Korean subjects rating blue as more pleasant and red as more arousing than German subjects. Discussion In summary, all of the consulted studies come to very similar conclusions. Colour saturation or chromaticity has the strongest influence on the affective perception of colours; while all factors depend on chroma to some extent, the connection with activity is strongest. Valdez and Mehrabian (1994) hypothesise that a physiological reason for this influence might be that photoreceptors are stimulated more strongly by highly saturated colours. Lightness (or brightness) also influences all factors. The strongest influence is an inverse correlation with the perception of dominance, which seems to be in line with the intuitive notion that bright colours are also perceived as light, and vice versa. In comparison, the influence of hue is much weaker, at least in regard to activity and dominance. Several studies note that the common perception of red as being highly arousing actually stems from the high chromaticity of common examples of red, while green is commonly seen as soothing because the selected sample is usually not very saturated (D’Andrade and Egan, 1974). However, hue fully describes if a colour is perceived as warm or cool. Ou et al. (2004) regard warm-cool as a separate dimension of colour perception. It is a pervasive idea that colours can be warm and cool, both among artists and other people. This is manifested in the conventionalised labels of water-taps; hot water is red, and cold water is blue. The study confirms this intuitive notion. This suggests to me a possible application of colours in affective space interfaces; colours can reliably represent the special semantic dimension warm-cool or hot-cold. Hue also influences if a colour was liked or disliked by subjects. In all studies, blue was perceived as the most pleasant hue, while yellow was continuously regarded as least pleasant. However, valence ratings seem to be culturally dependent; in the study of Ou et al. (2004), ratings differed so much between cultures that they decided to exclude like-dislike from the factor analysis. Colour symbolism One potential problem with the use of colours as metalanguage is their potential cultural dependency. Colour symbolism refers to the use of colours as symbols in culture. Kaya and Epps (2004) note that the emotions we associate with colours are highly dependent on personal preferences and previous experiences. Suk (2006) speaks of colour semantics and apparently refers to the denotative function which colours can perform. She gives several examples. For instance, green has a special meaning in Islam, being “associated with luxurious green mead- CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 60 ows and trees, and it symbolizes paradise for those who liven in barren land” (p. 33), an importance which is mirrored by the use of green in the national flags of many Islamic countries. In traditional Chinese architecture, specific colours are associated with each of the four directions of the compass. In Western cultures, death is symbolised by black, while in some Asian cultures, it is associated with white. Suk tries to avoid such associations by showing just the colours and measuring the immediate reaction of subjects. Emotionally}Vague7 was a web-based study about colours and emotions. While the other studies tried to determine the underlying dimensions of colour affect, this study simply asked subjects which colours they associate with basic emotions. 250 subjects from many different countries participated in the study. The results are reproduced in figure 5.5. Joy Love Anger Sadness Fear Figure 5.5: Colour association with basic emotions. Taken from Emotionally}Vague. The results mirror some of the notions of the dimensional studies. Highly arousing or active emotions are associated with saturated colours and brighter shades. Apparently, lightness is also connected with valence; the two positive emotions are associated with brighter colours than the negative ones. Anger, being negatively valenced but highly arousing, consequently is associated with highly saturated red but also black. Also note that hue has an influence; both sadness and fear are, if at all, associated with cool colours, while the positive emotions are represented by warm colours. Though there are parallels, the proposed formulas of the dimensional studies would not predict these results. One possible explanation for these deviations is the influence of colour symbolism. Emotions are very frequent experiences in our daily lives, and specific colours have become associated with these states. As figure 5.5 shows, bright yellow stands 7 http://www.emotionallyvague.com, last accessed 2009-03-18 61 CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE for joy, possibly because it is how we perceive the sun. Red and pink stand for love, but red also symbolises anger. The most striking deviation concerns the influence of hue. In all dimensional studies, blue was perceived as the most pleasant colour, while yellow was continuously regarded as the least pleasant example. However, as figure 5.5 clearly shows, yellow is by far the most common association for joy. Blue, on the other hand, most commonly occurs with sadness and only comes in sixth for joy. In the English language, to feel blue is synonymous with being depressed or sad. According to the OED, this use of blue came into common use in the 19th century. It is not clear if the association of blue and sadness is a result of this meaning of the word blue, or if the meaning came into use because the colour does reference sadness in some biological way that predates language. To hypothesise, this seems to be another case where two layers of valence become apparent. If asked which colours we prefer, blue is the most frequent answer. Thus, the affective connotation of the colour blue has positive valence. If used as a signifier, however, blue can denote an emotional state with very negative valence. Yellow, on the other hand, can be a signifier for the emotional state of joy in the same way that the word joy is. If people are to choose their favourite colour among a selection, yellow is not preferred; thus, the affective connotation of the colour yellow itself has negative valence. However, yellow is associated with joy, which is a very positive emotion. The principle is illustrated in figure 5.6. colour emotions colour symbolism POSITIVE VALENCE joy NEGATIVE VALENCE sadness Figure 5.6: Hypothesised denotative functions for yellow and blue. Since experimental results for the valence of blue and yellow differ clearly from the symbolic use of these colours, it seems believable that the results obtained in the dimensional study actually measured the emotion of colours itself, and not their symbolic meaning, as it was intended by Suk. In the case of arousal, there seems to be no difference between how people feel about colours on their own (colour emotions), and when they are asked about the feelings they associate with colours (colour symbolism). Examples There are a few examples in which the connection between colours and emotion is being employed. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 62 Chen and Yen (2008) describe a video player which allows manual annotation of videos with emotions. Passages of a video can be tagged with an emotion which a user believes to be expressed. The emotions are represented by colours: angry (red), fear (green), sad (blue), happy (yellow). Passages can also be tagged as neutral (beige). The selection of colours mostly conforms with colour symbolism. de Melo and Paiva (2007) describe an embodied conversational agent. They want to support the agent’s expressiveness by adding colourful light and shadows to its environment. The colours used are not described in details, but they use red light to convey anger and a black & white filter to express despair. Cymbolism8 is an interesting project which aims to create a dictionary of wordcolour associations. Visitors are presented with a random word along with a description of its denotation. The visitor can then pick one out of 19 colours to express which colour he or she associates with this word. New words can be suggested by visitors. The website is designed as a tool for designers to support them in the task of picking a colour which goes well with a certain word. Moody Music is strongly tied to emotions, and emotions are somehow connected with colours. The software Moody9 builds upon this mediated relation of music and colours. It is a plugin for the iTunes media player and lets users tag their personal music collections with colours. Afterwards, the user can select a desired range of moods (colours), and the software only plays pieces of music tagged accordingly. Users can also share their colour tags online. Figure 5.7 shows a screenshot of the interface. The colours are mapped according to a two-dimensional representation of affective space. Each dimension is represented as a 4-step scale, resulting in 16 possible colours. The horizontal axis is defined as sad-happy, representing pleasure, and the vertical axis is defined as calm-intense, representing arousal (see figure 5.7 on the right). The colours for the four extremes of this affective space are: blue for negative pleasure and low arousal (−P − A), red for negative pleasure and high arousal (−P + A), green (+P − A) and yellow (+P + A). This selection has been clearly made for the symbolic meaning of these colours; yellow for joy, red for anger, and blue for sadness. The emotions associated with these colours occupy the same positions in a two-dimensional affective space: sadness (−P − A), anger (+P − A) and joy (+P + A). The intermediate colours are blendings between the corner colours. In a L*a*b* colour space, yellow-blue constitute axis b*, and red-green constitute axis a*. This makes green a logical choice for the fourth colour. Figure 5.8 shows the 16 colours of Moody in a chromaticity diagram (twodimensional diagram with a* as abscissa and b* as ordinate). The numbers in the 8 http://www.cymbolism.com/, last accessed 2009-03-18. last accessed 2009-03-18. 9 http://www.crayonroom.com/moody.php, 63 CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE AROUSAL VALENCE Figure 5.7: metaphor Moody: Two-dimensional pleasure-arousal space using a colour colour spots represent the colour’s lightness value. The colours are distributed well in a*b* space. The colours represent two pleasure dimensions, valence and arousal; their orientations in a*b* space are also indicated. In this implementation, the affective dimensions lie diagonally in a*b* space. There are some parallels that can be drawn between the empirically derived formulas from the cited studies and the implementation in Moody. For the most part, chromaticity is positively correlated with valence and arousal. The only exception is blue, which represents lowest valence and lowest arousal, but is not the colour with lowest saturation. Apart from that, there is a clear tendency that colours farther from the centre represent higher valence or arousal. Like in the empirical studies, lightness is also positively correlated with both valence and arousal. The biggest difference to pure colour emotions as inferred in the empirical studies is the influence of hue. In Moody, hue differences are an important differentiation between represented affective states, resulting in almost equidistant distribution of the colours in a*b* space. Moody is an interesting example of an affective space interface. Affective space is reduced to two dimensions, with only four steps on each dimensions, resulting in 16 affective states that can be expressed. Arguably, this leads to only a rough indicator. On the other hand, affective experiences are subjective and fuzzy, and giving only few options makes it easier for users to choose between these alternatives. The use of colours as metaphor also befits the fuzzy character of the affective value of music. Colours are not very specific, and are, possibly, more easily associated with affective experiences induced or expressed by music than other descriptors. Furthermore, colours have the advantage that an affective state can be expressed on very little space; it is not necessary to reproduce the coordinate system to be able to locate a colour’s position. Colour choice in Moody is symbolic. Colour symbolism fits well with their use as signifiers for the affective connotation of media. As explained before, for 64 CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE b* 61 …L* value 92 Arousal Valence 82 50 81 70 73 72 65 61 64 25 62 55 57 57 -50 a* -25 0 25 50 50 52 -25 46 Figure 5.8: Moody: a*b* chromaticity diagram showing the colours’ positions. arousal, there seems to be no difference in colour symbolism and colour emotions; chromaticity and arousal are positively correlated. In the case of valence, it is not intended to describe how favourable the song is, but how favourable the emotions are that are expressed in the song; valence is understood as a happy-sad scale. The situation might be different if the other form of valence, like-dislike, were to be expressed. In this case, colours which are favourable and unfavourable should be picked. Of course, star ratings provide a much more familiar and unambiguous scale for this type of valence. 5.5 Facial expressions Research of human emotions is closely tied to the study of facial expressions, frequently seen as the primary way for humans to communicate their emotional state (Ekman, 1999b, Buck, 1984). The relationship seems to be so intrinsic that it is hard CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 65 to write about emotions without explaining facial expressions, and vice versa. In my case, the topics were divided up to emphasise the role facial expressions can play in affective space interfaces, being one of several metalanguages to choose from. However, because emotion theories commonly explain both phenomena, several notions from chapter 3 will reoccur here. Darwin (1872) pioneered the view of facial expressions which have developed evolutionary. He was the first to undertake cross-cultural research in this area and concluded that certain facial expressions are universal, i.e. being exhibited and understood by humans regardless of cultural background. He also showed that some emotion related facial expressions are also exhibited by animals, and that even infants have the capability of recognising emotional states from facial expressions in other humans. Facial expression research only became relevant again the second half of the twentieth century. Since then, many of his findings were confirmed by newer research. Buck (1984) explains the communicative function of emotions through a communication metaphor: Emotional states are encoded in a facial expression. Such facial expressions are recognised by other humans via sensory input and are subsequently decoded to infer the emotional state of the person who exhibited the facial expression. Perception: categorical vs. dimensional Being intrinsically tied to emotions, facial expressions are being described with the same models; there are proponents of a dimensional approach and a categorical approach. For instance, Schlosberg’s early dimensional theories of emotion (1952, 1954) were actually attempts to describe facial expressions. In fact, much of what is known about emotions has been derived from studies of facial expressions. Perhaps the most prominent proponent of the connection between emotions and universal facial expressions is Paul Ekman. His cross-cultural research (Ekman et al., 1972), from which he inferred the universality of six basic emotions, shaped the idea that emotions are perceived categorically. His views have been challenged numerous times. Ortony and Turner (1990, p. 320) state that for Ekman, states which do not result in distinct facial expressions are not actual emotions. This notion is not entirely true, though. Ekman and Friesen (1975) acknowledge the idea that basic emotions are in fact emotion families, which are emotions that are similar in regard to physiological activity and expressive behaviour. This notion renders the distinction between categorical and dimensional models somewhat obsolete; the useful observation of dimensional models that emotions vary in their similarity is acknowledged by proponents of categorical explanations. Moreover, several studies have confirmed Ekman’s notions of universality and categorical perception of emotions. For instance, Etcoff and Magee (1992) conducted a study for which they produced series of drawings which blend from one emotion CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 66 to another. Their results showed clear perception boundaries around the pictures which expressed two emotions in the same amount, and subjects hardly ever used two emotion terms to describe a face although encouraged to do so. The only exception which did not seem be perceived as a separate category was surprise. This leads them to suggest that surprise may actually be a cognitive state which can co-occur with true emotions. Young et al. (1997) conducted a study which tries to determine if facial expression perception is better explained by a dimensional or categorical model. They too prepared images which show mixtures of two emotions; in this case, morphs of real photographs were used. Again, subjects picked one of the basic emotions in most cases. In a second experiment, they produced blends with a neutral expression and again noticed a perception boundary. They also note that distance from pure emotion had an influence on reaction time, and in some cases, subjects accurately judged the second emotion which had been blended in to produce the morph. They draw the conclusion that facial expression perception is categorical, not dimensional. In two-dimensional models like the circumplex model (Russell, 1980), emotions build antonymous pairs. For instance, happiness is seen as the opposite of sadness. Therefore, a mixture of these emotions should give a neutral expression. Since their subjects did not judge such mixtures as neutral, they conclude that a two-dimensional model can not be accurate. Instead, emotions displayed by facial expressions must lie in a higher-dimensional space, in which distinct emotions are connected without passing through the neutral state. “Dimensions such as pleasantunpleasant thus correspond to intellectual, not perceptual constructs.” (Young et al., 1997, p. 309) It should be remembered, however, that a three-dimensional model (e.g. Mehrabian, 1996, Osgood, 1976) can reliably distinguish basic emotions. This result is very much in line with the notion that affective space is indeed multidimensional. Basic human emotions occupy distinct points in this space, but this is not an exhaustive description. However, given the importance emotions play in our lives, facial expressions capture very important affective dimensions, even in their most basic form. Of course, the message conveyed by facial expressions is usually much more complex. Actors are capable of consciously communicating much more through their facial expressions, while in our daily lives, we do so without actually noticing. Display rules Though the biological basis for emotions and facial expressions is most likely identical all over the world, actually exhibited facial expressions differ considerably due to cultural differences. Ekman and Friesen (1975) coined the term display rules, which influence how we express our emotional states. These rules are specific to culture and become inhabited over a lifetime. Humans learn these rules either by observation of other people in the same situation, or by being told what is appro- CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 67 priate and what is not in face of a situation. A good example is a funeral (noted by Buck, 1984 and Ekman and Friesen, 1975), where it would be inappropriate to show happiness even if one feels so inside. Also, if the secretary of a deceased businessman were to show more sadness than the widow, it might suggest that their relationship was more than only work-related. For Buck (1984), it is important to distinguish between spontaneous communication and symbolic communication. Spontaneous communication via facial expressions is a result of the emotional state we are experiencing. To employ his example, it is a sign for one’s emotional state in the same way that dark clouds signal impending rain. This is the kind of facial expression which can also be observed in animals. In humans, spontaneous facial expressions are still regulated by display rules. The facial expressions which result when we feel basic emotions fall into this category. On the other hand, symbolic communication is unique to humans and culturally dependent. We can choose to display a very wide array of facial expressions. This may be done for humorous reasons to mark an utterance as a joke, or for deceit to make others believe that we feel in a different way than we really do. Symbolic facial expressions may be so frequently used that they become habits, at which point we are hardly aware anymore that we display them. The difference becomes well visible when we compare genuine, spontaneous smiles and smiles which we choose to display. Both involve contraction of zygomaticus major, which is the facial muscle that pulls the lips upwards and apart in the way characteristic for smiles. We are aware that the smile is the facial movement which signals happiness and friendliness and can smile voluntarily. However, genuine smiles also involve contraction of orbicularis oculi, a muscle surrounding the eye. Contraction of this muscle results in slightly closed eyes and wrinkles next to the eyes. While it is possible to make this movement voluntarily, this is usually not done in a deliberate smile. This phenomenon was already noted in the 19th century by the neurologist Duchenne de Boulogne, who conducted experiments in which he electrically stimulated facial muscles to produce facial expressions. The genuine smile is now also known as the Duchenne smile. Ekman (1992b) gives a more detailed account of what distinguishes different kinds of smiles. Facial expressions as affect metalanguage As noticed in chapter 3, emotions are an important aspect of affect, and emotional categories all have their place in affective space. Because of their close ties with emotions, facial expressions seem to be a very promising metalanguage for affective connotation. In their basic form, they are universally understood, regardless of culture. Because they are purely visual, non-verbal signals, they solve the translation and comprehensibility problems inherent to all language-based approaches. Though basically universal, there are cultural dependencies which need to be considered carefully. Symbolic facial expressions cannot be assumed to be un- CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 68 derstood correctly across cultural boundaries, and display rules should certainly influence facial expression displays to be acceptable. Basic emotions are a helpful framework to explain facial expressions, and vice versa. It is important to remember, however, that a face can communicate much more than that, just as human emotions are much more complicated. When facial expressions express basic emotions, they represent just a certain region of affective space. The discussion of the relationship of affect and specific media in chapter 4 has shown, however, that music, photographs or films only seldom communicate these basic emotions; their messages are much more subtle. This can be addressed in two ways. One possibility is to tailor the facial display for the purpose at hand. This can be achieved by the use of symbols. The advantage of this approach is that the model can stay quite simple – only those affective messages that are common for the targeted media type need to be depicted. The downside of this approach is a loss in universality. A specialised facial display is less useful for media types it was not designed for. Moreover, since a simple model makes it inevitable to rely on additional symbols to make up for a lack of expressiveness, the display becomes even more culturally dependent. The best example for simplistic, tailored facial expression interfaces are emoticons, which are discussed below. The other solution are facial expression displays whose emotional expressiveness approaches that of real human faces. Computer-generated characters in animation or live-action films have become so convincing that this task seems not completely unfeasible. When this level of expressiveness is achieved, there is no need for the addition of symbols nor specialisation towards a specific application. Then, a face can serve as a highly accurate yet very subtle representative of a wide region of affective space. The development of such a display would be challenging, in both technical and artistic ways and would require artists and developers to work hand-in-hand. It would also ask for a computational model of affect that surpasses basic emotions or three-dimensional models in its expressiveness. Chapter 6 describes a first attempt at developing such a display. Abstraction and exaggeration A related question is how far a face should be abstracted to be best suited for an affective space interface. It is without question that some level of abstraction is necessary; an artificial display is an approximation towards reality no matter how sophisticated. It can be argued, though, that it is not even desirable to imitate a natural face in the most detailed way. Figure 5.9 is taken from McCloud (2006), an excellent resource for artists on how to draw comics. It depicts the same meaning – the affective state of anxiety – in five different levels of abstraction along a resemblance-meaning dimension. At the left end of this spectrum, a highly detailed drawing of a face conveys anxiety. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 69 Figure 5.9: Anxiety in five levels of abstraction. Taken from McCloud (2006, p. 96) Towards the right side, the level of detail is gradually reduced; with increasing abstraction the level of resemblance decreases. The third picture only contains a minimal number of facial features but still expresses anxiety as accurately as the most detailed one. When the facial expression is abstracted even more, facial features are lost which are necessary to convey the emotion; in this case, the wrinkles are missing. To make up for the loss of expressiveness, the face is augmented with sweat beads, a symbol whose effectiveness depends on the viewer’s knowledge of its meaning. The word anxious constitutes the right end of the spectrum; it is pure meaning and depends entirely on the knowledge of the viewer to imagine the facial expression which this word denotes. A detailed discussion of McCloud (2006) and its implications for the design of facial expression interfaces is given in chapter 6. The goal of high expressiveness does not mean that it is necessary to depict a face in great level of detail. As we have seen, a more simple drawing can be equally effective. Indeed, realism can prove counterproductive for this task. Our faces are guided by display rules, and it is only seldom that we actually exhibit full-blown emotions in our face. Also, high realism specifically depicts a person. The more abstract the drawing, the more general it becomes, thus making it easier for users to identify or engage with the face. Facial expressions can be captured by a number of key features. These features are the eyes, eyebrows and the mouth. Wrinkles are a result of skin pushed together and are thus a by-product of the actual facial expression. In computer-generated faces, they are frequently ignored. However, they too are necessary for accurate facial expressions. Concentrating on the key features of facial expressions reduces the number of distracting elements and can help to convey the intended emotion more clearly. A final, pragmatic argument in favour of a less detailed approach is that of expectations. When a face looks very real, it might lead users to expect the same capabilities as a real person. As Picard notes, “The more complex the system, and the higher the user’s expectations, the harder it also becomes for the system’s designer to craft the appearance of natural, believable emotions.” (Picard, 1997, p. 221) CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 70 The expressiveness of a face can be increased further if the key features of an expression are exaggerated. This is just the device which is employed in caricatures. Brennan (1985) made an early and impressive attempt at a system which automatically creates caricature drawings. The system represents drawings by a set of curves. The drawing which should be turned into a caricature is compared with a reference drawing, and the differences between the drawings are stored as vectors. The user can then choose a factor by which the calculated vectors are multiplied. The differences to the reference picture are the drawing’s key features. Increasing the distance of this key features in the direction of their respective vectors results in an exaggerated version of the first picture. Calder et al. (2000) conducted a study on the effects of caricaturisation. They produced caricatures of real photographs which depict basic emotions. They employed a method similar to Brennan’s caricature generator. A computer system calculates the key features of an emotional expression by comparing it with the photograph of a neutral expression. With morphing software, these differences are then exaggerated to produce caricatures. They tested for variables like intensity of depicted emotion and face-likeness of the caricature. Their results showed that face-likeness of exaggerated photos was perceived as decreasing. However, emotion intensity was rated linearly increasing with the level of caricaturisation. Exaggeration thus seems to be a valid method to increase the range of emotions that can be depicted by facial expression displays. Emoticons In interactive systems, the most common application of facial expressions as metalanguage for affect are emoticons, which are highly abstracted facial expressions, usually represented by conventionalised character sequences. In face-to-face communication, bodily expressions and prosody – rhythm, stress and intonation – communicate the speaker’s intention to the listener. Text-based computer-mediated communication makes it very hard for people to convey the affective value of messages, because neither of these devices is available. Emoticons are used to solve this shortcoming, communicating affect through the only available device: textual characters. According to the Wikipedia entry on emoticons10 , while the principle can be traced back to the 19th century, emoticons were started to be used in 1982. The first emoticons were :-), symbolising a smiling face to mark a message as a joke, and :-(, a sad face to flag something as serious. The concept was picked up quickly as intended and has become a ubiquitous part of online communication. The symbol :-) seems to be influenced by the iconographic smiley , which became popular in the 1970s11 . Indeed, sometimes emoticons are called smileys, and in more recent 10 http://en.wikipedia.org/wiki/Emoticon, 11 http://en.wikipedia.org/wiki/Smiley, last accessed 2009-02-10. last accessed 2009-02-10. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 71 years, many software applications have started to automatically replace text-based emoticons with graphical emoticons, which are usually based on this design. In Asian countries, a different style of emoticons has evolved. The basic smiling emoticon is produced as (^_^). Eastern style emoticons are not read sideways. While Western emoticons emphasise the mouth, Eastern emoticons focus on the eyes as their primary communicative signal. This may be a result of cultural differences in display rules, but I could not find a study which tries to answer this question. Another influencing factor might be the drawing style of manga, which commonly features oversized eyes. According to the Wikipedia entry on emoticons, Eastern style emoticons have been becoming more popular in the Western world lately. While text-based emoticons are very different, I could not find an instant messenger which transforms these into graphical ones. For instance, the Japanese version of Yahoo! Instant Messenger, a separate product rather than a localisation, uses the same graphical emoticons as the Western version. There are many conventionalised emoticons to express a wide range of affective states that are hard to express verbally. Derks et al. (2008) conclude in their study about emoticons that they are primarily used to express emotions, to put emphasis on the verbal message and to express humour. In this way, they argue, emoticons express what usually is expressed non-verbally. In an earlier study, Derks et al. (2007) noted the influence of context and valence on the use of emoticons; in a positively evaluated situation and in a social context, emoticons are more common than in task-oriented or negatively evaluated contexts. Interestingly, Walther and D’Addario (2001) conclude in their study that they could not find any influence of emoticons on the interpretation of text messages. They conclude that emoticons complement verbal messages but are not used to change their meaning. However, they have become so frequently used that they lead Azuma et al. (2008) to imagine emoticons as a possible universal visual language of the future. Emoticons do not occur automatically as a direct reflection of our emotional state but have to be applied manually by the communicating user. Because of this voluntary nature, the use of emoticons can only be symbolic communication. The symbols used are mostly based on spontaneous facial expressions but are highly exaggerated in these cases. For example, it is a spontaneous emotional reaction of humans to blush, possibly reflecting embarrassment or modesty. The blush emotiin the instant messenger Skype12 exaggerates the characteristic red cheeks to con symbolic level. Abstraction to symbolic level means that the correct interpretation depends on the receiver’s familiarity with the meaning of the symbol. The symbols chosen, of course, are ones that are established and well-known, at least in the targeted culture. This abstraction is made for good reasons. Textual emoticons use only a few characters to display an affective state. Graphical emoticons are very small too, with common sizes around 20×20 pixels. Only the symbolic, simplistic 12 http://www.skype.com, last accessed 2009-03-18. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 72 and exaggerated character of emoticons makes it possible to depict affective states on so little space. Textual emoticons do not need a separate interface; they are simply entered as characters. Graphical emoticons may be placed by producing the equivalent textual emoticon, which is automatically converted into the graphical representation by the software. Alternatively, users pick the desired emoticon from a list of available choices. This list is usually laid out without an underlying structure. As explained before, the affective states covered are highly specialised and may not be equally distributed in major affective dimensions. A notable exception is Papermint13 , an online community in which users are represented by comic avatars. Avatars can display facial expressions, which are selected by the user through emoticons that symbolise emotions. The available choices are arranged on a valence-arousal affective space (see figure 5.10). When the user hovers over an emoticon, a tooltip gives a one-word description of the depicted state. This example illustrates that lower-dimensional mappings onto affective space can be useful to inform the layout of interface elements, even though these elements may represent specialised affective dimensions. AROUSAL VALENCE Figure 5.10: Papermint. Emoticons arranged on a valence-arousal affective space. I believe that emoticons are an excellent example of affective connotation made explicit. Text-based online conversations have much in common with spoken language, full of colloquialisms and frequently containing onomatopoeic expressions. As explained before, text-based communication is well suited for denotative discourse, while affect is difficult to express adequately. Emoticons augment a conversation limited by its mode in such way with the affective connotation intended by the speaker. 13 http://www.papermint.com, last accessed 2009-03-18. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 73 The ‘affective vocabulary’ of textual emoticons is not limited; new character combinations can be invented which draw on established symbols or introduce novel ones. Graphical emoticons are much more limited to a preselected number of available states. In both cases, however, the vocabulary is highly specialised towards the target domain, the augmentation of a conversation with its intended affective connotation. Emoticons are thus an example of an affective space interface in which the affective dimensions covered are tailored for the purpose. Example: LittleBigPlanet In the platformer game LittleBigPlanet for the Playstation 3, players control avatars in pre-built or user-contributed levels. These avatars are called Sackboy and Sackgirl and can be customised in the game. The game is designed for online playing and lets multiple users play platform scenarios together while being physically apart. The game recognises the importance of affect in the communication between players. Being console-based, the game is controlled via gamepads, which makes it difficult to send text-based messages to other players (though this is possible). This makes sending emoticons to others infeasible. Instead, the game allows players to express affect through their avatars. Via the gamepad’s left-hand d-pad , a player can select an emotion to be expressed by his or her avatar. Four basic emotions can be expressed, each in three intensities: is joy, while is sadness. Thus, the vertical axis represents valence. Furthermore, is fear, while is anger, which turns the horizontal axis into the dominance dimension. Four basic emotions are thus mapped onto a two-dimensional valencedominance affective space, which is illustrated in figure 5.11. The emotional states are expressed by the avatar in two ways, facial expressions and posture. Facial expressions are exaggerated, which communicates the emotion in a clear way and fits well with the visual comic-like style of the game. Facial expressions and postures are animated, which adds greatly to their expressiveness. Posture is mainly employed to communicate sadness and fear. In presence of fear, Sackboy raises his hands defensively and leans back his head. The expression of sadness is intensified because Sackboy is bent forwards and lets his arms hang down. The addition of posture is an interesting possibility to be considered in the search for ways to express affect. In the affective computing community, there is now ongoing work to infer the emotional state of people from video recordings of their posture (e.g. Kleinsmith and Bianchi-Berthouze, 2007, Castellano et al., 2007). Facial expression recognition There are attempts at automatic recognition of facial expressions (e.g. Bartlett et al., 2003), which promise to be an interesting input method for affective information. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 74 Figure 5.11: LittleBigPlanet: Facial expressions in a valence-dominance affective space. On the one hand, visual source material – photos and video – could be analysed algorithmically to extract some indication of expressed affect when human faces are present. On the other hand, with webcams being commonplace nowadays, facial expression recognition might become a feasible input method, which does not rely on introspection, but rather uses a physiological measure to obtain information about the affective meaning of information entities. At their current state, such algorithms require expressions to be posed and can only detect them into few categories. However, to be used as physiological input method, algorithms need to be capable of detecting subtle changes in the spontaneous and weak facial expressions exhibited by users when exposed to information entities. Taking into account the inconclusive results of studies which measured facial expressions in reaction to music (see 4.2: Production of emotions in the listener), it seems doubtful that such a method becomes feasible in the foreseeable future. CHAPTER 5. METALANGUAGES FOR AFFECTIVE SPACE 5.6 75 Discussion I have described several metalanguages for affective space. Again, this is only a small selection out of the many possibilities worth considering. Each of the metalanguages described here seems to have its merits, so the question of which metalanguage to pick depends on the purpose. Language-based approaches have the advantage of being very specific in their message. One possibility is to use tags which describe affective states. As the analysis has shown, users use this possibility to a lesser extent than what might be expected. This could be countered with an interface which recommends well suited tags to the user. Semantic scales also have the advantage of being very specific in their meaning. Such scales are usually preselected, thus solving the problem of tags that users need to find the right words. The selection of defining words for the scales can be adjusted for the type of media being described. From these specific statements, positions on general affective dimensions can be calculated. Scales further allow to express different levels of intensity, something which cannot be achieved with tags. On the downside, both scales and tags exhibit the problem of translating into other languages. Translations can shift a terms’ meaning, which makes results difficult to compare. Affective space can be represented directly through its most important dimensions. While the concept seems to be easily understood, expressiveness is only achieved when a given point in affective space is shown in context of the range of possible values in this space, which limits the concept to cases in which enough screen estate is available. Colours, on the other hand, can be represented well in spatially constrained situations. Another advantage of colours is the possibility to describe any colour with three dimensions, which seems to make them well suited for use with a general dimensional model of affect. When selecting colour as metalanguage, however, it is important to remember the difference between colour emotions and colour symbolism. All of the examples consulted focus on colour symbolism, suggesting that this kind of association is stronger than the other. Facial expressions seem to be the best choice if actual emotions are to be expressed. They are – to a certain degree – understood world-wide. Being entirely visual, there is no translation problem. If abstracted and exaggerated to a certain level, their expressiveness can be increased while required space is decreased. In the last chapter, I describe a project which aims at the development of a software component which expresses emotions through facial expressions, taking into account the findings of this chapter. Chapter 6 Grimace Facial expressions are, as we have seen, closely tied to emotions. Because they are universally understandable, they constitute an excellent metalanguage for this important aspect of affect. In this chapter, I describe Grimace, a web component which displays emotions through facial expressions. Reproducing facial expressions through computer graphics necessitates artistic skills. For this reason, the component was developed in co-operation with fellow student Thomas Fadrus. While I developed the concept, software architecture and the bulk of the actual implementation, my colleague focused on the visual aspects of our project. Emotion model Artists who want to draw convincing portraits of humans need to be expert observers of facial expression. Grimace was based on the book Making Comics by Scott McCloud (2006), a manual for artists on how to draw comics. Chapter 2 of the book deals with how to convincingly draw facial expressions. McCloud develops an emotion model which follows the categorical approach, in which he postulates 6 basic emotions: Anger Joy Surprise Disgust Sadness Fear Table 6.1: The basic emotions defined by McCloud (2006) This list is, in turn, based on the research of Paul Ekman. In a series of crosscultural experiments, Ekman et al. (1972) showed photographs of facial expressions 76 CHAPTER 6. GRIMACE 77 to members of different cultures. Since the posed expressions could be judged accurately, he inferred the universality of several specific emotions. While the existence and the number of basic emotions is still debated, his results show that these 6 emotions can be judged correctly. Artists who want to draw convincing portraits of humans need to be expert observers of facial expression. The Artist’s Complete Guide to Facial Expression by Gary Faigin (1990), an excellent guide to drawing detailed facial expressions, is based on a categorical model. McCloud (2006) takes visual cues for his model from Faigin’s work and shows how to depict emotion through facial expressions in the world of comics, offering the ideal framework for our work. In McCloud’s model, these basic emotions can be blended to achieve complex emotions. He compares this process to the way arbitrary colours can be mixed from three primary colours. Accordingly, he calls the 6 basic emotions primaries, and blendings of two basic emotions secondaries. He gives example depictions and names for all primary and secondary emotions, which we used as the basis for our work. He asserts that mixtures can occur in arbitrary intensity and might even include three emotions. Goals The goal of the project Grimace was to build a facial expression display using web technology which effectively conveys emotions as depicted by McCloud (2006, p.8385). This includes any primary (or basic) emotion and any secondary emotion (blendings of two basic emotions) in arbitrary intensity. The result should be a free component which can be easily integrated into other projects. 6.1 Related work Realistic approaches The development of dynamic face models is an important research area. Most of this work is being undertaken in the field of affective computing, which aims to enhance interactive systems with affective awareness and expressiveness. In a principal work for this research area, the goal was defined as “making machines less frustrating to interact with.” (Picard, 1997, p. 214) Interactive systems should recognise the affective state of the user and adapt their output and behaviour accordingly. One commonly proposed solution is the use of embodied conversational agents (ECA). In such an environment, the system interacts with the user through an avatar with human-like appearance and expressiveness. ECA strive to be believable conversation partners. Therefore, they need to be able to express their emotions through facial expressions. Since an ECA also speaks, another requirement is ap- CHAPTER 6. GRIMACE 78 propriate facial animation which supports the impression that the ECA is actually speaking. Usually, an ECA is modelled as a 3D head. Wang (1993) undertook an early yet impressive attempt at building a threedimensional face model, which predates affective computing. The face consists of a set of hierarchical b-spline patches, which can be manipulated by simulated muscles. Pan et al. (2007) focus on the notion that the basic emotions as postulated by Ekman are not the ones which are actually needed for conversations with believable agents. They developed a 3D character which expresses affective states like agreement, interest, thinking and uncertainty through facial expressions and animated gestures. Ochs et al. (2005), on the other hand, base their three-dimensional ECA on basic emotions, but allow them to be blended to achieve more subtle affective states. Albrecht et al. (2005) depart from the concept of basic emotions and base their agent on a dimensional approach. Their agent augments a text-to-speech system. The system analyses a text for certain terms whose coordinates for three affective dimensions are stored in a table. With these values, the system augments the spoken text with appropriate facial expressions. Zhang et al. (2007) follow a similar approach. In their system, high-level affective dimensions are translated into MPEG 4 Facial Animation instructions. This standard is described by Pandzic and Forchheimer (2002) in detail. There are publicly available ECA frameworks. The CSLU Toolkit1 is one example. It is a comprehensive framework which facilitates application development with the included ECA. Another example is Xface2 , which achieves an impressive level of realism. It makes use of the MPEG 4 Facial Animation standard. Comic-like approaches The approaches described above usually employ characters which are designed for a high level of realism. Such level of realism, however, is not necessary for this project. We aim to unambiguously express emotions through facial expressions. McCloud (2006) shows that a certain level of abstraction is possible without any loss of expressiveness (see section 6.2). There are a few attempts to develop face models which aim to achieve such abstracted comic-like appearance. Bartneck (2001) conducted a study on the well-suitedness of different representations of facial expressions. Three representations at different abstraction levels were compared; photographs of a real face, an embodied conversational agent from the CSLU Toolkit, and highly abstracted emoticons (black & white, 10 × 9 pixels in size). Subjects rated facial expressions of these representations for convincingness and distinctness. Results showed that the very abstract emoticon was perceived to 1 http://cslu.cse.ogi.edu/toolkit/, 2 http://xface.itc.it/, last accessed 2009-02-20. last accessed 2009-02-20. CHAPTER 6. GRIMACE 79 be as convincing as the photograph, while distinctness was rated as decreasing with increasing abstraction levels. Tanahashi and Kim (1999) developed a comic-like face model. Their model is designed to express four out of six basic emotions as defined by Ekman. They also experiment with exaggeration of the features and with the addition of symbols to achieve a higher level of expressiveness. The symbols employed are highly culturally dependent. Iwashita et al. (1999) also follow a comic approach in their system for caricature development. An interesting point is the use of a survey to improve the validity of the system, in which they asked subjects to pick the most expressive and convincing representation out of a few alternatives. Schubert (2004) describes a very simple comic-like face model, which only consists of eyes and mouth. These features are represented by parabolic functions. His model is based on a dimensional approach. The shape of the mouth indicates valence, the degree to which eyes are opened represents arousal. The model is used to visualise emotions expressed by music. Emotions are very short experiences which are not constant over the duration of a song. The model expresses these changes in affect. Discussion Our goal is not to build a conversational agent; we do not include animation or a speech component. The realistic three-dimensional approaches are designed for a purpose quite different from our goals. Although embodied conversational agents can express emotions through facial expressions, we believe that expressiveness can be increased further. For instance, the ECA cited above do not include wrinkles in their design. However, wrinkles are an essential aspect of some facial expressions (disgust, for example). The comic-like models we have found, on the other hand, reduce the facial complexity to an extent which results in a loss of expressiveness. Tanahashi and Kim try to counter this with the addition of symbols. This is something we wish to avoid, because symbols are culturally dependent, while cross-cultural research has shown that facial expressions on their own are universally understood. This brief survey leads to the principles on which we built our system. 6.2 Design Grimace is based on two principles: simplicity and accuracy. Our face is simple by design, a two-dimensional comic-like face which only consists of those facial features which are relevant to the display of emotions. In order to achieve credible facial expressions, however, the few facial features available must be flexible and accurate. 80 CHAPTER 6. GRIMACE In 3D computer graphics, face models are represented by a polygon mesh. The mesh can be manipulated to show facial expressions. However, our goal of clearly displaying emotions does not ask for a detailed 3D model. McCloud has shown that a certain level of abstraction is possible without any loss of expressiveness. Consider, again, figure 6.1, in which anxiety is depicted in five levels of abstraction – from a very realistic drawing to full abstraction (a word). The face in the middle uses only a minimal number of facial features while being as expressive of this emotion as the most realistic depiction. Reduced detail and complexity allows to focus on those features which are necessary to effectively display a specific facial expression. This is the level of abstraction we want to achieve. We believe that detailed reproduction of faces is not necessary to convey emotions unambiguously and a certain level of abstraction can actually convey emotions more clearly than a very realistic approach. Calder et al. (2000) show that comprehensibility can be increased even further if characteristic features of facial expressions are exaggerated. Figure 6.1: McCloud (2006, p.96) The Uncanny Valley Another reason why we opted for a comic-like, abstracted approach is the notion of The Uncanny Valley. The concept was introduced in a short essay by robotresearcher Masahiro Mori (1970). He asserts that representations of humans at different levels of human-likeness are not linearly correlated with the level of familiarity. While abstract depictions – like cartoon characters – are readily accepted as quite familiar and healthy persons naturally achieve the greatest level of familiarity, there is a gap in between for almost human-like depictions. These almost humanlike appearances – corpses and zombies are examples given – are perceived to be creepy or uncanny. Figure 6.2 shows Mori’s graphical representation of the concept. The drop in familiarity for almost human-like characters is visible as a valley, hence the name Uncanny Valley. Bartneck et al. (2007) conducted a study in which they tried to find out if Mori’s predictions are accurate. They used pictures of robots at different levels of humanlikeness, as well as pictures of real humans. It has to be noted that Bartneck et al. measured the level of likeability in their study, not the level of familiarity which Mori 81 CHAPTER 6. GRIMACE + Uncanny valley Moving Still Bunraku puppet Healthy person Familiarity Humanoid robot Stuffed animal Industrial robot Human likeness 50% 100% Corpse – Prosthetic hand Zombie Figure 6.2: Graphical representation of The Uncanny Valley. Taken from Geller (2008), based on Mori (1970). described. Their findings did not confirm the predicted raise in likeability. More abstract depictions were perceived as more likeable than pictures of real humans. They note that knowledge about whether the photo showed a real human or a robot did not have an influence on likeability. Instead, results were highly dependent on the perceived beauty of the stimulus. From their results, they infer that it might be more accurate to speak of an Uncanny Cliff. Geller (2008) gives an up-to-date examination of the concept. He notes that there is a number of examples which contradict Mori’s predictions. One important factor which influences familiarity is the depiction of the eyes. There might be an uncanny threshold; the eyes need to be very well done to be acceptable. He closes with a recent quote of Mori, in which he, too, questions the predicted raise familiarity. Mori now believes that the human ideal is best expressed by the calm facial expressions of Buddha statues. If there is an Uncanny Valley, it needs to be carefully avoided for our face to be perceived as acceptable and not creepy. This we hope to achieve by choosing an abstracted, comic-like graphical approach. Functional principles We believe that closely following the biological process of how emotions result in facial expressions increases the credibility and clarity of the displayed emotion. In humans and animals alike, facial expressions result from the contraction of facial muscles. Usually, muscles are attached to bones at both ends. With facial muscles, CHAPTER 6. GRIMACE 82 however, only one end is anchored to a bone, while the other one is attached to the skin. When facial muscles contract, they move parts of our face. Each muscle affects the facial features in a certain way, and each emotion affects a different set of muscles. When several muscles are contracted in a special way, familiar facial expressions arise which clearly convey emotions. Therefore, our facial model is muscle-based. To achieve the desired level of expressiveness and abstraction, we had to find the facial features which are necessary to convey emotions. As it turned out, there are only a few facial features needed; eyebrows, eyes and the mouth. On top of that, some emotions only become distinguishable from each other when wrinkles are taken into account. McCloud’s drawings of the 6 six basic emotions were further reduced to the essential features and wrinkles of each emotion (figure 6.3). Figure 6.3: Essential features for 6 basic emotions. These features had to be translated into graphical elements that could be transformed algorithmically. We found the necessary combination of accuracy and flexibility in Bézier splines. All facial features and muscles are represented by one or more splines. Figure 6.4 shows the first attempt to represent the facial features with a minimal number of Bézier splines. This early model proved to be not capable of expressing all necessary facial expressions and was augmented in later iterations. The shape of Bézier splines is determined by a very small number of control points. The idea was to connect virtual muscles to these control points in such a manner that contraction of the muscles would result in natural looking transformations of the splines. Emotions would, in turn, affect multiple muscles in a concerted way. McCloud notes that the facial expressions which result from emotions are fully symmetrical. Therefore, we decided that it would be enough to model one half of the face and mirror the result to the other half. 6.3 Development Selected technology Grimace has been developed in Actionscript 3 and is being deployed as a Flash / SWF file. Actionscript 3 is the scripting language used by Adobe Flash and Adobe CHAPTER 6. GRIMACE 83 Figure 6.4: First attempt to represent facial features via Bézier splines. Flex. Though not advertised, the language can be used on its own. The technology was selected for several reasons: • Free: The Flex 3 SDK is available open source under the Mozilla Public License. It contains MXMLC, an Actionscript 3 compiler, which allows generation of SWF files entirely through Actionscript 3 code, without the need for an IDE like Adobe Flash IDE or Adobe Flex Builder. • Optimised for dynamic vector graphics: Flash originated as a vector-based animation tool and offers comprehensive vector-drawing functions. • Optimised for the web: Flash is a web-centric technology which delivers small file sizes and can be conveniently integrated in web projects. • Ubiquity and consistency: The Flash player is available for all major operating systems and has an extremely high install base. SWF files are displayed in exactly the same way across platforms and browsers. Iterative development We have adopted an iterative approach to development. Instead of defining the complete architecture beforehand, we started with only implementing the most basic functions. First, we implemented the necessary classes to define facial features. Afterwards, we added the capability for muscles to move control points of features. In many iteration steps, muscles were optimised. Features were added over time; starting with the eyes (the most simple feature), later adding eyebrows and the mouth and finally wrinkles. The goal was to find muscle definitions which would 84 CHAPTER 6. GRIMACE Tension Polynomial mapping 1 0.75 Sine mapping 0.5 0.25 0 0.25 0.5 0.75 1 Anger Figure 6.5: Muscle tensions were plotted and interpolated for each emotion. allow the features to be transformed in such a way that they would be capable of expressing each of the 6 basic emotions. We defined this as our facial expression gamut. Afterwards, emotions were added. Each emotion should be able to influence any number of control points in a very flexible manner. Again, we started with the emotion easiest to implement – surprise – and gradually added the other ones. The relationship between emotions and muscles were derived from McCloud’s drawings. We had representations of each emotion at 4 levels of intensity. For each level, muscles were adjusted to match the expression. Thus, we achieved an indication of the tensions for each emotion and each muscle at 5 points (neutral and 4 intensity levels). To enable arbitrary intensity levels, we plotted muscle tensions over emotion intensity and applied curve-fitting. Thus, we achieved mathematical representations for each emotion-muscle pair. For example, figure 6.5 shows the relationship of two muscles with anger. The indicated forms are approximated by two mathematical functions. Then, we repeated the complete process for wrinkles. Here, an additional challenge was the introduction of opacity. Wrinkles are not visible all the time, but only when certain muscles are contracted. The relationship between muscles and wrinkle opacity was implemented in the same way. At this point, the face could convincingly express every basic emotion. What remained to be done was to allow blendings of two emotions. For each muscleemotion pair, we introduced an additional parameter, which controls how muscles CHAPTER 6. GRIMACE 85 should be contracted in presence of multiple emotions. This process involved a great deal of optimisation work. When the face was capable of expressing any primary or secondary emotion, the component was adapted for deployment. This included the addition of a JavaScript API, which allows full control over the face’s capabilities, and the construction of a project website. 6.4 Technical details Grimace follows a muscle-based approach and thus mimics the way biological faces operate. In human and animal faces, facial expressions result from contraction of facial muscles. Facial muscles are, unlike muscles in any other part of the body, only fixed to bones at one end, while the other one is attached directly into the facial skin. This unique property allows the wide range facial expressions humans are capable of displaying. Our face model consists of three major components: emotions, muscles and features. • Features, which are the visible elements of a face. Features can be transformed by muscles. Typically, this includes dynamic features like eyes, eyebrows, mouth and wrinkles, as well as static features like nose and head outline. • Muscles, which are the link between emotions and features. The shape of a muscle is defined by a spline and when contracted can move an arbitrary number of control points along its path. • Emotions, which are the high-level concept which influence a number of muscles in an arbitrary fashion. Each emotion affects specific regions of the face and results in familiar facial expressions. Figure 6.6 illustrates how these three components work together to achieve a facial expression. It shows the mouth and its surrounding muscles for a neutral expression and four states of anger. The shape of the mouth is represented by two features, upper lip and lower lip (shown as ). A feature consists of several control points. The mouth is surrounded by several muscles. A muscle has a defined path (shown as ) and a current tension (the dot along its path). Each control point can be influenced by multiple muscles (influences are ). When a muscle is contracted, it moves its tension dot along its path. Any control point which is influenced by the muscle is then moved, which results in a change of the feature’s shape. Now, when an emotion is present – the example shows the influence of anger – several muscles contract simultaneously. In the following, each of these components and their underlying technologies are being described. This is followed by a brief description of how Grimace can be 86 CHAPTER 6. GRIMACE 0.0 0.25 0.5 0.75 1.0 Figure 6.6: Influence of anger on muscles surrounding the mouth. put to use in other projects. A complete overview of all the classes is given in a UML-style class diagram. 6.4.1 Muscles As explained before, facial muscles are fixed to a bone at one end, and attached to skin at the other end. When muscles contract, they shorten and thus pull the skin towards the point where they are attached to the bone. We simulate this behaviour. The shape of a muscle is defined by a spline (see 6.4.4). However, unlike real muscles, muscles in Grimace have no width. The tension parameter of a muscle corresponds to the position t ∈ [0, 1] along the spline. Thus, t = 0 is a complete relaxed muscle, while t = 1 represents maximum tension. A muscle can be defined with the parameter initTension, which defines the neutral state for this muscle. This defaults to 0, but in some cases, a neutral face – i.e. no emotion is active – results in contracted muscles. An example is Levator palpebrae, which controls the upper eye lid. Since the eyes are halfway open in neutral state, this muscle is defined with initTension. The tension of a muscle, or rather, the distance between the points Q(t = initTension) and Q(t = tension) influences the position of feature nodes (see 6.4.2: Node influences). In turn, the tension of a muscle is calculated from the emotions which exhibit an influence on the muscle (see 6.4.3). Finally, Muscles are grouped into instances of MuscleGroup. This grouping is optional, but currently muscles are divided into feature muscles and wrinkle muscles, defining additional muscles which simulate the wrinkles resulting from when feature muscles contract. This is a point where we had to leave an accurate biological representation to achieve the desired facial expression gamut. 87 CHAPTER 6. GRIMACE 6.4.2 Features Features, the visible parts of a face, can be transformed by muscles. The Feature class encapsulates distinct facial features, e.g. the upper lip, an eyebrow or a wrinkle. A feature is comprised of one or more segments, which are instances of the FeatureSegment class. The shape of a segment is defined by a spline. Thus, a feature can take an arbitrary shape by connecting several segments. A spline has two endpoints and 0 or more control points, referred to as nodes. Every point is represented by the FeatureNode class and can be influenced by an arbitrary number of muscles. For every node-muscle influence, a weight parameter is stored. For n registered muscles, the position of node N is evaluated in the following way: For each registered muscle M we calculate the distance between the muscle’s position resulting from its current tension v, and the position resulting from its initial tension t. The distance is scaled by the respective weight factor w. The node’s initial position N0 is then translated by the resulting vector. n N = N0 + ∑ (wi · ( Mi (v) − Mi (t))) i =1 Features can also be filled arbitrarily, represented by the FeatureFill class. Fills can also be influenced by muscles, thus adding a way to add animation. For every fill, a FeatureNode represents a pivot point, which can then be influenced by muscles like any other node and moves the whole fill when translated. Not every feature is constantly visible. Wrinkles result from tightening of facial skin and thus only become visible when certain muscles are contracted. To simulate this behaviour, the visibility of features can be mapped to the tension of a muscle. The relation is not direct but mediated through mappings (see 6.4.5); therefore, the feature opacity can be controlled flexibly. 6.4.3 Emotions Emotions are the high-level concept which we aim to display via facial expressions. In real faces as well as in our implementation, the 6 basic emotions we have implemented result in distinct facial expressions, which have been said to be recognisable cross-culturally. The presence of an emotion is represented by a parameter value ∈ [0, 1], where value = 0 means the emotion is not present, and value = 1 represents maximum influence of an emotion. When an emotional state is present, it results in simultaneous contraction of a set of muscles. This contraction does not follow value linearly. For instance, some features only start to be influenced when an emotion is strongly present, while others are continuously influenced, but more strongly in 88 CHAPTER 6. GRIMACE early than in later stages. Therefore, for every emotion-muscle influence, we have defined a mapping which allows flexible control over how a muscle is contracted for an emotion state (see 6.4.5). Our emotion model subscribes to the idea that complex emotions are in fact mixtures of basic emotions. If two or more emotions are present simultaneously, more than one set of muscles is affected. However, since different emotions sometimes influence the same muscles, influences have a priority parameter. When a muscle is influenced by more than one emotion simultaneously, priority defines the influence of each emotion on the final tension of a muscle. For instance, a genuine smile not only influences the shape of the mouth, but also results in squinting of the eye. However, a result of feeling surprised are widely-opened eyes. If joy and surprise are experienced together, the eyes remain open, because surprise has a stronger influence on the eyes than joy. This is represented by different priorities. Given an influence of n emotions, with emotion values vi , influence priorities pi and raw emotion tensions ti , the final tension of a muscle is calculated as: t= n ∑ vi · pi · ti · i =1 6.4.4 1 n ∑ ( vi · pi ) i =1 Splines Spline is the common term for the use of piecewise parametric curves in computer graphics. A spline has two endpoints and may have control points in between. With splines, complex shapes can easily described or approximated by very few points. Bézier curves are a form of parametric curves which are commonly used in vectordrawing and animation software. It is a notable property of Bézier curves that the curve does not run through the control points but is merely pulled towards them. All shapes in Grimace – facial features and muscles – are based on straight lines and Bézier curves. They offer an easily understandable way to model the face, and the selected technology offers native support for these types of splines. Facial features are all visible components of the face, e.g. the eyes, the mouth or wrinkles. Each feature consists of one or more segments, and the shape of each segment is defined by exactly one spline. In addition, muscles are also based on Bézier curves; the shape of each muscle is defined by exactly one spline. Splines implement the ISpline interface. The interface defines the getPoint(t) method, which calculates the location of a point along the spline given the position t ∈ [0, 1], where t = 0 is the starting point of the spline, and t = 1 is the endpoint. 89 CHAPTER 6. GRIMACE The following splines are available for muscles and facial features: (a) Line (b) Quadratic Bézier (c) Cubic Bézier R Figure 6.7: Spline types 1 S3 R2 S2 S0=R3 Q3=R0 Q0 C continuity A spline which connects two endpoints with a straight line. Flash offers the Line 1 native drawing method lineTo for Qthis spline type. Q =R 1 2 S1=RB A Quadratic Bézier AR Quadratic Bézier curve has one control point. Flash offers the S R 1 2 3 C1 2 1 C2 native drawing method curveTo for this spline type. 7 8 1 2 1 8 S =R R The parametric form of a SQuadratic Bézier curve is: Q =R 0 3 1 2 3 2 0 S1 1 2 Q(Q t) = P0 (1 − t)2 +C P 1 · 2t (1 − t ) + P2 · t , t ∈ [0, 1] continuity 0 1 Q2=R0 1 2 Q1 1 2 R2=S0 3 8 Q2=RA S2=T0 1 2 S1=RB T1 3 8 Q A Cubic Bézier spline has two control points and offers great control Cubic Bézier 1 over the curve form. If two or more Cubic Bézier splines are concatenated, they offer C =T 1 2 3 2 C enough flexibility to draw all necessaryC =QCfacial features, including the mouth, which 1 0 7 8 1 2 1 8 20 demands the greatest flexibility. S R 1 2 1 2 1 1 The parametric form of a Cubic Bézier curve is: R2=S0 Q(t) = P0 (Q1=R− t)3 + P1 · 3S(=T1 − t)2 t + P2 · 3(1 − t)t2 + P3 · t3 , t ∈ [0, 1] 2 1 2 0 2 1 2 0 T1 3 8 3 8 FlashQ does not offer a native drawing method for Cubic Béziers. However, 1 the form can be approximated by lower-complexity curves like Quadratic Bézier C =T 3 2 splines. TheC =Qmore lower-complexity curves are used, the more accurate the form of the approximated curve becomes. 0 0 We have selected the Fixed Midpoint approximation method described by Groleau (2002). It approximates a Cubic Bézier with four Quadratic Béziers and offers a good trade-off between accuracy and calculation complexity. The approach is illustrated in figure 6.8. Given the four points of a Cubic Bézier C to be approximated, endpoints and control points for Quadratic Béziers Q, R, S and T are calculated in the following way: Q3=R0 Q0 C1 continuity S1=RB Q1 Q2=RA 90 CHAPTER 6. GRIMACE 1 2 C1 C2 1 2 1 8 1 2 7 8 S1 R1 1 2 R2=S0 S2=T0 Q2=R0 1 2 3 8 1 2 T1 3 8 Q1 C3=T2 C0=Q0 Figure 6.8: Fixed Midpoint approximation C0 +C1 2 + 2 + 2 C1 +C2 2 C0 + C2 C + 1 4 2 C1 +C2 C2 +C3 C + C3 C 2 2 H1 = = 1 + 2 4 2 5C0 + 3C1 Q0 = C0 ; Q1 = 8 7H0 + H1 H0 + H1 R1 = ; R2 = 8 2 Q1 + R1 Q2 = R0 = 2 H0 + 7H1 S0 = R 2 ; S1 = 8 3C2 + 5C3 T1 = ; T2 = C3 8 S + T1 S2 = T0 = 1 2 H0 = = In our implementation, the spline can be used like a regular Cubic Bézier with two endpoints and two control points, while the approximation is handled internally by the class. 91 CHAPTER 6. GRIMACE For some facial features, i.e. the shape of the mouth or several wrinkles, a Joiner single Cubic Bézier curve does not suffice. In these cases, two or more curves may be joined together to form a curve with additional flexibility. In these cases, one feature consists of more than one segment. Parametric continuity C n is a description of the smoothness of concatenated parametric curves: • C0 : curves are joined. • C1 : first derivatives are equal. • C2 : first and second derivatives are equal. Without additional measures, connected Bézier curves only offer C0 continuity. If two connected splines are to appear as a single and coherent curve, however, at least C1 continuity is necessary. S3 R2 R1 Q0 S2 S0=R3 Q3=R0 C1 continuity S1=RB Q1 Q2=RA Figure 6.9: Joiner 1 2 C1 The Joiner spline is a Cubic Bézier spline whose control points are calculated C 2 from the control points of adjacent splines to achieve C1 continuity. The concept is illustrated in figure 6.9. 1 8 7 8 1 2 1 2 A Joiner spline R 12is constructed from two Sendpoints R0 , R3 and two additional 1 R1 points R A , R B . These additional points are used to calculate the necessary control R =S 2 0 points R1 , R2 to achieve C1 continuity in both endpoints. R1 and R2 lie on the −−−→ −−−→ lines formed by R0 R A and R3 R B . The distance ofS the =T control points from the reQ =R 2 1 0 0 2 between the 1 spective endpoints on2their respective axis is derived from the distance 2 endpoints. T1 3 8 3 8 Typically, R A and R B are set to the nearest control points of adjacent splines. Q 1 For instance, if Cubic Bézier Q ends in R0 , then R A would be set to Q2 . Likewise, if Cubic Bézier S starts in R3 , then R B would be set to S1 . C3=T2 C0=Q0 CHAPTER 6. GRIMACE 92 The Joiner class is also used for mirroring. Assume a mirror through the vertical axis at position x = 0, which results in horizontal mirroring. To ensure a smooth curve, R" ( x = 0) must be 0. If R0 = ( x = 0, y = y0 ), this can be achieved by setting R A = ( x < 0, y = y0 ). Then, R0 and R A build a horizontal line, which places R1 at ( x > 0, y = y0 ) and results in zero slope for x = 0. When the curve is now horizontally mirrored at this point, C1 continuity is achieved. 6.4.5 Mappings Each emotion influences a different set of muscles. McCloud offers drawings for each basic emotion in 4 intensity levels. These drawings were used as references, which we wanted to match. For each emotion and each intensity level, muscles were adjusted to match the reference drawing. The values of the muscle tensions were saved for each intensity level. Plots of the muscle tensions showed that the relationship is a different one for each combination of muscle and emotion. In some cases, the relation is a linear one – heightening the level of an emotion increases a muscle’s tension. More often than not, however, the relation is much more complicated. In order to achieve credible muscle tensions, this relationship, only indicated by 5 points (neutral and 4 intensity levels for each emotion), needs to be interpolated. We represent the relationships by a number of mathematical functions, which we call Mappings. A Mapping takes a few parameters which influence the resulting function in a flexible way to approximate the form of the underlying relationship. The IMapping interface is merely a wrapper for a low-level mathematical function and only has one method: function y(x:Number):Number; Every registered emotion-muscle influence is represented by a Mapping. The y-method takes the current value of an emotion as parameter x and returns the current tension for the muscle. Another relation represented by Mappings is the visibility of Features. Some features – wrinkles – only become visible when a muscle is contracted. Representing this relation through Mappings allows fine-grained control over the opacity. Three mapping types are currently available: SineMapping This form of Mapping is defined by four parameters. The function returns y0 for x < x0 , and y1 for x ≥ x1 . For x0 ≤ x < x1 , the curve interpolates between y0 and y1 , following the form of a sine function. This results in a smooth transition between the two states. 93 CHAPTER 6. GRIMACE y1 y0 x0 0 x1 Figure 6.10: SineMapping y +0 + + ,, , y( x ) = 0.5 · sin π · xx−−xx00 + 1.5 + 0.5 · (y1 − y0 ) + y0 1 y 1 GaussMapping x < x0 x0 ≤ x < x1 x ≥ x1 This mapping represents the Gaussian function and is used in cases where a muscle is only contracted for intermediate values of an emotion, but not for low or high values. !2=0.02 a2 !2=0.01 a1 !2=0.005 a0 0 µ 0 µ (a) Influence of scale factor a (b) Influence of variance σ2 Figure 6.11: GaussMapping ( x − µ )2 1 − y( x ) = a · √ e 2σ2 σ 2π The mapping takes three parameters: value = a; mean = µ; variance = σ2 CHAPTER 6. GRIMACE PolynomialMapping 94 This is a direct representation of a polynomial function. It can approximate any necessary form by increasing the order of the polynomial. However, the function is hard to configure manually. In practice, we used the curve-fitting methods of Grapher.app, which calculates a polynomial interpolation of desired order for a given point set. y ( x ) = a n x n + a n −1 x n −1 + · · · + a 2 x 2 + a 1 x + a 0 6.4.6 Stroke styles The shape of features is represented by splines. Stroke styles determine how the splines are visually represented. If no stroke style is set, the spline is simply stroked by a constant width brush. However, in many cases, this does not deliver favourable results. Stroke styles implement the IStrokeStyle interface. The interface’s draw method supplies the style with the spline to be drawn. BrushStyle Currently, BrushStyle is the only stroke style available. It simulates the characteristic form of a brush; thin lines at the start, getting thicker towards the centre, and again thinner towards the end. This corresponds to the parameters startWidth, maxWidth and endWidth. From the spline to be stroked, two splines are derived which define the shape of the stroke. One spline defines the upper edge, the other one defines the lower edge. In every point of the base spline, a normal is drawn. On each normal, the positions of the points of upper and lower are shifted; points of upper spline to the left, points of lower spline to the right. Thus, maxWidth does not directly represent the actual thickness of the resulting stroke, but the distance of the control points. The concept is illustrated in Figure 6.12. 95 maxWidth CHAPTER 6. GRIMACE t= xW ma 1 3 h idt t= 2 3 idth endW startWid th Figure 6.12: BrushStyle applied to a Cubic Bézier spline 6.4.7 Facedata file format Faces are entirely defined through external files which are loaded at runtime. This allows the development of faces which look entirely different to the standard face we developed. Additional emotions can also be implemented. A complete set of Facedata defines the following: • Features, which are the visible elements of a face. Features can be transformed by muscles. Typically, this includes dynamic features like eyes, eyebrows, mouth and wrinkles, as well as static features like nose and head outline. • Muscles, which are the link between emotions and features. The shape of a muscle is defined by a spline and when contracted can move an arbitrary number of control points along its path. • Emotions, which are the high-level concept which influence a number of muscles in an arbitrary fashion. Each emotion affects specific regions of the face and results in familiar facial expressions. • Overlays, which are optional graphical elements added on top of the face to add additional personality to the face. In the standard model, the hairdo is an overlaid vector graphic. Pixel-based graphics can also be included. Facedata is an XML-based file format. Currently, no graphical editor is available; Facedata has to be edited manually. A corresponding DTD is kept up-to-date3 with 3 The latest version of the DTD can be found at http://grimace-project.net/dtd/latest.dtd CHAPTER 6. GRIMACE 96 the current capabilities of Grimace and allows face developers to validate their files through an XML validation service4 . Since the definitions can become quite large and data have to be edited manually, Facedata definitions can be spread across files. The loadFacedata API method takes an array of URLs as parameter, loading the files in the supplied order. Deployment and use Grimace is a self-contained component which enables the addition of facial expressions to software projects. Being written in Actionscript 3, the component is deployed as SWF file and can be opened by Adobe Flash Player 9 and upwards. The component can be downloaded from the project website and includes detailed instructions and demo files. Control of the component is offered by an API, which is compatible with JavaScript and Actionscript 3. The recommended method is to embed Grimace into web pages and control it through JavaScript via the API. Through embedding, Grimace can also be controlled via Actionscript 3. Apart from pure AS3, this includes Flex and Flash (from version CS4 upwards). The AS3 API is basically identical to the JavaScript API but less tested. Customisation The download package includes a complete face in the form of a set of Facedata XML files. We encourage the development of new faces based on these definitions. Currently, no graphical editor is available, values need to be edited manually. However, the package also includes Facemap.swf, the tool we used to develop the face definition. The tool allows to show muscles and their current tension, include underlaid pictures which offer reference, and allows the output of the current state of all components. 4 e.g. http://www.validome.org/xml/ 97 CHAPTER 6. GRIMACE 6.4.8 Class diagram External API ExternalCommands Grimace JSHandler MuscleController EmotionCore 1 FeatureController 1 0..* Geom ASHandler 1 0..* MuscleGroup Emotion 1 0..* Muscle 0..* 1 Data input 0..* 1 0..* 1 Feature FacedataLoader <<contracts>> 1 1 SineMapping IMapping <<interface>> XMLFactory PolynomialMapping XMLDraw GaussMapping 1..* 1 <<visibility>> <<shape>> 1 ISpline <<interface>> 1 0..* 1 FeatureSegment 1 FeatureFill 1 <<shape>> 0..1 IStrokeStyle <<interface>> AbstractSpline BrushStyle Line QuadraticBezier <<pivot>> CubicBezier Joiner <<moves>> 1..* 0..* FeatureNode Figure 6.13: Class diagram 98 CHAPTER 6. GRIMACE 6.5 Results We believe that the goal of this project has been achieved. Our component can display all primary and secondary emotions which were depicted by McCloud. Furthermore, primaries can be blended in arbitrary intensities, thus covering states not covered before. The component has been released to the public under a Creative Commons licence. A project website, http://grimace-project.net, has been implemented. The website features a demo application that allows visitors to express arbitrary blendings of any two emotions. A download package is available, which includes the component, demo applications for all supported programming environments and comprehensive documentation on how to use the component. Public reactions to the project were notedly positive, shown in a large number of approving comments. Scott McCloud kindly featured our project on his blog on February 25, 2009, emphasising that facial expressions should be taught in school, for which our project might be very useful. We are also very thankful to Mr McCloud for his encouraging words and useful comments about our work at an intermediate stage of the project. The resulting face is shown in figure 6.14 with a neutral expression. Figure 6.15 shows the 6 emotional primaries at four intensity levels. In figure 6.16, any combination of two primaries (both at 75% intensity level) is shown. Figure 6.14: Neutral expression CHAPTER 6. GRIMACE Joy Surprise Fear Sadness Disgust Anger Figure 6.15: Primary emotions in 4 intensity levels 99 100 CHAPTER 6. GRIMACE Joy + Surprise Surprise + Fear Fear + Disgust Joy + Sadness Surprise + Sadness Fear + Anger Joy + Fear Surprise + Disgust Sadness + Disgust Joy + Disgust Surprise + Anger Sadness + Anger Joy + Anger Fear + Sadness Disgust + Anger Figure 6.16: Secondary emotion blendings of intensity level 3 CHAPTER 6. GRIMACE 6.6 101 Discussion and future directions The described software component Grimace displays emotions through a comic-like face. We believe that the display of emotional information is a valuable addition to information resources, and facial expressions are a natural way of expressing this kind of information. The work of McCloud (2006) was used as guide and visual reference throughout the design and development process. We believe to have found a useful compromise between simplicity and necessary detail. We include all facial features which are necessary to convey an emotion while omitting the rest. The component was developed using web technology, which allows easy deployment. We have defined an API which allows convenient integration into other projects without the need for knowledge about technical details. All configuration data is loaded from external files, which use an XML-based file format. The file format is fully documented and allows full customisation of all aspects – features, muscles and emotions. The component is stable and ready for use for the intended purpose. A project website was implemented, from which the component and documentation can be downloaded. The expressiveness of the face model was tested in an informal study by my colleague. Subjects were presented with basic emotions and any combination of two emotions, resulting in 22 test faces. Subjects had to select one or two emotions which they believed were expressed by the face. Answers were only counted as correct when, in the case of blendings, both emotions had been selected correctly, or, for pure emotions, only the correct emotion had been selected. The lowest success rate was reported for disgust, only being judged correctly in 30% of the cases on its own, and an average of 46% in blendings with other emotions. Pure joy was identified most reliably with 93% correct answers. On average, correct answers were given in 53% of the cases. This number may not seem to be very high. However, even the lowest success rate is significantly higher than chance rate, which would be 1 22 ≈ 4.5%. It also disagrees with the conclusions of Etcoff and Magee (1992) that we perceive facial expressions to belong to one category. Our results show that humans are, to a certain extent, capable of judging blendings of basic emotions in facial expressions. However, the results of the study suggest much room for improvement of the expressiveness of the face. On the whole, however, we believe the goal of the project has been achieved in a satisfactory manner. Still, many areas remain to be addressed, a few of which will be outlined in the following. First of all, the current face model can be further optimised. We had to add additional muscles to the principal facial muscles in a few cases to achieve the desired expressiveness. However, it might be possible to reduce the number of necessary muscles by optimising the definition of the actual muscles. Calder et al. (2000) show that comprehensibility of facial expressions can be increased further if CHAPTER 6. GRIMACE 102 the characteristic features of an expression are exaggerated. Our model has comiclike appearance, and it might be possible to make our model even more expressive if we allow a certain level of unrealistic, cartoon-like exaggerated expressions. Customisation and extension of the current face model would become much easier if a graphical editor was available. First and foremost, such an editor should facilitate customisation of visible features of a face. Currently, the control points for features need to be entered manually in XML files. These are the parts which can be exchanged easily. The relationships between muscles and emotions, however, need considerable attention and are quite tedious to change. So far, the system can only display facial expressions which represent emotional states. Of course, humans can communicate much more through their faces, which can be easily observed by studying the wide range of facial expressions which actors can display. Facial expressions which cannot be expressed currently include doubt or agreement. The Facial Action Coding System, or FACS, byEkman et al. (1978) describes a comprehensive framework of all possible facial movements. If the range of possible facial expressions is to be extended, this framework would offer a good basis. This would also mean a departure from the mirroring of facial features. Right now, facial features are completely symmetrical. In FACS, asymmetrical movement of features is possible. Ideally, the system would still mirror those parts that are symmetrical and only consider the differences to the symmetrical state when necessary. Chapter 7 Summary and conclusion Summary This thesis has examined the nature of the affective experiences that we associate with information entities like words, pictures and music, and how these experiences can be put to use in interactive systems. Chapter 1 provided an introduction, outlining the motivation for this thesis and putting it into the research context. Chapter 2 was concerned with the nature of meaning. The sign theory, in which a signifier references a signified, formed the starting point of the analysis. Three types of meaning were distinguished; denotation, conceptual connotation and affective connotation. Denotation is the actual content, while conceptual connotation covers any objective property which is implied by an entity. Affective connotation refers to the emotions or feelings which are expressed by an entity or aroused in a recipient. Connotation cannot be described directly but only through the use of metalanguages. Work from experimental psychology shows that affective connotation can be represented well through three factors, evaluation, potency and activity. The last part of the chapter applied the distinguished aspects of meaning to current web technology. Chapter 3 addressed the nature of affective connotation, comparing two theoretical approaches towards affect and emotions. Dimensional theories correspond to the factors derived for the description of affective connotation. Categorical approaches focus on the notion of a set of basic emotions, which fulfil specific functions in human behaviour. Affect and emotion were distinguished as being unequal, the latter being a specialised form of the former. In light of this analysis, I came to the conclusion that general affect is best described through dimensional approaches, while actual emotions are better captured through categorical theories. Chapter 4 elaborated briefly on the question of how art is related with affect by presenting two theories. The contour theory states that art can express emotions because works of art resemble the expressive behaviour which results from 103 CHAPTER 7. SUMMARY AND CONCLUSION 104 emotions. The expression theory focuses on the notion that recipients construct a persona, which expresses emotions like a person would do. Music seems to have a special relationship with affect and was analysed in greater detail. Several ways how music achieves its emotional expressiveness were considered, which showed that emotional expressiveness in music is more subtle than what can be captured by basic emotions. The observation that music which produces negative emotions is sometimes seen as favourable led to the differentiation of two forms of valence, evaluation and pleasure. Chapter 5 integrated the findings of the preceding chapters, considering various metalanguages for the description of affective connotation. Affective factors can be simply represented as scales, which constitutes a very direct metalanguage. The most common form are scales which represent the evaluative factor as star ratings. Defining scales with words turns them into semantic scales, which can be very specific in their meaning while still being expressive of general affective factors. Another language based metalanguage are affect words that are applied as tags. An informal analysis showed that this alternative, though possible with current implementations, is not very popular. Colours were considered as a metaphor for affective states. Studies were compared which looked for formulas through which affective factors and colours are related. Colour symbolism was recognised to have a strong influence which in some cases contradicts the findings of the cited studies. Finally, facial expressions were considered as a universal visual language for the description of emotions. Facial expressions seem to be best explained by a categorical approach. Abstraction and exaggeration were proposed as ways to enhance the emotional expressiveness of facial expression displays. Emoticons were analysed and presented as a very popular form of affective space interfaces. Chapter 6 introduced Grimace, an experimental facial expression display for emotions. The face model has comic-like appearance and is abstracted to a level just below the need for symbols. Facial features are represented by splines, which are influenced by simulated muscles. Six basic emotions each contract a different set of muscles in non-linear ways. Basic emotions can be blended, resulting in an arbitrary number of expressible emotions. The face model was implemented in Actionscript, and all configuration data are read from external XML-based files. It is deployed as a component, which can be integrated in other projects and controlled from JavaScript via an API. Conclusion The present work can only serve as a starting point for the development of interactive systems which target the affective experience any information entity arouses in humans. Though many questions were left unaddressed, some useful conclusions can be drawn. CHAPTER 7. SUMMARY AND CONCLUSION 105 First, there seems to be no single best solution to the question how affect can be captured best in interactive systems. Rather, the decision which option to pick depends on several factors. Here, the context of the work needs to be considered, whether there are tight spatial constraints, or if users need to input their opinion through the interface. Also, the nature of the entities to be described has strong implications. Solutions which can be adapted for each media type, like semantic scales, seem like a good compromise between expressiveness and simplicity. Through all differences, a recurrent theme was the notion of three factors which capture much of what differentiates affective states. The ubiquity of these orthogonal factors leads to the notion that there really exists some form of affective space in which affective experiences can be located. These three factors – let them be called pleasure, arousal and dominance – may constitute a general metalanguage for affect-aware systems. The simplicity of such a dimensional model limits the expressiveness that can be achieved to some extent. However, these factors can constitute a common ground, which enables comparisons across media types and specific implementations. Emotions have a special status. As affective experiences, they can be described through three dimensions. However, the specific meaning which every emotion has for humans is captured better if we think in emotional categories. Just like words have a specific meaning (denotation) and an accompanying affective experience (affective connotation), so do emotions. The functions that emotions serve are described well if we name them specifically – this is emotion’s denotative level. Since emotions denote affective experiences, their affective connotation is quite pronounced and easily determined. This leads me to the conclusion that dimensional descriptions of emotions are, in fact, descriptions of the affective connotation of emotions. This notion does not lessen the value of dimensional descriptions of emotions; it merely puts them into an appropriate context. Since emotions result in specific facial expressions, this intrinsic relationship can be used to visualise emotions in interactive systems. Facial expressions can be seen to denote specific emotions, just like the names of emotions do, but in a visual and universally understandable way. The presented component, Grimace, serves as a starting point for the visualisation of emotions in interactive systems. If my conclusions are correct, all emotions which can be expressed by Grimace should occupy a point in affective space on which a majority of people can agree on. The reverse, however, might not be true; not every point in affective space needs to result in a distinct facial expression, because emotions are only one of several affective phenomena. A test of this hypothesis remains to be conducted in the future. I hope to have presented convincing evidence that affect does play an important role in human experience and is worth to be recognised in the design of interactive systems. In doing so, we would pay respect to human nature, the inseparable duality of reason and emotion. Bibliography Albrecht, I., Schröder, M., Haber, J. and Seidel, H. 2005. Mixed feelings: expression of non-basic emotions in a muscle-based talking head. Virtual Reality, 8(4):201–212. Azuma, J., Kobe, J., Ebner, M. and Graz, A. 2008. A Stylistic Analysis of Graphic Emoticons: Can they be Candidates for a Universal Visual Language of the Future? In: Proceedings of World Conference on Educational Media, Hypermedia and Telecommunications (ED-Media), pp. 972–977. Bainbridge, D., Cunningham, S. and Downie, J. 2003. How people describe their music information needs: A grounded theory analysis of music queries. In: Proceedings of the International Symposium on Music Information Retrieval, pp. 221–222. Barthes, R. 1973/1996. Denotation and Connotation. The Communication Theory Reader, pp. 129–133. Bartlett, M., Littlewort, G., Fasel, I. and Movellan, J. 2003. Real time face detection and facial expression recognition: Development and applications to human computer interaction. In: CVPR Workshop on Computer Vision and Pattern Recognition for Human-Computer Interaction. Bartneck, C. 2001. How Convincing is Mr. Data’s Smile: Affective Expressions of Machines. User Modeling and User-Adapted Interaction, 11(4):279–295. Bartneck, C., Kanda, T., Ishiguro, H. and Hagita, N. 2007. Is The Uncanny Valley An Uncanny Cliff? In: Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN, pp. 368–373. Bestgen, Y. 2008. Building Affective Lexicons from Specific Corpora for Automatic Sentiment Analysis. Proceedings of Language Resources and Evaluation Conference LREC 2008. Boal-Palheiros, G. and Hargreaves, D. 2001. Listening to music at home and at school. British Journal of Music Education, 18(02):103–118. Brennan, S. 1985. Caricature Generator: The Dynamic Exaggeration of Faces by Computer. Leonardo, 18(3):170–178. 106 BIBLIOGRAPHY 107 Buck, R. 1984. The communication of emotion. Guilford Press. Calder, A., Rowland, D., Young, A., Nimmo-Smith, I., Keane, J. and Perrett, D. 2000. Caricaturing facial expressions. Cognition, 76(2):105–146. Castellano, G., Villalba, S. and Camurri, A. 2007. Recognising Human Emotions from Body Movement and Gesture Dynamics. Lecture Notes in Computer Science, 4738:71. Chen, L., Chen, G., Xu, C., March, J. and Benford, S. 2008. EmoPlayer: A media player for video clips with affective annotations. Interacting with Computers, 20(1):17–28. Chen, P. and Yen, J. 2008. A Color Set Web-Based Agent System for 2-Dimension Emotion Image Space. Lecture Notes in Computer Science, 4953:113. Christie, I. and Friedman, B. 2004. Autonomic specificity of discrete emotion and dimensions of affective space: a multivariate approach. International Journal of Psychophysiology, 51(2):143–153. Condon, C., Perry, M. and O’Keefe, R. 2004. Denotation and connotation in the humancomputer interface: The ’Save as’ command. Behaviour and Information Technology, 23(1):21–31. D’Andrade, R. and Egan, M. 1974. The Colors of Emotion. American Ethnologist, 1(1):49–63. Darwin, C. 1872. The Expression of the Emotions in Man and Animals. J. Murray. Davies, S. 2001. Philosophical perspectives on music’s expressiveness. Music and emotion: Theory and research, pp. 23–44. Derks, D., Bos, A. and Grumbkow, J. 2007. Emoticons and social interaction on the Internet: the importance of social context. Computers in Human Behavior, 23(1):842– 849. Derks, D., Bos, A. and von Grumbkow, J. 2008. Emoticons in Computer-Mediated Communication: Social Motives and Social Context. CyberPsychology & Behavior, 11(1):99–101. Ekman, P. 1957. A methodological discussion of nonverbal behavior. Journal of Psychology, 43:141–149. Ekman, P. 1992a. An argument for basic emotions. Cognition & Emotion, 6(3 & 4):169– 200. Ekman, P. 1992b. Facial Expressions of Emotion: New Findings, New Questions. Psychological Science, 3(1):34–38. BIBLIOGRAPHY 108 Ekman, P. 1994. Strong evidence for Universals in facial expressions: a reply to Russell’s mistaken critique. Psychological bulletin, 115(2):268–287. Ekman, P. 1999a. Basic emotions. Handbook of cognition and emotion, pp. 45–60. Ekman, P. 1999b. Facial expressions. Handbook of cognition and emotion, pp. 301– 320. Ekman, P. and Friesen, W. 1975. Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Prentice Hall. Ekman, P., Friesen, W. and Ellsworth, P. 1972. Emotion in the Human Face: Guidelines for Research and an Integration of Findings. Pergamon. Ekman, P., Friesen, W., Hager, J. and Face, A. 1978. Facial Action Coding System. Consulting Psychologists Press. Erdmann, K. 1966. Die Bedeutung des Wortes: Aufsätze aus dem Grenzgebiet der Sprachpsychologie und Logik. Wissenschaftliche Buchgesellschaft. Espe, H. 1985. A cross-cultural investigation of the graphic differential. Journal of Psycholinguistic Research, 14(1):97–111. Esuli, A. and Sebastiani, F. 2006. SentiWordNet: A publicly available lexical resource for opinion mining. In: Proceedings of LREC, pp. 417–422. Etcoff, N. and Magee, J. 1992. Categorical perception of facial expressions. Cognition, 44(3):227–40. Faigin, G. 1990. The Artist’s Complete Guide to Facial Expression. Watson-Guptill. Fehr, B. and Russell, J. 1984. Concept of emotion viewed from a prototype perspective. Journal of experimental psychology. General, 113(3):464–486. French, P. 1977. Nonverbal measurement of affect: The graphic differential. Journal of Psycholinguistic Research, 6(4):337–347. Friborg, O., Martinussen, M. and Rosenvinge, J. 2006. Likert-based vs. semantic differential-based scorings of positive psychological constructs: A psychometric comparison of two versions of a scale measuring resilience. Personality and Individual Differences, 40(5):873–884. Gabrielsson, A. and Lindstrom, E. 2001. The influence of musical structure on emotional expression. Music and emotion: Theory and research, pp. 223–248. Garza-Cuarón, B. 1991. Connotation and Meaning. Mouton De Gruyter. Geller, T. 2008. Overcoming the uncanny valley. IEEE Computer Graphics and Applications, 28(4):11–17. BIBLIOGRAPHY 109 Groleau, T. 2002. Approximating Cubic Bezier Curves in Flash MX. URL http://timotheegroleau.com/Flash/articles/cubic_bezier_in_flash. htm Iser, W. 1978. The Implied Reader: Patterns of Communication in Prose Fiction from Bunyan to Beckett. Johns Hopkins University Press. Iwashita, S., Takeda, Y. and Onisawa, T. 1999. Expressive facial caricature drawing. In: Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE International, volume 3. Izard, C. 1977. Human Emotions. Plenum Pub Corp. Izard, C. 1997. 3. Emotions and facial expressions: A perspective from Differential Emotions Theory. Cambridge University Press. Juslin, P. and Sloboda, J. 2001. Music and emotion: Introduction. Music and Emotion: Theory and Research, pp. 3–20. Kamps, J., Marx, M., Mokken, R. and de Rijke, M. 2004. Using WordNet to measure semantic orientation of adjectives. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004, volume 4, pp. 1115–1118. Kaya, N. and Epps, H. 2004. Color-emotion associations: Past experience and personal preference. Proceedings of the AIC 2004 Color and Paints, Interim Meeting of the International Color Association. Kleinsmith, A. and Bianchi-Berthouze, N. 2007. Recognizing Affective Dimensions from Body Posture. Lecture Notes in Computer Science, 4738:48. Lowe, W. 2001. What is the Dimensionality of Human Semantic Space? In: Connectionist Models of Learning, Development and Evolution: Proceedings of the Sixth Neural Computation and Psychology Workshop, Liege, Belgium, 16-18 September 2000. Springer. Mathes, A. 2004. Folksonomies-Cooperative Classification and Communication Through Shared Metadata. In: Computer Mediated Communication, LIS590CMC (Doctoral Seminar), Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, December. URL http://adammathes.com/academic/computer-mediated-communication/ folksonomies.pdf McCloud, S. 2006. Making Comics: Storytelling Secrets of Comics, Manga and Graphic Novels. HarperPerennial. McKay, C. and Fujinaga, I. 2004. Automatic genre classification using large high-level musical feature sets. In: Proceedings of the International Conference on Music Information Retrieval, volume 525, p. 30. BIBLIOGRAPHY 110 Mehrabian, A. 1996. Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament. Current Psychology, 14(4):261–292. de Melo, C. and Paiva, A. 2007. Expression of Emotions in Virtual Humans Using Lights, Shadows, Composition and Filters. Lecture Notes in Computer Science, 4738:546. Mikunda, C. 2002. Kino spüren: Strategien der emotionalen Filmgestaltung. WUVUniversitätsverlag. Moore, A. 2001. Categorical conventions in music discourse: style and genre. Music and Letters, 82(3):432–442. Mori, M. 1970. The Uncanny Valley. Energy, 7(4):33–35. URL http://graphics.cs.ucdavis.edu/~staadt/ECS280/Mori1970OTU.pdf Norman, D. 2004. Emotional Design: Why We Love (or Hate) Everyday Things. Basic Books. Oatley, K. 1992. Best Laid Schemes: The Psychology of Emotions. Cambridge University Press; Paris: Editions de la Maison des science de l’homme. Oatley, K. and Jenkins, J. 1996. Understanding Emotions. Blackwell Publishers. Ochs, M., Niewiadomski, R., Pelachaud, C. and Sadek, D. 2005. Intelligent expressions of emotions. In: 1st International Conference on Affective Computing and Intelligent Interaction ACII. Springer. Ogden, C., Richards, I. and Eco, U. 1923/1969. The Meaning of Meaning: A Study of the Influence of Language Upon Thought and of the Science of Symbolism. Routledge & Kegan Paul, Ltd. Ortony, A. and Turner, T. 1990. What’s basic about basic emotions. Psychological Review, 97(3):315–331. Osgood, C. 1964. Semantic differential technique in the comparative study of cultures. American Anthropologist, pp. 171–200. Osgood, C. 1976. Focus on Meaning. Mouton. Osgood, C., Suci, G. and Tannenbaum, P. 1957. The Measurement of Meaning. University of Illinois Press. Ou, L., Luo, M., Woodcock, A. and Wright, A. 2004. A Study of Colour Emotion and Colour Preference. Part I: Colour Emotions for Single Colours. Color Research & Application, 29(3):232–240. Pan, X., Gillies, M., Sezgin, T. and Loscos, C. 2007. Expressing Complex Mental States Through Facial Expressions. Lecture Notes in Computer Science, 4738:745. BIBLIOGRAPHY 111 Pandzic, I. and Forchheimer, R. 2002. MPEG-4 Facial Animation: The Standard, Implementation and Applications. Wiley. Peter, C. and Herbon, A. 2006. Emotion representation and physiology assignments in digital systems. Interacting with Computers, 18(2):139–170. Picard, R. 1997. Affective Computing. MIT Press. Plutchik, R. 1980. Emotion: A Psychoevolutionary Synthesis. Harper & Row, New York. Robinson, J. 2005. Deeper Than Reason: Emotion and Its Role in Literature, Music, and Art. Oxford University Press. Russell, J. 1980. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178. Russell, J. 1994. Is There Universal Recognition of Emotion From Facial Expression? A Review of the Cross-Cultural Studies. Psychological Bulletin, 115:102–102. de Saussure, F. 1916/1959. Course in general linguistics. Philos. Library, New York. Scaringella, N., Zoia, G. and Mlynek, D. 2006. Automatic genre classification of music content. Signal Processing Magazine, IEEE, 23(2):133–141. Scherer, K. 2001. Foreword. Music and emotion: Theory and research. Scherer, K. and Zentner, M. 2001. Emotional effects of music: Production rules. Music and emotion: Theory and research, pp. 361–392. Schlosberg, H. 1952. The description of facial expressions in terms of two dimensions. Journal of experimental psychology, 44(4):229–37. Schlosberg, H. 1954. Three dimensions of emotion. Psychological Review, 61(2):81–8. Schubert, E. 2001. Continuous measurement of self-report emotional response to music. Music and emotion: Theory and research, pp. 393–414. Schubert, E. 2004. EmotionFace: Prototype facial expression display of emotion in music. In: Proc. Int. Conf. On Auditory Displays (ICAD). Schubert, E. 2007. The influence of emotion, locus of emotion and familiarity upon preference in music. Psychology of Music, 35(3):499. Schubert, E. and Fabian, D. 2006. The dimensions of baroque music performance: a semantic differential study. Psychology of Music, 34(4):573. Sloboda, J. and Juslin, P. 2001. Psychological perspectives on music and emotion. Music and emotion: Theory and research, pp. 71–104. BIBLIOGRAPHY 112 Solomon, R. 1993. The philosophy of emotions. Handbook of emotions, pp. 3–15. Spindler, O. 2006. Managing Music. Unpublished undergraduate thesis. Spiteri, L. 2007. Structure and form of folksonomy tags: The road to the public library catalogue. Information Technology and Libraries, 26(3):13–25. Strongman, K. 1996. The Psychology of Emotion: Theories of Emotion in Perspective. John Wiley & Son Ltd. Suk, H. 2006. Color and emotion. Ph.D. thesis, Universität Mannheim, Allgemeine Psychologie. URL http://madoc.bib.uni-mannheim.de/madoc/volltexte/2006/1336/pdf/ version_11.0.pdf Tanahashi, S. and Kim, Y. 1999. A comic emotional expression method and its applications. In: TENCON 99. Proceedings of the IEEE Region 10 Conference, volume 1. Tarrant, M., North, A. and Hargreaves, D. 2000. English and American Adolescents’ Reasons for Listening to Music. Psychology of Music, 28(2):166. Urban, W. 1939. Language and reality. Allen & Unwin London. Valdez, P. and Mehrabian, A. 1994. Effects of color on emotions. Journal of experimental psychology. General, 123(4):394–409. Walther, J. and D’Addario, K. 2001. The Impacts of Emoticons on Message Interpretation in Computer-Mediated Communication. Social Science Computer Review, 19(3):324. Wang, C. 1993. Langwidere: A Hierarchical Spline Based Facial Animation System with Simulated Muscles. Ph.D. thesis, University of Calgary. Woodworth, R. and Schlosberg, H. 1954. Experimental Psychology. Methuew. Young, A., Rowland, D., Calder, A., Etcoff, N., Seth, A. and Perrett, D. 1997. Facial expression megamix: Tests of dimensional and category accounts of emotion recognition. Cognition, 63(3):271–313. Zhang, S., Wu, Z., Meng, H. and Cai, L. 2007. Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar. Lecture Notes in Computer Science, 4738:24. Zorn, J. 1999. Arcana: Musicians on Music. Granary Books.
© Copyright 2026 Paperzz