Fabri, M., Moore, D.J., Hobbs, D.J (2004) Mediating the Expression of Emotion in Educational Collaborative Virtual Environments: An Experimental Study, in International Journal of Virtual Reality, Springer Verlag, London Received: 3 September 2002 Accepted: 2 October 2003 Published online: 5 February 2004 http://dx.doi.org/10.1007/s10055-003-0116-7 Mediating the Expression of Emotion in Educational Collaborative Virtual Environments: An Experimental Study. MARC FABRI, DAVID MOORE ISLE Research Group, Leeds Metropolitan University [email protected], [email protected] tel +44-113 283 2600 fax +44-113 283 3182 DAVE HOBBS School of Informatics, University of Bradford [email protected] tel +44-1274 236 135 fax +44-1274 233 727 Abstract: The use of avatars with emotionally expressive faces is potentially highly beneficial to communication in collaborative virtual environments (CVEs), especially when used in a distance learning context. However, little is known about how, or indeed whether, emotions can effectively be transmitted through the medium of CVE. Given this, an avatar head model with limited but human-like expressive abilities was built, designed to enrich CVE communication. Based on the Facial Action Coding System (FACS), the head was designed to express, in a readily recognisable manner, the six universal emotions. An experiment was conducted to investigate the efficacy of the model. Results indicate that the approach of applying the FACS model to virtual face representations is not guaranteed to work for all expressions of a particular emotion category. However, given appropriate use of the model, emotions can effectively be visualised with a limited number of facial features. A set of exemplar facial expressions is presented. Keywords: avatar, collaborative virtual environment, emotion, facial expression 1 1 Introduction This document outlines an experimental study to investigate the use of facial expressions for humanoid user representations as a means of non-verbal communication in CVEs. The intention is to establish detailed knowledge about how facial expressions can be effectively and efficiently visualised in CVEs. We start by arguing for the insufficiency of existing distance communication media in terms of emotional context and means for emotional expression, and propose that this problem could be overcome by enabling people to meet virtually in a CVE and engage in quasi face-to-face communication via their avatars. We further argue that the use of avatars with emotionally expressive faces is potentially highly beneficial to communication in CVEs. However, although research in the field of CVEs has been proceeding for some time now, the representation of user embodiments, or avatars, in most systems is still relatively simple and rudimentary [1]. In particular, virtual environments are often poor in terms of the emotional cues that they convey [2]. Accordingly, the need for sophisticated ways to reflect emotions in virtual embodiments has been pointed out repeatedly in recent investigations [3,4]. In the light of this, a controlled experiment was conducted to investigate the applicability of non-verbal means of expression, particularly the use of facial expressions, via avatars in CVE systems. It is the purpose of the experiments to establish whether and how emotions can effectively be transmitted through the medium of CVE. 2. A Case for CVE for Education Today's information society provides us with numerous technological options to facilitate human interaction over a distance, in real time or asynchronously: telephony, electronic mail, text-based chat, video-conferencing systems. These tools are useful, and indeed crucial, for people who cannot come together physically but need to discuss, collaborate on, or even dispute certain matters. Distance Learning programmes make extensive use of such technologies to enable 2 communication between spatially separated tutors and learners, and between learners and fellow learners [5]. Extensive research [6,7,8] has shown that such interaction is crucial for the learning process, for the purpose of mutual reflection on actions and problem solutions, for motivation and stimulation as well as assessment and control of progress. It has given rise to a growing body of literature in computer supported collaborative learning , cf [9, 10]. However, when communicating over a distance through media tools, the emotional context is often lost, as well as the ability to express emotional states in the way one is accustomed to in face-to-face conversations. When using text-based tools, important indicators like accentuation, emotion, and change of emotion or intonation are difficult to mediate [11]. Audio conferencing tools can alleviate some of these difficulties but lack ways to mediate non-verbal means of communication such as facial expressions, posture or gesture. These channels, however, play an important role in human interaction and it has been argued that the socio-emotional content they convey is vital for building relationships that need to go beyond purely factual and task-oriented communication [11]. Video conferencing can alleviate some of the shortcomings concerning body language and visual expression of a participant's emotional state. Daly-Jones et al [12] identify several advantages of video conferencing over high quality audio conferencing, in particular the vague awareness of an interlocutor's attentional focus. However, because of the non-immersive character of typical video-based interfaces, conversational threads during meetings can easily break down when people are distracted by external influences or have to change the active window, for example to handle electronically shared data [13]. CVEs are a potential alternative to these communication tools, aiming to overcome the lack of emotional and social context whilst at the same time offering a stimulating and integrated framework for conversation and collaboration. Indeed, it can be argued that CVEs represent a communication technology in their own right due to the highly visual and interactive character of the interface that allows communication and the representation of information in new, innovative ways. Users are likely to be actively engaged in interaction with 3 the virtual world and with other inhabitants. In the distance learning discipline in particular, this high-level interactivity where the users' senses are engaged in the action and they 'feel' they are participating in it, is seen as an essential factor for effective and efficient learning [14]. 3. The need for emotionally expressive avatars The term "non-verbal communication" is commonly used to describe all human communication events which transcend the spoken or written word [15]. It plays a substantial role in human interpersonal behaviour. Social psychologists argue that more than 65% of the information exchanged during a person-to-person conversation is carried on the non-verbal band [16]. Argyle [17] sees non-verbal behaviour taking place whenever one person influences another by means of facial expressions, gestures, body posture, bodily contact, gaze and pupil dilation, spatial behaviour, clothes, appearance, or non-verbal vocalisation (e.g. murmur). A particularly important aspect of non-verbal communication is its use to convey information concerning the emotional state of interlocutors. Wherever one interacts with another person, that other person's emotional expressions are monitored and interpreted – and the other person is doing the same [18]. Indeed, the ability to judge the emotional state of others is considered an important goal in human perception [19], and it is argued that from an evolutionary point of view, it is probably the most significant function of interpersonal perception. Since different emotional states are likely to lead to different courses of action, it can be crucial for survival to be able to recognise emotional states, in particular anger or fear in another person. Similarly, Argyle [17] argues that that the expression of emotion, in the face or through the body, is part of a wider system of natural human communication that has evolved to facilitate social life. Keltner [20] showed that for example embarrassment is an appeasement signal that helps reconcile relations when they have gone awry, a way of apologising for making a social faux-pas. Again, recent findings in psychology and neurology suggest that emotions are also an important factor in decision-making, problem solving, cognition and intelligence in general [see 19,21,22,23]. 4 Of particular importance from the point of view of education, it has been argued that the ability to show emotions, empathy and understanding through facial expressions and body language is central to ensuring the quality of tutor-learner and learner-learner interaction [24]. Acceptance and understanding of ideas and feelings, encouraging and criticising, silence, questioning – all involve non-verbal elements of interaction [15,24]. And given this, it can be argued that CSCL technologies ought to provide for at least some degree of non-verbal, and in particular emotional, communication. For instance, the pedagogical agent STEVE [25] is used in a virtual training environment for control panel operation. STEVE has the ability to give instant praise or express criticism via hand and head gestures depending on a student's performance. Concerning CVE technology in particular, McGrath and Prinz [26] call for appropriate ways to express presence and awareness in order to aid communication between inhabitants, be it full verbal communication or nonverbal presence in silence. Thalmann [1] sees a direct relation between the quality of a user’s representation and their ability to interact with the environment and with each other. Even avatars with rather primitive expressive abilities can potentially cause strong emotional responses in people using a CVE system [27]. It appears, then, that the avatar can readily take on a personal role, thereby increasing the sense of togetherness, the community feeling. It potentially becomes a genuine representation of the underlying individual, not only visually, but also within a social context. It is argued, then, that people’s naturally developed skill to “read” emotional expressions is potentially highly beneficial to communication in CVEs in general, and educational CVEs in particular. The emotionally expressive nature of an interlocutor's avatar may be able to aid the communication process and provide information that would otherwise be difficult to mediate. 5 4. Modelling an emotionally expressive avatar Given that emotional expressiveness would be a desirable attribute of CVE, the issue becomes one of how such emotional expressions can be mediated. Whilst all of the different channels for non-verbal communication - face, gaze, gesture, posture - can in principle be mediated in CVEs to a certain degree, our current work focuses on the face. For in the real world it is the face that is the most immediate indicator of the emotional state of a person [28]. While physiology looks beneath the skin, physiognomy stays on the surface studying facial features and lineaments. It is the art of judging character or the emotional state of an individual from the features of the face [29]. The face reflects interpersonal attitudes, provides feedback on the comments of others, and is regarded as the primary source of information after human speech [15]. Production (encoding) and recognition (decoding) of distinct facial expressions constitute a signalling system between humans [30]. Surakka and Hietanen [31] see facial expressions of emotion clearly dominating over vocal expressions of emotion; Knapp [15] generally considers facial expressions as the primary site for communication of emotional states. Indeed, most researchers even suggest that the ability to classify facial expressions of an interlocutor is a necessary pre-requisite for the inference of emotion. It appears that there are certain key stimuli in the human face that support cognition. Zebrowitz [32] found that, for example, in the case of an infant's appearance these key stimuli can by themselves trigger favourable emotional responses. Strongman [18] points out that humans make such responses not only to the expression but also to what is believed to be the “meaning” behind the expression. Our work therefore concentrates on the face. To model an emotionally expressive avatar face, the work of [33] is followed. It was found that there are six universal facial expressions, corresponding to the following emotions: Surprise, Anger, Fear, Happiness, Disgust/Contempt, and Sadness. This categorisation is widely accepted, and considerable research has shown that these basic emotions can be accurately communicated by facial expressions [32,34]. Indeed, it is held that expression and, to an extent, recognition, of these six emotions has an innate 6 basis. They can be found in all cultures, and correspond to distinctive patterns of physiognomic arousal. Figure 1 shows sample photographs depicting the six universal emotions, together with the neutral expression (from [35], used with permission). Figure 1: The six universal emotions and neutral expression 4.1 Describing facial expression Great effort has gone into the development of scoring systems for facial movements. These systems attempt to objectively describe and quantify all visually discriminable units of facial action seen in adults. For the purpose of analysis, the face is typically broken down into three areas: 1. brows and forehead 2. eyes, eyelids, and root of the nose 3. lower face with mouth, nose, cheeks, and chin These are the areas which appear to be capable of independent movement. In order to describe the visible muscle activity in the face comprehensively, the “Facial Action Coding System” (FACS) was developed [36]. FACS is based on 7 highly detailed anatomical studies of human faces and results from a major body of work. It has formed the basis for numerous series of experiments in social psychology, computer vision and computer animation [cf. 37,38,39,40]. A facial expression is a high level description of facial motions, which can be decomposed into certain muscular activities, i.e. relaxation or contraction, called “Action Units” (AUs). FACS identifies 58 action units, which separately or in various combinations are capable of characterising any human expression. An AU corresponds to an action produced by one or a group of related muscles. Action Unit 1, for example, is the inner-brow-raiser, a contraction of the central frontalis muscle. Action Unit 7 is the lid-tightener, tightening the eyelids and thereby narrowing the eye opening. FACS is usually coded from video or photographs, and a trained human FACS coder decomposes an observed expression into the specific AUs that occurred, their duration, onset, and offset time [37]. From this system, some very specific details can be learnt about facial movement for different emotional expressions of humans in the real world. For instance, the brow seems capable of the fewest positions and the lower face the most [15]. Certain emotions also seem to manifest themselves in particular areas of the face. The best predictors for anger for example are the lower face and the brows/forehead area, whereas sadness is most revealed in the area around the eyes [15]. For our current modelling work, then, FACS is adapted to generate the expression of emotions in the virtual face, by applying a limited number of relevant action units to the animated head. Figure 2 shows photographs of some alternative expressions for the anger emotion category, together with the corresponding virtual head expressions as modelled by our avatar. Equivalent representations exist for all remaining universal emotions (and the neutral expression). All photographs are taken from the Pictures of Facial Affect databank [35]. 8 Figure 2: Photographs showing variations of Anger, with corresponding virtual heads 4.2 Keeping it simple Interest in modelling the human face has been strong in the computer graphics community since the 1980s. The first muscle-based model of an animated face, using geometric deformation operators to control a large number of muscle units, was developed by Platt and Badler [41]. This was developed further by modelling the anatomical nature of facial muscles and the elastic nature of human skin, resulting in a dynamic muscle model [40,42]. The approach adopted in this study, however, is feature-based and therefore less complex than a realistic simulation of real-life physiology. It is argued that it is not necessary, and indeed may be counter-productive, to assume that a “good” avatar has to be a realistic and very accurate representation of the real world physiognomy. We argue this partly on the ground that early evidence suggested that approaches aiming to reproduce the human physics in detail may in fact be wasteful [43]. Indeed, this has been described as the Uncanny Valley [44], originally created to predict human psychological reaction to humanoid robots (see figure 3, adapted from [45]). When plotting human reaction against robot movement, the curve 9 initially shows a steady upward trend. That trend continues until the robot reaches reasonably human quality. It then plunges down dramatically, even evoking a negative emotional response. A nearly human robot is considered irritating and repulsive. The curve only rises again once the robot eventually reaches complete resemblance with humans. Figure 3: The "Uncanny Valley" It is postulated that human reaction to avatars is similarly characterised by an uncanny valley. An avatar designed to suspend disbelief that is only nearly realistic may be equally confusing and not be accepted, even considered repulsive. In any event, Hindmarsh et al [46] suggest that even with full realism and full perceptual capabilities of physical human bodies in virtual space, opportunities for employing more inventive and evocative ways of expression would probably be lost if the focus is merely on simulating the real world - with its rules, habits and limitations. It may be more appropriate, and indeed more supportive to perception and cognition, to represent issues in simple or unusual ways. Godenschweger et al [47] found that minimalist drawings of body parts, showing gestures, were generally easier to recognise than more complex representations. Further, Donath [48] warns that because the face is so highly expressive and humans are so adept in reading (into) it, any level of detail in 3D facial rendering could potentially provoke the interpretation of various social messages. If these messages are unintentional, the face will arguably be hindering rather than helping communication. 10 Again, there is evidence that particularly distinctive faces can convey emotions more efficiently than normal faces [32,49,50], a detail regularly employed by caricaturists. The human perception system can recognise physiognomic clues, in particular facial expressions, from very few visual stimuli [51]. To summarise, rather than simulating the real world accurately, we aim to take advantage of humans’ innate cognitive abilities to perceive, recognise and interpret distinctive physiognomic clues. With regard to avatar expressiveness and the uncanny valley, we are targeting the first summit of the curve (see figure 3) where human emotional response is maximised while employing a relatively simple avatar model. 4.3 Modelling facial expression In order to realise such an approach in our avatar work, we developed an animated virtual head with a limited number of controllable features. It is loosely based on the H-Anim specification [52] developed by the international panel that develops the Virtual Reality Modeling Language (VRML). H-Anim specifies seven control parameters: 1. left eyeball 2. right eyeball 3. left eyebrow 4. right eyebrow 5. left upper eyelid 6. right upper eyelid 7. temporomandibular (for moving the jaw) Early in the investigation it became evident, however, that eyeball movement was not necessary as the virtual head was always in direct eye contact with the observer. We also found that although we were aiming at a simple model, a single parameter for moving and animating the mouth area (temporomandibular) was insufficient for the variety of expressions required in the lower face area. 11 Consequently, the H-Anim basis was developed further and additional features were derived from, and closely mapped to, FACS action units. This allowed for greater freedom, especially in the mouth area. It has to be noted that while FACS describes muscle movement, our animated head was not designed to necessarily emulate such muscle movement faithfully, but to achieve a visual effect very similar to the result of muscle activity in the human face. It turned out that it is not necessary for the entire set of action units to be reproduced in order to achieve the level of detail envisaged for the current face model. In fact, reducing the number of relevant action units is not uncommon practice for simple facial animation models [see 53,54], and this study used a subset of 11 action units (see table 1). Table 1: Reduced set of Action Units AU Facial Action Code Muscular Basis 1 Inner Brow Raiser Frontalis, Pars Medialis 2 Outer Brow Raiser Frontalis, Pars Lateralis 4 Brow Lowerer Depressor Glabellae, Depressor Supercilli, Corrugator 5 Upper Lid Raiser Levator Palpebrae Superioris 7 Lid Tightener Orbicularis Oculi, Pars Palebralis 10 Upper Lip Raiser Levator Labii Superioris, Caput Infraorbitalis 12 Lip Corner Puller Zygomatic Major 15 Lip Corner Depressor Triangularis 17 Chin Raiser Mentalis 25 Lips Part Depressor Labii, Relaxation of Mentalis or Orbicularis Oris 26 Jaw Drop (mouth only) Masetter, Relaxation of Temporal and Internal Preygoids The relevant animation control parameters required to model facial features that correspond to these 11 action units are illustrated in figure 4. 12 Figure 4: Controllable features of the virtual head As an example, figure 5 shows four variations of the Sadness emotion, as used in the experiment. Note the wider eye opening in 1, and the change of angle and position of eyebrows. Figure 5: Variations within emotion category Sadness Certain facial features have deliberately been omitted to keep the number of control parameters, and action units, low. For example, AU12 (lip corner puller) normally involves a change in cheek appearance. The virtual head however shows AU12 only in the mouth corners. Also, the virtual head showing AU26 (jaw drop) does not involve jawbone movement but is characterised solely by the relaxation of the mentalis muscle, resulting in a characteristic opening of the mouth. These omissions were considered tolerable, as they did not appear to change the visual appearance of the expression significantly. Accordingly, neither the statistical analysis nor feedback from participants indicated a disadvantage of doing so. In summary, then, we argue that the virtual face model introduced above is a potentially effective and efficient means for conveying emotion in CVEs. By 13 reducing the facial animation to a minimal set of features believed to display the most distinctive area segments of the six universal expressions of emotion (according to [28]), we take into account findings from cognitive and social psychology. These findings suggest that there are internal, probably innate, physiognomic schemata that support face perception and emotion recognition in the face [55]. This recognition process works with even a very limited set of simple but distinctive visual clues [17,51]. 5. Experimental investigation We argue, then, that there is a strong prima facie case that the proposed virtual head, with its limited, but human-like expressive abilities, is a potentially effective and efficient means to convey emotions in virtual environments, and that the reduced set of action units and the resulting facial animation control parameters are sufficient to express, in a readily recognisable manner, the six universal emotions. We have experimentally investigated this prima facie argument, comparing recognition rates of virtual head expressions with recognition rates based on photographs of faces for which FACS action unit coding, as well as recognition rates from human participants, was available. These photographs were taken from [35]. A detailed description of the experimental setup is presented in this section. The aims of the experiment were (a) to investigate the use of simple but distinctive visual clues to mediate the emotional and social state of a CVE user, and (b) to establish the most distinctive and essential features of an avatar facial expression. Given these aims, the experiment was designed to address the following working hypothesis: “For a well-defined subset that includes at least one expression per emotion category, recognition rates of the virtual head model and of the corresponding photographs are comparable”. 5.1 Design The independent variable (IV) in this study is the stimulus material presented to the participants. The facial expressions of emotion are presented in two different 14 ways, as FACS training photographs or displayed by the animated virtual head. Within each of these two factors, there are seven sub-levels (the six universal expressions of emotion and neutral). The dependent variable (DV) is the success rate achieved when assigning the presented expressions of emotion to their respective categories. Two control variables (CVs) can be identified: the cultural background of participants and their previous experience in similar psychological experiments. Since the cultural background of participants potentially may affect their ability to recognise certain emotions in the face [32], this factor was neutralised by ensuring that all participants had broadly the same ability concerning the recognition of emotion. In the same manner, it was checked that none of the participants had previous experience with FACS coding or related psychological experiments, as this may influence the perception abilities due to specifically developed skills. We adopted a one-factor, within subjects design (also known as repeated measures design) for the experiment. The factor comprises two levels, photograph or virtual face, and each participant performs under both conditions: Condition A: emotions depicted by virtual head Condition B: emotions shown by persons on FACS photographs 29 participants took part in the experiment, 17 female and 12 male, with an age range from 22 to 51 years old. All participants were volunteers. None had classified facial expressions or used FACS before. None of the participants worked in facial animation, although some were familiar with 3D modelling techniques in general. 5.2 Procedure The experiment involved three phases: a pre-test questionnaire, a recognition exercise and a post-test questionnaire. Each participant was welcomed by the researcher and seated at the workstation where the experiment would be conducted. The researcher then gave the participant an overview of what was expected of him/her and what to expect during the experiment. Care was taken not 15 to give out information that might bias the user. The participants were assured that they themselves were not under evaluation and that they could leave the experiment at any point if they felt uncomfortable. Participants were then presented with the pre-test questionnaire, which lead into the recognition exercise. From this moment, the experiment ran automatically, via a bespoke software application, and no further experimenter intervention was required. The actual experiment was preceded by a pilot test with a single participant. This participant was not part of the participant group in the later experiment. The pilot run confirmed that the software designed to present the stimulus material and collect the data was functioning correctly and also that a 20 minutes duration time per participant was realistic. Further, it gave indications that questionnaire items possess the desired qualities of measurement and discriminability. The pre-test questionnaire (figure 6) collected information about the participant in relation to their applicability for the experiment. Figure 6: Pre-test questionnaire The Cancel button allowed abortion of the experiment at any stage, in which case all data collected so far was deleted. Back and Next buttons were displayed 16 depending on the current context. A screen collecting further data about the participant's background on FACS as well as possible involvement in similar experiments followed the pre-test questionnaire. Before the recognition task started, a “practice” screen illustrating the actual recognition screen and giving information about the choice of emotion categories and the functionality of buttons and screen element, was shown to the participant. During the recognition task, each participant was shown 28 photographs and 28 corresponding virtual head images, mixed together in a randomly generated order that was the same for all participants. Each of the six emotion categories was represented in 4 variations, and 4 variations of the neutral face were also shown. The variations were defined not by intensity, but by differences in expression of the same emotion. The controllable parameters of the virtual head were adjusted so that they corresponded with the photographs. All material was presented in digitised form, i.e. as virtual head screenshots and scanned photographs, respectively. Each of the six emotion categories was represented in 4 variations. In addition, there were 4 variations of the neutral face. Each participant was therefore asked to classify 56 expressions (2 conditions x 7 emotion categories x 4 variations per category). All virtual head images depicted the same male model throughout, whereas the photographs showed several people, expressing a varying number of emotions (21 images showing male persons, 8 female). The order of expressions in terms of categories and variations was randomised but the same for all participants. Where the facial atlas does not provide 4 distinctive variations of a particular emotion category, or the virtual head cannot show the variation because of the limited set of animation parameters, a similar face was repeated. The face images used in the task were cropped to display the full face, including hair. Photographs were scaled to 320 x 480 pixels, whereas virtual head images 17 were slightly smaller at 320 x 440 pixels. The data collected for each facial expression of emotion consisted of: • the type of stimulus material • the expression depicted by each of the facial areas • the emotion category expected • the emotion category picked by the participant A “recognition screen” (figure 7) displayed the images and provided buttons for participants to select an emotion category. In addition to the aforementioned seven categories, two more choices were offered. The “Other…” choice allowed entry of a term that, according to the participant, described the shown emotion best but was not part of the categories offered. If none of the emotions offered appeared to apply, and no other emotion could be named, the participant was able to choose “Don't know”. Figure 7: Recognition screen On completion of the recognition task, the software presented the post-test questionnaire (figure 8) to the participant. This collected various quantitative and 18 qualitative data, with a view to complementing the data collected during the recognition task. The Next button was enabled only on completion of all rows. Figure 8: Post-test questionnaire 6. Results The overall number of pictures shown was 1624 (29 participants x 56 pictures per participant). On average, a participant took 11 minutes to complete the experiment including pre-test and post-test questionnaire. Results show that recognition rates vary across emotion categories, as well as between the two conditions. Figure 9 summarises this: Figure 9: Summary of recognition rates 19 Surprise, Fear, Happiness and Neutral show slightly higher recognition rates for the photographs, while in categories Anger and Sadness the virtual faces are more easily recognised than their counterparts. Disgust stands out as it shows a very low scoring for virtual faces (around 20%) in contrast to the result for photographs of disgust which is over 70%. Overall, results clearly suggest that recognition rates for photographs (78.6% overall) are significantly higher than those for virtual heads (62.2% overall). The Mann-Whitney test confirms this, even at a significance level of 1%. However, a closer look at the recognition rates of particular emotions reveals that all but one emotion category have at least one Photograph-Virtual head pair with comparable results, demonstrating that recognition was as successful with the virtual head as it was with the directly corresponding photographs. Figure 10 shows recognition rates for these “top” virtual heads in each category. Disgust still stands out as a category with "poor" results for the virtual head. Figure 10: Summary of recognition rates for selected images Results also indicate that recognition rates vary significantly between participants. The lowest scoring individual recognised 30 out of 56 emotions correctly (54%), the highest score was 48 (86%). Those who achieved better results did so homogeneously between virtual heads and photographs. Lower scoring participants were more likely to fail recognising virtual heads rather than the photographs. The expressions of emotion identified as being most distinctive are shown below. Each expression is coded according to FACS with corresponding action units. 20 Some action units are binary, i.e. they are applied or not, while other action units have an associated intensity scoring. Intensity can vary from A (weakest) to E (strongest). The study results would recommend use of these particular expressions, or "exemplars", for models with a similarly limited number of animation control parameters: Figure 11: Most distinctive expressions Surprise (AUs 1C 2C 5C 26) is a very brief emotion, shown mostly around the eyes. Our "exemplary surprise face" features high raised eyebrows and raised upper lids. The lower eyelids remain in the relaxed position. The open mouth is relaxed, not tense. Unlike the typical human surprise expression, the virtual head does not actually drop the jaw bone. The evidence is that this does not have an adverse effect however, considering that 80% of all participants classified the expression correctly. 21 Fear (AUs 1B 5C L10A 15A 25) usually has a distinctive appearance in all three areas of the face [28]. The variation which proved to be most successful in our study is characterised by raised, slightly arched eyebrows. The eyes are wide open as in surprise and the lips are parted and tense. This is in contrast to the open but relaxed "surprise" mouth. There is an asymmetry in that the left upper lip is slightly raised. Disgust (AUs 4C 7C 10A) is typically shown in the mouth and nose area [28]. The variation with the best results is characterised mainly by the raised upper lip (AU10) together with tightened eyelids. It has to be stressed that disgust was the least successful category with only 30% of the participants assigning this expression correctly. Our Anger face (AUs 2A 4B 7C 17B) features lowered brows that are drawn together. Accordingly, the eyelids are tightened which makes the eyes appear to be staring out in a penetrating fashion. Lips are pressed firmly together with the corners straight, a result of the chin raiser AU17. Happiness (AUs 12C, 25) turned out to be easy to recognise - in most cases a cheek raiser (AU12) is sufficient. In our exemplary face, the eyes are relaxed and mouth corners are being pulled up. The virtual head does not allow change to the cheek appearance, neither does it allow for wrinkles to appear underneath the eyes. Such smiles without cheek or eye involvement are sometimes referred to as non-enjoyment smiles, or "Duchénne smiles" after the 19th century French neurologist Duchénne de Boulogne [cf. 31]. The Sadness expression (AUs 1D 4D 15A 25) that was most successful has characteristic brow and eye features. The brows are raised in the middle while the outer corners are lowered. This affects the eyes which are triangulated with the inner corner of the upper lids raised. The slightly raised lower eyelid is not necessarily typical [33] but, in this case, increases the sadness expression. The corners of the lips are down. 22 6.1 Recognition errors The errors made by participants when assigning expressions to categories are presented in Table 2. The matrix shows which categories have been confused, and compares virtual heads with photographs. Rows give per cent occurrence of each response. Confusion values above 10% are shaded light grey, above 20% dark grey, above 30% black: Table 2: Error matrix for emotion categorisation Response [ Virtual / Photograph ] Category Other/ Surprise Fear Disgust Anger Happiness Sadness Neutral Don’t know Surprise .67 .85 .06 .07 .00 .00 .00 .01 .23 .00 .00 .00 .01 .00 .03 .08 Fear .15 .19 .41 .73 .00 .04 .30 .00 .03 .00 .03 .00 .02 .00 .06 .03 Disgust .01 .02 .02 .00 .22 .77 .39 .14 .01 .00 .04 .00 .10 .01 .21 .07 Anger .03 .04 .00 .04 .00 .03 .77 .72 .02 .00 .03 .03 .11 .05 .05 .09 Happiness .01 .00 .01 .00 .01 .00 .01 .00 .64 .84 .03 .00 .26 .15 .04 .02 Sadness .06 .00 .09 .10 .00 .00 .00 .01 .01 .01 .85 .66 .03 .09 .01 .07 Neutral .03 .00 .03 .00 .01 .00 .00 .01 .00 .02 .11 .01 .78 .94 .04 .02 Disgust and Anger Table 2 shows that the majority of confusion errors were made in the category Disgust, an emotion frequently confused with Anger. When examining results for virtual heads only, anger (39%) was picked almost twice as often as disgust (22%). Further, with faces showing disgust, participants often felt unable to select any given category and instead picked “Don’t know”, or suggested an alternative emotion. These alternatives were for example aggressiveness, hatred, irritation, or self-righteousness. Ekman and Friesen [28] describe disgust (or contempt) as an emotion that often carries an element of condescension toward the object of contempt. People feeling disgusted by other people, or their behaviour, tend to feel morally superior to them. Our observations confirm this tendency, for where “other” was selected instead of the expected “disgust”, the suggested alternative was often in line with the Ekman and Friesen's interpretation. 23 Fear and Surprise The error matrix (Table 2) further reveals that Fear was often mistaken for Surprise, a tendency that was also observed in several other studies (see [34]). It is stated that a distinction between the two emotions can be observed with high certainty only in "literate" cultures, but not in "pre-literate", visually isolated cultures. Social psychology states that experience and therefore expression of fear and surprise often happen simultaneously, such as when fear is felt suddenly due to an unexpected threat [28]. The appearance of fear and surprise is also similar, with fear generally producing a more tense facial expression. However, fear differs from surprise in three ways: 1. Whilst surprise is not necessarily pleasant or unpleasant, even mild fear is unpleasant. 2. One can be afraid of something familiar that is certainly going to happen (for example a visit to the dentist), whereas something familiar or expected can hardly be surprising. 3. Whilst surprise usually disappears as soon as it is clear what the surprising event was, fear can last much longer, even when the nature of the event is fully known. These indicators allow differentiating whether a person is afraid or surprised. All three have to do with context and timing of the fear-inspiring event – factors that are not perceivable from a still image. In accordance with this, Poggi and Pélachaud [56] found that emotional information is not only contained in the facial expression itself, but also in the performatives of a communicative act: suggesting, warning, ordering, imploring, approving and praising. Similarly, Bartneck [49] observed significantly higher recognition rates when still images of facial expressions were shown in a dice game context, compared to display without any context. In other words, the meaning and interpretation of an emotional expression can depend on the situation in which it is shown. This strongly suggests that in situations where the facial expression is animated or displayed in context, recognition rates can be expected to be higher. 24 Fear and Anger The relationship between Fear and Anger is similar to that between fear and surprise. Both can occur simultaneously, and their appearance often blends. What is striking is that all confusions were made with virtual faces, whilst not even one of the fear photographs was categorised as anger. This may suggest that the fear category contained some relatively unsuitable examples of modelled facial expressions. An examination of the results shows that there was one artefact in particular that was regularly mistaken for anger. Figure 12 shows an expression with the appearance of the eyes being characteristic of fear. The lower eyelid is visibly drawn up and appears to be very tensed. Both eyebrows are slightly raised and drawn together. The lower area of the face also shows clear characteristics of fear, such as the slightly opened mouth with stretched lips that are drawn together. In contrast, an angry mouth has the lips either pressed firmly together or open in a “squarish” shape, as if to shout. Figure 12: Fear expression, variation A However, 18 out of 29 times this expression was categorised as anger. In anger, as in fear, eyebrows can be drawn together. But unlike the fearful face which shows raised eyebrows, the angry face features a lowered brow. Generally, we have found that subtle changes to upper eyelids and brows had a significant effect on 25 the expression overall, which is in line with findings for real life photographs [28]. The eyebrows in Figure 12 are only slightly raised from the relaxed position, but perhaps not enough to give the desired impression. Another confusing indicator is the furrowed shape of the eyebrows, since a straight line or arched brows are more typical for fear. Figure 13: Fear expression, variation B In contrast, in Figure 13 the expression is identical to the expression in Figure 12 apart from the eyebrows, which are now raised and arched, thereby changing the facial expression significantly and making it less ambiguous and distinctively “fearful”. 6.2 Post-experiment questionnaire results After completing the recognition task participants were asked to complete a questionnaire and were invited to comment on any aspect of the experiment. Responses to the latter are discussed in the next section of this paper. The questionnaire comprised eleven questions, each one answered on a scale from 0-4 with 0 being total disagreement and 4 being total agreement. Table 3 below shows average values per question. 26 Table 3: Post-experiments questionnaire results No. Statement Score (0=disagree, 4=agree) 1. The interface was easy to use. 3.8 2. More emotion categories would have been better. 2.3 3. Emotions were easy to recognise. 2.2 4. The real people showed natural emotions. 2.6 5. I responded emotionally to the pictures. 2.0 6. It was difficult to find the right category. 1.9 7. The "recognisability" of the emotions varied a lot. 2.5 8. The real-life photographs looked posed. 2.2 9. The choice of emotions was sufficient. 2.9 10. Virtual faces showed easily recognisable emotions. 2.7 11. The virtual head looked alienating. 2.9 7. Discussion and Conclusions The experiment followed standard practice for expression recognition experiments by preparing the six universal emotions as pictures of avatar faces of photographs of real human faces and showing these to participants who are asked to say what emotion they think each photograph or picture portrays [57]. Photographs were selected from the databank "Pictures of Facial Affect" solely based on their high recognition rates. This was believed to be the most appropriate method, aiming to avoid the introduction of factors that would potentially disturb results, such as gender, age or ethnicity. Further, the photographs are considered standardised facial expressions of emotions and exact AU coding is available for them. This ensures concurrent validity, since performance in one test (virtual head) is related to another, well reputed test (FACS coding and recognition ). Potential order effects induced by the study's repeat measures design were neutralised by presenting the artefacts of the two conditions in a mixed random order. 27 Further confidence in the results derives from the fact that participants found the interface easy to use (table 3 statement 1), implying that results were not distorted by extraneous user interface factors. Similarly, although participants tended to feel that the photographs looked posed (table 3 statement 8), they nevertheless tended to see them as showing real emotion (table 3 statement 4). Again, despite some ambivalence in the matter (table 3 statements 2 and 9), participants were on the whole happy with the number of categories of emotion offered in the experiment. This is not unexpected since the facial expressions were showing merely the offered range of emotions, and it supports the validity of our results. However, the slight agreement indicates more categories could potentially have produced more satisfaction in participants when making their choice. Two participants noted in their comments, explicitly, that they would have preferred a wider choice of categories. Having established the validity of the experimental procedure and results, an important conclusion to be drawn is that the approach of applying the reduced FACS model to virtual face representations is not guaranteed to work for all expressions, or all variations of a particular emotion category. This is implied by the finding that recognition rates for the photographs were significantly higher than those for the virtual heads (section 6.1). Further evidence is supplied in the post-experiment questionnaire data. Two participants, for example, noted that on several occasions the virtual face expression was not distinctive enough, and two that the virtual head showed no lines or wrinkles and that recognition might have been easier with these visual cues. Nevertheless, our data also suggests that, when applying the FACS model to virtual face representations, emotions can effectively be visualised with a very limited number of facial features and action units. For example, in respect of the “top scoring” virtual heads, emotion recognition rates are, with the exception of the “disgust” emotion, comparable to those of their corresponding real-life photographs. These top-scoring expressions are exemplar models for which detailed AU scoring is available. They therefore potentially build a basis for emotionally expressive avatars in collaborative virtual environments and hence for the advantages of emotionally enriched CVEs argued for earlier. 28 No categorisation system can ever be complete. Although accepted categories exist, emotions can vary in intensity and inevitably there is a subjective element to recognition. When modelling and animating facial features, however, our results suggest that such ambiguity in interpretation can be minimised by focussing on, and emphasising, those visual clues that are particularly distinctive. Although it remains to be corroborated through further studies, it is believed that such simple, pure emotional expressions could fulfil a useful role in displaying explicit, intended communicative acts which can therefore help interaction in a CVE. They can provide a basis for emotionally enriched CVEs, and hence for the benefits of such technology being used, for example, within distance learning as argued for earlier. It should perhaps be noted, however, that such pure forms of emotion are not generally seen in real life, as many expressions occurring in face-to-face communication between humans are unintended or automatic reactions. They are often caused by a complex interaction of several simultaneous emotions, vividly illustrated in Picard's example of a Marathon runner who, after winning a race, experiences a range of emotions: “tremendously happy for winning the race, surprised because she believed she would not win, sad that the race was over, and a bit fearful because during the race she had acute abdominal pain” [21]. With regards to our own work, such instinctive reactions could be captured and used to control an avatar directly, potentially allowing varying intensities and blends of facial expressions to be recognised and modelled onto avatar faces. However, this study has deliberately opted for an avatar that can express clearly, and unambiguously, what the controlling individual exactly wants it to express, since this is one way in which people may want to use CVE technology. Another issue concerns consistency. Social psychology suggests, as do our own findings, that an emotion’s recognisability depends on how consistently it is shown on a face. Further, most emotions, with the exception of sadness, become clearer and more distinctive when their intensity increases. There are indications that in cases where the emotion appeared to be ambiguous at first, the photographs 29 contained subtle clues as to what emotion is displayed, enabling the viewer to assign it after closer inspection. These clues appear to be missing in the virtual head artefacts, suggesting the need to either emphasise distinctive and unambiguous features, or to enhance the model by adding visual cues that help identify variations of emotion more clearly. For further work on emotions in realtime virtual environment interactions the authors aim to concentrate on the former. Overall, it should be noted that many of the artefacts classified by participants as the “Other…” choice are actually close to the emotion category expected, confirming that the facial expressions in those cases were not necessarily badly depicted. This highlights the importance of having a well-defined vocabulary when investigating emotions - a problem that is not new to the research community and that has been discussed at length over the years (see [33] for an early comparison of emotion dimensions vs. categories, also [32,58]) The experimental work discussed in this paper provides strong evidence that creating avatar representations based on the FACS model, but using only a limited number of facial features, allows emotions to be effectively conveyed, giving rise to recognition rates that are comparable with those of the corresponding real-life photographs. Effectiveness has been demonstrated through good recognition rates for all but one of the emotion categories, and efficiency has been established since a reduced feature set was found to be sufficient to build a successfully recognised core set of avatar facial expressions. In consequence, the top-scoring expressions illustrated earlier may be taken to provide a sound basis for building emotionally expressive avatars to represent users (which may in fact be agents), in CVEs. When modelling and animating facial features, potential ambiguity in interpretation can be minimised by focussing on, and emphasising particularly distinctive visual clues of a particular emotion. We have proposed a set of expressions that fulfil this. These are not necessarily the most distinctive clues for a particular emotion as a whole, but those that we found to be very distinctive for that emotion category. 30 8. Further work It is planned to extend the work in a variety of ways. The data reveals that certain emotions were confused more often than others, most notably Disgust and Anger. This was particularly the case for the virtual head expressions. Markham and Wang [59] observed a similar link between these two emotions when showing photographs of faces to children. Younger children (aged 4-6) in particular tended to group certain emotions together, while older children (aged 10+) were found to typically have the ability to differentiate correctly. In view of the findings from the current study, this may indicate that although adults can differentiate emotions well in day-to-day social interaction, the limited clues provided by the virtual head make observers revert back to a less experience-based, but more instinctbased manner when categorising them. However, more work will be necessary to investigate this possibility. Two other studies also found Disgust often confused with Anger [49,60] and concluded that the lack of morph targets, or visual clues, around the nose was a likely cause. In humans, Disgust is typically shown around the mouth and nose [28] and although our model features a slightly raised lip (AU10), there is no movement of the nose. This strongly suggests that to improve distinctiveness of the Disgust expression in a real-time animated model, the nose should be included in the animation, as should the relevant action unit AU9 which is responsible for “nose wrinkling”. Given this, we have now developed an animated model of the virtual head that is capable of lifting and wrinkling the nose to express Disgust. The experimental results, in particular the relatively high number of “Other” and “Don’t know” responses, indicate that limiting the number of categories of emotion might have had a negative effect on the recognition success rates. It might be that allowing more categories, and/or offering a range of suitable descriptions for an emotion category (such as Joy, Cheerfulness and Delight, to complement Happiness), would yield still higher recognition rates, and future experiments will address this. Similarly, although concentrating on the face as the primary channel for conveying emotions, the work must be seen in a wider context in which the entire 31 humanoid representation of a user can in principle act as the communication device in CVEs. The experiments discussed here set the foundation for further work on emotional postures and the expression of attitude through such a virtual embodiment, drawing for example on the work of [61] on posture, [62] on gestures, or [4] on spatial behaviour and gestures. A further contextual aspect of emotional recognition concerns the conversational milieu within which emotions are expressed and recognised. Context plays a crucial role in emotion expression and recognition - effective, accurate mediation of emotion is closely linked with the situation and other, related, communicative signals. A reliable interpretation of facial expressions, which fails to take cognisance of the context in which they are displayed, is often not possible. One would expect, therefore, that recognition of avatar representations of emotion will be higher when contextualised. This assumption requires empirical investigation, however, and future experiments are planned to address this. Bartneck [49] distinguishes between the recognisability of a facial expression of emotion, and its "convincingness", seeing the latter as more important, and further experimental work will enable study of how this distinction plays itself out in a virtual world. It is predicted that timing will affect "convincingness" in a virtual world. For example, showing surprise over a period of, say, a minute would - at the very least - send confusing or contradictory signals. It will also be possible to investigate this and, more generally, what impact the mediation of emotions has on the conversational interchanges. A further contextual issue concerns culture. Although emotions exist universally, there can be cultural differences concerning when emotions are displayed [32]. It appears that people in various cultures differ in what they have been taught about managing or controlling their facial expression of emotion. Ekman and Friesen [28] call these cultural norms “display rules”. Display rules prescribe whether, and if so when, an emotion is supposed to be fully expressed, masked, lowered or intensified. For instance, it has been observed that male Japanese are often reluctant to show unpleasant emotions in the physical presence of others. Interestingly, these cultural differences can also affect the recognition of 32 emotions. In particular, Japanese people reportedly have more difficulty than others recognising negative expressions of emotions, an effect that may reflect a lack of perceptual experience with such expression because of the cultural proscriptions against displaying them [32]. How such cultural differences might play themselves out in a virtual world is an important open question. Finally, the authors wish to explore how the results concerning the mediation of emotions via avatars might be beneficially used to help people with autism. A commonly, if not universally, held view of the nature of autism is that it involves a “triad of impairments” [63]. There is a social impairment: the person with autism finds it hard to relate to, and empathise with, other people. Secondly, there is a communication impairment: the person with autism finds it hard to understand and use verbal and non-verbal communication. Finally, there is a tendency to rigidity and inflexibility in thinking, language and behaviour. Much current thinking is that this triad is underpinned by a “theory of mind deficit” - people with autism may have a difficulty in understanding mental states and in ascribing them to themselves or to others. CVE technology of the sort discussed in this paper could potentially provide a means by which people with autism might communicate with others (autistic or non-autistic) and thus circumvent their social and communication impairment and sense of isolation. Further, as well as this prosthetic role, the technology can also be used for purposes of practice and rehearsal. For this to help combat any theory of mind problem, users would need to be able to recognise the emotions being displayed via the avatars. The findings reported in the current paper give grounds for confidence that the technology will be useful in such a role, but this needs to be investigated in practice [cf. 64,65]. Much remains to be investigated, therefore, concerning the educational use of the emerging CVE technology. It is hoped that the work reported in this paper will help set the foundation for further work on the mediation of emotions in virtual worlds. Acknowledgements Photographs from the CD-Rom Pictures of Facial Affect [35] used with permission. Original virtual head geometry by Geometrek. 33 Detailed results of this study as well as the virtual head prototypes are available online at http://www.leedsmet.ac.uk/ies/comp/staff/mfabri/emotion References 1. Thalmann D. The Role of Virtual Humans in Virtual Environment Technology and Interfaces. In: Frontiers of Human-Centred Computing, Online Communities and Virtual Environments. Earnshaw R, Guedj R, Vince J eds. London: Springer Verlag. 2001 2. Fleming B, Dobbs D. Animating Facial Features and Expressions. Charles River Media: Boston 1999 3. Dumas C, Saugis G, Chaillou C, Degrande S, Viaud M. A 3-D Interface for Cooperative Work. In: Collaborative Virtual Environments 1998 Proceedings. Manchester. 1998 4. Manninen T, Kujanpää T. Non-Verbal Communication Forms in Multi-player Game Sessions. In: People and Computers XVI – Memorable Yet Invisible. Faulkner X, Finlay J, Détienne F eds. London: BCS Press. ISBN 1852336595. 2002 5. Atkins H, Moore D, Hobbs D, Sharpe S. Learning Style Theory and Computer Mediated Communication. In: ED-Media 2001 Proceedings. 2001 6. Laurillard D. Rethinking University Teaching. Routledge:London 1993 7. Moore M. Three Types of Interaction. In: Distance Education: New Perspectives. Harry K, John M, Keegan D eds. London: Routledge. 1993 8. Johnson D, Johnson R. Cooperative learning in the culturally diverse classroom. In: Cultural Diversity in Schools. DeVillar, Faltis, Cummins eds. Albany: State University of New York Press. 1994 9. Webb N. Constructive Activity and Learning in Collaborative Small Groups. Educational Psychology 1995; 87(3); 406-423 10. Wu A, Farrell R, Singley M. Scaffolding Group Learning in a Collaborative Networked Environment. In: CSCL 2002 Proceedings. Boulder, Colorado. 2002 11. Lisetti C, Douglas M, LeRouge C. Intelligent Affective Interfaces: A User-Modeling Approach for Telemedicine. In: Proceedings of International Conference on Universal Access in HCI. New Orleans, LA. Elsevier Science Publishers. 2002 12. Daly-Jones O, Monk A, Watts L. Some advantages of video conferencing over high-quality audio conferencing: fluency and awareness of attentional focus. Int. Journal of Human-Computer Studies 1998; 49(1); 21-58 13. McShea J, Jennings S, McShea H. Characterising User Control of Video Conferencing in Distance Education. In: CAL-97 Proceedings. Exeter University. 1997 14. Fabri M, Gerhard M. The Virtual Student: User Embodiment in Virtual Learning Environments. In: International Perspectives on Tele-Education and Virtual Learning Environments. Orange G, Hobbs D eds. Ashgate 2000 15. Knapp M. Nonverbal Communication in Human Interaction. Holt Rinehart Winston: New York 1978 16. Morris D, Collett P, Marsh P, O'Shaughnessy M. Gestures, their Origin and Distribution. Jonathan Cape: London 1979 17. Argyle M. Bodily Communication (second edition). Methuen: New York 1988 18. Strongman K. The Psychology of Emotion (fourth edition). Wiley & Sons: New York 1996 19. Dittrich W, Troscianko T, Lea S, Morgan D. Perception of emotion from dynamic point-light displays presented in dance. Perception 1996; 25; 727-738 20. Keltner D. Signs of appeasement: evidence for the distinct displays of embarrassment, amusement and shame. Personality and Social Psychology 1995; 68(3); 441-454 34 21. Picard R. Affective Computing. MIT Press 1997 22. Lisetti C, Schiano D. Facial Expression Recognition: Where Human-Computer Interaction, Artificial Intelligence and Cognitive Science Intersect. Pragmatics and Cognition 2000; 8(1); 185-235 23. Damásio A. Descarte’s Error: Emotion, Reason and the Human Brain. Avon: New York 1994 24. Cooper B, Brna P, Martins A. Effective Affective in Intelligent Systems – Building on Evidence of Empathy in Teaching and Learning. In: Affective Interactions: Towards a New Generation of Computer Interfaces. Paiva A ed. London: Springer Verlag. 2000 25. Johnson W. Pedagogical Agents. In: Computers in Education Proceedings. Beijing, China. 1998 26. McGrath A, Prinz W. All that Is Solid Melts Into Software. In: Collaborative Virtual Environments Digital Places and Spaces for Interaction. Churchill, Snowdon, Munro eds. London: Springer. 2001 27. Durlach N, Slater M. Meeting People Virtually: Experiments in Shared Virtual Environments. In: The Social Life of Avatars. Schroeder R ed, London: Springer Verlag. 2002 28. Ekman P, Friesen W. Unmasking the Face. Prentice Hall: New Jersey 1975 29. New Oxford Dictionary of English. Oxford University Press 2001 30. Russell J, Férnandez-Dols J. The Psychology of Facial Expression, Cambridge University Press 1997 31. Surakka V, Hietanen J. Facial and emotional reactions to Duchénne and non-Duchénne smiles. International Journal of Psychophysiology 1998; 29(1); 23-33 32. Zebrowitz L. Reading Faces: Window to the Soul? Westview Press: Boulder, Colorado 1997 33. Ekman P, Friesen W, Ellsworth P. Emotion in the Human Face: Guidelines for Research and an Integration of Findings. Pergamon Press: New York 1972 34. Ekman P. Facial Expressions. In: Handbook of Cognition and Emotion. Dalgleish T, Power M eds. New York: Wiley & Sons. 1999 35. Ekman P, Friesen W. Pictures of Facial Affect CD-Rom. University of California, San Francisco. 1975 36. Ekman P, Friesen W. Facial Action Coding System. Consulting Psychologists Press 1978 37. Bartlett M. Face Image Analysis by Unsupervised Learning and Redundancy Reduction (Ph.D. Thesis). University of California, San Diego. 1998 38. Pélachaud C, Badler N, Steedman M. Generating Facial Expressions for Speech. Cognitive Science 1996; 20(1); 1-46 39. Ekman P, Rosenzweig L eds. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System. Oxford University Press. 1998 40. Terzopoulos D, Waters K. Analysis and synthesis of facial image sequences using physical and anatomical models. Pattern Analysis and Machine Intelligence 1993; 15(6); 569-579 41. Platt S, Badler N. Animating facial expression. ACM SIGGRAPH 1981; 15(3); 245-252 42. Parke F. Parameterized modeling for facial animation. IEEE Computer Graphics and Applications 1982; 2(9); 61-68 43. Benford S, Bowers J, Fahlén L, Greenhalgh C, Snowdon D. User Embodiment in Collaborative Virtual Environments. In: CHI 1995 Proceedings. Denver, Colorado: ACM Press 1995 44. Mori M. The Buddha in the Robot. Tuttle Publishing 1982 45. Reichardt J. Robots: Fact, Fiction and Prediction. London: Thames & Hudson 1978 46. Hindmarsh J, Fraser M, Heath C, Benford S. Virtually Missing the Point: Configuring CVEs for ObjectFocused Interaction. Collaborative Virtual Environments: Digital Places and Spaces for Interaction. Churchill E, Snowdon D, Munro A eds. London: Springer Verlag. 2001 47. Godenschweger F, Strothotte T, Wagener H. Rendering Gestures as Line Drawings. Gesture Workshop 1997. Bielefeld, Germany. Springer Verlag 1997 48. Donath J. Mediated Faces. Cognitive Technology: Instruments of Mind. Beynon M, Nehaniv C, Dautenhahn K eds. Warwick, UK. 2001 35 49. Bartneck C. Affective Expressions of Machines. In: CHI 2001 Proceedings. Seattle, USA. 2001 50. Ellis H. Developmental trends in face recognition. The Psychologist: Bulletin of the British Psychological Society 1990; 3; 114-119 51. Dittrich W. Facial motion and the recognition of emotions. Psychologische Beiträge 1991; 33(3/4); 366377 52. H-Anim Working Group. Specification for a Standard VRML Humanoid. http://www.h-anim.org 53. Yacoob Y, Davis L. Computing spatio-temporal representations of human faces. In: Computer Vision and Pattern Recognition Proceedings. IEEE Computer Society 1994 54. Essa I, Pentland A. Coding, Analysis, Interpretation, and Recognition of Facial Expressions. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 1995 55. Neisser U. Cognition and Reality. Freeman: San Francisco 1976 56. Poggi I, Pélachaud C. Emotional Meaning and Expression in Animated Faces. In: Affective Interactions: Towards a New Generation of Computer Interfaces. Paiva A ed. London: Springer Verlag. 2000 57. Rutter D. Non-verbal communication. In: The Blackwell Dictionary of Cognitive Psychology. Eysenck M ed. Blackwell Publishers: Oxford. 1990 58. Wehrle T, Kaiser S. Emotion and Facial Expression. In: Affective Interactions: Towards a New Generation of Computer Interfaces. Paiva A ed. London: Springer Verlag. 2000 59. Markham R, Wang L. Recognition of emotion by Chinese and Australian children. Cross-Cultural Psychology 1996; 27(5); 616-643 60. Spencer-Smith J, Innes-Ker A, Wild H, Townsend J. Making Faces with Action Unit Morph Targets. AISB'02 Symposium on Animating Expressive Characters for Social Interactions. ISBN 1902956256. London. 2002 61. Coulson M. Expressing emotion through body movement: A component process approach. In: Artificial Intelligence and Simulated Behaviour Proceedings. Imperial College, London. 2002 62. Capin T, Pandzic I, Thalmann N, Thalmann D. Realistic Avatars and Autonomous Virtual Humans in VLNET Networked Virtual Environments. In: Virtual Worlds on the Internet. Earnshaw R, Vince J eds. IEEE Computer Science Press. 1999 63. Wing L. Autism Spectrum Disorders. Constable 1996 64. Moore D, McGrath P, Thorpe J. Computer Aided Learning for People with Autism – A Framework for Research and Development. Innovations in Education and Training International 2000; 37(3); 218-228 65. Moore D, Taylor J. Interactive multimedia systems for people with autism. Educational Media 2001; 25(3); 169-177 36
© Copyright 2026 Paperzz