COGNITIVE 17, SCIENCE 539-622 (1993) Machine Interpretationof Emotion: Designof a Memory-BasedExpertSystem for Interpreting FacialExpressions in Termsof SignaledEmotions GARRETT D. KEARNEY SATI MCKENZIE University of Green with As a first step in involving based expert system expression in terms camera will eventually manually emotion of face geometry user emotion in human-computer interaction, a memory (JANUS; Kearney, 1991) was designed to interpret facial of the signaled emotion. Anticipating that a VDU-mounted supply face parameters automatically, JANUS now accepts made measurements an a digitized full-face photograph and returns labels used by college students. An intermediate representation in terms actions (e.g., mouth open) is also used. Production rules convert the Into these. A dynamic memory (Kolodner, 1934; Schank, 1932) interprets the face actions that new emotion been implemented in terms of emotion labels. The memory is dynamic labels con be learned with experience. A prototype on o Sun 2020 system using POPLOG. Validotion the prototype suggest with those of college that the students interpretations without formal achieved instruction are generally in emotion in the sense system has studies an consistent signals. 1. INTRODUCTION JANUS’ is a memory-basedexpert systemcapableof interpreting facial expressions in termsof the emotionssignaled.It wasdevelopedasan experiWe thank Professor M. Bramer, presently with the University of Portsmouth, and Geoffrey D. A. Sullivan of the University of Reading for several useful discussions, and the reviewers of this article for their valuable criticism. Garrett Kearney acknowledges financial support from the Science and Engineering Research Council. Correspondence and requests for reprints should be sent to Garrett Kearney, Department of Computing and Information Technology, University of Greenwich, Wellington Street, Woolwich, London, SE18 6PF, England. I At an advanced stage in this project, we came across references to other systems called JANUS: Day (1987) conceived a hybrid system of neural networks and a production system concerned with integrating automatic and controlled problem solving; and Fischer, Lemke, A.C., Mostalglio, T., and March, A.I. 1991) described an integration of Hypertext with a knowledge-based design environment. There is also the CA0 software package for electrotechnical systems (Colombani, Sabonnodiere, E. Auriol, P., and Pardo-Gibson, O., 1988), the Sydney University Library researchers’ facility (Brodie, 1989), the decision support system (Raghavan & Chand, 1989), and the BBN & ISINL system (Hinrichs, 1988). None of these have any bearing on the research reported in this article. 589 590 KEARNEY AND MCKENZIE ment in making computers sensitive to the body language of users. The possibility of using nonverbal communication as a means of human-computer interaction has attracted some attention recently and several systems have been reported (Mase, Suenaga, & Akimoto, 1987; Sheehy, 1989). The problem of recognition and recall of facial features is also of interest to psychologists and raises a number of fundamental questions relating to the structure, organization, and functioning of human memory (see Bruce, 1988, for a critical overview; also, among others, Baddeley, 1979; Bower, Gilligan, & Monteiro, 1981; Bower & Karlin, 1974; Patterson & Baddeley, 1977; Strnad & Meuller, 1977; Wells & Hryciw, 1984; Winograd, 1976); bearing on the way faces are perceived (Courtois & Mueller, 1979; Ellis, Jeeves, Newcombe, & Young, 1986; Galper 8c Hochberg, 1971; Jensen, 1986; Sergent, 1984), and the importance of context effects (Bower & Karlin, 1974; Watkins, Ho, 8c Tulving, 1976). Although there has been a wealth of research over the past century in specifying the facial actions signaling emotions, the problem of how these are represented in memory and the strategies enabling their recognition and recall have received less attention than the related question of face recognition. Despite the considerable theorizing linking the role of emotions to goals, motives, and plans in humans (Izard, 1971; Izard & Tomkins, cited in Izard, 1971; Oatley & Johnson-Laird, 1985; Sloman, 1986), only Sloman and Croucher (1981a, 1981b) appeared to accept that robots, too, will have emotions. So, also, with the obverse: There have been few attempts to equip computers with the means of recognizing and acting upon the signaled emotions of their users. Sheehy (1989) planned to detect a user’s eyebrow lift in surprise as a telling communication in computer-user dialogue. As a first step in computer recognition of user emotion, the JANUS system converts face geometry into static face action format and classifies an expression by matching it to the typical expressions of six universal emotions. Use of the word “static” makes explicit that only the end state of the movement is measured in comparison to the neutral position, and not the movement itself. Atypicahties are further labeled by analogy to expressions on which the system has already been trained. The output is one or more emotion labels. Such labels were acquired from college students without formal training in face perception. The direction of future work will attempt to make these meaningful in the context of the goals pursued in the user-machine interaction. JANUS lacks a vision “front end” and does not attempt a solution to the automatic measurement of face emotion parameters but is designed to accept a facial description from a human source and return an emotion label. The input description may be geometric (coordinate positions of 34 selected landmarks currently obtained from manual measurements on a digitized full-face photograph) or syntactic (a list of verbal face actions, e.g., “mouth MACHINE INTERPRETATION Ge;;ztric 591 OF EMOTION Face Actions Interpret Learn Learn Mode Interpretation Flgure 1. Basic components of JANUS ” “nose flared”).<The geometric description, if used, is converted into open, syntactic form prior to interpretation. The conversion is done by a rule base. The interpretation is in the form of an emotion label, such as “happy” or “angry,” and is accomplished by a dynamic memory based on Schank’s (1982) memory organization packets and his theory of reminding and learning and Kolodner’s (1984) computer implementation. In addition to offering interpretations by analogy to those accompanying similar expressions experienced in the past, JANUS is capable of learning new emotion labels and associated face actions, thereby increasing its expertise with use. This allows memory to be trained before use in accordance with the intended purpose. The basic components of JANUS are shown in Figure 1. JANUS differs from conventional expert systems in incorporating a dynamic memory. The advantage of memory-based systems is that, like human beings, they develop their expertise through experience. They also offer the possibility of successfully tackling a problem at a more generalized level if no specific rules apply. Human beings do this when faced with new situations (Schank, 1984). Validation and evaluation studies play an important role in the development of expert systems. Validation studies on JANUS have been aimed at testing both the rule-base and the dynamic memory components. Both the interpretation and learning functions have been considered. The conclusions of JANUS were compared with those of human “lay experts” (i.e., without formal training in emotion signals) drawn from college personnel. An additional gold standard used to assess the capability of these personnel was provided by the descriptions given in Ekman and Friesen (1976b, 1984). 592 KEARNEY Figure AND 2. Facial MCKENZIE ‘Landmarks’ Both informal qualitative assessments and quantitative comparisons using standard statistical techniques were carried out. The results of these studies appear to support the claim that JANUS performs at least as well as the lay experts. The level of expertise also appears acceptable, though more extensive field trials will be necessary to confirm this. The design and operation of the basic components of JANUS, namely, the rule base and the dynamic memory are discussed in Sections 2 and 3. Section 4 covers the validation studies on JANUS. A discussion of related theoretical issues forms the subject of Section 5. Our general conclusions are presented in Section 6. 2. THE RULE BASE The rule base performs the task of converting the geometric description of a face into a list of static “face actions.” The geometric description consists of the positions of 34 “landmarks” (Figure 2) measured manually on images with respect to the tip of the nose. These measurements are made on a digitized full-face photograph. The measured distances are normalized to take into account differences in size and scale. Thus, horizontal distances are divided by the distance between the inner angles of the eyes and vertical MACHINE INTERPRETATION OF EMOTION 593 distances are divided by the length of the nose. The verbal description is in the form of a list of features, action pairs such as “eyes-wide” or “noseflared.” Actions have been defined for six features, namely, brow, eyes, nose, mouth, cheeks, and jaw. A full list of face actions is given in Table 1. The choice of features and actions was influenced by the work of Ekman and Friesen (1984) who published a comprehensive account of such facial cues and associated emotions together with illustrative photographs. The landmarks were chosen with regard to their potential in registering change in the position of these features. The representation adopted is more in line with their earlier Facial Affect Scoring Technique (FAST; Ekman, Friesen, & Tomkins, 1971), not their more precise anatomically based Facial Action Code approach (FACS; Ekman & Friesen, 1976a, 1978), which is, in terms of muscle “action units,” capable of distinguishing among all visible facial behavior and free of any theoretical bias about the possible meaning of facial behaviors. Their analysis of the muscles underlying all facial movements allowed them to define 44 action units in terms of which they attained their target. These have been used, among other uses, to define the groupings underlying the expressions of facial emotion, which can be done very precisely. Training is required to define correctly face expressions in terms of their causative muscle actions; “experienced” scorers in Ekman, Friesen, and O’Sullivan’s (1988) report had from 1 to 4 years experience. In Ekman, Davidson, and Friesen (1990), the scorers had more than 1 year of experience using FACS, and their reliability had proven satisfactory against a standard. However, the earlier syntactic approach was considered more suitable both for training memory by lay experts and as a first attempt at a computer facility that draws on the expertise of everyday college students. Admittedly, this decision arose because of the conditions specific to the project. There were no resources to train subjects in the FACS technique and the paradigm followed was that fundamental to many expert systems. In these computer systems the expert knowledge, which is applied when the expert in a domain tackles a representative set of problems, is extracted from his or her selfreport. Another expert, this time in knowledge engineering, traditionally represents the expert’s reasoning in the form of a rule base, and creates an inference engine to access the pertinent knowledge for an input query. This is the paradigm for JANUS. Assuming that most adults have amassed some expertise in “reading” faces according to their experience, I interviewed groups of college students, showing face photographs and recording their interpretations and reasons. No subject gave reasons other than in descriptive terms. None were aware of the scientific research in the domain. In all cases their reasons were in terms used in FAST. Ekman and Friesen (1984) supplied the typical expressions and descriptions relevant to the templates with which such acquired knowledge could be compared. For these reasons JANUS is more influenced by FAST. It must not, however, be thought that a similar Brows llered upper plJlled square inlid raised lid lowered lid lowered lower upper open upper lip tensed lower lip tensed upper lip raked lower lip raised lower lip lowered upper lip evened lower lip evened compressed sUghUy wide lower lid tensed I bared up down Mouth upper lid tensed lid raised screwed Nose lower lid raised Eyes Table 1: Face actions used in JANUS I verdcel nose-mouth raised Cheeks grooves drop Jaw MACHINE An example rule in POP11 and define eyes-l-lid-raised ;;;comment: the natural OF EMOTION language 595 equivalent ( mug) -> i; The difference ;;;right eye (erio ;;;than that of the neutrnl ;;;the INTERPRETATION between (2)) nnd the lower the y-values lid below fnce: nerll(2), of the internal the pupil nerio angle of the (erll (2)) is less (2) value of “i” is true or false ( (erii(2) - eria(2)) c (nerll(2) - neria(2)) ) -> i; enddefine; Natuml Language equivalent: If the vertical distance between the centre of the lower lid margin and the level of the inner angle of the right eye is less than that,of the neutral face Then the lower eyelid is “raised”. Figure 3. system could not be constructed using FACS if the experts could be trained in the technique. Other workers have utilized FACS to good effect, for example, Mase (1991). JANUS allows all input face actions to be stored. Memory is organized around typicalities and anomalies, with the result that its interpretations are very much a function of its experience in training. In a solely interpretative use, for example, when fed by a video camera (not implemented at present), its output would draw on its experience. A total of 38 rules were defined. Most of these (26) are independent of other face actions (apart from comparison with a neutral face of the same person), whereas the rest (12) use the context of other actions on the same face. For instance, the rule for “brows contracted” is context-free and uses only the distance between the inner ends of the eyebrows. Tension in the lower eyelid, however, cannot be defined directly and is inferred from a combination of “lower eyelid raised, ” “cheeks not raised,” and the “mouth turned down.” An example rule is given in Figure 3. 596 KEARNEY AND 3. THE DYNAMIC MCKENZIE MEMORY The dynamic memory performs two functions. In interpret mode, it accepts a syntactic description of the face and returns the appropriate emotion label. In learn mode, it accepts both a syntactic description and the attributed emotion label and adds them to its repertoire for future use. How it performs these functions will be discussed in this section. The organization of the dynamic memory brings together &hank’s (1982) theorizing on the functional organization of human autobiographical memories, Kolodner’s (1984) computer representations of some of these conceptual structures in the design of a fact retrieval system, and the theory of universal face expressions of emotion (e.g., Ekman & Friesen, 1971; Ekman, Sorenson, & Friesen, 1969). &hank’s (1982) model of the human memory is an attempt to explain how memories of autobiographical social events are stored, organized, and remembered. For one event to remind one spontaneously of another, both must be represented within the same dynamic chunking memory structure, which organizes episodes according to their contextual or thematic similarities and differences. Both must be indexed by a similar explanatory theme that has sufficient salience in the individual’s experience to have merited such atypical indexing in the past. His memory organizational packets (MOPS) organize sets of scenes (general actions) related by a shared goal. These organize memories in terms of typicalities and atypicalities. Schank’s (1982) ideas were applied to the domain of diplomatic events by Kolodner (1984). She developed a system (Cyrus) that organized on-line news reports concerning the diplomatic activities of two U.S. Secretaries of State within an incrementally self-organizing event-content-addressable computer memory. This allowed fact retrieval and elaboration of incomplete events, making use of generalizations formed from prior input. Some of her ideas and representations have been used in the design of JANUS, for example, representing the typical generalities in the frame of the MOP and indexing differences below the frame by their atypical features and the methods of promotion and demotion of these. In JANUS the events are the micro-events of a set of co-existing static face actions displayed in the service of the goal of communicating the emotion of the person. The dynamic memory is initially endowed with six basic expression pools housed in the frames of six FACILMOPS. Each is composed of a set of face actions from which the different typical expressions of a basic emotion (e.g., happiness, sadness, anger, disgust, fear, and surprise) can be composed. The choice of these was influenced by Ekman and Friesen’s (1971; Ekman, Sorenson, & Friesen, 1969) theory of face actions of those emotions, which are universal. The associated face actions were adapted from Ekman and Friesen (1984) with such modifications as were necessitated MACHINE INTERPRETATION OF EMOTION 597 Table 2: Face actions associated with basic emotions W-v: lower eyelids are upturned, Sad: medial ends of Ihe brows are drawn closer IO one brows are raised slraighlenlng Iha usual downcurve, brows are mid-line producing brow corrugalions mostly In cenlre of forehead, raised, inner parts 01 Ihe eyelids are raised higher lhan the mouth are down-turned. upper eye lids are lowered. 4ngry: brows are lowered onto lhe ayes, medial ends of the brows are drawn closer lo one another. eyes narrowed. inner parts of the eyelids are raised higher than the outer parts. lower eyelids ara raised, lower eye lids are tensed. upper eyelids are lowered. upper eyelids are tensed, nostrils are flared, lips are compressed logether, lower lips are lensed, upper lips am tensed, mouth is open al all, laelh are showing, mouth is’ widely-open, moulh Is widgly open and assumes a squarish aperture, grooves between the wings of nose and moulh comers are deep and more varlical. Afraid: bath brows raised corrugaling lhe lorehead all across, medial aspecl of brows are raised slraighlenlng the usual downcunre, medial ends of lhe brows are drawn closer lo one another, lower eyelids are tensed. eyes widely open, upper eyelids are raised, lower eyelids are raised, inner parts of the eyelids are raised higher than the ouler parts, mouth Is widalyopen, mouth is open al all, mouth pulled back widening il horizontally, upper lips are tensed, teeth are showing. Disgusted: brows are lowered onto Ihe eyes. lower eyelids are raised, eyes narrowed, nose screwed up producing lransverse wrinkles al the bridge, lower lip is turned out, leelh are showing, upper lip is turned oul , upper lip raised, lower lip is lowered, nostrils are Ilared. lower lips raised, grooves between the wings of nosa and mouth comers am deep and more vertical. mouth open al all, cheeks ara raised making a fullness below Ihe eyes, lips are compressed together, lower lip is tensed, upper lip is tensed. Surprised: both brows raised corrugaling Ihe lorehesd all across, ayes widely open, upper eyellds raised, lower eyelids are lowered, moulh is open al all, mouth is slighlly-open. mouth widely open. jaw is dmpped. are raised. mouth Is open al all, moulh Is widely-open, comers 01 Ihe moulh Ihe leelh ara showing, cheeks are raised making a fullness below the eyes another, medial aspecl of raised inwards lowards the lower eyelids are outer parts, corners of Ihe by design constraints, which were discussed in Section 2. A list of the basic emotions and face actions is given in Table 2. JANUS organizes facial expressions of emotion. Each of six FACE_ MOPSis essentially a tree with typical universal expressions stored at the root (frame) and related, but atypical, face actions forming subtrees below the frame. Any recurring expression is channeled down the tree until it reaches an identical event previously encountered. This results in “reminding” whereby the emotion attributed to the previous expression is made available. Expressions that have not been encountered before are automatically incorporated into new branches of the tree. Frequently occurring events are recognized as being “typical” and the memory is restructured to reflect this. The computer organization of JANUS’s dynamic memory is a tree of nodes and links (represented as recordtypes). The nodes contain in their information field a variety of input components [a single face-feature: 598 KEARNEY Figure 4. Basic tree of pre-defined AND MCKENZIE action components: only a few links “brows”] binary face action: “brows raised” or an input event identification: “ev0”; or they may reference by name an object that may contain: (a) typical face-actions of an emotion (~~cE210p content frame); (b) a typical facefeature abstracted from experience (fsub-MOP); (c) a typical face-action abstracted from experience (sub-MOP); or (d) in the case of leaf nodes: identifiers of complete input events. An input event is a list of syntactic faceaction pairs and perhaps an interpretation. These objects (a-d) are also represented as instances of Poplog’s Flavor Object Classes. Details of (a-d) such as lists of face actions, related interpretations, and references to other objects are stored in these data structures. Attached procedures (demons) are used to carry out the dynamic restructuring of memory to accommodate new input events. The links may be of three types-a feature, an action or “event” -depending on the type of object pointed to. Each link is composed of two items: type of link and node pointer. The initial state of the tree is shown in Figure 4 where some of the links have been omitted for clarity. The root node (m0) is linked by six “feature” links to the first-level nodes (ml-m6). These are in turn connected by action links to Level 2 nodes (m7m12) each of which contains the typical face-action pool of one of the six basic emotions in the content frames. New events are incorporated as shown in Figure 5. An event, ev0, related to ml2 but differing from it in two respects (“mouth pulled,” “ cheeks raised”) is entered. The differences are indexed below ml2 and two branches are created, each with: feature - < feature > - action - < feature-action > - event - < ev0 > where < . . . > represents nodes in the tree. All subtrees below second-rank nodes (FACEAJOP frames) have this sequence. The same event is indexed twice (at nodes ml5 and m18), and could be accessed (remembered) if either of the two actions occur in a subsequent event traversing the same path. The dynamic reorganization is further illustrated in Figures 6a, 6b, and 6c. Identical events, evl, ev2, which differ from the typical (Face-Mop) in MACHINE INTERPRETATION 599 OF EMOTION pulled + cheeks Figure below 5. A new m12. event, ~0, differs = ev0 raised in two actions from those cheeks 6a. Identical Face-MOP Brows low events differing from the typical 6b. Fsub-MOP are chee& and sub-MOP cheeks Figure indexes ev0 raised indexed below ml2 sub MOP fsub-MOPcheeks Figure Each ralsed cheeks Flgure in m12. 6~. Promotion ralsz formation raised of sub-MOP having “cheeks-raised” are indexed below that node (Figure 6a). The two are then collapsed into a single branch (sub-MOP) in Figure 6b. After six occurrences of the same event (an artitrary number adopted from Kolodner, 1984), JANUS decides that this is a “typical” situation and promotes the action “cheeks raised” to FACUOP level and indexes the events directly off that node (Figure 6~). 600 KEARNEY AND MCKENZIE Table 3: Heuristic rules for FACE-MOP selection: Selection IF THEN depends on specillc face aclions in the inpul lace expresslon eyes lower-lid-lowered and brows raised MOP-surprised-geneFace ELSE IF and and event viz: eyes lower -lid-lensed eyes lower-lid-raised brows raised THEN ELSE IF and mouth up no1 ( nose screwed) MOP-happy_gen-Face and and and and brows lowered brows contracled eyes inlid-raised ( moulh compressed or mouth wide) not ( moulh upper-lip-raised) MOP-angry-gen_Face THEN ELSE IF THEN ELSE IF and and or or or or THEN ELSE IF THEN ( moulh upper-lip-raised mouth upper-lip-tensed ( mouth lower-lip-raised or mouth lower-lip-lowered)) ( nose screwed and cheeks raised and eyes lower-lid-raised and brows lowered) ( nose screwed and cheeks raised and eyes lower-lid-raised and moulh upper-lip-raised) ( moulh upper-lip-raised and moulh lower-lip-everted and cheeks raised and nose screwed) ( mouth upper-lip-ralsed and ( moulh lower-lip-raised or moulh lower-lip-lowered) and ( nose screwed or cheeks n-I-velt)) MOP-dIsgusted_gen_Face ( brows cenlre-raised or mouth down or brows cenlre-raised MOP-sad-geneFace and eyes inlid-raised and eyes lower-lid-raised) A face-actionlist enteredfor interpretation,is first assignedto oneof the six basic emotions.The face actionsin the frame of a FACE-MOPare not definitional. Together,they are a pool drawn from typical expressionsof that emotion. The face actionsof an input eventare matchedagainsteach of thesepoolsto decideautomaticallythe onethat includesmost of it. The largestratio of matchedface actionsin the input to the number of face actionsin eachof the six FACE-MOPpools decidesthe issue.If a tie results, a heuristicof salientfeatures(seeTable3) is applied.If still tied, both competing emotionsare output on the assumptionthat the expressionshows both emotions.If all the input actionsareconsumedby the chosenemotion, thenthat emotionlabelis returned.If someof the input actionsareatypical, MACHINE INTERPRETATION OF EMOTION 601 the sub tree is traversed with these in search of similar learned events and if any are found, the corresponding leaf interpretations along with the FACEMOP emotion are returned. In none are found, the FACE-MOP emotion is returned. When a face-action list is entered to be learned, it is accompanied by an interpretation and the complete event is listed in a leaf. If no atypical feature actions are present, this leaf is indexed directly off the FACE-MOP frame, but if there are atypical face actions, subtrees are traversed or forged with each of these as before, but any atypical actions not already present in the tree are now included as separate branches and a new event leaf node is added. A new instance of the event object class is also created. Thus, an input event may be referenced several times within the tree, once for each atypical face action that is present. 4. VALIDATION Validation studies on JANUS addressed the question of whether its interpretations were acceptable to human beings judging the same photographs. Other than that describing the basic emotions, the knowledge JANUS acquires rests on the discriminations made by human beings to series of face expressions that have not been systematically validated. In many cases the judges’ competence was tested using “gold standards” of systematically validated expressions obtained from two sources. Ekman and Friesen (1984) described in detail the face actions associated with basic emotion classes. They also published a set of validated photographic slides (Ekman BEFriesen, 1976b) of faces exhibiting these basic emotions. The set includes neutral faces for comparison. With the permission of the publishers, a selection of these slides were digitized and used for purposes of validation. College personnel and other lay experts were presented with the same photographs and asked to identify face actions and emotional state. These were then compared with the interpretations JANUS obtained from the same faces by passing the 34 face coordinates of each digitized image through the rule base, thereby deriving the set of face actions to be entered into memory. Lay experts were also used to teach JANUS about new emotional states in order to test the learning function. Human experts also played other roles as judges or arbiters in deciding how they rated the interpretations of JANUS against those of other humans in a blind comparison. 4.1 Quantitative Validation of the Rule Base The aim here was to obtain a precise estimate of the measure of agreement between the conclusions of JANUS and those of human beings. The rule base was tested using four experts (9-12; different from those used for the prelimmary investigation) and 17 photographs. The questionnaire for eliciting 602 KEARNEY x2 i 4.72; AND MCKENZIE d.f. = 9; p P 0.95 *17 photos times 5 choices per photo = 85 possible agreements(A) and disagreements(r)-A) per judge. Key: A = agree, D-A = disagree, exp. = expected frequencies able 4a: With- & without Janus pair-wise comparisons over six features face actions was divided into six sections corresponding to the six features: brows@), eyes(7), nose(4), mouth(l4), checks(2), and jaw(l). The numbers in parentheses represent the number of face actions for each feature. The number of agreements and disagreements for each feature over 17 non-goldstandard, unvalidated faces of six basic emotions in varied intensity posed by one of the authors were computed for all possible pairs involving the experts A-D and JANUS(J) and tested for significance using the chi-square test. The scores for pairs without JANUS were also tested for significance using the chi-square test. The comparisons for “brows” took the form shown in Table 4 and were of two classes: with and without JANUS. The results are given in Table 4a. It is apparent that the near-significant result involving the eyes is not part of a general trend in the direction predicted by the alternative hypothesis. A breakdown of the disagreements in the eye section from the point of view of JANUS incriminates, in particular, judgments of “eyes narrowly open,” “upper eyelid raised,” and “lower eyelid raised” in this order as producing most dissent. All of these reached full agreement in some faces and certain faces (Numbers 1,4, 5 & 12). Differences of 1 pixel can be decisive, so that MACHINE INTERPRETATION OF EMOTION 603 Table 5: With- & without Janus pair-wise comparisons over six features I f x Jaw e 2 x2 x2 x2 x2 = = With 9.32; e as descrlbedin JANUS without JANUS 14.12; = 7.27; = 24.07; = 6.03; = 11.13; clear delineation of the face points is essential. Definition is not good in the eye region because of the natural shadows cast therein. Considering the quality of the images available, JANUS’s agreement or lack of it, might be considered as a lower bound on the potential of this approach using state of the art imaging techniques. Although the rule based is thus in fair agreement with lay experts’ judgments of face actions present, it would be gratifying to find that both agreed when standard photographs were used. It was decided, therefore, to test the rule base using expressions that had been well validated as showing basic emotions. The pictures used for this purpose were taken from Ekman and Friesen [1976b: Pictures of facial affect, PFA)]: 84(happy), 9l(disgusted), 90(surprise), 92(neutral), 4l(neutral), 38(angry), and 37(afraid)]. No conscious bias dictated this choice except that the expressions seemed very well defined. Such photographs would undoubtedly depict the typical face actions for these emotions although these are, unfortunately, not detailed for each photograph in the published material. Agreement as to the depicted emotion of these expressions was very high among the human judges who made the original validation of this published source. The test of the rule base proceeded along the same lines as before, namely, comparing human judgments of these photographs with those produced by the rule base. The experts in this case were five clinical psychologists. The results, enabling a with and without JANUS eomparison are displayed in Table 5. The results do not reveal significant differences in the six feature areas although the comparison of the mouth with JANUS approaches significance (.05). The without JANUS counterpart is a little less (.12). The mouth area is therefore judged with some disparity and the striking characteristic of JANUS’s performance in this respect is that fewer face actions in this area are judged present in many cases. Comparing JANUS’s performance with human judges is not enough. It is necessary to compare the human judges’ performance with a validated standard. A better test of the rule base might then be a comparison of the results of the rule base on the five preceding photos with descriptions of 604 KEARNEY AND MCKENZIE very similar expressions of the same models, pictured and described in some detail in “Unmasking the Face” [UTF, (Ekman & Friesen, 1984). PFA (Ekman & Friesen, 1976b) is a collection of face transparencies without details of face actions. UTF is a book tutor that describes photographed expressions in detail]. Some of the models in the two sources are the same. The results for the five PFA faces are shown in Table 6: JANUS performed slightly better overall than the human judges as reflected in the total face actions in agreement with the UTF descriptions, but varied from face to face as was true of the human judges. 4.2 Validation of the Dynamic Memory This involves testing that the FACE-MOP basic emotions are accessed correctly by their component face actions when these are used as input to the system and, also, that the basic emotions output by an untrained JANUS in response to a test set of digitized images do not, as a block, differ significantly from those adjudged by human judges to be present. A blind study, in which the emotional labels output by a trained JANUS were analyzed with regard to their acceptability to human observers, was made. 4.2.1 Interpretation of Basic Emotion Category: A Qualitative Study. JANUS has “given” knowledge about what face actions may be commonly associated with each basic emotion. These expressions are held in the frame of the FACFOP and, because there can be many typical expressions for a basic emotion, the knowledge is represented as a pool of face actions to which a particular user-input list of face actions can be compared. The number of matches in each of the six FACELMOP pools divided by the number of face actions in the respective pool gives in each case a quotient, the greatest of these is used to select the FACE-MOP under which the input will be classified. To validate this function, that is, whether the “correct” FACE-MOP is selected, the frame face actions of the basic emotion under investigation were input in increasing random combinations starting with singles, then pairs, then triples, then quadruples, and so on. Because of the potential MACHINE INTERPRETATION OF EMOTION 605 combinatorial explosion, just seven random combinations were tried at each level. The level at which 100% success for these seven was achieved was used as an estimate of the sensitivity of the ability of frame face actions to select the “right” Face-Mop. In Ekman and Friesen (1984) the separate combinations of face actions and appearances typifying each basic emotion are described, but what is being validated here is that a purely numerical measure (a quotient: input matches/tally of pool) will select that pool rather than any other FACE-MOP pool. The results suggest that an input of a single given face action varies in its ability to access the FACE-MOP under test. Singles from the “surprised” pool are exceptional (all correct). Pairs and triples are all correct also. Thus, one of the correct emotions (“surprised”) achieved 100% hit rate (over seven consecutive randomly selected inputs) at the one-grouping level, “happy” and “sad,” at the two-groupings level, “afraid” and “disgusted,” at the four-grouping level, and “angry” at the five-grouping level. Bearing in mind that the number of face actions in the associated pool was 8, 9, 7, 14, 17, & 17 respectively, we concluded that the sensitivity of the FACE-MOP selection function was acceptable. The learning capability was investigated to ensure that new input face actions and emotion labels were learned and correctly retrieved in subsequent interpretations. Two experts were asked to view six photographs, one for each basic category and supply lists of face actions together with their own interpretations. These were entered into JANUS. Subsequent input of the same face actions in retrieve mode, that is, without an accompanying emotion, did retrieve the correct interpretations. Thus, the learning function appeared satisfactory. 4.2.2 Validation of the Basic Emotion Output by JANUS by Comparison with Human Judges: A Quantitative Study. A more stringent test would be whether JANUS returns the same basic emotions as do human experts when presented with an arbitrary set of face photographs. Four college computer staff (Experts A-D), aged 21 (male), 24 (female), 38 (male) and 38 (female) years, without formal training in facial expressions were presented with 17 full-face black and white A6-sized photographs of one of the authors posing various intensities of each of the emotions: happy, sad, angry, disgusted, afraid, and surprised, and asked to select, in each case, from the emotion terms, “happy,” “sad,” “disgusted,” “afraid,” “angry,” and “surprised,” the term that described the emotion signaled. The face actions obtained by passing the geometric descriptions of these same photographs through the rule base were input to JANUS and the returned emotion was noted. All attributions are presented in Table 7. Photograph 2 was used as the neutral expression for comparison and was omitted from the analysis. In order to assessthe capability of these judges, the four judges, in a separate trial, interpreted 24 PFA pictures randomly selected over the six basic emotion labels. Their performance in this task 3 10 11 12 13 14 15 16 9 1 3 4 5 6 7 8 Happy Afraid Afraid Happy Allgw Disgusted Afraid Afraid 1Photo 1 JANUS Sad Afraid Sad Happy Surprised Dissusted Angry Afraid Sad Happy Disgusted su Afraid Happy Surprised Surprised Happy Surprised Disgusted Angry Disgusted Surprised Happy Happy Angry Sad Angry I Angry 1 Happy Sad Surprised Happy Happy Sad Disgusted Disgusted I Afraid 1 Discrusted I 9 lo 8 1 3 4 5 6 7 I _ 1 Expert A 1 Expert B 1 Expert C I Expert D I Photo 1 Table 7 : Interpretation of basic emotion category MACHINE INTERPRETATION OF EMOTION 607 (79.2, 95.8, 91.6, and 91.6% correct compared to the published emotion labels for the same faces) indicates that they are in no way a deviant group when it comes to judging faces. However, there is clearly a spread of agreement among the five sources in the table, and statistical tests were applied to test for the significance of these differences overall. The kappa statistic (Cohen, 1960, 1968) was used to test for agreement among the five raters on the results in Table 7 with the following results: x = .467, &r(x) = .0013,2 = 13.13. This value of 2 exceeds the .Ol% significance level and we concluded that the five raters including JANUS exhibit significant agreement. Additional cases gave the following results: With With With With With With JANUS omitted, Expert A omitted, Expert B omitted, Expert C omitted, Expert D omitted, Experts C and D x = .45, var(x) = .0093, Z = 9.46; x = .45, var(x) = .00212, Z = 9.83; x = 44, var(x) = .00254, Z = 8.8; x = .46, var(x) = .00228, Z = 9.63; x = .469, var(x) = .00139, Z = 12.56; omitted, x = .59, var(x) = .00043, Z = 9.1. The x values suggest a moderate agreement in all of these cases and this agreement does not vary to the extent that would suggest any particular expert was markedly deviant. All these Z values exceed the .Ol significance level (Z = 2.32). This would argue for rejecting the hypothesis that the agreement is due to chance, suggesting, instead, significant agreement between the judgments. A better test of the ratings of JANUS vis-a-vis the human experts may be obtained by using the Williams (1976) “In” statistic on the results of Table 7. This test is specially designed to compare the joint agreement of several raters (human experts) with another rater (say, JANUS). For purposes of this discussion, we merely state that a statistic “In” (where n is the number of reference raters, not including JANUS) can be derived by: fll = &/i” PO represents the overall agreement of the isolated rater with the reference raters whereas Pn represents the overall group agreement among raters 1 - n. The results are given in Table 8. Five sets of calculations were done with JANUS and the four human experts being selected in turn for scrutiny. Two additional caseswere considered: 1. The ratings of JANUS were all replaced by the fixed emotion “disgusted,” and 2. The ratings of JANUS were all replaced by randomly selected emotions. These contrived situations were used to assess the sensitivity of the test. In addition to calculating “In” for each case, it is important to be able to estimate the error limits for the “In.” Following Williams (1976), upper Expert Expert Expert Using Using Using Contrived I Contrived Expert Using random fixed responses expert: expert: 0.51 0.52 0.27 I 1.34 0.90 1.37 1.14 0.63 1.19 1.27 1.04 1.09 1.21 A I4 JANUS: 0.29 JANUS: expert: from from focused focused focused response D as the C as the B as the expert: expert: focused focused A as the JANUS as the Using Case upper bound A on I 4 at the 5% sign. level Table 8: I4 comparisons in test cases of Table 4.2 MACHINE INTERPRETATION OF EMOTION 609 bounds were calculated for the population “In,” at the .05% significance level. The results are presented in Table 8. A value of “In” close to 1 would suggest that the ratings of the test judge are as consistent with those of the reference judges as the ratings of the latter are mutually consistent. An upper bound of 1 or more would confirm this at the .05% confidence level. An “In” significantly less than 1 (and an upper bound of less than 1 at the chosen confidence level) would imply that the ratings of the test judge are not consistent with those of the reference judges, The results suggest that there is practically no difference between the joint ratings of the test judge and the reference judges. Thus, Experts A-D and JANUS agree, although Expert C is slightly anomalous. This is in marked contrast with the last two (contrived) cases, where JANUS (with tailored ratings) is clearly inconsistent with the human experts. Another approach is to use “meta-judges” to rate the interpretations in Table 7 in a blind comparison. The meta-judges (aged 30-35 years) had no formal training in recognizing facial expressions. Two were brothers who had lived apart many years and one was married to the third rater so these as a group could have developed some perceptual commonality. The photographs that they were to judge were those upon which the judgments of Table 7 were made. They were not aware that one of the sources of those judgments was a computer. As an index of their prowess in such a task, they scored, respectively: 40(80%), 42(84%), and 45(90%) of SO(lOO%) in agreement with a gold standard set of expressions (PFA, Ekman 8c Friesen, 1976b). It was felt that their ratings of the judgments in Table 7 would have credibility. Each meta-judge was asked to indicate whether each of the lay experts’ interpretations was (a) good, (b)fair, or (c)poor. These were compared using the Friedman (1937) ANOVA (analysis of variance) test. This is appropriate where the same group of subjects is studied under different treatments and the outcomes are to be compared. In this case the outcomes are the number of “a” and “b” grades given to the interpretations of Table 7 by meta-judges observing the same set of photographs. We wished to find out whether or not the grades obtained by the five differed significantly. The data are prepared as a two-way table of five columns and 17 rows, in which each row contains the rank positions across the row of the number of “a”s and “by’s accredited to each expert (including JANUS). A column tabulates these over 17 face photographs. The test statistic, XT is distributed approximately as the chi-square with degrees of freedom equal to the number of columns minus 1. A value equal to or greater than that at the .05% level of significance (9.49) implies that the hypothesis that all the samples came from the same population may be rejected. The results of this test argue for accepting the five samples as coming from the same population: (xr = 2.5694, df = 4, P P .5). The ratings are shown in Table 9. Although it is desirable that posed expressions should be recognizable for the intended emotion, it is possible that we do not always signal our true photo no. 1 3 4 5 6 7 6 9 10 11 12 13 14 15 16 17 16 awry disgusted hww angry disgusted afraid afraid disgusted surprised hww sad afraid surprised sad hwy afraid afraid JANUS I = three mete-judges r bab baa CCC 888 cca aaa aab aab bba bbb 888 888 baa bab baa a88 888 sad afraid sad hww disgusted dlsgusted afraid afraid pained happy hwy sad afraid afraid sad wm disgusted expert1 ratings of the same photograph r cbc baa bee 888 bcb aaa aab aab 888 aab 888 888 baa aab baa 888 aaa hwy afraid surprised hwvy sad dlsgusted w3w afraid disgusted surprlsed hwv sad afraid afraid sad wry disgusted expert2 r bab baa bcb aaa bat aaa acb aab bba bbb aaa 888 baa aab baa 888 888 dlsgusted afraid happy hww afraid sad awry wnf dlsgusted happy hww sad dlsgusted angry sad awry dlsgusted expert3 Table 9: Three me&judges’ ratings of interpretations of 17 garryphotos. The ratings (r) shown are the better of first and second(if any) interpretations iy:he interpretation by the me&judge of the same photograph: surprkd happy hwv sad dlsgurted afraid afraid dlsgurted hwy happy sad surprised afraid sad wry disgusted baa bbb 888 bbb aaa acb bba aab 888 888 ccb cob baa aaa 888 bC0 sad coo r expert4 r CbC bba bba 888 cab aaa baa baa abb baa aaa aaa bba baa aab aaa aaa of each lay-expert compared MACHINE INTERPRETATIONOF EMOTION 611 Table 10: Interpretations of JANUS after training 1 Photo Basic Label 1Alter native Learned Labels Cheerful, Anticipating pleasure Puzzled, l-earful. Uncomprehendmg Puzzled, l-earful, Uncomprehendmg Cheerful. Antlcbatlna Dleasure . .... aiFij, I: 7 8 9 10 ii I i2 I 14l3 15 16 17 .. 18 3’y 0: gusted Af ri.-aid Afraid Disausted !Worised iWvy sad Afraid Surnrlsnd - -. - - ;nrl L,AMWU .“J’ d. 1I I Fleceative to araument. interested 1 Cheerful, anticipating pleasure I Depressed, Unhappv, Havmct dtstaste 1 Puzzled, Fearful, Uncomwehendina ,I .l-lrmnntlva .V-r .,.- to argument, Interested 1I ”nepressed, Unhappy, Having distaste I, n&liking, Displeased . *. I- feelings, or present them in an idiosyncratic way. The accuracy problemwhether the face expression is recognized as signaling the intended emotioncannot be addressed because we do not know what emotion was felt, but do know what was desired to be signaled and the model’s success in signaling this can be calculated for the photographs on which Table 7 rest. The intended emotions can be compared with the interpretations made by the experts and JANUS in whether the intended message got through. The number of judgments in accord with the intended emotion over 17 non-gold-standard photographs were as follows: JANUS(13), Expert A(12), Expert B(14), Expert C(8), and Expert D(ll). 4.2.3 Validation of the Learning and Recall Functions. The learning and recall functions of the dynamic memory were tackled next. A set of face photographs different from those used in previous tests was presented to a group of 30 people and a total of 50 event descriptions (face actions and nonstandard emotion labels) were obtained from them. These were entered into JANUS in the learn mode. This resulted in an experienced (“trained”) memory. The question addressed was: How acceptable are these labels when output again in response to a face description input in interpret mode? The following procedure was designed to discover this. Geometric descriptions of the 17 photographs used in the “untrained” validation (but not in the training session) were converted to face actions by the rule base and input to the dynamic memory. The interpretations returned are given in Table 10. I 612 KEARNEY AND MCKENZIE Table 11: rating Learned emotion. no % Bask emotion label no % good fair 103 30.66 78 52.35 117 34.82 47 31.54 poor 116 34.52 24 16.11 total 336 100.0 149 100.0 There are several learned emotions but only one basic emotion Fifty-five independent judges (untrained people and college students) were asked to judge the same faces but their judgments were discarded. They were told that “other people” had interpreted them as showing this or that emotion and were asked to rate these as good, fair, or poor. Each judge rated up to three photographs each, both for the basic emotion category and the new “learned” emotion labels, yielding 149 basic and 336 learned ratings. The results are given in Table 11. The results show a clear preference for the basic (Face-Mop) emotion label. This is in line with the view of emotions from a prototype perspective rather than as classical concepts (Fehr & Russell, 1984) if we view the FACE-MOP labels as the more typical “core” and the learned lead labels as the more fuzzy perimeter. This is a different analysis than that shown in Table 11. A practical validation of JANUS’s interpretative power is seen if the choice given to the user is addressed directly. Remembering that the FACE-MOP basic emotion is output as well as the learned interpretations for each face description entered in search of an interpretation, validation may proceed with reference to the highest grade obtained per face, “given” or “learned” regardless. This analysis showed that 94% of the interpretations were approved in some measure: good, 105 (70.5%), fair, 35 (23.5%), poor, 9 (60/o). An attempt was made to assess the meta-judges’ prowess in interpreting such photographs by asking them to interpret 24 gold standard PFAs (Ekrnan & Friesen, 1976b). The 24 pictures were randomly selected within the six basic emotions, and each had been well validated by Ekman and Friesen (1976) for the emotion signaled. Unfortunately, there was a considerable drop-out in this endeavor with 39 of 55 (70.9%) being ultimately tested. The task was to judge which basic emotion-happy, sad, angry, afraid, surprised, or disgusted-each picture depicted. Their responses were compared with the published validation of these pictures. On average, the 39 meta-judges scored 20.5 correct: 24 (85.42%, range = 13-24). Their accuracy appears satisfactory in comparison with this published classification. MACHINE INTERPRETATION OF EMOTION 613 4.3 Discussion of System Validation The basis of the validation was comparison of JANUS with humans studying the same face photographs either directly or through the agency of metajudges. The criterion underlying this approach is that one cannot expect JANUS to agree with the humans to a greater extent than the latter agree among themselves. Given that the validation is only as sound as the capabilities of the human lay experts and that although, a priori, we assume that any adult is an expert at identifying emotions, it turns out that the consensus of the adults used is only moderate. However, as a criterial policy, this is still not enough: It is necessary to have some measure of the humans’ prowess in “reading” faces. That the particular quartets of lay experts were somewhat varied in both their interpretations of the face features present, and emotion signaled in judging face photographs, may suggest that these are not tasks on which there is close agreement between humans generally, or that it happened to be a discordant group. The problem is that there is no normative test of capability for recognizing individual face actions. One measure available for objectively assessing prowess in detecting the emotion signaled was the gold standard set of transparencies: PFA (Ekman & Friesen, 1976b). Each transparency features a face, on which the expression has been classified as characteristic of one of the basic emotions (in contradistinction to the other basic emotions) by a high level of consensus of observers in the original validation of the set. The consensus classification for each transparency is taken to be the “correct” basic emotion for validating JANUS. Where possible, the judges involved in the validation of JANUS were asked to judge the emotion signaled by a randomly chosen set of these transparencies in order to give some inkling as to their capabilities. In relation to this gold standard, JANUS and the meta-judges perform well enough, but what is an acceptable passing mark? The 4 interpreters of the basic emotions depicted in the 17 photographs (posed by one of the authors on which JANUS was validated), scored an average of 79.2%, 95.8%,91.6%, and 91.6% correct on 24 PFA transparencies that had produced in the canonical validation, an average consensus agreement of 91.21% for the emotion depicted in each case (range = 71-100%); the 3 persons meta-judged the interpretations of JANUS and these judges achieved 80070, 84070, and 90% PFA “correct” interpretations over 50 transparencies (with an average canonical consensus of 91.78%), and 38 persons meta-judging the aptness of the JANUS learned interpretations averaged 85.42% correct over 24 transparencies (with an average canonical consensus of 91.21 To). If one accepts that this degree of capability on the gold standard commands respect, one will have faith in their assessment of the system output emotion (trained and basic emotions together): Only 7 judges of 55 judging the JANUS learned interpretations, decided a face was not acceptably described either by the basic or the learned interpretations. Their adverse judgments involved 5 different faces of the total shown. 614 KEARNEY AND MCKENZIE There is no comparable gold standard for recognizing face actions, but use was made of UTF (Ekman & Friesen, 1984) as reported in Table 6. Of 29 face actions described in detail in UTF with comparable expressions on the same model in PFA, JANUS derived 26 from geometrical measures. The validation of the rule base showed that JANUS often disagreed with the human experts as to which eye actions were present. This was particularly evident in accessing the degree of eye opening. JANUS decides this on arbitrary’numerical intervals and more work is needed if these are to grasp human perceptual distinctions. The relatively greater rise of the inner part of the upper eyelid also was a feature that caused disagreement. Although indicated plainly by a diagram on the validation questionnaire, the comparison with the neutral face seemed to have been overlooked by the experts. There are two aspects to JANUS that would be improved upon in a field model; because both affect validation, they may, with advantage, be mentioned here. First, a well-validated set of face expressions should form the basis on which the measurements are undertaken for refining the rule base and for validating system function. Acquiring models, training them, producing images with good definition and without shadow and procuring representative samples of people to judge these is a major project. With the constraints in time, equipment, and budget available to the JANUS project, we would consider the results achieved to be a lower bound on what can be achieved in this methodology with ample resources. Second, there are trained experts in the facial recognition of emotional expressions of the face, but they are few and far between. A definitive system could only gain in interpretative capability by drawing its expert knowledge from such sources. The problem with using everyday people as knowledge sources is the uncertainty about their capability. 5. DISCUSSION JANUS brings together two diverse psychological theories: Schank’s (1982) theory of reminding and the Ekman and Friesen (1976b, 1984) explicit theory of which face expressions signal which basic emotions. The product is an emotion retrieval system that learns from its experience of input faceexpression events and applies this learning, as an individual view of its world, to subsequent input. What has been achieved is a small research prototype that produces an emotional label or a choice of several of these from an input description of a facial expression. Because the intended applications of this technological approach would involve data extractable from video camera frames, the input face description is in the form of the xand y- coordinates of 34 standard face locations. Several assumptions are fundamental to the approach, and are discussed briefly. MACHINE l l l l l INTERPRETATION OF EMOTION 615 An event approach has been adopted: face expressions of emotion are treated as autobiographical events, although JANUS uses events from many people. Events are constrained to a list of face actions with or without an emotion label. This is a very constrained type of event with context limited to the accompanying face actions in the expression. It is assumed that the FACE-MOP frame pool forms part of the contextfree autobiographical knowledge (cf. Conway & Bekerian, 1987a) about emotions, which is updated by abstraction from the input over time. However, the co-existence of a set of these face actions overlapping on a face at the one time implies a mutual context. Because there are only six FACE-MOPS and the input is constrained only by the necessity of having one face action from the union of these pools, there is the implicit assumption that all emotions that can be registered on a face can be classified under these six basic emotion Face-Mops that compete for the input event. It is not clear whether JANUS models human neurological behavior. It is not at ah certain that all facial expressions and, by implication, all emotions able to be displayed on the face can be classified under the six basic emotions. It is not intuitive to us that we have abstractive conceptual structures for typical face actions or that atypicalities from them are abstracted from experience and that perception is organized around them. Does one smile that does not reach the eyes remind one of another? Faced with an expression, do alternative emotion labels come to mind? Oatley and Johnson-Laird (1985), Sloman (1986), and Sloman and Croucher (1981a, 1981b) emphasized the crucial role that emotions may play in intelligent systems. Thus, within a changing world, they may act as essential interrupts in a system with multiple motives but limited resources or, released at particular junctures of multigoal planning sequences, act as global communicators and coordinators maintaining transition modes and preparing for action by focusing attention on certain goals. The emphasis in JANUS is rather the obverse function of giving computers an awareness of the emotion that the user might be feeling so that inferences may be drawn about the motivations implied by them. Perceived emotions, however, are open to more than one interpretation. Human expressions should be of some communicatory value to computers and robots but their interpretation in terms of motives and plans would require severely restricted contexts and evidence from other sources. To rely on facial expression alone to communicate junctures of plans would produce only generalities. Speech is precise, but the face may convey discordant information that casts doubt on the veracity of the words, for example, in sarcasm. Humans make use of multiple channels of communication. Body language is one 616 l KEARNEY AND MCKENZIE channel only. Petajan (1985) combined acoustic recognition with automatic lip-reading and found that the latter always improves the recognition rate of digits, letters, and words compared to acoustic recognition alone. Happily, within human-computer interaction the user can be prompted to confirm the body language verbally or in terms of key strokes in response to screen queries. JANUS was conceived as having a potential role to play with a humancomputer interaction: as a step in the direction of making the computer more sensitive to the cognitive states of the user. Frames grabbed from a video camera scanning the user would be processed by an automatic feature-finding algorithm [not implemented, but Bromley (1977, cited in Laughery, Rhodes, & Batten, 1981), Craw, Ellis, & Lishman (1987), Petajan (1985), and Sakai, Ngao, & Kanade (1972), among others, have done work along these lines] into the required x-v-coordinates, which, input to JANUS in turn, would produce the emotion. The emotion along with other information would infer the user’s motives and plans. These are interpreted within the context of the user’s coding goals and plans that have already been communicated to the dialogue coordinator’s user model and may prompt automated messages to screen requesting further amplification. Useful information from other monitoring sources (e.g., natural language input, keyboard posture, and lip-reading) need to be coordinated in the dialogue. For a system that learns its expertise incrementally from the same user with whom it has to interact in daily use, the user’s repertoire of facial expressions and validated associations could represent a much more informed personal knowledge source when the present expression is interpreted in reference to events that caused it in the past. In order to focus exclusively on the coding task, the context needs to be constrained in order to interpret the cognitions associated with the facial expression in terms of the coding only to the exclusion of incidental causes, for example, indigestion. But this is for the future. Without such constraints, the cognitive associations can only be indicated in far more general terms (e.g., see Roseman, 1982). In practice, none of these steps would be without great problems. Automatic feature measurement algorithms are far from perfect and have not, to our knowledge, been proved capable of detecting the fine distances required for JANUS’s needs. Such algorithms, if developed, would need to be hardwired to effect real-time processing. Contour tracing requiring processing of frames buffered in memory is too time consuming (Petajan, 1985). The need for real-time video analysis can be relaxed and measurement can be carried out on random or sampled grabbed frames. Much of the behavioral measurement on people is carried out in this way. The measurement is indirect insofar as it is made by observers or some technical device (Wallbott, MACHINE INTERPRETATION OF EMOTION 617 1980 discussed the various techniques and their advantages and disadvantages). Digital time codes are required for computer analysis (Ekman, Friesen, & Taussig, 1969). One problem with this approach for the use referred to before is not knowing in what stage of the expression the grab has occurred, and conversely, how to locate the beginning frame of a movement. Expressions have an onset, a peak, and a decline, and serial frames in succession would be required to differentiate these. The problem would be compounded for blends. In view of these difficulties, direct measurement might appear a useful alternative, at least in experimental situations. Johansson (1973) attached small lights in strategic positions on the body and filmed the subject in motion without any other source of light. Observers viewed the film, which depicted moving spots of light, and interpreted them correctly. Bassili (1975) used a related technique on the face to show that emotional expressions could be interpreted. There are several techniques described by Mitchelson (1975) for measuring body motion in real time automatically by fixing miniature radiation emitters or transducers to parts in motion. These include sources of infrared and polarized light. It would seem probable that a safe, unobtrusive technique will be forthcoming for facial measurement. This will be no solution for real-world automatic monitoring of expressions. Our methodology may be criticized on grounds that a connectionist approach would provide a higher recognition rate. We have not come across any published evidence supporting such a claim. The single-layered, Perceptron-like Wisard (Aleksander & Burnett, 1983; Aleksander, Thomas, & Bowden (1984); Stonham, 1986) can classify smiles from frowns but it is uncertain whether it would be able to generalize this learning to all comers over all the basic emotions. Although faces have been used as patterns in connectionist network models (Kohonen, 1977; Kohonen, Oja, & Lehtio, 1981; McClelland & Rumelhart, 1985) and the emergent properties of such networks can simulate an approximation of the function ascribed to face recognition units (see Bruce, 1988), it remains uncertain how they would perform in this domain. MOPS were chosen to represent JANUSs memory because of their crucial role in Schank’s (1982) theory of dynamic memory and reminding. There is no separate concept of“‘working memory” in the system. No new FACE-MOPS are formed in the course of classifying input, but the six FACE-MOPS are not static structures in memory, but dynamic, constantly monitored, and liable to change in content. In the course of the system’s use, “ad hoc categories” (see Barsalou, 1990) are formed within the organization of the FACE-MOPS. In JANUS, these are called sub-MOPS. Sub-MOPS are provisional categories on formation because they may be the result of some temporary regularity in the environment, which is not maincharacteristically group tained over ensuing experience. Face-Mops 618 KEARNEY AND MCKENZIE together, in their content frames, face actions with a shared goal: that of signaling an emotion at the one time. The conception of MOPS as dynamic memory structures in human memory may have their proponents and critics but they deserve and receive consideration in explaining experimental findings (cf. Conway & Bekerian’s A-MOPS, 1987b). They are useful as knowledge representation structures in knowledge-based systems (cf. Kolodner’s E-MOPS, 1984; Lebowitz’s Spec-Mops, 1980). JANUS demonstrates their use in this respect at a much lower level of specialized knowledge than usually met with. MOPS, as originally conceived, have social, personal, and physical aspects, and these aspects are distinguishable in FACE-MOPS. There are a number of ways in which the scope of JANUS could be improved: JANUS might be fooled by false and masked emotions because one of the discriminators between these is not measured. Smiling is not a unitary class of behavior (Bkman et al., 1988; Ekman et al., 1990; Ekman & Friesen, 1984). One would have to represent the action of orbicularis oculi pars lateralis (which both raises the cheek, tightens the peri-orbital ring muscle and draws in the periocular skin) by some linear vertical distance. Although there is a heuristic for raised cheeks, the dropping of skin under the brow (which is a telling sign of genuinely happy eyes) has a fullness that cannot be conveyed by a linear measure. One would expect more sophisticated systems to make use of brightness intensity data analysis to supplement distances in representing fullness. As a heuristic, the distance from below the eyebrow to the upper eyelid might be considered, but this will vary also with movements of the brow, say in “surprise,” and so lacks specificity. A further limitation is evident in the exclusion from JANUS of face signals that control, emphasize, punctuate, and give shades of meaning to speech. These would need to be allowed for, depending on the use to which they are put. Intensity of emotion has not been implemented. The capability to rate the intensity of facial actions would be desirable in JANUS. However, the level of precision that could be achieved in measuring distances on digitized photographs was not sufficient to put this into effect. 6. CONCLUSIONS The methodology described is, on the whole, capable of mapping face geometry to emotion labels, although the correspondence with some human judgments may be less than perfect and the rule base may benefit from further refinements, perhaps from a statistical analysis of the most efficient parameters for representing a face action (Pilowski, Thornton, & Stokes, 1985,1986) and, also, improved to take blends and intensities of emotion. It MACHINE INTERPRETATION OF EMOTION 619 is envisaged that such a system could form the basis of a perceptual front end, providing input to the computer’s user model. It is possible that the classificatory and learning tasks required to monitor human facial expressions will be easier with a connectionist approach. We are not aware of any system that perfects this at present, but we are looking at the possibility. In this task we seek a macrostructural approximation to model such mappings. REFERENCES Aleksander, I., & Burnett. P. (1983). Reinventing man. London: Kogan Page. Aleksander, I., Thomas, W.V., & Bowden, P.A. (1984). Wisard-A radical step forward in image recognition. Sensory Review, 4, 120-124. Baddeley, A. (1979). Applied cognitive and cognitive applied psychology: The case of face recognition. In L. Nilsson (Ed.), Perspectives on memory research. Hillsdale, NJ: Erlbaum. BarsaIou, L.W. (1990). Are there static category representations in long-term memory? Behavioural and Brain Sciences, 9, 6X-652. Bassili, J.N. (1975). Facial motion in the perception of faces and of emotional expression. Journal of Experimental Psychology: Human Perception and Performance, 4.373-379. Bower, C., Gilligan, S., & Monteiro, K. (1981). Selectivity of learning caused by affective states. Journal of Experimental Psychology: General, IlO,, 451-473. Bower, G., & Karlin, M. (1974). Depth of processing pictures of faces and recognition memory. Journal of Experimental Psychology, 103, 751-757. Brodie, M. (1989). Making it work: An overview of the Janus Project. LASIE, 19(5), 104-112. Bruce, V. (1988). Recognising faces. London: Erlbaum. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educutionul und Psychological Measurement, 20, 37-46. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220. Colombani. D., Sabonnadiere, E., Auriol, P., & Pardo-Gibson, 0. (1988). Janus. A CA0 software package for calculating the electro magnetic susceptibility of industrial electro technical systems. Conference Actes du Colloque Electronique de Puissance;Colloque International ‘Les RF1 et EM1 en Electronique de Puissance’, 71-78. Publisher Electron Puissance, Paris, France. Conway, M.A., & Bekerian, D.A. (1987a). Situational knowledge and emotion. Cognition and Emotion, 1(2), 145-191. Conway, M.A., & Bekerian, D.A. (1987b). Organization in autobiographical memory. Memory & Cognition, U(2), 119-132. Courtois, M.R.. & Mueller, J.H. (1979). Processing multiple physical features in facial recognition. Bulletin of the Psychonomic Soceity, 14, 74-76. Craw, I., Ellis, H., & Lishman, J.R. (1987). Automatic extraction of face features. Pattern Recognition Letters, 5(2), 183-187. Ekman, P., Davidson, R.J., & Friesen, W.V. (1990). The duchenne smile: Emotional expression and brain physiology. Journal of Personality und Social Psychology, 58, 342-352. Ekman, P., & Friesen, W. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2), 124-129. Ekman, P., & Friesen, W. (1976a). Measuring facial movement. Journul of Environmentul Psychology and Nonverbal Behavior, I, 56-57. Ekman, P., & Friesen, W. (1976b). Pictures of facial 4ffect. Palo Alto, CA: Consulting Psychologists Press. 620 KEARNEY AND MCKENZIE Ekman, P., & Friesen, W. (1978). The facial action coding system: A technique for the measurement of facial movement. Palo Alto, CA: Consulting Psychologists Press. Ekman, P., & Friesen, W. (1984). Unmasking theface. A guide to recognizing emotions from facial cues. Englewood Cliffs, NJ: Prentice Hall. Ekman, P., Friesen, W.V., &O’Sullivan, M. (1988). Smiles when lying. Journal of Personality and Social Psychology, 54, 414-420. Ekman, P., Friesen, W., & Taussig, T.G. (1969). VID-R and SCAN: Tools and methods for the automated analysis of visual records. In G. Gerbner, D.R. Holsti, K. Krippendorf, W.J. Paisley, & P.J. Stone (Eds.), The analysis of communication content. New York: Wiley. Ekman, P., Friesen, W., & Tomkins, S. (1971). Facial affect scoring technique: A first validity study. Semiotica, 3, 37-58. Ekman, P., Sorenson, E., & Friesen, W. (1969). Pan-cultural elements in facial displays of emotion. Science, 64, 86-88. Ellis, H.D., Jeeves, F., Newcombe, A., &Young, A. (Eds). (1986). Aspects offaceprocessing. Dordrecht, Netherlands: Nijhoff. Fehr, B., & Russell, J.A. (1984). Concept of emotion viewed from a prototype perspective. Journal of Experimental Psychology: General, 113. 464-486. Fischer, G., Lemke, AC., Mastaglio, T., 8c March, A.I. (1991). Critics: An emerging approach to knowledge-based human computer interaction. International Journal of Man-Machine Studies, 35(5). 695-721. Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675-701. Galper, R.E., & Hochberg, J. (1971). Recognition memory for photographs of faces. American Journal of Psychology, 84, 351-359. Hinricks, E.W. (1988). Tense, quantifiers, and contexts. Computational Linguistics, 14(2), 3-14. Izard, C.E. (1971). The face of emotion. New York: Appleton-Century Crofts. Jensen, D. (1986). Facial perception: Holistic or feature analytic? Proceedings of the Human Factors Society, 30, (P. I), 729-733. Johansson. G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201-393. Kearney, G.D. (1991). Design of a memory based expert system for interpreting facial expressions in terms of signalled emotions. Unpublished doctoral dissertation, Thames Polytechnic, London, England. Kohonen, T. (1977). Associative memory-A system theoretical approach. Berlin: SpringerVerlag. Kohonen, T., Oja, E., & Lehtio, P. (1981). Storage and processing of information in distributed associative memory systems. In G. Hinton & J.A. Anderson (Eds.), Parallel models of associative memory. Hillsdale, NJ: Erlbaum. Kolodner, J.L. (1984). Retrieval and organizational strategies in conceptual memory: A computer model. Hillsdale, NJ: Erlbaum, Laughery, K., Rhodes, B., Jr., &Batten, G.W., Jr. (1981). Computer-guided recognition and retrieval of facial Images. In G.M. Davies, Ellis, H.D., & Shephard, J.W. (Eds.), Perceiving and remembering faces. London: Academic. Lebowitx, M. (1980). Generalization and memory in an integrated understanding system. (Tech. Rep. No. 186). New Haven, CT: Yale University. Department of Computer Science. Mase, K. (1991). Recognition of facial expression from optical flow. IEICE Transactions, E 74(10), pp. 3474-3483. Mase, K., Suenaga, Y., & Akimoto, T. (1987). Head Reader-A head motion understanding system for better man-machine interaction. Proceedings of the 1987 IEEE International Conference on Systems, Man, and Cybernetics, 3, 970-974. MACHINE INTERPRETATION OF EMOTION 621 McClelland, J.L., & Rumelhart, D.E. (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology: General, 114, 159-188. Mitchelson, D.L. (1975). Recording of movement without photography. In D.W. Grieve (Ed.), Techniques for the analysis of human movement. Lepus, imprint of A. and C. Block, London. Oatley, K., & Johnson-Laird, P.N. (1985). Sketch for a cognifive theory of the emotions (Cognitive Science Research Paper CSRP.045). Falmer, England: University of Sussex. Patterson, K.E., & Baddeley, A. (1977). When face recognition fails. Journalof Experimental Psychology, Human Learning and Memory, 3, 406-417. Petajan, E. (1985). Automatic lipreading to enhance speech recognition. Proceedings of the IEEE Coderence on Computer Vision and Pattern Recognition, 40-47. ISBN 0818606339. Pilowski, I., Thornton, M., & Stokes, B. (1985). A microcomputer based approach to the quantification of facial expressions. Australasian Physical & Engineering Sciences in Medicine 8, 70-75. Pilowski, I., Thornton, M., & Stokes, B. (1986). Towards the quantification of facial expressions with the use of a mathematical model of the face. In H. Ellis, M.A. Jeeves, F. Newcombe, & A. Young, A. (Eds.), Aspects of face processing. Lancaster, England: Martinus Nijhoff. Raghavan, S.A., &Chard, D.R. (1989). Exploring active decision support: The JANUS project. In R. Blanning, D. King, Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences, Volume 111: Decision Support and Knowledge Based Systems Track (Cat. No. 89TH0244-4), 33-35. Washington, DC: IEEE Computing Society Press. Roseman, I. (1982). Cognitive aspects of discrete emotions. Unpublished doctoral dissertation, Yale University, New Haven, CT. Sakai, T., Ngao, M., & Kanade, T. (1972). Computer analysis and classifications of photographs of human faces. Proceedings of the First USA-Japan Computer Conference, Tokyo, I, 55-62. AFIPS and Information Processing Society of Montvale, NJ. Schank, R.C. (1982). Dynamic memory: A theory of reminding and learning in computers and people. Cambridge, England: Cambridge University Press. Schank, R.C. (1984). Memory-based expert systems (Interim Report AFOSR. TR.84-0814). New Haven, CT: Yale University, Computer Science Department. Sergent, J. (1984). An investigation into component and configural processes underlying face perception. British Journal of Psychology, 75, 221-242. Sheehy, N.P. (1989). Non-verbal behaviour in the demonstrator. In Communication Failure in dialogue techniques for detection and repair. Deliverable 9. Implementation of Dialogue System (Esprit Project 527, Ref. CFID.Dg.2). Leeds, England: University of Leeds, Department of Psychology. Sloman, A. (1986). Motives, mechanisms and emotions (Cognitive Science Research Reports, Serial No. CSRP 0620). Fahner, Brighton, England: University of Sussex, School of Social Studies. Sloman, A., & Croucher. M. (1981a). Why robots will have emotions. Cognitive Science Research Paper, No. 176. University of Sussex, School of Social Sciences, Fahner, England. Sloman, A., & Croucher, M., (1981b). You don’t need a soft skin to have a warm heart (Cognitive Science Research Paper, Serial No. CSRP 004). Falmer, Brighton, England: University of Sussex, School of Social Sciences. Stonham, T.J. (1986). Practical face recognition and verification with Wisard. In H. Ellis, M.A. Jeeves, F. Newcombe, & A. Young @ids.), Aspects offaceprocessing. Lancaster, England: Martinus Nijhoff. 622 KEARNEY AND MCKENZIE Strnad, B., &Mueller, J.H. (1977). Levels of processing in facial recognition memory. Bulletin of the Psychonomic Society, 9, 17-18. Wallbott, H.G. (1980). The measurement of human expression: In W. von Raffler-Engel (Ed.), Aspects of nonverbal communication. Lisse: Swets & Zeitlinger. Watkins, M.J., Ho, E., & Tulving, E. (1976). Context effects in recognition memory for faces. Journal of Verbal Learning and Verbal Behavior, 15, 505417. Wells, G.L., & Hryciw, B.A. (1984). Memory for faces: Encoding and retrieval operations. Memory and Cognition, 12, 338-344. Wiiiams, G.W. (1976). Comparing the joint agreement of several raters with another rater. Biometrics, 32, 619-627. Winograd, E. (1976). Recognition memory for faces following nine different judgements. Bulletin of the Psychonomic Society, 8, 419-421.
© Copyright 2026 Paperzz