A Linguistically Motivated Model for Speed and Pausing in Animations of American Sign Language MATT HUENERFAUTH The City University of New York, Queens College 9 Many deaf adults in the United States have difficulty reading written English text; computer animations of American Sign Language (ASL) can improve these individuals’ access to information, communication, and services. Planning and scripting the movements of a virtual character’s arms and body to perform a grammatically correct and understandable ASL sentence is a difficult task, and the timing subtleties of the animation can be particularly challenging. After examining the psycholinguistics literature on the speed and timing of ASL, we have designed software to calculate realistic timing of the movements in ASL animations. We have built algorithms to calculate the time-duration of signs and the location/length of pauses during an ASL animation. To determine whether our software can improve the quality of ASL animations, we conducted a study in which native ASL signers evaluated the ASL animations processed by our algorithms. We have found that: (1) adding linguistically motivated pauses and variations in sign-durations improved signers’ performance on a comprehension task and (2) these animations were rated as more understandable by ASL signers. Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—Language generation, Machine translation; K.4.2 [Computers and Society]: Social Issues—Assistive technologies for persons with disabilities This research was supported by grants from The City University of New York PSC-CUNY Research Award Program (“Evaluating Parameters for American Sign Language Animations”, 2007), from the National Science Foundation (“CAREER: Learning to Generate American Sign Language Animation through Motion-Capture and Participation of Native ASL Signers”, Award #0746556, 2008), and from Siemens A&D UGS PLM Software (“Generating Animations of American Sign Language”, Go PLM Grant Program, 2007). This article is an extended version of a paper presented at the ACM SIGACCESS conference on Computers and Accessibility [Huenerfauth 2008b]. This article includes additional detail about experimental design, new examples of the ASL passages included in the study, results from an additional round of evaluation with additional subjects, and a discussion of practical issues for researchers making use of the results of this study. Author’s address: M. Huenerfauth, Computer Science Department, CUNY Queens College, City University of New York, 65-30 Kissena Blvd, Flushing, NY 11375 USA, email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2009 ACM 1936-7228/2009/06-ART9 $10.00 DOI: 10.1145/1530064.1530067. http://doi.acm.org/10.1145/1530064.1530067. ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 2 · M. Huenerfauth General Terms: Design, Experimentation Additional Key Words and Phrases: American Sign Language, animation, natural language generation, evaluation, accessibility technology for the deaf ACM Reference Format: Huenerfauth, M. 2009. A linguistically motivated model for speed and pausing in animations of American Sign Language. ACM Trans. Access. Comput. 2, 2, Article 9 (June 2009), 31 pages. DOI = 10.1145/1530064.1530067. http://doi.acm.org/10.1145/1530064.1530067. 1. MOTIVATIONS AND BACKGROUND American Sign Language (ASL) is a natural language, which is used as a primary means of communication for about one half million people in the United States [Mitchell et al. 2006]. During an ASL sentence, signers use their hands, facial expression, eye gaze, head tilt, and body tilt to convey linguistic meaning [Liddell 2003; Neidle et al. 2000; Sandler and Lillo-Martin 2006]. ASL is not just a manual presentation of an English sentence; it has its own word order, syntactic constructions, and vocabulary (which may not have one-to-one equivalence with English words). Because of the differences between English and ASL, it is possible to be fluent in ASL yet have difficulty reading English text. In fact, a majority of deaf 18-year-olds in the United States have an English reading level below average 10-year-old hearing students [Holt 1993]. Unfortunately, Web sites and other written-English information sources can also pose a challenge for deaf adults with low literacy skills. One way of combating this accessibility challenge is to use software that displays computergenerated animations of ASL to make more information, communication, and services accessible to these users. These ASL animations may be scripted by a content developer [Kennaway et al. 2007] or generated by English-to-ASL automatic machine translation software [Chiu et al. 2007; Huenerfauth 2006; Marshall and Sáfár 2005; Stein et al. 2006]. We have conducted research on generation of computer animations of ASL [Huenerfauth 2008a]. During an earlier study in which human signers evaluated ASL animations, several participants wrote feedback comments requesting changes to the animation’s speed [Huenerfauth et al. 2008]. The animation speed in that study had been chosen based on the typical signs-per-minute of ASL reported in the linguistics literature. Because of these comments, we decided to conduct the present study to investigate how to set time-durations of signs and placement of pauses in ASL animations. 1.1 Speed of ASL Signing For users who are fluent in ASL but may have low literacy in English, information is most accessible if presented in ASL. Linguistic researchers have established that ASL signing conveys information at the same rate as spoken English [Bellugi and Fischer 1972], but the average speed at which most literate adults can read English text is much faster than the speed of spoken English audio. So, a deaf user who is relying on an ASL animation to receive ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 3 information would receive that information at a lower sentence-per-minute rate than adults who are reading English text. This creates a disparity in the speed of access to information between users of ASL animations and users who are reading written English text. For deaf adults with low literacy to access information at a speed comparable to English-reading adults, ASL animations must be displayed quickly (while maintaining their understandability). The linguistics literature contains a range of values for “normal” signing speed: from 1.5 to 2.37 signs per second [Bellugi and Fischer 1972; Grosjean 1979]. It is known that when ASL videos are played faster than 2.5 times normal, viewers’ comprehension of the video drops significantly [Fischer et al. 1999]. To find ways to present ASL computer animations both quickly and understandably, we have sought inspiration from research on English speech audio. Studies of English speech have shown that inserting pauses at linguistically determined locations in high-speed audio allowed it to be more easily understood. One study increased the speed of an English speech recording, and researchers later inserted pauses at linguistically appropriate locations (between sentences, clauses, phrases) [Wingfield et al. 1999]. The pauses improved listeners’ comprehension. This benefit arose only if pauses were at linguistically appropriate locations (not at random or uniform locations), and the benefit leveled off once the pauses had increased the duration of the recording back to its original length before the artificial speeding [Wingfield et al. 1999]. Two explanations for this link between comprehension and linguistically placed pauses were proposed: (1) pauses may help listeners more easily determine sentence/clause boundaries in the performance or (2) pauses at appropriate phrase boundaries give the listener some additional time to mentally “chunk” units of information and process/remember them more effectively. In one study on ASL, black-screen segments of video were inserted into a double-time video recording of a human performing ASL. The black-screen segments were added between “semantically unitary statements” [Heiman and Tweney 1981]. (We understand this to mean that black frames were added between sentences/clauses.) Blank segments were of uniform duration, and enough were added to return the ASL video to its original time. No significant improvement in viewers’ comprehension resulted. It has not been examined whether inserting pauses (in which the signer remains on the screen but the hands stop moving) at linguistically appropriate locations would impact viewers’ comprehension of videos of human signers. Our work examines whether inserting pauses into computer-generated animations of ASL improves viewers’ comprehension of the animation or makes the resulting animations appear more natural-looking to viewers. 1.2 Related Work on ASL Animation Several research projects have investigated the synthesis of computer animations of virtual humans performing sign language [Elliot et al. 2008; Fotinea et al. 2008; Huenerfauth 2006; Sheard et al. 2004] (and surveys in Huenerfauth [2006] and Kennaway et al. [2007]). For example, several years of European research projects have contributed to the eSIGN project, which ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 4 · M. Huenerfauth creates technologies for content developers to build sign databases in a symbolic notation, assemble scripts of signing performance for use on accessible web pages, and allow viewers to see animations on their Web browser [Kennaway et al. 2007]. SignSmith Studio1 , discussed in Section 3.1, is a commercial tool for scripting ASL animations. Other computer science research has examined automatic generation or machine translation (MT) of sign language [Chiu et al. 2007; Huenerfauth 2006; Karpouzis et al. 2007; Marshall and Sáfár 2005; Morrissey and Way 2005; Shionome et al. 2005; Stein et al. 2006; van Zijl and Barker 2003]. The computer science literature on sign language animations has not focused on the timing of these animations. Content developers may script individual signs while observing the timing of human signers in video [Kennaway et al. 2007] or use motion-capture technology to record individual signs from humans directly [Elliot et al. 2008], but this addresses the timing of isolated signs (not sentences). Also, many sign language animation systems give the viewer the ability to adjust a dial that modifies the speed of the performance [Elliot et al. 2008; Kennaway et al. 2007]; however, Section 2 will discuss how the speed of an ASL performance is more complex than a single speed value. Previous sign language animation research has not examined how the timeduration of a sign is affected by its surrounding linguistic context (what other signs occur in a sentence or in a performance) nor how pauses should be placed in an animation to mimic how human signers tend to pause at natural linguistic boundaries. SignSmith allows the content developer to manually specify pauses to occur, and content-scripting tools from the eSIGN project give similar control over speed, timing, and pauses [Kennaway et al. 2007]. Animations from generation or MT projects do tend to include pauses between sentences [Elliot et al. 2008; Fotinea et al. 2008; Huenerfauth 2006], but a principled linguistic way to select where to insert pauses into a sign language animation has not been described previously in the literature. 2. LINGUISTIC BASIS FOR THE DESIGN The timing of an ASL performance is actually more complex than a single “speed” variable (representing the number of signs per second). In fact, many parameters are needed to specify the speed of an ASL performance: the speed of fingerspelling relative to speed of other signs, the time spent in transition movements between signs, the time spent pausing during signing, etc. Further, ASL signs are not all performed in the same amount of time: each sign has a standard time duration at which it is performed. (For example, the ASL sign “SEA” involves both hands making a wavelike motion in space, but the sign “HATE” involves the middle finger of both hands being flicked out from behind the thumbs. Generally, because of the complexity of the movement path of the signer’s hands, the sign SEA requires more time to perform than the sign HATE.) Thus, the final timing of ASL is a complex interaction between several speed parameters and the lexical durations of the specific signs performed. 1 http://www.vcom3d.com ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 5 Although several psycholinguistic experiments have examined human ASL speed and pausing [Bellugi and Fischer 1972; Fischer et al. 1999], the most detailed analysis was conducted by Grosjean and Lane [Grosjean 1979; Grosjean and Lane 1979; Grosjean et al. 1981] in the late 1970s. During several years of research, Grosjean and Lane studied the interaction of three component variables of signing rate: (1) the articulation rate at which the hands move through space, (2) the number and location of pauses in a sentence, and (3) the length of each pause. They defined a “pause” as a period of time between two signs when the hands are not moving—there is not a pause between every pair of signs [Grosjean et al. 1981]. In their view, signs consist of an “in-transition time” to get the hands in the proper position, the main movement of the sign, and an optional “hold” at the end of a sign where a pause could occur [Grosjean 1979]. Grosjean and Lane observed that ASL signers in recorded videotapes perform signs before sentence boundaries more slowly (12% longer) than their normal duration [Grosjean 1979]. Also, they observed that when a sign occurs more than once during a performance, then the durations of the later occurrences differ from the typical duration for that sign. If later occurrences of the sign appear in a syntactic position where the sign has appeared before (e.g., as the subject or as the direct object of the sentence), then the later occurrences are 12% shorter. If the later occurrence of a sign appears in a new syntactic position where it had not previously appeared, then the later occurrence of the sign is 12% longer [Grosjean 1979]. For example, if a sign appears early in a performance in the subject position of a sentence, and the same sign is used as a direct object in a later sentence in the performance, then the second occurrence is longer. 3. TWO ALGORITHMS FOR ASL SPEED AND TIMING While there has been psycholinguistic research on ASL pauses and sign durations, it has not been previously applied to ASL animations. Based on these ASL linguistics studies, we have built algorithms for calculating the duration of signs and the location/length of pauses during an ASL animation. Our two algorithms thus attempt to set timing values for an ASL animation so that it mimics the behavior of human signers. Our goal is to improve the understandability and perceived naturalness of ASL animations. 3.1 Using SignSmith Studio for Prototyping To build a prototype of our algorithms and to evaluate whether they produce ASL animations that signers find more understandable and natural, we had to create animations for signers to examine. We had to select a method for producing computer animation of ASL. In earlier work, we had built a system for generating animations of a character performing ASL sentences (containing constructions called “classifier predicates,” which describe 3D spatial concepts) [Huenerfauth et al. 2008]. Our generator was designed with a primary focus on these constructions, and it had a small ad hoc vocabulary of signs in its repertoire, which had been expanded as needed to construct full sentences ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 6 · M. Huenerfauth for past evaluation studies. To conduct tests of algorithms that operate on sign-duration and pause-insertion on wide variety of ASL sentences (not just classifier predicates), we needed an animation platform with a larger vocabulary of signs. We have decided to use SignSmith Studio, a commercial system from VCom3D. This product allows users to script an ASL performance (using a dictionary of signs, a fingerspelling generator, limited eye gaze and head tilt, limited shoulder tilt, a list of around 50 facial expressions, optional speech audio and mouth movements, and other ASL features). The user is presented with an interface that looks like a set of parallel tracks (like a musical score), and the user arranges signs and facial expressions on these parallel timelines. When a “play” button is pressed, then an animation is generated in which a virtual human character (there are several to choose from) performs the script. The advantage of using a commercial system that we did not develop is that the movements and speed/durations of the signs in its dictionary were developed external to our research project. The signs were not built solely for this study, and so each sign’s duration in the dictionary was set independently of our experiments. Another advantage of SignSmith is that its representation of the timing of signs is compatible with that of our generation software [Huenerfauth 2006]; so, progress made running experiments on their animations can translate into later improvements for our ASL generation system under development. Specifically, SignSmith stores three timing values for each sign: (1) transition time during which the hands get to the starting position, (2) main movement of each sign, and (3) hold time at the end of the sign when the hands remain motionless. In SignSmith, the user can manually override the default values for these three sub-times for each sign in the animation. Signs have a basic “duration” value for their main movement, and this is multiplied by a “multiplier” factor (that the user may optionally specify) to vary the time of the sign’s main movement in the resulting animation. To produce ASL animations, users of SignSmith are expected to be knowledgeable of the language; however, even fluent signers may not have intuitions about how speed and pauses should be numerically specified to create a natural result. The documentation for SignSmith mentions that users may want to insert longer transition times between two signs at a sentence boundary. (So this appears to be the recommended approach for users to manually add pauses between sentences.) For this project, we created several multisentence ASL performances using SignSmith—leaving the default timing values for each sign in the script (more details in Section 4.2). The script for these animations was used as input to the sign-duration and pause-insertion algorithms discussed in the following. 3.2 Sign-Duration Algorithm We have implemented an algorithm for calculating linguistically motivated durations for the signs in an ASL animation—based on the standard duration for each sign and its surrounding linguistic context. The input to our algorithm is an initial script of the ASL performance from SignSmith (an XML file ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 7 representing the sequence of signs with their standard time durations) and some linguistic information: (1) part-of-speech for each sign (with nouns subcategorized according to their syntactic role as described in the following) and (2) sentence/clause boundaries identified. The output of the algorithm is a script in which the durations of some signs are modified. Our two-phase algorithm builds upon linguistic research in Section 2. In phase 1, signs occurring more than once in the script are detected. Only content signs like nouns, verbs, adjectives, and adverbs are modified in this sign duration algorithm—function signs like prepositions, pronouns, or other grammatical markers are not. If the repeated sign is a verb, adjective, or adverb, then later occurrences of the sign are shortened by 12% in duration. If the repeated sign is a noun, then changes to later occurrences depend on the syntactic role of each occurrence (it may lengthen or shorten by 12%). Nouns are categorized as being in: topic position, when/conditional-clause position, subject position, direct/indirect object position, object-of-preposition, etc. (Topic and when/conditional clauses occur at the beginning of ASL sentences and are accompanied by special facial expressions.) In phase 2 of the algorithm, signs that appear just before sentence or clause boundaries are lengthened (by 12% or 8%, respectively). Section 7.2 will discuss how future work may examine alternative implementations of this sign-duration algorithm in which some of these numerical parameters (e.g., the 8% or 12% modifications in sign durations) are set differently. 3.3 Pause-Insertion Algorithm We have also implemented an algorithm for determining where to insert pauses (and how long they should be) during a multisentence ASL animation. The input to this algorithm is a script of an ASL performance that includes the sequence of signs with the time duration of each (the XML output of the previous algorithm can serve as the input to this algorithm). Our pause-insertion algorithm also requires some linguistic data about the ASL sentences: (1) location of sentence boundaries and (2) a syntactic parse tree for each sentence. (This data is again supplied manually to the algorithm for this study; tools for automatically parsing ASL sentences are a focus of future work, as discussed in section 7.2.) The algorithm’s output is a script of the ASL performance in which the “hold” times at the end of signs have been modified to produce pauses during the performance at linguistically appropriate locations. 3.3.1 The Original Grosjean and Lane Model. Our algorithm builds upon ASL psycholinguistic research on speed and timing. When analyzing video recordings of human ASL signers, Grosjean and Lane proposed a model to account for where in a performance signers would insert pauses (and how long pauses would be) [Grosjean and Lane 1979; Grosjean et al. 1981]. Their model assumed that a syntactic parse tree for the sentence was known, and it predicted the percentage share of total pause time that should be inserted between adjacent pairs of signs. Their model only accounted for pauses in a single sentence—not for a multisentential passage. We have used their model as a basis for our multisentential ASL pause-prediction algorithm. ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 8 · M. Huenerfauth The key idea of Grosjean and Lane’s model was that two factors determined how much pause time occurs between two signs: (1) the syntactic significance of that boundary and (2) whether the boundary is near the midpoint between the nearest pauses. Thus, boundaries between sentences, clauses, and major phrases in a sentence were preferred locations for pauses, but this preference was balanced with a “bisection tendency” [Grosjean and Lane 1979] to pause near the middle of long constituents. Grosjean and Lane [1979] describe an iterative method for calculating the percentage share of total pause time that should be inserted between each pair of adjacent signs in a performance. They first assign a syntactic “complexity index” value (CI) to each sign boundary in the ASL sentence; this value is calculated using the syntactic parse tree for the sentence. In a parse tree, there exists some lowest node that spans the boundary between each pair of adjacent signs; the syntactic importance of that node determines the CI value of that boundary. Specifically, the total number of descendant nodes below that node in the parse tree is the CI value for the boundary. Thus, the boundary between two clauses would have a larger CI value than the boundary between two signs inside a noun phrase. Grosjean and Lane’s method iterates until all boundaries between signs have been assigned a percentage share of pause time. One pause is added during each iteration. An iteration of the algorithm begins by selecting the longest span of signs not yet broken by a pause (based on number of signs, not on sign durations). For each boundary between adjacent signs within that span, the relative proximity (RP) of the boundary to the midpoint of the span is calculated (RP = 100% at the midpoint, RP = 0% at the ends of the span, etc.). For each boundary inside the span, the CI value is multiplied by the current RP value. The boundary with the greatest (CI*RP) product inside the span is chosen as a location for a pause; the percentage share of pause time assigned to that boundary is calculated based on the product. The algorithm then iterates (selecting the longest remaining unbroken span of signs in the whole performance and then calculating fresh RP values for that span under construction [Grosjean and Lane 1979]. 3.3.2 Our Pause Insertion Algorithm. Our algorithm implements and extends the Grosjean and Lane method in several ways. First, it takes into account the results of our sign-duration algorithm. When calculating the RP values, the Grosjean and Lane model operates on signs as if they were all uniform unit duration. Our algorithm uses the actual timing values (in-transition time + sign-duration) of the signs in a span to calculate the RP values. Second, we had to extend the Grosjean and Lane model to account for multisentential performances. Syntactic parse trees span over single sentences, so the CI for a boundary between sentences is undefined in the Grosjean and Lane model. Our algorithm sets the CI value of the sentence boundary between any two sentences (S1 and S2) as equal to max(18, length(S1)+length(S2)-2), where max is the maximum function and length is the number of signs in each sentence. The logic behind this approach is that if the two sentences had been joined by a conjunction, then the root of the parse tree that joins them ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 9 Fig. 1. Pseudocode for pause-insertion algorithm. would dominate all of the nodes in the tree for S1 and all nodes in the tree for S2. Assuming binary-branching syntactic parse trees, the number of internal nodes in a tree for a sentence S would be length(S)-1. To ensure sentence boundaries adjacent to short sentences still receive sufficient syntactic heft, our algorithm will assign a CI value of at least 18 to any sentence boundary. Our algorithm uses other results from the psycholinguistic literature. ASL signers insert a pause at 25% of the boundaries between signs [Grosjean 1979], so our algorithm adds pauses at boundaries that have been assigned the top 25% of pause-percentage weight. ASL signers spend about 12% of their time pausing during rehearsed sentences [Grosjean 1979] and 10%–35% during spontaneous signing [Bellugi and Fischer 1972]. So, we insert pause-time into the animation such that 17% of the final animation time is spent in pauses (a middle-ground between published percentages). Pause time is added to the “hold” time of the sign before the boundary. Figure 1 shows pseudocode for our pause-insertion algorithm. Boundaries with the top 25% Pause-Share (PS) values in Figure 1 receive a share of the pause time proportional to their PS value. 4. DESIGN OF THE FIRST EVALUATION STUDY While linguistic results for ASL timing have guided our algorithm design, we had to perform our own evaluation studies of ASL animations for the algorithms we implemented. Limitations of the animated character’s appearance, expressiveness, and movement subtleties combined with the expectations human viewers have of computer animations could lead to different results when evaluating the speed and timing of a computer-generated ASL animation. We conducted an evaluation study in which native ASL signers evaluated the animations processed by our timing algorithms to test two hypotheses: (H1) adding linguistically motivated pauses and variations in sign durations will help ASL signers understand and remember information from these animations and (H2) these new ASL animations will appear more natural-looking to signers. ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 10 · M. Huenerfauth 4.1 Participant Recruitment and Interaction In an earlier evaluation study [Huenerfauth et al. 2008], we outlined a set of best practices for the conduct of studies involving ASL signers that are designed to evaluate ASL animations. We described how it was important that the signers recruited for the study be native signers, how to best ask questions to screen for such signers, and how the experimental environment around signers should be ASL-focused (with as little English or English-like signing as possible). All of these factors help to ensure that the responses given by participants about the correctness of the animations are as ASL-accurate as possible. Nonnative signers who learned ASL later in life may be more lenient when judging ASL animations, and signers subjected to an English environment may switch their own signing to a more English-like form. This can also result in their being overly tolerant of animations that are too English-like [Huenerfauth et al. 2008]. For the current study, all instructions and interactions were conducted in ASL, and 8 of the 12 participants arrived accompanied by another ASL signer (thereby producing an ASL conversational environment immediately prior to the study). Advertisements posted on Deaf community websites in New York City asked whether potential participants had grown up using ASL at home or attended an ASL-based school as a young child. Of the 14 people who came to the laboratory, 2 answered prescreening questions in such a way that they did not meet the screening criteria. Their data was excluded from the study. Of the 12 participants whose data was included, nine grew up with parents who used ASL at home. Of the remaining three, two began attending a school using primarily ASL before the age of 7, and the final participant began using ASL before the age of 7 through another circumstance. Of our 12 participants, 5 had a significant other who is deaf/Deaf, 9 used ASL as the primary language in their home, 11 used ASL at work, and 11 had attended a college where instruction was primarily in ASL. There were 7 men and 5 women of ages 25–58 (median age 37). 4.2 Animations Shown in the Study Twelve ASL passages of length 48–80 signs (median 69 signs) were created in SignSmith on a variety of topics: four short news stories, two adaptations of encyclopedia articles, four fictional narratives, and two personal introductions. Passages contained sentences of a variety of lengths and complexity; some included topicalized noun phrases, condition/when clauses before a main clause, rhetorical questions, contrastive role-shift (signers may tilt their shoulders to the left or right as they contrast two concepts), or association of entities with locations in space (for use by later pronouns during which the signer points to those locations). Table I contains a listing of all 12 passages used in the study with a brief summary of each. Figure 2 contains a transcript (in the form of ASL glosses and English translation) for one of the passages used in the study. See Figures 3 and 4 for a screenshot and timeline of animations from our study. Sample videos from the study are available on the website of ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 11 Table I. List of the Twelve ASL Passages Used in the Study Passage Name Bear Spray Big Interview Chess Program Cost of Rice Garage Sale Jose & Family President Eugene Student Protest Reporter Albert Obituary Sports Injury Hurricane Damage Summary News story: scientists make a bear repellent spray. Narrative: Martin’s daughter has a big job interview. Encyclopedia entry: computer software that plays chess. News story: the price of rice rises internationally. Narrative: Sally’s aunt holds a garage sale. Personal Introduction: Jose’s family history and heritage. Narrative: seven-year-old child wants to be president. Narrative: students protest animal use in laboratories. Personal Introduction: college student studying journalism. News story: famous classics professor dies. News story: student athlete is injured and cannot play. Encyclopedia entry: storms causing expensive damage. Fig. 2. Transcript of sample passage from the study (first in the form of ASL glosses and then in the form of an English translation). The passages were displayed in the form of computer animations of ASL during the experimental study—as shown in Figure 3. the Linguistic and Assistive Technologies Laboratory at Queens College of the City University of New York [Huenerfauth 2008c]. An ASL interpreter verified the accuracy of the twelve animations—to a degree. While SignSmith gives the user many options in controlling the animated character, there are many phenomena in fluent ASL signing that are beyond the capabilities of the system: inflection of ASL verbs for locations in 3D space, separate use of head tilt and eye gaze during verb signs to indicate subject/object agreement, association of entities under discussion with more than two locations around the signer, etc. SignSmith’s dictionary also does not contain all possible ASL signs. Therefore, the animations produced for this study are not perfectly fluent ASL. To evaluate our timing algorithms, we feel that “rather fluent” ASL animations are acceptable. State-of-the-art ASL generation technology will not be able to produce “fully fluent” ASL animations for many years in the future. So the degree to which the timing algorithms can improve the understandability and naturalness of semi-fluent ASL animations is still an important research question (and is perhaps a more realistic evaluation our algorithms). ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 12 · M. Huenerfauth Fig. 3. Screenshot from an animation. Fig. 4. Illustration of sequence of signs in an original animation (top row) and after being processed by our timing algorithms (bottom row). Additional space between signs represents pauses added to the animation script by our algorithms. 4.3 Responses and Experimental Design Participants viewed animations of six types: 2 groups (no-pauses vs. pauses) × 3 speeds (normal, fast, very fast). No-pauses animations were not processed by our timing algorithms; pauses animations have been processed by our sign-duration and pause insertion algorithms. We also examined how quickly animations can be played while remaining understandable. Normal-speed animations are at a rate of 1.5 signs per second, fast-speed animations are 2.25 signs per second, and very fast animations are 3 signs per second. (Values for average signing speed in the linguistics literature vary from 1.5 to 2.37 signs per second [Bellugi and Fischer 1972; Grosjean 1979].) Another reason to study both pause insertion and speed in one study is that we can determine whether any effect from pause insertion is from (1) pauses being placed at linguistically appropriate locations or (2) simply from the additional time added to the animation. Each of the 12 ASL passages was generated in all six combinations of group × speed, producing 72 animation files. A fully factorial within-subjects design was used to assign 12 files to each participant in the study such that: (1) no participant saw the same passage twice, (2) the order of presentation was randomized, and (3) each participant saw two animations of each of the six combinations of group × speed. Animations were viewed on a 17” LCD screen at a distance of less than one meter. The animations occupied a 10cm × 10cm region of the screen. We selected this animation size for the experiment based ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 13 on the assumption that many future applications that include sign language animations may display the animated character in a window that occupies a fraction of the screen (with the remainder of the screen used for the rest of the application’s user interface). For this reason, we conducted our experiments with the animations at the size of 10cm × 10cm, which was approximately one-ninth of the screen area. Several formal evaluation studies of ASL animations have been conducted that involve native signers [Huenerfauth et al. 2008; Huenerfauth 2006; Kennaway et al. 2007; Sheard et al. 2004; Vink and Schermer 2005]. In our earlier study [Huenerfauth et al. 2008], participants saw animations and were asked to circle numbers on 10-point Likert scales to indicate how Grammatical, Understandable, and Natural-moving the animations were. Participants in that study appeared comfortable with the instructions and giving their opinions about these aspects of the ASL animations. We have used the same three subjective criteria in this study. For this study, we have added an additional Likert scale to enable participants to indicate whether the ASL animation is too-slow, perfect, too-fast, or somewhere in-between. As in our previous study, instructions were given to participants in ASL to explain the meaning of each of these Likert scales. Earlier experiments evaluating the understandability of sign language animations have given viewers various tasks to demonstrate their comprehension: decide if two animations say the same thing [Vink and Schermer 2005], match a signing animation to a movie of what was described [Huenerfauth et al. 2008; Huenerfauth 2006], summarize the animation [Sheard et al. 2004], or answer comprehension questions about the animation’s content [Sheard et al. 2004]. Studies of ASL videos have also pioneered techniques useful for evaluation of animation [Bellugi and Fisher 1972; Fisher et al. 1999; Grosjean 1979; Grosjean and Lane 1979; Grosjean et al. 1981; Heiman and Tweney 1981; Tartter and Fisher 1983]. Heiman and Tweney [1981] showed signers multisentence ASL videos followed by ASL videos of comprehension questions. Their participants wrote down English answers to the questions. Tartter and Fischer [1983] asked signers to select a cartoon drawing that corresponded to the content of video of an ASL sentence they had seen. In our study, after viewing the ASL passage, participants will answer the four Likert-scale questions discussed above. Then, participants are shown a set of four comprehension questions about the information in that ASL passage. The same animated signing character used during the passage performs these four questions; after each question, the animated character gives a list of multiple-choice answers. These answers correspond to a list of cartoon clip-art pictures on a paper survey form, and the participant circles the correct answer(s) to each question. We have adapted the design of Heiman and Tweney so that our participants do not need to write any English text during the experiment—they circle a picture as their answer. We had planned to omit labels below the cartoons to avoid English influence in the environment; however, during pilot tests, participants requested we add short English captions (Figure 5). We included an example story with comprehension questions prior to the data collection process. This allowed participants to become comfortable ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 14 · M. Huenerfauth Fig. 5. Sample set of answer choices for one comprehension task question. Fig. 6. Transcript of comprehension questions used in the study for the passage shown in Figure 2 (each question is first presented in the form of ASL glosses then as an English translation). Questions were displayed as ASL computer animations during the study. seeing the animated signing character, and it ensured that the instructions for the study were clearly conveyed. Most questions ask basic who/what/when/where facts from the passage, and about 10% are less direct. For an example of a less direct question, in one passage, a person is said to be vegetarian, and a later question asks what foods this person eats—choices include: hamburgers, hot dogs, salad, etc. Figure 6 contains a set of questions used in the study; Figure 7 shows the set of answer choices shown to the participant as clip-art images for these questions. To minimize the effect of variation in skills across participants, questions focused on shallow comprehension of basic facts in the passages. Also, our “comprehension” questions actually measure a mixture of the participant’s recall and comprehension of the passage. Participants were not allowed to watch the passage more than once; so, they could not replay it to look for an answer. (Participants could replay the questions.) Since ASL animations should both be comprehensible and convey information memorably, we decided it was acceptable for our questions to measure both recall and comprehension. In future work, we may study these two phenomena more independently. Statistical tests to be performed on the data were planned prior to data collection. To look for significant differences between scores for GrammatiACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 15 Fig. 7. Clip art answer choices corresponding to the questions in Figure 6. cality, Understandability, Naturalness, Speed, and Comprehension-task2 performance, a Kruskal-Wallis test was performed. Nonparametric significance tests were used because the Likert-Scale response data was not known to be normally distributed. In addition, tests for correlation were planned between the major dependent variables (G, U, N, S, and C), and between various demographic variables (participant’s age, gender, the presentation order in which individual animations were seen by that participant, etc.). After viewing 12 ASL passages and answering the corresponding Likertscale and comprehension questions, participants gave open-ended feedback about the animations. They were shown two pauses animations (that had been processed by our two algorithms) while they gave their feedback to give them something specific to comment about if they preferred. Participants were given the option to sign their comments in ASL or to write them down in English themselves. 5. RESULTS OF THE FIRST EVALUATION STUDY To score the Comprehension task for each passage, we subtracted the number of correctly circled pictures minus 25% of the incorrectly circled pictures. 2 Capitalized terms (Grammaticality, Understandability, Naturalness, Speed, and Comprehension) refer to response values collected in our experiment; lower-case terms refer to the general meaning of these words. This distinction is particularly important for speed/Speed: the capitalized term refers to the 21-point Likert-scale value collected in the study, but the lower-case term refers to the signs-per-minute speed of the animations. ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 16 · M. Huenerfauth Fig. 8. Graph of comprehension task scores. Fig. 9. Graph of Likert-Scale scores for subjective responses from the first evaluation study. This difference was then divided by the highest possible score for that question (to enable comparisons across questions with different numbers of correct answers). Figure 8 shows the average Comprehension task score for the no-pauses vs. pauses groups; the white bar represents the average across all responses for that group, and the shaded bars represent the values for each speed subgroup (normal, fast, very fast). Tests for statistical significance were performed between the pause and no-pause groups for each speed level and for the combination of responses across all speeds. Statistically significant differences between pairs of compared values are marked with an asterisk. Figures 9 and 10 show average Likert-scale responses for each of the six animation types for all three speeds. The graphs include white bars that indicate the average value across 72 responses, while the shaded bars for each speed level indicate the average of 24 responses for animations of that type (group × speed). The values shown in Figure 9 (G, U, and N) are reported on a 1-to-10 Likert scale. The perceived Speed value (S) shown in Figure 10 was reported on a ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 17 Fig. 10. Graph of the Likert-Scale score for perceived Speed from the first evaluation study. Table II. Pearson’s R-Values between Compared Values Under. Natural. Speed Compre. Gram. 0.687 0.537 0.506 0.406 Under. Natural. Speed 0.673 0.761 0.473 0.473 0.317 0.413 21-point scale: from 1 (“too fast”) to 10 (“perfect”) to 21 (“too slow”). Out of the 144 S scores collected, only six scores were above 10 (range 11–13, median 11.5), all for normal-speed animations (3 for pauses animations, 3 for no-pauses animations). Another important note about Figures 9 and 10 is that significant differences are indicated only between pauses vs. no-pauses pairs of animations. This evaluation study was conducted to evaluate two hypotheses: (H1) adding linguistically motivated pauses and variations in sign durations will help ASL signers understand and remember information from these animations and (H2) these new ASL animations will appear more natural-looking to signers. The use of our timing algorithms has led to a significant increase in Comprehension task performance. This was true in the overall case (all speeds combined), and it was also significant when only considering the normal-speed data. We also see significant differences for the scores participants gave to the animations for Understandability. Since the normal-speed animations received perceived Speed scores closest to 10 (perfect), it is important that our Comprehension and Understandability trends hold true in the normalspeed case. Our second hypothesis was not supported by the data; we did not measure significant difference in the Naturalness scores of animations in the pauses vs. no-pauses groups. Table II shows the correlations calculated between scores for G, U, N, S, and C. All Pearson’s R-values in the table are statistically significant (p<0.05). Understandability was the value best correlated with a participant’s success on the Comprehension task for an animation, but it was not a strong correlation (R=0.473). In an earlier study with ASL animations [Huenerfauth et al. 2008], ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 18 · M. Huenerfauth we also saw low R-value between Understandability and Comprehension task success. Low correlation between U and C suggests that asking participants to subjectively rate the understandability of animations is no substitute for directly measuring Comprehension task performance. Correlations were also examined between the order in which each animation was shown to a participant and the evaluation scores. There was a slight correlation between presentation order and reported Speed Likert-scale score (R = 0.11) and between presentation order and Comprehension task success (R = 0.15); however, neither was statistically significant. Since we presented the passages in this study in randomized order for each participant, the practice effect would have minimal impact on the results. There were weak but significant negative correlations between a participant’s age and their scores for Grammaticality (R = −0.26), Understandability (R = −0.22), and Naturalness (R = −0.29). There was no significant correlation between age and perceived Speed nor between age and Comprehension-task success. So, older participants rated animations more critically but this did not lead to differences in their rating of the animation’s perceived Speed nor success at the Comprehension task. No age-related differences were observed in Comprehension scores for pauses vs. no-pauses. Most feedback comments from participants were on aspects of the animation inherent to SignSmith or specific passages (quality of facial expression, movement of individual signs, geographic or regional specificity of some signs, “stiffness” of the character’s torso, or limited association of entities with points in 3D space for pronouns). For example, eight of the twelve participants mentioned that the animated character should have more facial expressions during the signing performance. Three participants felt that the animated character should have more eye gaze movements; while one commented that the eye gaze movements were already quite good. One participant commented that the transition movements between pairs of signs should be smoother looking. Interestingly, none of the participants commented on the presence or absence of pauses. This feature of the animations did not seem to draw their overt attention. Some participants’ comments were relevant to the timing. For example, nine participants mentioned that the animations at the normalspeed were still too fast, but three felt the speed was OK. Three participants felt that the fingerspelling was relatively too fast compared to other signs. In the normal-speed animations, fingerspelling occurred at a rate of 4.1 letters per second with a 0.8 second hold at the end. 6. SECOND EVALUATION STUDY We conducted a follow-up evaluation study to address some additional research questions raised by the results above. Specifically, since all three speeds of animations in the first study were judged to be too fast by participants, we wanted to ask participants to evaluate animations at slower speeds—with a goal of identifying an ideal speed at which the animations should be displayed. We not only wanted to find out how participant’s subjective Likert-scale rating of the perceived Speed of the animations would vary, but we also wanted to ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 19 Fig. 11. Comprehension scores plotted by signs/second. see what levels of Comprehension scores participants would achieve on these slower animations (since speed of presentation had a significant impact on Comprehension task scores in the first study). In addition to displaying animations at slower speeds, we wanted to address a limitation in the design of our first study. Specifically, there are two ways to define the speed of an animation with pauses: the signs per minute of the original animation or the recalculated signs per minute taking into account the additional time from pauses. In the first evaluation study, we used the first definition to label animations as normal, fast, or very fast; so a normal-speed animation without pauses was shorter than the corresponding normal-speed animation with pauses. This means that the pause-insertion process added additional time to an animation. Because we observed a connection between speed of animation and Comprehension scores in our first evaluation study, this makes it difficult to determine the true benefit of inserting pauses into animations. It was not clear whether the improvement in Comprehension scores that we observed in the first study were due to the fact that the pauses were in linguistically appropriate locations. Alternatively, the increased Comprehension scores could have merely arisen from the additional time added to the animation through the pause-insertion process. To better understand this issue, we plotted the Comprehension scores collected during our first evaluation study according to the total number of signs divided by total time (including pauses)—see Figure 11. If the only benefit of adding pauses had been from the change in total animation time alone, then we would expect these two lines to coincide. Because they do not, then it appears placement of pauses at linguistically appropriate locations has a benefit beyond just an overall increase in the animation time. While Figure 11 suggests that pause-insertion has a benefit beyond merely slowing down the animation, to truly understand whether the location of pauses was significant, we decided to compare the following types of animations in our second evaluation study: (1) animations without pauses and (2) animations with pauses that have later ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 20 · M. Huenerfauth been sped up so that they have the same total time as the original no-pauses animation. 6.1 Differences between First and Second Evaluation Studies There are three major ways in which our second evaluation study differed from the first. The second study included 18 new participants (as opposed to only 12 participants in the initial study). In the data from our first study, we noticed many trends in the values for Comprehension, Grammaticality, Naturalness, and perceived Speed that did not reach statistical significance. For this reason, we wanted to include a larger number of participants in this second study. In our first study, we also noticed that participants rated all three speeds of animations as being too fast (on the Likert scale). In the second study, we decided to evaluate animations at slower speeds—with a goal of identifying an optimal speed for displaying animations of ASL. In the first study, we tested animations at speeds of 3.0, 2.25, and 1.5 signs per second (before inserting pauses into the animations). In our second study, we evaluated animations at: 1.5, 1.2, and 0.9 signs per second. We will refer to these three speeds as: normal, slow, and very slow. We decided to use the slowest speed from the first study (the normal-speed animations) as the fastest speed in the second study to more easily compare the results of the two studies. The third difference between our first and second evaluation studies is that after running our sign-duration and pause-insertion algorithms on the animations, we “normalized” the speed of the modified animations. Specifically, we increased the speed of the animations into which pauses had been added so that the animation with pauses had the same total time duration as the animation without pauses. Because we speed up animations after inserting pauses, we expect to see a smaller difference in Comprehension scores between the pauses and no-pauses animations in our second study. (While adding pauses appeared to increase Comprehension in the first study, increasing speed seemed to decrease Comprehension—thus, these two factors may somewhat counteract each other in the second study.) Thus, we anticipated that the design of our second evaluation study would be a more rigorous test of the benefit of our pause-insertion and sign-duration algorithms. 6.2 Design of the Second Evaluation Study Aside from the differences identified above, the design of our second evaluation study was identical to the first. The same set of ASL passages and comprehension questions were used, the same pause-insertion and sign-duration algorithms were applied to the animations, and a similar recruitment procedure was undertaken. There were 18 participants in the study (12 men and 6 women) of ages 21 to 39 (median age 29.5). Of the 18 participants, 8 grew up with parents who used ASL at home, 9 began using ASL in a school setting before the age of 8, and 1 began using ASL through another circumstance. Of our 18 participants in the second study, 8 had a significant other who is deaf/Deaf, 15 used ASL as the primary language in their home, 17 used ASL at work, and all 18 had attended a college where instruction was primarily ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 21 Fig. 12. Percentage of correct responses to the comprehension questions about the animations in the second evaluation study. in ASL. We formulated two hypotheses prior to this second evaluation study: (H3) Despite speeding up the animations processed by our sign-duration and pause-insertion algorithms, we will still see a significant improvement in the Comprehension task scores—as compared to the Comprehension task scores for the animations that had not been processed by our algorithms. (H4) Based on the trend in the perceived Speed Likert-scale scores from the first study, we would expect that participants will prefer animations that are displayed slower than normal speed—perhaps approximately 1.2 signs-per-second. 6.3 Results of the Second Evaluation Study Figure 12 shows the Comprehension task scores for the animations in the second evaluation study. Considering the combined data from all three speeds, we see a statistically significant difference between the Comprehension scores on the pauses animations and the no-pauses animations (p<0.05). This result is exciting because it demonstrates the usefulness of the sign-duration and pause-insertion algorithms on a more rigorous test than the first evaluation study. Thus, hypothesis H3 was verified. In this case, we increased the speed of animations after running our sign-duration and pause-insertion algorithms on them; so, we expected the Comprehension scores on the unprocessed and processed animations to be more similar than in the first study. The results of this second study indicate that there is a benefit to processing ASL animations with our pause-insertion and sign-duration algorithms—a benefit that goes beyond merely inserting additional time into the animations. Thus, given a fixed amount of time to display an ASL animation, it is actually beneficial to increase the overall movement speed of the virtual character to allow us to allocate time for pauses during the performance. The positive effect of pause-insertion outweighs the negative effect of speeding up the animation to normalize the total time duration. This result also suggests that our sign-duration and pause-insertion algorithms may allow us to display ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 22 · M. Huenerfauth Fig. 13. Graph of Likert scale scores for grammaticality, understandability, and naturalness-ofmovement from the second study. ASL animations at a faster overall speed (while still preserving their understandability)—thereby addressing the speed-bottleneck to information access discussed in Section 1.1. Comparing the Comprehension task scores for the animations at each speed level individually (normal, slow, very slow), we did not observe statistically significant differences in Comprehension scores. While not significant, the Comprehension task scores for the pauses animations were higher than the no-pauses animations. We also observed a trend in the Comprehension task results that was present in the first evaluation study: animations displayed at slower speeds had higher Comprehension task scores. Figure 13 displays the Grammaticality, Understandability, and Naturalness values for the animations in our second study. The results for Understandability show a somewhat similar trend to those in the first study (Figure 9). That is, the Understandability scores for the pauses animations appeared higher than the no-pauses animations, and the slower animations received higher Understandability scores. However, the differences in values between groups are very small. In fact the differences between the values for the pauses and nopauses animations were not statistically significant in Figure 13 (p<0.05). The results for Grammaticality and Naturalness in this second evaluation study were too similar in order for us to identify trends. It is possible that the trends identified in the Grammaticality and Naturalness scores in the first evaluation study arose largely due to the differences in the signs-per-minute of the animations between groups. In this study, now that we normalize the time duration of the animations after running our sign-duration and pause-insertion algorithms, the Grammaticality and Naturalness effects seen in the first study may have disappeared. Figure 14 displays the perceived Speed values for the animations in the second study. There is a similar trend to the results as those in the first evaluation study—see Figure 10. However, the differences between the values for the pauses and no-pauses animations are not statistically significant in Figure 14 (p < 0.05). ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 23 Fig. 14. Graph of the Likert-Scale score for perceived speed from the second evaluation study. Table III. Results for the Normal-Speed Animations with “No-Pauses” (i.e., not Processed by Our Two Algorithms) from the First and Second Study. Evaluation Criterion Comprehension Task Score Grammaticality Likert-Scale Score Understandability Likert-Scale Score Naturalness Likert-Scale Score Perceived Speed Likert-Scale Score Normal No-Pauses animations from 1st study 0.20 Normal No-Pauses animations from 2nd study 0.34 6.85 7.00 5.81 5.78 5.31 4.81 7.44 7.17 To understand how well the results of our first and second study can be compared, we can examine the results for normal-speed animations without pauses. (These animations were the only ones shown in both the first and the second studies.) If we see similar results for Comprehension, Grammaticality, Understandability, Naturalness, and perceived Speed, then we can conclude that the comprehension skill and subjective judgments of participants in the two studies were similar. Table III compares the scores for the normal-speed no-pauses animations in the first and second study. While the comprehension score is somewhat higher, none of the differences in the table are statistically significant (p<0.05, Mann Whitney U-test). 6.4 Determining Optimal Speed for ASL Animations An additional motivation for conducting this second evaluation study was to help identify an optimal speed for displaying animations of American Sign Language. In our first study, participants reported that all three speeds of animations (very fast, fast, and normal) were too fast. Participants’ responses on the 21-point Likert scale for perceived Speed are our primary source of data ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 24 · M. Huenerfauth Table IV. Perceived Speed Likert-Scale Scores and Comprehension Task Scores for Animations from Both Evaluation Studies, Sorted According to Signs-per-Second. Type of Animations: Speed, Pauses / No-Pauses (Study) Very Fast, No-Pauses (1st study) Very Fast, Pauses (1st study) Fast, No-Pauses (1st study) Fast, Pauses (1st study) Normal, No-Pauses (1st study) Normal, No-Pauses (2nd study) Normal, Pauses, Sped-Up (2nd study) Normal, Pauses (1st study) Slow, No-Pauses (2nd study) Slow, Pauses, Sped-Up (2nd study) Very Slow, No-Pauses (2nd study) Very Slow, Pauses, Sped-Up (2nd study) Average Signs-PerSecond 3.0 2.63 2.25 1.98 1.5 1.5 Perceived Speed Likert-Scale Score (10=perfect) 2.60 3.80 5.21 6.56 7.44 7.17 Comprehension Task Score 1.5 1.32 1.2 1.2 0.9 7.88 8.04 9.83 9.51 11.4 0.44 0.52 0.36 0.52 0.48 0.9 11.0 0.56 0.01 0.17 0.13 0.35 0.20 0.34 about their opinion of the speed of the animation displayed; however, we may want to consider how their Comprehension task scores vary according to the speed of the animation. Thus, even if participants report that they prefer a certain speed of animation, it may be important to consider how well they seem to understand animations at that speed. Table IV shows the perceived Speed Likert-scale scores and Comprehension task scores for the animations arranged according to the signs per second of the animations. If we focus on pauses animations (that have been processed by our signduration and pause-insertion algorithms), then we see that signers prefer animations displayed at a rate between 1.2 and 0.9 signs per second. (Considering that the Speed Likert-scale score is 9.51 for the 1.2 signs per second animations and 11.0 for the 0.9 signs per second animations, it is likely that the ideal value is around 1.1 signs per second.) Thus, in future work, we may want to repeat this study with animations at speeds between 1.2 and 0.9 signs per second to more precisely determine the ideal speed for displaying ASL animations. While participants seemed to prefer the speed of the 1.2 signs per second animations over the 0.9 signs per second animations, they actually had better Comprehension scores for animations at slower speed. Thus, there may be a trade-off between participants’ happiness with the speed of an ASL animation and their Comprehension success. Depending on the length of the message being conveyed and the importance of accurate understanding, a specific speed value can be selected for some application of ASL animations. Thus, our hypothesis H4 was supported—signers preferred animations displayed more slowly than the normal-speed animations. 7. CONCLUSIONS AND FUTURE WORK This project has identified a way to improve the comprehension of ASL animations by modifying two parameters that have received little systematic research attention before: insertion of linguistically-appropriate pauses and ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 25 modifying the duration of signs based on surrounding linguistic factors. This is the first study to show that signers perceive a difference in the quality of ASL animations based on these speed and timing factors. While this study focused on ASL, these techniques should be applicable to animations of other sign languages around the world. Other contributions of this work include: (1) a prototyping/experimental framework for measuring comprehension of ASL sentences (so that additional factors can be explored in later work), (2) collection of empirical data on speed preferences and comprehension rates of ASL animations by native ASL signers, and (3) motivation for future computational linguistic work on ASL. 7.1 Practical Issues for Making Use of Our Results The results of our two evaluation studies indicate an optimal speed for displaying animations of ASL, and this result may be of interest to other accessibility researchers who wish to use ASL animations in various applications. However, it is important to consider that the perceived Speed Likert-scale preferences of signers may depend on a wide range of factors specific to our experimental set-up: 1. The apparent size of the animation: the size of the computer screen, the portion of the screen occupied by the animation, and the distance of the human participant from the computer screen. 2. The “camera angle” of the animation: the 3D perspective from which the animation is generated of the virtual human. For example, it may be important for the animation to display the virtual human character at “eye level” as opposed to a camera angle that displays the virtual human from above or below. 3. The task that the participant is asked to perform when viewing the animation: whether the participant needs to remember detailed information from the animation, categorize the overall topic under discussion by an animation, or merely recognize the presence of specific words in the animation, etc. 4. The domain and genre of the information conveyed by the ASL animation: whether the animation is technical content, news stories, encyclopedic information, conversational, etc. Also, the familiarity of the specific participant with the topic of discussion may affect their comprehension of the content. 5. The visual appearance of the virtual human character in the animation: the facial features, the skin coloring, the background color, the clothing color and pattern, and various other specific visual aspects of the virtual human’s appearance may affect how easy it is for someone to see its movements. 6. The geographic- or culture-specific dialect of sign language used in the animation: if the variety of sign language performance includes vocabulary or colloquial phrases that are unfamiliar to the participant, then it may be more difficult to understand. 7. The “register” of the virtual human character’s sign language performance: whether the character is making large signing movements (as if giving a ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 26 · M. Huenerfauth speech in front of a large audience) or making small movements (as if “whispering” a sign language sentences so that it is discretely conveyed). 8. The context and environment in which the sign language animation is being viewed: whether the environment distracting, whether there are other activities occurring around the participant, whether the participant’s attention is being diverted while watching the sign language animation, etc. 9. The length of the animation being conveyed: For short messages in which comprehension is very important, it may be better to display animations more slowly. For longer messages, participants may not tolerate slower animations. We would expect that differences in these factors above could affect the perceived Speed Likert-scale preferences and Comprehension scores of participants in the study. Thus, we present our speed-related results in this article not as a one-size-fits-all “ideal speed” but as a useful starting point for other researchers to consider the optimal speed for displaying sign language animations in their own applications. As the state of the art in generating computer animations of ASL improves over time, then participants may prefer animations at faster speeds. We expect future improvements in the quality of ASL facial expressions, use of eye-gaze during signing, and use of 3D space around the signer to represent entities under discussion; these improvements may lead to computer animations of ASL more closely resembling movements of human ASL signers. As the quality of computer animations of ASL improves, then participants may be more willing to watch animations at faster, more human-like speeds. A further practical consideration for future researchers who wish to make use of our sign-duration and pause-insertion algorithms is that the algorithms require some linguistic information about the sentence in order to operate. Currently, for our algorithms to be used in a real-world setting as part of a tool like SignSmith, the user (who is scripting an ASL animation) must supply some information: (1) clause/sentence boundaries, (2) part of speech of each sign, (3) syntactic role of each noun, and (4) a syntactic parse tree for each sentence. Section 7.2 discusses how in future work we may explore computational linguistic techniques for automatically identifying this information in ASL animations. While we were generally pleased with our experience using the SignSmith software in our experiments, we did notice one limitation that caused some difficulty when inserting pauses between signs in the ASL animations. An ASL performance consists of periods of time during which the signer is doing one of the following actions: (1) performing the basic movement of an individual sign, (2) making a transitional movement from one sign to the next, (3) pausing his/her hands in between signs, or (4) holding his/her hands in a resting position. (Of course the boundary between these time periods can become somewhat blurred because there are ways in which the motion of one ASL sign has an impact on the performance of an adjacent sign.) We would prefer for the dictionary entries of a sign language scripting system to store the animation performance of “(1)” only—the basic movement of each individACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 27 Fig. 15. Three still images from the animation of the sign STATE from the animations produced using SignSmith. During this sign, the character moves the right hand in a closed-fist handshape down across the open palm of the left hand. In image (a) on the left and image (b) in the center, the beginning and end of the main movement of the sign is shown. Image (c) on the right shows how the signer’s hand moves after completing the performance of the sign. If a pause is inserted after the sign STATE using the SignSmith scripting software, then the character pauses in the configuration shown in image (c). A human ASL signer would have paused in the configuration shown in image (b). ual sign. However, some of the signs in the SignSmith dictionary included an additional movement of the signer’s hands at the end of the sign that would normally be considered part of “(2)”—the transitional movement of the hands between signs. Specifically, the animation-movement information for some signs in SignSmith’s dictionary included a movement of the hands back to a more neutral position after the main portion of the sign was completed. For example, during the sign STATE, the signer moves the right hand in a closed-fist handshape downward across the forward-facing open palm of the other hand. See Figure 15(a), (b). However, the SignSmith dictionary entry for STATE includes the movement of the hands shown in Figure 15(a), (b), (c). The likely reason why the creators of the SignSmith system have included this extra movement information at the end of the sign is that it tends to produce more natural-looking transitional movements of the hands between signs. By specifying how the animate character’s hands should move “out of ” the final position of a sign, the system can produce a more graceful and natural-looking performance when this sign blends into a following sign. If we were using sign language scripting software without inserting pauses into the animation, then there is no problem with having this extra movement data at the end of the sign. However, if we try to insert a pause after the sign STATE, then the animated character will: perform the sign, begin to retract the hands back toward a more neutral position somewhat, perform the pause midway during the transitional movement, and then begin moving again to continue the transitional movement prior to the next sign. Thus, the pause occurs at an odd location in the performance midway during the transitional movement between the end of STATE and the beginning of the next sign; it will pause at Figure 15(c). Ideally, the pause should occur at the end of the main movement of STATE; the animated character should pause at Figure 15(b). ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 28 · M. Huenerfauth While we happened to notice extra movement data at the end of over a dozen signs in the SignSmith system’s dictionary, this issue is certainly not specific to this system. To improve the naturalness of the performance, designers of dictionaries of signs for any sign language scripting system may decide to include some transitional movement data at the end of some signs. For future researchers who wish to make use of our pause-insertion software, we see two solutions to this concern: (1) the dictionary of the sign language scripting system should be edited so that extra “back to neutral position” movement data is not part of the dictionary entry for a sign or (2) the dictionary entry for each sign must specify the timestamp during the movement data when the main performance of the sign is finished and the transitional movement begins (this is where a pause could be inserted at the end of the sign). While less ideal, a quick fix would be to identify the subset of signs that include this extra movement data at the end of their dictionary entries and then modify the pause-insertion software so that it does not attempt to insert pauses after such signs in the animation. 7.2 Future Work This study has shown that ASL animations can be made more understandable if we apply sign-duration and pause-insertion algorithms to the animations. However, these algorithms require some linguistic information: a syntactic parse of the sentence and a part of speech for each sign. Therefore, there is motivation for future research in the design of tools to calculate this linguistic information automatically. It may be possible to develop automatic natural language processing tools to assign part of speech tags to signs in a sentence and provide some form of syntactic parse of the sentence that has been scripted (to automate the process). If our sign-duration and pause-insertion algorithms were used as part of an ASL animation generator (that automatically plans an ASL sentence—as in the final step of an English-to-ASL machine translation system), then the linguistic data required by our algorithms would have already been calculated by the generator as part of its planning work. In order to build the sign-duration and pause-insertion algorithms for this study, we had to select values for some numerical parameters based on numbers obtained from the linguistics literature: the percentage of time that a signer should spend pausing (17%), the percentage of boundaries between signs when a pause should occur (25%), the amount to lengthen the duration of signs before sentence boundaries (12%), etc. In future work, it would be useful to empirically verify that the values we selected for these parameters are actually the ideal settings for ASL computer animations. While it is a useful starting point to produce ASL computer animations whose speed and pausing are based on the behavior of human ASL signers, it may be the case that computer animations are even more understandable with different settings of these parameters. In future work, we also plan to use the experimental techniques developed in this study for measuring sentence comprehension to evaluate animations with different variations in timing variables—for example, we may modify the ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 29 relative speed of fingerspelling vs. other signs. ASL psycholinguistic studies have also suggested that the overall speed of a performance may affect where pauses are inserted (and their length), these studies have also suggested syntactic, discourse, and emotional features [Grosjean 1979] that affect speed and timing of ASL. We may modify our sign-duration and pause-insertion algorithms to incorporate these other factors. Finally, it was an earlier study of ours on ASL classifier-predicate sentences that first prompted us to explore the issue of ASL speed and timing [Huenerfauth et al. 2008]; so, we also plan to develop timing algorithms for those ASL constructions. When researchers like Grosjean and Lane described patterns in ASL signdurations almost thirty years ago [Grosjean et al. 1981], they were analyzing the data as a matter of linguistic interest. It would have been impossible to predict the development of computer and animation technology that has made the generation of ASL animations possible today—opening up a new application for their research. The promising initial results of this study open the question of whether there may be additional published linguistic research on ASL that can be applied to current research on ASL animation. This investigation paradigm (of seeking inspiration from the ASL linguistics literature, creating prototype systems to explore the potential benefits of new algorithms, and conducting controlled user experiments to measure any benefit) may lead to additional advancements in the state-of-the-art of ASL animation. ACKNOWLEDGMENTS Allen Harper and Julian Bayona assisted with data analysis and the preparation of experiment materials for the first evaluation study. Jonathan Lamberton recruited participants and organized experimental sessions for the second evaluation study. REFERENCES B ELLUGI , U. AND F ISCHER , S. 1972. A comparison of sign language and spoken language. Cognition 1, 2–3, 173–200. C HIU, Y.-H., W U, C.-H., S U, H.-Y., AND C HENG, C.-J. 2007. Joint optimization of word alignment and epenthesis generation for Chinese to Taiwanese sign synthesis. IEEE Trans. Patt. Anal. Mach. Intell. 29, 1, 28–39. E LLIOT, R., G LAUERT, J., K ENNAWAY, J., M ARSHALL , I., AND S ÁF ÁR , E. 2008. Linguistic modeling and language-processing technologies for avatar-based sign language presentation. Univ. Acc. Inform. Soc. 6, 4, 375–391. F ISCHER , S., D ELHOURNE , L., AND R EED, C. 1999. Effects of rate of presentation on the reception of American Sign Language. J. Speech Lang. Hear. Resear. 42, 568–582. F OTINEA , S.-E., E FTHIMIOU, E., C ARIDAKIS, G., AND K ARPOUZIS, K. 2008. A knowledge-based sign synthesis architecture. Univ. Acc. Inform. Soc. 6, 4, 405–418. G ROSJEAN, F. 1979. A study of timing in a manual and a spoken language: American sign language and English. J. Psycholinguist. Res. 8, 4, 379–405. G ROSJEAN, F., G ROSJEAN, L., AND L ANE , H. 1979. The patterns of silence: Performance structures in sentence production. Cogn. Psychol. 11, 58–81. ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. 9: 30 · M. Huenerfauth G ROSJEAN, F., L ANE , H., T EUBER , H., AND B ATTISON, R. 1981. The invariance of sentence performance structures across language modality. J. Exper. Psych. Hum. Percep. Perform. 7, 216–230. H EIMAN, G. AND T WENEY, R. 1981. Intelligibility and comprehension of time compressed sign language narratives. J. Psycholinguist. Res. 10, 1, 3–16. H OLT, J. A. 1993. Stanford Achievement Test—8th Edition: Reading comprehension subgroup results. Amer. Ann. Deaf. 138, 172–175. H UENERFAUTH , M. 2006. Generating American Sign Language classifier predicates for Englishto-ASL machine translation. Dissertation, Computer and Information Science, University of Pennsylvania. H UENERFAUTH , M. 2008a. Misconceptions, Technical Challenges, and New Technologies for Generating American Sign Language Animation. Univ. Acc. Inform. Soc. 6, 4, 419–434. H UENERFAUTH , M. 2008b. Evaluation of a Psycholinguistically Motivated Timing Model for Animations of American Sign Language. In Proceedings of the 10th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’08), 129–136. H UENERFAUTH , M. 2008c. A linguistically motivated model for speed and pausing in animations of American Sign Language. Linguistic and Assistive Technologies Laboratory website. http://latlab.cs.qc.cuny.edu/taccess2009/. (Accessed 11/30/08). H UENERFAUTH , M., Z HOU, L, G U, E., AND A LLBECK , J. 2008. Evaluation of American Sign Language generation by native ASL signers. ACM Trans. Access. Comput. 1, 1, Article 3. K ARPOUZIS, K., C ARIDAKIS, G., F OTINEA , S.-E., AND E FTHIMIOU, E. 2007. Educational resources and implementation of a Greek sign language synthesis architecture. Comput. Educ. 49, 1, 54–74. K ENNAWAY, J., G LAUERT, J., AND Z WITSERLOOD, I. 2007. Providing signed content on the Internet by synthesized animation. ACM Trans. Comput.-Hum. Interact. 14, 3, Article 15. L IDDELL , S. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge University Press, Cambridge, UK. M ARSHALL , I. AND S ÁF ÁR , E. 2005. Grammar development for sign language avatar-based synthesis. In Proceedings of the 11th International Conference on Human-Computer Interaction, C. Stephanidis Ed. Lawrence Erlbaum Associates, Mahwah, NJ. M ITCHELL , R., Y OUNG, T., B ACHLEDA , B., AND K ARCHMER , M. 2006. How many people use ASL in the United States? Why estimates need updating. Sign Lang. Stud. 6, 3, 306–335. M ORRISSEY, S. AND WAY, A. 2005. An example-based approach to translating sign language. In Proceedings of the Workshop on Example-Based Machine Translation, 109–116. N EIDLE , C., K EGL , J., M ACLAUGHLIN, D., B AHAN, B., AND L EE , R. G. 2000. The Syntax of American Sign Language: Functional Categories and Hierarchical Structure. The MIT Press, Cambridge, MA. S ANDLER , W. AND L ILLO -M ARTIN, D. 2006. Sign Language and Linguistic Universals. Cambridge University Press, Cambridge, UK. S HEARD, M., VAN D ER S CHOOT, S., Z WITSERLOOD, I., V ERLINDEN, M., AND W EBER , I. 2004. Evaluation reports 1 & 2, European Union project Essential Sign Language Information on Government Networks. S HIONOME , T., K AMATA , K., YAMAMOTO, H., AND F ISCHER , S. 2005. Effects of display size on perception of Japanese sign language—Mobile access in signed language. In Proceedings of the 11th International Conference on Human-Computer Interaction, C. Stephanidis Ed. Lawrence Erlbaum Associates, Mahwah, NJ. S TEIN, D., B UNGEROTH , J., AND N EY, H. 2006. Morpho-syntax based statistical methods for sign language translation. In Proceedings of the Conference of the European Association for Machine Translation, 169–177. T ARTTER , V. AND F ISCHER , S. 1983. Perceptual confusion in ASL under normal and reduced (point-light display) conditions. In J. Kyle & B. Woll Eds., Language in Sign: An International Perspective on Sign Language, Croom Helm, London, 215–224. ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009. A Model for Speed and Pausing in Animations of American Sign Language · 9: 31 Z IJL , L. AND B ARKER , D. 2003. South African sign language machine translation system. In Proceedings of the 2nd International Conference on Computer Graphics, Virtual Reality, Visualisation, and Interaction in Africa (Afrigraph’03). ACM Press, 49–52. V INK , M. AND S CHERMER , T. 2005. Report of research on use of an avatar compared with drawings or films of gestures. Tech. rep., Dutch Sign Language Center. W INGFIELD, A., T UN, P., K OH , C., AND R OSEN, M. 1999. Regaining lost time: Adult aging and the effect of time restoration on recall of time-compressed speed. Psych. Aging, 14, 3, 380–389. VAN Received November 2008; revised January 2009; accepted January 2009. ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
© Copyright 2026 Paperzz