A Linguistically Motivated Model for Speed and

A Linguistically Motivated Model for Speed
and Pausing in Animations of American
Sign Language
MATT HUENERFAUTH
The City University of New York, Queens College
9
Many deaf adults in the United States have difficulty reading written English text; computer
animations of American Sign Language (ASL) can improve these individuals’ access to information, communication, and services. Planning and scripting the movements of a virtual character’s
arms and body to perform a grammatically correct and understandable ASL sentence is a difficult
task, and the timing subtleties of the animation can be particularly challenging. After examining the psycholinguistics literature on the speed and timing of ASL, we have designed software
to calculate realistic timing of the movements in ASL animations. We have built algorithms to
calculate the time-duration of signs and the location/length of pauses during an ASL animation.
To determine whether our software can improve the quality of ASL animations, we conducted a
study in which native ASL signers evaluated the ASL animations processed by our algorithms.
We have found that: (1) adding linguistically motivated pauses and variations in sign-durations
improved signers’ performance on a comprehension task and (2) these animations were rated as
more understandable by ASL signers.
Categories and Subject Descriptors: I.2.7 [Artificial Intelligence]: Natural Language Processing—Language generation, Machine translation; K.4.2 [Computers and Society]: Social
Issues—Assistive technologies for persons with disabilities
This research was supported by grants from The City University of New York PSC-CUNY
Research Award Program (“Evaluating Parameters for American Sign Language Animations”,
2007), from the National Science Foundation (“CAREER: Learning to Generate American Sign
Language Animation through Motion-Capture and Participation of Native ASL Signers”, Award
#0746556, 2008), and from Siemens A&D UGS PLM Software (“Generating Animations of
American Sign Language”, Go PLM Grant Program, 2007).
This article is an extended version of a paper presented at the ACM SIGACCESS conference on
Computers and Accessibility [Huenerfauth 2008b]. This article includes additional detail about
experimental design, new examples of the ASL passages included in the study, results from an
additional round of evaluation with additional subjects, and a discussion of practical issues for
researchers making use of the results of this study.
Author’s address: M. Huenerfauth, Computer Science Department, CUNY Queens College, City University of New York, 65-30 Kissena Blvd, Flushing, NY 11375 USA, email:
[email protected].
Permission to make digital or hard copies of part or all of this work for personal or classroom use
is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display
along with the full citation. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post
on servers, to redistribute to lists, or to use any component of this work in other works requires
prior specific permission and/or a fee. Permissions may be requested from the Publications Dept.,
ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or
[email protected].
c 2009 ACM 1936-7228/2009/06-ART9 $10.00 DOI: 10.1145/1530064.1530067.
http://doi.acm.org/10.1145/1530064.1530067.
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 2
·
M. Huenerfauth
General Terms: Design, Experimentation
Additional Key Words and Phrases: American Sign Language, animation, natural language
generation, evaluation, accessibility technology for the deaf
ACM Reference Format:
Huenerfauth, M. 2009. A linguistically motivated model for speed and pausing in animations of
American Sign Language. ACM Trans. Access. Comput. 2, 2, Article 9 (June 2009), 31 pages.
DOI = 10.1145/1530064.1530067. http://doi.acm.org/10.1145/1530064.1530067.
1. MOTIVATIONS AND BACKGROUND
American Sign Language (ASL) is a natural language, which is used as a primary means of communication for about one half million people in the United
States [Mitchell et al. 2006]. During an ASL sentence, signers use their hands,
facial expression, eye gaze, head tilt, and body tilt to convey linguistic meaning
[Liddell 2003; Neidle et al. 2000; Sandler and Lillo-Martin 2006]. ASL is not
just a manual presentation of an English sentence; it has its own word order,
syntactic constructions, and vocabulary (which may not have one-to-one equivalence with English words). Because of the differences between English and
ASL, it is possible to be fluent in ASL yet have difficulty reading English text.
In fact, a majority of deaf 18-year-olds in the United States have an English
reading level below average 10-year-old hearing students [Holt 1993].
Unfortunately, Web sites and other written-English information sources can
also pose a challenge for deaf adults with low literacy skills. One way of combating this accessibility challenge is to use software that displays computergenerated animations of ASL to make more information, communication, and
services accessible to these users. These ASL animations may be scripted by
a content developer [Kennaway et al. 2007] or generated by English-to-ASL
automatic machine translation software [Chiu et al. 2007; Huenerfauth 2006;
Marshall and Sáfár 2005; Stein et al. 2006].
We have conducted research on generation of computer animations of ASL
[Huenerfauth 2008a]. During an earlier study in which human signers evaluated ASL animations, several participants wrote feedback comments requesting changes to the animation’s speed [Huenerfauth et al. 2008]. The animation
speed in that study had been chosen based on the typical signs-per-minute of
ASL reported in the linguistics literature. Because of these comments, we decided to conduct the present study to investigate how to set time-durations of
signs and placement of pauses in ASL animations.
1.1 Speed of ASL Signing
For users who are fluent in ASL but may have low literacy in English, information is most accessible if presented in ASL. Linguistic researchers have
established that ASL signing conveys information at the same rate as spoken
English [Bellugi and Fischer 1972], but the average speed at which most literate adults can read English text is much faster than the speed of spoken
English audio. So, a deaf user who is relying on an ASL animation to receive
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 3
information would receive that information at a lower sentence-per-minute
rate than adults who are reading English text. This creates a disparity in the
speed of access to information between users of ASL animations and users who
are reading written English text. For deaf adults with low literacy to access
information at a speed comparable to English-reading adults, ASL animations
must be displayed quickly (while maintaining their understandability). The
linguistics literature contains a range of values for “normal” signing speed:
from 1.5 to 2.37 signs per second [Bellugi and Fischer 1972; Grosjean 1979].
It is known that when ASL videos are played faster than 2.5 times normal,
viewers’ comprehension of the video drops significantly [Fischer et al. 1999].
To find ways to present ASL computer animations both quickly and understandably, we have sought inspiration from research on English speech audio.
Studies of English speech have shown that inserting pauses at linguistically
determined locations in high-speed audio allowed it to be more easily understood. One study increased the speed of an English speech recording, and researchers later inserted pauses at linguistically appropriate locations (between
sentences, clauses, phrases) [Wingfield et al. 1999]. The pauses improved listeners’ comprehension. This benefit arose only if pauses were at linguistically
appropriate locations (not at random or uniform locations), and the benefit
leveled off once the pauses had increased the duration of the recording back to
its original length before the artificial speeding [Wingfield et al. 1999]. Two
explanations for this link between comprehension and linguistically placed
pauses were proposed: (1) pauses may help listeners more easily determine
sentence/clause boundaries in the performance or (2) pauses at appropriate
phrase boundaries give the listener some additional time to mentally “chunk”
units of information and process/remember them more effectively.
In one study on ASL, black-screen segments of video were inserted into a
double-time video recording of a human performing ASL. The black-screen segments were added between “semantically unitary statements” [Heiman and
Tweney 1981]. (We understand this to mean that black frames were added
between sentences/clauses.) Blank segments were of uniform duration, and
enough were added to return the ASL video to its original time. No significant
improvement in viewers’ comprehension resulted. It has not been examined
whether inserting pauses (in which the signer remains on the screen but the
hands stop moving) at linguistically appropriate locations would impact viewers’ comprehension of videos of human signers. Our work examines whether
inserting pauses into computer-generated animations of ASL improves viewers’ comprehension of the animation or makes the resulting animations appear
more natural-looking to viewers.
1.2 Related Work on ASL Animation
Several research projects have investigated the synthesis of computer animations of virtual humans performing sign language [Elliot et al. 2008;
Fotinea et al. 2008; Huenerfauth 2006; Sheard et al. 2004] (and surveys in
Huenerfauth [2006] and Kennaway et al. [2007]). For example, several years
of European research projects have contributed to the eSIGN project, which
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 4
·
M. Huenerfauth
creates technologies for content developers to build sign databases in a symbolic notation, assemble scripts of signing performance for use on accessible web pages, and allow viewers to see animations on their Web browser
[Kennaway et al. 2007]. SignSmith Studio1 , discussed in Section 3.1, is a commercial tool for scripting ASL animations. Other computer science research
has examined automatic generation or machine translation (MT) of sign language [Chiu et al. 2007; Huenerfauth 2006; Karpouzis et al. 2007; Marshall
and Sáfár 2005; Morrissey and Way 2005; Shionome et al. 2005; Stein et al.
2006; van Zijl and Barker 2003].
The computer science literature on sign language animations has not focused on the timing of these animations. Content developers may script individual signs while observing the timing of human signers in video [Kennaway
et al. 2007] or use motion-capture technology to record individual signs from
humans directly [Elliot et al. 2008], but this addresses the timing of isolated
signs (not sentences). Also, many sign language animation systems give the
viewer the ability to adjust a dial that modifies the speed of the performance
[Elliot et al. 2008; Kennaway et al. 2007]; however, Section 2 will discuss how
the speed of an ASL performance is more complex than a single speed value.
Previous sign language animation research has not examined how the timeduration of a sign is affected by its surrounding linguistic context (what other
signs occur in a sentence or in a performance) nor how pauses should be placed
in an animation to mimic how human signers tend to pause at natural linguistic boundaries. SignSmith allows the content developer to manually specify
pauses to occur, and content-scripting tools from the eSIGN project give similar control over speed, timing, and pauses [Kennaway et al. 2007]. Animations
from generation or MT projects do tend to include pauses between sentences
[Elliot et al. 2008; Fotinea et al. 2008; Huenerfauth 2006], but a principled
linguistic way to select where to insert pauses into a sign language animation
has not been described previously in the literature.
2. LINGUISTIC BASIS FOR THE DESIGN
The timing of an ASL performance is actually more complex than a single
“speed” variable (representing the number of signs per second). In fact, many
parameters are needed to specify the speed of an ASL performance: the speed
of fingerspelling relative to speed of other signs, the time spent in transition
movements between signs, the time spent pausing during signing, etc. Further,
ASL signs are not all performed in the same amount of time: each sign has a
standard time duration at which it is performed. (For example, the ASL sign
“SEA” involves both hands making a wavelike motion in space, but the sign
“HATE” involves the middle finger of both hands being flicked out from behind
the thumbs. Generally, because of the complexity of the movement path of
the signer’s hands, the sign SEA requires more time to perform than the sign
HATE.) Thus, the final timing of ASL is a complex interaction between several
speed parameters and the lexical durations of the specific signs performed.
1 http://www.vcom3d.com
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 5
Although several psycholinguistic experiments have examined human ASL
speed and pausing [Bellugi and Fischer 1972; Fischer et al. 1999], the most detailed analysis was conducted by Grosjean and Lane [Grosjean 1979; Grosjean
and Lane 1979; Grosjean et al. 1981] in the late 1970s. During several years of
research, Grosjean and Lane studied the interaction of three component variables of signing rate: (1) the articulation rate at which the hands move through
space, (2) the number and location of pauses in a sentence, and (3) the length
of each pause. They defined a “pause” as a period of time between two signs
when the hands are not moving—there is not a pause between every pair of
signs [Grosjean et al. 1981]. In their view, signs consist of an “in-transition
time” to get the hands in the proper position, the main movement of the sign,
and an optional “hold” at the end of a sign where a pause could occur [Grosjean
1979].
Grosjean and Lane observed that ASL signers in recorded videotapes perform signs before sentence boundaries more slowly (12% longer) than their
normal duration [Grosjean 1979]. Also, they observed that when a sign occurs
more than once during a performance, then the durations of the later occurrences differ from the typical duration for that sign. If later occurrences of the
sign appear in a syntactic position where the sign has appeared before (e.g., as
the subject or as the direct object of the sentence), then the later occurrences
are 12% shorter. If the later occurrence of a sign appears in a new syntactic
position where it had not previously appeared, then the later occurrence of
the sign is 12% longer [Grosjean 1979]. For example, if a sign appears early
in a performance in the subject position of a sentence, and the same sign is
used as a direct object in a later sentence in the performance, then the second
occurrence is longer.
3. TWO ALGORITHMS FOR ASL SPEED AND TIMING
While there has been psycholinguistic research on ASL pauses and sign durations, it has not been previously applied to ASL animations. Based on these
ASL linguistics studies, we have built algorithms for calculating the duration
of signs and the location/length of pauses during an ASL animation. Our two
algorithms thus attempt to set timing values for an ASL animation so that it
mimics the behavior of human signers. Our goal is to improve the understandability and perceived naturalness of ASL animations.
3.1 Using SignSmith Studio for Prototyping
To build a prototype of our algorithms and to evaluate whether they produce
ASL animations that signers find more understandable and natural, we had
to create animations for signers to examine. We had to select a method for
producing computer animation of ASL. In earlier work, we had built a system
for generating animations of a character performing ASL sentences (containing constructions called “classifier predicates,” which describe 3D spatial concepts) [Huenerfauth et al. 2008]. Our generator was designed with a primary
focus on these constructions, and it had a small ad hoc vocabulary of signs in
its repertoire, which had been expanded as needed to construct full sentences
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 6
·
M. Huenerfauth
for past evaluation studies. To conduct tests of algorithms that operate on
sign-duration and pause-insertion on wide variety of ASL sentences (not just
classifier predicates), we needed an animation platform with a larger vocabulary of signs. We have decided to use SignSmith Studio, a commercial system from VCom3D. This product allows users to script an ASL performance
(using a dictionary of signs, a fingerspelling generator, limited eye gaze and
head tilt, limited shoulder tilt, a list of around 50 facial expressions, optional
speech audio and mouth movements, and other ASL features). The user is presented with an interface that looks like a set of parallel tracks (like a musical
score), and the user arranges signs and facial expressions on these parallel
timelines. When a “play” button is pressed, then an animation is generated in
which a virtual human character (there are several to choose from) performs
the script.
The advantage of using a commercial system that we did not develop is
that the movements and speed/durations of the signs in its dictionary were
developed external to our research project. The signs were not built solely for
this study, and so each sign’s duration in the dictionary was set independently
of our experiments. Another advantage of SignSmith is that its representation of the timing of signs is compatible with that of our generation software
[Huenerfauth 2006]; so, progress made running experiments on their animations can translate into later improvements for our ASL generation system under development. Specifically, SignSmith stores three timing values for each
sign: (1) transition time during which the hands get to the starting position, (2)
main movement of each sign, and (3) hold time at the end of the sign when the
hands remain motionless. In SignSmith, the user can manually override the
default values for these three sub-times for each sign in the animation. Signs
have a basic “duration” value for their main movement, and this is multiplied
by a “multiplier” factor (that the user may optionally specify) to vary the time
of the sign’s main movement in the resulting animation.
To produce ASL animations, users of SignSmith are expected to be knowledgeable of the language; however, even fluent signers may not have intuitions
about how speed and pauses should be numerically specified to create a natural
result. The documentation for SignSmith mentions that users may want to insert longer transition times between two signs at a sentence boundary. (So this
appears to be the recommended approach for users to manually add pauses between sentences.) For this project, we created several multisentence ASL performances using SignSmith—leaving the default timing values for each sign
in the script (more details in Section 4.2). The script for these animations was
used as input to the sign-duration and pause-insertion algorithms discussed
in the following.
3.2 Sign-Duration Algorithm
We have implemented an algorithm for calculating linguistically motivated
durations for the signs in an ASL animation—based on the standard duration
for each sign and its surrounding linguistic context. The input to our algorithm is an initial script of the ASL performance from SignSmith (an XML file
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 7
representing the sequence of signs with their standard time durations) and
some linguistic information: (1) part-of-speech for each sign (with nouns subcategorized according to their syntactic role as described in the following) and
(2) sentence/clause boundaries identified. The output of the algorithm is a
script in which the durations of some signs are modified.
Our two-phase algorithm builds upon linguistic research in Section 2. In
phase 1, signs occurring more than once in the script are detected. Only content signs like nouns, verbs, adjectives, and adverbs are modified in this sign
duration algorithm—function signs like prepositions, pronouns, or other grammatical markers are not. If the repeated sign is a verb, adjective, or adverb,
then later occurrences of the sign are shortened by 12% in duration. If the
repeated sign is a noun, then changes to later occurrences depend on the syntactic role of each occurrence (it may lengthen or shorten by 12%). Nouns are
categorized as being in: topic position, when/conditional-clause position, subject position, direct/indirect object position, object-of-preposition, etc. (Topic
and when/conditional clauses occur at the beginning of ASL sentences and are
accompanied by special facial expressions.) In phase 2 of the algorithm, signs
that appear just before sentence or clause boundaries are lengthened (by 12%
or 8%, respectively). Section 7.2 will discuss how future work may examine
alternative implementations of this sign-duration algorithm in which some of
these numerical parameters (e.g., the 8% or 12% modifications in sign durations) are set differently.
3.3 Pause-Insertion Algorithm
We have also implemented an algorithm for determining where to insert
pauses (and how long they should be) during a multisentence ASL animation.
The input to this algorithm is a script of an ASL performance that includes the
sequence of signs with the time duration of each (the XML output of the previous algorithm can serve as the input to this algorithm). Our pause-insertion
algorithm also requires some linguistic data about the ASL sentences:
(1) location of sentence boundaries and (2) a syntactic parse tree for each sentence. (This data is again supplied manually to the algorithm for this study;
tools for automatically parsing ASL sentences are a focus of future work, as
discussed in section 7.2.) The algorithm’s output is a script of the ASL performance in which the “hold” times at the end of signs have been modified to
produce pauses during the performance at linguistically appropriate locations.
3.3.1 The Original Grosjean and Lane Model. Our algorithm builds upon
ASL psycholinguistic research on speed and timing. When analyzing video
recordings of human ASL signers, Grosjean and Lane proposed a model to account for where in a performance signers would insert pauses (and how long
pauses would be) [Grosjean and Lane 1979; Grosjean et al. 1981]. Their model
assumed that a syntactic parse tree for the sentence was known, and it predicted the percentage share of total pause time that should be inserted between adjacent pairs of signs. Their model only accounted for pauses in a
single sentence—not for a multisentential passage. We have used their model
as a basis for our multisentential ASL pause-prediction algorithm.
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 8
·
M. Huenerfauth
The key idea of Grosjean and Lane’s model was that two factors determined
how much pause time occurs between two signs: (1) the syntactic significance
of that boundary and (2) whether the boundary is near the midpoint between
the nearest pauses. Thus, boundaries between sentences, clauses, and major
phrases in a sentence were preferred locations for pauses, but this preference
was balanced with a “bisection tendency” [Grosjean and Lane 1979] to pause
near the middle of long constituents.
Grosjean and Lane [1979] describe an iterative method for calculating the
percentage share of total pause time that should be inserted between each pair
of adjacent signs in a performance. They first assign a syntactic “complexity
index” value (CI) to each sign boundary in the ASL sentence; this value is
calculated using the syntactic parse tree for the sentence. In a parse tree,
there exists some lowest node that spans the boundary between each pair of
adjacent signs; the syntactic importance of that node determines the CI value
of that boundary. Specifically, the total number of descendant nodes below that
node in the parse tree is the CI value for the boundary. Thus, the boundary
between two clauses would have a larger CI value than the boundary between
two signs inside a noun phrase.
Grosjean and Lane’s method iterates until all boundaries between signs
have been assigned a percentage share of pause time. One pause is added
during each iteration. An iteration of the algorithm begins by selecting the
longest span of signs not yet broken by a pause (based on number of signs,
not on sign durations). For each boundary between adjacent signs within that
span, the relative proximity (RP) of the boundary to the midpoint of the span
is calculated (RP = 100% at the midpoint, RP = 0% at the ends of the span,
etc.). For each boundary inside the span, the CI value is multiplied by the
current RP value. The boundary with the greatest (CI*RP) product inside the
span is chosen as a location for a pause; the percentage share of pause time
assigned to that boundary is calculated based on the product. The algorithm
then iterates (selecting the longest remaining unbroken span of signs in the
whole performance and then calculating fresh RP values for that span under
construction [Grosjean and Lane 1979].
3.3.2 Our Pause Insertion Algorithm. Our algorithm implements and extends the Grosjean and Lane method in several ways. First, it takes into account the results of our sign-duration algorithm. When calculating the RP
values, the Grosjean and Lane model operates on signs as if they were all uniform unit duration. Our algorithm uses the actual timing values (in-transition
time + sign-duration) of the signs in a span to calculate the RP values.
Second, we had to extend the Grosjean and Lane model to account for multisentential performances. Syntactic parse trees span over single sentences, so
the CI for a boundary between sentences is undefined in the Grosjean and Lane
model. Our algorithm sets the CI value of the sentence boundary between
any two sentences (S1 and S2) as equal to max(18, length(S1)+length(S2)-2),
where max is the maximum function and length is the number of signs in
each sentence. The logic behind this approach is that if the two sentences had
been joined by a conjunction, then the root of the parse tree that joins them
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 9
Fig. 1. Pseudocode for pause-insertion algorithm.
would dominate all of the nodes in the tree for S1 and all nodes in the tree for
S2. Assuming binary-branching syntactic parse trees, the number of internal
nodes in a tree for a sentence S would be length(S)-1. To ensure sentence
boundaries adjacent to short sentences still receive sufficient syntactic heft,
our algorithm will assign a CI value of at least 18 to any sentence boundary.
Our algorithm uses other results from the psycholinguistic literature. ASL
signers insert a pause at 25% of the boundaries between signs [Grosjean 1979],
so our algorithm adds pauses at boundaries that have been assigned the top
25% of pause-percentage weight. ASL signers spend about 12% of their time
pausing during rehearsed sentences [Grosjean 1979] and 10%–35% during
spontaneous signing [Bellugi and Fischer 1972]. So, we insert pause-time into
the animation such that 17% of the final animation time is spent in pauses
(a middle-ground between published percentages). Pause time is added to the
“hold” time of the sign before the boundary. Figure 1 shows pseudocode for
our pause-insertion algorithm. Boundaries with the top 25% Pause-Share (PS)
values in Figure 1 receive a share of the pause time proportional to their PS
value.
4. DESIGN OF THE FIRST EVALUATION STUDY
While linguistic results for ASL timing have guided our algorithm design, we
had to perform our own evaluation studies of ASL animations for the algorithms we implemented. Limitations of the animated character’s appearance,
expressiveness, and movement subtleties combined with the expectations human viewers have of computer animations could lead to different results when
evaluating the speed and timing of a computer-generated ASL animation. We
conducted an evaluation study in which native ASL signers evaluated the
animations processed by our timing algorithms to test two hypotheses: (H1)
adding linguistically motivated pauses and variations in sign durations will
help ASL signers understand and remember information from these animations and (H2) these new ASL animations will appear more natural-looking to
signers.
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 10
·
M. Huenerfauth
4.1 Participant Recruitment and Interaction
In an earlier evaluation study [Huenerfauth et al. 2008], we outlined a set of
best practices for the conduct of studies involving ASL signers that are designed to evaluate ASL animations. We described how it was important that
the signers recruited for the study be native signers, how to best ask questions to screen for such signers, and how the experimental environment around
signers should be ASL-focused (with as little English or English-like signing
as possible). All of these factors help to ensure that the responses given by
participants about the correctness of the animations are as ASL-accurate as
possible. Nonnative signers who learned ASL later in life may be more lenient
when judging ASL animations, and signers subjected to an English environment may switch their own signing to a more English-like form. This can also
result in their being overly tolerant of animations that are too English-like
[Huenerfauth et al. 2008]. For the current study, all instructions and interactions were conducted in ASL, and 8 of the 12 participants arrived accompanied
by another ASL signer (thereby producing an ASL conversational environment
immediately prior to the study).
Advertisements posted on Deaf community websites in New York City asked
whether potential participants had grown up using ASL at home or attended
an ASL-based school as a young child. Of the 14 people who came to the laboratory, 2 answered prescreening questions in such a way that they did not
meet the screening criteria. Their data was excluded from the study. Of the
12 participants whose data was included, nine grew up with parents who used
ASL at home. Of the remaining three, two began attending a school using
primarily ASL before the age of 7, and the final participant began using ASL
before the age of 7 through another circumstance. Of our 12 participants, 5
had a significant other who is deaf/Deaf, 9 used ASL as the primary language
in their home, 11 used ASL at work, and 11 had attended a college where instruction was primarily in ASL. There were 7 men and 5 women of ages 25–58
(median age 37).
4.2 Animations Shown in the Study
Twelve ASL passages of length 48–80 signs (median 69 signs) were created in
SignSmith on a variety of topics: four short news stories, two adaptations of
encyclopedia articles, four fictional narratives, and two personal introductions.
Passages contained sentences of a variety of lengths and complexity; some included topicalized noun phrases, condition/when clauses before a main clause,
rhetorical questions, contrastive role-shift (signers may tilt their shoulders to
the left or right as they contrast two concepts), or association of entities with
locations in space (for use by later pronouns during which the signer points
to those locations). Table I contains a listing of all 12 passages used in the
study with a brief summary of each. Figure 2 contains a transcript (in the
form of ASL glosses and English translation) for one of the passages used in
the study. See Figures 3 and 4 for a screenshot and timeline of animations
from our study. Sample videos from the study are available on the website of
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 11
Table I. List of the Twelve ASL Passages Used in the Study
Passage Name
Bear Spray
Big Interview
Chess Program
Cost of Rice
Garage Sale
Jose & Family
President Eugene
Student Protest
Reporter Albert
Obituary
Sports Injury
Hurricane Damage
Summary
News story: scientists make a bear repellent spray.
Narrative: Martin’s daughter has a big job interview.
Encyclopedia entry: computer software that plays chess.
News story: the price of rice rises internationally.
Narrative: Sally’s aunt holds a garage sale.
Personal Introduction: Jose’s family history and heritage.
Narrative: seven-year-old child wants to be president.
Narrative: students protest animal use in laboratories.
Personal Introduction: college student studying journalism.
News story: famous classics professor dies.
News story: student athlete is injured and cannot play.
Encyclopedia entry: storms causing expensive damage.
Fig. 2. Transcript of sample passage from the study (first in the form of ASL glosses and then
in the form of an English translation). The passages were displayed in the form of computer
animations of ASL during the experimental study—as shown in Figure 3.
the Linguistic and Assistive Technologies Laboratory at Queens College of the
City University of New York [Huenerfauth 2008c].
An ASL interpreter verified the accuracy of the twelve animations—to a
degree. While SignSmith gives the user many options in controlling the animated character, there are many phenomena in fluent ASL signing that are
beyond the capabilities of the system: inflection of ASL verbs for locations in
3D space, separate use of head tilt and eye gaze during verb signs to indicate
subject/object agreement, association of entities under discussion with more
than two locations around the signer, etc. SignSmith’s dictionary also does not
contain all possible ASL signs. Therefore, the animations produced for this
study are not perfectly fluent ASL. To evaluate our timing algorithms, we feel
that “rather fluent” ASL animations are acceptable. State-of-the-art ASL generation technology will not be able to produce “fully fluent” ASL animations
for many years in the future. So the degree to which the timing algorithms
can improve the understandability and naturalness of semi-fluent ASL animations is still an important research question (and is perhaps a more realistic
evaluation our algorithms).
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 12
·
M. Huenerfauth
Fig. 3. Screenshot from an animation.
Fig. 4. Illustration of sequence of signs in an original animation (top row) and after being processed by our timing algorithms (bottom row). Additional space between signs represents pauses
added to the animation script by our algorithms.
4.3 Responses and Experimental Design
Participants viewed animations of six types: 2 groups (no-pauses vs. pauses) ×
3 speeds (normal, fast, very fast). No-pauses animations were not processed
by our timing algorithms; pauses animations have been processed by our
sign-duration and pause insertion algorithms. We also examined how quickly
animations can be played while remaining understandable. Normal-speed animations are at a rate of 1.5 signs per second, fast-speed animations are 2.25
signs per second, and very fast animations are 3 signs per second. (Values
for average signing speed in the linguistics literature vary from 1.5 to 2.37
signs per second [Bellugi and Fischer 1972; Grosjean 1979].) Another reason
to study both pause insertion and speed in one study is that we can determine
whether any effect from pause insertion is from (1) pauses being placed at linguistically appropriate locations or (2) simply from the additional time added
to the animation.
Each of the 12 ASL passages was generated in all six combinations of
group × speed, producing 72 animation files. A fully factorial within-subjects
design was used to assign 12 files to each participant in the study such that:
(1) no participant saw the same passage twice, (2) the order of presentation
was randomized, and (3) each participant saw two animations of each of the six
combinations of group × speed. Animations were viewed on a 17” LCD screen
at a distance of less than one meter. The animations occupied a 10cm × 10cm
region of the screen. We selected this animation size for the experiment based
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 13
on the assumption that many future applications that include sign language
animations may display the animated character in a window that occupies a
fraction of the screen (with the remainder of the screen used for the rest of the
application’s user interface). For this reason, we conducted our experiments
with the animations at the size of 10cm × 10cm, which was approximately
one-ninth of the screen area.
Several formal evaluation studies of ASL animations have been conducted
that involve native signers [Huenerfauth et al. 2008; Huenerfauth 2006;
Kennaway et al. 2007; Sheard et al. 2004; Vink and Schermer 2005]. In our
earlier study [Huenerfauth et al. 2008], participants saw animations and were
asked to circle numbers on 10-point Likert scales to indicate how Grammatical, Understandable, and Natural-moving the animations were. Participants
in that study appeared comfortable with the instructions and giving their opinions about these aspects of the ASL animations. We have used the same three
subjective criteria in this study. For this study, we have added an additional
Likert scale to enable participants to indicate whether the ASL animation is
too-slow, perfect, too-fast, or somewhere in-between. As in our previous study,
instructions were given to participants in ASL to explain the meaning of each
of these Likert scales.
Earlier experiments evaluating the understandability of sign language animations have given viewers various tasks to demonstrate their comprehension:
decide if two animations say the same thing [Vink and Schermer 2005], match
a signing animation to a movie of what was described [Huenerfauth et al.
2008; Huenerfauth 2006], summarize the animation [Sheard et al. 2004], or
answer comprehension questions about the animation’s content [Sheard et al.
2004]. Studies of ASL videos have also pioneered techniques useful for evaluation of animation [Bellugi and Fisher 1972; Fisher et al. 1999; Grosjean 1979;
Grosjean and Lane 1979; Grosjean et al. 1981; Heiman and Tweney 1981;
Tartter and Fisher 1983]. Heiman and Tweney [1981] showed signers multisentence ASL videos followed by ASL videos of comprehension questions. Their
participants wrote down English answers to the questions. Tartter and Fischer
[1983] asked signers to select a cartoon drawing that corresponded to the content of video of an ASL sentence they had seen.
In our study, after viewing the ASL passage, participants will answer the
four Likert-scale questions discussed above. Then, participants are shown a
set of four comprehension questions about the information in that ASL passage. The same animated signing character used during the passage performs
these four questions; after each question, the animated character gives a list
of multiple-choice answers. These answers correspond to a list of cartoon
clip-art pictures on a paper survey form, and the participant circles the correct answer(s) to each question. We have adapted the design of Heiman and
Tweney so that our participants do not need to write any English text during
the experiment—they circle a picture as their answer. We had planned to omit
labels below the cartoons to avoid English influence in the environment; however, during pilot tests, participants requested we add short English captions
(Figure 5). We included an example story with comprehension questions prior
to the data collection process. This allowed participants to become comfortable
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 14
·
M. Huenerfauth
Fig. 5. Sample set of answer choices for one comprehension task question.
Fig. 6. Transcript of comprehension questions used in the study for the passage shown in
Figure 2 (each question is first presented in the form of ASL glosses then as an English translation). Questions were displayed as ASL computer animations during the study.
seeing the animated signing character, and it ensured that the instructions for
the study were clearly conveyed.
Most questions ask basic who/what/when/where facts from the passage, and
about 10% are less direct. For an example of a less direct question, in one
passage, a person is said to be vegetarian, and a later question asks what foods
this person eats—choices include: hamburgers, hot dogs, salad, etc. Figure 6
contains a set of questions used in the study; Figure 7 shows the set of answer
choices shown to the participant as clip-art images for these questions.
To minimize the effect of variation in skills across participants, questions
focused on shallow comprehension of basic facts in the passages. Also, our
“comprehension” questions actually measure a mixture of the participant’s recall and comprehension of the passage. Participants were not allowed to watch
the passage more than once; so, they could not replay it to look for an answer.
(Participants could replay the questions.) Since ASL animations should both
be comprehensible and convey information memorably, we decided it was acceptable for our questions to measure both recall and comprehension. In future
work, we may study these two phenomena more independently.
Statistical tests to be performed on the data were planned prior to data
collection. To look for significant differences between scores for GrammatiACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 15
Fig. 7. Clip art answer choices corresponding to the questions in Figure 6.
cality, Understandability, Naturalness, Speed, and Comprehension-task2 performance, a Kruskal-Wallis test was performed. Nonparametric significance
tests were used because the Likert-Scale response data was not known to be
normally distributed. In addition, tests for correlation were planned between
the major dependent variables (G, U, N, S, and C), and between various demographic variables (participant’s age, gender, the presentation order in which
individual animations were seen by that participant, etc.).
After viewing 12 ASL passages and answering the corresponding Likertscale and comprehension questions, participants gave open-ended feedback
about the animations. They were shown two pauses animations (that had been
processed by our two algorithms) while they gave their feedback to give them
something specific to comment about if they preferred. Participants were given
the option to sign their comments in ASL or to write them down in English
themselves.
5. RESULTS OF THE FIRST EVALUATION STUDY
To score the Comprehension task for each passage, we subtracted the number of correctly circled pictures minus 25% of the incorrectly circled pictures.
2 Capitalized terms
(Grammaticality, Understandability, Naturalness, Speed, and Comprehension)
refer to response values collected in our experiment; lower-case terms refer to the general meaning
of these words. This distinction is particularly important for speed/Speed: the capitalized term
refers to the 21-point Likert-scale value collected in the study, but the lower-case term refers to
the signs-per-minute speed of the animations.
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 16
·
M. Huenerfauth
Fig. 8. Graph of comprehension task scores.
Fig. 9. Graph of Likert-Scale scores for subjective responses from the first evaluation study.
This difference was then divided by the highest possible score for that question (to enable comparisons across questions with different numbers of correct answers). Figure 8 shows the average Comprehension task score for the
no-pauses vs. pauses groups; the white bar represents the average across all
responses for that group, and the shaded bars represent the values for each
speed subgroup (normal, fast, very fast). Tests for statistical significance were
performed between the pause and no-pause groups for each speed level and for
the combination of responses across all speeds. Statistically significant differences between pairs of compared values are marked with an asterisk.
Figures 9 and 10 show average Likert-scale responses for each of the six animation types for all three speeds. The graphs include white bars that indicate
the average value across 72 responses, while the shaded bars for each speed
level indicate the average of 24 responses for animations of that type (group ×
speed).
The values shown in Figure 9 (G, U, and N) are reported on a 1-to-10 Likert
scale. The perceived Speed value (S) shown in Figure 10 was reported on a
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 17
Fig. 10. Graph of the Likert-Scale score for perceived Speed from the first evaluation study.
Table II. Pearson’s R-Values between Compared Values
Under.
Natural.
Speed
Compre.
Gram.
0.687
0.537
0.506
0.406
Under.
Natural.
Speed
0.673
0.761
0.473
0.473
0.317
0.413
21-point scale: from 1 (“too fast”) to 10 (“perfect”) to 21 (“too slow”). Out of
the 144 S scores collected, only six scores were above 10 (range 11–13, median 11.5), all for normal-speed animations (3 for pauses animations, 3 for
no-pauses animations). Another important note about Figures 9 and 10 is that
significant differences are indicated only between pauses vs. no-pauses pairs
of animations.
This evaluation study was conducted to evaluate two hypotheses: (H1)
adding linguistically motivated pauses and variations in sign durations will
help ASL signers understand and remember information from these animations and (H2) these new ASL animations will appear more natural-looking to
signers. The use of our timing algorithms has led to a significant increase in
Comprehension task performance. This was true in the overall case (all speeds
combined), and it was also significant when only considering the normal-speed
data. We also see significant differences for the scores participants gave to
the animations for Understandability. Since the normal-speed animations
received perceived Speed scores closest to 10 (perfect), it is important that
our Comprehension and Understandability trends hold true in the normalspeed case. Our second hypothesis was not supported by the data; we did not
measure significant difference in the Naturalness scores of animations in the
pauses vs. no-pauses groups.
Table II shows the correlations calculated between scores for G, U, N, S,
and C. All Pearson’s R-values in the table are statistically significant (p<0.05).
Understandability was the value best correlated with a participant’s success on
the Comprehension task for an animation, but it was not a strong correlation
(R=0.473). In an earlier study with ASL animations [Huenerfauth et al. 2008],
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 18
·
M. Huenerfauth
we also saw low R-value between Understandability and Comprehension task
success. Low correlation between U and C suggests that asking participants
to subjectively rate the understandability of animations is no substitute for
directly measuring Comprehension task performance.
Correlations were also examined between the order in which each animation was shown to a participant and the evaluation scores. There was a slight
correlation between presentation order and reported Speed Likert-scale score
(R = 0.11) and between presentation order and Comprehension task success
(R = 0.15); however, neither was statistically significant. Since we presented
the passages in this study in randomized order for each participant, the practice effect would have minimal impact on the results.
There were weak but significant negative correlations between a participant’s age and their scores for Grammaticality (R = −0.26), Understandability
(R = −0.22), and Naturalness (R = −0.29). There was no significant correlation
between age and perceived Speed nor between age and Comprehension-task
success. So, older participants rated animations more critically but this did
not lead to differences in their rating of the animation’s perceived Speed nor
success at the Comprehension task. No age-related differences were observed
in Comprehension scores for pauses vs. no-pauses.
Most feedback comments from participants were on aspects of the animation inherent to SignSmith or specific passages (quality of facial expression,
movement of individual signs, geographic or regional specificity of some signs,
“stiffness” of the character’s torso, or limited association of entities with points
in 3D space for pronouns). For example, eight of the twelve participants mentioned that the animated character should have more facial expressions during
the signing performance. Three participants felt that the animated character
should have more eye gaze movements; while one commented that the eye
gaze movements were already quite good. One participant commented that
the transition movements between pairs of signs should be smoother looking.
Interestingly, none of the participants commented on the presence or absence of pauses. This feature of the animations did not seem to draw their
overt attention. Some participants’ comments were relevant to the timing.
For example, nine participants mentioned that the animations at the normalspeed were still too fast, but three felt the speed was OK. Three participants
felt that the fingerspelling was relatively too fast compared to other signs. In
the normal-speed animations, fingerspelling occurred at a rate of 4.1 letters
per second with a 0.8 second hold at the end.
6. SECOND EVALUATION STUDY
We conducted a follow-up evaluation study to address some additional research
questions raised by the results above. Specifically, since all three speeds of
animations in the first study were judged to be too fast by participants, we
wanted to ask participants to evaluate animations at slower speeds—with a
goal of identifying an ideal speed at which the animations should be displayed.
We not only wanted to find out how participant’s subjective Likert-scale rating
of the perceived Speed of the animations would vary, but we also wanted to
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 19
Fig. 11. Comprehension scores plotted by signs/second.
see what levels of Comprehension scores participants would achieve on these
slower animations (since speed of presentation had a significant impact on
Comprehension task scores in the first study).
In addition to displaying animations at slower speeds, we wanted to address
a limitation in the design of our first study. Specifically, there are two ways to
define the speed of an animation with pauses: the signs per minute of the original animation or the recalculated signs per minute taking into account the
additional time from pauses. In the first evaluation study, we used the first
definition to label animations as normal, fast, or very fast; so a normal-speed
animation without pauses was shorter than the corresponding normal-speed
animation with pauses. This means that the pause-insertion process added
additional time to an animation. Because we observed a connection between
speed of animation and Comprehension scores in our first evaluation study,
this makes it difficult to determine the true benefit of inserting pauses into animations. It was not clear whether the improvement in Comprehension scores
that we observed in the first study were due to the fact that the pauses were
in linguistically appropriate locations. Alternatively, the increased Comprehension scores could have merely arisen from the additional time added to the
animation through the pause-insertion process.
To better understand this issue, we plotted the Comprehension scores collected during our first evaluation study according to the total number of signs
divided by total time (including pauses)—see Figure 11. If the only benefit of
adding pauses had been from the change in total animation time alone, then we
would expect these two lines to coincide. Because they do not, then it appears
placement of pauses at linguistically appropriate locations has a benefit beyond just an overall increase in the animation time. While Figure 11 suggests
that pause-insertion has a benefit beyond merely slowing down the animation,
to truly understand whether the location of pauses was significant, we decided
to compare the following types of animations in our second evaluation study:
(1) animations without pauses and (2) animations with pauses that have later
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 20
·
M. Huenerfauth
been sped up so that they have the same total time as the original no-pauses
animation.
6.1 Differences between First and Second Evaluation Studies
There are three major ways in which our second evaluation study differed from
the first. The second study included 18 new participants (as opposed to only 12
participants in the initial study). In the data from our first study, we noticed
many trends in the values for Comprehension, Grammaticality, Naturalness,
and perceived Speed that did not reach statistical significance. For this reason,
we wanted to include a larger number of participants in this second study.
In our first study, we also noticed that participants rated all three speeds
of animations as being too fast (on the Likert scale). In the second study, we
decided to evaluate animations at slower speeds—with a goal of identifying an
optimal speed for displaying animations of ASL. In the first study, we tested
animations at speeds of 3.0, 2.25, and 1.5 signs per second (before inserting
pauses into the animations). In our second study, we evaluated animations
at: 1.5, 1.2, and 0.9 signs per second. We will refer to these three speeds as:
normal, slow, and very slow. We decided to use the slowest speed from the first
study (the normal-speed animations) as the fastest speed in the second study
to more easily compare the results of the two studies.
The third difference between our first and second evaluation studies is that
after running our sign-duration and pause-insertion algorithms on the animations, we “normalized” the speed of the modified animations. Specifically,
we increased the speed of the animations into which pauses had been added
so that the animation with pauses had the same total time duration as the
animation without pauses. Because we speed up animations after inserting pauses, we expect to see a smaller difference in Comprehension scores
between the pauses and no-pauses animations in our second study. (While
adding pauses appeared to increase Comprehension in the first study, increasing speed seemed to decrease Comprehension—thus, these two factors may
somewhat counteract each other in the second study.) Thus, we anticipated
that the design of our second evaluation study would be a more rigorous test
of the benefit of our pause-insertion and sign-duration algorithms.
6.2 Design of the Second Evaluation Study
Aside from the differences identified above, the design of our second evaluation study was identical to the first. The same set of ASL passages and comprehension questions were used, the same pause-insertion and sign-duration
algorithms were applied to the animations, and a similar recruitment procedure was undertaken. There were 18 participants in the study (12 men and
6 women) of ages 21 to 39 (median age 29.5). Of the 18 participants, 8 grew
up with parents who used ASL at home, 9 began using ASL in a school setting before the age of 8, and 1 began using ASL through another circumstance.
Of our 18 participants in the second study, 8 had a significant other who is
deaf/Deaf, 15 used ASL as the primary language in their home, 17 used ASL
at work, and all 18 had attended a college where instruction was primarily
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 21
Fig. 12. Percentage of correct responses to the comprehension questions about the animations in
the second evaluation study.
in ASL. We formulated two hypotheses prior to this second evaluation study:
(H3) Despite speeding up the animations processed by our sign-duration and
pause-insertion algorithms, we will still see a significant improvement in the
Comprehension task scores—as compared to the Comprehension task scores
for the animations that had not been processed by our algorithms. (H4) Based
on the trend in the perceived Speed Likert-scale scores from the first study,
we would expect that participants will prefer animations that are displayed
slower than normal speed—perhaps approximately 1.2 signs-per-second.
6.3 Results of the Second Evaluation Study
Figure 12 shows the Comprehension task scores for the animations in the second evaluation study. Considering the combined data from all three speeds,
we see a statistically significant difference between the Comprehension scores
on the pauses animations and the no-pauses animations (p<0.05). This result
is exciting because it demonstrates the usefulness of the sign-duration and
pause-insertion algorithms on a more rigorous test than the first evaluation
study. Thus, hypothesis H3 was verified. In this case, we increased the speed
of animations after running our sign-duration and pause-insertion algorithms
on them; so, we expected the Comprehension scores on the unprocessed and
processed animations to be more similar than in the first study.
The results of this second study indicate that there is a benefit to processing ASL animations with our pause-insertion and sign-duration algorithms—a
benefit that goes beyond merely inserting additional time into the animations.
Thus, given a fixed amount of time to display an ASL animation, it is actually beneficial to increase the overall movement speed of the virtual character
to allow us to allocate time for pauses during the performance. The positive
effect of pause-insertion outweighs the negative effect of speeding up the animation to normalize the total time duration. This result also suggests that
our sign-duration and pause-insertion algorithms may allow us to display
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 22
·
M. Huenerfauth
Fig. 13. Graph of Likert scale scores for grammaticality, understandability, and naturalness-ofmovement from the second study.
ASL animations at a faster overall speed (while still preserving their
understandability)—thereby addressing the speed-bottleneck to information
access discussed in Section 1.1.
Comparing the Comprehension task scores for the animations at each speed
level individually (normal, slow, very slow), we did not observe statistically
significant differences in Comprehension scores. While not significant, the
Comprehension task scores for the pauses animations were higher than the
no-pauses animations. We also observed a trend in the Comprehension task
results that was present in the first evaluation study: animations displayed at
slower speeds had higher Comprehension task scores.
Figure 13 displays the Grammaticality, Understandability, and Naturalness
values for the animations in our second study. The results for Understandability show a somewhat similar trend to those in the first study (Figure 9). That
is, the Understandability scores for the pauses animations appeared higher
than the no-pauses animations, and the slower animations received higher Understandability scores. However, the differences in values between groups are
very small. In fact the differences between the values for the pauses and nopauses animations were not statistically significant in Figure 13 (p<0.05). The
results for Grammaticality and Naturalness in this second evaluation study
were too similar in order for us to identify trends. It is possible that the trends
identified in the Grammaticality and Naturalness scores in the first evaluation study arose largely due to the differences in the signs-per-minute of the
animations between groups. In this study, now that we normalize the time duration of the animations after running our sign-duration and pause-insertion
algorithms, the Grammaticality and Naturalness effects seen in the first study
may have disappeared.
Figure 14 displays the perceived Speed values for the animations in the second study. There is a similar trend to the results as those in the first evaluation study—see Figure 10. However, the differences between the values for the
pauses and no-pauses animations are not statistically significant in Figure 14
(p < 0.05).
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 23
Fig. 14. Graph of the Likert-Scale score for perceived speed from the second evaluation study.
Table III. Results for the Normal-Speed Animations with “No-Pauses” (i.e., not Processed
by Our Two Algorithms) from the First and Second Study.
Evaluation
Criterion
Comprehension
Task Score
Grammaticality
Likert-Scale Score
Understandability
Likert-Scale Score
Naturalness
Likert-Scale Score
Perceived Speed
Likert-Scale Score
Normal No-Pauses
animations from 1st study
0.20
Normal No-Pauses
animations from 2nd study
0.34
6.85
7.00
5.81
5.78
5.31
4.81
7.44
7.17
To understand how well the results of our first and second study can be
compared, we can examine the results for normal-speed animations without
pauses. (These animations were the only ones shown in both the first and the
second studies.) If we see similar results for Comprehension, Grammaticality,
Understandability, Naturalness, and perceived Speed, then we can conclude
that the comprehension skill and subjective judgments of participants in the
two studies were similar. Table III compares the scores for the normal-speed
no-pauses animations in the first and second study. While the comprehension
score is somewhat higher, none of the differences in the table are statistically
significant (p<0.05, Mann Whitney U-test).
6.4 Determining Optimal Speed for ASL Animations
An additional motivation for conducting this second evaluation study was to
help identify an optimal speed for displaying animations of American Sign
Language. In our first study, participants reported that all three speeds of
animations (very fast, fast, and normal) were too fast. Participants’ responses
on the 21-point Likert scale for perceived Speed are our primary source of data
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 24
·
M. Huenerfauth
Table IV. Perceived Speed Likert-Scale Scores and Comprehension Task Scores for Animations
from Both Evaluation Studies, Sorted According to Signs-per-Second.
Type of Animations:
Speed, Pauses / No-Pauses (Study)
Very Fast, No-Pauses (1st study)
Very Fast, Pauses (1st study)
Fast, No-Pauses (1st study)
Fast, Pauses (1st study)
Normal, No-Pauses (1st study)
Normal, No-Pauses (2nd study)
Normal, Pauses, Sped-Up
(2nd study)
Normal, Pauses (1st study)
Slow, No-Pauses (2nd study)
Slow, Pauses, Sped-Up (2nd study)
Very Slow, No-Pauses (2nd study)
Very Slow, Pauses, Sped-Up
(2nd study)
Average
Signs-PerSecond
3.0
2.63
2.25
1.98
1.5
1.5
Perceived Speed
Likert-Scale Score
(10=perfect)
2.60
3.80
5.21
6.56
7.44
7.17
Comprehension
Task Score
1.5
1.32
1.2
1.2
0.9
7.88
8.04
9.83
9.51
11.4
0.44
0.52
0.36
0.52
0.48
0.9
11.0
0.56
0.01
0.17
0.13
0.35
0.20
0.34
about their opinion of the speed of the animation displayed; however, we may
want to consider how their Comprehension task scores vary according to the
speed of the animation. Thus, even if participants report that they prefer a
certain speed of animation, it may be important to consider how well they
seem to understand animations at that speed. Table IV shows the perceived
Speed Likert-scale scores and Comprehension task scores for the animations
arranged according to the signs per second of the animations.
If we focus on pauses animations (that have been processed by our signduration and pause-insertion algorithms), then we see that signers prefer
animations displayed at a rate between 1.2 and 0.9 signs per second. (Considering that the Speed Likert-scale score is 9.51 for the 1.2 signs per second
animations and 11.0 for the 0.9 signs per second animations, it is likely that
the ideal value is around 1.1 signs per second.) Thus, in future work, we may
want to repeat this study with animations at speeds between 1.2 and 0.9 signs
per second to more precisely determine the ideal speed for displaying ASL animations. While participants seemed to prefer the speed of the 1.2 signs per
second animations over the 0.9 signs per second animations, they actually had
better Comprehension scores for animations at slower speed. Thus, there may
be a trade-off between participants’ happiness with the speed of an ASL animation and their Comprehension success. Depending on the length of the message being conveyed and the importance of accurate understanding, a specific
speed value can be selected for some application of ASL animations. Thus, our
hypothesis H4 was supported—signers preferred animations displayed more
slowly than the normal-speed animations.
7. CONCLUSIONS AND FUTURE WORK
This project has identified a way to improve the comprehension of ASL animations by modifying two parameters that have received little systematic
research attention before: insertion of linguistically-appropriate pauses and
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 25
modifying the duration of signs based on surrounding linguistic factors. This
is the first study to show that signers perceive a difference in the quality of
ASL animations based on these speed and timing factors. While this study
focused on ASL, these techniques should be applicable to animations of other
sign languages around the world. Other contributions of this work include:
(1) a prototyping/experimental framework for measuring comprehension of
ASL sentences (so that additional factors can be explored in later work),
(2) collection of empirical data on speed preferences and comprehension rates
of ASL animations by native ASL signers, and (3) motivation for future computational linguistic work on ASL.
7.1 Practical Issues for Making Use of Our Results
The results of our two evaluation studies indicate an optimal speed for displaying animations of ASL, and this result may be of interest to other accessibility
researchers who wish to use ASL animations in various applications. However,
it is important to consider that the perceived Speed Likert-scale preferences of
signers may depend on a wide range of factors specific to our experimental
set-up:
1. The apparent size of the animation: the size of the computer screen, the
portion of the screen occupied by the animation, and the distance of the
human participant from the computer screen.
2. The “camera angle” of the animation: the 3D perspective from which the
animation is generated of the virtual human. For example, it may be important for the animation to display the virtual human character at “eye level”
as opposed to a camera angle that displays the virtual human from above
or below.
3. The task that the participant is asked to perform when viewing the animation: whether the participant needs to remember detailed information from
the animation, categorize the overall topic under discussion by an animation, or merely recognize the presence of specific words in the animation,
etc.
4. The domain and genre of the information conveyed by the ASL animation:
whether the animation is technical content, news stories, encyclopedic information, conversational, etc. Also, the familiarity of the specific participant
with the topic of discussion may affect their comprehension of the content.
5. The visual appearance of the virtual human character in the animation: the
facial features, the skin coloring, the background color, the clothing color
and pattern, and various other specific visual aspects of the virtual human’s
appearance may affect how easy it is for someone to see its movements.
6. The geographic- or culture-specific dialect of sign language used in the animation: if the variety of sign language performance includes vocabulary
or colloquial phrases that are unfamiliar to the participant, then it may be
more difficult to understand.
7. The “register” of the virtual human character’s sign language performance:
whether the character is making large signing movements (as if giving a
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 26
·
M. Huenerfauth
speech in front of a large audience) or making small movements (as if “whispering” a sign language sentences so that it is discretely conveyed).
8. The context and environment in which the sign language animation is being viewed: whether the environment distracting, whether there are other
activities occurring around the participant, whether the participant’s attention is being diverted while watching the sign language animation, etc.
9. The length of the animation being conveyed: For short messages in which
comprehension is very important, it may be better to display animations
more slowly. For longer messages, participants may not tolerate slower
animations.
We would expect that differences in these factors above could affect the perceived Speed Likert-scale preferences and Comprehension scores of participants in the study. Thus, we present our speed-related results in this article
not as a one-size-fits-all “ideal speed” but as a useful starting point for other
researchers to consider the optimal speed for displaying sign language animations in their own applications.
As the state of the art in generating computer animations of ASL improves
over time, then participants may prefer animations at faster speeds. We expect
future improvements in the quality of ASL facial expressions, use of eye-gaze
during signing, and use of 3D space around the signer to represent entities under discussion; these improvements may lead to computer animations of ASL
more closely resembling movements of human ASL signers. As the quality of
computer animations of ASL improves, then participants may be more willing
to watch animations at faster, more human-like speeds.
A further practical consideration for future researchers who wish to make
use of our sign-duration and pause-insertion algorithms is that the algorithms
require some linguistic information about the sentence in order to operate.
Currently, for our algorithms to be used in a real-world setting as part of a
tool like SignSmith, the user (who is scripting an ASL animation) must supply
some information: (1) clause/sentence boundaries, (2) part of speech of each
sign, (3) syntactic role of each noun, and (4) a syntactic parse tree for each
sentence. Section 7.2 discusses how in future work we may explore computational linguistic techniques for automatically identifying this information in
ASL animations.
While we were generally pleased with our experience using the SignSmith
software in our experiments, we did notice one limitation that caused some
difficulty when inserting pauses between signs in the ASL animations. An
ASL performance consists of periods of time during which the signer is doing
one of the following actions: (1) performing the basic movement of an individual sign, (2) making a transitional movement from one sign to the next,
(3) pausing his/her hands in between signs, or (4) holding his/her hands in a
resting position. (Of course the boundary between these time periods can become somewhat blurred because there are ways in which the motion of one
ASL sign has an impact on the performance of an adjacent sign.) We would
prefer for the dictionary entries of a sign language scripting system to store
the animation performance of “(1)” only—the basic movement of each individACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 27
Fig. 15. Three still images from the animation of the sign STATE from the animations produced
using SignSmith. During this sign, the character moves the right hand in a closed-fist handshape
down across the open palm of the left hand. In image (a) on the left and image (b) in the center,
the beginning and end of the main movement of the sign is shown. Image (c) on the right shows
how the signer’s hand moves after completing the performance of the sign. If a pause is inserted
after the sign STATE using the SignSmith scripting software, then the character pauses in the
configuration shown in image (c). A human ASL signer would have paused in the configuration
shown in image (b).
ual sign. However, some of the signs in the SignSmith dictionary included an
additional movement of the signer’s hands at the end of the sign that would
normally be considered part of “(2)”—the transitional movement of the hands
between signs. Specifically, the animation-movement information for some
signs in SignSmith’s dictionary included a movement of the hands back to a
more neutral position after the main portion of the sign was completed.
For example, during the sign STATE, the signer moves the right hand in
a closed-fist handshape downward across the forward-facing open palm of the
other hand. See Figure 15(a), (b). However, the SignSmith dictionary entry
for STATE includes the movement of the hands shown in Figure 15(a), (b), (c).
The likely reason why the creators of the SignSmith system have included this
extra movement information at the end of the sign is that it tends to produce
more natural-looking transitional movements of the hands between signs. By
specifying how the animate character’s hands should move “out of ” the final
position of a sign, the system can produce a more graceful and natural-looking
performance when this sign blends into a following sign. If we were using sign
language scripting software without inserting pauses into the animation, then
there is no problem with having this extra movement data at the end of the
sign. However, if we try to insert a pause after the sign STATE, then the animated character will: perform the sign, begin to retract the hands back toward
a more neutral position somewhat, perform the pause midway during the transitional movement, and then begin moving again to continue the transitional
movement prior to the next sign. Thus, the pause occurs at an odd location in
the performance midway during the transitional movement between the end
of STATE and the beginning of the next sign; it will pause at Figure 15(c). Ideally, the pause should occur at the end of the main movement of STATE; the
animated character should pause at Figure 15(b).
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 28
·
M. Huenerfauth
While we happened to notice extra movement data at the end of over a dozen
signs in the SignSmith system’s dictionary, this issue is certainly not specific
to this system. To improve the naturalness of the performance, designers of
dictionaries of signs for any sign language scripting system may decide to include some transitional movement data at the end of some signs. For future
researchers who wish to make use of our pause-insertion software, we see two
solutions to this concern: (1) the dictionary of the sign language scripting system should be edited so that extra “back to neutral position” movement data
is not part of the dictionary entry for a sign or (2) the dictionary entry for
each sign must specify the timestamp during the movement data when the
main performance of the sign is finished and the transitional movement begins (this is where a pause could be inserted at the end of the sign). While
less ideal, a quick fix would be to identify the subset of signs that include this
extra movement data at the end of their dictionary entries and then modify
the pause-insertion software so that it does not attempt to insert pauses after
such signs in the animation.
7.2 Future Work
This study has shown that ASL animations can be made more understandable
if we apply sign-duration and pause-insertion algorithms to the animations.
However, these algorithms require some linguistic information: a syntactic
parse of the sentence and a part of speech for each sign. Therefore, there is
motivation for future research in the design of tools to calculate this linguistic information automatically. It may be possible to develop automatic natural
language processing tools to assign part of speech tags to signs in a sentence
and provide some form of syntactic parse of the sentence that has been scripted
(to automate the process). If our sign-duration and pause-insertion algorithms
were used as part of an ASL animation generator (that automatically plans
an ASL sentence—as in the final step of an English-to-ASL machine translation system), then the linguistic data required by our algorithms would have
already been calculated by the generator as part of its planning work.
In order to build the sign-duration and pause-insertion algorithms for this
study, we had to select values for some numerical parameters based on numbers obtained from the linguistics literature: the percentage of time that a
signer should spend pausing (17%), the percentage of boundaries between
signs when a pause should occur (25%), the amount to lengthen the duration
of signs before sentence boundaries (12%), etc. In future work, it would be useful to empirically verify that the values we selected for these parameters are
actually the ideal settings for ASL computer animations. While it is a useful
starting point to produce ASL computer animations whose speed and pausing are based on the behavior of human ASL signers, it may be the case that
computer animations are even more understandable with different settings of
these parameters.
In future work, we also plan to use the experimental techniques developed
in this study for measuring sentence comprehension to evaluate animations
with different variations in timing variables—for example, we may modify the
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 29
relative speed of fingerspelling vs. other signs. ASL psycholinguistic studies
have also suggested that the overall speed of a performance may affect where
pauses are inserted (and their length), these studies have also suggested syntactic, discourse, and emotional features [Grosjean 1979] that affect speed and
timing of ASL. We may modify our sign-duration and pause-insertion algorithms to incorporate these other factors. Finally, it was an earlier study of
ours on ASL classifier-predicate sentences that first prompted us to explore
the issue of ASL speed and timing [Huenerfauth et al. 2008]; so, we also plan
to develop timing algorithms for those ASL constructions.
When researchers like Grosjean and Lane described patterns in ASL signdurations almost thirty years ago [Grosjean et al. 1981], they were analyzing
the data as a matter of linguistic interest. It would have been impossible to
predict the development of computer and animation technology that has made
the generation of ASL animations possible today—opening up a new application for their research. The promising initial results of this study open the
question of whether there may be additional published linguistic research on
ASL that can be applied to current research on ASL animation. This investigation paradigm (of seeking inspiration from the ASL linguistics literature,
creating prototype systems to explore the potential benefits of new algorithms,
and conducting controlled user experiments to measure any benefit) may lead
to additional advancements in the state-of-the-art of ASL animation.
ACKNOWLEDGMENTS
Allen Harper and Julian Bayona assisted with data analysis and the
preparation of experiment materials for the first evaluation study. Jonathan
Lamberton recruited participants and organized experimental sessions for the
second evaluation study.
REFERENCES
B ELLUGI , U. AND F ISCHER , S. 1972. A comparison of sign language and spoken language.
Cognition 1, 2–3, 173–200.
C HIU, Y.-H., W U, C.-H., S U, H.-Y., AND C HENG, C.-J. 2007. Joint optimization of word alignment
and epenthesis generation for Chinese to Taiwanese sign synthesis. IEEE Trans. Patt. Anal.
Mach. Intell. 29, 1, 28–39.
E LLIOT, R., G LAUERT, J., K ENNAWAY, J., M ARSHALL , I., AND S ÁF ÁR , E. 2008. Linguistic modeling and language-processing technologies for avatar-based sign language presentation. Univ.
Acc. Inform. Soc. 6, 4, 375–391.
F ISCHER , S., D ELHOURNE , L., AND R EED, C. 1999. Effects of rate of presentation on the reception
of American Sign Language. J. Speech Lang. Hear. Resear. 42, 568–582.
F OTINEA , S.-E., E FTHIMIOU, E., C ARIDAKIS, G., AND K ARPOUZIS, K. 2008. A knowledge-based
sign synthesis architecture. Univ. Acc. Inform. Soc. 6, 4, 405–418.
G ROSJEAN, F. 1979. A study of timing in a manual and a spoken language: American sign
language and English. J. Psycholinguist. Res. 8, 4, 379–405.
G ROSJEAN, F., G ROSJEAN, L., AND L ANE , H. 1979. The patterns of silence: Performance
structures in sentence production. Cogn. Psychol. 11, 58–81.
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
9: 30
·
M. Huenerfauth
G ROSJEAN, F., L ANE , H., T EUBER , H., AND B ATTISON, R. 1981. The invariance of sentence
performance structures across language modality. J. Exper. Psych. Hum. Percep. Perform. 7,
216–230.
H EIMAN, G. AND T WENEY, R. 1981. Intelligibility and comprehension of time compressed sign
language narratives. J. Psycholinguist. Res. 10, 1, 3–16.
H OLT, J. A. 1993. Stanford Achievement Test—8th Edition: Reading comprehension subgroup
results. Amer. Ann. Deaf. 138, 172–175.
H UENERFAUTH , M. 2006. Generating American Sign Language classifier predicates for Englishto-ASL machine translation. Dissertation, Computer and Information Science, University of
Pennsylvania.
H UENERFAUTH , M. 2008a. Misconceptions, Technical Challenges, and New Technologies for
Generating American Sign Language Animation. Univ. Acc. Inform. Soc. 6, 4, 419–434.
H UENERFAUTH , M. 2008b. Evaluation of a Psycholinguistically Motivated Timing Model for
Animations of American Sign Language. In Proceedings of the 10th International ACM
SIGACCESS Conference on Computers and Accessibility (ASSETS’08), 129–136.
H UENERFAUTH , M. 2008c. A linguistically motivated model for speed and pausing in animations of American Sign Language. Linguistic and Assistive Technologies Laboratory website.
http://latlab.cs.qc.cuny.edu/taccess2009/. (Accessed 11/30/08).
H UENERFAUTH , M., Z HOU, L, G U, E., AND A LLBECK , J. 2008. Evaluation of American Sign
Language generation by native ASL signers. ACM Trans. Access. Comput. 1, 1, Article 3.
K ARPOUZIS, K., C ARIDAKIS, G., F OTINEA , S.-E., AND E FTHIMIOU, E. 2007. Educational resources and implementation of a Greek sign language synthesis architecture. Comput. Educ.
49, 1, 54–74.
K ENNAWAY, J., G LAUERT, J., AND Z WITSERLOOD, I. 2007. Providing signed content on the
Internet by synthesized animation. ACM Trans. Comput.-Hum. Interact. 14, 3, Article 15.
L IDDELL , S. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge
University Press, Cambridge, UK.
M ARSHALL , I. AND S ÁF ÁR , E. 2005. Grammar development for sign language avatar-based
synthesis. In Proceedings of the 11th International Conference on Human-Computer Interaction,
C. Stephanidis Ed. Lawrence Erlbaum Associates, Mahwah, NJ.
M ITCHELL , R., Y OUNG, T., B ACHLEDA , B., AND K ARCHMER , M. 2006. How many people use ASL
in the United States? Why estimates need updating. Sign Lang. Stud. 6, 3, 306–335.
M ORRISSEY, S. AND WAY, A. 2005. An example-based approach to translating sign language. In
Proceedings of the Workshop on Example-Based Machine Translation, 109–116.
N EIDLE , C., K EGL , J., M ACLAUGHLIN, D., B AHAN, B., AND L EE , R. G. 2000. The Syntax of
American Sign Language: Functional Categories and Hierarchical Structure. The MIT Press,
Cambridge, MA.
S ANDLER , W. AND L ILLO -M ARTIN, D. 2006. Sign Language and Linguistic Universals.
Cambridge University Press, Cambridge, UK.
S HEARD, M., VAN D ER S CHOOT, S., Z WITSERLOOD, I., V ERLINDEN, M., AND W EBER , I. 2004.
Evaluation reports 1 & 2, European Union project Essential Sign Language Information on
Government Networks.
S HIONOME , T., K AMATA , K., YAMAMOTO, H., AND F ISCHER , S. 2005. Effects of display size on
perception of Japanese sign language—Mobile access in signed language. In Proceedings of the
11th International Conference on Human-Computer Interaction, C. Stephanidis Ed. Lawrence
Erlbaum Associates, Mahwah, NJ.
S TEIN, D., B UNGEROTH , J., AND N EY, H. 2006. Morpho-syntax based statistical methods for sign
language translation. In Proceedings of the Conference of the European Association for Machine
Translation, 169–177.
T ARTTER , V. AND F ISCHER , S. 1983. Perceptual confusion in ASL under normal and reduced
(point-light display) conditions. In J. Kyle & B. Woll Eds., Language in Sign: An International
Perspective on Sign Language, Croom Helm, London, 215–224.
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.
A Model for Speed and Pausing in Animations of American Sign Language
·
9: 31
Z IJL , L. AND B ARKER , D. 2003. South African sign language machine translation system.
In Proceedings of the 2nd International Conference on Computer Graphics, Virtual Reality,
Visualisation, and Interaction in Africa (Afrigraph’03). ACM Press, 49–52.
V INK , M. AND S CHERMER , T. 2005. Report of research on use of an avatar compared with
drawings or films of gestures. Tech. rep., Dutch Sign Language Center.
W INGFIELD, A., T UN, P., K OH , C., AND R OSEN, M. 1999. Regaining lost time: Adult aging and
the effect of time restoration on recall of time-compressed speed. Psych. Aging, 14, 3, 380–389.
VAN
Received November 2008; revised January 2009; accepted January 2009.
ACM Transactions on Accessible Computing, Vol. 2, No. 2, Article 9, Pub. date: June 2009.