respoken - Andrew Duchowski

Evaluating the Effects of Text Chunking on Subtitling:
A Quantitative and Qualitative Examination
Dhevi R. Rajendran
Andrew T. Duchowski
COMPUTER SCIENCE
COMPUTER SCIENCE
RICE UNIVERSITY
CLEMSON UNIVERSITY
[email protected]
[email protected]
Juan Martínez
SPEECH RECOGNITION AND RESPEAKING
ZURICH UNIV. OF APP. SCIENCES WINTERTHUR
[email protected]
Abstract
Our work serves as an assay of the impact of text chunking on
subtitling for the deaf and hard-of-hearing (SDH). Specifically, we aim
to evaluate subtitles constructed with different chunking methods to
determine whether different text groupings influence comprehension or
affect the viewing experience as a whole. The results show that there
are indeed disparities in the eye tracking data among the four
subtitling methods tested.
Figure 4: Graphs showing means and standard errors for within-subject saccadic crossovers
(left), mean durations of fixations (middle), and percentage of gazepoints in the subtitles (right).
Discussion
Introduction & Background
This study aims to improve subtitles for the deaf and hard-of-hearing
(SDH) with principles of language processing. Text chunking, or the
grouping of a block of text into coherent segments (see Figure 1), has
been shown in other contexts, such as studies of short-term memory
(Baddeley 1993) to increase short-term information retention and
speed of information processing. We examine whether these findings
also apply to subtitles.
Though eye tracking over video clips is still a nascent technology,
much research has been done to improve subtitling. Most deals with
the styles, speed of display, and positioning of the subtitles. Matamala
and Orero (2010) explored preferences in colors, positions, and
display speeds of subtitles. This study tests an aspect of subtitles that,
to our knowledge, is as yet unexplored.
Figure 2: Screenshots of heatmaps overlaid on a single frame of the video clip with no
segmentation subtitles (top left), word by word subtitles (top right), subtitles chunked by
phrase (bottom left), and subtitles chunked by sentence (bottom right).
Results
• The results were analyzed along four metrics, both within-subjects
(comparing the four runs shown to each subject) and between-subjects
(considering only the data from the first video clip each subject
viewed). The metrics were mean durations of fixations, percentage of
fixations in the subtitles, percentage of gazepoints in the subtitles, and
number of saccadic crossovers (see Figure 3).
Figure 1: Subtitle chunked by phrase, as described in methodology.
Methodology
• 24 university students viewed four video clips in a unique order. The
clips had the same video, but different subtitles.
• The four subtitling methods were: no segmentation (the area for
subtitles was filled with as much text as possible); word by word
(words showed up one by one); chunked by phrase (phrases showed
up one by one); and chunked by sentence (sentences showed up
one by one). All clips had no audio to ensure information was taken
in visually (see Figure 2).
• Participants were instructed to watch the clip for content in both the
subtitles and videos. After viewing the first clip, they were given a
multiple choice quiz on the content.
• Participants filled out preference questionnaires after viewing each
clip and after they had viewed all four.
Figure 3: Example of a saccadic crossover. A saccadic crossover occurs when a gazepoint
in the subtitles is immediately followed by a gazepoint in the scene, or vice versa.
• Within subjects, ANOVA showed a significant difference (p < 0.01) in
saccadic crossovers. Pairwise t-tests (no correction) revealed a
marginally significant (p < 0.05) difference between the word by word
and chunked by phrase clips. There was also a marginally significant
difference (p < 0.05) between the word by word and chunked by
sentence clips (see Figure 4).
• Between subjects, a marginally significant difference (p < 0.05) was
found between the word for word and chunked by phrase in terms of
durations of fixations. There was a nearly significant difference (p =
0.06) between the word for word and chunked by phrase in terms of
percentage of gazepoints in the subtitles (see Figure 4).
• There were no significant differences in preference or comprehension.
School of Computing, Clemson University
The chunked by phrase subtitles seemed to provide the relative best
viewing experience, as they had the least number of crossovers and the
smallest percentage of gazepoints in the subtitles (meaning viewers spent
more time on the video itself), with no difference in comprehension from the
other four clips (and no significant difference in preference). Likewise, the
word by word subtitles seemed to provide the relative worst quality viewing
experience, as it had the most number of crossovers and the largest
percentage of gazepoints in the subtitles. The mean fixation duration is
indicative of the speed of processing information; therefore, the data imply
that the chunked by phrase style was the easiest to take in, and the word
by word was the most difficult.
Study Limitations
• Because the same clip was shown four times, fatigue or boredom cannot
be excluded as a confound of within-subject analysis.
• The comprehension quiz may have caused participants to view the
remaining three clips differently than they would have normally.
• The questionnaires asking for preferences may not have been reliable, as
some questions were slightly ambiguous, and the participants seemed to
interpret the questions differently.
• No subjects were deaf or hard-of-hearing, so some results may not fully
generalize to the deaf and hard-of-hearing populations.
Conclusion
• The study showed that different types of text groupings in subtitles
elicit different viewing behaviors.
• The significance of these differences would be of interest in future
research, to truly understand the effects of text chunking.
• A replication of this study with more participants may produce greater
significance by reducing the variance.
• A replication of this study with deaf and hard-of-hearing participants
would be a good way to confirm the results.
References
Matamala, Anna, and Pilar Orero. Listening to Subtitles. 1st ed. Bern, Switzerland:
International Academic Publishers, 2010.
Baddeley, Alan. The Psychology of Memory. 1st ed. New York, NY: New York
University Press, 1993.