Evaluating the Effectiveness of Height Visualizations for Improving

Evaluating the Effectiveness of Height Visualizations for
Improving Gestural Communication at Distributed Tables
Aaron Genest
University of Saskatchewan
[email protected]
ABSTRACT
In co-located collaboration, people use the space above the
table for deictic gestures, and height is an important part of
these gestures. However, when collaborators work at
distributed tables, we know little about how to convey
information about gesture height. A few visualizations have
been proposed, but these have not been evaluated in detail.
To better understand how remote embodiments can show
gesture height, we developed several visualizations and
evaluated them in three studies. First, we show that touch
visualizations significantly improve people’s accuracy in
identifying the type and target of a gesture. Second, we
show that visualizations of height above the table help to
convey gesture qualities such as confidence, emphasis, and
specificity. Third, we show that people quickly make use of
height visualizations in realistic collaborative tasks, and that
height-enhanced embodiments are strongly preferred. Our
work illustrates several designs for effective visualization
of height, and provides the first comprehensive evidence of
the value of height information as a way to improve
gestural communication in distributed tabletop groupware.
Author Keywords
Digital tables, deixis, gesture, gesture height, pointing
ACM Classification Keywords
H.5.3 Group and Organization Interfaces: CSCW.
INTRODUCTION
Gestural communication – particularly pointing as a deictic
reference – is ubiquitous when people work together on
tables in the real world. Pointing reduces the complexity of
verbal communication by allowing people to easily indicate
objects, areas, or paths through simple gestures [2]. When
collaboration happens on distributed tables, however, it is
much more difficult to remotely convey natural pointing
gestures. Although digital embodiments can be employed
(e.g., telepointers or video images), current techniques are
inadequate for conveying the subtleties of deictic gesture.
In particular, typical embodiments do not show the height
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise,
or republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee.
CSCW 2012, February 11–15, 2012, Seattle, Washington.
Copyright 2011 ACM XXX-X-XXXXX-XXX-X/XX/XX...$5.00.
Carl Gutwin
University of Saskatchewan
[email protected]
of a gesture above the table. People use height for several
things when they construct a pointing gesture: to make clear
the target of the gesture (e.g., putting a finger on the object
of interest); to indicate different types of gesture (e.g.,
indications of paths versus general areas); to show different
qualities in the gesture (e.g., a high gesture might be used to
indicate less specificity); or to indicate that the gesture
points to a location that is out of reach or off the table [6].
Without a representation of gesture height, pointing-based
gestural communication on distributed tables is prone to
errors, misunderstandings, and interpretation difficulty.
Researchers in tabletop groupware have recognized some of
these problems, and have suggested a few embodiment
designs that show at least some aspects of height (e.g.,
[12,19]). However, these designs are primarily concerned
with showing surface touches, rather than the full range of
height above the table, and their effectiveness has not been
evaluated in detail. In this paper, we investigate three
questions that need to be answered in order to better
understand the design of height visualizations:
• Accuracy: does height information improve people’s
ability to determine the type or target of a gesture?
• Expressiveness: can height visualizations reliably convey
qualities such as specificity, confidence, and emphasis?
• Usability: can people make use of height visualizations in
realistic work, and do they prefer these representations?
We answered these design questio ns with three studies.
The first experiment showed that representing touch and
hover significantly improve people’s ability to determine
both the target of the gesture and the type of gesture. The
second study showed that people use height visualizations
to interpret a gesture’s specificity, confidence, and
emphasis, and showed that these interpretations are
consistent with the ways that people see real-world
gestures. The third study looked at the ways that people use
height representations in realistic collaboration, and showed
that people quickly make use of the additional height
information in their deictic gestures, that the heightaugmented embodiments caused no new usability problems,
and that the augmented versions were strongly preferred by
users.
This work makes two main contributions. First, we
demonstrate several visualizations for showing key
elements of gesture height, designs that refine and improve
on previous solutions. Second, we provide a wide variety of
evidence for the value of representing height in remote
tabletop embodiments: we are the first to show that height
visualizations can improve interpretation accuracy, can
improve the expressiveness of remote gestures, and can be
quickly learned and used in realistic collaboration. The
design information and empirical evidence that this work
provides can greatly help to improve the subtlety and
precision of gestural communication on remote tables.
PREVIOUS WORK
Gestures and Deixis
Researchers have identified gestural interaction, specifically
deixis or pointing gestures, as critical to the success of
collaborative environments (e.g. [2], [10], [13]). It follows
that distributed collaboration is hampered when users work
in environments where they are unable to see the gestures
made by remote collaborators. This can result in higher
levels of frustration in communication and lower-quality
results from collaboration [2]. Thus, supporting gestural
communication between collaborators is an important goal
for distributed groupware technologies. One way of
providing support for gestures is through embodiments –
representations of the participants in a distributed
collaboration. There has been a great deal of work done on
embodiments (e.g., [16,20,18]), but we are particularly
interested in embodiments for distributed tables.
Remote Embodiments on Tables and Surfaces
Several systems have provided visual representations of
remote users’ arms and bodies, from early video-based
tools such as VideoDraw [20], VideoWhiteBoard [21], and
ClearBoard [11] to more recent tabletop and large-display
groupware (e.g., [17], [18]). Embodiments in these systems
can be divided into two categories: abstract embodiments
such as telepointers [9], which represent the user with lines
and shapes, and realistic embodiments, which often employ
video techniques (e.g., [21], [19]). Each of these approaches
has particular strengths and weaknesses [6]. Although
realistic embodiments can be richer in their representation
of natural deictic gestures, abstract embodiments are better
able to indicate a specific touch point, or provide extra
information such as additional visualizations, that are not
available in 2D video (see Table 1). The question of
whether video or artificial representations are best has
received some attention, with some evidence for both sides:
a study by Kirk et al. [14] found that realistic video of
hands and arms was better than a representation of a stylus
drawing tool; but several projects have also shown or
implied that abstract visualizations are also useful [4,19].
In a comparison of abstract and realistic techniques, Genest
and Gutwin [6] identified the height of a gesture as one
component that video embodiments do not communicate
well. This finding is supported by Fussel et al., who found
that lack of information about gesture height was a barrier
to communication in collaborative environments [5]. These
problems exist partly because of the 2D representation of a
realistic embodiment, but there are also several practical
problems in representing height with video – for example,
higher gestures become much larger (since they get closer
to a top-mounted camera), and higher gestures are much
more likely to leave the camera’s view frustum.
Table 1. Representational capabilities of abstract and realistic
embodiments (adapted from [7]).
Component of Deictic Gesture
Differentiate similar gestures
Represent stroke gestures
Express full range of morphology
Represent height
Abstract
No
Yes
No
Possible
Realistic
Possible
No
Possible
No
Genest and Gutwin found that height differences were
closely associated with variations in confidence, specificity,
and emphasis. They also observed height used to indicate
secondary references, to point at distant objects, and to
mirror the real-world height variations of an object. They
identified four important levels of height: on the surface,
moving between on and just off the surface, hovering off
the surface, and more than 5 cm above the surface [6].
Showing Gesture Height in Remote Embodiments
Previous work on embodiments has considered height in
two ways: as an aspect of gestural communication, or as a
way to provide feedthrough about others’ actions.
Height in Existing Embodiment Techniques
One of the earliest video embodiments, VideoWhiteBoard,
used real shadows cast on the display surface [21]. This had
the side effect of showing diffusion in the shadows as
people moved away from the surface, a cue that could be
used by remote participants to better understand people’s
locations. Later implementations of the idea used digital
rather than analogue shadows, and did not convey the same
kind of distance information [1].
Distance information can also indicate intention to make a
deictic gesture. Fraser et al. found that explicitly visualizing
the approach of participants’ pens toward a wallboard
significantly improved coordination in a video-annotation
task, and reduced conversational latency [4]. Both Fraser’s
work and VideoWhiteBoard are effective in showing
heights or distances larger than 5cm, but neither of these
projects explored a more complete design space for height.
Hilliges et al. [9] used the height from a surface as a
method of providing more sophisticated interaction with 3D
objects. They provided hand shadows as a feedback
technique, inverted so as to appear smaller when hands
were farther from the surface. As an interaction technique,
distance from the surface can be mapped, as they suggest,
to less engagement, and therefore less of a user footprint on
the surface. We consider this in our designs, detailed below.
More recently, Tang et al. enhanced the VideoArms
technique to show ‘touch pearls’ [19] or contact traces
when participants were touching the table surface, using an
effect similar to telepointer traces [7]. They mention that
proximity information was not well modeled in their
design, despite the improvement provided by the traces;
however, they note that the traces did provide a level of
awareness not available through the original VideoArms
design. No evaluation of the contact traces was carried out.
Height as Feedthrough
Height or distance from the surface can also provide
information about user activity, since surface touches are
often used to invoke commands in the system.
Visualizations of height are therefore used to give feedback
to the local user, and can also provide feedthrough to
remote participants [3]. The visualizations generally show
the difference between hover and touching states on touch
or pen interfaces (e.g., with the C-Slate interface [12], or
with Wigdor et al’s ripple visualizations [22]). Although
work on feedthrough that incorporates height information
provides valuable insight into embodiment design, this
research has not investigated how including height affects
communication in distributed collaborations.
DESIGNING AN ENHANCED EMBODIMENT
Our review of related work identified three important
design factors in height representation for embodiments: the
difference between touch and hover, the relative height of
gestures above the surface, and the short-term history of
movement. To better evaluate the usefulness of height in
embodiments, we designed two components: one to
represent surface touches and the short term history of
contact with the surface (contact traces) and one to
represent the relative height of a gesture above the table.
Since height is difficult to see in video-based embodiments,
our height visualizations are abstract rather than realistic;
they could, however, be added to a video-based
embodiment. There are also reasons for lower-fidelity or
abstract embodiments to stand alone, such as situations of
limited bandwidth or low computational power, or systems
that support high numbers of collaborators. We anticipate
that the designs below, supported by the research presented
in this paper, can be used in either context – as additions to
high-fidelity, video-based embodiments or as components
of high-efficiency, abstract embodiments.
Our designs build on previous work: traces and ripples are
based on work by Tang [19] and by Wigdor [22], and the
inspiration for our above-the-surface ellipses comes from
work by Tang on VideoWhiteboard [21]. However, our
designs are the first to comprehensively describe where
gestures are situated in the space above the table.
improved visibility on a variety of backgrounds). When a
touch is detected, we add a crosshair shape under the arrow,
and highlight the crosshair in green (see Figure 1). We used
arrows and crosshairs instead of wider blob shapes (like
those extracted from FTIR tables), since we were interested
in comparing them to traditional telepointers. Differences in
accuracy between the mouse arrow and a fingerprint-sized
blob would be a confounding factor in evaluations.
Figure 1: Our table-touching designs, on the surface colour
and shape change (left); above the surface, plain (centre); and
combined with contact traces as ripples (right).
Showing the Touch Path with Contact Traces
Traces of interactions improve the interpretation of
distributed gestures [19], and paths are common elements
of gestures over tables [6], so we added a visualization of
the path of a touch gesture, or contact traces.
We designed two variants of the trace visualization. The
first showed contact traces in the form of ripples (Figure 2,
top), based on the previous success of ripples as touch
feedback [22]. The second version shows simple sketch
lines along the path of the gesture (Figure 2, middle); this
design more clearly indicates a path (ripples can
occasionally be mistaken for a multiple point gesture). In
this version, ripples are still shown at the point of initial and
final contact with the table, which permitted tapping
gestures and gestures that touched the surface at a single
point to still show a contact trace.
Figure 2. Top: ripple contact traces. Middle: line contact
traces (slow movement with initial ripple at touch point).
Bottom: line contact trace ‘colouring in’ an area.
We designed embodiments to convey the location of a
touch, and the recent history of contact with the table.
The trace lines also showed the speed of the gesture: fast
touching gestures resulted in straight and parallel sketch
lines, and slower gestures showed lines that were slightly
rotated and slightly further apart (Figure 2). This
visualization was added since our early observations
showed that people ‘colour in’ space with quick path
gestures to indicate an area.
Showing Touch
Designing to Show the Height Above the Table
Representing Touches on the Table
The change in gesture height between just off the surface
and touching is an important state change [6]. To emphasize
this difference, we dual-encoded touch state using both
shape and colour. When the gesture is above the surface we
represent the user’s pointer with an arrow (red and black for
Beyond the table surface, height is an ordinal variable.
Although people use different layers in the above-the-table
space, these differences are not uniform [6], so we represent
height with a continuous visualization, but with added
elements that allow people to make use of different layers.
The ordinal component of height above the surface is
represented as an ellipse that increases in size and becomes
more transparent as the gesture moves upwards (Figure 3).
These visual cues are compatible with the idea that higher
gestures are less specific [6] and that higher gestures might
need to appear less engaged [9]. Our ellipse was multicoloured, increasing its visibility on a variety of
backgrounds (Figures 3 and 4). We also included a
movement trace (to emphasize high-level path gestures) by
fading out each ellipse after one second; the effect is one of
a series of shapes left behind in the trail of the cursor.
(i.e., can they convey the kinds of subjective qualities that
are common in real-world gestures); and usability (i.e., can
people make use of them in realistic work, and do they
prefer them over basic representations).
The three studies are described below. The first two
experiments are more controlled, and therefore investigate
simple gestures – pointing at objects, paths, and areas.
These gestures comprise a substantial proportion of simple
deixis (i.e., many indicative gestures in the real world
involve these simple actions), and are also components of
more complex interactions (previous work has shown that
complex gestures can be broken into smaller common
components [13]). In the third study, participants carried
out open-ended collaboration, and so the tasks and gestures
were not constrained.
STUDY 1: INTERPRETATION ACCURACY
Figure 3: Above-the-surface visualization. A high gesture (A)
with the large, transparent, ray-cast ellipsoid and a low
gesture (B) with the small, solid, sphere.
Our design subtly reflected the change in height layer
(between the 0-5cm layer and the above 5 cm layer) by
changing from a perfect sphere (in the lower layer) to an
increasingly narrow and long ellipse (in the higher layer)
(see Figure 3). This visualization technique helps to show
gestures that point to objects far away or off the table with a
simulated ray-casting technique: as gestures move higher,
the ellipse becomes narrower and longer, and the centroid
of the ellipse moves away from the user (i.e., as a shadow
would when cast from a light behind the user; see Figure 4).
These effects were shown only when the gesture rose 5cm
above the table surface.
This study looked at the effects of touch and hover
visualizations on observers’ ability to identify the target of
the pointing gesture (what was being pointed at), and on
people’s ability to identify the type of pointing gesture (i.e.,
whether the gesturer’s intent was to point to an object, a
path, or an area).
Height Visualizations and Tasks
We compared four of the visualizations described above in
two tasks. Our visualizations (see Figures 1 and 2) were:
• Arrow: a red-outlined, standard arrow pointer (Fig. 1);
• Touch: a shape- and colour-changing pointer (Fig. 1);
• Trace: a red-outlined, standard arrow pointer (Fig. 1)
with ripples as a contact trace (Fig. 2);
• Touch+Trace: a combination of Touch and Trace.
We used Arrow as a control condition because this
representation provides the highest gesture specificity, and
because arrows are a standard embodiment in groupware.
The visualizations were tested on two tasks, both of which
used aerial photographs as background data. Participants
were seated at a large table, on which was projected a
satellite photograph of New York (see Figure 5). The image
was selected on the basis of its high number of closely
situated potential targets.
Figure 4: Simulated ray-casting with the ellipsoid
embodiment. N.B., the shadow is an artifact of the overhead
projection and is not part of the embodiment.
EVALUATION OF HEIGHT VISUALIZATIONS: 3 STUDIES
Although a few designs for touching and above-the-surface
interaction been seen in prior work, there are no formal
evaluations of the effectiveness of height visualizations for
remote collaborators. We carried out three studies to assess
three important design issues for height visualizations:
accuracy (i.e., do they improve people’s ability to
determine the type or target of a gesture); expressiveness
Task 1: identifying the target of the gesture. We prerecorded twelve gestures that touched the surface of the
table 3-5 times. Touches occurred at realistic targets (such
as street intersections) and the appropriate point of each
visualization was used for the touch (the tip of the Arrow
representation, and the center of the Crosshair). After
viewing the gesture (there was no accompanying speech),
the participant used the mouse to click on each location
where the gesture touched the surface. Participants were
asked to be as accurate as possible (but were also aware that
the task was timed). Participants could select the points in
any order (our analysis used a best-fit technique that
calculated the best possible accuracy value).
Task 2: identifying the type of the gesture. We pre-recorded
three kinds of gestures that can look similar [6], in order to
determine whether height information helps people
differentiate between them. The three types (see Figure 5)
were paths, which indicated a straight or zigzagging path on
the map; single points, which indicated single targets on the
map; and multiple points, which indicated several points
scattered across the map. Gestures were carried out at a
variety of heights: paths could be on or above the table, and
point gestures either touched the table or paused at each
target in the gesture. Participants viewed 28 pre-recorded
gestures (8 single-point, 8 multiple-point, 12 paths) in
random order, and selected the type from a list.
Study Design and Procedure
The study used a within-participants design and RMANOVA to look for effects of visualization type on
accuracy of target identification and type identification.
Visualization were shown as a block (order balanced using
a Latin square). Main tests used α=.05, and post-hoc
comparisons used t-tests with the Bonferroni correction.
Participants completed a demographic survey and an
orientation session that included previews of each visual
condition. The type identification task was presented first,
then the accuracy target identification task. Sixteen
participants (12 male, 4 female, 21-45 years old, mean
27.6) were recruited from a local university. Computer logs
were used to track the participant’s completion time,
selected points for the accuracy task, and answers in the
type-of-gesture task.
Study 1 Results
Accuracy in Identifying Target Locations
Since there were several targets in a gesture, and since
participants were not required to specify the points in order,
we calculated error using a best-fit technique that used the
lowest overall error value (in pixels) for any possible
mapping of participant selections to actual targets.
60
50
40
30
20
10
0
Arrow
Touch
Trace
Visualization Type
Touch+Trace
Figure 7: Error in pixels by visualization (accuracy task).
Accuracy of Gesture Type Identification
RM-ANOVA showed a main effect of visualization type on
identification accuracy (F3,45=154.85, p<0.001). As Figure 8
shows, all of the visualizations that represented touch
(either with a visual state change or with a contact trace)
had significantly lower error rates than the standard arrow
(p<0.001 for all post-hoc pairwise comparisons involving
the arrow; no other differences found). The differences
were substantial: the nearly 50% error rate of the arrow
cursor was reduced to less than 5% for all of the touch
visualizations. This finding suggests that gesture
identification is improved with any touch information,
which helps to disambiguate between simple movement
above the table (e.g., moving to the start position of the
gesture) and the gesture itself.
0.6
0.5
Error Rate
Figure 5: Gesture types used in Study 1: left (blue), a
scatteredmultiple points gesture; middle (red), a path gesture;
right (green), a point gesture. An example of the background
used in the studies.
Mean Error per Target (pixels)
RM-ANOVA showed a main effect of visualization type on
location accuracy (F3,45=15.76, p<0.001). Figure 7 displays
average error amount per target, and shows that participants
were approximately 12 and 16 pixels closer to the targets
with the two trace visualizations (Trace and Touch+Trace)
than with the other techniques. Pairwise comparisons
showed that both trace visualizations had significantly
lower error amounts than Arrow and Touch (all p<0.01, no
other differences found). RM-ANOVA of the completiontime data found no effects of visualization type
(F3,45=0.252, p<0.859).
0.4
0.3
0.2
0.1
0
Arrow
Touch
Trace
Visualization Type
Touch+Trace
Figure 8: Error rate on the identification task
STUDY 2: EXPRESSIVENESS
The second study investigated the question of whether
height visualization aids in conveying subjective qualities
that are common in real-world gestures, such as the degree
of confidence, specificity, and emphasis in the gesture. We
explored this question in three parts. First, we looked at
whether people interpret height information in a remote
gesture as having an effect on the confidence, specificity,
and emphasis of the gesture. Second, we looked at whether
people associate visualizations of tapping with emphasis.
Gesture height was shown using a combination of two
designs described above: height above the table was shown
using the ellipse and sphere visualizations (Figure 3), but
the moment the gesture touched the table, the sphere
disappeared and the line-based contact traces, state-change
cursor and starting and ending ripples were used (Figure 2,
middle). As discussed above, we used line-based traces
rather than ripples because our design work suggested that
lines are less likely to indicate multiple points to viewers.
Tasks, Design, and Procedure
The study used similar tasks to those described for the first
experiment. Participants stated whether the gesture was a
path, point, or area gesture, and also answered three
questions related to gesture qualities: how specific was the
gesture, how emphatic was the gesture, and how confident
was the gesturer in making the gesture. These questions
were answered on a seven-point scale. In addition, we
included two distractor questions to reduce the chance of
participants guessing the hypothesized link between height
and specificity, confidence, and emphasis. These were: did
the gesture indicate something off the edge of the table, and
did the gesture indicate an out-of-reach object. Sixteen
participants (11 male, 4 female, 19-33 years old, mean
23.9) were recruited from a local university. All watched
several previews of the gesture visualization.
The study used a within-participants design to examine the
effects of two factors: the global height of the gesture, and
the presence of local height variation in the gesture. These
Study 2 Results
We had two groups of data: participant measures of gesture
qualities and an accuracy test. We examined the participant
data to see how height variation (global and local) affected
how participants interpreted gesture qualities.
Interpretation of Gesture Qualities: Global Height
All participants interpreted the ellipse height visualization
as expected, inferring that lower height implied greater
confidence, specificity, and emphasis. We performed an
ANOVA to determine whether height visualizations at the
three global heights led to differences in participant ratings
of these three qualities. In all cases, the ANOVA showed a
significant effect of height on the perceived level of the
quality (confidence, F2,30=24.84; specificity, F2,30=30.27;
emphasis, F2,30=15.64; all p<0.001), see Fig. 9.
Participant rating (1-­‐7)
We considered two kinds of gesture height in the study:
global height, indicating the overall height of the entire
gesture above the table, and local height variation, which
involved hand-bouncing or finger-tapping motions but did
not change the global height of the gesture. We recorded
gestures at three global heights: high (more than 20cm
above the surface), low (between 5 and 20cm above the
surface), and surface (in contact with the table). We
recorded half of the gestures with local height variation
within the global height (i.e., a tapping or waving
movement that remained within the bounds of a specific
global height), and half where the hand and fingers
maintained a constant, global height (i.e., no tapping). Each
gesture set (high, high with local variation, low, low with
local variation, surface, and surface with local variation)
included two kinds of path gestures, two kinds of point
gestures, and two kinds of area gestures:
• Pointing gestures indicating a single target;
• Pointing gestures indicating multiple scattered points;
• A simple path between two points;
• Paths between multiple points;
• An area delineated with a contour gesture;
• An area that is ‘coloured in’ with the gesture.
factors were fully crossed and contained either three or four
trials for each of the gesture types (path, point, area). The
difference in number results from the fact that some
gestures would not realistically be used in some situations
(e.g. hovering over an area works only above the surface).
There were a total of 63 randomly-ordered trials.
7
6
5
High
Low
Surface
4
3
2
1
Specificity
Confidence
Gesture Quality
Emphasis
Figure 9: Interpretation of gesture qualities, by global height.
Interpretation of Gesture Qualities: Local Height Variation
RM-ANOVA also showed a main effect of local height
variation on interpretation of qualities (F1,15=6.39,
p<0.001). As shown in Figure 10, people were more likely
to interpret a gesture as emphatic if there was a bouncing or
tapping motion in the gesture. There was also an interaction
between global height and local height variation (F2,30=4.68,
p<0.001). Post hoc pairwise comparisons found significant
differences between height variation and no height variation
on specificity, emphasis, and confidence for globally high
gestures (p<0.001), globally low gestures (p<0.01), but only
on emphasis for gestures on the surface (p<0.01).
Rating of Emphasis
Third, we looked at whether height information helps
people to interpret gesture type. These explorations are
motivated by prior work showing that in co-located
situations, people do interpret gestures in these ways [6].
7
6
5
4
3
2
1
Height Variation
No Height Variation
High
Low
Height of Gesture
Surface
Figure 10: Interpretation of emphasis, by local height
variations. Global heights are collapsed.
As the figure shows, when people saw only the global
height, they were much more likely to reduce their
estimation of emphasis with high gestures, but when there
was local height variation, this information was consistently
interpreted as implying that the gesture was emphatic.
(Results were similar for specificity and confidence).
Accuracy of Identifying Gesture Type
RM-ANOVA showed a main effect of global height on
interpretation accuracy (F2,30=9.027, p<0.001). We
investigated, post hoc, the high error rates we observed in
some conditions. We discovered that the exceptional error
rates were specific to particular gesture types: paths with
points had high error rates above the surface and both
multiple-point paths and area contours had high error rates
on the surface. Participants largely misclassified the
multiple-point paths and area contours surface gestures as
path gestures, and both of these gesture types do share
considerable information with paths. Part of this result is
attributable to the complicated nature of gestures. The
points-on-a-path gesture could also have been classified as
a pointing gesture – participants interpreted it as a path on
the surface and as pointing when in the air.
Despite these subtleties in measures of accuracy, Study 2
provides us with three clear findings. First, and most
importantly, representations of gesture height allow people
to interpret several subject qualities of a remote gesture.
Second, the global height of a gesture is consistently
interpreted as being inversely proportional to the gesture’s
emphasis, confidence, and specificity. Third, local height
variations are consistently interpreted as increasing a
gesture’s emphasis, confidence, and specificity. Overall,
these results show that visualization of a remote gesture’s
height can substantially increase gesture expressiveness.
STUDY 3: USABILITY IN REALISTIC COLLABORATION
The third study investigated the use of height-augmented
embodiments in a realistic collaborative situation, and
looked at three main issues:
• Usage: do people make use of the height visualization
when making gestures, and for what do they use them?
• Interpretation: are gestures interpreted as intended, and
do the visuals of the augmented embodiments cause any
communication difficulty or confusion?
• Preference: do people prefer augmented embodiments
over standard versions?
The study involved a realistic collaborative task between
two participants: the remote participant was seated in one
room at a (non-interactive) tabletop display, and the local
participant was in another room at the table used in the
earlier experiments. Local participants had a Polhemus
Liberty 240/80 sensor attached to their primary pointing
finger, which was used to capture the participant’s gestures
in 3D for display on the remote participant’s table.
The height-augmented embodiment used in the study (and
shown equally to both participants) were the same as those
used for the second study (Figure 2, middle; Figure 3;
Figure 4). The ‘standard’ embodiment was a standard arrow
telepointer (Figure 1, middle). The background data on the
tables was a satellite image of the participants’ local city.
The remote participant was given a list of 22 questions
about locations in the city (similar to those used by
Kettebekov and Sharma [13]). The remote participant asked
each question to the local participant, who responded by
speaking and gesturing over the map. Participants were
asked to continue discussing the question until both were
satisfied that the answer was understood and correct.
The questions were selected to explore a wide range of
open-ended interactions. Some questions involved areas
that were familiar to one or both participants, and some
questions were designed to involve unfamiliar areas of the
city (e.g., safe boating areas or optimal pathways through
neighbourhoods). Between questions, participants filled out
a questionnaire which asked them to identify different
qualities of the gestures that had been used in the exchange,
and also asked how quickly they had achieved
understanding, how well that understanding was achieved,
and whether any locations had been out of reach or off the
table. After each session, an interview was conducted to
discuss the participants’ experiences and to ask about
preferences. We did not gather performance data since the
nature of the tasks and the differences in familiarity with
the locations led to high variance in completion times.
Half of the groups began with the height-enabled
embodiment, and the other half with the telepointer. After
11 questions, they switched to the other visualization.
Before each condition, participants were given as much
time as needed to familiarize themselves with the
embodiments. Eight pairs of participants were recruited
from the local university (5 female, 11 male, ages 21-33,
mean 24.4). Sessions were videotaped for later analysis.
Study 3 Results
Usage of height-augmented embodiments
Observations during sessions and during later video
analysis showed that the local participant (the person
constructing gestures) adapted quickly to the additional
capabilities provided by the enhanced embodiment. All
participants clearly understood what the remote participant
would see of the visualization, and all participants showed
evidence that some of their gestures were constructed to
make use of the height visualization.
Indicating areas. Local participants regularly used the
ellipse to indicate areas by hovering or moving slightly at a
height that made the ellipse both an appropriate size for the
target under discussion (e.g., Figure 13), and to indicate
emphasis. For example, the following exchange occurred
during discussion of shopping areas (LP: Local Participant):
LP: It's on this road. [leaves hand on surface while
tracing along road, scanning ahead to find target]
LP: [raises hand slightly to create ellipse] And here's the
mall. [taps finger in the air, varying the size of the ellipse]
Improved gesture visibility. When local participants
constructed a path or contour gesture with the augmented
embodiment, it was clear that they knew the other person
would see the path (because of the contact traces). As a
result, we observed no instances where the local participant
repeated the gesture. When people constructed gestures
with the plain telepointer, however, it was common for the
participant to repeat the gesture several times, even without
prompting from the remote participant.
Indicating out-of-reach targets. When tasks required the
identification of far-away targets, local participants often
used the augmented embodiment’s simulated ray-casting
ability (see Figure 13). The following exchanges illustrate
the difference between the two embodiment conditions;
participants were able to come to a shared understanding
with much less effort using the augmented embodiment.
Telepointer condition:
LP: Somewhere up there... I can't reach very far but...
[stretches out of chair to reach as far as possible]
RP: Yeah…
LP: [sits down and retracts hand] Up along the river.
Enhanced embodiment condition:
LP: Up along the river, in the middle, there. [at limit of
reach, raises hand to use raycasting, stays seated]
RP: OK
Figure 13: Left, reaching for a distant target in the plain
condition; centre, reaching in the enhanced condition; right,
hovering over an area in the enhanced condition.
Controlling specificity. Participants using the enhanced
embodiment were clearly aware of the way that height was
shown to the other person, and exercised much more
control over gesture height than in the telepointer condition.
For example, in several sessions, the local participant
avoided touching the table unless they intended a high level
of specificity. In the telepointer condition, however,
participants were indiscriminant about the height of their
gestures, since there was only one level of specificity in the
embodiment. This did not cause a problem for the
telepointer condition (although we wonder what would
happen in mixed-presence settings), but indicates that
participants made use of the additional expressive
capabilities of the embodiment when these were available.
Interpretation of height-augmented embodiments
We analysed the video data looking for examples in which
the additional visual information of the augmented
embodiments caused clutter, confusion, or errors. In no
cases did the extra information cause any noticeable
problem, and nor did any participant report such problems
in the post-session interview. We paid particular attention
to issues of distraction and occlusion for the ellipse
visualization, but the transparency of this effect appeared to
successfully remove any difficulties for the participants.
It was also clear that remote participants made use of the
features of the augmented embodiment in their
interpretations. Evidence for the value of the visualizations
came out in the interviews, where it was clear that remote
participants understood the visual effects and wanted the
local participant to make use of them. For example, one
remote participant remarked, “I was hoping that she [the
local participant] would use the difference between the ball
[ellipsoid] and the pointer more than she did.”
We note that gestures in this study were accompanied with
speech, which likely limited interpretation problems (for
either condition), in that accompanying speech could often
be used to disambiguate a gesture. This indicates that (as
expected) the information conveyed by height visualization
may not be critical to the successful completion of the task;
rather, the augmentations add subtlety and expressiveness
to the overall interaction. (The fact that people did notice
and appreciate these changes is shown in preference data).
Preferences and Participant Comments
When asked about their preferences after the session, 14 of
the 16 participants stated that they preferred the heightaugmented embodiments. In addition, remote participants
universally preferred the height-augmented version. In one
typical response a participant described the visualization as,
“really nice, especially the trail left behind.” The touch
traces were appreciated by both participants: one remote
participant said that the “pathways were a lot easier [to
see]” with touch traces; a local user agreed, stating: “I liked
the afterwards line. That was useful for generating paths.”
The two participants who preferred the plain telepointer
were local participants; and both indicated that the plain
telepointer was more familiar and therefore more desirable.
Other analyses
Analysis of the post-conversation questionnaires (using
RM-ANOVA) found no significant differences between
telepointer conditions (all p>0.05). We did not expect major
differences in these measures because of the high variance
introduced by the unconstrained task.
DISCUSSION
Our evaluations show three main results:
• Visualizing touch, hover, and contact traces significantly
improved people’s ability to interpret the type of gesture,
and the targets of multi-point gestures;
• Visualizing height above the table allowed gestures to
convey subjective qualities (e.g., specificity or emphasis),
and these qualities were interpreted consistently;
• Touch, trace, and height visualizations were all easy to
learn and use in realistic collaboration, and people made
frequent use of the visualizations when gesturing.
• People strongly preferred the augmented embodiments.
channel communication that should not be noticed by other
participants. In these situations, height visualizations can
allow people to convey far more information than what can
be produced with a simple telepointer.
Explanation and interpretation of main findings
Second, even when speech is available, the added
information of the augmented embodiment can help to
reduce communicative effort. For example, the tapping-inthe-air gesture made by a participant in the third study (see
description above) provided a clear indication of a location;
if this were to be established through speech, the participant
would have had to be more careful (e.g., more accurately
coordinate the timing of the word here with the location of
the telepointer). It is almost always true that verbal
interaction can suffice [15]; but the added capabilities of the
augmented embodiment provide more possibilities for
participants to make conversation simpler and faster. A
detailed explication of these improvements is left to future
work, but it is indicative that participants’ preferences were
so strongly in favour of having height information.
Why height visualizations worked. There are three main
reasons why the height-augmented embodiments were
successful in our studies. First, and most obvious, there are
several situations where useful and important information is
encoded in a gesture’s height – different types of gestures
use different heights near the surface, specific actions such
as tapping have characteristic height variations, and
subjective qualities are strongly associated with height
above the table. When height information is available in the
embodiment, people get a richer representation of what is
intended in a gesture. Although verbal communication can
sometimes make up for gaps in this visual information (see
below), augmented embodiments can allow distributed
communication to be simpler and more subtle.
Second, our studies made it clear that remote embodiments
on large tables and on complex workspaces are much less
obvious and much less visible than real arms in co-located
settings. This problem has been noted for other kinds of
distributed groupware, but becomes more acute when
tabletops are large (as in our studies). The lack of basic
visibility makes it more difficult to determine when certain
events (such as touching the table) occur, even with
accompanying speech. The additional visual information of
the augmented embodiments helped with this visibility
problem – the change in representation when touching, the
contact traces, and the ellipse all provided more noticeable
representations that aided interpretation.
Contact traces were found to be particularly valuable in our
studies, suggesting that it is difficult for observers to see
and remember the details of a remote gesture. Traces of
past activity have previously been shown to be valuable for
other reasons (e.g., they are an ‘awareness buffer’ when
attention is divided among several tasks [1,23]), but here
we show that traces can also be useful when they are
specific to a particular height – people in our studies knew
that traces applied only to surface gestures. In future, we
will explore the value of providing different traces at
different heights, to determine whether we can obtain a
broader benefit from this general approach.
Does talk remove the need for height information? Our
third study showed that even when people used a simple
telepointer, they were still able to carry out their tasks and
come to collaborative understanding, primarily through
their accompanying verbal interaction. However, this does
not mean that height visualizations are not valuable, for two
reasons. First, there are many situations where verbal
interaction is constrained – e.g., a gesture may be produced
by a third person outside of the main conversation,
participants may be trying to work on a task as quickly as
possible, or a gesture may be intended as a subtle side-
Natural gestures and gesture languages. Although we
intended to evaluate how successfully an embodiment can
represent height to remote collaborators, it is almost
impossible to fully duplicate natural gestures for a remote
participant, and so gesture visualizations will always to
some degree be new visual languages rather than simply a
representation of the natural gesture itself. In our third
study, although we observed generally more natural
pointing behaviour with our enhanced telepointer, we also
saw people use new behaviours to take advantage of the
facilities provided by the enhancements – for example,
when they used simulated raycasting to identify out-ofreach targets or the expanding ellipse to indicate areas. This
raises the question of whether it is better to try and replicate
the visual appearance of real-world gestures (which can
never be exact), or to produce an artificial visual language
that people will have to learn (but that will cause less
negative transfer from expectations of the real world).
Future work will explore whether hybrid embodiments
(video plus abstract representations) can provide a middle
ground in this regard.
Lessons for Designers and Future Work
Although more research needs to be done in this area, two
main principles from our work can already be used by
designers of tabletop embodiments. First, designers should
consider representing height information in remote
embodiments, since height can improve interpretation and
accuracy, can allow gestures to express a much wider range
of subjective qualities, and is strongly preferred by users. In
the near future, height information will be simple to obtain
(e.g., using depth-sensing cameras), making these changes a
feasible option. Second, touch visualizations and traces are
useful for a variety of tasks (involving both awareness and
height issues), and should be easy to add to groupware
systems since touch is already sensed on many tables.
Our work identified the role of height visualization as a
stand-alone feature of embodiments, rather than as
incorporated in already existing, high-fidelity, video-based
embodiments. This has the added benefit of allowing our
work to be used in the design of high-efficiency, abstract
embodiments intended for deployment where bandwidth or
computational power are limiting factors. In future work,
we will examine how incorporating our height visualization
affects communication with both abstract embodiments and
hybrid embodiments [6] (combined video and abstract
visualizations). We are also curious to see if significant
behavioural changes occur when just the contact traces and
touching information are present in an embodiment, or
whether the height visualization is also needed.
Finally, our work currently only applies to 2D display
surfaces. There are a number of tabletop interaction designs
that use false 3D or incorporate tangible components. We
are interested to see how we can represent height when the
display surface is no longer a uniform height.
Conclusion
The height of a pointing gesture above a table is an
important part of collaborative communication. We
evaluated how adding abstract height information to
embodiments can improve the use and understanding of
gestures in distributed collaboration. We found that
embodiments can improve the accuracy of gesture and
target identification, and that information about the height
above the table allows users to consistently interpret
subjective qualities of the gesture. Our evidence suggests
that more information about the height of the gesture, both
where it is and where it has been, can improve distributed
communication and collaboration. We found that in realistic
collaboration, users adapted quickly to the embodiments,
and took advantage of the opportunities afforded by our
designs when constructing their gestures. The augmented
embodiments were strongly preferred. This work shows that
height information can have important effects on the
usability of distributed tabletop groupware.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
REFERENCES
1.
2.
3.
4.
5.
Apperley, M., McLeod, L., Masoodian, M., et al., Use
of video shadow for small group interaction awareness
on a large interactive display surface. Proc. AUIC
2003, Volume 18, 81–90.
Bekker, M., Olson, J., and Olson, G., Analysis of
gestures in face-to-face design teams provides
guidance for how to use groupware in design. Proc.
DIS 1995, 157-166.
Dix, A., Finlay, J., Abowd, G., and Beale, R. HumanComputer Interaction. New York: Prentice Hall, 1993.
Fraser, M., McCarthy, M., Shaukat, M., and Smith, P.
Seconds Matter: Improving Distributed Coordination
by Tracking and Visualizing Display Trajectories.
Proc. CHI 2007, 1303-1312.
Fussell, S., Setlock, L., Yang, J., Ou, J., Mauer, E., and
Kramer, A., Gestures over video streams to support
19.
20.
21.
22.
23.
remote collaboration on physical tasks. HCI, 19, 3,
2004, 273–309.
Genest, A. and Gutwin, C., Characterizing Deixis over
Surfaces to Improve Remote Embodiments. Proc.
ECSCW 2011, in press. (hci.usask.ca/publications/).
Gutwin, C. and Penner, R. Improving interpretation of
remote gestures with telepointer traces. Proc. CSCW
2002, 57-67.
Hayne, S., Pendergast, M., and Greenberg, S.,
Implementing gesturing with cursors in group support
systems. J.MIS, 10, 3, 1993, 43–61.
Hilliges, O., Izadi, S., Wilson, A., Hodges, S., GarciaMendoza, A.,and Butz, A., Interactions in the air:
adding further depth to interactive tabletops. Proc.
UIST 2009, 139-148.
Hindmarsh, J. and Heath, C., Embodied reference: A
study of deixis in workplace interaction. J. Pragmatics,
32, 12, 2000, 1855–1878.
Ishii, H., Kobayashi, M., ClearBoard: a seamless
medium for shared drawing and conversation with eye
contact. Proc. CHI 2007, 525-532.
Izadi, S., Agarwal, A., Criminisi, A., Winn, J., Blake,
A., and Fitzgibbon, A., C-Slate: a multi-touch and
object recognition system for remote collaboration
using horizontal surfaces. Tabletop 2007, 3–10.
Kettebekov, S. and Sharma, R., Understanding
gestures in multimodal human computer interaction. J.
AI Tools, 9, 2, 2000, 205–223.
Kirk, D., Stanton Fraser, D., Comparing remote
gesture technologies for supporting collaborative
physical tasks. Proc. CHI 2006, 1191-1200.
Kirk, D., Rodden, T., and Stanton Fraser, D., Turn it
this way: grounding collaborative action with remote
gestures. Proc. CHI 2007, 1048-1058.
Li, J., Wessels, A., Alem, L., and Stitzlein, C.,
Exploring interface with representation of gesture for
remote collaboration. OZCHI 2007, 179-182.
Ou, J., Chen, X., Fussell, S., Yang, J., DOVE: Drawing
over video environment. Proc. Multimedia, 100–101.
Tang, A., Neustaedter, C., and Greenberg, S.,
Videoarms: embodiments for mixed presence
groupware. People and Computers 20, 2007, 85-102.
Tang, A., Pahud, M., Inkpen, K., Benko, H., Tang, J.,
and Buxton, B., Three’s company: understanding
communication channels in three-way distributed
collaboration. Proc. CSCW 2010, 271–280.
Tang, J., Minneman, S., VideoDraw: a video interface
for collaborative drawing. ToIS, 9, 2, 1991, 170–184.
Tang, J., and Minneman, S., VideoWhiteboard: video
shadows to support remote collaboration. Proc. CHI
1991, 315–322.
Wigdor, D., Williams, S., Cronin, M., et al., Ripples:
utilizing per-contact visualizations to improve user
interaction with touch displays. Proc. UIST 2009, 3-12.
Yamashita, N., Kaji, K., Kuzuoka, H., and Hirata, K.,
Improving visibility of remote gestures in distributed
tabletop collaboration. Proc. CSCW 2011, 98-104.