Evaluating the Effectiveness of Height Visualizations for Improving Gestural Communication at Distributed Tables Aaron Genest University of Saskatchewan [email protected] ABSTRACT In co-located collaboration, people use the space above the table for deictic gestures, and height is an important part of these gestures. However, when collaborators work at distributed tables, we know little about how to convey information about gesture height. A few visualizations have been proposed, but these have not been evaluated in detail. To better understand how remote embodiments can show gesture height, we developed several visualizations and evaluated them in three studies. First, we show that touch visualizations significantly improve people’s accuracy in identifying the type and target of a gesture. Second, we show that visualizations of height above the table help to convey gesture qualities such as confidence, emphasis, and specificity. Third, we show that people quickly make use of height visualizations in realistic collaborative tasks, and that height-enhanced embodiments are strongly preferred. Our work illustrates several designs for effective visualization of height, and provides the first comprehensive evidence of the value of height information as a way to improve gestural communication in distributed tabletop groupware. Author Keywords Digital tables, deixis, gesture, gesture height, pointing ACM Classification Keywords H.5.3 Group and Organization Interfaces: CSCW. INTRODUCTION Gestural communication – particularly pointing as a deictic reference – is ubiquitous when people work together on tables in the real world. Pointing reduces the complexity of verbal communication by allowing people to easily indicate objects, areas, or paths through simple gestures [2]. When collaboration happens on distributed tables, however, it is much more difficult to remotely convey natural pointing gestures. Although digital embodiments can be employed (e.g., telepointers or video images), current techniques are inadequate for conveying the subtleties of deictic gesture. In particular, typical embodiments do not show the height Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CSCW 2012, February 11–15, 2012, Seattle, Washington. Copyright 2011 ACM XXX-X-XXXXX-XXX-X/XX/XX...$5.00. Carl Gutwin University of Saskatchewan [email protected] of a gesture above the table. People use height for several things when they construct a pointing gesture: to make clear the target of the gesture (e.g., putting a finger on the object of interest); to indicate different types of gesture (e.g., indications of paths versus general areas); to show different qualities in the gesture (e.g., a high gesture might be used to indicate less specificity); or to indicate that the gesture points to a location that is out of reach or off the table [6]. Without a representation of gesture height, pointing-based gestural communication on distributed tables is prone to errors, misunderstandings, and interpretation difficulty. Researchers in tabletop groupware have recognized some of these problems, and have suggested a few embodiment designs that show at least some aspects of height (e.g., [12,19]). However, these designs are primarily concerned with showing surface touches, rather than the full range of height above the table, and their effectiveness has not been evaluated in detail. In this paper, we investigate three questions that need to be answered in order to better understand the design of height visualizations: • Accuracy: does height information improve people’s ability to determine the type or target of a gesture? • Expressiveness: can height visualizations reliably convey qualities such as specificity, confidence, and emphasis? • Usability: can people make use of height visualizations in realistic work, and do they prefer these representations? We answered these design questio ns with three studies. The first experiment showed that representing touch and hover significantly improve people’s ability to determine both the target of the gesture and the type of gesture. The second study showed that people use height visualizations to interpret a gesture’s specificity, confidence, and emphasis, and showed that these interpretations are consistent with the ways that people see real-world gestures. The third study looked at the ways that people use height representations in realistic collaboration, and showed that people quickly make use of the additional height information in their deictic gestures, that the heightaugmented embodiments caused no new usability problems, and that the augmented versions were strongly preferred by users. This work makes two main contributions. First, we demonstrate several visualizations for showing key elements of gesture height, designs that refine and improve on previous solutions. Second, we provide a wide variety of evidence for the value of representing height in remote tabletop embodiments: we are the first to show that height visualizations can improve interpretation accuracy, can improve the expressiveness of remote gestures, and can be quickly learned and used in realistic collaboration. The design information and empirical evidence that this work provides can greatly help to improve the subtlety and precision of gestural communication on remote tables. PREVIOUS WORK Gestures and Deixis Researchers have identified gestural interaction, specifically deixis or pointing gestures, as critical to the success of collaborative environments (e.g. [2], [10], [13]). It follows that distributed collaboration is hampered when users work in environments where they are unable to see the gestures made by remote collaborators. This can result in higher levels of frustration in communication and lower-quality results from collaboration [2]. Thus, supporting gestural communication between collaborators is an important goal for distributed groupware technologies. One way of providing support for gestures is through embodiments – representations of the participants in a distributed collaboration. There has been a great deal of work done on embodiments (e.g., [16,20,18]), but we are particularly interested in embodiments for distributed tables. Remote Embodiments on Tables and Surfaces Several systems have provided visual representations of remote users’ arms and bodies, from early video-based tools such as VideoDraw [20], VideoWhiteBoard [21], and ClearBoard [11] to more recent tabletop and large-display groupware (e.g., [17], [18]). Embodiments in these systems can be divided into two categories: abstract embodiments such as telepointers [9], which represent the user with lines and shapes, and realistic embodiments, which often employ video techniques (e.g., [21], [19]). Each of these approaches has particular strengths and weaknesses [6]. Although realistic embodiments can be richer in their representation of natural deictic gestures, abstract embodiments are better able to indicate a specific touch point, or provide extra information such as additional visualizations, that are not available in 2D video (see Table 1). The question of whether video or artificial representations are best has received some attention, with some evidence for both sides: a study by Kirk et al. [14] found that realistic video of hands and arms was better than a representation of a stylus drawing tool; but several projects have also shown or implied that abstract visualizations are also useful [4,19]. In a comparison of abstract and realistic techniques, Genest and Gutwin [6] identified the height of a gesture as one component that video embodiments do not communicate well. This finding is supported by Fussel et al., who found that lack of information about gesture height was a barrier to communication in collaborative environments [5]. These problems exist partly because of the 2D representation of a realistic embodiment, but there are also several practical problems in representing height with video – for example, higher gestures become much larger (since they get closer to a top-mounted camera), and higher gestures are much more likely to leave the camera’s view frustum. Table 1. Representational capabilities of abstract and realistic embodiments (adapted from [7]). Component of Deictic Gesture Differentiate similar gestures Represent stroke gestures Express full range of morphology Represent height Abstract No Yes No Possible Realistic Possible No Possible No Genest and Gutwin found that height differences were closely associated with variations in confidence, specificity, and emphasis. They also observed height used to indicate secondary references, to point at distant objects, and to mirror the real-world height variations of an object. They identified four important levels of height: on the surface, moving between on and just off the surface, hovering off the surface, and more than 5 cm above the surface [6]. Showing Gesture Height in Remote Embodiments Previous work on embodiments has considered height in two ways: as an aspect of gestural communication, or as a way to provide feedthrough about others’ actions. Height in Existing Embodiment Techniques One of the earliest video embodiments, VideoWhiteBoard, used real shadows cast on the display surface [21]. This had the side effect of showing diffusion in the shadows as people moved away from the surface, a cue that could be used by remote participants to better understand people’s locations. Later implementations of the idea used digital rather than analogue shadows, and did not convey the same kind of distance information [1]. Distance information can also indicate intention to make a deictic gesture. Fraser et al. found that explicitly visualizing the approach of participants’ pens toward a wallboard significantly improved coordination in a video-annotation task, and reduced conversational latency [4]. Both Fraser’s work and VideoWhiteBoard are effective in showing heights or distances larger than 5cm, but neither of these projects explored a more complete design space for height. Hilliges et al. [9] used the height from a surface as a method of providing more sophisticated interaction with 3D objects. They provided hand shadows as a feedback technique, inverted so as to appear smaller when hands were farther from the surface. As an interaction technique, distance from the surface can be mapped, as they suggest, to less engagement, and therefore less of a user footprint on the surface. We consider this in our designs, detailed below. More recently, Tang et al. enhanced the VideoArms technique to show ‘touch pearls’ [19] or contact traces when participants were touching the table surface, using an effect similar to telepointer traces [7]. They mention that proximity information was not well modeled in their design, despite the improvement provided by the traces; however, they note that the traces did provide a level of awareness not available through the original VideoArms design. No evaluation of the contact traces was carried out. Height as Feedthrough Height or distance from the surface can also provide information about user activity, since surface touches are often used to invoke commands in the system. Visualizations of height are therefore used to give feedback to the local user, and can also provide feedthrough to remote participants [3]. The visualizations generally show the difference between hover and touching states on touch or pen interfaces (e.g., with the C-Slate interface [12], or with Wigdor et al’s ripple visualizations [22]). Although work on feedthrough that incorporates height information provides valuable insight into embodiment design, this research has not investigated how including height affects communication in distributed collaborations. DESIGNING AN ENHANCED EMBODIMENT Our review of related work identified three important design factors in height representation for embodiments: the difference between touch and hover, the relative height of gestures above the surface, and the short-term history of movement. To better evaluate the usefulness of height in embodiments, we designed two components: one to represent surface touches and the short term history of contact with the surface (contact traces) and one to represent the relative height of a gesture above the table. Since height is difficult to see in video-based embodiments, our height visualizations are abstract rather than realistic; they could, however, be added to a video-based embodiment. There are also reasons for lower-fidelity or abstract embodiments to stand alone, such as situations of limited bandwidth or low computational power, or systems that support high numbers of collaborators. We anticipate that the designs below, supported by the research presented in this paper, can be used in either context – as additions to high-fidelity, video-based embodiments or as components of high-efficiency, abstract embodiments. Our designs build on previous work: traces and ripples are based on work by Tang [19] and by Wigdor [22], and the inspiration for our above-the-surface ellipses comes from work by Tang on VideoWhiteboard [21]. However, our designs are the first to comprehensively describe where gestures are situated in the space above the table. improved visibility on a variety of backgrounds). When a touch is detected, we add a crosshair shape under the arrow, and highlight the crosshair in green (see Figure 1). We used arrows and crosshairs instead of wider blob shapes (like those extracted from FTIR tables), since we were interested in comparing them to traditional telepointers. Differences in accuracy between the mouse arrow and a fingerprint-sized blob would be a confounding factor in evaluations. Figure 1: Our table-touching designs, on the surface colour and shape change (left); above the surface, plain (centre); and combined with contact traces as ripples (right). Showing the Touch Path with Contact Traces Traces of interactions improve the interpretation of distributed gestures [19], and paths are common elements of gestures over tables [6], so we added a visualization of the path of a touch gesture, or contact traces. We designed two variants of the trace visualization. The first showed contact traces in the form of ripples (Figure 2, top), based on the previous success of ripples as touch feedback [22]. The second version shows simple sketch lines along the path of the gesture (Figure 2, middle); this design more clearly indicates a path (ripples can occasionally be mistaken for a multiple point gesture). In this version, ripples are still shown at the point of initial and final contact with the table, which permitted tapping gestures and gestures that touched the surface at a single point to still show a contact trace. Figure 2. Top: ripple contact traces. Middle: line contact traces (slow movement with initial ripple at touch point). Bottom: line contact trace ‘colouring in’ an area. We designed embodiments to convey the location of a touch, and the recent history of contact with the table. The trace lines also showed the speed of the gesture: fast touching gestures resulted in straight and parallel sketch lines, and slower gestures showed lines that were slightly rotated and slightly further apart (Figure 2). This visualization was added since our early observations showed that people ‘colour in’ space with quick path gestures to indicate an area. Showing Touch Designing to Show the Height Above the Table Representing Touches on the Table The change in gesture height between just off the surface and touching is an important state change [6]. To emphasize this difference, we dual-encoded touch state using both shape and colour. When the gesture is above the surface we represent the user’s pointer with an arrow (red and black for Beyond the table surface, height is an ordinal variable. Although people use different layers in the above-the-table space, these differences are not uniform [6], so we represent height with a continuous visualization, but with added elements that allow people to make use of different layers. The ordinal component of height above the surface is represented as an ellipse that increases in size and becomes more transparent as the gesture moves upwards (Figure 3). These visual cues are compatible with the idea that higher gestures are less specific [6] and that higher gestures might need to appear less engaged [9]. Our ellipse was multicoloured, increasing its visibility on a variety of backgrounds (Figures 3 and 4). We also included a movement trace (to emphasize high-level path gestures) by fading out each ellipse after one second; the effect is one of a series of shapes left behind in the trail of the cursor. (i.e., can they convey the kinds of subjective qualities that are common in real-world gestures); and usability (i.e., can people make use of them in realistic work, and do they prefer them over basic representations). The three studies are described below. The first two experiments are more controlled, and therefore investigate simple gestures – pointing at objects, paths, and areas. These gestures comprise a substantial proportion of simple deixis (i.e., many indicative gestures in the real world involve these simple actions), and are also components of more complex interactions (previous work has shown that complex gestures can be broken into smaller common components [13]). In the third study, participants carried out open-ended collaboration, and so the tasks and gestures were not constrained. STUDY 1: INTERPRETATION ACCURACY Figure 3: Above-the-surface visualization. A high gesture (A) with the large, transparent, ray-cast ellipsoid and a low gesture (B) with the small, solid, sphere. Our design subtly reflected the change in height layer (between the 0-5cm layer and the above 5 cm layer) by changing from a perfect sphere (in the lower layer) to an increasingly narrow and long ellipse (in the higher layer) (see Figure 3). This visualization technique helps to show gestures that point to objects far away or off the table with a simulated ray-casting technique: as gestures move higher, the ellipse becomes narrower and longer, and the centroid of the ellipse moves away from the user (i.e., as a shadow would when cast from a light behind the user; see Figure 4). These effects were shown only when the gesture rose 5cm above the table surface. This study looked at the effects of touch and hover visualizations on observers’ ability to identify the target of the pointing gesture (what was being pointed at), and on people’s ability to identify the type of pointing gesture (i.e., whether the gesturer’s intent was to point to an object, a path, or an area). Height Visualizations and Tasks We compared four of the visualizations described above in two tasks. Our visualizations (see Figures 1 and 2) were: • Arrow: a red-outlined, standard arrow pointer (Fig. 1); • Touch: a shape- and colour-changing pointer (Fig. 1); • Trace: a red-outlined, standard arrow pointer (Fig. 1) with ripples as a contact trace (Fig. 2); • Touch+Trace: a combination of Touch and Trace. We used Arrow as a control condition because this representation provides the highest gesture specificity, and because arrows are a standard embodiment in groupware. The visualizations were tested on two tasks, both of which used aerial photographs as background data. Participants were seated at a large table, on which was projected a satellite photograph of New York (see Figure 5). The image was selected on the basis of its high number of closely situated potential targets. Figure 4: Simulated ray-casting with the ellipsoid embodiment. N.B., the shadow is an artifact of the overhead projection and is not part of the embodiment. EVALUATION OF HEIGHT VISUALIZATIONS: 3 STUDIES Although a few designs for touching and above-the-surface interaction been seen in prior work, there are no formal evaluations of the effectiveness of height visualizations for remote collaborators. We carried out three studies to assess three important design issues for height visualizations: accuracy (i.e., do they improve people’s ability to determine the type or target of a gesture); expressiveness Task 1: identifying the target of the gesture. We prerecorded twelve gestures that touched the surface of the table 3-5 times. Touches occurred at realistic targets (such as street intersections) and the appropriate point of each visualization was used for the touch (the tip of the Arrow representation, and the center of the Crosshair). After viewing the gesture (there was no accompanying speech), the participant used the mouse to click on each location where the gesture touched the surface. Participants were asked to be as accurate as possible (but were also aware that the task was timed). Participants could select the points in any order (our analysis used a best-fit technique that calculated the best possible accuracy value). Task 2: identifying the type of the gesture. We pre-recorded three kinds of gestures that can look similar [6], in order to determine whether height information helps people differentiate between them. The three types (see Figure 5) were paths, which indicated a straight or zigzagging path on the map; single points, which indicated single targets on the map; and multiple points, which indicated several points scattered across the map. Gestures were carried out at a variety of heights: paths could be on or above the table, and point gestures either touched the table or paused at each target in the gesture. Participants viewed 28 pre-recorded gestures (8 single-point, 8 multiple-point, 12 paths) in random order, and selected the type from a list. Study Design and Procedure The study used a within-participants design and RMANOVA to look for effects of visualization type on accuracy of target identification and type identification. Visualization were shown as a block (order balanced using a Latin square). Main tests used α=.05, and post-hoc comparisons used t-tests with the Bonferroni correction. Participants completed a demographic survey and an orientation session that included previews of each visual condition. The type identification task was presented first, then the accuracy target identification task. Sixteen participants (12 male, 4 female, 21-45 years old, mean 27.6) were recruited from a local university. Computer logs were used to track the participant’s completion time, selected points for the accuracy task, and answers in the type-of-gesture task. Study 1 Results Accuracy in Identifying Target Locations Since there were several targets in a gesture, and since participants were not required to specify the points in order, we calculated error using a best-fit technique that used the lowest overall error value (in pixels) for any possible mapping of participant selections to actual targets. 60 50 40 30 20 10 0 Arrow Touch Trace Visualization Type Touch+Trace Figure 7: Error in pixels by visualization (accuracy task). Accuracy of Gesture Type Identification RM-ANOVA showed a main effect of visualization type on identification accuracy (F3,45=154.85, p<0.001). As Figure 8 shows, all of the visualizations that represented touch (either with a visual state change or with a contact trace) had significantly lower error rates than the standard arrow (p<0.001 for all post-hoc pairwise comparisons involving the arrow; no other differences found). The differences were substantial: the nearly 50% error rate of the arrow cursor was reduced to less than 5% for all of the touch visualizations. This finding suggests that gesture identification is improved with any touch information, which helps to disambiguate between simple movement above the table (e.g., moving to the start position of the gesture) and the gesture itself. 0.6 0.5 Error Rate Figure 5: Gesture types used in Study 1: left (blue), a scatteredmultiple points gesture; middle (red), a path gesture; right (green), a point gesture. An example of the background used in the studies. Mean Error per Target (pixels) RM-ANOVA showed a main effect of visualization type on location accuracy (F3,45=15.76, p<0.001). Figure 7 displays average error amount per target, and shows that participants were approximately 12 and 16 pixels closer to the targets with the two trace visualizations (Trace and Touch+Trace) than with the other techniques. Pairwise comparisons showed that both trace visualizations had significantly lower error amounts than Arrow and Touch (all p<0.01, no other differences found). RM-ANOVA of the completiontime data found no effects of visualization type (F3,45=0.252, p<0.859). 0.4 0.3 0.2 0.1 0 Arrow Touch Trace Visualization Type Touch+Trace Figure 8: Error rate on the identification task STUDY 2: EXPRESSIVENESS The second study investigated the question of whether height visualization aids in conveying subjective qualities that are common in real-world gestures, such as the degree of confidence, specificity, and emphasis in the gesture. We explored this question in three parts. First, we looked at whether people interpret height information in a remote gesture as having an effect on the confidence, specificity, and emphasis of the gesture. Second, we looked at whether people associate visualizations of tapping with emphasis. Gesture height was shown using a combination of two designs described above: height above the table was shown using the ellipse and sphere visualizations (Figure 3), but the moment the gesture touched the table, the sphere disappeared and the line-based contact traces, state-change cursor and starting and ending ripples were used (Figure 2, middle). As discussed above, we used line-based traces rather than ripples because our design work suggested that lines are less likely to indicate multiple points to viewers. Tasks, Design, and Procedure The study used similar tasks to those described for the first experiment. Participants stated whether the gesture was a path, point, or area gesture, and also answered three questions related to gesture qualities: how specific was the gesture, how emphatic was the gesture, and how confident was the gesturer in making the gesture. These questions were answered on a seven-point scale. In addition, we included two distractor questions to reduce the chance of participants guessing the hypothesized link between height and specificity, confidence, and emphasis. These were: did the gesture indicate something off the edge of the table, and did the gesture indicate an out-of-reach object. Sixteen participants (11 male, 4 female, 19-33 years old, mean 23.9) were recruited from a local university. All watched several previews of the gesture visualization. The study used a within-participants design to examine the effects of two factors: the global height of the gesture, and the presence of local height variation in the gesture. These Study 2 Results We had two groups of data: participant measures of gesture qualities and an accuracy test. We examined the participant data to see how height variation (global and local) affected how participants interpreted gesture qualities. Interpretation of Gesture Qualities: Global Height All participants interpreted the ellipse height visualization as expected, inferring that lower height implied greater confidence, specificity, and emphasis. We performed an ANOVA to determine whether height visualizations at the three global heights led to differences in participant ratings of these three qualities. In all cases, the ANOVA showed a significant effect of height on the perceived level of the quality (confidence, F2,30=24.84; specificity, F2,30=30.27; emphasis, F2,30=15.64; all p<0.001), see Fig. 9. Participant rating (1-‐7) We considered two kinds of gesture height in the study: global height, indicating the overall height of the entire gesture above the table, and local height variation, which involved hand-bouncing or finger-tapping motions but did not change the global height of the gesture. We recorded gestures at three global heights: high (more than 20cm above the surface), low (between 5 and 20cm above the surface), and surface (in contact with the table). We recorded half of the gestures with local height variation within the global height (i.e., a tapping or waving movement that remained within the bounds of a specific global height), and half where the hand and fingers maintained a constant, global height (i.e., no tapping). Each gesture set (high, high with local variation, low, low with local variation, surface, and surface with local variation) included two kinds of path gestures, two kinds of point gestures, and two kinds of area gestures: • Pointing gestures indicating a single target; • Pointing gestures indicating multiple scattered points; • A simple path between two points; • Paths between multiple points; • An area delineated with a contour gesture; • An area that is ‘coloured in’ with the gesture. factors were fully crossed and contained either three or four trials for each of the gesture types (path, point, area). The difference in number results from the fact that some gestures would not realistically be used in some situations (e.g. hovering over an area works only above the surface). There were a total of 63 randomly-ordered trials. 7 6 5 High Low Surface 4 3 2 1 Specificity Confidence Gesture Quality Emphasis Figure 9: Interpretation of gesture qualities, by global height. Interpretation of Gesture Qualities: Local Height Variation RM-ANOVA also showed a main effect of local height variation on interpretation of qualities (F1,15=6.39, p<0.001). As shown in Figure 10, people were more likely to interpret a gesture as emphatic if there was a bouncing or tapping motion in the gesture. There was also an interaction between global height and local height variation (F2,30=4.68, p<0.001). Post hoc pairwise comparisons found significant differences between height variation and no height variation on specificity, emphasis, and confidence for globally high gestures (p<0.001), globally low gestures (p<0.01), but only on emphasis for gestures on the surface (p<0.01). Rating of Emphasis Third, we looked at whether height information helps people to interpret gesture type. These explorations are motivated by prior work showing that in co-located situations, people do interpret gestures in these ways [6]. 7 6 5 4 3 2 1 Height Variation No Height Variation High Low Height of Gesture Surface Figure 10: Interpretation of emphasis, by local height variations. Global heights are collapsed. As the figure shows, when people saw only the global height, they were much more likely to reduce their estimation of emphasis with high gestures, but when there was local height variation, this information was consistently interpreted as implying that the gesture was emphatic. (Results were similar for specificity and confidence). Accuracy of Identifying Gesture Type RM-ANOVA showed a main effect of global height on interpretation accuracy (F2,30=9.027, p<0.001). We investigated, post hoc, the high error rates we observed in some conditions. We discovered that the exceptional error rates were specific to particular gesture types: paths with points had high error rates above the surface and both multiple-point paths and area contours had high error rates on the surface. Participants largely misclassified the multiple-point paths and area contours surface gestures as path gestures, and both of these gesture types do share considerable information with paths. Part of this result is attributable to the complicated nature of gestures. The points-on-a-path gesture could also have been classified as a pointing gesture – participants interpreted it as a path on the surface and as pointing when in the air. Despite these subtleties in measures of accuracy, Study 2 provides us with three clear findings. First, and most importantly, representations of gesture height allow people to interpret several subject qualities of a remote gesture. Second, the global height of a gesture is consistently interpreted as being inversely proportional to the gesture’s emphasis, confidence, and specificity. Third, local height variations are consistently interpreted as increasing a gesture’s emphasis, confidence, and specificity. Overall, these results show that visualization of a remote gesture’s height can substantially increase gesture expressiveness. STUDY 3: USABILITY IN REALISTIC COLLABORATION The third study investigated the use of height-augmented embodiments in a realistic collaborative situation, and looked at three main issues: • Usage: do people make use of the height visualization when making gestures, and for what do they use them? • Interpretation: are gestures interpreted as intended, and do the visuals of the augmented embodiments cause any communication difficulty or confusion? • Preference: do people prefer augmented embodiments over standard versions? The study involved a realistic collaborative task between two participants: the remote participant was seated in one room at a (non-interactive) tabletop display, and the local participant was in another room at the table used in the earlier experiments. Local participants had a Polhemus Liberty 240/80 sensor attached to their primary pointing finger, which was used to capture the participant’s gestures in 3D for display on the remote participant’s table. The height-augmented embodiment used in the study (and shown equally to both participants) were the same as those used for the second study (Figure 2, middle; Figure 3; Figure 4). The ‘standard’ embodiment was a standard arrow telepointer (Figure 1, middle). The background data on the tables was a satellite image of the participants’ local city. The remote participant was given a list of 22 questions about locations in the city (similar to those used by Kettebekov and Sharma [13]). The remote participant asked each question to the local participant, who responded by speaking and gesturing over the map. Participants were asked to continue discussing the question until both were satisfied that the answer was understood and correct. The questions were selected to explore a wide range of open-ended interactions. Some questions involved areas that were familiar to one or both participants, and some questions were designed to involve unfamiliar areas of the city (e.g., safe boating areas or optimal pathways through neighbourhoods). Between questions, participants filled out a questionnaire which asked them to identify different qualities of the gestures that had been used in the exchange, and also asked how quickly they had achieved understanding, how well that understanding was achieved, and whether any locations had been out of reach or off the table. After each session, an interview was conducted to discuss the participants’ experiences and to ask about preferences. We did not gather performance data since the nature of the tasks and the differences in familiarity with the locations led to high variance in completion times. Half of the groups began with the height-enabled embodiment, and the other half with the telepointer. After 11 questions, they switched to the other visualization. Before each condition, participants were given as much time as needed to familiarize themselves with the embodiments. Eight pairs of participants were recruited from the local university (5 female, 11 male, ages 21-33, mean 24.4). Sessions were videotaped for later analysis. Study 3 Results Usage of height-augmented embodiments Observations during sessions and during later video analysis showed that the local participant (the person constructing gestures) adapted quickly to the additional capabilities provided by the enhanced embodiment. All participants clearly understood what the remote participant would see of the visualization, and all participants showed evidence that some of their gestures were constructed to make use of the height visualization. Indicating areas. Local participants regularly used the ellipse to indicate areas by hovering or moving slightly at a height that made the ellipse both an appropriate size for the target under discussion (e.g., Figure 13), and to indicate emphasis. For example, the following exchange occurred during discussion of shopping areas (LP: Local Participant): LP: It's on this road. [leaves hand on surface while tracing along road, scanning ahead to find target] LP: [raises hand slightly to create ellipse] And here's the mall. [taps finger in the air, varying the size of the ellipse] Improved gesture visibility. When local participants constructed a path or contour gesture with the augmented embodiment, it was clear that they knew the other person would see the path (because of the contact traces). As a result, we observed no instances where the local participant repeated the gesture. When people constructed gestures with the plain telepointer, however, it was common for the participant to repeat the gesture several times, even without prompting from the remote participant. Indicating out-of-reach targets. When tasks required the identification of far-away targets, local participants often used the augmented embodiment’s simulated ray-casting ability (see Figure 13). The following exchanges illustrate the difference between the two embodiment conditions; participants were able to come to a shared understanding with much less effort using the augmented embodiment. Telepointer condition: LP: Somewhere up there... I can't reach very far but... [stretches out of chair to reach as far as possible] RP: Yeah… LP: [sits down and retracts hand] Up along the river. Enhanced embodiment condition: LP: Up along the river, in the middle, there. [at limit of reach, raises hand to use raycasting, stays seated] RP: OK Figure 13: Left, reaching for a distant target in the plain condition; centre, reaching in the enhanced condition; right, hovering over an area in the enhanced condition. Controlling specificity. Participants using the enhanced embodiment were clearly aware of the way that height was shown to the other person, and exercised much more control over gesture height than in the telepointer condition. For example, in several sessions, the local participant avoided touching the table unless they intended a high level of specificity. In the telepointer condition, however, participants were indiscriminant about the height of their gestures, since there was only one level of specificity in the embodiment. This did not cause a problem for the telepointer condition (although we wonder what would happen in mixed-presence settings), but indicates that participants made use of the additional expressive capabilities of the embodiment when these were available. Interpretation of height-augmented embodiments We analysed the video data looking for examples in which the additional visual information of the augmented embodiments caused clutter, confusion, or errors. In no cases did the extra information cause any noticeable problem, and nor did any participant report such problems in the post-session interview. We paid particular attention to issues of distraction and occlusion for the ellipse visualization, but the transparency of this effect appeared to successfully remove any difficulties for the participants. It was also clear that remote participants made use of the features of the augmented embodiment in their interpretations. Evidence for the value of the visualizations came out in the interviews, where it was clear that remote participants understood the visual effects and wanted the local participant to make use of them. For example, one remote participant remarked, “I was hoping that she [the local participant] would use the difference between the ball [ellipsoid] and the pointer more than she did.” We note that gestures in this study were accompanied with speech, which likely limited interpretation problems (for either condition), in that accompanying speech could often be used to disambiguate a gesture. This indicates that (as expected) the information conveyed by height visualization may not be critical to the successful completion of the task; rather, the augmentations add subtlety and expressiveness to the overall interaction. (The fact that people did notice and appreciate these changes is shown in preference data). Preferences and Participant Comments When asked about their preferences after the session, 14 of the 16 participants stated that they preferred the heightaugmented embodiments. In addition, remote participants universally preferred the height-augmented version. In one typical response a participant described the visualization as, “really nice, especially the trail left behind.” The touch traces were appreciated by both participants: one remote participant said that the “pathways were a lot easier [to see]” with touch traces; a local user agreed, stating: “I liked the afterwards line. That was useful for generating paths.” The two participants who preferred the plain telepointer were local participants; and both indicated that the plain telepointer was more familiar and therefore more desirable. Other analyses Analysis of the post-conversation questionnaires (using RM-ANOVA) found no significant differences between telepointer conditions (all p>0.05). We did not expect major differences in these measures because of the high variance introduced by the unconstrained task. DISCUSSION Our evaluations show three main results: • Visualizing touch, hover, and contact traces significantly improved people’s ability to interpret the type of gesture, and the targets of multi-point gestures; • Visualizing height above the table allowed gestures to convey subjective qualities (e.g., specificity or emphasis), and these qualities were interpreted consistently; • Touch, trace, and height visualizations were all easy to learn and use in realistic collaboration, and people made frequent use of the visualizations when gesturing. • People strongly preferred the augmented embodiments. channel communication that should not be noticed by other participants. In these situations, height visualizations can allow people to convey far more information than what can be produced with a simple telepointer. Explanation and interpretation of main findings Second, even when speech is available, the added information of the augmented embodiment can help to reduce communicative effort. For example, the tapping-inthe-air gesture made by a participant in the third study (see description above) provided a clear indication of a location; if this were to be established through speech, the participant would have had to be more careful (e.g., more accurately coordinate the timing of the word here with the location of the telepointer). It is almost always true that verbal interaction can suffice [15]; but the added capabilities of the augmented embodiment provide more possibilities for participants to make conversation simpler and faster. A detailed explication of these improvements is left to future work, but it is indicative that participants’ preferences were so strongly in favour of having height information. Why height visualizations worked. There are three main reasons why the height-augmented embodiments were successful in our studies. First, and most obvious, there are several situations where useful and important information is encoded in a gesture’s height – different types of gestures use different heights near the surface, specific actions such as tapping have characteristic height variations, and subjective qualities are strongly associated with height above the table. When height information is available in the embodiment, people get a richer representation of what is intended in a gesture. Although verbal communication can sometimes make up for gaps in this visual information (see below), augmented embodiments can allow distributed communication to be simpler and more subtle. Second, our studies made it clear that remote embodiments on large tables and on complex workspaces are much less obvious and much less visible than real arms in co-located settings. This problem has been noted for other kinds of distributed groupware, but becomes more acute when tabletops are large (as in our studies). The lack of basic visibility makes it more difficult to determine when certain events (such as touching the table) occur, even with accompanying speech. The additional visual information of the augmented embodiments helped with this visibility problem – the change in representation when touching, the contact traces, and the ellipse all provided more noticeable representations that aided interpretation. Contact traces were found to be particularly valuable in our studies, suggesting that it is difficult for observers to see and remember the details of a remote gesture. Traces of past activity have previously been shown to be valuable for other reasons (e.g., they are an ‘awareness buffer’ when attention is divided among several tasks [1,23]), but here we show that traces can also be useful when they are specific to a particular height – people in our studies knew that traces applied only to surface gestures. In future, we will explore the value of providing different traces at different heights, to determine whether we can obtain a broader benefit from this general approach. Does talk remove the need for height information? Our third study showed that even when people used a simple telepointer, they were still able to carry out their tasks and come to collaborative understanding, primarily through their accompanying verbal interaction. However, this does not mean that height visualizations are not valuable, for two reasons. First, there are many situations where verbal interaction is constrained – e.g., a gesture may be produced by a third person outside of the main conversation, participants may be trying to work on a task as quickly as possible, or a gesture may be intended as a subtle side- Natural gestures and gesture languages. Although we intended to evaluate how successfully an embodiment can represent height to remote collaborators, it is almost impossible to fully duplicate natural gestures for a remote participant, and so gesture visualizations will always to some degree be new visual languages rather than simply a representation of the natural gesture itself. In our third study, although we observed generally more natural pointing behaviour with our enhanced telepointer, we also saw people use new behaviours to take advantage of the facilities provided by the enhancements – for example, when they used simulated raycasting to identify out-ofreach targets or the expanding ellipse to indicate areas. This raises the question of whether it is better to try and replicate the visual appearance of real-world gestures (which can never be exact), or to produce an artificial visual language that people will have to learn (but that will cause less negative transfer from expectations of the real world). Future work will explore whether hybrid embodiments (video plus abstract representations) can provide a middle ground in this regard. Lessons for Designers and Future Work Although more research needs to be done in this area, two main principles from our work can already be used by designers of tabletop embodiments. First, designers should consider representing height information in remote embodiments, since height can improve interpretation and accuracy, can allow gestures to express a much wider range of subjective qualities, and is strongly preferred by users. In the near future, height information will be simple to obtain (e.g., using depth-sensing cameras), making these changes a feasible option. Second, touch visualizations and traces are useful for a variety of tasks (involving both awareness and height issues), and should be easy to add to groupware systems since touch is already sensed on many tables. Our work identified the role of height visualization as a stand-alone feature of embodiments, rather than as incorporated in already existing, high-fidelity, video-based embodiments. This has the added benefit of allowing our work to be used in the design of high-efficiency, abstract embodiments intended for deployment where bandwidth or computational power are limiting factors. In future work, we will examine how incorporating our height visualization affects communication with both abstract embodiments and hybrid embodiments [6] (combined video and abstract visualizations). We are also curious to see if significant behavioural changes occur when just the contact traces and touching information are present in an embodiment, or whether the height visualization is also needed. Finally, our work currently only applies to 2D display surfaces. There are a number of tabletop interaction designs that use false 3D or incorporate tangible components. We are interested to see how we can represent height when the display surface is no longer a uniform height. Conclusion The height of a pointing gesture above a table is an important part of collaborative communication. We evaluated how adding abstract height information to embodiments can improve the use and understanding of gestures in distributed collaboration. We found that embodiments can improve the accuracy of gesture and target identification, and that information about the height above the table allows users to consistently interpret subjective qualities of the gesture. Our evidence suggests that more information about the height of the gesture, both where it is and where it has been, can improve distributed communication and collaboration. We found that in realistic collaboration, users adapted quickly to the embodiments, and took advantage of the opportunities afforded by our designs when constructing their gestures. The augmented embodiments were strongly preferred. This work shows that height information can have important effects on the usability of distributed tabletop groupware. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. REFERENCES 1. 2. 3. 4. 5. Apperley, M., McLeod, L., Masoodian, M., et al., Use of video shadow for small group interaction awareness on a large interactive display surface. Proc. AUIC 2003, Volume 18, 81–90. Bekker, M., Olson, J., and Olson, G., Analysis of gestures in face-to-face design teams provides guidance for how to use groupware in design. Proc. DIS 1995, 157-166. Dix, A., Finlay, J., Abowd, G., and Beale, R. HumanComputer Interaction. New York: Prentice Hall, 1993. Fraser, M., McCarthy, M., Shaukat, M., and Smith, P. Seconds Matter: Improving Distributed Coordination by Tracking and Visualizing Display Trajectories. Proc. CHI 2007, 1303-1312. Fussell, S., Setlock, L., Yang, J., Ou, J., Mauer, E., and Kramer, A., Gestures over video streams to support 19. 20. 21. 22. 23. remote collaboration on physical tasks. HCI, 19, 3, 2004, 273–309. Genest, A. and Gutwin, C., Characterizing Deixis over Surfaces to Improve Remote Embodiments. Proc. ECSCW 2011, in press. (hci.usask.ca/publications/). Gutwin, C. and Penner, R. Improving interpretation of remote gestures with telepointer traces. Proc. CSCW 2002, 57-67. Hayne, S., Pendergast, M., and Greenberg, S., Implementing gesturing with cursors in group support systems. J.MIS, 10, 3, 1993, 43–61. Hilliges, O., Izadi, S., Wilson, A., Hodges, S., GarciaMendoza, A.,and Butz, A., Interactions in the air: adding further depth to interactive tabletops. Proc. UIST 2009, 139-148. Hindmarsh, J. and Heath, C., Embodied reference: A study of deixis in workplace interaction. J. Pragmatics, 32, 12, 2000, 1855–1878. Ishii, H., Kobayashi, M., ClearBoard: a seamless medium for shared drawing and conversation with eye contact. Proc. CHI 2007, 525-532. Izadi, S., Agarwal, A., Criminisi, A., Winn, J., Blake, A., and Fitzgibbon, A., C-Slate: a multi-touch and object recognition system for remote collaboration using horizontal surfaces. Tabletop 2007, 3–10. Kettebekov, S. and Sharma, R., Understanding gestures in multimodal human computer interaction. J. AI Tools, 9, 2, 2000, 205–223. Kirk, D., Stanton Fraser, D., Comparing remote gesture technologies for supporting collaborative physical tasks. Proc. CHI 2006, 1191-1200. Kirk, D., Rodden, T., and Stanton Fraser, D., Turn it this way: grounding collaborative action with remote gestures. Proc. CHI 2007, 1048-1058. Li, J., Wessels, A., Alem, L., and Stitzlein, C., Exploring interface with representation of gesture for remote collaboration. OZCHI 2007, 179-182. Ou, J., Chen, X., Fussell, S., Yang, J., DOVE: Drawing over video environment. Proc. Multimedia, 100–101. Tang, A., Neustaedter, C., and Greenberg, S., Videoarms: embodiments for mixed presence groupware. People and Computers 20, 2007, 85-102. Tang, A., Pahud, M., Inkpen, K., Benko, H., Tang, J., and Buxton, B., Three’s company: understanding communication channels in three-way distributed collaboration. Proc. CSCW 2010, 271–280. Tang, J., Minneman, S., VideoDraw: a video interface for collaborative drawing. ToIS, 9, 2, 1991, 170–184. Tang, J., and Minneman, S., VideoWhiteboard: video shadows to support remote collaboration. Proc. CHI 1991, 315–322. Wigdor, D., Williams, S., Cronin, M., et al., Ripples: utilizing per-contact visualizations to improve user interaction with touch displays. Proc. UIST 2009, 3-12. Yamashita, N., Kaji, K., Kuzuoka, H., and Hirata, K., Improving visibility of remote gestures in distributed tabletop collaboration. Proc. CSCW 2011, 98-104.
© Copyright 2025 Paperzz