Coordinating Communication in Human

Coordinating Communication
in Human-Robot Task Collaborations
Aaron St. Clair and Maja Matarić
Interaction Lab, Computer Science Department, Viterbi School of Engineering
University of Southern California
Los Angeles, California, USA
{astclair, mataric}@usc.edu
Abstract—In this paper we study how coordinating
communication during human-robot task collaboration can be
used to improve in situ decision-making and team performance.
The problem of generating communication actions is formulated
as a planning problem compatible with a class of pair-wise
loosely-coupled tasks and uses the notion of role assignment to
guide the robot's communication actions. We developed an
approach for planning three different types of verbal feedback
using a Markov decision process representation of the task
environment and a set of policies for representing agent roles. We
conducted a pilot experiment that compared human-human and
human-robot collaborative task performance. A study designed
to experimentally validate the approach is currently being
conducted with n=20 users to determine if the generated
communication can quantitatively improve team performance
and qualitatively improve user experience.
Keywords- human-robot
human-robot teamwork;
I.
collaboration;
communication;
INTRODUCTION
We consider the human-robot interaction (HRI) problem of
a robot generating useful communicative feedback in the
course of human-robot task collaboration. As personal service
robots become increasingly more competent, capable of
performing broader varieties of tasks, and working in humancentered environments, there will be an expanded need for
effective methods for collaboration. Rather than requiring users
to gain competence in operating autonomous robots via a
screen-based interface, we seek to enable robots to use humanlike coordination mechanisms, specifically embodied
communication. We hypothesize that a robot teammate will
make a more effective work partner and improve quantitative
measures of team performance when supporting the natural
coordination mechanisms people use in working with each
other. Further, allowing users to interact with the robot as a
teammate rather than an operator will partially offload the
burden of coordination from the user, hopefully resulting in
quantitative improvements in performance.
We consider the problem of the robot producing
communication actions to a person with whom it is interacting,
in order to provide feedback and support in situ decisionmaking. First, we describe design goals for the robot's
communication actions, and the formulation of the problem.
This work was supported in part by National Science Foundation (NSF)
grants CNS- 0709296 and IIS-1117279, and the ONR MURI program
(N00014-09-1-1031).
Next, we summarize implementation details and describe the
pseudo-herding task used to validate the system. We present
results from a pilot data collection involving person-person and
person-robot teams that provided domain knowledge and were
used to motivate the types of communication actions the robot
uses in the task. Finally, we discuss initial results from an ongoing user study validating the approach with human-robot
teams.
II.
PROBLEM
Social science literature indicates many types of human
communication behavior used during collaborative tasks,
including attentional cues to indicate an area of focus, staging
actions to maximize shared visual information, gestural and
speech cues indicating intentional goals or instructions, and
coaching actions such as feedback, encouragement, and
empathetic displays to build team rapport. Effectively
producing all of these communication actions on robots, in
real-world task environments, is not currently feasible and
would be difficult to generalize across different robot
embodiments; for example, indicating attentional focus is
different with humanoid and non-humanoid robots. Based on a
review of the relevant social science literature covering humanhuman collaborations, and on observations of person-person
task collaboration in our experimental setting, we focus in this
work on using speech. Speech works well across different
embodiments and is effective in communicating intent. On the
other hand, speech has obvious limitations in noisy
environments, with users with hearing or linguistic limitations,
and in certain scenarios, such as disambiguating many similar
objects. Nonetheless, speech is a natural human communication
modality that addresses a range of use cases in home and work
environments.
Our approach involves the combination of three robot
communication components: 1) robot’s self-narration of its
activities, 2) role allocation suggestions for the user, and 3)
empathetic displays when positive and negative events occur.
The combination provides a balance of information aimed at
improving the human collaborator's situational awareness. The
second component, offering suggestions to the user, is
particularly interesting because it informs the user that the
robot is monitoring their progress and evaluating the world
from their perspective and allows the robot to potentially
influence the joint decision making of the team.
A. Related Work
Prior work has demonstrated that robots that account for the
actions of their collaborators when deciding what to do are
preferred and perceived as more intelligent [6] and that
anticipatory action can play an important role in increasing
team fluency [3]. Other work has treated the collaborative
process as a dialog, supporting verbal turn-taking and sub-task
assignment [1], [8]. Speech has been demonstrated to be an
effective input modality for commanding robots [6, 4]. Existing
work in assistive robotics on robot speech production has been
limited to specific conversational structures, such as turn-taking
[1]. Our aim is to develop a methodology allowing the robot to
issue task-relevant speech-based communication to a teammate
during a dynamic collaboration.
III.
APPROACH
The task control and communication approach consists of
three constituent components: 1) the task control system, 2)
the human activity model and recognizer, and 3) the
communication planner and executive. Planning with a human
partner in the environment is distinct from planning in multirobot scenarios mainly in how communication takes place
between teammates. Similarly, there are existing approaches
to segmenting, recognizing, and modeling human activity in
various contexts. We have developed a simplified
methodology to serve these purposes in our experimental task,
although this work could be integrated with and perhaps
improved by state-of-the-art recognition and task planning
systems. The main contribution of this paper is in the area of
planning and execution of communication actions during
collaborative task execution in HRI.
We formulate the coordination problem in two parts, as
follows. First, the robot represents the task as an MDP to plan
its actions in the presence of noise. We define a Markov
decision process (MDP) for the task 𝑀 = {𝑆, 𝐴, 𝑇, 𝑅}, where
𝑆 is the finite state of the environment, 𝐴 is the set of task
actions the robot can execute, 𝑇 is a function giving a
probability distribution over states for executing a given action
in a given state, and 𝑅 is a reward for each state. We assume
that the robot can perform the task and has a policy 𝜋 𝑠 = 𝑎
that allows it to perform the task. In this case, the transition
function captures the robot's uncertainty due to environmental
sources, such as sensor or motor noise, and also due to
changes the human collaborator might make in the
environment. This formulation has been successfully used
previously in human-robot collaboration [5] to coordinate the
task-actions of the robot without any communicative feedback.
In addition to ensuring that the robot can jointly perform the
task in the presence of a human partner, the robot needs to
provide verbal feedback to its partner. We focus on
communicating intended action via the mechanism of roles.
People use explicit and implicit roles in team activity and
other organized behavior. Roles have been used previously to
inform human-computer interfaces for collaboration of
multiple users [7] and extensively studied in multi-agent
systems [9]. Our pilot human-human experiments
demonstrated that people tended to assign roles to others
relative to common task-related objects and locations. Some
roles consisted of multiple discrete activities, such as
navigating to an object, picking it up, and moving it to a target
destination. To capture users’ preferential action selection, we
model roles using a set of assignable policies, 𝜋𝑟𝑜𝑙𝑒𝑠 =
{𝜋! , 𝜋! , … 𝜋𝑛 }. This set of policies is domain-dependent and
not assumed to be optimal or otherwise sufficient for solving
the task, when executed individually. Rather, the policies in
the set are a means of quantifying patterns of user behavior
over the course of the task, and of grouping similar actions
according to the roles people typically assign.
To track role use over time, we assume the robot receives a
stream of recognized actions and the agent (human or robot)
responsible for performing them. To accomplish this in our
test task, we developed a heuristic action recognition system
that monitors state transitions and agent positions. The system
maintains a multinomial distribution over the set of user roles
based on the likelihood that the user is executing each policy.
On each update, policies are reweighted based on their
agreement or disagreement with the recognized action. Based
on this recognized action and the robot’s own policy, three
types of verbal feedback are generated: self-narrative, roleallocative, and empathetic. To generate narrative feedback,
such as “I'll go take care of this,” the robot monitors its
planned action and issues verbal feedback when the selected
action changes. To generate role allocation suggestions for the
user, such as “can you take care of the painting?”, the inferred
policy of the user is queried to retrieve the user’s likely next
action given the model. Since we are assuming the robot has a
single-agent model of the task, it does not know the best pairwise allocation of actions to each user. Instead, we provide the
system with a list of action pairs that would conflict if
performed by the person and robot at the same time. If the
robot and user are expected to take conflicting actions, the
robot suggests a different role for the user, one that would not
result in a conflict. Finally, to generate empathetic feedback,
the algorithm monitors for specific state transitions that are
associated with especially good and bad outcomes. When
these state transitions occur, the robot expresses empathy
using a non-domain dependent positive or negative empathy
(e.g., “Oh no” or “Great”), as appropriate.
Appropriate phrases for each type of verbal feedback were
determined by having people perform the task in a small pilot
experiment. For each action, we stored a set of phrases and
randomly selected one such phrase when executing the
communication. This avoided repetitive speech, which has
been shown to be both annoying to users and to make the
robot be perceived as less smart [10]. The phrases for roleallocative feedback can be readily adjusted to be more or less
polite using methods from Politeness Theory [2]. We used
baseline phrases collected from the pilot study, with no
attempt to make them more polite. At this step there is also the
possibility for the robot to proactively perform the better
action instead of recommending that the user do it. For our
initial experiments, we assume that the robot's task control
policy is static.
Figure 1. The experimental setup with person and a Pioneer 2-AT robot.
IV.
to an entrainment effect in which most users’ times improve
markedly during their second performance of the task. The
survey responses are promising, with all but one user
specifying strong agreement that the talking robot is a better
teammate than the silent robot, among other positive attributes
(see Table 1).
In addition to completing the full evaluation of the task
communication system, we plan to apply and evaluate the
approach on a real-world assistive task involving physical
object manipulation with older adults. We also plan to address
methods for learning the set of role policies and verbal
references from a guided interaction and develop methods to
adapt the robot's communication policy based on factors such
as the user’s compliance and preference for different amounts
of communication.
TABLE I.
EVALUATION AND RESULTS
To validate the approach, we developed an augmented
reality environment that allows for testing a variety of tasks.
The environment consists of a set of overhead projectors that
project a merged display on the floor. Users' positions in the
room are tracked with a pair of Microsoft Kinects. We tested
our framework with a Pioneer 2AT mobile robot and a person.
The virtual display data are provided by a task simulator that
updates the positions of virtual objects in response to the
actions of the physical agents overtime. This environment
allows for simplification of agent-environment interaction
dynamics and rapid prototyping of the task experience, as well
as specifying the difficulty of the task by varying environment
parameters. We implemented a pseudo-herding task in which
many virtual sheep appear and roam around the environment.
The collaborative team's goal is to herd all the sheep one-byone, and bring them to a centralized collection area as quickly
as possible. In addition, there are two timed objects, a lock and
a light, that must be activated periodically to avoid penalties.
These timing elements were incorporated to encourage
teamwork and make the task and interaction more engaging.
The set of policies representing user’s roles and the sets of
specific phrases were obtained through an initial data collection
with 6 person-person teams.
We are currently conducting a full experimental validation
of the communication system with human-robot teams in a
pseudo-herding task. The experiment is a within-subject
design, with each participant seeing both the silent robot and
communicating robot, with order counterbalanced. The robot’s
task controller is the same in both conditions. Participants are
first introduced to the task by the experimenter and then asked
to do two trials with a robot teammate with the goal of
finishing as quickly as possible. We collect audio, video,
tracking data, the simulated state information, and administer a
post-experiment survey asking about the robot’s value as a
teammate and other demographic information. Preliminary
results from 8 (2 female, 6 male) of n=20 participants
demonstrate that the mean total duration to complete the task is
lower (i.e., faster) when users collaborate with the talking
robot, with a mean time to completion of 151 seconds (SD=88)
compared to the silent robot with a mean of 169 seconds
(SD=100), although this difference is not significant, likely due
MEAN SURVEY RESULTS
Question
The things the robot said made sense.
The talking robot was a better teammate than
the silent robot.
The robot’s talking helped me understand
what it was going to do next.
I tried to do what the robot told me to do.
The talking robot was more fun.
The things the robot said helped me decide
what to do.
0 (disagree) - 6 (agree)
6.0
5.88
5.43
5.25
5.0
4.75
REFERENCES
[1]
Cynthia Breazeal, Cory D Kidd, Andrea Lockerd Thomaz, Guy
Hoffman, and Matt Berlin. Effects of nonverbal com- munication on
efficiency and robustness in human-robot teamwork. In Intelligent
Robots and Systems, 2005.(IROS 2005). 2005 IEEE/RSJ International
Conference on, pages 708–713. IEEE, 2005.
[2] Penelope Brown. Politeness: Some universals in language usage,
volume 4. Cambridge University Press, 1987.
[3] Guy Hoffman and Cynthia Breazeal. Cost-based anticipatory action
selection for human–robot fluency. Robotics, IEEE Transactions on,
23(5):952–961, 2007.
[4] Thomas Kollar, Stefanie Tellex, Deb Roy, and Nicholas Roy. Toward
understanding natural language directions. In Human-Robot Interaction
(HRI), 2010 5th ACM/IEEE International Conference on, pages 259–
266. IEEE, 2010.
[5] Stefanos Nikolaidis and Julie Shah. Human-robot cross- training:
computational formulation, modeling and evaluation of a human team
training strategy. In Proceedings of the 8th ACM/IEEE international
conference on Human-robot interaction, pages 33–40. IEEE Press,
2013.
[6] Julie Shah, James Wiken, Brian Williams, and Cynthia Breazeal.
Improved human-robot team performance using chaski, a humaninspired plan execution system. In Proceedings of the 6th international
conference on Human-robot interaction, pages 29–36. ACM, 2011.
[7] Randall B Smith, Ronald Hixon, and Bernard Horan.. In Collaborative
Virtual Environments, pages 160–176. Springer, 2001.
[8] J Gregory Trafton, Nicholas L Cassimatis, Magdalena D Bugajska,
Derek P Brock, Farilee E Mintz, and Alan C Schultz. Enabling effective
human-robot interaction using perspective-taking in robots. Systems,
Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions
on, 35(4):460–470, 2005.
[9] P. J. Gmytrasiewicz and P. Doshi, “A framework for sequential planning
in multi-agent settings.,” In Journal of Artificial Intelligence Research
(JAIR), vol. 24, pp. 49–79, 2005
[10] C. Torrey, S. R. Fussell, and S. Kiesler, “How a robot should give
advice,” in Human-Robot Interaction (HRI), 2013 8th ACM/IEEE
International Conference on, pp. 275–282, IEEE, 2013.