Bradley Hayes and Brian Scassellati

Online Development of Assistive Robot Behaviors for
Collaborative Manipulation and Human-Robot Teamwork
Bradley Hayes and Brian Scassellati
Dept. of Computer Science, Yale University
Human-robot teaming has the potential
to enable robots to perform well beyond
their current limited and isolated roles.
Many modern robotics advances remain
inapplicable in domains where tasks are
either too complex to properly encode,
beyond modern hardware limitations,
too sensitive for non-human completion,
or too flexible for static automation
practices.
In these situations human-robot teaming
can be leveraged to improve the
efficiency, quality-of-life, and safety of
human workers. We desire to create
collaborative robots that can provide
assistance when useful, remove dull or
undesirable responsibilities when
possible, and assist with dangerous tasks
when feasible.
Task Comprehension
Task Execution
Tasks are learned by observing action sequences and building SMDPs from
recorded environment states and their associated action-based transitions.
These SMDPs are converted into goal-centric Hierarchical Task Networks,
where vertices indicate intermediate goals to be achieved during the task’s
execution.
Task
(G | E) → H
Robot Only
Each goal state is
representative of a collection
of possible actions that may be
taken from a valid previous
task state to progress the
activity.
Human Only
G
A|B
Mixed
b
c
d
B
A→C
c
D
e→b
a
C
c|e
a
Place
Board A
e
Using Board A
Using fastener
hardware
Using screwdriver
Secure Board A
•
•
Using Seat
Chair oriented
on back
Place Seat
Place
Board B
Secure Board B
•
•
Using Board B
Using fastener
hardware
Using screwdriver
b
e
Task progress is measured by determining
which task goal the agent(s) are attempting
to satisfy. We use active tools, occupied
workspace areas, and work piece specific
features alongside the task network to
determine action intent and to identify any
unexpected deviations.
: Observation
Partial view of goal-directed POMDP for assembling an Ikea Chair.
Speech bubbles denote observations from state transitions.
H
a→D
g
A
b→c→d
•
•
Either
Requires Both
•
•
To facilitate the proper
application of assistive
behaviors, task execution is
modeled as a multi-agent,
goal-directed POMDP.
Special measures must be taken for multiagent scenarios, particularly when
encountering human-in-the-loop
coordination. For these situations, standard
DEC-POMDP state estimation techniques do
not apply for most practical problems, as
solving within the human-robot collaboration
domain is NEXP-Complete.
Assistive Behaviors
Materials Retrieval
Materials Stabilization
Policy Optimization
Similar to learning tasks, once an HTN is
known we can learn SMDPs by
kinesthetic demonstration for many
types of assistive behaviors. These
learned behaviors may then be
associated with the HTN goal state that is
active at the time of training.
Retrieval behaviors can be
autonomously generated using HTN
state information to identify required
materials. Once identified, a robot
assistant may perform a pick-andplace action to retrieve items for coworkers.
Stabilization behaviors are
synthesized from the combination of
geometric properties of the held
object and pose examples provided
by the trainer.
A robot assistant must optimize its action
policy with both high- and low-level goals:
The result of this behavior learning
process is a system capable of following a
co-worker’s progress through a task with
the ability to render assistance or
guidance when necessary.
Seat
Placed
Get
Frame
Give
Frame
Chair
Face-up
Hold
chair
Frame
Taken
Individual user preference dictates
the level of invasiveness of this
behavior, ranging from placement
nearby to a direct handover.
Joint Object Manipulation
Enhancing Awareness
Tandem object manipulation is useful
when using unwieldy or cumbersome
materials. Torque sensing can be used
to detect failures (e.g., loss of tension
or abrupt unexpected force) or
collaborator-initiated motion.
Materials retrieval and stabilization
actions can be augmented to provide
awareness-enhancing behaviors that
do not directly interact with the
environment, such as positioning a
camera or mirror to provide a better
view.
Frame
Placed
Assistive SMDP
A goal-centric Hierarchical Task Network describing
the goal steps for the assembly of an IKEA Chair, with
attached assistive action SMDP
Kinesthetic guidance can provide a
corrective signal for over-permissive
bounds on acceptable workspace
position, relative angle to co-worker,
and proximity to other work pieces.
Joint manipulation can also be used to
enhance a co-worker’s capabilities. For
example, enforcing planar or axial
constraints during tool usage to
improve performance.
In common assembly domains, this
may include shining a laser to
indicate screw placements.
Funded by the Office of Naval Research Grant #N00014-12-1-0822, “Social Hierarchical Learning”
At a high level, it chooses assistive
behaviors to optimize its partner’s HTN
execution path (subtask ordering) given
considerations of other agents and
available resources.
At a low level, it should optimize for
spending the minimum possible time
spent achieving each particular goal.
Given a task POMDP T, possible POMDP
policies Tp Î TP, assistive behavior
execution policies Ap Î AP, a state
transition duration function d, and
observations defining the probability
function P describing the likelihood of a
collaborator executing a given task policy,
an optimal collaborator seeks to choose a
set of assistive policies that maximizes: