Online Development of Assistive Robot Behaviors for Collaborative Manipulation and Human-Robot Teamwork Bradley Hayes and Brian Scassellati Dept. of Computer Science, Yale University Human-robot teaming has the potential to enable robots to perform well beyond their current limited and isolated roles. Many modern robotics advances remain inapplicable in domains where tasks are either too complex to properly encode, beyond modern hardware limitations, too sensitive for non-human completion, or too flexible for static automation practices. In these situations human-robot teaming can be leveraged to improve the efficiency, quality-of-life, and safety of human workers. We desire to create collaborative robots that can provide assistance when useful, remove dull or undesirable responsibilities when possible, and assist with dangerous tasks when feasible. Task Comprehension Task Execution Tasks are learned by observing action sequences and building SMDPs from recorded environment states and their associated action-based transitions. These SMDPs are converted into goal-centric Hierarchical Task Networks, where vertices indicate intermediate goals to be achieved during the task’s execution. Task (G | E) → H Robot Only Each goal state is representative of a collection of possible actions that may be taken from a valid previous task state to progress the activity. Human Only G A|B Mixed b c d B A→C c D e→b a C c|e a Place Board A e Using Board A Using fastener hardware Using screwdriver Secure Board A • • Using Seat Chair oriented on back Place Seat Place Board B Secure Board B • • Using Board B Using fastener hardware Using screwdriver b e Task progress is measured by determining which task goal the agent(s) are attempting to satisfy. We use active tools, occupied workspace areas, and work piece specific features alongside the task network to determine action intent and to identify any unexpected deviations. : Observation Partial view of goal-directed POMDP for assembling an Ikea Chair. Speech bubbles denote observations from state transitions. H a→D g A b→c→d • • Either Requires Both • • To facilitate the proper application of assistive behaviors, task execution is modeled as a multi-agent, goal-directed POMDP. Special measures must be taken for multiagent scenarios, particularly when encountering human-in-the-loop coordination. For these situations, standard DEC-POMDP state estimation techniques do not apply for most practical problems, as solving within the human-robot collaboration domain is NEXP-Complete. Assistive Behaviors Materials Retrieval Materials Stabilization Policy Optimization Similar to learning tasks, once an HTN is known we can learn SMDPs by kinesthetic demonstration for many types of assistive behaviors. These learned behaviors may then be associated with the HTN goal state that is active at the time of training. Retrieval behaviors can be autonomously generated using HTN state information to identify required materials. Once identified, a robot assistant may perform a pick-andplace action to retrieve items for coworkers. Stabilization behaviors are synthesized from the combination of geometric properties of the held object and pose examples provided by the trainer. A robot assistant must optimize its action policy with both high- and low-level goals: The result of this behavior learning process is a system capable of following a co-worker’s progress through a task with the ability to render assistance or guidance when necessary. Seat Placed Get Frame Give Frame Chair Face-up Hold chair Frame Taken Individual user preference dictates the level of invasiveness of this behavior, ranging from placement nearby to a direct handover. Joint Object Manipulation Enhancing Awareness Tandem object manipulation is useful when using unwieldy or cumbersome materials. Torque sensing can be used to detect failures (e.g., loss of tension or abrupt unexpected force) or collaborator-initiated motion. Materials retrieval and stabilization actions can be augmented to provide awareness-enhancing behaviors that do not directly interact with the environment, such as positioning a camera or mirror to provide a better view. Frame Placed Assistive SMDP A goal-centric Hierarchical Task Network describing the goal steps for the assembly of an IKEA Chair, with attached assistive action SMDP Kinesthetic guidance can provide a corrective signal for over-permissive bounds on acceptable workspace position, relative angle to co-worker, and proximity to other work pieces. Joint manipulation can also be used to enhance a co-worker’s capabilities. For example, enforcing planar or axial constraints during tool usage to improve performance. In common assembly domains, this may include shining a laser to indicate screw placements. Funded by the Office of Naval Research Grant #N00014-12-1-0822, “Social Hierarchical Learning” At a high level, it chooses assistive behaviors to optimize its partner’s HTN execution path (subtask ordering) given considerations of other agents and available resources. At a low level, it should optimize for spending the minimum possible time spent achieving each particular goal. Given a task POMDP T, possible POMDP policies Tp Î TP, assistive behavior execution policies Ap Î AP, a state transition duration function d, and observations defining the probability function P describing the likelihood of a collaborator executing a given task policy, an optimal collaborator seeks to choose a set of assistive policies that maximizes:
© Copyright 2026 Paperzz