Action Inference: The Next Best Touch

The Next Best Touch for Model-Based Localization
Paul Hebert, Thomas Howard, Nicolas Hudson, Jeremy Ma
Joel W. Burdick
Jet Propulsion Laboratory, California Institute of Technology
Pasadena, CA, 91109, USA
California Institute of Technology
Pasadena, CA, 91124, USA
Abstract— This paper introduces a tactile or contact method
whereby an autonomous robot equipped with suitable sensors
can choose the next sensing action involving touch in order to
accurately localize an object in its environment. The method
uses an information gain metric based on the uncertainty of the
object’s pose to determine the next best touching action. Intuitively, the optimal action is the one that is the most informative.
The action is then carried out and the state of the object’s pose
is updated using an estimator. The method is further extended
to choose the most informative action to simultaneously localize
and estimate the object’s model parameter or model class.
Results are presented both in simulation and in experiment
on the DARPA Autonomous Robotic Manipulation Software
(ARM-S) robot.
I. I NTRODUCTION
We would like autonomous robotics to perform dexterous
tasks such as grasping screwdrivers and household objects
such as cups, inserting cables into power outlets, and using
keys to open and unlock doors [1]–[3]. All of these tasks
require precise localization of key objects in the robot’s
frame, with tasks such as key insertion requiring localization
on the order of a millimeter. While many of these tasks could
be done with highly precise robotic systems, there is a broad
interest in performing these tasks with inexpensive systems,
which may lack the required precision. Even in known
environments, with precise visual sensors, the precision of
a closed chain robotic mechanism may be insufficient to
execute many of these tasks open loop [4]. More generally,
errors typically introduced by visual pose estimation and/or
internal kinematic errors are often too large for precise
grasping and manipulation. Instead, estimation algorithms
can refine the local pose estimate during task execution,
when physical interaction with the object occurs [5]–[7]. This
paper builds upon these estimation algorithms, and provides
an automated method to select robotic manipulation actions
which will provide increased knowledge about the system
state.
In the recent DARPA Autonomous Robotic Manipulation
Software (ARM-S) competition, teams were required to
create autonomous manipulation systems to execute various
tasks. In particular, teams were required to unlock and open a
door autonomously. An interesting observation is that almost
all teams, regardless of the underlying approach, created
actions that first re-localized the door handle (by either
touching the door or the handle) before proceeding with
key insertion. In this paper, the action selection algorithm
automatically decides which probing motions to make, as
Fig. 1: Example door handle touching actions.
opposed to using predefined probes, motions or policies.
There are situations in which the robot may be presented
with a novel version of a known objectclass (e.g. imagine
a robot task to pick up a priori new screwdriver. The
robot’s perception system may be able to easily recognize
the screwdriver class, but the precise dimensions will not
be known.). As a consequence, it would be advantageous to
select interacting actions that would simulataneously localize
the object and estimate characteristic parameters of the
model that may impact localization. Similarly, a robot may
be required to manipulate an object of which it is unable
to discern the object’s class or category. This may occur
from visual occlusion or viewpoint dependency. This paper
presents a method to select actions that aims to both localize
and determine the object model.
Here, an action is defined as the combination of a trajectory, and a set of end conditions. The trajectory encodes the
planned open loop execution of the robot motion, and the
end-conditions specify when an action terminates (e.g. upon
a certain force threshold, or when contact has occurred).
Our action selection theory chooses actions which are
expected to minimize the entropy of the relative robotenvironment belief. Accurate localization requires minimal
uncertainty about the pose. In order to minimize uncertainty,
the entropy of the state distribution should be minimized,
as distributions with low entropy are peaked and certain.
Equivalently, the information gain can be viewed as the
change in relative entropy caused by undergoing an action
and updating the belief by acquiring a new measurement.
A critical aspect of action selection for manipulation in a
poorly known environment is that the resulting manipulation
trajectory cannot be predicted deterministically. As the robot
touches and interacts with objects, manipulation feedback
controllers will cause the manipulator to deviate from the
specified trajectories depending on the object position. As
opposed to previous work on probabilistic path planning,
where planning can be executed open loop [8], the problem of action selection must also consider the closed loop
response of the system in predicting what information a
specified action will yield.
To form a complete system, the described action selection
algorithm requires both a manipulation planner to generate
feasible trajectories, and an estimation algorithm with defines
how the system belief changes with new information. Details
of these algorithms can be found in previous papers [9]. In
this paper, we consider touching actions where after the
robot manipulator contacts an object, compliance control is
used to moderate the resulting forces.
II. L ITERATURE R EVIEW
Our action selection approach is based on expected information gain of each action. Generally, information gain has
been used extensively in the next best view problem, in which
a robot actively chooses where to view or look next in order
to gain the most knowledge of the problem at hand. In visual
active search, Davison [10] used information theory with
Gaussian uncertainty models for guided image processing,
such as feature tracking. In the domain of active recognition,
early work by Arbel [11] on gaze planning used an entropy
map to create a trajectory that maximizes disambiguation
of object recognition. In recent work, [12] optimizes a cost
metric based on information gain to determine the best action
for model identification and pose estimation of a certain
object. In terms of model generation, Krainin [13] uses the
change in entropy to determine the next best view in order
to build a 3D surface model of a grasped object.
Information gain has been extensively used in robotic
motion planning and exploration. Thrun, Fox and Burgard
[8] proposed to actively sense and navigate in a localization
context of occupancy grids by minimizing entropy to determine the next best exploring actions. A major difference
between our work is that for each choice of action, we make
no assumption that the action will complete. More recently,
in the domain of exploration, Stachniss et al. [14] also uses
the expected information gain to determine the best action
to explore. A Rao-Blackwellized particle filter is used to
localize and map the environment. The novelty of the paper
is computing the expected gain on two random variables, the
pose of the robot and the map of the environment. However
they too, also assume that each action will complete.
There is a vast literature in kinesthetic and tactile estimation. Early works include Grimson [15], which used
tactile sensors to identify and localize polyhedral objects,
and Allen [16], which combined vision with tactile sensors
to identify and localize objects. Gadeyne [17] use a Markov
based approach for force controlled robots and demonstrated
localization on a box. Gorges [18] used tactile and force
sensors to explore an object using skills of continuous and
discrete movements (for example, following an edge). Recently, tactile localization has become increasingly prevalent
with the advent of better sensors. Corcoran and Platt [6],
[19] introduce a particle filter and a novel measurement
model for localizing an object in hand. Lastly, Petrovskaya
[5] localizes a box based on tactile sensors and a novel
particle filter - our work adopts their measurement model.
These works did not present an automated way to interact
with the object. However, Hsiao’s [20] tactile exploration of
objects implemented a decision-theoretic approach and an
approximate POMDP, instead of information theory, to select
actions for exploration. Schneider [21] uses a bag-of-features
approach that combines tactile sensors with an information
gain to determine a grasping strategy for object identification.
Our contribution is automating touch-based localization
by exploiting information theory to determine the next
best touching action. Also, unlike previous work with
information-gain-based action selection, we consider actions
as nondeterministic, since the interaction with the object is
stochastic due to the uncertainty in the state of the object
and robot.
III. N EXT B EST T OUCH
The next best touch algorithm requires a method to estimate the state of the robot and object and their uncertainties. While our approach is general, our implemented
state representation x consists of both the state of the robot
xR and the state of the object xo : x = {xR , xo }. The
state uncertainty is compared against a threshold if it is
acceptable for accurate grasping or manipulation. If not,
candidate actions are generated, which specify how the
robot should interact with the object to gain information.
From these candidate actions, action constraints are created
in order to generate trajectories of the manipulator and
grasp mechanisms. Trajectories are broken into freespace
trajectories, which bring touching surfaces into the vicinity
of the object and interacting action trajectories, which aim
to contact the object or nearby fiducial surfaces, such as the
table top upon which the object may rest. Each interacting
action trajectory is costed using an information gain metric.
The action that maximizes the information gain metric is
selected and executed. As the action executes, acquired
measurements are used to update the state belief. Once the
action completes, the uncertainty is then checked against a
user-defined threshold to determine if additional actions are
necessary.
A. Estimation
Localization requires a state estimation framework which
forward predicts the state of world, xt , based on a possible
input action ai , and to update the belief based on new
measurements, z. We assume a general Bayes’ filter, which
operates in two steps: dynamic prediction and update. The
prediction step uses a generative dynamic model to propagate
Fig. 2: Candidate touching action directions and surfaces to
contact.
the state:
Z
bel(xt+1 |ai )
p(xt+1 |xt , ai ) bel(xt ) ,
=
(i = 1, . . . , nd ). The hand configuration pose during touching
actions are chosen by specifying surfaces of the robot’s
model MR (which may be the palm and/or the surfaces of
the fingers), which are associated with a contact sensor. The
normals of these surfaces are then aligned with the touching
direction (shown in figure 3). Since this construction does not
lead to a unique hand pose, the minimum rotation from the
hand’s initial pose to the aligned action direction is chosen.
To provide a range of hand poses, since some may not be
feasible, the rotation along the aligned axis is sampled to
provide a number of possible orientations about the touching
action direction. Figure 2 illustrates some possible ways the
hand may contact the object. Actions in which the hand
collides with other objects are pruned out of the candidate
touching actions.
In addition, human users may add extra information by
specifying which surfaces/edges of the object and hand
are advantageous to contact and similarly, surfaces that are
undesirable to contact may be specified. These surfaces may
be added or removed to the candidate action generation.
(1)
a2
xt
where bel(xt ) = p(xt |z1:t , u1:t−1 ) denotes belief of state x
and assumes measurements z and actions u upto time t.
The update step uses the measurement models described
in section III-D, consisting of both tactile and contact
measurements, which take positive and negative information
(positive information indicates contact occurs and negative
information indicates lack of contact) into account. After a
measurement, the state is then updated in the usual manner:
bel(xt+1 |ai )
p(z|xt+1 )bel(xt+1 |ai )
.
p(z|xt+1 , ai ) bel(xt+1 |ai )
xt+1
=R
τ
a1
P
o
Fig. 3: Touching action directions. Ellipse represent pose
uncertainty and black dots represent the hypothetical points
of contact with the object.
(2)
While our formulation does not require a specific type of
filter, a discrete histogram filter [22] is chosen for our
implementation. In this histogram filter, the continuous state
is binned into cells/divisions which form a partition of the
state space and which are used in a discrete Bayes’ filter.
Naturally, the histogram filter grows exponentially with state
size and bin size. A bin size is choosen according to the
desired accuracy of localization.
B. Object and Robot Models
The generation of candidate actions requires models of the
object o and the robot R. Both are modeled using a standard
polygonal mesh model M consisting of a fixed number of
faces {Fi } (i = 1, . . . , nF ), edges, {Ej } (j = 1, . . . , nE ),
and vertices {Vk } (k = 1, . . . , nV ). A reference frame Fo is
rigidly attached to the object model.
C. Generate Candidate Actions
Candidate touching actions are chosen in the following
manner and are comprised of a motion direction and a hand
pose. First, surfaces on the object model Mo , are chosen
(which may also be selected by the user) and the normals
of each surface are chosen as the touching direction di
D. Information Gain
The relative entropy of the state belief at time t is the
Kullback-Leibler divergence of the posterior belief bel(xt+1 )
and the prior belief bel(xt ):
Z
bel(xt+1 )
IG =
bel(xt+1 ) log
dx ,
(3)
bel(xt )
x
i.e., IG measures how much information is gained by taking
a specific measurement. In order to calculate the posterior
belief, a new measurement is used to update the prior belief.
However, in the context of finding the next best action
a∗ , these future measurements ẑ are always hypothetical.
Since the specific value of these measurements cannot be
known, the expectation of the relative entropy with respect
to hypothetical measurements, ẑ yields the expected gain in
information due to a candidate action:
Eẑ [IG(ẑ, ai )]
(4)
Z
Z
bel(xt+1 |ai )
dx dẑ
= p(ẑ) bel(xt+1 |ai ) log
bel(xt )
ẑ
x
Z Z
p(ẑ|xt+1 )bel(xt+1 |ai )
bel(xt+1 |ai )
=
p(ẑ)
log
dx dẑ
p(ẑ)
bel(xt )
ẑ x
Z Z
bel(xt+1 |ai )
=
p(ẑ|xt+1 )bel(xt+1 |ai ) log
dx dẑ ,
bel(xt )
ẑ x
where p(ẑ|xt+1 ) is the likelihood of the corresponding
measurement model, bel(xt+1 |ai ) = p(xt+1 |xt , ai ) bel(xt )
is the intermediate belief after the state prediction of action
a, and p(xt+1 |xt , ai ) is the prediction likelihood relating to
the dynamics governing the robot’s and object’s movement.
Similarly, the time and location of the end of a probing action (the moment of contact) is also unknown as the object’s
location is uncertain. Hence the information gain must take
into account the uncertainty in the time of contact. We let τ
denote the time/location at which collision occurs along the
action direction. τ may simply be sampled uniformly along
the action direction.
Eτ,ẑ [IG(ẑ, τ, ai )]
(5)
Z Z
X
bel(xt+τ |ai )
dxdẑ,
=
p(τ )
p(ẑ|xt+τ )bel(xt+τ |ai ) log
bel(xt )
ẑ x
τ
where p(τ ) is the probability density of ending the action at
time τ , and may be found using:
P
C(τ )bel(xt+τ )
x
.
(6)
p(τ ) = P Pt+τ
τ
xt+τ C(τ )bel(xt+τ )
The binary function C(τ ) simply indicates the presence or
absence of collision between the two bodies (represented
by polygonal mesh models, M) at time τ , namely the
manipulator at state xR and the object to touch at state
xo . The state of collision can be determined by simulating
contact using a contact detection algorithm:
c = C(x, τ ) = C(Mo (xot+τ ), MR (xR
t+τ )) .
(7)
A world model with accompanying computational mechanics
algorithms is used to support the collision detection algorithm. For this work, we interfaced with the third party
Bullet1 software. Polygonal meshed models of the objects
and robot are used with Bullet to detect collisions.
The measurement ẑ is comprised of contact (ẑ1 ∈ R1 ) and
tactile (ẑ2 ∈ R3 ) measurements which are typically used in
touching. While a contact measurement could be determined
solely from tactile measurements, force-torque sensors in the
manipulator can be used instead to infer contact. Therefore,
the measurement likelihood, which combines tactile and
other force sensors can be described as:
where α and β are the false positive and false negative error
rates, respectively, and c is a binary variable that indicates
an intersection/collision of the object and robot hand; this
can be simulated using the contact detection method, C.
Similarly, the tactile measurement likelihood is adopted
from [5] and can be decomposed into the contact position, ẑ2p and the contact normal, ẑ2n by independence
p(ẑ2 |ẑ1 , xt+τ ) = p(ẑ2p |ẑ1 , xt+τ )p(ẑ2n |ẑ1 , xt+τ ).
p(ẑ2p |ẑ1 = 1, xt+τ )
1 dist(ẑp , Fi (Mo (xot+τ )))
1
=√
exp −
2
σp2
2πσp
(10)
p(ẑ2n |ẑ1 = 1, xt+τ )
1
1 kẑn −ni (Mo (xot+τ ))k2
=√
exp −
2
σn2
2πσn
(11)
,
where Fi (Mo (xot+τ )) and ni (Mo (xot+τ )) are the face and
normal most likely to cause the measurements at xot+τ given
the mesh Mo .
The best action a∗ may be selected that causes the highest
expected information gain IG and minimizing some cost
C(a) accrued by taking that action:
(12)
a∗ = arg max IG(a) − γ C(a) ,
a
where γ is a constant to specify the relative importance of
action costs to information gain.
IV. N EXT B EST T OUCH WITH P OORLY K NOWN OR
U NKNOWN M ODELS
In addition to touch-based localization, model class or
model parameter estimation may be necessary. For example,
from certain views the robot maybe incapable of determining
whether a particular object might be of a particular subclass
of similar objects. A simple example is a mug/cup that may
or may not have a handle, as shown in Figure 4. As such,
without knowing the object has a handle or if it is not
possible to view the handle, it might be particularly beneficial
to touch the object before a grasp is selected and executed
in order to detect presence of the handle.
Instead of exploiting the relative entropy of the belief
bel(Xt ) of the object state for the next best touch in object
p(ẑ|xt+τ ) = p(ẑ1 , ẑ2 |xt+τ )
= p(ẑ2 |ẑ1 , xt+τ )p(ẑ1 |xt+τ ) .
(8)
The contact measurement likelihood can be described using
the following binary detection model of an imperfect measurement process:

P (ẑ1 = 0|c = 1) = β



P (ẑ = 1|c = 1) = 1 − β
1
p(ẑ1 |xt+τ ) =
(9)

P (ẑ1 = 0|c = 0) = 1 − α



P (ẑ1 = 1|c = 0) = α
,
1 Bullet
Physics
wordpress/)
Engine
(http://bulletphysics.org/
(a) Mug with a handle.
(b) Cup without handle.
Fig. 4: Two similar objects except for color and discernible
feature, such as a handle.
The joint information gain may then be computed as:
Eτ [IG(τ, ai , M )]
(15)
Z
X
X
bel(xt+τ , M |ai )
dx .
=
p(τ )
bel(xt+τ , M |ai ) log
bel(xt , M )
xt+τ
τ
M
The joint belief may be broken into two probabilities using
the product rule. The first is the state probability and the
second is the probability of the model.
bel(xt+τ , M )
= p(xt+τ |xt , M, z1:τ , u1:t ) p(M |z1:τ , u1:t )
= bel(xt+τ |M ) p(Mi |z1:τ , u1:t ) .
Fig. 5: Left image: screwdriver that may be parametrized by
multiple parameters. Right image: mesh of screwdriver with
radius as parameter m1 .
localization, we exploit the relative entropy of the joint belief
of the object state and model parameter m or model class
M.
A. NBT with Unknown Model Parameter
Similar mesh models M within a object model class M
has been shown to be parametrizable using an Active Shape
Model (ASM) developed by Cootes [23]. An example of
a parametrizable object such as a screwdriver is shown in
figure 5, where the parameters are shown. These parameters
may be unknown a priori and must be estimated while
used touch exploration. Therefore, a joint belief should be
computed bel(x, m) to estimate both the object state and
parameter. This may be accomplished using a dual estimation
technique or jointly by augmenting the state vector to include
the model parameters.
The information gain of Equation (14) is modified to use
a joint belief, and an analagous derivation is carried out:
(16)
The first term is simply the belief of the state x under
model M , bel(xit ). The second term is the model probability
and can be found using Bayes’ rule:
p(zτ |z1:t , u1:t , M ) p(M |z1:t , u1:t )
.
p(M |z1:τ , u1:t ) = P
M p(zτ |z1:t , u1:t , M ) p(M |z1:t , u1:t )
(17)
The first term in Equation (17) is the data likelihood and is
the normalizer in the measurement update Equation (2) of the
Bayes’ filter, but under model class M . The data likelihood
may be found by marginalizing over state xt given model
M:
Z
p(zτ |z1:t , u1:t , M ) =
p(zτ |xt+τ , z1:t , u1:t , M )
xt+τ
p(xt+τ |z1:t , u1:t , M ) dxt+τ . (18)
The first term of Equation (18) is simply the measurement
model likelihood by making the Markov assumption that past
and future data is independent if the state is known. The
second term is simply the intermediate belief, bel(xt+τ ).
The second term in Equation (17) is the model probability
from the previous time step and hence can be computed
recursively. Lastly, the denominator of Equation (17) is a
normalizer (p(zτ )) and need not be computed explicitly for
updating the model probability.
The expected information gain Equation (15) can be simEẑ [IG(ẑ, ai , m)]
(13)
Z
Z
plified into a form which can be readily computed. Substitutbel(xt+1 , m|ai )
ing bel(xt+τ , Mi ) into Equation (16) allows simplifications
= p(ẑ) bel(xt+1 , m|ai ) log
dxdẑ
bel(xt , m)
ẑ
x
similar to previous derivations. First, the conditional belief
Z Z
bel(xt+1 , m|ai )
=
p(ẑ|xt+1 , m)bel(xt+1 , m|ai ) log
dxdẑ .on the model class can be substituted with the prediction and
bel(x
,
m)
t
update steps of the Baye’s filter. Next, since the measurement
ẑ x
zτ is not known, now the hypothetical measurements ẑ must
As before, the end of the action is not known and the be integrated out in the expectated information gain. Lastly,
expectation must be taken:
the variables (z1:t and u1:t ) are omitted for clarity in the
model probability p(M |z1:τ , u1:t ):
Eτ,ẑ [IG(ẑ, τ, ai , m)]
Z Z
X
Eτ,ẑ [IG(ẑ, τ, ai , M )]
Z
=
p(τ )
p(ẑ|xt+τ , m)bel(xt+τ , m|ai )
X
XZ
ẑ x
=
p(τ )
p(ẑ|xt+τ , M )bel(xt+τ |M, ai )
τ
xt+τ M
ẑ
τ
bel(xt+τ , m|ai )
log
dx dẑ .
(14)
p(ẑ|xt+τ , M )bel(xt+τ |M, ai ) p(M )
bel(xt , m)
p(M ) log
. (19)
bel(xt , M )
p(ẑ)
B. NBT with Unknown Model Class
Equation (19) can be computed using the measurement
As before in section IV-A, a joint belief must be computed model p(ẑ|xt+τ , M ), which may use the binary contact
consisting of the state x and the model class M , bel(x, M ). model in Equation 9). The discrete prior model probability is
Fig. 6: The belief distribution after a series of null contact
measurements (panels 1-5) and a positive contact at panel 6.
Note the disappearance of belief states as the hand descends
in 1-5, the change in belief below the hand in panel 5 and
the peaked distribution in panel 6. Blue indicates low belief
and Red indicates high belief.
illustrates the belief distribution changing as a result of null
contact measurements, and finally a positive contact. The
change in belief is most evident in panel 5. When a positive
contact is made panel (6), the states in contact increase in
belief and the states not in contact decrease in belief.
Figure 7 shows the best action determined by the algorithm
given various initial prior distributions. Given diffuse beliefs,
respectively in the z and y axis, Figures 7a and 7b show, as
one might expect, the action which provides the mostinformation gain about the door handle pose. However, Figures 7c
and 7d illustrate that the greatest information gain does not
necessarily accrue in the direction of the largest uncertainty.
As a result, both the shape of the object and the hand pose
plays a role in the information gain calculation, as shown by
the action chosen in Figure 7d.
denoted by p(M ) (may be chosen as uniform) and the prior
belief conditioned on the model is denoted by bel(xt , M ).
V. S IMULATED AND E XPERIMENTAL R ESULTS
The method is demonstrated in simulation and via experiment on the DARPA ARM-S robot system with the task
of localizing a door handle and consequently opening the
door. The state is chosen to be the spatial location of the
door handle (x, y, z). The extension to unknown models is
validated on the problem of determining the class of object
when the object’s discernible feature is out of view.
(a) Screwdriver touching ac- (b) Table touching action
tion a1 - descending down a3 and pinching action a2 .
in z
−4
x 10
A. Simulation
1.6
X
1.4
1.2
λi
i
2
Information Gain [bits]
To calculate the information gain of each action, each
action must be simulated. For each action, the belief is
propagated using a series of null contact measurements
(ẑ1:(τ −1) = 0) just prior to the time/location (τ ) at which
the information gain is to be calculated. At contact time τ ,
all possible measurements are used in Equation (14). Figure 6
1
0.8
0.6
a1
a2
a3
1
0.4
a3
0.2
0
1
2
Action [−]
3
0
0
5
a2
10
15
20
25
Discrete position of action [−]
a1
30
35
40
(c) Information gain values for ac- (d) Covariance evolution of screwtions 1,2, and 3.
driver belief after each action.
Fig. 8: 3 feasible candidate actions to determine location and
model parameter. The blue cloud indicates the discrete belief
of the position and the handle radius of the screwdriver.
(a) Diffuse Prior along z axis
(b) Diffuse Prior along y axis
(c) Diffuse Prior along diagonal (d) Diffuse Prior along diagonal
z − y axis
z − y axis
Fig. 7: Four example choices of prior belief (left) and the
corresponding best action (right) chosen by the method. The
posterior belief (right) after a positive contact is also shown.
Blue indicates low belief and Red indicates high belief.
Next best touch for unknown model is validated in simulation using the example of determining the handle radius
parameter of a screwdriver (m1 in Figure 5) and localizes in
x, y, z. Three actions are choosen: (1) touching the handle
from above, (2) pinching the handle, and (3) caging and
touching the fiducial surface, the table. Figure 8c shows
the information gain for the 3 actions. Figure 8d shows
thecovariance evolution per each action. After the three
actions, the method discovers the correct radius and localizes
the screwdriver.
Next best touch with unknown model class was validated
in simulation using the example problem of discerning
between the mug and cup shown in Figure 4. The main
1
0.9
Mug − Action 1
Cup − Action 1
Mug & Cup − Action 2 & 3
0.8
Model Probability [−]
0.7
0.6
0.5
0.4
0.3
0.2
0.1
(a) Mug with a handle.
(b) Cup without handle.
Fig. 9: Two similar objects (mug and cup) undiscernable
based on shape from this viewpoint.
0
0
2
4
6
8
10
12
Discrete position of action [−]
14
16
18
20
Fig. 11: Model class probability evolution for the mug and
cup along each action.
B. Experiment
(a) Action 1 touching the han- (b) Action 2 touching the side
dle of the mug
of the mug
5
4.5
4
Information Gain [bits]
3.5
3
2.5
2
1.5
1
0.5
0
a3
1
a2
2
Action [−]
a1
3
(c) Action 3 touching the top (d) Information gain for 3 acof the mug.
tions.
Fig. 10: Three chosen actions for model class determination
and the associated information gains.
problem occurs when the discernible handle feature is out
of view, as shown in Figure 9. Three candidate actions are
chosen to illustrate and validate this method: (1) touching
the handle, (2) touching the side, and (3) touching the top
of the mug. The first action is touching the discernible
feature, the handle. These actions are shown in Figure 10. As
expected, the action aiming to touch the handle has a much
larger information gain than the other actions, as shown in
Figure 10d. The model class probability, for these actions
are shown in Figure 11. Action 1 quickly differentiates the
model class probability, while the second and third actions
do not impact the model class probability since both do not
interact with the distinguishing feature.
The algorithm was tested on the DARPA ARM-S robot
with the task of opening a door. The door and door handle
were both detected and the poses registered using vision
initially. The door was then displaced manually by the
experimenter to add additional error. The prior distribution
in this experiment was chosen as uniform to add additional
uncertainty in the location of the door handle. Note that this
choice of prior does not preclude from using uncertainty
models relating to the vision system used. The hand motion
was planned to a starting location near the initially estimated
door handle location. From there, action constraints and subsequent candidate actions were generated using the method
described above. These actions were pruned for collisions
with other objects in the scene. A video of an experiment
can be viewed at: http://robotics.caltech.edu/
˜hebert/nbt.mpg.
Figure 12 illustrates the sequence of touching actions and
the corresponding updated beliefs. A series of three touching
actions was determined by the algorithm. The first action
touched from the side. The subsequent two touching actions
were initiated from above the door handle but in different
locations. During each action, the belief is continually updated with contact measurements. The evolution of the belief
after each action is also shown in the upper right column.
The belief was marginalized over the state x for plotting
clarity. As the belief peaked, the uncertainty decreased past
the set threshold at which point, the door is then opened. The
bottom right of Figure 12 shows the decrease in uncertainty
(the sum of the eigenvalues of the covariance matrix) at each
touching action and the uncertainty threshold.
VI. C ONCLUSION
The paper presents an automated approach to determine
the next best action to localize an object by touch. Candidate
touching actions are first generated and are comprised of
motion direction and a hand posture. The motion directions
are chosen based on the normals of the model faces. The
motion directions are then paired with desired hand surfaces
to contact. Multiple feasible hand postures are coupled with
a manipulator trajectory generator. These trajectories are
sorted and then costed using an information gain metric.
After the selection of the best action, that action is executed
11
10
to poorly known and unknown models, where simulation has
validated the approach.
9
8
z
z
7
6
ACKNOWLEDGMENTS
5
y
4
3
2
1
1
2
3
4
5
y
6
7
8
9
10
11
11
10
9
8
The author wishes to acknowledge the support of the work
by the National Science and Engineering Research Council
of Canada (NSERC) and DARPA’s Autonomous Robotic
Manipulation Software Track (ARM-S) program through an
c 2013. All rights reserved.
agreement with NASA. 7
z
a1
R EFERENCES
6
5
4
3
2
1
y
1
2
3
4
5
6
1
2
3
4
5
1
2
3
4
5
0.01
0.015
0.02
0.025
7
8
9
10
11
7
8
9
10
11
7
8
9
10
11
11
10
9
a2
8
7
z
6
5
4
3
2
1
y
6
11
10
9
8
a3
7
z
6
5
4
3
2
1
0
0.005
y
6
0.03
0.035
0.04
0.045
0.05
−3
x 10
3
X
λi
2.5
i
2
1.5
1
0.5
0
1
2
a
3
Fig. 12: A belief distribution and covariance evolution of
door handle localization. The initial distribution is chosen
as uniform. The first action touches the side of the door
handle; the next two actions touch from the top at different
locations. The belief map in the upper right column was
marginalized over x for clarity. The sum of the eigenvalues
of the covariance matrix is plotted at each action update.
The uncertainty decreases until the threshold (blue horizontal
line) is reached at which point the door is then opened.
and the belief is updated accordingly with the measurements
acquired during the action. This process is repeated until the
covariance of the object’s pose is below a threshold set by the
user. Both simulation and experiments on the DARPA ARMS robot show that the method produces reasonable plans and
does indeed reduce uncertainty. The method is also extended
[1] A. Saxena, J. Driemeyer, and A. Y. Ng, “Robotic grasping of novel
objects using vision,” IJRR, 2008.
[2] W. Meeussen, M. Wise, S. Glaser, S. Chitta, C. McGann, P. Mihelich,
E. Marder-Eppstein, M. Muja, V. Eruhimov, T. Foote, J. Hsu, R. Rusu,
B. Marthi, G. Bradski, K. Konolige, B. Gerkey, and E. Berger,
“Autonomous door opening and plugging in with a personal robot,”
in ICRA, May 2010.
[3] A. Petrovskaya and A. Y. Ng, “Probabilistic mobile manipulation in
dynamic environments, with application to opening doors,” in IJCAI,
Hyderabad, India, 2007.
[4] P. Hebert, N. Hudson, J. Ma, T. Howard, T. Fuchs, M. Bajracharya,
and J. Burdick, “Combined shape, appearance and silhouette for
simultaneous manipulator and object tracking,” in ICRA, may 2012.
[5] A. Petrovskaya, O. Khatib, S. Thrun, and A. Ng, “Bayesian estimation
for autonomous object manipulation based on tactile sensors,” in ICRA,
May 2006.
[6] C. Corcoran and R. Platt, “Tracking object pose and shape during
robot manipulation based on tactile information,” in ICRA, Jan 2010.
[7] P. Hebert, N. Hudson, J. Ma, and J. Burdick, “Fusion of stereo
vision, force-torque, and joint sensors for estimation of in-hand object
location,” in ICRA, May 2011.
[8] S. T. Dieter Fox, Wolfram Burgard, “Active markov localization for
mobile robots,” JRAS, Jan 1998.
[9] N. Hudson, T. Howard, J. Ma, A. Jain, M. Bajracharya, S. Myint,
C. Kuo, L. Matthies, P. Backes, P. Hebert, T. Fuchs, and J. Burdick,
“End-to-end dexterous manipulation with deliberate interactive estimation,” in ICRA, May 2012.
[10] A. Davison, “Active search for real-time vision,” in ICCV, October
2005.
[11] T. Arbel and F. P. Ferrie, “Entropy-based gaze planning,” in JIVC,
1999, pp. 779–786.
[12] J. Ma and J. Burdick, “Dynamic sensor planning with stereo for model
identification on a mobile platform,” ICRA, Jan 2010.
[13] M. Krainin, B. Curless, and D. Fox, “Autonomous generation of complete 3d object models using next best view manipulation planning,”
ICRA, Jan 2011.
[14] C. Stachniss, G. Grisetti, and W. Burgard, “Information gain-based
exploration using Rao-Blackwellized particle filters,” in Proceedings
of Robotics: Science and Systems, Cambridge, USA, June 2005.
[15] W. E. L. Grimson and T. Lozano-Perez, “Model-based recognition and
localization from sparse range or tactile data,” IJRR, vol. 3, no. 3, pp.
3–35, 1984.
[16] P. Allen and R. Bajcsy, “Object recognition using vision and touch,”
in IJCAI, 1985.
[17] K. Gadeyne, “Markov techniques for object localization with forcecontrolled robots,” ICAR, 2004.
[18] N. Gorges, S. Gaa, and H. Worn, “Object exploration with a humanoid
robot using tactile and kinesthetic feedback,” in ICINCO, 2008.
[19] C. Corcoran and R. Platt, “A measurement model for tracking handobject state during dexterous manipulation,” in ICRA, Jan 2010.
[20] K. Hsiao, L. Kaelbling, and T. Lozano-Perez, “Task-driven tactile exploration,” in Proceedings of Robotics: Science and Systems, Zaragoza,
Spain, June 2010.
[21] A. Schneider, J. Sturm, C. Stachniss, M. Reisert, H. Burkhardt, and
W. Burgard, “Object identification with tactile sensors using bag-offeatures,” in IROS, 2009.
[22] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent
Robotics and Autonomous Agents series), 2005.
[23] T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active shape
models-their training and application,” Computer vision and image
understanding, vol. 61, no. 1, pp. 38–59, 1995.