Learning Occupancy Grids of Non-Stationary
Objects with Mobile Robots
Benson Limketkai1 , Rahul Biswas1 , and Sebastian Thrun2
1
2
Stanford University, Stanford CA 94305, USA
Carnegie Mellon University, Pittsburgh PA 15213, USA
Abstract. We propose an occupancy grid mapping algorithm for mobile robots
operating in environments where objects may change their locations over time.
Most mapping algorithms rely on a static world assumption, which cannot model
non-stationary objects (chairs, desks, . . . ). This paper describes an extension to
the well-known occupancy grid mapping technique [5,10] for learning models of
non-stationary objects. Our approach uses a map differencing technique to extract
snapshots of non-stationary objects. It then employs the expectation maximization
(EM) algorithm to learn models of these objects, and to solve the data association
problem that arises when objects are seen at different places at different points
in time. A Bayesian version of Occam’s razor is applied to determine the number
of different objects in the model. Experimental results obtained in two different
indoor environments illustrate that our approach robustly solves the data association problem, and generates accurate models of environments with non-stationary
objects.
1
Introduction
The field of robotic mapping is among the most active in mobile robotics research [7,15]. Virtually all existing robotic mapping algorithms rely on a static
world assumption, that is, things are not allowed to move when the robot is
acquiring a map. Dynamic effects, such as people, that might briefly obstruct
the robot’s sensors, are filtered away at best [6,17], and lead to catastrophic
mapping failures at worst. The static world assumption in robotic mapping
is motivated by the fact that even for static worlds, the mapping problem
is computationally very hard [14]. However, most natural environments possess non-stationary objects. For example, office environments contain objects
such as chairs, desks, recycling bins, and people, which frequently change
their location.
This paper describes an occupancy grid mapping algorithm capable of
modeling non-stationary objects in otherwise static environments, previously
described in [3]. Our approach assumes that objects in the environment move
sufficiently slowly that they can safely be assumed to be static for the time it
takes to build an occupancy grid map. However, their locations change over
longer time periods (e.g., from one day to another). An example of such a
situation involves an office delivery robot, which may enter offices at regular
time intervals. From one visit to the next, the configuration of the environment may have changed in unpredictable ways (e.g., chairs moved around
2
Limketkai, Biswas, and Thrun
and in or out of a room). Since the robot may not witness the motion directly, conventional tracking techniques [2,8] are inapplicable. Instead, the
robot faces a challenging data association problem of determining the correspondence between sightings of possible objects at different locations, and at
different points in time.
Our approach is capable of learning models of such non-stationary objects, from multiple observations acquired at different points in time. At regular time intervals, the robot constructs static occupancy grid maps [16].
Each map captures a “snapshot” of the environment at a specific point in
time. Changes in the environment are detected using a map differencing technique, adopted from the computer vision literature [18]. Our approach learns
models of non-stationary objects using a modified version of the expectation
maximization (EM) algorithm [9]. Since the total number of movable objects
may be unknown, our approach employs a model selection technique for determining the most plausible number of objects. In our empirical evaluation,
we found our approach to be reliable in identifying and localizing objects,
and in learning high fidelity object models. The paper provides experimental
results for two room-style environments, where a collection of natural objects
is moved over time.
2
2.1
Probabilistic Framework
Identifying Snapshots of Non-Stationary Objects
Our approach requires as input a set of occupancy grid maps of the same
environment, acquired at different points in time t. The identification (but
not the modeling!) of non-stationary objects is quite straightforward: If the
occupancy varies across maps, it is potentially part of a movable object in
those maps where the grid cell is occupied. As a first preprocessing step, our
approach pursues exactly this idea.
Our approach assumes that all non-stationary objects are located sufficiently far apart from each other. To identify object snapshots, our approach
decomposes the environmental model into a static occupancy grid map, and a
collection of smaller occupancy grid maps, one for each non-stationary object.
Non-stationary objects are identified by a relatively straightforward map differencing technique. Our approach identifies objects by finding regions that in
some of the maps are occupied, and that are free in others. If the occupancy
of a grid cell is the same in all maps, it does not belong to a non-stationary
object; instead, it is either part of a permanent free region or part of a static
object such as a wall. If the occupancy varies across maps, it is potentially
part of a non-stationary object in those maps where the grid cell is occupied.
This map differencing technique yields a set of candidate objects. A standard
low-pass computer vision filter [18] is then employed to remove noise, which is
usually found on the border of free and occupied space. The result is a list of
non-stationary “snapshots” of objects, each represented by a local occupancy
Learning Occupancy Grid Maps of Non-Stationary Objects
3
grid map. The underlying assumptions of our approach are that objects are
rigid, and that they do not touch each other when building a map.
Let us denote the total number of non-stationary objects (snapshots)
found in the t-th map by Kt , and the individual object snapshots by
µt = {µ1,t , µ2,t , . . . , µKt ,t }
(1)
Here µk,t is the k-th snapshot extracted from t-th map, where extracted
objects are arranged in no specific order. The set of all sets of object snapshots
µt will be denoted
µ = {µ1 , µ2 , . . . , µT },
(2)
where T is the total number of maps.
2.2
Models of Non-Stationary Objects
Objects are represented by local occupancy grid maps [5,10]. Each object
is modeled by a grid θn , where 1 ≤ n ≤ N . Here N is the total number of
objects. For the time being, we assume that N is known. Below, in Section 3.2,
we address the problem of determining N from data.
The set of all object models will be denoted
θ = θ1 , . . . , θN
(3)
Snapshots µ are assumed to be generated from objects θ in the following way.
Suppose snapshot µk,t corresponds to object θn . The probability of observing
µk,t is given by the following Gaussian distribution:
1 X
p(µk,t | θn , δk,t ) ∝ exp − 2
(f (µk,t , δk,t )[j] − θn [j])2
(4)
2σ
j
Here δk,t is an alignment parameter, which determines the relative orientation
of the snapshot µk,t to the corresponding object θn . The aligned (rotated)
snapshot is simply denoted f (µk,t , δk,t ), where f is the alignment function.
The variable j is the index of individual grid cells in each map. The distribution (4), thus, is a Gaussian over the distance between the object model θ n
and the aligned snapshot, using the variance parameter σ 2 .
Unfortunately, we are not told which object snapshot corresponds to
which object. It will therefore prove useful to make explicit the correspondence between objects θn and object snapshots µk,t . This will be established
by the following correspondence variables
α = α 1 , α2 , . . . , α T
(5)
Since each µt is an entire set of snapshots, each αt is in fact a function:
αt : {1, . . . , Kt } −→ {1, . . . , N }
(6)
4
Limketkai, Biswas, and Thrun
A key property of each correspondence function αt is that no two object
snapshots observed at the same point in time may correspond to the same
physical object. This induces a mutual exclusivity constraint on the set of
valid correspondences:
k 6= k 0 =⇒ αt (k) 6= αt (k 0 )
(7)
Equation (4) can now be re-written as follows:
1 X
p(µk,t | θ, δk,t , αt ) ∝ exp − 2
(f (µk,t , δk,t )[j] − θαt (k) [j])2
2σ
j
2.3
(8)
Log-Likelihood Function
Our probabilistic object model allows us to derive a joint likelihood function
of all essential parameters, that is, the data µ, the model θ, the alignment
parameters δ, and the correspondences α. As is common in the literature, we
assume independent and identically distributed noise.
p(µ | θ, δ, α) =
Kt
T Y
Y
p(µk,t | θ, δk,t , αt )
t=1 k=1
∝
Kt
T Y
Y
t=1 k=1
1 X
(f (µk,t , δk,t )[j] − θαt (k) [j])2
exp − 2
2σ
j
(9)
The joint likelihood is then attained by observing that
p(µ, θ, δ, α) = p(µ | θ, δ, α) p(θ, δ, α)
= p(µ | θ, δ, α) p(θ) p(δ) p(α)
(10)
where p(θ), p(δ), and p(α) are (in the absence of data) uniformly distributed,
hence can be subsumed into the constant factor in (9). Taking the logarithm
of the result gives us
log p(µ, θ, δ, α) = const. −
T Kt X
1 XX
(f (µk,t , δk,t )[j] − θαt (k) [j])2(11)
2σ 2 t=1
j
k=1
3
3.1
Learning Object Models as Optimization
Expectation Maximization
One approach to optimize (11) would be to estimate the model θ, the alignment parameters δ, and the correspondences α all at the same time, by maximizing the joint log likelihood function. Unfortunately, all those parameters
interact, making it difficult to learn object models efficiently. Following the
Learning Occupancy Grid Maps of Non-Stationary Objects
5
standard optimization literature [9,11], our problem can be decomposed by
instead optimizing a related function, in which the correspondences are integrated out (via an expectation operator):
Eα [log p(µ, θ, δ, α) | µ, θ, δ]
= const. −
1
2σ 2
Kt
T X
X
(12)
p(αt (k) = n | µ, θ, δ)
t=1 k=1
X
(f (µk,t , δk,t )[j] − θn [j])2
j
This function is conveniently optimized by generating a sequence of models
and alignment parameters
hθ[1] , δ [1] i, hθ [2] , δ [2] i, hθ [3] , δ [3] i, . . .
(13)
which gradually increase (12) until a local maximum is reached. This procedure is known as the expectation maximization algorithm [9,11], which has
been shown to converge to a local maximum of the joint likelihood (11).
Our implementation of EM involves the alternation of three steps, the
first of which implements the common expectation step in [9,11], and the
latter of which implement two separate maximization steps for the model θ,
and the alignment parameters δ, respectively.
1. Calculating expectations over correspondences. In the i-th iteration, expectations p(αt (k) = n | µ, θ, δ) for the latent correspondences are
calculated using the previously calculated model θ [i−1] and alignment parameters δ [i−1] . These expectations are obtained by a ratio of Gaussians,
where I is an indicator variable:
[i]
[i−1]
ak,t,n = p(αt (k) = n | µ, θ [i−1] , δk,t )
P P
[i−1]
[i−1]
X
(f (µk0 ,t [j],δk0 ,t )−θα (k0 ) [j])2
− 1
t
I(αt (k)=n) e 2σ2 j k0
=
at
X
e
−
1
2σ 2
P P
j
[i−1]
k0
[i−1]
2
0 [j])
t (k )
(f (µk0 ,t [j],δk0 ,t )−θα
(14)
at
This expression is simply the probability of a specific correspondence, as
attained from our Gaussian noise model (8). For fixed object models θ [i−1]
and fixed alignments δ [i−1] , calculating (14) is straightforward, though
with a complexity that is exponential in the number of observed object
snapshots Kt in an individual map. See [4,12] for efficient polynomial
approximations.
2. Calculating new object models. Subsequently, the correspondences
are used to compute a new object model θ. The pleasing implication of
our iterative optimization technique is that this problem decomposes: It
can be solved separately for each pixel j of each object model θn :
θn[i] [j] =
Kt
T X
X
t=1 k=1
[i]
[i−1]
ak,t,n f (µk,t , δk,t )[j]
(15)
6
Limketkai, Biswas, and Thrun
Thus, each occupancy value of each of the model grid cells is simply
obtained as a weighted count of the corresponding pixels of the individual
object snapshots.
3. Calculating new alignment parameters. Finally, a new set of alignment parameters δ is computed. Again, this problem decomposes into a
set of individual optimization problems, one for each object snapshot
[i]
δk,t
= argmin
δk,t
N
X
n=1
[i]
ak,t,n
X
(f (µk,t , δk,t )[j] − θn[i] [j])2
j
Since the projection f is non-linear, these optimization problems are
solved using hill climbing.
The iteration of all three steps leads to a local maximum in the space of all
models θ and alignment parameters δ. This iterative optimization is the very
core of our algorithm for learning models of non-stationary objects.
In our experiments, we found that a straightforward implementation of
EM frequently led to suboptimal maps. Our algorithm therefore employs
deterministic annealing [13] to smooth the likelihood function and improve
convergence. In our case, we anneal by varying the noise variance σ in the
sensor noise model.
σ [i] = σ + γ i σ0
(16)
with a large variance σ0 and annealing factor γ < 1. The intuition behind this
approach is that larger variances induce a smoother likelihood function, but
ultimately result in fuzzier shape models. Smaller variances lead to crisper
maps, but at the expense of an increased number of sub-optimal local maxima. The value σ [i] is used in the i-th iteration of EM. A second strategy for
avoiding local minima is multiple restarts with different random initial object
models.
3.2
Determining the Number of Objects
A final and important component of our mapping algorithm determines the
number of objects N . So far, we have silently assumed that N is given. In
practice, we usually do not know the number of objects in advance, and have
to determine it from our data.
Clearly, the number of objects is bounded below by the number of objects
seen in each individual map, and above by the sum of all objects ever seen:
X
max Kt ≤ N ≤
Kt
(17)
t
t
Our approach applies a Bayesian prior for selecting the right N , effectively
transforming the learning problem into a maximum a posterior (MAP) estimation problem. We use an exponential prior, which in log-form penalizes
Learning Occupancy Grid Maps of Non-Stationary Objects
(a)
7
(b)
(c)
Fig. 1. (a) The Pioneer robot used to collect laser range data. (b) The robotics lab
where the second data set was collected. (c) Actual images of dynamic objects used
in the second data set.
the log-likelihood in proportion to the number of objects N , with appropriate
constant penalty c. This leads to the following log posterior:
Eα [log p(µ, α | θ) | µ, θ] − cN
(18)
Our approach selects N by maximizing the log posterior (18), through a
separate EM optimization for each candidate value of N . At first glance this
exhaustive search procedure might appear computationally wasteful, but in
practice N is usually small, so the optimal values can be found quickly.
4
Experimental Results
Our approach was extensively tested in simulation and physical environments.
Figure 1 shows the robot, one of the testing environments, and four nonstationary objects whose locations changed over the course of time.
Maps for our two real-world data sets are shown in Figures 2a and 3a.
Figures 2b and 3b show an overlay of the original map, as would be obtained
using standard static-world occupancy grid mapping. Especially in the second
data set, the environment (Stanford’s robot lab) is quite cluttered. Nevertheless, our segmentation algorithm succeeded in identifying and extracting all
object snapshots correctly, in all maps. Figures 2c and 3c provide examples
of the result of map differencing, before applying our low pass filter for removing the small areas of erroneous pixels near the boundary of occupied
and unoccupied terrain. These images illustrate that oject snapshots can be
found reliably.
Our experimental results are described in more detail in [3]. In a nutshell,
our approach reliably identified individual objects, along with estimating the
‘right’ number of objects—given sufficiently many random restarts (typically
less than 5). Figure 4a shows a progression of object models for N = 4 objects, extracted from the data set shown in Figure 3a. The final set of objects
8
(a)
Limketkai, Biswas, and Thrun
(b)
(c)
Fig. 2. (a) Four maps used for learning models of dynamic objects using a fixed
number of objects per map. (b) Overlay of optimally aligned maps. (c) Difference
map before low-pass filtering.
(a)
(b)
(c)
Fig. 3. (a) Nine maps used for learning models of dynamic objects using a variable
number of objects per map. (b) Overlay of optimally aligned maps. (c) Difference
map before low-pass filtering. In our somewhat simplified setup, the non-stationary
objects are clearly identifiable.
accurately describes the individual objects in the world, where the resulting
models are more accurate than any of the individual object snapshots used
for their identification. Figure 3b shows the log posterior (18) as a function of
N . As this graph illustrates, the ‘right’ number of objects N = 4 indeed maximizes the penalized log likelihood, hence our approach identifies the correct
(a)
(b)
Model Likelihood − Complexity Penalty
Learning Occupancy Grid Maps of Non-Stationary Objects
9
Model Score vs. Number of Objects
−1000
−1200
−1400
−1600
−1800
−2000
−2200
2
4
6
8
10
12
14
Number of Objects
Fig. 4. (a) Seven iterations of EM: The bottom rows shows the learned object shape
models. (b) Graph of model score versus number of objects, used for identifying
the number of objects in the environment.
number of objects. In summary, we found our algorithm to be applicable to
the problem of learning shape models of non-stationary environments with
mobile robots.
5
Conclusion
The paper described an occupancy grid mapping algorithm for non-stationary
environments, where objects may change their locations over time. In a preprocessing stage, the algorithm extracts sets of non-stationary object snapshots from a collection of occupancy grid maps, recorded at different points
in time. The EM algorithm is applied to learn object models of the individual non-stationary objects in the world, represented as local occupancy
grid maps. The number of objects is estimated as well. Experimental results
presented in this paper demonstrate the robustness of the approach. We consistently found that high-fidelity object models were learned from multiple
sightings of the same object at different locations.
This work differs from the rich literature on robotic mapping in that it
addresses non-stationary environments. However, in its present state, our approach possesses a range of limitations which warrant future research. Our
segmentation techniques make highly restrictive assumptions on the nature
and configuration of non-stationary objects. In particular, objects must be
located sufficiently far apart to be identified as individual objects. Furthermore, objects have to move slowly enough that they are captured as static
objects in each occupancy grid map. This precludes the inclusion of fastmoving people in the map. We also believe that the same techniques can be
applied to more advanced representation than just occupancy grid maps (e.g.
integrating multi-modal sensor input from camera images, etc. . . ). Finally,
we note that we have recently extended this approach to include a hierarchy
of objects, where individual objects are inherited from generic shape templates of objects [1]. This hierarchical representation was found to improve
the convergence properties of the algorithm, and to yield more accurate object models.
10
Limketkai, Biswas, and Thrun
In many ways, we believe that the issue of learning maps of dynamic
environment has received too little attention, when compared with the rich
literature on mapping static environments. We believe the time is ripe to
extend our algorithms to handle dynamic effects. The present paper is a step
in this direction, of this largely unexplored problem.
References
1. D. Anguelov, R. Biswas, D. Koller, B. Limketkai, S. Sanner, and S. Thrun.
Learning hierarchical object maps of non-stationary environments with mobile
robots. In UAI-02.
2. Y. Bar-Shalom and T. E. Fortmann. Tracking and Data Association. Academic
Press, 1998.
3. R. Biswas, B. Limketkai, S. Sanner, and S. Thrun. Towards object mapping in
dynamic environments with mobile robots. In IROS-02.
4. F. Dellaert, S.M. Seitz, C. Thorpe, and S. Thrun. EM, MCMC, and chain
flipping for structure from motion with unknown correspondence. Machine
Learning, to appear.
5. A. Elfes. Sonar-based real-world mapping and navigation. IEEE Journal of
Robotics and Automation, 3(3):249–265, 1987.
6. D. Hähnel, D. Schulz, and W. Burgard. Map building with mobile robots in
populated environments. Submitted.
7. D. Kortenkamp, R.P. Bonasso, and R. Murphy, editors. AI-based Mobile Robots:
Case studies of successful robot systems, 1998. MIT Press.
8. J. MacCormick and A. Blake. A probabilistic exclusion principle for tracking
multiple objects. In ICCV-99.
9. G.J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley,
New York, 1997.
10. H. P. Moravec. Sensor fusion in certainty grids for mobile robots. AI Magazine,
9(2):61–74, 1988.
11. R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models. Kluwer,
1998.
12. H. Pasula, S. Russell, M. Ostland, and Y. Ritov. Tracking many objects with
many sensors. In IJCAI-99.
13. K. Rose. Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of IEEE, D, November
1998.
14. R. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships in robotics. In Autonomous Robot Vehicles, Springer, 1990.
15. C. Thorpe and H. Durrant-Whyte. Field robots. In ISRR-01.
16. S. Thrun. A probabilistic online mapping algorithm for teams of mobile robots.
International Journal of Robotics Research, 20(5):335–363, 2001.
17. B. Yamauchi, P. Langley, A.C. Schultz, J. Grefenstette, and W. Adams. Magellan: An integrated adaptive architecture for mobile robots. TR 98-2, ISLE,
Palo Alto, CA, 1998.
18. S.W. Zucker. Region growing: Childhood and adolescence. Comput. Graphics
Image Processing, 5:382–399, 1976.
© Copyright 2026 Paperzz