Computing Compact Policies for Fully Observable Non-Deterministic Planning Problems

Computing Compact Policies for Fully
Observable Non-Deterministic Planning
Problems
Alberto Camacho Martinez
MASTER THESIS UPF / 2013
SUPERVISED BY:
Prof. Hector Geffner
Department of Information and Communications Technologies
Acknowledgements
There are a number of people without whom this thesis might not have
been possible, and whom I am greatly indebted. It is not possible to mention
all ow them, nor to mention every individual contribution, so I will try to
take profit of these lines by expressing my grateful to everyone that has been
involved in some sense with my work during the elaboration of the Master
Thesis.
First, I would like to thank my MSc advisor Hector Geffner. Many things
can be said here, but I will try to be crisp and go to the point. We can
find a lot of talented people in the scientific community, but really a few of
them complement their talent to do research with an enjoyable and pleasing
behavior. Hector is one of these lucky chosen ones. He has always had
time for me to discuss about new ideas and the progress of this work, not
to mention his role in the friendly atmosphere at the department. I have to
express my gratitude for trusting me and for offering me such an interesting
line of research.
Second, I would like to thank Nir Lipovetzky, the former PhD candidate,
that is responsible of making my work here possible thanks to one of his
contributions made during his predoctoral studies. I had the pleasure to
meet him during some months at UPF, discuss about his work and, most
valuable, receive some piece of advice about how to do research in planning.
Continuing with the Artificial Intelligence group at UPF, I would like to
thank the rest of the team for their insights and support: Alexandre Albore,
Hector Palacios, Anders Jonsson, and my two office mates Damir Lotinac and
Filippos Kominis. The realization of the master would not have been such an
amazing experience without the good atmosphere created by my classmates,
to whom I want to thank for the funny moments we shared. Last, but not
least, I want to mention the support of the DTIC administration department,
that saved my neck more than once with the administrative paperwork.
iii
Abstract
Fully Observable Non-Deterministic (FOND) planning is the problem of finding action strategies for solving a planning problem assuming that the actions may have non-deterministic effects, and the states are fully observable.
In this thesis, we adapt the approach by [Geffner and Lipovetzky, 2012] to
study the complexity of FOND problems: we define a parameter, the width,
that captures the complexity of the problem, and we develop an algorithm
to compute compact policies in time and complexity that is exponential in
the problem width. We also show that many of the benchmark domains with
atomic goals can be solved in small polynomial time with this method, and
some problems with conjunctive goals can be solved with low complexity and
efficiency that is comparable with state-of-the-art FOND planners.
Resumen
Los problemas de planificación no determinística bajo total observabilidad
(FOND) consisten en encontrar políticas de acciones para resolver un problema, asumiendo que las acciones pueden tener efectos no determinísticos y
que los estados son totalmente observables. En esta tesis se adapta el método
utilizado en [Geffner and Lipovetzky, 2012] para estudiar la complejidad de
problemas FOND: definimos un parámetro, la anchura, que captura la complejidad del problema, y desarrollamos un algoritmo para computar políticas
compactas en tiempo y complejidad exponencial en la anchura del problema.
Mostramos también que muchos de los dominios estándares con metas atómicas pueden ser resueltos con este método en tiempo polinomial pequeño, y
algunos de los problemas con metas no atómicas pueden resolverse con baja
complejidad que es comparable con los actuales planners.
v
Contents
List of Figures
ix
List of Tables
xi
I
1
1
II
Background
PROBLEM STATEMENT
1.1 Introduction . . . . . . . . . . . . .
1.2 Classical Planning . . . . . . . . . .
1.3 Fully Observable Non-Deterministic
1.4 Thesis Outline . . . . . . . . . . . .
. . . . . .
. . . . . .
Planning
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Structure
3
3
6
10
11
13
2
A WIDTH NOTION FOR FOND PLANNING
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . .
2.2 Tuple Graph . . . . . . . . . . . . . . . . . . . . .
2.3 Low Width Benchmarks . . . . . . . . . . . . . .
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
16
19
27
3
NON-DETERMINISTIC ITERATED WIDTH
3.1 Tuple Graph G i . . . . . . . . . . . . . . . . . . .
3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . .
3.3 Empirical Evaluation . . . . . . . . . . . . . . . .
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
30
33
38
4
SERIALIZED NDIW
41
4.1 Serialized Width . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Offline S-NDIW Algorithm . . . . . . . . . . . . . . . . . . . . 42
4.3 Online S-NDIW Algorithm . . . . . . . . . . . . . . . . . . . . 43
vii
4.4
4.5
5
POLICY EXTRACTION
5.1 Solucion Extraction Graph
5.2 Iterated Prune Algorithm
5.2.1 Policy Definition .
5.3 Conclusion . . . . . . . . .
III
6
Empirical Evaluation . . . . . . . . . . . . . .
4.4.1 Serialized Width in Classical Problems
4.4.2 Serialized Width in FOND Problems .
Conclusion . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Conclusions and Future Work
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
44
44
46
48
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
50
51
52
53
DISCUSSION AND CONCLUSION
55
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
viii
List of Figures
2.1
Blocks World problem . . . . . . . . . . . . . . . . . . . . . . 19
3.1
3.2
3.3
NDIW execution stats . . . . . . . . . . . . . . . . . . . . . . 36
NDIW cumulative time distribution . . . . . . . . . . . . . . . 37
NDIW time to solve the triangle-tireworld problem . . . . . . 38
4.1
4.2
S-NDIW execution stats . . . . . . . . . . . . . . . . . . . . . 46
Online S-NDIW execution stats, selected domains . . . . . . . 47
ix
List of Tables
1.1
1.2
SIW performance . . . . . . . . . . . . . . . . . . . . . . . . . 9
State-of-the-art FOND and MDP planners . . . . . . . . . . . 11
3.1
3.2
3.3
NDIW vs. IW performance in deterministic problems . . . . . 34
NDIW performance in FOND problems with atomic goals . . 35
N DIW vs. P RP vs. FF-H+ performance . . . . . . . . . . . 38
4.1
4.2
4.3
Offline S-NDIW vs Online S-NDIW vs. SIW performance . . 45
Offline S-NDIW vs. F IP vs. P RP performance . . . . . . . 47
Online S-NDIW vs. FF-H+ vs. P RP performance . . . . . . 48
xi
Part I
Background
1
Chapter 1
PROBLEM STATEMENT
In this chapter we introduce the general problem of planning in artificial intelligence. We then focus into two forms of planning: classical planning, and
fully observable non-deterministic planning. We conclude with an explanation of the scope of this thesis.
1.1
Introduction
Planning is the problem of finding action strategies for a given problem,
such that the initial state of the problem is mapped into a desired goal
state [Ghallab et al., 2004], [Russell and Norvig, 2002], [Bonet and Geffner, 2001].
The model for classical planning is formulated in definition 1.1.1, and is the
most simple model for planning problems. The rest of the models arise from
the relaxation of these definition, e.g., when the effects of the actions are not
deterministic (see 1.3).
Definition 1.1.1. A classical planning model is a tuple ⇧ = hS, s0 , SG , A, F i
where:
(i) S is a finite and discrete set of states (the states space).
(ii) s0 2 S is the initial state of the problem.
(iii) SG ⇢ S is a set of goal states.
(iv) A is a set of actions. The set of applicable actions in a state s is denoted
by A(s), A(s) ⇢ A.
(v) F is a transition function that associates a state s0 by applying an
action a in a state s: s0 2 F (s; a) for a 2 A(s).
3
The problem of planning is computationally intractable [Bylander, 1994],
and the tradeoff between the quality of the solutions, and resources in time
and memory needed to find them makes planning a challenging task. During the last two decades we have seen significant improvements in the performance of planners as witnessed by International Planning Competition1 .
The planning problems can be classified into different types according to the
action effects – deterministic vs. stochastic – and the level of observability
in the world – fully observable vs. partially/non observable. While classical planning has been deeply studied and state-of-the-art planners demonstrate high performance, other types of planning problems involving nondeterminism and/or partial observability still seem to have room for notable
improvements [Levesque, 2005] [Kissmann and Edelkamp, 2009].
Planning Models
Planning problems define a huge state space, and enumerating all the states
in the problem definition is not feasible. In order to provide a compact,
computationally tractable problem definition, the factored representations
describe every state of the problem not as a whole entity, but as a set
of predicate variables that are true in that state. That is, a states space
that is intractable in size can be described with a set of variables that is
bounded in size. The most used representation in planning literature is
STRIPS [Fikes and Nilsson, 1972], that represents the states of a classical
planning problem problem through boolean variables (a.k.a. fluents, facts,
or atoms) stating whether a proposition that describes the world is true or
false in a given state. An action in STRIPS consists of three sets of variables:
the preconditions set Pre, the Add set (containing the atoms that the action
makes true), and the Del set (containing the atoms that the action makes
false). The definition 1.1.2 formalizes the STRIPS representation.
Definition 1.1.2. A planning problem in STRIPS is a 4-tuple ⇧ = hF, O, I, Gi
where
(i) F is a set of boolean variables
(ii) O is a set of operators, where o 2 O has the form o = hPre(o), Add(o), Del(o)i,
and Pre(o), Del(o) ⇢ F
(iii) I is a subset of F , describing the initial state
1
The International Planning Competition (IPC) is a biennial event organized in the
context of the International Conference on Planning and Scheduling.
4
(iv) G is a subset of F , describing the set of goal states
In the planning literature it is common to formulate the problems in
STRIPS form. However, other popular languages exist such as the Planning
Domain Definiton Language (PDDL) [McDermott et al., 1998], and other
PDDL-like languages that have been standard in the International Planning
Competitions [Gerevini et al., 2009]. In PDDL, the representation of a planning problem is defined by a set of predicates, variables, and constants (see
listing 1.1) that are used as the input of planner solvers. The definition of
a planning problem in PDDL is divided into two descriptors, usually separated into two different files: the domain, and the instance. The domain
of a problem describes the type of variables, constants, and actions in the
problem. The instance of a problem lists the variables and constants in the
problem, as well as the initial state and goal state. In that manner, different configurations of the same planning problem have the same domain,
but different instance file. For that reason, we will usually refer to different
planning domain configurations as different instances of a planning domain.
Listing 1.1: PDDL description of the River FOND
problem, consisting on the domain description, and
an instance description
( d e f i n e ( domain r i v e r )
( : r e q u i r e m e n t s : t y p i n g : s t r i p s : non d e t e r m i n i s t i c )
( : p r e d i c a t e s ( on n e a r bank ) ( on f a r bank )
( on i s l a n d ) ( a l i v e ) )
( : action traverse rocks : parameters ( )
: p r e c o n d i t i o n ( and ( on n e a r bank ) )
: e f f e c t ( and ( n o t ( on n e a r bank ) )
( oneof
( on f a r bank )
( not ( a l i v e ) )
( on i s l a n d )
( on i s l a n d ) ) ) )
( : a c t i o n swim r i v e r : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( and ( on n e a r bank ) )
: e f f e c t ( and ( n o t ( on n e a r bank ) )
( oneof
( and )
( on f a r bank ) ) ) )
( : a c t i o n swim i s l a n d : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( and ( on i s l a n d ) )
: e f f e c t ( and ( n o t ( on i s l a n d ) )
( oneof
( on f a r bank )
( on f a r bank )
( on f a r bank )
( on f a r bank )
( not ( a l i v e ) ) ) ) ) )
( d e f i n e ( problem r i v e r problem )
( : domain r i v e r )
( : i n i t ( on n e a r bank ) ( a l i v e ) )
( : g o a l ( and ( on f a r bank ) ) ) )
5
1.2
Classical Planning
Classical planning is the problem of finding a sequence of actions that maps
a given initial state into a goal state, assuming that the environment and
the actions are deterministic. This is the simplest model in planning, and
is the case when the transition function in definition 1.1.1 is a deterministic
function (F (a, s) is a singleton).
The solutions of a classical planning problem are plans, or sequences of
actions that map the initial state I into the goal state G of the problem P.
The determinism of the actions makes possible to compute a complete plan
in advance – i.e., prior to execution – with the information of the initial state
and without knowledge acquired in the intermediate states reachable during
the execution of the plan.
Width in Classical Planning
The structure inherent to classical planning problems has been the subject of
several studies, with the aim of founding their complexity, and finding methods for solving them. For instance, a theoretical study in [Chen and Giménez, 2007]
suggests several width notions that measure the complexity in planning problems according to the number of atoms that change its value during a plan
execution. A more recent study appearing in [Geffner and Lipovetzky, 2012]
introduces a novel width concept to bound the complexity in classical planning problems, by studying the reachability of new tuples of atoms during
the plan execution. The same study introduces an algorithm that solves
planning problems in complexity that is exponential in the problem width.
Tuple Graph
The tuple graph (in classical planning) encodes the reachability relations over
tuples of atoms t. From now on, we will use the term tuple to refer to a set
of atoms of a planning problem P, and we will note P(t) the problem that
is like P, but with goal t. We will consider that every action has the same
cost, so optimal plans are the shortest ones.
In classical planning, every optimal plan ⇡ has the property that every
action a in ⇡ introduces a new tuple of atoms optimally. In other words,
every partial plan ⇡i of ⇡ is an optimal plan for a certain tuple of atoms ti .
The execution of ⇡ is then a chain of state-actions that achieves optimally a
tuple of atoms ti in every step. Inspired by this notion, we define the concept
of chain of tuples (see definition 1.2.1).
6
Definition 1.2.1. A chain of tuples is an ordered sequence C : t0 ! t1 !
· · · ! tn , where each ti is a tuple of atoms from a classical planning problem
P. The size of C is the size of largest tuple t in the chain. A chain C is
valid if t0 is true in I and every optimal plan for ti can be extended into an
optimal plan for ti+1 by adding a single action. A valid chain C implies G if
all optimal plans for tn are also optimal plans for G.
Definition 1.2.2. We denote as T i the size of tuples t from P with size no
greater than the integer i.
Definition 1.2.3. Let P = hF, I, O, Gi be a classical planning problem.
We define the tuple graph G i as the graph with vertexes from T i defined
inductively as follows:
1. t is a root vertex in G i iff t is true in I
2. t ! t0 is a directed edge in G i iff t is in G i and for every optimal plan
⇡ for P(t) there is an action a 2 O such that ⇡ followed by a is an
optimal plan for P(t0 ).
The construction of the tuple graph G i , formalized in definition 1.2.3,
connects tuples t 2 T i when they form a valid chain. The paths in G i are
valid chains t0 ! · · · ! tm that imply tm and, thus, encode plans for tm .
Note that these valid chains are optimal in the state space of G i , whereas may
encode plans that are not optimal in the state of P. The key part now is to
identify which of the valid chains in G i imply the goal G. When G i contains
a valid chain C that imply G, a plan for G exists and can be inferred from C,
so a solution for P can be found.
Definition 1.2.4. The width of a planning problem P, w(P), is the min w
such that it exists a valid chain C with size w that implies the goal G. If G
is true in I, the width of the problem is 0.
Theorem 1.2.1. It w(P) = w, the tuple graph G w contains a valid chain
that optimally implies the goal G.
Proof. By definition of w(P), it exists a valid chain C with size w that implies
G. The tuple graph G w contains all the valid chains of size no greater than
w and, in particular, also contains C.
The Iterative Width (IW ) algorithm consists of a sequence of calls IW (i),
for i = 0, 1 . . . until the problem P is solved or i exceeds the number of
variables in P. Each IW (i) call expands the tuple graph G i , and checks for
a valid chain that implies G. The IW algorithm is complete, but it is not
7
guaranteed to return an optimal solution for P, since G i may contain a valid
chain that implies G for i < w(P). The minimum parameter i for which
IW (i) finds a plan for G is called the effective width of the problem.
Theorem 1.2.2. A classical planning problem P can be solved optimally in
time that is exponential in the problem width.
Proof. The Iterative Width algorithm[Geffner and Lipovetzky, 2012] is able
to solve a classical planning problem P in time complexity that is exponential
in the with of the problem.
The empirical tests reported in [Geffner and Lipovetzky, 2012] show that
most benchmark domains with atomic goals have a small effective width
independent of the size of the problem. Unfortunately, when the goals are
not atomic the width of the problem increases considerably, so it seems that
the complexity of a problem is somehow related to the number of atoms in
the goal. In order to take advantage of the low width of the problems when
the goals are atomic, a serialization of the problem is proposed to tackle
goals that are conjunction of atoms.
Serialization
Several approaches exist to tackle problems whose goals are conjunction of
atoms [Chapman, 1987], [Richter and Westphal, 2010], [Hoffmann et al., 2004].
The approach presented in [Geffner and Lipovetzky, 2012] proposes to perform a decomposition of the problem into a series of subproblems that are
solved sequentially. A serialization of a problem ⇧ with goal G as a sequence
of formulas G1 , . . . , Gm such that G0 = ;, Gm = G and, by simple serializations, Gi+1 extends Gi with an additional atom from G. A serialization d
defines a family of sets of planning problems Pd = P1 , . . . , Pm such that Pi
is like ⇧ but with goal Gi and initial state si that correspond to the state
resulting from solving the problem Pi 1 optimally.
Definition 1.2.5. Serialized Interated Width (SIW) over a classical planning
problem ⇧ = hF, I, O, Gi consists of a sequence of calls to IW over the
subproblems ⇧k = hF, Ik , O, Gk i, k = 1, . . . , |G|, where:
1. I1 = I
2. Gk is the first consistent set of atoms achieved from Ik such that Gk
Gk ✓ G and |Gk | = k; G0 = ;.
3. Ik
1
represents the state where Gk is achieved, 1 < k < |G|.
8
1
⇢
The Serialized Iterated Width (SIW ) algorithm (formalized in definition
1.2.5) achieves the atomic goals of P sequentially. In other words, the k-th
subcall of SIW tries to achieve k goals from G: the k 1 goals already
achieved by the previous subcall, and one extra goal. Notice that SIW does
not use heuristic estimators to reach the goal, and is blind with respect to
the goal Gk achieved in each subcall. The SIW algorithm is sound, and the
solution to ⇧ can be obtained by concatenating the solutions of the subproblems ⇧1 , . . . , ⇧m . As it inheritates from IW , the solution of a subproblem
may be suboptimal, so the solutions of SIW may not be optimal either. On
the other hand, while IW algorithm is complete, SIW is not. The reason is
that the order of the serialization of the subgoals may lead to deadend states.
In order to minimize this situation and improve the SIW performance, the
atomic goals are required to be achieved, if possible, consistently: the state
sk consistently achieves Gk 2 G if sk achieves Gk , and Gk does not need to
be undone in order to achieve G.
The complexity of a serialization d is bounded by the maximum order of
complexity among all the subproblems in Pd . Similarly, the complexity of
the SIW algorithm is bounded by the maximum order of complexity among
all the subcalls to IW . The width of a serialization, and the effective width
of a serialization comes directly from the previous definition of the width of
a problem.
The advantage of serializing a problem is that the width of a serialization –
i.e., the maximum width over all subproblems – is significantly lower that the
width of the problem without serialization, and this is directly related with
the complexity needed to compute solutions for the problem. The table 1.1,
extracted from [Geffner and Lipovetzky, 2012], summarizes the performance
of the SIW algorithm in the typical benchmark domains. Their columns
show, respectively, the number of tested instances, the number of solved
instances, the average time needed by SIW to find a plan, the average of the
maximum width of each domain, and the average of the expected width over
all domains. As shown in the table 1.1, many instances can be solved with
an expected complexity that is exponential in a number between 1.6 and 2.5.
Instances
Solved
Time
avg(maxw )
avg(avgv )
1150
819
55.01
2.5
1.6
Table 1.1: SIW performance in a collection of instances of the typical benchmark
domains from the IPC.
9
1.3
Fully Observable Non-Deterministic Planning
Fully Observable Non-Deterministic planning (FOND planning) is the problem of finding an action strategy that maps a given initial state into a goal
state, assuming that the effects of the actions are not deterministic, and the
states are fully observable.
The solutions of a FOND planning problem are policies, or functions
that map of states into actions. Depending on the level of confidence of the
solutions, we distinguish among three types of plans:
(i) the weak plans are the sequence of action effects that possibly reach the
goal.
(ii) the strong plans are guaranteed to reach the goal in a bounded number
of steps.
(iii) the strong cyclic plans are guaranteed to reach the goal, assuming fair
non-deterministism but, in this case, not necessarily in a bounded number of steps. Loops may appear in strong cyclic solutions.
FOND Planners
Compared to classical planning, there is not that much research in FOND
planning. Some of the last FOND planners – e.g., FIP [Fu et al., 2008],
FF-H+ [Yoon et al., 2010], or PRP [Muise et al., 2012] – improve the performance of the older FOND planners, but still do not seem to exploit the structure of the problems as deeply as it has been the case in classical planning.
The table 1.2 shows the most relevant recent FOND and MDP 2 planners.
There is not much variety of FOND-specific planners, and some of them are
not FOND-specific, but MDP solvers. The fact that, in the IPPC 2008 there
was only one planner competitor in the FOND track suggests that, at least by
that time, there were not too many specifically advances in FOND planning.
2
The main difference between a Markov Decision Process (MDP) model and a FOND
model is that, in MDPs, the transition probabilities are defined, while in a FOND model
not. A FOND problem is easily translatable into an MDP equivalent problem, so an MDP
planner can be used to solve FOND problems.
10
Planner
Published
Track
Institution
PRP
Beaver
MIT-ACL
SPUDD
Glutton
PROST
FF-H+
Gamer
FIP
ICAPS 2012 FOND University of Toronto
IPPC 2011
MDP Oregon State University
IPPC 2011
MDP MIT
IPPC 2011
MDP University of Waterloo
IPPC 2011
MDP University of Washington
IPPC 2011
MDP University of Freiburg
ICAPS 2010 MDP Embeded Reasoning Area
IPPC 2008
FOND TU Dortmund
ICTAI 2008 FOND University of Texas
Table 1.2: Most relevant FOND and MDP planners, from 2008 to 2012.
1.4
Thesis Outline
Various approaches have been taken to explore the structure of classical planning problems. We propose to take advantage of the approach presented in
[Geffner and Lipovetzky, 2012] for classical planning, and adapt it to study
FOND problems.
In chapter 2 we introduce a width notion that bounds the complexity of
FOND planning problems, and demonstrate theoretically that some typical
benchmark domains have a low bounded width independently of the initial
state and problem size, provided that the goals are restricted to be single
atoms.
In chapter 3 we propose an algorithm to find solutions in FOND problems,
whose complexity is exponential in the problem width. We show empirically
that many of the typical benchmark domains have small bounded width
provided that the goals are restricted to be single atoms.
In chapter 4 we define a serialization model for FOND planning to tackle
FOND problems whose goals are conjunctions of atoms. Based on that, we
propose two different algorithms: an online planner, and an offline planner
that construct a serialization of the goals of the problem. We show empirically that some benchmark domains can be solved by a suitable serialization.
Finally, in chapter 5 we propose a method for checking the existence of
solutions in a graph representation of a planning problem. This method
can be applied in the algorithm introduced in chapter 3 with negligible time
complexity.
11
Part II
Structure
13
Chapter 2
A WIDTH NOTION FOR FOND
PLANNING
In this chapter we define a width notion, together with a graph representation,
that explains the complexity in Fully Observable Non-Deterministic (FOND)
problems. This width notion, inspired in the work in classical planning by
[Geffner and Lipovetzky, 2012], provides a new approach to understand the
complexity in FOND planning problems. Finally, we prove that some of the
typical benchmark domains have low width provided that the goals of the
problems are restricted to single atoms.
2.1
Motivation
The solutions of a classical planning problem are plans that map the initial
state of the problem into the goal state through a sequence of intermediate
states I = s0 , s1 , . . . , sm = G. None of the intermediate states si is a deadend
of the problem, because every si makes true a certain set of atoms that makes
possible to keep on advancing in the problem towards the goal.
In FOND planning, the effects of the actions are not deterministic and,
given a policy ⇡ that solves a FOND problem and guarantees the reachability
of the goal, every reachable state by ⇡ is necessarily not a deadend. Of course,
given an action a = ⇡(s), some effects of a may be desired because they
advance towards the goal state, whereas other effects of a may be undesired
because they move away from the goal state. However, in all cases every
reachable state makes true a set of atoms that permits to advance in the
problem.
We focus in these sets of relevant atoms that, when made true, make
feasible to advance towards the goal no matter the value of the other atoms.
15
These sets of atoms define a class of states, S , that is the set of states that
make true. In any policy ⇡, each reachable state do belong to a class of
the form S , so the transitions among states can be mapped into transitions
among classes.
Inspired by this notion, we will define a class of states rt , together with a
graph that encodes the transitions between the aforementioned classes. The
policies of the original problem become transitions in the graph, and the
structure of the subgraph defined by the policy defines which sets of atoms
(and its value) make feasible to advance in the problem until reaching the
goal state.
2.2
Tuple Graph
There exist different ways to study the reachability relations between states in
a planning problem, e.g. making use of heuristics [Bonet and Geffner, 2001],
or a graph representation of the problem [McDermott, 1999]. Planning graphs
were first introduced by GraphPlan planner [Blum and Furst, 1997], and are
a computationally cheap way to obtain information about the structure of a
problem. Every planning problem admits a trivial graph representation, in
which each state is mapped into a different node of the graph, and the transitions between states are represented by directed edges between nodes in the
graph. However, more abstract graph representations of a planning problem
have been developed, that are more convenient for studying its complexity.
Based on the approach taken in [Geffner and Lipovetzky, 2012] for classical planning, in this section we define the tuple graph for FOND planning
problems.
We consider that FOND problems expressed in the standard form of definition 2.2.1. We assume that action costs are all 1, and the optimal plans are
the shortest ones. In order to guarantee the consistency of the definitions, as
well as to guarantee the completeness of the algorithm that we will present,
we need to consider that the representation of a FOND problem P contains
the negate fluents. That is, for every literal p of P , there also exists a literal
q whose value corresponds to the negation of p. In terms of notation, we
write P(t) to refer to the planning problem that is like P but with goal t
and, given an action a, we write aj to denote the j-th effect of a.
Definition 2.2.1 (FOND Problem in Standard Form). We consider a FOND
Problem P = hF, I, A, G} in standard form when all costs are 1 and, for every
propositional fluent f 2 F , F also contains a fluent f¯ corresponding to the
negation of f .
16
The nodes of the tuple graph G are sets of states of the form S ? (t), where
t is a tuple of atoms from P. For every tuple t, the states that belong to
S ? (t) are the states that achieve t through an optimal weak plan. The edges
aj
in G are of the form S ? (t1 ) ! S ? (t2 ), when for every state s1 2 S ? (t1 ) exists
s2 2 S ? (t2 ) such that aj (s1 ) models s2 .
Finite-State Controller
Finite-state controllers are a compact action selection mechanism [Geffner and Bonet, 2013].
A finite-state controller CN with N controller states q0 , . . . , qN 1 for a fully
observable non-deterministic problem P can be fully characterized by a set
of tuples (q, a, o, q 0 ), where q and q 0 are controller states, and a is an action.
Such a tuple says to perform the action a when the controller is in state q,
and to switch then to state q 0 if the observation is o.
A subgraph G 0 of the tuple graph G defines a finite-state controller CG 0 on
the problem P. The states of CG 0 are the nodes S ? (t) of G 0 , and (S ? (t), a, oj , S ? (t0 ))
aj
is a tuple in CG 0 when there exist an edge S ? (t) ! S ? (t0 ) in G 0 for some effect
aj . Such a tuple says to perform the action a when the controller is in state
S ? (t), and to switch to state S ? (t0 ) when the effect of the action a is aj . The
resulting world state, then, models a state that belongs to S ? (t0 ). Note that,
in order the finite-state controller to be well defined, the node S ? (t) needs to
have an outer edge labeled with ai for all the effects ai of the action a.
Finally, let ⇧ be a strong cyclic policy for P. We say that ⇧ is equivalent
to ⇧G 0 , and we write ⇧ ⇠ ⇧G , when for every reachable state s by ⇧ and
every realization of the problem, the action given by the finite-state controller
defined by G 0 equals to ⇧(s).
Notion of Width
The world state s corresponding to the realization of the actions given by CG 0
is such that, when CG 0 is in a state S ? (t), the world state s models a certain
s0 2 S ? (t), but not necessarily all s00 2 S ? (t). However, the set of atoms that
are common in all the states of S ? (t) are true in s. Such set, that is a partial
state, is not empty and contains at least the atoms that are present in the
tuple t. We call this partial state the side-effects of the tuple t.
Definition 2.2.2. The side effects of a tuple of atoms t is the partial state
that makes true the atoms that are necessarily true when t is achieved
through an optimal weak plan.
The action given by CG 0 in a controller state S ? (t) is applicable in every
state s that models any s0 2 S ? (t) and, thus, is applicable in every state
17
that models the side effects of the tuple t. The transitions among controller
states can be seen then as transitions among states that make true the side
effects of certain tuples. That is, the side-effects of such tuples make feasible
to advance towards the goal of the problem.
Since the complexity of a finite-state controller is related to its number of
states, it makes sense to characterize the complexity of the planning problems
in terms of the minimum size of the subgraphs G 0 such that the finite-state
controller CG 0 solves the problem. With this purpose, we define the width
of a subgraph G 0 as the maximum size of the tuples t s.t. S ? (t) is a node
of G 0 . Since the of nodes in G 0 are of the form S ? (t) with |t|  w(P), the
maximum number of controller states necessary to solve the problem is, at
most, exponential in the width of the subgraph. Finally, we define the width
of a planning problem as the minimum width of the subgraphs that define
a strong cyclic policy that solves the problem. The width of a problem P
bounds the minimum number of states in a finite-state controller necessary
to solve P.
Definition 2.2.3. Let P be a FOND planning problem, and let G 0 be a
subgraph of the tuple graph G. The width of G 0 is defined as w(G 0 ) :=
max{|t| s.t. S ? (t) 2 G}.
Definition 2.2.4. Let P be a FOND planning problem. The width of P is
defined as w(P) := min{w(G 0 ) s.t. CG 0 solves P}.
Theorem 2.2.1. Let P be a FOND planning problem. The minimum size of
a finite-state controller that solves P is, at most, exponential in the problem
width.
Proof. If w(P) = w, there exists a subgraph G 0 of the tuple graph G that
defines a finite-state controller that solves P, and such that the nodes in G 0
are of the form S ? (t), |t|  w. The number of tuples with cardinality not
greater than w is exponential in w, and the number of nodes in G 0 is bounded
by the same number.
Theorem 2.2.2. Let P be a FOND planning problem. Given a strong cyclic
policy ⇧ that solves P, there exists a subgraph of the tuple graph G that defines
a policy equivalent to ⇧.
Proof. It is sufficient to consider, for every reachable state s by ⇧, the subgraph made of the reachable nodes S ? (s) = s and transitions given by the
policy ⇧. Here, the fact that P is in standard form and contains the negate
fluents is crucial.
18
2.3
Low Width Benchmarks
As long as the width of a planning problem gives a bound on its complexity,
we are interested to know if the width of the benchmark planning problems
is small. As an illustration, in this section we show a sketch of the proof that
the width of a selection of the Blocks World, the Climber, and the Bus-Fare
domains is bounded when their goals are restricted to single atoms, no matter
the initial state or the size of the problem. While the Blocks World is a wellknown domain in planning competitions, the other two domains have been
extracted from [Little and Thiebaux, 2007] and, according to the author, are
probabilistically interesting domains with potential for deadends that may be
difficult so be solved by state-of-the-art planners [Muise et al., 2012].
Blocks World
The Blocks World is a typical benchmark domain in classical planning. In
this problem, the agent can pick-up a block and move it either on the table
or on top of another block. The difference between the Blocks world domain
for classical planning and their adaptation into FOND planning is that, in
the last domain, the agent can also move a tower of 2 blocks at once. In
addition, the effects of the actions are not deterministic, and the blocks may
fall down on the table when the agent tries to pick-up or move them. The
complete description of the Blocks World domain in PDDL is provided in
listing 2.1. In this language the non-deterministic effects of an action are
represented by the oneof statement, that lists the possible outcomes of the
action.
Figure 2.1: Illustration of the Blocks World domain, in which an agent has to
order a set of blocks into an specific order.
19
Listing 2.1: PDDL domain description of the Blocks World FOND
problem
( d e f i n e ( domain b l o c k s domain )
( : r e q u i r e m e n t s : non d e t e r m i n i s t i c : e q u a l i t y : t y p i n g )
( : types block )
( : p r e d i c a t e s ( h o l d i n g ?b
b l o c k ) ( emptyhand ) ( on t a b l e ? b
block )
( on ? b1 ? b2
b l o c k ) ( c l e a r ?b
block ))
( : a c t i o n p i c k up
: p a r a m e t e r s ( ? b1 ? b2
block )
: p r e c o n d i t i o n ( and ( n o t (= ? b1 ? b2 ) )
( emptyhand ) ( c l e a r ? b1 ) ( on ? b1 ? b2 ) )
: e f f e c t ( oneof
( and ( h o l d i n g ? b1 ) ( c l e a r ? b2 ) ( n o t ( emptyhand ) )
( n o t ( c l e a r ? b1 ) ) ( n o t ( on ? b1 ? b2 ) ) )
( and ( c l e a r ? b2 ) ( on t a b l e ? b1 ) ( n o t ( on ? b1 ? b2 ) ) ) )
)
( : a c t i o n p i c k up from t a b l e
: parameters (?b
block )
: p r e c o n d i t i o n ( and ( emptyhand ) ( c l e a r ? b ) ( on t a b l e ? b ) )
: e f f e c t ( oneof
( and )
( and ( h o l d i n g ? b ) ( n o t ( emptyhand ) ) ( n o t ( on t a b l e ? b ) ) ) )
)
( : a c t i o n put on b l o c k
: p a r a m e t e r s ( ? b1 ? b2
block )
: p r e c o n d i t i o n ( and ( h o l d i n g ? b1 ) ( c l e a r ? b2 ) )
: e f f e c t ( oneof
( and ( on ? b1 ? b2 ) ( emptyhand ) ( c l e a r ? b1 )
( n o t ( h o l d i n g ? b1 ) ) ( n o t ( c l e a r ? b2 ) ) )
( and ( on t a b l e ? b1 ) ( emptyhand ) ( c l e a r ? b1 )
( n o t ( h o l d i n g ? b1 ) ) ) )
)
( : a c t i o n put down
: parameters (?b
block )
: p r e c o n d i t i o n ( h o l d i n g ?b ) ( c l e a r ?b )
: e f f e c t ( and ( on t a b l e ? b ) ( emptyhand ) ( c l e a r ? b ) ( n o t ( h o l d i n g ? b ) ) )
)
( : a c t i o n pick tower
: p a r a m e t e r s ( ? b1 ? b2 ? b3
block )
: p r e c o n d i t i o n ( and ( emptyhand ) ( on ? b1 ? b2 ) ( on ? b2 ? b3 ) ( c l e a r ? b1 ) )
: e f f e c t ( oneof
( and )
( and ( h o l d i n g ? b2 ) ( c l e a r ? b3 )
( n o t ( emptyhand ) ) ( n o t ( on ? b2 ? b3 ) ) ) )
)
( : a c t i o n put tower on b l o c k
: p a r a m e t e r s ( ? b1 ? b2 ? b3
block )
: p r e c o n d i t i o n ( and ( h o l d i n g ? b2 ) ( on ? b1 ? b2 ) ( c l e a r ? b3 ) )
: e f f e c t ( oneof
( and ( on ? b2 ? b3 ) ( emptyhand )
( n o t ( h o l d i n g ? b2 ) ) ( n o t ( c l e a r ? b3 ) ) )
( and ( on t a b l e ? b2 ) ( emptyhand ) ( n o t ( h o l d i n g ? b2 ) ) ) )
)
( : a c t i o n put tower down
: p a r a m e t e r s ( ? b1 ? b2
block )
: p r e c o n d i t i o n ( and ( h o l d i n g ? b2 ) ( on ? b1 ? b2 ) )
: e f f e c t ( and ( on t a b l e ? b2 ) ( emptyhand ) ( n o t ( h o l d i n g ? b2 ) ) )
)
)
Listing 2.2: PDDL instance description of the Blocks World
FOND problem
( d e f i n e ( problem example )
( : domain b l o c k s domain )
( : o b j e c t s b1 b2 b3
block )
( : i n i t ( emptyhand )
( on b3 b1 )
( on t a b l e b1 ) ( on t a b l e b2 )
( c l e a r b3 ) ( c l e a r b2 ) )
( : g o a l ( and ( emptyhand )
( on b1 b2 ) ( on b2 b3 )
( on t a b l e b3 )
( c l e a r b1 ) ) )
)
20
The width of the Blocks World problem is bounded by 3, provided that
the goals are single atoms. To prove it, is sufficient to prove that the problem
has bounded width when the goal is of one of the predicates of the problem:
(holding ?b - block), (emptyhand), (on-table ?b - block), (on ?b1 ?b2 - block),
or (clear ?b - block). We divide then the proof in five cases, depending on
the type of atomic goal, to see that in every case the with is bounded by, or
equal to 3.
Case A: holding ?b
Let b1 , . . . , bm be the blocks belonging to the same tower T of b, and on the
top of b. We consider the policy ⇡ that picks up a pair of blocks at once from
the top of T , and puts them down on the table. Depending on the parity
of the number of blocks on the top of b the agent may need to pick-up a
single block at the end of the plan. In addition, if the block b is on the table,
the agent can not hold b with the action pick-tower, but with the action
pick-up. Eventually, the agent will hold the block b.
8
pick-tower(bi , bj , bk ) if (emptyhand) (on ?bi ?bj ) (on ?bj ?bk ) (clear ?bi )
>
>
>
>
put-tower-down(bi , bj ) if (holding ?bj ) (on ?bi ?bj )
>
>
<
pick-up(bi , b) if (emptyhand) (on ?bi ?b) (clear ?bi )
⇡(s) =
pick-up(b, bi ) if (emptyhand) (on ?b ?bi ) (clear ?b)
>
>
>
>
put-down(bi ) if (holding ?bi ) (clear ?bi )
>
>
:
pick-up-form-table(b) if (emptyhand) (clear ?b) (on-table ?b)
When the number of blocks on top of b is odd, and there does exist blocks
below b, the policy ⇡ is equivalent to the policy ⇡G 0 , where G 0 is the subgraph
made of nodes of the form S ? (t) for the tuples tc,i = (clear ?bi ) (emptyhand),
th,i = (holding ?bi ) (not-emptyhand), th,0 = (holding ?b) (not-emptyhand).
It is not difficult to see that the sets S ? (t) are singletones, and the edges of G 0
are of the form S ? (tc,i )
pick tower(bi ,bj ,bk )1
put tower down(bi ,bj )
! S ? (th,j ), S ? (tc,i )
pick tower(bi ,bj ,bk )2
!
S ? (tc,i ), S ? (th,i )
! S ? (tc,k ). Therefore, the width of the
problem in this case is bounded by 2.
When the number of blocks on top ob b is even, and/or when b is on the table the finite-state controller is a little bit more complex, and we need to consider additional controller states S ? (t) with tuples to,1 = (clear ?b) (on-table ?b),
to,2 = (clear ?b) (not-on-table ?b), tm,1 = (emptyhand) (clear ?bm ), tm,2 =
(not-emptyhand) (clear ?bm ). The edges on the subgraph are those given by
the policy ⇡ above, and the width of the problem is still bounded by 2.
21
Case B: emptyhand
In this case, the trivial policy ⇡(s) = put-down(b) if (holding ?b) is guaranteed to solve the problem for any initial state.
put down(b)
The subgraph G 0 with S ? (t1 )
! S ? (t2 ), t1 = (not-emptyhand),
t2 = (emptyhand), is such that ⇡ ⇠ ⇡G 0 , so the width of this problem is 1.
Case C: on-table ?b
The policy shown in Case A is also a policy for this case.
Case D: on ?b1 ?b2
The trick here is to take profit of the existence of a controller state machine that places the blocks b1 and b2 on the table. It is not difficult to
see, from the previous cases, that this controller can be defined with a subgraph of width 2. We then add two extra controller states S ? (t) for the tuples t1 = (clear ?b1 ) (clear ?b2 ) (emptyhand), t2 = (holding ?b1 ) (clear ?b2 ).
The actions performed in these states are, respectively, pick-up(b1 ), and
put-on-block(b1 , b2 ). In this case, then, the width of the problem is bounded
by 3.
Case E: clear ?b
The policy shown in Case A is also a policy for this case.
Climber
The Climber problem scenario consists of an agent that is initially placed
on the roof, and whose goal is to descend to the ground without dying in
the attempt. The agent can climb down without ladder, and descend to the
ground with a chance to dye. It the ladder is raised, the agent can descend
safely to the ground. In case the ladder is on the ground, the agent can call
for help, and the ladder is raised. The detailed description in PDDL of the
Climber problem is specified in listings 2.3 and 2.4.
22
Listing 2.3: PDDL domain description of the Climber FOND
problem
( d e f i n e ( domain c l i m b e r )
( : r e q u i r e m e n t s : t y p i n g : s t r i p s : non d e t e r m i n i s t i c )
( : p r e d i c a t e s ( on r o o f ) ( on ground )
( l a d d e r r a i s e d ) ( l a d d e r on ground ) ( a l i v e ) )
( : a c t i o n climb without l a d d e r : parameters ( )
: p r e c o n d i t i o n ( and ( on r o o f ) ( a l i v e ) )
: e f f e c t ( and ( n o t ( on r o o f ) ) ( on ground )
( o n e o f ( and )
( not ( a l i v e ) ) ) ) )
( : a c t i o n c l i m b with l a d d e r : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( and ( on r o o f ) ( a l i v e ) ( l a d d e r r a i s e d ) )
: e f f e c t ( and ( n o t ( on r o o f ) ) ( on ground ) ) )
( : action c a l l for help : parameters ( )
: p r e c o n d i t i o n ( and ( on r o o f ) ( a l i v e ) ( l a d d e r on ground ) )
: e f f e c t ( and ( n o t ( l a d d e r on ground ) ) ( l a d d e r r a i s e d ) ) ) )
Listing 2.4: PDDL instance description of the Climber FOND
problem
( d e f i n e ( problem c l i m b e r problem )
( : domain c l i m b e r )
( : i n i t ( on r o o f ) ( a l i v e ) ( l a d d e r on ground ) )
( : g o a l ( and ( on ground ) ( a l i v e ) ) ) )
The width of the Climber problem is 1 for any possible initial state and
size of the problem, provided that the goals are single atoms. To prove it, it
is sufficient to prove that the width of the problem is bounded by 1 when the
goal is one of the predicates of the problem: (on-roof ), (on-ground), (ladderraised), (ladder-on-ground ), or (alive). We divide then the demonstration in
five cases, depending on the type of atomic goal, to see that in every case the
with is bounded by, or equal to 1. As a consequence, the Climber domain
can be solved in linear complexity.
We assume that, in the initial state, the agent is alive and placed on the
roof, and the ladder is placed on the ground.
Case A: on-roof
In the climber domain, the agent is supposed to be on the roof in the initial
state. No action is needed to achieve the goal, so the width in this case is 0.
Case B: on-ground
In this case, the trivial policy ⇡(s) = climb-without-ladder is guaranteed
to solve the problem for any initial sate.
We can consider the subgraph G 0 whose nodes are of the form S ? (t) for
clib without ladder
tuples t0 = (on-roof), t1 = (on-ground), and edge S ? (t0 )
!
?
S (t1 ). Therefore, the width of the problem in this case is 1.
23
Case C: ladder-raised
In this case, the trivial policy ⇡(s) = call-for-help is guaranteed to solve
the problem for any initial sate.
We can consider the subgraph G 0 whose nodes are of the form S ? (t) for tu-
ples t0 = (not-ladder-raised), t1 = (ladder-raised), and edge S ? (t0 )
S ? (t1 ). Therefore, the width of the problem in this case is 1.
call f or help
!
Case D: ladder-on-ground
There is no action that places the ladder on the ground, so we need to assume
that it is already placed on the ground in the initial state. No action is needed
to achieve the goal, so the width in this case is 0.
Case E: alive
There is no action that turns the agent back to life, so we need to assume
that the agent is alive in the initial state. No action is needed to achieve the
goal, so the width in this case is 0.
Bus Fare
The Bus Fare problem scenario consists of an agent that wants to buy a bus
fare. There exist three types of coins. The bus fare can be exchanged by a
type-3 coin. Additionally, there exist different tasks that allow the agent to
change coins. The agent can bet a type-1 coin, getting a type-3 coin as a
reward or losing it. The agent can bet a type-2 coin, getting either a type-1
or a type-3 coin as a reward. The agent can invest a type-1 coin on washing a
car, getting a type-2 coin as a reward or keeping the type-1 coin. The agent
can invest a type-2 coin on washing a car, getting a type-1 coin as a reward
or keeping the type-2 coin.
The detailed description in PDDL of the Bus Fare problem is specified in
listings 2.5 and 2.6.
24
Listing 2.5: PDDL domain description of the Bus Fare FOND
problem
( d e f i n e ( domain bus f a r e )
( : r e q u i r e m e n t s : t y p i n g : s t r i p s : e q u a l i t y : non d e t e r m i n i s t i c )
( : types coin )
( : p r e d i c a t e s ( have 1 c o i n ) ( have 2 c o i n ) ( have 3 c o i n ) ( have
( : a c t i o n be t c o i n 1 : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( have 1 c o i n )
: e f f e c t ( and ( n o t ( have 1 c o i n ) )
( oneof
( and )
( have 3 c o i n ) ) ) )
( : a c t i o n be t c o i n 2 : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( have 2 c o i n )
: e f f e c t ( and ( n o t ( have 2 c o i n ) )
( oneof
( have 3 c o i n )
( have 1 c o i n ) ) ) )
( : a c t i o n wash c a r 1 : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( have 1 c o i n )
: e f f e c t ( oneof
( and )
( and ( n o t ( have 1 c o i n ) ) ( have 2 c o i n ) ) ) )
( : a c t i o n wash c a r 2 : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( have 2 c o i n )
: e f f e c t ( oneof
( and )
( and ( n o t ( have 2 c o i n ) ) ( have 1 c o i n ) ) ) )
( : a c t i o n buy f a r e : p a r a m e t e r s ( )
: p r e c o n d i t i o n ( have 3 c o i n )
: e f f e c t ( and ( n o t ( have 3 c o i n ) ) ( have f a r e ) ) ) )
fare ))
Listing 2.6: PDDL instance description of the Bus Fare FOND
problem
( d e f i n e ( problem bus f a r e
( : domain bus f a r e )
( : i n i t ( have 1 c o i n ) )
( : g o a l ( have f a r e ) ) )
problem )
The width of the Bus Fare problem is 1 for any possible initial state and
size of the problem, provided that the goals are single atoms. To prove it,
is sufficient to prove that the width of the problem is bounded by 1 when
the goal is one og the predicates of the problem: (have-1-coin, have-2-coin,
have-3-coin, or have-fare. We divide then de demonstration in four cases,
depending on the type of atomic goal, to see that in every case the width is
bounded by, or equal to 1. As a consequence, the Bus Fare domain can be
solved in linear complexity.
We assume that, in the initial state, the agent as at least one coin.
Case A: have-1-coin
In this problem there is no way to get a type-1 coin from a type-3 coin, so we
assume that, in this case, not all the coins of the agent are of type-3. We consider the trivial policy ⇡(s) = wash-car-2. Assuming fair non-determinism
of the action effects, this policy is guaranteed to solve the problem for any
initial state.
25
We can consider the subgraph G 0 whose nodes are of the form S ? (t) for tu-
ples t0 = (non-have-1-coin), t1 = (have-1-coin), and edges S ? (t0 )
?
?
S (t0 ), S (t0 )
this case is 1.
wash car 12
wash car 11
!
! S (t1 ). Therefore, the width of the problem in
?
Case B: have-2-coin
In this problem there is no way to get a type-1 coin from a type-3 coin, so we
assume that, in this case, not all the coins of the agent are of type-3. We consider the trivial policy ⇡(s) = wash-car-1. Assuming fair non-determinism
of the action effects, this policy is guaranteed to solve the problem for any
initial state.
We can consider the subgraph G 0 whose nodes are of the form S ? (t) for tuples t0 = (non-have-2-coin), t1 = (have-2-coin), and edges S ? (t0 )
S ? (t0 ), S ? (t0 )
this case is 1.
wash car 22
wash car 21
!
! S ? (t1 ). Therefore, the width of the problem in
Case C: have-3-coin
We consider the policy that, when the agent has a type-1 coin, it invests it
washing a car, so eventually it will get a type-2 coin. When the agent has a
type-2 coin, it bets it, so eventually it will get a type-3 coin. Assuming fair
non-determinism of the action effects, ⇡ defines a valid strong cyclic policy
for the problem.
⇡(s) =
⇢
wash-car-1 if (have-1-coin)
bet-coin-2 if (have-2-coin)
We can consider the subgraph G 0 whose nodes are of the form S ? (t)
for tuples t1 = (have-1-coin), t2 = (have-2-coin), t3 = (have-3-coin), and
edges S ? (t1 )
wash car 11
S ? (t3 ), S ? (t2 )
case is 1.
! S ? (t1 ), S ? (t1 )
bet coin 22
wash car 12
! S ? (t2 ), S ? (t2 )
bet coin 21
!
! S ? (t1 ). Therefore, the width of the problem in this
Case D: have-fare
We consider a similar policy than the one presented with the goal (have-3coin), with the difference that here, an additional action is considered when
the agent has a type-3 coin.
26
8
< wash-car-1 if (have-1-coin)
bet-coin-2 if (have-2-coin)
⇡(s) =
:
buy-fare if (have-3-coin)
We can consider the subgraph G 0 of the previous subsection, with the
buy f are
extra node S ? (tg ), tg = (have-fare) and extra edge S ? (t3 )
! S ? (tg ).
Therefore, the width of the problem in this case is 1.
2.4
Conclusion
In FOND problems, the non-determinism of the actions may result in undesired action effects that move the agent away from the optimal path towards
the goal state. Computing strong cyclic policies that solve those type of
problems is NP hard. We exploit the idea that every reachable state by a
strong cyclic policy ⇡ is not a deadend of the problem, because it makes
true a certain tuple of atoms that allows the agent to advance in the problem towards the goal. We have introduced a width notion that measures the
complexity of FOND planning problems based on the reachability over these
tuples of atoms. Finally, we have proven theoretically that some benchmark
domains have small bounded width no matter the initial state or the size of
the problem, provided that the goals are restricted to single atoms. Most notable is that some of these domains are tricky and have considerable number
of deadends, while its width parameter is considerably low.
27
Chapter 3
NON-DETERMINISTIC
ITERATED WIDTH
In this chapter we introduce the Non Deterministic Iterated Width (NDIW)
algorithm. The NDIW algorithm is able to find strong cyclic policies for
FOND problems, and runs in time and space complexity that is exponential
in the problem width. We then prove experimentally that many of the typical
benchmark domains have low width, provided that goals are restricted to
single atoms.
3.1
Tuple Graph G i
The tuple graph G introduced in chapter 2 is made of nodes S(t) that are the
set of states that achieve a tuple of atoms t following an optimal weak plan,
and its computation is intractable (PSPACE complete). Therefore, instead
of computing the sets S ? (t) are interested in computing a tuple graph G i
whose nodes are an estimation of the side-effects of the tuples t with size no
longer than the parameter i, and whose complexity is polynomial.
Definition 3.1.1. T i is the set of tuples t from the problem P that have
size no greater than the parameter i.
Definition 3.1.2. Let P be a FOND problem. The tuple graph G i is the
graph with nodes of the type rt , t 2 T i , defined inductively as follows:
1. rt is a root in G i if t is true in I.
2. R? (t0 ) is the set of states s such that there exists rt 2 G i , an i-compact
action a and one effect aj such that s = aj (rt ), t0 is true in s, and s
achieves t0 through a minimal length path in G i .
29
3. rt is the partial state that contains the literals that are true in all the
states in R? (t).
aj
4. The edge rt ! rt0 is in G i if a is i-compact in rt , and rt0 is true in
aj (rt ).
Definition 3.1.3. An action a is i-compact in a partial state rt if, for every
effect aj , the state s = aj (rt ) either makes true a tuple t 2 T i through an
minimal length path in G i , or s models a partial state rt0 in G i .
aj
An edge rt ! rt0 in G i indicates that, for every state s that models rt ,
it exists an action a and an effect aj that maps s into a state that models
rt0 . In addition, the i-compactness criteria guarantee that any other effect
of a maps s into a state that models some partial state rt00 . In other words,
the tuple graph G i defines a finite-state controller, whose states rt are partial
states, and the transitions correspond to actions. We say that G i solves P if
there exists a subgraph of G i that defines a strong cyclic policy in form of a
finite-state controller.
Theorem 3.1.1. Let P be a FOND planning problem. If w = w(P), the
tuple graph G w solves P.
proof sketch. Consider a subgraph G 0 of G such that w(P) = w(G 0 ) = w. We
want to see that there exist a mapping from G 0 into a subgraph of G w that
defines a strong cyclic policy in form of a finite-state controller.
The mapping between nodes is injective, and is defined inductively as
follows: Let r0 = I be a initial state of G w , and let S 0 = {I} be a initial state
of G 0 . Let ak be the policy given by ⇧G 0 in the state S k , 0  k < |G 0 | 1.
Then, the action ak is w-compact in rk . In addition, for every effect ajk , there
aj
k
exists a tuple t, an edge S k !
S ? (t) in G 0 and a state rt in G w such that rk
models the side effects of t.
This mapping defines a subgraph of G w , such that the policies defined by
both subgraphs are equivalent. As ⇧G 0 is a policy for P, the sets S ? t that
are goal states are such that the side effects of t do model the goal G. In
consequence, the partial state rt corresponding to the mapping of S ? (t) in
G w also models G, so is a goal state of the problem.
3.2
Algorithm
In this section we introduce the Non-Deterministic Iterated Width (N DIW )
algorithm. Inspired in the Iterated Width (IW ) algorithm used in classical
30
planning [Geffner and Lipovetzky, 2012], the N DIW algorithm consists of
a sequence of calls N DIW (i) for i = 0, 1, 2, . . . over a problem P until
the problem is solved or i exceeds the number of problem variables. Each
iteration N DIW (i) is an i-width search that expands the graph G i and checks
for the existence of a subgraph that defines a strong cyclic policy for P in
form of a finite-state controller.
N DIW (i) is a forward-state breadth-first search of partial states rt with
a couple of variations: the partial states rt are only expanded by i-compact
actions into states s, and those states s are used to compute new partial
states rt in the graph. As shown in lemma 3.2.1, the partial states rt are
computed sequentially along with the expansion of G i , and without the need
of the set R? (t). Each effect of an i-compact action a in rt either leads to
a state s0 through an optimal weak plan for a tuple t0 2 T i , or leads to a
state s0 that makes true a state of the form rt0 , t0 2 T i . In the first case,
rt0 is updated with the intersection of s0 as suggested in lemma 3.2.1. In the
second case, it is sufficient to test if s0 makes true all the atoms of some state
rt 0 .
Lemma 3.2.1. Let P be a FOND planning problem, and let ⇡ 1 (t), . . . , ⇡ m (t)
be the collection of optimal weak plans for a tuple t, in an arbitrary order.
The partial state rt , is the limit of the sequence rt1 , . . . , rtm , where:
(i) rt1 = ⇡ 1 (t)
(ii) rti = rti
1
\ ⇡ i (t), t = 2, . . . , m
Proof. The atoms that are true in rt are those that are true in all states in
R? (t) so, as a set of atoms, rt is the intersection of all states s 2 R? (t), i.e,
the intersection of ⇡ k , 1  k  m.
Definition 3.2.1. N DIW (i) is a breadth-first search that keeps partial
states rt , t 2 T i , and expands states rt by i-compact actions.
Listing 3.1: NDIW(i) pseudoalgorithm.
NDIW i ( I n i t ) :
for t u p l e t in I n i t :
r_t <
I
Stack <
r_t
while not S t a c k . empty ( ) :
r <
S t a c k . pop ( )
f o r i compact a c t i o n a i n r :
f o r s , a^ j i n F( a , s ) :
f o r new t u p l e t i n s :
r_t <
s
Add_Edge ( r , r_t , a^ j )
for t u p l e t in s a c h i e v e d
r_t <
r_t \ cup s
Add_Edge ( r , r_t , a^ j )
31
optimally :
According to theorem 3.1.1, the NDIW(w) algorithm is guaranteed to
find a strong cyclic policy if it exists. However, the width of a problem is not
known in general, so the N DIW algorithm needs to iterate through searches
N DIW (i) for i = 0, 1, . . . until a solution is found (maybe with i<w(P)
or i exceeds the number of variables in P. Note that N DIW (i) algorithm
expands the search graph in a blind manner, i.e., the goal G of the problem
P does not interfere in this process. Upon expansion of the search graph G i ,
the N DIW algorithm checks for the existence of a subgraph that define a
policiy that solves P. This step is independent of the construction of the
graph, and can be performed using different techniques (see chapter 5). The
complexity of the complete process is still exponential in the problem width.
Definition 3.2.2. The N DIW (i) algorithm solves P if the tuple graph G i
contains a subgraph G 0 that defines a policy for P in form of a finite-state
controller.
Definition 3.2.3. N DIW calls N DIW (i) sequentially for i = 0, 1, . . . until
the problem is solved or i exceeds the number of problem variables.
Theorem 3.2.2. For solvable problems P, the time and space complexity of
N DIW are exponential in 2w(P).
Proof. The NDIW algorithm performs the calls N DIW (i), i = 1 . . . w(P).
In each N DIW (i) subcall, the tuple graph G i is expanded. The number of
nodes in G i is exponential in the parameter i, so the space complexity of
N DIW is exponential in w(P). Despite the tuple graph is usually sparse,
the worst-case number of edges in G i is exponential in 2w(P).
The partial states rt are computed dynamically along with the expansion
of the tuple graph G i . In order to check if an action a is i-compact in a partial
state rt , NDIW(i) checks whether, for each effect aj , it exists a tuple t0 2 T i ,
t0 2 s = F (rt , aj ), such that either t0 is generated optimally, or s |= rt0 and
the partial state rt0 is already generated. Checking if an action is i-compact
in a partial state has a complexity that is exponential in i. Performing these
tests during the expansion of G i takes a complexity that is exponential in 2i
so, again, the worst-case complexity is exponential in 2w(P).
When expanding a partial state rt by an i-compact action a, N DIW (i)
adds the outer edges connecting rt with the corresponding nodes in the graph.
aj
An edge rt ! rt0 exists when either the tuple t0 has been generated optimally,
or the state s = F (rt , aj ) models rt0 . Similarly to the check of i-compact
actions, these process is exponential in the parameter i so, adding the edges
of all the nodes in G i takes a complexity that is exponential in 2i.
Finally, the time needed to check if G i encodes a policy for P, according
to the algorithm presented in chapter 5, is proportional to N 2 .
32
3.3
Empirical Evaluation
In this section we analyze the performance of the NDIW algorithm through
a series of tests on a selection of domains used in previous planning competitions, restricting the problems to have atomic goals. First, we compare
the width reported by the N DIW algorithm against the width reported by
the IW algorithm in classical planning problems. Then, we move to FOND
domains, for which we compute the width, and compare the performance of
the N DIW planner against the state-of-the-art FOND planners.
Width in Classical Problems
The classical planning problems can be considered a subclass of the FOND
planning problems – those in which every set of effects is a singleton, i.e.,
the actions are deterministic –, so in particular, the N DIW algorithm is also
able to solve classical planning problems. As long as the width notion and
NDIW algorithm introduced in this manuscript is based in the width notion
and IW algorithm introduced in [Geffner and Lipovetzky, 2012] for classical
planning, it is interesting to compare the performance and complexity of the
N DIW and IW algorithms in classical planning problems.
The NDIW algorithm has been tested with a selection of classical planning problems used in previous editions of the IPC. The experiments have
been run on Xeon Woodcrest computers with clock speeds of 2.33 GHz, using
2GB memory limit. The results of the tests are presented in table 3.1, and
are compared with those of IW, run on the same experimental setup, and
extracted from [Geffner and Lipovetzky, 2012]. For every domain, a number
of different instances with atomic goals have been tested, and classified according to the (effective) width of the problem. The number of instances
that can be solved by NDIW in a time limit of 30 minutes is indicated in
parenthesis (see also figure 3.2).
Most part of the tested domains appear to have a low width, bounded
by 2, no matter the initial state or the size of the problem. That is, as
long as the goal of the problem is atomic, it can most probably be solved in
linear or quadratic time and complexity. In addition, the width distribution
among the problems is very similar in both algorithms. In other words, the
complexity of the tuple graph of the problems using NDIW is very similar to
that of IW. This is reasonable, since the width parameter defined in FOND
planning is based on the width notion of classical planning and, restricted
33
Domain
8puzzle
Blocksworld
ferry
floortile
Grid
gripper
logistics
Mystery
parking
pegsol
satellite
scanalyzer
sokoban
tidybot
tower
visitall
woodworking
zeno
# instances
NDIW
IW
408
266
650
321
(408)
(245)
(379)
(321)
3 (3)
379 (376)
214 (208)
14 (14)
31 (31)
660 (660)
42 (26)
419 (419)
99 (34)
92 (7)
232 (232)
2811 (1486)
1486 (1435)
95 (87)
w=1
NDIW
IW
400
598
650
538
19
1275
249
30
540
964
308
624
153
84
–
21859
1659
219
53%
22%
36%
98%
33%
0%
37%
14%
100%
90%
0.15
95%
27%
14%
100%
100%
100%
0.37
55%
26%
36%
96%
5%
0%
18%
7%
77%
92%
11%
100%
37%
12%
–
100%
100%
21%
w=2
NDIW
IW
47%
71%
46%
2%
67%
100%
62%
86%
0%
10%
0.85
5%
33%
0%
0%
0%
0%
0.63
45%
74%
64%
4%
84%
100%
82%
93%
23%
8%
89%
0%
26%
39%
–
0%
0%
79%
w 3
NDIW
IW
0%
7%
19%
0%
0%
0%
1%
0%
0%
0%
0.0
0%
39%
86%
0%
0%
0%
0.0
0%
0%
0%
0%
11%
0%
0%
0%
0%
0%
0%
0%
27%
49%
–
0%
0%
0%
Table 3.1: N DIW vs. IW width distribution in typical benchmark domains.
to classical planning problems, the NDIW algorithm is quite similar to IW.
When restricted to deterministic problems, both algorithms are based in a
breadth-first search with pruning that keep the states that introduce new
tuples of atoms optimally. The difference is that in the NDIW algorithm,
the nodes are the interesection of all states that reach a tuple t optimally,
whereas the IW algorithm operates with the first state that optimally reach
t. The nodes in the tuple graph expanded by NDIW are more restrictive
than in the IW algorithm and, as a consequence, if a solution exists for
N DIW (i), the same solution must exist in IW (i), but not the opposite.
This appreciation is patent in the results shown in the table 3.1, where the
width of the problems using NDIW is always slightly higher than the width
of the problems using IW.
Width in FOND problems
The NDIW algorithm has been tested with a selection of FOND problems
used in previous planning competitions. The results of the tests are presented
in table 3.2, and reveal that the majority of the problems have small width
when the goals are restricted to single atoms. For every domain, a series
of N different instances with atomic goals have been tested, and classified
34
according to the width of the problem. The number of instances that can be
solved by NDIW in a time limit of 30 minutes is indicated in parenthesis. As
in the case of deterministic problems, most of the tested domains appear to
have a low width, bounded by 2, no matter the initial state or the size of the
problem. That is, the NDIW algorithm can solve any atomic goal in linear
or quadratic time and complexity. However, there are a significant number
of domains that appear to have higher width.
problem name
blocksworld
blocksworld-2
busfare
climber
elevators
first-responders
forest-new
rectangle-tireworld
river
tireworld
triangle-tireworld
zenotravel
N (Nsolved )
w=1
w=2
252 (199)
126 (77)
1 (1)
2 (2)
90 (90)
399 (387)
180 (20)
16 (9)
1 (1)
19 (10)
40 (31)
13 (15)
76%
63%
100%
100%
33%
86%
0%
0%
100%
21%
100%
31%
18%
19%
0%
0%
67%
14%
0%
25%
0%
0%
0%
69%
w
3
6%
18%
0%
0%
0%
0%
100%
75%
0%
79%
0%
0%
Table 3.2: Width reported by N DIW in a selection of FOND benchmark problems
with atomic goals.
Performance Analysis
In this section we look in detail at the execution stats of the NDIW planner
over the deterministic and FOND problems listed, respectively, in tables 3.1
and 3.2. The proves have been run on Xeon Woodcrest computers with
clock speeds of 2.33 GHz, using 2 GB memory, and 30 minutes time limit on
execution. The figure 3.1 shows the execution time of each problem according
to the size of the problem. We distinguish, for every problem instance, if
NDIW ends the process with success, a memory limit exception (MLE ), or
a time limit exception (TLE ).
The NDIW planner is able to solve problems when they have small size,
but when the size a the problem overpasses certain threshold, the planner
will most likely run out of memory. Likewise, some problems seem to be
too complex, and the execution of NDIW likely exceeds the time limits even
when they have small size. The execution success rate of NDIW in FOND
problems (with atomic goals) is quite high, as it can be seen in table 3.2.
However, the constraints in memory and time seem to be more problematic
35
(a) Execution stats of NDIW over deterministic problems.
(b) Execution stats of NDIW over FOND problems.
Figure 3.1: Execution stats of NDIW over deterministic and FOND problems. We
distinguish among successfull processes, time limit exceptions, and memory limit
exceptions.
in classical planning problems. That is, the classical planning benchmark
problems are more complex than those in FOND planning. And it makes
sense, since the state-of-the-art in classical planning is more advanced than
in FOND planning.
The figure 3.2 shows the distribution of the time needed by NDIW to
solve the deterministic and FOND problems, where we have omitted those
problems that need more than 2 GB of memory, or last more than 30 minutes
in execution. Roughly half part of all the instances are solvable in less than
1 second, and more than 90% are solvable in less than 100 seconds. This
fact can be useful to establish a threshold on the complexity of a problem
according to the execution time: if the execution time exceeds the order
of 102 seconds, the problem will likely be too complex for NDIW, and will
exceed the memory and/or time limit constraints.
One of the complex FOND benchmark problems is the triangle-tireworld,
36
Empirical Time Cumulative Distribution
Empirical Time Cumulative Distribution
1
Problems Solved Frequency
Problems Solved Frequency
1
0.8
0.6
0.4
0.2
0
0
500
1000
1500
2000
time/s
0.8
0.6
0.4
0.2
0 −4
10
−2
10
0
10
2
10
4
10
log(time/s)
Figure 3.2:
Time to find a solution by N DIW for the FOND and classical planning
domains restricted to atomic goals.
used in the IPC’2006 and IPC’2008 competitions. There is an inherent complexity in this domain that makes it difficult to be solved, not only because
of the huge states space, but also because of the likelihood to enter in a
deadend state. At the time they were published, the FF-Hinsight+ planner [Yoon et al., 2010] was recognized for solving the triangle-tireworld-17
instance of the problem and, later on, the PRP planner [Muise et al., 2012]
claimed for being capable of solving up to the triangle-tireworld-35 instance,
both with similar experimental setup and the same restrictions on time and
memory. The execution stats for this problem are shown in figure 3.3. The
NDIW planner is able to solve up to the triangle-tireworld-31 instance,
thanks to the apparently low width of the problem (see table 3.2). in addition,
most time in execution is due to preprocessing, while the proper execution
time of the NDIW algorithm is really short. The failure of the algorithm is
due to the memory constraints, and the biggest instance that can be solved
(triangle-tireworld-31 is still far from the time limits.
In order to evaluate the performance of the N DIW algorithm in FOND
problems, we have compared it against two state-of-the-art FOND planners:
FF-H+ [Yoon et al., 2010], and PRP [Muise et al., 2012]. We have chosen
three different domains: the triangle-tireworld, the bus-fare, and the river.
The triangle-tireworld is a FOND benchmark domain used in the IPC competition. Likewise, the bus-fare, and river domains are two “probabilistically interesting” domains that have potential deadends [Little and Thiebaux, 2007].
All these three domains have atomic goals, so NDIW can be used to completely solve them. A performance comparison between NDIW and the other
two FOND planners is shown in table 3.3, where the stats of FF-H+ and PRP
have been extracted from their respective papers. Both FF-H+ and PRP act
as an online planners, so the success rate and execution time are reported
37
Figure 3.3: Time to find a strong cyclic plan in the triangle tireworld problem,
respect to the problem size.
over 30 runs of each problem. However, since NDIW algorithm is an offline
planner, the same stats have been scaled to 30 runs.
problem name
busfare
river
triangle-tire-1
triangle-tire-17
triangle-tire-31
triangle-tire-35
Success Rate
NDIW
PRP FF-H+
100%
100
100%
100%
100%
–
100%
67%
100%
100%
100%
100%
100%
100%
100%
100%
–
–
NDIW
0
0(0)
0 (0)
29
723
–
time(s)
PRP FF-H+
0
0
0
19
–
1520
–
–
–
–
–
–
Table 3.3: N DIW vs. P RP vs. FF-H+ performance. Time results for 30 runs.
The NDIW planner solves the river domain in almost negligible time,
while the FF-H+ planner only solves 67% of the instances. In addition, the
NDIW planner overpasses the perfomance of FF-H+ in the triangle-tireworld
domain, by solving up to the triangle-tireworld-31 instance while the FF-H+
planner is only capable to solve up to the triangle-tireworld-17 instance. The
PRP planner slightly performs better than NDIW in the triangle-tireworld
domain, by solving up to the triangle-tireworld-35 instance, while the rest of
the statistics are quite similar.
3.4
Conclusion
We have introduced N DIW , a FOND algorithm that is sound and complete, and is able to solve FOND problems in time and complexity that is
exponential in the problem width.
38
We have proven empirically that, in deterministic problems width atomic
goals, the width parameter distribution reported by N DIW is similar to the
width parameter reported by N DIW . However, while the complexity of IW
is exponential in the problem width, the complexity of N DIW is exponential
in the double of the width parameter.
A series of tests over FOND benchmark domains have revealed that these
problems have low width when the goals are restricted to single atoms. However, some problems are rather more complex, and its width is higher than
2.
Finally, we have looked in detail into one domain, the triangle-tireworld,
and we have seen that NDIW can solve it with an efficiency that is comparable with the last state-of-the-art planners.
39
Chapter 4
SERIALIZED NDIW
In this chapter we introduce the Serialized Non-Deterministic Iterated Width
(S-NDIW ) algorithm, an extention NDIW that is able to solve FOND problems whose goals are conjunction of atoms through a simple form of decomposition. We define two versions of S-NDIW : an offline planner, and an online
planner. Finally, we present an experimental evaluation of the performance
of these algorithms.
4.1
Serialized Width
The empirical results of the N DIW algorithm in the typical benchmark
domains reveal that most of the problems have small width provided that
the goals are restricted to single atoms. However, the width of the same
problems increases significantly when the goals are conjunction of atoms.
Several approaches exist to tackle problems when the goal is a conjunction
of atoms ([Chapman, 1987], [Richter and Westphal, 2010],[Hoffmann et al., 2004]).
One of the most simple methods is by using landmarks , so that the different
goals of the problem are achieved sequentially. In this section we present a
method that divides the problem into a series of subproblems that are solved
sequentially. The advantage of this procedure is that, by taking advantage
of the structure of the subproblems, the order of complexity of the overall
process can be comparable to that of the problems with single goals.
As introduced in [Geffner and Lipovetzky, 2012], a serialization for a
problem ⇧ with goal G is a sequence G1 , . . . , Gm , m = |G|, such that G0 = ;,
Gm = G, and Gi+1 extends Gi by one additional atom from G. A serialization d defines a series of planning problems Pd = P1 , . . . , Pm , where each Pi
is like ⇧ but with goal Gi . The difference in the definition of the serialization
between the one proposed here, and the one that applies to classical planning
41
relies in the initial state si of the subproblems Pi : these states depend on the
landmark states in Pi 1 , and are defined differently in the offline and online
serialized FOND planners that we present in this chapter.
The Serialized Iterated Width (S-NDIW ) is a search algorithm that uses
NDIW searches both for constructing a serialization of the problem P and
for solving the resulting subproblems. While N DIW is a sequence of iwidth searches N DIW (i), i = 0, 1, . . . over the same problem, S-NDIW is
a sequence of N DIW calls over |G| subproblems Pk , k = 1, . . . , |G|. The
solution of P is then the execution of each one of the policies that solve
the serialization sequentially. The order of the serialization of goals affects
the overall solution of the problem and, concretely, a non adequate serialization may be determinant on the success or failure of the algorithm.
Thus, the S-NDIW algorithm is not complete. In order to increase the
chance of constructing a successful serialization, a consistency check (from
[Geffner and Lipovetzky, 2012] is applied in the nodes candidate to be landmark states.
Definition 4.1.1 (Consistency of a Landmark). Let Gi be the i-th term of
a serialization d. A state s consistently achieves a subset of goals Gi+1 when
Gi+1 is true in s, Gi+1 Gi , extends Gi with one atom from G, and it exists
a weak plan from s that achieves the goal G, such that Gi+1 is true in every
intermediate state.
The series of subproblems Pd = P1 , . . . , Pm defined by the serialization d
are solved by a sequence of calls to N DIW , that solves every subproblem Pi
in time and complexity that is exponential in w(Pi ). As the number of subproblems is finite, the overall time and complexity of S-NDIW is exponential
in ws = max1im w(Pi ). Based on this intuition, we define the serialized
width of a serialization Pd as the maximum width w(Pi ) of the subproblems.
In the next two sections we present two versions of S-NDIW, that differ
on the definition of the sequence of subproblems Pd given a serialization d
of the goals in P. The Offline S-NDIW is ideally to be used by an offline
planner, since it computes a complete solution that is valid for any possible
realization of P. The Online S-NDIW is ideally to be used by an online
planner, since it takes profit of the knowledge acquired during the realization
of the problem.
4.2
Offline S-NDIW Algorithm
The Offline Serialized Iterated Width (Offline S-NDIW ) algorithm uses NDIW
to compute a solution for P without any other information than the defini42
tion of P. The solution given by Offline S-NDIW is a policy ⇡ that is able
to solve the problem for every possible realization of P.
The Offline S-NDIW algorithm, described in definition 4.2.1, constructs
a serialization of the problem P, together with a series of policies ⇡ 1 , . . . , ⇡ m
that achieve the goals of the serialization sequentially. Each subpolicy ⇡ i
maps the agent from a state that makes true Ii , into a state that makes true
Ii+1 , and achieves a new atom from G. Despite this state is not known in
advance, the the next subpolicy ⇡ i+1 is guaranteed to lead the agent into a
state that achieves another atom of G, and so on.
Definition 4.2.1. Offline Serialized NDIW (Offline S-NDIW) over P =
hF, I, A, Gi consists of a sequence of calls to N DIW over the problems Pk =
hF, Ik , A, Gk i, k = 1, . . . , |G| , where:
1. I1 = I
2. Gk is a set of atoms achieved from Ik such that Gk
|Gk | = k; G0 = ;
1
⇢ Gk ✓ G and
3. Ik+1 represents the partial state that contains the atoms that are true
in all the landmark states where Gk is achieved through ⇡ k , required
to be a consistent set.
In other words, the k-th subcall of S-NDIW stops when NDIW finds a
landmark goal Gk ⇢ G and a policy ⇡ k for Gk such that the intersection of
all landmark states reachable through ⇡ k achieves Gk consistently. The next
subcall starts with an initial state that is set to the mentioned intersection
of landmark states, so ⇡ k+1 is guaranteed to be a policy for every landmark
state of ⇡ k . The consistency requirement is set to increase the probability to
get a successful order of the serialization.
4.3
Online S-NDIW Algorithm
The Online Serialized Iterated Width (Online S-NDIW ) algorithm uses NDIW
to compute a solution for P using the knowledge of the state in which the
agent achieves every subgoal of the serialization.
The Online S-NDIW algorithm, described in 4.3.1, constructs a serialization of the problem P, together with a series of policies ⇡ 1 , . . . , ⇡ m such
that, prior computation of ⇡ k+1 , the realization of ⇡ k is known. By knowing
the state in which the k-th subgoal is achieved, the policy ⇡ k+1 has wider
information about the real initial state of the subproblem Pk than in the
offline version, so more useful policies can be computed.
43
Definition 4.3.1. Online Serialized NDIW (Online S-NDIW) over P =
hF, I, A, Gi consists of a sequence of calls to N DIW over the problems Pk =
hF, Ik , A, Ĝk i, k = 1, . . . , |G| , where:
1. I1 = I
2. Ĝk = Gk
1
[ oneof {g 2 G \ Gk 1 }, G0 = ;,
3. The landmark states are required to be consistent states.
4. Ik+1 is the state of the agent achieves after the realization of the (k 1)th subproblem.
In other words, the k-th subcall of S-NDIW stops when NDIW finds a
policy in which every terminal state is a consistent state that achieves the
goal (perhaps not the same goal in each terminal state). The next subcall starts with an initial state that models one of these terminal landmark
states, corresponding to the state of the agent after the realization of the
subproblem.
4.4
Empirical Evaluation
In this section we analyze the capabilities of S-NDIW to solve problems when
the goals are conjunction of atoms. First, we compare the performance of
S-NDIW versus SIW on a selection of deterministic benchmark problems.
Then, we move to FOND domains, for which we compare the performance
of S-NDIW against some of the state-of-the-art FOND planners. All the
experiments with S-NDIW have been run on Xeon Woodcrest computers
with clock speeds of 2.33 GHz, using 2GB memory limit.
4.4.1
Serialized Width in Classical Problems
The online and offline versions of the S-NDIW algorithm have been tested
with a selection of classical planning problems used in previous editions of
the IPC. The experiments have been run on Xeon Woodcrest computers with
clock speeds of 2.33 GHz, using 2GB memory limit. The results of the tests
are presented in table 4.1, and are compared with those of SIW, run on the
same experimental setup, and extracted from [Geffner and Lipovetzky, 2012].
For every domain, a series of N different instances have been tested with the
offline version of S-NDIW. A total of 5 runs per instance have been tested
with the online version of S-NDIW. The table shows the number of successful
runs on each domain (normalized over 5 runs, in the case of Online S-NDIW ).
44
Domain
N
Offline
S-NDIW
Online
SNDIW
SIW
8puzzle
Blocksworld
ferry
floortile
Grid
gripper
logistics
Mystery
parking
pegsol
satellite
scanalyzer
sokoban
tidybot
tower
visitall
woodworking
zeno
50
50
50
20
5
50
28
30
20
30
20
30
30
20
22
20
30
20
5
8
27
16
1
25
9
18
0
0
3
0
1
0
22
3
1
7
2.4
9
26
0
0
20
16
15
0
0.8
4
1
1
0
22
2
1
8.2
50
50
50
–
5
50
28
27
17
6
19
26
3
7
–
19
30
19
Table 4.1: Offline S-NDIW vs Online S-NDIW vs. SIW width distribution in
typical benchmark domains. The Offline S-NDIW results are normalized over 5
runs per problem instance.
The performance of the Offline S-NDIW and Online S-NDIW algorithms
are, apparently, very similar. However, that performance is quite low when
compared with SIW. Because of the similarities between S-NDIW and SIW
algorithms when the problems are deterministic, we need to examine the
reason of the difference of performance among S-NDIW and SIW. The figure
4.1 shows the execution results of the offline, and online versions of S-NDIW.
We distinguish a high percentage of successful runs when the problem size is
small. In contrast, the success rate is quite small when the size of the problem
exceeds 300 atoms. In that case, only the really simple problems are solved
(in short time), while the rest of the problems exceed the time or memory
resources during execution. There is a interval of problems — those between
3000 and 6000 atoms —, in which the limiter resource in Offline S-NDIW
is memory, while in Online S-NDIW is the execution time. This situation
may be explained by the fact that the online version of S-NDIW counts
with richer information of the state of the world than the offline version in
the landmark states, so the subproblems may become less complex (also in
terms of memory) despite the complexity of the whole problem makes not
possible for Online S-NDIW to find a complete solution in the required time.
45
(a) Execution stats of the Offline version of S-NDIW.
(b) Execution stats of the Online version of S-NDIW.
Figure 4.1: Execution results of S-NDIW in deterministic problems. TLE = Time
Limit 30min Exceeded; MLE = Memory Limit 2GB Exceeded.
The figure 4.2 shows detailed execution stats of three selected domains:
the gripper, the ferry, and the zeno. We see that, as stated in theorem ??,
the time complexity needed to find a solution is, for each different problem,
exponential in the problem width. The amount of memory demanded by
S-NDIW is also exponential in the problem size, and in the figure we see
clearly that, when the problem size overpasses certain threshold, the S-NDIW
algorithm is likely to run out of memory. Thus, with limited resources in
memory and time, the S-NDIW is able to tackle problems when its size do
not overpass certain limits, that are different in each domain and depends on
the branching factor of the tuple graph.
4.4.2
Serialized Width in FOND Problems
In this section we evaluate the performance of the S-NDIW algorithm, by
comparing the online and offline versions of the planner against some of the
46
Figure 4.2: Execution results of Online S-NDIW in a selection of deterministic
domains.
state-of-the-art FOND planners.
A selection of FOND benchmark domains with conjunctive goals have
been tested with the offline version of NDIW. The performance results are
compared with those of the state-of-the-art planners FIP, and PRP, extracted
from [Muise et al., 2012]. The S-NDIW demonstrate lower performance than
the other two planners.
Domain
(#instances)
blocksworld (30)
faults (55)
first-responders (100)
forest (90)
blocksworld-new (50)
forest-new (90)
Solved (unsat)
Off S-NDIW
FIP
7 (0)
–
58 (25)
–
4 (0)
10 (0)
30 (0)
55 (0)
100 (25)
20 (11)
33 (0)
51 (0)
PRP
30 (0)
55 (0)
100 (25)
66 (48)
46 (0)
81 (0)
Table 4.2: Offline S-NDIW vs. F IP vs. P RP performance. In parenthesis,
number of non-resoluble instances detected.
A selection of FOND benchmark domains with conjunctive goals have
been tested with the online version of NDIW, and the performance results are
compared with those of the state-of-the-art planners FF-Hindsight+ [Yoon et al., 2010],
and PRP. The blocksworld-2, elevators, and zenotravel, and climber problems
have conjunctive goals, while the river, busfare, and triangle-tireworld problems have atomic goals. In this case, S-NDIW show a comparable performance with the state-of-the-art FOND planners, and is due to the fact that
these FOND problems have small bounded width. Despite some of these
problems have a significant number of deadends that may increase the complexity of other planners, the S-NDIW algorithm solves them in low order
of complexity.
47
Domain
(#instances)
blocksworld-2 (15)
elevators (15)
zenotravel (15)
climber (1)
river (1)
busfare (1)
triangle-tire-1 (1)
triangle-tire-17 (1)
triangle-tire-31 (1)
triangle-tire-35 (1)
Success Rate
On S-NDIW FF-H+
29.1%
91.1%
31.8%
100%
100%
100%
100%
100%
100%
–
74.4%
64.9%
68.9%
100%
66.7%
100%
100%
100%
–
–
PRP
time(s)
On S-NDIW FF-H+
PRP
100%
100%
100%
100%
66.7%
100%
100%
100%
100%
100%
900
1620
1620
–
–
–
–
–
–
–
8.4
1.7
98.7
0
0
0
0
19
–
1520
0
0
0
0 (0)
124 (29)
2736 (723)
–
Table 4.3: Online S-NDIW vs. FF-H+ vs. P RP performance. Results for 30
runs.
4.5
Conclusion
We have introduced a serialization method for addressing FOND problems
with goals that are conjunction of atoms. We have proposed two different
versions of the planner: an offline planner, that computes the complete solution valid for any possible realization of the problem; and an online planner,
that computes a solution of the problem given one realization.
We note that, despite the N DIW algorithm is complete, S-NDIW isn’t.
The S-NDIW algorithm does not provide guarantee of finding a solution –
if it exists –, and the reason is that the order of the serialization affects the
success of the algorithm, and some consistent orderings may lead to landmark
states that are deadends of the problem.
Finally, the S-NDIW algorithm, through its two versions, is able to solve
some of the FOND benchmark domains with a performance that is comparable with the state-of-the-art FOND planners, provided that the problem
size is sufficiently small. However, the performance of S-NDIW compared
with those of SIW in deterministic domains is low, and our implementation
of S-NDIW is only able to solve problems with small width and size.
48
Chapter 5
POLICY EXTRACTION
This chapter formalizes the extraction of policies for a fond planning problem
P from a tuple graph G of P. The Iterated Prune algorithm is proposed to
extract a family of policies from the tuple graph, with a complexity that is
comparable with the expansion of the tuple graph.
5.1
Solucion Extraction Graph
As defined in chapter 3, the tuple graph is a graph whose nodes are of the
form rt , the side-effects of a tuple t 2 T i , and that encodes the reachability
relations among the states represented by the nodes. These relations are
aj
represented by directed edges of the form rt ! rt0 , meaning that for every
optimal weak plan for rt exists an action a and an effect of a (say, aj ) such
that the state resulting by applying the effect aj to rt a weak plan for rt0 .
In every state rt more than one action may be applied, meaning that the
node rt in tuple graph G i may contain outer edges labeled with effects of
different actions. Similarly, an effect aj of an action a may map rt into a
state that models more than one node in G i , meaning that the node rt in the
tuple graph G i may contain more than one edge labeled with the same effect
aj .
A policy ⇡ for a problem P is well defined when ⇡ is defined in every
reachable state, and ⇡ eventually maps the initial state into the goal state of
the problem. In order to identify when the tuple graph encodes a policy for
0
the problem P, we need to check if G i contains a subgraph G i that accomplish
certain properties. The following definitions formalize the properties that we
0
will require G i to define a policy for P.
Definition 5.1.1. Let G i be a tuple graph of a FOND planning problem P.
We denote by AG i (rt ) the set of actions a for which the node rt has an outer
49
edge labeled with an effect of a.
Definition 5.1.2. Let G i be a tuple graph of a FOND planning problem P.
We say that a node rt is complete if, for every a 2 AG i (rt ), and for every
effect aj of a, there is an outer edge of rt labeled with aj . We say that a node
rt is unitary if, for every effect aj there is one, and only one, outer edge of rt
labeled with aj .
Definition 5.1.3. Let G i be a tuple graph of a FOND planning problem P,
0
0
and let G i be a subgraph of G i . A state rt is a deadend in G i when it doesn’t
0
exist a chain in G i starting in rt that implies the goal G.
0
0
A policy ⇡ can be inferred from a subgraph G i of G i with no deadends
when, for every state rt , we consider ⇡G i 0 (rt ) to be a random election over the
actions in AG i 0 (rt ), which we note as ⇡G i 0 (rt ) = oneof {a|a 2 AG i 0 (rt )}. The
policy ⇡ of an arbitrary (reachable) state s will be pi(s) = oneof {⇡G i 0 (rt )|s |=
0
rt } It is not difficult to see that all the nodes in G i need to be complete to
define a policy. Likewise, the do not need to be unitary, so the same effect
of an action a = ⇡G i 0 (rt ) may map the state rt into more than one possible
0
nodes in G i . For that reason, we choose the policy pi(s) to be one of the
actions in AG i 0 (rt ), for any rt true in s.
Theorem 5.1.1. Let G i be a tuple graph of a FOND planning problem P.
0
0
Let G i be a subgraph of G i . The subgraph G i defines a strong cyclic policy ⇡
0
0
for P if G i has no deadends, it exists rt 2 G i such that I is true in rt , and
0
every node in G i is complete.
0
Proof. The direct implication is trivial. Lets consider a subgraph G i with
the stated properties, and we will see that it defines a strong cyclic policy.
First, there exist a node rt modeled by the initial state I. For any rt0 in
0
G i , either rt0 is a goal state, or there exists an action applicable in AG i 0 (rt )
0
0
(because G i has no deadends). As every node in G i is complete, the policy
that picks an action from AG i 0 (rt ) is well defined. As there are no deadends
0
in G i , the policy is guaranteed to (eventually) reach the goal.
5.2
Iterated Prune Algorithm
We propose an algorithm, Iterated Prune (IP ), that checks the existence of
0
a subgraph G i ⇢ G i that defines a policy for P, and that is based on the algorithm to extract strong cyclic solutions presented in [Geffner and Bonet, 2013].
The IP algorithm is constructive, so a family of policies is obtained at the
50
end of the process. The time and complexity of the proposed method is comparable with the complexity of the expansion of the tuple graph. Despite
more efficient algorithms do exist, they are out of the scope of this work, and
do not result in a major improvement on the overall planner algorithm.
The Iterated Prune algorithm consists of a series of pruning actions in the
tuple graph G i until a subgraph with the properties of theorem 5.1.1 is found,
or all the nodes in the graph are pruned. In order to guarantee the absence of
deadend nodes, the algorithm computes the Vmin (rt ) value of every node in
G i , that measures the minimum distance from rt to a goal state, that is, the
length of minimum chain in from rt that implies the goal G. If G i contains
a node with Vmin (rt ) = 1, the algorithm prunes this node from the graph.
The pruning action implies not only removing the node and all the inner and
outer edges, but also the guarantee that the resulting graph is complete need
aj
to be preserved. That is, if an edge rt0 ! rt existed prior pruning the node
rt , and rt] do not have more outer edges labeled with the effect aj , then all
the outer edges of rt0 labeled with effects of the action a need to be pruned
as well.
The pruning process is guaranteed to end in a finite number of iterations
(at most, in |G i | iterations), and its complexity is comparable in front of
the complexity needed to expand the tuple graph. If P admits solution,
0
the Iterated Prune algorithm returns a subgraph G i that defines a family of
strong cyclic policies that solve P.
Theorem 5.2.1. The Iterated Prune algorithm is sound and, if P admits
solution, returns a family of strong cyclic policies for P.
Proof. In each iteration, the Iterated Prune algorithm prunes a node rt with
Vmin (rt ) = 1, i.e., that is a deadend of the graph. When pruning a node
rt , the inner and outer edges are removed while preserving the completeness
of the remaining nodes. At the end of the process, the algorithm returns a
graph that accomplish the conditions of theorem 5.1.1. When the nodes are
not unitary, the graph do not define a unique policy, but it returns a family
of strong cyclic policies for P.
Similarly, when the Iterated Prune algorithm do not find a policy for
0
P, then the root of G i is a deadend of the problem (because is pruned).
Therefore, there does not exist a strong cyclic policy for P.
5.2.1
Policy Definition
The graph returned by the Iterated Prune algorithm is a graph such that,
for every state rt , a set of actions A(rt ) are available. Every action a 2 A(rt )
51
maps the state rt into an state that depends on the (non-deterministic) effect
of a. In concrete, this graph is an AND/OR graph [Russell and Norvig, 2002].
In in the tests described in chapters 3 and 4, we have just randomly selected
an action a from the set A(rt ). The fact that the nodes of the subgraph are
complete, and the subgraph has no deadens guarantee that, assuming fair
non-determinism, our (randomized) policy eventually solves the problem.
There exist algorithms to select the optimal solutions according to the cost
of the actions, or the presence/absence of loops [Dechter and Pearl, 1985]
[Hansen and Zilberstein, 1998] [Bonet and Geffner, 2005]. Once the Iterated
Prune algorithm returns the subgraph that defines a family of strong cyclic
policies for P, we can use an off-the-shelf algorithm (e.g., AO? ) [Geffner and Bonet, 2013]
to select the optimal policy according to our preferences.
5.3
Conclusion
The tuple graph G defined in chapter 2 may contain a subgraph that defines a strong cyclic policy for a FOND planning problem P. In that case,
the methods defined in chapters 3 and 4 build a tuple graph G i such that,
for i = w(P), contains a subgraph that defines a strong cyclic policy. The
extraction of such policies can be performed with the Iterated Prune algorithm, a method that is sound and complete, and returns a family of strong
cyclic policies for P in time and complexity that is comparable with that of
expanding the tuple graph. Finally, we can use an off-the-shelf algorithm to
select the optimal policy according to our preferences.
52
Part III
Conclusions and Future Work
53
Chapter 6
DISCUSSION AND
CONCLUSION
In this chapter we summarize the contributions of this manuscript, and discuss future lines of research based on the work developed so far.
6.1
Contributions
This manuscript contributes in the study of the structure of FOND planning
problems, and proposes an abstraction of the FOND problems complexity
based on the reachability over classes that are the side-effects of tuples of
atoms. This method has appeared to encode the complexity in the typical
benchmark problems in classical and FOND planning. In more detail, the
contributions of this manuscript are:
1. A new width notion for fully observable, non-deterministic planning
that encodes the reachablility among states, and explains the complexity in planning problems. We prove theoretically that some typical
benchmark domains have a bounded and low width when goals are
restricted to single atoms.
2. A simple, blind-search planning algorithm (NDIW ) that runs in time
exponential in the width of the problem. We prove empirically that
many of the existing benchmark domains have a bounded and low width
when goals are restricted to single atoms.
3. An online version, and an offline version of a simple, blind-search planning algorithm (S-NDIW ) that uses NDIW for serializing a problem
into subproblems and for solving such subproblems.
55
4. A simple, effective method to extract strong cyclic policies from the
tuple graph that runs in time that is comparable with the time needed
to expand the tuple graph.
6.2
Future Work
This thesis defines a serialization of the problem that is based on the approach
in [Geffner and Lipovetzky, 2012] for classical planning. The intuition leads
to guess that the performance of S-NDIW should be comparable with that
of SIW in classical planning problems. However, our tests reveal that SNDIW do not perform such well due to memory and time constraints. The
low performance of S-NDIW may rely on the implementation, that should
be revised.
The serialization of the problems leads to an algorithm that is not complete, since the order of the serialization affects the success of the planner.
In order to improve the performance, and minimize the chances of entering
in a deadend of the problem, a consistency check over the states is employed.
This consistency check checks the existence of weak plans that satisfy certain
constraints and, certainly, improves the performance of the planner. However, as well as we pursue strong and strong cyclic plans, it may be useful
to use a more constrained consistency check that increase the likelihood of
existence of the desired policies.
The tests reported in this manuscript check the existence of strong cyclic
policies. However, the same tests can be performed to check for the existence
of strong policies, or even weak plans. It may be interesting to compare the
existence of each of such solutions in the existing benchmark domains.
There is a strong connection between the Markov Decision Problems
(MDP ) and FOND planning. The difference is that the MDP model considers
probabilities in the transition function, while the FOND model don’t. Any
MDP can be translated into a FOND problem, by translating the distribution
of probabilities of the effects of an action into different effects. This equivalence between FOND and MDP problems suggests that the width notion
introduced in this manuscript can be adapted to describe the MDP model
accurately.
The tuple graph exploits the fact that, in a FOND problem, not all atoms
are relevant in every state in order to achieve the goal and, similarly, only a
few atoms are relevant in each state in order to advance towards the goal of
the problem. It may be interesting to check the existence of compact policies
that only take into account a selection of atoms of the problem, leaving the
remaining atoms unattended.
56
Bibliography
[Blum and Furst, 1997] Blum, A. L. and Furst, M. L. (1997). Fast planning
through planning graph analysis. Artificial Intelligence, 90(1-2):281–300.
[Bonet and Geffner, 2001] Bonet, B. and Geffner, H. (2001). Planning as
heuristic search. Artificial Intelligence, 129(February 2000):5–33.
[Bonet and Geffner, 2005] Bonet, B. and Geffner, H. (2005). An algorithm
better than AO*? In Proceedings of the 20th national conference on Artificial intelligence - Volume 3, AAAI’05, pages 1343–1347. AAAI Press.
[Bylander, 1994] Bylander, T. (1994). The Computational Complexity of
Propositional STRIPS Planning. Artificial Intelligence, 69(1-2):165–204.
[Chapman, 1987] Chapman, D. (1987). Planning for conjunctive goals. Artificial Intelligence, 32(3):333–377.
[Chen and Giménez, 2007] Chen, H. and Giménez, O. (2007). Act Local,
Think Global: Width Notions for Tractable Planning. ICAPS.
[Dechter and Pearl, 1985] Dechter, R. and Pearl, J. (1985). Generalized bestfirst search strategies and the optimality of A*. J. ACM, 32(3):505–536.
[Fikes and Nilsson, 1972] Fikes, R. and Nilsson, N. (1972). STRIPS: A new
approach to the application of theorem proving to problem solving. Artificial intelligence, 2.
[Fu et al., 2008] Fu, J., Bastani, F., Ng, V., Yen, I.-L., and Zhang, Y. (2008).
FIP: A Fast Planning-Graph-Based Iterative Planner. 20th IEEE ICTAI,
pages 419–426.
[Geffner and Bonet, 2013] Geffner, H. and Bonet, B. (2013). A Concise Introduction to Models and Methods for Automated Planning: Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool
Publishers.
57
[Geffner and Lipovetzky, 2012] Geffner, H. and Lipovetzky, N. (2012).
Width and serialization of classical planning problems. ECAI.
[Gerevini et al., 2009] Gerevini, A. E., Haslum, P., Long, D., Saetti, A., and
Dimopoulos, Y. (2009). Deterministic planning in the fifth international
planning competition: PDDL3 and experimental evaluation of the planners. Artificial Intelligence, 173(5-6):619–668.
[Ghallab et al., 2004] Ghallab, M., Nau, D., and Traverso, P. (2004). Automated Planning: Theory & Practice. Morgan Kaufmann.
[Hansen and Zilberstein, 1998] Hansen, E. and Zilberstein, S. (1998).
Heuristic search in cyclic AND/OR graphs. AAAI/IAAI, pages 412–418.
[Hoffmann et al., 2004] Hoffmann, J., Porteous, J., and Sebastia, L. (2004).
Ordered landmarks in planning. J. Artif. Int. Res., 22(1):215–278.
[Kissmann and Edelkamp, 2009] Kissmann, P. and Edelkamp, S. (2009).
Solving fully-observable non-deterministic planning problems via translation into a general game. In KI 2009, KI’09, pages 1–8, Berlin, Heidelberg.
Springer-Verlag.
[Levesque, 2005] Levesque, H. (2005).
20(1):249–254.
Planning with loops.
IJCAI,
[Little and Thiebaux, 2007] Little, I. and Thiebaux, S. (2007). Probabilistic
planning vs. replanning. ICAPS Workshop on IPC: Past, Present and
Future.
[McDermott, 1999] McDermott, D. (1999). Using regression-match graphs
to control search in planning. Artificial Intelligence, pages 1–42.
[McDermott et al., 1998] McDermott, D., Ghallab, M., Howe, A., and
Knoblock, C. (1998). PDDL-the planning domain definition language.
[Muise et al., 2012] Muise, C., McIlraith, S., and Beck, J. (2012). Improved
Non-Deterministic Planning by Exploiting State Relevance. ICAPS, pages
172–180.
[Richter and Westphal, 2010] Richter, S. and Westphal, M. (2010). The
LAMA planner: guiding cost-based anytime planning with landmarks. J.
Artif. Int. Res., 39(1):127–177.
58
[Russell and Norvig, 2002] Russell, S. J. and Norvig, P. (2002). Artificial Intelligence: A Modern Approach, 2nd Ed. Prentice Hall, Englewood Cliffs,
NJ.
[Yoon et al., 2010] Yoon, S., Ruml, W., Benton, J., and Do, M. (2010).
Improving determinization in hindsight for online probabilistic planning.
ICAPS.
59
Index
classical planning, 6
consistency check, 9
effective width, 8
Iterative Width algorithm, 7
serialization, 8
SIW algorithm, 8
tuple graph, 6, 7
width, 7
FOND planning
side-effects of a tuple, 17
International Planning Competition, 4
PDDL model, 5
domain of a problem, 5
instance of a problem, 5
plan, 6
planning formulation, 3
actions set, 3
goal state, 3
initial state, 3
states space, 3
transition function, 3
policy, 10
STRIPS model, 4
tuple of atoms, 6
FOND planning, 10
complete node rt , 50
consistency of a landmark, 42
deadend, 50
Iterated Prune algorithm, 50
NDIW algorithm, 30
N DIW (i), 31
S-NDIW algorithm, 42
Offline S-NDIW, 43
Online S-NDIW, 44
serialization, 41
side-effects of tuple, 31
standard form, 16
strong cyclic plan, 10
strong plan, 10
tuple graph G, 17
tuple graph G i , 29
unitary node rt , 50
weak plan, 10
planning, 3
atom, 4
chain of tuples, 6
factored representation, 4
fluent, 4
61

Download Report

Computing Compact Policies for Fully Observable Non-Deterministic Planning Problems

Paperzz.com

Your Paperzz