Computing Compact Policies for Fully Observable Non-Deterministic Planning Problems Alberto Camacho Martinez MASTER THESIS UPF / 2013 SUPERVISED BY: Prof. Hector Geffner Department of Information and Communications Technologies Acknowledgements There are a number of people without whom this thesis might not have been possible, and whom I am greatly indebted. It is not possible to mention all ow them, nor to mention every individual contribution, so I will try to take profit of these lines by expressing my grateful to everyone that has been involved in some sense with my work during the elaboration of the Master Thesis. First, I would like to thank my MSc advisor Hector Geffner. Many things can be said here, but I will try to be crisp and go to the point. We can find a lot of talented people in the scientific community, but really a few of them complement their talent to do research with an enjoyable and pleasing behavior. Hector is one of these lucky chosen ones. He has always had time for me to discuss about new ideas and the progress of this work, not to mention his role in the friendly atmosphere at the department. I have to express my gratitude for trusting me and for offering me such an interesting line of research. Second, I would like to thank Nir Lipovetzky, the former PhD candidate, that is responsible of making my work here possible thanks to one of his contributions made during his predoctoral studies. I had the pleasure to meet him during some months at UPF, discuss about his work and, most valuable, receive some piece of advice about how to do research in planning. Continuing with the Artificial Intelligence group at UPF, I would like to thank the rest of the team for their insights and support: Alexandre Albore, Hector Palacios, Anders Jonsson, and my two office mates Damir Lotinac and Filippos Kominis. The realization of the master would not have been such an amazing experience without the good atmosphere created by my classmates, to whom I want to thank for the funny moments we shared. Last, but not least, I want to mention the support of the DTIC administration department, that saved my neck more than once with the administrative paperwork. iii Abstract Fully Observable Non-Deterministic (FOND) planning is the problem of finding action strategies for solving a planning problem assuming that the actions may have non-deterministic effects, and the states are fully observable. In this thesis, we adapt the approach by [Geffner and Lipovetzky, 2012] to study the complexity of FOND problems: we define a parameter, the width, that captures the complexity of the problem, and we develop an algorithm to compute compact policies in time and complexity that is exponential in the problem width. We also show that many of the benchmark domains with atomic goals can be solved in small polynomial time with this method, and some problems with conjunctive goals can be solved with low complexity and efficiency that is comparable with state-of-the-art FOND planners. Resumen Los problemas de planificación no determinística bajo total observabilidad (FOND) consisten en encontrar políticas de acciones para resolver un problema, asumiendo que las acciones pueden tener efectos no determinísticos y que los estados son totalmente observables. En esta tesis se adapta el método utilizado en [Geffner and Lipovetzky, 2012] para estudiar la complejidad de problemas FOND: definimos un parámetro, la anchura, que captura la complejidad del problema, y desarrollamos un algoritmo para computar políticas compactas en tiempo y complejidad exponencial en la anchura del problema. Mostramos también que muchos de los dominios estándares con metas atómicas pueden ser resueltos con este método en tiempo polinomial pequeño, y algunos de los problemas con metas no atómicas pueden resolverse con baja complejidad que es comparable con los actuales planners. v Contents List of Figures ix List of Tables xi I 1 1 II Background PROBLEM STATEMENT 1.1 Introduction . . . . . . . . . . . . . 1.2 Classical Planning . . . . . . . . . . 1.3 Fully Observable Non-Deterministic 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure 3 3 6 10 11 13 2 A WIDTH NOTION FOR FOND PLANNING 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 2.2 Tuple Graph . . . . . . . . . . . . . . . . . . . . . 2.3 Low Width Benchmarks . . . . . . . . . . . . . . 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 16 19 27 3 NON-DETERMINISTIC ITERATED WIDTH 3.1 Tuple Graph G i . . . . . . . . . . . . . . . . . . . 3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 3.3 Empirical Evaluation . . . . . . . . . . . . . . . . 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 30 33 38 4 SERIALIZED NDIW 41 4.1 Serialized Width . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Offline S-NDIW Algorithm . . . . . . . . . . . . . . . . . . . . 42 4.3 Online S-NDIW Algorithm . . . . . . . . . . . . . . . . . . . . 43 vii 4.4 4.5 5 POLICY EXTRACTION 5.1 Solucion Extraction Graph 5.2 Iterated Prune Algorithm 5.2.1 Policy Definition . 5.3 Conclusion . . . . . . . . . III 6 Empirical Evaluation . . . . . . . . . . . . . . 4.4.1 Serialized Width in Classical Problems 4.4.2 Serialized Width in FOND Problems . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 44 46 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 50 51 52 53 DISCUSSION AND CONCLUSION 55 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 viii List of Figures 2.1 Blocks World problem . . . . . . . . . . . . . . . . . . . . . . 19 3.1 3.2 3.3 NDIW execution stats . . . . . . . . . . . . . . . . . . . . . . 36 NDIW cumulative time distribution . . . . . . . . . . . . . . . 37 NDIW time to solve the triangle-tireworld problem . . . . . . 38 4.1 4.2 S-NDIW execution stats . . . . . . . . . . . . . . . . . . . . . 46 Online S-NDIW execution stats, selected domains . . . . . . . 47 ix List of Tables 1.1 1.2 SIW performance . . . . . . . . . . . . . . . . . . . . . . . . . 9 State-of-the-art FOND and MDP planners . . . . . . . . . . . 11 3.1 3.2 3.3 NDIW vs. IW performance in deterministic problems . . . . . 34 NDIW performance in FOND problems with atomic goals . . 35 N DIW vs. P RP vs. FF-H+ performance . . . . . . . . . . . 38 4.1 4.2 4.3 Offline S-NDIW vs Online S-NDIW vs. SIW performance . . 45 Offline S-NDIW vs. F IP vs. P RP performance . . . . . . . 47 Online S-NDIW vs. FF-H+ vs. P RP performance . . . . . . 48 xi Part I Background 1 Chapter 1 PROBLEM STATEMENT In this chapter we introduce the general problem of planning in artificial intelligence. We then focus into two forms of planning: classical planning, and fully observable non-deterministic planning. We conclude with an explanation of the scope of this thesis. 1.1 Introduction Planning is the problem of finding action strategies for a given problem, such that the initial state of the problem is mapped into a desired goal state [Ghallab et al., 2004], [Russell and Norvig, 2002], [Bonet and Geffner, 2001]. The model for classical planning is formulated in definition 1.1.1, and is the most simple model for planning problems. The rest of the models arise from the relaxation of these definition, e.g., when the effects of the actions are not deterministic (see 1.3). Definition 1.1.1. A classical planning model is a tuple ⇧ = hS, s0 , SG , A, F i where: (i) S is a finite and discrete set of states (the states space). (ii) s0 2 S is the initial state of the problem. (iii) SG ⇢ S is a set of goal states. (iv) A is a set of actions. The set of applicable actions in a state s is denoted by A(s), A(s) ⇢ A. (v) F is a transition function that associates a state s0 by applying an action a in a state s: s0 2 F (s; a) for a 2 A(s). 3 The problem of planning is computationally intractable [Bylander, 1994], and the tradeoff between the quality of the solutions, and resources in time and memory needed to find them makes planning a challenging task. During the last two decades we have seen significant improvements in the performance of planners as witnessed by International Planning Competition1 . The planning problems can be classified into different types according to the action effects – deterministic vs. stochastic – and the level of observability in the world – fully observable vs. partially/non observable. While classical planning has been deeply studied and state-of-the-art planners demonstrate high performance, other types of planning problems involving nondeterminism and/or partial observability still seem to have room for notable improvements [Levesque, 2005] [Kissmann and Edelkamp, 2009]. Planning Models Planning problems define a huge state space, and enumerating all the states in the problem definition is not feasible. In order to provide a compact, computationally tractable problem definition, the factored representations describe every state of the problem not as a whole entity, but as a set of predicate variables that are true in that state. That is, a states space that is intractable in size can be described with a set of variables that is bounded in size. The most used representation in planning literature is STRIPS [Fikes and Nilsson, 1972], that represents the states of a classical planning problem problem through boolean variables (a.k.a. fluents, facts, or atoms) stating whether a proposition that describes the world is true or false in a given state. An action in STRIPS consists of three sets of variables: the preconditions set Pre, the Add set (containing the atoms that the action makes true), and the Del set (containing the atoms that the action makes false). The definition 1.1.2 formalizes the STRIPS representation. Definition 1.1.2. A planning problem in STRIPS is a 4-tuple ⇧ = hF, O, I, Gi where (i) F is a set of boolean variables (ii) O is a set of operators, where o 2 O has the form o = hPre(o), Add(o), Del(o)i, and Pre(o), Del(o) ⇢ F (iii) I is a subset of F , describing the initial state 1 The International Planning Competition (IPC) is a biennial event organized in the context of the International Conference on Planning and Scheduling. 4 (iv) G is a subset of F , describing the set of goal states In the planning literature it is common to formulate the problems in STRIPS form. However, other popular languages exist such as the Planning Domain Definiton Language (PDDL) [McDermott et al., 1998], and other PDDL-like languages that have been standard in the International Planning Competitions [Gerevini et al., 2009]. In PDDL, the representation of a planning problem is defined by a set of predicates, variables, and constants (see listing 1.1) that are used as the input of planner solvers. The definition of a planning problem in PDDL is divided into two descriptors, usually separated into two different files: the domain, and the instance. The domain of a problem describes the type of variables, constants, and actions in the problem. The instance of a problem lists the variables and constants in the problem, as well as the initial state and goal state. In that manner, different configurations of the same planning problem have the same domain, but different instance file. For that reason, we will usually refer to different planning domain configurations as different instances of a planning domain. Listing 1.1: PDDL description of the River FOND problem, consisting on the domain description, and an instance description ( d e f i n e ( domain r i v e r ) ( : r e q u i r e m e n t s : t y p i n g : s t r i p s : non d e t e r m i n i s t i c ) ( : p r e d i c a t e s ( on n e a r bank ) ( on f a r bank ) ( on i s l a n d ) ( a l i v e ) ) ( : action traverse rocks : parameters ( ) : p r e c o n d i t i o n ( and ( on n e a r bank ) ) : e f f e c t ( and ( n o t ( on n e a r bank ) ) ( oneof ( on f a r bank ) ( not ( a l i v e ) ) ( on i s l a n d ) ( on i s l a n d ) ) ) ) ( : a c t i o n swim r i v e r : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( and ( on n e a r bank ) ) : e f f e c t ( and ( n o t ( on n e a r bank ) ) ( oneof ( and ) ( on f a r bank ) ) ) ) ( : a c t i o n swim i s l a n d : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( and ( on i s l a n d ) ) : e f f e c t ( and ( n o t ( on i s l a n d ) ) ( oneof ( on f a r bank ) ( on f a r bank ) ( on f a r bank ) ( on f a r bank ) ( not ( a l i v e ) ) ) ) ) ) ( d e f i n e ( problem r i v e r problem ) ( : domain r i v e r ) ( : i n i t ( on n e a r bank ) ( a l i v e ) ) ( : g o a l ( and ( on f a r bank ) ) ) ) 5 1.2 Classical Planning Classical planning is the problem of finding a sequence of actions that maps a given initial state into a goal state, assuming that the environment and the actions are deterministic. This is the simplest model in planning, and is the case when the transition function in definition 1.1.1 is a deterministic function (F (a, s) is a singleton). The solutions of a classical planning problem are plans, or sequences of actions that map the initial state I into the goal state G of the problem P. The determinism of the actions makes possible to compute a complete plan in advance – i.e., prior to execution – with the information of the initial state and without knowledge acquired in the intermediate states reachable during the execution of the plan. Width in Classical Planning The structure inherent to classical planning problems has been the subject of several studies, with the aim of founding their complexity, and finding methods for solving them. For instance, a theoretical study in [Chen and Giménez, 2007] suggests several width notions that measure the complexity in planning problems according to the number of atoms that change its value during a plan execution. A more recent study appearing in [Geffner and Lipovetzky, 2012] introduces a novel width concept to bound the complexity in classical planning problems, by studying the reachability of new tuples of atoms during the plan execution. The same study introduces an algorithm that solves planning problems in complexity that is exponential in the problem width. Tuple Graph The tuple graph (in classical planning) encodes the reachability relations over tuples of atoms t. From now on, we will use the term tuple to refer to a set of atoms of a planning problem P, and we will note P(t) the problem that is like P, but with goal t. We will consider that every action has the same cost, so optimal plans are the shortest ones. In classical planning, every optimal plan ⇡ has the property that every action a in ⇡ introduces a new tuple of atoms optimally. In other words, every partial plan ⇡i of ⇡ is an optimal plan for a certain tuple of atoms ti . The execution of ⇡ is then a chain of state-actions that achieves optimally a tuple of atoms ti in every step. Inspired by this notion, we define the concept of chain of tuples (see definition 1.2.1). 6 Definition 1.2.1. A chain of tuples is an ordered sequence C : t0 ! t1 ! · · · ! tn , where each ti is a tuple of atoms from a classical planning problem P. The size of C is the size of largest tuple t in the chain. A chain C is valid if t0 is true in I and every optimal plan for ti can be extended into an optimal plan for ti+1 by adding a single action. A valid chain C implies G if all optimal plans for tn are also optimal plans for G. Definition 1.2.2. We denote as T i the size of tuples t from P with size no greater than the integer i. Definition 1.2.3. Let P = hF, I, O, Gi be a classical planning problem. We define the tuple graph G i as the graph with vertexes from T i defined inductively as follows: 1. t is a root vertex in G i iff t is true in I 2. t ! t0 is a directed edge in G i iff t is in G i and for every optimal plan ⇡ for P(t) there is an action a 2 O such that ⇡ followed by a is an optimal plan for P(t0 ). The construction of the tuple graph G i , formalized in definition 1.2.3, connects tuples t 2 T i when they form a valid chain. The paths in G i are valid chains t0 ! · · · ! tm that imply tm and, thus, encode plans for tm . Note that these valid chains are optimal in the state space of G i , whereas may encode plans that are not optimal in the state of P. The key part now is to identify which of the valid chains in G i imply the goal G. When G i contains a valid chain C that imply G, a plan for G exists and can be inferred from C, so a solution for P can be found. Definition 1.2.4. The width of a planning problem P, w(P), is the min w such that it exists a valid chain C with size w that implies the goal G. If G is true in I, the width of the problem is 0. Theorem 1.2.1. It w(P) = w, the tuple graph G w contains a valid chain that optimally implies the goal G. Proof. By definition of w(P), it exists a valid chain C with size w that implies G. The tuple graph G w contains all the valid chains of size no greater than w and, in particular, also contains C. The Iterative Width (IW ) algorithm consists of a sequence of calls IW (i), for i = 0, 1 . . . until the problem P is solved or i exceeds the number of variables in P. Each IW (i) call expands the tuple graph G i , and checks for a valid chain that implies G. The IW algorithm is complete, but it is not 7 guaranteed to return an optimal solution for P, since G i may contain a valid chain that implies G for i < w(P). The minimum parameter i for which IW (i) finds a plan for G is called the effective width of the problem. Theorem 1.2.2. A classical planning problem P can be solved optimally in time that is exponential in the problem width. Proof. The Iterative Width algorithm[Geffner and Lipovetzky, 2012] is able to solve a classical planning problem P in time complexity that is exponential in the with of the problem. The empirical tests reported in [Geffner and Lipovetzky, 2012] show that most benchmark domains with atomic goals have a small effective width independent of the size of the problem. Unfortunately, when the goals are not atomic the width of the problem increases considerably, so it seems that the complexity of a problem is somehow related to the number of atoms in the goal. In order to take advantage of the low width of the problems when the goals are atomic, a serialization of the problem is proposed to tackle goals that are conjunction of atoms. Serialization Several approaches exist to tackle problems whose goals are conjunction of atoms [Chapman, 1987], [Richter and Westphal, 2010], [Hoffmann et al., 2004]. The approach presented in [Geffner and Lipovetzky, 2012] proposes to perform a decomposition of the problem into a series of subproblems that are solved sequentially. A serialization of a problem ⇧ with goal G as a sequence of formulas G1 , . . . , Gm such that G0 = ;, Gm = G and, by simple serializations, Gi+1 extends Gi with an additional atom from G. A serialization d defines a family of sets of planning problems Pd = P1 , . . . , Pm such that Pi is like ⇧ but with goal Gi and initial state si that correspond to the state resulting from solving the problem Pi 1 optimally. Definition 1.2.5. Serialized Interated Width (SIW) over a classical planning problem ⇧ = hF, I, O, Gi consists of a sequence of calls to IW over the subproblems ⇧k = hF, Ik , O, Gk i, k = 1, . . . , |G|, where: 1. I1 = I 2. Gk is the first consistent set of atoms achieved from Ik such that Gk Gk ✓ G and |Gk | = k; G0 = ;. 3. Ik 1 represents the state where Gk is achieved, 1 < k < |G|. 8 1 ⇢ The Serialized Iterated Width (SIW ) algorithm (formalized in definition 1.2.5) achieves the atomic goals of P sequentially. In other words, the k-th subcall of SIW tries to achieve k goals from G: the k 1 goals already achieved by the previous subcall, and one extra goal. Notice that SIW does not use heuristic estimators to reach the goal, and is blind with respect to the goal Gk achieved in each subcall. The SIW algorithm is sound, and the solution to ⇧ can be obtained by concatenating the solutions of the subproblems ⇧1 , . . . , ⇧m . As it inheritates from IW , the solution of a subproblem may be suboptimal, so the solutions of SIW may not be optimal either. On the other hand, while IW algorithm is complete, SIW is not. The reason is that the order of the serialization of the subgoals may lead to deadend states. In order to minimize this situation and improve the SIW performance, the atomic goals are required to be achieved, if possible, consistently: the state sk consistently achieves Gk 2 G if sk achieves Gk , and Gk does not need to be undone in order to achieve G. The complexity of a serialization d is bounded by the maximum order of complexity among all the subproblems in Pd . Similarly, the complexity of the SIW algorithm is bounded by the maximum order of complexity among all the subcalls to IW . The width of a serialization, and the effective width of a serialization comes directly from the previous definition of the width of a problem. The advantage of serializing a problem is that the width of a serialization – i.e., the maximum width over all subproblems – is significantly lower that the width of the problem without serialization, and this is directly related with the complexity needed to compute solutions for the problem. The table 1.1, extracted from [Geffner and Lipovetzky, 2012], summarizes the performance of the SIW algorithm in the typical benchmark domains. Their columns show, respectively, the number of tested instances, the number of solved instances, the average time needed by SIW to find a plan, the average of the maximum width of each domain, and the average of the expected width over all domains. As shown in the table 1.1, many instances can be solved with an expected complexity that is exponential in a number between 1.6 and 2.5. Instances Solved Time avg(maxw ) avg(avgv ) 1150 819 55.01 2.5 1.6 Table 1.1: SIW performance in a collection of instances of the typical benchmark domains from the IPC. 9 1.3 Fully Observable Non-Deterministic Planning Fully Observable Non-Deterministic planning (FOND planning) is the problem of finding an action strategy that maps a given initial state into a goal state, assuming that the effects of the actions are not deterministic, and the states are fully observable. The solutions of a FOND planning problem are policies, or functions that map of states into actions. Depending on the level of confidence of the solutions, we distinguish among three types of plans: (i) the weak plans are the sequence of action effects that possibly reach the goal. (ii) the strong plans are guaranteed to reach the goal in a bounded number of steps. (iii) the strong cyclic plans are guaranteed to reach the goal, assuming fair non-deterministism but, in this case, not necessarily in a bounded number of steps. Loops may appear in strong cyclic solutions. FOND Planners Compared to classical planning, there is not that much research in FOND planning. Some of the last FOND planners – e.g., FIP [Fu et al., 2008], FF-H+ [Yoon et al., 2010], or PRP [Muise et al., 2012] – improve the performance of the older FOND planners, but still do not seem to exploit the structure of the problems as deeply as it has been the case in classical planning. The table 1.2 shows the most relevant recent FOND and MDP 2 planners. There is not much variety of FOND-specific planners, and some of them are not FOND-specific, but MDP solvers. The fact that, in the IPPC 2008 there was only one planner competitor in the FOND track suggests that, at least by that time, there were not too many specifically advances in FOND planning. 2 The main difference between a Markov Decision Process (MDP) model and a FOND model is that, in MDPs, the transition probabilities are defined, while in a FOND model not. A FOND problem is easily translatable into an MDP equivalent problem, so an MDP planner can be used to solve FOND problems. 10 Planner Published Track Institution PRP Beaver MIT-ACL SPUDD Glutton PROST FF-H+ Gamer FIP ICAPS 2012 FOND University of Toronto IPPC 2011 MDP Oregon State University IPPC 2011 MDP MIT IPPC 2011 MDP University of Waterloo IPPC 2011 MDP University of Washington IPPC 2011 MDP University of Freiburg ICAPS 2010 MDP Embeded Reasoning Area IPPC 2008 FOND TU Dortmund ICTAI 2008 FOND University of Texas Table 1.2: Most relevant FOND and MDP planners, from 2008 to 2012. 1.4 Thesis Outline Various approaches have been taken to explore the structure of classical planning problems. We propose to take advantage of the approach presented in [Geffner and Lipovetzky, 2012] for classical planning, and adapt it to study FOND problems. In chapter 2 we introduce a width notion that bounds the complexity of FOND planning problems, and demonstrate theoretically that some typical benchmark domains have a low bounded width independently of the initial state and problem size, provided that the goals are restricted to be single atoms. In chapter 3 we propose an algorithm to find solutions in FOND problems, whose complexity is exponential in the problem width. We show empirically that many of the typical benchmark domains have small bounded width provided that the goals are restricted to be single atoms. In chapter 4 we define a serialization model for FOND planning to tackle FOND problems whose goals are conjunctions of atoms. Based on that, we propose two different algorithms: an online planner, and an offline planner that construct a serialization of the goals of the problem. We show empirically that some benchmark domains can be solved by a suitable serialization. Finally, in chapter 5 we propose a method for checking the existence of solutions in a graph representation of a planning problem. This method can be applied in the algorithm introduced in chapter 3 with negligible time complexity. 11 Part II Structure 13 Chapter 2 A WIDTH NOTION FOR FOND PLANNING In this chapter we define a width notion, together with a graph representation, that explains the complexity in Fully Observable Non-Deterministic (FOND) problems. This width notion, inspired in the work in classical planning by [Geffner and Lipovetzky, 2012], provides a new approach to understand the complexity in FOND planning problems. Finally, we prove that some of the typical benchmark domains have low width provided that the goals of the problems are restricted to single atoms. 2.1 Motivation The solutions of a classical planning problem are plans that map the initial state of the problem into the goal state through a sequence of intermediate states I = s0 , s1 , . . . , sm = G. None of the intermediate states si is a deadend of the problem, because every si makes true a certain set of atoms that makes possible to keep on advancing in the problem towards the goal. In FOND planning, the effects of the actions are not deterministic and, given a policy ⇡ that solves a FOND problem and guarantees the reachability of the goal, every reachable state by ⇡ is necessarily not a deadend. Of course, given an action a = ⇡(s), some effects of a may be desired because they advance towards the goal state, whereas other effects of a may be undesired because they move away from the goal state. However, in all cases every reachable state makes true a set of atoms that permits to advance in the problem. We focus in these sets of relevant atoms that, when made true, make feasible to advance towards the goal no matter the value of the other atoms. 15 These sets of atoms define a class of states, S , that is the set of states that make true. In any policy ⇡, each reachable state do belong to a class of the form S , so the transitions among states can be mapped into transitions among classes. Inspired by this notion, we will define a class of states rt , together with a graph that encodes the transitions between the aforementioned classes. The policies of the original problem become transitions in the graph, and the structure of the subgraph defined by the policy defines which sets of atoms (and its value) make feasible to advance in the problem until reaching the goal state. 2.2 Tuple Graph There exist different ways to study the reachability relations between states in a planning problem, e.g. making use of heuristics [Bonet and Geffner, 2001], or a graph representation of the problem [McDermott, 1999]. Planning graphs were first introduced by GraphPlan planner [Blum and Furst, 1997], and are a computationally cheap way to obtain information about the structure of a problem. Every planning problem admits a trivial graph representation, in which each state is mapped into a different node of the graph, and the transitions between states are represented by directed edges between nodes in the graph. However, more abstract graph representations of a planning problem have been developed, that are more convenient for studying its complexity. Based on the approach taken in [Geffner and Lipovetzky, 2012] for classical planning, in this section we define the tuple graph for FOND planning problems. We consider that FOND problems expressed in the standard form of definition 2.2.1. We assume that action costs are all 1, and the optimal plans are the shortest ones. In order to guarantee the consistency of the definitions, as well as to guarantee the completeness of the algorithm that we will present, we need to consider that the representation of a FOND problem P contains the negate fluents. That is, for every literal p of P , there also exists a literal q whose value corresponds to the negation of p. In terms of notation, we write P(t) to refer to the planning problem that is like P but with goal t and, given an action a, we write aj to denote the j-th effect of a. Definition 2.2.1 (FOND Problem in Standard Form). We consider a FOND Problem P = hF, I, A, G} in standard form when all costs are 1 and, for every propositional fluent f 2 F , F also contains a fluent f¯ corresponding to the negation of f . 16 The nodes of the tuple graph G are sets of states of the form S ? (t), where t is a tuple of atoms from P. For every tuple t, the states that belong to S ? (t) are the states that achieve t through an optimal weak plan. The edges aj in G are of the form S ? (t1 ) ! S ? (t2 ), when for every state s1 2 S ? (t1 ) exists s2 2 S ? (t2 ) such that aj (s1 ) models s2 . Finite-State Controller Finite-state controllers are a compact action selection mechanism [Geffner and Bonet, 2013]. A finite-state controller CN with N controller states q0 , . . . , qN 1 for a fully observable non-deterministic problem P can be fully characterized by a set of tuples (q, a, o, q 0 ), where q and q 0 are controller states, and a is an action. Such a tuple says to perform the action a when the controller is in state q, and to switch then to state q 0 if the observation is o. A subgraph G 0 of the tuple graph G defines a finite-state controller CG 0 on the problem P. The states of CG 0 are the nodes S ? (t) of G 0 , and (S ? (t), a, oj , S ? (t0 )) aj is a tuple in CG 0 when there exist an edge S ? (t) ! S ? (t0 ) in G 0 for some effect aj . Such a tuple says to perform the action a when the controller is in state S ? (t), and to switch to state S ? (t0 ) when the effect of the action a is aj . The resulting world state, then, models a state that belongs to S ? (t0 ). Note that, in order the finite-state controller to be well defined, the node S ? (t) needs to have an outer edge labeled with ai for all the effects ai of the action a. Finally, let ⇧ be a strong cyclic policy for P. We say that ⇧ is equivalent to ⇧G 0 , and we write ⇧ ⇠ ⇧G , when for every reachable state s by ⇧ and every realization of the problem, the action given by the finite-state controller defined by G 0 equals to ⇧(s). Notion of Width The world state s corresponding to the realization of the actions given by CG 0 is such that, when CG 0 is in a state S ? (t), the world state s models a certain s0 2 S ? (t), but not necessarily all s00 2 S ? (t). However, the set of atoms that are common in all the states of S ? (t) are true in s. Such set, that is a partial state, is not empty and contains at least the atoms that are present in the tuple t. We call this partial state the side-effects of the tuple t. Definition 2.2.2. The side effects of a tuple of atoms t is the partial state that makes true the atoms that are necessarily true when t is achieved through an optimal weak plan. The action given by CG 0 in a controller state S ? (t) is applicable in every state s that models any s0 2 S ? (t) and, thus, is applicable in every state 17 that models the side effects of the tuple t. The transitions among controller states can be seen then as transitions among states that make true the side effects of certain tuples. That is, the side-effects of such tuples make feasible to advance towards the goal of the problem. Since the complexity of a finite-state controller is related to its number of states, it makes sense to characterize the complexity of the planning problems in terms of the minimum size of the subgraphs G 0 such that the finite-state controller CG 0 solves the problem. With this purpose, we define the width of a subgraph G 0 as the maximum size of the tuples t s.t. S ? (t) is a node of G 0 . Since the of nodes in G 0 are of the form S ? (t) with |t| w(P), the maximum number of controller states necessary to solve the problem is, at most, exponential in the width of the subgraph. Finally, we define the width of a planning problem as the minimum width of the subgraphs that define a strong cyclic policy that solves the problem. The width of a problem P bounds the minimum number of states in a finite-state controller necessary to solve P. Definition 2.2.3. Let P be a FOND planning problem, and let G 0 be a subgraph of the tuple graph G. The width of G 0 is defined as w(G 0 ) := max{|t| s.t. S ? (t) 2 G}. Definition 2.2.4. Let P be a FOND planning problem. The width of P is defined as w(P) := min{w(G 0 ) s.t. CG 0 solves P}. Theorem 2.2.1. Let P be a FOND planning problem. The minimum size of a finite-state controller that solves P is, at most, exponential in the problem width. Proof. If w(P) = w, there exists a subgraph G 0 of the tuple graph G that defines a finite-state controller that solves P, and such that the nodes in G 0 are of the form S ? (t), |t| w. The number of tuples with cardinality not greater than w is exponential in w, and the number of nodes in G 0 is bounded by the same number. Theorem 2.2.2. Let P be a FOND planning problem. Given a strong cyclic policy ⇧ that solves P, there exists a subgraph of the tuple graph G that defines a policy equivalent to ⇧. Proof. It is sufficient to consider, for every reachable state s by ⇧, the subgraph made of the reachable nodes S ? (s) = s and transitions given by the policy ⇧. Here, the fact that P is in standard form and contains the negate fluents is crucial. 18 2.3 Low Width Benchmarks As long as the width of a planning problem gives a bound on its complexity, we are interested to know if the width of the benchmark planning problems is small. As an illustration, in this section we show a sketch of the proof that the width of a selection of the Blocks World, the Climber, and the Bus-Fare domains is bounded when their goals are restricted to single atoms, no matter the initial state or the size of the problem. While the Blocks World is a wellknown domain in planning competitions, the other two domains have been extracted from [Little and Thiebaux, 2007] and, according to the author, are probabilistically interesting domains with potential for deadends that may be difficult so be solved by state-of-the-art planners [Muise et al., 2012]. Blocks World The Blocks World is a typical benchmark domain in classical planning. In this problem, the agent can pick-up a block and move it either on the table or on top of another block. The difference between the Blocks world domain for classical planning and their adaptation into FOND planning is that, in the last domain, the agent can also move a tower of 2 blocks at once. In addition, the effects of the actions are not deterministic, and the blocks may fall down on the table when the agent tries to pick-up or move them. The complete description of the Blocks World domain in PDDL is provided in listing 2.1. In this language the non-deterministic effects of an action are represented by the oneof statement, that lists the possible outcomes of the action. Figure 2.1: Illustration of the Blocks World domain, in which an agent has to order a set of blocks into an specific order. 19 Listing 2.1: PDDL domain description of the Blocks World FOND problem ( d e f i n e ( domain b l o c k s domain ) ( : r e q u i r e m e n t s : non d e t e r m i n i s t i c : e q u a l i t y : t y p i n g ) ( : types block ) ( : p r e d i c a t e s ( h o l d i n g ?b b l o c k ) ( emptyhand ) ( on t a b l e ? b block ) ( on ? b1 ? b2 b l o c k ) ( c l e a r ?b block )) ( : a c t i o n p i c k up : p a r a m e t e r s ( ? b1 ? b2 block ) : p r e c o n d i t i o n ( and ( n o t (= ? b1 ? b2 ) ) ( emptyhand ) ( c l e a r ? b1 ) ( on ? b1 ? b2 ) ) : e f f e c t ( oneof ( and ( h o l d i n g ? b1 ) ( c l e a r ? b2 ) ( n o t ( emptyhand ) ) ( n o t ( c l e a r ? b1 ) ) ( n o t ( on ? b1 ? b2 ) ) ) ( and ( c l e a r ? b2 ) ( on t a b l e ? b1 ) ( n o t ( on ? b1 ? b2 ) ) ) ) ) ( : a c t i o n p i c k up from t a b l e : parameters (?b block ) : p r e c o n d i t i o n ( and ( emptyhand ) ( c l e a r ? b ) ( on t a b l e ? b ) ) : e f f e c t ( oneof ( and ) ( and ( h o l d i n g ? b ) ( n o t ( emptyhand ) ) ( n o t ( on t a b l e ? b ) ) ) ) ) ( : a c t i o n put on b l o c k : p a r a m e t e r s ( ? b1 ? b2 block ) : p r e c o n d i t i o n ( and ( h o l d i n g ? b1 ) ( c l e a r ? b2 ) ) : e f f e c t ( oneof ( and ( on ? b1 ? b2 ) ( emptyhand ) ( c l e a r ? b1 ) ( n o t ( h o l d i n g ? b1 ) ) ( n o t ( c l e a r ? b2 ) ) ) ( and ( on t a b l e ? b1 ) ( emptyhand ) ( c l e a r ? b1 ) ( n o t ( h o l d i n g ? b1 ) ) ) ) ) ( : a c t i o n put down : parameters (?b block ) : p r e c o n d i t i o n ( h o l d i n g ?b ) ( c l e a r ?b ) : e f f e c t ( and ( on t a b l e ? b ) ( emptyhand ) ( c l e a r ? b ) ( n o t ( h o l d i n g ? b ) ) ) ) ( : a c t i o n pick tower : p a r a m e t e r s ( ? b1 ? b2 ? b3 block ) : p r e c o n d i t i o n ( and ( emptyhand ) ( on ? b1 ? b2 ) ( on ? b2 ? b3 ) ( c l e a r ? b1 ) ) : e f f e c t ( oneof ( and ) ( and ( h o l d i n g ? b2 ) ( c l e a r ? b3 ) ( n o t ( emptyhand ) ) ( n o t ( on ? b2 ? b3 ) ) ) ) ) ( : a c t i o n put tower on b l o c k : p a r a m e t e r s ( ? b1 ? b2 ? b3 block ) : p r e c o n d i t i o n ( and ( h o l d i n g ? b2 ) ( on ? b1 ? b2 ) ( c l e a r ? b3 ) ) : e f f e c t ( oneof ( and ( on ? b2 ? b3 ) ( emptyhand ) ( n o t ( h o l d i n g ? b2 ) ) ( n o t ( c l e a r ? b3 ) ) ) ( and ( on t a b l e ? b2 ) ( emptyhand ) ( n o t ( h o l d i n g ? b2 ) ) ) ) ) ( : a c t i o n put tower down : p a r a m e t e r s ( ? b1 ? b2 block ) : p r e c o n d i t i o n ( and ( h o l d i n g ? b2 ) ( on ? b1 ? b2 ) ) : e f f e c t ( and ( on t a b l e ? b2 ) ( emptyhand ) ( n o t ( h o l d i n g ? b2 ) ) ) ) ) Listing 2.2: PDDL instance description of the Blocks World FOND problem ( d e f i n e ( problem example ) ( : domain b l o c k s domain ) ( : o b j e c t s b1 b2 b3 block ) ( : i n i t ( emptyhand ) ( on b3 b1 ) ( on t a b l e b1 ) ( on t a b l e b2 ) ( c l e a r b3 ) ( c l e a r b2 ) ) ( : g o a l ( and ( emptyhand ) ( on b1 b2 ) ( on b2 b3 ) ( on t a b l e b3 ) ( c l e a r b1 ) ) ) ) 20 The width of the Blocks World problem is bounded by 3, provided that the goals are single atoms. To prove it, is sufficient to prove that the problem has bounded width when the goal is of one of the predicates of the problem: (holding ?b - block), (emptyhand), (on-table ?b - block), (on ?b1 ?b2 - block), or (clear ?b - block). We divide then the proof in five cases, depending on the type of atomic goal, to see that in every case the with is bounded by, or equal to 3. Case A: holding ?b Let b1 , . . . , bm be the blocks belonging to the same tower T of b, and on the top of b. We consider the policy ⇡ that picks up a pair of blocks at once from the top of T , and puts them down on the table. Depending on the parity of the number of blocks on the top of b the agent may need to pick-up a single block at the end of the plan. In addition, if the block b is on the table, the agent can not hold b with the action pick-tower, but with the action pick-up. Eventually, the agent will hold the block b. 8 pick-tower(bi , bj , bk ) if (emptyhand) (on ?bi ?bj ) (on ?bj ?bk ) (clear ?bi ) > > > > put-tower-down(bi , bj ) if (holding ?bj ) (on ?bi ?bj ) > > < pick-up(bi , b) if (emptyhand) (on ?bi ?b) (clear ?bi ) ⇡(s) = pick-up(b, bi ) if (emptyhand) (on ?b ?bi ) (clear ?b) > > > > put-down(bi ) if (holding ?bi ) (clear ?bi ) > > : pick-up-form-table(b) if (emptyhand) (clear ?b) (on-table ?b) When the number of blocks on top of b is odd, and there does exist blocks below b, the policy ⇡ is equivalent to the policy ⇡G 0 , where G 0 is the subgraph made of nodes of the form S ? (t) for the tuples tc,i = (clear ?bi ) (emptyhand), th,i = (holding ?bi ) (not-emptyhand), th,0 = (holding ?b) (not-emptyhand). It is not difficult to see that the sets S ? (t) are singletones, and the edges of G 0 are of the form S ? (tc,i ) pick tower(bi ,bj ,bk )1 put tower down(bi ,bj ) ! S ? (th,j ), S ? (tc,i ) pick tower(bi ,bj ,bk )2 ! S ? (tc,i ), S ? (th,i ) ! S ? (tc,k ). Therefore, the width of the problem in this case is bounded by 2. When the number of blocks on top ob b is even, and/or when b is on the table the finite-state controller is a little bit more complex, and we need to consider additional controller states S ? (t) with tuples to,1 = (clear ?b) (on-table ?b), to,2 = (clear ?b) (not-on-table ?b), tm,1 = (emptyhand) (clear ?bm ), tm,2 = (not-emptyhand) (clear ?bm ). The edges on the subgraph are those given by the policy ⇡ above, and the width of the problem is still bounded by 2. 21 Case B: emptyhand In this case, the trivial policy ⇡(s) = put-down(b) if (holding ?b) is guaranteed to solve the problem for any initial state. put down(b) The subgraph G 0 with S ? (t1 ) ! S ? (t2 ), t1 = (not-emptyhand), t2 = (emptyhand), is such that ⇡ ⇠ ⇡G 0 , so the width of this problem is 1. Case C: on-table ?b The policy shown in Case A is also a policy for this case. Case D: on ?b1 ?b2 The trick here is to take profit of the existence of a controller state machine that places the blocks b1 and b2 on the table. It is not difficult to see, from the previous cases, that this controller can be defined with a subgraph of width 2. We then add two extra controller states S ? (t) for the tuples t1 = (clear ?b1 ) (clear ?b2 ) (emptyhand), t2 = (holding ?b1 ) (clear ?b2 ). The actions performed in these states are, respectively, pick-up(b1 ), and put-on-block(b1 , b2 ). In this case, then, the width of the problem is bounded by 3. Case E: clear ?b The policy shown in Case A is also a policy for this case. Climber The Climber problem scenario consists of an agent that is initially placed on the roof, and whose goal is to descend to the ground without dying in the attempt. The agent can climb down without ladder, and descend to the ground with a chance to dye. It the ladder is raised, the agent can descend safely to the ground. In case the ladder is on the ground, the agent can call for help, and the ladder is raised. The detailed description in PDDL of the Climber problem is specified in listings 2.3 and 2.4. 22 Listing 2.3: PDDL domain description of the Climber FOND problem ( d e f i n e ( domain c l i m b e r ) ( : r e q u i r e m e n t s : t y p i n g : s t r i p s : non d e t e r m i n i s t i c ) ( : p r e d i c a t e s ( on r o o f ) ( on ground ) ( l a d d e r r a i s e d ) ( l a d d e r on ground ) ( a l i v e ) ) ( : a c t i o n climb without l a d d e r : parameters ( ) : p r e c o n d i t i o n ( and ( on r o o f ) ( a l i v e ) ) : e f f e c t ( and ( n o t ( on r o o f ) ) ( on ground ) ( o n e o f ( and ) ( not ( a l i v e ) ) ) ) ) ( : a c t i o n c l i m b with l a d d e r : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( and ( on r o o f ) ( a l i v e ) ( l a d d e r r a i s e d ) ) : e f f e c t ( and ( n o t ( on r o o f ) ) ( on ground ) ) ) ( : action c a l l for help : parameters ( ) : p r e c o n d i t i o n ( and ( on r o o f ) ( a l i v e ) ( l a d d e r on ground ) ) : e f f e c t ( and ( n o t ( l a d d e r on ground ) ) ( l a d d e r r a i s e d ) ) ) ) Listing 2.4: PDDL instance description of the Climber FOND problem ( d e f i n e ( problem c l i m b e r problem ) ( : domain c l i m b e r ) ( : i n i t ( on r o o f ) ( a l i v e ) ( l a d d e r on ground ) ) ( : g o a l ( and ( on ground ) ( a l i v e ) ) ) ) The width of the Climber problem is 1 for any possible initial state and size of the problem, provided that the goals are single atoms. To prove it, it is sufficient to prove that the width of the problem is bounded by 1 when the goal is one of the predicates of the problem: (on-roof ), (on-ground), (ladderraised), (ladder-on-ground ), or (alive). We divide then the demonstration in five cases, depending on the type of atomic goal, to see that in every case the with is bounded by, or equal to 1. As a consequence, the Climber domain can be solved in linear complexity. We assume that, in the initial state, the agent is alive and placed on the roof, and the ladder is placed on the ground. Case A: on-roof In the climber domain, the agent is supposed to be on the roof in the initial state. No action is needed to achieve the goal, so the width in this case is 0. Case B: on-ground In this case, the trivial policy ⇡(s) = climb-without-ladder is guaranteed to solve the problem for any initial sate. We can consider the subgraph G 0 whose nodes are of the form S ? (t) for clib without ladder tuples t0 = (on-roof), t1 = (on-ground), and edge S ? (t0 ) ! ? S (t1 ). Therefore, the width of the problem in this case is 1. 23 Case C: ladder-raised In this case, the trivial policy ⇡(s) = call-for-help is guaranteed to solve the problem for any initial sate. We can consider the subgraph G 0 whose nodes are of the form S ? (t) for tu- ples t0 = (not-ladder-raised), t1 = (ladder-raised), and edge S ? (t0 ) S ? (t1 ). Therefore, the width of the problem in this case is 1. call f or help ! Case D: ladder-on-ground There is no action that places the ladder on the ground, so we need to assume that it is already placed on the ground in the initial state. No action is needed to achieve the goal, so the width in this case is 0. Case E: alive There is no action that turns the agent back to life, so we need to assume that the agent is alive in the initial state. No action is needed to achieve the goal, so the width in this case is 0. Bus Fare The Bus Fare problem scenario consists of an agent that wants to buy a bus fare. There exist three types of coins. The bus fare can be exchanged by a type-3 coin. Additionally, there exist different tasks that allow the agent to change coins. The agent can bet a type-1 coin, getting a type-3 coin as a reward or losing it. The agent can bet a type-2 coin, getting either a type-1 or a type-3 coin as a reward. The agent can invest a type-1 coin on washing a car, getting a type-2 coin as a reward or keeping the type-1 coin. The agent can invest a type-2 coin on washing a car, getting a type-1 coin as a reward or keeping the type-2 coin. The detailed description in PDDL of the Bus Fare problem is specified in listings 2.5 and 2.6. 24 Listing 2.5: PDDL domain description of the Bus Fare FOND problem ( d e f i n e ( domain bus f a r e ) ( : r e q u i r e m e n t s : t y p i n g : s t r i p s : e q u a l i t y : non d e t e r m i n i s t i c ) ( : types coin ) ( : p r e d i c a t e s ( have 1 c o i n ) ( have 2 c o i n ) ( have 3 c o i n ) ( have ( : a c t i o n be t c o i n 1 : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( have 1 c o i n ) : e f f e c t ( and ( n o t ( have 1 c o i n ) ) ( oneof ( and ) ( have 3 c o i n ) ) ) ) ( : a c t i o n be t c o i n 2 : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( have 2 c o i n ) : e f f e c t ( and ( n o t ( have 2 c o i n ) ) ( oneof ( have 3 c o i n ) ( have 1 c o i n ) ) ) ) ( : a c t i o n wash c a r 1 : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( have 1 c o i n ) : e f f e c t ( oneof ( and ) ( and ( n o t ( have 1 c o i n ) ) ( have 2 c o i n ) ) ) ) ( : a c t i o n wash c a r 2 : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( have 2 c o i n ) : e f f e c t ( oneof ( and ) ( and ( n o t ( have 2 c o i n ) ) ( have 1 c o i n ) ) ) ) ( : a c t i o n buy f a r e : p a r a m e t e r s ( ) : p r e c o n d i t i o n ( have 3 c o i n ) : e f f e c t ( and ( n o t ( have 3 c o i n ) ) ( have f a r e ) ) ) ) fare )) Listing 2.6: PDDL instance description of the Bus Fare FOND problem ( d e f i n e ( problem bus f a r e ( : domain bus f a r e ) ( : i n i t ( have 1 c o i n ) ) ( : g o a l ( have f a r e ) ) ) problem ) The width of the Bus Fare problem is 1 for any possible initial state and size of the problem, provided that the goals are single atoms. To prove it, is sufficient to prove that the width of the problem is bounded by 1 when the goal is one og the predicates of the problem: (have-1-coin, have-2-coin, have-3-coin, or have-fare. We divide then de demonstration in four cases, depending on the type of atomic goal, to see that in every case the width is bounded by, or equal to 1. As a consequence, the Bus Fare domain can be solved in linear complexity. We assume that, in the initial state, the agent as at least one coin. Case A: have-1-coin In this problem there is no way to get a type-1 coin from a type-3 coin, so we assume that, in this case, not all the coins of the agent are of type-3. We consider the trivial policy ⇡(s) = wash-car-2. Assuming fair non-determinism of the action effects, this policy is guaranteed to solve the problem for any initial state. 25 We can consider the subgraph G 0 whose nodes are of the form S ? (t) for tu- ples t0 = (non-have-1-coin), t1 = (have-1-coin), and edges S ? (t0 ) ? ? S (t0 ), S (t0 ) this case is 1. wash car 12 wash car 11 ! ! S (t1 ). Therefore, the width of the problem in ? Case B: have-2-coin In this problem there is no way to get a type-1 coin from a type-3 coin, so we assume that, in this case, not all the coins of the agent are of type-3. We consider the trivial policy ⇡(s) = wash-car-1. Assuming fair non-determinism of the action effects, this policy is guaranteed to solve the problem for any initial state. We can consider the subgraph G 0 whose nodes are of the form S ? (t) for tuples t0 = (non-have-2-coin), t1 = (have-2-coin), and edges S ? (t0 ) S ? (t0 ), S ? (t0 ) this case is 1. wash car 22 wash car 21 ! ! S ? (t1 ). Therefore, the width of the problem in Case C: have-3-coin We consider the policy that, when the agent has a type-1 coin, it invests it washing a car, so eventually it will get a type-2 coin. When the agent has a type-2 coin, it bets it, so eventually it will get a type-3 coin. Assuming fair non-determinism of the action effects, ⇡ defines a valid strong cyclic policy for the problem. ⇡(s) = ⇢ wash-car-1 if (have-1-coin) bet-coin-2 if (have-2-coin) We can consider the subgraph G 0 whose nodes are of the form S ? (t) for tuples t1 = (have-1-coin), t2 = (have-2-coin), t3 = (have-3-coin), and edges S ? (t1 ) wash car 11 S ? (t3 ), S ? (t2 ) case is 1. ! S ? (t1 ), S ? (t1 ) bet coin 22 wash car 12 ! S ? (t2 ), S ? (t2 ) bet coin 21 ! ! S ? (t1 ). Therefore, the width of the problem in this Case D: have-fare We consider a similar policy than the one presented with the goal (have-3coin), with the difference that here, an additional action is considered when the agent has a type-3 coin. 26 8 < wash-car-1 if (have-1-coin) bet-coin-2 if (have-2-coin) ⇡(s) = : buy-fare if (have-3-coin) We can consider the subgraph G 0 of the previous subsection, with the buy f are extra node S ? (tg ), tg = (have-fare) and extra edge S ? (t3 ) ! S ? (tg ). Therefore, the width of the problem in this case is 1. 2.4 Conclusion In FOND problems, the non-determinism of the actions may result in undesired action effects that move the agent away from the optimal path towards the goal state. Computing strong cyclic policies that solve those type of problems is NP hard. We exploit the idea that every reachable state by a strong cyclic policy ⇡ is not a deadend of the problem, because it makes true a certain tuple of atoms that allows the agent to advance in the problem towards the goal. We have introduced a width notion that measures the complexity of FOND planning problems based on the reachability over these tuples of atoms. Finally, we have proven theoretically that some benchmark domains have small bounded width no matter the initial state or the size of the problem, provided that the goals are restricted to single atoms. Most notable is that some of these domains are tricky and have considerable number of deadends, while its width parameter is considerably low. 27 Chapter 3 NON-DETERMINISTIC ITERATED WIDTH In this chapter we introduce the Non Deterministic Iterated Width (NDIW) algorithm. The NDIW algorithm is able to find strong cyclic policies for FOND problems, and runs in time and space complexity that is exponential in the problem width. We then prove experimentally that many of the typical benchmark domains have low width, provided that goals are restricted to single atoms. 3.1 Tuple Graph G i The tuple graph G introduced in chapter 2 is made of nodes S(t) that are the set of states that achieve a tuple of atoms t following an optimal weak plan, and its computation is intractable (PSPACE complete). Therefore, instead of computing the sets S ? (t) are interested in computing a tuple graph G i whose nodes are an estimation of the side-effects of the tuples t with size no longer than the parameter i, and whose complexity is polynomial. Definition 3.1.1. T i is the set of tuples t from the problem P that have size no greater than the parameter i. Definition 3.1.2. Let P be a FOND problem. The tuple graph G i is the graph with nodes of the type rt , t 2 T i , defined inductively as follows: 1. rt is a root in G i if t is true in I. 2. R? (t0 ) is the set of states s such that there exists rt 2 G i , an i-compact action a and one effect aj such that s = aj (rt ), t0 is true in s, and s achieves t0 through a minimal length path in G i . 29 3. rt is the partial state that contains the literals that are true in all the states in R? (t). aj 4. The edge rt ! rt0 is in G i if a is i-compact in rt , and rt0 is true in aj (rt ). Definition 3.1.3. An action a is i-compact in a partial state rt if, for every effect aj , the state s = aj (rt ) either makes true a tuple t 2 T i through an minimal length path in G i , or s models a partial state rt0 in G i . aj An edge rt ! rt0 in G i indicates that, for every state s that models rt , it exists an action a and an effect aj that maps s into a state that models rt0 . In addition, the i-compactness criteria guarantee that any other effect of a maps s into a state that models some partial state rt00 . In other words, the tuple graph G i defines a finite-state controller, whose states rt are partial states, and the transitions correspond to actions. We say that G i solves P if there exists a subgraph of G i that defines a strong cyclic policy in form of a finite-state controller. Theorem 3.1.1. Let P be a FOND planning problem. If w = w(P), the tuple graph G w solves P. proof sketch. Consider a subgraph G 0 of G such that w(P) = w(G 0 ) = w. We want to see that there exist a mapping from G 0 into a subgraph of G w that defines a strong cyclic policy in form of a finite-state controller. The mapping between nodes is injective, and is defined inductively as follows: Let r0 = I be a initial state of G w , and let S 0 = {I} be a initial state of G 0 . Let ak be the policy given by ⇧G 0 in the state S k , 0 k < |G 0 | 1. Then, the action ak is w-compact in rk . In addition, for every effect ajk , there aj k exists a tuple t, an edge S k ! S ? (t) in G 0 and a state rt in G w such that rk models the side effects of t. This mapping defines a subgraph of G w , such that the policies defined by both subgraphs are equivalent. As ⇧G 0 is a policy for P, the sets S ? t that are goal states are such that the side effects of t do model the goal G. In consequence, the partial state rt corresponding to the mapping of S ? (t) in G w also models G, so is a goal state of the problem. 3.2 Algorithm In this section we introduce the Non-Deterministic Iterated Width (N DIW ) algorithm. Inspired in the Iterated Width (IW ) algorithm used in classical 30 planning [Geffner and Lipovetzky, 2012], the N DIW algorithm consists of a sequence of calls N DIW (i) for i = 0, 1, 2, . . . over a problem P until the problem is solved or i exceeds the number of problem variables. Each iteration N DIW (i) is an i-width search that expands the graph G i and checks for the existence of a subgraph that defines a strong cyclic policy for P in form of a finite-state controller. N DIW (i) is a forward-state breadth-first search of partial states rt with a couple of variations: the partial states rt are only expanded by i-compact actions into states s, and those states s are used to compute new partial states rt in the graph. As shown in lemma 3.2.1, the partial states rt are computed sequentially along with the expansion of G i , and without the need of the set R? (t). Each effect of an i-compact action a in rt either leads to a state s0 through an optimal weak plan for a tuple t0 2 T i , or leads to a state s0 that makes true a state of the form rt0 , t0 2 T i . In the first case, rt0 is updated with the intersection of s0 as suggested in lemma 3.2.1. In the second case, it is sufficient to test if s0 makes true all the atoms of some state rt 0 . Lemma 3.2.1. Let P be a FOND planning problem, and let ⇡ 1 (t), . . . , ⇡ m (t) be the collection of optimal weak plans for a tuple t, in an arbitrary order. The partial state rt , is the limit of the sequence rt1 , . . . , rtm , where: (i) rt1 = ⇡ 1 (t) (ii) rti = rti 1 \ ⇡ i (t), t = 2, . . . , m Proof. The atoms that are true in rt are those that are true in all states in R? (t) so, as a set of atoms, rt is the intersection of all states s 2 R? (t), i.e, the intersection of ⇡ k , 1 k m. Definition 3.2.1. N DIW (i) is a breadth-first search that keeps partial states rt , t 2 T i , and expands states rt by i-compact actions. Listing 3.1: NDIW(i) pseudoalgorithm. NDIW i ( I n i t ) : for t u p l e t in I n i t : r_t < I Stack < r_t while not S t a c k . empty ( ) : r < S t a c k . pop ( ) f o r i compact a c t i o n a i n r : f o r s , a^ j i n F( a , s ) : f o r new t u p l e t i n s : r_t < s Add_Edge ( r , r_t , a^ j ) for t u p l e t in s a c h i e v e d r_t < r_t \ cup s Add_Edge ( r , r_t , a^ j ) 31 optimally : According to theorem 3.1.1, the NDIW(w) algorithm is guaranteed to find a strong cyclic policy if it exists. However, the width of a problem is not known in general, so the N DIW algorithm needs to iterate through searches N DIW (i) for i = 0, 1, . . . until a solution is found (maybe with i<w(P) or i exceeds the number of variables in P. Note that N DIW (i) algorithm expands the search graph in a blind manner, i.e., the goal G of the problem P does not interfere in this process. Upon expansion of the search graph G i , the N DIW algorithm checks for the existence of a subgraph that define a policiy that solves P. This step is independent of the construction of the graph, and can be performed using different techniques (see chapter 5). The complexity of the complete process is still exponential in the problem width. Definition 3.2.2. The N DIW (i) algorithm solves P if the tuple graph G i contains a subgraph G 0 that defines a policy for P in form of a finite-state controller. Definition 3.2.3. N DIW calls N DIW (i) sequentially for i = 0, 1, . . . until the problem is solved or i exceeds the number of problem variables. Theorem 3.2.2. For solvable problems P, the time and space complexity of N DIW are exponential in 2w(P). Proof. The NDIW algorithm performs the calls N DIW (i), i = 1 . . . w(P). In each N DIW (i) subcall, the tuple graph G i is expanded. The number of nodes in G i is exponential in the parameter i, so the space complexity of N DIW is exponential in w(P). Despite the tuple graph is usually sparse, the worst-case number of edges in G i is exponential in 2w(P). The partial states rt are computed dynamically along with the expansion of the tuple graph G i . In order to check if an action a is i-compact in a partial state rt , NDIW(i) checks whether, for each effect aj , it exists a tuple t0 2 T i , t0 2 s = F (rt , aj ), such that either t0 is generated optimally, or s |= rt0 and the partial state rt0 is already generated. Checking if an action is i-compact in a partial state has a complexity that is exponential in i. Performing these tests during the expansion of G i takes a complexity that is exponential in 2i so, again, the worst-case complexity is exponential in 2w(P). When expanding a partial state rt by an i-compact action a, N DIW (i) adds the outer edges connecting rt with the corresponding nodes in the graph. aj An edge rt ! rt0 exists when either the tuple t0 has been generated optimally, or the state s = F (rt , aj ) models rt0 . Similarly to the check of i-compact actions, these process is exponential in the parameter i so, adding the edges of all the nodes in G i takes a complexity that is exponential in 2i. Finally, the time needed to check if G i encodes a policy for P, according to the algorithm presented in chapter 5, is proportional to N 2 . 32 3.3 Empirical Evaluation In this section we analyze the performance of the NDIW algorithm through a series of tests on a selection of domains used in previous planning competitions, restricting the problems to have atomic goals. First, we compare the width reported by the N DIW algorithm against the width reported by the IW algorithm in classical planning problems. Then, we move to FOND domains, for which we compute the width, and compare the performance of the N DIW planner against the state-of-the-art FOND planners. Width in Classical Problems The classical planning problems can be considered a subclass of the FOND planning problems – those in which every set of effects is a singleton, i.e., the actions are deterministic –, so in particular, the N DIW algorithm is also able to solve classical planning problems. As long as the width notion and NDIW algorithm introduced in this manuscript is based in the width notion and IW algorithm introduced in [Geffner and Lipovetzky, 2012] for classical planning, it is interesting to compare the performance and complexity of the N DIW and IW algorithms in classical planning problems. The NDIW algorithm has been tested with a selection of classical planning problems used in previous editions of the IPC. The experiments have been run on Xeon Woodcrest computers with clock speeds of 2.33 GHz, using 2GB memory limit. The results of the tests are presented in table 3.1, and are compared with those of IW, run on the same experimental setup, and extracted from [Geffner and Lipovetzky, 2012]. For every domain, a number of different instances with atomic goals have been tested, and classified according to the (effective) width of the problem. The number of instances that can be solved by NDIW in a time limit of 30 minutes is indicated in parenthesis (see also figure 3.2). Most part of the tested domains appear to have a low width, bounded by 2, no matter the initial state or the size of the problem. That is, as long as the goal of the problem is atomic, it can most probably be solved in linear or quadratic time and complexity. In addition, the width distribution among the problems is very similar in both algorithms. In other words, the complexity of the tuple graph of the problems using NDIW is very similar to that of IW. This is reasonable, since the width parameter defined in FOND planning is based on the width notion of classical planning and, restricted 33 Domain 8puzzle Blocksworld ferry floortile Grid gripper logistics Mystery parking pegsol satellite scanalyzer sokoban tidybot tower visitall woodworking zeno # instances NDIW IW 408 266 650 321 (408) (245) (379) (321) 3 (3) 379 (376) 214 (208) 14 (14) 31 (31) 660 (660) 42 (26) 419 (419) 99 (34) 92 (7) 232 (232) 2811 (1486) 1486 (1435) 95 (87) w=1 NDIW IW 400 598 650 538 19 1275 249 30 540 964 308 624 153 84 – 21859 1659 219 53% 22% 36% 98% 33% 0% 37% 14% 100% 90% 0.15 95% 27% 14% 100% 100% 100% 0.37 55% 26% 36% 96% 5% 0% 18% 7% 77% 92% 11% 100% 37% 12% – 100% 100% 21% w=2 NDIW IW 47% 71% 46% 2% 67% 100% 62% 86% 0% 10% 0.85 5% 33% 0% 0% 0% 0% 0.63 45% 74% 64% 4% 84% 100% 82% 93% 23% 8% 89% 0% 26% 39% – 0% 0% 79% w 3 NDIW IW 0% 7% 19% 0% 0% 0% 1% 0% 0% 0% 0.0 0% 39% 86% 0% 0% 0% 0.0 0% 0% 0% 0% 11% 0% 0% 0% 0% 0% 0% 0% 27% 49% – 0% 0% 0% Table 3.1: N DIW vs. IW width distribution in typical benchmark domains. to classical planning problems, the NDIW algorithm is quite similar to IW. When restricted to deterministic problems, both algorithms are based in a breadth-first search with pruning that keep the states that introduce new tuples of atoms optimally. The difference is that in the NDIW algorithm, the nodes are the interesection of all states that reach a tuple t optimally, whereas the IW algorithm operates with the first state that optimally reach t. The nodes in the tuple graph expanded by NDIW are more restrictive than in the IW algorithm and, as a consequence, if a solution exists for N DIW (i), the same solution must exist in IW (i), but not the opposite. This appreciation is patent in the results shown in the table 3.1, where the width of the problems using NDIW is always slightly higher than the width of the problems using IW. Width in FOND problems The NDIW algorithm has been tested with a selection of FOND problems used in previous planning competitions. The results of the tests are presented in table 3.2, and reveal that the majority of the problems have small width when the goals are restricted to single atoms. For every domain, a series of N different instances with atomic goals have been tested, and classified 34 according to the width of the problem. The number of instances that can be solved by NDIW in a time limit of 30 minutes is indicated in parenthesis. As in the case of deterministic problems, most of the tested domains appear to have a low width, bounded by 2, no matter the initial state or the size of the problem. That is, the NDIW algorithm can solve any atomic goal in linear or quadratic time and complexity. However, there are a significant number of domains that appear to have higher width. problem name blocksworld blocksworld-2 busfare climber elevators first-responders forest-new rectangle-tireworld river tireworld triangle-tireworld zenotravel N (Nsolved ) w=1 w=2 252 (199) 126 (77) 1 (1) 2 (2) 90 (90) 399 (387) 180 (20) 16 (9) 1 (1) 19 (10) 40 (31) 13 (15) 76% 63% 100% 100% 33% 86% 0% 0% 100% 21% 100% 31% 18% 19% 0% 0% 67% 14% 0% 25% 0% 0% 0% 69% w 3 6% 18% 0% 0% 0% 0% 100% 75% 0% 79% 0% 0% Table 3.2: Width reported by N DIW in a selection of FOND benchmark problems with atomic goals. Performance Analysis In this section we look in detail at the execution stats of the NDIW planner over the deterministic and FOND problems listed, respectively, in tables 3.1 and 3.2. The proves have been run on Xeon Woodcrest computers with clock speeds of 2.33 GHz, using 2 GB memory, and 30 minutes time limit on execution. The figure 3.1 shows the execution time of each problem according to the size of the problem. We distinguish, for every problem instance, if NDIW ends the process with success, a memory limit exception (MLE ), or a time limit exception (TLE ). The NDIW planner is able to solve problems when they have small size, but when the size a the problem overpasses certain threshold, the planner will most likely run out of memory. Likewise, some problems seem to be too complex, and the execution of NDIW likely exceeds the time limits even when they have small size. The execution success rate of NDIW in FOND problems (with atomic goals) is quite high, as it can be seen in table 3.2. However, the constraints in memory and time seem to be more problematic 35 (a) Execution stats of NDIW over deterministic problems. (b) Execution stats of NDIW over FOND problems. Figure 3.1: Execution stats of NDIW over deterministic and FOND problems. We distinguish among successfull processes, time limit exceptions, and memory limit exceptions. in classical planning problems. That is, the classical planning benchmark problems are more complex than those in FOND planning. And it makes sense, since the state-of-the-art in classical planning is more advanced than in FOND planning. The figure 3.2 shows the distribution of the time needed by NDIW to solve the deterministic and FOND problems, where we have omitted those problems that need more than 2 GB of memory, or last more than 30 minutes in execution. Roughly half part of all the instances are solvable in less than 1 second, and more than 90% are solvable in less than 100 seconds. This fact can be useful to establish a threshold on the complexity of a problem according to the execution time: if the execution time exceeds the order of 102 seconds, the problem will likely be too complex for NDIW, and will exceed the memory and/or time limit constraints. One of the complex FOND benchmark problems is the triangle-tireworld, 36 Empirical Time Cumulative Distribution Empirical Time Cumulative Distribution 1 Problems Solved Frequency Problems Solved Frequency 1 0.8 0.6 0.4 0.2 0 0 500 1000 1500 2000 time/s 0.8 0.6 0.4 0.2 0 −4 10 −2 10 0 10 2 10 4 10 log(time/s) Figure 3.2: Time to find a solution by N DIW for the FOND and classical planning domains restricted to atomic goals. used in the IPC’2006 and IPC’2008 competitions. There is an inherent complexity in this domain that makes it difficult to be solved, not only because of the huge states space, but also because of the likelihood to enter in a deadend state. At the time they were published, the FF-Hinsight+ planner [Yoon et al., 2010] was recognized for solving the triangle-tireworld-17 instance of the problem and, later on, the PRP planner [Muise et al., 2012] claimed for being capable of solving up to the triangle-tireworld-35 instance, both with similar experimental setup and the same restrictions on time and memory. The execution stats for this problem are shown in figure 3.3. The NDIW planner is able to solve up to the triangle-tireworld-31 instance, thanks to the apparently low width of the problem (see table 3.2). in addition, most time in execution is due to preprocessing, while the proper execution time of the NDIW algorithm is really short. The failure of the algorithm is due to the memory constraints, and the biggest instance that can be solved (triangle-tireworld-31 is still far from the time limits. In order to evaluate the performance of the N DIW algorithm in FOND problems, we have compared it against two state-of-the-art FOND planners: FF-H+ [Yoon et al., 2010], and PRP [Muise et al., 2012]. We have chosen three different domains: the triangle-tireworld, the bus-fare, and the river. The triangle-tireworld is a FOND benchmark domain used in the IPC competition. Likewise, the bus-fare, and river domains are two “probabilistically interesting” domains that have potential deadends [Little and Thiebaux, 2007]. All these three domains have atomic goals, so NDIW can be used to completely solve them. A performance comparison between NDIW and the other two FOND planners is shown in table 3.3, where the stats of FF-H+ and PRP have been extracted from their respective papers. Both FF-H+ and PRP act as an online planners, so the success rate and execution time are reported 37 Figure 3.3: Time to find a strong cyclic plan in the triangle tireworld problem, respect to the problem size. over 30 runs of each problem. However, since NDIW algorithm is an offline planner, the same stats have been scaled to 30 runs. problem name busfare river triangle-tire-1 triangle-tire-17 triangle-tire-31 triangle-tire-35 Success Rate NDIW PRP FF-H+ 100% 100 100% 100% 100% – 100% 67% 100% 100% 100% 100% 100% 100% 100% 100% – – NDIW 0 0(0) 0 (0) 29 723 – time(s) PRP FF-H+ 0 0 0 19 – 1520 – – – – – – Table 3.3: N DIW vs. P RP vs. FF-H+ performance. Time results for 30 runs. The NDIW planner solves the river domain in almost negligible time, while the FF-H+ planner only solves 67% of the instances. In addition, the NDIW planner overpasses the perfomance of FF-H+ in the triangle-tireworld domain, by solving up to the triangle-tireworld-31 instance while the FF-H+ planner is only capable to solve up to the triangle-tireworld-17 instance. The PRP planner slightly performs better than NDIW in the triangle-tireworld domain, by solving up to the triangle-tireworld-35 instance, while the rest of the statistics are quite similar. 3.4 Conclusion We have introduced N DIW , a FOND algorithm that is sound and complete, and is able to solve FOND problems in time and complexity that is exponential in the problem width. 38 We have proven empirically that, in deterministic problems width atomic goals, the width parameter distribution reported by N DIW is similar to the width parameter reported by N DIW . However, while the complexity of IW is exponential in the problem width, the complexity of N DIW is exponential in the double of the width parameter. A series of tests over FOND benchmark domains have revealed that these problems have low width when the goals are restricted to single atoms. However, some problems are rather more complex, and its width is higher than 2. Finally, we have looked in detail into one domain, the triangle-tireworld, and we have seen that NDIW can solve it with an efficiency that is comparable with the last state-of-the-art planners. 39 Chapter 4 SERIALIZED NDIW In this chapter we introduce the Serialized Non-Deterministic Iterated Width (S-NDIW ) algorithm, an extention NDIW that is able to solve FOND problems whose goals are conjunction of atoms through a simple form of decomposition. We define two versions of S-NDIW : an offline planner, and an online planner. Finally, we present an experimental evaluation of the performance of these algorithms. 4.1 Serialized Width The empirical results of the N DIW algorithm in the typical benchmark domains reveal that most of the problems have small width provided that the goals are restricted to single atoms. However, the width of the same problems increases significantly when the goals are conjunction of atoms. Several approaches exist to tackle problems when the goal is a conjunction of atoms ([Chapman, 1987], [Richter and Westphal, 2010],[Hoffmann et al., 2004]). One of the most simple methods is by using landmarks , so that the different goals of the problem are achieved sequentially. In this section we present a method that divides the problem into a series of subproblems that are solved sequentially. The advantage of this procedure is that, by taking advantage of the structure of the subproblems, the order of complexity of the overall process can be comparable to that of the problems with single goals. As introduced in [Geffner and Lipovetzky, 2012], a serialization for a problem ⇧ with goal G is a sequence G1 , . . . , Gm , m = |G|, such that G0 = ;, Gm = G, and Gi+1 extends Gi by one additional atom from G. A serialization d defines a series of planning problems Pd = P1 , . . . , Pm , where each Pi is like ⇧ but with goal Gi . The difference in the definition of the serialization between the one proposed here, and the one that applies to classical planning 41 relies in the initial state si of the subproblems Pi : these states depend on the landmark states in Pi 1 , and are defined differently in the offline and online serialized FOND planners that we present in this chapter. The Serialized Iterated Width (S-NDIW ) is a search algorithm that uses NDIW searches both for constructing a serialization of the problem P and for solving the resulting subproblems. While N DIW is a sequence of iwidth searches N DIW (i), i = 0, 1, . . . over the same problem, S-NDIW is a sequence of N DIW calls over |G| subproblems Pk , k = 1, . . . , |G|. The solution of P is then the execution of each one of the policies that solve the serialization sequentially. The order of the serialization of goals affects the overall solution of the problem and, concretely, a non adequate serialization may be determinant on the success or failure of the algorithm. Thus, the S-NDIW algorithm is not complete. In order to increase the chance of constructing a successful serialization, a consistency check (from [Geffner and Lipovetzky, 2012] is applied in the nodes candidate to be landmark states. Definition 4.1.1 (Consistency of a Landmark). Let Gi be the i-th term of a serialization d. A state s consistently achieves a subset of goals Gi+1 when Gi+1 is true in s, Gi+1 Gi , extends Gi with one atom from G, and it exists a weak plan from s that achieves the goal G, such that Gi+1 is true in every intermediate state. The series of subproblems Pd = P1 , . . . , Pm defined by the serialization d are solved by a sequence of calls to N DIW , that solves every subproblem Pi in time and complexity that is exponential in w(Pi ). As the number of subproblems is finite, the overall time and complexity of S-NDIW is exponential in ws = max1im w(Pi ). Based on this intuition, we define the serialized width of a serialization Pd as the maximum width w(Pi ) of the subproblems. In the next two sections we present two versions of S-NDIW, that differ on the definition of the sequence of subproblems Pd given a serialization d of the goals in P. The Offline S-NDIW is ideally to be used by an offline planner, since it computes a complete solution that is valid for any possible realization of P. The Online S-NDIW is ideally to be used by an online planner, since it takes profit of the knowledge acquired during the realization of the problem. 4.2 Offline S-NDIW Algorithm The Offline Serialized Iterated Width (Offline S-NDIW ) algorithm uses NDIW to compute a solution for P without any other information than the defini42 tion of P. The solution given by Offline S-NDIW is a policy ⇡ that is able to solve the problem for every possible realization of P. The Offline S-NDIW algorithm, described in definition 4.2.1, constructs a serialization of the problem P, together with a series of policies ⇡ 1 , . . . , ⇡ m that achieve the goals of the serialization sequentially. Each subpolicy ⇡ i maps the agent from a state that makes true Ii , into a state that makes true Ii+1 , and achieves a new atom from G. Despite this state is not known in advance, the the next subpolicy ⇡ i+1 is guaranteed to lead the agent into a state that achieves another atom of G, and so on. Definition 4.2.1. Offline Serialized NDIW (Offline S-NDIW) over P = hF, I, A, Gi consists of a sequence of calls to N DIW over the problems Pk = hF, Ik , A, Gk i, k = 1, . . . , |G| , where: 1. I1 = I 2. Gk is a set of atoms achieved from Ik such that Gk |Gk | = k; G0 = ; 1 ⇢ Gk ✓ G and 3. Ik+1 represents the partial state that contains the atoms that are true in all the landmark states where Gk is achieved through ⇡ k , required to be a consistent set. In other words, the k-th subcall of S-NDIW stops when NDIW finds a landmark goal Gk ⇢ G and a policy ⇡ k for Gk such that the intersection of all landmark states reachable through ⇡ k achieves Gk consistently. The next subcall starts with an initial state that is set to the mentioned intersection of landmark states, so ⇡ k+1 is guaranteed to be a policy for every landmark state of ⇡ k . The consistency requirement is set to increase the probability to get a successful order of the serialization. 4.3 Online S-NDIW Algorithm The Online Serialized Iterated Width (Online S-NDIW ) algorithm uses NDIW to compute a solution for P using the knowledge of the state in which the agent achieves every subgoal of the serialization. The Online S-NDIW algorithm, described in 4.3.1, constructs a serialization of the problem P, together with a series of policies ⇡ 1 , . . . , ⇡ m such that, prior computation of ⇡ k+1 , the realization of ⇡ k is known. By knowing the state in which the k-th subgoal is achieved, the policy ⇡ k+1 has wider information about the real initial state of the subproblem Pk than in the offline version, so more useful policies can be computed. 43 Definition 4.3.1. Online Serialized NDIW (Online S-NDIW) over P = hF, I, A, Gi consists of a sequence of calls to N DIW over the problems Pk = hF, Ik , A, Ĝk i, k = 1, . . . , |G| , where: 1. I1 = I 2. Ĝk = Gk 1 [ oneof {g 2 G \ Gk 1 }, G0 = ;, 3. The landmark states are required to be consistent states. 4. Ik+1 is the state of the agent achieves after the realization of the (k 1)th subproblem. In other words, the k-th subcall of S-NDIW stops when NDIW finds a policy in which every terminal state is a consistent state that achieves the goal (perhaps not the same goal in each terminal state). The next subcall starts with an initial state that models one of these terminal landmark states, corresponding to the state of the agent after the realization of the subproblem. 4.4 Empirical Evaluation In this section we analyze the capabilities of S-NDIW to solve problems when the goals are conjunction of atoms. First, we compare the performance of S-NDIW versus SIW on a selection of deterministic benchmark problems. Then, we move to FOND domains, for which we compare the performance of S-NDIW against some of the state-of-the-art FOND planners. All the experiments with S-NDIW have been run on Xeon Woodcrest computers with clock speeds of 2.33 GHz, using 2GB memory limit. 4.4.1 Serialized Width in Classical Problems The online and offline versions of the S-NDIW algorithm have been tested with a selection of classical planning problems used in previous editions of the IPC. The experiments have been run on Xeon Woodcrest computers with clock speeds of 2.33 GHz, using 2GB memory limit. The results of the tests are presented in table 4.1, and are compared with those of SIW, run on the same experimental setup, and extracted from [Geffner and Lipovetzky, 2012]. For every domain, a series of N different instances have been tested with the offline version of S-NDIW. A total of 5 runs per instance have been tested with the online version of S-NDIW. The table shows the number of successful runs on each domain (normalized over 5 runs, in the case of Online S-NDIW ). 44 Domain N Offline S-NDIW Online SNDIW SIW 8puzzle Blocksworld ferry floortile Grid gripper logistics Mystery parking pegsol satellite scanalyzer sokoban tidybot tower visitall woodworking zeno 50 50 50 20 5 50 28 30 20 30 20 30 30 20 22 20 30 20 5 8 27 16 1 25 9 18 0 0 3 0 1 0 22 3 1 7 2.4 9 26 0 0 20 16 15 0 0.8 4 1 1 0 22 2 1 8.2 50 50 50 – 5 50 28 27 17 6 19 26 3 7 – 19 30 19 Table 4.1: Offline S-NDIW vs Online S-NDIW vs. SIW width distribution in typical benchmark domains. The Offline S-NDIW results are normalized over 5 runs per problem instance. The performance of the Offline S-NDIW and Online S-NDIW algorithms are, apparently, very similar. However, that performance is quite low when compared with SIW. Because of the similarities between S-NDIW and SIW algorithms when the problems are deterministic, we need to examine the reason of the difference of performance among S-NDIW and SIW. The figure 4.1 shows the execution results of the offline, and online versions of S-NDIW. We distinguish a high percentage of successful runs when the problem size is small. In contrast, the success rate is quite small when the size of the problem exceeds 300 atoms. In that case, only the really simple problems are solved (in short time), while the rest of the problems exceed the time or memory resources during execution. There is a interval of problems — those between 3000 and 6000 atoms —, in which the limiter resource in Offline S-NDIW is memory, while in Online S-NDIW is the execution time. This situation may be explained by the fact that the online version of S-NDIW counts with richer information of the state of the world than the offline version in the landmark states, so the subproblems may become less complex (also in terms of memory) despite the complexity of the whole problem makes not possible for Online S-NDIW to find a complete solution in the required time. 45 (a) Execution stats of the Offline version of S-NDIW. (b) Execution stats of the Online version of S-NDIW. Figure 4.1: Execution results of S-NDIW in deterministic problems. TLE = Time Limit 30min Exceeded; MLE = Memory Limit 2GB Exceeded. The figure 4.2 shows detailed execution stats of three selected domains: the gripper, the ferry, and the zeno. We see that, as stated in theorem ??, the time complexity needed to find a solution is, for each different problem, exponential in the problem width. The amount of memory demanded by S-NDIW is also exponential in the problem size, and in the figure we see clearly that, when the problem size overpasses certain threshold, the S-NDIW algorithm is likely to run out of memory. Thus, with limited resources in memory and time, the S-NDIW is able to tackle problems when its size do not overpass certain limits, that are different in each domain and depends on the branching factor of the tuple graph. 4.4.2 Serialized Width in FOND Problems In this section we evaluate the performance of the S-NDIW algorithm, by comparing the online and offline versions of the planner against some of the 46 Figure 4.2: Execution results of Online S-NDIW in a selection of deterministic domains. state-of-the-art FOND planners. A selection of FOND benchmark domains with conjunctive goals have been tested with the offline version of NDIW. The performance results are compared with those of the state-of-the-art planners FIP, and PRP, extracted from [Muise et al., 2012]. The S-NDIW demonstrate lower performance than the other two planners. Domain (#instances) blocksworld (30) faults (55) first-responders (100) forest (90) blocksworld-new (50) forest-new (90) Solved (unsat) Off S-NDIW FIP 7 (0) – 58 (25) – 4 (0) 10 (0) 30 (0) 55 (0) 100 (25) 20 (11) 33 (0) 51 (0) PRP 30 (0) 55 (0) 100 (25) 66 (48) 46 (0) 81 (0) Table 4.2: Offline S-NDIW vs. F IP vs. P RP performance. In parenthesis, number of non-resoluble instances detected. A selection of FOND benchmark domains with conjunctive goals have been tested with the online version of NDIW, and the performance results are compared with those of the state-of-the-art planners FF-Hindsight+ [Yoon et al., 2010], and PRP. The blocksworld-2, elevators, and zenotravel, and climber problems have conjunctive goals, while the river, busfare, and triangle-tireworld problems have atomic goals. In this case, S-NDIW show a comparable performance with the state-of-the-art FOND planners, and is due to the fact that these FOND problems have small bounded width. Despite some of these problems have a significant number of deadends that may increase the complexity of other planners, the S-NDIW algorithm solves them in low order of complexity. 47 Domain (#instances) blocksworld-2 (15) elevators (15) zenotravel (15) climber (1) river (1) busfare (1) triangle-tire-1 (1) triangle-tire-17 (1) triangle-tire-31 (1) triangle-tire-35 (1) Success Rate On S-NDIW FF-H+ 29.1% 91.1% 31.8% 100% 100% 100% 100% 100% 100% – 74.4% 64.9% 68.9% 100% 66.7% 100% 100% 100% – – PRP time(s) On S-NDIW FF-H+ PRP 100% 100% 100% 100% 66.7% 100% 100% 100% 100% 100% 900 1620 1620 – – – – – – – 8.4 1.7 98.7 0 0 0 0 19 – 1520 0 0 0 0 (0) 124 (29) 2736 (723) – Table 4.3: Online S-NDIW vs. FF-H+ vs. P RP performance. Results for 30 runs. 4.5 Conclusion We have introduced a serialization method for addressing FOND problems with goals that are conjunction of atoms. We have proposed two different versions of the planner: an offline planner, that computes the complete solution valid for any possible realization of the problem; and an online planner, that computes a solution of the problem given one realization. We note that, despite the N DIW algorithm is complete, S-NDIW isn’t. The S-NDIW algorithm does not provide guarantee of finding a solution – if it exists –, and the reason is that the order of the serialization affects the success of the algorithm, and some consistent orderings may lead to landmark states that are deadends of the problem. Finally, the S-NDIW algorithm, through its two versions, is able to solve some of the FOND benchmark domains with a performance that is comparable with the state-of-the-art FOND planners, provided that the problem size is sufficiently small. However, the performance of S-NDIW compared with those of SIW in deterministic domains is low, and our implementation of S-NDIW is only able to solve problems with small width and size. 48 Chapter 5 POLICY EXTRACTION This chapter formalizes the extraction of policies for a fond planning problem P from a tuple graph G of P. The Iterated Prune algorithm is proposed to extract a family of policies from the tuple graph, with a complexity that is comparable with the expansion of the tuple graph. 5.1 Solucion Extraction Graph As defined in chapter 3, the tuple graph is a graph whose nodes are of the form rt , the side-effects of a tuple t 2 T i , and that encodes the reachability relations among the states represented by the nodes. These relations are aj represented by directed edges of the form rt ! rt0 , meaning that for every optimal weak plan for rt exists an action a and an effect of a (say, aj ) such that the state resulting by applying the effect aj to rt a weak plan for rt0 . In every state rt more than one action may be applied, meaning that the node rt in tuple graph G i may contain outer edges labeled with effects of different actions. Similarly, an effect aj of an action a may map rt into a state that models more than one node in G i , meaning that the node rt in the tuple graph G i may contain more than one edge labeled with the same effect aj . A policy ⇡ for a problem P is well defined when ⇡ is defined in every reachable state, and ⇡ eventually maps the initial state into the goal state of the problem. In order to identify when the tuple graph encodes a policy for 0 the problem P, we need to check if G i contains a subgraph G i that accomplish certain properties. The following definitions formalize the properties that we 0 will require G i to define a policy for P. Definition 5.1.1. Let G i be a tuple graph of a FOND planning problem P. We denote by AG i (rt ) the set of actions a for which the node rt has an outer 49 edge labeled with an effect of a. Definition 5.1.2. Let G i be a tuple graph of a FOND planning problem P. We say that a node rt is complete if, for every a 2 AG i (rt ), and for every effect aj of a, there is an outer edge of rt labeled with aj . We say that a node rt is unitary if, for every effect aj there is one, and only one, outer edge of rt labeled with aj . Definition 5.1.3. Let G i be a tuple graph of a FOND planning problem P, 0 0 and let G i be a subgraph of G i . A state rt is a deadend in G i when it doesn’t 0 exist a chain in G i starting in rt that implies the goal G. 0 0 A policy ⇡ can be inferred from a subgraph G i of G i with no deadends when, for every state rt , we consider ⇡G i 0 (rt ) to be a random election over the actions in AG i 0 (rt ), which we note as ⇡G i 0 (rt ) = oneof {a|a 2 AG i 0 (rt )}. The policy ⇡ of an arbitrary (reachable) state s will be pi(s) = oneof {⇡G i 0 (rt )|s |= 0 rt } It is not difficult to see that all the nodes in G i need to be complete to define a policy. Likewise, the do not need to be unitary, so the same effect of an action a = ⇡G i 0 (rt ) may map the state rt into more than one possible 0 nodes in G i . For that reason, we choose the policy pi(s) to be one of the actions in AG i 0 (rt ), for any rt true in s. Theorem 5.1.1. Let G i be a tuple graph of a FOND planning problem P. 0 0 Let G i be a subgraph of G i . The subgraph G i defines a strong cyclic policy ⇡ 0 0 for P if G i has no deadends, it exists rt 2 G i such that I is true in rt , and 0 every node in G i is complete. 0 Proof. The direct implication is trivial. Lets consider a subgraph G i with the stated properties, and we will see that it defines a strong cyclic policy. First, there exist a node rt modeled by the initial state I. For any rt0 in 0 G i , either rt0 is a goal state, or there exists an action applicable in AG i 0 (rt ) 0 0 (because G i has no deadends). As every node in G i is complete, the policy that picks an action from AG i 0 (rt ) is well defined. As there are no deadends 0 in G i , the policy is guaranteed to (eventually) reach the goal. 5.2 Iterated Prune Algorithm We propose an algorithm, Iterated Prune (IP ), that checks the existence of 0 a subgraph G i ⇢ G i that defines a policy for P, and that is based on the algorithm to extract strong cyclic solutions presented in [Geffner and Bonet, 2013]. The IP algorithm is constructive, so a family of policies is obtained at the 50 end of the process. The time and complexity of the proposed method is comparable with the complexity of the expansion of the tuple graph. Despite more efficient algorithms do exist, they are out of the scope of this work, and do not result in a major improvement on the overall planner algorithm. The Iterated Prune algorithm consists of a series of pruning actions in the tuple graph G i until a subgraph with the properties of theorem 5.1.1 is found, or all the nodes in the graph are pruned. In order to guarantee the absence of deadend nodes, the algorithm computes the Vmin (rt ) value of every node in G i , that measures the minimum distance from rt to a goal state, that is, the length of minimum chain in from rt that implies the goal G. If G i contains a node with Vmin (rt ) = 1, the algorithm prunes this node from the graph. The pruning action implies not only removing the node and all the inner and outer edges, but also the guarantee that the resulting graph is complete need aj to be preserved. That is, if an edge rt0 ! rt existed prior pruning the node rt , and rt] do not have more outer edges labeled with the effect aj , then all the outer edges of rt0 labeled with effects of the action a need to be pruned as well. The pruning process is guaranteed to end in a finite number of iterations (at most, in |G i | iterations), and its complexity is comparable in front of the complexity needed to expand the tuple graph. If P admits solution, 0 the Iterated Prune algorithm returns a subgraph G i that defines a family of strong cyclic policies that solve P. Theorem 5.2.1. The Iterated Prune algorithm is sound and, if P admits solution, returns a family of strong cyclic policies for P. Proof. In each iteration, the Iterated Prune algorithm prunes a node rt with Vmin (rt ) = 1, i.e., that is a deadend of the graph. When pruning a node rt , the inner and outer edges are removed while preserving the completeness of the remaining nodes. At the end of the process, the algorithm returns a graph that accomplish the conditions of theorem 5.1.1. When the nodes are not unitary, the graph do not define a unique policy, but it returns a family of strong cyclic policies for P. Similarly, when the Iterated Prune algorithm do not find a policy for 0 P, then the root of G i is a deadend of the problem (because is pruned). Therefore, there does not exist a strong cyclic policy for P. 5.2.1 Policy Definition The graph returned by the Iterated Prune algorithm is a graph such that, for every state rt , a set of actions A(rt ) are available. Every action a 2 A(rt ) 51 maps the state rt into an state that depends on the (non-deterministic) effect of a. In concrete, this graph is an AND/OR graph [Russell and Norvig, 2002]. In in the tests described in chapters 3 and 4, we have just randomly selected an action a from the set A(rt ). The fact that the nodes of the subgraph are complete, and the subgraph has no deadens guarantee that, assuming fair non-determinism, our (randomized) policy eventually solves the problem. There exist algorithms to select the optimal solutions according to the cost of the actions, or the presence/absence of loops [Dechter and Pearl, 1985] [Hansen and Zilberstein, 1998] [Bonet and Geffner, 2005]. Once the Iterated Prune algorithm returns the subgraph that defines a family of strong cyclic policies for P, we can use an off-the-shelf algorithm (e.g., AO? ) [Geffner and Bonet, 2013] to select the optimal policy according to our preferences. 5.3 Conclusion The tuple graph G defined in chapter 2 may contain a subgraph that defines a strong cyclic policy for a FOND planning problem P. In that case, the methods defined in chapters 3 and 4 build a tuple graph G i such that, for i = w(P), contains a subgraph that defines a strong cyclic policy. The extraction of such policies can be performed with the Iterated Prune algorithm, a method that is sound and complete, and returns a family of strong cyclic policies for P in time and complexity that is comparable with that of expanding the tuple graph. Finally, we can use an off-the-shelf algorithm to select the optimal policy according to our preferences. 52 Part III Conclusions and Future Work 53 Chapter 6 DISCUSSION AND CONCLUSION In this chapter we summarize the contributions of this manuscript, and discuss future lines of research based on the work developed so far. 6.1 Contributions This manuscript contributes in the study of the structure of FOND planning problems, and proposes an abstraction of the FOND problems complexity based on the reachability over classes that are the side-effects of tuples of atoms. This method has appeared to encode the complexity in the typical benchmark problems in classical and FOND planning. In more detail, the contributions of this manuscript are: 1. A new width notion for fully observable, non-deterministic planning that encodes the reachablility among states, and explains the complexity in planning problems. We prove theoretically that some typical benchmark domains have a bounded and low width when goals are restricted to single atoms. 2. A simple, blind-search planning algorithm (NDIW ) that runs in time exponential in the width of the problem. We prove empirically that many of the existing benchmark domains have a bounded and low width when goals are restricted to single atoms. 3. An online version, and an offline version of a simple, blind-search planning algorithm (S-NDIW ) that uses NDIW for serializing a problem into subproblems and for solving such subproblems. 55 4. A simple, effective method to extract strong cyclic policies from the tuple graph that runs in time that is comparable with the time needed to expand the tuple graph. 6.2 Future Work This thesis defines a serialization of the problem that is based on the approach in [Geffner and Lipovetzky, 2012] for classical planning. The intuition leads to guess that the performance of S-NDIW should be comparable with that of SIW in classical planning problems. However, our tests reveal that SNDIW do not perform such well due to memory and time constraints. The low performance of S-NDIW may rely on the implementation, that should be revised. The serialization of the problems leads to an algorithm that is not complete, since the order of the serialization affects the success of the planner. In order to improve the performance, and minimize the chances of entering in a deadend of the problem, a consistency check over the states is employed. This consistency check checks the existence of weak plans that satisfy certain constraints and, certainly, improves the performance of the planner. However, as well as we pursue strong and strong cyclic plans, it may be useful to use a more constrained consistency check that increase the likelihood of existence of the desired policies. The tests reported in this manuscript check the existence of strong cyclic policies. However, the same tests can be performed to check for the existence of strong policies, or even weak plans. It may be interesting to compare the existence of each of such solutions in the existing benchmark domains. There is a strong connection between the Markov Decision Problems (MDP ) and FOND planning. The difference is that the MDP model considers probabilities in the transition function, while the FOND model don’t. Any MDP can be translated into a FOND problem, by translating the distribution of probabilities of the effects of an action into different effects. This equivalence between FOND and MDP problems suggests that the width notion introduced in this manuscript can be adapted to describe the MDP model accurately. The tuple graph exploits the fact that, in a FOND problem, not all atoms are relevant in every state in order to achieve the goal and, similarly, only a few atoms are relevant in each state in order to advance towards the goal of the problem. It may be interesting to check the existence of compact policies that only take into account a selection of atoms of the problem, leaving the remaining atoms unattended. 56 Bibliography [Blum and Furst, 1997] Blum, A. L. and Furst, M. L. (1997). Fast planning through planning graph analysis. Artificial Intelligence, 90(1-2):281–300. [Bonet and Geffner, 2001] Bonet, B. and Geffner, H. (2001). Planning as heuristic search. Artificial Intelligence, 129(February 2000):5–33. [Bonet and Geffner, 2005] Bonet, B. and Geffner, H. (2005). An algorithm better than AO*? In Proceedings of the 20th national conference on Artificial intelligence - Volume 3, AAAI’05, pages 1343–1347. AAAI Press. [Bylander, 1994] Bylander, T. (1994). The Computational Complexity of Propositional STRIPS Planning. Artificial Intelligence, 69(1-2):165–204. [Chapman, 1987] Chapman, D. (1987). Planning for conjunctive goals. Artificial Intelligence, 32(3):333–377. [Chen and Giménez, 2007] Chen, H. and Giménez, O. (2007). Act Local, Think Global: Width Notions for Tractable Planning. ICAPS. [Dechter and Pearl, 1985] Dechter, R. and Pearl, J. (1985). Generalized bestfirst search strategies and the optimality of A*. J. ACM, 32(3):505–536. [Fikes and Nilsson, 1972] Fikes, R. and Nilsson, N. (1972). STRIPS: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2. [Fu et al., 2008] Fu, J., Bastani, F., Ng, V., Yen, I.-L., and Zhang, Y. (2008). FIP: A Fast Planning-Graph-Based Iterative Planner. 20th IEEE ICTAI, pages 419–426. [Geffner and Bonet, 2013] Geffner, H. and Bonet, B. (2013). A Concise Introduction to Models and Methods for Automated Planning: Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers. 57 [Geffner and Lipovetzky, 2012] Geffner, H. and Lipovetzky, N. (2012). Width and serialization of classical planning problems. ECAI. [Gerevini et al., 2009] Gerevini, A. E., Haslum, P., Long, D., Saetti, A., and Dimopoulos, Y. (2009). Deterministic planning in the fifth international planning competition: PDDL3 and experimental evaluation of the planners. Artificial Intelligence, 173(5-6):619–668. [Ghallab et al., 2004] Ghallab, M., Nau, D., and Traverso, P. (2004). Automated Planning: Theory & Practice. Morgan Kaufmann. [Hansen and Zilberstein, 1998] Hansen, E. and Zilberstein, S. (1998). Heuristic search in cyclic AND/OR graphs. AAAI/IAAI, pages 412–418. [Hoffmann et al., 2004] Hoffmann, J., Porteous, J., and Sebastia, L. (2004). Ordered landmarks in planning. J. Artif. Int. Res., 22(1):215–278. [Kissmann and Edelkamp, 2009] Kissmann, P. and Edelkamp, S. (2009). Solving fully-observable non-deterministic planning problems via translation into a general game. In KI 2009, KI’09, pages 1–8, Berlin, Heidelberg. Springer-Verlag. [Levesque, 2005] Levesque, H. (2005). 20(1):249–254. Planning with loops. IJCAI, [Little and Thiebaux, 2007] Little, I. and Thiebaux, S. (2007). Probabilistic planning vs. replanning. ICAPS Workshop on IPC: Past, Present and Future. [McDermott, 1999] McDermott, D. (1999). Using regression-match graphs to control search in planning. Artificial Intelligence, pages 1–42. [McDermott et al., 1998] McDermott, D., Ghallab, M., Howe, A., and Knoblock, C. (1998). PDDL-the planning domain definition language. [Muise et al., 2012] Muise, C., McIlraith, S., and Beck, J. (2012). Improved Non-Deterministic Planning by Exploiting State Relevance. ICAPS, pages 172–180. [Richter and Westphal, 2010] Richter, S. and Westphal, M. (2010). The LAMA planner: guiding cost-based anytime planning with landmarks. J. Artif. Int. Res., 39(1):127–177. 58 [Russell and Norvig, 2002] Russell, S. J. and Norvig, P. (2002). Artificial Intelligence: A Modern Approach, 2nd Ed. Prentice Hall, Englewood Cliffs, NJ. [Yoon et al., 2010] Yoon, S., Ruml, W., Benton, J., and Do, M. (2010). Improving determinization in hindsight for online probabilistic planning. ICAPS. 59 Index classical planning, 6 consistency check, 9 effective width, 8 Iterative Width algorithm, 7 serialization, 8 SIW algorithm, 8 tuple graph, 6, 7 width, 7 FOND planning side-effects of a tuple, 17 International Planning Competition, 4 PDDL model, 5 domain of a problem, 5 instance of a problem, 5 plan, 6 planning formulation, 3 actions set, 3 goal state, 3 initial state, 3 states space, 3 transition function, 3 policy, 10 STRIPS model, 4 tuple of atoms, 6 FOND planning, 10 complete node rt , 50 consistency of a landmark, 42 deadend, 50 Iterated Prune algorithm, 50 NDIW algorithm, 30 N DIW (i), 31 S-NDIW algorithm, 42 Offline S-NDIW, 43 Online S-NDIW, 44 serialization, 41 side-effects of tuple, 31 standard form, 16 strong cyclic plan, 10 strong plan, 10 tuple graph G, 17 tuple graph G i , 29 unitary node rt , 50 weak plan, 10 planning, 3 atom, 4 chain of tuples, 6 factored representation, 4 fluent, 4 61
© Copyright 2025 Paperzz