Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work with Alan Fern, Bob Givan and Rao Kambhampati Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning Competition Client : Participants, send action Server: Competition Host, simulates actions Sungwook Yoon – Probabilistic Planning via Determinization 2 The Winner was …… • FF-Replan – A replanner. Use FF – Probabilistic domain is determinized • Interesting Contrast – Many probabilistic planning techniques • Work in theory but does not work in practice – FF-Replan • No theory • Work in practice Sungwook Yoon – Probabilistic Planning via Determinization 3 The Paper’s Objective Better determinization approach (Determinization in Hindsight) Theoretical consideration of the new determinization (in Hindsight) New view on FF-Replan Experimental studies with determinization in Hindsight (FF-Hindsight) Sungwook Yoon – Probabilistic Planning via Determinization 4 Probabilistic Planning (goal-oriented) Action Maximize Goal Achievement I Left Outcomes are more likely A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Dead End Goal State Action State Sungwook Yoon – Probabilistic Planning via Determinization 5 All Outcome Replanning (FFRA) ICAPS-07 Probability1 Effect 1 Action1 Effect 1 Effect 2 Action2 Effect 2 Action Probability2 Sungwook Yoon – Probabilistic Planning via Determinization 6 Probabilistic Planning All Outcome Determinization Action Find Goal I A1 Probabilistic Outcome A2 Time 1 A1-1 A1 A1-2 A2 A1 A2-1 A2 A1 A2 A2-2 A1 A2 Time 2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 Dead End Goal State Action State Sungwook Yoon – Probabilistic Planning via Determinization 7 Probabilistic Planning All Outcome Determinization Action Find Goal I A1 Probabilistic Outcome A2 Time 1 A1-1 A1 A1-2 A2 A1 A2-1 A2 A1 A2 A2-2 A1 A2 Time 2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 Dead End Goal State Action State Sungwook Yoon – Probabilistic Planning via Determinization 8 Problem of FF-Replan and better alternative sampling FF-Replan’s Static Determinizations don’t respect probabilities. We need “Probabilistic and Dynamic Determinization” Sample Future Outcomes and Determinization in Hindsight Each Future Sample Becomes a Known-Future Deterministic Problem Sungwook Yoon – Probabilistic Planning via Determinization 9 Probabilistic Planning (goal-oriented) Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Dead End Goal State Action State Sungwook Yoon – Probabilistic Planning via Determinization 10 Start Sampling Note. Sampling will reveal which is better A1? Or A2 at state I Sungwook Yoon – Probabilistic Planning via Determinization 11 Hindsight Sample 1 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 1 A2: 0 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 12 Hindsight Sample 2 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 2 A2: 1 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 13 Hindsight Sample 3 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 2 A2: 1 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 14 Hindsight Sample 4 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 3 A2: 1 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 15 Summary of the Idea: The Decision Process (Estimating Q-Value, Q(s,a)) S: Current State, A(S) → S’ 1. For Each Action A, Draw Future Samples Each Sample is a Deterministic Planning Problem 2. Solve The Deterministic Problems The solution length is used for goal-oriented problems, Q(s,A) 3. Aggregate the solutions for each action Max A Q(s,A) 4. Select the action with best aggregation Sungwook Yoon – Probabilistic Planning via Determinization 16 Mathematical Summary of the Algorithm • H-horizon future FH for M = [S,A,T,R] – Mapping of state, action and time (h<H) to a state Each Future is a – S×A×h→S Deterministic Problem • Value of a policy π for FH – R(s,FH, π) • VHS(s,H) = EFH [maxπ R(s,FH,π)] Done by FF • Compare this and the real value • V*(s,H) = maxπ EFH [ R(s,FH,π) ] • VFFRa(s) = maxF V(s,F) ≥ VHS(s,H) ≥ V*(s,H) • Q(s,a,H) = (R(a) + EFH-1 [maxπ R(a(s),FH-1,π)] ) – In our proposal, computation of maxπ R(s,FH-1,π) is approximately done by FF [Hoffmann and Nebel ’01] Sungwook Yoon – Probabilistic Planning via Determinization 17 Key Technical Results The Importance of Independent Sampling of States, Actions, Time The necessity of Random Time Breaking in Decision making We identify the characteristic of FF-Replan in terms of Hindsight Decision Making, VFFRa(s) = maxF V(s,F) Theorem 1 When there is a policy that can achieve the goal with probability 1 within horizon, hindsight decision making algorithm will find the goal with probability 1. Theorem 2 Polynomial number of samples are needed with regard to, Horizon, Action, The minimum Q-value advantage Sungwook Yoon – Probabilistic Planning via Determinization 18 Empirical Results IPPC-04 Problems Problem Numbers are solved Trials FFRa FF-Hindsight Blocksworld 270 158 Boxworld 150 100 Fileworld 29 14 R-Tireworld 30 30 ZenoTravel 30 0 Exploding BW 5 28 G-Tireworld 7 18 Tower of Hanois 11 17 For ZenoTravel, when we used Importance sampling, the solved trials have been improved to 26 Sungwook Yoon – Probabilistic Planning via Determinization 19 Empirical Results Planners Climber River Bus-Fare Tire1 Tire2 Tire3 Tire4 Tire5 Tire6 60% 65% 1% 50% 0% 0% 0% 0% 0% Paragraph 100% 65% 100% 100% 100% 100% 3% 1% 0% FPG 100% 65% 22% 100% 92% 60% 35% 19% 13% FF-HS 100% 65% 100% 100% 100% 100% 100% 100% 100% FFRa These Domains are Developed just to Beat FF-Replan Obviously, FF-Replan did not do well. But, FF-Hindsight did very well, showing Probabilistic Reasoning Ability while achieving Scalability Sungwook Yoon – Probabilistic Planning via Determinization 20 Conclusion Deterministic Planning Probabilistic Planning scalability scalability Classic Planning Markov Decision Processes Machine Learning for MDP Machine Learning for Planning Net Benefit Optimization Temporal Planning Temporal MDP scalability Determinization Sungwook Yoon – Probabilistic Planning via Determinization 21 Conclusion • Devised an algorithm that can take advantage of the significant advances in deterministic planning in the context of probabilistic planning • Made many of the deterministic planning techniques available to probabilistic planning – Most of the learning to planning techniques are developed solely for deterministic planning • Now, these techniques are relevant to probabilistic planning too – Advanced net-benefit style of planners can be used for the reward maximization style of probabilistic planning problems Sungwook Yoon – Probabilistic Planning via Determinization 22 Discussion • Mercier and Van Hentenryck provided the analysis of the difference between – V*(s,H) = maxπ EFH [ R(s,FH,π) ] – VHS(s,H) = EFH [maxπ R(s,FH,π)] • Ng and Jordan provided the analysis of the difference between – V*(s,H) = maxπ EFH [ R(s,FH,π) ] – V^(s,H) = maxπ ∑ [ R(s,FH,π) ] / m, where m is the sample number Sungwook Yoon – Probabilistic Planning via Determinization 23 IPPC-2004 Results Human Learned Control Knowledge Knowledge 2nd Winner of IPPC-04 FFRs Numbers : Successful Runs Place Winners NMRC J1 Classy NMR mGPT C FFRS FFRA BW 252 270 255 30 120 30 210 270 Box 134 150 100 0 30 0 150 150 File - - - 3 30 3 14 29 Zeno - - - 30 30 30 0 30 Tire-r - - - 30 30 30 30 30 Tire-g - - - 9 16 30 7 7 TOH - - - 15 0 0 0 11 Exploding - - - 0 0 0 3 5 NMR Non-Markovian Reward Decision Process Planner Classy Approximate Policy Iteration with a Policy Language Bias mGPT Heuristic Search Probabilistic Planning C Sungwook Symbolic Heuristic SearchYoon – Probabilistic Planning via Determinization 24 IPPC-2006 Results Unofficial Winner of IPPC-06 FFRa Numbers : Percentage of Successful Runs Paragraph FFRS FFRA FPG FOALP sfDP BW 86 63 100 29 0 77 Zenotravel 100 27 0 7 7 7 Random 100 65 0 0 5 73 Elevator 93 76 100 0 0 93 Exploding 52 43 24 31 31 52 Drive 71 56 0 0 9 0 Schedule 51 54 0 0 1 0 PitchCatch 54 23 0 0 0 0 Tire 82 75 82 0 91 69 FPG Factored Policy Gradient Planner FOALP First Order Approximate Linear Programming sfDP Symbolic Stochastic Focused Dynamic Programming with Decision Diagrams Paragraph A Graphplan Based Probabilistic Planner Sungwook Yoon – Probabilistic Planning via Determinization 25 Sungwook Yoon – Probabilistic Planning via Determinization 26 Sampling Problem Time dependency issue S1 S2 A Start Goal B D (with probability 1-p) S3 C (with probability p) C (with probability 1-p) D (with probability p) Dead End Sungwook Yoon – Probabilistic Planning via Determinization 27 Sampling Problem Time dependency issue S1 S2 A Start Goal B S3 Dead End S3 is worse state then S1 but looks like there is always a path to Goal Need to sample independently across actions Sungwook Yoon – Probabilistic Planning via Determinization 28 Action Selection Problem Random Tie breaking is essential B: with probability 1-p A: Always stays in Start Start S1 B: with probability p Goal C: with probability 1-p C: with probability p In Start state, C action is definitely better, but A can be used to wait until C to the Goal effect is realized Sungwook Yoon – Probabilistic Planning via Determinization 29 Sampling Problem Importance Sampling (IS) B: with very high probability Start S1 B: with extremely low probability Goal - Sampling uniformly would find the problem unsolvable. - Use importance sampling. - Identifying the region that needs importance sampling is for further study. -In the benchmark, Zenotravel needs the IS idea. Sungwook Yoon – Probabilistic Planning via Determinization 30 Theoretical Results • Theorem 1 – For goal-achieving probabilistic planning problems, if there is a policy that can solve the probabilistic planning problem with probability 1 with bounded horizon, then hindsight planning would solve the problem with probability 1. If there is no such policy, hindsight planning would return less 1 success ratio. – If there is a future where no plan can achieve the goal, the future can be sampled • Theorem 2 – The number of future samples needed to correctly identify the best action – w > 4Δ-2T ln (|A|H| / δ) – Δ : the minimum Q-advantage of the best action over the other actions, δ: confidence parameter – From Chernoff Bound Sungwook Yoon – Probabilistic Planning via Determinization 31 Probabilistic Planning Expecti-max solution Action Maximize Goal Achievement Probabilistic Outcome Max Time 1 Exp Exp Max Max Time 2 E E E Max Max E E E E Action State E Goal State Sungwook Yoon – Probabilistic Planning via Determinization 32 Hindsight Sample 1 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 1 A2: 0 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 33 Hindsight Sample 2 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 2 A2: 1 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 34 Hindsight Sample 3 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 2 A2: 1 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 35 Hindsight Sample 4 Left Outcomes are more likely Action Maximize Goal Achievement I A1 Probabilistic Outcome A2 Time 1 A1 A2 A1 A2 A1 A2 A1 A2 Time 2 Action State A1: 3 A2: 1 Sungwook Yoon – Probabilistic Planning via Determinization Dead End Goal State 36
© Copyright 2026 Paperzz