TSP for Robot navigation

Traveling Salesman
Problems Motivated by
Robot Navigation
Maria Minkoff
MIT
With Avrim Blum, Shuchi Chawla,
David Karger, Terran Lane,
Adam Meyerson
A Robot Navigation Problem
• Robot delivering packages in a building
• Goal to deliver as quickly as possible
• Classic model: Traveling Salesman Problem
• Find a tour of minimum length
• Additional constraints:
• some packages have higher priority
• uncertainty in robot’s behavior
• battery failure
• sensor error, motor control error
Markov Decision Process Model
• State space S
• Choice of actions aA at each state s
• Transition function T(s’|s,a)
• action determines probability distribution on next state
• sequence of actions produces a random path through
graph
• Rewards R(s) on states
• If arrive in state s at time t,
receive discounted reward gtR(s) for g(0,1)
• MDP Goal: policy for picking an action from any state
that maximizes total discounted reward
Exponential Discounting
• Motivates to get to desired state quickly
• Inflation: reward collected in distant future
decreases in value due to uncertainty
• at time t robot loses power with fixed probability
• probability of being alive at t is exponentially
distributed
• discounting reflects value of reward in expectation
Solving MDP
• Fixing action at each state produces a
Markov Chain with transition probabilities pvw
• Can compute expected discounted reward rv
if start at state v:
rv = rv + Sw pvw gt(v,w) rw
• Choosing actions to optimize this recurrence
is polynomial time solvable
• Linear programming
• Dynamic programming (like shortest paths)
Solving the wrong problem
• Package can only be delivered once
• So should not get reward each time reach target
• One solution: expand state space
• New state = current location  past locations
(packages already delivered)
• Reward nonzero only on states where current
location not included in list of previously visited
• Now apply MDP algorithm
• Problem: new state space has exponential size
Tackle an easier problem
• Problem has two novel elements for “theory”
• Discounting of reward based on arrival time
• Probability distribution on outcome of actions
• We will set aside second issue for now
• In practice, robot can control errors
• Even first issue by itself is hard and interesting
• First step towards solving whole problem
Discounted-Reward TSP
Given
• undirected graph G=(V,E)
• edge weights (travel times) de ≥ 0
• weights on nodes (rewards) rv ≥ 0
• discount factor g  (0,1)
• root node s
Goal
find a path P starting at s that maximizes
P(v)
d
total discounted reward r(P) = Sv P rv g
Approximation Algorithms
• Discounted-Reward TSP is NP-complete
(and so is more general MDP-type problem)
• reduction from minimum latency TSP
• So intractable to solve exactly
• Goal: approximation algorithm
that is guaranteed to collect at least some constant
fraction of the best possible discounted reward
Related Problems
Goal of Discounted-Reward TSP seems to be
to find a “short” path that collects “lots” of
reward
• Prize-Collecting TSP
• Given a root vertex v, find a tour containing v that
minimizes total length + foregone reward
(undiscounted)
• Primal-dual 2-approximation algorithm [GW 95]
k-TSP
• Find a tour of minimum length that visits
at least k vertices
• 2-approximation algorithm known for
undirected graphs based on algorithm for
PC-TSP [Garg 99]
• Can be extended to handle node-weighted
version
Mismatch
Constant factor approximation on length
doesn’t exponentiate well
• Suppose optimum solution reaches some
vertex v at time t for reward gtr
• Constant factor approximation would reach
within time 2t for reward g2tr
• Result: get only gt fraction of optimum
discounted reward, not a constant fraction.
Orienteering Problem
Find a path of length at most D that maximizes
net reward collected
• Complement of k-TSP
• approximates reward collected instead of length
• avoids changing length, so exponentiation
doesn’t hurt
• unrooted case can be solved via k-TSP
• Drawback: no constant factor approximation for
rooted non-geometric version previously known
• Our techniques also give a constant factor
approximation for Orienteering problem
Our Results
Using -approximation for k-TSP as subroutine
•
•
(3/2+2)-approximation for Orienteering
e(3/2+2)-approximation for DiscountedReward Collection
• constant-factor approximations for tree- and
multiple-path versions of the problems
Our Results
Using -approximation for k-TSP as subroutine
substitute =2 announced by Garg in 1999
•
•
(3/2+25 -approximation for Orienteering
e(3/2+13-approximation for DiscountedReward Collection
• constant-factor approximations for tree- and
multiple-path versions of the problems
Eliminating Exponentiation
• Let dv = shortest path distance (time) to v
• Define the prize at v as pv=gdv rv
• max discounted reward possibly collectable at v
• If given path reaches v at time tv,
define excess ev = tv – dv
• difference between shortest path and chosen one
• Then discounted reward at v is gev pv
• Idea: if excess small, prize ~ discounted reward
• Fact: excess only increases as traverse path
• excess reflects lost time; can’t make it up
Optimum path
• assume g = ½
(can scale edge lengths)
Claim: at least ½ of optimum path’s
discounted reward R is collected
before path’s excess reaches 1
Proof by contradiction:
• Let u be first vertex with eu ≥ 1
• Suppose more than R/2 reward follows u
• Can shortcut directly to u then traverse
the rest of optimum
• reduces all excesses after u by at least 1
• so “undiscounts” rewards by factor g -1 = 2
• so doubles discounted reward collected
• but this was more than R/2: contradiction
s
0
0.5
u 10
1.5
0.5
21
32
New problem:
Approximate Min-Excess Path
• Suppose there exists an s-t path P* with
prize value  of length l(P*)=dt+e
• Optimization: find s-t path P with prize value ≥  that
minimizes excess l(P)-dt over shortest path to t
• equivalent to minimizing total length, e.g. k-TSP
• Approximation: find s-t path P with prize value ≥  that
approximates optimum excess over shortest path to t,
i.e. has length l(P) = dt + ce
• better than approximating entire path length
Using Min-Excess Path
• Recall discounted reward at v is gev pv
• Prefix of optimum discounted reward path:
• collects discounted reward S gev pv  R/2
 spans prize S pv  R/2
• and has no vertex with excess over 1
• Guess t = last node on opt path with excess et  1
• Find a path to t of approximately (4 times) minimum
excess that spans  R/2 prize
(we can guess R/2)
• Excesses at most 4, so gev pv  pv/16
 discounted reward on found path  R/32
Solving Min-Excess Path problem
Exactly solvable case: monotonic paths
• Suppose optimum path goes through vertices
in strictly increasing distance from root
• Then can find optimum by dynamic program
• Just as can solve longest path in an acyclic graph
• Build table
• For each vertex v: is there a monotonic path from
v with length l and prize p?
Solving Min-Excess Path problem
Approximable case: wiggly paths
• Length of path to v is lv = dv + ev
• If ev > dv then lv > ev > lv/2
• i.e., take twice as long as necessary to reach v
• So if approximate lv to constant factor, also
approximate ev to twice that constant factor
Approximating path length
• Can use k-TSP algorithm to find approximately
shortest s-t path with specified prize
• merge s and t into vertex r
• opt path becomes a tour
• solve k-TSP with root r
• “unmerge”: can get one
or more cycles
• connect s and t by shortest
path
r
s
t
Decompose optimum path
monotone
monotone
wiggly
monotone
wiggly
Divides into independent problems
> 2/3 of each wiggly path is excess
Decomposition Analysis
• 2/3 of each wiggly segment is excess
• That excess accumulates into whole path
• total excess of wiggly segment  excess of whole path
total length of wiggly segments  3/2 of path excess
• Use dynamic program to find shortest (min-excess)
monotonic segments collecting target prize
• Use k-TSP to find approximately shortest wiggles
collecting target prize
• Approximates length, so approximates excess
• Over all monotonic and wiggly segments,
approximates total excess
Dynamic program
for Min-Excess Path
• For each pair of vertices and each
(discretized) prize value, find
• Shortest monotonic path collecting desired prize
• Approximately shortest wiggly path collecting
desired prize
• Note: polynomially many subproblems
• Use dynamic programming to find optimum
pasting together of segments
Solving Orienteering Problem:
special case
s
• Given a path from s that
• collects prize 
• has length  D
• ends at t, the farthest point from s
0
• For any const integer r  1, there
1
0.5
1.5
exists a path from s to some v with
• prize  /r
• excess  (D-dv)/r
v
12
3
t
Solving Orienteering Problem
General case: path ends at arbitrary t
• Let u be the farthest point from s
• Connect t to s via shortest path
• One of path segments ending at u
s
t
• has prize  /2
• has length  D
 Reduced to special case
• Using 4-approximation for
Min-Excess Path get
8-approximation for Orienteering
u
Budget Prize-Collecting Steiner
Tree problem
Find a rooted tree of edge cost at most D that spans
maximum amount of prize
• Complement of k-MST
• Create Euler tour of opt tree T* of cost  2D
• Divide this tour into two paths starting at root each
of length  D
• One of them contains at least ½ of total prize
• Path is a type of tree
• Use c-approximation algorithm for Orienteering to
obtain 2c-approximation for Budget PCST
Summary
• Showed maximum discounted reward can be
approximated using min-excess path
• Showed how to approximate min-excess path
using k-TSP
• Min-excess path can also be used to solve
rooted Orienteering problem (open question)
• Also solves “tree” and “cycle” versions of
Orienteering
Open Questions
• Non-uniform discount factors
• each vertex v has its own gv
• Non-uniform deadlines
• each vertex specifies its own deadline by which it
has to be visited in order to collect reward
• Directed graphs
• We used k-TSP, only solved for undirected
• For directed, even standard TSP has no known
constant factor approximation
• We only use k-TSP/undirectedness in wiggly parts
Future directions
• Stochastic actions
• Stochastic seems to imply directed
• Special case: forget rewards.
• Given choice of actions, choose to minimize cover
time of graph
• Applying discounting framework to other
problems :
• Scheduling
• Exponential penalty in place of hard deadlines