Processing Chip (device m) - University of Southern California

Asynchronous Control for Coupled
Markov Decision Systems
Transmit
Receive
Camera
Mode
Image
Processing
0
0
Device 1
Device 2
Device 3
0
t0
1
1
2
1
3
2
4
2
3
t1 t2 t3 t4 t5 t6 t7 t8
Michael J. Neely
University of Southern California
Information Theory Workshop (ITW)
Lausanne, Sept. 2012
1
t
t9 t10
Example: Network of Smart Devices
Each device m has a Processing Chip
and a Wireless Communication Chip.
Processing Chip (device m)
bits
energy
energy
State 1
State 2
State 3
State 4
energy
Frame 1
Wireless Comms Chip (device m)
Arriving
bits
bits
Frame 2
bits
Frame 3
time
channel quality
Queue
time
2
Example: Network of Smart Devices
There are many such devices sharing wireless
resources. Can do opportunistic scheduling.
3
Example: Network of Smart Devices
Heterogeneous timelines  we must solve
a time averaged fractional optimization:
Minimize:
∑ (transmit energy) + ∑
(processing energy)m
m
m
m
(frame size)m
Subject to:
∑
m
(bits generated for link i)m
(frame size)m
≤
(transmission)i
for all links i in {1,…, L}
4
General Model
•
•
•
•
S separate embedded Markov systems.
Each system s in {1, …, S} has state space K(s).
Each system has its own (variable length) frames.
On frame r for system s, observe:
 Observe Random Event ω(s)[r].
 Observe Current State k(s)[r].
 Choose Control Action α(s)[r].
• The 3-tuple (k(s)[r], ω(s)[r], α(s)[r]) determines:
 Frame size T(s)[r].
 Penalty vector (x(s)[r], y1(s)[r], …., yL(s)[r]).
 Transition Probabilities Pij(s)[r].
5
Generalized Goal:
Minimize:
∑
x(s)
∑
yi(s)
s
Subject to:
s
•
•
•
T(s)
T(s)
≤ di for all penalties i in {1,…, L}
Fractional terms with different denominators.
General problems of this type are intractable.
This has special structure that admits an optimal solution.
6
Theorem 1:
Consider special case with no random event processes
ω(s)[r]. Then:
1. The problem can be transformed into a linear
program via a nonlinear change of variables.
2. The total complexity is linear in the number of
systems S.
Translation: Total complexity is essentially the same as
having each system solving its own MDP over its own
state space. There is no curse of dimensionality as the
number of systems S grows large!
7
Now Treat Random Events
Example:
• L channels, each with 10000 quality levels.
• 10000L probabilities for the quality vector ω(s)[r]
(cannot estimate this huge number of statistics).
• Even single “standard” MDPs do not have such
random event processes ω(s)[r].
Idea:
• Use Lyapunov Optimization and Virtual Queues to
estimate appropriate scalar max-weight functionals.
• Theorem 2: This is is a computational tool for total
optimality with no curse of dimensionality.
8
Overview of Algorithm
Virtual Queue Update (for system s, penalty i, state k):
Zi[r+1] = max[Zi[r] + ∑θ(s)[r]yi(s)[r] –di, 0]
s
Hk(s)[r+1] = Hκ(s)[r] + θ(s)[r]1κ(s)[r] - ∑θ(s)[r]qik(s)[r]
i
J(s)[r+1] = J(s)[r] + θ(s)[r]T(s)[r] - 1
θ(s)[r] is an auxiliary variable related to 1/(frame size).
Use a drift-plus-penalty (or “max-weight”) decision
to choose actions on each frame based on virtual
queue values and observed random events ω(s)[r].
9