Markov decision processes with threshold-based

1/31
Problem
statement
Piecewise
linear policies
Power control
example
Markov decision processes with threshold-based
piecewise-linear optimal policies
Smart grid
example
T. Erseghe, A. Zanella, C. Codemo
Dept. of Information Engineering, University of Padova, Italy
Padova, June 20, 2013
Outline
2/31
Problem
statement
1
Problem statement
2
Piecewise linear policies
3
Power control example
4
Smart grid example
Piecewise
linear policies
Power control
example
Smart grid
example
Reference problem (in the infinite horizon)
3/31
Notation
Consider a (stationary) Markov chain with
Problem
statement
1
state x
Piecewise
linear policies
2
action a to map into next state y = f (a, x)
Power control
example
3
admissible action space a ∈ A(x)
4
cost γ(a, x) associated with action a
5
policy π(x) providing an action a = π(x)
Smart grid
example
Problem
We want to identify an optimal policy π ∗ that minimizes the
long term average cost, i.e.,
)
(
T −1
1 X
λ = min lim
E [γ(π(x(t)), x(t))]
π
T →∞ T
t=0
Example: Power control under buffer constraints
4/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
Sensor model
The sensor has a queue of maximum length Q.
At every time slot T :
1
L (fixed) bits are added to the queue
2
d bits are sent to the channel
3
the (normalized) power required for transmission is
P
2d/(WT ) − 1
=
N0 W
g
where g is channel attenuation
4
channel is (time-varying) Rayleigh with Jakes correlation
Target action
We want to minimize the (long term) average power P
Example (cont’d)
5/31
Equivalent model
1
Piecewise
linear policies
state is x = [xs , xv ] with xs the number of queued bits,
and xv = g the attenuation of the channel
2
action is a = −d, minus the number of transmitted bits
Power control
example
3
admissible action space is
Problem
statement
Smart grid
example
A(x) = [−xs − L, Q − xs − L] ∩ [−∞, 0]
to guarantee that no bits are lost, and that d ≥ 0
4
cost is
γ(a, x) =
5
2−a/(WT ) − 1
xv
policy π(xs , xv ) assumes that the channel state is known
at transmission
Off-the-shelf solution
6/31
Problem
statement
Piecewise
linear policies
Dynamic programming (Belmann’s equation)
Z
λ + g (x) = min γ(a, x) + p(y |a, x)g (y ) dy ,
a∈A(x)
where g (x) is some function satisfying g (0) = 0
Power control
example
Smart grid
example
Numerical solution (value iteration)
Start from g0 (x) = 0, and iteratively apply
Z
Ut (a, x) = p(y |a, x)gt (y ) dy
πt∗ (x) = argmin γ(a, x) + Ut (a, x)
a∈A(x)
g̃t+1 (x) = γ(πt∗ (x), x) + Ut (πt∗ (x), x)
gt+1 (x) = g̃t+1 (x) − g̃t+1 (0)
Limits of value iteration
7/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
Computational load :-(
πt∗ (x) = argmin γ(a, x) + Ut (a, x)
a∈A(x)
involves a search on A(x) for each value of x
Storage requirements :-(
The policy π(xs , xv ) may require large storage space
We look for algorithm counterparts requiring lower
computational demand and allowing for compact expression of
policies (e.g., piecewise linear) !!!
How the idea originated
8/31
Problem
statement
Piecewise
linear policies
Power control
example
Citation
P. Van de Ven, N. Hegde, L. Massoulie, and T. Salonidis,
Optimal control of residential energy storage under price
fluctuations, in Proc. of IARIA ENERGY 2011, Venice (I), May
2011, pp. 159162.
Smart grid
example
Contribute
A piecewise-linear threshold-based optimal policy was identified
for linear costs
Idea
Can this be generalized to piecewise linear costs?
Requirements for piecewise linear policies
9/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
Assumption 1
The state should have the form x = [xs , xv ] with xs
deterministically controlled by action a, and xv independent on
both xs and action a, that is,
p(y |a, x) = δ(ys − f (a, x)) p(yv |xv ) ,
Assumption 2
The deterministic function f (a, x) is linear with respect to xs
and a, that is,
f (a, x) = c1 (xv ) · xs + c2 (xv ) · a + c3 (xv )
with positive c1 and c2
Requirements for piecewise linear policies (cont’d)
10/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
Assumption 3
The action space
n
o
R(xv ) = (a, xs ) a ∈ A(xs , xv ), xs ∈ Xs
is closed and convex
Assumption 4
The cost function γ only depends on xv and a, i.e.,
γ(a, x) = γ(a, xv ) .
Moreover, γ is (or can be approximated as) convex and
piecewise linear in a
Requirements for piecewise linear policies (cont’d)
γ(a, xv )
11/31
Problem
statement
Piecewise
linear policies
Power control
example
a
∂γ(a, xv )
∂a
Smart grid
example
d3 (xv )
d5 (xv )
d4 (xv )
d2 (xv )
d1 (xv )
b1 (xv )
b2 (xv ) b3 (xv )
b4 (xv )
a
Fields of application
12/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
General
Any buffered system with Markovian inputs and parameters
Power control under buffer constraints
Where we need to set f (a, x) = xs + a + L, and where γ needs
to be approximated with a piecewise linear function
Energy storage (battery) management in the Smart Grid
Will see later on! ;-)
Main result
13/31
Problem
statement
Piecewise
linear policies
Piecewise linear policies
Under the above assumptions optimal policies assume the tilted
staircase form
πt∗ (xs , xv )
Power control
example
Smart grid
example
βn−1 (xv )
βn (xv ) −
βn (xv )
bn (xv )
c1 (xv )
xs
c2 (xv )
xs
bn−1 (xv )
πt∗ (x)
c1 (xv )
= max min βn (xv ) −
xs , bn (xv )
n=1,...,B+1
c2 (xv )
Main result (cont’d)
14/31
Identifying constants βn
βn (xv ) =
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
wn (xv )−c3 (xv )
c2 (xv )
are defined via
wn (xv ) = argmin dn (xv ) + c2 (xv ) ∂xsGt (xs , xv )
xs ∈Xs
where ∂xsGt expresses the sub-differential of convex function
Z
Gt (x) = p(yv |xv )gt (xs , yv ) dyv
Remark
The search in (1) is a search for level −dn (xv )/c2 (xv ) of
non-decreasing function ∂xsGt (xs , xv ) !!!
(1)
Alternative to standard value iteration
We track the sub-differential ∂xsGt (x), that is,
15/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
Set ∂xsG0 (x) = 0
for t = 0, 1, . . . do
3:
Evaluate constants wn (xv )
4:
Synthesize the optimal policy πt∗ (x)
5:
Evaluate the sub-differential of gt+1 (x) using
∂xs gt+1 (x) = ∂a γ πt∗ (x), xv ∂˜xsπt∗ (x)
h
i
+ ∂xsGt f (πt∗ (x), x), xv c1 (xv ) + c2 (xv ) ∂˜xsπt∗ (x)
1:
2:
6:
7:
Update the sub-differential ∂xsGt+1 (x) using
Z
∂xsGt+1 (x) = p(yv |xv )∂xs gt+1 (xs , yv ) dyv
end for
Computational load
16/31
Standard value iteration
πt∗ (x) = argmin γ(a, x) + Ut (a, x)
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
a∈A(x)
involves a search on A(x) for each value of x
Sub-gradient tracking
wn (xv ) = argmin dn (xv ) + c2 (xv ) ∂xsGt (xs , xv )
xs ∈Xs
involves a search on Xs for any value of xv (independently of
the value of breakpoints B)
Overall gain
P
|A(x)|
xP
x
1
1
Settings
17/31
Problem
statement
Piecewise
linear policies
Power control
example
Data source and queue settings
We assume constant bit rate data transmission at 64 kbit/s:
1
slot period T = 10 ms
2
arrival rate of L = 640 bit per slot
3
queue length Q = 10L
Smart grid
example
Transmission settings
We adopt a 802.15.4 like scenario:
1
equivalent bandwidth W = 156 khz
2
Rayleigh channel with average gain of −25 dB
3
Jakes correlation with doppler frequency fd = 11 Hz
(5 km/h)
Simulation results
18/31
Problem
statement
Piecewise
linear policies
Chosen approaches
1
CP (continuous policy) = standard policy iteration
2
TP (threshold based policy) = sub-gradient approach with
a four-pieces linear approximation of cost function
Power control
example
Cost approximation and parameters βn
(a)
(b) TP
15
6
βn∗
TP piecewise
linear approximation
2x
−1
[kbit]
10
β1∗
4
2
β4∗
5
0
Smart grid
example
0
0
−2
1
2
x
3
−40
−30
−20
xv [dB]
Simulation results (cont’d)
TP versus CP
19/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
π∗
(a) CP
0
(c) CP
2000
[kbit]
P
N0 W
1500
−2
1000
−4
500
−6
0
π∗
0
2
4
6
xs [kbit]
(b) TP
0
[kbit]
0
2
10
time [s]10
(d) TP
2000
P
N0 W
1500
−2
1000
−4
500
−6
0
0
2
4
6
xs [kbit]
−2
10
0
time [s]10
System model
Energy Storage Unit
20/31
E1
Piecewise
linear policies
Power control
example
batteries
Problem
statement
Smart grid
example
0
E2
0
δ1 (t)
E1(t)
f1
δ2 (t)
f2
E2(t)
Energy
splitterconcentrator
S(t)
X (t)
Grid
L(t)
EN
0
EN(t)
δN (t)
fN
User 1
overall power request L(t) (from loads)
2
energy drawn from the utility X (t)
3
energy taken from local storage S(t)
4
cost is C (X (t)) with C (·) convex and non-decreasing
Battery model
21/31
Charge/discharge constraints
Problem
statement
1
storage capacity limits 0 ≤ Ei (t) ≤ E i
Piecewise
linear policies
2
one-slot variation of charge δi (t) = Ei (t + 1) − Ei (t)
Power control
example
3
charge/discharge limits δ i ≤ δi (t) ≤ δ i
Smart grid
example
Dissipation
S(t) =
N
X
i=1
fi (δi (t)) ,
(
δ/ηci ,
fi (δ) =
δ ηdi ,
δ ≥ 0 (charge)
δ < 0 (discharge)
where ηci , ηdi ∈ (0, 1] are charge and discharge efficiency
coefficients
Mapping into reference model
22/31
Problem
statement
1
state is x = [xs , xv ] with xs = e, charge levels, and xv = `,
load request
2
action is a = δ, one-slot variations
Piecewise
linear policies
3
state update is ys = xs + a = e + δ
Power control
example
4
admissible action space is
n
o
A(x) = A(xs ) = δ : δi ∈ [δ i , δ i ], ei + δi ∈ [0, E i ]
5
cost is
Smart grid
example
γ(a, xv ) = γ(δ, `) = C ` +
N
X
fi (δi ) ,
i=1
and is piecewise linear whenever C is piecewise linear
Simulations (with one battery)
Power requests
PDF
Problem
statement
0.2
Power control
example
0.1
mL
Piecewise
linear policies
Smart grid
example
0
0
10
20
30
40
50
60
[kW]
150
C2 (`)
100
C3 (`)
cost units
Costs
23/31
50
C1 (`)
0
0
10
20
30
40
50
60
` [kW]
Simulations (cont’d)
24/31
Problem
statement
Chosen approaches
1
DPM = sub-gradient based/policy iteration approach with
Markovian load request
Power control
example
2
DPI = DPM assuming i.i.d. load requests
Smart grid
example
3
ST = threshold based simple policy [Tassiulas 2011]
4
AOS = optimum offline solution (upper bound)
Piecewise
linear policies
Realistic power requests
Power request process L(t) is generated using the sophisticated
and realistic model proposed in [Richardson 2010]. A group of
20 users over the 31 days of January was selected.
Simulations (cont’d)
25/31
Cost improvement (% wrt no battery)
(a)
Problem
statement
Piecewise
linear policies
30
Γ
25
(b)
(c)
30
AOS
DPM
DPI
ST
30
Γ
Γ
25
25
Power control
example
Smart grid
example
20
20
15
15
15
10
10
10
5
5
5
E = 12 kWh
20
E = 24 kWh
E = 6 kWh
0
0
10
δ [kW]
0
0
10
δ [kW]
0
0
10
20
δ [kW]
Simulations (cont’d)
26/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
Dependence on cost function (δ = 12 E )
(a)
(b)
10
(c)
70
Γ9
35
Γ
Γ
60
8
C1
30
C2
7
6
C3
50
25
40
20
30
15
20
10
10
5
5
4
3
2
1
0
0
10
20
30
E [kWh]
0
0
10
20
30
E [kWh]
0
0
10
20
30
E [kWh]
Simulations (cont’d)
27/31
Problem
statement
Policy parameters βn
(a)
25
β2 (`)20
Piecewise
linear policies
[kWh]15
Power control
example
10
5
Smart grid
example
0
0
10
20
30
40
50
40
50
60
` [kW]
(b)
2
β3 (`)
[kWh]1.5
1
0.5
0
0
10
20
30
60
` [kW]
For cost function C3 , E = 24 kWh and δ = 24 kW
Simulations (cont’d)
28/31
Problem
statement
Piecewise
linear policies
Power control
example
Even more compact policy parameters βn
1
2
Smart grid
example
3
4
β1 (`) = E
(
E
` ≤ L1
β2 (`) =
max(0, −7.1 + 0.275 `) ` > L1
(
max(0, 5.87 + 0.186 `)
` ≤ L2
β3 (`) =
max(0, −1.65 + 0.0347 `) ` > L2
β4 (`) = 0
which can be used for any E ≤ 24 kWh and δ ≤ 24 kW at
absolutely no loss in performance !!!!
Simulations (cont’d)
4
2
26
22
16
20
26
24
20
24
22
(δ U ,E U )
6
14
Power control
example
22
20
18
16
R
=
1
18
10
2
14
8
10
12
20
18
15
4
Smart grid
example
24
8
Piecewise
linear policies
E
[kWh]
18
Problem
statement
Contour plot of DPM cost improvement
10
29/31
16
16
14
14
(δ L ,E L )
12
12
6
12
10
5
8
4
R=
6
1
4
10
10
8
8
6
2
4
2
0
0
6
4
5
2
2
10
15
20
δ [kW]
30/31
Problem
statement
Piecewise
linear policies
Power control
example
Smart grid
example
Thanks for your attention!
31/31
Problem
statement
Piecewise
linear policies
Papers
1
Erseghe, Zanella, Codemo, Markov Decision Processes
with Threshold Based Piecewise Linear Optimal Policies,
IEEE Wireless Comm. Letters, early access
2
Codemo, Erseghe, Zanella, Energy Storage Optimization
Strategies for Smart Grids, ICC 2013
3
Erseghe, Zanella, Codemo, Optimal and Compact Control
Policies for Energy Storage Units with Single and Multiple
Batteries, submitted IEEE Trans. on Smart Grid
Power control
example
Smart grid
example