Quadratic Approximation of Cost Functions in Lost Sales and

Quadratic Approximation of Cost Functions in Lost
Sales and Perishable Inventory Control Problems
Peng Sun, Kai Wang, Paul Zipkin
Fuqua School of Business, Duke University, Durham, NC 27708, USA psun, kai.wang, [email protected]
We propose an approximation scheme for two important but difficult single-product inventory systems,
specifically, the lost-sales system and the perishable product system. The approach is based on quadratic
approximations of cost-to-go functions within the framework of Approximate Dynamic Programming via
Linear Programming. We propose heuristics based on the approximation and evaluate their performance.
Numerical results show promise in our approach.
History : This version: March 22, 2016
1. Introduction
1.1. Overview
Periodic-review inventory models with positive lead times can be hard to solve. The difficulty
arises from the high number of state variables needed to represent the pipeline inventories of
different ages. Dynamic programming for such a model therefore suffers from the so-called “curse
of dimensionality.” An exception is the basic model with back orders. In this case, it is well-known
that we can reduce the state to a single variable, the inventory position, and the optimal policy
takes the simple form of base-stock structure. This lovely reduction does not work for many other
systems, however, including systems with lost sales or a perishable product. This paper proposes
an approximate solution framework for these two problems.
In a system with lost sales, the state of the system is a vector of dimension equal to the lead time.
The vector contains the on-hand inventory and also all the outstanding replenishment orders in the
pipeline. The structure of the optimal policy has been partially characterized in Karlin and Scarf
(1958), Morton (1969), and Zipkin (2008b). A numerical evaluation of existing plausible heuristics
by Zipkin (2008a) has found room for improvement even for fairly short lead times. For example,
one of the best performing policies, namely the myopic2 policy, is computationally expensive and
its performance appears to deteriorate as the lead time increases.
A perishable inventory system has a multidimensional state even with zero lead time. The state
of the system contains information on the age distribution of the inventory. Most existing works on
perishable inventory focus on this setting, including the analysis of the optimal policy structure and
1
2
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
development of heuristics. Even more challenging is a system with positive lead time. Even partial
characterization of the optimal policy has been developed only recently by Chen et al. (2014b),
and a heuristic policy that works well is yet to be proposed.
These two inventory systems share an interesting structure: After a certain transformation of
the state variables the, the cost-to-go functions are L\ -convex (Zipkin 2008b, Chen et al. 2014b).
This property is a concept from discrete convex analysis, related to convexity, submodularity, and
diagonal dominance. The property reveals structural characteristics of the optimal policy. In this
paper, we propose an approximation scheme for these systems that exploits this structure.
The overall framework of the approach is Approximate Dynamic Programming via Linear Programming (ADP-LP) (de Farias and Van Roy 2003b). This technique requires us to specify a convex
cone of functions, and it approximates the cost-to-go function by an element of this cone. Here, we
specify the cone so that all its functions are themselves L\ -convex. Thus, we aim to preserve this
structural property of the true cost-to-go function.
In particular, we focus on quadratic L\ -convex functions. There are several reasons for this
restriction. First, as shown in Morton (1969) and Zipkin (2008b), the effective state space is a fairly
small compact set, and casual examination of the exact cost-to-go function in several cases suggests
that it is fairly smooth. Any smooth function is nearly quadratic over a small set. Second, this cone
has a fairly small number of extreme rays, and so the resulting optimization model has a small
number of variables. Third, the simplest policy obtained from such a function (called LP-greedy)
is essentially a variant of a base-stock policy, where the inventory position is replaced by a (nearly)
linear function of the state variables. This form is easy to understand and implement.
For the lost-sales system, we propose several heuristic policies. A direct greedy policy (LPgreedy) derives the order quantity using the approximate cost-to-go function from ADP-LP as the
continuation function. It is essentially a generalized base-stock policy, which depends on a “twisted”
inventory position that is linear in all outstanding orders, plus a separable nonlinear function of
the current inventory. There is also a “second-degree” greedy policy (LP-greedy-2), where we use
the optimal cost of the greedy policy as the next period’s cost-to-go. The linear-quadratic structure
can be further exploited by applying value iterations to the quadratic approximate cost function
obtained from the ADP-LP. This creates an entire class of order-up-to policies ranging from LPgreedy which is nearly linear in the state variables, to a policy we call T L that is nonlinear (but
still fairly simple) in the state variables. We also have a one-step-further greedy policy, which we
call T L+1 , that searches for the best order quantity using the objective function from T L as the
cost-to-go function.
We then develop similar heuristics for the perishable inventory problem. Specifically, we have
a class of policies ranging from LP-greedy to what we call T Λ , which can be also interpreted as
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
3
base-stock policies, using a “twisted” inventory position. We also have a perishable version of the
T L+1 policy. We also introduce a myopic policy, which is a natural extension of the myopic heuristic
introduced by Nahmias (1976a), but with lead times. To the best of our knowledge, these are the
first heuristics proposed for the perishable inventory model with positive lead times.
In the numerical study, we evaluate these heuristics against benchmark heuristics, namely, myopic
policies. The results show that the heuristics perform better than the benchmarks in nearly all
cases. The more complex heuristics are somewhat better than the simpler ones, but the differences
are modest.
For the lost-sales system we also test the modified myopic policy introduced recently in Brown
and Smith (2014), which we call the B-S myopic policy. They find that this policy performs quite
well in a few instances. We consider a larger range of problems. We observe that the policy performs
as well or better than the other heuristics.
1.2. Literature Review
The lost-sales model with positive lead time is first formulated by Karlin and Scarf (1958) and
further explored by Morton (1969). The base-stock policy is found not optimal for such systems,
and the optimal order quantity is partially characterized as decreasing in all pipeline inventories,
with increasing but limited sensitivity from the most dated to the most recent order. Morton (1969)
also derives bounds for the optimal policy and suggests such bounds could be used as a simple
heuristic, later called the standard vector base-stock policy. Various other heuristics have been
proposed. Morton (1971) studies a single-period myopic policy based on a modified accounting
scheme, which is extended by Nahmias (1976b) to more general settings. Levi et al. (2008) introduce a dual-balancing policy that is guaranteed to yield costs no greater than twice the optimal.
Asymptotic analysis has been done in both directions of lead time and penalty cost. The base-stock
policies were found by Huh et al. (2009) to be asymptotically optimal as the penalty cost increases,
while Goldberg et al. (2012) shows asymptotic optimality of constant-order policies proposed by
Reiman (2004) for large lead times. Zipkin (2008a) evaluates the performance of various heuristics
and compares them with the optimal policy for lead times up to four. He finds that base-stock
policies perform poorly, and a two-period horizon version of Morton (1971)’s myopic policy, namely
myopic2, generally performs best among the heuristics studied. In a recent development, Brown
and Smith (2014) propose a modified myopic policy with an adjustable parameter, the terminal
cost. They demonstrate that, with a good choice of the terminal cost (found by simulation), this
heuristic performs well. We contribute to the body of work on heuristics for the lost-sales model
by introducing a class of well-performing policies that preserve the structural characterization of
the optimal policy.
4
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
Most literature on perishable inventory control focuses on systems with zero lead time. Nahmias
and Pierskalla (1973) studies a system with a single product with two periods of lifetime. Nahmias
(1982) provides a review of early literature. More recent works are summarized in Nahmias (2011)
and Karaesmen et al. (2011). Most recently, Chen et al. (2014b) partially characterize the optimal
policies in a more general setting that allows positive lead times, either backlogging or lost-sales
for unmet demand, and joint inventory and pricing decisions. As for heuristics, Nahmias (1976a)
proposes a myopic base-stock policy for the zero lead-time case. It appears that the policies we
propose here are the first for positive lead times. More recently, Chao et al. (2015) propose another
heuristic, similar in spirit to the dual-balancing policy for the lost-sales model.
The concept of L\ -convexity, developed by Murota (1998, 2003, 2005) in the area of discrete
convex analysis, was first introduced to inventory management by Lu and Song (2005). Since
then, Zipkin (2008b) reveals the L\ -convexity structure for lost-sales systems, which helps recover
and extend earlier results by Karlin and Scarf (1958) and Morton (1969). Huh and Janakiraman
(2010) extends the analysis to serial inventory systems. Pang et al. (2012) applies the approach
to joint inventory-pricing problem for back order systems with leadtimes, and Chen et al. (2014b)
further extend it to perishable inventories for both the backorders and lost-sales cases. For computation, Chen et al. (2014a) propose a scheme for finite-horizon problems with L\ -convex cost-to-go
functions based on recursively applying a technique for extending an L\ -convex function to a multidimensional domain from a finite number of points. Our work is tailored for infinite-horizon
problems. The approach is based on a parametric (specifically, quadratic) approximation of L\ convex cost functions. We also derive intuitive and easy to implement heuristics based on the
approximation. Also, we point out a simple but useful fact about L\ -convex functions under certain
variable transformations.
Our overall framework is Approximate Dynamic Programming using Linear Programming (ADPLP). The method is studied in de Farias and Van Roy (2003b) for discounted costs and de Farias
and Van Roy (2003a, 2006) for the average cost problem.
1.3. Organization and Notation
The remainder of the paper is organized as follows. Section 2 lays out the model setup of the
lost sales system with positive lead time. We then introduce the approximation scheme in Section
3. Section 4 presents heuristics. In Section 5, we extend the approach to the perishable inventory control problem. Numerical studies for both problems are presented in Section 6. Finally, we
summarize and give concluding remarks in Section 7.
Throughout the paper, for vectors a and b, we use a ∨ b and a ∧ b to represent the elementwise maximum and minimum of a and b, respectively. When a and b are scalars, they are simply
a ∨ b = max(a, b) and a ∧ b = min(a, b). Also, a+ = max(a, 0), and a− = − min(a, 0).
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
5
2. The Lost Sales Inventory Model
2.1. Formulation
The model setup and notation closely follow Zipkin (2008b). Consider the standard, single-item
inventory system in discrete time with lost sales. Denote
L = order lead time.
dt = demand in period t.
zt = order at time t.
xt = (x0,t , x1,t , · · · , xL−1,t ) where x0,t is the inventory level at time t
and x1,t = zt+1−L , · · · , xL−1,t = zt−1
ρt = x0,t − dt
The demands dt are independent, identically distributed random variables. The state of the system
is represented by the L-vector xt which includes the inventory on hand as well as orders of the
past L − 1 periods. Its dynamics follow
xt+1 = [x0,t − dt ]+ + x1,t , x2,t , · · · , xL−1,t , zt .
For generic state variables stripped of time indices, we have
x+ = [x0 − d]+ + x1 , x2 , · · · , xL−1 , z .
We shall use such generic variables unless time indices are unavoidable. By default we treat such
state vectors as column vectors in matrix operations.
Next we introduce two different linear state transformations. First, we can represent the state
PL−1
by a vector s = (s0 , · · · , sL−1 ), where sl = τ =l xτ , 0 ≤ l < L, are the partial sums of pipeline
inventories. The dynamics of this new state vector are
s+ = [s0 − s1 − d]+ + s1 , s2 , · · · , sL−1 , 0 + ze ,
where e is the L-dimensional vector of 1s. Alternatively, the state can be represented by vector
Pl
v = (v0 , · · · , vL−1 ), where vl = τ =0 xτ , 0 ≤ l < L. The dynamics of v are
v+ = v1 , · · · , vL−1 , vL−1 + z − (v0 ∧ d)e ,
We shall mainly work with the v state, but refer to x and s at various points.
6
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
Let
c = unit cost of procurement.
h(+) = unit cost of holding inventory.
h(−) = unit cost penalty of lost sales.
γ = discount factor.
We assume that the procurement cost is paid only when the order arrives. There is no fixed order
cost. Let q 0 (x0 ) denote the expected holding-penalty cost at the end of the current period, starting
with inventory x0 . Then, q 0 (x0 ) = E h(+) ρ+ + h(−) ρ− , recalling that ρ = x0 − d.
Let f (v) be the optimal cost-to-go, as a function of the variables v. It satisfies the following
Bellman equation:
n
o
f (v) = min γ L cz + q 0 (v0 ) + γE f (v+ ) .
z≥0
Let
f¯(s) = f (s0 − s1 , s0 − s2 , · · · , s0 − sL−1 , s0 ) ,
(2.1)
that is, the optimal cost with respect to the s variables.
Definition 1 (L\ -Convexity). A function f : RL → R ∪ ∞ is called L\ -convex if the function
ψ(v, ζ) = f (v − ζe) is submodular on RL+1 .
(This is one of several equivalent definitions. The same definition works for functions of integer
variables.) Zipkin (2008b) shows that the function f¯(s) is L\ -convex. The following result implies
that the function f (v) also enjoys this property.
Proposition 1 (Preservation of L\ -Convexity by State Transformation). Let function f
be any function of RL . Define f¯ in terms of f , as in (2.1). Then, f is L\ -convex, if and only if f¯
is.
Our approximation procedure can be based on either the s state, or the v state. In order to be
consistent with later developments, we focus on the v state.
We shall also need the following quantities:
ĥ(+) = h(+) − γc,
q̂ 0 (v0 ) = cv0 + E ĥ(+) ρ+ + h(−) ρ− ,
q̂(v, z) = γ L E q̂ 0 (v0,+L ) .
Thus, q̂ is the discounted expected purchase plus holding-penalty cost in period t + L (the first
period where the current order affects such costs), as viewed from period t, assuming that the
starting inventory in period t + L + 1 is valued at rate c.
7
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
2.2. Existing Heuristics
Here we present some existing heuristics, namely, the standard vector base-stock (SVBS) policy, the
myopic and myopic2 policies, and the Brown-Smith modified myopic policy (B-S myopic). These
are either used as benchmarks later on or employed as part of our approximation scheme.
SVBS is arguably the easiest to implement heuristic, and provides an upper bound for the
optimal order quantity according to Morton (1969). Define s̄ = (s, 0) = (s0 , s1 , · · · , sL−1 , 0), ϑ =
PL
(c + ĥ(+) )/(ĥ(+) + h(−) ), and d[l,L] = k=l d+k for l = 0, · · · , L. Now set
s̄SVBS
= min s : P[d[l,L] > s] ≤ ϑ ,
l
l = 0, · · · , L.
At state s, the order quantity is
+
z(s) = min{s̄SVBS
− s̄l , l = 0, · · · , L} .
l
The myopic policy selects the order quantity z to minimize the q̂(v, z) above, and myopic2 is
defined by
n
o
z(v) = arg min q̂(v, z) + γE q̂ ∗ (v+ ) ,
z≥0
where q̂ ∗ (v) = minz≥0 q̂(v, z) is the minimal cost from myopic. Myopic and myopic2 are quite
intuitive, and arguably the best performing heuristics studied in Zipkin (2008a). Although myopic2
always outperforms myopic, it is computationally more cumbersome and can be impractical for
large L, when the set of states becomes large.
Brown and Smith (2014) suggest a modified version of the myopic policy. The objective function
of the original myopic policy is
h
i
(−) −
q̂(v, z) = E cv0,+L + ĥ(+) ρ+
+
h
ρ
+L
+L
h
i
+
(−) −
= E cv0,+L + h(+) ρ+
+
h
ρ
+
γ(
−
cρ
)
.
+L
+L
+L
Brown and Smith (2014) propose adjusting the parameter c in the salvage value cρ+
+L above. Their
method conducts a line search to find the best such value, where each value is evaluated by direct
simulation. In the few examples they present, the approach works quite well, though it is not clear
why.
3. Approximation
We now develop the Linear Programming approach to Approximate Dynamic Programming (ADPLP), based on de Farias and Van Roy (2003b).
8
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
3.1. Linear Program for Approximate Dynamic Programming
First we briefly describe the ADP-LP in a general framework. Consider a discrete-time infinite
horizon dynamic program (DP) with state space V ⊆ RL . For each state vector v ∈ V , an action
vector a is chosen from the set of available actions Av ⊆ RK , which incurs a single-stage cost of
g(v, a). Let v+ denote the next period’s state, which follows the state transition v+ = v+ (v, a, ω),
where ω is a random variable, whose distribution only depends on v and a. Letting γ denote the
discount factor, the optimal discounted expected cost function f (v) satisfies the Bellman equation
n
o
f (v) = min g(v, a) + γE f (v+ ) .
a∈Av
When the sets V and Av are finite, f (v) is the solution to the following linear program (LP):
X
maximize
wv f (v)
{f (v):v∈V}
v∈V
h
i
subject to f (v) ≤ g(v, a) + γE f v+ (v, a, ω) ,
v ∈ V , a ∈ Av .
where {wv }v∈V are some state-relevance weights. In this exact formulation, these weights can be any
positive numbers. This LP has as many decision variables as the number of states in the original
DP problem, and there is a constraint for each state-action pair. For a problem with a large state
space, solving this exact LP is impractical.
The idea of ADP-LP is to approximate f by a function in a relatively small set of functions.
PM
Specifically, consider a linearly parameterized class of functions f˜(v) = m=1 rm fm (v). The fm are
pre-defined “basis functions”, and the rm are their variable linear coefficients. This yields a Linear
Program for Approximate Dynamic Programming (ADP-LP):
maximize
{rm }M
m=1
subject to
M
X
rm
m=1
M
X
X
wv fm (v)
v∈V
M
hX
i
rm fm (v) ≤ g(v, a) + γE
rm fm v+ (v, a, ω)
m=1
∀v ∈ V , a ∈ Av ,
m=1
which only has M decision variables. Some or all of the rm may also be required to be nonnegative. The number of constraints, however, remains large. Therefore, it is necessary to sample the
constraints. Unlike the exact formulation, here the choice of state-relevance weights can affect the
quality of the solution.
3.2. Quadratic Approximation for L\ -Convex Cost-to-Go functions
There are three key components in the ADP-LP approach that we need to specify. First, we need
a class of basis functions whose span is rich enough to mimic the shape of the cost-to-go function,
but small enough that the number of decision variables is manageable. Second, the state-relevance
weights need to be determined. Last, we need to decide how to sample the constraints, to keep the
linear program tractable.
9
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
3.2.1. Basis functions For the first issue, we propose to approximate the cost-to-go function
by an L\ -convex quadratic function. Such a function has the following form:
Proposition 2 (Murota (2003)). A quadratic function f˜(v) is L\ -convex, if and only if
f˜(v) =
L−1
X
X
µl vl2 +
µlk (vl − vk )2 +
λl vl + ν
l=0
l,k:l6=k
l=0
L−1
X
L−1
for some {µl }L−1
l=0 ≥ 0, {µlk |l 6= k } ≥ 0, {λl }l=0 and ν, where µlk = µkl , l 6= k.
In matrix notation, define the vector λ and the matrix Q to be
λ = (λ0 , λ1 , · · · , λL−1 ), and Q =
X
µl E(l) +
X
µkl E(kl) .
k,l:k6=l
l
Here, the matrix E(l) is an L × L matrix with zero components except the lth diagonal component,
which is 1, while E(kl) is a zero matrix except the kth and lth diagonal components taking value 1,
and the k, lth and the l, kth components taking value −1. According to Proposition 2, a quadratic
L\ -convex function can be expressed as v> Qv + λ> v + ν .
Applying this approximation in the original DP yields the following ADP-LP formulation:
maximize
{µl }L−1
l=0
{µlk |l6=k}
L−1
{λl }l=0 , ν
subject to
X
wv (
v∈V
L−1
X
L−1
X
µl vl2
+
l=0
µl vl2 +
l=0
X
2
µlk (vl − vk ) +
L−1
X
l,k:l6=k
X
µlk (vl − vk )2 +
l,k:l<k
λl vl + ν)
l=0
L−1
X
λ l vl + ν
l=0
L−1
hX
X
L
0
2
≤ γ cz + q̂ (y) + γE
µl vl,+
+
µlk (vl,+ − vk,+ )2
l=0
+
L−1
X
i
λl vl,+ + ν ,
l,k:l<k
∀v ∈ V , 0 ≤ z ≤ z SVBS (v)
l=0
µl ≥ 0 ,
∀l = 0, · · · , L − 1
µlk ≥ 0 ,
∀l, k : l < k
The number of decision variables is only (L + 1)(L + 2)/2.
3.2.2. State-relevance weights and constraint sampling Let u∗ denote the optimal policy of the exact DP and πu∗ the steady-state distribution of states under u∗ . de Farias and Van
Roy (2003b) suggest wv be as close to πu∗ as possible for good approximations. Although πu∗ is not
available, we often can find a good heuristic policy û, and approximate its steady-state distribution
πû . The hope is that the steady-state distributions πû and πu∗ are close to each other. We propose
to generate a sample of states V û by simulating a long trajectory of states under some easy to
10
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
implement and fairly well-performing heuristic policy û. We then use the sample state distribution
π̂û , an estimate of πû , as the state-relevance weights wv . In the lost-sales problem, for example, we
can use an existing heuristic, such as SVBS or myopic.
de Farias and Van Roy (2004) also discuss imposing only a sampled subset of the constraints,
and conclude that a good sampling distribution should be close to πu∗ as well. Thus a sample of
states generated by a well-performing policy, call it V û , is a reasonable choice for the constraints.
This is what we do.
To summarize, the linear programming problem we ultimately solve is
maximize
{µl }L−1
l=0
{µlk |l6=k}
{λl }L−1
, ν
l=0
subject to
X
wv (
µl vl2 +
l=0
v∈V û
L−1
X
L−1
X
l=0
µlk (vl − vk )2 +
l,k:l6=k
X
µl vl2 +
X
L−1
X
λl vl + ν)
l=0
µlk (vl − vk )2 +
l,k:l<k
L−1
X
λl vl + ν
l=0
L−1
L−1
hX
i
X
X
2
2
≤ γE
µl vl,+ +
µlk (vl,+ − vk,+ ) +
λl vl,+ + ν
l=0
L
l,k:l<k
0
+ γ cz + q̂ (y) ,
(3.1)
l=0
∀v ∈ V û , 0 ≤ z ≤ z
µl ≥ 0 ,
∀l = 0, · · · , L − 1
µlk ≥ 0 ,
∀l, k : l < k
SVBS
(v)
L−1
Let {µ̃l }L−1
l=0 , {µ̃lk |l 6= k }, {λ̃l }l=0 , and ν̃ be the solution to the linear program (3.1). We thus
obtain the following quadratic approximation of the cost-to-go function f (v), represented in a more
compact matrix form,
>
f˜ADP (v) = v> Q̃v + λ̃ v + ν̃ ,
where the vector λ̃ and the matrix Q̃ are defined as
λ̃ = (λ̃0 , λ̃1 , · · · , λ̃L−1 ) ,
Q̃ =
X
l
µ̃l E(l) +
X
µ̃kl E(kl) .
k,l:k6=l
4. Heuristics for the Lost Sales System
In this section, we develop several heuristics for the lost-sales model based on the approximation
f˜ADP obtained as above. Since the SVBS policy is easy to compute and bounds the optimal order
quantity from above, we use it as an upper bound for all of the following policies.
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
11
4.1. LP-Greedy Policy
The approximate cost-to-go function f˜ADP immediately yields the following heuristic, in which the
order quantity is determined by
z LPg (v) = arg min
n
o
γ L cz + q 0 (v0 ) + γE f˜ADP (v+ )
0≤z≤z SVBS
We call it the LP-greedy policy. Due to the linearity of both the single period cost and the state
transition function in the decision variable z, the policy has a closed-form expression. For notational
convenience, for a vector v, we denote vm:n to represent subvector (vm , vm+1 , · · · , vn ). Similarly,
for a matrix Q, notation Qm:n,s:t represents its submatrix with rows m, m + 1, . . . , n and columns
s, s + 1, . . . , t.
We further define vector κ = (κ0 , · · · , κL−1 ) and scalar ς such that
Pl−2
µ̃L−1 + 2 l=0 µ̃l,L−1
µ̃L−1
−λ̃L − γ L−1 c
κ0 = κ1 =
, κl =
, ∀l = 2, · · · , L − 1, and ς =
,
µ̄
µ̄
2µ̄
in which the denominator
µ̄ = µ̃L−1 + 2
L−2
X
µ̃l,L−1 .
(4.1)
(4.2)
l=0
Proposition 3. The order quantity following the LP-greedy policy is
z LPg = 0 ∨ ẑ LPg ∧ z SVBS
(4.3)
in which
ẑ LPg (x) = ς − κ0 E [x0 − d]+ + κ>
.
1:L−1 x1:L−1
Note that the order quantity ẑ LPg , without the upper and lower bounds, is almost linear in the
state variables. More specifically, let ṽ = κ0 E [x0 − d]+ + κ>
1:L−1 x1:L−1 , which is a κ1:L−1 -weighted
sum of vector E [x0 − d]+ + x1 , x2 , · · · , xL−1 . This vector, in turn, is the expected next period’s
pipeline inventories without the current period’s order quantity. If we consider this ṽ a “twisted”
inventory position, the LP-greedy policy can be interpreted as a generalized base-stock policy where
the constant term ς serves as the order-up-to level.
Since 0 ≤ κ0 = κ1 ≤ · · · ≤ κL−1 ≤ 1 and 0 ≤ dE [x0 − d]+ /dx0 ≤ 1, we have
−1 ≤
dẑ LPg
dẑ LPg
≤ ··· ≤
≤0 ,
dxL−1
dx0
which echoes the monotone sensitivity result of Karlin and Scarf (1958), Morton (1969) and Zipkin
(2008b).
12
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
4.2. LP-Greedy-2 Policy
Following the LP-greedy policy, we define
h
i
f LPg (v) =γ L cz LPg (v) + q 0 (v0 ) + γE f˜ADP v+ (v, z LPg , d) .
Given that z LPg has a closed-form expression, it is easy to compute f LPg (v). This suggests an
extension of the LP-greedy policy by solving the following single dimensional optimization problem.
n
o
min
γ L cz + q 0 (v0 ) + γE f LPg (v+ ) .
0≤z≤z SVBS
We call the solution to this problem LP-greedy-2 policy, since it essentially solves a two-period DP
with f˜ADP as the terminal cost.
4.3. T L policy
The quadratic form of f˜ADP can be further exploited if we temporarily put aside the non-negativity
constraint and the upper bound z SVBS for the order quantity z, and recursively define functions
f˜(l) for l = 0, · · · , L − 1 as
f˜(0) (v) =f˜ADP (v),
n
o
f˜(l) (v) = min γ L cz + q 0 (v0 ) + γE f˜(l−1) (v+ ) ,
z
l = 1, ..., L − 1.
(4.4)
These functions are all quadratic in z. As a result, we can obtain an expression for z iteratively by
computing functions f˜(1) to f˜(L−1) . Formally, we have the following result.
Proposition 4. Functions f˜(0) . . . f˜(L−1) , as recursively defined in (4.4), can be expressed in the
following manner:
f˜(l) (v) =
l
h
i
X
E E[v0:L−(l+1),+l |v+τ ]> Q(l)
τ E[v0:L−(l+1),+l |v+τ ]
τ =0
+ λ(l) E[v0:L−(l+1),+l ] +
l−1
X
(4.5)
γ τ E[q̂ 0 (v0,+τ )] + ν (l) ,
τ =0
(l)
(l)
where (L − l) × (L − l) matrices Q(l)
are recursively defined
τ , 1 × (L − l) vectors λ , and scalar ν
as:
(0)
Q0 = Q̃ , λ(0) = λ̃ , ν (0) = ν̃ , and ∀l = 0, . . . , L − 2, and τ = 1, · · · , l + 1 ,
!
Pl
Pl
(l)
(l)
J>
L−l−1 (
τ =0 Qτ,(1:L−l,L−l) )(
τ =0 Qτ,(L−l,1:L−l) )JL−l−1
(l+1)
,
Q0
= −γ
Pl
(l)
τ =0 Qτ,(L−l,L−l)
(l)
Q(l+1)
= γJ>
τ
L−l−1 Qτ −1 JL−l−1 ,
!
Pl
(l)
(l)
L−1
(λ
+
γ
c)(
Q
)
(L−l)
τ,(L−l,1:L−l)
τ =0
λ(l+1) = γ λ(l) −
JL−l−1 ,
Pl
(l)
τ =0 Qτ,(L−l,L−l)
(l)
L−1 2 λ
+
γ
c
(L−l)
ν (l+1) = γ ν (l) − Pl
,
(l)
4 τ =0 Qτ,(L−l,L−l)
(4.6)
13
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
in which matrix Jl is an (l + 1) × l matrix as follows,


1
 1



.

.
Jl = 
. 
 ,

1
1
and scalars ν (l) are constants that do not depend on the state v.
Furthermore, function f˜(L−1) takes the following form:
f˜(L−1) (v) =
L−1
X
L−2
h
i
X
(L−1)
2
Q(L−1)
E
E[v
|
v
]
+
λ
E[v
]
+
γ τ E[q 0 (v0,+τ )] + ν (L−1) .
0,+L−1
+τ
0,+L−1
τ
τ =0
τ =0
Expression (4.6) provides a simple procedure to obtain Q̃τ(L−1) and λ(L−1) that are needed to
compute f˜L−1 . It is worth noting that this procedure is essentially the process of solving an (L − 1)period linear-quadratic control system.
Now we employ f˜(L−1) as the approximate cost-to-go function, and solve
minimize
0≤z≤z SVBS
γ L cz + q 0 (v0 ) + γE f˜(L−1) (v+ ) .
We call this the T L policy, where T refers to a dynamic programming operator as defined in the
usual sense. Our procedure corresponds to applying the T operator L times to f˜ADP , with the first
L − 1 times unconstrained and the last one subject to constraint 0 ≤ z ≤ z SVBS . The order quantity,
L
denoted by z T , is the optimizer of the last iteration of operator T .
L
L
Similar to LP-greedy, the T L policy is also quite simple. Note that z T = 0 ∨ ẑ T ∧ z SVBS , where
ẑ
TL
L
X
h 2 i
+
(L−1)
+
L
(L−1)
(v) = arg min γ cz + γ
Qτ
E E ρ+L−1 + z |v+τ
+λ
E ρ+L−1 + z
τ =1
=ς
TL
− E ρ+
+L−1 ,
h
+ PL
L
in which ς T = −(λ(L−1) + γ L−1 c)/(2 τ =1 Q(L−1)
),
and
E
ρ
=
E
· · · [x0 − d]+ + · · · + xL−1 −
τ
+L−1
i
+
d+L−1
. Note E ρ+
+L−1 is the expected inventory level after a lead time of L periods right before
the order quantity z arrives, and can be computed backward iteratively for l = L − 1, · · · , 1:
+
E[ρ+
+L−1 |x0,+L−1 ] = E (x0,+L−1 − d+L−1 ) |x0,+L−1 ,
h i
+
+
E[ρ+
|
x
,
x
,
·
·
·
,
x
]
=
E
E
ρ
|
[x
−
d
]
+
x
,
x
,
·
·
·
,
x
.
0,+l−1
l
L−1
0,+l−1
+l−1
l
l+1
L−1
+L−1
+L−1
Therefore this heuristic can be interpreted as another type of generalized base-stock policy that
places an order to raise the expected inventory level after lead time, E[x0,+L ], to ς T
possible.
L
whenever
14
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
The LP-greedy and T L heuristics form an interesting contrast. Both can be interpreted as orderup-to policies with respect to some “twisted” inventory positions. However, the sensitivity of the
LP-greedy order quantity to the inventory state is governed by L − 1 coefficients {κl }L−1
l=1 (note that
κ0 = κ1 ), but the sensitivity of the T L order quantity is endogenously determined by the expression
E ρ+
+L−1 .
4.4. T L+1 Policy
The quadratic structure of the approximate cost function can be further exploited one step beyond
the T L policy. Since the objective function of T L is still quadratic in z, we further define
n
o
f˜(L) (v) = min γ L cz + q 0 (v0 ) + γE f˜(L−1) (v+ )
z
=
L
X
L−1
h 2 i
+ X
+
L
E ρ+L−1 |v+τ
− γ cE ρ+L−1 +
γ τ E[q 0 (x0,+τ )] + ν (L) ,
Q(L)
τ E
τ =0
τ =0
where
(L)
Q0
=−γ
L
X
(L−1)
Q(L−1)
, Q(L)
= γQτ −1
τ
τ
∀τ = 1, · · · , L , and
τ =1
(L−1)
L−1 2 λ
+
γ
c
.
ν (L) =γ ν (L−1) −
PL−1 (L−1)
4 τ =0 Qτ
We define the T L+1 policy by order quantity
zT
L+1
= arg min
n
o
γ L cz + q 0 (v0 ) + γE f˜(L) (v+ ) .
0≤z≤z SVBS
+
2 i
Note that E[q 0 (x0,+L )], E ρ+
and
E
ρ
|
v
in f˜(L) (v+ ) can all be evaluated through recursion
+L
+L +τ
in a similar fashion as E[ρ+
+L−1 |x0,+L−1 ]. The only difference is that they are now functions of z.
Also we no longer have a closed-form expression for z T
L+1
and need to conduct a line search for
the solution. The computation cost is of the same magnitude as myopic.
5. The Perishable Product Model
5.1. Formulation
Consider a perishable product inventory system. It takes Λ periods for an order to arrive. Upon
arrival the product has M periods of lifetime, after which it must be discarded. The product
is issued to satisfy demand according to a “First In, First, Out”(FIFO) policy. Unmet demand
is backlogged. The state of this system is described by an L = Λ + M − 1 dimensional vector
x = (x0 , · · · , xL−1 ). Here, xM −1+l for 1 ≤ l ≤ Λ is the order that will arrive in l periods. The xl
for 0 ≤ l ≤ M − 1 represent information related the inventories that have already arrived on hand
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
15
and also the current backorders. These are nonnegative, except for x0 , which may be negative. If
x0 ≥ 0, then there are no backorders, and xl is the inventory with l periods of lifetime remaining.
PM −1
If x0 ≤ 0, then the actual backorders are [ l=0 xl ]− , and the inventory with k periods or less of
Pk
lifetime remaining is [ l=0 xl ]+ . This state definition makes the dynamics relatively simple:
x+ = x1 − [d − x0 ]+ , x2 , · · · , xL−1 , z .
Similar to the lost sales system, the state can be alternatively represented by the vector of partial
Pl
inventory positions v, with vl = τ =0 xτ for 0 ≤ l < L − 1. Let y denote the order-up-to level
vL−1 + z, the dynamics of state v are
v+ = v1 , · · · , vL−1 , y − (v0 ∨ d)e.
The disposal cost for expired inventory is θ per unit. The unit procurement cost and inventory holding cost are again denoted by c and h(+) , respectively, and h(−) now represents the unit
backorder cost. Again, there is no fixed order cost. Let
χ0 (v) =E h(+) [vM −1 − v0 ∨ d]+ + h(−) [vM −1 − d]− + θ[v0 − d]+
=E h(+) [vM −1 − d]+ + h(−) [vM −1 − d]− + (θ − h(+) )[v0 − d]+ ,
be the expected inventory and disposal cost. The optimal cost-to-go function f (v) satisfies:
f (v) = min
y≥vL−1
n
o
γ Λ c(y − vL−1 ) + χ0 (v) + γE f (v+ ) .
Chen et al. (2014b) show that f (v) is L\ -convex.
Even with zero lead time (Λ = 0), the state space includes inventory levels of different remaining lifetimes, and therefore is multi-dimensional. A system with positive lead time is still more
challenging.
5.2. Benchmark Heuristics
As Karaesmen et al. (2011) point out, there has not been much work on heuristics for perishable
inventory systems with lead time. Here we propose a myopic policy inspired by Nahmias (1976a)
for zero lead time problems. This serves two purposes. First, this is a benchmark to compare
to our ADP-LP based heuristics. Second, as discussed earlier, our approximation approach needs
some easy-to-implement policy to generate relevance weights and a sample of states for the linear
program.
Define
χ̂(v, y) =γ Λ E h(+) [vM −1,+Λ − d+Λ ]+ + h(−) [vM −1,+Λ − d+Λ ]− + γ L θ̂E [v0,+L − d+L ]+ .
16
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
This is the discounted expected purchase, holding-backorder and disposal costs, in the first periods,
respectively, where the current order affects these quantities. A simple recursion can be used to
compute the distributions of vM −1,+Λ and v0,+L . The myopic policy sets the order-up-to level to
y myopic (v) = arg min (1 − γ)γ Λ cy + χ̂(v, y) .
y≥vL−1
5.3. ADP-LP Based Heuristics
Now we present our ADP-LP based heuristics, many of which share similar ideas with the heuristics
for the lost sales problem.
5.3.1. LP-greedy policy The LP-greedy policy for the perishable inventory model is very
similar to one for the lost-sales model. Define vector ξ = (ξ0 , · · · , ξL−1 ) and scalar ς, such that
− γ L−M c + λ̃L−1
2µ̃l−1,L−1
µ̃L−1
, ξl =
, for l = 1, · · · , L − 1, and ς =
,
ξ0 =
µ̄
µ̄
2µ̄
>
in which µ̄ is defined in (4.2). Based on the approximation result f˜ADP (v) = v> Q̃v + λ̃ v + ν̃, the
LP-greedy policy sets the order-up-to levels to
y LPg (v) = max vL−1 , ŷ LPg ,
where
n
>
o
>
ŷ LPg (v) = arg min γ Λ c(y − vL−1 ) + χ0 (v) + γE v+
Q̃v+ + λ̃ v+ + ν̃
(
= arg min γ Q̃L,L y 2 + 2γ(Q̃L,1:L−1 v1:L−1 − e> Q̃1:L,L E[v0 ∨ d] y
)
Λ
+ γ c + γ λ̃L−1 y
=ξ0 E[v0 ∨ d] + ξ >
1:L−1 v1:L−1 + ς .
PL−1
Note that ξ ≥ 0 and l=0 ξl = 1. Thus the order-up-to quantity under the LP-greedy policy is
a weighted sum of (E[v0 ∨ d], v1 , · · · , vL−1 ) plus a constant.
Again we look at the policy from the perspective of the order quantity, z, as a function of the
original states x. Using κ = (κ0 , · · · , κL−1 ) as defined in (4.1), we have
z LPg (x) = max 0, ẑ LPg (x) ,
ẑ LPg (x) =ς − κ0 x0 − E[x0 ∨ d] + κ>
.
1:L−1 x1:L−1
Similar to the lost-sales model, we have 0 ≤ κ0 ≤ κ1 ≤ · · · ≤ κL−1 ≤ 1 and 0 ≤ dE[x0 ∨ d]/dx0 ≤ 1.
Therefore,
dẑ LPg
dẑ LPg
≤ ··· ≤
≤0 ,
dxL−1
dx0
which is consistent with to the monotone sensitivity property of the optimal order quantity iden−1 ≤
tified in Chen et al. (2014b).
17
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
5.3.2. T Λ policy With the quadratic approximation, we can apply the same operations used
to compute the T L heuristic for the lost-sales system. In this problem, we can not apply the
unconstrained dynamic programming operator in a linear-quadratic fashion for L times, but only
for Λ = L − M + 1 times. We take the solution of the Λth iteration and subject it to the additional
nonnegativity constraint, and call the result the T Λ policy.
Recursively define functions f˜(l) for l = 0, · · · , L − M as
f˜(0) (v) =f˜ADP (v),
n
o
f˜(l) (v) = min γ Λ cz + χ0 (v) + γE f˜(l−1) (v+ ) ,
l = 1, ..., L − M.
z
(5.1)
The T Λ order quantity for perishable inventory system is defined as
n
o
Λ
z T = arg min γ Λ cz + χ0 (v) + γE f˜L−M (v+ ) .
z≥0
Similar to Proposition 4, the order quantity has a closed-form expression.
Proposition 5. The order quantity following the T Λ policy is
h Λ
i+
Λ
Λ
Λ
z T = ς T − κT1 E[x0,+Λ ] + κT2:M −1 xΛ+1:L−1
,
for some appropriately defined scalar ς T
Λ
Λ
and vector κT1:M −1 when M ≥ 2. When M = 1,
Λ
+
Λ
z T = ς T + E[(x0,+Λ−1 − d)− ] .
The general form shares some similarity to the LP-greedy order quantity. Note that the term
Λ
Λ
κT1 E[x0,+Λ ] in z T is similar to x0 − E[x0 ∨ d] in z LPg . The difference is, in T Λ , the first Λ elements
Λ
of the state vector are combined in κT1 E[x0,+Λ ] in a nonlinear fashion. Heuristic T Λ can still be
interpreted as an order-up-to policy, with base-stock level ς T
Λ
and an inventory position that is
non-linear in the first Λ elements and linear in the last M − 1 elements of the x state vector. In
the case of M = 1, it is a special base-stock policy. The order quantity is the base-stock level ς T
Λ
plus a nonnegative term E[(x0,+Λ−1 − d)− ], which is essentially the forecast of backlog Λ − 1 periods
later. Since there is only one period of life time, any excess inventory is outdated at the end of the
period and does not contribute to future inventory levels. Therefore, the order quantity does not
depend on the forecast of (x0,+Λ−1 − d)+ .
5.3.3. T Λ+1 policy Similar to the T L+1 policy for the lost-sales model, we have the following
T Λ+1 policy for the perishable product problem. It does not have a closed-form expression and is
solved by a one-dimensional search. The order quantity is
n
o
Λ+1
zT
= arg min γ Λ cz + χ0 (v) + γE f˜Λ (v+ ) .
z≥0
We provide some further derivations for this policy in the Appendix.
18
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
5.4. Observations
Before we present computational results, it is worth summarizing some general insights on our
ADP-LP based heuristics for both inventory systems. The LP-greedy and T L /T Λ policies can be
perceived as two extremes of a more general class of policies. For the lost sales system, the general
rule takes the following form:
h
i+
>
.
z = ς − κ0 E[ρ+
]
+
κ
x
1:L−l l:L−1
+l−1
for some positive base-stock level ς, and coefficients 0 ≤ κ0 ≤ κ1 ≤ · · · ≤ κL−l ≤ 1 with l = 1, · · · , L.
+
>
A twisted inventory position x̄ = κ0 E[ρ+
+l−1 ] + κ1:L−l xl:L−1 consists of E[ρ+l−1 ], the forecast of
inventory holding l − 1 periods later, which is a nonlinear function of x0:l−1 , and a linear combination
of xl:L−1 . In this framework, LP-greedy corresponds to the case l = 1. The T L policy, on the other
hand, corresponds to the case l = L, with κ0 = 1.
Similarly, for the perishable inventory system, the policy is:
h
i+
>
z = ς − κ0 E[−ρ−
]
+
κ
x
.
1:L−l l:L−1
+l−1
for l = 1, · · · , Λ. LP-greedy corresponds to the case l = 1 and T Λ corresponds to the case l = Λ. The
term E[ρ−
+l−1 ] is the forecast of backlog on the immediately outdating inventory l − 1 periods later
that needs to be subtracted from the linear combination of the rest of the pipeline inventory.
This view of our heuristic policies may suggest good policies structures for other systems with
L\ -convex structures. We leave such explorations to future research.
6. Computational Study
Now we present computational results for the lost sales and perishable product problems.
6.1. Lost Sales Problem
We take model parameters from Zipkin (2008a) by letting c = 0, h(+) = 1, and h(−) take values
from {4, 9, 19, 39}, and using Poisson and geometric distributions both with mean at 5 for demand.
Zipkin (2008a) studied lead times from 1 to 4. To test how well our approach handles long lead
times, we set L equal to 4, 6, 8, and 10.
We use the myopic policy to generate a long trajectory of states for the state-relevance weights
and the set of constraints. In particular, we simulate the system for 6000 periods under this policy,
discarding the first 1000 periods to mitigate the initial transient effect. The remaining 5000 time
periods’ states are used as our sample state set V û for the ADP-LP. (We also tried the SVBS and
myopic2 policies as generating policies. We found that myopic produces better results than SVBS,
and about the same as myopic2, which is far harder to compute.)
19
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
Then we solve the ADP-LP. Using the solution, we determine each ADP-LP-based heuristic
policy (LP-greedy, LP-greedy-2, T L , and T L+1 . We evaluate each policy by simulating the system
for 50000 periods. Finally, we evaluate the myopic, myopic2, and B-S myopic policies, again by
simulation. (These evaluation simulations are of course independent of those used to set up the
ADP-LP.)
Tables 1 and 2 present the average costs of the ADP-LP-based policies and the benchmark
policies. We mark numbers bold to highlight cases where an ADP-LP-based policy outperforms
the myopic2 policy.
Table 1
L
4
4
4
4
6
6
6
6
8
8
8
8
10
10
10
10
h(−)
4
9
19
39
4
9
19
39
4
9
19
39
4
9
19
39
Table 2
L
4
4
4
4
6
6
6
6
8
8
8
8
10
10
10
10
h
(−)
4
9
19
39
4
9
19
39
4
9
19
39
4
9
19
39
Average Cost with Poisson Demand (Lost-Sales)
LPg
4.82
6.94
9.09
11.68
5.01
7.36
9.77
12.63
5.13
7.68
10.26
13.21
5.24
7.94
10.68
13.75
LPg2
4.77
6.90
9.11
11.90
4.96
7.33
9.887
13.43
5.07
7.61
10.38
15.64
5.15
7.82
10.78
18.85
TL
4.75
6.92
9.33
12.39
4.89
7.33
10.16
14.23
4.96
7.56
10.73
19.09
5.00
7.72
11.37
45.70
T L+1
4.78
6.91
8.93
10.81
4.93
7.33
9.75
11.99
5.00
7.62
10.34
12.86
5.03
7.82
10.76
13.53
M1
5.06
7.20
9.18
11.04
5.42
7.91
10.28
12.47
5.70
8.43
11.11
13.59
5.90
8.86
11.82
14.59
M2
4.82
6.92
8.96
10.84
5.05
7.43
9.85
12.11
5.20
7.77
10.53
13.09
5.31
8.08
11.09
13.93
B-S
4.72
6.83
8.88
10.79
4.87
7.22
9.63
11.94
4.95
7.47
10.15
12.79
4.99
7.63
10.53
13.46
Average Cost with Geometric Demand (Lost-Sales)
LPg
10.72
16.74
23.46
30.88
10.89
17.35
24.39
32.36
11.08
17.80
25.25
33.37
14.21
18.28
26.02
36.03
LPg2
10.66
16.74
23.49
31.02
10.83
17.28
24.54
32.78
10.93
17.65
25.28
34.14
10.99
17.94
25.88
35.19
TL
10.64
16.76
23.67
31.27
10.79
17.31
24.90
33.75
10.86
17.63
25.74
35.69
11.18
17.86
26.44
39.60
T L+1
10.65
16.66
23.02
29.40
10.81
17.29
24.38
31.58
10.87
17.68
25.36
33.23
11.18
17.91
26.03
36.10
M1
11.32
17.62
24.04
30.33
11.75
18.78
26.04
33.16
12.04
19.66
27.63
35.48
12.23
20.36
28.97
37.46
M2
10.80
16.89
23.29
29.64
11.08
17.75
24.93
32.12
11.27
18.39
26.21
34.12
11.40
18.89
27.27
35.82
B-S
10.61
16.58
22.96
29.37
10.76
17.15
24.23
31.50
10.84
17.49
25.11
33.09
10.88
17.72
25.76
34.34
We observe that all the ADP-LP heuristics give reasonably good results, compared to the benchmarks. The T L+1 policy consistently outperforms myopic2. The LP-greedy policy has good and
20
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
stable performance. Although it does not yield the best results when L = 4, it does manage to
outperform myopic2 in the majority of cases as the lead time increases. The performance of T L ,
however, can be poor for high penalty cost. (Wang (2014) contains a diagnosis of this erratic
behavior.)
It is worth noting that there are more bold case numbers in Table 2 than in Table 1, indicating
that more variations of our ADP-LP-based heuristics outperform myopic2 when the demand distribution is geometric, than when it is Poisson. Now, Zipkin (2008a) observed that in almost all cases
studied in that paper, the existing heuristics generally perform closer to optimality for Poisson
demand than for geometric demand. Our result, on the other hand, suggests that our heuristics
have better potential for cases that challenge traditional heuristics more. (We are not sure why
this is so.)
Observe also that the B-S myopic policy performs best among all policies in most cases. It
remains a mystery why this should be so. (Recall that this method requires a line search for the
best terminal value, with each step requiring a simulation.)
The approximation scheme relies on linear programs generated from randomly simulated trajectories of states. It is therefore important to check if the performance comparisons in the study are
robust. For each parameter setting, we repeat the same procedure to generate the ADP-LP and
evaluate the heuristics 100 times, with 100 independently sampled demand trajectories. We pair
up policies for comparison and take the difference in average costs for each demand trajectory, and
check the t-statistic of such paired performance differences from the 100 repetitions.
Table 3 presents the paired t-statistics computed for one-tailed tests on the performance differences between the ADP-LP-based heuristics and myopic-2. The t-statistic tables show that most
of the corresponding performance orderings in Table 1 and Table 2 are statistically significant. In
fact, 156 out of 160 instances are at least 90% significant (critical value of t-statistics at 1.29),
among which 150 instances are at least 99.95% significant (critical value of t-statistics at 3.4).
It is also worth investigating the stability of the ADP-LP-based policies, given uncertainties in
the generated trajectories upon which linear programs are based. First, among the 10 out of 160
instances in Table 3 that are less than 99.95% significant, 5 correspond to L = 10 and 4 correspond
to h(−) = 39. Secondly, the performance of T L becomes significantly inferior to other policies for
larger h(−) ’s. In some cases with both L and h(−) being high, average costs can be extraordinarily
high. Generally speaking, higher L and h(−) imply a larger state space. Fixing the sample size,
we expect higher variation in the generated trajectories for constructing constraints and staterelevance weights for the linear program. Indeed, among the 100 repetitions for each parameter
setting, we observe such a trend. – Using sample trajectories of 5000 periods to generate the
ADP-LPs for all instances, the standard deviation of average costs from any ADP-LP-based policy
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
21
Table 3
L h(−)
4
4
4
9
4
19
4
39
6
4
6
9
6
19
6
39
8
4
8
9
8
19
8
39
10
4
10
9
10
19
10
39
Performance t-statistics for Lost-Sales Heuristics
Poisson
Geometric
LPg LPg2
T L T L+1 LPg LPg2
T L T L+1
0.7 36.6 132.1 46.6 19.4 40.7 241.8 118.8
-6.2
6.5
-5.9 10.5 23.4 18.2 143.4 137.3
-22.8 -24.7 -53.1 30.0 -20.7 -19.8 -139.5 179.0
-91.5 -83.4 -60.6 55.9 -96.8 -81.7 -295.5 255.8
19.9 36.0 279.5 73.0 108.0 125.7 386.5 168.8
32.1 31.5 54.2 35.5 84.3 87.1 348.7 117.6
21.0
-3.3 -45.3 33.7 45.2 22.9
1.3 115.5
-54.3 -28.6 -44.9 61.5
-7.9 -23.4 -77.1 144.2
29.9 32.8 258.4 94.8
2.6 108.0 469.1 197.1
38.3 38.2 141.1 44.0 176.4 157.6 389.2 101.7
127.4 12.3 -13.0 39.8 302.2 78.2
37.5 66.7
-9.4 -14.7 -17.1 53.5 53.5
-1.1 -30.0 89.6
22.8 33.2 129.2 176.6
-6.9 69.2
1.3
1.3
40.4 62.7 155.2 58.8
6.1 157.3 120.3 119.1
179.1
9.1
-6.6 41.1 278.1 162.8
42.0 79.6
3.6
-9.3
-7.2 72.6
-0.7 10.8
-2.4
-0.2
almost always increases in L or h(−) . Furthermore, the standard deviation of the average costs from
T L increases much faster in h(−) than any other policies. In Table 4, we compare the LP-greedy
and T L policies for some selected cases. In (L = 10, h(−) = 39), the average costs of T L become
highly unstable, compared with LPg. That is, when h(−) is high, the same results and variation
from linear programs yield much higher variation in the average costs of T L than LP-greedy or
other policies. If we increase the length of LP-generating sample trajectories from 5000 to 100000,
we are able to shrink the standard deviation of T L from 43.92 to 0.127 except for two outliers. The
clear conclusion is that more samples are needed to generate the LP for higher L and h(−) values,
especially for the T L heuristic.
Table 4
Standard Deviation of Long-Run Average Costs, LP-greedy vs. T L , Poisson Demand
L h(−)
LPg
TL
4
4 0.0191 0.0062
4
9 0.0331 0.0089
4
19 0.0580 0.0715
4
39 0.0929 0.2563
10
4 0.0313 0.0241
10
9 0.0336 0.0248
10
19 0.0270 0.4345
10
39 0.4885 43.9260
There are asymptotic results for the lost-sales system in both the lead time L and the lostsales penalty h(−) . On one hand, Goldberg et al. (2012) establish that a constant-order policy is
asymptotically optimal as the lead time grows large. On the other hand, Huh et al. (2009) show
that, as the lost-sales penalty increases, an order-up-to policy is asymptotically optimal. Although
22
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
these asymptotic results do not necessarily imply that the optimal policy converges to a constantorder policy or an order-up-to policy when the corresponding parameter grows large, it is still of
interest to check whether well-performing heuristic policies exhibit such tendencies.
Under the LP-greedy policy, for example, such tendencies, if they exist, can be observed in the
linear coefficients κ of the LP-greedy order quantity. Since the order quantity is almost affine
in the state vector x, if all {κl }L−1
l=0 approach 0, the LP-greedy policy has a tendency towards a
constant-order quantity; if all {κl }L−1
l=0 approach 1, on the other hand, the LP-greedy policy tends
to behave like an order-up-to policy. Since the increases in L and h(−) have competing impacts
on the behavior of the order quantities, we choose (L = 4, h(−) = 4) as the base case and increase
one parameter L or h(−) while keeping the other fixed. Tables 5 and 6 report the coefficients with
increasing L and h(−) , respectively. To ease the comparison, we also present the average magnitude
PL−1
of LP-greedy coefficients, κ̄ = ( l=0 κl )/L, for each case. From the tables, we do observe decrease
L−1
(−)
of {κl }l=0
and κ̄ in L and increase of {κl }L−1
.
l=0 and κ̄ in h
Table 5
Lost-Sales LP-greedy Coefficients with Increasing L
L
4
6
8
10
κ̄
0.764
0.628
0.504
0.359
κ0
0.691
0.530
0.414
0.287
κ1
0.691
0.530
0.414
0.287
L
4
6
8
10
κ̄
0.648
0.527
0.412
0.330
κ0
0.566
0.378
0.281
0.228
κ1
0.566
0.378
0.281
0.228
Table 6
Poisson Demand, h(−) = 4
κ2
κ3
κ4
κ5
κ6
0.787 0.885
0.586 0.643 0.707 0.774
0.422 0.449 0.490 0.538 0.605
0.288 0.293 0.305 0.324 0.357
Geometric Demand, h(−) = 4
κ2
κ3
κ4
κ5
κ6
0.675 0.786
0.511 0.570 0.622 0.701
0.345 0.395 0.428 0.459 0.511
0.250 0.270 0.289 0.310 0.335
κ7
κ8
κ9
0.703
0.409 0.473 0.564
κ7
κ8
κ9
0.599
0.384 0.447 0.558
Lost-Sales LP-greedy Coefficients with Increasing h(−)
Poisson Demand, L = 4
h(−)
κ̄
κ0
κ1
κ2
κ3
4 0.764 0.691 0.691 0.787 0.885
9 0.812 0.752 0.752 0.836 0.909
19 0.849 0.794 0.794 0.875 0.932
39 0.851 0.785 0.785 0.890 0.943
Geometric Demand, L = 4
h(−)
κ̄
κ0
κ1
κ2
κ3
4 0.648 0.566 0.566 0.675 0.786
9 0.781 0.705 0.705 0.815 0.900
19 0.847 0.816 0.816 0.859 0.897
39 0.901 0.881 0.881 0.911 0.932
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
23
6.2. Perishable Inventory
In the numerical study of the perishable inventory model, the cost parameters are set to be consistent with Chen et al. (2014b). The order cost c, holding cost h+ , and disposal cost θ are set as
22.15, 0.22 and 5 respectively. The backorder cost h− is varied among {1.98, 4.18, 10.78, 21.78}. The
state dimension L (the total lifetime of a newly placed order) takes value in {4, 6, 8, 10}. We set
the on-hand lifetime M equal to 2. Note that since Chen et al. (2014b)’s study is about the joint
inventory-pricing problem for the perishable system, the demand model in that paper is therefore
not stationary. Instead, we assume the distribution of demand d to be truncated Normal on [0, 20]
with expected demand equal to 10. The coefficient of variation (c.v.) is used to control the ratio
between standard deviation and the expectation of d. We experiment with c.v. ∈ {0.2, 0.6, 1.0},
corresponding to demand standard deviations in {2, 6, 10}. We use the myopic policy as both the
generating policy for the approximate linear program and the benchmark.
L
4
4
4
4
6
6
6
6
8
8
8
8
10
10
10
10
h(-)
1.98
4.18
10.78
21.78
1.98
4.18
10.78
21.78
1.98
4.18
10.78
21.78
1.98
4.18
10.78
21.78
Table 7
Average Cost (Perishable Inventory)
c.v = 0.2
c.v = 0.6
c.v = 1.0
LPg
T Λ Myop
LPg
T Λ Myop
LPg
T Λ Myop
224.9 224.9 224.7 241.1 242.0 241.5 245.6 248.8 246.3
226.0 226.0 226.1 250.2 250.1 252.4 257.5 257.1 260.3
228.0 228.0 228.4 265.5 264.4 268.9 276.3 274.8 280.7
230.3 230.2 230.5 277.4 275.8 283.3 290.7 288.6 298.2
226.0 226.2 226.1 244.2 246.9 246.3 249.4 261.4 251.8
227.7 228.0 228.2 255.5 254.5 258.8 263.7 262.3 267.1
230.9 230.7 231.6 271.8 269.6 278.6 283.6 280.4 292.6
234.5 234.0 234.5 285.2 281.5 293.2 298.5 294.3 309.1
226.9 227.1 227.3 246.9 254.8 249.7 252.4 273.7 255.9
229.1 229.1 230.0 259.4 257.3 263.7 268.1 265.4 273.5
233.1 232.7 234.1 276.2 272.5 284.7 288.7 283.4 298.9
238.1 236.9 238.1 291.0 284.6 301.5 304.5 297.5 319.0
227.7 227.9 228.6 249.2 269.3 253.0 255.1 280.4 259.5
230.3 230.2 231.4 262.5 259.3 267.7 271.4 267.2 277.6
236.5 234.2 236.3 279.9 274.5 290.2 293.6 285.4 304.2
243.9 239.3 240.7 300.6 286.8 306.7 313.3 299.6 324.6
Table 7 lists the long-run average costs of LP-greedy and T Λ against myopic for each instance. We
also conducted the same robustness check as in the lost-sales model by repeating the simulation for
100 independent demand sample trajectories and list the t-statistics for the performance advantage
of LP-greedy and T Λ over myopic in Table 8. We notice that the ADP-LP-based policies outperform
myopic for the majority of instances, with a few exceptions. When the back-order cost h(−) = 1.98,
T Λ has higher average costs than both myopic and LP-greedy. However, we do observe that when
h(−) is high, T Λ outperforms both myopic and LP-greedy by significant margins. Interestingly, this
is quite the opposite to the behavior of the T L policy in the lost-sales model. Recall that with
lost-sales, T L tends to perform better with low h(−) and can be unstable with high h(−) . There is
24
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
however one common feature between low h(−) for the perishable inventory and high h(−) for the
lost-sales system. In both instances, the system tends to stay in the states where it behaves more
like a back-order system with non-perishable goods.
Table 8
L
4
4
4
4
6
6
6
6
8
8
8
8
10
10
10
10
Performance t-statistics for Perishable Inventory Heuristics
c.v = 0.2
c.v = 0.6
c.v = 1.0
h(-) LPg
T Λ LPg
T Λ LPg
TΛ
1.98 -4.76 -6.33 4.59 -2.17 6.32 -4.17
4.18 2.31 1.15 10.02 10.54 10.93 12.28
10.78 7.78 7.42 10.56 14.17 11.35 15.40
21.78 2.26 3.40 13.05 16.57 13.95 17.39
1.98 3.67 -0.71 12.92 -0.76 12.29 -3.00
4.18 7.67 6.17 11.63 15.35 10.78 12.55
10.78 8.21 10.41 15.34 20.16 13.82 19.02
21.78 -0.31 4.54 11.61 17.02 12.58 17.95
1.98 8.63 3.36 14.80 -2.48 12.71 -3.19
4.18 10.90 10.76 11.08 16.60 11.48 18.70
10.78 10.30 14.83 14.27 21.55 14.07 22.43
21.78 0.21 7.09 11.86 19.29 13.66 20.92
1.98 10.28 8.16 11.90 -3.09 11.30 -2.73
4.18 11.76 13.70 13.56 22.53 11.99 22.23
10.78 -0.19 15.00 17.28 27.99 13.44 26.46
21.78 -1.93 5.83 2.72 21.33 5.45 22.71
6.3. Computation Times
We report the running times for implementing our approach in Table 9. We separate the time
needed to generate and solve the linear program and the time to simulate the various policies. For
each model, we report the parameter settings corresponding to the smallest and largest problem
sizes. All programs are written in MATLAB on a Linux server with Intel Xeon X5460 processors,
calling CPLEX 12.5 as the linear programming solver.
Table 9
ADP-LP
Policy
ADP-LP
Policy
Running Time of ADP-LP and Heuristics (Seconds)
Lost-Sales
Policy
L=4, h(−) =4
L=10, h(−) =39
Poisson
Geometric
Time
2.46
15.05
Time
LPg
3.14
9.46
LPg2
10.03
293.67
TL
2.78
7.53
L+1
T
30.19
333.97
M1
16.34
119.16
M2
166.56
2926.77
Perishable
Policy
M=2, L=4
M=2, L=10
h(−) =1.98, c.v. = 0.2 h(−) =21.78 c.v. = 1.0
Time
0.72
23.15
Time
LPg
1.21
1.12
TΛ
1.28
2.08
25
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
7. Concluding Remarks
We have tried a number of alternative techniques, not reported here. See Wang (2014). In particular,
he considers the Smoothed ADP-LP approach recently proposed by Desai et al. (2012). This
approach does not appear to significantly outperform the standard approach used in this paper.
Several variants of the approach are worth investigating. First, it would be interesting to try other
families of basis functions besides the quadratics. Second, there are other approaches to ADP, such
as regression. Third, several alternative sampling methods are available, such as quasi-Monte-Carlo.
Finally, the potential contribution of this work is not limited to the two individual problems we
have studied. Several other models have been found to enjoy the L\ -convexity structure. The same
idea could be applied to any of them. For example, Chen et al. (2014b) address a general problem
of joint inventory-pricing for perishables with lost-sales and positive lead times. Our methods,
especially the LP-greedy heuristic, can be seamlessly applied to the positive lead-time cases of this
general problem.
Appendix
A. Proof of Proposition 1
We give the argument for integer variables. The proof for real variables is essentially the same.
Consider ψ̄(s, ξ) = f¯(s − ξe). We want to show that ψ̄ is submodular, if and only if ψ is. Compute
the second cross partial differences of ψ̄. Each of these is equal to one of the second cross partial
differences of ψ, and vice versa. Thus, all the second cross partial differences of ψ̄ are nonpositive,
if and only if all those of ψ are.
B. Proof of Proposition 3
n
ADP
o
LPg
L
0
˜
Let ŷ (v) = arg min γ c(y − vL−1 ) + q (v0 ) + γE f
(v+ ) , then
n
>
o
>
ŷ LPg (v) = arg min γ L c(y − vL−1 ) + q 0 (v0 ) + γE v+
Q̃v+ + λ̃ v+ + ν̃
(
)
= arg min γ Q̃L,L y 2 + 2γ(Q̃L,1:L−1 v1:L−1 − e> Q̃1:L,L E[v0 ∧ d] y + γ L c + γ λ̃L−1 y
=ξ0 E[v0 ∧ d] + ξ >
1:L−1 v1:L−1
+ ς = ξ0 E[x0 ∧ d] +
L−1
X
ξl x1 + · · · + ξL−1 xL−1 + ς .
l=1
Then
ẑ
LPg
=ŷ
LPg
− vL−1 = ς + ξ0 E[x0 ∧ d] − (1 −
L−1
X
l=1
=ς − κ0 E [x0 − d]+ + κ>
x
1:L−1 1:L−1 .
ξl )x1 + · · · + (1 − ξL−1 )xL−1
26
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
Since γ L cz + q 0 (y) + γE f˜ADP (v+ ) is convex in z,
n
o
γ L cz + q̂ 0 (v0 ) + γE f˜ADP (v+ )
0≤z≤z SVBS
n
o
=0 ∨ arg min γ L cz + q̂ 0 (v0 ) + γE f˜ADP (v+ ) ∧ z SVBS = 0 ∨ ẑ LPg ∧ z SVBS .
z LPg = arg min
C. Proof of Proposition 4
Base Case:
v+ =(v0,+ , · · · , vL−1,+ , vL−1,+ + z)
> (0)
f˜(0) (v+ ) =v+
Q v+ + λ(0) v+ + ν (0)
(0)
(0)
(0)
>
=Q0,(L,L) z 2 + 2(Q0,(L,1:L) JL−1 v0:L−2,+ )z + v0:L−2,+
(J>
L−1 Q0 JL−1 )v0:L−2,+
(0)
+ λ(L) z + λ(0) JL−1 v0:L−2,+ + ν (0)
Then
f˜(1) (v) = min{γ L cz + q̂ 0 (v0 ) + γE[f˜(0) (v+ )]}
z
n
o
(0)
(0)
(0)
= min γQ0,(L,L) · z 2 + 2γQ0,(L,1:L) JL−1 · E[v0:L−2,+1 ] + γλ(L) + γ L c · z
z
>
(0)
(0)
+ γE v0:L−2,+
(J>
JL−1 · E[v0:L−2,+ ] + q̂ 0 (v0 ) + γν (0)
L−1 Q0 JL−1 )v0:L−2,+ + γλ
!
(0)
(0)
γ · J>
L−1 Q0,(1:L,L) · Q0 JL−1
>
−
=E[v0:L−2,+1 ]
E[v0:L−2,+1 ]
(0)
Q0,(L,L)
2 (0)
λ(L) + γ L−1 c
>
(0)
>
0
(0)
+ E v0:L−2,+ (γJL−1 Q0 JL−1 )v0:L−2,+ + q̂ (v0 ) + γ ν −
(0)
4Q0,(L,L)
(0)
(0)
λ(L) + γ L−1 c Q0,(L,1:L) (0)
JL−1 · E[v0:L−2,+1 ]
+γ λ −
(0)
Q0,(L,L)
> (1)
> (1)
=E E[v0:L−2,+1 |v] Q0 E[v0:L−2,+1 |v] + E[v0:L−2,+1 |v+1 ] Q1 E[v0:L−2,+1 |v+1 ]
+ λ(1) E[v0:L−2,+1 ] + q̂ 0 (v0 ) + ν (1) .
(1)
(1)
It is also easy to see the matrix Q0 + Q1 is positive semidefinite.
Pl
Induction Step: Assume the result holds for f˜(1) (v), and the matrix τ =0 Q(l)
τ is positive semidefinite, then we have
27
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
f˜(l+1) (v) = min{γ L cz + q̂ 0 (v0 ) + γE[f˜(l) (v+ )]}
z
l−1
X
(l)
L
0
τ
0
(l)
= min γ cz + q̂ (v0 ) + γ
γ E[q̂ (v0,+τ +1 )] + γλ E[v0:L−(l+1),+l+1 ] + γν
z
τ =0
l
hX
i
+ γE
E[v0:L−(l+1),+l+1 |v+τ +1 ]> Q(l)
E[v
|
v
]
0:L−(l+1),+l+1 +τ +1
τ
τ =0
=γ min
X
l
z
l
X
(l)
(l)
(l)
Qτ,(L−l,L−l) z 2 + 2
Qτ,(L−l,1:L−l)) JL−l−1 E[v0:L−(l+2),+l+1 ] + λ(L−l) + γ L−1 c z
τ =0
τ =0
l
hX
i
(l)
+ γE
E[v0:L−(l+2),+l+1 |v+τ +1 ]> J>
L−l−1 Qτ JL−l−1 E[v0:L−(l+2),+l+1 |v+τ +1 ]
τ =0
(l)
+ γλ JL−l−1 E[v0:L−(l+2),+l+1 ] +
l−1
X
γ τ +1 E[q̂ 0 (v0,+τ +1 )] + q̂ 0 (v0 ) + γν (l)
τ =0
l+1 h
i
X
=
E E[v0:L−(l+2),+l+1 |v+τ ]> Q(l+1)
E[v
|
v
]
0:L−(l+2),+l+1
+τ
τ
τ =0
+ λ(l+1) E[v0:L−(l+2),+l+1 ] +
l
X
γ τ E[q̂ 0 (v0,+τ )] + ν (l+1)
τ =0
Moreover, the matrix
Pl+1
(l+1)
τ =0 Qτ
is positive semidefinite..
D. Proof of Proposition 5
Functions f˜(0) . . . f˜(Λ) can be expressed as:
f˜(l) (v) =
l
h
i
X
E E[v0:L−(l+1),+l |v+τ ]> Q(l)
E[v
|
v
]
0:L−(l+1),+l
+τ
τ
τ =0
(l)
+ λ E[v0:L−(l+1),+l ] +
l−1
X
(D.1)
τ
0
γ E[χ (v+τ )] + ν
(l)
τ =0
(l)
(l)
where the (L − l) × (L − l) matrices Q(l)
according
τ , the 1 × (L − l) vectors λ , and the scalars ν
to Equation (4.6) for l = 0, . . . , Λ, with the only change that replaces γ L−1 in (4.6) with γ L−M in
the expressions for λl+1 and ν l+1 .
Following (D.1) for l = L − M , we have the the expression for function f˜L−M (v+ ), which yields
Λ
closed-form expressions for the order quantity z T . In particular, when M = 1, matrices Q(L−1)
τ
and vector λ(L−1) reduce to scalars. Let ρ = x0 − d, then v0,+Λ = −ρ−
+L−M + z and
L−1
L−1
n X
o
X
Λ
(L−1)
−
2
(L−1)
L−1
z T = arg min γ
Q(L−1)
z
−
2γ
E[ρ
Q
]z
+
γ(λ
+
γ
c)z
τ
τ
+L−1
z≥0
τ =0
τ =0
Λ
+
= ς T + E[ρ−
,
+L−1 ]
Λ
in which ς T = −(λ̂
(L−1)
+ γ L−1 c)/ 2
PL−1
τ =0
L
Q̂τ(L−1) , similar to the ς T in the lost-sales model.
28
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
If M ≥ 2, we use Ĵl to denote the l × l matrix

1
1 1
Ĵl = 
..

.
11
1


 ,

then
v0:M −1,+Λ =ĴM x0:M −1,+Λ = ĴM (x0,+Λ , xΛ+1 , · · · , xL−1 , z) .
(L−M )
)
(L−M )
Let Q̂(L−M
= Ĵ>
ĴM and λ̂
τ
M Qτ
z
TΛ
= λ(L−M ) ĴM , then
L−M
n L−M
X (L−M )
X (L−M )
2
= arg min γ
Q̂τ,(M,M ) z + 2γ
Q̂τ,(M,1:M −1) E[x0:M −2,+Λ ]z
z≥0
τ =0
(L−M )
τ =0
L−M
o
+ γ(λ̂(M ) + γ
c)z
Λ
+
Λ
Λ
= ς T − (κT1 E[x0,+Λ ] + κT2:M −1 xΛ+1:L−1 ) ,
in which
ς
TΛ
PL−M (L−M )
(L−M )
−(λ̂(M ) + γ L−M c)
τ =0 Q̂τ,(M,l)
TΛ
=
PL−M (L−M ) , κl = PL−M (L−M ) , for l = 1, · · · , M − 1 .
2 τ =0 Q̂τ,(M,M )
τ =0 Q̂τ,(M,M )
E. More on the T Λ+1 Policy
It can be verified that
n
o
f˜(Λ) (v) = min γ Λ cz + γE f˜(L−M ) (v+ )
z
=
Λ
X
h X
2 i
− L−M
−
Λ
γ τ E[χ0 (v+τ )] + ν (L) ,
+ γ cE ρ+L−1 +
E ρ+L−1 |v+τ
Q(Λ)
τ E
τ =0
τ =0
with
(Λ)
Q0
= −γ
Λ
X
(L−M )
)
Q(L−M
, Q(Λ)
= γQτ −1
τ
τ
∀τ = 1, · · · , Λ .
τ =1
Therefore, for M = 1, we have
(
X
)
L+1
h i
Λ+1
2
−
(−) −
zT
= arg min γ Λ cz + γ Λ E θρ+
ρ+Λ + γ
Q(Λ)
+ γ L cE ρ−
.
τ E E ρ+Λ |v+τ
+Λ + h
+Λ
z≥0
τ =1
For M ≥ 2, the Q̂(Λ)
and λ̂
τ
(Λ)
can be defined in the same way as Q̂τ(L−M ) and λ̂
(L−M )
, note that
they are scalars when M = 2. Therefore, if M = 2,
zT
Λ+1
Λ
Λ
n X
X
−
2
Q̂(Λ)
= arg min γ
Q̂(Λ)
z
−
2γ
τ
τ E[ρ+Λ+1 ]z
z≥0
τ =0
(Λ)
+ (γ λ̂
τ =0
o
+ γ c)z + γ E h(+) [vM −1,+Λ − d+Λ ]+ + h(−) [vM −1,+Λ − d+Λ ]− .
Λ
Λ
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
29
and if M > 2,
z
T Λ+1
Λ
Λ
n X
X
(Λ)
(Λ)
2
Q̂τ,(M −1,1:M −2) E[x0:M −3,+Λ+1 ]z
Q̂τ,(M −1,M −1) z + 2γ
= arg min γ
z≥0
τ =0
τ =0
(Λ)
o
+ (γ λ̂(M −1) + γ c)z + γ E h(+) [vM −1,+Λ − d+Λ ]+ + h(−) [vM −1,+Λ − d+Λ ]− .
Λ
Λ
References
Brown, D. B. and Smith, J. E. (2014). Information Relaxations, Duality, and Convex Stochastic Dynamic
Programs. to appear in Operations Research.
Chao, X., Gong, X., Shi, C., Yang, C., Zhang, H., and Zhou, S. X. (2015). Approximation Algorithms for
Capacitated Perishable Inventory Systems with Positive Lead Times. working paper, University of
Michigan.
Chen, W., Dawande, M., and Janakiraman, G. (2014a). Fixed-Dimensional Stochastic Dynamic Programs:
An Approximation Scheme and an Inventory Application. Operations Research, 62(1):81–103.
Chen, X., Pang, Z., and Pan, L. (2014b). Coordinating Inventory Control and Pricing Strategies for Perishable
Products. Operations Research, 62(2):284–300.
de Farias, D. P. and Van Roy, B. (2003a). Approximate Linear Programming for Average-Cost Dynamic
Programming. In Advances in Neural Information Processing Systems 15. MIT Press.
de Farias, D. P. and Van Roy, B. (2003b). The Linear Programming Approach to Approximate Dynamic
Programming. Operations Research, 51(6):850–865.
de Farias, D. P. and Van Roy, B. (2004). On Constraint Sampling in the Linear Programming Approach to
Approximate Dynamic Programming. Mathematics of Operations Research, 29(3):462–478.
de Farias, D. P. and Van Roy, B. (2006). A Cost-Shaping Linear Program for Average-Cost Approximate
Dynamic Programming with Performance Guarantees. Mathematics of Operations Research, 31(3):597–
620.
Desai, V. V., Farias, V. F., and Moallemi, C. C. (2012). Approximate Dynamic Programming via a Smoothed
Linear Program. Operations Research, 60(3):655–674.
Goldberg, D. A., Katz-Rogozhnikov, D. A., Lu, Y., Sharma, M., and Squillante, M. S. (2012). Asymptotic
Optimality of Constant-Order Policies for Lost Sales Inventory Models with Large Lead Times. working
paper, Georgia Institute of Technology.
Huh, W. T. and Janakiraman, G. (2010). On the Optimal Policy Structure in Serial Inventory Systems with
Lost Sales. Operations Research, 58(2):486–491.
Huh, W. T., Janakiraman, G., Muckstadt, J. A., and Rusmevichientong, P. (2009). Asymptotic Optimality
of Order-Up-To Policies in Lost Sales Inventory Systems. Management Science, 55(3):404–420.
30
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
Karaesmen, I. Z., Scheller-Wolf, A., and Deniz, B. (2011). Managing Perishable and Aging Inventories:
Review and Future Research Directions. In Kempf, K. G., Keskinocak, P., and Uzsoy, R., editors,
Planning Production and Inventories in the Extended Enterprise, volume 151 of International Series
in Operations Research & Management Science, chapter 15, pages 393–436. Springer US, Boston, MA.
Karlin, S. and Scarf, H. (1958). Inventory Models of the Arrow-Harris-Marschak Type with Time Lag. In
Arrow, K. J., Karlin, S., and Scarf, H., editors, Studies in the Mathematical Theory of Inventory and
Production, chapter 10. Stanford University Press, Stanford, CA.
Levi, R., Janakiraman, G., and Nagarajan, M. (2008). A 2-Approximation Algorithm for Stochastic Inventory
Control Models with Lost Sales. Mathematics of Operations Research, 33(2):351–374.
Lu, Y. and Song, J.-S. (2005). Order-Based Cost Optimization in Assemble-to-Order Systems. Operations
Research, 53(1):151–169.
Morton, T. E. (1969). Bounds on the Solution of the Lagged Optimal Inventory Equation with No Demand
Backlogging and Proportional Costs. SIAM Review, 11(4):572–597.
Morton, T. E. (1971). The Near-Myopic Nature of the Lagged-Proportional-Cost Inventory Problem with
Lost Sales. Operations Research, 19(7):1708–1716.
Murota, K. (1998). Discrete Convex Analysis. Mathematical Programming, 83(1-3):313–371.
Murota, K. (2003). Discrete Convex Analysis. SIAM, Philadelphia.
Murota, K. (2005). Note on Multimodularity and L-Convexity. Mathematics of Operations Research,
30(3):658–661.
Nahmias, S. (1976a). Myopic Approximations for the Perishable Inventory Problem. Management Science,
22(9):1002–1008.
Nahmias, S. (1976b). Simple Approximations for a Variety of Dynamic Leadtime Lost-Sales Inventory
Models. Operations Research, 27(5):904–924.
Nahmias, S. (1982). Perishable Inventory Theory: A Review. Operations Research, 30(4):680–708.
Nahmias, S. (2011). Perishable Inventory Systems, volume 160 of International Series in Operations Research
& Management Science. Springer US, Boston, MA.
Nahmias, S. and Pierskalla, W. P. (1973). Optimal Ordering Policies for a Product That Perishes in Two
Periods Subject to Stochastic Demand. Naval Research Logistics Quarterly, 20(2):207–229.
Pang, Z., Chen, F. Y., and Feng, Y. (2012). Technical Note–A Note on the Structure of Joint InventoryPricing Control with Leadtimes. Operations Research, 60(3):581–587.
Reiman, M. I. (2004). A New and Simple Policy for the Continuous Review Lost Sales Inventory Model.
working paper, Bell Labs.
Wang, K. (2014). Heuristics for Inventory Systems Based on Quadratic Approximation of L\ -Convex Value
Functions. PhD thesis, Duke University.
Sun, Wang, and Zipkin: Approximation of Cost Functions in Inventory Control
31
Zipkin, P. (2008a). Old and New Methods for Lost-Sales Inventory Systems. Operations Research, 56(5):1256–
1263.
Zipkin, P. (2008b). On the Structure of Lost-Sales Inventory Models. Operations Research, 56(4):937–944.