P. Cao, J. Xie. Optimal Control of an Inventory System With Joint

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016
4235
Optimal Control of an Inventory System With
Joint Production and Pricing Decisions
Ping Cao and Jingui Xie
Abstract—In this study, we consider a stochastic inventory system in which the objective of the manufacturer is to maximize
the long-run average profit by dynamically offering the selling
price and switching the production on or off. The demand process
is non-homogeneous Poisson with a price-dependent arrival rate.
System costs consist of switching costs, inventory holding and
backlogging costs. We show that an (s, S, p) policy is optimal.
Moreover, we characterize the structural properties of the optimal
profit function and pricing strategy, and show that the optimal
price is a quasiconcave function of the inventory level when the
production is off and is a quasiconvex function when the production is on.
Index Terms—Average criterion, limited capacity, optimal
(s, S, p) policy, production/inventory systems.
I. I NTRODUCTION
With the rapid development of e-commerce and new technologies
(e.g., big data, could computing and electronic funds transfer), a
number of industries have begun to adopt dynamic pricing strategies
and sell products directly to customers through their websites. For
example, Dell sells its products through its website, offering promotions every week and even changing prices daily. The Alibaba Group
operates leading online and mobile marketplaces in retail and wholesale trades and provides technology and services to enable buyers
and suppliers to conduct commerce more easily. These developments
have resulted in an increased interest in the construction of models to integrate production decisions, inventory control, and pricing
strategies to improve the profitability of companies. In this study,
we consider an inventory model with joint production and pricing
decisions, in which the demand process is non-homogeneous Poisson
with a price-dependent arrival rate, the production rate is constant and
the production can be switched on or off with a fixed cost.
When the production capacity is unlimited, this problem becomes
an inventory control problem with pricing and replenishment decisions, which has been well studied in the literature. For example,
Chen and Simchi-Levi [3] consider a periodic-review model with
price-dependent demand and a fixed ordering cost, and they show
under some conditions that the optimal policy for the infinite-horizon
problem is an (s, S, p) policy. In [4], Chen and Simchi-Levi consider
a continuous-review model in which the inter-arrival time and the
demand size depend on the selling price. They show that optimal
stationary (s, S, p) inventory policies exist for both discounted and
Manuscript received May 6, 2015; revised September 6, 2015 and
December 7, 2015; accepted December 13, 2015. Date of publication
December 17, 2015; date of current version December 2, 2016. This work was
supported by NSFC under Grants 71201154, 71401159, and 71571176, and
by the Fundamental Research Funds for the Central Universities under Grants
WK2040160009 and WK2040160011. Recommended by Associate Editor
L. H. Lee.
The authors are with the School of Management, University of Science and
Technology of China, Hefei 230026, China (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TAC.2015.2510162
average profit problems and prove that a higher inventory holding and
shortage cost result in a smaller selling price for the average profit
problem. Chao and Zhou [2] consider an infinite-horizon, continuousreview stochastic inventory system in which the demand process is
Poisson with a price-dependent arrival rate. They show that the optimal
price is a unimodal function of the inventory level. However, none of
them consider the case in which production capacity is limited.
Several papers also consider a limited production capacity in inventory models. Federgruen and Zipkin [5] consider a periodic-review
inventory model with a finite production capacity and show that under
a few additional unrestrictive assumptions, a modified base-stock policy is optimal under the average-cost criterion. Foreest and Wijngarrd
[6] consider a continuous review production-inventory system with
compound Poisson demand and fixed switching costs and show that
(s, S) policy is optimal if the holding cost is convex. However, they
do not incorporate pricing decisions into their models.
In this study, we consider a stochastic inventory system in which
the manufacturer can dynamically adjust the selling price and switch
the production on or off. To the best of our knowledge, this study
is the first to incorporate pricing and production decisions into the
inventory control. We assume that the production rate is large. Under
this assumption, we show that an (s, S, p) policy is optimal for a
discretized model. An (s, S, p) policy says that if the inventory level
falls to s or below then the production will be switched on; if the
inventory level reaches to S or above then the production will be
switched off; and if the inventory level is x (s < x < S), then the
status of production will remain unchanged. The price p depends on
the current inventory level. Moreover, we characterize the optimal
inventory control and pricing strategy. We show that the optimal price
is a quasiconcave function of the inventory level when the production
is off and is a quasiconvex function when the production is on.
The rest of the technical note is organized as follows. In Section II,
we describe the inventory model and apply a discretization technique.
In Section III, we show that the (s, S, p) policy is optimal and
characterize the structural properties of the optimal profit function and
pricing strategy. In Section IV, we conclude the technical note with
several possible extensions.
II. M ODEL
Consider a manufacturer who can control its inventory level by
production and pricing jointly. The selling price of each product is
set as pt at time t. Demand arrives according to a non-homogeneous
Poisson process with rate Λt (pt ), which depends on the price pt .
Demand is met immediately if the inventory is available. Otherwise,
it is backlogged. At any time t, the production can be switched
on (off) with cost K1 (K0 ). Production rate is constant r, i.e., the
manufacturer can produce r products in one unit of time if production
is on.
This dynamic inventory control problem is analytically challenging.
In this technical note, we apply a discretization technique and solve
the corresponding discretized problem. Let Δt = 1/r. Assume r is
large enough so that Δt is very small. Due to industrialization and
0018-9286 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
4236
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016
mass production, this assumption is not unrealistic. According to the
property of non-homogeneous Poisson process, there is one arrival in
time interval (t, t + Δt] with probability (w.p.) Λt Δt + o(Δt), more
than one arrival w.p. o(Δt), and no arrivals w.p. 1 − Λt Δt. If Δt → 0,
then approximately there is one arrival in time interval (t, t + Δt] with
probability (w.p.) Λt Δt, more than one arrival w.p. 0, and no arrivals
w.p. 1 − Λt Δt.
Now, we can define the discretized problem. Let the length of each
period be Δt. Suppose that price p ∈ P is chosen at the beginning
of a period, where P is assumed to be a compact set. Let λ(p) :=
Λt Δt. Then with probability λ(p) that a demand arrives, and with
probability 1 − λ(p) no demand arrives. Demand is met by end-ofperiod inventory, and unsatisfied demand is backlogged. Production
rate is now 1, i.e., the manufacturer can produce one product in one
period if the production is on. At the beginning of each period the state
of production is chosen to be on or off. If production is off (on) at the
end of a period, it can be switched on (off) at the beginning of the next
period at the expense of a setup cost K1 (K0 ). When production is on
(off) in a period, keeping it on (off) is free.
We suppose that the production cost per item is c. Inventory costs
are accrued at the end of a period according to the function h(·). A
typical form for h(·) is h(x) = hx+ + bx− , where x+ = max{x, 0},
x− = max{−x, 0}, and h and b are the holding cost and shortage cost
per item per period, respectively. In the remainder of the technical note,
we take the following assumptions of h(x).
Assumption 2.1: h(x) is quasiconvex in x with its minimum point
x = 0, h(0) = 0, and lim|x|→∞ h(x) = ∞.
A function f (x) (x ∈ Z) is called quasiconvex (unimodal) if for
some value m, it is increasing for x ≤ m and decreasing for x ≥ m.
A function f (x) is quasiconcave if −f (x) is quasiconvex.
The system state can be described by its current inventory level
x, and its production status y, thereby denoted by (x, y), where
y = 1 indicates that the production is on and y = 0 indicates that the
production is off.
Assumption 2.2: λ(p) is strictly decreasing and continuous in p;
Moreover, λ̄ ≡ maxp∈P λ(p) < 1.
Assumption 2.3: There exists a positive number M such that
maxp∈P pλ(p) = M < ∞.
Assumption 2.4: K0 + K1 > 0.
Given a pricing and production policy u and the initial system state
(x, y), the long-run average profit can be formulated as
T
1
E
(pt Dt (pt )−h(xt+1 )−cδ(yt = 1)
Vc (x, y, u) = lim inf
T →∞ T
t=1
−K1 δ(yt = 1, yt−1 = 0)
−K0 δ(yt = 0, yt−1 = 1))
where xt , yt , and pt are the inventory level, production state and
selling price at the beginning of period t, respectively, Dt (pt ) is the
demand in period t, which is 1 w.p. λ(pt ) and 0 w.p. 1 − λ(pt ). δ(A)
is an indicator function, which is 1 if the event A is true, and 0 if A
is false.
The manufacturer’s objective is to maximize its long-run average
profit by dynamically adjusting its price and switching the production
on or off, i.e., find an optimal policy u∗ such that
Vc (x, y) ≡ Vc (x, y, u∗ ) = sup Vc (x, y, u)
u∈U
where U is the set of all admissible policies.
III. M AIN R ESULTS
A. Existence of Optimal Stationary Policies
Assumption 2.4 implies that it is optimal to switch the production
at most once at each state. Thus, the problem can be casted as a
discrete-time infinite-horizon Markov decision problem under average
criterion with state space S = {(x, y)|x ∈ Z, y = 0, 1}, action space
A = {(p, z)|p ∈ P, z = 0, 1}, one-step reward function at state (x, y)
under action (p, z)
R(x, y, p, z) = −Kz δ(z = y) + pλ(p) − cδ(z = 1)
− λ(p)h(x + z − 1)−(1−λ(p)) h(x + z)
(1)
and one-step transition probability from state (x, y) to (x , z) under
action (p, z)
⎧
⎪
if x = x + δ(z = 1) − 1
⎨λ(p),
P (x, y, p, x , z) = 1 − λ(p), if x = x + δ(z = 1)
⎪
⎩
0,
otherwise.
Before we show the existence of an optimal stationary policy, the
following lemma is required, which comes from Corollary 7.5.9 and
Theorem 7.5.6 in [9].
Lemma 3.1: Consider a discrete time Markov decision process
{x(t) : t ≥ 0} consisting of four-element tuple {S, (A(i), i ∈ S),
p(j|i, a), c(i, a)} under average cost criterion: The state space S is
denumerable; each action space A(i) is a subset of the finite action
space A; the transition probability p(j|i, a) satisfies p(j|i, a) ≥ 0,
i, j ∈ S, a ∈ A(i) and j∈S p(j|i, a) = 1, ∀ i ∈ S, a ∈ A(i); the
one-step cost function c(i, a) ≥ 0, ∀ i ∈ S, a ∈ A(i).
There exists an optimal stationary policy if the following set (CAV)
of assumptions hold:
(CAV1) There exists a z standard policy d with positive recurrent
class Rd . d is defined to be a i0 standard policy if the Markov
process induced by d, {xd (t) : t ≥ 0} satisfies that for any i ∈ S,
the expected time mi,i0 (d) of a first passage from i to i0 (during
which at least one transition occurs) is finite and the expected
cost ci,i0 (d) of a first passage from i to i0 (during which at least
one transition occurs) is finite.
(CAV2) Given U > 0, the set DU = {i|C(i, a) ≤ U for some a}
is finite.
(CAV3) Given i ∈ S − Rd , there exists a policy θi ∈ R∗ (z, i),
where R∗ (z, i) is the class of policies d such that the expected
cost cz,i (d) of a first passage from z to i is finite.
Since P is a compact set, it follows from (1) that there exists a positive number M such that R(x, y, p, z) ≤ M . Thus, the average profit
problem can be casted as a discrete time Markov decision process
under average cost criterion with one-step cost M − R(x, y, p, z).
Now we are ready to show the existence of an optimal stationary policy
by applying Lemma 3.1 to verify that (CAV1–3) holds in our model.
Proposition 3.1: There exists an optimal stationary policy for the
average profit problem.
Proof: Let policy d be an (s, S, p) policy which is described in
the introduction such that λ(p(x, 0)) > 0 for x > S, λ(p(x, 1)) < 1
for x < s, and 0 < λ(p(x, y)) < 1 for s ≤ x ≤ S and y = 0, 1. It is
straightforward to see that for any inventory level z ∈ [s, S] ∩ Z, the
Markov chain induced by d is (z, 0) standard with its positive recurrent
class Rd = {(x, y) ∈ S : s ≤ x ≤ S}. Thus, (CAV1) holds.
Given U , it is straightforward to verify that DU = {(x, y) ∈
S|R(x, y, p, z) ≥ −U for some (p, z) ∈ A} is finite by
Assumption 2.1 and thus (CAV2) holds.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016
Given any system state (x0 , y0 ) such that (x0 , y0 ) ∈ Rd , define an
(s , S , p ) policy denoted by d , such that λ(p (x, 0)) > 0 for x > S ,
λ(p (x, 1)) < 1 for x < s , 0 < λ(p (x, y)) < 1 for s ≤ x ≤ S and
y = 0, 1, and s ≤ min(x0 , s) and S ≥ max(x0 , S). Thus, it is clear
that d ∈ R∗ ((z, 0), (x0 , y0 )) and thus (CAV3) holds.
B. Optimality of (s, S, p) Policy
Based on the results in Section III-A, we only consider stationary
policies, i.e., the decision depends deterministically on the system’s
current state. Under a stationary policy u, if the system’s current state
is (x, y), then the price is a function pu (x, y), and the production
decision is y u (x, y) = 1+, 0− or y, where y u (x, y) = 1+ implies
that y = 0 and the production is switched on, y u (x, y) = 0− implies
that y = 1 and the production is switched off, and y u (x, y) = y
implies keeping the production status unchanged.
Note that the inventory process is two-sided skip free (i.e., the
inventory level x cannot jump to x + z, |z| ≥ 2 in one period). It
follows from [10] that the following result holds.
Proposition 3.2: There exists an (s, S, p) policy which is optimal in
the class of all stationary policies.
Proof: Suppose that a stationary policy u is given and it
is optimal. If the initial state is (x0 , 1), define S u = inf{x ∈ Z :
x ≥ x0 , y u (x, 1) = 0−} and su = sup{x ∈ Z : x < S u , y u (x, 0) =
1+}. First, it is easy to see that S u < ∞. Otherwise, the inventory
level will go to ∞ and thus the policy u cannot be optimal. Moreover,
we have su > −∞. Otherwise, the inventory level will go to −∞ and
thus the policy u cannot be optimal. Since the inventory process is twosided skip free, we know that the inventory level is always between
su and S u , and thus the long-run average profit under the policy u is
equal to that under an (su , S u , p) policy. If the initial state is (x0 , 0),
then we define su = sup{x ∈ Z : x ≤ x0 , y u (x, 0) = 1+} and S u =
inf{x ∈ Z : x > su , y u (x, 1) = 0−}. Using an similar argument, we
know that the long-run average profit under the policy u is equal to that
under a (su , S u , p) policy. Therefore, there exists an (s, S, p) policy
which is optimal in the class of all stationary policies.
Note that the resulting su and S u depend on the system’s initial
state. This proposition tells us that we can restrain our discussion on
the (s, S, p) policies. We mention that the above argument does not
work for discounted case.
C. Average Profit Under an (s, S, p) Policy
To compute the average profit, we use renewal reward theory. A
cycle starts from (S, 1). The manufacturer switches off the production
and the inventory level falls from S to s. Afterward, the production is
switched on and the inventory level rises from s to S, which ends a
circle. Next, we compute the expected cycle length and the expected
profit per cycle.
Under an (s, S, p) policy, if the production is off, the length of
the time in which the inventory level falls from x to x − 1, denoted by τ0 (x), is geometrically distributed with mean 1/λ(p(x, 0)),
i.e., P (τ0 (x) = k) = λ(p(x, 0))(1 − λ(p(x, 0)))k−1 , k ≥ 1. The expected profit in which the inventory level falls from x to x − 1 is
R0 (x) = E [p(x, 0) − h(x)(τ0 (x) − 1) − h(x − 1)]
= p(x, 0) −
1 − λ (p(x, 0))
h(x) − h(x − 1).
λ (p(x, 0))
If the production is on, the length of the time in which the inventory
level rises from x to x + 1, denoted by τ1 (x), is geometrically distrib-
4237
uted with mean 1/(1 − λ(p(x, 1)). The expected profit in which the
inventory level rises from x to x + 1 is
R1 (x) = E [p(x, 1) (τ1 (x) − 1) − h(x) (τ1 (x) − 1)
− h(x + 1) − cτ1 (x)]
λ (p(x, 1))
λ (p(x, 1))
− h(x)
1 − λ (p(x, 1))
1 − λ (p(x, 1))
c
− h(x + 1) −
.
1 − λ (p(x, 1))
= p(x, 1)
It follows from renewal reward theory that the average profit under
an (s, S, p) policy is:
S−1
−K + S
i=s+1 R0 (i) +
i=s R1 (i)
R(s, S, p) = S
S−1
i=s+1 1/λ (p(i, 0))+
i=s 1/ (1−λ(p(i, 1))
where K = K1 + K0 .
By subtracting a dummy profit γ from original profit in each period,
we define the following auxiliary total profit function in a cycle
S
L(s, S, p, γ) = −K +
R0 (i) +
i=s+1
S
−γ
S−1
1/λ (p(i, 0))+
i=s+1
S = −K +
+
i=s
p(i, 0)−
+
i=s
1/ (1 − λ (p(i, 1)))
h(i) + γ
+h(i)−h(i−1)
λ (p(i, 0))
p(i, 0) −
i=s+1
S−1
p(i, 1)λ (p(i, 1)) − c − h(i) − γ
1 − λ (p(i, 1))
+ h(i) − h(i + 1)
S
= −K +
S−1
i=s
i=s+1
S−1
R1 (i)
i=s
h(i) + γ
λ (p(i, 0))
p(i, 1)λ (p(i, 1)) − c − h(i) − γ
1 − λ (p(i, 1))
.
(2)
Define L(γ) = maxs,S,p L(s, S, p, γ). Then it follows from basic
theory of fractional programming [8] that γ is the maximal average
profit if and only if L(γ) = 0. It follows from the first equality
of (2) that L(s, S, p, γ) is a strictly decreasing, convex function of
γ. Since both the monotonicity and convexity are preserved under
maximization, we know that L(γ) is also a strictly decreasing, convex
function of γ. It follows from the first equality of (2) and the definition
of L(γ) that if γ is sufficiently large, L(γ) < 0 and if γ is sufficiently
small, L(γ) > 0. In fact, in Proposition 3.3 we show that L(γ) = +∞
if γ < 0. Since L(γ) is strictly decreasing in γ, there exists a unique
γ ∗ , such that L(γ ∗ ) = 0 and γ ∗ is the optimal average profit.
D. Analysis of the Optimal Price Given s, S, and γ
We analyze the optimal price given s, S, and γ, which is the optimal
solution to the problem
L(s, S, γ) := max L(s, S, p, γ).
p
It follows from Assumption 2.2 that λ(p) has a strictly decreasing,
continuous inverse p(λ). Therefore, we can take arrival probability λ
as the decision variable instead of p.
4238
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016
Define f0 (γ, x, λ) = p(λ)−(h(x)+γ)/λ and f1 (γ, x, λ) = (λp(λ)−
c − h(x) − γ)/(1 − λ). It follows from the last equality of (2) that:
S
f0 γ, i, λ (p(i, 0))
L(s, S, p, γ) = −K +
i=s+1
+
S−1
f1 (γ, i, λ (p(i, 1))) .
i=s
We first compute the optimal arrival probability λ∗ (x, 0) =
λ(p∗ (x, 0)) when the inventory level is x and the production is off.
The following assumption is required for our discussion.
Assumption 3.1: For any 0 < λ ≤ λ̄, λp (λ) + 2p (λ) < 0.
Note that the condition that p(λ) is concave decreasing in λ is
sufficient for λp (λ) + 2p (λ) < 0. Moreover, it follows from Assumption 3.1 that the feasible region of λ is the interval [0, λ̄].
It follows from:
1
2
d
p
dp μ1
μ
1
2 1
1
1 1
= − 2 p
,
+
=
p
p
dμ
μ
μ
dμ2
μ4
μ
μ3
μ
that Assumption 3.1 implies that p(1/μ) is strictly concave in μ
for μ > 1. Thus, by letting μ = 1/λ it is known that f0 (γ, x, λ) =
p(1/μ) − (h(x) + γ)μ is strictly concave in μ and its first-order
derivative is
∂f0
= −λ2 p (λ) − (h(x) + γ) .
∂μ
(3)
Now we can claim that it suffices to consider the case γ ≥ 0 in our
technical note.
Proposition 3.3: If γ < 0, then L(γ) = +∞.
Proof: Note that for admissible λ, −λ2 p (λ) ≥ 0. Thus, if
γ < 0, then f0 is increasing in μ and thus decreasing in λ at x =
0. Therefore, maxλ∈[0,λ̄] f0 (γ, 0, λ) = f0 (γ, 0, 0) = +∞ and thus
L(s, S, γ) = +∞ which implies L(γ) = +∞.
Let g0 (λ) = −λ2 p (λ), then g0 is strictly increasing in λ for λ > 0
as g0 (λ) = −λ(λp (λ) + 2p (λ)) > 0 according to Assumption 3.1.
Thus, g0 has a strictly increasing inverse denoted by g0−1 . Moreover,
g0 (0) = 0 and g0 (λ̄) = −λ̄2 p (λ̄). It follows from (3) that the optimal
arrival probability at state (x, 0) is:
g0−1 (h(x) + γ) , if h(x) + γ ≤ g0 (λ̄)
∗
λ (x, 0) =
(4)
λ̄,
otherwise.
λ̄)p (λ̄). It follows from (5) that the optimal arrival probability at state
(x, 1) is:
g1−1 (c + h(x) + γ) , if c + h(x) + γ ≤ g1 (0)
(6)
λ∗ (x, 1) =
0,
otherwise.
Define r0 (γ, x) = maxλ f0 (γ, x, λ) = f0 (γ, x, λ∗ (x, 0)) and
r1 (γ, x) = maxλ f1 (γ, x, λ) = f1 (γ, x, λ∗ (x, 1)). Obviously, both
r0 (γ, x) and r1 (γ, x) are strictly decreasing, convex functions of γ
as both f0 (γ, x, λ) and f1 (γ, x, λ) are strictly decreasing, convex
functions of γ. Moreover. we have
L(s, S, γ) = −K +
= −K +
S
i=s+1
S
and Assumption 3.1 thatf1 (γ, x, λ) = (μ − 1)p((μ − 1)/μ) − (c +
h(x) + γ)μ is strictly concave in μ for μ > 1, and its first-order
derivative is
∂f1
= λ(1 − λ)p (λ) + p(λ) − (c + h(x) + γ) .
∂μ
(5)
Let g1 (λ) = λ(1−λ)p (λ)+p(λ), then g1 is strictly decreasing in
λ for λ > 0 as g1 (λ) = (1 − λ)(λp (λ) + 2p (λ)) < 0 according to
Assumption 3.1. Thus, g1 has a strictly decreasing inverse denoted by
g1−1 . Moreover, g1 (0) = limλ→0 λp (λ) + p(0) and g1 (λ̄) = λ̄(1 −
S−1
r1 (γ, i)
i=s
(r0 (γ, i) + r1 (γ, i − 1)) .
(7)
i=s+1
The following two propositions characterize the relationship of
r0 (γ, x), r1 (γ, x) with x, which is crucial in determining the optimal
s(γ) and S(γ).
Proposition 3.4: For any given γ ≥ 0, both r0 (γ, x) and r1 (γ, x)
(x ∈ R) are quasiconcave functions of x, and 0 is the maximum point.
Proof: It follows from the envelope theorem [7] that
(dr0 (γ, x))/dx = −h (x)/λ∗ (x, 0) and (dr1 (γ, x))/dx = −h (x)/
(1 − λ∗ (x, 0)). Since h(x) is quasiconvex in x, we have h (x) ≤ 0
when x ≤ 0 and h (x) ≥ 0 when x ≥ 0. Therefore, both r0 (γ, x)
and r1 (γ, x) are quasiconcave functions of x, and 0 is the
maximum point.
Proposition 3.5: For any given γ ≥ 0, r0 (γ, x) + r1 (γ, x − 1) (x ∈
Z) is a quasiconcave function of x, and 0 or 1 is the maximum point.
Proof: It follows from Proposition 3.4 that r0 (γ, x)+r1 (γ, x −
1) is increasing on x ≤ 0 and decreasing on x ≥ 1. Since x ∈ Z,
r0 (γ, x) + r1 (γ, x − 1) takes its maximum at point 0 or 1 and is quasi
concave in x ∈ Z.
We mention that the constraint x ∈ Z is necessary for Proposition
3.5 to be held as r0 (γ, x)+r1 (γ, x−1) might not be quasiconcave for
0 < x < 1.
E. Analysis of the Optimal s and S Given γ
Now we want to find the optimal s and S given γ, which is the
solution of
max
s<S,s,S∈Z
Next we compute the optimal arrival probability λ∗ (x, 1) when the
inventory level is x and the production is on.
Let μ = 1/(1 − λ). It follows from:
μ−1 μ−1
μ−1
d
μ−1
=
+
p
p
(μ − 1)p
dμ
μ
μ2
μ
μ
2
μ−1
d
μ − 1 μ − 1
2 μ−1
(μ−1)p
p
p
=
+
dμ2
μ
μ4
μ
μ3
μ
r0 (γ, i) +
L(s, S, γ).
Proposition 3.6: Given γ ≥ 0, if the set {x ∈ Z : r0 (γ, x) +
r1 (γ, x − 1) ≥ 0, x ≥ 0} is non-empty, then the optimal s(γ) and
S(γ) are given by
s(γ) = min {x ∈ Z : r0 (γ, x)+r1 (γ, x−1) ≥ 0}−1
S(γ) = max {x ∈ Z : r0 (γ, x) + r1 (γ, x − 1) ≥ 0} .
(8)
(9)
Proof: Since r0 (γ, x) + r1 (γ, x − 1) is a quasiconcave function of x ∈ Z, it follows from (7) that the optimal solution for
maxs,S L(s, S, γ) should be the s and S such that r0 (γ, i) + r1 (γ, i −
1) ≥ 0 for all s + 1 ≤ i ≤ S. Thus, the result holds.
The set {x ∈ Z : r0 (γ, x) + r1 (γ, x − 1) ≥ 0, x ≥ 0} might be
empty if γ is quite large. In this case, r0 (γ, x) + r1 (γ, x − 1) < 0
for all x ∈ Z. In this case, if r0 (γ, 0) + r1 (γ, −1) ≥ r0 (γ, 1) +
r1 (γ, 0), then the optimal (s(γ), S(γ)) = (−1, 0). Otherwise, the
optimal (s(γ), S(γ)) = (0, 1).
Proposition 3.7: Consider the domain of γ such that the set {x ∈ Z :
r0 (γ, x) + r1 (γ, x − 1) ≥ 0, x ≥ 0} is non-empty. s(γ) is increasing
in γ and S(γ) is decreasing in γ.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016
4239
Proof: Note that r0 (γ, x) + r1 (γ, x − 1) is decreasing in γ. For
γ1 < γ2 , it follows from (8) that r0 (γ1 , s(γ1 )) + r1 (γ1 , s(γ1 )−1) < 0
and thus r0 (γ2 , s(γ1 ))+r1 (γ2 , s(γ1 )−1) < 0. Again, it follows from
(8) that s(γ1 ) < s(γ2 ) + 1, or s(γ1 ) ≤ s(γ2 ). Thus, s(γ) is increasing
in γ. Similarly, it follows from (9) that S(γ) is decreasing in γ.
Proposition 3.8: λ∗ (x, 0) is decreasing on x ≤ 0, and increasing
on x ≥ 0; λ∗ (x, 1) is increasing on x ≤ 0, and decreasing on x ≥ 0.
Proof: It follows from the definition of λ∗ (x, 0) that:
f0 (γ, x, λ∗ (x, 0)) ≥ f0 (γ, x, λ∗ (x − 1, 0))
f0 (γ, x − 1, λ∗ (x − 1, 0)) ≥ f0 (γ, x − 1, λ∗ (x, 0)) .
Summing the above two inequalities, we obtain
1
1
(h(x) − h(x − 1)) ≤ 0.
−
λ∗ (x, 0)
λ∗ (x − 1, 0)
It follows from the quasiconvexity of h(x) that λ∗ (x, 0) is decreasing
on x ≤ 0, and increasing on x ≥ 0. Similarly, we have:
1
1
(h(x) − h(x − 1)) ≤ 0.
−
1 − λ∗ (x, 1)
1 − λ∗ (x − 1, 1)
It follows from the quasiconvexity of h(x) that λ∗ (x, 1) is increasing
on x ≤ 0, and decreasing on x ≥ 0.
Note that p(λ) is strictly decreasing in λ and the optimal price
p∗ (x, y) = p(λ∗ (x, y)). Thus, the interpretation of Proposition 3.8 is
as follows: When the production is off (i.e., y = 0), the optimal price
increases in the inventory level x if x ≤ 0, and decreases in x if x ≥ 0.
When the production is on (i.e., y = 1), the optimal price decreases in
the inventory level x if x ≤ 0, and increases in x if x ≥ 0. Therefore,
the optimal price is a quasiconcave function of the inventory level x
when the production is off and is a quasiconvex function of x when
the production is on.
The result is consistent with our intuition. Intuitively, because
of inventory holding costs and backlogging costs, the manufacturer
attempts to maintain the inventory level around 0, by keeping the
inventory level around 0 as long as possible and moving the inventory
level away from S and s as fast as possible. Thus, he will make different
pricing strategies according to the current production state.
Suppose that the production is off (on): if the inventory level is high,
the manufacturer should set the selling price low (high) to attract more
demand to sell the product in stock (to make the inventory level quickly
reach S to switch the production off); if the backlogging level is high,
the manufacturer should also set the selling price low (high) to make
the inventory level quickly reach s to switch the production on (to meet
more backlogged demand).
F. Comparative Statics
We discuss the effect of switching cost. Let γ ∗ be the optimal
average profit and s∗ = s(γ ∗ ), S ∗ = S(γ ∗ ).
Proposition 3.9: γ ∗ is strictly decreasing in K = K0 + K1 ; s∗ is
decreasing in K and S ∗ is increasing in K. Moreover, p∗ (x, 0) is
increasing in K and p∗ (x, 1) is decreasing in K. Both the expected
production-on time length and production-off time length of a cycle
are increasing in K.
Proof: It follows from:
0 = Lγ ∗ =
=
max
s<S,s,S∈Z
max
s<S,s,S∈Z
L(s, S, γ ∗ )
−K +
S
∗
∗
(r0 (γ , i) + r1 (γ , i − 1))
(10)
i=s+1
and the fact that both r0 (γ, x) and r1 (γ, x) are strictly decreasing in γ
that γ ∗ is strictly decreasing in K. Thus, it follows from Proposition 3.7
that s∗ is decreasing in K and S ∗ is increasing in K.
Fig. 1. Sample trajectory of inventory level and price under optimal policy.
It follows from (4) and (6) that λ∗ (x, 0) is decreasing in K and
λ∗ (x, 1) is increasing in K. Thus, p∗ (x, 0) is increasing in K and
p∗ (x, 1) is decreasing in K.
∗ −1
The expected production-on time length of a cycle is S
i=s∗ 1/
(1 − λ∗ (i, 1)), which is increasing in K as λ∗ (x, 1) is increasing in
K, s∗ is decreasing in K, S ∗ is increasing in K and 1 − λ∗ (i, 1) > 0
for all i ∈ Z. Similarly, the expected production-off time length of a
∗
∗
cycle S
i=s∗ +1 1/λ (i, 0) is also increasing in K.
The above result indicates that when the switching cost K increases,
switching less frequently is preferred. The impact of holding cost on
the optimal policy is unclear, although γ ∗ decreases as holding costs
increase. Consider a special case in which the holding cost h(x) at each
state x is increased by Δh (i.e., h(x) + Δh). In this case, the optimal
average profit is decreased by Δh, but the optimal production-pricing
policy (i.e., s∗ , S ∗ and the optimal price) remains unchanged.
G. A Numerical Example
In this part, we provide a simple numerical example to illustrate the
properties of the optimal pricing strategies. In this example, the production rate r = 1, and λ(p) = e−θp with θ = 0.2. The allowable price
set P = [1, 10]. The switching costs are K0 = 50 and K1 = 100. The
holding cost function is h(x) = hx+ +bx− with h = 0.1, b = 0.15. By
using value iteration algorithm, we find that the optimal average profit
is γ ∗ = 0.13, s∗ = −12 and S ∗ = 17. Fig. 1 shows a sample trajectory
of inventory level and price under optimal policy, starting from level
0 with production on, which reflects an asymptotic cyclic behavior.
When the production is on, the price first decreases and then increases,
the inventory level increases from s to S. The price reaches the
minimum when the inventory level is close to zero, so that the demand
and production can be balanced while keeping low inventory. When
the production is off, the price first increases and then decreases, the
inventory level decreases from S to s. The price reaches the maximum
(p = 10) and keep for a while when the inventory level is close to zero.
IV. C ONCLUSION
In this technical note, we consider a stochastic inventory system
where the manufacturer can dynamically adjust the selling price and
switch the production on or off to maximize the long-run average
profit. We show that an (s, S, p) policy is optimal. Moreover, the structural properties of the optimal profit function and pricing strategies are
presented. There are a few topics worthy of further discussion. One can
consider a compound Poisson process or batch production, formulating
the problem as a continuous model with Poisson process directly. The
problem under a discounted criterion, in finite horizon, or with lead
times can also be studied.
4240
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016
ACKNOWLEDGMENT
The authors thank Prof. X. Chen for proposing this research
problem.
R EFERENCES
[1] D. P. Bertsekas, Dynamic Programming and Optimal Control. Belmont,
MA, USA: Athena Scientific, 2001, vol. II.
[2] X. Chao and S. X. Zhou, “Joint inventory-and-pricing strategy for
a stochastic continuous-review system,” IIE Trans., vol. 38, no. 5,
pp. 401–408, 2006.
[3] X. Chen and D. Simchi-Levi, “Coordinating inventory control and
pricing strategies with random demand and fixed ordering cost: The
infinite horizon case,” Mathemat. Oper. Res., vol. 29, no. 3, pp. 698–723,
2004.
[4] X. Chen and D. Simchi-Levi, “Coordinating inventory control and pricing strategies: The continous review models,” Oper. Res. Lett., vol. 34,
pp. 323–332, 2006.
[5] A. Federgruen and P. H. Zipkin, “An inventory model with limited
production capacity and uncertain demands I. The average-cost criterion,”
Mathemat. Oper. Res., vol. 11, no. 2, pp. 193–207, 1986.
[6] N. D. V. Foreest and J. Wijngaard, “On optimal policies for productioninventory systems with compound poisson demand and setup costs,”
Mathemat. Oper. Res., vol. 39, no. 2, pp. 517–532, 2014.
[7] P. Milgrom and I. Segal, “Envelope theorems for arbitrary choice sets,”
Econometrica, vol. 70, no. 2, pp. 583–601, 2002.
[8] S. Schaible and T. Ibaraki, “Fractional programming,” Eur. J. Oper. Res.,
vol. 12, pp. 325–338, 1983.
[9] L. I. Sennott, Stochastic Dynamic Programming and the Control of
Queueing Systems. New York, NY, USA: Wiley, 1999.
[10] M. J. Sobel, “Optimal average-cost policy for a queue with start-up and
shut-down costs,” Oper. Res., vol. 17, no. 1, pp. 145–162, 1969.