IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016 4235 Optimal Control of an Inventory System With Joint Production and Pricing Decisions Ping Cao and Jingui Xie Abstract—In this study, we consider a stochastic inventory system in which the objective of the manufacturer is to maximize the long-run average profit by dynamically offering the selling price and switching the production on or off. The demand process is non-homogeneous Poisson with a price-dependent arrival rate. System costs consist of switching costs, inventory holding and backlogging costs. We show that an (s, S, p) policy is optimal. Moreover, we characterize the structural properties of the optimal profit function and pricing strategy, and show that the optimal price is a quasiconcave function of the inventory level when the production is off and is a quasiconvex function when the production is on. Index Terms—Average criterion, limited capacity, optimal (s, S, p) policy, production/inventory systems. I. I NTRODUCTION With the rapid development of e-commerce and new technologies (e.g., big data, could computing and electronic funds transfer), a number of industries have begun to adopt dynamic pricing strategies and sell products directly to customers through their websites. For example, Dell sells its products through its website, offering promotions every week and even changing prices daily. The Alibaba Group operates leading online and mobile marketplaces in retail and wholesale trades and provides technology and services to enable buyers and suppliers to conduct commerce more easily. These developments have resulted in an increased interest in the construction of models to integrate production decisions, inventory control, and pricing strategies to improve the profitability of companies. In this study, we consider an inventory model with joint production and pricing decisions, in which the demand process is non-homogeneous Poisson with a price-dependent arrival rate, the production rate is constant and the production can be switched on or off with a fixed cost. When the production capacity is unlimited, this problem becomes an inventory control problem with pricing and replenishment decisions, which has been well studied in the literature. For example, Chen and Simchi-Levi [3] consider a periodic-review model with price-dependent demand and a fixed ordering cost, and they show under some conditions that the optimal policy for the infinite-horizon problem is an (s, S, p) policy. In [4], Chen and Simchi-Levi consider a continuous-review model in which the inter-arrival time and the demand size depend on the selling price. They show that optimal stationary (s, S, p) inventory policies exist for both discounted and Manuscript received May 6, 2015; revised September 6, 2015 and December 7, 2015; accepted December 13, 2015. Date of publication December 17, 2015; date of current version December 2, 2016. This work was supported by NSFC under Grants 71201154, 71401159, and 71571176, and by the Fundamental Research Funds for the Central Universities under Grants WK2040160009 and WK2040160011. Recommended by Associate Editor L. H. Lee. The authors are with the School of Management, University of Science and Technology of China, Hefei 230026, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2015.2510162 average profit problems and prove that a higher inventory holding and shortage cost result in a smaller selling price for the average profit problem. Chao and Zhou [2] consider an infinite-horizon, continuousreview stochastic inventory system in which the demand process is Poisson with a price-dependent arrival rate. They show that the optimal price is a unimodal function of the inventory level. However, none of them consider the case in which production capacity is limited. Several papers also consider a limited production capacity in inventory models. Federgruen and Zipkin [5] consider a periodic-review inventory model with a finite production capacity and show that under a few additional unrestrictive assumptions, a modified base-stock policy is optimal under the average-cost criterion. Foreest and Wijngarrd [6] consider a continuous review production-inventory system with compound Poisson demand and fixed switching costs and show that (s, S) policy is optimal if the holding cost is convex. However, they do not incorporate pricing decisions into their models. In this study, we consider a stochastic inventory system in which the manufacturer can dynamically adjust the selling price and switch the production on or off. To the best of our knowledge, this study is the first to incorporate pricing and production decisions into the inventory control. We assume that the production rate is large. Under this assumption, we show that an (s, S, p) policy is optimal for a discretized model. An (s, S, p) policy says that if the inventory level falls to s or below then the production will be switched on; if the inventory level reaches to S or above then the production will be switched off; and if the inventory level is x (s < x < S), then the status of production will remain unchanged. The price p depends on the current inventory level. Moreover, we characterize the optimal inventory control and pricing strategy. We show that the optimal price is a quasiconcave function of the inventory level when the production is off and is a quasiconvex function when the production is on. The rest of the technical note is organized as follows. In Section II, we describe the inventory model and apply a discretization technique. In Section III, we show that the (s, S, p) policy is optimal and characterize the structural properties of the optimal profit function and pricing strategy. In Section IV, we conclude the technical note with several possible extensions. II. M ODEL Consider a manufacturer who can control its inventory level by production and pricing jointly. The selling price of each product is set as pt at time t. Demand arrives according to a non-homogeneous Poisson process with rate Λt (pt ), which depends on the price pt . Demand is met immediately if the inventory is available. Otherwise, it is backlogged. At any time t, the production can be switched on (off) with cost K1 (K0 ). Production rate is constant r, i.e., the manufacturer can produce r products in one unit of time if production is on. This dynamic inventory control problem is analytically challenging. In this technical note, we apply a discretization technique and solve the corresponding discretized problem. Let Δt = 1/r. Assume r is large enough so that Δt is very small. Due to industrialization and 0018-9286 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 4236 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016 mass production, this assumption is not unrealistic. According to the property of non-homogeneous Poisson process, there is one arrival in time interval (t, t + Δt] with probability (w.p.) Λt Δt + o(Δt), more than one arrival w.p. o(Δt), and no arrivals w.p. 1 − Λt Δt. If Δt → 0, then approximately there is one arrival in time interval (t, t + Δt] with probability (w.p.) Λt Δt, more than one arrival w.p. 0, and no arrivals w.p. 1 − Λt Δt. Now, we can define the discretized problem. Let the length of each period be Δt. Suppose that price p ∈ P is chosen at the beginning of a period, where P is assumed to be a compact set. Let λ(p) := Λt Δt. Then with probability λ(p) that a demand arrives, and with probability 1 − λ(p) no demand arrives. Demand is met by end-ofperiod inventory, and unsatisfied demand is backlogged. Production rate is now 1, i.e., the manufacturer can produce one product in one period if the production is on. At the beginning of each period the state of production is chosen to be on or off. If production is off (on) at the end of a period, it can be switched on (off) at the beginning of the next period at the expense of a setup cost K1 (K0 ). When production is on (off) in a period, keeping it on (off) is free. We suppose that the production cost per item is c. Inventory costs are accrued at the end of a period according to the function h(·). A typical form for h(·) is h(x) = hx+ + bx− , where x+ = max{x, 0}, x− = max{−x, 0}, and h and b are the holding cost and shortage cost per item per period, respectively. In the remainder of the technical note, we take the following assumptions of h(x). Assumption 2.1: h(x) is quasiconvex in x with its minimum point x = 0, h(0) = 0, and lim|x|→∞ h(x) = ∞. A function f (x) (x ∈ Z) is called quasiconvex (unimodal) if for some value m, it is increasing for x ≤ m and decreasing for x ≥ m. A function f (x) is quasiconcave if −f (x) is quasiconvex. The system state can be described by its current inventory level x, and its production status y, thereby denoted by (x, y), where y = 1 indicates that the production is on and y = 0 indicates that the production is off. Assumption 2.2: λ(p) is strictly decreasing and continuous in p; Moreover, λ̄ ≡ maxp∈P λ(p) < 1. Assumption 2.3: There exists a positive number M such that maxp∈P pλ(p) = M < ∞. Assumption 2.4: K0 + K1 > 0. Given a pricing and production policy u and the initial system state (x, y), the long-run average profit can be formulated as T 1 E (pt Dt (pt )−h(xt+1 )−cδ(yt = 1) Vc (x, y, u) = lim inf T →∞ T t=1 −K1 δ(yt = 1, yt−1 = 0) −K0 δ(yt = 0, yt−1 = 1)) where xt , yt , and pt are the inventory level, production state and selling price at the beginning of period t, respectively, Dt (pt ) is the demand in period t, which is 1 w.p. λ(pt ) and 0 w.p. 1 − λ(pt ). δ(A) is an indicator function, which is 1 if the event A is true, and 0 if A is false. The manufacturer’s objective is to maximize its long-run average profit by dynamically adjusting its price and switching the production on or off, i.e., find an optimal policy u∗ such that Vc (x, y) ≡ Vc (x, y, u∗ ) = sup Vc (x, y, u) u∈U where U is the set of all admissible policies. III. M AIN R ESULTS A. Existence of Optimal Stationary Policies Assumption 2.4 implies that it is optimal to switch the production at most once at each state. Thus, the problem can be casted as a discrete-time infinite-horizon Markov decision problem under average criterion with state space S = {(x, y)|x ∈ Z, y = 0, 1}, action space A = {(p, z)|p ∈ P, z = 0, 1}, one-step reward function at state (x, y) under action (p, z) R(x, y, p, z) = −Kz δ(z = y) + pλ(p) − cδ(z = 1) − λ(p)h(x + z − 1)−(1−λ(p)) h(x + z) (1) and one-step transition probability from state (x, y) to (x , z) under action (p, z) ⎧ ⎪ if x = x + δ(z = 1) − 1 ⎨λ(p), P (x, y, p, x , z) = 1 − λ(p), if x = x + δ(z = 1) ⎪ ⎩ 0, otherwise. Before we show the existence of an optimal stationary policy, the following lemma is required, which comes from Corollary 7.5.9 and Theorem 7.5.6 in [9]. Lemma 3.1: Consider a discrete time Markov decision process {x(t) : t ≥ 0} consisting of four-element tuple {S, (A(i), i ∈ S), p(j|i, a), c(i, a)} under average cost criterion: The state space S is denumerable; each action space A(i) is a subset of the finite action space A; the transition probability p(j|i, a) satisfies p(j|i, a) ≥ 0, i, j ∈ S, a ∈ A(i) and j∈S p(j|i, a) = 1, ∀ i ∈ S, a ∈ A(i); the one-step cost function c(i, a) ≥ 0, ∀ i ∈ S, a ∈ A(i). There exists an optimal stationary policy if the following set (CAV) of assumptions hold: (CAV1) There exists a z standard policy d with positive recurrent class Rd . d is defined to be a i0 standard policy if the Markov process induced by d, {xd (t) : t ≥ 0} satisfies that for any i ∈ S, the expected time mi,i0 (d) of a first passage from i to i0 (during which at least one transition occurs) is finite and the expected cost ci,i0 (d) of a first passage from i to i0 (during which at least one transition occurs) is finite. (CAV2) Given U > 0, the set DU = {i|C(i, a) ≤ U for some a} is finite. (CAV3) Given i ∈ S − Rd , there exists a policy θi ∈ R∗ (z, i), where R∗ (z, i) is the class of policies d such that the expected cost cz,i (d) of a first passage from z to i is finite. Since P is a compact set, it follows from (1) that there exists a positive number M such that R(x, y, p, z) ≤ M . Thus, the average profit problem can be casted as a discrete time Markov decision process under average cost criterion with one-step cost M − R(x, y, p, z). Now we are ready to show the existence of an optimal stationary policy by applying Lemma 3.1 to verify that (CAV1–3) holds in our model. Proposition 3.1: There exists an optimal stationary policy for the average profit problem. Proof: Let policy d be an (s, S, p) policy which is described in the introduction such that λ(p(x, 0)) > 0 for x > S, λ(p(x, 1)) < 1 for x < s, and 0 < λ(p(x, y)) < 1 for s ≤ x ≤ S and y = 0, 1. It is straightforward to see that for any inventory level z ∈ [s, S] ∩ Z, the Markov chain induced by d is (z, 0) standard with its positive recurrent class Rd = {(x, y) ∈ S : s ≤ x ≤ S}. Thus, (CAV1) holds. Given U , it is straightforward to verify that DU = {(x, y) ∈ S|R(x, y, p, z) ≥ −U for some (p, z) ∈ A} is finite by Assumption 2.1 and thus (CAV2) holds. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016 Given any system state (x0 , y0 ) such that (x0 , y0 ) ∈ Rd , define an (s , S , p ) policy denoted by d , such that λ(p (x, 0)) > 0 for x > S , λ(p (x, 1)) < 1 for x < s , 0 < λ(p (x, y)) < 1 for s ≤ x ≤ S and y = 0, 1, and s ≤ min(x0 , s) and S ≥ max(x0 , S). Thus, it is clear that d ∈ R∗ ((z, 0), (x0 , y0 )) and thus (CAV3) holds. B. Optimality of (s, S, p) Policy Based on the results in Section III-A, we only consider stationary policies, i.e., the decision depends deterministically on the system’s current state. Under a stationary policy u, if the system’s current state is (x, y), then the price is a function pu (x, y), and the production decision is y u (x, y) = 1+, 0− or y, where y u (x, y) = 1+ implies that y = 0 and the production is switched on, y u (x, y) = 0− implies that y = 1 and the production is switched off, and y u (x, y) = y implies keeping the production status unchanged. Note that the inventory process is two-sided skip free (i.e., the inventory level x cannot jump to x + z, |z| ≥ 2 in one period). It follows from [10] that the following result holds. Proposition 3.2: There exists an (s, S, p) policy which is optimal in the class of all stationary policies. Proof: Suppose that a stationary policy u is given and it is optimal. If the initial state is (x0 , 1), define S u = inf{x ∈ Z : x ≥ x0 , y u (x, 1) = 0−} and su = sup{x ∈ Z : x < S u , y u (x, 0) = 1+}. First, it is easy to see that S u < ∞. Otherwise, the inventory level will go to ∞ and thus the policy u cannot be optimal. Moreover, we have su > −∞. Otherwise, the inventory level will go to −∞ and thus the policy u cannot be optimal. Since the inventory process is twosided skip free, we know that the inventory level is always between su and S u , and thus the long-run average profit under the policy u is equal to that under an (su , S u , p) policy. If the initial state is (x0 , 0), then we define su = sup{x ∈ Z : x ≤ x0 , y u (x, 0) = 1+} and S u = inf{x ∈ Z : x > su , y u (x, 1) = 0−}. Using an similar argument, we know that the long-run average profit under the policy u is equal to that under a (su , S u , p) policy. Therefore, there exists an (s, S, p) policy which is optimal in the class of all stationary policies. Note that the resulting su and S u depend on the system’s initial state. This proposition tells us that we can restrain our discussion on the (s, S, p) policies. We mention that the above argument does not work for discounted case. C. Average Profit Under an (s, S, p) Policy To compute the average profit, we use renewal reward theory. A cycle starts from (S, 1). The manufacturer switches off the production and the inventory level falls from S to s. Afterward, the production is switched on and the inventory level rises from s to S, which ends a circle. Next, we compute the expected cycle length and the expected profit per cycle. Under an (s, S, p) policy, if the production is off, the length of the time in which the inventory level falls from x to x − 1, denoted by τ0 (x), is geometrically distributed with mean 1/λ(p(x, 0)), i.e., P (τ0 (x) = k) = λ(p(x, 0))(1 − λ(p(x, 0)))k−1 , k ≥ 1. The expected profit in which the inventory level falls from x to x − 1 is R0 (x) = E [p(x, 0) − h(x)(τ0 (x) − 1) − h(x − 1)] = p(x, 0) − 1 − λ (p(x, 0)) h(x) − h(x − 1). λ (p(x, 0)) If the production is on, the length of the time in which the inventory level rises from x to x + 1, denoted by τ1 (x), is geometrically distrib- 4237 uted with mean 1/(1 − λ(p(x, 1)). The expected profit in which the inventory level rises from x to x + 1 is R1 (x) = E [p(x, 1) (τ1 (x) − 1) − h(x) (τ1 (x) − 1) − h(x + 1) − cτ1 (x)] λ (p(x, 1)) λ (p(x, 1)) − h(x) 1 − λ (p(x, 1)) 1 − λ (p(x, 1)) c − h(x + 1) − . 1 − λ (p(x, 1)) = p(x, 1) It follows from renewal reward theory that the average profit under an (s, S, p) policy is: S−1 −K + S i=s+1 R0 (i) + i=s R1 (i) R(s, S, p) = S S−1 i=s+1 1/λ (p(i, 0))+ i=s 1/ (1−λ(p(i, 1)) where K = K1 + K0 . By subtracting a dummy profit γ from original profit in each period, we define the following auxiliary total profit function in a cycle S L(s, S, p, γ) = −K + R0 (i) + i=s+1 S −γ S−1 1/λ (p(i, 0))+ i=s+1 S = −K + + i=s p(i, 0)− + i=s 1/ (1 − λ (p(i, 1))) h(i) + γ +h(i)−h(i−1) λ (p(i, 0)) p(i, 0) − i=s+1 S−1 p(i, 1)λ (p(i, 1)) − c − h(i) − γ 1 − λ (p(i, 1)) + h(i) − h(i + 1) S = −K + S−1 i=s i=s+1 S−1 R1 (i) i=s h(i) + γ λ (p(i, 0)) p(i, 1)λ (p(i, 1)) − c − h(i) − γ 1 − λ (p(i, 1)) . (2) Define L(γ) = maxs,S,p L(s, S, p, γ). Then it follows from basic theory of fractional programming [8] that γ is the maximal average profit if and only if L(γ) = 0. It follows from the first equality of (2) that L(s, S, p, γ) is a strictly decreasing, convex function of γ. Since both the monotonicity and convexity are preserved under maximization, we know that L(γ) is also a strictly decreasing, convex function of γ. It follows from the first equality of (2) and the definition of L(γ) that if γ is sufficiently large, L(γ) < 0 and if γ is sufficiently small, L(γ) > 0. In fact, in Proposition 3.3 we show that L(γ) = +∞ if γ < 0. Since L(γ) is strictly decreasing in γ, there exists a unique γ ∗ , such that L(γ ∗ ) = 0 and γ ∗ is the optimal average profit. D. Analysis of the Optimal Price Given s, S, and γ We analyze the optimal price given s, S, and γ, which is the optimal solution to the problem L(s, S, γ) := max L(s, S, p, γ). p It follows from Assumption 2.2 that λ(p) has a strictly decreasing, continuous inverse p(λ). Therefore, we can take arrival probability λ as the decision variable instead of p. 4238 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016 Define f0 (γ, x, λ) = p(λ)−(h(x)+γ)/λ and f1 (γ, x, λ) = (λp(λ)− c − h(x) − γ)/(1 − λ). It follows from the last equality of (2) that: S f0 γ, i, λ (p(i, 0)) L(s, S, p, γ) = −K + i=s+1 + S−1 f1 (γ, i, λ (p(i, 1))) . i=s We first compute the optimal arrival probability λ∗ (x, 0) = λ(p∗ (x, 0)) when the inventory level is x and the production is off. The following assumption is required for our discussion. Assumption 3.1: For any 0 < λ ≤ λ̄, λp (λ) + 2p (λ) < 0. Note that the condition that p(λ) is concave decreasing in λ is sufficient for λp (λ) + 2p (λ) < 0. Moreover, it follows from Assumption 3.1 that the feasible region of λ is the interval [0, λ̄]. It follows from: 1 2 d p dp μ1 μ 1 2 1 1 1 1 = − 2 p , + = p p dμ μ μ dμ2 μ4 μ μ3 μ that Assumption 3.1 implies that p(1/μ) is strictly concave in μ for μ > 1. Thus, by letting μ = 1/λ it is known that f0 (γ, x, λ) = p(1/μ) − (h(x) + γ)μ is strictly concave in μ and its first-order derivative is ∂f0 = −λ2 p (λ) − (h(x) + γ) . ∂μ (3) Now we can claim that it suffices to consider the case γ ≥ 0 in our technical note. Proposition 3.3: If γ < 0, then L(γ) = +∞. Proof: Note that for admissible λ, −λ2 p (λ) ≥ 0. Thus, if γ < 0, then f0 is increasing in μ and thus decreasing in λ at x = 0. Therefore, maxλ∈[0,λ̄] f0 (γ, 0, λ) = f0 (γ, 0, 0) = +∞ and thus L(s, S, γ) = +∞ which implies L(γ) = +∞. Let g0 (λ) = −λ2 p (λ), then g0 is strictly increasing in λ for λ > 0 as g0 (λ) = −λ(λp (λ) + 2p (λ)) > 0 according to Assumption 3.1. Thus, g0 has a strictly increasing inverse denoted by g0−1 . Moreover, g0 (0) = 0 and g0 (λ̄) = −λ̄2 p (λ̄). It follows from (3) that the optimal arrival probability at state (x, 0) is: g0−1 (h(x) + γ) , if h(x) + γ ≤ g0 (λ̄) ∗ λ (x, 0) = (4) λ̄, otherwise. λ̄)p (λ̄). It follows from (5) that the optimal arrival probability at state (x, 1) is: g1−1 (c + h(x) + γ) , if c + h(x) + γ ≤ g1 (0) (6) λ∗ (x, 1) = 0, otherwise. Define r0 (γ, x) = maxλ f0 (γ, x, λ) = f0 (γ, x, λ∗ (x, 0)) and r1 (γ, x) = maxλ f1 (γ, x, λ) = f1 (γ, x, λ∗ (x, 1)). Obviously, both r0 (γ, x) and r1 (γ, x) are strictly decreasing, convex functions of γ as both f0 (γ, x, λ) and f1 (γ, x, λ) are strictly decreasing, convex functions of γ. Moreover. we have L(s, S, γ) = −K + = −K + S i=s+1 S and Assumption 3.1 thatf1 (γ, x, λ) = (μ − 1)p((μ − 1)/μ) − (c + h(x) + γ)μ is strictly concave in μ for μ > 1, and its first-order derivative is ∂f1 = λ(1 − λ)p (λ) + p(λ) − (c + h(x) + γ) . ∂μ (5) Let g1 (λ) = λ(1−λ)p (λ)+p(λ), then g1 is strictly decreasing in λ for λ > 0 as g1 (λ) = (1 − λ)(λp (λ) + 2p (λ)) < 0 according to Assumption 3.1. Thus, g1 has a strictly decreasing inverse denoted by g1−1 . Moreover, g1 (0) = limλ→0 λp (λ) + p(0) and g1 (λ̄) = λ̄(1 − S−1 r1 (γ, i) i=s (r0 (γ, i) + r1 (γ, i − 1)) . (7) i=s+1 The following two propositions characterize the relationship of r0 (γ, x), r1 (γ, x) with x, which is crucial in determining the optimal s(γ) and S(γ). Proposition 3.4: For any given γ ≥ 0, both r0 (γ, x) and r1 (γ, x) (x ∈ R) are quasiconcave functions of x, and 0 is the maximum point. Proof: It follows from the envelope theorem [7] that (dr0 (γ, x))/dx = −h (x)/λ∗ (x, 0) and (dr1 (γ, x))/dx = −h (x)/ (1 − λ∗ (x, 0)). Since h(x) is quasiconvex in x, we have h (x) ≤ 0 when x ≤ 0 and h (x) ≥ 0 when x ≥ 0. Therefore, both r0 (γ, x) and r1 (γ, x) are quasiconcave functions of x, and 0 is the maximum point. Proposition 3.5: For any given γ ≥ 0, r0 (γ, x) + r1 (γ, x − 1) (x ∈ Z) is a quasiconcave function of x, and 0 or 1 is the maximum point. Proof: It follows from Proposition 3.4 that r0 (γ, x)+r1 (γ, x − 1) is increasing on x ≤ 0 and decreasing on x ≥ 1. Since x ∈ Z, r0 (γ, x) + r1 (γ, x − 1) takes its maximum at point 0 or 1 and is quasi concave in x ∈ Z. We mention that the constraint x ∈ Z is necessary for Proposition 3.5 to be held as r0 (γ, x)+r1 (γ, x−1) might not be quasiconcave for 0 < x < 1. E. Analysis of the Optimal s and S Given γ Now we want to find the optimal s and S given γ, which is the solution of max s<S,s,S∈Z Next we compute the optimal arrival probability λ∗ (x, 1) when the inventory level is x and the production is on. Let μ = 1/(1 − λ). It follows from: μ−1 μ−1 μ−1 d μ−1 = + p p (μ − 1)p dμ μ μ2 μ μ 2 μ−1 d μ − 1 μ − 1 2 μ−1 (μ−1)p p p = + dμ2 μ μ4 μ μ3 μ r0 (γ, i) + L(s, S, γ). Proposition 3.6: Given γ ≥ 0, if the set {x ∈ Z : r0 (γ, x) + r1 (γ, x − 1) ≥ 0, x ≥ 0} is non-empty, then the optimal s(γ) and S(γ) are given by s(γ) = min {x ∈ Z : r0 (γ, x)+r1 (γ, x−1) ≥ 0}−1 S(γ) = max {x ∈ Z : r0 (γ, x) + r1 (γ, x − 1) ≥ 0} . (8) (9) Proof: Since r0 (γ, x) + r1 (γ, x − 1) is a quasiconcave function of x ∈ Z, it follows from (7) that the optimal solution for maxs,S L(s, S, γ) should be the s and S such that r0 (γ, i) + r1 (γ, i − 1) ≥ 0 for all s + 1 ≤ i ≤ S. Thus, the result holds. The set {x ∈ Z : r0 (γ, x) + r1 (γ, x − 1) ≥ 0, x ≥ 0} might be empty if γ is quite large. In this case, r0 (γ, x) + r1 (γ, x − 1) < 0 for all x ∈ Z. In this case, if r0 (γ, 0) + r1 (γ, −1) ≥ r0 (γ, 1) + r1 (γ, 0), then the optimal (s(γ), S(γ)) = (−1, 0). Otherwise, the optimal (s(γ), S(γ)) = (0, 1). Proposition 3.7: Consider the domain of γ such that the set {x ∈ Z : r0 (γ, x) + r1 (γ, x − 1) ≥ 0, x ≥ 0} is non-empty. s(γ) is increasing in γ and S(γ) is decreasing in γ. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016 4239 Proof: Note that r0 (γ, x) + r1 (γ, x − 1) is decreasing in γ. For γ1 < γ2 , it follows from (8) that r0 (γ1 , s(γ1 )) + r1 (γ1 , s(γ1 )−1) < 0 and thus r0 (γ2 , s(γ1 ))+r1 (γ2 , s(γ1 )−1) < 0. Again, it follows from (8) that s(γ1 ) < s(γ2 ) + 1, or s(γ1 ) ≤ s(γ2 ). Thus, s(γ) is increasing in γ. Similarly, it follows from (9) that S(γ) is decreasing in γ. Proposition 3.8: λ∗ (x, 0) is decreasing on x ≤ 0, and increasing on x ≥ 0; λ∗ (x, 1) is increasing on x ≤ 0, and decreasing on x ≥ 0. Proof: It follows from the definition of λ∗ (x, 0) that: f0 (γ, x, λ∗ (x, 0)) ≥ f0 (γ, x, λ∗ (x − 1, 0)) f0 (γ, x − 1, λ∗ (x − 1, 0)) ≥ f0 (γ, x − 1, λ∗ (x, 0)) . Summing the above two inequalities, we obtain 1 1 (h(x) − h(x − 1)) ≤ 0. − λ∗ (x, 0) λ∗ (x − 1, 0) It follows from the quasiconvexity of h(x) that λ∗ (x, 0) is decreasing on x ≤ 0, and increasing on x ≥ 0. Similarly, we have: 1 1 (h(x) − h(x − 1)) ≤ 0. − 1 − λ∗ (x, 1) 1 − λ∗ (x − 1, 1) It follows from the quasiconvexity of h(x) that λ∗ (x, 1) is increasing on x ≤ 0, and decreasing on x ≥ 0. Note that p(λ) is strictly decreasing in λ and the optimal price p∗ (x, y) = p(λ∗ (x, y)). Thus, the interpretation of Proposition 3.8 is as follows: When the production is off (i.e., y = 0), the optimal price increases in the inventory level x if x ≤ 0, and decreases in x if x ≥ 0. When the production is on (i.e., y = 1), the optimal price decreases in the inventory level x if x ≤ 0, and increases in x if x ≥ 0. Therefore, the optimal price is a quasiconcave function of the inventory level x when the production is off and is a quasiconvex function of x when the production is on. The result is consistent with our intuition. Intuitively, because of inventory holding costs and backlogging costs, the manufacturer attempts to maintain the inventory level around 0, by keeping the inventory level around 0 as long as possible and moving the inventory level away from S and s as fast as possible. Thus, he will make different pricing strategies according to the current production state. Suppose that the production is off (on): if the inventory level is high, the manufacturer should set the selling price low (high) to attract more demand to sell the product in stock (to make the inventory level quickly reach S to switch the production off); if the backlogging level is high, the manufacturer should also set the selling price low (high) to make the inventory level quickly reach s to switch the production on (to meet more backlogged demand). F. Comparative Statics We discuss the effect of switching cost. Let γ ∗ be the optimal average profit and s∗ = s(γ ∗ ), S ∗ = S(γ ∗ ). Proposition 3.9: γ ∗ is strictly decreasing in K = K0 + K1 ; s∗ is decreasing in K and S ∗ is increasing in K. Moreover, p∗ (x, 0) is increasing in K and p∗ (x, 1) is decreasing in K. Both the expected production-on time length and production-off time length of a cycle are increasing in K. Proof: It follows from: 0 = Lγ ∗ = = max s<S,s,S∈Z max s<S,s,S∈Z L(s, S, γ ∗ ) −K + S ∗ ∗ (r0 (γ , i) + r1 (γ , i − 1)) (10) i=s+1 and the fact that both r0 (γ, x) and r1 (γ, x) are strictly decreasing in γ that γ ∗ is strictly decreasing in K. Thus, it follows from Proposition 3.7 that s∗ is decreasing in K and S ∗ is increasing in K. Fig. 1. Sample trajectory of inventory level and price under optimal policy. It follows from (4) and (6) that λ∗ (x, 0) is decreasing in K and λ∗ (x, 1) is increasing in K. Thus, p∗ (x, 0) is increasing in K and p∗ (x, 1) is decreasing in K. ∗ −1 The expected production-on time length of a cycle is S i=s∗ 1/ (1 − λ∗ (i, 1)), which is increasing in K as λ∗ (x, 1) is increasing in K, s∗ is decreasing in K, S ∗ is increasing in K and 1 − λ∗ (i, 1) > 0 for all i ∈ Z. Similarly, the expected production-off time length of a ∗ ∗ cycle S i=s∗ +1 1/λ (i, 0) is also increasing in K. The above result indicates that when the switching cost K increases, switching less frequently is preferred. The impact of holding cost on the optimal policy is unclear, although γ ∗ decreases as holding costs increase. Consider a special case in which the holding cost h(x) at each state x is increased by Δh (i.e., h(x) + Δh). In this case, the optimal average profit is decreased by Δh, but the optimal production-pricing policy (i.e., s∗ , S ∗ and the optimal price) remains unchanged. G. A Numerical Example In this part, we provide a simple numerical example to illustrate the properties of the optimal pricing strategies. In this example, the production rate r = 1, and λ(p) = e−θp with θ = 0.2. The allowable price set P = [1, 10]. The switching costs are K0 = 50 and K1 = 100. The holding cost function is h(x) = hx+ +bx− with h = 0.1, b = 0.15. By using value iteration algorithm, we find that the optimal average profit is γ ∗ = 0.13, s∗ = −12 and S ∗ = 17. Fig. 1 shows a sample trajectory of inventory level and price under optimal policy, starting from level 0 with production on, which reflects an asymptotic cyclic behavior. When the production is on, the price first decreases and then increases, the inventory level increases from s to S. The price reaches the minimum when the inventory level is close to zero, so that the demand and production can be balanced while keeping low inventory. When the production is off, the price first increases and then decreases, the inventory level decreases from S to s. The price reaches the maximum (p = 10) and keep for a while when the inventory level is close to zero. IV. C ONCLUSION In this technical note, we consider a stochastic inventory system where the manufacturer can dynamically adjust the selling price and switch the production on or off to maximize the long-run average profit. We show that an (s, S, p) policy is optimal. Moreover, the structural properties of the optimal profit function and pricing strategies are presented. There are a few topics worthy of further discussion. One can consider a compound Poisson process or batch production, formulating the problem as a continuous model with Poisson process directly. The problem under a discounted criterion, in finite horizon, or with lead times can also be studied. 4240 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 61, NO. 12, DECEMBER 2016 ACKNOWLEDGMENT The authors thank Prof. X. Chen for proposing this research problem. R EFERENCES [1] D. P. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA, USA: Athena Scientific, 2001, vol. II. [2] X. Chao and S. X. Zhou, “Joint inventory-and-pricing strategy for a stochastic continuous-review system,” IIE Trans., vol. 38, no. 5, pp. 401–408, 2006. [3] X. Chen and D. Simchi-Levi, “Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The infinite horizon case,” Mathemat. Oper. Res., vol. 29, no. 3, pp. 698–723, 2004. [4] X. Chen and D. Simchi-Levi, “Coordinating inventory control and pricing strategies: The continous review models,” Oper. Res. Lett., vol. 34, pp. 323–332, 2006. [5] A. Federgruen and P. H. Zipkin, “An inventory model with limited production capacity and uncertain demands I. The average-cost criterion,” Mathemat. Oper. Res., vol. 11, no. 2, pp. 193–207, 1986. [6] N. D. V. Foreest and J. Wijngaard, “On optimal policies for productioninventory systems with compound poisson demand and setup costs,” Mathemat. Oper. Res., vol. 39, no. 2, pp. 517–532, 2014. [7] P. Milgrom and I. Segal, “Envelope theorems for arbitrary choice sets,” Econometrica, vol. 70, no. 2, pp. 583–601, 2002. [8] S. Schaible and T. Ibaraki, “Fractional programming,” Eur. J. Oper. Res., vol. 12, pp. 325–338, 1983. [9] L. I. Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems. New York, NY, USA: Wiley, 1999. [10] M. J. Sobel, “Optimal average-cost policy for a queue with start-up and shut-down costs,” Oper. Res., vol. 17, no. 1, pp. 145–162, 1969.
© Copyright 2026 Paperzz