EFFECTS OF SYSTEM PARAMETERS ON THE OPTIMAL POLICY

EFFECTS OF SYSTEM PARAMETERS ON THE OPTIMAL POLICY STRUCTURE
IN A CLASS OF QUEUEING CONTROL PROBLEMS
Eren Başar Çil∗ , E. Lerzan Örmeci∗∗ and Fikri Karaesmen∗∗
∗
Kellog School of Management
Northwestern University
2001 Sheridan Rd
Evanston, IL 60208, USA
[email protected]
∗∗ Department
of Industrial Engineering
Koç University
34450, Sarıyer, İstanbul, TURKEY
[email protected], [email protected],
Effects of System Parameters on the Optimal Policy
Structure in a Class of Queueing Control Problems
Eren Başar Çil∗ , E. Lerzan Örmeci∗∗ , Fikri Karaesmen∗∗
∗
Kellog School of Management, Northwestern University, Evanston, USA
[email protected]
∗∗
Department of Industrial Engineering, Koç University, İstanbul, TURKEY
[email protected], [email protected]
Abstract
This paper studies a class of queueing control problems involving commonly used control
mechanisms such as admission control, pricing, and service rate control. Such controls are
frequently employed in modeling service systems, telecommunications, and also in production/inventory systems modeled as make-to-stock queues. It is by now well established that
in a number of such important problems, there is an optimal policy with a certain structure
that can be described by a few parameters. From a design point of view, it is useful to understand how such an optimal policy changes when the system parameters such as arrival or service
rates, or the number of servers vary. We present a general framework to investigate the policy
implications of the changes in system parameters by using event-based dynamic programming.
In this framework, the value function of the control problem is first represented by a number
of common operators, and the effect of system parameters on the structured optimal policy
is analyzed for each individual operator. We present this analysis for a set of frequently used
operators. Whenever a queueing control problem can be modeled by these operators, the effects
of system parameters on the optimal policy follows trivially from this analysis. This enables us
to easily retrieve and generalize some of the existing results in the literature. We also present
examples in a number of challenging problems.
1
Introduction
This paper focuses on a class of queueing control problems that frequently arise in modelling service
systems, telecommunications systems, or production and inventory systems. The objective in such
systems is, in general, managing the queue length or the inventory level in order to maximize a
reward function. This can be achieved by controlling the arrival rates using admission control or
dynamic pricing, or by controlling the service rate by slowing down or speeding up service. A
significant literature exists on the investigation of such problems. One of the common interesting
findings of this literature is that in a number of situations, the optimal control policy can be shown
to have a simple structure. A frequent case is the optimality of threshold policies where the optimal
1
action only depends on a threshold value of the state variable which may be the queue length or the
inventory level. Such structural results are important, because based on them simple operational
policies can be designed. In this paper, we pursue this line of exploration further and address the
following question: assume that the system parameters (i.e., arrival rates, service rates, number of
servers, buffer size) change, how would the structured optimal policy be impacted by this change?
The main purpose of this paper is to develop a general approach to analyze this issue.
The optimality of structured policies is essential for our investigation. Unless the optimal policy
is structured, it is difficult and not very meaningful to make statements on how it would change
when the parameters vary. Fortunately, optimality of structured policies was addressed on an
individual problem basis in a large number of papers over the past thirty years. In addition, there
has been some work on developing general approaches for obtaining results on the structure of
optimal policies that apply to a class of problems. For instance, Veatch and Wein [25] investigate
a class of queueing problems with this perspective. Smith and McCardle [10] present some results
for general Markov Decision Processes (MDPs). In particular, a fruitful general approach for
queueing problems is proposed by Koole [11] which introduces the so-called event-based dynamic
programming framework. Event-based dynamic programming expresses the value function of a
given control problem in terms of the composition of different operators corresponding to individual
events. Establishing the structure of an optimal policy can then be performed by verifying that the
different operators constituting the problem satisfy certain properties. This decomposition of the
problem is extremely useful, since the properties of commonly used operators in queueing control
problems can be investigated individually for once and collected in a library as in Koole [11] and [12].
The end result is that, if a queueing control problem is composed of known operators, establishing
the structure of the optimal policy is usually trivial by checking the operator library. Our approach
for the investigation of the effects of system parameters on the optimal policy is in a similar spirit.
We employ the event-based dynamic programming framework and focus on a number of frequently
employed operators. We investigate the properties of these operators in detail with a view towards
the effects of system parameters. This has two purposes: first it gives a clearer view of how and
why the optimal policy is influenced by system parameters. Second, as in Koole [11], whenever a
queueing control problem can be expressed as a composition of these operators, understanding the
effects of a given system parameter on the optimal policy becomes an easy problem.
In order to investigate the influence of system parameters on the optimal policy, we first need to
establish certain properties of the value functions with respect to these system parameters. Koole
[12], independently, has considered these properties, but with a focus on optimal design issues to
support strategical decisions. We, on the other hand, concentrate on the effects of these properties
on optimal control policies, which correspond to operational decisions. To our knowledge, there
are only a few specific studies which concentrate on this issue. For instance, Ku and Jordan [13]
study an admission control problem in a two-stage multi-server loss system. They characterize
2
the structure of the optimal policy, and then establish the effects of the system parameters on
this policy. Gans and Savin [6] consider a joint admission control and dynamic pricing problem in
a multi-server loss system and analyze the effects of the parameters on the optimal policy. The
optimal admission policy is shown to have a threshold structure, and they prove the monotonicity
of the optimal thresholds and the optimal prices in the system parameters. Aktaran-Kalaycı and
Ayhan [1] and Çil, Karaesmen and Örmeci [4] examine the effects of the parameters on the optimal
decisions in a multi-server queueing system with limited capacity. They show the monotonicity
of the optimal prices as in [6]. Most of these papers and a number of other important models in
the literature fit into our framework, and the influence of the parameters on these models can be
determined easily by our results. In order to demonstrate the application of the general approach,
we also investigate two models in some detail.
Our main contribution is providing a general framework to investigate the effects of system
parameters on the optimal policy in queueing control problems. To our knowledge, this problem
has not been considered in this generality before. Müller [17] proposes an approach to investigate
the effects of transition probabilities on the value function of a Markov Decision Process in a general
setting. However, his focus is not on policy implications. A fairly rich class of problems, including
several earlier studied ones, fall within the framework developed here. There are, of course, certain
limitations to the generality. First a given problem must be within the scope of event-based
programming and the value iteration principle must hold. Moreover, for infinite-horizon problems
the existence of optimal value functions and optimal policies should be checked separately. We
do not address these issues here (see [11] and [12] for more on these issues). These issues are
sometimes easy to verify and follow by known general results such as the ones in Puterman [21]
but in some cases an analysis of the individual problem may be necessary. Second, we investigate
a certain number of operators which are frequently used and cover a significant scope, but other
problem-specific operators should be investigated individually. Using the results on the operators
here, we think it should not be difficult to extend the investigation to other operators.
The other contributions of this paper are as follows: we investigate a subset of the dynamic
programming operators from Koole [11] but in certain cases we modify or generalize these operators.
In addition, we also present a number of new operators. Some of these are relevant for recent
applications of interest such as dynamic pricing, others are introduced since they are used in the
control of make-to-stock queues that model production/inventory systems. We present structural
properties for these parameters as well as investigating the effects of system parameters. The full
strength of event-based dynamic programming appears when the structure of the optimal policy
and the effects of the parameters have to be determined for a new problem. To this end, we explore
an inventory rationing problem for a make-to-stock queue with multiple classes of demand arriving
in batches. Using the operators and the framework, we present a characterization of the optimal
policy and determine the effects of the system parameters.
3
The rest of the paper is organized as follows. Section 2 introduces a simple model to demonstrate
how the event-based dynamic programming approach can be employed to determine the structure of
optimal policies and the effects of parameter changes in these structures. We present the operators
modelling the events in stochastic control problems, as well as their relations to certain parameters
in Section 3. Then, in Section 4, we work on certain properties of the operators to guarantee the
existence of optimal monotone policies and to characterize their behavior with respect to parameter
changes. Section 5 presents several models that can be generated by combining the operators we
have defined, in order to illustrate how to apply our results regarding to the effects of system
parameters. In particular, we discuss two models in detail. Finally, we conclude the study and
mention our future research objectives in section 6.
2
Illustration of the framework
In this section, our goal is three-fold: First, we motivate our main tool, the event-based dynamic
programming technique (introduced in [11]). For this purpose, we represent a typical Markovian
control problem as an event-based dynamic program, from which the structure of the optimal
control policy can easily be deduced. Second, we illustrate that it is not straightforward to predict
how the policy parameters would change when the system parameters are varied. The rest of
the paper will, then, show the effectiveness of the event-based dynamic programming technique in
investigating the influence of parameter changes on optimal control policies. Finally, with the aid
of the model, we classify different kinds of effects that different parameters may impose.
We consider an admission control problem in a simple queueing system with m identical parallel
servers and infinite waiting room. The system receives N different types of customers, where class-i
customers arrive at the system according to a Poisson process with a rate λi bringing a reward of
Ri , and demand an exponential service time with rate µ. Then, the state of the system can be
defined as the current number of customers in system x, where x ∈ Z+ , Z+ = {0, 1, . . . }. A holding
cost of h(x) is incurred per unit time, where h(x) is a convex and non-decreasing function of x. At
each arrival epoch, the decision maker either accepts the incoming customer or rejects her in order
to maximize the expected profit over an infinite horizon. We assume that a rejected customer is lost
forever. We can consider both the total discounted and long-run average criteria. However, for now
we concentrate on the total discounted profit with a discount rate of β. We employ uniformization
to form the equivalent discrete time Markov decision model of the system (see [16]). Hence, we
assume that
N
i=1 λi + M µ + θ + β
= 1 without loss of generality, where M ≥ m + 1 and θ > 0. We
note that θ + µ(M − m) corresponds to the rate of fictitious events, which are introduced to ensure
that the time scale will not be affected when one of the parameters, µ or λ or m, increases. We
define vn (x) as the total expected β-discounted profit of such a system when there are n remaining
transitions in the horizon. Then the standard arguments of Markov decision processes can be used
4
to analyze the finite horizon problems, and the infinite horizon problems as their limits, see e.g.,
[21]. The value functions vn (x) satisfy the following optimality equations:
N
vn+1 (x) =
i=1
λi max {vn (x), vn (x + 1) + Ri } + µ min{m, x} vn ((x − 1)+ )
+(µ (M − min{m, x}) + θ) vn (x) − h(x),
where a+ = max{0, a}. For this particular model, we can show that both the discounted infinite
horizon problem and the long-run average problem have a solution, which can be found by the
value iteration algorithm. Hence, we can define v(x) as the total expected β-discounted profit over
an infinite horizon, and compute it by v(x) = limn→∞ vn (x). Finally, taking β → 0 in v(x) converts
the problem to maximizing the long-run average profit in the usual way, see e.g., [21]. In the rest of
this section we will concentrate on the total expected β-discounted profit over an infinite horizon,
v(x).
The traditional approach is to examine the structure of an optimal policy using the optimality
equations in order to prove some structural properties (monotonicity, concavity, etc.) of the optimal
policies. For example, one can prove that if v0 (x) is non-increasing and concave in x, then vn (x) will
also be non-increasing and concave in x for all n > 0 by induction. However, proving the desired
properties may not be easy by working on the entire value function in some cases, as Koole [11]
points out. Therefore, Koole [11] suggests defining the value function as a combination of distinct
event operators as an alternative to the traditional approach.
Now, we demonstrate how one can use the event-based dynamic programming technique by
redefining the value function of the above model as a combination of certain event operators. Let
us first introduce the following event operators:
TCOST v(x) = v(x) − h(x),
TU N IF ({fj (x)}; {pj }) =
pj fj (x),
j
TDEP v(x) = b(x)v((x − 1)+ ) + [1 − b(x)]v(x), with b(x) =
min{x, m}
,
M
(1)
TADMi v(x) = max {v(x), v(x + 1) + Ri } ,
TF IC v(x) = v(x).
Using the above operators, the value function for the total expected β-discounted profit over
an infinite horizon, v(x) is:
v(x) = TCOST TU N IF (TDEP v(x), {TADMi v(x)}i , TF IC v(x); M µ, {λi }, θ) ,
In order to investigate the structure of the value function, we first show that TADMi , TDEP ,
TCOST , TF IC preserve some properties, say P rop1 ,. . . , P ropk , of any function, f (x), on which they
5
are applied, whereas we prove that TU N IF preserves these properties whenever all functions entering
the operator satisfy them. Then, the value function, vn+1 (x), has P rop1 ,. . . , P ropk , if vn (x) has
these properties since it is the combination of the event operators: TU N IF , TADMi , TDEP , TCOST ,
TF IC . This implies, by induction, that vn (x) has these properties for all n if v0 (x) has them. As
v(x) = limn→∞ vn (x), v(x) also inherits these properties, as well as the relative value functions for
the long-run average criterion (see e.g., [21]). Finally, the properties of the value functions should
be related to the optimal decisions, which will characterize the optimal policy structure for this
model.
In this specific model, it can be shown that all the operators TU N IF , TADMi , TDEP , TCOST ,
TF IC preserve monotonicity (f (x + 1) ≤ f (x)) and concavity (f (x − 1) − f (x) ≤ f (x) − f (x + 1)),
which then guarantees the existence of an optimal threshold policy:
Proposition 1 There exists an optimal policy of threshold type, i.e., there exist numbers li ≥ 0 for
i = 1, · · · , N , such that: If x ≥ li , it is optimal to reject an incoming class-i customer; otherwise it
is optimal to admit her. Moreover, if the rewards R1 > R2 > · · · > RN , then l1 ≥ l2 ≥ · · · ≥ lN .
Now we arrive at the question which is the main subject of this paper: How will these thresholds
change if one of the parameters, more explicitly one of the arrival rates λi , or the service rate µ,
or the number of servers m increase, by some ε > 0? In general, the answer to this question is not
obvious. Let us consider a specific version of the above model, where the system has one server
and three different customer types with rewards (R1 , R2 , R3 ) = (50, 30, 10). A holding cost of 1
unit is incurred for each waiting customer per unit time, so h(x) = x. Finally, the service rate
is µ = 2.5, and the arrival rates are (λ1 , λ2 , λ3 ) = (0.5, 0.7, 0.9), before uniformization. Optimal
thresholds, which maximize the long-run average profit of the system, are (l1 , l2 , l3 ) = (72, 32, 6).
Will these optimal thresholds increase or decrease when one of the arrival rates of different classes
increases? The most puzzling case is when the arrival rate of class 2, λ2 , increases, where one
can see two different lines of intuition: (1) We may expect the threshold of class 3, l3 , to decrease
to have more room for class-2 customers, and the threshold of class-1, l1 , to increase in order to
protect the system from the growing number of class-2 customers. In this line of of thinking, the
behavior of l2 is not clear, it may decrease to open room for class 1, or increase to limit the access
of class 3. (2) Another line of thought may take an increase in any one of the arrival rates as an
increase in the overall demand, so that it expects all thresholds to decrease (which includes staying
constant). Both arguments are intuitively convincing, but also contradictory, and so it appears
that the intuition does not always provide satisfactory answers to these types of questions. Hence,
our aim in this paper is to provide a general framework, which, completely and automatically,
characterizes the behavior of optimal control policies when one of the parameters changes.
Finally, we would like to point out the different types of effects that different parameters impose
on the operators. An increase in λi affects the admission control operator, TADMi , directly, since
6
the probability of an arrival, i.e., the probability of the event which triggers the operator TADMi ,
is increased. Such an increase in λi decreases the probability of the fictitious event, θ. We will
see, later on, that this simultaneous change in the probability of these two events will call for a
comparison of their consequences. Finally, notice that the definition of the operator TADMi is still
the same, only the value of its multiplier has increased. Now assume that we increase m by 1,
which affects the departure operator, TDEP , directly. But now the definition of the operator TDEP
is altered, while the probabilities of all events remain the same, in particular the probability of
observing TDEP is still M µ.
In all the operators we will define in the next section, the direct effects of parameters will be
either a change in the probability of the event that the operator corresponds to (as in λi above)
or a change in the definition of the operator itself (as in m above) or both. Finally, we note that,
obviously, the value functions will be influenced by a change in one of the parameters, which will
affect all other events and so corresponding operators. However, these effects are indirect effects,
which do not need to be addressed explicitly. Now we are ready to introduce our framework,
starting with the operators.
3
The Operators
The important issues addressed in this paper are to introduce the appropriate event operators
that are commonly used, and to prove that they preserve certain properties of a function f . This
section introduces the operators that correspond to certain events occurring in Markovian queuing
and inventory systems. By using these operators, various Markovian systems can be constructed.
The next section will examine certain properties of these operators in order to characterize the
structure of control policies.
Our eventual goal is to characterize the effects of four parameters on the structure of optimal
policies, namely the customer or demand arrival rate λ, the service or production rate µ, the number
of servers m, and the capacity of the system K. While defining the operators, we will keep the
notation as general as possible. Therefore, in certain cases the relation of the operators with these
parameters will not be obvious from the definitions only. Hence, we will comment on the potential
direct influence of the parameters on the given operators, when applicable. As we mentioned in the
previous section, these effects can be of two types, changing the probability of the corresponding
event or changing the definition of the operator.
In the rest of our paper, x will denote the state of the system (the number of customers in the
queue or the inventory level of an item). We consider only the objective of maximizing, but all our
results are valid for minimization problems when the properties are re-defined properly. Moreover,
we define v(x) as a generic value function of an infinite horizon problem with discounting. In other
words, v(x) is the total maximal expected discounted profit of the system over an infinite horizon.
7
However, we note that, as mentioned earlier, the existence of the infinite-horizon value functions
should be shown for each different model, even if the model is constructed as a combination of the
operators defined below.
We define our operators mainly for one-dimensional systems. However, our framework can
accommodate Markov modulation by defining an operator to represent the changes in the environment state, namely TEN V (see below for the precise definition). Since the state of the environment
is exogeneous, i.e., we do not control the environment, it can even be multidimensional. We denote
the environment state by e, and change the state of the system x to (x, e). Then the operators will
be modified as in the following example: If a certain operator T is defined as T v(x) = v(x + 1), we
re-define it as T v(x, e) = v(x + 1, e), where e is the state of the exogenous environment.
Our framework can model systems with finite or infinite buffer capacity. We will define the
operators mainly for systems with infinite capacity. When stating and proving our results, we will
always address the finite-capacity systems explicitly.
In principle, we can classify the operators into two: the controlling operators are those which
involve a maximization, e.g., TADMi in the example, while the un-controlling operators are all the
others such as TDEP . Now we are ready to define the operators to be considered:
The cost operator represents the systems incurring a holding cost, h(x), which is a function of
the state of the system:
TCOST v(x) = v(x) − h(x),
where the holding cost function h(x) is non-decreasing and convex in the inventory level x.
We will define the arrival operator, TARR , to represent the arrival process to a queueing system:
TARR v(x) = a(x)v(x + 1) + [1 − a(x)]v(x).
The function a(x) is, in fact, the probability that an arriving customer joins the system when there
are x customers, which we refer to as the joining probability. We assume that a(x) is non-increasing
in x. When a(x) is constant, TARR models a system where customers enter the system with a fixed
probability, say a, independent of the state, or choose not to enter the system with probability
1 − a. We will call this type of arrivals as regular arrivals, since they do not depend on the state of
the system. Moreover, we can model finite systems by setting a(x) = 0 for all x ≥ K. Our results
will require certain properties on a(x), which will be explicitly mentioned.
A change in the arrival rate λ will change the probability of observing an arrival, and so this
operator is directly related with λ. Whenever the function a(x) does not depend on λ, the operators
themselves do not change with λ, only their multipliers in the optimality equations will change.
The function a(x), on the other hand, may depend on one of the parameters λ, µ, or m. In this
case, the definition of the operator TARR changes with the parameter under consideration, so that
we need to consider explicitly how a(x) varies with it. Finally, an increase in the system capacity
K always changes the definition of this arrival operator.
8
The departure operator, TDEP , represents the departure of an existing customer from the system,
where the service rate may depend on the state of the system. TDEP is defined as in (1), where
the function b(x) corresponds to the probability of a service completion when the system has x
customers. In general, b(x) can be any non-decreasing and concave function of x. Whenever we
need to model m identical parallel servers, b(x) is specified as in (1). We note that the function
b(x) may depend on one of the parameters λ, µ, or m in a more complex way. Then, as with the
function a(x) in TARR , we need to consider explicitly how b(x) varies with the parameter under
consideration.
The controlled departure and the controlled production operators, TCD and TC
P RD ,
represent
the choice of the best service rate in queuing and inventory systems, respectively. π is the used
portion of the service rate and cπ is the non-negative cost of using this portion, i.e., cπ ≥ 0. We
assume that c0 = minπ∈[0,1] cπ . Then:
⎧
⎨max
π∈[0,1] {−cπ + πv(x − 1) + (1 − π)v(x)} if x > 0
TCD v(x) =
⎩−c + v(x)
if x = 0,
0
TC
TCD (TC
P RD )
P RD v(x)
= max {−cπ + πv(x + 1) + (1 − π)v(x)} .
π∈[0,1]
is affected by a change in the service rate (production rate) µ, where µ changes
the probability of finishing the service of a customer (of producing an item). For a capacitated
system, an increase in the parameter K will change the definition of TC
to emphasize at this point that TC
P RD
P RD .
It is important
is the main production control operator employed in the
literature on make-to-stock queue.
The queue pricing and the inventory pricing operators, TQ P RC and TI
P RC
represent the opti-
mal price to be charged for the arriving customers in queuing and inventory systems, respectively.
F (.) is the cumulative distribution function of the reservation price of an arriving customer, R,
where R is the maximum price a customer is willing to pay. Hence:
TQ P RC v(x) = max F̄ (p)[v(x + 1) + p] + F (p)v(x) ,
p
⎧
⎨maxp F̄ (p)[v(x − 1) + p] + F (p)v(x)
TI P RC v(x) =
⎩v(x)
if x > 0
if x = 0,
where F̄ (p) = 1 − F (p).
Both of these pricing operators are affected by an increase in the arrival rate λ, through an
increase in the probability of observing the event of pricing due to an arrival. The operator TQ P RC
is additionally affected by an increase in the capacity of the system K, which changes the definition
of the operator.
The batch admission and the batch rationing operators, TB
ADM i
and TB
RT i ,
represent the
choice of the number of the customers to be admitted from an arriving batch of class-i customers
9
in queuing and inventory systems, respectively. We assume that some of the customers in a batch
can be admitted while the remaining ones are rejected, which is defined as partial acceptance in
[18]. Bi is the size of an arriving batch, κi is the number of class-i customers admitted from this
batch, and Ri is the reward obtained by admitting one class-i customer. Therefore:
TB
ADM i v(x)
=
RT i v(x)
=
TB
max {κi Ri + v(x + κi )} ,
κi ≤Bi
max
κi ≤min{x,Bi }
{κi Ri + v(x − κi )} .
We note that TADMi defined in the above example is a special case of TB
ADM i ,
where only single
customers arrive at the system, i.e., Bi = 1 for all i.
The batch operators have a similar spirit to pricing operators, with respect to the effects of
parameters. Hence, they are both affected by an increase in the arrival rate λ, which increases the
probability of observing the event of batch admission or rationing, while the operator TB
ADM i
is
additionally affected by an increase in the capacity of the system K, which changes the definition
of the operator.
The uniformization operator puts all events together with their corresponding probabilities:
l
TU N IF ({fi }lj=1 ; {pi }lj=1 )(x) =
l
pj fj (x), with
j=1
j=1
pj ≤ 1.
TU N IF is a convex combination of functions fj , and it only reflects the changes in the parameters
so that it is not affected by any parameter directly.
whereas
l
j=1 pj
l
j=1 pj
< 1 models discounting criterion,
= 1 refers to long-run average criterion.
The fictitious, TF IC , operator represents all the fictitious events, which affect neither the state
nor the reward of the system:
TF IC v(x) = v(x).
This operator is affected by both λ and µ, since any change in these parameters increase
the probability of an arrival or a departure (or production), which decreases the probability of a
fictitious event.
The environment, TEN V (j) , operator represents the transition from the exogenous environment
e to the exogenous environment j in the Markov Modulated models. Then:
TEN V (j) v(x, e) = v(x, j).
TEN V (j) is not affected by a change in any of the parameters since it represents an exogeneous
process.
4
Properties of the Operators
In this section, we first prove that the operators introduced in Section 3 preserve some structural
properties such as monotonicity and concavity under certain assumptions. This will enable us to
10
conclude that the models constructed by using these operators have the same properties. Some of
these operators, especially the ones applying to queueing systems, are similar to the operators in
Koole [11] and [12], and Koole has already shown some structural properties of these operators.
However, Section 4.1 discusses and proves these properties in some detail for a number of reasons:
The operators defined here and in [11] and [12] do not coincide, as we present some extensions on
queueing operators, either to incorporate certain generalities, or to facilitate representing certain
events, whereas operators related to inventory systems have not been considered. Moreover, we
consider some additional structural properties in terms of bounding certain quantities. Finally,
it is essential to understand the main structural properties of the operators, which guarantee the
existence of a monotone optimal policy, in order to consider the effects of system parameters on
the monotonicity of optimal policies.
4.1
Structural Properties Preserved by the Operators
The first property on which we focus is the monotonicity. We define the monotonicity such that
the value function is non-increasing in x for all states x, i.e., v(x) ≥ v(x + 1). This property implies
a positive burden of an additional customer or an additional inventory in the system. We refer to
this burden as the opportunity cost of an additional customer or an additional inventory in state
x, and denote it by ∆v(x) with ∆v(x) = v(x) − v(x + 1). For most plausible queueing systems, this
burden is indeed positive. However, in inventory models, opportunity costs are not always positive
or always negative, so that the monotonicity of the value functions does not hold in general.
In many queueing and inventory problems, the opportunity costs affect optimal decisions, and
monotonicity of the opportunity costs (e.g., concavity of the value function in a maximization
problem), implies the monotonicity of optimal policies. For instance, threshold policies are optimal
in admission control problems due to the monotonicity of opportunity costs. Therefore, we also
investigate the monotonicity of opportunity costs.
In addition to the above properties, we consider upper and lower bounds on the opportunity
costs, ∆v(x), in queueing and inventory problems, respectively. The existence of upper bounds in
queueing systems implies that the opportunity cost of a new customer, ∆v(x), may be lower than
the reward of one or more demand classes for all states x, and thus these classes are admitted to the
system whenever it is possible. Similarly, the existence of lower bounds in inventory systems implies
that opportunity cost of an additional inventory, ∆v(x), may be higher than the reward of one or
more demand classes for all states x, and thus the demands from these classes are satisfied whenever
it is possible. Such classes are defined to be preferred classes in [19] and [22], a terminology we
adopt in this paper. Hence, these bounds imply the existence of preferred class(es). Because of
these relationships, we investigate the upper and lower bounds on the opportunity costs as well
as the monotonicity and concavity of the value function. To denote the bounds, we define new
11
properties Lower-Bounded Differences (LBD) and Upper-Bounded Differences (U BD) such that f
is an LBD function if there exists L ∈ R such that f (x) − f (x + 1) ≥ L for all x, and it is a U BD
function if there exists 0 < U ∈ R with f (x) − f (x + 1) ≤ U for all x.
The results on the monotonicity, concavity and bounds of the operators are presented in the fol-
lowing lemma. In Lemma 1, we use a similar notation to the one in [11]: For a certain event operator,
T , P rop1 , . . . , P ropk → P rop denotes that if the function f has the properties P rop1 , . . . , P ropk
then T f has the property P rop. The explicit definitions of the properties we consider are as follows:
N on − Inc(x) : f (x) ≥ f (x + 1),
Conc(x) : ∆f (x) ≤ ∆f (x + 1),
LBD(L) : f (x) − f (x + 1) ≥ L
U BD(U ) : f (x) − f (x + 1) ≤ U,
for some L ∈ R and U ≥ 0. We do not have any restriction on L because in the inventory models
the opportunity cost of an additional inventory may be less than 0.
Lemma 1
Conc(x).
• TU N IF : Non − Inc(x) → N on − Inc(x) ; LBD(L) → LBD(L); Conc(x) →
• TCOST : N on − Inc(x) → N on − Inc(x) ; LBD(L) → LBD(L); Conc(x) → Conc(x).
• TARR : N on − Inc(x) → N on − Inc(x) ; U BD(U ) → U BD(U ); Conc(x) → Conc(x), given
that a(x) is non-increasing and convex in x.
• TDEP : N on − Inc(x) → Non − Inc(x) ; U BD(U ) → U BD(U ); Conc(x) → Conc(x).
• TCD : N on − Inc(x) → Non − Inc(x) ; U BD(U ) → U BD(U ); Conc(x) → Conc(x).
• TC
P RD
: LBD(L) → LBD(L); Conc(x) → Conc(x).
• TQ P RC : N on − Inc(x) → N on − Inc(x) ; U BD(U ), Conc(x) → U BD(U ); Conc(x) →
Conc(x).
• TI−P RC : LBD(L), Conc(x) → LBD(L); Conc(x) → Conc(x).
• TB
ADM i
• TB
RT i
Conc(x).
: N on − Inc(x) → N on − Inc(x) ; U BD(U ), Conc(x) → U BD(U ); Conc(x) →
: LBD(L), Conc(x) → LBD(L); Conc(x) → Conc(x).
• TEN V (j) : N on − Inc(x) → N on − Inc(x); U BD(U ) → U BD(U ); LBD(L) → LBD(L);
Conc(x) → Conc(x).
• TF IC : N on − Inc(x) → N on − Inc(x) ; U BD(U ) → U BD(U ); LBD(L) → LBD(L); Conc(x)
→ Conc(x).
We note that whenever a finite system receives regular arrivals, so that a(x) = a for all x < K
12
and a(x) = 0 for all x ≥ K, TARR does not preserve concavity since a(x) is not convex in x. Hence,
regular arrivals preserve concavity only in infinite systems.
The proof of the lemma is trivial for TF IC since it is the same as f (x), and the proofs for
TU N IF and TEN V (j) are obvious since TU N IF is a convex combination of all functions entering
the operator, and since we assume that f (x, e) has the same structural properties in all demand
environments, respectively. The proofs of the monotonicity, LBD property and concavity for
TCOST are obvious when the holding cost h(x) is assumed to be non-decreasing and convex. For
the remaining operators, the proofs can be found in the Appendix.
In the following remark, we discuss why TCOST does not preserve the U BD property and its
implication on optimal policies.
Remark 1 Unlike monotonicity, LBD and concavity, TCOST does not preserve the U BD property
of the function on which it is applied. To illustrate, we let f (x) be a U BD function, and then
try to prove that TCOST f (x) is also a U BD function. To achieve this aim, we have to prove that
∆f (x) − ∆h(x) ≤ R is true. We know that ∆f (x) ≤ R by our assumption, so that we need to
show ∆h(x) ≥ 0, i.e., the holding cost is non-increasing in x. However, in any realistic queueing
system, this is not possible for obvious reasons. Therefore, the cost operator does not preserve the
U BD property of f (x), which implies that there does not exist a preferred class in queueing systems
incurring a holding cost.
4.2
Effects of Parameters on Structural Properties
In this subsection, we focus on the behavior of the operators when the parameters change. Now
that the system parameters are allowed to change, we will be considering two systems with two
different parameters simultaneously. Hence, we extend our notation, so that vα (x) denotes the
value function when the parameter under consideration has the value of α. However, if it is clear
that we consider only one system, we will drop the subscript α.
In order to observe the effects of parameter changes on the structure of optimal policies, we
increase α, the parameter whose effects we would like to observe, by ε > 0, and compare the
opportunity costs in the systems with parameters α and α + ε. For this purpose, we define the
supermodularity and submodularity of the value function with respect to the state (x) and the
parameter under consideration (α) as follows:
∆vα (x) ≥ ∆vα+ε (x), and
(2)
∆vα (x) ≤ ∆vα+ε (x),
(3)
respectively. Intuitively, the value function, vα (x), is supermodular with respect to α and x if the
opportunity cost, ∆vα (x), is non-increasing in α, and it is submodular if ∆vα (x) is non-decreasing
in α.
13
To show the supermodularity (submodularity) of the value function, vα (x), with respect to x
and α, we need to prove that all operators which constitute the model preserve the supermodularity
(submodularity) with respect to x and α. However, this is not always sufficient to guarantee the
desired result.
Let us illustrate this with the example we introduced earlier. Assume that we are interested in
the effect of an ε increase in the service rate µ. Unlike the arrival rates λi , such an increase does
not introduce any trade-off between customers, so the intuition points out only one direction: Since
the service will be faster for all customers, the expected burden of an extra customer in the system
should decrease. Hence, we expect the value function, vµ (x), to be supermodular with respect to
x and µ. Then, we need to show:
⎡
N
λ ∆TADMi vµ (x)
⎢ i=1 i
⎢ +M µ ∆T
DEP vµ (x)
⎢
∆vµ (x) = ⎢
⎢
+θ ∆TF IC vµ (x)
⎣
with:
⎤
⎡
⎥ ⎢
⎥ ⎢
⎥ ⎢
⎥≥⎢
⎥ ⎢
⎦ ⎣
N
i=1 λi ∆TADMi vµ+ε (x)
+M µ ∆TDEP vµ+ε (x)
+θ ∆TF IC vµ+ε (x)
+M ε ∆[TDEP vµ+ε (x) − vµ+ε (x)]
⎤
⎥
⎥
⎥
⎥ = ∆vµ+ε (x), (4)
⎥
⎦
N
vµ+ε (x) = TCOST
i=1
λi TADMi vµ+ε (x) + M (µ + ε)TDEP vµ+ε (x) + (θ − M ε)TF IC vµ+ε (x) ,
N
vµ (x) = TCOST
λi TADMi vµ (x) + M µTDEP vµ (x) + θTF IC vµ (x) ).
i=1
When supermodularity is preserved by all operators of this model, the first three lines in (4) satisfy
the desired inequality. However, the last line involves with the quantity TDEP vµ+ε (x) − vµ+ε (x):
In the altered system, the service rate is increased to µ + ε for each server, which decreases the rate
of the fictitious event to θ − M ε to ensure that uniformization still holds with the same time scale.
Hence, we need to compare the consequences of having a service completion instead of having
no event in states x and x + 1. Notice that the remaining term is observed only in one of the
systems, so it depends only on the state of the system. In this example, this extra term should be
non-decreasing in x to preserve the supermodularity of the value functions:
∆[TDEP vµ+ε (x) − vµ+ε (x)] = TDEP vµ+ε (x) − vµ+ε (x) − (TDEP vµ+ε (x + 1) − vµ+ε (x + 1)) ≤ 0.
This monotonicity implies that service completions are more valuable when the number of customers
is higher.
Now we consider the general framework: Let T be the operator related with the parameter α.
Then, an increase in parameter α generates an extra term T vα+ε (x) − vα+ε (x) in the optimality
equation, whenever this increase affects the probability of observing the event represented by T . We
call this term “the marginal benefit of operator T ”, since it compares two systems, one observing
14
the event generated by the operator T , and the other remaining in its current state. Here, we
note that the marginal benefit of an operator T is generated only by a change in µ and λ, since
neither K nor m changes the probability of events, as discussed in Section 3. When the marginal
benefit of an operator has to be acknowledged separately, its monotonicity will be a determining
factor in showing the supermodularity or submodularity of the value functions with respect to the
corresponding parameter.
The next subsection is devoted to the supermodularity and submodularity properties preserved
by the operators. Section 4.2.2 concentrates on the monotonicity of the marginal benefit of operators. Finally, in section 4.2.3, we present a short discussion on the consequences of all our results
in the queueing and inventory systems.
4.2.1
Supermodularity and Submodularity
In this section, we present a systematic analysis of supermodularity and submodularity of the value
functions with respect to the state of the system, x, and the parameter under consideration, α,
where α can be µ, λ, m or K.
We first consider the effects of the service (production) rate, µ, and the arrival rate, λ, so let
α ∈ {µ, λ}. Then, a change in α does not change the state space of the system, meaning that
both systems under consideration will live in the same state space. Moreover, α does not change
the definitions of operators, unless the functions a(x) and b(x) in operators TARR and TDEP ,
respectively, are allowed to vary by α. We show that when the definitions of the operators are
not changed by α, all operators preserve supermodularity and submodularity of the value function,
vα (x).
If a(x) and b(x) depend on α, then we need additional conditions on how α affects these
functions. For example, assume that we are interested in the supermodularity of the value functions
with respect to α, i.e., we expect the opportunity costs to decrease in α. We assume, furthermore,
that a(x), the joining probability, in the operator TARR is a function of α. Now, we need to check
whether TARR preserves supermodularity. If a(x) increases with α in all states x, then the overall
load of the system may increase, which may also increase the opportunity costs, to the contrary
of the initial expectation of supermodularity. However, under certain restrictions on how a(x)
changes with α, we can ensure that TARR preserves supermodularity. Lemmas 2 and 3 specify
conditions on how a(x) and b(x) can depend on α for TARR and TDEP to preserve supermodularity
and submodularity, respectively.
Now we consider the parameter K. An increase in K changes the state space, which has a greater
impact on the directly related operators. In particular, K affects the arrival-related operators in
queueing and TC
P RD
in inventory systems in a special way, whereas its impacts on other operators
are similar to those discussed above. At this point, we group the operators affected by K strongly,
15
as controlling operators, i.e., TQ P RC , TB
ADM i
and TC
P RD ,
and as un-controlling operator, i.e.,
TARR . A change in K affects these two groups of operators in opposite ways.
We first consider the un-controlling operator, TARR , defined for queueing systems: When K
is increased to K + 1, a(K) increases from 0 to some strictly positive value, so that an arrival
which would never join the queue in state K before the change, will join the queue with a positive
probability. Therefore, the un-controlling operator, TARR , leads to an increase in the overall load
of the system when K is increased. Moreover, we cannot derive sufficient conditions to bound
this increase as in the case when λ or µ affects a(x) or b(x), since the state space of the system
is changed. Hence, we cannot guarantee that this operator will preserve supermodularity, which
needs the opportunity costs to decrease with an increase in K. As a result, we show that TARR
preserves only submodularity with respect to K.
For the controlling operators, on the other hand, increasing K brings an extra opportunity to
the decision maker: In a queueing system with waiting room capacity K, new arrivals are not
allowed to enter the system when there are K customers in the system whether it is optimal or not.
If it is optimal to reject the arrivals, the optimal policy does not change when the capacity increases.
On the other hand, if the arrivals are rejected only because of the capacity, a new customer may
be accepted when there are K customers after an increase in the capacity. Obviously, an increase
in the capacity to K + 1 cannot lead to rejecting the customers who were accepted in the system
with capacity K. In other words, the opportunity cost of a new customer cannot increase by an
increase in the capacity. A similar argument can be given for the controlled production operator
TC
P RD .
Thus, we only show that the controlling operators preserve supermodularity with respect
to K. Section 3 of the Appendix presents explicit examples, where submodularity is not preserved
with respect to K by TB
ADM i
and supermodularity is not preserved by TARR for regular arrivals.
All other operators preserve both supermodularity and submodularity of the value function with
respect to K and x.
Finally, we consider the effects of the number of servers, m. When the system has m identical
parallel servers, the function b(x) depends on m in a specific way given by formula (1). Hence, it
is non-decreasing in m and supermodular with respect to m and x. Then, b(x) does not satisfy the
condition given for the operator TDEP in Lemma 3 when α = m, so that TDEP does not preserve
submodularity with respect to m and x. This is very intuitive, since increasing the number of
servers, in other words, service capacity, cannot increase the opportunity costs. At this point, a
natural question comes up regarding the service rate µ, since Lemma 3 will ensure that TDEP
preserves submodularity with respect to µ and x. In fact, increasing m changes the definition of
the operator TDEP . Therefore, the effect of a change in m involves the marginal benefit generated
by TDEP implicitly. As we will see in the next subsection, the marginal benefit of TDEP is nondecreasing in x, so that whenever µ is increased the opportunity costs will always decrease. Hence,
the effects of increasing µ and m on the opportunity costs are the same, and they agree with our
16
intuition. For all other operators, the effects of m are similar to those of λ and µ when m < K,
while they are similar to those of K for m = K.
We summarize our results about the effects of the parameters on the operators in the following lemmas. We use the same notation in Lemma 1. In addition, we define SuperM (α, x) and
SubM (α, x) to denote the supermodularity and submodularity with respect to α and x. A detailed
proof of Lemma 2 can be found in the Appendix, while the proof of Lemma 3 is omitted since it is
very similar to that of Lemma 2.
The properties of the operators related to preserving supermodularity is summarized by the
following lemma:
• TU N IF : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}.
Lemma 2
• TCOST : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}.
• TARR : In systems with infinite capacity: N on−Inc(x), Conc(x), SuperM (α, x) → SuperM (α, x),
given that a(x) is non-increasing and convex in x, non-increasing in α, and submodular with
respect to α and x, for all α ∈ {µ, λ, m}.
TARR : In systems with finite capacity: N on−Inc(x), Conc(x), SuperM (α, x) → SuperM (α, x),
given that a(x) is non-increasing and convex in x and it does not depend on α, for all α ∈
{µ, λ, m} if m < K, and for all α ∈ {µ, λ} if m = K.
• TDEP : N on − Inc(x), Conc(x), SuperM (α, x) → SuperM (α, x), given that b(x) is nondecreasing in α and supermodular with respect to α and x, for all α ∈ {µ, λ, m, K}.
• TCD : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}.
• TC
P RD
: SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}.
• TQ P RC : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}.
• TI
P RC
: SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}.
• TB
ADM i
• TB
RT i
: Conc(x), SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}.
: Conc(x), SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}.
• TEN V (j) : SuperM (α, x) for all demand environments → SuperM (α, x) for all demand environments, for all α ∈ {µ, λ, m, K}.
• TF IC : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}.
We note that systems receiving regular arrivals, which join the queue with a constant probability,
do not preserve supermodularity (or submodularity) in finite systems, since concavity cannot be
guaranteed in these systems due to the fact that a(x) is not convex in x. The following lemma
presents the properties of the operators related to preserving submodularity:
Lemma 3
• TU N IF : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m, K}.
17
• TCOST : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m, K}.
• TARR : N on − Inc(x), Conc(x), SubM (α, x) → SubM (α, x), given that a(x) is non-increasing
and convex in x, non-increasing in α, and submodular with respect to α and x, for all α ∈
{µ, λ, m, K}.
• TDEP : N on − Inc(x), Conc(x), SubM (α, x) → SubM (α, x), given that b(x) is non-increasing
in α and submodular with respect to α and x, for all α ∈ {µ, λ, K}.
• TCD : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, K}.
• TC
: SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ}.
P RD
• TQ P RC : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m} if m < K, and for all α ∈ {µ, λ} if
m = K.
• TI
P RC
: SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, K}.
• TB
ADM i
• TB
RT i
: Conc(x), SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m} if m < K, and for all
α ∈ {µ, λ} if m = K.
: Conc(x), SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, K}.
• TEN V (j) : SubM (α, x) for all demand environments → SubM (α, x) for all demand environments, for all α ∈ {µ, λ, m, K}.
• TF IC : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m, K}.
As we can observe from the above lemmas, most operators preserve both supermodularity
and submodularity. In particular, if the waiting room capacity is not limited to a finite K, all
operators preserve both properties with respect to µ and λ. This is one of the reasons that the
monotonicity of the marginal benefit of operators is the key factor to guarantee the supermodularity
and submodularity of the value functions.
4.2.2
The Marginal Benefit of Operator T : T v(x) − v(x)
In this section, we will not indicate the dependence of the value functions on α, since the marginal
benefit of an operator is defined for systems having the same set of parameters. In fact, the
marginal benefit of an operator, T , compares two systems both in state x, where one observes the
event caused by an increase in the parameter, α, which affects the operator, T , directly, while the
other one remains in its current state. Now consider the case when the marginal benefit is nondecreasing in x, so T v(x) − v(x) ≤ T v(x + 1) − v(x + 1), which can be written as T ∆v(x) ≤ ∆v(x).
Then observing the event associated with the operator T decreases the opportunity costs, so that
increasing the parameter α will also decrease the opportunity costs. Hence, supermodularity of the
value functions in parameter α and x will always need the marginal benefit of operator T associated
with parameter α to be non-decreasing in x. Similarly, whenever we need submodularity of the
18
value functions, we expect to have the marginal benefit of T to be non-increasing in x.
The supermodularity and submodularity of the value functions depend on the characteristics
of the event and the context of the problem. In the context of queueing control, an increase in
the service capacity will decrease the opportunity costs, which calls for the supermodularity of the
value functions. In terms of the marginal benefit, this intuition requires the departure of an existing
customer to be more valuable when there is one more customer in the system. Similarly, an increase
in the arrival rate of a queueing system should increase the opportunity costs, so that the arrival
of a new customer is expected to be more valuable when there is one less customer. Therefore,
T v(x) − v(x) is non-decreasing in x for departure-related operators and non-increasing in x for
arrival-related operators. However, this intuition may not apply to the un-controlled operator
TARR in all circumstances: If the waiting room capacity is K, then the arrivals to a system in
state K do not bring any burden, and so a high level of x may, in fact, decrease the load of the
system. Moreover, since the joining probability function, a(x), is assumed to be non-increasing in
x, the number of customers in the system, the system faces two opposing effects: On one hand,
as the load of the system becomes higher, the arrival of a new customer has a higher burden on
the system. On the other hand, a new customer also relieves this burden because s/he induces a
decrease in the joining probability. Hence, we consider the marginal benefit of the arrival operator
TARR only in systems receiving regular arrivals to an unlimited waiting room, so that a(x) = a for
all x ≥ 0 for some 0 < a ≤ 1.
In the inventory control context, if the level of on-hand inventory is high, then the arrival of
a new customer will be more valuable, while producing a new item will bring little benefit. More
precisely, an increase in the production capacity will increase the opportunity costs. Hence, the
value functions are expected to be submodular in µ and x, while the marginal benefit of production
operators should be non-increasing in x, so that producing a new item is more valuable if the
inventory is lower. Similarly, an increase in the demand arrival rate will decrease the opportunity
costs, and the marginal benefit of demand arrival operators should be non-decreasing in x. To
summarize, in inventory systems T v(x) − v(x) is non-decreasing in x for arrival-related operators
and non-increasing in x for the production operator.
We state the results on the marginal benefit in Lemma 4. The notation is similar to Lemma 1,
where we say that f has N on − Dec − M B(x) property if the marginal benefit generated by the
operator T is non-decreasing in x, i.e., T f (x) − f (x) ≤ T f (x + 1) − f (x + 1). Similarly, if a function
f has the property N on − Inc − M B − T (x), then T f (x) − f (x) ≥ T f (x + 1) − f (x + 1). The proof
of Lemma 4 is presented in the Appendix.
Lemma 4
• TARR : Conc(x) → N on − Inc − M B(x), in systems with infinite buffer, given that
a(x) = a for all x ≥ 0 for some 0 < a ≤ 1.
• TDEP : N on − Inc(x), Conc(x) → N on − Dec − M B(x).
19
• TCD : Conc(x) → N on − Dec − M B(x).
• TC
P RD
: Conc(x) → N on − Inc − M B(x).
• TQ P RC : Conc(x) → N on − Inc − M B(x).
• TI−P RC : Conc(x) → N on − Dec − M B(x).
• TB
ADM i
• TB
RT i
4.2.3
: Conc(x) → N on − Inc − M B(x).
: Conc(x) → N on − Dec − M B(x).
Discussion of the results
The supermodularity and submodularity of the value functions with respect to µ and λ requires
the conclusions of both sections 4.2.1 and 4.2.2, since the monotonicity of the marginal benefits
is necessary in addition to supermodularity or submodularity of the value functions with respect
to the parameter under consideration. In fact, the determining factor for the value functions to
preserve supermodularity or submodularity will always be Lemma 4. For the parameters m and
K, on the other hand, the marginal benefits of the affected operators do not come into the picture,
and so the supermodularity or submodularity of the value functions with respect to m and K will
follow from section 4.2.1 immediately.
Now consider systems with infinite capacity. All queueing models that can be represented as
a combination of the operators described here satisfy supermodularity with respect to µ and m.
However, the queueing models satisfying submodularity property with respect to λ can be built as
a combination of all operators, except for the operator TARR . By Lemma 4, TARR preserves the
property of non-increasing marginal benefits only if a(x) is constant for all x. Hence, we cannot
guarantee submodularity with respect to λ in the presence of the operator TARR when a(x) is not
constant.
In inventory models with infinite capacity, the value functions satisfy supermodularity with
respect to λ and submodularity with respect to µ for all possible combinations of the operators
considered here.
When the waiting room is finite, all queueing models represented by any combination of the
operators still satisfy supermodularity with respect to µ and m for m < K. For these systems,
we cannot guarantee that operator TARR preserves the submodularity of the value functions with
respect to λ even when a(x) is constant (see Lemma 4). However, any combination of all other
operators will satisfy submodularity with respect to λ. The effect of a change in the size of the
waiting room K is more complex: Any queueing system which represents the arrivals by only uncontrolling operator TARR , satisfies submodularity with respect to K, whereas any system with
only controlling arrival operators TQ P RC and TB
ADMi
will have supermodularity with respect to
K. We, finally, consider the case with m = K: Then all combinations of operators which exclude
TARR will model queueing systems that satisfy supermodularity with respect to m (or K), whereas
20
the value functions of the systems with TARR are not submodular in general due to the preserved
supermodularity property with respect to m.
In inventory systems with finite capacity, the value functions will still satisfy supermodularity
with respect to λ and submodularity with respect to µ for all possible combinations of the operators
considered here. Now consider the effect of changing the finite capacity K: Since production can
always be controlled in inventory systems, and so TC
P RD
is the only production operator considered
here, all combinations of inventory operators representing a meaningful system will always include
the operator TC
P RD .
Then, the value functions of all possible inventory models are supermodular
with respect to K, due to the supermodularity preserved by TC
P RD .
All the results of this section agree with the basic intuition regarding to these properties. In
fact, the results are restricted to some parameters and/or by certain conditions whenever the
corresponding intuition is not clear. For example, when the waiting room capacity is finite in
queueing systems, the intuition regarding to the parameter changes blurs due to the boundary
effect of K, which is reflected in our results.
5
Illustration of Results and Examples
In this section, we illustrate how our results can establish the effects of parameter changes on the
structure of optimal policies in a number of models. We will first consider two admission control
models, where the first model is the one introduced in section 2, and the other one considers a loss
system receiving batch arrivals. Section 5.2 will sweep through the literature to point out several
interesting models, which fall into our framework. Finally, we show a full application in a stock
rationing problem of an inventory system with batch arrivals, for which we not only establish the
structure of the optimal policy but also establish the effects of the parameters on the optimal policy.
5.1
Illustration of Results on Admission Control Problems
Let us first illustrate the approach in detail using the admission control model of Section 2. Similar
admission control problems for queueing systems have been studied by for instance Stidham [23]
and Blanc et. al. [3].
Recall that the model outlined in Section 2 uses the main operators TADMi and TDEP , in
addition to TF IC , TU N IF and TCOST . As mentioned before, TADMi is, of course, a special case
of TB
ADMi
with a batch size of 1. The main structural results announced in Section 2 can now
be easily verified. Using Lemma 1, it is seen that all operators propagate monotonicity (in the
non-increasing sense) and concavity. Therefore, the value function is non-increasing and concave in
the queue length and threshold-type policies are optimal, as stated in Proposition 1. In addition,
class 1 would be a preferred class if ∆v(x) ≤ R1 , that is if the U BD property holds with parameter
R1 . From Lemma 1, it can be seen that along with concavity, U BD(R1 ) propagates through all
21
operators but TCOST . This implies that unless h(x) = 0 for all x, there cannot be a preferred class.
On the other hand if h(x) = 0 for all x, class 1 is preferred and will always be accepted.
Let us now turn our attention to the effects of system parameters. As explained before, changes
in the arrival or service rates in this model change the event occurence probabilities and must be
treated differently. In particular, for these parameters, the results of Lemma 4 are critical. Let
us start by investigating the effect of an increase in the service rate µ. This has a direct effect on
TDEP and indirect effects on all other operators. By Lemma 4, TDEP preserves the non-decreasing
marginal benefits property since the value function is non-increasing and concave in x. This implies
that if supermodularity in µ and x also holds, optimal thresholds will be non-decreasing in µ. To
check supermodularity, we go back to Lemma 2 which reveals that supermodularity propagates for
all operators when the value function is non-increasing and concave. This shows that all optimal
thresholds are non-decreasing in the service rate, µ.
Now comes the more interesting case, where we increase the arrival rates, λi . In fact, we will
check the effect of an increase in λi similarly to above. By Lemma 4, TADMi propagates the property
of non-increasing marginal benefits since the value function is concave in x. Lemma 3 then shows
that all operators support submodularity in λi and x when the value function is non-increasing
and concave in x. This establishes that all optimal thresholds are non-increasing in arrival rates
λi , which agrees with the second line of thought proposed in Section 2. In terms of the numerical
example introduced, Table 1 gives the values of optimal thresholds, when each λi is increased by
0.1
Finally, let us investigate the effect of increasing the number of servers m. This is relatively
easier to check since Lemma 4 is not needed. b(x) is non-decreasing in m and supermodular in
m and x. This implies by Lemma 2 that TDEP preserves supermodularity in m and x since the
value function is non-increasing and concave in x. In addition, the lemma also shows that all other
operators preserve supermodularity. This ensures that all optimal thresholds are non-decreasing in
the number of servers, m.
In order to emphasize the strength of the framework in its generality, let us now extend the
basic model of Section 2. Most of the studies on admission control focus on loss systems with no
waiting room. Lippman and Ross [15], Altman et. al. [2], Örmeci et. al. [19], Savin et. al. [22],
Örmeci and van der Wal [20] and Gans and Savin [6] consider the admission control problem in
a loss system, which receives single arrivals. As a relatively complicated example, let us take the
model in Örmeci and Burnetas [18] where a loss system with batch arrivals is considered.
In particular, Örmeci and Burnetas [18] consider a partial acceptance problem, where some of
the customers in a batch can be admitted while the remaining ones are rejected, in a loss system
which consists of m identical parallel servers with no waiting room and N classes of customers. In
the original model, it is allowed to have a batch containing different classes of customers. However,
here we assume that each arriving batch consists of only one class of customers, so that arrivals
22
(λ1 , λ2 , λ3 )
(0.5, 0.7, 0.9)
(0.6, 0.7, 0.9)
(0.5, 0.8, 0.9)
(0.5, 0.7, 1.0)
Thresholds (l1 , l2 , l3 )
(72, 32, 6)
(68, 30, 5)
(70, 30, 6)
(72, 32, 5)
Table 1: The values of optimal thresholds in systems with different arrival rates
from class i occur according to a Poisson process with rate λi , and a class-i arrival consists of B
customers with probability piB . At each arrival epoch, the decision maker has to decide the number
of customers to be admitted from an arriving batch. Each admitted class-i customer brings a reward
of Ri ≥ 0 and requires an exponential service time with rate µ, where the classes are ordered with
respect to their rewards, i.e., R1 ≥ · · · ≥ RN .
We model this system as an event-based dynamic program by using the batch admission opera-
tor, TB
ADM i
(modified for the finite buffer size with K = m), the departure operator TDEP , where
the probability of service completion is b(x) = min{x, m}/M with M > m. The value function for
infinite-horizon discounted reward problem for this model is then given by:
N
v(x) = M µTDEP v(x) +
λi
i=1
piB TB
ADM i v(x)
+ θTF IC v(x),
(5)
B
It is seen that most arguments used in the analysis from the simpler version of the model carry
over directly. Hence, v(x) is non-increasing and concave in x, and it has the U BD property since
no holding costs are incurred. These properties, then, imply the existence of an optimal threshold
policy as in [18]:
Proposition 2 There exists an optimal policy of threshold type, i.e., there exist numbers li ≥ 0 for
i = 1, · · · , N , such that it is optimal to admit min{B, (li − x)+ } of incoming B class-i customers
in state x. Moreover, all class-1 customers are always admitted, i.e., m = l1 ≥ l2 ≥ · · · ≥ lN .
Now Lemmas 2 and 4 guarantee that the value functions are supermodular with respect to µ
and x, as well as with respect to m and x, whereas Lemmas 3 and 4 ensure the submodularity of
the value functions with respect to λi and x for all i. These results immediately translate to the
following theorem, which summarizes the effects of the parameters on optimal thresholds.
Theorem 1 In this partial acceptance problem, the optimal thresholds li are non-decreasing in the
service rate, µ, and the number of servers, m, and non-increasing in the arrival rates λi .
Finally we note that the first model studied in Lippman [16] is a special case of this model,
where all batches consist of only single arrivals.
23
5.2
Other Examples from the Literature
Several known models from the literature can easily be analyzed for the effects of changes in the
parameters using the framework provided.
We continue to consider the basic control problems studied in Lippman [16]. The second model
of Lippman is an M/M/1 queue with a controllable service rate consisting of the operators TCD ,
TARR and TF IC . It is shown that the optimal service rate is non-decreasing in the queue length.
We can complement these results by establishing, for example, that the optimal service rate is
increasing in the arrival rate. The third model of Lippman considers a dynamic pricing problem in
an M/M/m queue. This problem employs the operators TQ P RC , TDEP and TF IC . The optimal
price to be charged to an arriving customer is shown to be increasing in the queue length. The
effects of parameters on this problem have recently been analyzed independently by AktaranKalaycı and Ayhan [1] and Çil, Karaesmen and Örmeci [4]. Our framework also easily yields that
the optimal prices are non-increasing in the service rate and in the number of servers, whereas they
are non-decreasing in the arrival rate.
Most basic control problems for make-to-stock queues can also be analyzed within this framework. Consider the standard problem of production control of a single-processor with exponential
processing times and Poisson demand arrivals with linear holding costs and lost sales (see Veatch
and Wein [26]). The optimal production policy for this problem is a threshold policy called a base
stock policy: it is optimal to produce whenever the inventory level is below the base stock level
and not to produce otherwise. The optimality equation is composed of the operators TC
P RD ,
and TDEP , in addition to TCOST and TF IC . It then follows by Lemmas 3 and 4 that the optimal
base stock level is decreasing in µ. Li [14] considers a make-to-stock queue with dynamic pricing
of inventories. In this model, the Poisson demand rate depends on the price chosen. Li shows
that the optimal production policy is of base-stock type, and optimal prices are decreasing in the
inventory level. The optimality equation of this model is composed of the operators TC
TI
P RC ,
P RD ,
and
in addition to TCOST and TF IC . Using the lemmas, we can establish that the base stock
level is non-increasing and the optimal prices are non-decreasing in the processing rate. Even an
extension of Li, when the demand arrivals occur according to a Markov modulated Poisson process
(MMPP), studied by Gayon et. al. [7], falls into our framework by adding the operator TEN V (j) .
For instance, an increased arrival rate in any one of the environment states leads to non-decreasing
optimal prices.
5.3
A Stock Rationing Problem for a Make-to-Stock Queue with Batch Arrivals
Our final example investigates a stock rationing problem with batch demand arrivals. To our
knowledge, this problem has not been analyzed in such generality before and the framework is
employed for both obtaining structural results and investigating the effects of parameters.
24
The stock rationing problem appears when different customer classes who bring different rewards
have to be satisfied from a common stock. Since demand classes have different values, it is more
important to satisfy one class rather than the others. Therefore, it is even possible not to fulfill
certain demand types in anticipation of future demand from the more valuable classes. The basic
stock rationing problem has been studied widely in the inventory literature going back to Topkis [24]
who studies the optimal ordering and rationing policies for a single-product inventory system with
several demand classes under periodic review. Our focus here is on a recent stream of research that
uses the make-to-stock queueing framework to model limited production capacity. For instance, Ha
[8], [9], and de Véricourt et. al. [5] examine such inventory systems under different assumptions.
In particular, Ha [8] considers the stock rationing problem in a make-to-stock production system
with several demand classes and lost sales and characterizes the structure of the optimal policy.
To illustrate our framework in a rationing problem, we choose an extension of the model introduced by Ha [8], but modify it so that we consider profit maximization, as opposed to cost
minimization in the original model. Ha considers a make-to-stock production system that produces
a single product with N demand classes and lost sales. Demand arrivals of class i occur according
to a Poisson process with rate λi , and each customer requests one unit of product. Let us consider the following plausible extension of this model: customers may demand more than one unit
of product at a time. Therefore, we extend the model by considering batch arrivals. We assume
that at an arrival epoch of class i, piB is the probability that there are B customers in the batch,
where i = 1, · · · , N and B = 1, 2, · · · . We assume lost sales as in Ha, so that when a demand
occurs, it is either satisfied from on-hand inventory or completely lost. Whenever a class-i demand
is satisfied, a reward of Ri > 0 per unit is obtained, and without loss of generality we assume that
R1 ≥ · · · ≥ RN . The production time is assumed to be exponential with mean 1/µ. Moreover,
inventory holding cost per unit time, h(x), is a non-decreasing and convex function of the on-hand
inventory.
At any time, the decision maker has to decide whether to produce or not, and the number of
class-i customers to be satisfied from an arriving batch, κi . We assume that the variable production
cost is c, which is set to 0 in [8]. Then, we denote the arrivals and rationing policy by TB
production control by TC
P RD ,
RT i ,
the
where π ∈ {0, 1} and cπ = πc, and the fictitious events by TF IC .
The objective of the problem is to maximize the expected total β-discounted reward over an
infinite horizon. To achieve this aim, we build the corresponding discrete-time MDP of the model
by using uniformization and normalization since the potential transition rate is finite. Then, we
assume that the time between two consecutive transitions is exponentially distributed with rate
µ+
N
i=1 λi
+ β + θ and using the appropriate time scale, assume that µ +
N
i=1 λi
+ β + θ = 1.
The optimality equation can be expressed as:
N
v(x) = TCOST (µTC
P RD v(x) +
λi
i=1
25
piB TB
B
RT i v(x)
+ θTF IC v).
(6)
5.3.1
Structure of the Optimal Policy
In the original model [8], Ha shows that the optimal production control and rationing policies are
of threshold type, and class 1 is preferred, i.e., it is always optimal to satisfy class-1 demands
whenever it is possible. We investigate the existence of similar structures in the extended model.
We first concentrate on the concavity of the value function to ensure that there is an optimal
policy which is of threshold type. To establish concavity we need to show the following inequality:
µ∆TC
+
N
i=1 λi
B
P RD v(x)
piB ∆TB
+θ∆TF IC v(x)
µ∆TC
RT i v(x)
≤
+
−∆h(x)
N
i=1 λi
B
P RD v(x +
piB ∆TB
1)
RT i v(x
+θ∆TF IC v(x + 1)
+ 1)
(7)
−∆h(x + 1).
The first three lines are directly verified by Lemma 1 and the last line is true by the convexity of
h(x). Hence, v(x) is concave in x.
Concavity of the value function v(x) means that the value of an additional item in inventory is
increasing with the current inventory level, x. Hence, as x increases, the system will be more willing
to sell the items, while being less willing to produce them. To state the effects of the concavity of
v(x) on optimal production decisions, we define:
S ∗ = min{x : v(x) − v(x + 1) > −c},
(8)
so that when x = S ∗ , it is optimal not to produce more items. Due to the concavity of v(x), for
all states x ≥ S ∗ , it is not worth producing a new unit; while for all x < S ∗ it is always optimal to
produce.
Now we consider the effect of concavity on the rationing policy. Let:
li∗ = max{x : v(x − 1) − v(x) < −Ri },
(9)
with li∗ = 0 if there is no such x. Therefore, when x = li∗ , it is optimal to reject the whole class-i
batch since v(li∗ − 1) + Ri < v(li∗ ), while for x = li∗ + 1 it is optimal to satisfy one unit from a class-i
batch because v(li∗ )+Ri ≥ v(li∗ +1), by the definition of li∗ . Then for all x ≤ li∗ , v(x−1)+Ri < v(x)
due to the concavity of v(x), so that it will be always optimal to reject the whole batch in all states
x ≤ li∗ , i.e., κ∗i (x) = 0. Similarly, for all x ≥ li∗ + 1, v(x − 1) + Ri ≥ v(x) by concavity, so that it
will be optimal to satisfy class-i demand until either the inventory level drops down to li∗ or the
whole batch is satisfied. Hence, κ∗i (x) = min{x − li∗ , B} for all x ≥ li∗ + 1. In other words, the
optimal rationing policy will reject the whole class-i batch if x ≤ li∗ , partially satisfy the demand if
li∗ < x < li∗ + B, and satisfy the entire batch if x ≥ li∗ + B. Therefore, a low threshold li∗ leads to
a higher fulfillment rate of class-i demand. In fact, if the reward obtained by admitting a class-i
customer is higher than the reward of a class-j customer, then the optimal threshold of class-i
26
will be lower than that of class-j as a result of the definition of li∗ , i.e., when Ri ≥ Rj , li∗ ≤ lj∗ .
Moreover, it is always optimal to satisfy class-1 customers whenever it is possible, i.e., class 1 is
preferred. This result is easily established by Lemma 1, since all operators of the model preserve
the LBD(R1 ) property (see the proof of Lemma 1 for details). The following theorem establishes
the structure of an optimal policy for this model.
Theorem 2 In this stock rationing model:
• A base-stock policy is an optimal production control policy. S ∗ (given by (8)) is the the base-stock
level so that it is optimal to produce only if x < S ∗ .
• The optimal rationing policy is a sequential threshold policy for each demand class, where optimal
thresholds are given by (9). Then, optimal number of class-i customers to be satisfied from an
arriving batch in state x, κ∗i (x), is given by:
⎧
⎨min{B, x − l∗ }
i
κ∗i (x) =
⎩0
if x > li∗
if x ≤ li∗ .
Moreover, li∗ ’s are monotone in i and class-1 is preferred, i.e., lN ∗ ≥ · · · ≥ l1 ∗ = 0
5.3.2
Effects of the Parameters: µ, λi
Intuitively, if the production rate, µ, increases, then the system will process faster and the opportunity cost of an additional inventory will increase because it is not necessary to keep more inventory
when the system processes faster. On the other hand, the opportunity cost of an additional inventory decreases by an increase in the arrival rate of class-i customers, λi , because the higher the
arrival rate, the faster the inventory is consumed. Inequalities (2) and (3) reflect the intuitions
about the effects of the arrival rate and the production rate on the opportunity cost, respectively,
and thus, we may anticipate supermodularity of the value function, vλi (x), with respect to λi and
x, and submodularity of vµ (x) with respect to µ and x. We only show the proof of supermodularity
of vλi (x) with respect to λi and x (submodularity is shown similarly). We need to show that:
µ∆TC
P RD vλi (x)
µ∆TC
N
λj
+
j=1
P RD vλi +ε (x)
N
pjB ∆TB
RT j vλi (x)
B
≥
+
λj
j=1
pjB ∆TB
RT j vλi +ε (x)
B
+θ∆TF IC vλi (x)
+θ∆TF IC vλi +ε (x)
−∆h(x)
−∆h(x)
piB (∆TB
+ε
RT i vλi +ε (x)
B
(10)
− ∆vλi +ε (x)) ,
where the first three lines satisfy the inequality due to Lemma 2. Hence, we consider only the last
line: By Lemma 4, TB
RT i f (x)
− f (x) is non-decreasing in x, i.e., ∆TB
RT i f (x)
− ∆f (x) ≤ 0, so
that the last line is also true. Hence, vλi (x) is supermodular with respect to λi and x.
27
As to the effects of supermodularity on optimal actions, from (8) we see that as ∆vλi (x) decreases
in λi , S ∗ increases. Similarly, li∗ is also increasing by a decrease in ∆vλi (x) by (9).
The proof of submodularity of vµ (x) with respect to µ and x, and of its effects on the optimal
actions are similar to the above, and so omitted. The following theorem summarizes our results on
the effects of the parameters on the optimal policy:
Theorem 3 In this stock rationing model, the optimal base-stock level, S ∗ , and the optimal rationing thresholds, li∗ , are non-increasing in µ and non-decreasing in λi .
6
Conclusion
We presented a general framework for investigating the effects of system parameters on the optimal policy for a class of queueing and inventory control problems. Our approach is based on an
exploration of the properties of the operators that constitute the value function of the given control
problem. The approach gives a clear guideline of how system parameters affect the structure of the
operators and the consequent effects on the optimal policy. In addition, for a rich class of problems,
the framework enables an automatic assessment how changes in different system parameters affect
the optimal policy.
There are several interesting research directions. Extending the scope of application of the
general approach is one potential direction. Additional control operators could be investigated
and possibly a more general class of problems could be addressed. Exploring multi-dimensional
problems is also a major challenge. As always these problems pose additional difficulties and an
immediate general extension seems very hard but it would be interesting to understand to what
extent the results here could be generalized to multi-dimensional problems.
Acknowledgements: We thank Yalçın Akçay, Refik Güllü and Evrim Güneş for their helpful
comments, and Ger Koole for sharing an unpublished manuscript. This work has been partially
supported by a TÜBA Grant.
References
[1] T. Aktaran-Kalaycı and H. Ayhan. Sensitivity of optimal prices to system parameters in a
service facility. Submitted to European Journal of Operations Research, 2005.
[2] E. Altman, T. Jimenez, and G. M. Koole. On optimal call admission control in a resourcesharing system. IEEE Transactions on Communications, 49:1659— 1668, 2001.
[3] J. P. C. Blanc, P. R. de Waal, P. Nain, and D. Towsley. Optimal control of admission to a
multiserver queue with two arrival streams. IEEE Transactions on Automatic Control, 37:785—
797, 1992.
28
[4] E. B. Çil, F. Karaesmen, and E. L. Örmeci. Sensitivity analysis on a dynamic pricing problem
of an m/m/c queueing system. Proceeedings of the 12th IFAC Semposium on Information
Control Problems in Manufacturing, 2006.
[5] F. de Véricourt, F. Karaesmen, and Y. Dallery. Optimal stock allocation for a capacitated
supply system. Management Science, 48:1486—1501, 2002.
[6] N. Gans and S. Savin. Pricing and capacity rationing for rentals with uncertain durations.
Management Science, 53:390—407, 2007.
[7] J.P. Gayon, I.T. Değirmenci, F. Karaesmen, and E.L. Örmeci. Dynamic pricing and replenishment in a production-inventory system with markov modulated demand. Working Paper,
Department of Industrial Engineering, Koç University, 2004.
[8] A. Y. Ha. Inventory rationing in a make-to-stock production system with several demand
classes and lost sales. Management Science, 43:1093— 1103, 1997.
[9] A. Y. Ha. Stock-rationing policy for a make-to-stock production system with two priority
classes and backordering. Naval Research Logistics, 44:458—472, 1997.
[10] Smith J.E. and K.F. McCardle. Structural properties of stochastic dynamic programs. Operations Research, 20:796—809, 2002.
[11] G. Koole. Structural results for the control of queueing systems using event-based dynamic
programming. Queueing Systems, 30:323— 339, 1998.
[12] G. Koole. Monotonicity in markov reward and decision chains: Theory and applications.
Working Paper, Vrije University, The Netherlands, 2007.
[13] C. Y. Ku and S. Jordan. Access control to two multiserver loss queues in series. IEEE
Transactions on Automatic Control, 42:1017— 1023, 1997.
[14] L. Li. A stochastic theory of the firm. Mathematics of Operations Research, 13(3):447—466,
1988.
[15] S. A. Lippman and S. M. Ross. The streetwalker’s dilemma: A job shop model. SIAM Journal
of Applied Mathematics, 20:336— 342, 1971.
[16] S.A. Lippman. Applying a new device in the optimization of exponential queuing systems.
Operations Research, 23:687— 710, 1975.
[17] A. Müller. How does the value function of a markov decision process depend on the transition
probabilities. Mathematics of Operations Research, 22:872—885, 1996.
29
[18] E. L. Örmeci and A. Burnetas. Admission control with batch arrivals. Operations Research
Letters, 32:448—454, 2004.
[19] E. L. Örmeci, A. Burnetas, and J. van der Wal. Admission policies for a two class loss system.
Stochastic Models, 17:513— 540, 2001.
[20] E. L. Örmeci and J. van der Wal. Admission policies for a two class loss system with general
interarrival times. Stochastic Models, 22:37—53, 2006.
[21] M.L. Puterman. Discrete Stochastic Dynamic Programming. John Wiley and Sons, USA, 1994.
[22] Savin S., Cohen M., Gans N., and Katalan Z. Capacity management in rental businesses with
heterogeneous customer bases. Operations Research, 53:617—631, 2005.
[23] S. Stidham. Optimal control of admission to a queuing system. IEEE Transactions on Automatic Control, 30:705— 713, 1985.
[24] D. M. Topkis. Optimal orderingand rationing policies in a nonstationary dynamic inventory
model with n demand classes. Management Science, 15:160—176, 1968.
[25] M.H. Veatch and L.M. Wein. Monotone control of queueing networks. Queueing Systems,
12:391—408, 1992.
[26] M.H. Veatch and L.M. Wein. Scheduling a make-to-stock queue: index policies and hedging
points. Operations Research, 44(4):634—647, 1996.
30