EFFECTS OF SYSTEM PARAMETERS ON THE OPTIMAL POLICY STRUCTURE IN A CLASS OF QUEUEING CONTROL PROBLEMS Eren Başar Çil∗ , E. Lerzan Örmeci∗∗ and Fikri Karaesmen∗∗ ∗ Kellog School of Management Northwestern University 2001 Sheridan Rd Evanston, IL 60208, USA [email protected] ∗∗ Department of Industrial Engineering Koç University 34450, Sarıyer, İstanbul, TURKEY [email protected], [email protected], Effects of System Parameters on the Optimal Policy Structure in a Class of Queueing Control Problems Eren Başar Çil∗ , E. Lerzan Örmeci∗∗ , Fikri Karaesmen∗∗ ∗ Kellog School of Management, Northwestern University, Evanston, USA [email protected] ∗∗ Department of Industrial Engineering, Koç University, İstanbul, TURKEY [email protected], [email protected] Abstract This paper studies a class of queueing control problems involving commonly used control mechanisms such as admission control, pricing, and service rate control. Such controls are frequently employed in modeling service systems, telecommunications, and also in production/inventory systems modeled as make-to-stock queues. It is by now well established that in a number of such important problems, there is an optimal policy with a certain structure that can be described by a few parameters. From a design point of view, it is useful to understand how such an optimal policy changes when the system parameters such as arrival or service rates, or the number of servers vary. We present a general framework to investigate the policy implications of the changes in system parameters by using event-based dynamic programming. In this framework, the value function of the control problem is first represented by a number of common operators, and the effect of system parameters on the structured optimal policy is analyzed for each individual operator. We present this analysis for a set of frequently used operators. Whenever a queueing control problem can be modeled by these operators, the effects of system parameters on the optimal policy follows trivially from this analysis. This enables us to easily retrieve and generalize some of the existing results in the literature. We also present examples in a number of challenging problems. 1 Introduction This paper focuses on a class of queueing control problems that frequently arise in modelling service systems, telecommunications systems, or production and inventory systems. The objective in such systems is, in general, managing the queue length or the inventory level in order to maximize a reward function. This can be achieved by controlling the arrival rates using admission control or dynamic pricing, or by controlling the service rate by slowing down or speeding up service. A significant literature exists on the investigation of such problems. One of the common interesting findings of this literature is that in a number of situations, the optimal control policy can be shown to have a simple structure. A frequent case is the optimality of threshold policies where the optimal 1 action only depends on a threshold value of the state variable which may be the queue length or the inventory level. Such structural results are important, because based on them simple operational policies can be designed. In this paper, we pursue this line of exploration further and address the following question: assume that the system parameters (i.e., arrival rates, service rates, number of servers, buffer size) change, how would the structured optimal policy be impacted by this change? The main purpose of this paper is to develop a general approach to analyze this issue. The optimality of structured policies is essential for our investigation. Unless the optimal policy is structured, it is difficult and not very meaningful to make statements on how it would change when the parameters vary. Fortunately, optimality of structured policies was addressed on an individual problem basis in a large number of papers over the past thirty years. In addition, there has been some work on developing general approaches for obtaining results on the structure of optimal policies that apply to a class of problems. For instance, Veatch and Wein [25] investigate a class of queueing problems with this perspective. Smith and McCardle [10] present some results for general Markov Decision Processes (MDPs). In particular, a fruitful general approach for queueing problems is proposed by Koole [11] which introduces the so-called event-based dynamic programming framework. Event-based dynamic programming expresses the value function of a given control problem in terms of the composition of different operators corresponding to individual events. Establishing the structure of an optimal policy can then be performed by verifying that the different operators constituting the problem satisfy certain properties. This decomposition of the problem is extremely useful, since the properties of commonly used operators in queueing control problems can be investigated individually for once and collected in a library as in Koole [11] and [12]. The end result is that, if a queueing control problem is composed of known operators, establishing the structure of the optimal policy is usually trivial by checking the operator library. Our approach for the investigation of the effects of system parameters on the optimal policy is in a similar spirit. We employ the event-based dynamic programming framework and focus on a number of frequently employed operators. We investigate the properties of these operators in detail with a view towards the effects of system parameters. This has two purposes: first it gives a clearer view of how and why the optimal policy is influenced by system parameters. Second, as in Koole [11], whenever a queueing control problem can be expressed as a composition of these operators, understanding the effects of a given system parameter on the optimal policy becomes an easy problem. In order to investigate the influence of system parameters on the optimal policy, we first need to establish certain properties of the value functions with respect to these system parameters. Koole [12], independently, has considered these properties, but with a focus on optimal design issues to support strategical decisions. We, on the other hand, concentrate on the effects of these properties on optimal control policies, which correspond to operational decisions. To our knowledge, there are only a few specific studies which concentrate on this issue. For instance, Ku and Jordan [13] study an admission control problem in a two-stage multi-server loss system. They characterize 2 the structure of the optimal policy, and then establish the effects of the system parameters on this policy. Gans and Savin [6] consider a joint admission control and dynamic pricing problem in a multi-server loss system and analyze the effects of the parameters on the optimal policy. The optimal admission policy is shown to have a threshold structure, and they prove the monotonicity of the optimal thresholds and the optimal prices in the system parameters. Aktaran-Kalaycı and Ayhan [1] and Çil, Karaesmen and Örmeci [4] examine the effects of the parameters on the optimal decisions in a multi-server queueing system with limited capacity. They show the monotonicity of the optimal prices as in [6]. Most of these papers and a number of other important models in the literature fit into our framework, and the influence of the parameters on these models can be determined easily by our results. In order to demonstrate the application of the general approach, we also investigate two models in some detail. Our main contribution is providing a general framework to investigate the effects of system parameters on the optimal policy in queueing control problems. To our knowledge, this problem has not been considered in this generality before. Müller [17] proposes an approach to investigate the effects of transition probabilities on the value function of a Markov Decision Process in a general setting. However, his focus is not on policy implications. A fairly rich class of problems, including several earlier studied ones, fall within the framework developed here. There are, of course, certain limitations to the generality. First a given problem must be within the scope of event-based programming and the value iteration principle must hold. Moreover, for infinite-horizon problems the existence of optimal value functions and optimal policies should be checked separately. We do not address these issues here (see [11] and [12] for more on these issues). These issues are sometimes easy to verify and follow by known general results such as the ones in Puterman [21] but in some cases an analysis of the individual problem may be necessary. Second, we investigate a certain number of operators which are frequently used and cover a significant scope, but other problem-specific operators should be investigated individually. Using the results on the operators here, we think it should not be difficult to extend the investigation to other operators. The other contributions of this paper are as follows: we investigate a subset of the dynamic programming operators from Koole [11] but in certain cases we modify or generalize these operators. In addition, we also present a number of new operators. Some of these are relevant for recent applications of interest such as dynamic pricing, others are introduced since they are used in the control of make-to-stock queues that model production/inventory systems. We present structural properties for these parameters as well as investigating the effects of system parameters. The full strength of event-based dynamic programming appears when the structure of the optimal policy and the effects of the parameters have to be determined for a new problem. To this end, we explore an inventory rationing problem for a make-to-stock queue with multiple classes of demand arriving in batches. Using the operators and the framework, we present a characterization of the optimal policy and determine the effects of the system parameters. 3 The rest of the paper is organized as follows. Section 2 introduces a simple model to demonstrate how the event-based dynamic programming approach can be employed to determine the structure of optimal policies and the effects of parameter changes in these structures. We present the operators modelling the events in stochastic control problems, as well as their relations to certain parameters in Section 3. Then, in Section 4, we work on certain properties of the operators to guarantee the existence of optimal monotone policies and to characterize their behavior with respect to parameter changes. Section 5 presents several models that can be generated by combining the operators we have defined, in order to illustrate how to apply our results regarding to the effects of system parameters. In particular, we discuss two models in detail. Finally, we conclude the study and mention our future research objectives in section 6. 2 Illustration of the framework In this section, our goal is three-fold: First, we motivate our main tool, the event-based dynamic programming technique (introduced in [11]). For this purpose, we represent a typical Markovian control problem as an event-based dynamic program, from which the structure of the optimal control policy can easily be deduced. Second, we illustrate that it is not straightforward to predict how the policy parameters would change when the system parameters are varied. The rest of the paper will, then, show the effectiveness of the event-based dynamic programming technique in investigating the influence of parameter changes on optimal control policies. Finally, with the aid of the model, we classify different kinds of effects that different parameters may impose. We consider an admission control problem in a simple queueing system with m identical parallel servers and infinite waiting room. The system receives N different types of customers, where class-i customers arrive at the system according to a Poisson process with a rate λi bringing a reward of Ri , and demand an exponential service time with rate µ. Then, the state of the system can be defined as the current number of customers in system x, where x ∈ Z+ , Z+ = {0, 1, . . . }. A holding cost of h(x) is incurred per unit time, where h(x) is a convex and non-decreasing function of x. At each arrival epoch, the decision maker either accepts the incoming customer or rejects her in order to maximize the expected profit over an infinite horizon. We assume that a rejected customer is lost forever. We can consider both the total discounted and long-run average criteria. However, for now we concentrate on the total discounted profit with a discount rate of β. We employ uniformization to form the equivalent discrete time Markov decision model of the system (see [16]). Hence, we assume that N i=1 λi + M µ + θ + β = 1 without loss of generality, where M ≥ m + 1 and θ > 0. We note that θ + µ(M − m) corresponds to the rate of fictitious events, which are introduced to ensure that the time scale will not be affected when one of the parameters, µ or λ or m, increases. We define vn (x) as the total expected β-discounted profit of such a system when there are n remaining transitions in the horizon. Then the standard arguments of Markov decision processes can be used 4 to analyze the finite horizon problems, and the infinite horizon problems as their limits, see e.g., [21]. The value functions vn (x) satisfy the following optimality equations: N vn+1 (x) = i=1 λi max {vn (x), vn (x + 1) + Ri } + µ min{m, x} vn ((x − 1)+ ) +(µ (M − min{m, x}) + θ) vn (x) − h(x), where a+ = max{0, a}. For this particular model, we can show that both the discounted infinite horizon problem and the long-run average problem have a solution, which can be found by the value iteration algorithm. Hence, we can define v(x) as the total expected β-discounted profit over an infinite horizon, and compute it by v(x) = limn→∞ vn (x). Finally, taking β → 0 in v(x) converts the problem to maximizing the long-run average profit in the usual way, see e.g., [21]. In the rest of this section we will concentrate on the total expected β-discounted profit over an infinite horizon, v(x). The traditional approach is to examine the structure of an optimal policy using the optimality equations in order to prove some structural properties (monotonicity, concavity, etc.) of the optimal policies. For example, one can prove that if v0 (x) is non-increasing and concave in x, then vn (x) will also be non-increasing and concave in x for all n > 0 by induction. However, proving the desired properties may not be easy by working on the entire value function in some cases, as Koole [11] points out. Therefore, Koole [11] suggests defining the value function as a combination of distinct event operators as an alternative to the traditional approach. Now, we demonstrate how one can use the event-based dynamic programming technique by redefining the value function of the above model as a combination of certain event operators. Let us first introduce the following event operators: TCOST v(x) = v(x) − h(x), TU N IF ({fj (x)}; {pj }) = pj fj (x), j TDEP v(x) = b(x)v((x − 1)+ ) + [1 − b(x)]v(x), with b(x) = min{x, m} , M (1) TADMi v(x) = max {v(x), v(x + 1) + Ri } , TF IC v(x) = v(x). Using the above operators, the value function for the total expected β-discounted profit over an infinite horizon, v(x) is: v(x) = TCOST TU N IF (TDEP v(x), {TADMi v(x)}i , TF IC v(x); M µ, {λi }, θ) , In order to investigate the structure of the value function, we first show that TADMi , TDEP , TCOST , TF IC preserve some properties, say P rop1 ,. . . , P ropk , of any function, f (x), on which they 5 are applied, whereas we prove that TU N IF preserves these properties whenever all functions entering the operator satisfy them. Then, the value function, vn+1 (x), has P rop1 ,. . . , P ropk , if vn (x) has these properties since it is the combination of the event operators: TU N IF , TADMi , TDEP , TCOST , TF IC . This implies, by induction, that vn (x) has these properties for all n if v0 (x) has them. As v(x) = limn→∞ vn (x), v(x) also inherits these properties, as well as the relative value functions for the long-run average criterion (see e.g., [21]). Finally, the properties of the value functions should be related to the optimal decisions, which will characterize the optimal policy structure for this model. In this specific model, it can be shown that all the operators TU N IF , TADMi , TDEP , TCOST , TF IC preserve monotonicity (f (x + 1) ≤ f (x)) and concavity (f (x − 1) − f (x) ≤ f (x) − f (x + 1)), which then guarantees the existence of an optimal threshold policy: Proposition 1 There exists an optimal policy of threshold type, i.e., there exist numbers li ≥ 0 for i = 1, · · · , N , such that: If x ≥ li , it is optimal to reject an incoming class-i customer; otherwise it is optimal to admit her. Moreover, if the rewards R1 > R2 > · · · > RN , then l1 ≥ l2 ≥ · · · ≥ lN . Now we arrive at the question which is the main subject of this paper: How will these thresholds change if one of the parameters, more explicitly one of the arrival rates λi , or the service rate µ, or the number of servers m increase, by some ε > 0? In general, the answer to this question is not obvious. Let us consider a specific version of the above model, where the system has one server and three different customer types with rewards (R1 , R2 , R3 ) = (50, 30, 10). A holding cost of 1 unit is incurred for each waiting customer per unit time, so h(x) = x. Finally, the service rate is µ = 2.5, and the arrival rates are (λ1 , λ2 , λ3 ) = (0.5, 0.7, 0.9), before uniformization. Optimal thresholds, which maximize the long-run average profit of the system, are (l1 , l2 , l3 ) = (72, 32, 6). Will these optimal thresholds increase or decrease when one of the arrival rates of different classes increases? The most puzzling case is when the arrival rate of class 2, λ2 , increases, where one can see two different lines of intuition: (1) We may expect the threshold of class 3, l3 , to decrease to have more room for class-2 customers, and the threshold of class-1, l1 , to increase in order to protect the system from the growing number of class-2 customers. In this line of of thinking, the behavior of l2 is not clear, it may decrease to open room for class 1, or increase to limit the access of class 3. (2) Another line of thought may take an increase in any one of the arrival rates as an increase in the overall demand, so that it expects all thresholds to decrease (which includes staying constant). Both arguments are intuitively convincing, but also contradictory, and so it appears that the intuition does not always provide satisfactory answers to these types of questions. Hence, our aim in this paper is to provide a general framework, which, completely and automatically, characterizes the behavior of optimal control policies when one of the parameters changes. Finally, we would like to point out the different types of effects that different parameters impose on the operators. An increase in λi affects the admission control operator, TADMi , directly, since 6 the probability of an arrival, i.e., the probability of the event which triggers the operator TADMi , is increased. Such an increase in λi decreases the probability of the fictitious event, θ. We will see, later on, that this simultaneous change in the probability of these two events will call for a comparison of their consequences. Finally, notice that the definition of the operator TADMi is still the same, only the value of its multiplier has increased. Now assume that we increase m by 1, which affects the departure operator, TDEP , directly. But now the definition of the operator TDEP is altered, while the probabilities of all events remain the same, in particular the probability of observing TDEP is still M µ. In all the operators we will define in the next section, the direct effects of parameters will be either a change in the probability of the event that the operator corresponds to (as in λi above) or a change in the definition of the operator itself (as in m above) or both. Finally, we note that, obviously, the value functions will be influenced by a change in one of the parameters, which will affect all other events and so corresponding operators. However, these effects are indirect effects, which do not need to be addressed explicitly. Now we are ready to introduce our framework, starting with the operators. 3 The Operators The important issues addressed in this paper are to introduce the appropriate event operators that are commonly used, and to prove that they preserve certain properties of a function f . This section introduces the operators that correspond to certain events occurring in Markovian queuing and inventory systems. By using these operators, various Markovian systems can be constructed. The next section will examine certain properties of these operators in order to characterize the structure of control policies. Our eventual goal is to characterize the effects of four parameters on the structure of optimal policies, namely the customer or demand arrival rate λ, the service or production rate µ, the number of servers m, and the capacity of the system K. While defining the operators, we will keep the notation as general as possible. Therefore, in certain cases the relation of the operators with these parameters will not be obvious from the definitions only. Hence, we will comment on the potential direct influence of the parameters on the given operators, when applicable. As we mentioned in the previous section, these effects can be of two types, changing the probability of the corresponding event or changing the definition of the operator. In the rest of our paper, x will denote the state of the system (the number of customers in the queue or the inventory level of an item). We consider only the objective of maximizing, but all our results are valid for minimization problems when the properties are re-defined properly. Moreover, we define v(x) as a generic value function of an infinite horizon problem with discounting. In other words, v(x) is the total maximal expected discounted profit of the system over an infinite horizon. 7 However, we note that, as mentioned earlier, the existence of the infinite-horizon value functions should be shown for each different model, even if the model is constructed as a combination of the operators defined below. We define our operators mainly for one-dimensional systems. However, our framework can accommodate Markov modulation by defining an operator to represent the changes in the environment state, namely TEN V (see below for the precise definition). Since the state of the environment is exogeneous, i.e., we do not control the environment, it can even be multidimensional. We denote the environment state by e, and change the state of the system x to (x, e). Then the operators will be modified as in the following example: If a certain operator T is defined as T v(x) = v(x + 1), we re-define it as T v(x, e) = v(x + 1, e), where e is the state of the exogenous environment. Our framework can model systems with finite or infinite buffer capacity. We will define the operators mainly for systems with infinite capacity. When stating and proving our results, we will always address the finite-capacity systems explicitly. In principle, we can classify the operators into two: the controlling operators are those which involve a maximization, e.g., TADMi in the example, while the un-controlling operators are all the others such as TDEP . Now we are ready to define the operators to be considered: The cost operator represents the systems incurring a holding cost, h(x), which is a function of the state of the system: TCOST v(x) = v(x) − h(x), where the holding cost function h(x) is non-decreasing and convex in the inventory level x. We will define the arrival operator, TARR , to represent the arrival process to a queueing system: TARR v(x) = a(x)v(x + 1) + [1 − a(x)]v(x). The function a(x) is, in fact, the probability that an arriving customer joins the system when there are x customers, which we refer to as the joining probability. We assume that a(x) is non-increasing in x. When a(x) is constant, TARR models a system where customers enter the system with a fixed probability, say a, independent of the state, or choose not to enter the system with probability 1 − a. We will call this type of arrivals as regular arrivals, since they do not depend on the state of the system. Moreover, we can model finite systems by setting a(x) = 0 for all x ≥ K. Our results will require certain properties on a(x), which will be explicitly mentioned. A change in the arrival rate λ will change the probability of observing an arrival, and so this operator is directly related with λ. Whenever the function a(x) does not depend on λ, the operators themselves do not change with λ, only their multipliers in the optimality equations will change. The function a(x), on the other hand, may depend on one of the parameters λ, µ, or m. In this case, the definition of the operator TARR changes with the parameter under consideration, so that we need to consider explicitly how a(x) varies with it. Finally, an increase in the system capacity K always changes the definition of this arrival operator. 8 The departure operator, TDEP , represents the departure of an existing customer from the system, where the service rate may depend on the state of the system. TDEP is defined as in (1), where the function b(x) corresponds to the probability of a service completion when the system has x customers. In general, b(x) can be any non-decreasing and concave function of x. Whenever we need to model m identical parallel servers, b(x) is specified as in (1). We note that the function b(x) may depend on one of the parameters λ, µ, or m in a more complex way. Then, as with the function a(x) in TARR , we need to consider explicitly how b(x) varies with the parameter under consideration. The controlled departure and the controlled production operators, TCD and TC P RD , represent the choice of the best service rate in queuing and inventory systems, respectively. π is the used portion of the service rate and cπ is the non-negative cost of using this portion, i.e., cπ ≥ 0. We assume that c0 = minπ∈[0,1] cπ . Then: ⎧ ⎨max π∈[0,1] {−cπ + πv(x − 1) + (1 − π)v(x)} if x > 0 TCD v(x) = ⎩−c + v(x) if x = 0, 0 TC TCD (TC P RD ) P RD v(x) = max {−cπ + πv(x + 1) + (1 − π)v(x)} . π∈[0,1] is affected by a change in the service rate (production rate) µ, where µ changes the probability of finishing the service of a customer (of producing an item). For a capacitated system, an increase in the parameter K will change the definition of TC to emphasize at this point that TC P RD P RD . It is important is the main production control operator employed in the literature on make-to-stock queue. The queue pricing and the inventory pricing operators, TQ P RC and TI P RC represent the opti- mal price to be charged for the arriving customers in queuing and inventory systems, respectively. F (.) is the cumulative distribution function of the reservation price of an arriving customer, R, where R is the maximum price a customer is willing to pay. Hence: TQ P RC v(x) = max F̄ (p)[v(x + 1) + p] + F (p)v(x) , p ⎧ ⎨maxp F̄ (p)[v(x − 1) + p] + F (p)v(x) TI P RC v(x) = ⎩v(x) if x > 0 if x = 0, where F̄ (p) = 1 − F (p). Both of these pricing operators are affected by an increase in the arrival rate λ, through an increase in the probability of observing the event of pricing due to an arrival. The operator TQ P RC is additionally affected by an increase in the capacity of the system K, which changes the definition of the operator. The batch admission and the batch rationing operators, TB ADM i and TB RT i , represent the choice of the number of the customers to be admitted from an arriving batch of class-i customers 9 in queuing and inventory systems, respectively. We assume that some of the customers in a batch can be admitted while the remaining ones are rejected, which is defined as partial acceptance in [18]. Bi is the size of an arriving batch, κi is the number of class-i customers admitted from this batch, and Ri is the reward obtained by admitting one class-i customer. Therefore: TB ADM i v(x) = RT i v(x) = TB max {κi Ri + v(x + κi )} , κi ≤Bi max κi ≤min{x,Bi } {κi Ri + v(x − κi )} . We note that TADMi defined in the above example is a special case of TB ADM i , where only single customers arrive at the system, i.e., Bi = 1 for all i. The batch operators have a similar spirit to pricing operators, with respect to the effects of parameters. Hence, they are both affected by an increase in the arrival rate λ, which increases the probability of observing the event of batch admission or rationing, while the operator TB ADM i is additionally affected by an increase in the capacity of the system K, which changes the definition of the operator. The uniformization operator puts all events together with their corresponding probabilities: l TU N IF ({fi }lj=1 ; {pi }lj=1 )(x) = l pj fj (x), with j=1 j=1 pj ≤ 1. TU N IF is a convex combination of functions fj , and it only reflects the changes in the parameters so that it is not affected by any parameter directly. whereas l j=1 pj l j=1 pj < 1 models discounting criterion, = 1 refers to long-run average criterion. The fictitious, TF IC , operator represents all the fictitious events, which affect neither the state nor the reward of the system: TF IC v(x) = v(x). This operator is affected by both λ and µ, since any change in these parameters increase the probability of an arrival or a departure (or production), which decreases the probability of a fictitious event. The environment, TEN V (j) , operator represents the transition from the exogenous environment e to the exogenous environment j in the Markov Modulated models. Then: TEN V (j) v(x, e) = v(x, j). TEN V (j) is not affected by a change in any of the parameters since it represents an exogeneous process. 4 Properties of the Operators In this section, we first prove that the operators introduced in Section 3 preserve some structural properties such as monotonicity and concavity under certain assumptions. This will enable us to 10 conclude that the models constructed by using these operators have the same properties. Some of these operators, especially the ones applying to queueing systems, are similar to the operators in Koole [11] and [12], and Koole has already shown some structural properties of these operators. However, Section 4.1 discusses and proves these properties in some detail for a number of reasons: The operators defined here and in [11] and [12] do not coincide, as we present some extensions on queueing operators, either to incorporate certain generalities, or to facilitate representing certain events, whereas operators related to inventory systems have not been considered. Moreover, we consider some additional structural properties in terms of bounding certain quantities. Finally, it is essential to understand the main structural properties of the operators, which guarantee the existence of a monotone optimal policy, in order to consider the effects of system parameters on the monotonicity of optimal policies. 4.1 Structural Properties Preserved by the Operators The first property on which we focus is the monotonicity. We define the monotonicity such that the value function is non-increasing in x for all states x, i.e., v(x) ≥ v(x + 1). This property implies a positive burden of an additional customer or an additional inventory in the system. We refer to this burden as the opportunity cost of an additional customer or an additional inventory in state x, and denote it by ∆v(x) with ∆v(x) = v(x) − v(x + 1). For most plausible queueing systems, this burden is indeed positive. However, in inventory models, opportunity costs are not always positive or always negative, so that the monotonicity of the value functions does not hold in general. In many queueing and inventory problems, the opportunity costs affect optimal decisions, and monotonicity of the opportunity costs (e.g., concavity of the value function in a maximization problem), implies the monotonicity of optimal policies. For instance, threshold policies are optimal in admission control problems due to the monotonicity of opportunity costs. Therefore, we also investigate the monotonicity of opportunity costs. In addition to the above properties, we consider upper and lower bounds on the opportunity costs, ∆v(x), in queueing and inventory problems, respectively. The existence of upper bounds in queueing systems implies that the opportunity cost of a new customer, ∆v(x), may be lower than the reward of one or more demand classes for all states x, and thus these classes are admitted to the system whenever it is possible. Similarly, the existence of lower bounds in inventory systems implies that opportunity cost of an additional inventory, ∆v(x), may be higher than the reward of one or more demand classes for all states x, and thus the demands from these classes are satisfied whenever it is possible. Such classes are defined to be preferred classes in [19] and [22], a terminology we adopt in this paper. Hence, these bounds imply the existence of preferred class(es). Because of these relationships, we investigate the upper and lower bounds on the opportunity costs as well as the monotonicity and concavity of the value function. To denote the bounds, we define new 11 properties Lower-Bounded Differences (LBD) and Upper-Bounded Differences (U BD) such that f is an LBD function if there exists L ∈ R such that f (x) − f (x + 1) ≥ L for all x, and it is a U BD function if there exists 0 < U ∈ R with f (x) − f (x + 1) ≤ U for all x. The results on the monotonicity, concavity and bounds of the operators are presented in the fol- lowing lemma. In Lemma 1, we use a similar notation to the one in [11]: For a certain event operator, T , P rop1 , . . . , P ropk → P rop denotes that if the function f has the properties P rop1 , . . . , P ropk then T f has the property P rop. The explicit definitions of the properties we consider are as follows: N on − Inc(x) : f (x) ≥ f (x + 1), Conc(x) : ∆f (x) ≤ ∆f (x + 1), LBD(L) : f (x) − f (x + 1) ≥ L U BD(U ) : f (x) − f (x + 1) ≤ U, for some L ∈ R and U ≥ 0. We do not have any restriction on L because in the inventory models the opportunity cost of an additional inventory may be less than 0. Lemma 1 Conc(x). • TU N IF : Non − Inc(x) → N on − Inc(x) ; LBD(L) → LBD(L); Conc(x) → • TCOST : N on − Inc(x) → N on − Inc(x) ; LBD(L) → LBD(L); Conc(x) → Conc(x). • TARR : N on − Inc(x) → N on − Inc(x) ; U BD(U ) → U BD(U ); Conc(x) → Conc(x), given that a(x) is non-increasing and convex in x. • TDEP : N on − Inc(x) → Non − Inc(x) ; U BD(U ) → U BD(U ); Conc(x) → Conc(x). • TCD : N on − Inc(x) → Non − Inc(x) ; U BD(U ) → U BD(U ); Conc(x) → Conc(x). • TC P RD : LBD(L) → LBD(L); Conc(x) → Conc(x). • TQ P RC : N on − Inc(x) → N on − Inc(x) ; U BD(U ), Conc(x) → U BD(U ); Conc(x) → Conc(x). • TI−P RC : LBD(L), Conc(x) → LBD(L); Conc(x) → Conc(x). • TB ADM i • TB RT i Conc(x). : N on − Inc(x) → N on − Inc(x) ; U BD(U ), Conc(x) → U BD(U ); Conc(x) → : LBD(L), Conc(x) → LBD(L); Conc(x) → Conc(x). • TEN V (j) : N on − Inc(x) → N on − Inc(x); U BD(U ) → U BD(U ); LBD(L) → LBD(L); Conc(x) → Conc(x). • TF IC : N on − Inc(x) → N on − Inc(x) ; U BD(U ) → U BD(U ); LBD(L) → LBD(L); Conc(x) → Conc(x). We note that whenever a finite system receives regular arrivals, so that a(x) = a for all x < K 12 and a(x) = 0 for all x ≥ K, TARR does not preserve concavity since a(x) is not convex in x. Hence, regular arrivals preserve concavity only in infinite systems. The proof of the lemma is trivial for TF IC since it is the same as f (x), and the proofs for TU N IF and TEN V (j) are obvious since TU N IF is a convex combination of all functions entering the operator, and since we assume that f (x, e) has the same structural properties in all demand environments, respectively. The proofs of the monotonicity, LBD property and concavity for TCOST are obvious when the holding cost h(x) is assumed to be non-decreasing and convex. For the remaining operators, the proofs can be found in the Appendix. In the following remark, we discuss why TCOST does not preserve the U BD property and its implication on optimal policies. Remark 1 Unlike monotonicity, LBD and concavity, TCOST does not preserve the U BD property of the function on which it is applied. To illustrate, we let f (x) be a U BD function, and then try to prove that TCOST f (x) is also a U BD function. To achieve this aim, we have to prove that ∆f (x) − ∆h(x) ≤ R is true. We know that ∆f (x) ≤ R by our assumption, so that we need to show ∆h(x) ≥ 0, i.e., the holding cost is non-increasing in x. However, in any realistic queueing system, this is not possible for obvious reasons. Therefore, the cost operator does not preserve the U BD property of f (x), which implies that there does not exist a preferred class in queueing systems incurring a holding cost. 4.2 Effects of Parameters on Structural Properties In this subsection, we focus on the behavior of the operators when the parameters change. Now that the system parameters are allowed to change, we will be considering two systems with two different parameters simultaneously. Hence, we extend our notation, so that vα (x) denotes the value function when the parameter under consideration has the value of α. However, if it is clear that we consider only one system, we will drop the subscript α. In order to observe the effects of parameter changes on the structure of optimal policies, we increase α, the parameter whose effects we would like to observe, by ε > 0, and compare the opportunity costs in the systems with parameters α and α + ε. For this purpose, we define the supermodularity and submodularity of the value function with respect to the state (x) and the parameter under consideration (α) as follows: ∆vα (x) ≥ ∆vα+ε (x), and (2) ∆vα (x) ≤ ∆vα+ε (x), (3) respectively. Intuitively, the value function, vα (x), is supermodular with respect to α and x if the opportunity cost, ∆vα (x), is non-increasing in α, and it is submodular if ∆vα (x) is non-decreasing in α. 13 To show the supermodularity (submodularity) of the value function, vα (x), with respect to x and α, we need to prove that all operators which constitute the model preserve the supermodularity (submodularity) with respect to x and α. However, this is not always sufficient to guarantee the desired result. Let us illustrate this with the example we introduced earlier. Assume that we are interested in the effect of an ε increase in the service rate µ. Unlike the arrival rates λi , such an increase does not introduce any trade-off between customers, so the intuition points out only one direction: Since the service will be faster for all customers, the expected burden of an extra customer in the system should decrease. Hence, we expect the value function, vµ (x), to be supermodular with respect to x and µ. Then, we need to show: ⎡ N λ ∆TADMi vµ (x) ⎢ i=1 i ⎢ +M µ ∆T DEP vµ (x) ⎢ ∆vµ (x) = ⎢ ⎢ +θ ∆TF IC vµ (x) ⎣ with: ⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥≥⎢ ⎥ ⎢ ⎦ ⎣ N i=1 λi ∆TADMi vµ+ε (x) +M µ ∆TDEP vµ+ε (x) +θ ∆TF IC vµ+ε (x) +M ε ∆[TDEP vµ+ε (x) − vµ+ε (x)] ⎤ ⎥ ⎥ ⎥ ⎥ = ∆vµ+ε (x), (4) ⎥ ⎦ N vµ+ε (x) = TCOST i=1 λi TADMi vµ+ε (x) + M (µ + ε)TDEP vµ+ε (x) + (θ − M ε)TF IC vµ+ε (x) , N vµ (x) = TCOST λi TADMi vµ (x) + M µTDEP vµ (x) + θTF IC vµ (x) ). i=1 When supermodularity is preserved by all operators of this model, the first three lines in (4) satisfy the desired inequality. However, the last line involves with the quantity TDEP vµ+ε (x) − vµ+ε (x): In the altered system, the service rate is increased to µ + ε for each server, which decreases the rate of the fictitious event to θ − M ε to ensure that uniformization still holds with the same time scale. Hence, we need to compare the consequences of having a service completion instead of having no event in states x and x + 1. Notice that the remaining term is observed only in one of the systems, so it depends only on the state of the system. In this example, this extra term should be non-decreasing in x to preserve the supermodularity of the value functions: ∆[TDEP vµ+ε (x) − vµ+ε (x)] = TDEP vµ+ε (x) − vµ+ε (x) − (TDEP vµ+ε (x + 1) − vµ+ε (x + 1)) ≤ 0. This monotonicity implies that service completions are more valuable when the number of customers is higher. Now we consider the general framework: Let T be the operator related with the parameter α. Then, an increase in parameter α generates an extra term T vα+ε (x) − vα+ε (x) in the optimality equation, whenever this increase affects the probability of observing the event represented by T . We call this term “the marginal benefit of operator T ”, since it compares two systems, one observing 14 the event generated by the operator T , and the other remaining in its current state. Here, we note that the marginal benefit of an operator T is generated only by a change in µ and λ, since neither K nor m changes the probability of events, as discussed in Section 3. When the marginal benefit of an operator has to be acknowledged separately, its monotonicity will be a determining factor in showing the supermodularity or submodularity of the value functions with respect to the corresponding parameter. The next subsection is devoted to the supermodularity and submodularity properties preserved by the operators. Section 4.2.2 concentrates on the monotonicity of the marginal benefit of operators. Finally, in section 4.2.3, we present a short discussion on the consequences of all our results in the queueing and inventory systems. 4.2.1 Supermodularity and Submodularity In this section, we present a systematic analysis of supermodularity and submodularity of the value functions with respect to the state of the system, x, and the parameter under consideration, α, where α can be µ, λ, m or K. We first consider the effects of the service (production) rate, µ, and the arrival rate, λ, so let α ∈ {µ, λ}. Then, a change in α does not change the state space of the system, meaning that both systems under consideration will live in the same state space. Moreover, α does not change the definitions of operators, unless the functions a(x) and b(x) in operators TARR and TDEP , respectively, are allowed to vary by α. We show that when the definitions of the operators are not changed by α, all operators preserve supermodularity and submodularity of the value function, vα (x). If a(x) and b(x) depend on α, then we need additional conditions on how α affects these functions. For example, assume that we are interested in the supermodularity of the value functions with respect to α, i.e., we expect the opportunity costs to decrease in α. We assume, furthermore, that a(x), the joining probability, in the operator TARR is a function of α. Now, we need to check whether TARR preserves supermodularity. If a(x) increases with α in all states x, then the overall load of the system may increase, which may also increase the opportunity costs, to the contrary of the initial expectation of supermodularity. However, under certain restrictions on how a(x) changes with α, we can ensure that TARR preserves supermodularity. Lemmas 2 and 3 specify conditions on how a(x) and b(x) can depend on α for TARR and TDEP to preserve supermodularity and submodularity, respectively. Now we consider the parameter K. An increase in K changes the state space, which has a greater impact on the directly related operators. In particular, K affects the arrival-related operators in queueing and TC P RD in inventory systems in a special way, whereas its impacts on other operators are similar to those discussed above. At this point, we group the operators affected by K strongly, 15 as controlling operators, i.e., TQ P RC , TB ADM i and TC P RD , and as un-controlling operator, i.e., TARR . A change in K affects these two groups of operators in opposite ways. We first consider the un-controlling operator, TARR , defined for queueing systems: When K is increased to K + 1, a(K) increases from 0 to some strictly positive value, so that an arrival which would never join the queue in state K before the change, will join the queue with a positive probability. Therefore, the un-controlling operator, TARR , leads to an increase in the overall load of the system when K is increased. Moreover, we cannot derive sufficient conditions to bound this increase as in the case when λ or µ affects a(x) or b(x), since the state space of the system is changed. Hence, we cannot guarantee that this operator will preserve supermodularity, which needs the opportunity costs to decrease with an increase in K. As a result, we show that TARR preserves only submodularity with respect to K. For the controlling operators, on the other hand, increasing K brings an extra opportunity to the decision maker: In a queueing system with waiting room capacity K, new arrivals are not allowed to enter the system when there are K customers in the system whether it is optimal or not. If it is optimal to reject the arrivals, the optimal policy does not change when the capacity increases. On the other hand, if the arrivals are rejected only because of the capacity, a new customer may be accepted when there are K customers after an increase in the capacity. Obviously, an increase in the capacity to K + 1 cannot lead to rejecting the customers who were accepted in the system with capacity K. In other words, the opportunity cost of a new customer cannot increase by an increase in the capacity. A similar argument can be given for the controlled production operator TC P RD . Thus, we only show that the controlling operators preserve supermodularity with respect to K. Section 3 of the Appendix presents explicit examples, where submodularity is not preserved with respect to K by TB ADM i and supermodularity is not preserved by TARR for regular arrivals. All other operators preserve both supermodularity and submodularity of the value function with respect to K and x. Finally, we consider the effects of the number of servers, m. When the system has m identical parallel servers, the function b(x) depends on m in a specific way given by formula (1). Hence, it is non-decreasing in m and supermodular with respect to m and x. Then, b(x) does not satisfy the condition given for the operator TDEP in Lemma 3 when α = m, so that TDEP does not preserve submodularity with respect to m and x. This is very intuitive, since increasing the number of servers, in other words, service capacity, cannot increase the opportunity costs. At this point, a natural question comes up regarding the service rate µ, since Lemma 3 will ensure that TDEP preserves submodularity with respect to µ and x. In fact, increasing m changes the definition of the operator TDEP . Therefore, the effect of a change in m involves the marginal benefit generated by TDEP implicitly. As we will see in the next subsection, the marginal benefit of TDEP is nondecreasing in x, so that whenever µ is increased the opportunity costs will always decrease. Hence, the effects of increasing µ and m on the opportunity costs are the same, and they agree with our 16 intuition. For all other operators, the effects of m are similar to those of λ and µ when m < K, while they are similar to those of K for m = K. We summarize our results about the effects of the parameters on the operators in the following lemmas. We use the same notation in Lemma 1. In addition, we define SuperM (α, x) and SubM (α, x) to denote the supermodularity and submodularity with respect to α and x. A detailed proof of Lemma 2 can be found in the Appendix, while the proof of Lemma 3 is omitted since it is very similar to that of Lemma 2. The properties of the operators related to preserving supermodularity is summarized by the following lemma: • TU N IF : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}. Lemma 2 • TCOST : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}. • TARR : In systems with infinite capacity: N on−Inc(x), Conc(x), SuperM (α, x) → SuperM (α, x), given that a(x) is non-increasing and convex in x, non-increasing in α, and submodular with respect to α and x, for all α ∈ {µ, λ, m}. TARR : In systems with finite capacity: N on−Inc(x), Conc(x), SuperM (α, x) → SuperM (α, x), given that a(x) is non-increasing and convex in x and it does not depend on α, for all α ∈ {µ, λ, m} if m < K, and for all α ∈ {µ, λ} if m = K. • TDEP : N on − Inc(x), Conc(x), SuperM (α, x) → SuperM (α, x), given that b(x) is nondecreasing in α and supermodular with respect to α and x, for all α ∈ {µ, λ, m, K}. • TCD : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}. • TC P RD : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}. • TQ P RC : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}. • TI P RC : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}. • TB ADM i • TB RT i : Conc(x), SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}. : Conc(x), SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, K}. • TEN V (j) : SuperM (α, x) for all demand environments → SuperM (α, x) for all demand environments, for all α ∈ {µ, λ, m, K}. • TF IC : SuperM (α, x) → SuperM (α, x), for all α ∈ {µ, λ, m, K}. We note that systems receiving regular arrivals, which join the queue with a constant probability, do not preserve supermodularity (or submodularity) in finite systems, since concavity cannot be guaranteed in these systems due to the fact that a(x) is not convex in x. The following lemma presents the properties of the operators related to preserving submodularity: Lemma 3 • TU N IF : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m, K}. 17 • TCOST : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m, K}. • TARR : N on − Inc(x), Conc(x), SubM (α, x) → SubM (α, x), given that a(x) is non-increasing and convex in x, non-increasing in α, and submodular with respect to α and x, for all α ∈ {µ, λ, m, K}. • TDEP : N on − Inc(x), Conc(x), SubM (α, x) → SubM (α, x), given that b(x) is non-increasing in α and submodular with respect to α and x, for all α ∈ {µ, λ, K}. • TCD : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, K}. • TC : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ}. P RD • TQ P RC : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m} if m < K, and for all α ∈ {µ, λ} if m = K. • TI P RC : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, K}. • TB ADM i • TB RT i : Conc(x), SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m} if m < K, and for all α ∈ {µ, λ} if m = K. : Conc(x), SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, K}. • TEN V (j) : SubM (α, x) for all demand environments → SubM (α, x) for all demand environments, for all α ∈ {µ, λ, m, K}. • TF IC : SubM (α, x) → SubM (α, x), for all α ∈ {µ, λ, m, K}. As we can observe from the above lemmas, most operators preserve both supermodularity and submodularity. In particular, if the waiting room capacity is not limited to a finite K, all operators preserve both properties with respect to µ and λ. This is one of the reasons that the monotonicity of the marginal benefit of operators is the key factor to guarantee the supermodularity and submodularity of the value functions. 4.2.2 The Marginal Benefit of Operator T : T v(x) − v(x) In this section, we will not indicate the dependence of the value functions on α, since the marginal benefit of an operator is defined for systems having the same set of parameters. In fact, the marginal benefit of an operator, T , compares two systems both in state x, where one observes the event caused by an increase in the parameter, α, which affects the operator, T , directly, while the other one remains in its current state. Now consider the case when the marginal benefit is nondecreasing in x, so T v(x) − v(x) ≤ T v(x + 1) − v(x + 1), which can be written as T ∆v(x) ≤ ∆v(x). Then observing the event associated with the operator T decreases the opportunity costs, so that increasing the parameter α will also decrease the opportunity costs. Hence, supermodularity of the value functions in parameter α and x will always need the marginal benefit of operator T associated with parameter α to be non-decreasing in x. Similarly, whenever we need submodularity of the 18 value functions, we expect to have the marginal benefit of T to be non-increasing in x. The supermodularity and submodularity of the value functions depend on the characteristics of the event and the context of the problem. In the context of queueing control, an increase in the service capacity will decrease the opportunity costs, which calls for the supermodularity of the value functions. In terms of the marginal benefit, this intuition requires the departure of an existing customer to be more valuable when there is one more customer in the system. Similarly, an increase in the arrival rate of a queueing system should increase the opportunity costs, so that the arrival of a new customer is expected to be more valuable when there is one less customer. Therefore, T v(x) − v(x) is non-decreasing in x for departure-related operators and non-increasing in x for arrival-related operators. However, this intuition may not apply to the un-controlled operator TARR in all circumstances: If the waiting room capacity is K, then the arrivals to a system in state K do not bring any burden, and so a high level of x may, in fact, decrease the load of the system. Moreover, since the joining probability function, a(x), is assumed to be non-increasing in x, the number of customers in the system, the system faces two opposing effects: On one hand, as the load of the system becomes higher, the arrival of a new customer has a higher burden on the system. On the other hand, a new customer also relieves this burden because s/he induces a decrease in the joining probability. Hence, we consider the marginal benefit of the arrival operator TARR only in systems receiving regular arrivals to an unlimited waiting room, so that a(x) = a for all x ≥ 0 for some 0 < a ≤ 1. In the inventory control context, if the level of on-hand inventory is high, then the arrival of a new customer will be more valuable, while producing a new item will bring little benefit. More precisely, an increase in the production capacity will increase the opportunity costs. Hence, the value functions are expected to be submodular in µ and x, while the marginal benefit of production operators should be non-increasing in x, so that producing a new item is more valuable if the inventory is lower. Similarly, an increase in the demand arrival rate will decrease the opportunity costs, and the marginal benefit of demand arrival operators should be non-decreasing in x. To summarize, in inventory systems T v(x) − v(x) is non-decreasing in x for arrival-related operators and non-increasing in x for the production operator. We state the results on the marginal benefit in Lemma 4. The notation is similar to Lemma 1, where we say that f has N on − Dec − M B(x) property if the marginal benefit generated by the operator T is non-decreasing in x, i.e., T f (x) − f (x) ≤ T f (x + 1) − f (x + 1). Similarly, if a function f has the property N on − Inc − M B − T (x), then T f (x) − f (x) ≥ T f (x + 1) − f (x + 1). The proof of Lemma 4 is presented in the Appendix. Lemma 4 • TARR : Conc(x) → N on − Inc − M B(x), in systems with infinite buffer, given that a(x) = a for all x ≥ 0 for some 0 < a ≤ 1. • TDEP : N on − Inc(x), Conc(x) → N on − Dec − M B(x). 19 • TCD : Conc(x) → N on − Dec − M B(x). • TC P RD : Conc(x) → N on − Inc − M B(x). • TQ P RC : Conc(x) → N on − Inc − M B(x). • TI−P RC : Conc(x) → N on − Dec − M B(x). • TB ADM i • TB RT i 4.2.3 : Conc(x) → N on − Inc − M B(x). : Conc(x) → N on − Dec − M B(x). Discussion of the results The supermodularity and submodularity of the value functions with respect to µ and λ requires the conclusions of both sections 4.2.1 and 4.2.2, since the monotonicity of the marginal benefits is necessary in addition to supermodularity or submodularity of the value functions with respect to the parameter under consideration. In fact, the determining factor for the value functions to preserve supermodularity or submodularity will always be Lemma 4. For the parameters m and K, on the other hand, the marginal benefits of the affected operators do not come into the picture, and so the supermodularity or submodularity of the value functions with respect to m and K will follow from section 4.2.1 immediately. Now consider systems with infinite capacity. All queueing models that can be represented as a combination of the operators described here satisfy supermodularity with respect to µ and m. However, the queueing models satisfying submodularity property with respect to λ can be built as a combination of all operators, except for the operator TARR . By Lemma 4, TARR preserves the property of non-increasing marginal benefits only if a(x) is constant for all x. Hence, we cannot guarantee submodularity with respect to λ in the presence of the operator TARR when a(x) is not constant. In inventory models with infinite capacity, the value functions satisfy supermodularity with respect to λ and submodularity with respect to µ for all possible combinations of the operators considered here. When the waiting room is finite, all queueing models represented by any combination of the operators still satisfy supermodularity with respect to µ and m for m < K. For these systems, we cannot guarantee that operator TARR preserves the submodularity of the value functions with respect to λ even when a(x) is constant (see Lemma 4). However, any combination of all other operators will satisfy submodularity with respect to λ. The effect of a change in the size of the waiting room K is more complex: Any queueing system which represents the arrivals by only uncontrolling operator TARR , satisfies submodularity with respect to K, whereas any system with only controlling arrival operators TQ P RC and TB ADMi will have supermodularity with respect to K. We, finally, consider the case with m = K: Then all combinations of operators which exclude TARR will model queueing systems that satisfy supermodularity with respect to m (or K), whereas 20 the value functions of the systems with TARR are not submodular in general due to the preserved supermodularity property with respect to m. In inventory systems with finite capacity, the value functions will still satisfy supermodularity with respect to λ and submodularity with respect to µ for all possible combinations of the operators considered here. Now consider the effect of changing the finite capacity K: Since production can always be controlled in inventory systems, and so TC P RD is the only production operator considered here, all combinations of inventory operators representing a meaningful system will always include the operator TC P RD . Then, the value functions of all possible inventory models are supermodular with respect to K, due to the supermodularity preserved by TC P RD . All the results of this section agree with the basic intuition regarding to these properties. In fact, the results are restricted to some parameters and/or by certain conditions whenever the corresponding intuition is not clear. For example, when the waiting room capacity is finite in queueing systems, the intuition regarding to the parameter changes blurs due to the boundary effect of K, which is reflected in our results. 5 Illustration of Results and Examples In this section, we illustrate how our results can establish the effects of parameter changes on the structure of optimal policies in a number of models. We will first consider two admission control models, where the first model is the one introduced in section 2, and the other one considers a loss system receiving batch arrivals. Section 5.2 will sweep through the literature to point out several interesting models, which fall into our framework. Finally, we show a full application in a stock rationing problem of an inventory system with batch arrivals, for which we not only establish the structure of the optimal policy but also establish the effects of the parameters on the optimal policy. 5.1 Illustration of Results on Admission Control Problems Let us first illustrate the approach in detail using the admission control model of Section 2. Similar admission control problems for queueing systems have been studied by for instance Stidham [23] and Blanc et. al. [3]. Recall that the model outlined in Section 2 uses the main operators TADMi and TDEP , in addition to TF IC , TU N IF and TCOST . As mentioned before, TADMi is, of course, a special case of TB ADMi with a batch size of 1. The main structural results announced in Section 2 can now be easily verified. Using Lemma 1, it is seen that all operators propagate monotonicity (in the non-increasing sense) and concavity. Therefore, the value function is non-increasing and concave in the queue length and threshold-type policies are optimal, as stated in Proposition 1. In addition, class 1 would be a preferred class if ∆v(x) ≤ R1 , that is if the U BD property holds with parameter R1 . From Lemma 1, it can be seen that along with concavity, U BD(R1 ) propagates through all 21 operators but TCOST . This implies that unless h(x) = 0 for all x, there cannot be a preferred class. On the other hand if h(x) = 0 for all x, class 1 is preferred and will always be accepted. Let us now turn our attention to the effects of system parameters. As explained before, changes in the arrival or service rates in this model change the event occurence probabilities and must be treated differently. In particular, for these parameters, the results of Lemma 4 are critical. Let us start by investigating the effect of an increase in the service rate µ. This has a direct effect on TDEP and indirect effects on all other operators. By Lemma 4, TDEP preserves the non-decreasing marginal benefits property since the value function is non-increasing and concave in x. This implies that if supermodularity in µ and x also holds, optimal thresholds will be non-decreasing in µ. To check supermodularity, we go back to Lemma 2 which reveals that supermodularity propagates for all operators when the value function is non-increasing and concave. This shows that all optimal thresholds are non-decreasing in the service rate, µ. Now comes the more interesting case, where we increase the arrival rates, λi . In fact, we will check the effect of an increase in λi similarly to above. By Lemma 4, TADMi propagates the property of non-increasing marginal benefits since the value function is concave in x. Lemma 3 then shows that all operators support submodularity in λi and x when the value function is non-increasing and concave in x. This establishes that all optimal thresholds are non-increasing in arrival rates λi , which agrees with the second line of thought proposed in Section 2. In terms of the numerical example introduced, Table 1 gives the values of optimal thresholds, when each λi is increased by 0.1 Finally, let us investigate the effect of increasing the number of servers m. This is relatively easier to check since Lemma 4 is not needed. b(x) is non-decreasing in m and supermodular in m and x. This implies by Lemma 2 that TDEP preserves supermodularity in m and x since the value function is non-increasing and concave in x. In addition, the lemma also shows that all other operators preserve supermodularity. This ensures that all optimal thresholds are non-decreasing in the number of servers, m. In order to emphasize the strength of the framework in its generality, let us now extend the basic model of Section 2. Most of the studies on admission control focus on loss systems with no waiting room. Lippman and Ross [15], Altman et. al. [2], Örmeci et. al. [19], Savin et. al. [22], Örmeci and van der Wal [20] and Gans and Savin [6] consider the admission control problem in a loss system, which receives single arrivals. As a relatively complicated example, let us take the model in Örmeci and Burnetas [18] where a loss system with batch arrivals is considered. In particular, Örmeci and Burnetas [18] consider a partial acceptance problem, where some of the customers in a batch can be admitted while the remaining ones are rejected, in a loss system which consists of m identical parallel servers with no waiting room and N classes of customers. In the original model, it is allowed to have a batch containing different classes of customers. However, here we assume that each arriving batch consists of only one class of customers, so that arrivals 22 (λ1 , λ2 , λ3 ) (0.5, 0.7, 0.9) (0.6, 0.7, 0.9) (0.5, 0.8, 0.9) (0.5, 0.7, 1.0) Thresholds (l1 , l2 , l3 ) (72, 32, 6) (68, 30, 5) (70, 30, 6) (72, 32, 5) Table 1: The values of optimal thresholds in systems with different arrival rates from class i occur according to a Poisson process with rate λi , and a class-i arrival consists of B customers with probability piB . At each arrival epoch, the decision maker has to decide the number of customers to be admitted from an arriving batch. Each admitted class-i customer brings a reward of Ri ≥ 0 and requires an exponential service time with rate µ, where the classes are ordered with respect to their rewards, i.e., R1 ≥ · · · ≥ RN . We model this system as an event-based dynamic program by using the batch admission opera- tor, TB ADM i (modified for the finite buffer size with K = m), the departure operator TDEP , where the probability of service completion is b(x) = min{x, m}/M with M > m. The value function for infinite-horizon discounted reward problem for this model is then given by: N v(x) = M µTDEP v(x) + λi i=1 piB TB ADM i v(x) + θTF IC v(x), (5) B It is seen that most arguments used in the analysis from the simpler version of the model carry over directly. Hence, v(x) is non-increasing and concave in x, and it has the U BD property since no holding costs are incurred. These properties, then, imply the existence of an optimal threshold policy as in [18]: Proposition 2 There exists an optimal policy of threshold type, i.e., there exist numbers li ≥ 0 for i = 1, · · · , N , such that it is optimal to admit min{B, (li − x)+ } of incoming B class-i customers in state x. Moreover, all class-1 customers are always admitted, i.e., m = l1 ≥ l2 ≥ · · · ≥ lN . Now Lemmas 2 and 4 guarantee that the value functions are supermodular with respect to µ and x, as well as with respect to m and x, whereas Lemmas 3 and 4 ensure the submodularity of the value functions with respect to λi and x for all i. These results immediately translate to the following theorem, which summarizes the effects of the parameters on optimal thresholds. Theorem 1 In this partial acceptance problem, the optimal thresholds li are non-decreasing in the service rate, µ, and the number of servers, m, and non-increasing in the arrival rates λi . Finally we note that the first model studied in Lippman [16] is a special case of this model, where all batches consist of only single arrivals. 23 5.2 Other Examples from the Literature Several known models from the literature can easily be analyzed for the effects of changes in the parameters using the framework provided. We continue to consider the basic control problems studied in Lippman [16]. The second model of Lippman is an M/M/1 queue with a controllable service rate consisting of the operators TCD , TARR and TF IC . It is shown that the optimal service rate is non-decreasing in the queue length. We can complement these results by establishing, for example, that the optimal service rate is increasing in the arrival rate. The third model of Lippman considers a dynamic pricing problem in an M/M/m queue. This problem employs the operators TQ P RC , TDEP and TF IC . The optimal price to be charged to an arriving customer is shown to be increasing in the queue length. The effects of parameters on this problem have recently been analyzed independently by AktaranKalaycı and Ayhan [1] and Çil, Karaesmen and Örmeci [4]. Our framework also easily yields that the optimal prices are non-increasing in the service rate and in the number of servers, whereas they are non-decreasing in the arrival rate. Most basic control problems for make-to-stock queues can also be analyzed within this framework. Consider the standard problem of production control of a single-processor with exponential processing times and Poisson demand arrivals with linear holding costs and lost sales (see Veatch and Wein [26]). The optimal production policy for this problem is a threshold policy called a base stock policy: it is optimal to produce whenever the inventory level is below the base stock level and not to produce otherwise. The optimality equation is composed of the operators TC P RD , and TDEP , in addition to TCOST and TF IC . It then follows by Lemmas 3 and 4 that the optimal base stock level is decreasing in µ. Li [14] considers a make-to-stock queue with dynamic pricing of inventories. In this model, the Poisson demand rate depends on the price chosen. Li shows that the optimal production policy is of base-stock type, and optimal prices are decreasing in the inventory level. The optimality equation of this model is composed of the operators TC TI P RC , P RD , and in addition to TCOST and TF IC . Using the lemmas, we can establish that the base stock level is non-increasing and the optimal prices are non-decreasing in the processing rate. Even an extension of Li, when the demand arrivals occur according to a Markov modulated Poisson process (MMPP), studied by Gayon et. al. [7], falls into our framework by adding the operator TEN V (j) . For instance, an increased arrival rate in any one of the environment states leads to non-decreasing optimal prices. 5.3 A Stock Rationing Problem for a Make-to-Stock Queue with Batch Arrivals Our final example investigates a stock rationing problem with batch demand arrivals. To our knowledge, this problem has not been analyzed in such generality before and the framework is employed for both obtaining structural results and investigating the effects of parameters. 24 The stock rationing problem appears when different customer classes who bring different rewards have to be satisfied from a common stock. Since demand classes have different values, it is more important to satisfy one class rather than the others. Therefore, it is even possible not to fulfill certain demand types in anticipation of future demand from the more valuable classes. The basic stock rationing problem has been studied widely in the inventory literature going back to Topkis [24] who studies the optimal ordering and rationing policies for a single-product inventory system with several demand classes under periodic review. Our focus here is on a recent stream of research that uses the make-to-stock queueing framework to model limited production capacity. For instance, Ha [8], [9], and de Véricourt et. al. [5] examine such inventory systems under different assumptions. In particular, Ha [8] considers the stock rationing problem in a make-to-stock production system with several demand classes and lost sales and characterizes the structure of the optimal policy. To illustrate our framework in a rationing problem, we choose an extension of the model introduced by Ha [8], but modify it so that we consider profit maximization, as opposed to cost minimization in the original model. Ha considers a make-to-stock production system that produces a single product with N demand classes and lost sales. Demand arrivals of class i occur according to a Poisson process with rate λi , and each customer requests one unit of product. Let us consider the following plausible extension of this model: customers may demand more than one unit of product at a time. Therefore, we extend the model by considering batch arrivals. We assume that at an arrival epoch of class i, piB is the probability that there are B customers in the batch, where i = 1, · · · , N and B = 1, 2, · · · . We assume lost sales as in Ha, so that when a demand occurs, it is either satisfied from on-hand inventory or completely lost. Whenever a class-i demand is satisfied, a reward of Ri > 0 per unit is obtained, and without loss of generality we assume that R1 ≥ · · · ≥ RN . The production time is assumed to be exponential with mean 1/µ. Moreover, inventory holding cost per unit time, h(x), is a non-decreasing and convex function of the on-hand inventory. At any time, the decision maker has to decide whether to produce or not, and the number of class-i customers to be satisfied from an arriving batch, κi . We assume that the variable production cost is c, which is set to 0 in [8]. Then, we denote the arrivals and rationing policy by TB production control by TC P RD , RT i , the where π ∈ {0, 1} and cπ = πc, and the fictitious events by TF IC . The objective of the problem is to maximize the expected total β-discounted reward over an infinite horizon. To achieve this aim, we build the corresponding discrete-time MDP of the model by using uniformization and normalization since the potential transition rate is finite. Then, we assume that the time between two consecutive transitions is exponentially distributed with rate µ+ N i=1 λi + β + θ and using the appropriate time scale, assume that µ + N i=1 λi + β + θ = 1. The optimality equation can be expressed as: N v(x) = TCOST (µTC P RD v(x) + λi i=1 25 piB TB B RT i v(x) + θTF IC v). (6) 5.3.1 Structure of the Optimal Policy In the original model [8], Ha shows that the optimal production control and rationing policies are of threshold type, and class 1 is preferred, i.e., it is always optimal to satisfy class-1 demands whenever it is possible. We investigate the existence of similar structures in the extended model. We first concentrate on the concavity of the value function to ensure that there is an optimal policy which is of threshold type. To establish concavity we need to show the following inequality: µ∆TC + N i=1 λi B P RD v(x) piB ∆TB +θ∆TF IC v(x) µ∆TC RT i v(x) ≤ + −∆h(x) N i=1 λi B P RD v(x + piB ∆TB 1) RT i v(x +θ∆TF IC v(x + 1) + 1) (7) −∆h(x + 1). The first three lines are directly verified by Lemma 1 and the last line is true by the convexity of h(x). Hence, v(x) is concave in x. Concavity of the value function v(x) means that the value of an additional item in inventory is increasing with the current inventory level, x. Hence, as x increases, the system will be more willing to sell the items, while being less willing to produce them. To state the effects of the concavity of v(x) on optimal production decisions, we define: S ∗ = min{x : v(x) − v(x + 1) > −c}, (8) so that when x = S ∗ , it is optimal not to produce more items. Due to the concavity of v(x), for all states x ≥ S ∗ , it is not worth producing a new unit; while for all x < S ∗ it is always optimal to produce. Now we consider the effect of concavity on the rationing policy. Let: li∗ = max{x : v(x − 1) − v(x) < −Ri }, (9) with li∗ = 0 if there is no such x. Therefore, when x = li∗ , it is optimal to reject the whole class-i batch since v(li∗ − 1) + Ri < v(li∗ ), while for x = li∗ + 1 it is optimal to satisfy one unit from a class-i batch because v(li∗ )+Ri ≥ v(li∗ +1), by the definition of li∗ . Then for all x ≤ li∗ , v(x−1)+Ri < v(x) due to the concavity of v(x), so that it will be always optimal to reject the whole batch in all states x ≤ li∗ , i.e., κ∗i (x) = 0. Similarly, for all x ≥ li∗ + 1, v(x − 1) + Ri ≥ v(x) by concavity, so that it will be optimal to satisfy class-i demand until either the inventory level drops down to li∗ or the whole batch is satisfied. Hence, κ∗i (x) = min{x − li∗ , B} for all x ≥ li∗ + 1. In other words, the optimal rationing policy will reject the whole class-i batch if x ≤ li∗ , partially satisfy the demand if li∗ < x < li∗ + B, and satisfy the entire batch if x ≥ li∗ + B. Therefore, a low threshold li∗ leads to a higher fulfillment rate of class-i demand. In fact, if the reward obtained by admitting a class-i customer is higher than the reward of a class-j customer, then the optimal threshold of class-i 26 will be lower than that of class-j as a result of the definition of li∗ , i.e., when Ri ≥ Rj , li∗ ≤ lj∗ . Moreover, it is always optimal to satisfy class-1 customers whenever it is possible, i.e., class 1 is preferred. This result is easily established by Lemma 1, since all operators of the model preserve the LBD(R1 ) property (see the proof of Lemma 1 for details). The following theorem establishes the structure of an optimal policy for this model. Theorem 2 In this stock rationing model: • A base-stock policy is an optimal production control policy. S ∗ (given by (8)) is the the base-stock level so that it is optimal to produce only if x < S ∗ . • The optimal rationing policy is a sequential threshold policy for each demand class, where optimal thresholds are given by (9). Then, optimal number of class-i customers to be satisfied from an arriving batch in state x, κ∗i (x), is given by: ⎧ ⎨min{B, x − l∗ } i κ∗i (x) = ⎩0 if x > li∗ if x ≤ li∗ . Moreover, li∗ ’s are monotone in i and class-1 is preferred, i.e., lN ∗ ≥ · · · ≥ l1 ∗ = 0 5.3.2 Effects of the Parameters: µ, λi Intuitively, if the production rate, µ, increases, then the system will process faster and the opportunity cost of an additional inventory will increase because it is not necessary to keep more inventory when the system processes faster. On the other hand, the opportunity cost of an additional inventory decreases by an increase in the arrival rate of class-i customers, λi , because the higher the arrival rate, the faster the inventory is consumed. Inequalities (2) and (3) reflect the intuitions about the effects of the arrival rate and the production rate on the opportunity cost, respectively, and thus, we may anticipate supermodularity of the value function, vλi (x), with respect to λi and x, and submodularity of vµ (x) with respect to µ and x. We only show the proof of supermodularity of vλi (x) with respect to λi and x (submodularity is shown similarly). We need to show that: µ∆TC P RD vλi (x) µ∆TC N λj + j=1 P RD vλi +ε (x) N pjB ∆TB RT j vλi (x) B ≥ + λj j=1 pjB ∆TB RT j vλi +ε (x) B +θ∆TF IC vλi (x) +θ∆TF IC vλi +ε (x) −∆h(x) −∆h(x) piB (∆TB +ε RT i vλi +ε (x) B (10) − ∆vλi +ε (x)) , where the first three lines satisfy the inequality due to Lemma 2. Hence, we consider only the last line: By Lemma 4, TB RT i f (x) − f (x) is non-decreasing in x, i.e., ∆TB RT i f (x) − ∆f (x) ≤ 0, so that the last line is also true. Hence, vλi (x) is supermodular with respect to λi and x. 27 As to the effects of supermodularity on optimal actions, from (8) we see that as ∆vλi (x) decreases in λi , S ∗ increases. Similarly, li∗ is also increasing by a decrease in ∆vλi (x) by (9). The proof of submodularity of vµ (x) with respect to µ and x, and of its effects on the optimal actions are similar to the above, and so omitted. The following theorem summarizes our results on the effects of the parameters on the optimal policy: Theorem 3 In this stock rationing model, the optimal base-stock level, S ∗ , and the optimal rationing thresholds, li∗ , are non-increasing in µ and non-decreasing in λi . 6 Conclusion We presented a general framework for investigating the effects of system parameters on the optimal policy for a class of queueing and inventory control problems. Our approach is based on an exploration of the properties of the operators that constitute the value function of the given control problem. The approach gives a clear guideline of how system parameters affect the structure of the operators and the consequent effects on the optimal policy. In addition, for a rich class of problems, the framework enables an automatic assessment how changes in different system parameters affect the optimal policy. There are several interesting research directions. Extending the scope of application of the general approach is one potential direction. Additional control operators could be investigated and possibly a more general class of problems could be addressed. Exploring multi-dimensional problems is also a major challenge. As always these problems pose additional difficulties and an immediate general extension seems very hard but it would be interesting to understand to what extent the results here could be generalized to multi-dimensional problems. Acknowledgements: We thank Yalçın Akçay, Refik Güllü and Evrim Güneş for their helpful comments, and Ger Koole for sharing an unpublished manuscript. This work has been partially supported by a TÜBA Grant. References [1] T. Aktaran-Kalaycı and H. Ayhan. Sensitivity of optimal prices to system parameters in a service facility. Submitted to European Journal of Operations Research, 2005. [2] E. Altman, T. Jimenez, and G. M. Koole. On optimal call admission control in a resourcesharing system. IEEE Transactions on Communications, 49:1659— 1668, 2001. [3] J. P. C. Blanc, P. R. de Waal, P. Nain, and D. Towsley. Optimal control of admission to a multiserver queue with two arrival streams. IEEE Transactions on Automatic Control, 37:785— 797, 1992. 28 [4] E. B. Çil, F. Karaesmen, and E. L. Örmeci. Sensitivity analysis on a dynamic pricing problem of an m/m/c queueing system. Proceeedings of the 12th IFAC Semposium on Information Control Problems in Manufacturing, 2006. [5] F. de Véricourt, F. Karaesmen, and Y. Dallery. Optimal stock allocation for a capacitated supply system. Management Science, 48:1486—1501, 2002. [6] N. Gans and S. Savin. Pricing and capacity rationing for rentals with uncertain durations. Management Science, 53:390—407, 2007. [7] J.P. Gayon, I.T. Değirmenci, F. Karaesmen, and E.L. Örmeci. Dynamic pricing and replenishment in a production-inventory system with markov modulated demand. Working Paper, Department of Industrial Engineering, Koç University, 2004. [8] A. Y. Ha. Inventory rationing in a make-to-stock production system with several demand classes and lost sales. Management Science, 43:1093— 1103, 1997. [9] A. Y. Ha. Stock-rationing policy for a make-to-stock production system with two priority classes and backordering. Naval Research Logistics, 44:458—472, 1997. [10] Smith J.E. and K.F. McCardle. Structural properties of stochastic dynamic programs. Operations Research, 20:796—809, 2002. [11] G. Koole. Structural results for the control of queueing systems using event-based dynamic programming. Queueing Systems, 30:323— 339, 1998. [12] G. Koole. Monotonicity in markov reward and decision chains: Theory and applications. Working Paper, Vrije University, The Netherlands, 2007. [13] C. Y. Ku and S. Jordan. Access control to two multiserver loss queues in series. IEEE Transactions on Automatic Control, 42:1017— 1023, 1997. [14] L. Li. A stochastic theory of the firm. Mathematics of Operations Research, 13(3):447—466, 1988. [15] S. A. Lippman and S. M. Ross. The streetwalker’s dilemma: A job shop model. SIAM Journal of Applied Mathematics, 20:336— 342, 1971. [16] S.A. Lippman. Applying a new device in the optimization of exponential queuing systems. Operations Research, 23:687— 710, 1975. [17] A. Müller. How does the value function of a markov decision process depend on the transition probabilities. Mathematics of Operations Research, 22:872—885, 1996. 29 [18] E. L. Örmeci and A. Burnetas. Admission control with batch arrivals. Operations Research Letters, 32:448—454, 2004. [19] E. L. Örmeci, A. Burnetas, and J. van der Wal. Admission policies for a two class loss system. Stochastic Models, 17:513— 540, 2001. [20] E. L. Örmeci and J. van der Wal. Admission policies for a two class loss system with general interarrival times. Stochastic Models, 22:37—53, 2006. [21] M.L. Puterman. Discrete Stochastic Dynamic Programming. John Wiley and Sons, USA, 1994. [22] Savin S., Cohen M., Gans N., and Katalan Z. Capacity management in rental businesses with heterogeneous customer bases. Operations Research, 53:617—631, 2005. [23] S. Stidham. Optimal control of admission to a queuing system. IEEE Transactions on Automatic Control, 30:705— 713, 1985. [24] D. M. Topkis. Optimal orderingand rationing policies in a nonstationary dynamic inventory model with n demand classes. Management Science, 15:160—176, 1968. [25] M.H. Veatch and L.M. Wein. Monotone control of queueing networks. Queueing Systems, 12:391—408, 1992. [26] M.H. Veatch and L.M. Wein. Scheduling a make-to-stock queue: index policies and hedging points. Operations Research, 44(4):634—647, 1996. 30
© Copyright 2025 Paperzz