Multiuser Throughput Dynamics under Proportional Fair and other Gradient Based Scheduling Algorithms Alexander L. Stolyar Bell Labs, Lucent Technologies Murray Hill, NJ 07974 [email protected] Abstract We consider the model where N queues (users) are served in discrete time by a The switch state is random, and it determines the set of possible service rate choices (scheduling decisions) in each time slot. This model is primarily motivated by the problem of scheduling transmissions of N data users over a shared time-varying wireless environment, but also includes other applications such as inputqueued cross-bar switches and parallel exible server systems. The objective is to nd a scheduling strategy maximizing a concave utility function H (u1 ; : : : ; uN ), where un 's are long-term average service rates (data throughputs) of the users, assuming users always have data to be served. We prove asymptotic optimality of the Gradient scheduling algorithm (which generalizes the well known Proportional Fair algorithm) for our model which, in particular, allows for simultaneous service of multiple users and for discrete sets of scheduling decisions. Analysis of the transient dynamics of user throughputs is the key part of this work. generalized switch. Scheduling, Optimal Throughput Allocation, Proportional Fair, Gradient Algorithm, Wireless, Generalized Switch Key words and phrases: 1 Introduction We consider the model where N queues (users) are served in discrete time by a generalized switch. The switch state is a random ergodic process, and it determines the set of possible service rate choices (scheduling decisions) in each time slot. This model is primarily motivated by the problem of scheduling transmissions of N data users over a shared time-varying 1 wireless environment, which naturally arises, for example, in modern wireless data technologies, such as HDR (3G1X EV-DO) [2, 7]. It also includes as special cases other models and applications such as input-queued cross-bar switches [11, 12], a discrete time version of the parallel server system [6, 19], and a parallel-processing system [5]. (The specic model we consider is identical to that of [14]. However, models of this type have a long history, going back to the work of Tassiulas and Ephremides [15, 16]. See [14] for a further review of the model history and applications.) In this paper we consider the \saturated system" where each user has innite amount of data to be served (transmitted), and we are concerned with optimizing the vector of average service rates (throughputs) u = (u1; : : : ; uN ), so as to maximize some concave utility function H (u). In the wireless data scheduling context, P this problem was rst considered by Tse in [17], where the Proportional Fair utility function log(un), and the corresponding Proportional Fair scheduling algorithm (dened below) were introduced for the model where (as in HDR) one user can be served (transmit) at a time. The Gradient Algorithm (dened below) is a natural generalization of Proportional Fair algorithm in that it applies to any concave utility function, and to the systems where multiple users can be served at a time. The goal of this paper is to formally prove (asymptotic) optimality of the Gradient Algorithm for our rather general model, which in particular allows for (a) simultaneous service of multiple users and (b) the set of scheduling decisions to be discrete (as is the case in HDR and other wireless data technologies). To describe our main results and connections to previous work, let us rst describe the model in more detail. (The formal description will be given in Section 3.) Queues (users) n = 1; : : : ; N , are served by a generalized switch in discrete time t = 0; 1; 2; : : : . Switch state m = (m(t); t = 0; 1; 2; : : : ) is a random ergodic process. In each state m, the switch can choose a scheduling decision k from a set K (m). Each decision k has the associated service rate vector m(k) = (m1 (k); : : : ; mN(k)). This vector represents the amount of data of each user which will be served (transmitted) in one \time slot" if decision k is chosen. (We emphasize the fact that our model allows for features (a) and (b) specied earlier.) As mentioned earlier, we consider the situation when each user always has data to be served, and we are interested in optimizing the average service rates u = (u1; : : : ; uN ). Namely, we seek to nd a scheduling algorithm which produces u solving the problem max H (u) (1) subject to u2V ; (2) where H (u) is a strictly concave utility function, and set V above is the system rate region, i.e. the set of all feasible long-term service rate vectors. It is easily observed that (under some 2 natural non-restrictive assumptions) V is a convex closed bounded set and, consequently, the solution u to the problem (1)-(2) exists and is unique. The Gradient Algorithm is dened as follows. If at time t switch is in state m, the algorithm chooses a (possibly non-unique) decision k(t) 2 arg max rH (X (t)) m (k) k maximizing the dot-product with the gradient of H (X (t)), where the vector X (t) is updated as follows: X (t + 1) = (1 )X (t) + m (k) ; with arbitrary initial value X (0) and with > 0 being a xed (small) parameter. (We see from this denition that, roughly, vector X (t) represents exponentially smoothed average service rates.) The Fair (PF) algorithm is the Gradient Algorithm with utility function H (u) = P log(Proportional un). Also, very often PF algorithm is considered for the \one-at-a-time" case, when service rate vectors m(k) can have no more than one non-zero component. In other words, in each time slot only one user can be picked for service, and if user n is picked, it is served at the rate mn , depending on the switch state m. In this case the PF algorithm is particularly simple [17, 7]: in each time slot t it serves the user n for which mn =Xn(t) is maximal. In this paper we prove the following property for our model. (P1) Asymptotic Optimality of Gradient Algorithm. Let u denote the vector of expected average service rates in a system with xed parameter > 0. Then, as ! 0, u ! u : (The rates u are dened in terms of averages over nite time intervals. Formal denitions and corresponding results, Theorem 1 and Corollary 1, are presented in Section 7. The results show that the convergence in (P1) holds in a quite strong sense.) Establishing property (P1) naturally involves the analysis of the (limiting, continuous time) trajectories of the process X (t= ), with very small xed > 0. Such trajectories x = (x(t); t 0), which we call uid sample paths (FSP), satisfy, in particular, the dierential inclusion x0 (t) = v (t) x(t); v (t) 2 arg max[rH (x(t))] v : (3) v2V As the key element of proving (P1), we prove the following attraction property of FSPs. (P2) Attraction Property. The convergence x(t) ! u ; t ! 1 ; holds for any FSP x, with any initial state x(0). (In fact we prove uniform convergence on the initial states x(0) from a bounded set. See Theorem 3 for precise formulation.) 3 The problem of asymptotic optimality of the Gradient Algorithm was studied in the recent papers [8] and [1]. In particular, in those papers the dierential inclusion (3) was derived and the attraction property (P2) was proved, under certain constraints which we now discuss in more detail. In both [8] and [1] it is required that the arg max in (3) is a single point v, depending on x(t) continuously. This makes (3) a dierential equation of the form x0 (t) = f (x(t)) (4) with continuous f (). In essence, this is the condition that rate region V is strictly convex. Such condition does not hold, for example, if the set of all possible scheduling decisions is nite - a situation typical in many applications, including HDR [2, 7]. The proof of convergence (P2) in [8] relies in essential way on a further assumption that the dynamics described by (4) is cooperative. This roughly means that the vector-function f (u) is such that if we decrease all components of vector u except one, say ui, then fi(u) will also decrease. This is another condition on the shape of V . It holds in the \one-at-a-time" case (when the service rate vectors m(k) can have at most one non-zero component). It however typically does not hold in important cases when multiple users can be served simultaneously in one time slot, as in [18] (or in cross-bar switch models). In [1], property (P2) is proved under the additional assumption that the initial state x(0) 2 V . (In this case H (x(t)) is a natural Lyapunov function, which is no longer the case if x(0) is outside of V .) This assumption is restrictive for the following two (related) reasons. First, such form of property (P2) does not imply the asymptotic optimality (P1). Secondly, in applications, and especially wireless applications, the stationarity of the switch (channel) state process is only an approximation - the \stationary" distribution of this process is subject to both slow and sudden changes (for example, due to arrival and departure of users). Therefore, the ability of an algorithm to bring system to the optimal state from any initial state is in fact required for algorithm robustness in real applications. Our proof of property (P2) follows a \geometric" approach (inherited from [14]), which does not rely on the cooperative dynamics assumption, or on the condition that rate region is strictly convex, or on the utility function H as the sole Lyapunov function. As a result, we are able to establish property (P2), and consequently the asymptotic optimality (P1), in the desired generality, which allows in particular for features (a) and (b) of the model. We also note that the proofs of (P2) in [8] and [1] exclude the utility functions of the type P log( un), which take value 1 when some un = 0. Due to the wide use of such (somewhat less \benign") utility functions in applications, we include them in our results. To do this, for such utility functions, we prove and use additional properties of uid sample paths, beyond the basic dierential inclusion (3). As we comment later in the paper, the assumption of strict concavity of utility function H is not essential. If H is just concave, all our results and proofs hold if we replace the single optimal solution u of the problem (1)-(2) by the set of optimal solutions (which necessarily will be convex and compact). 4 To summarize, the main contributions of this paper are as follows: - The key attraction property (P2) of uid sample paths under Gradient Algorithm is proved for arbitrary initial state, and moreover, uniformly on the initial states from a bounded subset; - Property (P2) is proved under very general (but common in applications) assumptions on the model and utility function; - The asymptotic optimality (P1) of the Gradient Algorithm is proved, based on the property (P2). Finally, we note that papers [8] and [1] (as well as earlier papers [9, 10]) also consider versions of the Gradient Algorithm, such that the parameter is not xed, but rather is a decreasing vanishing function of time t. Such versions of the algorithm are asymptotically optimal in the sense that long-term average throughputs converge to the exact optimal values u, as time t ! 1. However, the smaller the initial value (0), the greater the convergence time. The algorithm with xed (which we study in this paper) is dierent in that it brings user throughputs into a (small) neighborhood of u. However, as we show, the convergence is within the time of the order of O(1= ), uniformly on essentially any initial state. (This is because, in applications, usually there exists some a priori known upper bound a on the maximum possible service rate of any user at any time. As a result, as long as all Xn(0) a, the components of X (t) will always stay bounded by a.) This means that the algorithm is able to adjust to the \environment" changes without resetting the algorithm parameters. Such robustness of the algorithm with xed is very desirable in applications (as already mentioned earlier). The rest of the paper is organized as follows. In Section 2 we introduce basic notations and conventions used in the paper. We introduce the formal model in Section 3, and in the following Section 4 dene the system rate region V . The optimization problem for the average user rates is formulated in Section 5, and the Gradient Scheduling Algorithm is dened in Section 6. The asymptotic optimality results (Theorem 1, Corollary 1, and Theorem 2), which formalize and strengthen property (P1), are presented in Section 7. Section 8 contains the main result on uniform convergence of uid sample paths (Theorem 3), which formalizes property (P2), and the process level convergence result (Theorem 4). In Section 9 we prove Theorem 3. Section 10 contains formal denition of uid samlpe paths, and proofs of their properties. Finally, in Section 11 we prove Theorems 1, 2, and 4. Section 12 contains concluding remarks. 2 Notations We will use standard notations R and R+ for the sets of real and real non-negative numbers, respectively. Corresponding N -times product spaces are denoted RN and R+N . The space RN is viewed as a standard vector-space, with elements x 2 RN being row-vectors x = 5 (x1 ; : : : ; xN ). The dot-product (scalar product) of x; y 2 RN , is x y =: and the norm of x is Xx y N n=1 n n ; p kxk =: x x : Sometimes we use notation (x; y ) =: kx y k for the distance between vectors x and y, and notation (x; V ) =: inf (x; y ) y2V for the distance between vector x and a set V RN . Suppose x is an N -dimensional vector with non-negative components xn, and V is a nite closed subset of RN . Then, arg max x v v2V denotes the subset of vectors v 2 V with the maximum value of x v. We do not exclude the case when xn = +1 for some n. For this case we use the convention that the product (+1) c is equal to 1, 0, and +1, if c is negative, zero, and positive, respectively. Thus, if xn = +1 and V 2 R+N , then any u 2 V with un > 0 belongs to arg maxv2V x v. For a scalar function h(t) of a real variable t, we use the following notations for the lower and upper right derivative number: h(t + dt) h(t) d+ h(t) = lim inf ; dt#0 dt dt d+ h(t + dt) h(t) h(t) = lim sup : dt dt dt#0 The abbreviation u.o.c. means uniform on compact sets convergence of functions. 3 Generalized Switch Model Generalized switch [14] is the following queueing system. There is a nite set N = f1; 2; : : : ; N g of queues (and corresponding \customer types") served by a switch. (We will use the same symbol N for both the set and its cardinality.) In this paper we assume that queues always have suÆcient \supply" of customers to serve. The system operates in discrete time t = 0; 1; 2; : : : . By convention, we will identify an (integer) time t with the unit time interval [t; t + 1), which will sometimes be referred to as the time slot t. 6 The switch has a nite set of switch states M . In each time slot, the switch is in one of the states m 2 M ; and the sequence of states m(t); t = 0; 1; 2; : : : , forms an irreducible (nite) Markov chain with stationary distribution fm; m 2 M g, X m = 1 : m > 0; 8m 2 M; m2M When switch is in state m 2 M , a nite number of scheduling decisions can be made, which form a nite set K (m); if a decision k 2 K (m) is chosen at time t, then the \amount" mn (k) 0 of \customers" of type n 2 N are served and depart the system at time t + 1. We will denote m(k) =: (m1 (k); : : : ; mN(k)) 2 R+N the corresponding vector of service rates, and make a non-degeneracy assumption that for any type n 2 N , there exist at least one state m 2 M and a decision k 2 K (m) such that mn(k) > 0. (The values of mn(k) are real numbers, not necessarily integers. For exposition purposes, we sometimes refer to mn(k) as \number of customers.") We will denote by the maximum of the values of mn(k) over all possible triples (n; m; k), and by > 0 the smallest strictly positive value of mn(k) over all (n; m; k). Remark. Formally, our model does not include those in [8, 1], primarily because the set of switch states M and the sets K (m) of scheduling decisions are nite. However, rst, the key result of this paper on the uniform convergence of uid sample paths (Theorem 3) is strictly more general than the corresponding results in [8, 1]. Secondly, the extension of our results, for example, to the case of ergodic m with arbitrary state space M and compact (uniformly bounded) sets K (m) (as in [1]) is straightforward. This would require only minor adjustments in formulations and proofs. We concentrate on a \nite" model to simplify the exposition. Note also that this nite model is common in applications and is a model not covered by the results of [8, 1]. (See also remark in the next section.) 4 System Rate Region Suppose, for each of its states m 2 M , a probability measure m = (mk ; k 2 K (m)), is P xed, which means that mk 0 for all k 2 K (m), and k mk = 1. Consider a Static Service Split (SSS) scheduling rule, parameterized by the set of measures : = (m ; m 2 M ). When the switch is in state m, the SSS rule chooses one of the scheduling decisions k 2 K (m) randomly with probability mk . Then, clearly, the long-term service rate allocated to queue n 2 N is equal to X m X m(k) : vn = mk n m2M Thus, k2K (m) v =: (v1 ; : : : ; vN ) = v () ; 7 where v() is the function of an SSS rule , dened by the above expressions. (Here and below, a measure dening an SSS rule is itself called an SSS rule.) The system rate region V is dened as the set of all service rate vectors v() corresponding to all possible SSS rules . In other words, V is the set of all feasible long-term service rate vectors. Obviously, V is a closed bounded convex set (and, moreover, a polyhedron) in R+N , as a linear image of the convex polyhedron of possible values of . Rate region V may turn out to be degenerate (i.e., have dimension less than N ). Remark. The fact that V is a polyhedron is a consequence of the fact that sets of switch states M and scheduling decisions K (m) in each state m are nite. However, our analysis only requires that V is convex - the polyhedral shape of V is not used. Moreover, the fact that V is allowed to be non-strictly convex is a generality not covered in the previous work ([8, 1]), because it causes uid sample paths to be generally non-smooth. 5 Optimization Problem for the Rate Allocation Consider the following optimization problem: max H (u) (5) subject to u2V ; (6) where H (u) is a strictly concave utility function. We will consider two types of utility function, dened as follows. Type (I) Utility Function. H (u) is continuous stictly concave function on R+N . Moreover, H (u) is continuously dierentiable, i.e. the gradient rH is nite and continuous everywhere in R+N . P Type (II) Utility Function. H (u) = n Hn (un ) where each Hn (un ) is strictly concave continuously dierentiable function, dened for all un > 0, and such that Hn(un) # 1 as un # 0. (The denition implies Hn0 (un ) " +1 as un # 0. We adopt the conventions that Hn (0) = 1 and Hn0 (0) = +1.) Remark 1. Type (I) utility function is more \benign." The system dynamics under the Gradient algorithm with such utility function is rather simple. Type (II) function has additive form, but the components are not bounded and have innite derivative at 0, which P somewhat complicates analysis. The Proportional Fair utility function n log(un), widely used in applications, is of type (II). This is one of the reasons why we do not want to exclude such functions from our analysis. We also note that all our results hold as is for more general type (II) functions, where some of the components Hn are allowed to be (more \benign") nite and continuously dierentiable everywhere on R+, including 0. Extensions of the analysis to this case will be transparent - we do not do it here to simplify the exposition. 8 Proposition 1 Suppose the utility function H is of either type (I) or type (II). Then, solution u to the problem (5)-(6) exists and is unique. Proof is trivial for type (I) utility function H , since V is a bounded convex compact set and H is strictly concave. For the type (II) function H , its domain needs to be rst restricted to a compact set V \ [; 1)N , with > 0 small enough so that the supremum of H is certainly attained within this set. Our denition of the utility function does not require that H is non-decreasing on each argument. So, the optimal point u is not necessarily a point on the outer (\northeast") boundary of V . Although, in many, if not most, applications of interest this would be the case. Remark 3. The assumption of strict concavity of utility function H is not essential for the results of this paper. If H is just concave, the set of optimal solutions to the problem (5)-(6) is a convex compact set u. Our key results and proofs in Sections 8 and 9 hold virtually verbatim, with u replaced by u. And so do the asymptotic optimality results in Section 7 (with the exception of the proof of Theorem 1 which would have to be slightly adjusted). We make the strict concavity assumption to simplify the exposition. Remark 2. 6 Gradient Scheduling Algorithm Now, consider the random process describing behavior of the Generalized Switch under the following scheduling algorithm which is called Gradient Algorithm (GA). Gradient Algorithm: If at time t switch is in state m, choose a scheduling decision k(t) 2 arg max [ r H (X (t))] m (k) ; k2K (m) where X (t) = (X1 (t); : : : ; XN (t)) is the vector of the average service rate estimates, which is updated as follows: X (t + 1) = (1 )X (t) + m(t) (k(t)) ; with > 0 being a (small) parameter. The initial vector X (0) 2 R+N is arbitrary. Let us denote S (t) =: (X (t); m(t)) : Our assumptions imply that S = fS (t); t = 0; 1; : : : g is a discrete time Markov chain with the state space R+N M . 9 7 Asymptotic Optimality of Gradient Algorithm This section presents the results showing that, as # 0, the average user service rates converge to the optimal solution u of problem (5)-(6). Moreover, this convergence is uniform (up to a time scaling by the factor of 1= ). First, we need to dene the asymptotic regime. From this point on in the paper, we consider a sequence of processes S , indexed by the value of parameter , with # 0 along a sequence B = fj ; j = 1; 2; : : : g such that j > 0 for all j . The initial state S (0) = (X (0); m (0)) is xed for each 2 B. (Here and below, the processes and variables pertaining to a xed parameter will be supplied the upper index . Expression # 0 means that converges to 0 along the sequence B, unless otherwise specied.) The probability law of the Markov chain m () describing the switch state process is the same for each . Let us denote by U (l1; l2) the vector of average service rates in the interval [l1 ; l2]. Namely, for a pair of integers 1 l1 l2, l X 1 : D (j ) ; U (l1 ; l2 ) = l2 l1 + 1 j =l where D (l) = m(l 1) (k(l 1)) the vector of numbers of customers departed at time l (with k(l 1) 2 K (m(l 1)) being the scheduling decision chosen in slot l 1). 2 1 Theorem 1 Let A be a bounded subset of R+N . Then, for any > 0 there exists T > 0 and T > 0 (both depending on and A), such that lim sup kEU (l1 ; l2) uk < : #0 X (0)2A; l1 >T =; l2 l1 >T = Corollary 1 Suppose for each we consider a stationary version of the process S , in which case ED (1) = EU (l1 ; l2 ) (for any integer 1 l1 l2 ) is simply the average service rate vector. Then, : lim ED (1) ! u #0 of Theorem 1 will be presented in Section 11. It is obtained using the following uniform convergence in probability result for X . Proof Theorem 2 Let A be a bounded subset of R+N . Then, for any > (depending on and A) such that lim sup P fkX (t) uk > g = 0 : #0 X (0)2A; t>T= Proof. Presented in Section 11. 10 0 there exists T > 0 8 System Dynamics when is Small We consider the sequence of systems introduced in the previous section, with parameter # 0. Let us extend the denition of X (t) to the continuous time t 2 R+ , by adopting a convention that X (t) is constant within each time slot [l; l + 1). (We use this convention throughout the paper, for all other discrete time processes, which are originally dened on the integers.) Then, for each we can consider the scaled process x (t) =: X (t= ); t 2 R+ : In this section we formulate our main results, describing asymptotic behavior of the sequence of processes fx g. To this end, rst consider the family of uid sample paths (FSP) x = (x(t); t 2 R+ ), which arise as possible limits of sample paths of the processes x as # 0 (given that the sample paths of the switch state process m satisfy the law of large numbers condition). The formal construction of the family of FSPs is presented in Section 10, and it will be proved (in Lemmas 2 and 3) that this family satises the following properties (i)-(iii). (i) \Basic dynamics." Each FSP x is a Lipschitz continuous function in [0; 1), with Lipschitz constant C + kx(0)k (where C > 0 is a xed constant depending only on system parameters), and satisfying the following basic dierential inclusion for almost all t 2 R+: x0 (t) = v (t) x(t); v (t) 2 arg max[rH (x(t))] v : (7) v2V (Points t for which (7) holds will be called regular.) (ii) \Boundary condition." Suppose the utility function H is of type (II). Suppose xi (t) = 0 for i 2 B N and xi(t) > 0 for i 2 N n B . Then X d+ x (t) c > 0 ; dt i2B i where c > 0 is a xed constant depending only on the system parameters. (iii) \Compactness." If a sequence of FSP x(j) ! x u.o.c. as j ! 1, then this x is also an FSP. The following uniform convergence theorem is the key result of this paper. Theorem 3 Suppose H (x) is a utility function of either type (I) or type (II). Then, uid sample paths x have the following convergence property. For any bounded subset A R+N , x(t) ! u ; t ! 1; uniformly on x(0) 2 A. 11 Proof is presented in the next section. It is easy to observe that Theorem 3 is a \deterministic analog" of Theorem 2 of the previous section. In fact the proof of Theorem 2 follows from Theorem 3 and the following process convergence result. Theorem 4 Consider the sequence of processes fx g with # 0, such that x (0) ! x(0), where x(0) 2 R+N is a xed vector. Then, the sequence fx g is relatively compact and any weak limit of this sequence (i.e., a process obtained as a weak limit of a subsequence of fx g) is a process with sample paths being FSPs with probability 1. The processes x are processes in the Skorohod space DRN [0; 1) of rightcontinuous left-limits (RCLL) functions, taking values in the space RN (with standard topology and Borel -algebra on RN ). (See [4] for the denition.) Remark 2. Theorem 4 is analogous to the corresponding process convergence results in [8, 1]. Note however that Theorem 4 is stronger is the sense that it claims that the limiting process is concentrated on the family of FSPs, which is \narrower" than simply the family of solutions of the dierential inclusion (7). This subtlety is important because when we use this theorem, we need all properties of FSPs, not just property (i). This also eects the proof. Proof is in Section 11. Remark 1. 9 Proof of Theorem 3 In this proof, without loss of generality, we assume that the set A is closed, and therefore compact. 9.1 Uniform Convergence to the Set V The following lemma formalizes a simple, but nevertheless important, general observation that if velocity x0 (t) is always a vector from x(t) to an arbitrary point within a convex set V , then the distance from x(t) to V can only decrease. Note, that in the dierential inclusion (8) in Lemma 1 it is only required that v(t) 2 V . (There is no condition v(t) 2 arg max[rH (x(t))] v.) Lemma 1 Suppose V is a convex bounded closed subset of RN . Suppose a vector-function (x(t); t 0) is Lipschitz continuous, satisfying the following dierential inclusion for almost all t 0: x0 (t) = v (t) x(t); v (t) 2 V : (8) 12 Then, (x(t); V ) is a Lipschitz continuous non-increasing function, and moreover, for almost all t 0, d (x(t); V ) (x(t); V ) ; dt which implies that (9) (x(t); V ) (x(0); V )e t : As we see, under the conditions of this lemma, functions satisfying (8) converge to set V uniformly on the initial states from a bounded set. For ease of reference we record this fact as a Corollary 2 below. For 0, let us denote by V, the -thickening of a subset V RN , namely V =: fy 2 RN j (y; V ) g Corollary 2 Let arbitrary bounded set A RN and arbitrary > 0 be xed. Then there exists T1 = T1 (; A) such that for any (x(t); t 0) satisfying conditions of Lemma 1 and x(0) 2 A, x(t) 2 V; 8t T1 : Lipschitz continuity of (x(t); V ) follows from the fact that x(t) is Lipschitz continuous. To prove the rest of the statement, it suÆces to show that (9) holds for t = , where > 0 is a regular point such that (x(); V ) > 0. Consider the point which is the (unique) point of V closest to x(), i.e., (x(); ) = (x(); V ). (See Figure 1.) Consider vectors 1 = x(); 2 = v () ; and = v () x() = 1 + 2 : Consider auxiliary linear function y (t) = + 2 (t ); t ; and note that y(t) 2 V for all suÆciently small increments (t ) 0. Then, we can write Proof of Lemma 1. d d+ (x(t); V ) (x(t); y (t)) = dt dt k1k = (x(); V ) : 13 v2 v () 2 y (t) V x() 1 v1 Figure 1: Proof of Lemma 1. 9.2 Proof of Theorem 3 for Type (I) Utility Function This proof uses only the basic dynamics property (i) of the FSPs. Let us x arbitrary Æ > 0. Then x > 0: small enough so that h < H (u), where h is the maximum of H (x) over the subset UÆ; = V n OÆ (u). Using concavity and smoothness of H (x), it is easy to verify that at any point x 2 UÆ; , the directional derivative of H (x) in the direction of point u is at least (H (u) h )=Æ, namely (10) [rH (x)] (u x) H (u Æ) h ku xk H (u) h : By Corollary 2 (from Lemma 1), there exists T1 such that x(t) 2 V for all t T1 . Then, for any regular point t T1 such that x 2 UÆ; we have: d H (x(t)) = [rH (x(t))] (v (t) x) [rH (x(t))] (u x) H (u ) h : dt Since H (u) h > 0, we see that there exists a constant T2 T1 such that (uniformly on x(0) 2 A), t2 = minft T1 j x(t) 2 O Æ (u )g T2 ; i.e. x(t) must \hit" the closed neighborhood OÆ (u) no later than at time T2 . For all t t2, we must have H (x(t)) h, where h is the minimum of H (x) over x 2 V \ O Æ (u ). (This is because, for t T1 , H (x(t)) can not decrease outside of O Æ (u ).) Since our Æ > 0 could be chosen arbitrarily small, we observe that (uniformly on x(0) 2 A) lim inf H (x(t)) = H (u) : (11) t!1 Finally, there exists T3 T2 such that (uniformly on x(0) 2 A) for all t T3 x(t) 2 V \ O Æ (u ) ; 14 because otherwise (11) would not hold. 9.3 Proof of Theorem 3 for Type (II) Utility Function This proof uses all three properties (i)-(iii) of the FSPs. As in the proof for type (I) utility function, let us x arbitrary Æ > 0, and then x > :0 small enough so that h < H (u), where h is the maximum of H (x) over the subset UÆ; = V n OÆ (u ). By Corollary 2 (from Lemma 1), there exist T10 such that x(t) 2 V for all t T10 . (The above statement, and other statements in this proof, hold uniformly on x(0) 2 A, unless specied otherwise.) The rest of the proof consists of a sequence of propositions. Proposition 2 At any regular point t 0, we have xi (t) > 0 for all i. Suppose not. Consider the subset B N of types i such that xi(t) = 0. Since t is a regular point, we must have x0i (P t) = 0. (Otherwise xi () would be negative just before or just after time t.) Consequently, i2B x0i(t) = 0. This however contradicts to the property (ii) of FSPs (proved later in Lemma 3 (ii)). Proof. Proposition 3 For all t > T10 , we have xi (t) > 0 for all i. We can choose a regular point t0 > T10 , arbitrarily close to T10 . We have H (x(t0)) > 1, because all xi (t01) > 0. Similarly to the way it is established in the proof for type (I) utility function, we see that H (x(t)) has strictly positive derivative at any regular point t 2 [T10 ; 1) such that H (x(t)) > 1 and x(t) 2 UÆ;. This implies that H (x(t)) > 1 for all t t01. Proof. Proposition 4 For any T1 > T10 , there exists 1 > 0 such that we have xi (T1 ) 1 for all i. Suppose not. Then there exists a sequence of FSPs (with initial states within the compact set A) converging u.o.c. to a function x = (x(t); t 0) such that for at least one i, xi (T1 ) = 0. By property (iii) of FSPs (proved later in Lemma 3 (iii)), this function x is also an FSP with x(0) 2 A. This contradicts to Proposition 3. Proof. Proposition 5 For any T1 > T10 , there exists > 0 such that we have xi (t) for all i and all t T1 . 15 Let us x arbitrary T1 > T10 and choose a small 1 > 0 such that the statement of Proposition 4 holds. Without loss of generality we can assume that 1 is small enough so that H (1 ; : : : ; 1 ) < min H (x) : x2O Æ Then, again using the fact that H (x(t)) has strictly positive derivative at any regular point t 2 [T10 ; 1) such that H (x(t)) > 1 and x(t) 2 UÆ;, it is easy to see that H (x(t)) H (1 ; : : : ; 1 ); 8t T1 : This easily implies that all xi(t) must stay separated from 0 in [T1 ; 1). Proof. Thus we have proved that there exists T1 > 0 such that for all t T1 , x(t) 2 V; =: V \fxi ; 8ig. Note that H (x) is bounded on V; . Given this fact, the rest of the proof is virtually identical to the proof for the type (I) utility function. (Namely, the choice of constants T2 and T3, and the proofs of the corresponding properties repeat those in the proof for type (I) utility function verbatim.) Proof of Theorem 3 for type (II) function H is complete. 10 Fluid Sample Paths under the Gradient Algorithm Let us dene some additional random functions associated with the system for each value of the scaling parameter . These functions will be dened for a continuous time t. (Recall our convention that functions originally dened on the integers l are assumed to be constant within corresponding time slots [l; l + 1).) Let btc ^Fn (t) =: X Dn (l) l=1 denote the number of type n customers that were served by time t 0 (i.e. in the interval [0; t]), where Dn (l) = mn (l 1) (k(l 1)) is the number of type n customers served in slot l 1 (with k(l 1) 2 K (m(l 1)) being the scheduling decision chosen in slot l 1). Denote by Gm (t) the total number of time slots by (and including) time t 1, when the server was in state m; and by G^ mk (t) the number of time slots by (and including) time t 1 when the server state was m and the scheduling decision k 2 K (m) was chosen. Obviously, for any n 2 N , m 2 M , and k 2 K (m), we have: F^n (0) = 0; Gm (0) = 0; G^ mk (0) = 0 ; and we have the following relations: X G^ (t); t 0; Gm (t) = mk k2K (m) 16 F^n (t) = X X m2M k2K (m) mn (k)G^ mk (t); t 0: Let Xn (t) 0 is the value of variable Xn at time t. Recall that for all integer l 1, Xn (l) = Dn (l) + (1 )Xn (l 1) : (12) Also, recall that the probability law of the Markov chain m () describing the switch state process is the same for each . We dene the process Z , describing system evolution: Z = (X ; F^ ; G ; G^ ) ; where X = (X (t) = (X1 (t); : : : ; XN (t)); t 0); F^ = (F^ (t) = (F^1 (t); : : : ; F^N (t)); t 0); G = ((Gm (t); m 2 M ); t 0) ; G^ = ((G^ mk (t); m 2 M; k 2 K (m)); t 0): In this section we study the sequence of processes Z under uid limit scaling. To be precise, we will only need to consider sample paths of the processes under this scaling, and their limits. For each consider the scaled process z = (x ; f^ ; g ; g^ ) ; where x (t) =: X (t= ) is obtained: from X by time scaling only, and the other components by time and space scaling, f^ (t) = F^ (t= ) and similarly for g and g^ . Thus, the component functions of z are piece-wise constant, with a \time slot" of the length 1= . ^ g; g^) we will call a uid sample path (FSP) if Denition. A xed set of functions z = (x; f; there exists a sequence B0 of positive values of , such that # 0, and a sequence of sample paths (of the corresponding processes) fz g such that ( as # 0 along sequence B0 ) z ! z; u:o:c: ; and in addition kx(0)k < 1 ; (gm (t); t 0) ! (mt; t 0) 17 u:o:c: ; 8m 2 M: (13) It follows directly from the above denition that any FSP satises the following properties: gm (t) = m t ; t 0; m 2 M ; X g^ (t); t 0; m 2 M ; gm (t) = mk 2 X X f^(t) = k K (m) m2M k2K (m) m (k)^gmk (t); t 0: A sequence B0 whose existence is required in the above denition may be completely unrelated to the sequence B we introduced earlier. The following two lemmas establish some properties of uid sample paths. We note that these properties may not characterize the family of FSPs completely, but they will suÆce to establish the desired convergence properties. Remark. Lemma 2 For any uid sample path z , all its component functions are Lipschitz continuous in [0; 1), with the Lipschitz constant upper bounded by C + kx(0)k, where C > 0 is a xed constant depending only on the system parameters. Consider a xed FSP z and a sequence of paths z \dening it." For each , Z is the \unscaled" path from which z is obtained. To prove Lipschitz property of z, let us recall the Xn () update equation (12): Xn (l) = Dn (l) + (1 )Xn (l 1) : (14) Applying this equation iteratively for l = 1; 2; : : : , we obtain: Xn (l) = X~ n (l) + (1 )l Xn (0) ; (15) where Proof. X (1 X~ n (l) =: l 1 )j Dn (l j =0 We see that for any l 1 X~ n (l) = max mn (k) ; n;m;k j) : (16) (17) because X~n (l) is simply the weighted sum of the values of Dn (0); : : : ; Dn (l) (with positive weights summing up to at most 1), and each of those values is of course bounded above by . We can notice (for future reference) that expressions (15)-(17) imply that Xn (l) maxf; Xn (0)g; 8l 0 : (18) 18 Let us rewrite (14) as follows: Xn (l) Xn (l 1) = Dn (l) Xn (l 1) : (19) We see that jXn (l) Xn (l 1)j (Dn (l) + X~n (l 1) + Xn (0)) (2 + Xn (0))) : (20) All other components of the process Z are non-decreasing and we trivially have for all integer l 1 (and any n, m, k): F^n (l) F^n (l 1) ; Gm (l) Gm (l 1) 1 ; G^ mk (l) G^ mk (l 1) 1 : These relations along with (20) imply that all components of an FSP z are Lipschitz in [0; 1) with the specied bound on the Lipschitz constant. Since all component functions of an FSP are Lipschitz, they are absolutely continuous, and therefore almost all points t 2 R+ (with respect to Lebesgue measure) are such that all component functions of z have derivatives. We will call such points regular. In the rest of the paper, when we write an expression containing derivatives of FSP components at time t, we always mean that it holds under the additional condition that t is regular, even if we do not state this condition explicitly. Lemma 3 The family of FSPs z satises the following additional properties: (i) For almost all t 0 (with respect to Lebesgue measure) we have: x0 (t) = v (t) x(t); where v (t) =: (d=dt)f^(t) = X X m2M k2K (m) 0 (t)m (k) g^mk (21) (22) satises the condition v (t) 2 arg max(rH (x(t))) v : v 2V (23) (ii) Suppose the utility function H is of type (II). Suppose, for some subset B N , xi (t) = 0 if i 2 B and xi (t) > 0 if i 2 N n B . Then X d+ x (t) c > 0 ; dt i2B i where c > 0 is a xed constant depending only on the system parameters. (iii) If a sequence of FSPs z (j ) ! z u.o.c. as j ! 1, then this z is also an FSP. 19 As in the proof of Lemma 3, consider a xed FSP z and a sequence of paths z \dening it," along with their \unscaled" versions Z . To prove properties (i) and (ii) let us rst derive the basic integral equation for x(). (Equation (26) below.) The derivation of this equation is same as in [8, 1] - we present it for completeness. Let us sum up equations (19) for l = 1; : : : ; j . We obtain Proof. Xn (j ) Xn X X X (l (0) = D (l) j j n l=1 l=1 n 1) : (24) Switching to scaled processes gives, for all integer j 0, xn (j ) xn (0) = f^n (j ) Z l 0 xn ( )d : (25) Using the fact that all limiting functions xn and f^n are Lipschitz continuous and we have u.o.c. convergence z ! z, we nally obtain the desired integral equation xn (t) xn (0) = f^n (t) Z 0 t xn ( )d ; t 2 R+ : (26) Since both x() and f^() are Lipschitz continuous, the equation (21) (for every regular point t 0) follows. Now, let us prove property (ii). (After that we will prove (23), i.e., the rest of property (i)). Suppose B N . (The case B = N is simpler. The proof for this case is a simplied version of the proof for B N .) We can make the following observation, which follows from the continuity of x() and properties of the component utility functions Hi(). There exist suÆciently small xed > 0 and > 1 (both depending on the values xn (t) for n 2 N n B ) such that for any 2 [t; t + ], any i 2 B and any n 2 N n B , Hi0 (xi ( )) > 0 and Hi0 (xi ( )) N : jHn0 (xn( ))j (We remind that denotes the smallest positive value of mn (k) over all n; m; k.) This means that if we consider the \unscaled" paths Z , then for all suÆciently small and any (positive integer) time slot l within interval [t=; t= + = ] we have the following property: if m = m(l 1) is such that m i (k ) > 0 for at least one pair of k 2 K (m) and i 2 B , then X D(l) : i i2B 20 Let us pick a state m such that mi(k) > 0 for at least one i 2 B . (Such a state exists.) By the denition of a uid sample path, the fraction of time in the interval [t=; t= += ] the switch state is m converges to its stationary probability m. This easily implies d+ X d X^ xi (t) = + fi (t) m > 0 : dt dt i2B i2B Since there is only a nite number of subsets B N , the property (ii) has been proved. To prove (23), we rst notice that in the case of type (II) utility function, at any regular point t, we must have xn (t) > 0 for all n 2 N . Otherwise (since t is regular) we would have x0n(t) = 0 for any n with xn(t) = 0, which is impossible due to property (ii). Thus, if t is regular, then (for any utility function type), the gradient rH (x()) is continuous in a neighborhood of point t. Given this, the proof of (23) is analogous to the proof of Lemma 4(ii) in [14]. Namely, due to the continuity of the gradient rH (x()) at time t, the following observation is true. For any > 0, there exists a suÆsiently small > 0, such that for all \unscaled" paths Z , with suÆciently small , we have the following property. For any (positive integer) time slot l within interval [t= =; t= + = ], krH (X (l 1)) D (l) k2max rH (x(t)) m(l 1) (k)k : m(l 1) This easily implies that krH (x(t)) v(t) max rH (x(t)) vk ; v 2V and since can be arbitrarily small, rH (x(t)) v(t) = maxv2V rH (x(t)) v, which completes the proof of (23) and with it the proof of property (i). The compactness property (iii) of the family of FSPs is analogous to the compactness properties described in Lemmas 5.1-5.3 in [13]. It follows directly from the construction of an FSP as a limit. Namely, for each FSP z(j) consider a \dening" sequence z(j); ; ! 0; such that z(j); ! z(j) u.o.c. For each j , consider z(j);j which is within distance j > 0 from z(j) in the uniform metric in the interval [0; j ], where j # 0. Then, it is easy to see that the sequence z(j);j denes FSP z. 11 11.1 Proofs of Theorems 4, 2, and 1 Proof of Theorem 4 It suÆces to prove a more general fact that any weak limit of the sequence of processes z is a process z concentrated on FSPs with probability 1. (In this case all processes are in the 21 Skorohod space DRL [0; 1), where L is the total number of scalar component functions of z .) Each sequence fgm g satises the law of large numbers, namely gm (t2 ) gm (t1 ) ! m (t2 t1 ) in probability 8 0 t1 t2 ; (27) and the sequence fz g is \asymptotically Lipschitz", namely, P fkz (t2 ) z (t1 )k C (t2 t1 ) + g ! 1; 8 0 t1 t2 ; > 0 ; for some xed C > 0. Using the above two facts, the proof is completely analogous to the proof of Theorem 7.1 in [13]. We note that, alternatively, the proof can also be obtained more directly using Skorohod representation (see [4]). Namely, asymptotic Lipschitz property implies that the sequence fz g is relatively compact. Then, any converging subsequence can be constructed on a probability space such that the convergence is with probability 1. Then, since property (27) holds, the property (13) holds with probability 1. This, by the denition of an FSP, implies that z converges to an FSP with probability 1. 11.2 Proof of Theorem 2 First of all, let us choose a > 0 to be xed and large enough so that a > , V [0; a]N , and A [0; a]N . Then, without loss of generality, it will suÆce to prove the theorem for the set A rechosen to be A = [0; a]N . For a given > 0 and Æ > 0, by Theorem 3, we can choose T > 0 (depending on A and ), so that for any FSP with x(0) 2 A, we have sup kx(T ) uk =2 : t2[T;T +Æ] Using this, Theorem 4, and the continuous mapping theorem (see [3]), it is easy to obtain lim sup P f sup kx (t) uk > g = 0 : (28) #0 x (0)2A t2[T;T +Æ] Note, however that (by (18)), xn(t) remains bounded by the maximum of and xn (0) for all t. This means that x (t) 2 A for all t and all . Applying (28) to the processes restarted at times 0, we can rewrite (28) in a stronger form lim sup sup P f sup kx (t) uk > g = 0 ; #0 0 x (0)2A which implies the desired result. t2[ +T; +T +Æ] 22 11.3 Proof of Theorem 1 Both U (l1 ; l2) and X (l2) are dierently computed averages of the values of D (l): the former one is the average over l1 l l2 with equal weights 1=(l2 l1 + 1), and the latter one is (roughly) the average of D (l2 j ) with \exponential" weights (1 )j . We know from Theorem 2 that EX (l) is close to u for large l, and therefore, when l1 is large, X EX (l) 1 (29) l2 l1 + 1 l ll is close to u, uniformly on l2 l1 . In this proof, we show that, when both l1 and l2 l1 are suÆciently large, the expected value EU (l1 ; l2) is close to (29). Let us choose a > 0 the same way as in the proof of Theorem 2. So, without loss of generality we can assume that A = [0; a]N . Then, again as observed in: the p proof of Theorem 2, for all l 0 and all , X (l) 2 A, and consequently kX (l)k b = Na. Let us x (small) > 0. Theorem 2 along with the fact that the values of X (l) are uniformly bounded implies that we can choose T > 0 such that for all suÆciently small , uniformly on l T= , we have kEX (l) uk < : (30) Without loss of generality, we can always choose T to be large enough so that e T is arbitrarily small. We note that X (1 )j = e T + O() ; 1 2 j T = where (here and below) O( ) denotes a term with absolute value bounded above by C as ! 0, with C > 0 being a xed constant. In what follows, l1 , l2 are functions of , such that l1 T= , l2 l1 T 0= , and T 0 > T . : Also, we will use a simplied notation d(p) = ED (p). Then, we can write X EX (l) = 1 Q =: (31) l2 l1 + 1 l ll 1 1 X X (1 l2 T 0 = l 1 l=l1 j =0 2 )j d(l j ) + O( ) = (32) Q1 + Q2 + Q3 + O( ) ; where Q1 , Q1 , and Q3 , break down the summation in (32), into the summations over X X l2 l=l1 0j<l l1 ; X X l1 l<l1 +T= l l1 j l 1 23 ; X X l1 +T= ll2 l l1 j l 1 ; respectively. Since we have kd(p)k b, we obtain the following estimates kQ2k bT=T 0 + O( ) ; kQ3k be T (T 0 T )=T 0 + O( ) : For Q1, by changing summation indexes from (l; j ) to (p = l l lXp X 1 Q1 = 0 (1 T = 2 2 p=l1 j =0 1 1 XX (1 l2 where )j d(p) = )j d(p) Q4 T 0 = p=l1 j =0 EU (l1 ; l2 ) + O( ) Q4 1 X 1 X Q4 = 0 T = l1 p<l2 Q5 = 0 T = l2 T= T= pl2 X (1 j>l2 p X (1 j>l2 p j; j ), we can write: Q5 = Q5 ; )j d(p) ; )j d(p) ; kQ4 k be T (T 0 T )=T 0 + O( ) ; kQ5k bT=T 0 + O( ) : We see that kQ EU (l1; l2)k kQ2k + kQ3 k + kQ4k + kQ5 k + O( ) 2be T (T 0 T )=T 0 +2bT=T 0 + O( ) : Thus, if we x T large enough so that e T is small, and then choose T such that T=T is small, then we have kQ EU (l1; l2 )k for all suÆciently small , uniformly on T 0 > T . Since we also know that kQ uk , the result follows. 12 Conclusions In this paper we consider the problem of optimal throughput allocation to multiple users in a generalized switch model, with the objective of maximizing a concave utility function. We prove asymptotic optimality of the Gradient scheduling algorithm. Our model is general enough to cover a wide class of applications of interest, including those not covered in the previous work. Most notably, the model allows multiple users to be served simultaneously and allows discrete sets of scheduling decisions. We also allow very general utility functions. The analysis of transient dynamics of user throughputs under Gradient algorithm is the key problem and the main part of this work. Our \geometric" approach to this problem is inherited from [14], and may be of independent interest. 24 References [1] R. Agrawal, V. Subramanian. Optimality of Certain Channel Aware Scheduling Policies. In Proc. of the 40th Annual Allerton Conference on Communication, Control, and Computing. Monticello, Illinois, USA, October 2002. [2] P.Bender, P.Black, M.Grob, R.Padovani, N.Sindhushayana, A.Viterbi. CDMA/HDR: A Bandwidth EÆcient High Speed Wireless Data Service for Nomadic Users. IEEE Communications Magazine, July 2000. [3] P.Billingsley. Convergence of Probability Measures. John Wiley and Sons, New York, 1968. [4] S. N. Ethier and T. G. Kurtz. Markov Process: Characterization and Convergence. John Wiley and Sons, New York, 1986. [5] N. Gans and G. van Ryzin. Optimal Dynamic Scheduling of a General Class of ParallelProcessing Queueing Systems. Adv. Appl. Prob., Vol. 30, (1998), pp. 1130-1156. [6] J. M. Harrison. Heavy TraÆc Analysis of a System with Parallel Servers: Asymptotic Optimality of Discrete Review Policies. Annals of Applied Probability, Vol. 8, (1998), pp. 822-848. [7] A.Jalali, R.Padovani, and R.Pankaj. Data Throughput of CDMA-HDR, a High EÆciency - High Data Rate Personal Communication Wireless System. In Proc. of the IEEE Semiannual Vehicular Technology Conference, VTC2000-Spring, Tokyo, Japan, May 2000. [8] H. Kushner, P. Whiting. Asymptotic Properties of Proportional Fair Sharing Algorithms. In Proc. of the 40th Annual Allerton Conference on Communication, Control, and Computing. Monticello, Illinois, USA, October 2002. [9] X.Liu, E.K.P.Chong, N.B.Shro. Opportunistic Transmission Scheduling with ResourceSharing Constraints in Wireless Networks. IEEE Journal on Selected Areas in Communications, 19(10), 2001. [10] X.Liu, E.K.P.Chong, N.B.Shro. A Framework for Opportunistic Scheduling in Wireless Networks. Computer Networks. (To appear.) [11] N.McKeown, V.Anantharam, and J.Walrand. Achieving 100% Throughput in an InputQueued Switch. Proceedings of the INFOCOM'96, 1996, pp. 296-302. [12] A.Mekkittikul and N.McKeown. A Starvation Free Algorithm for Achieving 100% Throughput in an Input-Queued Switch. Proceedings of the ICCCN'96, 1996, pp. 226-231. [13] A.L. Stolyar. On the Stability of Multiclass Queueing Networks: A Relaxed SuÆcient Condition via Limiting Fluid Processes. Markov Processes and Related Fields, 1(4), 1995, pp.491-512. 25 [14] A. L. Stolyar. MaxWeight Scheduling in a Generalized Switch: State Space Collapse and Equivalent Workload Minimization under Complete Resource Pooling. 2001. (Submitted) [15] L.Tassiulas, A.Ephremides. Stability Properties of Constrained Queueing Systems and Scheduling Policies for Maximum Throughput in Multihop Radio Networks. IEEE Transactions on Automatic Control, Vol. 37, (1992), pp. 1936-1948. [16] L.Tassiulas, A.Ephremides. Dynamic Server Allocation to Parallel Queues with Randomly Varying Connectivity. IEEE Transactions on Information Theory, Vol. 39, (1993), pp. 466-478. [17] D. Tse. Multiuser Diversity and Proportionally Fair Scheduling. In preparation. [18] H. Viswanathan, S. Venkatesan, and H. C. Huang. Downlink Capacity Evaluation of Cellular Networks with Known Interference Cancellation. 2002. To appear in IEEE Journal on Selected Areas on Communications. [19] R.J.Williams. On Dynamic Scheduling of a Parallel Server System with Complete Resource Pooling. Fields Institute Communications, Vol. 28, American Mathematical Society, Providence, RI, 2000, pp. 49-71. 26
© Copyright 2026 Paperzz