Dynamic Admission and Service Rate Control of a Queue

Size: px

Start display at page:

Download "Dynamic Admission and Service Rate Control of a Queue"

James Fleming
6 years ago
Views:

1 Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering University of Texas at Austin, Austin, Texas, {akranthi@gmail.com, jhas@mail.utexas.edu } Abstract This paper investigates a queueing system in which the controller can perform admission and service rate control. In particular, we examine a single server queueing system with Poisson arrivals and exponentially distributed services with adjustable rates. At each decision epoch the controller may adjust the service rate. Also, the controller can reject incoming customers as they arrive. The objective is to minimize long-run average costs which include: a holding cost, which is a non-decreasing function of the number of jobs in the system; a service rate cost c(x), representing the cost per unit time for servicing jobs at rate x; and a rejection cost κ for rejecting a single job. From basic principles, we derive a simple, efficient algorithm for computing the optimal policy. Our algorithm also provides an easily computable bound on the optimality gap at every step. Finally, we demonstrate that, in the class of stationary policies, deterministic stationary policies are optimal for this problem. April 13, Introduction This paper investigates a joint admission and service rate control problem for a single-server queue with Poisson arrivals and exponential service times. The controller is allowed to choose a service rate in each state and also whether to outsource (or reject) arriving jobs in each state. Thus the model allows service rate control in addition to a form of admissions control. There is a cost function associated with the available service rates, in addition to a rejection cost and a general holding cost. The controller s objective is to minimize the long-run average cost. There is a rich set of literature associated with this model and similar models, but the primary motivation for this work derives from an elegant analysis of the same problem, without admission control, performed in George and Harrison [4]. That paper specifically left open the problem of extending the model to incorporate admission control. Furthermore, [4] assumes that the optimal control is in the class of stationary deterministic control policies. In this paper, we extend their model to allow admission control (which also removes a technical 1 Research supported in part by National Science Foundation grant DMI

2 condition on the service rate cost function) and we prove that the optimal control policy is deterministic, within the class of stationary policies. The problem of controlling a queue with service rate or admission decisions has been addressed by many authors in various forms. Apart from [4], the most closely related paper is one by Stidham and Weber [10], who provided a uniform method for proving monotonicity of optimal rates in a variety of one station queueing problems. There are two primary differences between their model and ours. First, they did not explicitly consider the case of admission control, in the sense where customer can be rejected. Although they allow arrival rate control, this is not necessarily equivalent to the problem with admission control. Second, they require the action space to be compact, an assumption not imposed in our model. Koole [7] also provided a framework, called event-based dynamic programming to prove monotonicity results in queueing control problems of the type analyzed here. Koole s framework applies most directly to finite horizon discounted cost problems and it is possible that those results could be used to prove monotonicity results for the finite horizon version of our problem. However, extending the technique to particular infinite horizon average cost problems requires verifying technical side conditions. Furthermore, Koole s technique seems to rely on a uniformization, which would not apply in our model. The books of Sennott [9] and Puterman [8] also contain discussion and background on admission and rate control problems in queueing models. To our knowledge, the combined problem studied in this paper has not been analyzed previously. In particular, we allow the set of available service rates to be non-compact, which is different from the action space assumption in many standard Markov decision process (MDP) models, such as those in [2, 8, 9]. For example, [5], Guo and Hernández consider the overall action space to be unbounded, however, for each state, the action space is bounded. In our model, as in [4], the action space is allowed to be unbounded in each state. Ata and Shneorson [1], which extends the model of [4] to include capacity constraints, also consider an unbounded action space. There are three main contributions in this paper. First, we construct a simple, efficient iterative algorithm for solving the joint service rate and admission control problem for a single server queue. Although the system is relatively simple, the model is quite general in that there are minimal assumptions on the service rate and holding cost functions (see the next section for details). Furthermore, each step of the algorithm provides an immediately computable upper bound on the optimality gap. If the optimal policy is to reject customer at a system threshold level n, then the algorithm terminates when truncation scheme reaches level n. If it is not optimal to apply admission control then the algorithm does not terminate, but converges to the optimal control policy as the truncation level goes to infinity. Hence, in this case, the bound on the optimality gap is useful when applying a stopping criterion. Although our approach is similar in spirit to the algorithm in [4], the iterative approximation scheme is different. In particular, our algorithm truncates the state space, whereas the algorithm in [4] truncates the holding costs. The second contribution comes as a byproduct of the computational development, in which we also prove that the optimal service rates are monotone in the system state. The third contribution is to prove that deterministic policies 2

3 are optimal within the class of stationary policies. It remains to show that one can restrict the search for optimal policies to the class of stationary policies. Stidham and Weber [10] are able to show that this is the case when the action space is compact and their analysis relies crucially on this assumption. Hernández-Lerma and Lasserre [6] provide more general results results on the optimality of stationary policies for MDP models with unbounded action spaces. However, that book deals only with discrete-time Markov decision processes. Because our action space is unbounded, so are the transition rates, and thus uniformization cannot be invoked to convert our problem into an equivalent discrete-time problem. The rest of this paper is organized as follows. In Section 2 we introduce the control model. Section 3 presents the optimality equations and an associated verification theorem. Also, for a system with a fixed rejection threshold modified optimality equations are provided. The computational algorithm is presented in Section 4, and numerical examples are given in Section 5. Finally, in Section 6 we prove that the optimal control policy is stationary. 2 The Control Model Our model consists of a single server with an adjustable service rate. The service time of a customer being served at rate x > 0, is exponentially distributed with mean 1/x. Arrivals occur according to a Poisson process and without loss of generality we take the rate of this process to be 1. The system manager can change the service rate at any time. Further, an arrival can be denied admission by the system manager. There are costs associated both with providing service at a particular rate x and denying admission. These costs along with a holding cost comprise the total system cost. The objective of the system manager is to minimize the long run average cost per unit time. For practical applications, the rejection cost can be viewed as a cost to outsource jobs. It is assumed that all cost functions are known to the system manager. Let the cost per unit time to serve at rate x be c(x), and let h n be the cost per unit time to hold n customers in the system. In general, there can be a non-negative holding cost h 0 incurred even when no customers are present in the system. The cost incurred for rejecting a customer is κ. One can also view the rejection of jobs as providing instantaneous service. As will be seen below, this view is useful when considering the technical conditions on the cost of service function c( ). In summary, the system manager s decision policy consists of determining the service rates and admission policy at any instant of time, with the objective of minimizing the long-run average cost of operating the system. Under such an objective function, standard arguments show that the decision time points can be restricted to times when arrivals or departures occur. Thus, the control problem can be embedded in the framework of a continuous-time Markov decision process with a countable state space. In Section 6 we show that one need only consider the set of stationary, state-dependent controls. We now discuss the technical assumptions in more detail. We make the following assumptions on the action space and system costs: (A1) The action space A is a closed subset of [0, ) containing 0 and an element greater 3

4 than 1. (A2) The holding cost h n is non-decreasing in n. (A3) The cost for service function c(x) is non-decreasing in x. (A4) c( ) is continuous on (0, ) (and right-continuous at 0) with c(0) = 0. These assumptions are relatively weak and are essentially the minimal assumptions required to avoid pathological cases. They are also identical to the assumptions that appear in the motivating paper of George and Harrison [4], with some exceptions. First, George and Harrison require only left-continuity of c( ). Our computational approach requires the additional assumption of right-continuity. Second, they also require a geometric growth condition on the holding costs (see equation (2) in [4]). This condition was imposed primarily in order to prove convergence of optimal values under their algorithmic method, which involves truncating holding costs. Since our truncation method is different, we do not require this geometric growth condition. Finally, and most importantly, in [4], the following additional condition is imposed on the cost function when the action space A is unbounded: lim y ( inf { c(x) x }) : x A, x y =. If the limit above is finite, then the control model must take into account a possible additional mode of control: instantaneous service of a customer, for a cost equal to the limit. George and Harrison wished to exclude that mode of control from their analysis. We allow this mode of control, and thus the relaxed assumption below is imposed: (A5) lim y (inf { c(x) x }) : x A, x y κ 0. Recall that κ is the cost to reject (or outsource) a customer. Assumption A5 implies that the cost for service per unit time is greater than or equal to the rejection cost, as the service rate grows without bound. When A5 is not satisfied, the rejection option can be ignored, since serving at some large service rate is always more beneficial than rejecting the customer. A5 is similar in form to the condition for near monotone costs discussed in Borkar and Meyn [3]. When this conditions holds, they are able to show existence of optimal policies under the so-called risk-sensitive cost criterion. The five assumptions A1 A5 are the only technical assumptions needed for our analysis. Since all interevent times are exponential in our model, the state space can be restricted to simply the current system size. The action space at each decision epoch involves the admission decision and the service rate control. Note that admission control is only relevant when the decision epoch is at a customer arrival time (as opposed to a departure time). We represent the admission control decision by a, where a {0, 1}. a = 1 when the decision is to admit the new customer and a = 0 when the decision is to reject. The service rate can be changed when the system state changes due to a departure or an arrival, even if the customer is not admitted. We assume that the admission and service rate decisions are made 4

5 simultaneously by the controller. Thus, at an arrival time the joint control decision can be represented as (a, x), where the service rate until the next decision epoch is 0 x <. When the state changes due to a departure, then we represent the action as (1, x). Without loss of generality we assume x = 0 when n = 0. For most of this paper, we restrict our analysis to the set of deterministic stationary controls. Under a given stationary control, it is clear that the system size process is a continuous-time Markov chain (CTMC). If the Markov chain is positive recurrent under a policy, we call the policy ergodic. 2.1 Dynamical Equations and Objective Functions In this section we assume that the system is operated under a stationary ergodic policy. For simplicity of discussion we also assume that the system starts empty at time 0, although the results hold for any initial state. Since the control is stationary, if incoming customers are rejected when the system size is m, then the system size will never exceed m. Hence the set of controls can be divided into two general types: terminating and non-terminating. Under a given terminating policy we let m be the smallest state for which customers are rejected. For such a policy, we can specify the policy by ( µ, m). When there are n m customers in the system the server processes jobs at rate µ n. When there are m customers in the system, all incoming customers are rejected. We refer to such a policy as m-terminating when the threshold level m needs to be emphasized. If a given stationary policy does not reject arrivals in any state then the policy is non-terminating and the policy can be specified by an infinite vector µ. In this case, we think of m as taking the value infinity. Under an ergodic policy ( µ, m), let p n ( µ, m) be the steady-state probability that the system size is n. Standard CTMC theory then implies that the long-run average cost per unit time under this policy is: z( µ, m) := p m ( µ, m)κ + m p n ( µ, m) {c(µ n ) + h n }. (1) n=0 For a non-terminating policy, with m =, we take p m ( µ, m) = 0. Hence, in this case the first term on the right-hand side of (1) is zero, and the sum is an infinite sum. The steady-state probabilities satisfy the local balance equations: p n ( µ, m) = p n 1 ( µ, m)µ 1 n 1 n m, µ n > 0. (2) It is possible to not serve customers in some states, i.e., to have µ n = 0 for some n. In this case, the state space and balance equations are modified in a straightforward manner. Note that in order for a non-terminating policy to be ergodic the number of states with µ n = 0 has to be finite. Next, define z (m) := inf z( µ, m), where the infimum is taken over all m terminating policies. Note that z(0, 0) = z (0). If customers are rejected in every state, the resulting policy is ergodic with z(0, 0) = h 0 +κ <. 5

6 Thus, for every set of parameters there exists at least one ergodic policy with a finite average cost. As a result, the infimum above is well-defined and finite. An m terminating policy with z( µ, m) = z (m), if it exists, is called m-optimal. The infimum among all ergodic policies is: z = inf m 0 z (m). The control problem is to find a ergodic policy ( µ, m) which achieves the infimum and a policy which does so is said to be globally optimal. We use this term to distinguish such policies from policies which are only optimal among all m-terminating policies, for a given m (we sometimes call such policies m-optimal ). When holding cost rates are bounded, it is possible that h n h z <, i.e., the long-run average cost under the do-nothing policy is smaller than achievable long-run average cost under any ergodic policy. As in [4], in this case the MDP is said to be degenerate. When the problem is non-degenerate, we prove existence of the optimal policy in a constructive manner; in particular, we provide an algorithm to compute the policy. 3 Optimality Equations and Verification In this section, we provide the optimality equations and an associated verification theorem. The overall procedure is similar to that in [4]. However, several modifications are necessary due to the extra mode of control allowed. Standard arguments from the theory of Markov decision processes yield the following equations for the relative cost functions, which we denote by v n, n 0: and v n = min 0 = min inf x A inf { c(x)+hn z+xvn 1 +v n+1 1+x { c(x)+hn z+xvn 1 +v n+κ x A 1+x { c(x)+hn z x(vn v n 1 )+(v n+1 v n) 1+x { } inf c(x)+hn z x(vn v n 1 )+κ x A 1+x inf x A v 1 = v 0 h 0 + z. }, } n 1, }, n 1 (3a) Since for a given z the sequence of v n s is determined only up to an additive constant, one usually works with the relative cost differences y n := v n v n 1, n 1. Following the arguments in [4] and [11], using the relative cost differences reduces the optimality equations to the following form: and z h n = min ( inf x A {c(x) y nx + y n+1 } inf x A {c(x) y nx + κ} y 1 = z h 0. 6 ) (3b) n 1, (4a) (4b)

7 Defining φ(y) := sup {yx c(x)}, for y 0, (5) x A simplifies the optimality equations. Specifically, (4a) becomes h n z = φ(y n ) min {y n+1, κ} n 1. (6) It is worthwhile to note that the optimality equations remain the same even if the holding cost is not non-decreasing. The smallest value of n for which y n+1 κ is said to be the terminating state for the optimality equations. This corresponds to the threshold state m of the associated terminating policy. If y n+1 < κ for all n 1, then the solution pair is said to be non-terminating. Note that if a particular pair z and (y 1, y 2,... ) is a solution of the optimality equations (4b) and (6), then y n+1 < κ for any non-terminating state n 1. When y < κ, under A1, A4 and A5, the function φ( ) has finite values which are attained and the smallest maximizers in the set A exist. Let ψ(y) be the smallest minimizer of φ(y). Note that assumption A5 implies that ψ(y) is finite for each y 0. Further, if φ(κ) <, then ψ(κ) is well defined. In the next subsection we confirm the validity of the optimality equations for bounded solutions (sometimes called a verification theorem ). Also, a modified version of these optimality equations is introduced and verified. 3.1 Verification Theorem We first state and prove the main verification result. Theorem 1. Let z <, (y 1, y 2,... ) be a solution to the optimality equations (4b) and (6), with the y i s being uniformly bounded. Let m be the corresponding terminating state (if there is no terminating state, then m = ). Then z z( µ, m) for every ergodic policy ( µ, m), that is, z z. If the policy ( µ, m ) defined by { µ ψ(y n ) for 1 n < m ; n = ψ(y m ) for n m (when m < ) is ergodic, then ( µ, m ) is an optimal policy. Proof. First, note that since z < and the y i are bounded, φ(y n ) < for all n 1. Using (5) and (6), one obtains the following relations: xy n c(x) φ(y n ) y n+1 + h n z x A, n 1, (7) xy n c(x) φ(y n ) κ + h n z x A, n 1. (8) Any ergodic policy is either a terminating or a non-terminating policy. First, consider an arbitrary terminating policy ( µ, m). Setting x = µ n in (7) and x = µ m and n = m in (8), we have µ n y n c(µ n ) y n+1 + h n z for n 1, (9) µ m y m c(µ m ) κ h m z for m <. (10) 7

8 Multiplying both sides of (9) by p n ( µ, m), multiplying both sides of (10) by p m ( µ, m), substituting µ n p n ( µ, m) = p n 1 ( µ, m) from (2), and rearranging terms yields: p n ( µ, m)[h n + c(µ n ) z] p n 1 ( µ, m)y n p n ( µ, m)y n+1 for 1 n < m (11a) p m ( µ, m) [h m + c(µ m ) + κ z] p m 1 ( µ, m)y m. Summing all the equations in (11a) and (11b), and using relation (1) gives (11b) z( µ, m) p 0 ( µ, m)h 0 z[1 p 0 ( µ, m)] p 0 ( µ, m)y 1. (12) Applying (4b) we conclude z( µ, m) z, which establishes the result for any terminating policy. For a non-terminating policy ( µ, ), the derivation is analogous, where now we sum the equations in (11a) over all n 1. Since the y i s are bounded the sums due to the right-hand side of (11a) are finite and we again obtain (12). This establishes z( µ, ) z for any non-terminating policy. Next, given a solution to the optimality equations with m <, set x = ψ(y n ) in (7) and x = ψ(y m ) in (8). In that case, (9) and (10) hold with equality, implying that (11a) and (11b) also hold with equality. As a result (12) holds with equality for this policy, i.e., (µ, m ) is optimal. An analogous argument holds in the non-terminating case. In the theorem above, even for a terminating policy, we assigned a service rate for states beyond the threshold state. Of course, these states are transient, so the assigned rates are inconsequential in terms of long-run costs. In the theorem below, we consider the optimality equations for a fixed n-terminating policy. These equations will come in to play in later sections. As such, we refer to (4b) and the equations in (13) as the n-optimality equations. Furthermore, we call a policy n-optimal if it is optimal for the control problem in which the state space is truncated to be {0,..., n}. The following theorem is stated without a proof, as it follows directly from arguments in the proof above. Theorem 2. If there exist an n 1,and a sequence (y 1,, y n ) and z(n) < satisfying (4b) and then z(n) = z (n) and the policy ( µ, n) given by is n-optimal. h k z(n) = φ(y k ) y k+1 for 1 k n y n+1 = κ (13) µ k = ψ(y k ) for 1 k n (14) This theorem can be applied when a system with a fixed threshold is considered and there is no option of rejecting customers unless the buffer is full. One such case is discussed in [1]. Note that both theorems in this section hold even when the holding cost h n is not non-decreasing in n. 8

9 4 Policy Computation 4.1 Overview In Section 3.1 we developed optimality equations and established their validity. In this section, we suggest a computational methodology to obtain an optimal or near optimal policy. The optimality equations alone do not suggest an efficient methodology for constructing near optimal policies. However, the structure of the model does suggest an algorithm, and most of this section is devoted to validating the approach. It should be noted that our algorithm differs from the one suggested in [4]. In their model, approximating problems are formed by truncating the holding costs at some buffer level n, i.e., they set h n = h n+1 =... for some n 1. Since our model allows customer rejection, it is natural to consider a sequence of approximating problems which truncate the state space instead. This approach works because it can be shown that if there is a local minimum in these approximating problems, then the solution to the corresponding truncated problem yields the optimal policy for the original problem. If no local minimum exists, then it is not optimal to reject customers in any state. In either case, one can use the approximating problems to generate a bound on the optimality gap at any stage, thus allowing the implementation of a stopping criterion. Because we allow the action space to be unbounded, there may be no formal solution to the optimality equations for a particular truncation level n. Such cases arise when the optimal service rate for a state below the truncation level is. In such cases, as the algorithm below indicates, it is then globally optimal to reject at a lower threshold value. Assumption A5 insures that such a policy is optimal in these cases. The algorithm is as follows: Initialization Set n = 1. Step 1 Solve the n-optimality equations. Step 2 If a solution to the optimality equations exists and z (n) z (n 1) then the optimal (n 1)-terminating policy is globally optimal. If no solution exists, then the optimal (n 1)-terminating policy is globally optimal. If neither case applies, then increase n by 1 and go to step 1. Consider the sequence of solutions (z(n), (y n 1,, y n n)), n 1, satisfying equations (4b) and (13). If a solution pair exists for some n 1 then Theorem 2 applies, i.e., z(n) = z (n). In particular, this solution pair corresponds to an optimal policy for a system with a limit of n customers. Starting from n = 1 solution pairs are computed for incremental values of n. As mentioned above, this computation continues until a local minimum is found or until no solution pair exists for some n. It is shown below that if ñ is the first local minimum of the sequence then z(ñ) = z. If there is no such local minimum, then lim z(n) = n z. 9

10 The sequence of terminating optimal policies can be seen as corresponding to progressive approximations of an optimal policy in the same way that [4] provides successive approximations via holding cost truncation. We now summarize the development in the remainder of this section. In Section 4.2 we prove optimality of the n-terminating policy corresponding to the first local minimum of the z( ) values described above. Section 4.3 treats the case when there is no local minimum. In that case, the limiting policy is shown to be optimal. In both cases, it is shown that the optimal service rates are monotone in the number of customers in the system. 4.2 Policy Computation: The Local Minimum Case The sequence of solutions (z(1), y 1 1), (z(2), y 2 1, y 2 2),... might have a local minimum or terminate when no solution pair exists for a particular n + 1. When no solution exists it is shown that z(n) < z (n+1) (note that z (n+1) exists even if there is no solution z(n+1)). Hence, in either case the sequence is said to have a local minimum. Here, we show that when z(n) is the first local minimum of the sequence, then the n-optimal policy is an optimal policy. We first establish a preliminary lemma. Lemma 1. Consider a sequence of optimal terminating solutions (z(1), y 1 1), (z(2), y 2 1, y 2 2),.... If z(k) > z(n) for 1 k < n, then y n k+1 < κ. (15) Proof. Since z(k) > z(n), we have h k z(n) > h k z(k) φ(y n k ) y n k+1 > φ(y k k) κ, (16) where the latter inequality is due to (13). Next note that given a value z(n), the sequence of y s are constructed from (4b) and (13). This construction directly implies that if z(k) > z(n) then yk k > yn k. Since φ( ) is a non-decreasing function we have φ(yk k ) φ(yn k ). Combining this with (16) yields yk+1 n < κ. For a decreasing sequence of z s Lemma 1 establishes that the y s are bounded away from κ, i.e., they satisfy the defining property of non-terminating states. The next theorem is the main result of this subsection. Theorem 3. If there exists a sequence of optimal terminating solutions with values satisfying z(k) > z(n) for 1 k < n, z (n + 1) z(n) (17) then the n-optimal policy corresponding to the solution (z(n), (y n 1,, y n n)) is globally optimal. Furthermore y n 1 y n 2 y n n, (18) which implies that the optimal service rates are monotone in the number of jobs in the system. 10

11 Proof. By assumption, n 1. The case where even the 1-optimality equations cannot be satisfied is discussed at the end of this subsection. We first establish (18). For any l, such that 1 < l n we can apply (13) and the fact that the holding cost rates are non-decreasing to obtain: φ(y l l) κ = h l z(l) h l 1 z(l) φ(y l l 1) κ, where the last inequality is derived by applying both (13) and Lemma 1. Since φ( ) is non-decreasing, the inequality above implies yl l yl l 1 and another application of (13) yields φ(y l 1 ) h l 1 φ(y l l 2) h l 2. (19) Since the holding costs are non-decreasing (19) implies φ(yl 1 l ) φ(yl l 2 ), which gives yl l 1 yl 2 l. We can now recursively apply the last few observations to obtain (18). We are now prepared to prove the main part of the theorem. By Lemma 1, yk+1 n < κ for 1 k < n. Thus, the optimality equations in (13) from the terminating case can be written in the equivalent form of the original optimality equations: h k z(n) = φ(y n k ) min { y n k+1, κ } for 1 k < n. (20) Case 1: Suppose that there exists a solution to the set of (n + 1)-optimality equations. As previously noted, if z(n) z(n + 1), then by construction yk n yn+1 k for all 1 k n. Further, we also have h n z(n) h n z(n + 1). Applying the n- and (n + 1)-optimality equations, plus the last two observations gives φ(yn) n κ φ(yn n+1 ) yn+1 φ(yn) n yn+1 yn+1 κ. (21) Define δ := z(n + 1) z(n) and, for k n + 1 set g := h n+1 δ. Using these definitions, (13) and (21) yields: g z(n) = φ(y n+1 n+1) min { y n+1 n+1, κ } for k n + 1. (22) Let us modify the holding cost vector by replacing h i, i > n with g, i.e., the new rates are: (h 0, h 1,, h n 1, h n, g, g, ). Since δ 0 the new holding cost rates for states larger than n are less than or equal to the original rates. Then, for this cost rate vector, the n-optimal policy corresponding to the solution (z(n), (y n 1,, y n n)) is globally optimal using (4b), (20) and (22) and applying Theorem 1. Since it is optimal to reject customers in state n with these modified holding 11

12 costs it must also be optimal to reject them when all holding costs are larger beyond state n, as they are in the original holding cost vector. Hence, the n-optimal policy is also globally optimal under the original holding cost vector. Case 2: Suppose that a solution to the set of (n + 1)-optimality equations does not exist. In state n+1, replace the holding cost h n+1 with the modified cost h n+1 := z(n)+φ(κ) κ. We claim that h n+1 h n+1 0. To see this, first note that for the system with the modified cost, (z(n), (y n 1,, y n n, κ)) satisfies the (n + 1)-optimality equations. Now suppose that h n+1 h n+1 < 0. Let ẑ(n+1) be the cost of implementing the policy corresponding to (y n 1,, y n n, κ) with the original holding cost vector. In that case, we must have ẑ(n + 1) < z(n), since the only thing that has been changed is the holding cost in state n + 1, which by assumption is strictly lower in the original system. This last inequality implies z (n + 1) < z(n) which contradicts the assumption in the theorem statement that z (n + 1) z(n). Thus we have established that h n+1 h n+1 0. The remainder of the argument is similar to that of case 1. In particular we again modify that holding cost vector to be (h 0, h 1,, h n 1, h n, h n+1, h n+1, ) and argue that the n-optimal policy corresponding to the solution (z(n), (y1 n,, yn)) n is globally optimal in the modified system. Hence, it must also be globally optimal in the original system. It is possible that there is not even a solution to the 1-optimality equations, i.e., there is no z and finite y 1 satisfying the equations. In this case, it is straightforward to argue that the policy of rejecting customers in all states (including state 0) is optimal. 4.3 Policy Computation: The Decreasing Sequence Case If there exists a sequence of terminating optimal solutions, satisfying z(n) > z(n + 1) n 0, (23) then z( ) := lim n z(n) is the cost incurred by using the limiting control policy, i.e., the policy ( ζ, ) where ζ m = lim n ψ(y n m) for all m 1. From the discussion preceding Section 3.1 it is clear that the limiting service rate ζ m is finite for all m 1. Thus, to prove optimality of the limiting control policy we need to show that a limiting policy exists and that z( ) = z. The theory in this section is closely related to the truncating holding cost case considered in Section 6 of [4]. So, consider a modified problem with holding costs (h 0, h 1,, h n 1, h n, h n, ). The next lemma shows that there exists an n-terminating solution for this truncated holding cost problem. This result is used in establishing optimality of the limiting control policy. 12

13 Lemma 2. Suppose there exists a sequence of n-terminating solutions {(z(n), (y n 1,..., y n n), n 1} with decreasing values, i.e., (23) is satisfied. Then for each n 1 there exists a solution vector (ẑ(n), (ŷ n 1,, ŷ n n)) which solves the truncated holding cost problem. In particular (ẑ(n), (ŷ n 1,, ŷ n n)), satisfies: (i) (4b) and ŷ n k = φ(ŷ n k 1) h k 1 + ẑ(n) for 2 k n (24) ŷ n n = φ(ŷ n n) h n + ẑ(n). (25) (ii) Furthermore, for each n 1 the ŷ n s are non-decreasing: ŷ n 1 ŷ n 2 ŷ n n. (26) Proof. Consider an n-terminating solution (z, (y 1,, y n )). Such a solution satisfies (4b) and (24), but not necessarily (25). We wish to show that we can find a ẑ(n) which will generate a new sequence of ŷ n s which also satisfy (25). Note that for any fixed z if a sequence of y n s exists it is unique. Hence, below we can think of y n as a function of z and we denote this below by using the notation y n (z). Define n (z) := φ(y n (z)) y n (z) + z h n (27) and let S(n) represent the statement that there exists a ẑ(n) such that n (ẑ(n)) = 0. If for some n 1, the original n-terminating solution is such that, n (z(n)) 0, then this solution satisfies the global optimality equations (4b) (6), implying that the corresponding policy is globally optimal. This contradicts our assumption that the sequence of n-terminating optimal policies have decreasing values. Therefore, n (z(n)) > 0, n 1. For n = 1, plugging in z = h 0 in (27) and using the fact that the holding cost is nondecreasing, we have 1 (h 0 ) = φ(0) + h 0 h 1 0. By construction, it can be seen that 1 ( ) is continuous on (0, ). Hence, from the observations in the last two displays we conclude that there exists a ẑ(1), h 0 ẑ(1) < z(1), with 1 (ẑ(1)) = 0, i.e., S(1) holds. Now assume S(m) holds for some m 1. Let ẑ(m) be the function argument for which S(m) holds, i.e., suppose m (ẑ(m)) = 0. Next, by definition m+1 (ẑ(m)) = φ(y m+1 (ẑ(m)) y m+1 (ẑ(m)) + ẑ(m) h m+1. (28) Using (28), the identity y m+1 (ẑ(m)) = y m (ẑ(m)) and the fact that S(m) holds we have: m+1 (ẑ(m)) = h m h m+1, 13

14 which implies that m+1 (ẑ(m)) is non-positive. From observations above recall that m+1 (z(m+ 1)) > 0. Again using the continuity of m+1 ( ) we conclude that there exists a ẑ(m + 1), ẑ(m) ẑ(m + 1) < z(m + 1), (29) such that m+1 (ẑ(m + 1)) = 0. So, S(m + 1) holds and by induction, S(n) holds for all n 1, thus establishing (i). Result (ii) follows from an argument analogous to that in the proof of Theorem 3. For completeness, we now state a lemma similar to Proposition 4 in [4]. The proposition establishes that the solutions of Lemma 2 correspond to optimal policies, at least when the control problem is nondegenerate. Lemma 3. For a fixed n 1, consider the control problem with modified holding cost vector (h 0,, h n 1, h n, h n, ). Let ẑ(n) be the optimal objective value for the modified problem, and let (ˆµ n 1, ˆµ n 2,...) be the corresponding optimal service rates. The following hold: (i) If ẑ(n) h n, then the modified control problem is degenerate. (ii) If ẑ(n) < h n, then ( η(n), ), where η(n) = {ˆµ n i } is an optimal ergodic policy. Proof. For clarity note that the elements of η(n) are dictated by ẑ(n), and ˆµ n k = ˆµn n for k n. The result follows from Proposition 4 in [4] and observing that (ẑ(n), (ŷ1 n,, ŷn)) n satisfies (4b), (24) and (25). If the optimal value for a terminating policy is strictly smaller than the previous value, i.e, z(n) > z(n+1), then Lemma 3 guarantees the existence of a non-terminating optimal policy for a system with holding cost truncated at h n. This result holds even if z(n + 1) z(n + 2), i.e., the sequence of z( ) s need only be decreasing up until stage n + 1 for the existence of an optimal policy under truncated holding costs. Recall that the original holding costs are non-decreasing in the number of jobs in the system. Hence, as we increase the truncation level, the sequence of ẑ(n) s must also be non-decreasing. Furthermore, we have ẑ(n) z for all n 1. Hence, ẑ( ) := lim n ẑ(n) z. (30) Next note that the ŷ n i ( ) are bounded and increasing functions of ẑ. From these properties, (26), and the construction of non-terminating policies in Lemma 2, we have for i 1 ŷ i := lim n ŷ n i Similarly, the pre-limit property (26) of the ŷ n i = ŷ i (ẑ( )) yields ŷ 1 ŷ 2 Since ψ( ) is continuous and non-decreasing, (30) implies the existence of a limiting control rate, for each i 1: ˆµ i := lim ψ(ŷi n ) = ψ(ŷi ). (31) n With these observations above in hand, we first consider the degenerate case. 14

15 Theorem 4. If h n h as n and ẑ( ) h, then the original control problem is degenerate. Proof. The result follows from (30) and the definition of degeneracy. In the appendix we prove the following lemma, from which the main result will follow immediately. Lemma 4. For every non-terminating ergodic policy ( η(n), ) construct a terminating policy ( µ n, n), defined by { µ n ˆµ n k if 1 k < n; k = ˆµ n n 1 if k = n. Let z(n) be the optimal values of these so-constructed policies. Then lim z(n) = n z. We are now prepared to present the main result of this section, which provides the justification for the computational method when optimal objective values are decreasing. Theorem 5. If the sequence of n-terminating optimal policies have decreasing values, i.e., (23) is satisfied, then the corresponding limiting control policy ( µ( ), ) is an optimal policy. Furthermore, the optimal service rates are non-decreasing with respect to the number of customers in the system. Proof. First, we wish to establish that the sequence of values of optimal terminating policies converges to z. In Lemma 4 we establish this for terminating policies derived from the sequence of policies derived from truncated holding cost problems. Recall that these optimal values are denoted by z(n), n 1. It is easy to see that for all n, we have z(n) z(n), since z(n) is the optimal value among all terminating policies, and z(n) corresponds to a value under some (perhaps suboptimal) terminating policy, at the same truncation level. Lemma 4 shows that lim n z(n) = z, which with the observation above immediately implies that lim z(n) = n z. From here on, our argument is analogous to that in the proof of Proposition 7 in [4]. In particular, y( ) and ψ( ) are continuous functions implying that the optimal truncated service rates converge to a limiting service rate vector corresponding to z : lim ψ(y k(z(n)) = µ k ( ) = ψ(y k (z )), n for all k 1. These same continuity properties similarly imply that the optimal limiting service rates are monotone in the number of customers in the system. 15

16 A minor difference between our proof and the arguments appearing in [4] should be noted. George and Harrison require left-continuity of c( ) because they truncate holding costs, which means that the sequence of truncated optimal values converge to z from the left. Since we truncate the buffer level instead, our sequence of optimal values converge to z from the right, requiring right-continuity of c( ) and the related function ψ( ) for the argument above (left-continuity is used in earlier sections). Though yn < κ for n 1, the optimal service rates associated with z( ) can grow without bound as the number of customers in the system increases, i.e., the optimal service rates in this case could be unbounded. 5 Numerical Examples In this section we illustrate the algorithm developed in the last section with a few numerical examples. First, we present a result which provides a lower bound on the optimality gap between the control policy obtained after an iteration of the algorithm, and the optimal control policy. George and Harrison [4] also provided a bound on the optimality gap for their algorithm. However, their algorithm evolves in a different manner, since they truncate holding costs rather than the state space. Theorem 6. If z(n) > z(n + 1) for some n 1, and z(k) > z(n) for all k < n then z(n) z κ y n n. (32) Proof. Since z(n) > z(n + 1), the n-terminating policy is not globally optimal. It follows that y n n < κ. (33) If the above inequality does not hold then setting y m = y n n for m > n, we have a solution pair to the optimality equations when the holding cost vector is modified to be (h 0, h 1,, h n 1, h n, h n, ), from Theorem 1 and Lemma 1. This implies z = z(n) which contradicts our assumption. Set h m = h m θ n for m < n, where θ n = κ y n n. From (33) θ n is positive. For a system with modified holding costs (h 0, h 1,, h n 1, h n, h n, ), the pair (z(n) θ n, (y 1 n, y 2 n,, y n n 1, y n n, y n n, )) satisfies the optimality equations. Since the modified holding costs are less than or equal to the original costs in every state, we conclude which immediately implies the result. z(n) θ n z, Thus, after each iteration of the algorithm, one has an easily computable bound on the optimality gap, which allows for the implementation of a stopping criterion, whether or not the optimal policy is a terminating policy. When the limiting policy is the optimal policy then, θ n 0 as n. Note that Theorem 6 holds irrespective of the holding cost structure. 16

5.1 Example Control Problems Figure 1: Optimal buffer size: c(x) = x 2 In all of the numerical examples in this section we take A = [0, ) and set h n = h 0 + s(n M + 1) +, where s and M are

17 5.1 Example Control Problems Figure 1: Optimal buffer size: c(x) = x 2 In all of the numerical examples in this section we take A = [0, ) and set h n = h 0 + s(n M + 1) +, where s and M are parameters that are varied across cases. In all the cases, s is strictly positive and M is an integer. In the first example we consider the quadratic cost structure for service used in [4], in particular, c(x) = x 2. Further, we set h 0 = 10 and M = 1. For this example, the optimal policies are terminating policies. Figure 1 shows the variation in the buffer limit under the optimal policy as the rejection cost κ and the multiplicative factor s in the holding cost are varied. It is clear that for fixed κ, the optimal buffer size increases with decreasing s and further, this effect is more prominent for smaller values of s. An interesting result is the apparent lack of monotonicity in the optimal buffer size with respect to κ, for a fixed value of s. One also observes that this apparent lack of monotonicity is prominent only once the optimal buffer size saturates. As expected, the optimal buffer size is highest when the rejection cost is large and s is small. For small values of the rejection cost, irrespective of s, all customers are rejected. This is due primarily to the particular values chosen for the holding cost function. For the same parameter settings, the optimal values are plotted against s and κ in Figure 2. As expected, the optimal value increase with either increased rejection or holding cost. However, note that the optimal values saturate relatively quickly in the rejection cost and this saturation occurs faster for smaller values of s. The optimal buffer size and optimal objective values have opposite trends with respect to s, i.e., as the optimal cost decreases the optimal buffer size increases. The next example is selected such that the cost for service per unit time approaches the rejection cost as the service rate approaches. In this case A5 holds with equality. The cost for service is c(x) = x x 1 1+ε, where ε is a positive parameter that is varied in 17

18 Figure 2: Optimal cost: c(x) = x 2 Figure 3: Optimal buffer size: c(x) = x x 1 1+ε 18

Figure 4: Optimal cost: c(x) = x x 1 1+ε the numerical experiments. In Figures 3 and 4, the optimal buffer limit and optimal cost, respectively, are plotted on the vertical axis, against ε and 1.

19 Figure 4: Optimal cost: c(x) = x x 1 1+ε the numerical experiments. In Figures 3 and 4, the optimal buffer limit and optimal cost, respectively, are plotted on the vertical axis, against ε and 1. The other parameters are fixed s at κ = 1, M = 1 and h 0 = 1. As in the first example studied, there is an apparent lack of monotonicity of the buffer limit, with respect to ε, when the value of s is fixed. Note that both the optimal cost and optimal buffer size are essentially insensitive to the value of s. Clearly, as the service cost function decreases, i.e., ε increases, the optimal value decreases. The algorithm presented in this paper is quite efficient, in the following sense. In Step 1 of the algorithm, one need only perform a one-dimensional search in terms of z, to satisfy the terminating optimality equations. Generally speaking the computation of φ( ) is not intense and thus Step 1 is computationally inexpensive. The number of these steps which must be performed is simply linear in the truncation level. Hence, for most practical problems, the algorithm will terminate, or converge, in less than a second on a standard desktop computer. Although we did not do so, one could also fine tune the algorithm to make optimal re-use of the one-dimensional search results, which would increase the algorithmic efficiency even more. 6 Optimality of Deterministic Policies In George and Harrison s paper an important question regarding the control of their model was left unresolved. In particular, it remains to show that the search for an optimal policy can be restricted to deterministic stationary policies. For a large class of MDP s this question has been settled by previous work. For example, control problems with adjustable service and service rates along with additional constraints are discussed in [5, Section 6] and existence of an optimal stationary policy is established. However, there it is assumed that the service 19

20 rates are bounded. When the action space is uncountable, and non-compact, there are fewer results. As mentioned earlier Chapter 5 of Hernández-Lerma and Lasserre [6] provides conditions for the optimality of stationary deterministic policies. The primary requirement there is for the transition probabilities to be setwise continuous. However, the results are given only for discrete-time MDP s. From basic principles, we now proceed to show that randomized policies need not be considered, assuming that one restricts to stationary policies a priori. So, we now consider an arbitrary stationary policy, which may be randomized. When such a policy is considered, at any point of time the control can be considered to have two components: admission control and service rate control. Service rate control is applied only when a customer is present in the system and admission control is applied only when a new customer arrives. When control rates are random, the associated decision process is a continuous time semi- Markov decision process (SMDP). Using the SMDP framework, we directly show whenever a solution to the original optimality equations exists, and there is a corresponding stationary ergodic policy, then no randomized policy can do better. Hence, it is sufficient to restrict the search to deterministic policies. We model the randomized decision in each state as follows. After a system event, the service rate X n is assumed to be a random variable with c.d.f. F n ( ). Clearly, X n must be restricted to take values in A. In this case, a system event is an arrival or departure of a customer. Furthermore, when there is an arrival to a system which is in state n, it is assumed that the customer is accepted with probability q n. For clarity, note that we allow the controller to choose a new service rate X n (drawn from F n ( )) even if an incoming arrival is rejected. Thus the control decision now is to choose probabilities {q n, n 0} and the c.d.f. s {F n ( ), n 1} which minimize the long-run cost. Specifically, when the number of customers in the system is n 0, before an arrival or after a departure, the randomized policy is specified by Q n (q n, F n ( )) and the entire control policy is given by the functional vector Q {Q n, n 0}. Below, the expectation operator is defined with respect to Q. So, for example, E[X n ] is the expected service rate under this policy, when there are n jobs in the system. Under Q, we assume that E[X n ] < for all n. A policy with E[X n ] = will have an average cost per unit time equal to the cost when customers are always rejected in state n. Under a general stationary policy, the service time in a state n follows a distribution which is determined by Q n. Given such a policy, it should be clear that the queue length process is then a semi-markov process. Under an ergodic policy Q with E[X n ] < there exist steady-state probabilities v n which represent the long-run average amount of time the SMDP spends in state n. Note that v n > 0 for states below the rejection threshold. Using standard ergodic theory for SMDP s, the long-run cost average cost under any ergodic stationary policy Q is: z( Q) = v n [h n + (1 q n )κ] + n=0 v n E[c(X n )]. (34) We now provide the extension to Theorem 1 which establishes the optimality of deter- 20 n=1

21 ministic stationary policies. Theorem 7. If there exist a z < and uniformly bounded (y 1, y 2,... ), satisfying the optimality equations (4b)-(6), and the corresponding policy as specified in Theorem 1 is ergodic, then z z( Q) for every stationary policy Q. Proof. Consider then a fixed stationary policy Q, and a bounded solution z, (y 1, y 2,...) to the optimality equations. Let µ n := E[X n ] be the mean service rate in state n, under Q. First, one can derive the following detailed balance equations for the SMDP via the embedded Markov chain (which is a simple random walk with self-transitions): Next, equations (7) and(8) imply: q n ν n = ν n+1 µ n+1 for n 0. (35) xy n c(x) y n+1 + h n z x A, n 1, xy n c(x) κ + h n z x A, n 1. Taking the convex combination of these inequalities induced by (q n, 1 q n ) yields: xy n c(x) (1 q n )κ y n+1 q n h n z. This inequality holds for each n 1 and x A, thus it also holds for each realization of X n taking values in A, i.e., X n y n c(x n ) (1 q n )κ y n+1 q n h n z w.p. 1, for each n 1. Taking expectations gives: y n µ n E[c(X n )] (1 q n )κ y n+1 q n h n z n 1. (36) Next, by definition if n = 0 is not a terminating state under the policy corresponding to z, (y 1, y 2,...) then y 1 < κ. If n = 0 is a terminating state then it is straightforward to check that y 1 = κ. Thus, in either case, y 1 κ. Combining this with (4b) yields z h 0 κ. Taking a convex combination of (4b) and this last inequality gives Now, multiplying (36) by ν n for each n 1 we obtain: Applying (35) gives z h 0 y 1 q 0 + (1 q 0 )κ. (37) y n ν n µ n ν n E[c(X n )] ν n (1 q n )κ ν n q n y n+1 ν n [h n z]. y n q n 1 ν n 1 y n+1 q n ν n ν n E[c(X n )] ν n (1 q n )κ ν n [h n z] n 1. 21

Rate control of a queue with quality-of-service constraint under bounded and unbounded. action spaces. Abdolghani Ebrahimi

Rate control of a queue with quality-of-service constraint under bounded and unbounded action spaces by Abdolghani Ebrahimi A thesis submitted to the graduate faculty in partial fulfillment of the requirements