ADAPTIVE SIMULATION BUDGET ALLOCATION FOR DETERMINING THE BEST DESIGN. Qi Fan Jiaqiao Hu

Size: px

Start display at page:

Download "ADAPTIVE SIMULATION BUDGET ALLOCATION FOR DETERMINING THE BEST DESIGN. Qi Fan Jiaqiao Hu"

Marybeth Douglas
5 years ago
Views:

1 Proceedings of the 013 Winter Simulation Conference R. Pasupathy, S.-H. Kim, A. Tol, R. Hill, and M. E. Kuhl, eds. ADAPTIVE SIMULATIO BUDGET ALLOCATIO FOR DETERMIIG THE BEST DESIG Qi Fan Jiaqiao Hu Department of Applied Mathematics and Statistics State University of ew Yor Stony Broo, Y 11794, USA ABSTRACT We consider the prolem of allocating a given simulation udget among a set of design alternatives in order to maximize the proaility of correct selection. Prior wor has focused on deriving static rules that predetermine the numer of simulation replications to e allocated to each design. In contrast, we formulate the prolem as a Marov decision process MDP and propose a dynamic myopic scheme to adaptively allocate simulation samples ased on current estimates of the means and variances of the design alternatives. We provide numerical examples to illustrate the performance of the proposed dynamic allocation rule. 1 ITRODUCTIO We consider the prolem of identifying the est design from a finite set of design alternatives. Each design is assumed to involve random uncertainty and requires stochastic simulation for performance estimation. When simulation is expensive and the numer of design alternatives is relatively small, a well-nown class of procedures for solving such prolems is raning and selection, where the goal is to determine the numer of simulation runs to e allocated to each design in order to guarantee a pre-specified correct selection proaility. Examples of raning and selection methods include Rinott s two-stage indifference zone procedure Rinott 1978, the expected value of information procedure Chic and Inoue 001, and the K family of algorithms Kim and elson 006; some reviews and advances in this field can e found in e.g., Goldsman and elson 1998, elson et al. 001, and Kim and elson 007. Chen 1995 approached the prolem from a different perspective y determining the est allocation of a given simulation udget among the designs in order to maximize the proaility of correct selection PCS. In particular, Chen 1996 proposed to use a Bayesian approach to estimate the design performance measures ased on prior sampling information and derived a lower ound for the correct selection proaility. The idea was susequently used in Chen, Chen, and Yücesan 000 and Chen et al. 000 to develop analytical allocation rules called Optimal Computing Budget Allocation OCBA that asymptotically optimize the lower ounds of the proaility of correct selection. More recently, Chen et al. 006 also investigated a dynamic allocation rule ased on perfect information assumption, and suggested that a dynamic scheme could dramatically improve the performance of static allocation rules. In this paper, we consider the setting of OCBA, i.e., maximizing PCS under a simulation udget constraint. However, unlie previous wor, which has primarily focused on static rules that predetermine the numer of simulation replications to e allocated to each design, we investigate a dynamic programming approach that adaptively allocates simulation samples ased on current estimates of the means and variances of various designs. Our wor can e viewed as an extension of that of Chen et al. 006 ased on perfect information. In particular, we model the simulation allocation process as a Marov decision process MDP with a terminal cost function. The state variale of the MDP model consists of the current sample mean of each design and the numer of simulation replications allocated to each of them y assuming that /13/$ IEEE 888

2 design variances are nown. These variances can then e estimated y sample variances. Since analytically solving the MDP model is intractale, we further develop an upper ound to the optimal value function and propose a one-step looahead index policy to myopically minimize the sum of the current one-stage cost and the upper ound of the optimal value function. Our preliminary numerical results indicate competitive performance of our approach with that of OCBA, especially when the simulation udget is small. The rest of the paper is organized as follows: In Section, we define notations and descrie the prolem setting. In Section 3, we formulate the prolem as an MDP, provide an upper ound to the optimal value function, and derive a myopic index policy for simulation allocation. umerical examples are provided in Section 4 to illustrate the performance of our approach. Finally, we conclude the paper in Section 5. PROBLEM SETTIG Consider the following optimization prolem: min J i min EL iξ ] i Θ i Θ where Θ = {1,,...,} is a finite set of design indices and J i is the true performance measure of design i. ote that J i itself is the expectation of the sample performance L i ξ, where the expectation is understood with respect to the distriution of the random variale ξ representing the stochastic uncertainty of the design. We assume that the expectation cannot e evaluated exactly; however, for a given simulation udget t, the performance measure J i can e estimated y the sample mean J t i 1 t i i t L i ξ i j, j=1 where t i represents the numer of replication runs allocated to design i and ξ i j represents the jth realization of ξ simulation sample path from design i. Throughout this paper, we assume that the simulation outputs are independent of each other. We egin y defining some notations. : the true variance of design i, i.e., σ i = VarL i ξ, which can e estimated y its sample variance. t : the index of the design that shows the current est sample performance after t simulation replications have een allocated, i.e., t t min i i t. s t : the index of the design that shows the second est sample performance after t simulation runs have een allocated, i.e., s t t min i t i t. δ t t,i = t t : the difference etween the sample performance of the current est and the ith designs. σi σ t t,i = σ t t t J t i + σ i : the standard deviation of the random variale δ t i t t,i. Define the event of correct selection CS as the event that design t i.e., the one with the current est sample performance is actually the est design. For a given simulation udget, the goal is to find a way to maximize the proaility of correct selection P{CS}. We follow the Bayesian approach introduced in Chen 1996 and assume that the output performance measure L i ξ is normally distriuted for each design. Let J i e a random variale whose distriution is the posterior distriution of design i given the previous sampling information: P{ J i } = P{J i L i ξ i j, j = 1,,, t i }, for i = 1,,,. It can e shown that if no priori nowledge is given on the performance of each design, J i has the normal distriution J i i t, σ i. Thus, a lower ound to P{CS} called approximate proaility t i 889

3 of correct selection APCS can e otained y applying Bonferroni s inequality. P{CS} = P{J t < J i, i t L i ξ i j, j = 1,,,i t,i = 1,,,} = P{ J t < J i, i t }, = P{ J t J i < 0} i t 1 P{ J t J i > 0} i t = 1 i t = APCS, δ t t,i σ t t,i where throughout this paper, we use and φ to denote the c.d.f. and p.d.f. of the standard normal distriution. Since P{CS} is difficult to evaluate, whereas APCS can e computed analytically without resorting to additional simulation effort, Chen et al. 000 proposed to use APCS as an approximation to the true proaility of correct selection. For a given udget T, the goal is to find a simulation udget allocation that solves the following optimization prolem. min 1 T,T,,T s.t. δ T T,i σ T T,i T i T and T i A DYAMIC BUDGET ALLOCATIO PROCEDURE Motivated y the wor of Hu et al. 011 and that of Chen et al. 006, we aim to solve the allocation prolem y modeling the allocation process as an MDP model with a terminal cost function. So instead of allocating all T simulation replications at the eginning as in 1, we derive a dynamic policy that sequentially allocates the udget ased on the estimated performance measure of all designs as well as the current sampling information. 3.1 Modeling the Allocation Process as an MDP Given a total of T simulation samples, we start y assigning at step t = 0 a small numer n 0 simulation replications to each of the designs. Define the state variale w t = i t,t i,i = 1,,,T as a vector containing the current performance estimates of all designs and the numer of times each design has een sampled thus far, where i 0 = n 0 and i 0 = 1 n 0 n 0 j=1 L i ξ i j for all i. ext consider a sequential allocation policy π that determines, at each step t = 1,,,T n 0 ased on w t, whether one more replication run should e allocated to one of the designs or the entire allocation process should e terminated. Let π t w t {0,1,...,} e the action taen at step t, then { i allocate one simulate run to design i, i = 1,,, π t w t = 0 stop the allocation process. 890

4 For every allocation policy π descried aove, it can e seen that {w t } is a Marov chain with the following state transition dynamics: i t+1 = t i i t +Y ii {a=i} t i + I {a=i} for i = 1,,, 3 t+1 i = t i + I {a=i} for i = 1,,,, 4 where I is the indicator function, a is the action taen at step t under π, and Y i is the simulation output performance measure of design i after the additional allocation. Based on 3 and 4, the updating formula for the current est sample mean is given y: t+1 t+1 = where Z + = max{0,z}. Let w = i, i,i = 1,..., T σ + σ i t t +Y t + s t t J s t t t t t t +I if a = t {a=i} t t J t t t aj a+y t a I + {a=i} if a a+i t {a=i} t, e a given state and define = argmin i J i, δ,i = J J i, and σ,i = i. By associating the state action pair w,a with the following one-stage cost function: { 0 if a 0 R t w,a = δ,i σ,i if a = 0 and R s w,a = 0 for all s t + 1 whenever π t w = 0, we otain a T n horizon MDP with the total cost V π T n 0 w = E R t w t,π t w t ] w0 = w, t=0 where the expectation is taen respect to the proaility measure induced y π. For a given initial state w 0 = w, the ojective is to find an optimal simulation allocation policy π to minimize the total cost accumulated efore the allocation process terminates. 3. A Myopic Index Policy Since otaining the exact optimal policy for the MDP model is intractale, we derive a myopic index policy using one-step looahead optimization. The following result provides an upper ound to the optimal value function. Theorem 1 Let V t e the optimal cost-to-go function at stage t of the MDP defined in Section 3.1. For every t = 0,1,,,T n 0 and w = i, i,i = 1,..., T, we have δ,i V t w. 6 Proof. At the final stage t = T n 0, all T replications have een exhausted, so the only option is to stop the allocation process. Therefore, we must have πt n 0 w = 0 and V T n0 w = δ,i. It follows that when t = T n 0 1, σ,i σ,i V T n0 1w = min a E a R T n0 1w,a +V T n0 w ] 5 891

5 where a {0,1,,,} and w = J i, i,i = 1,,,T is the next state generated according to the transition dynamics 3 and 4 when action a is taen, in particular, J i = ij i +Y i I {a=i} i +I and {a=i} i = i + I {a=i}. Therefore, V T n0 1w = mine a R T n0 a 1w,a +V T n0 w ] δ,i = min{,min E a a 0 δ,i σ,i, σ,i σ + σ i i δ,i σ,i ] } where = argmin i J i, δ,i = J i, and σ,i =. ow proceed y induction and assume that V t+1 w δ,i σ,i for all w. Then V t w = mine a R t w,a +V t+1 w ] a δ,i min{,min σ E a,i a 0 δ,i. This completes the proof of the theorem. σ,i δ,i Motivated y Theorem 1, we propose a simple stationary greedy policy that minimizes the sum of the current one-stage cost function and the upper ound of the optimal cost-to-go function at each step: 0 if πw = δ,i σ,i min a 0 E a δ ],i σ,i argmin a 0 E a δ ] 7,i σ otherwise.,i By connecting π to 1, it is not difficult to see that if one more simulation sample is needed, such a policy myopically allocates the next sample in such a way so that the APCS in the next step is maximized after the additional allocation. Since π is an index policy, we can create an index for each action a ased on the current state w. Denote y indexa as the index of action a and let w = J i, i,i = 1,...,T e the sampled next state when action a is taen, we have three different cases ased on the transition dynamics 3, 4, and 5: case 1 If a = 0, case If a =, index = E l=1,l index0 = J s s σ J +Y + σ l J J i + σ J l + σ i i σ,i J s + ]}, 8 J s J +Y σ + σ + J +Y ], 9 89

6 case 3 If a and a 0, indexa = E a l=1,l a J a σ J a +Y a + σ l Fan and Hu + J l J + J a J a +Y a σ + σ a a + a J a +Y a ], 10 where = argmin i J i and σ is the variance of design. Let B s = { J +Y J s }. The + operator can e removed from 9 y conditioning on event B s : index = E l + E l +Y σ J l + σ l J s J l σs s + σ l + 1 ] B s PB s + J s J +Y σ s s + σ B ] c s PB c s. 11 Similarly, y conditioning on B = { J +Y }, the index in case 3 can e otained as indexa = E a l a + E a l a a a +Y a σa + σ l J J l σ + σ l J l + 1 ] B PB + J a J a +Y a σ + σ a B ] c PB c A Dynamic Budget Allocation Algorithm ote that when a = 0 the index in case 1 can e calculated analytically, whereas calculating the performance indices in 11 and 1 require evaluating the expectations with respect to the design distriutions. One natural approach to evaluate/estimate these expectations is to use Taylor expansion. Taing 11 as an example, we can treat each respective term as a function of the sample mean J +Y and perform a first order Taylor expansion of the term around. In addition, y replacing the true mean of the current est design with its sample mean, PB s can e approximated y +1 J s σ. Thus when a =, we can approximate the index of action y the following analytical formula: index = l J φ l σ l φ σ s J J l σ + σ l J J s s + σ J J s σs s + σ + σ l ] σ σ + 1 J s J + l l σs + 1 σ + 1 J s + σ l s + σ l σ s s + σ σ ] + 1 J s φ σ J + 1 J s J s φ σ J σ J

7 Similarly, when a and a 0, indexa can e approximated y ] indexa 1 = + a J l a + 1 a l a σa + σ l l + φ l a φ a J l σa + σ l a σ + a σ + σ a σa a + 1 J J + l σ l a + σ l + σ l a + 1 σ + σ a ] a + 1 φ J a a a + 1 φ J a J a. 14 Finally, y replacing true variances with sample variances, we propose the following algorithm for simulation udget allocation. Dynamic Simulation Budget Allocation DSBA Step 0: Perform n 0 simulation replications for all designs. Calculate the sample mean and sample variance for each design. Step 1: For each action a {0,...,}, compute the index of action a according to 8, 13, and 14. Step : Select the action a with the smallest index value. If a = 0, then stop the allocation process; else if a = i, perform one simulation replication for design i, update the sample mean and sample variance of design i. Increase the numer of simulation replications to design i y 1 and go ac to Step 1 until the given udget is exhausted. 4 UMERICAL RESULTS In this section, we test the proposed DSBA algorithm and compare its performance with that of OCBA on some simple examples. OCBA was derived ased on analytically solving the static optimization prolem 1. It has een shown in Chen et al. 000 that the asymptotically optimal solution to the prolem as T satisfies the following conditions. 1 i σi /δ j =,i σ j /δ, j = σ,i i σi, where i is the numer of samples allocated to design i, δ,i = J i, min i J i, and σ i is the standard deviation of the performance measure for design i, which can e estimated y sample variance. OCBA sequentially allocates a given udget T y splitting it into atches of size. Then at each step, the current computing udget is increased y and a udget allocation is calculated using conditions 1 and ased on the updated computing udget. This allocation is then used to determine the numer of additional simulation runs need to e allocated to each design. The process continues until all udget has een consumed. We consider the following examples in our computational experiment. Example 1 : This is a special example where the est design has a zero variance and the rest two designs have the same performance: X 1 j 0,0,X j 0.4,3,X 3 j 0.4,3 894

8 Example : There are five design alternatives with the est design eing deterministic: X 1 j 0,0,X j 0.4,1.5,X 3 j 0.4,3,X 4 j 1,3,X 5 j,3 Example 3 : This example is an extension of the previous one with the deterministic design removed: X 1 j 0,1.5,X j 0.6,3,X 3 j 1,3,X 4 j,3 Example 4 : This is an example with three alternatives, all of which are random: X 1 j 1,1,X j 1.5,3,X 3 j 1.5,3 In our experiment, the initial numer of replications n 0 is set to 10 for oth DSBA and OCBA. Figure 1 shows the performance of oth algorithms for each of the four respective test cases, where the true P{CS} in each case is estimated y the proportional of times the est design is found y an algorithm out of 10,000 independent experiments. The figure indicates competitive performance of DSBA with that of OCBA in all test cases. In particular, DSBA outperforms OCBA when the simulation udget is small, whereas OCBA shows slightly etter performance when the udget is increased, especially in the last case. Our conjecture is that this is due to the asymptotic optimality of OCBA, whereas DSBA is myopic in nature. 5 COCLUSIO In this paper, we have introduced a dynamic simulation udget allocation procedure for determining the est design from a set of finite design alternatives. The idea is to use a myopic one-step looahead policy to approximately solve an underlying MDP characterizing the udget allocation process. Such a policy gives rise to a stationary index rule that adaptively determines at each step which design should e simulated next in order to myopically maximize the approximate proaility of correction after the additional allocation. Our preliminary numerical results indicate that our approach may provide competitive performance with that of OCBA, especially when the computing udget is small. ACKOWLEDGMETS This wor was supported in part y the Air Force Office of Scientific Research under Grant FA and y the ational Science Foundation under Grant CMMI REFERECES Chen, C.-H An Effective Approach to Smartly Allocate Computing Budget for Discrete Event Simulation. In Prodeedings of the 34th IEEE Conference on Decision and Control, Piscataway, J: IEEE. Chen, C.-H A Low Bound for the Correct Suset-Selection Proaility and Its Application to Discrete-Event System Simulation. IEEE Transaction on Automatic Control 41: Chen, C.-H., D. He, and M. Fu Efficient Dynamic Simulation Allocation in Ordinal Optimization. IEEE Transaction on Automatic Control 51: Chen, C.-H., J. Lin, E. Yücesan, and S. E. Chic Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization. Discrete Event Dynamic System: Theory and Applications 10: Chen, H.-C., C.-H. Chen, and E. Yücesan Computing Efforts Allocation for Ordinal Optimization and Discrete Event Simulation. IEEE Transaction on Automatic Control 45: Chic, S. E., and K. Inoue ew Two-Stage and Sequential Procedures for Selecting the Best Simulated System. Operations Research 49: Goldsman, D., and B. L. elson Comparing systems via simulation. In Handoo of simulation, edited y J. Bans, ew Yor: John Wiley. 895

9 Figure 1: Comparison of OCBA and DSBA. Hu, J., H. S. Chang, M. C. Fu, and S. I. Marcus Dynamic Sample Budget Allocation in Model-Based Optimization. Journal of Gloal Optimization 50: Kim, S.-H., and B. L. elson Selecting the Best System. In Handoos in Operations Research and Mangement Science: Simulation, edited y S. G. Henderson and B. L. elson, Chapter 17, Oxford, UK: Elsevier Science. Kim, S.-H., and B. L. elson Recent Advances in Raning and Selection. In Prodeedings of the 007 Winter Simulation Conference, edited y S. Henderson, B. Biler, M.-H. Hsieh, J. Shortle, J. Tew, and R. Barton, Piscataway, J: IEEE. elson, B. L., J. Swann, D. Goldsman, and W. Song Simple Procedures for Selecting the Best Simulated System when the umer of Alternatives Is Large. Operations Research 49: Rinott, Y On Two-Stage Selection Procedures and Related Proaility Inequalities. Communications in Statistics - Theory and Methods 7: AUTHOR BIOGRAPHIES Qi Fan is a Ph.D. student in the Department of Applied Mathematics and Statistic at the State University of ew Yor, Stony Broo. He received the B.S. degree in mathematics from Zhejiang University, China in 011. His research interests include Marov decision processes, optimization and simulation. His address is qfan@ams.sunys.edu. 896

10 JIAQIAO HU is an Associate Professor in the Department of Applied Mathematics and Statistics at the State University of ew Yor, Stony Broo. He received the B.S. degree in automation from Shanghai Jiao Tong University, the M.S. degree in applied mathematics from the University of Maryland, Baltimore County, and the Ph.D. degree in electrical engineering from the University of Maryland, College Par. His research interests include Marov decision processes, applied proaility, and simulation optimization. His address is 897

Provably Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models

Provably Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models Provaly Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models Retsef Levi Sloan School of Management, MIT, Camridge, MA, 02139, USA email: retsef@mit.edu Roin O. Roundy School of