Risk-Averse Anticipation for Dynamic Vehicle Routing

Size: px

Start display at page:

Download "Risk-Averse Anticipation for Dynamic Vehicle Routing"

Mark Cameron
6 years ago
Views:

1 Risk-Averse Anticipation for Dynamic Vehicle Routing Marlin W. Ulmer 1 and Stefan Voß 2 1 Technische Universität Braunschweig, Mühlenpfordtstr. 23, Braunschweig, Germany, m.ulmer@tu-braunschweig.de 2 Universität Hamburg, Von-Melle-Park 5, Hamburg, Germany, stefan.voss@hamburg.de Abstract. In the field of dynamic vehicle routing, the importance to integrate stochastic information about possible future events in current decision making increases. Integration is achieved by anticipatory solution approaches, often based on approximate dynamic programming (ADP). ADP methods estimate the expected mean values of future outcomes. In many cases, decision makers are risk-averse, meaning that they avoid risky decisions with highly volatile outcomes. Current ADP methods in the field of dynamic vehicle routing are not able to integrate riskaversion. In this paper, we adapt a recently proposed ADP method explicitly considering risk-aversion to a dynamic vehicle routing problem with stochastic requests. We analyze how risk-aversion impacts solutions quality and variance. We show that a mild risk-aversion may even improve the risk-neutral objective. Keywords: dynamic vehicle routing, anticipation, risk-aversion, approximate dynamic programming, stochastic customer requests 1 Introduction Many service providers dispatch a fleet of vehicles during the day to transport goods or passengers and to conduct services at customers. Factors like e-commerce, digitization, and urbanization lead to an increase in uncertainty dispatchers have to consider in their plans, e.g., in travel times, service times, or customer demands [1]. Especially, customer requests often occur spontaneously during the day. In many cases, new requests require significant adaptions of the current plan [2]. These are enabled by real-time computational resources. Practical routing applications are generally modeled as dynamic vehicle routing problems (DVRPs, compare [1]). For many DVRPs, static approaches applied on a rolling horizon are not suitable [3]. Anticipation of possible future events and decisions is mandatory to allow reliable, flexible, and effective plans. Anticipation can be achieved by approximate dynamic programming [4]. ADP for DVRPs is widely established, especially for stochastic requests [2]. ADP methods evaluate decisions regarding the expected future rewards (or costs). The expected future rewards are usually approximated via simulation. Generally, a

2 2 Marlin W. Ulmer, Stefan Voß tradeoff between current and future rewards can be experienced. High immediate rewards lower the expected future rewards. Dispatchers aim for an optimal balance between immediate and future rewards. All ADP approaches applied to DVRPs maximize the sum of immediate and expected future rewards. In practice, decisions also depend on the variance of the expected future rewards, i.e., the service provider s risk-aversion [5]. A riskaverse provider may discount the expected future rewards if a high variance, i.e., a high uncertainty of a decision s success is given. In some cases, practitioners are able to quantify their risk-aversion. In other cases, the degree of risk-aversion can be derived by analyzing historical decisions [6]. The derived properties then have to be integrated in a suitable anticipatory DVRP-approach. Work on risk-aversion for vehicle routing problems is limited. In (static) vehicle routing with stochastic travel times explicit inclusion of risk-aversion is, e.g., achieved by [7]. [8] evaluate plans by risk for a dynamic orienteering problem. Risk-aversion is indirectly considered in robust optimization (e.g., [9]) and competitive analysis (e.g., [10]). Both optimize to avoid worst-case scenarios. Their practical suitability is often limited. Until now, the ADP methods applied to DVRPs are not able to integrate practitioners risk-aversion. Anticipation is based on mean values. Especially low probability - high impact incidences are not sufficiently considered [11]. Recently, Jiang and Powell [12] proposed a general ADP method integrating quantiles of the expected value-distribution and therefore the variance in the anticipation. In this paper, we adapt the proposed method to an ADP approach of anticipatory time budgeting (ATB, [2]) for a DVRP with stochastic customer requests. We analyze the impact on rewards and variances for different instance settings and degrees of risk-aversion. This paper is the first work integrating (dynamic) risk-aversion in an ADP approach for dynamic vehicle routing. We show that an explicit inclusion of riskaversion in DVRPs is possible and that a mild risk-aversion even strengthens the approximation process resulting in higher rewards and lower variances compared to the risk-neutral equivalent. 2 Dynamic VRP with Stochastic Requests In this section, we define the DVRP with stochastic requests via Markov decision process (MDP, [13]). For the given problem, a vehicle serves customers in a service area considering a time limit. The tour starts and ends in a depot. A set of known early request customers (ERC) has to be served. During the day, new requests occur. If the vehicle is located at a customer, the dispatcher has to decide about the subset of occurred requests to be confirmed and the next customer to visit. Waiting is permitted. The dispatcher aims on maximizing the confirmed late request customers (LRC). Modeling the problem as MDP, a decision point k occurs if the vehicle is located at a customer. A state S k consists of the point of time, the vehicle s position, the set of not yet served ERC and confirmed LRC, and the set of new LRC. Decisions x are made about the subset to be confirmed and the next customer to visit, respectively, waiting.

3 Risk-Averse Anticipation for Dynamic Vehicle Routing 3 The immediate reward R(S k, x) is the number of newly confirmed LRC. A postdecision state Sk x consists of the point of time, the vehicle s position, the not yet served ERC and confirmed LRC, and the next customer to visit. The transition results from the vehicle s travel and provides a new set of requesting LRC. The process terminates in state S K when no customers remain to be served and the vehicle has returned to the depot. The objective is to derive a decision policy π maximizing the expected sum of rewards over all decision points. Notably, the objective is defined for a risk-neutral dispatcher. 3 Risk-Averse Time Budgeting In this section, we extend ATB by [2] to ATB λ allowing the integration of riskaversion. ATB draws on the ADP method of approximate value iteration (AVI, [4]) to evaluate post-decision states (PDSs) S x regarding the expected number of future confirmations, i.e., their value V (S x ). To be more specific, AVI represents ways of using past experience about the algorithm behavior to improve future performance. Tuning refers to the update of values. Because of the curses of dimensionality, PDSs are aggregated to vectors containing the point of time and the remaining free time budget. The resulting vector space is then partitioned to a lookup table (LT). Every entry of the LT contains a set of vectors. AVI starts with initial tuning and entry values ˆV 0 inducing a policy π 0. Then, AVI iteratively simulates a problem s realization i and tune the values ˆV i 1 regarding the algorithms performance. Within each approximation run i, policy π i 1 is applied based on Bellman s Equation [13] depicted in Eq. (1). The values for the new policy π i are tuned by the realized values of approximation run i. { X π i k (S k) = arg max R(S k, x) + ˆV } i (S x ) (1) x X (S k ) V (S x ) is a random variable. A risk-averse policy aims on avoiding highly volatile V (S x ). Notably, V (S x ) is the sum of a sequence of interdependent random variables R(S k+i, x), 0 < i < K k, i.e., the volatility and the impact of the volatility may change over the subsequent decision points. A straightforward evaluation of the variance of V (S x ) is not sufficient to consider dynamic riskaversion. [14] describe dynamic risk measures ρ(s x ) considering the risk over the subsequent decision points. [12] present an algorithm to approximate ρ(s x ) for every post-decision state by ρ α via ADP methods. They use the quantiles of ρ α as an approximation of the real value distribution of ρ. For ATB λ, we draw on the concept of conditional value at risk (CVaR, [15]). The considered dynamic risk measure ρ α is induced by the one-step conditional risk measure ρ λ as depicted in Eq. (2). ρ λ (S x ) = (1 λ)v (S x ) + λρ α (S x ) (2) To achieve ρ α, ρ λ is recursively applied over the subsequent decision points. For an efficient approximation, we simplistically assume V (S x ) follows a uniform

4 4 Marlin W. Ulmer, Stefan Voß distribution. This avoids an extensive estimation of the distribution for every value. As a result, parameter λ [0, 1] directly determines the dispatcher s riskaversion. λ = 0 results in risk-neutrality and ATB. λ = 1 results in a myopic policy. For the tuning of ATB λ, we approximate both V (S x ) and ρ α (S x ). 4 Computational Studies In this section, we define the settings of ATB λ, briefly describe the instances, and analyze the results. For ATB λ, we follow the parameter settings of [2]. We use a (static) LT with interval length of one. We consider the tuning after 1 million approximation runs. We draw on the instances of [2]. The time limit is set to 360 minutes. The vehicle travels with monotone speed v = 15km/h in a service area of 20 20km 2. The expected number of customers is 100. The percentage of LRC is 75%. Customer requests follow a Poisson distribution over time. We consider three spatial customer distributions. Customers are distributed uniformly (F U ), equally grouped in two clusters (F 2C ), or distributed in three clusters (F 3C ). Within the clusters, the request probability is normally distributed. Table 1. Results; best values are depicted in bold. Confirmations Variance λ F U F 2C F 3C F U F 2C F 3C For each instance setting, we run 1,000 test runs for λ = 0.0, 0.1,..., 1.0. The average number of confirmations and the variance are depicted in Table 1. Notably, a mild risk-aversion leads to a higher risk-neutral objective value. This can be explained by the impact of risk-aversion on the tuning process. For a high λ, only the (relatively certain) outcomes of the next few decision points define the decision policy leading to a fast and more reliable tuning process. A low λ results in an equal consideration of all subsequent decision points and outcomes. The according tuning process requires a high number of approximation runs to be accurate. This is especially the case for the clustered customer distributions [2]. Further, ATB is based only on temporal attributes and may provide a less

5 Risk-Averse Anticipation for Dynamic Vehicle Routing 5 reliable tuning for clustered distributions compared to F U [16]. As a result, the highest amount of confirmations is achieved for λ = 0.3 and λ = 0.4 for the clustered distributions. As expected, we experience a constant decrease of the variances between λ = 0.0 and λ = 0.5. Afterwards, the variance increases. A high λ is similar to a myopic policy and results in outcomes highly dependent on the problem s realization Variance k 500k 1,000k Confirmations Fig. 1. Solution Quality and Standard Deviation for Varying λ and F U We now analyze the tuning process and the tradeoff between number of confirmations and variance in more detail. Figure 1 shows the number of confirmations and variance for varying λ and F U for 1,000 test runs and policies achieved by 100k, 500k and 1,000k approximation runs. For 1,000k, λ = 0.1 to λ = 0.5 span a Pareto-front for both dimensions. For 100k, the tuning for ATB λ with λ = 0.1 is (still) not sufficient. During the tuning process, we experience an increase in the number of confirmations for low λ, and a decrease in the variance for high λ. Hence, a directed tuning of ATB λ (and AVI) to the two different objectives can be achieved. The integration of risk-aversion further results in a faster and more reliable AVI-tuning. 5 Conclusion In this paper, we applied an ADP method to a DVRP with stochastic customer requests enabling anticipation and the inclusion of service provider s riskaversion. Even though we simplistically assume the expected values V to follow a uniform distribution, results show that the integration is not only possible, but also strengthens the tuning process and even improves the overall (risk-neutral) objective. In this paper, we considered a vanilla DVRP. Future work may focus on more real-world related problems and problems containing unlikely events

6 6 Marlin W. Ulmer, Stefan Voß with significant impacts (e.g., vehicle breakdowns). For a more efficient tuning process, risk directed sampling may be included in the approach as proposed in [12]. Further, historical data about previous decision making may be analyzed to quantify service providers risk-aversion. For a more accurate approximation, the distribution of V could be explicitly considered by a set of quantiles. Finally, a mild risk-aversion improves the (risk-neutral) objective. Hence, it may be beneficial for many ADP methods to include a dynamic risk measure for a strengthened and more reliable tuning process. The risk-aversion may decrease during the tuning process once a more reliable approximation is achieved. References 1. Psaraftis, H.N., Wen, M., Kontovas, C.A.: Dynamic vehicle routing problems: Three decades and counting. Networks, online available (2015) 2. Ulmer, M.W., Mattfeld, D.C., Köster, F.: Budgeting time for dynamic vehicle routing with stochastic customer requests. Technical report, Technische Universität Braunschweig, Germany (2015) 3. Powell, W.B., Towns, M.T., Marar, A.: On the value of optimal myopic solutions for dynamic routing and scheduling problems in the presence of user noncompliance. Transportation Science 34 (2000) Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Volume 842. John Wiley & Sons, New York (2011) 5. Dyer, J.S., Sarin, R.K.: Relative risk aversion. Management Science 28 (1982) Jackwerth, J.C.: Recovering risk aversion from option prices and realized returns. Review of Financial Studies 13 (2000) Adulyasak, Y., Jaillet, P.: Models and algorithms for stochastic and robust vehicle routing with deadlines. Transportation Science, online available (2015) 8. Lau, H.C., Yeoh, W., Varakantham, P., Nguyen, D.T., Chen, H.: Dynamic stochastic orienteering problems for risk-aware applications. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence. (2012) Ordóñez, F.: Robust vehicle routing. TUTORIALS in Operations Research (2010) Jaillet, P., Wagner, M.R.: Online routing problems: Value of advanced information as improved competitive ratios. Transportation Science 40 (2006) Taniguchi, E., Thompson, R.G., Yamada, T.: Incorporating risks in city logistics. Procedia-Social and Behavioral Sciences 2 (2010) Jiang, D.R., Powell, W.B.: Approximate dynamic programming for dynamic quantile-based risk measures. Technical report, Princeton University (2015) 13. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York, New York, New York, New York (2014) 14. Ruszczyński, A.: Risk-averse dynamic programming for markov decision processes. Mathematical Programming 125 (2010) Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. Journal of Risk 2 (2000) Ulmer, M.W., Mattfeld, D.C., Hennig, M., Goodson, J.C.: A rollout algorithm for vehicle routing with stochastic customer requests. In: Logistics Management. Springer (2015)

Multistage risk-averse asset allocation with transaction costs

Multistage risk-averse asset allocation with transaction costs 1 Introduction Václav Kozmík 1 Abstract. This paper deals with asset allocation problems formulated as multistage stochastic programming models.