Risk-Averse Anticipation for Dynamic Vehicle Routing

Size: px
Start display at page:

Download "Risk-Averse Anticipation for Dynamic Vehicle Routing"

Transcription

1 Risk-Averse Anticipation for Dynamic Vehicle Routing Marlin W. Ulmer 1 and Stefan Voß 2 1 Technische Universität Braunschweig, Mühlenpfordtstr. 23, Braunschweig, Germany, m.ulmer@tu-braunschweig.de 2 Universität Hamburg, Von-Melle-Park 5, Hamburg, Germany, stefan.voss@hamburg.de Abstract. In the field of dynamic vehicle routing, the importance to integrate stochastic information about possible future events in current decision making increases. Integration is achieved by anticipatory solution approaches, often based on approximate dynamic programming (ADP). ADP methods estimate the expected mean values of future outcomes. In many cases, decision makers are risk-averse, meaning that they avoid risky decisions with highly volatile outcomes. Current ADP methods in the field of dynamic vehicle routing are not able to integrate riskaversion. In this paper, we adapt a recently proposed ADP method explicitly considering risk-aversion to a dynamic vehicle routing problem with stochastic requests. We analyze how risk-aversion impacts solutions quality and variance. We show that a mild risk-aversion may even improve the risk-neutral objective. Keywords: dynamic vehicle routing, anticipation, risk-aversion, approximate dynamic programming, stochastic customer requests 1 Introduction Many service providers dispatch a fleet of vehicles during the day to transport goods or passengers and to conduct services at customers. Factors like e-commerce, digitization, and urbanization lead to an increase in uncertainty dispatchers have to consider in their plans, e.g., in travel times, service times, or customer demands [1]. Especially, customer requests often occur spontaneously during the day. In many cases, new requests require significant adaptions of the current plan [2]. These are enabled by real-time computational resources. Practical routing applications are generally modeled as dynamic vehicle routing problems (DVRPs, compare [1]). For many DVRPs, static approaches applied on a rolling horizon are not suitable [3]. Anticipation of possible future events and decisions is mandatory to allow reliable, flexible, and effective plans. Anticipation can be achieved by approximate dynamic programming [4]. ADP for DVRPs is widely established, especially for stochastic requests [2]. ADP methods evaluate decisions regarding the expected future rewards (or costs). The expected future rewards are usually approximated via simulation. Generally, a

2 2 Marlin W. Ulmer, Stefan Voß tradeoff between current and future rewards can be experienced. High immediate rewards lower the expected future rewards. Dispatchers aim for an optimal balance between immediate and future rewards. All ADP approaches applied to DVRPs maximize the sum of immediate and expected future rewards. In practice, decisions also depend on the variance of the expected future rewards, i.e., the service provider s risk-aversion [5]. A riskaverse provider may discount the expected future rewards if a high variance, i.e., a high uncertainty of a decision s success is given. In some cases, practitioners are able to quantify their risk-aversion. In other cases, the degree of risk-aversion can be derived by analyzing historical decisions [6]. The derived properties then have to be integrated in a suitable anticipatory DVRP-approach. Work on risk-aversion for vehicle routing problems is limited. In (static) vehicle routing with stochastic travel times explicit inclusion of risk-aversion is, e.g., achieved by [7]. [8] evaluate plans by risk for a dynamic orienteering problem. Risk-aversion is indirectly considered in robust optimization (e.g., [9]) and competitive analysis (e.g., [10]). Both optimize to avoid worst-case scenarios. Their practical suitability is often limited. Until now, the ADP methods applied to DVRPs are not able to integrate practitioners risk-aversion. Anticipation is based on mean values. Especially low probability - high impact incidences are not sufficiently considered [11]. Recently, Jiang and Powell [12] proposed a general ADP method integrating quantiles of the expected value-distribution and therefore the variance in the anticipation. In this paper, we adapt the proposed method to an ADP approach of anticipatory time budgeting (ATB, [2]) for a DVRP with stochastic customer requests. We analyze the impact on rewards and variances for different instance settings and degrees of risk-aversion. This paper is the first work integrating (dynamic) risk-aversion in an ADP approach for dynamic vehicle routing. We show that an explicit inclusion of riskaversion in DVRPs is possible and that a mild risk-aversion even strengthens the approximation process resulting in higher rewards and lower variances compared to the risk-neutral equivalent. 2 Dynamic VRP with Stochastic Requests In this section, we define the DVRP with stochastic requests via Markov decision process (MDP, [13]). For the given problem, a vehicle serves customers in a service area considering a time limit. The tour starts and ends in a depot. A set of known early request customers (ERC) has to be served. During the day, new requests occur. If the vehicle is located at a customer, the dispatcher has to decide about the subset of occurred requests to be confirmed and the next customer to visit. Waiting is permitted. The dispatcher aims on maximizing the confirmed late request customers (LRC). Modeling the problem as MDP, a decision point k occurs if the vehicle is located at a customer. A state S k consists of the point of time, the vehicle s position, the set of not yet served ERC and confirmed LRC, and the set of new LRC. Decisions x are made about the subset to be confirmed and the next customer to visit, respectively, waiting.

3 Risk-Averse Anticipation for Dynamic Vehicle Routing 3 The immediate reward R(S k, x) is the number of newly confirmed LRC. A postdecision state Sk x consists of the point of time, the vehicle s position, the not yet served ERC and confirmed LRC, and the next customer to visit. The transition results from the vehicle s travel and provides a new set of requesting LRC. The process terminates in state S K when no customers remain to be served and the vehicle has returned to the depot. The objective is to derive a decision policy π maximizing the expected sum of rewards over all decision points. Notably, the objective is defined for a risk-neutral dispatcher. 3 Risk-Averse Time Budgeting In this section, we extend ATB by [2] to ATB λ allowing the integration of riskaversion. ATB draws on the ADP method of approximate value iteration (AVI, [4]) to evaluate post-decision states (PDSs) S x regarding the expected number of future confirmations, i.e., their value V (S x ). To be more specific, AVI represents ways of using past experience about the algorithm behavior to improve future performance. Tuning refers to the update of values. Because of the curses of dimensionality, PDSs are aggregated to vectors containing the point of time and the remaining free time budget. The resulting vector space is then partitioned to a lookup table (LT). Every entry of the LT contains a set of vectors. AVI starts with initial tuning and entry values ˆV 0 inducing a policy π 0. Then, AVI iteratively simulates a problem s realization i and tune the values ˆV i 1 regarding the algorithms performance. Within each approximation run i, policy π i 1 is applied based on Bellman s Equation [13] depicted in Eq. (1). The values for the new policy π i are tuned by the realized values of approximation run i. { X π i k (S k) = arg max R(S k, x) + ˆV } i (S x ) (1) x X (S k ) V (S x ) is a random variable. A risk-averse policy aims on avoiding highly volatile V (S x ). Notably, V (S x ) is the sum of a sequence of interdependent random variables R(S k+i, x), 0 < i < K k, i.e., the volatility and the impact of the volatility may change over the subsequent decision points. A straightforward evaluation of the variance of V (S x ) is not sufficient to consider dynamic riskaversion. [14] describe dynamic risk measures ρ(s x ) considering the risk over the subsequent decision points. [12] present an algorithm to approximate ρ(s x ) for every post-decision state by ρ α via ADP methods. They use the quantiles of ρ α as an approximation of the real value distribution of ρ. For ATB λ, we draw on the concept of conditional value at risk (CVaR, [15]). The considered dynamic risk measure ρ α is induced by the one-step conditional risk measure ρ λ as depicted in Eq. (2). ρ λ (S x ) = (1 λ)v (S x ) + λρ α (S x ) (2) To achieve ρ α, ρ λ is recursively applied over the subsequent decision points. For an efficient approximation, we simplistically assume V (S x ) follows a uniform

4 4 Marlin W. Ulmer, Stefan Voß distribution. This avoids an extensive estimation of the distribution for every value. As a result, parameter λ [0, 1] directly determines the dispatcher s riskaversion. λ = 0 results in risk-neutrality and ATB. λ = 1 results in a myopic policy. For the tuning of ATB λ, we approximate both V (S x ) and ρ α (S x ). 4 Computational Studies In this section, we define the settings of ATB λ, briefly describe the instances, and analyze the results. For ATB λ, we follow the parameter settings of [2]. We use a (static) LT with interval length of one. We consider the tuning after 1 million approximation runs. We draw on the instances of [2]. The time limit is set to 360 minutes. The vehicle travels with monotone speed v = 15km/h in a service area of 20 20km 2. The expected number of customers is 100. The percentage of LRC is 75%. Customer requests follow a Poisson distribution over time. We consider three spatial customer distributions. Customers are distributed uniformly (F U ), equally grouped in two clusters (F 2C ), or distributed in three clusters (F 3C ). Within the clusters, the request probability is normally distributed. Table 1. Results; best values are depicted in bold. Confirmations Variance λ F U F 2C F 3C F U F 2C F 3C For each instance setting, we run 1,000 test runs for λ = 0.0, 0.1,..., 1.0. The average number of confirmations and the variance are depicted in Table 1. Notably, a mild risk-aversion leads to a higher risk-neutral objective value. This can be explained by the impact of risk-aversion on the tuning process. For a high λ, only the (relatively certain) outcomes of the next few decision points define the decision policy leading to a fast and more reliable tuning process. A low λ results in an equal consideration of all subsequent decision points and outcomes. The according tuning process requires a high number of approximation runs to be accurate. This is especially the case for the clustered customer distributions [2]. Further, ATB is based only on temporal attributes and may provide a less

5 Risk-Averse Anticipation for Dynamic Vehicle Routing 5 reliable tuning for clustered distributions compared to F U [16]. As a result, the highest amount of confirmations is achieved for λ = 0.3 and λ = 0.4 for the clustered distributions. As expected, we experience a constant decrease of the variances between λ = 0.0 and λ = 0.5. Afterwards, the variance increases. A high λ is similar to a myopic policy and results in outcomes highly dependent on the problem s realization Variance k 500k 1,000k Confirmations Fig. 1. Solution Quality and Standard Deviation for Varying λ and F U We now analyze the tuning process and the tradeoff between number of confirmations and variance in more detail. Figure 1 shows the number of confirmations and variance for varying λ and F U for 1,000 test runs and policies achieved by 100k, 500k and 1,000k approximation runs. For 1,000k, λ = 0.1 to λ = 0.5 span a Pareto-front for both dimensions. For 100k, the tuning for ATB λ with λ = 0.1 is (still) not sufficient. During the tuning process, we experience an increase in the number of confirmations for low λ, and a decrease in the variance for high λ. Hence, a directed tuning of ATB λ (and AVI) to the two different objectives can be achieved. The integration of risk-aversion further results in a faster and more reliable AVI-tuning. 5 Conclusion In this paper, we applied an ADP method to a DVRP with stochastic customer requests enabling anticipation and the inclusion of service provider s riskaversion. Even though we simplistically assume the expected values V to follow a uniform distribution, results show that the integration is not only possible, but also strengthens the tuning process and even improves the overall (risk-neutral) objective. In this paper, we considered a vanilla DVRP. Future work may focus on more real-world related problems and problems containing unlikely events

6 6 Marlin W. Ulmer, Stefan Voß with significant impacts (e.g., vehicle breakdowns). For a more efficient tuning process, risk directed sampling may be included in the approach as proposed in [12]. Further, historical data about previous decision making may be analyzed to quantify service providers risk-aversion. For a more accurate approximation, the distribution of V could be explicitly considered by a set of quantiles. Finally, a mild risk-aversion improves the (risk-neutral) objective. Hence, it may be beneficial for many ADP methods to include a dynamic risk measure for a strengthened and more reliable tuning process. The risk-aversion may decrease during the tuning process once a more reliable approximation is achieved. References 1. Psaraftis, H.N., Wen, M., Kontovas, C.A.: Dynamic vehicle routing problems: Three decades and counting. Networks, online available (2015) 2. Ulmer, M.W., Mattfeld, D.C., Köster, F.: Budgeting time for dynamic vehicle routing with stochastic customer requests. Technical report, Technische Universität Braunschweig, Germany (2015) 3. Powell, W.B., Towns, M.T., Marar, A.: On the value of optimal myopic solutions for dynamic routing and scheduling problems in the presence of user noncompliance. Transportation Science 34 (2000) Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Volume 842. John Wiley & Sons, New York (2011) 5. Dyer, J.S., Sarin, R.K.: Relative risk aversion. Management Science 28 (1982) Jackwerth, J.C.: Recovering risk aversion from option prices and realized returns. Review of Financial Studies 13 (2000) Adulyasak, Y., Jaillet, P.: Models and algorithms for stochastic and robust vehicle routing with deadlines. Transportation Science, online available (2015) 8. Lau, H.C., Yeoh, W., Varakantham, P., Nguyen, D.T., Chen, H.: Dynamic stochastic orienteering problems for risk-aware applications. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence. (2012) Ordóñez, F.: Robust vehicle routing. TUTORIALS in Operations Research (2010) Jaillet, P., Wagner, M.R.: Online routing problems: Value of advanced information as improved competitive ratios. Transportation Science 40 (2006) Taniguchi, E., Thompson, R.G., Yamada, T.: Incorporating risks in city logistics. Procedia-Social and Behavioral Sciences 2 (2010) Jiang, D.R., Powell, W.B.: Approximate dynamic programming for dynamic quantile-based risk measures. Technical report, Princeton University (2015) 13. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York, New York, New York, New York (2014) 14. Ruszczyński, A.: Risk-averse dynamic programming for markov decision processes. Mathematical Programming 125 (2010) Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. Journal of Risk 2 (2000) Ulmer, M.W., Mattfeld, D.C., Hennig, M., Goodson, J.C.: A rollout algorithm for vehicle routing with stochastic customer requests. In: Logistics Management. Springer (2015)

Multistage risk-averse asset allocation with transaction costs

Multistage risk-averse asset allocation with transaction costs Multistage risk-averse asset allocation with transaction costs 1 Introduction Václav Kozmík 1 Abstract. This paper deals with asset allocation problems formulated as multistage stochastic programming models.

More information

Accelerated Option Pricing Multiple Scenarios

Accelerated Option Pricing Multiple Scenarios Accelerated Option Pricing in Multiple Scenarios 04.07.2008 Stefan Dirnstorfer (stefan@thetaris.com) Andreas J. Grau (grau@thetaris.com) 1 Abstract This paper covers a massive acceleration of Monte-Carlo

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Portfolio Optimization using Conditional Sharpe Ratio

Portfolio Optimization using Conditional Sharpe Ratio International Letters of Chemistry, Physics and Astronomy Online: 2015-07-01 ISSN: 2299-3843, Vol. 53, pp 130-136 doi:10.18052/www.scipress.com/ilcpa.53.130 2015 SciPress Ltd., Switzerland Portfolio Optimization

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information

Dynamic Programming (DP) Massimo Paolucci University of Genova

Dynamic Programming (DP) Massimo Paolucci University of Genova Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem

More information

Sequential Coalition Formation for Uncertain Environments

Sequential Coalition Formation for Uncertain Environments Sequential Coalition Formation for Uncertain Environments Hosam Hanna Computer Sciences Department GREYC - University of Caen 14032 Caen - France hanna@info.unicaen.fr Abstract In several applications,

More information

MULTISTAGE PORTFOLIO OPTIMIZATION AS A STOCHASTIC OPTIMAL CONTROL PROBLEM

MULTISTAGE PORTFOLIO OPTIMIZATION AS A STOCHASTIC OPTIMAL CONTROL PROBLEM K Y B E R N E T I K A M A N U S C R I P T P R E V I E W MULTISTAGE PORTFOLIO OPTIMIZATION AS A STOCHASTIC OPTIMAL CONTROL PROBLEM Martin Lauko Each portfolio optimization problem is a trade off between

More information

Dynamic Risk Management in Electricity Portfolio Optimization via Polyhedral Risk Functionals

Dynamic Risk Management in Electricity Portfolio Optimization via Polyhedral Risk Functionals Dynamic Risk Management in Electricity Portfolio Optimization via Polyhedral Risk Functionals A. Eichhorn and W. Römisch Humboldt-University Berlin, Department of Mathematics, Germany http://www.math.hu-berlin.de/~romisch

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

A Simple Utility Approach to Private Equity Sales

A Simple Utility Approach to Private Equity Sales The Journal of Entrepreneurial Finance Volume 8 Issue 1 Spring 2003 Article 7 12-2003 A Simple Utility Approach to Private Equity Sales Robert Dubil San Jose State University Follow this and additional

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Financial Giffen Goods: Examples and Counterexamples

Financial Giffen Goods: Examples and Counterexamples Financial Giffen Goods: Examples and Counterexamples RolfPoulsen and Kourosh Marjani Rasmussen Abstract In the basic Markowitz and Merton models, a stock s weight in efficient portfolios goes up if its

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Risk-averse Reinforcement Learning for Algorithmic Trading

Risk-averse Reinforcement Learning for Algorithmic Trading Risk-averse Reinforcement Learning for Algorithmic Trading Yun Shen 1 Ruihong Huang 2,3 Chang Yan 2 Klaus Obermayer 1 1 TECHNISCHE UNIVERSITÄT BERLIN 2 HUMBOLDT-UNIVERSITÄT ZU BERLIN 3 LOBSTER TEAM IEEE

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in

More information

Fast Convergence of Regress-later Series Estimators

Fast Convergence of Regress-later Series Estimators Fast Convergence of Regress-later Series Estimators New Thinking in Finance, London Eric Beutner, Antoon Pelsser, Janina Schweizer Maastricht University & Kleynen Consultants 12 February 2014 Beutner Pelsser

More information

c 2004 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-2004), Budapest, Hungary, pp

c 2004 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-2004), Budapest, Hungary, pp c 24 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-24), Budapest, Hungary, pp. 197 112. This material is posted here with permission of the IEEE.

More information

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.

More information

The robust approach to simulation selection

The robust approach to simulation selection The robust approach to simulation selection Ilya O. Ryzhov 1 Boris Defourny 2 Warren B. Powell 2 1 Robert H. Smith School of Business University of Maryland College Park, MD 20742 2 Operations Research

More information

Risk-Return Optimization of the Bank Portfolio

Risk-Return Optimization of the Bank Portfolio Risk-Return Optimization of the Bank Portfolio Ursula Theiler Risk Training, Carl-Zeiss-Str. 11, D-83052 Bruckmuehl, Germany, mailto:theiler@risk-training.org. Abstract In an intensifying competition banks

More information

Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment

Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3-6, 2012 Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing

More information

Portfolio Optimization with Alternative Risk Measures

Portfolio Optimization with Alternative Risk Measures Portfolio Optimization with Alternative Risk Measures Prof. Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) MAFS6010R- Portfolio Optimization with R MSc in Financial Mathematics

More information

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function? DOI 0.007/s064-006-9073-z ORIGINAL PAPER Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function? Jules H. van Binsbergen Michael W. Brandt Received:

More information

Hedging with Life and General Insurance Products

Hedging with Life and General Insurance Products Hedging with Life and General Insurance Products June 2016 2 Hedging with Life and General Insurance Products Jungmin Choi Department of Mathematics East Carolina University Abstract In this study, a hybrid

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

arxiv: v1 [math.pr] 6 Apr 2015

arxiv: v1 [math.pr] 6 Apr 2015 Analysis of the Optimal Resource Allocation for a Tandem Queueing System arxiv:1504.01248v1 [math.pr] 6 Apr 2015 Liu Zaiming, Chen Gang, Wu Jinbiao School of Mathematics and Statistics, Central South University,

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in

More information

"Pricing Exotic Options using Strong Convergence Properties

Pricing Exotic Options using Strong Convergence Properties Fourth Oxford / Princeton Workshop on Financial Mathematics "Pricing Exotic Options using Strong Convergence Properties Klaus E. Schmitz Abe schmitz@maths.ox.ac.uk www.maths.ox.ac.uk/~schmitz Prof. Mike

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

Dynamic Portfolio Choice II

Dynamic Portfolio Choice II Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS

EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Commun. Korean Math. Soc. 23 (2008), No. 2, pp. 285 294 EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Kyoung-Sook Moon Reprinted from the Communications of the Korean Mathematical Society

More information

The Problem of Temporal Abstraction

The Problem of Temporal Abstraction The Problem of Temporal Abstraction How do we connect the high level to the low-level? " the human level to the physical level? " the decide level to the action level? MDPs are great, search is great,

More information

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model Simerjot Kaur (sk3391) Stanford University Abstract This work presents a novel algorithmic trading system based on reinforcement

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

Exchange rate dynamics and Forex hedging strategies

Exchange rate dynamics and Forex hedging strategies Exchange rate dynamics and Forex hedging strategies AUTHORS ARTICLE INFO JOURNAL Mihir Dash Anand Kumar N.S. Mihir Dash and Anand Kumar N.S. (2013). Exchange rate dynamics and Forex hedging strategies.

More information

Reinforcement Learning Analysis, Grid World Applications

Reinforcement Learning Analysis, Grid World Applications Reinforcement Learning Analysis, Grid World Applications Kunal Sharma GTID: ksharma74, CS 4641 Machine Learning Abstract This paper explores two Markov decision process problems with varying state sizes.

More information

A No-Arbitrage Theorem for Uncertain Stock Model

A No-Arbitrage Theorem for Uncertain Stock Model Fuzzy Optim Decis Making manuscript No (will be inserted by the editor) A No-Arbitrage Theorem for Uncertain Stock Model Kai Yao Received: date / Accepted: date Abstract Stock model is used to describe

More information

Problem set 5. Asset pricing. Markus Roth. Chair for Macroeconomics Johannes Gutenberg Universität Mainz. Juli 5, 2010

Problem set 5. Asset pricing. Markus Roth. Chair for Macroeconomics Johannes Gutenberg Universität Mainz. Juli 5, 2010 Problem set 5 Asset pricing Markus Roth Chair for Macroeconomics Johannes Gutenberg Universität Mainz Juli 5, 200 Markus Roth (Macroeconomics 2) Problem set 5 Juli 5, 200 / 40 Contents Problem 5 of problem

More information

Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning

Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning Hao Wang, Baosen Zhang Department of Electrical Engineering, University of Washington, Seattle, WA 9895 Email: {hwang6,zhangbao}@uw.edu

More information

Reinforcement Learning. Monte Carlo and Temporal Difference Learning

Reinforcement Learning. Monte Carlo and Temporal Difference Learning Reinforcement Learning Monte Carlo and Temporal Difference Learning Manfred Huber 2014 1 Monte Carlo Methods Dynamic Programming Requires complete knowledge of the MDP Spends equal time on each part of

More information

Worst-case-expectation approach to optimization under uncertainty

Worst-case-expectation approach to optimization under uncertainty Worst-case-expectation approach to optimization under uncertainty Wajdi Tekaya Joint research with Alexander Shapiro, Murilo Pereira Soares and Joari Paulo da Costa : Cambridge Systems Associates; : Georgia

More information

Available online at ScienceDirect. Procedia Computer Science 95 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 95 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 95 (2016 ) 483 488 Complex Adaptive Systems, Publication 6 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course

More information

Math 416/516: Stochastic Simulation

Math 416/516: Stochastic Simulation Math 416/516: Stochastic Simulation Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 13 Haijun Li Math 416/516: Stochastic Simulation Week 13 1 / 28 Outline 1 Simulation

More information

Stochastic Dynamic Programming for Portfolio Selection Problem applied to CAC40

Stochastic Dynamic Programming for Portfolio Selection Problem applied to CAC40 Stochastic Dynamic Programming for Portfolio Selection Problem applied to CAC40 Edouard BERTHE berthe.ed@gmail.com Student 43966813 École Centrale Paris - The University of Queensland Supervisor: Michael

More information

Math Models of OR: More on Equipment Replacement

Math Models of OR: More on Equipment Replacement Math Models of OR: More on Equipment Replacement John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA December 2018 Mitchell More on Equipment Replacement 1 / 9 Equipment replacement

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2016 Slide 1 CPSC 422, Lecture 9 An MDP Approach to Multi-Category Patient Scheduling in a Diagnostic Facility Adapted from: Matthew

More information

Optimal Investment with Deferred Capital Gains Taxes

Optimal Investment with Deferred Capital Gains Taxes Optimal Investment with Deferred Capital Gains Taxes A Simple Martingale Method Approach Frank Thomas Seifried University of Kaiserslautern March 20, 2009 F. Seifried (Kaiserslautern) Deferred Capital

More information

Risk measures: Yet another search of a holy grail

Risk measures: Yet another search of a holy grail Risk measures: Yet another search of a holy grail Dirk Tasche Financial Services Authority 1 dirk.tasche@gmx.net Mathematics of Financial Risk Management Isaac Newton Institute for Mathematical Sciences

More information

Lecture 12. Asset pricing model. Randall Romero Aguilar, PhD I Semestre 2017 Last updated: June 15, 2017

Lecture 12. Asset pricing model. Randall Romero Aguilar, PhD I Semestre 2017 Last updated: June 15, 2017 Lecture 12 Asset pricing model Randall Romero Aguilar, PhD I Semestre 2017 Last updated: June 15, 2017 Universidad de Costa Rica EC3201 - Teoría Macroeconómica 2 Table of contents 1. Introduction 2. The

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets

Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets Selvaprabu (Selva) Nadarajah, (Joint work with François Margot and Nicola Secomandi) Tepper School

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

A Study on Numerical Solution of Black-Scholes Model

A Study on Numerical Solution of Black-Scholes Model Journal of Mathematical Finance, 8, 8, 37-38 http://www.scirp.org/journal/jmf ISSN Online: 6-44 ISSN Print: 6-434 A Study on Numerical Solution of Black-Scholes Model Md. Nurul Anwar,*, Laek Sazzad Andallah

More information

INTERTEMPORAL ASSET ALLOCATION: THEORY

INTERTEMPORAL ASSET ALLOCATION: THEORY INTERTEMPORAL ASSET ALLOCATION: THEORY Multi-Period Model The agent acts as a price-taker in asset markets and then chooses today s consumption and asset shares to maximise lifetime utility. This multi-period

More information

Markov Decision Processes II

Markov Decision Processes II Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies

More information

Lecture outline W.B.Powell 1

Lecture outline W.B.Powell 1 Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

A class of coherent risk measures based on one-sided moments

A class of coherent risk measures based on one-sided moments A class of coherent risk measures based on one-sided moments T. Fischer Darmstadt University of Technology November 11, 2003 Abstract This brief paper explains how to obtain upper boundaries of shortfall

More information

A lower bound on seller revenue in single buyer monopoly auctions

A lower bound on seller revenue in single buyer monopoly auctions A lower bound on seller revenue in single buyer monopoly auctions Omer Tamuz October 7, 213 Abstract We consider a monopoly seller who optimally auctions a single object to a single potential buyer, with

More information

Capital Adequacy and Liquidity in Banking Dynamics

Capital Adequacy and Liquidity in Banking Dynamics Capital Adequacy and Liquidity in Banking Dynamics Jin Cao Lorán Chollete October 9, 2014 Abstract We present a framework for modelling optimum capital adequacy in a dynamic banking context. We combine

More information

An Application of Extreme Value Theory for Measuring Financial Risk in the Uruguayan Pension Fund 1

An Application of Extreme Value Theory for Measuring Financial Risk in the Uruguayan Pension Fund 1 An Application of Extreme Value Theory for Measuring Financial Risk in the Uruguayan Pension Fund 1 Guillermo Magnou 23 January 2016 Abstract Traditional methods for financial risk measures adopts normal

More information

Chapter 9 Dynamic Models of Investment

Chapter 9 Dynamic Models of Investment George Alogoskoufis, Dynamic Macroeconomic Theory, 2015 Chapter 9 Dynamic Models of Investment In this chapter we present the main neoclassical model of investment, under convex adjustment costs. This

More information

Optimal Securitization via Impulse Control

Optimal Securitization via Impulse Control Optimal Securitization via Impulse Control Rüdiger Frey (joint work with Roland C. Seydel) Mathematisches Institut Universität Leipzig and MPI MIS Leipzig Bachelier Finance Society, June 21 (1) Optimal

More information

A distributed Laplace transform algorithm for European options

A distributed Laplace transform algorithm for European options A distributed Laplace transform algorithm for European options 1 1 A. J. Davies, M. E. Honnor, C.-H. Lai, A. K. Parrott & S. Rout 1 Department of Physics, Astronomy and Mathematics, University of Hertfordshire,

More information

Analysis of truncated data with application to the operational risk estimation

Analysis of truncated data with application to the operational risk estimation Analysis of truncated data with application to the operational risk estimation Petr Volf 1 Abstract. Researchers interested in the estimation of operational risk often face problems arising from the structure

More information

Introduction. Tero Haahtela

Introduction. Tero Haahtela Lecture Notes in Management Science (2012) Vol. 4: 145 153 4 th International Conference on Applied Operational Research, Proceedings Tadbir Operational Research Group Ltd. All rights reserved. www.tadbir.ca

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence

More information

On the investment}uncertainty relationship in a real options model

On the investment}uncertainty relationship in a real options model Journal of Economic Dynamics & Control 24 (2000) 219}225 On the investment}uncertainty relationship in a real options model Sudipto Sarkar* Department of Finance, College of Business Administration, University

More information

Decoupling and Agricultural Investment with Disinvestment Flexibility: A Case Study with Decreasing Expectations

Decoupling and Agricultural Investment with Disinvestment Flexibility: A Case Study with Decreasing Expectations Decoupling and Agricultural Investment with Disinvestment Flexibility: A Case Study with Decreasing Expectations T. Heikkinen MTT Economic Research Luutnantintie 13, 00410 Helsinki FINLAND email:tiina.heikkinen@mtt.fi

More information

The End-of-the-Year Bonus: How to Optimally Reward a Trader?

The End-of-the-Year Bonus: How to Optimally Reward a Trader? The End-of-the-Year Bonus: How to Optimally Reward a Trader? Hyungsok Ahn Jeff Dewynne Philip Hua Antony Penaud Paul Wilmott February 14, 2 ABSTRACT Traders are compensated by bonuses, in addition to their

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

Value-at-Risk Based Portfolio Management in Electric Power Sector

Value-at-Risk Based Portfolio Management in Electric Power Sector Value-at-Risk Based Portfolio Management in Electric Power Sector Ran SHI, Jin ZHONG Department of Electrical and Electronic Engineering University of Hong Kong, HKSAR, China ABSTRACT In the deregulated

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Optimal construction of a fund of funds

Optimal construction of a fund of funds Optimal construction of a fund of funds Petri Hilli, Matti Koivu and Teemu Pennanen January 28, 29 Introduction We study the problem of diversifying a given initial capital over a finite number of investment

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent

More information

Optimal Scheduling Policy Determination in HSDPA Networks

Optimal Scheduling Policy Determination in HSDPA Networks Optimal Scheduling Policy Determination in HSDPA Networks Hussein Al-Zubaidy, Jerome Talim, Ioannis Lambadaris SCE-Carleton University 1125 Colonel By Drive, Ottawa, ON, Canada Email: {hussein, jtalim,

More information

Log-Robust Portfolio Management

Log-Robust Portfolio Management Log-Robust Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Elcin Cetinkaya and Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983 Dr.

More information

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2 COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

Optimal Policies of Newsvendor Model Under Inventory-Dependent Demand Ting GAO * and Tao-feng YE

Optimal Policies of Newsvendor Model Under Inventory-Dependent Demand Ting GAO * and Tao-feng YE 207 2 nd International Conference on Education, Management and Systems Engineering (EMSE 207 ISBN: 978--60595-466-0 Optimal Policies of Newsvendor Model Under Inventory-Dependent Demand Ting GO * and Tao-feng

More information

1. For two independent lives now age 30 and 34, you are given:

1. For two independent lives now age 30 and 34, you are given: Society of Actuaries Course 3 Exam Fall 2003 **BEGINNING OF EXAMINATION** 1. For two independent lives now age 30 and 34, you are given: x q x 30 0.1 31 0.2 32 0.3 33 0.4 34 0.5 35 0.6 36 0.7 37 0.8 Calculate

More information

On worst-case investment with applications in finance and insurance mathematics

On worst-case investment with applications in finance and insurance mathematics On worst-case investment with applications in finance and insurance mathematics Ralf Korn and Olaf Menkens Fachbereich Mathematik, Universität Kaiserslautern, 67653 Kaiserslautern Summary. We review recent

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Stochastic domains Image: Berkeley CS188 course notes (downloaded Summer

More information

STOCHASTIC COST ESTIMATION AND RISK ANALYSIS IN MANAGING SOFTWARE PROJECTS

STOCHASTIC COST ESTIMATION AND RISK ANALYSIS IN MANAGING SOFTWARE PROJECTS Full citation: Connor, A.M., & MacDonell, S.G. (25) Stochastic cost estimation and risk analysis in managing software projects, in Proceedings of the ISCA 14th International Conference on Intelligent and

More information

Portfolio optimization problem with default risk

Portfolio optimization problem with default risk Portfolio optimization problem with default risk M.Mazidi, A. Delavarkhalafi, A.Mokhtari mazidi.3635@gmail.com delavarkh@yazduni.ac.ir ahmokhtari20@gmail.com Faculty of Mathematics, Yazd University, P.O.

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

Robust Portfolio Decisions for Financial Institutions

Robust Portfolio Decisions for Financial Institutions Robust Portfolio Decisions for Financial Institutions Ioannis Baltas 1,3, Athanasios N. Yannacopoulos 2,3 & Anastasios Xepapadeas 4 1 Department of Financial and Management Engineering University of the

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Consumption and Portfolio Choice under Uncertainty

Consumption and Portfolio Choice under Uncertainty Chapter 8 Consumption and Portfolio Choice under Uncertainty In this chapter we examine dynamic models of consumer choice under uncertainty. We continue, as in the Ramsey model, to take the decision of

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

Portfolio Management and Optimal Execution via Convex Optimization

Portfolio Management and Optimal Execution via Convex Optimization Portfolio Management and Optimal Execution via Convex Optimization Enzo Busseti Stanford University April 9th, 2018 Problems portfolio management choose trades with optimization minimize risk, maximize

More information