Risk-Averse Anticipation for Dynamic Vehicle Routing
|
|
- Mark Cameron
- 6 years ago
- Views:
Transcription
1 Risk-Averse Anticipation for Dynamic Vehicle Routing Marlin W. Ulmer 1 and Stefan Voß 2 1 Technische Universität Braunschweig, Mühlenpfordtstr. 23, Braunschweig, Germany, m.ulmer@tu-braunschweig.de 2 Universität Hamburg, Von-Melle-Park 5, Hamburg, Germany, stefan.voss@hamburg.de Abstract. In the field of dynamic vehicle routing, the importance to integrate stochastic information about possible future events in current decision making increases. Integration is achieved by anticipatory solution approaches, often based on approximate dynamic programming (ADP). ADP methods estimate the expected mean values of future outcomes. In many cases, decision makers are risk-averse, meaning that they avoid risky decisions with highly volatile outcomes. Current ADP methods in the field of dynamic vehicle routing are not able to integrate riskaversion. In this paper, we adapt a recently proposed ADP method explicitly considering risk-aversion to a dynamic vehicle routing problem with stochastic requests. We analyze how risk-aversion impacts solutions quality and variance. We show that a mild risk-aversion may even improve the risk-neutral objective. Keywords: dynamic vehicle routing, anticipation, risk-aversion, approximate dynamic programming, stochastic customer requests 1 Introduction Many service providers dispatch a fleet of vehicles during the day to transport goods or passengers and to conduct services at customers. Factors like e-commerce, digitization, and urbanization lead to an increase in uncertainty dispatchers have to consider in their plans, e.g., in travel times, service times, or customer demands [1]. Especially, customer requests often occur spontaneously during the day. In many cases, new requests require significant adaptions of the current plan [2]. These are enabled by real-time computational resources. Practical routing applications are generally modeled as dynamic vehicle routing problems (DVRPs, compare [1]). For many DVRPs, static approaches applied on a rolling horizon are not suitable [3]. Anticipation of possible future events and decisions is mandatory to allow reliable, flexible, and effective plans. Anticipation can be achieved by approximate dynamic programming [4]. ADP for DVRPs is widely established, especially for stochastic requests [2]. ADP methods evaluate decisions regarding the expected future rewards (or costs). The expected future rewards are usually approximated via simulation. Generally, a
2 2 Marlin W. Ulmer, Stefan Voß tradeoff between current and future rewards can be experienced. High immediate rewards lower the expected future rewards. Dispatchers aim for an optimal balance between immediate and future rewards. All ADP approaches applied to DVRPs maximize the sum of immediate and expected future rewards. In practice, decisions also depend on the variance of the expected future rewards, i.e., the service provider s risk-aversion [5]. A riskaverse provider may discount the expected future rewards if a high variance, i.e., a high uncertainty of a decision s success is given. In some cases, practitioners are able to quantify their risk-aversion. In other cases, the degree of risk-aversion can be derived by analyzing historical decisions [6]. The derived properties then have to be integrated in a suitable anticipatory DVRP-approach. Work on risk-aversion for vehicle routing problems is limited. In (static) vehicle routing with stochastic travel times explicit inclusion of risk-aversion is, e.g., achieved by [7]. [8] evaluate plans by risk for a dynamic orienteering problem. Risk-aversion is indirectly considered in robust optimization (e.g., [9]) and competitive analysis (e.g., [10]). Both optimize to avoid worst-case scenarios. Their practical suitability is often limited. Until now, the ADP methods applied to DVRPs are not able to integrate practitioners risk-aversion. Anticipation is based on mean values. Especially low probability - high impact incidences are not sufficiently considered [11]. Recently, Jiang and Powell [12] proposed a general ADP method integrating quantiles of the expected value-distribution and therefore the variance in the anticipation. In this paper, we adapt the proposed method to an ADP approach of anticipatory time budgeting (ATB, [2]) for a DVRP with stochastic customer requests. We analyze the impact on rewards and variances for different instance settings and degrees of risk-aversion. This paper is the first work integrating (dynamic) risk-aversion in an ADP approach for dynamic vehicle routing. We show that an explicit inclusion of riskaversion in DVRPs is possible and that a mild risk-aversion even strengthens the approximation process resulting in higher rewards and lower variances compared to the risk-neutral equivalent. 2 Dynamic VRP with Stochastic Requests In this section, we define the DVRP with stochastic requests via Markov decision process (MDP, [13]). For the given problem, a vehicle serves customers in a service area considering a time limit. The tour starts and ends in a depot. A set of known early request customers (ERC) has to be served. During the day, new requests occur. If the vehicle is located at a customer, the dispatcher has to decide about the subset of occurred requests to be confirmed and the next customer to visit. Waiting is permitted. The dispatcher aims on maximizing the confirmed late request customers (LRC). Modeling the problem as MDP, a decision point k occurs if the vehicle is located at a customer. A state S k consists of the point of time, the vehicle s position, the set of not yet served ERC and confirmed LRC, and the set of new LRC. Decisions x are made about the subset to be confirmed and the next customer to visit, respectively, waiting.
3 Risk-Averse Anticipation for Dynamic Vehicle Routing 3 The immediate reward R(S k, x) is the number of newly confirmed LRC. A postdecision state Sk x consists of the point of time, the vehicle s position, the not yet served ERC and confirmed LRC, and the next customer to visit. The transition results from the vehicle s travel and provides a new set of requesting LRC. The process terminates in state S K when no customers remain to be served and the vehicle has returned to the depot. The objective is to derive a decision policy π maximizing the expected sum of rewards over all decision points. Notably, the objective is defined for a risk-neutral dispatcher. 3 Risk-Averse Time Budgeting In this section, we extend ATB by [2] to ATB λ allowing the integration of riskaversion. ATB draws on the ADP method of approximate value iteration (AVI, [4]) to evaluate post-decision states (PDSs) S x regarding the expected number of future confirmations, i.e., their value V (S x ). To be more specific, AVI represents ways of using past experience about the algorithm behavior to improve future performance. Tuning refers to the update of values. Because of the curses of dimensionality, PDSs are aggregated to vectors containing the point of time and the remaining free time budget. The resulting vector space is then partitioned to a lookup table (LT). Every entry of the LT contains a set of vectors. AVI starts with initial tuning and entry values ˆV 0 inducing a policy π 0. Then, AVI iteratively simulates a problem s realization i and tune the values ˆV i 1 regarding the algorithms performance. Within each approximation run i, policy π i 1 is applied based on Bellman s Equation [13] depicted in Eq. (1). The values for the new policy π i are tuned by the realized values of approximation run i. { X π i k (S k) = arg max R(S k, x) + ˆV } i (S x ) (1) x X (S k ) V (S x ) is a random variable. A risk-averse policy aims on avoiding highly volatile V (S x ). Notably, V (S x ) is the sum of a sequence of interdependent random variables R(S k+i, x), 0 < i < K k, i.e., the volatility and the impact of the volatility may change over the subsequent decision points. A straightforward evaluation of the variance of V (S x ) is not sufficient to consider dynamic riskaversion. [14] describe dynamic risk measures ρ(s x ) considering the risk over the subsequent decision points. [12] present an algorithm to approximate ρ(s x ) for every post-decision state by ρ α via ADP methods. They use the quantiles of ρ α as an approximation of the real value distribution of ρ. For ATB λ, we draw on the concept of conditional value at risk (CVaR, [15]). The considered dynamic risk measure ρ α is induced by the one-step conditional risk measure ρ λ as depicted in Eq. (2). ρ λ (S x ) = (1 λ)v (S x ) + λρ α (S x ) (2) To achieve ρ α, ρ λ is recursively applied over the subsequent decision points. For an efficient approximation, we simplistically assume V (S x ) follows a uniform
4 4 Marlin W. Ulmer, Stefan Voß distribution. This avoids an extensive estimation of the distribution for every value. As a result, parameter λ [0, 1] directly determines the dispatcher s riskaversion. λ = 0 results in risk-neutrality and ATB. λ = 1 results in a myopic policy. For the tuning of ATB λ, we approximate both V (S x ) and ρ α (S x ). 4 Computational Studies In this section, we define the settings of ATB λ, briefly describe the instances, and analyze the results. For ATB λ, we follow the parameter settings of [2]. We use a (static) LT with interval length of one. We consider the tuning after 1 million approximation runs. We draw on the instances of [2]. The time limit is set to 360 minutes. The vehicle travels with monotone speed v = 15km/h in a service area of 20 20km 2. The expected number of customers is 100. The percentage of LRC is 75%. Customer requests follow a Poisson distribution over time. We consider three spatial customer distributions. Customers are distributed uniformly (F U ), equally grouped in two clusters (F 2C ), or distributed in three clusters (F 3C ). Within the clusters, the request probability is normally distributed. Table 1. Results; best values are depicted in bold. Confirmations Variance λ F U F 2C F 3C F U F 2C F 3C For each instance setting, we run 1,000 test runs for λ = 0.0, 0.1,..., 1.0. The average number of confirmations and the variance are depicted in Table 1. Notably, a mild risk-aversion leads to a higher risk-neutral objective value. This can be explained by the impact of risk-aversion on the tuning process. For a high λ, only the (relatively certain) outcomes of the next few decision points define the decision policy leading to a fast and more reliable tuning process. A low λ results in an equal consideration of all subsequent decision points and outcomes. The according tuning process requires a high number of approximation runs to be accurate. This is especially the case for the clustered customer distributions [2]. Further, ATB is based only on temporal attributes and may provide a less
5 Risk-Averse Anticipation for Dynamic Vehicle Routing 5 reliable tuning for clustered distributions compared to F U [16]. As a result, the highest amount of confirmations is achieved for λ = 0.3 and λ = 0.4 for the clustered distributions. As expected, we experience a constant decrease of the variances between λ = 0.0 and λ = 0.5. Afterwards, the variance increases. A high λ is similar to a myopic policy and results in outcomes highly dependent on the problem s realization Variance k 500k 1,000k Confirmations Fig. 1. Solution Quality and Standard Deviation for Varying λ and F U We now analyze the tuning process and the tradeoff between number of confirmations and variance in more detail. Figure 1 shows the number of confirmations and variance for varying λ and F U for 1,000 test runs and policies achieved by 100k, 500k and 1,000k approximation runs. For 1,000k, λ = 0.1 to λ = 0.5 span a Pareto-front for both dimensions. For 100k, the tuning for ATB λ with λ = 0.1 is (still) not sufficient. During the tuning process, we experience an increase in the number of confirmations for low λ, and a decrease in the variance for high λ. Hence, a directed tuning of ATB λ (and AVI) to the two different objectives can be achieved. The integration of risk-aversion further results in a faster and more reliable AVI-tuning. 5 Conclusion In this paper, we applied an ADP method to a DVRP with stochastic customer requests enabling anticipation and the inclusion of service provider s riskaversion. Even though we simplistically assume the expected values V to follow a uniform distribution, results show that the integration is not only possible, but also strengthens the tuning process and even improves the overall (risk-neutral) objective. In this paper, we considered a vanilla DVRP. Future work may focus on more real-world related problems and problems containing unlikely events
6 6 Marlin W. Ulmer, Stefan Voß with significant impacts (e.g., vehicle breakdowns). For a more efficient tuning process, risk directed sampling may be included in the approach as proposed in [12]. Further, historical data about previous decision making may be analyzed to quantify service providers risk-aversion. For a more accurate approximation, the distribution of V could be explicitly considered by a set of quantiles. Finally, a mild risk-aversion improves the (risk-neutral) objective. Hence, it may be beneficial for many ADP methods to include a dynamic risk measure for a strengthened and more reliable tuning process. The risk-aversion may decrease during the tuning process once a more reliable approximation is achieved. References 1. Psaraftis, H.N., Wen, M., Kontovas, C.A.: Dynamic vehicle routing problems: Three decades and counting. Networks, online available (2015) 2. Ulmer, M.W., Mattfeld, D.C., Köster, F.: Budgeting time for dynamic vehicle routing with stochastic customer requests. Technical report, Technische Universität Braunschweig, Germany (2015) 3. Powell, W.B., Towns, M.T., Marar, A.: On the value of optimal myopic solutions for dynamic routing and scheduling problems in the presence of user noncompliance. Transportation Science 34 (2000) Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Volume 842. John Wiley & Sons, New York (2011) 5. Dyer, J.S., Sarin, R.K.: Relative risk aversion. Management Science 28 (1982) Jackwerth, J.C.: Recovering risk aversion from option prices and realized returns. Review of Financial Studies 13 (2000) Adulyasak, Y., Jaillet, P.: Models and algorithms for stochastic and robust vehicle routing with deadlines. Transportation Science, online available (2015) 8. Lau, H.C., Yeoh, W., Varakantham, P., Nguyen, D.T., Chen, H.: Dynamic stochastic orienteering problems for risk-aware applications. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence. (2012) Ordóñez, F.: Robust vehicle routing. TUTORIALS in Operations Research (2010) Jaillet, P., Wagner, M.R.: Online routing problems: Value of advanced information as improved competitive ratios. Transportation Science 40 (2006) Taniguchi, E., Thompson, R.G., Yamada, T.: Incorporating risks in city logistics. Procedia-Social and Behavioral Sciences 2 (2010) Jiang, D.R., Powell, W.B.: Approximate dynamic programming for dynamic quantile-based risk measures. Technical report, Princeton University (2015) 13. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York, New York, New York, New York (2014) 14. Ruszczyński, A.: Risk-averse dynamic programming for markov decision processes. Mathematical Programming 125 (2010) Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. Journal of Risk 2 (2000) Ulmer, M.W., Mattfeld, D.C., Hennig, M., Goodson, J.C.: A rollout algorithm for vehicle routing with stochastic customer requests. In: Logistics Management. Springer (2015)
Multistage risk-averse asset allocation with transaction costs
Multistage risk-averse asset allocation with transaction costs 1 Introduction Václav Kozmík 1 Abstract. This paper deals with asset allocation problems formulated as multistage stochastic programming models.
More informationAccelerated Option Pricing Multiple Scenarios
Accelerated Option Pricing in Multiple Scenarios 04.07.2008 Stefan Dirnstorfer (stefan@thetaris.com) Andreas J. Grau (grau@thetaris.com) 1 Abstract This paper covers a massive acceleration of Monte-Carlo
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationPortfolio Optimization using Conditional Sharpe Ratio
International Letters of Chemistry, Physics and Astronomy Online: 2015-07-01 ISSN: 2299-3843, Vol. 53, pp 130-136 doi:10.18052/www.scipress.com/ilcpa.53.130 2015 SciPress Ltd., Switzerland Portfolio Optimization
More informationReinforcement Learning
Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical
More informationDynamic Programming (DP) Massimo Paolucci University of Genova
Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem
More informationSequential Coalition Formation for Uncertain Environments
Sequential Coalition Formation for Uncertain Environments Hosam Hanna Computer Sciences Department GREYC - University of Caen 14032 Caen - France hanna@info.unicaen.fr Abstract In several applications,
More informationMULTISTAGE PORTFOLIO OPTIMIZATION AS A STOCHASTIC OPTIMAL CONTROL PROBLEM
K Y B E R N E T I K A M A N U S C R I P T P R E V I E W MULTISTAGE PORTFOLIO OPTIMIZATION AS A STOCHASTIC OPTIMAL CONTROL PROBLEM Martin Lauko Each portfolio optimization problem is a trade off between
More informationDynamic Risk Management in Electricity Portfolio Optimization via Polyhedral Risk Functionals
Dynamic Risk Management in Electricity Portfolio Optimization via Polyhedral Risk Functionals A. Eichhorn and W. Römisch Humboldt-University Berlin, Department of Mathematics, Germany http://www.math.hu-berlin.de/~romisch
More informationMaking Decisions. CS 3793 Artificial Intelligence Making Decisions 1
Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside
More informationA Simple Utility Approach to Private Equity Sales
The Journal of Entrepreneurial Finance Volume 8 Issue 1 Spring 2003 Article 7 12-2003 A Simple Utility Approach to Private Equity Sales Robert Dubil San Jose State University Follow this and additional
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationFinancial Giffen Goods: Examples and Counterexamples
Financial Giffen Goods: Examples and Counterexamples RolfPoulsen and Kourosh Marjani Rasmussen Abstract In the basic Markowitz and Merton models, a stock s weight in efficient portfolios goes up if its
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationRisk-averse Reinforcement Learning for Algorithmic Trading
Risk-averse Reinforcement Learning for Algorithmic Trading Yun Shen 1 Ruihong Huang 2,3 Chang Yan 2 Klaus Obermayer 1 1 TECHNISCHE UNIVERSITÄT BERLIN 2 HUMBOLDT-UNIVERSITÄT ZU BERLIN 3 LOBSTER TEAM IEEE
More informationMaking Complex Decisions
Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in
More informationFast Convergence of Regress-later Series Estimators
Fast Convergence of Regress-later Series Estimators New Thinking in Finance, London Eric Beutner, Antoon Pelsser, Janina Schweizer Maastricht University & Kleynen Consultants 12 February 2014 Beutner Pelsser
More informationc 2004 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-2004), Budapest, Hungary, pp
c 24 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-24), Budapest, Hungary, pp. 197 112. This material is posted here with permission of the IEEE.
More informationMATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS
MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.
More informationThe robust approach to simulation selection
The robust approach to simulation selection Ilya O. Ryzhov 1 Boris Defourny 2 Warren B. Powell 2 1 Robert H. Smith School of Business University of Maryland College Park, MD 20742 2 Operations Research
More informationRisk-Return Optimization of the Bank Portfolio
Risk-Return Optimization of the Bank Portfolio Ursula Theiler Risk Training, Carl-Zeiss-Str. 11, D-83052 Bruckmuehl, Germany, mailto:theiler@risk-training.org. Abstract In an intensifying competition banks
More informationDynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment
Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3-6, 2012 Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing
More informationPortfolio Optimization with Alternative Risk Measures
Portfolio Optimization with Alternative Risk Measures Prof. Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) MAFS6010R- Portfolio Optimization with R MSc in Financial Mathematics
More informationSolving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?
DOI 0.007/s064-006-9073-z ORIGINAL PAPER Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function? Jules H. van Binsbergen Michael W. Brandt Received:
More informationHedging with Life and General Insurance Products
Hedging with Life and General Insurance Products June 2016 2 Hedging with Life and General Insurance Products Jungmin Choi Department of Mathematics East Carolina University Abstract In this study, a hybrid
More informationDecision Theory: Value Iteration
Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision
More informationarxiv: v1 [math.pr] 6 Apr 2015
Analysis of the Optimal Resource Allocation for a Tandem Queueing System arxiv:1504.01248v1 [math.pr] 6 Apr 2015 Liu Zaiming, Chen Gang, Wu Jinbiao School of Mathematics and Statistics, Central South University,
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in
More information"Pricing Exotic Options using Strong Convergence Properties
Fourth Oxford / Princeton Workshop on Financial Mathematics "Pricing Exotic Options using Strong Convergence Properties Klaus E. Schmitz Abe schmitz@maths.ox.ac.uk www.maths.ox.ac.uk/~schmitz Prof. Mike
More informationLogistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week
CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials
More informationDynamic Portfolio Choice II
Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More information17 MAKING COMPLEX DECISIONS
267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the
More informationEFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS
Commun. Korean Math. Soc. 23 (2008), No. 2, pp. 285 294 EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Kyoung-Sook Moon Reprinted from the Communications of the Korean Mathematical Society
More informationThe Problem of Temporal Abstraction
The Problem of Temporal Abstraction How do we connect the high level to the low-level? " the human level to the physical level? " the decide level to the action level? MDPs are great, search is great,
More informationAlgorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model
Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model Simerjot Kaur (sk3391) Stanford University Abstract This work presents a novel algorithmic trading system based on reinforcement
More informationCPS 270: Artificial Intelligence Markov decision processes, POMDPs
CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward
More informationExchange rate dynamics and Forex hedging strategies
Exchange rate dynamics and Forex hedging strategies AUTHORS ARTICLE INFO JOURNAL Mihir Dash Anand Kumar N.S. Mihir Dash and Anand Kumar N.S. (2013). Exchange rate dynamics and Forex hedging strategies.
More informationReinforcement Learning Analysis, Grid World Applications
Reinforcement Learning Analysis, Grid World Applications Kunal Sharma GTID: ksharma74, CS 4641 Machine Learning Abstract This paper explores two Markov decision process problems with varying state sizes.
More informationA No-Arbitrage Theorem for Uncertain Stock Model
Fuzzy Optim Decis Making manuscript No (will be inserted by the editor) A No-Arbitrage Theorem for Uncertain Stock Model Kai Yao Received: date / Accepted: date Abstract Stock model is used to describe
More informationProblem set 5. Asset pricing. Markus Roth. Chair for Macroeconomics Johannes Gutenberg Universität Mainz. Juli 5, 2010
Problem set 5 Asset pricing Markus Roth Chair for Macroeconomics Johannes Gutenberg Universität Mainz Juli 5, 200 Markus Roth (Macroeconomics 2) Problem set 5 Juli 5, 200 / 40 Contents Problem 5 of problem
More informationEnergy Storage Arbitrage in Real-Time Markets via Reinforcement Learning
Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning Hao Wang, Baosen Zhang Department of Electrical Engineering, University of Washington, Seattle, WA 9895 Email: {hwang6,zhangbao}@uw.edu
More informationReinforcement Learning. Monte Carlo and Temporal Difference Learning
Reinforcement Learning Monte Carlo and Temporal Difference Learning Manfred Huber 2014 1 Monte Carlo Methods Dynamic Programming Requires complete knowledge of the MDP Spends equal time on each part of
More informationWorst-case-expectation approach to optimization under uncertainty
Worst-case-expectation approach to optimization under uncertainty Wajdi Tekaya Joint research with Alexander Shapiro, Murilo Pereira Soares and Joari Paulo da Costa : Cambridge Systems Associates; : Georgia
More informationAvailable online at ScienceDirect. Procedia Computer Science 95 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 95 (2016 ) 483 488 Complex Adaptive Systems, Publication 6 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More information91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010
91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course
More informationMath 416/516: Stochastic Simulation
Math 416/516: Stochastic Simulation Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 13 Haijun Li Math 416/516: Stochastic Simulation Week 13 1 / 28 Outline 1 Simulation
More informationStochastic Dynamic Programming for Portfolio Selection Problem applied to CAC40
Stochastic Dynamic Programming for Portfolio Selection Problem applied to CAC40 Edouard BERTHE berthe.ed@gmail.com Student 43966813 École Centrale Paris - The University of Queensland Supervisor: Michael
More informationMath Models of OR: More on Equipment Replacement
Math Models of OR: More on Equipment Replacement John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA December 2018 Mitchell More on Equipment Replacement 1 / 9 Equipment replacement
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2016 Slide 1 CPSC 422, Lecture 9 An MDP Approach to Multi-Category Patient Scheduling in a Diagnostic Facility Adapted from: Matthew
More informationOptimal Investment with Deferred Capital Gains Taxes
Optimal Investment with Deferred Capital Gains Taxes A Simple Martingale Method Approach Frank Thomas Seifried University of Kaiserslautern March 20, 2009 F. Seifried (Kaiserslautern) Deferred Capital
More informationRisk measures: Yet another search of a holy grail
Risk measures: Yet another search of a holy grail Dirk Tasche Financial Services Authority 1 dirk.tasche@gmx.net Mathematics of Financial Risk Management Isaac Newton Institute for Mathematical Sciences
More informationLecture 12. Asset pricing model. Randall Romero Aguilar, PhD I Semestre 2017 Last updated: June 15, 2017
Lecture 12 Asset pricing model Randall Romero Aguilar, PhD I Semestre 2017 Last updated: June 15, 2017 Universidad de Costa Rica EC3201 - Teoría Macroeconómica 2 Table of contents 1. Introduction 2. The
More information16 MAKING SIMPLE DECISIONS
247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result
More informationApproximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets
Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets Selvaprabu (Selva) Nadarajah, (Joint work with François Margot and Nicola Secomandi) Tepper School
More informationReinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein
Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the
More informationA Study on Numerical Solution of Black-Scholes Model
Journal of Mathematical Finance, 8, 8, 37-38 http://www.scirp.org/journal/jmf ISSN Online: 6-44 ISSN Print: 6-434 A Study on Numerical Solution of Black-Scholes Model Md. Nurul Anwar,*, Laek Sazzad Andallah
More informationINTERTEMPORAL ASSET ALLOCATION: THEORY
INTERTEMPORAL ASSET ALLOCATION: THEORY Multi-Period Model The agent acts as a price-taker in asset markets and then chooses today s consumption and asset shares to maximise lifetime utility. This multi-period
More informationMarkov Decision Processes II
Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies
More informationLecture outline W.B.Powell 1
Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationA class of coherent risk measures based on one-sided moments
A class of coherent risk measures based on one-sided moments T. Fischer Darmstadt University of Technology November 11, 2003 Abstract This brief paper explains how to obtain upper boundaries of shortfall
More informationA lower bound on seller revenue in single buyer monopoly auctions
A lower bound on seller revenue in single buyer monopoly auctions Omer Tamuz October 7, 213 Abstract We consider a monopoly seller who optimally auctions a single object to a single potential buyer, with
More informationCapital Adequacy and Liquidity in Banking Dynamics
Capital Adequacy and Liquidity in Banking Dynamics Jin Cao Lorán Chollete October 9, 2014 Abstract We present a framework for modelling optimum capital adequacy in a dynamic banking context. We combine
More informationAn Application of Extreme Value Theory for Measuring Financial Risk in the Uruguayan Pension Fund 1
An Application of Extreme Value Theory for Measuring Financial Risk in the Uruguayan Pension Fund 1 Guillermo Magnou 23 January 2016 Abstract Traditional methods for financial risk measures adopts normal
More informationChapter 9 Dynamic Models of Investment
George Alogoskoufis, Dynamic Macroeconomic Theory, 2015 Chapter 9 Dynamic Models of Investment In this chapter we present the main neoclassical model of investment, under convex adjustment costs. This
More informationOptimal Securitization via Impulse Control
Optimal Securitization via Impulse Control Rüdiger Frey (joint work with Roland C. Seydel) Mathematisches Institut Universität Leipzig and MPI MIS Leipzig Bachelier Finance Society, June 21 (1) Optimal
More informationA distributed Laplace transform algorithm for European options
A distributed Laplace transform algorithm for European options 1 1 A. J. Davies, M. E. Honnor, C.-H. Lai, A. K. Parrott & S. Rout 1 Department of Physics, Astronomy and Mathematics, University of Hertfordshire,
More informationAnalysis of truncated data with application to the operational risk estimation
Analysis of truncated data with application to the operational risk estimation Petr Volf 1 Abstract. Researchers interested in the estimation of operational risk often face problems arising from the structure
More informationIntroduction. Tero Haahtela
Lecture Notes in Management Science (2012) Vol. 4: 145 153 4 th International Conference on Applied Operational Research, Proceedings Tadbir Operational Research Group Ltd. All rights reserved. www.tadbir.ca
More informationCS 188: Artificial Intelligence. Outline
C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence
More informationOn the investment}uncertainty relationship in a real options model
Journal of Economic Dynamics & Control 24 (2000) 219}225 On the investment}uncertainty relationship in a real options model Sudipto Sarkar* Department of Finance, College of Business Administration, University
More informationDecoupling and Agricultural Investment with Disinvestment Flexibility: A Case Study with Decreasing Expectations
Decoupling and Agricultural Investment with Disinvestment Flexibility: A Case Study with Decreasing Expectations T. Heikkinen MTT Economic Research Luutnantintie 13, 00410 Helsinki FINLAND email:tiina.heikkinen@mtt.fi
More informationThe End-of-the-Year Bonus: How to Optimally Reward a Trader?
The End-of-the-Year Bonus: How to Optimally Reward a Trader? Hyungsok Ahn Jeff Dewynne Philip Hua Antony Penaud Paul Wilmott February 14, 2 ABSTRACT Traders are compensated by bonuses, in addition to their
More informationDRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics
Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward
More informationValue-at-Risk Based Portfolio Management in Electric Power Sector
Value-at-Risk Based Portfolio Management in Electric Power Sector Ran SHI, Jin ZHONG Department of Electrical and Electronic Engineering University of Hong Kong, HKSAR, China ABSTRACT In the deregulated
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationOptimal construction of a fund of funds
Optimal construction of a fund of funds Petri Hilli, Matti Koivu and Teemu Pennanen January 28, 29 Introduction We study the problem of diversifying a given initial capital over a finite number of investment
More informationReinforcement Learning
Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent
More informationOptimal Scheduling Policy Determination in HSDPA Networks
Optimal Scheduling Policy Determination in HSDPA Networks Hussein Al-Zubaidy, Jerome Talim, Ioannis Lambadaris SCE-Carleton University 1125 Colonel By Drive, Ottawa, ON, Canada Email: {hussein, jtalim,
More informationLog-Robust Portfolio Management
Log-Robust Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Elcin Cetinkaya and Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983 Dr.
More informationCOMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2
COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationMartingale Pricing Theory in Discrete-Time and Discrete-Space Models
IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,
More informationOptimal Policies of Newsvendor Model Under Inventory-Dependent Demand Ting GAO * and Tao-feng YE
207 2 nd International Conference on Education, Management and Systems Engineering (EMSE 207 ISBN: 978--60595-466-0 Optimal Policies of Newsvendor Model Under Inventory-Dependent Demand Ting GO * and Tao-feng
More information1. For two independent lives now age 30 and 34, you are given:
Society of Actuaries Course 3 Exam Fall 2003 **BEGINNING OF EXAMINATION** 1. For two independent lives now age 30 and 34, you are given: x q x 30 0.1 31 0.2 32 0.3 33 0.4 34 0.5 35 0.6 36 0.7 37 0.8 Calculate
More informationOn worst-case investment with applications in finance and insurance mathematics
On worst-case investment with applications in finance and insurance mathematics Ralf Korn and Olaf Menkens Fachbereich Mathematik, Universität Kaiserslautern, 67653 Kaiserslautern Summary. We review recent
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Stochastic domains Image: Berkeley CS188 course notes (downloaded Summer
More informationSTOCHASTIC COST ESTIMATION AND RISK ANALYSIS IN MANAGING SOFTWARE PROJECTS
Full citation: Connor, A.M., & MacDonell, S.G. (25) Stochastic cost estimation and risk analysis in managing software projects, in Proceedings of the ISCA 14th International Conference on Intelligent and
More informationPortfolio optimization problem with default risk
Portfolio optimization problem with default risk M.Mazidi, A. Delavarkhalafi, A.Mokhtari mazidi.3635@gmail.com delavarkh@yazduni.ac.ir ahmokhtari20@gmail.com Faculty of Mathematics, Yazd University, P.O.
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements
More informationRobust Portfolio Decisions for Financial Institutions
Robust Portfolio Decisions for Financial Institutions Ioannis Baltas 1,3, Athanasios N. Yannacopoulos 2,3 & Anastasios Xepapadeas 4 1 Department of Financial and Management Engineering University of the
More informationTDT4171 Artificial Intelligence Methods
TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods
More informationConsumption and Portfolio Choice under Uncertainty
Chapter 8 Consumption and Portfolio Choice under Uncertainty In this chapter we examine dynamic models of consumer choice under uncertainty. We continue, as in the Ramsey model, to take the decision of
More information16 MAKING SIMPLE DECISIONS
253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)
More informationPortfolio Management and Optimal Execution via Convex Optimization
Portfolio Management and Optimal Execution via Convex Optimization Enzo Busseti Stanford University April 9th, 2018 Problems portfolio management choose trades with optimization minimize risk, maximize
More information