Dynamic Programming (DP) Massimo Paolucci University of Genova
|
|
- Magnus Rice
- 5 years ago
- Views:
Transcription
1 Dynamic Programming (DP) Massimo Paolucci University of Genova
2 DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem is defined The overall solution is obtained recursively DP is based on the so-called Optimality Principle, which allows one to reduce the solution of a given problem to the solutions of a series of subproblems
3 A company has three machineries and 5M available to expand them C i : possible investment on machinery i R i : corresponding profit Possibility of investment An example Machinery 1 Machinery 2 Machinery 3 C1 R1 C2 R2 C3 R Target: finding the most rewarding investment
4 In this simple example one might enumerate all possible alternatives, but in general explicit enumeration is cumbersome and inefficient: for each alternative one has to solve the whole problem; unfeasible alternatives are not recognized a priori; at each step, the information obtained during the computation of previous alternatives is not exploited.
5 In this example, the approach based on Dynamic Programming can be introduced via a graph. The model: stage machinery state (x i ) possible allocation of money to machineries from stage to stage i (x i 5) arc (x i, x i+1 ) most rewarding allocation of money (x i+1 -x i ) Weight of each arc profit associated with the corresponding investment
6 stage x stage 1 x 1 stage 2 x 2 stage 3 x
7 The problem consists in finding the path -5 (from stage to stage 3) with largest weight Each path represents an admissible solution Some arcs represent overspending situations
8 Backward phase The graph is travelled backward, starting from stage 3 To each node one associates f i (x i ) := length of the maximal path between x i and x 3
9 stage 3: f 3 (5) = stage 2: f 2 (x 2 ) = R(x 2, x 3 ) = length of the maximal path between x 2 and x 3 f 2 () = f 2 (1) = f 2 (2) = f 2 (3) = f 2 (4) = 3 f 2 (5) =
10 stage 1: length of the maximal path between x 1 and x 3 f1(x 1) max f2(x2) R(x1,x 2) x 2 length of the maximal path between x 2 and x 3 length of the arc between x 1 and x 2
11 stage 1: f 1 () = max[+3, +3, 8+3, 9+3, 12+3, 12+] = 15 f 1 (1) = max[+3, +3, 8+3, 9+3, 12+] = 12 f 1 (2) = max[+3, +3, 8+3, 9+] = 11 f 1 (3) = max[+3, +3, 8+] = 8 f 1 (4) = max[+3, +] = 3 f 1 (5) = max[+] =
12 stage : length of the maximal path between x and x 3 f () = max[+15, 5+12, 6+11, 6+8, 6+3, 6+] = 17 f() max f1(x 1) R(,x1) x 1
13 Forward phase There exist various alternative maximal paths, which can be found by travelling the solution forward: path 1) possibility of investment 2, 3, 2 path 2) possibility of investment 2, 4, 1 path 3) possibility of investment 3, 2, 2
14 We have seen a deterministic example The DP approach can be applied, with suitable modifications, also to stochastic contexts
15 The backward equations of DP (deterministic context) Let us consider the following general case: x N-1 x 1 x 2 x N-1 N x N N stages (i=,...,n-1) For each stage i one has x i {1,...,q}, i.e., there are q possible different states. In general, the state is a vector in R n A cost T(x i, x i+1 ) is associated with each arc (e.g., the cost when the arc is travelled)
16 Minimum total cost: T N1 min T i(x i,xi1) i Number of possible paths: q N-1 E.g., q=1, N= If a path is computed in 1-6 sec, then finding the solution requires 1 14 sec years!!!
17 DP solves the problem with a backward procedure, in which a subproblem is solved at each stage To each arc one associates the optimal cost required to reach the final stage starting from it Stage N-1: T N-1(x N-1 ) = T N-1 (x N-1, x N ) For instance, in the case of the route of an plane, such a cost is the time required to reach x N from x N-1
18 Stage N - 2: T (x ) min T (x,x ) T N 2 N2 N2 N2 N1 N 1 (xn1) xn1 Minimum time to reach x N starting from x N-2 Minimum time to reach x N starting from x N-1 Minimum time to reach x N-1 starting from x N-2
19 In general Backward equations of DP T (xi) i min T i(xi,xi1) xi1 T i 1 (xi1) i N 1,N 2,..., T (xn) N Cost-To-Go
20 Optimality Principle: Whichever the state at a certain stage is, one has to proceed by following the optimal trajectory: T minmin... x1 x2 min TN xn1... At each stage, one has to make q sums q q 2 sums Hence, in total one has q 2 N operations Example: = = 2.1 msec Compare with years!!!
21 Bellman s theorem It proves the correctness of the backward equations of DP T N1 min T i(x i,xi1) x1,...,xn1 i x : departure, x N : arrival T min T (x,x1) x1,...,xn1 N1 T i(x i,xi1) i1 x 1 influences all the terms, but x 2,...,x N-1 influence only the terms inside the sum.
22 Hence T mint (x,x1) x1 N1 min T i(x i,xi1) x2,...,xn1 i1 Cost-To-Go Optimal cost from the second stage till the last one
23 The equation can be rewritten as T min T (x,x ) T 1 (x1) 1 x1 1st equation of DP By writing T 1 (x 1 ) explicitly T N 1 (x ) min T (x,x ) T (x,x ) i i i 1 x2,...,xn1 i2 mint1 (x1,x 2) x2 min T (x,x ) T (x2) 2 x2 N1 min T i(x i,xi1) x3,...,xn1 i2 2nd equation of DP... and so on
24 The curse of dimensionality In general the state is not a scalar, but an n-dimensional vector Hence, at each stage of DP one has to keep the information regarding all possible combinations of values.
25 For instance, when the state has dimension 2 (i.e., n=2) and the possible values of each x i are d, then one has for each stage q q for N-1 stages q 2 q 2... q 2 = q 2(N-1) in general, for dimension n q n(n-1) x 2 x 1 stage 1... stage N Number of operations in dimension n: q n q n =q 2n sums per stage q 2n N operations over N stages
26 Dynamic Programming: summing up To apply DP, it is required that the problem can be divided into stages. For every stage, a decision policy has to be determined A number of states is associated with each stage The effect of the decision at each stage consists in transforming the present state into a new state, associated with the next stage
27 At the current stage, the decision taken at previous stages does not influence the decision at next stages The backward solution process determines the optimal decision policy for each state of the previous stage The optimal decision policy is obtained recursively It is determined by travelling forward the sequence of decisions
Optimization Methods. Lecture 16: Dynamic Programming
15.093 Optimization Methods Lecture 16: Dynamic Programming 1 Outline 1. The knapsack problem Slide 1. The traveling salesman problem 3. The general DP framework 4. Bellman equation 5. Optimal inventory
More informationDynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming
Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role
More informationIntroduction to Dynamic Programming
Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1
More informationHandout 4: Deterministic Systems and the Shortest Path Problem
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas
More informationLecture Notes 1
4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross
More informationIEOR E4004: Introduction to OR: Deterministic Models
IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the
More informationHandout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,
More informationProblem Set 2: Answers
Economics 623 J.R.Walker Page 1 Problem Set 2: Answers The problem set came from Michael A. Trick, Senior Associate Dean, Education and Professor Tepper School of Business, Carnegie Mellon University.
More information6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE
6.21 DYNAMIC PROGRAMMING LECTURE LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path
More information6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time
More informationLEC 13 : Introduction to Dynamic Programming
CE 191: Civl and Environmental Engineering Systems Analysis LEC 13 : Introduction to Dynamic Programming Professor Scott Moura Civl & Environmental Engineering University of California, Berkeley Fall 2013
More informationDynamic Portfolio Choice II
Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic
More informationCS 188: Artificial Intelligence. Outline
C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence
More informationLecture 5 January 30
EE 223: Stochastic Estimation and Control Spring 2007 Lecture 5 January 30 Lecturer: Venkat Anantharam Scribe: aryam Kamgarpour 5.1 Secretary Problem The problem set-up is explained in Lecture 4. We review
More informationDeterministic Dynamic Programming
Deterministic Dynamic Programming Dynamic programming is a technique that can be used to solve many optimization problems. In most applications, dynamic programming obtains solutions by working backward
More informationEE365: Risk Averse Control
EE365: Risk Averse Control Risk averse optimization Exponential risk aversion Risk averse control 1 Outline Risk averse optimization Exponential risk aversion Risk averse control Risk averse optimization
More informationOptimal energy management and stochastic decomposition
Optimal energy management and stochastic decomposition F. Pacaud P. Carpentier J.P. Chancelier M. De Lara JuMP-dev workshop, 2018 ENPC ParisTech ENSTA ParisTech Efficacity 1/23 Motivation We consider a
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More informationChapter 21. Dynamic Programming CONTENTS 21.1 A SHORTEST-ROUTE PROBLEM 21.2 DYNAMIC PROGRAMMING NOTATION
Chapter 21 Dynamic Programming CONTENTS 21.1 A SHORTEST-ROUTE PROBLEM 21.2 DYNAMIC PROGRAMMING NOTATION 21.3 THE KNAPSACK PROBLEM 21.4 A PRODUCTION AND INVENTORY CONTROL PROBLEM 23_ch21_ptg01_Web.indd
More information6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move
More informationLecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods
Lecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods. Introduction In ECON 50, we discussed the structure of two-period dynamic general equilibrium models, some solution methods, and their
More informationTHE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE
THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,
More information6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE
6.21 DYNAMIC PROGRAMMING LECTURE LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path
More informationCHAPTER 5: DYNAMIC PROGRAMMING
CHAPTER 5: DYNAMIC PROGRAMMING Overview This chapter discusses dynamic programming, a method to solve optimization problems that involve a dynamical process. This is in contrast to our previous discussions
More informationStochastic Optimal Control
Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of
More informationDefinition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.
102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More information0/1 knapsack problem knapsack problem
1 (1) 0/1 knapsack problem. A thief robbing a safe finds it filled with N types of items of varying size and value, but has only a small knapsack of capacity M to use to carry the goods. More precisely,
More informationComplex Decisions. Sequential Decision Making
Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by
More informationFrom Discrete Time to Continuous Time Modeling
From Discrete Time to Continuous Time Modeling Prof. S. Jaimungal, Department of Statistics, University of Toronto 2004 Arrow-Debreu Securities 2004 Prof. S. Jaimungal 2 Consider a simple one-period economy
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationMartingale Pricing Theory in Discrete-Time and Discrete-Space Models
IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,
More information6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE Suboptimal control Cost approximation methods: Classification Certainty equivalent control: An example Limited lookahead policies Performance bounds
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More information1 Answers to the Sept 08 macro prelim - Long Questions
Answers to the Sept 08 macro prelim - Long Questions. Suppose that a representative consumer receives an endowment of a non-storable consumption good. The endowment evolves exogenously according to ln
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 9: MDPs 9/22/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 2 Grid World The agent lives in
More informationMarkov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N
Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning
More informationEssays on Some Combinatorial Optimization Problems with Interval Data
Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university
More informationDynamic Appointment Scheduling in Healthcare
Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2011-12-05 Dynamic Appointment Scheduling in Healthcare McKay N. Heasley Brigham Young University - Provo Follow this and additional
More informationAvailable online at ScienceDirect. Procedia Computer Science 95 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 95 (2016 ) 483 488 Complex Adaptive Systems, Publication 6 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri
More informationPortfolio Choice. := δi j, the basis is orthonormal. Expressed in terms of the natural basis, x = j. x j x j,
Portfolio Choice Let us model portfolio choice formally in Euclidean space. There are n assets, and the portfolio space X = R n. A vector x X is a portfolio. Even though we like to see a vector as coordinate-free,
More information17 MAKING COMPLEX DECISIONS
267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the
More informationHomework #4. CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class
Homework #4 CMSC351 - Spring 2013 PRINT Name : Due: Thu Apr 16 th at the start of class o Grades depend on neatness and clarity. o Write your answers with enough detail about your approach and concepts
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationRisk-Averse Anticipation for Dynamic Vehicle Routing
Risk-Averse Anticipation for Dynamic Vehicle Routing Marlin W. Ulmer 1 and Stefan Voß 2 1 Technische Universität Braunschweig, Mühlenpfordtstr. 23, 38106 Braunschweig, Germany, m.ulmer@tu-braunschweig.de
More informationReinforcement Learning
Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent
More informationLecture outline W.B.Powell 1
Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous
More informationChapter 15: Dynamic Programming
Chapter 15: Dynamic Programming Dynamic programming is a general approach to making a sequence of interrelated decisions in an optimum way. While we can describe the general characteristics, the details
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Stochastic domains Image: Berkeley CS188 course notes (downloaded Summer
More informationEE266 Homework 5 Solutions
EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The
More informationEC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods
EC316a: Advanced Scientific Computation, Fall 2003 Notes Section 4 Discrete time, continuous state dynamic models: solution methods We consider now solution methods for discrete time models in which decisions
More informationOptimal Dam Management
Optimal Dam Management Michel De Lara et Vincent Leclère July 3, 2012 Contents 1 Problem statement 1 1.1 Dam dynamics.................................. 2 1.2 Intertemporal payoff criterion..........................
More informationOptimal Security Liquidation Algorithms
Optimal Security Liquidation Algorithms Sergiy Butenko Department of Industrial Engineering, Texas A&M University, College Station, TX 77843-3131, USA Alexander Golodnikov Glushkov Institute of Cybernetics,
More informationNeuro-Dynamic Programming for Fractionated Radiotherapy Planning
Neuro-Dynamic Programming for Fractionated Radiotherapy Planning Geng Deng Michael C. Ferris University of Wisconsin at Madison Conference on Optimization and Health Care, Feb, 2006 Background Optimal
More informationRobust Dual Dynamic Programming
1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217 2 / 18 Inspired by SDDP Stochastic optimization
More informationMDPs: Bellman Equations, Value Iteration
MDPs: Bellman Equations, Value Iteration Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) Adapted from slides kindly shared by Stuart Russell Sutton & Barto Ch 4 (Cf. AIMA Ch 17, Section 2-3) 1 Appreciations
More informationOptimal Securitization via Impulse Control
Optimal Securitization via Impulse Control Rüdiger Frey (joint work with Roland C. Seydel) Mathematisches Institut Universität Leipzig and MPI MIS Leipzig Bachelier Finance Society, June 21 (1) Optimal
More informationAMH4 - ADVANCED OPTION PRICING. Contents
AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationDynamic Programming cont. We repeat: The Dynamic Programming Template has three parts.
Page 1 Dynamic Programming cont. We repeat: The Dynamic Programming Template has three parts. Subproblems Sometimes this is enough if the algorithm and its complexity is obvious. Recursion Algorithm Must
More informationReinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein
Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements
More informationCOMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2
COMP417 Introduction to Robotics and Intelligent Systems Reinforcement Learning - 2 Speaker: Sandeep Manjanna Acklowledgement: These slides use material from Pieter Abbeel s, Dan Klein s and John Schulman
More information25 Increasing and Decreasing Functions
- 25 Increasing and Decreasing Functions It is useful in mathematics to define whether a function is increasing or decreasing. In this section we will use the differential of a function to determine this
More informationDynamic Programming and Reinforcement Learning
Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning
More informationarxiv: v1 [math.pr] 6 Apr 2015
Analysis of the Optimal Resource Allocation for a Tandem Queueing System arxiv:1504.01248v1 [math.pr] 6 Apr 2015 Liu Zaiming, Chen Gang, Wu Jinbiao School of Mathematics and Statistics, Central South University,
More informationMultistage Stochastic Programming
IE 495 Lecture 21 Multistage Stochastic Programming Prof. Jeff Linderoth April 16, 2003 April 16, 2002 Stochastic Programming Lecture 21 Slide 1 Outline HW Fixes Multistage Stochastic Programming Modeling
More information16 MAKING SIMPLE DECISIONS
253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)
More informationAN ALGORITHM FOR FINDING SHORTEST ROUTES FROM ALL SOURCE NODES TO A GIVEN DESTINATION IN GENERAL NETWORKS*
526 AN ALGORITHM FOR FINDING SHORTEST ROUTES FROM ALL SOURCE NODES TO A GIVEN DESTINATION IN GENERAL NETWORKS* By JIN Y. YEN (University of California, Berkeley) Summary. This paper presents an algorithm
More informationRevenue Management Under the Markov Chain Choice Model
Revenue Management Under the Markov Chain Choice Model Jacob B. Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jbf232@cornell.edu Huseyin
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline
More informationPrize offered for the solution of a dynamic blocking problem
Prize offered for the solution of a dynamic blocking problem Posted by A. Bressan on January 19, 2011 Statement of the problem Fire is initially burning on the unit disc in the plane IR 2, and propagateswith
More informationOnline Appendix. ( ) =max
Online Appendix O1. An extend model In the main text we solved a model where past dilemma decisions affect subsequent dilemma decisions but the DM does not take into account how her actions will affect
More informationCS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.
CS 188 Fall 2013 Introduction to Artificial Intelligence Midterm 1 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC
More informationUNIT 2. Greedy Method GENERAL METHOD
UNIT 2 GENERAL METHOD Greedy Method Greedy is the most straight forward design technique. Most of the problems have n inputs and require us to obtain a subset that satisfies some constraints. Any subset
More informationMS-E2114 Investment Science Exercise 10/2016, Solutions
A simple and versatile model of asset dynamics is the binomial lattice. In this model, the asset price is multiplied by either factor u (up) or d (down) in each period, according to probabilities p and
More informationTDT4171 Artificial Intelligence Methods
TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationStochastic Optimization
Stochastic Optimization Introduction and Examples Alireza Ghaffari-Hadigheh Azarbaijan Shahid Madani University (ASMU) hadigheha@azaruniv.edu Fall 2017 Alireza Ghaffari-Hadigheh (ASMU) Stochastic Optimization
More informationCSE 473: Artificial Intelligence
CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationOn Complexity of Multistage Stochastic Programs
On Complexity of Multistage Stochastic Programs Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA e-mail: ashapiro@isye.gatech.edu
More informationOptimal Long-Term Supply Contracts with Asymmetric Demand Information. Appendix
Optimal Long-Term Supply Contracts with Asymmetric Demand Information Ilan Lobel Appendix Wenqiang iao {ilobel, wxiao}@stern.nyu.edu Stern School of Business, New York University Appendix A: Proofs Proof
More informationDM559/DM545 Linear and integer programming
Department of Mathematics and Computer Science University of Southern Denmark, Odense May 22, 2018 Marco Chiarandini DM559/DM55 Linear and integer programming Sheet, Spring 2018 [pdf format] Contains Solutions!
More informationCS885 Reinforcement Learning Lecture 3b: May 9, 2018
CS885 Reinforcement Learning Lecture 3b: May 9, 2018 Intro to Reinforcement Learning [SutBar] Sec. 5.1-5.3, 6.1-6.3, 6.5, [Sze] Sec. 3.1, 4.3, [SigBuf] Sec. 2.1-2.5, [RusNor] Sec. 21.1-21.3, CS885 Spring
More informationScenario reduction and scenario tree construction for power management problems
Scenario reduction and scenario tree construction for power management problems N. Gröwe-Kuska, H. Heitsch and W. Römisch Humboldt-University Berlin Institute of Mathematics Page 1 of 20 IEEE Bologna POWER
More informationDifferential Geometry: Curvature, Maps, and Pizza
Differential Geometry: Curvature, Maps, and Pizza Madelyne Ventura University of Maryland December 8th, 2015 Madelyne Ventura (University of Maryland) Curvature, Maps, and Pizza December 8th, 2015 1 /
More informationConsumption and Portfolio Choice under Uncertainty
Chapter 8 Consumption and Portfolio Choice under Uncertainty In this chapter we examine dynamic models of consumer choice under uncertainty. We continue, as in the Ramsey model, to take the decision of
More informationReinforcement Learning and Optimal Control. Chapter 1 Exact Dynamic Programming DRAFT
Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 1 Exact Dynamic Programming DRAFT This is Chapter 1 of the draft textbook Reinforcement
More informationPricing Kernel. v,x = p,y = p,ax, so p is a stochastic discount factor. One refers to p as the pricing kernel.
Payoff Space The set of possible payoffs is the range R(A). This payoff space is a subspace of the state space and is a Euclidean space in its own right. 1 Pricing Kernel By the law of one price, two portfolios
More informationuseful than solving these yourself, writing up your solution and then either comparing your
CSE 441T/541T: Advanced Algorithms Fall Semester, 2003 September 9, 2004 Practice Problems Solutions Here are the solutions for the practice problems. However, reading these is far less useful than solving
More informationMarkov Decision Process
Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf
More informationLecture 2: Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and
Lecture 2: Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization The marginal or derivative function and optimization-basic principles The average function
More informationDynamic Portfolio Execution Detailed Proofs
Dynamic Portfolio Execution Detailed Proofs Gerry Tsoukalas, Jiang Wang, Kay Giesecke March 16, 2014 1 Proofs Lemma 1 (Temporary Price Impact) A buy order of size x being executed against i s ask-side
More informationIEOR 3106: Introduction to Operations Research: Stochastic Models SOLUTIONS to Final Exam, Sunday, December 16, 2012
IEOR 306: Introduction to Operations Research: Stochastic Models SOLUTIONS to Final Exam, Sunday, December 6, 202 Four problems, each with multiple parts. Maximum score 00 (+3 bonus) = 3. You need to show
More informationInterpolation. 1 What is interpolation? 2 Why are we interested in this?
Interpolation 1 What is interpolation? For a certain function f (x we know only the values y 1 = f (x 1,,y n = f (x n For a point x different from x 1,,x n we would then like to approximate f ( x using
More informationIdentification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium
Identification and Estimation of Dynamic Games when Players Belief Are Not in Equilibrium A Short Review of Aguirregabiria and Magesan (2010) January 25, 2012 1 / 18 Dynamics of the game Two players, {i,
More informationHomework solutions, Chapter 8
Homework solutions, Chapter 8 NOTE: We might think of 8.1 as being a section devoted to setting up the networks and 8.2 as solving them, but only 8.2 has a homework section. Section 8.2 2. Use Dijkstra
More information