3. The Dynamic Programming Algorithm (cont d)
|
|
- Felicia Logan
- 5 years ago
- Views:
Transcription
1 3. The Dynamic Programming Algorithm (cont d) Last lecture e introduced the DPA. In this lecture, e first apply the DPA to the chess match example, and then sho ho to deal ith problems that do not match the standard form outlined in Section 2.1. xample 1: Chess match strategy (revisited) Consider a to game chess match ith an opponent. Our objective is to develop a strategy that maximizes the chance of inning the match. ach game can have one of to outcomes: 1) Win/Lose: 1 point for the inner, 0 for the loser; 2) Dra: 0.5 points for each player. In addition, if at the end of to games the score is equal, the players keep on playing ne games until one ins, and thereby ins the match (also knon as sudden death). There are to possible playing styles for our player: timid and bold. When playing timid, our player dras ith probability p d and loses ith probability (1 p d ). When playing bold, our player ins ith probability p and loses ith probability (1 p ). We also assume that p d > p, a necessary condition for this problem to make sense. We ant to find a control policy that maximizes the probability of inning the match. We ill solve this using the DPA (replacing min ith max). The state x k is the difference beteen our player s score and the opponent s score at the end of game k. That is, x 0 = 0, x 1 S 1 = { 1, 0, 1}, x 2 S 2 = { 2, 1, 0, 1, 2}. The control inputs u k are the to playing styles, that is u k U = {timid, bold}. Dynamics: model as a finite state system using transition probabilities (see Section 1.3): here x k+1 = k, k = 0, 1 Pr ( k = x k u k = timid) = p d Pr ( k = x k 1 u k = timid) = 1 p d Pr ( k = x k + 1 u k = bold) = p Pr ( k = x k 1 u k = bold) = 1 p ith p d > p. Date compiled: October 6,
2 Cost: e ant to maximize the probability of inning. This is equivalent to the standard form 1 g 2 (x 2 ) + g k (x k, u k, k ) here k=0 g k (x k, u k, k ) = 0, k {0, 1} 1 if x 2 > 0 g 2 (x 2 ) = p if x 2 = 0 0 if x 2 < 0 To see that the expected cost is equal to the probability of inning P in, let q + := Pr (x 2 0), q 0 := Pr (x 2 = 0) q := Pr (x 2 0). The probability of inning is P in = q + + q 0 p and the expected value of the cost is g 2 (x 2 ) = q q 0 p + q 0 = q + + q 0 p. No apply DPA: Initialization: 1 if x > 0 J 2 (x) = p if x = 0 0 if x < 0 Recursion: J k (x) = max u U g k(x k, u k, k ) + J k+1 (x k+1 ), x S k, k {0, 1} ( k x k =x,u k =u) = max J k+1( k ) u U ( k x k =x,u k =u) = max p dj k+1 (x) + (1 p d )J k+1 (x 1), p J k+1 (x + 1) + (1 p )J k+1 (x 1) }{{}}{{}. timid Henceforth the first entry of the maximum ill denote the cost associated ith timid play and the second ith bold play. k = 1 : J 1 (x) = max {p d J 2 (x) + (1 p d )J 2 (x 1), p J 2 (x + 1) + (1 p )J 2 (x 1)} x 1 = 1: J 1 (1) = max {p d + (1 p d )p, p + (1 p )p } Comparing the to entries yields: (p d + (1 p d )p ) (p + (1 p )p ) = (p d p )(1 p ) > 0 (since p d > p ) Therefore µ 1 (1) = timid and J 1(1) = p d + (1 p d )p. 2 bold
3 x 1 = 0: k = 0 : J 1 (0) = max {p d p + (1 p d ) 0, p + (1 p ) 0} = max {p d p, p } Therefore µ 1 (0) = bold and J 1(0) = p. x 1 = 1: J 1 ( 1) = max {p d 0 + (1 p d ) 0, p p + (1 p ) 0} = max { 0, p 2 } Therefore µ 1 ( 1) = bold and J 1( 1) = p 2. J 0 (x) = max {p d J 1 (x) + (1 p d )J 1 (x 1), p J 1 (x + 1) + (1 p )J 1 (x 1)} x 0 = 0: J 0 (0) = max {p d J 1 (0) + (1 p d )J 1 ( 1), p J 1 (1) + (1 p )J 1 ( 1)} = max { p d p + (1 p d )p 2, p (p d + (1 p d )p ) + (1 p )p 2 } = max { p d p + (1 p d )p 2, p d p + (1 p d )p 2 + (1 p )p 2 } Therefore µ 0 (0) = bold and J 0(0) = p d p + (1 p d )p 2 + (1 p )p 2. Optimal match strategy: Play timid iff ahead in the score. 3.1 Converting non-standard problems to the standard form At first glance, the class of systems of our standard problem formulation in Section 1.1 may seem limiting but is in fact general enough to handle other types of problems via state-augmentation. We include some examples belo Time Lags Assume the dynamics have the folloing form: x k+1 = f k (x k, x k 1, u k, u k 1, k ) Let y k := x k 1, s k := u k 1, and the augmented state vector x k := (x k, y k, s k ). The dynamics of the augmented state then become x k+1 f k (x k, y k, u k, s k, k ) x k+1 = y k+1 = s k+1 x k u k =: f k ( x k, u k, k ) hich no matches the standard form. Note that this procedure orks for an arbitrary number of time lags. 3
4 3.1.2 Correlated Disturbances Disturbances k that are correlated across time (colored noise) can commonly be modeled as the output of a linear system driven by independent random variables as follos: k = C k y k+1 y k+1 = A k y k + ξ k here A k, C k are given and ξ k, k = 0,..., N 1, are independent random variables. Let the augmented state vector x k := (x k, y k ). Note that no y k must be observed at time k, hich can be done using a state estimator. The dynamics of the augmented state then become xk+1 fk (x x k+1 = = k, u k, C k (A k y k + ξ k )) =: y k+1 A k y k + ξ f k ( x k, u k, ξ k ) k hich no matches the standard form Forecasts Consider the case here at each time period e have access to a forecast that reveals the probability distribution of k, and possibly of future disturbances. For example, assume that k is independent of x k and u k. At the beginning of each period k, e receive a prediction y k (forecast) that k ill attain a probability distribution out of a given finite collection of distributions { p k y k ( 1), p k y k ( 2),..., p k y k ( m) }. In particular, e receive a forecast that y k = i and thus p k y k ( i) is used to generate k. Furthermore, the forecast itself has a given a-priori probability distribution, namely, y k+1 = ξ k, here the ξ k are independent random variables taking value i {1, 2,..., m} ith probability p ξk (i). Let the augmented state vector x k := (x k, y k ). Since the forecast y k is knon at time k, e still have perfect state information. We define our ne disturbance as k := ( k, ξ k ), ith probability distribution p( k x k, u k ) = p( k, ξ k x k, y k, u k ) = p( k x k, y k, u k, ξ k ) p(ξ k x k, y k, u k ) = p( k y k ) p(ξ k ). Note that k depends only on x k (in particular y k ), and ξ k does not depend on anything. The dynamics therefore become xk+1 fk (x x k+1 = = k, u k, k ) =: f k ( x k, u k, k ). y k+1 hich no matches the standard form. The associated DPA becomes: ξ k 4
5 Initialization J N ( x) = J N (x, y) = g N (x), x S N, y {1,..., m} Recursion J k ( x) = J k (x, y) Steps: u U k (x k ) ( k x k = x,u k =u) 1 u U k (x k ) ( k y k =y) 2 u U k (x k ) ( k y k =y) u U k (x k ) ( k y k =y) g k (x k, u k, k ) + J k+1 (f k (x k, u k, k ), ξ k ) g k (x, u, k ) + J k+1 (f k (x, u, k ), ξ k ) ξ k g k (x, u, k ) + J k+1 (f k (x, u, k ), ξ k ) ξk g k (x, u, k ) + 1. Using p( k x k, u k ) = p( k y k ) p(ξ k ). m p ξk (i) J k+1 (f k (x, u, k ), i) i=1 x S k, y {1,..., m}, k = N 1,..., Since g k (x k, u k, k ) is not a function of the random variable ξ k The Curse of Dimensionality The above mentioned conversions come ith the price of increased computational complexity. The augmented state space increases the computational burden exponentially. This is sometimes knon as the curse of dimensionality. 5
6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE Suboptimal control Cost approximation methods: Classification Certainty equivalent control: An example Limited lookahead policies Performance bounds
More informationDuopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma
Recap Last class (September 20, 2016) Duopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma Today (October 13, 2016) Finitely
More informationHandout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,
More informationCost Minimization and Cost Curves. Beattie, Taylor, and Watts Sections: 3.1a, 3.2a-b, 4.1
Cost Minimization and Cost Curves Beattie, Talor, and Watts Sections: 3.a, 3.a-b, 4. Agenda The Cost Function and General Cost Minimization Cost Minimization ith One Variable Input Deriving the Average
More informationStochastic Optimal Control
Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of
More informationAdv. Micro Theory, ECON
Av. Micro Theory, ECON 60-090 Assignment 4 Ansers, Fall 00 Due: Wenesay October 3 th by 5pm Directions: Anser each question as completely as possible. You may ork in a group consisting of up to 3 members
More informationSolutions of Bimatrix Coalitional Games
Applied Mathematical Sciences, Vol. 8, 2014, no. 169, 8435-8441 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.410880 Solutions of Bimatrix Coalitional Games Xeniya Grigorieva St.Petersburg
More informationEcon 101A Midterm 2 Th 6 November 2003.
Econ 101A Midterm 2 Th 6 November 2003. You have approximately 1 hour and 20 minutes to anser the questions in the midterm. I ill collect the exams at 12.30 sharp. Sho your k, and good luck! Problem 1.
More informationIntroduction to Dynamic Programming
Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1
More informationRobust portfolio optimization using second-order cone programming
1 Robust portfolio optimization using second-order cone programming Fiona Kolbert and Laurence Wormald Executive Summary Optimization maintains its importance ithin portfolio management, despite many criticisms
More informationChapter 10: Mixed strategies Nash equilibria, reaction curves and the equality of payoffs theorem
Chapter 10: Mixed strategies Nash equilibria reaction curves and the equality of payoffs theorem Nash equilibrium: The concept of Nash equilibrium can be extended in a natural manner to the mixed strategies
More informationIntermediate Micro HW 2
Intermediate Micro HW June 3, 06 Leontief & Substitution An individual has Leontief preferences over goods x and x He starts ith income y and the to goods have respective prices p and p The price of good
More informationSaving seats for strategic customers
Saving seats for strategic customers Martin A. Lariviere (ith Eren Cil) Kellogg School of Management The changing nature of restaurant reservations It took three years for OpenTable to seat its one-millionth
More informationScenario Generation and Sampling Methods
Scenario Generation and Sampling Methods Güzin Bayraksan Tito Homem-de-Mello SVAN 2016 IMPA May 9th, 2016 Bayraksan (OSU) & Homem-de-Mello (UAI) Scenario Generation and Sampling SVAN IMPA May 9 1 / 30
More informationThe Principal-Agent Problem
The Principal-Agent Problem Class Notes A principal (she) hires an agent (he) or more than one agent for one perio. Agents effort levels provie a revenue to the principal, ho pays a age to each agent.
More informationCS188 Spring 2012 Section 4: Games
CS188 Spring 2012 Section 4: Games 1 Minimax Search In this problem, we will explore adversarial search. Consider the zero-sum game tree shown below. Trapezoids that point up, such as at the root, represent
More informationInternational Trade
4.58 International Trade Class notes on 5/6/03 Trade Policy Literature Key questions:. Why are countries protectionist? Can protectionism ever be optimal? Can e explain ho trade policies vary across countries,
More informationCasino gambling problem under probability weighting
Casino gambling problem under probability weighting Sang Hu National University of Singapore Mathematical Finance Colloquium University of Southern California Jan 25, 2016 Based on joint work with Xue
More informationPakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks
Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Spring 2009 Main question: How much are patents worth? Answering this question is important, because it helps
More informationEE266 Homework 5 Solutions
EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The
More informationAlgorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information
Algorithmic Game Theory and Applications Lecture 11: Games of Perfect Information Kousha Etessami finite games of perfect information Recall, a perfect information (PI) game has only 1 node per information
More informationReasoning with Uncertainty
Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally
More informationOutline for today. Stat155 Game Theory Lecture 13: General-Sum Games. General-sum games. General-sum games. Dominated pure strategies
Outline for today Stat155 Game Theory Lecture 13: General-Sum Games Peter Bartlett October 11, 2016 Two-player general-sum games Definitions: payoff matrices, dominant strategies, safety strategies, Nash
More informationADVANCED MACROECONOMIC TECHNIQUES NOTE 7b
316-406 ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b Chris Edmond hcpedmond@unimelb.edu.aui Aiyagari s model Arguably the most popular example of a simple incomplete markets model is due to Rao Aiyagari (1994,
More informationFinancial Optimization ISE 347/447. Lecture 15. Dr. Ted Ralphs
Financial Optimization ISE 347/447 Lecture 15 Dr. Ted Ralphs ISE 347/447 Lecture 15 1 Reading for This Lecture C&T Chapter 12 ISE 347/447 Lecture 15 2 Stock Market Indices A stock market index is a statistic
More informationProblem Set 3: Suggested Solutions
Microeconomics: Pricing 3E00 Fall 06. True or false: Problem Set 3: Suggested Solutions (a) Since a durable goods monopolist prices at the monopoly price in her last period of operation, the prices must
More information6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move
More informationInformation Acquisition in Financial Markets: a Correction
Information Acquisition in Financial Markets: a Correction Gadi Barlevy Federal Reserve Bank of Chicago 30 South LaSalle Chicago, IL 60604 Pietro Veronesi Graduate School of Business University of Chicago
More informationLecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory
CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go
More informationProblem Set 2: Answers
Economics 623 J.R.Walker Page 1 Problem Set 2: Answers The problem set came from Michael A. Trick, Senior Associate Dean, Education and Professor Tepper School of Business, Carnegie Mellon University.
More informationTUFTS UNIVERSITY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING ES 152 ENGINEERING SYSTEMS Spring Lesson 16 Introduction to Game Theory
TUFTS UNIVERSITY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING ES 52 ENGINEERING SYSTEMS Spring 20 Introduction: Lesson 6 Introduction to Game Theory We will look at the basic ideas of game theory.
More informationGame Theory - Lecture #8
Game Theory - Lecture #8 Outline: Randomized actions vnm & Bernoulli payoff functions Mixed strategies & Nash equilibrium Hawk/Dove & Mixed strategies Random models Goal: Would like a formulation in which
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationStrategy -1- Strategy
Strategy -- Strategy A Duopoly, Cournot equilibrium 2 B Mixed strategies: Rock, Scissors, Paper, Nash equilibrium 5 C Games with private information 8 D Additional exercises 24 25 pages Strategy -2- A
More informationChapter 17: Vertical and Conglomerate Mergers
Chapter 17: Vertical and Conglomerate Mergers Learning Objectives: Students should learn to: 1. Apply the complementary goods model to the analysis of vertical mergers.. Demonstrate the idea of double
More information16 MAKING SIMPLE DECISIONS
247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result
More informationLecture 5: Iterative Combinatorial Auctions
COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes
More informationEE641 Digital Image Processing II: Purdue University VISE - October 29,
EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 The EM Algorithm. Suffient Statistics and Exponential Distributions Let p(y θ) be a family of density functions parameterized by
More informationRobust Dual Dynamic Programming
1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217 2 / 18 Inspired by SDDP Stochastic optimization
More informationStrategies and Nash Equilibrium. A Whirlwind Tour of Game Theory
Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationDynamic Portfolio Choice II
Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic
More informationSupplementary material Expanding vaccine efficacy estimation with dynamic models fitted to cross-sectional prevalence data post-licensure
Supplementary material Expanding vaccine efficacy estimation ith dynamic models fitted to cross-sectional prevalence data post-licensure Erida Gjini a, M. Gabriela M. Gomes b,c,d a Instituto Gulbenkian
More informationDynamic tax depreciation strategies
OR Spectrum (2011) 33:419 444 DOI 10.1007/s00291-010-0214-3 REGULAR ARTICLE Dynamic tax depreciation strategies Anja De Waegenaere Jacco L. Wielhouwer Published online: 22 May 2010 The Author(s) 2010.
More informationEcon 101A Final Exam We May 9, 2012.
Econ 101A Final Exam We May 9, 2012. You have 3 hours to answer the questions in the final exam. We will collect the exams at 2.30 sharp. Show your work, and good luck! Problem 1. Utility Maximization.
More informationG5212: Game Theory. Mark Dean. Spring 2017
G5212: Game Theory Mark Dean Spring 2017 Why Game Theory? So far your microeconomic course has given you many tools for analyzing economic decision making What has it missed out? Sometimes, economic agents
More informationThursday, March 3
5.53 Thursday, March 3 -person -sum (or constant sum) game theory -dimensional multi-dimensional Comments on first midterm: practice test will be on line coverage: every lecture prior to game theory quiz
More informationFinding Optimal Strategies for Imperfect Information Games*
From: AAAI-98 Proceedings. Copyright 1998, AAAI (.aaai.org). All rights reserved. Finding Optimal Strategies for Imperfect Information Games* Ian Frank Complex Games Lab Electrotechnical Laboratory Umezono
More informationResearch Memo: Adding Nonfarm Employment to the Mixed-Frequency VAR Model
Research Memo: Adding Nonfarm Employment to the Mixed-Frequency VAR Model Kenneth Beauchemin Federal Reserve Bank of Minneapolis January 2015 Abstract This memo describes a revision to the mixed-frequency
More informationLecture outline W.B.Powell 1
Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous
More informationMAT 4250: Lecture 1 Eric Chung
1 MAT 4250: Lecture 1 Eric Chung 2Chapter 1: Impartial Combinatorial Games 3 Combinatorial games Combinatorial games are two-person games with perfect information and no chance moves, and with a win-or-lose
More informationLecture #6: Auctions: Theory and Applications. Prof. Dr. Sven Seuken
Lecture #6: Auctions: Theory and Applications Prof. Dr. Sven Seuken 15.3.2012 Housekeeping Questions? Concerns? BitTorrent homework assignment? Posting on NB: do not copy/paste from PDFs Game Theory Homework:
More informationMaking Complex Decisions
Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2
More informationLecture 5 January 30
EE 223: Stochastic Estimation and Control Spring 2007 Lecture 5 January 30 Lecturer: Venkat Anantharam Scribe: aryam Kamgarpour 5.1 Secretary Problem The problem set-up is explained in Lecture 4. We review
More informationGame theory and applications: Lecture 1
Game theory and applications: Lecture 1 Adam Szeidl September 20, 2018 Outline for today 1 Some applications of game theory 2 Games in strategic form 3 Dominance 4 Nash equilibrium 1 / 8 1. Some applications
More informationMarkov Decision Process
Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationGame theory for. Leonardo Badia.
Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player
More informationG5212: Game Theory. Mark Dean. Spring 2017
G5212: Game Theory Mark Dean Spring 2017 Bargaining We will now apply the concept of SPNE to bargaining A bit of background Bargaining is hugely interesting but complicated to model It turns out that the
More informationRisk-Sensitive Planning with One-Switch Utility Functions: Value Iteration
Risk-Sensitive Planning ith One-Sitch tility Functions: Value Iteration Yaxin Liu Department of Computer Sciences niversity of Texas at Austin Austin, TX 78712-0233 yxliu@cs.utexas.edu Sven Koenig Computer
More informationORF 307: Lecture 19. Linear Programming: Chapter 13, Section 2 Pricing American Options. Robert Vanderbei. May 1, 2018
ORF 307: Lecture 19 Linear Programming: Chapter 13, Section 2 Pricing American Options Robert Vanderbei May 1, 2018 Slides last edited on April 30, 2018 http://www.princeton.edu/ rvdb American Options
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationFDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.
FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.) Hints for Problem Set 2 1. Consider a zero-sum game, where
More informationIndex Numbers and Moving Averages
5 Index Numbers and Moving Averages 5.1 INDEX NUMBERS The value of money is going don, e hear everyday. This means that since prices of things are going up, e get lesser and lesser quantities of the same
More informationMath 152: Applicable Mathematics and Computing
Math 152: Applicable Mathematics and Computing May 22, 2017 May 22, 2017 1 / 19 Bertrand Duopoly: Undifferentiated Products Game (Bertrand) Firm and Firm produce identical products. Each firm simultaneously
More informationw E(Q w) w/100 E(Q w) w/
14.03 Fall 2000 Problem Set 7 Solutions Theory: 1. If used cars sell for $1,000 and non-defective cars have a value of $6,000, then all cars in the used market must be defective. Hence the value of a defective
More information7. Infinite Games. II 1
7. Infinite Games. In this Chapter, we treat infinite two-person, zero-sum games. These are games (X, Y, A), in which at least one of the strategy sets, X and Y, is an infinite set. The famous example
More informationProblem Set #3 (15 points possible accounting for 3% of course grade) Due in hard copy at beginning of lecture on Wednesday, March
Department of Economics M. Doell California State University, Sacramento Spring 2011 Intermediate Macroeconomics Economics 100A Problem Set #3 (15 points possible accounting for 3% of course grade) Due
More informationDynamic Programming and Reinforcement Learning
Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning
More informationAnswer Key: Problem Set 4
Answer Key: Problem Set 4 Econ 409 018 Fall A reminder: An equilibrium is characterized by a set of strategies. As emphasized in the class, a strategy is a complete contingency plan (for every hypothetical
More informationMidterm Exam 2. Tuesday, November 1. 1 hour and 15 minutes
San Francisco State University Michael Bar ECON 302 Fall 206 Midterm Exam 2 Tuesday, November hour and 5 minutes Name: Instructions. This is closed book, closed notes exam. 2. No calculators of any kind
More informationN-Player Preemption Games
N-Player Preemption Games Rossella Argenziano Essex Philipp Schmidt-Dengler LSE October 2007 Argenziano, Schmidt-Dengler (Essex, LSE) N-Player Preemption Games Leicester October 2007 1 / 42 Timing Games
More informationBasic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]
Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal
More informationTheir opponent will play intelligently and wishes to maximize their own payoff.
Two Person Games (Strictly Determined Games) We have already considered how probability and expected value can be used as decision making tools for choosing a strategy. We include two examples below for
More information6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time
More informationCS 573: Algorithmic Game Theory Lecture date: 22 February Combinatorial Auctions 1. 2 The Vickrey-Clarke-Groves (VCG) Mechanism 3
CS 573: Algorithmic Game Theory Lecture date: 22 February 2008 Instructor: Chandra Chekuri Scribe: Daniel Rebolledo Contents 1 Combinatorial Auctions 1 2 The Vickrey-Clarke-Groves (VCG) Mechanism 3 3 Examples
More informationSequential Rationality and Weak Perfect Bayesian Equilibrium
Sequential Rationality and Weak Perfect Bayesian Equilibrium Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu June 16th, 2016 C. Hurtado (UIUC - Economics)
More informationCOMP331/557. Chapter 6: Optimisation in Finance: Cash-Flow. (Cornuejols & Tütüncü, Chapter 3)
COMP331/557 Chapter 6: Optimisation in Finance: Cash-Flow (Cornuejols & Tütüncü, Chapter 3) 159 Cash-Flow Management Problem A company has the following net cash flow requirements (in 1000 s of ): Month
More informationDynamic Programming (DP) Massimo Paolucci University of Genova
Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 8 The portfolio selection problem The portfolio
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationFinancial Risk Management
Financial Risk Management Professor: Thierry Roncalli Evry University Assistant: Enareta Kurtbegu Evry University Tutorial exercices #4 1 Correlation and copulas 1. The bivariate Gaussian copula is given
More informationCUR 412: Game Theory and its Applications, Lecture 4
CUR 412: Game Theory and its Applications, Lecture 4 Prof. Ronaldo CARPIO March 22, 2015 Homework #1 Homework #1 will be due at the end of class today. Please check the website later today for the solutions
More informationORF 307: Lecture 12. Linear Programming: Chapter 11: Game Theory
ORF 307: Lecture 12 Linear Programming: Chapter 11: Game Theory Robert J. Vanderbei April 3, 2018 Slides last edited on April 3, 2018 http://www.princeton.edu/ rvdb Game Theory John Nash = A Beautiful
More informationMATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS
MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.
More informationApproximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications
Approximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications Anna Timonina University of Vienna, Abraham Wald PhD Program in Statistics and Operations
More informationThe Simple Random Walk
Chapter 8 The Simple Random Walk In this chapter we consider a classic and fundamental problem in random processes; the simple random walk in one dimension. Suppose a walker chooses a starting point on
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationComputing Optimal Randomized Resource Allocations for Massive Security Games
Computing Optimal Randomized Resource Allocations for Massive Security Games Christopher Kiekintveld, Manish Jain, Jason Tsai, James Pita, Fernando Ordonez, Milind Tambe The Problem The LAX canine problems
More informationECE 586BH: Problem Set 5: Problems and Solutions Multistage games, including repeated games, with observed moves
University of Illinois Spring 01 ECE 586BH: Problem Set 5: Problems and Solutions Multistage games, including repeated games, with observed moves Due: Reading: Thursday, April 11 at beginning of class
More informationIntroduction to Artificial Intelligence Spring 2019 Note 2
CS 188 Introduction to Artificial Intelligence Spring 2019 Note 2 These lecture notes are heavily based on notes originally written by Nikhil Sharma. Games In the first note, we talked about search problems
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC
More informationA Dynamic Model of Mixed Duopolistic Competition: Open Source vs. Proprietary Innovation
A Dynamic Model of Mixed Duopolistic Competition: Open Source vs. Proprietary Innovation Suat Akbulut Murat Yılmaz August 015 Abstract Open source softare development has been an interesting investment
More informationThe Cagan Model. Lecture 15 by John Kennes March 25
The Cagan Model Lecture 15 by John Kennes March 25 The Cagan Model Let M denote a country s money supply and P its price level. Higher expected inflation lowers the demand for real balances M/P by raising
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use
More informationLecture 8: Linear Prediction: Lattice filters
1 Lecture 8: Linear Prediction: Lattice filters Overview New AR parametrization: Reflection coefficients; Fast computation of prediction errors; Direct and Inverse Lattice filters; Burg lattice parameter
More informationProblem set #2. Martin Ellison MPhil Macroeconomics, University of Oxford. The questions marked with an * should be handed in. max log (1) s.t.
Problem set #2 Martin Ellison MPhil Macroeconomics, University of Oxford The questions marked with an * should be handed in 1 A representative household model 1. A representative household consists of
More informationPORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA
PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA We begin by describing the problem at hand which motivates our results. Suppose that we have n financial instruments at hand,
More informationReal Options and Game Theory in Incomplete Markets
Real Options and Game Theory in Incomplete Markets M. Grasselli Mathematics and Statistics McMaster University IMPA - June 28, 2006 Strategic Decision Making Suppose we want to assign monetary values to
More information