3. The Dynamic Programming Algorithm (cont d)

Size: px
Start display at page:

Download "3. The Dynamic Programming Algorithm (cont d)"

Transcription

1 3. The Dynamic Programming Algorithm (cont d) Last lecture e introduced the DPA. In this lecture, e first apply the DPA to the chess match example, and then sho ho to deal ith problems that do not match the standard form outlined in Section 2.1. xample 1: Chess match strategy (revisited) Consider a to game chess match ith an opponent. Our objective is to develop a strategy that maximizes the chance of inning the match. ach game can have one of to outcomes: 1) Win/Lose: 1 point for the inner, 0 for the loser; 2) Dra: 0.5 points for each player. In addition, if at the end of to games the score is equal, the players keep on playing ne games until one ins, and thereby ins the match (also knon as sudden death). There are to possible playing styles for our player: timid and bold. When playing timid, our player dras ith probability p d and loses ith probability (1 p d ). When playing bold, our player ins ith probability p and loses ith probability (1 p ). We also assume that p d > p, a necessary condition for this problem to make sense. We ant to find a control policy that maximizes the probability of inning the match. We ill solve this using the DPA (replacing min ith max). The state x k is the difference beteen our player s score and the opponent s score at the end of game k. That is, x 0 = 0, x 1 S 1 = { 1, 0, 1}, x 2 S 2 = { 2, 1, 0, 1, 2}. The control inputs u k are the to playing styles, that is u k U = {timid, bold}. Dynamics: model as a finite state system using transition probabilities (see Section 1.3): here x k+1 = k, k = 0, 1 Pr ( k = x k u k = timid) = p d Pr ( k = x k 1 u k = timid) = 1 p d Pr ( k = x k + 1 u k = bold) = p Pr ( k = x k 1 u k = bold) = 1 p ith p d > p. Date compiled: October 6,

2 Cost: e ant to maximize the probability of inning. This is equivalent to the standard form 1 g 2 (x 2 ) + g k (x k, u k, k ) here k=0 g k (x k, u k, k ) = 0, k {0, 1} 1 if x 2 > 0 g 2 (x 2 ) = p if x 2 = 0 0 if x 2 < 0 To see that the expected cost is equal to the probability of inning P in, let q + := Pr (x 2 0), q 0 := Pr (x 2 = 0) q := Pr (x 2 0). The probability of inning is P in = q + + q 0 p and the expected value of the cost is g 2 (x 2 ) = q q 0 p + q 0 = q + + q 0 p. No apply DPA: Initialization: 1 if x > 0 J 2 (x) = p if x = 0 0 if x < 0 Recursion: J k (x) = max u U g k(x k, u k, k ) + J k+1 (x k+1 ), x S k, k {0, 1} ( k x k =x,u k =u) = max J k+1( k ) u U ( k x k =x,u k =u) = max p dj k+1 (x) + (1 p d )J k+1 (x 1), p J k+1 (x + 1) + (1 p )J k+1 (x 1) }{{}}{{}. timid Henceforth the first entry of the maximum ill denote the cost associated ith timid play and the second ith bold play. k = 1 : J 1 (x) = max {p d J 2 (x) + (1 p d )J 2 (x 1), p J 2 (x + 1) + (1 p )J 2 (x 1)} x 1 = 1: J 1 (1) = max {p d + (1 p d )p, p + (1 p )p } Comparing the to entries yields: (p d + (1 p d )p ) (p + (1 p )p ) = (p d p )(1 p ) > 0 (since p d > p ) Therefore µ 1 (1) = timid and J 1(1) = p d + (1 p d )p. 2 bold

3 x 1 = 0: k = 0 : J 1 (0) = max {p d p + (1 p d ) 0, p + (1 p ) 0} = max {p d p, p } Therefore µ 1 (0) = bold and J 1(0) = p. x 1 = 1: J 1 ( 1) = max {p d 0 + (1 p d ) 0, p p + (1 p ) 0} = max { 0, p 2 } Therefore µ 1 ( 1) = bold and J 1( 1) = p 2. J 0 (x) = max {p d J 1 (x) + (1 p d )J 1 (x 1), p J 1 (x + 1) + (1 p )J 1 (x 1)} x 0 = 0: J 0 (0) = max {p d J 1 (0) + (1 p d )J 1 ( 1), p J 1 (1) + (1 p )J 1 ( 1)} = max { p d p + (1 p d )p 2, p (p d + (1 p d )p ) + (1 p )p 2 } = max { p d p + (1 p d )p 2, p d p + (1 p d )p 2 + (1 p )p 2 } Therefore µ 0 (0) = bold and J 0(0) = p d p + (1 p d )p 2 + (1 p )p 2. Optimal match strategy: Play timid iff ahead in the score. 3.1 Converting non-standard problems to the standard form At first glance, the class of systems of our standard problem formulation in Section 1.1 may seem limiting but is in fact general enough to handle other types of problems via state-augmentation. We include some examples belo Time Lags Assume the dynamics have the folloing form: x k+1 = f k (x k, x k 1, u k, u k 1, k ) Let y k := x k 1, s k := u k 1, and the augmented state vector x k := (x k, y k, s k ). The dynamics of the augmented state then become x k+1 f k (x k, y k, u k, s k, k ) x k+1 = y k+1 = s k+1 x k u k =: f k ( x k, u k, k ) hich no matches the standard form. Note that this procedure orks for an arbitrary number of time lags. 3

4 3.1.2 Correlated Disturbances Disturbances k that are correlated across time (colored noise) can commonly be modeled as the output of a linear system driven by independent random variables as follos: k = C k y k+1 y k+1 = A k y k + ξ k here A k, C k are given and ξ k, k = 0,..., N 1, are independent random variables. Let the augmented state vector x k := (x k, y k ). Note that no y k must be observed at time k, hich can be done using a state estimator. The dynamics of the augmented state then become xk+1 fk (x x k+1 = = k, u k, C k (A k y k + ξ k )) =: y k+1 A k y k + ξ f k ( x k, u k, ξ k ) k hich no matches the standard form Forecasts Consider the case here at each time period e have access to a forecast that reveals the probability distribution of k, and possibly of future disturbances. For example, assume that k is independent of x k and u k. At the beginning of each period k, e receive a prediction y k (forecast) that k ill attain a probability distribution out of a given finite collection of distributions { p k y k ( 1), p k y k ( 2),..., p k y k ( m) }. In particular, e receive a forecast that y k = i and thus p k y k ( i) is used to generate k. Furthermore, the forecast itself has a given a-priori probability distribution, namely, y k+1 = ξ k, here the ξ k are independent random variables taking value i {1, 2,..., m} ith probability p ξk (i). Let the augmented state vector x k := (x k, y k ). Since the forecast y k is knon at time k, e still have perfect state information. We define our ne disturbance as k := ( k, ξ k ), ith probability distribution p( k x k, u k ) = p( k, ξ k x k, y k, u k ) = p( k x k, y k, u k, ξ k ) p(ξ k x k, y k, u k ) = p( k y k ) p(ξ k ). Note that k depends only on x k (in particular y k ), and ξ k does not depend on anything. The dynamics therefore become xk+1 fk (x x k+1 = = k, u k, k ) =: f k ( x k, u k, k ). y k+1 hich no matches the standard form. The associated DPA becomes: ξ k 4

5 Initialization J N ( x) = J N (x, y) = g N (x), x S N, y {1,..., m} Recursion J k ( x) = J k (x, y) Steps: u U k (x k ) ( k x k = x,u k =u) 1 u U k (x k ) ( k y k =y) 2 u U k (x k ) ( k y k =y) u U k (x k ) ( k y k =y) g k (x k, u k, k ) + J k+1 (f k (x k, u k, k ), ξ k ) g k (x, u, k ) + J k+1 (f k (x, u, k ), ξ k ) ξ k g k (x, u, k ) + J k+1 (f k (x, u, k ), ξ k ) ξk g k (x, u, k ) + 1. Using p( k x k, u k ) = p( k y k ) p(ξ k ). m p ξk (i) J k+1 (f k (x, u, k ), i) i=1 x S k, y {1,..., m}, k = N 1,..., Since g k (x k, u k, k ) is not a function of the random variable ξ k The Curse of Dimensionality The above mentioned conversions come ith the price of increased computational complexity. The augmented state space increases the computational burden exponentially. This is sometimes knon as the curse of dimensionality. 5

6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 8 LECTURE OUTLINE Suboptimal control Cost approximation methods: Classification Certainty equivalent control: An example Limited lookahead policies Performance bounds

More information

Duopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma

Duopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma Recap Last class (September 20, 2016) Duopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma Today (October 13, 2016) Finitely

More information

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems

Handout 8: Introduction to Stochastic Dynamic Programming. 2 Examples of Stochastic Dynamic Programming Problems SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 8: Introduction to Stochastic Dynamic Programming Instructor: Shiqian Ma March 10, 2014 Suggested Reading: Chapter 1 of Bertsekas,

More information

Cost Minimization and Cost Curves. Beattie, Taylor, and Watts Sections: 3.1a, 3.2a-b, 4.1

Cost Minimization and Cost Curves. Beattie, Taylor, and Watts Sections: 3.1a, 3.2a-b, 4.1 Cost Minimization and Cost Curves Beattie, Talor, and Watts Sections: 3.a, 3.a-b, 4. Agenda The Cost Function and General Cost Minimization Cost Minimization ith One Variable Input Deriving the Average

More information

Stochastic Optimal Control

Stochastic Optimal Control Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of

More information

Adv. Micro Theory, ECON

Adv. Micro Theory, ECON Av. Micro Theory, ECON 60-090 Assignment 4 Ansers, Fall 00 Due: Wenesay October 3 th by 5pm Directions: Anser each question as completely as possible. You may ork in a group consisting of up to 3 members

More information

Solutions of Bimatrix Coalitional Games

Solutions of Bimatrix Coalitional Games Applied Mathematical Sciences, Vol. 8, 2014, no. 169, 8435-8441 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.410880 Solutions of Bimatrix Coalitional Games Xeniya Grigorieva St.Petersburg

More information

Econ 101A Midterm 2 Th 6 November 2003.

Econ 101A Midterm 2 Th 6 November 2003. Econ 101A Midterm 2 Th 6 November 2003. You have approximately 1 hour and 20 minutes to anser the questions in the midterm. I ill collect the exams at 12.30 sharp. Sho your k, and good luck! Problem 1.

More information

Introduction to Dynamic Programming

Introduction to Dynamic Programming Introduction to Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes Outline 2/65 1

More information

Robust portfolio optimization using second-order cone programming

Robust portfolio optimization using second-order cone programming 1 Robust portfolio optimization using second-order cone programming Fiona Kolbert and Laurence Wormald Executive Summary Optimization maintains its importance ithin portfolio management, despite many criticisms

More information

Chapter 10: Mixed strategies Nash equilibria, reaction curves and the equality of payoffs theorem

Chapter 10: Mixed strategies Nash equilibria, reaction curves and the equality of payoffs theorem Chapter 10: Mixed strategies Nash equilibria reaction curves and the equality of payoffs theorem Nash equilibrium: The concept of Nash equilibrium can be extended in a natural manner to the mixed strategies

More information

Intermediate Micro HW 2

Intermediate Micro HW 2 Intermediate Micro HW June 3, 06 Leontief & Substitution An individual has Leontief preferences over goods x and x He starts ith income y and the to goods have respective prices p and p The price of good

More information

Saving seats for strategic customers

Saving seats for strategic customers Saving seats for strategic customers Martin A. Lariviere (ith Eren Cil) Kellogg School of Management The changing nature of restaurant reservations It took three years for OpenTable to seat its one-millionth

More information

Scenario Generation and Sampling Methods

Scenario Generation and Sampling Methods Scenario Generation and Sampling Methods Güzin Bayraksan Tito Homem-de-Mello SVAN 2016 IMPA May 9th, 2016 Bayraksan (OSU) & Homem-de-Mello (UAI) Scenario Generation and Sampling SVAN IMPA May 9 1 / 30

More information

The Principal-Agent Problem

The Principal-Agent Problem The Principal-Agent Problem Class Notes A principal (she) hires an agent (he) or more than one agent for one perio. Agents effort levels provie a revenue to the principal, ho pays a age to each agent.

More information

CS188 Spring 2012 Section 4: Games

CS188 Spring 2012 Section 4: Games CS188 Spring 2012 Section 4: Games 1 Minimax Search In this problem, we will explore adversarial search. Consider the zero-sum game tree shown below. Trapezoids that point up, such as at the root, represent

More information

International Trade

International Trade 4.58 International Trade Class notes on 5/6/03 Trade Policy Literature Key questions:. Why are countries protectionist? Can protectionism ever be optimal? Can e explain ho trade policies vary across countries,

More information

Casino gambling problem under probability weighting

Casino gambling problem under probability weighting Casino gambling problem under probability weighting Sang Hu National University of Singapore Mathematical Finance Colloquium University of Southern California Jan 25, 2016 Based on joint work with Xue

More information

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Spring 2009 Main question: How much are patents worth? Answering this question is important, because it helps

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information Algorithmic Game Theory and Applications Lecture 11: Games of Perfect Information Kousha Etessami finite games of perfect information Recall, a perfect information (PI) game has only 1 node per information

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally

More information

Outline for today. Stat155 Game Theory Lecture 13: General-Sum Games. General-sum games. General-sum games. Dominated pure strategies

Outline for today. Stat155 Game Theory Lecture 13: General-Sum Games. General-sum games. General-sum games. Dominated pure strategies Outline for today Stat155 Game Theory Lecture 13: General-Sum Games Peter Bartlett October 11, 2016 Two-player general-sum games Definitions: payoff matrices, dominant strategies, safety strategies, Nash

More information

ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b

ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b 316-406 ADVANCED MACROECONOMIC TECHNIQUES NOTE 7b Chris Edmond hcpedmond@unimelb.edu.aui Aiyagari s model Arguably the most popular example of a simple incomplete markets model is due to Rao Aiyagari (1994,

More information

Financial Optimization ISE 347/447. Lecture 15. Dr. Ted Ralphs

Financial Optimization ISE 347/447. Lecture 15. Dr. Ted Ralphs Financial Optimization ISE 347/447 Lecture 15 Dr. Ted Ralphs ISE 347/447 Lecture 15 1 Reading for This Lecture C&T Chapter 12 ISE 347/447 Lecture 15 2 Stock Market Indices A stock market index is a statistic

More information

Problem Set 3: Suggested Solutions

Problem Set 3: Suggested Solutions Microeconomics: Pricing 3E00 Fall 06. True or false: Problem Set 3: Suggested Solutions (a) Since a durable goods monopolist prices at the monopoly price in her last period of operation, the prices must

More information

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move

More information

Information Acquisition in Financial Markets: a Correction

Information Acquisition in Financial Markets: a Correction Information Acquisition in Financial Markets: a Correction Gadi Barlevy Federal Reserve Bank of Chicago 30 South LaSalle Chicago, IL 60604 Pietro Veronesi Graduate School of Business University of Chicago

More information

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go

More information

Problem Set 2: Answers

Problem Set 2: Answers Economics 623 J.R.Walker Page 1 Problem Set 2: Answers The problem set came from Michael A. Trick, Senior Associate Dean, Education and Professor Tepper School of Business, Carnegie Mellon University.

More information

TUFTS UNIVERSITY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING ES 152 ENGINEERING SYSTEMS Spring Lesson 16 Introduction to Game Theory

TUFTS UNIVERSITY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING ES 152 ENGINEERING SYSTEMS Spring Lesson 16 Introduction to Game Theory TUFTS UNIVERSITY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING ES 52 ENGINEERING SYSTEMS Spring 20 Introduction: Lesson 6 Introduction to Game Theory We will look at the basic ideas of game theory.

More information

Game Theory - Lecture #8

Game Theory - Lecture #8 Game Theory - Lecture #8 Outline: Randomized actions vnm & Bernoulli payoff functions Mixed strategies & Nash equilibrium Hawk/Dove & Mixed strategies Random models Goal: Would like a formulation in which

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Strategy -1- Strategy

Strategy -1- Strategy Strategy -- Strategy A Duopoly, Cournot equilibrium 2 B Mixed strategies: Rock, Scissors, Paper, Nash equilibrium 5 C Games with private information 8 D Additional exercises 24 25 pages Strategy -2- A

More information

Chapter 17: Vertical and Conglomerate Mergers

Chapter 17: Vertical and Conglomerate Mergers Chapter 17: Vertical and Conglomerate Mergers Learning Objectives: Students should learn to: 1. Apply the complementary goods model to the analysis of vertical mergers.. Demonstrate the idea of double

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Lecture 5: Iterative Combinatorial Auctions

Lecture 5: Iterative Combinatorial Auctions COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes

More information

EE641 Digital Image Processing II: Purdue University VISE - October 29,

EE641 Digital Image Processing II: Purdue University VISE - October 29, EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 The EM Algorithm. Suffient Statistics and Exponential Distributions Let p(y θ) be a family of density functions parameterized by

More information

Robust Dual Dynamic Programming

Robust Dual Dynamic Programming 1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217 2 / 18 Inspired by SDDP Stochastic optimization

More information

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Dynamic Portfolio Choice II

Dynamic Portfolio Choice II Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic

More information

Supplementary material Expanding vaccine efficacy estimation with dynamic models fitted to cross-sectional prevalence data post-licensure

Supplementary material Expanding vaccine efficacy estimation with dynamic models fitted to cross-sectional prevalence data post-licensure Supplementary material Expanding vaccine efficacy estimation ith dynamic models fitted to cross-sectional prevalence data post-licensure Erida Gjini a, M. Gabriela M. Gomes b,c,d a Instituto Gulbenkian

More information

Dynamic tax depreciation strategies

Dynamic tax depreciation strategies OR Spectrum (2011) 33:419 444 DOI 10.1007/s00291-010-0214-3 REGULAR ARTICLE Dynamic tax depreciation strategies Anja De Waegenaere Jacco L. Wielhouwer Published online: 22 May 2010 The Author(s) 2010.

More information

Econ 101A Final Exam We May 9, 2012.

Econ 101A Final Exam We May 9, 2012. Econ 101A Final Exam We May 9, 2012. You have 3 hours to answer the questions in the final exam. We will collect the exams at 2.30 sharp. Show your work, and good luck! Problem 1. Utility Maximization.

More information

G5212: Game Theory. Mark Dean. Spring 2017

G5212: Game Theory. Mark Dean. Spring 2017 G5212: Game Theory Mark Dean Spring 2017 Why Game Theory? So far your microeconomic course has given you many tools for analyzing economic decision making What has it missed out? Sometimes, economic agents

More information

Thursday, March 3

Thursday, March 3 5.53 Thursday, March 3 -person -sum (or constant sum) game theory -dimensional multi-dimensional Comments on first midterm: practice test will be on line coverage: every lecture prior to game theory quiz

More information

Finding Optimal Strategies for Imperfect Information Games*

Finding Optimal Strategies for Imperfect Information Games* From: AAAI-98 Proceedings. Copyright 1998, AAAI (.aaai.org). All rights reserved. Finding Optimal Strategies for Imperfect Information Games* Ian Frank Complex Games Lab Electrotechnical Laboratory Umezono

More information

Research Memo: Adding Nonfarm Employment to the Mixed-Frequency VAR Model

Research Memo: Adding Nonfarm Employment to the Mixed-Frequency VAR Model Research Memo: Adding Nonfarm Employment to the Mixed-Frequency VAR Model Kenneth Beauchemin Federal Reserve Bank of Minneapolis January 2015 Abstract This memo describes a revision to the mixed-frequency

More information

Lecture outline W.B.Powell 1

Lecture outline W.B.Powell 1 Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous

More information

MAT 4250: Lecture 1 Eric Chung

MAT 4250: Lecture 1 Eric Chung 1 MAT 4250: Lecture 1 Eric Chung 2Chapter 1: Impartial Combinatorial Games 3 Combinatorial games Combinatorial games are two-person games with perfect information and no chance moves, and with a win-or-lose

More information

Lecture #6: Auctions: Theory and Applications. Prof. Dr. Sven Seuken

Lecture #6: Auctions: Theory and Applications. Prof. Dr. Sven Seuken Lecture #6: Auctions: Theory and Applications Prof. Dr. Sven Seuken 15.3.2012 Housekeeping Questions? Concerns? BitTorrent homework assignment? Posting on NB: do not copy/paste from PDFs Game Theory Homework:

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

Lecture 5 January 30

Lecture 5 January 30 EE 223: Stochastic Estimation and Control Spring 2007 Lecture 5 January 30 Lecturer: Venkat Anantharam Scribe: aryam Kamgarpour 5.1 Secretary Problem The problem set-up is explained in Lecture 4. We review

More information

Game theory and applications: Lecture 1

Game theory and applications: Lecture 1 Game theory and applications: Lecture 1 Adam Szeidl September 20, 2018 Outline for today 1 Some applications of game theory 2 Games in strategic form 3 Dominance 4 Nash equilibrium 1 / 8 1. Some applications

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Game theory for. Leonardo Badia.

Game theory for. Leonardo Badia. Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player

More information

G5212: Game Theory. Mark Dean. Spring 2017

G5212: Game Theory. Mark Dean. Spring 2017 G5212: Game Theory Mark Dean Spring 2017 Bargaining We will now apply the concept of SPNE to bargaining A bit of background Bargaining is hugely interesting but complicated to model It turns out that the

More information

Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration

Risk-Sensitive Planning with One-Switch Utility Functions: Value Iteration Risk-Sensitive Planning ith One-Sitch tility Functions: Value Iteration Yaxin Liu Department of Computer Sciences niversity of Texas at Austin Austin, TX 78712-0233 yxliu@cs.utexas.edu Sven Koenig Computer

More information

ORF 307: Lecture 19. Linear Programming: Chapter 13, Section 2 Pricing American Options. Robert Vanderbei. May 1, 2018

ORF 307: Lecture 19. Linear Programming: Chapter 13, Section 2 Pricing American Options. Robert Vanderbei. May 1, 2018 ORF 307: Lecture 19 Linear Programming: Chapter 13, Section 2 Pricing American Options Robert Vanderbei May 1, 2018 Slides last edited on April 30, 2018 http://www.princeton.edu/ rvdb American Options

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015. FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.) Hints for Problem Set 2 1. Consider a zero-sum game, where

More information

Index Numbers and Moving Averages

Index Numbers and Moving Averages 5 Index Numbers and Moving Averages 5.1 INDEX NUMBERS The value of money is going don, e hear everyday. This means that since prices of things are going up, e get lesser and lesser quantities of the same

More information

Math 152: Applicable Mathematics and Computing

Math 152: Applicable Mathematics and Computing Math 152: Applicable Mathematics and Computing May 22, 2017 May 22, 2017 1 / 19 Bertrand Duopoly: Undifferentiated Products Game (Bertrand) Firm and Firm produce identical products. Each firm simultaneously

More information

w E(Q w) w/100 E(Q w) w/

w E(Q w) w/100 E(Q w) w/ 14.03 Fall 2000 Problem Set 7 Solutions Theory: 1. If used cars sell for $1,000 and non-defective cars have a value of $6,000, then all cars in the used market must be defective. Hence the value of a defective

More information

7. Infinite Games. II 1

7. Infinite Games. II 1 7. Infinite Games. In this Chapter, we treat infinite two-person, zero-sum games. These are games (X, Y, A), in which at least one of the strategy sets, X and Y, is an infinite set. The famous example

More information

Problem Set #3 (15 points possible accounting for 3% of course grade) Due in hard copy at beginning of lecture on Wednesday, March

Problem Set #3 (15 points possible accounting for 3% of course grade) Due in hard copy at beginning of lecture on Wednesday, March Department of Economics M. Doell California State University, Sacramento Spring 2011 Intermediate Macroeconomics Economics 100A Problem Set #3 (15 points possible accounting for 3% of course grade) Due

More information

Dynamic Programming and Reinforcement Learning

Dynamic Programming and Reinforcement Learning Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning

More information

Answer Key: Problem Set 4

Answer Key: Problem Set 4 Answer Key: Problem Set 4 Econ 409 018 Fall A reminder: An equilibrium is characterized by a set of strategies. As emphasized in the class, a strategy is a complete contingency plan (for every hypothetical

More information

Midterm Exam 2. Tuesday, November 1. 1 hour and 15 minutes

Midterm Exam 2. Tuesday, November 1. 1 hour and 15 minutes San Francisco State University Michael Bar ECON 302 Fall 206 Midterm Exam 2 Tuesday, November hour and 5 minutes Name: Instructions. This is closed book, closed notes exam. 2. No calculators of any kind

More information

N-Player Preemption Games

N-Player Preemption Games N-Player Preemption Games Rossella Argenziano Essex Philipp Schmidt-Dengler LSE October 2007 Argenziano, Schmidt-Dengler (Essex, LSE) N-Player Preemption Games Leicester October 2007 1 / 42 Timing Games

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

Their opponent will play intelligently and wishes to maximize their own payoff.

Their opponent will play intelligently and wishes to maximize their own payoff. Two Person Games (Strictly Determined Games) We have already considered how probability and expected value can be used as decision making tools for choosing a strategy. We include two examples below for

More information

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 10 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Approximations of rollout algorithms Discretization of continuous time

More information

CS 573: Algorithmic Game Theory Lecture date: 22 February Combinatorial Auctions 1. 2 The Vickrey-Clarke-Groves (VCG) Mechanism 3

CS 573: Algorithmic Game Theory Lecture date: 22 February Combinatorial Auctions 1. 2 The Vickrey-Clarke-Groves (VCG) Mechanism 3 CS 573: Algorithmic Game Theory Lecture date: 22 February 2008 Instructor: Chandra Chekuri Scribe: Daniel Rebolledo Contents 1 Combinatorial Auctions 1 2 The Vickrey-Clarke-Groves (VCG) Mechanism 3 3 Examples

More information

Sequential Rationality and Weak Perfect Bayesian Equilibrium

Sequential Rationality and Weak Perfect Bayesian Equilibrium Sequential Rationality and Weak Perfect Bayesian Equilibrium Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu June 16th, 2016 C. Hurtado (UIUC - Economics)

More information

COMP331/557. Chapter 6: Optimisation in Finance: Cash-Flow. (Cornuejols & Tütüncü, Chapter 3)

COMP331/557. Chapter 6: Optimisation in Finance: Cash-Flow. (Cornuejols & Tütüncü, Chapter 3) COMP331/557 Chapter 6: Optimisation in Finance: Cash-Flow (Cornuejols & Tütüncü, Chapter 3) 159 Cash-Flow Management Problem A company has the following net cash flow requirements (in 1000 s of ): Month

More information

Dynamic Programming (DP) Massimo Paolucci University of Genova

Dynamic Programming (DP) Massimo Paolucci University of Genova Dynamic Programming (DP) Massimo Paolucci University of Genova DP cannot be applied to each kind of problem In particular, it is a solution method for problems defined over stages For each stage a subproblem

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 8 The portfolio selection problem The portfolio

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Financial Risk Management

Financial Risk Management Financial Risk Management Professor: Thierry Roncalli Evry University Assistant: Enareta Kurtbegu Evry University Tutorial exercices #4 1 Correlation and copulas 1. The bivariate Gaussian copula is given

More information

CUR 412: Game Theory and its Applications, Lecture 4

CUR 412: Game Theory and its Applications, Lecture 4 CUR 412: Game Theory and its Applications, Lecture 4 Prof. Ronaldo CARPIO March 22, 2015 Homework #1 Homework #1 will be due at the end of class today. Please check the website later today for the solutions

More information

ORF 307: Lecture 12. Linear Programming: Chapter 11: Game Theory

ORF 307: Lecture 12. Linear Programming: Chapter 11: Game Theory ORF 307: Lecture 12 Linear Programming: Chapter 11: Game Theory Robert J. Vanderbei April 3, 2018 Slides last edited on April 3, 2018 http://www.princeton.edu/ rvdb Game Theory John Nash = A Beautiful

More information

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.

More information

Approximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications

Approximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications Approximation of Continuous-State Scenario Processes in Multi-Stage Stochastic Optimization and its Applications Anna Timonina University of Vienna, Abraham Wald PhD Program in Statistics and Operations

More information

The Simple Random Walk

The Simple Random Walk Chapter 8 The Simple Random Walk In this chapter we consider a classic and fundamental problem in random processes; the simple random walk in one dimension. Suppose a walker chooses a starting point on

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Computing Optimal Randomized Resource Allocations for Massive Security Games

Computing Optimal Randomized Resource Allocations for Massive Security Games Computing Optimal Randomized Resource Allocations for Massive Security Games Christopher Kiekintveld, Manish Jain, Jason Tsai, James Pita, Fernando Ordonez, Milind Tambe The Problem The LAX canine problems

More information

ECE 586BH: Problem Set 5: Problems and Solutions Multistage games, including repeated games, with observed moves

ECE 586BH: Problem Set 5: Problems and Solutions Multistage games, including repeated games, with observed moves University of Illinois Spring 01 ECE 586BH: Problem Set 5: Problems and Solutions Multistage games, including repeated games, with observed moves Due: Reading: Thursday, April 11 at beginning of class

More information

Introduction to Artificial Intelligence Spring 2019 Note 2

Introduction to Artificial Intelligence Spring 2019 Note 2 CS 188 Introduction to Artificial Intelligence Spring 2019 Note 2 These lecture notes are heavily based on notes originally written by Nikhil Sharma. Games In the first note, we talked about search problems

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

A Dynamic Model of Mixed Duopolistic Competition: Open Source vs. Proprietary Innovation

A Dynamic Model of Mixed Duopolistic Competition: Open Source vs. Proprietary Innovation A Dynamic Model of Mixed Duopolistic Competition: Open Source vs. Proprietary Innovation Suat Akbulut Murat Yılmaz August 015 Abstract Open source softare development has been an interesting investment

More information

The Cagan Model. Lecture 15 by John Kennes March 25

The Cagan Model. Lecture 15 by John Kennes March 25 The Cagan Model Lecture 15 by John Kennes March 25 The Cagan Model Let M denote a country s money supply and P its price level. Higher expected inflation lowers the demand for real balances M/P by raising

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

Lecture 8: Linear Prediction: Lattice filters

Lecture 8: Linear Prediction: Lattice filters 1 Lecture 8: Linear Prediction: Lattice filters Overview New AR parametrization: Reflection coefficients; Fast computation of prediction errors; Direct and Inverse Lattice filters; Burg lattice parameter

More information

Problem set #2. Martin Ellison MPhil Macroeconomics, University of Oxford. The questions marked with an * should be handed in. max log (1) s.t.

Problem set #2. Martin Ellison MPhil Macroeconomics, University of Oxford. The questions marked with an * should be handed in. max log (1) s.t. Problem set #2 Martin Ellison MPhil Macroeconomics, University of Oxford The questions marked with an * should be handed in 1 A representative household model 1. A representative household consists of

More information

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA

PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA PORTFOLIO OPTIMIZATION AND EXPECTED SHORTFALL MINIMIZATION FROM HISTORICAL DATA We begin by describing the problem at hand which motivates our results. Suppose that we have n financial instruments at hand,

More information

Real Options and Game Theory in Incomplete Markets

Real Options and Game Theory in Incomplete Markets Real Options and Game Theory in Incomplete Markets M. Grasselli Mathematics and Statistics McMaster University IMPA - June 28, 2006 Strategic Decision Making Suppose we want to assign monetary values to

More information