Reduced Complexity Approaches to Asymmetric Information Games

Size: px
Start display at page:

Download "Reduced Complexity Approaches to Asymmetric Information Games"

Transcription

1 Reduced Complexity Approaches to Asymmetric Information Games Jeff Shamma and Lichun Li Georgia Institution of Technology ARO MURI Annual Review November 19, 2014

2 Research Thrust: Obtaining Actionable Cyber-Attack Forecasts Today s talk Value Iteration of Repeated Asymmetric Games and Its Application in Network Interdiction Problems Resilience of LTE Networks against Smart Jamming Attacks. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 2 / 16

3 Project structure Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Data Cyber-Assets Model Predict Future Actions COAs Sensor Alerts Data Impact Analysis Data Create semantically-rich view of cyber-mission status Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 3 / 16

4 Games with different information patterns Player 1's info Player 2's info Markovian Repeated One shot Info pattern Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 4 / 16

5 Network Interdiction Problem: An Asymmetric Game channel 1:10 Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 5 / 16

6 Network Interdiction Problem: An Asymmetric Game channel 1:10 channel 2:1 Attacker s actions Observe which channel is in use, but not able to measure the capacity. This action is effortless. Block one of the channels. This action has a cost of 1. Channel 1 has high capacity? Channel 2 has high capacity? observe attack 1 attack 2 observe attack 1 attack 2 use use Goal: transmit as much info as possible, the sooner the better. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 5 / 16

7 Abstraction of The Game: A Discounted Asymmetric Repeated Game Asymmetric repeated games: Three finite sets: state set S, i.e. high capacity channel is channel 1 or 2? defender s action set I, i.e. use channel 1 or 2? attacker s action set J, i.e. observe, block channel 1 or 2? An initial belief (probability) p 0 over state s, i.e. [0.5; 0.5]. A payoff function g : S I J R, i.e. g(1, 1, 2) = 11. The play rule: Stage 1: Choose state s p 0. State s is told to the defender. Both players independently choose their actions. Both Actions are announced. Stage 2 and forward: Both players independently choose their actions. Both Actions are announced. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 6 / 16

8 Abstraction of The Game: A Discounted Asymmetric Repeated Game Asymmetric repeated games: Three finite sets: state set S, i.e. high capacity channel is channel 1 or 2? defender s action set I, i.e. use channel 1 or 2? attacker s action set J, i.e. observe, block channel 1 or 2? An initial belief (probability) p 0 over state s, i.e. [0.5; 0.5]. A payoff function g : S I J R, i.e. g(1, 1, 2) = 11. The play rule: Stage 1: Choose state s p 0. State s is told to the defender. Both players independently choose their actions. Both Actions are announced. Stage 2 and forward: Both players independently choose their actions. Both Actions are announced. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 6 / 16

9 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. Payoff: γ(p 0, σ, τ) = E p0,σ,τ ( t=1 g(s, i t, j t )). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16

10 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. ( Discounted payoff: γ λ (p 0, σ, τ) = E p0,σ,τ t=1 λ(1 λ)t 1 g(s, i t, j t ) ). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16

11 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. ( Discounted payoff: γ λ (p 0, σ, τ) = E p0,σ,τ t=1 λ(1 λ)t 1 g(s, i t, j t ) ). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16

12 The KEY of the discounted asymmetric game Smart attacker learns from history actions ( p p + s x s ) t (i) (p, x t, i) = x p,xt (i) s S (1) p + : the current (at the beginning of stage t + 1) belief of what the state is. p: the previous (at the beginning of stage t) belief of what the state is. x t : the previous probability distribution over defender s action set. i: the previous action defender took. Defender fully monitors attacker s learning Game value exists. Recursive formula: v λ (p 0 ) = max x t (I) S = min y t (J) min (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) y t (J) max (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) x t (I) S The defender s optimal strategy depends on the attacker s belief only. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 8 / 16

13 The KEY of the discounted asymmetric game Smart attacker learns from history actions ( p p + s x s ) t (i) (p, x t, i) = x p,xt (i) s S (1) p + : the current (at the beginning of stage t + 1) belief of what the state is. p: the previous (at the beginning of stage t) belief of what the state is. x t : the previous probability distribution over defender s action set. i: the previous action defender took. Defender fully monitors attacker s learning Game value exists. Recursive formula: v λ (p 0 ) = max x t (I) S = min y t (J) min (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) y t (J) max (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) x t (I) S The defender s optimal strategy depends on the attacker s belief only. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 8 / 16

14 Value Iteration: A Learning Process Value iteration v λ n+1 (p) = max x n (I) S min (λg(p, x n, y n ) + (1 λ)t p,xn (v λ n )). y n (J) Contraction functional M: M(f ) M( f ) sup a f f sup for some a [0, 1). Q v x(p) = min y (J) {λ s S ps (x s ) T G s y + (1 λ)t p,x (v)} is a contraction functional with contraction constant 1 λ. H v (p) = max x (I) S Q v x(p) is a contraction functional with contraction constant 1 λ. The approximated value function v λ n converges to v λ exponentially with rate 1 λ, i.e. v λ v λ n sup (1 λ) v λ v λ n sup. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 9 / 16

15 Value Iteration: A Learning Process Value iteration v λ n+1 (p) = max x n (I) S min (λg(p, x n, y n ) + (1 λ)t p,xn (v λ n )). y n (J) Contraction functional M: M(f ) M( f ) sup a f f sup for some a [0, 1). Q v x(p) = min y (J) {λ s S ps (x s ) T G s y + (1 λ)t p,x (v)} is a contraction functional with contraction constant 1 λ. H v (p) = max x (I) S Q v x(p) is a contraction functional with contraction constant 1 λ. The approximated value function v λ n converges to v λ exponentially with rate 1 λ, i.e. v λ v λ n sup (1 λ) v λ v λ n sup. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 9 / 16

16 Suboptimal Policy σ λ n and Its Error Bound Suboptimal policy based on v λ n : σ λ n = arg max σ Σ The worst case payoff: min (λg(p, σ(p), y) + (1 λ)t p,σ(p)(v λ n )) y (J) J σ λ n (p) = min γ λ (p, σ λ n, τ). τ The game value J σ λ n induced by the sub-optimal policy σ λ n satisfies v λ J σ 2(1 λ) λ n sup v λ v λ λ n sup 2(1 λ)n+1 = v λ sup. λ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 10 / 16

17 Linear Programming Formulation of v λ n (p) Compute v λ 1 (p) = max x (I) S min λ p s x st G s y. y (J) s S s.t. max λl x,l p s G st x s l1 s S 1 T x s =1, s S x s 0, s S z s =p s x s == s.t. max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 11 / 16

18 Linear Programming Formulation of v λ n (p) Compute v λ 1 (p) = max x (I) S min λ p s x st G s y. y (J) s S s.t. max λl x,l p s G st x s l1 s S 1 T x s =1, s S x s 0, s S z s =p s x s == s.t. max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 11 / 16

19 Linear Programming Formulation of v λ n (p) max z t h t,l t ht Compute v λ n n t=1 h t H t λ(1 λ) t 1 l t ht s.t. for all t = 1, 2,..., n, and all h t H t, (G s ) T z s t h t l t ht 1 s S 1 T z s t h t =z s t 1 h t 1 (i t 1 ), s S z s t h t 0, s S, Computational complexity: O( S J I n ). s.t. Compute v λ 1 max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, Computational complexity of extensive form: O( S J n I n ). s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 12 / 16

20 Application in Network Interdiction Problems channel 1:10 channel 2 :1 Channel 1 has high capacity? observe attack 1 attack 2 use use Channel 2 has high capacity? observe attack 1 attack 2 use use p 0 = [0.5; 0.5]. λ = 0.1. Run a 100 stage simulation for 10 times At each stage, we update the current belief p, compute v λ 6 (p), σ λ 6 (p) and the worst case payoff, and choose an action according to σ λ 6 (p). The worst case payoff ranges from 3.61 to 3.97 with the average at With only one channel, the defender has a reward of 1. With a channel of capacity 1, the defender gains 2.77 more reward. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 13 / 16

21 Markovian Games (on the go) The transition matrix only depends on the defender. Game value exists. Value iteration converges exponentially to the game value. LP formulation to compute the value iteration and corresponding strategy. The transition matrix depends on both the defender and the attacker. Game value may not exist. 1 1 D. Rosenberg, E. Solan, N. Vieille Stochastic games with a single controller and incomplete information. SIAM J. Control Optim Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 14 / 16

22 Project Progress Last year: LP formulation of repeated asymmetric games with finite horizon. This year: LP formulation of Markovian asymmetric games with finite horizon. This year: Approximate policies for discounted repeated asymmetric games with infinite horizon. On the go: Approximate policies for discounted Markovian asymmetric games with infinite horizon. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 15 / 16

23 Thanks. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 16 / 16

An introduction on game theory for wireless networking [1]

An introduction on game theory for wireless networking [1] An introduction on game theory for wireless networking [1] Ning Zhang 14 May, 2012 [1] Game Theory in Wireless Networks: A Tutorial 1 Roadmap 1 Introduction 2 Static games 3 Extensive-form games 4 Summary

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in

More information

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

Stochastic Games and Bayesian Games

Stochastic Games and Bayesian Games Stochastic Games and Bayesian Games CPSC 532l Lecture 10 Stochastic Games and Bayesian Games CPSC 532l Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games 4 Analyzing Bayesian

More information

POMDPs: Partially Observable Markov Decision Processes Advanced AI

POMDPs: Partially Observable Markov Decision Processes Advanced AI POMDPs: Partially Observable Markov Decision Processes Advanced AI Wolfram Burgard Types of Planning Problems Classical Planning State observable Action Model Deterministic, accurate MDPs observable stochastic

More information

arxiv: v2 [cs.gt] 7 Nov 2017

arxiv: v2 [cs.gt] 7 Nov 2017 1 An LP approach for Solving Two-Player Zero-Sum Repeated Bayesian Games Lichun Li, Cedric Langbort and Jeff Shamma arxiv:1703.01957v2 [cs.gt] 7 Nov 2017 Abstract This paper studies two-player zero-sum

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Optimal Investment for Worst-Case Crash Scenarios

Optimal Investment for Worst-Case Crash Scenarios Optimal Investment for Worst-Case Crash Scenarios A Martingale Approach Frank Thomas Seifried Department of Mathematics, University of Kaiserslautern June 23, 2010 (Bachelier 2010) Worst-Case Portfolio

More information

Stochastic Games and Bayesian Games

Stochastic Games and Bayesian Games Stochastic Games and Bayesian Games CPSC 532L Lecture 10 Stochastic Games and Bayesian Games CPSC 532L Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games Stochastic Games

More information

An Application of Ramsey Theorem to Stopping Games

An Application of Ramsey Theorem to Stopping Games An Application of Ramsey Theorem to Stopping Games Eran Shmaya, Eilon Solan and Nicolas Vieille July 24, 2001 Abstract We prove that every two-player non zero-sum deterministic stopping game with uniformly

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks

Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks Hussein Abouzeid Department of Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute abouzeid@ecse.rpi.edu

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

The folk theorem revisited

The folk theorem revisited Economic Theory 27, 321 332 (2006) DOI: 10.1007/s00199-004-0580-7 The folk theorem revisited James Bergin Department of Economics, Queen s University, Ontario K7L 3N6, CANADA (e-mail: berginj@qed.econ.queensu.ca)

More information

Blackwell Optimality in Markov Decision Processes with Partial Observation

Blackwell Optimality in Markov Decision Processes with Partial Observation Blackwell Optimality in Markov Decision Processes with Partial Observation Dinah Rosenberg and Eilon Solan and Nicolas Vieille April 6, 2000 Abstract We prove the existence of Blackwell ε-optimal strategies

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

Revenue Management with Incomplete Demand Information

Revenue Management with Incomplete Demand Information Revenue Management with Incomplete Demand Information Victor F. Araman René Caldentey Stern School of Business, New York University, New York, NY 10012. Abstract Consider a seller who is endowed with a

More information

A Robust Option Pricing Problem

A Robust Option Pricing Problem IMA 2003 Workshop, March 12-19, 2003 A Robust Option Pricing Problem Laurent El Ghaoui Department of EECS, UC Berkeley 3 Robust optimization standard form: min x sup u U f 0 (x, u) : u U, f i (x, u) 0,

More information

Framework and Methods for Infrastructure Management. Samer Madanat UC Berkeley NAS Infrastructure Management Conference, September 2005

Framework and Methods for Infrastructure Management. Samer Madanat UC Berkeley NAS Infrastructure Management Conference, September 2005 Framework and Methods for Infrastructure Management Samer Madanat UC Berkeley NAS Infrastructure Management Conference, September 2005 Outline 1. Background: Infrastructure Management 2. Flowchart for

More information

The Value of Information in Central-Place Foraging. Research Report

The Value of Information in Central-Place Foraging. Research Report The Value of Information in Central-Place Foraging. Research Report E. J. Collins A. I. Houston J. M. McNamara 22 February 2006 Abstract We consider a central place forager with two qualitatively different

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

Repeated Games. EC202 Lectures IX & X. Francesco Nava. January London School of Economics. Nava (LSE) EC202 Lectures IX & X Jan / 16

Repeated Games. EC202 Lectures IX & X. Francesco Nava. January London School of Economics. Nava (LSE) EC202 Lectures IX & X Jan / 16 Repeated Games EC202 Lectures IX & X Francesco Nava London School of Economics January 2011 Nava (LSE) EC202 Lectures IX & X Jan 2011 1 / 16 Summary Repeated Games: Definitions: Feasible Payoffs Minmax

More information

Kim Weston (Carnegie Mellon University) Market Stability and Indifference Prices. 1st Eastern Conference on Mathematical Finance.

Kim Weston (Carnegie Mellon University) Market Stability and Indifference Prices. 1st Eastern Conference on Mathematical Finance. 1st Eastern Conference on Mathematical Finance March 216 Based on Stability of Utility Maximization in Nonequivalent Markets, Finance & Stochastics (216) Basic Problem Consider a financial market consisting

More information

Games of Incomplete Information

Games of Incomplete Information Games of Incomplete Information EC202 Lectures V & VI Francesco Nava London School of Economics January 2011 Nava (LSE) EC202 Lectures V & VI Jan 2011 1 / 22 Summary Games of Incomplete Information: Definitions:

More information

Game Theory Tutorial 3 Answers

Game Theory Tutorial 3 Answers Game Theory Tutorial 3 Answers Exercise 1 (Duality Theory) Find the dual problem of the following L.P. problem: max x 0 = 3x 1 + 2x 2 s.t. 5x 1 + 2x 2 10 4x 1 + 6x 2 24 x 1 + x 2 1 (1) x 1 + 3x 2 = 9 x

More information

Deterministic Multi-Player Dynkin Games

Deterministic Multi-Player Dynkin Games Deterministic Multi-Player Dynkin Games Eilon Solan and Nicolas Vieille September 3, 2002 Abstract A multi-player Dynkin game is a sequential game in which at every stage one of the players is chosen,

More information

Optimal Selling Strategy With Piecewise Linear Drift Function

Optimal Selling Strategy With Piecewise Linear Drift Function Optimal Selling Strategy With Piecewise Linear Drift Function Yan Jiang July 3, 2009 Abstract In this paper the optimal decision to sell a stock in a given time is investigated when the drift term in Black

More information

The Neoclassical Growth Model

The Neoclassical Growth Model The Neoclassical Growth Model 1 Setup Three goods: Final output Capital Labour One household, with preferences β t u (c t ) (Later we will introduce preferences with respect to labour/leisure) Endowment

More information

Optimal Scheduling Policy Determination in HSDPA Networks

Optimal Scheduling Policy Determination in HSDPA Networks Optimal Scheduling Policy Determination in HSDPA Networks Hussein Al-Zubaidy, Jerome Talim, Ioannis Lambadaris SCE-Carleton University 1125 Colonel By Drive, Ottawa, ON, Canada Email: {hussein, jtalim,

More information

Lecture 5 Leadership and Reputation

Lecture 5 Leadership and Reputation Lecture 5 Leadership and Reputation Reputations arise in situations where there is an element of repetition, and also where coordination between players is possible. One definition of leadership is that

More information

Commutative Stochastic Games

Commutative Stochastic Games Commutative Stochastic Games Xavier Venel To cite this version: Xavier Venel. Commutative Stochastic Games. Mathematics of Operations Research, INFORMS, 2015, . HAL

More information

Game Theory Fall 2003

Game Theory Fall 2003 Game Theory Fall 2003 Problem Set 5 [1] Consider an infinitely repeated game with a finite number of actions for each player and a common discount factor δ. Prove that if δ is close enough to zero then

More information

Probabilistic Robotics: Probabilistic Planning and MDPs

Probabilistic Robotics: Probabilistic Planning and MDPs Probabilistic Robotics: Probabilistic Planning and MDPs Slide credits: Wolfram Burgard, Dieter Fox, Cyrill Stachniss, Giorgio Grisetti, Maren Bennewitz, Christian Plagemann, Dirk Haehnel, Mike Montemerlo,

More information

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective Systems from a Queueing Perspective September 7, 2012 Problem A surveillance resource must observe several areas, searching for potential adversaries. Problem A surveillance resource must observe several

More information

Lecture 4. Finite difference and finite element methods

Lecture 4. Finite difference and finite element methods Finite difference and finite element methods Lecture 4 Outline Black-Scholes equation From expectation to PDE Goal: compute the value of European option with payoff g which is the conditional expectation

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information

Multi-period mean variance asset allocation: Is it bad to win the lottery?

Multi-period mean variance asset allocation: Is it bad to win the lottery? Multi-period mean variance asset allocation: Is it bad to win the lottery? Peter Forsyth 1 D.M. Dang 1 1 Cheriton School of Computer Science University of Waterloo Guangzhou, July 28, 2014 1 / 29 The Basic

More information

The Optimality of Regret Matching

The Optimality of Regret Matching The Optimality of Regret Matching Sergiu Hart July 2008 SERGIU HART c 2008 p. 1 THE OPTIMALITY OF REGRET MATCHING Sergiu Hart Center for the Study of Rationality Dept of Economics Dept of Mathematics The

More information

On Complexity of Multistage Stochastic Programs

On Complexity of Multistage Stochastic Programs On Complexity of Multistage Stochastic Programs Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA e-mail: ashapiro@isye.gatech.edu

More information

AM 121: Intro to Optimization Models and Methods

AM 121: Intro to Optimization Models and Methods AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,

More information

Markov Decision Process

Markov Decision Process Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf

More information

Capacity Expansion Games with Application to Competition in Power May 19, Generation 2017 Investmen 1 / 24

Capacity Expansion Games with Application to Competition in Power May 19, Generation 2017 Investmen 1 / 24 Capacity Expansion Games with Application to Competition in Power Generation Investments joint with René Aïd and Mike Ludkovski CFMAR 10th Anniversary Conference May 19, 017 Capacity Expansion Games with

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline

More information

Game theory for. Leonardo Badia.

Game theory for. Leonardo Badia. Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player

More information

RECURSIVE VALUATION AND SENTIMENTS

RECURSIVE VALUATION AND SENTIMENTS 1 / 32 RECURSIVE VALUATION AND SENTIMENTS Lars Peter Hansen Bendheim Lectures, Princeton University 2 / 32 RECURSIVE VALUATION AND SENTIMENTS ABSTRACT Expectations and uncertainty about growth rates that

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due

More information

A simple wealth model

A simple wealth model Quantitative Macroeconomics Raül Santaeulàlia-Llopis, MOVE-UAB and Barcelona GSE Homework 5, due Thu Nov 1 I A simple wealth model Consider the sequential problem of a household that maximizes over streams

More information

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5 Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5 The basic idea prisoner s dilemma The prisoner s dilemma game with one-shot payoffs 2 2 0

More information

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference. 14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose

More information

Lecture 1: Lucas Model and Asset Pricing

Lecture 1: Lucas Model and Asset Pricing Lecture 1: Lucas Model and Asset Pricing Economics 714, Spring 2018 1 Asset Pricing 1.1 Lucas (1978) Asset Pricing Model We assume that there are a large number of identical agents, modeled as a representative

More information

Lecture 2: The Neoclassical Growth Model

Lecture 2: The Neoclassical Growth Model Lecture 2: The Neoclassical Growth Model Florian Scheuer 1 Plan Introduce production technology, storage multiple goods 2 The Neoclassical Model Three goods: Final output Capital Labor One household, with

More information

6.207/14.15: Networks Lecture 9: Introduction to Game Theory 1

6.207/14.15: Networks Lecture 9: Introduction to Game Theory 1 6.207/14.15: Networks Lecture 9: Introduction to Game Theory 1 Daron Acemoglu and Asu Ozdaglar MIT October 13, 2009 1 Introduction Outline Decisions, Utility Maximization Games and Strategies Best Responses

More information

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure

More information

arxiv: v1 [math.oc] 23 Dec 2010

arxiv: v1 [math.oc] 23 Dec 2010 ASYMPTOTIC PROPERTIES OF OPTIMAL TRAJECTORIES IN DYNAMIC PROGRAMMING SYLVAIN SORIN, XAVIER VENEL, GUILLAUME VIGERAL Abstract. We show in a dynamic programming framework that uniform convergence of the

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

Microeconomic Theory II Preliminary Examination Solutions

Microeconomic Theory II Preliminary Examination Solutions Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

G5212: Game Theory. Mark Dean. Spring 2017

G5212: Game Theory. Mark Dean. Spring 2017 G5212: Game Theory Mark Dean Spring 2017 Bargaining We will now apply the concept of SPNE to bargaining A bit of background Bargaining is hugely interesting but complicated to model It turns out that the

More information

Optimal liquidation with market parameter shift: a forward approach

Optimal liquidation with market parameter shift: a forward approach Optimal liquidation with market parameter shift: a forward approach (with S. Nadtochiy and T. Zariphopoulou) Haoran Wang Ph.D. candidate University of Texas at Austin ICERM June, 2017 Problem Setup and

More information

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens. 102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the

More information

Real Option Analysis for Adjacent Gas Producers to Choose Optimal Operating Strategy, such as Gas Plant Size, Leasing rate, and Entry Point

Real Option Analysis for Adjacent Gas Producers to Choose Optimal Operating Strategy, such as Gas Plant Size, Leasing rate, and Entry Point Real Option Analysis for Adjacent Gas Producers to Choose Optimal Operating Strategy, such as Gas Plant Size, Leasing rate, and Entry Point Gordon A. Sick and Yuanshun Li October 3, 4 Tuesday, October,

More information

Advanced Microeconomics

Advanced Microeconomics Advanced Microeconomics ECON5200 - Fall 2014 Introduction What you have done: - consumers maximize their utility subject to budget constraints and firms maximize their profits given technology and market

More information

Basic Game-Theoretic Concepts. Game in strategic form has following elements. Player set N. (Pure) strategy set for player i, S i.

Basic Game-Theoretic Concepts. Game in strategic form has following elements. Player set N. (Pure) strategy set for player i, S i. Basic Game-Theoretic Concepts Game in strategic form has following elements Player set N (Pure) strategy set for player i, S i. Payoff function f i for player i f i : S R, where S is product of S i s.

More information

Robust Dual Dynamic Programming

Robust Dual Dynamic Programming 1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217 2 / 18 Inspired by SDDP Stochastic optimization

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Game theory and applications: Lecture 1

Game theory and applications: Lecture 1 Game theory and applications: Lecture 1 Adam Szeidl September 20, 2018 Outline for today 1 Some applications of game theory 2 Games in strategic form 3 Dominance 4 Nash equilibrium 1 / 8 1. Some applications

More information

Assembly systems with non-exponential machines: Throughput and bottlenecks

Assembly systems with non-exponential machines: Throughput and bottlenecks Nonlinear Analysis 69 (2008) 911 917 www.elsevier.com/locate/na Assembly systems with non-exponential machines: Throughput and bottlenecks ShiNung Ching, Semyon M. Meerkov, Liang Zhang Department of Electrical

More information

Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities

Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities 1/ 46 Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities Yue Kuen KWOK Department of Mathematics Hong Kong University of Science and Technology * Joint work

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally

More information

Repeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games

Repeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games Repeated Games Frédéric KOESSLER September 3, 2007 1/ Definitions: Discounting, Individual Rationality Finitely Repeated Games Infinitely Repeated Games Automaton Representation of Strategies The One-Shot

More information

Applications of short-time asymptotics to the statistical estimation and option pricing of Lévy-driven models

Applications of short-time asymptotics to the statistical estimation and option pricing of Lévy-driven models Applications of short-time asymptotics to the statistical estimation and option pricing of Lévy-driven models José Enrique Figueroa-López 1 1 Department of Statistics Purdue University CIMAT and Universidad

More information

Worst-case-expectation approach to optimization under uncertainty

Worst-case-expectation approach to optimization under uncertainty Worst-case-expectation approach to optimization under uncertainty Wajdi Tekaya Joint research with Alexander Shapiro, Murilo Pereira Soares and Joari Paulo da Costa : Cambridge Systems Associates; : Georgia

More information

Microeconomics II. CIDE, MsC Economics. List of Problems

Microeconomics II. CIDE, MsC Economics. List of Problems Microeconomics II CIDE, MsC Economics List of Problems 1. There are three people, Amy (A), Bart (B) and Chris (C): A and B have hats. These three people are arranged in a room so that B can see everything

More information

MS&E 246: Lecture 2 The basics. Ramesh Johari January 16, 2007

MS&E 246: Lecture 2 The basics. Ramesh Johari January 16, 2007 MS&E 246: Lecture 2 The basics Ramesh Johari January 16, 2007 Course overview (Mainly) noncooperative game theory. Noncooperative: Focus on individual players incentives (note these might lead to cooperation!)

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

SUCCESSIVE INFORMATION REVELATION IN 3-PLAYER INFINITELY REPEATED GAMES WITH INCOMPLETE INFORMATION ON ONE SIDE

SUCCESSIVE INFORMATION REVELATION IN 3-PLAYER INFINITELY REPEATED GAMES WITH INCOMPLETE INFORMATION ON ONE SIDE SUCCESSIVE INFORMATION REVELATION IN 3-PLAYER INFINITELY REPEATED GAMES WITH INCOMPLETE INFORMATION ON ONE SIDE JULIAN MERSCHEN Bonn Graduate School of Economics, University of Bonn Adenauerallee 24-42,

More information

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course

More information

Black-Scholes and Game Theory. Tushar Vaidya ESD

Black-Scholes and Game Theory. Tushar Vaidya ESD Black-Scholes and Game Theory Tushar Vaidya ESD Sequential game Two players: Nature and Investor Nature acts as an adversary, reveals state of the world S t Investor acts by action a t Investor incurs

More information

SOLVING ROBUST SUPPLY CHAIN PROBLEMS

SOLVING ROBUST SUPPLY CHAIN PROBLEMS SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated

More information

Stochastic Games with 2 Non-Absorbing States

Stochastic Games with 2 Non-Absorbing States Stochastic Games with 2 Non-Absorbing States Eilon Solan June 14, 2000 Abstract In the present paper we consider recursive games that satisfy an absorbing property defined by Vieille. We give two sufficient

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Gittins Index: Discounted, Bayesian (hence Markov arms). Reduces to stopping problem for each arm. Interpretation as (scaled)

More information

Optimal structural policies for ambiguity and risk averse inventory and pricing models

Optimal structural policies for ambiguity and risk averse inventory and pricing models Optimal structural policies for ambiguity and risk averse inventory and pricing models Xin Chen Peng Sun March 13, 2009 Abstract This paper discusses multi-period stochastic joint inventory and pricing

More information

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n 6. Martingales For casino gamblers, a martingale is a betting strategy where (at even odds) the stake doubled each time the player loses. Players follow this strategy because, since they will eventually

More information

CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma

CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma CS 331: Artificial Intelligence Game Theory I 1 Prisoner s Dilemma You and your partner have both been caught red handed near the scene of a burglary. Both of you have been brought to the police station,

More information

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,

More information

Anumericalalgorithm for general HJB equations : a jump-constrained BSDE approach

Anumericalalgorithm for general HJB equations : a jump-constrained BSDE approach Anumericalalgorithm for general HJB equations : a jump-constrained BSDE approach Nicolas Langrené Univ. Paris Diderot - Sorbonne Paris Cité, LPMA, FiME Joint work with Idris Kharroubi (Paris Dauphine),

More information

Option pricing in the stochastic volatility model of Barndorff-Nielsen and Shephard

Option pricing in the stochastic volatility model of Barndorff-Nielsen and Shephard Option pricing in the stochastic volatility model of Barndorff-Nielsen and Shephard Indifference pricing and the minimal entropy martingale measure Fred Espen Benth Centre of Mathematics for Applications

More information

Game Theory for Wireless Engineers Chapter 3, 4

Game Theory for Wireless Engineers Chapter 3, 4 Game Theory for Wireless Engineers Chapter 3, 4 Zhongliang Liang ECE@Mcmaster Univ October 8, 2009 Outline Chapter 3 - Strategic Form Games - 3.1 Definition of A Strategic Form Game - 3.2 Dominated Strategies

More information

Optimal Order Placement

Optimal Order Placement Optimal Order Placement Peter Bank joint work with Antje Fruth OMI Colloquium Oxford-Man-Institute, October 16, 2012 Optimal order execution Broker is asked to do a transaction of a significant fraction

More information