Reduced Complexity Approaches to Asymmetric Information Games
|
|
- Buck Conley
- 5 years ago
- Views:
Transcription
1 Reduced Complexity Approaches to Asymmetric Information Games Jeff Shamma and Lichun Li Georgia Institution of Technology ARO MURI Annual Review November 19, 2014
2 Research Thrust: Obtaining Actionable Cyber-Attack Forecasts Today s talk Value Iteration of Repeated Asymmetric Games and Its Application in Network Interdiction Problems Resilience of LTE Networks against Smart Jamming Attacks. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 2 / 16
3 Project structure Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Data Cyber-Assets Model Predict Future Actions COAs Sensor Alerts Data Impact Analysis Data Create semantically-rich view of cyber-mission status Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 3 / 16
4 Games with different information patterns Player 1's info Player 2's info Markovian Repeated One shot Info pattern Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 4 / 16
5 Network Interdiction Problem: An Asymmetric Game channel 1:10 Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 5 / 16
6 Network Interdiction Problem: An Asymmetric Game channel 1:10 channel 2:1 Attacker s actions Observe which channel is in use, but not able to measure the capacity. This action is effortless. Block one of the channels. This action has a cost of 1. Channel 1 has high capacity? Channel 2 has high capacity? observe attack 1 attack 2 observe attack 1 attack 2 use use Goal: transmit as much info as possible, the sooner the better. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 5 / 16
7 Abstraction of The Game: A Discounted Asymmetric Repeated Game Asymmetric repeated games: Three finite sets: state set S, i.e. high capacity channel is channel 1 or 2? defender s action set I, i.e. use channel 1 or 2? attacker s action set J, i.e. observe, block channel 1 or 2? An initial belief (probability) p 0 over state s, i.e. [0.5; 0.5]. A payoff function g : S I J R, i.e. g(1, 1, 2) = 11. The play rule: Stage 1: Choose state s p 0. State s is told to the defender. Both players independently choose their actions. Both Actions are announced. Stage 2 and forward: Both players independently choose their actions. Both Actions are announced. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 6 / 16
8 Abstraction of The Game: A Discounted Asymmetric Repeated Game Asymmetric repeated games: Three finite sets: state set S, i.e. high capacity channel is channel 1 or 2? defender s action set I, i.e. use channel 1 or 2? attacker s action set J, i.e. observe, block channel 1 or 2? An initial belief (probability) p 0 over state s, i.e. [0.5; 0.5]. A payoff function g : S I J R, i.e. g(1, 1, 2) = 11. The play rule: Stage 1: Choose state s p 0. State s is told to the defender. Both players independently choose their actions. Both Actions are announced. Stage 2 and forward: Both players independently choose their actions. Both Actions are announced. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 6 / 16
9 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. Payoff: γ(p 0, σ, τ) = E p0,σ,τ ( t=1 g(s, i t, j t )). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16
10 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. ( Discounted payoff: γ λ (p 0, σ, τ) = E p0,σ,τ t=1 λ(1 λ)t 1 g(s, i t, j t ) ). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16
11 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. ( Discounted payoff: γ λ (p 0, σ, τ) = E p0,σ,τ t=1 λ(1 λ)t 1 g(s, i t, j t ) ). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16
12 The KEY of the discounted asymmetric game Smart attacker learns from history actions ( p p + s x s ) t (i) (p, x t, i) = x p,xt (i) s S (1) p + : the current (at the beginning of stage t + 1) belief of what the state is. p: the previous (at the beginning of stage t) belief of what the state is. x t : the previous probability distribution over defender s action set. i: the previous action defender took. Defender fully monitors attacker s learning Game value exists. Recursive formula: v λ (p 0 ) = max x t (I) S = min y t (J) min (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) y t (J) max (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) x t (I) S The defender s optimal strategy depends on the attacker s belief only. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 8 / 16
13 The KEY of the discounted asymmetric game Smart attacker learns from history actions ( p p + s x s ) t (i) (p, x t, i) = x p,xt (i) s S (1) p + : the current (at the beginning of stage t + 1) belief of what the state is. p: the previous (at the beginning of stage t) belief of what the state is. x t : the previous probability distribution over defender s action set. i: the previous action defender took. Defender fully monitors attacker s learning Game value exists. Recursive formula: v λ (p 0 ) = max x t (I) S = min y t (J) min (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) y t (J) max (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) x t (I) S The defender s optimal strategy depends on the attacker s belief only. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 8 / 16
14 Value Iteration: A Learning Process Value iteration v λ n+1 (p) = max x n (I) S min (λg(p, x n, y n ) + (1 λ)t p,xn (v λ n )). y n (J) Contraction functional M: M(f ) M( f ) sup a f f sup for some a [0, 1). Q v x(p) = min y (J) {λ s S ps (x s ) T G s y + (1 λ)t p,x (v)} is a contraction functional with contraction constant 1 λ. H v (p) = max x (I) S Q v x(p) is a contraction functional with contraction constant 1 λ. The approximated value function v λ n converges to v λ exponentially with rate 1 λ, i.e. v λ v λ n sup (1 λ) v λ v λ n sup. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 9 / 16
15 Value Iteration: A Learning Process Value iteration v λ n+1 (p) = max x n (I) S min (λg(p, x n, y n ) + (1 λ)t p,xn (v λ n )). y n (J) Contraction functional M: M(f ) M( f ) sup a f f sup for some a [0, 1). Q v x(p) = min y (J) {λ s S ps (x s ) T G s y + (1 λ)t p,x (v)} is a contraction functional with contraction constant 1 λ. H v (p) = max x (I) S Q v x(p) is a contraction functional with contraction constant 1 λ. The approximated value function v λ n converges to v λ exponentially with rate 1 λ, i.e. v λ v λ n sup (1 λ) v λ v λ n sup. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 9 / 16
16 Suboptimal Policy σ λ n and Its Error Bound Suboptimal policy based on v λ n : σ λ n = arg max σ Σ The worst case payoff: min (λg(p, σ(p), y) + (1 λ)t p,σ(p)(v λ n )) y (J) J σ λ n (p) = min γ λ (p, σ λ n, τ). τ The game value J σ λ n induced by the sub-optimal policy σ λ n satisfies v λ J σ 2(1 λ) λ n sup v λ v λ λ n sup 2(1 λ)n+1 = v λ sup. λ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 10 / 16
17 Linear Programming Formulation of v λ n (p) Compute v λ 1 (p) = max x (I) S min λ p s x st G s y. y (J) s S s.t. max λl x,l p s G st x s l1 s S 1 T x s =1, s S x s 0, s S z s =p s x s == s.t. max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 11 / 16
18 Linear Programming Formulation of v λ n (p) Compute v λ 1 (p) = max x (I) S min λ p s x st G s y. y (J) s S s.t. max λl x,l p s G st x s l1 s S 1 T x s =1, s S x s 0, s S z s =p s x s == s.t. max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 11 / 16
19 Linear Programming Formulation of v λ n (p) max z t h t,l t ht Compute v λ n n t=1 h t H t λ(1 λ) t 1 l t ht s.t. for all t = 1, 2,..., n, and all h t H t, (G s ) T z s t h t l t ht 1 s S 1 T z s t h t =z s t 1 h t 1 (i t 1 ), s S z s t h t 0, s S, Computational complexity: O( S J I n ). s.t. Compute v λ 1 max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, Computational complexity of extensive form: O( S J n I n ). s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 12 / 16
20 Application in Network Interdiction Problems channel 1:10 channel 2 :1 Channel 1 has high capacity? observe attack 1 attack 2 use use Channel 2 has high capacity? observe attack 1 attack 2 use use p 0 = [0.5; 0.5]. λ = 0.1. Run a 100 stage simulation for 10 times At each stage, we update the current belief p, compute v λ 6 (p), σ λ 6 (p) and the worst case payoff, and choose an action according to σ λ 6 (p). The worst case payoff ranges from 3.61 to 3.97 with the average at With only one channel, the defender has a reward of 1. With a channel of capacity 1, the defender gains 2.77 more reward. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 13 / 16
21 Markovian Games (on the go) The transition matrix only depends on the defender. Game value exists. Value iteration converges exponentially to the game value. LP formulation to compute the value iteration and corresponding strategy. The transition matrix depends on both the defender and the attacker. Game value may not exist. 1 1 D. Rosenberg, E. Solan, N. Vieille Stochastic games with a single controller and incomplete information. SIAM J. Control Optim Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 14 / 16
22 Project Progress Last year: LP formulation of repeated asymmetric games with finite horizon. This year: LP formulation of Markovian asymmetric games with finite horizon. This year: Approximate policies for discounted repeated asymmetric games with infinite horizon. On the go: Approximate policies for discounted Markovian asymmetric games with infinite horizon. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 15 / 16
23 Thanks. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 16 / 16
An introduction on game theory for wireless networking [1]
An introduction on game theory for wireless networking [1] Ning Zhang 14 May, 2012 [1] Game Theory in Wireless Networks: A Tutorial 1 Roadmap 1 Introduction 2 Static games 3 Extensive-form games 4 Summary
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in
More informationMATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS
MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.
More informationMaking Decisions. CS 3793 Artificial Intelligence Making Decisions 1
Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More information17 MAKING COMPLEX DECISIONS
267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the
More informationStochastic Games and Bayesian Games
Stochastic Games and Bayesian Games CPSC 532l Lecture 10 Stochastic Games and Bayesian Games CPSC 532l Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games 4 Analyzing Bayesian
More informationPOMDPs: Partially Observable Markov Decision Processes Advanced AI
POMDPs: Partially Observable Markov Decision Processes Advanced AI Wolfram Burgard Types of Planning Problems Classical Planning State observable Action Model Deterministic, accurate MDPs observable stochastic
More informationarxiv: v2 [cs.gt] 7 Nov 2017
1 An LP approach for Solving Two-Player Zero-Sum Repeated Bayesian Games Lichun Li, Cedric Langbort and Jeff Shamma arxiv:1703.01957v2 [cs.gt] 7 Nov 2017 Abstract This paper studies two-player zero-sum
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationOptimal Investment for Worst-Case Crash Scenarios
Optimal Investment for Worst-Case Crash Scenarios A Martingale Approach Frank Thomas Seifried Department of Mathematics, University of Kaiserslautern June 23, 2010 (Bachelier 2010) Worst-Case Portfolio
More informationStochastic Games and Bayesian Games
Stochastic Games and Bayesian Games CPSC 532L Lecture 10 Stochastic Games and Bayesian Games CPSC 532L Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games Stochastic Games
More informationAn Application of Ramsey Theorem to Stopping Games
An Application of Ramsey Theorem to Stopping Games Eran Shmaya, Eilon Solan and Nicolas Vieille July 24, 2001 Abstract We prove that every two-player non zero-sum deterministic stopping game with uniformly
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More informationOptimal Policies for Distributed Data Aggregation in Wireless Sensor Networks
Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks Hussein Abouzeid Department of Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute abouzeid@ecse.rpi.edu
More informationMaking Complex Decisions
Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2
More informationThe folk theorem revisited
Economic Theory 27, 321 332 (2006) DOI: 10.1007/s00199-004-0580-7 The folk theorem revisited James Bergin Department of Economics, Queen s University, Ontario K7L 3N6, CANADA (e-mail: berginj@qed.econ.queensu.ca)
More informationBlackwell Optimality in Markov Decision Processes with Partial Observation
Blackwell Optimality in Markov Decision Processes with Partial Observation Dinah Rosenberg and Eilon Solan and Nicolas Vieille April 6, 2000 Abstract We prove the existence of Blackwell ε-optimal strategies
More informationReinforcement Learning and Simulation-Based Search
Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision
More informationRevenue Management with Incomplete Demand Information
Revenue Management with Incomplete Demand Information Victor F. Araman René Caldentey Stern School of Business, New York University, New York, NY 10012. Abstract Consider a seller who is endowed with a
More informationA Robust Option Pricing Problem
IMA 2003 Workshop, March 12-19, 2003 A Robust Option Pricing Problem Laurent El Ghaoui Department of EECS, UC Berkeley 3 Robust optimization standard form: min x sup u U f 0 (x, u) : u U, f i (x, u) 0,
More informationFramework and Methods for Infrastructure Management. Samer Madanat UC Berkeley NAS Infrastructure Management Conference, September 2005
Framework and Methods for Infrastructure Management Samer Madanat UC Berkeley NAS Infrastructure Management Conference, September 2005 Outline 1. Background: Infrastructure Management 2. Flowchart for
More informationThe Value of Information in Central-Place Foraging. Research Report
The Value of Information in Central-Place Foraging. Research Report E. J. Collins A. I. Houston J. M. McNamara 22 February 2006 Abstract We consider a central place forager with two qualitatively different
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements
More informationRepeated Games. EC202 Lectures IX & X. Francesco Nava. January London School of Economics. Nava (LSE) EC202 Lectures IX & X Jan / 16
Repeated Games EC202 Lectures IX & X Francesco Nava London School of Economics January 2011 Nava (LSE) EC202 Lectures IX & X Jan 2011 1 / 16 Summary Repeated Games: Definitions: Feasible Payoffs Minmax
More informationKim Weston (Carnegie Mellon University) Market Stability and Indifference Prices. 1st Eastern Conference on Mathematical Finance.
1st Eastern Conference on Mathematical Finance March 216 Based on Stability of Utility Maximization in Nonequivalent Markets, Finance & Stochastics (216) Basic Problem Consider a financial market consisting
More informationGames of Incomplete Information
Games of Incomplete Information EC202 Lectures V & VI Francesco Nava London School of Economics January 2011 Nava (LSE) EC202 Lectures V & VI Jan 2011 1 / 22 Summary Games of Incomplete Information: Definitions:
More informationGame Theory Tutorial 3 Answers
Game Theory Tutorial 3 Answers Exercise 1 (Duality Theory) Find the dual problem of the following L.P. problem: max x 0 = 3x 1 + 2x 2 s.t. 5x 1 + 2x 2 10 4x 1 + 6x 2 24 x 1 + x 2 1 (1) x 1 + 3x 2 = 9 x
More informationDeterministic Multi-Player Dynkin Games
Deterministic Multi-Player Dynkin Games Eilon Solan and Nicolas Vieille September 3, 2002 Abstract A multi-player Dynkin game is a sequential game in which at every stage one of the players is chosen,
More informationOptimal Selling Strategy With Piecewise Linear Drift Function
Optimal Selling Strategy With Piecewise Linear Drift Function Yan Jiang July 3, 2009 Abstract In this paper the optimal decision to sell a stock in a given time is investigated when the drift term in Black
More informationThe Neoclassical Growth Model
The Neoclassical Growth Model 1 Setup Three goods: Final output Capital Labour One household, with preferences β t u (c t ) (Later we will introduce preferences with respect to labour/leisure) Endowment
More informationOptimal Scheduling Policy Determination in HSDPA Networks
Optimal Scheduling Policy Determination in HSDPA Networks Hussein Al-Zubaidy, Jerome Talim, Ioannis Lambadaris SCE-Carleton University 1125 Colonel By Drive, Ottawa, ON, Canada Email: {hussein, jtalim,
More informationLecture 5 Leadership and Reputation
Lecture 5 Leadership and Reputation Reputations arise in situations where there is an element of repetition, and also where coordination between players is possible. One definition of leadership is that
More informationCommutative Stochastic Games
Commutative Stochastic Games Xavier Venel To cite this version: Xavier Venel. Commutative Stochastic Games. Mathematics of Operations Research, INFORMS, 2015, . HAL
More informationGame Theory Fall 2003
Game Theory Fall 2003 Problem Set 5 [1] Consider an infinitely repeated game with a finite number of actions for each player and a common discount factor δ. Prove that if δ is close enough to zero then
More informationProbabilistic Robotics: Probabilistic Planning and MDPs
Probabilistic Robotics: Probabilistic Planning and MDPs Slide credits: Wolfram Burgard, Dieter Fox, Cyrill Stachniss, Giorgio Grisetti, Maren Bennewitz, Christian Plagemann, Dirk Haehnel, Mike Montemerlo,
More informationModelling Anti-Terrorist Surveillance Systems from a Queueing Perspective
Systems from a Queueing Perspective September 7, 2012 Problem A surveillance resource must observe several areas, searching for potential adversaries. Problem A surveillance resource must observe several
More informationLecture 4. Finite difference and finite element methods
Finite difference and finite element methods Lecture 4 Outline Black-Scholes equation From expectation to PDE Goal: compute the value of European option with payoff g which is the conditional expectation
More information16 MAKING SIMPLE DECISIONS
253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)
More informationReinforcement Learning
Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical
More informationMulti-period mean variance asset allocation: Is it bad to win the lottery?
Multi-period mean variance asset allocation: Is it bad to win the lottery? Peter Forsyth 1 D.M. Dang 1 1 Cheriton School of Computer Science University of Waterloo Guangzhou, July 28, 2014 1 / 29 The Basic
More informationThe Optimality of Regret Matching
The Optimality of Regret Matching Sergiu Hart July 2008 SERGIU HART c 2008 p. 1 THE OPTIMALITY OF REGRET MATCHING Sergiu Hart Center for the Study of Rationality Dept of Economics Dept of Mathematics The
More informationOn Complexity of Multistage Stochastic Programs
On Complexity of Multistage Stochastic Programs Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA e-mail: ashapiro@isye.gatech.edu
More informationAM 121: Intro to Optimization Models and Methods
AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,
More informationMarkov Decision Process
Markov Decision Process Human-aware Robotics 2018/02/13 Chapter 17.3 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/mdp-ii.pdf
More informationCapacity Expansion Games with Application to Competition in Power May 19, Generation 2017 Investmen 1 / 24
Capacity Expansion Games with Application to Competition in Power Generation Investments joint with René Aïd and Mike Ludkovski CFMAR 10th Anniversary Conference May 19, 017 Capacity Expansion Games with
More informationCPS 270: Artificial Intelligence Markov decision processes, POMDPs
CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward
More informationCS 188: Artificial Intelligence. Outline
C 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley ome slides adapted from Dan Klein 1 Outline Markov Decision Processes (MDPs) Formalism Value iteration In essence
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Markov Decision Processes (MDP)! Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer, Dan Klein, Pieter Abbeel, Stuart Russell or Andrew Moore 1 Outline
More informationGame theory for. Leonardo Badia.
Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player
More informationRECURSIVE VALUATION AND SENTIMENTS
1 / 32 RECURSIVE VALUATION AND SENTIMENTS Lars Peter Hansen Bendheim Lectures, Princeton University 2 / 32 RECURSIVE VALUATION AND SENTIMENTS ABSTRACT Expectations and uncertainty about growth rates that
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationCSE 473: Artificial Intelligence
CSE 473: Artificial Intelligence Markov Decision Processes (MDPs) Luke Zettlemoyer Many slides over the course adapted from Dan Klein, Stuart Russell or Andrew Moore 1 Announcements PS2 online now Due
More informationA simple wealth model
Quantitative Macroeconomics Raül Santaeulàlia-Llopis, MOVE-UAB and Barcelona GSE Homework 5, due Thu Nov 1 I A simple wealth model Consider the sequential problem of a household that maximizes over streams
More informationEconomics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5
Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5 The basic idea prisoner s dilemma The prisoner s dilemma game with one-shot payoffs 2 2 0
More informationGAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.
14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose
More informationLecture 1: Lucas Model and Asset Pricing
Lecture 1: Lucas Model and Asset Pricing Economics 714, Spring 2018 1 Asset Pricing 1.1 Lucas (1978) Asset Pricing Model We assume that there are a large number of identical agents, modeled as a representative
More informationLecture 2: The Neoclassical Growth Model
Lecture 2: The Neoclassical Growth Model Florian Scheuer 1 Plan Introduce production technology, storage multiple goods 2 The Neoclassical Model Three goods: Final output Capital Labor One household, with
More information6.207/14.15: Networks Lecture 9: Introduction to Game Theory 1
6.207/14.15: Networks Lecture 9: Introduction to Game Theory 1 Daron Acemoglu and Asu Ozdaglar MIT October 13, 2009 1 Introduction Outline Decisions, Utility Maximization Games and Strategies Best Responses
More informationCS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm
CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure
More informationarxiv: v1 [math.oc] 23 Dec 2010
ASYMPTOTIC PROPERTIES OF OPTIMAL TRAJECTORIES IN DYNAMIC PROGRAMMING SYLVAIN SORIN, XAVIER VENEL, GUILLAUME VIGERAL Abstract. We show in a dynamic programming framework that uniform convergence of the
More informationDecision Theory: Value Iteration
Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision
More informationMicroeconomic Theory II Preliminary Examination Solutions
Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose
More informationLogistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week
CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials
More informationComplex Decisions. Sequential Decision Making
Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by
More informationG5212: Game Theory. Mark Dean. Spring 2017
G5212: Game Theory Mark Dean Spring 2017 Bargaining We will now apply the concept of SPNE to bargaining A bit of background Bargaining is hugely interesting but complicated to model It turns out that the
More informationOptimal liquidation with market parameter shift: a forward approach
Optimal liquidation with market parameter shift: a forward approach (with S. Nadtochiy and T. Zariphopoulou) Haoran Wang Ph.D. candidate University of Texas at Austin ICERM June, 2017 Problem Setup and
More informationDefinition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.
102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the
More informationReal Option Analysis for Adjacent Gas Producers to Choose Optimal Operating Strategy, such as Gas Plant Size, Leasing rate, and Entry Point
Real Option Analysis for Adjacent Gas Producers to Choose Optimal Operating Strategy, such as Gas Plant Size, Leasing rate, and Entry Point Gordon A. Sick and Yuanshun Li October 3, 4 Tuesday, October,
More informationAdvanced Microeconomics
Advanced Microeconomics ECON5200 - Fall 2014 Introduction What you have done: - consumers maximize their utility subject to budget constraints and firms maximize their profits given technology and market
More informationBasic Game-Theoretic Concepts. Game in strategic form has following elements. Player set N. (Pure) strategy set for player i, S i.
Basic Game-Theoretic Concepts Game in strategic form has following elements Player set N (Pure) strategy set for player i, S i. Payoff function f i for player i f i : S R, where S is product of S i s.
More informationRobust Dual Dynamic Programming
1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217 2 / 18 Inspired by SDDP Stochastic optimization
More information16 MAKING SIMPLE DECISIONS
247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result
More informationGame theory and applications: Lecture 1
Game theory and applications: Lecture 1 Adam Szeidl September 20, 2018 Outline for today 1 Some applications of game theory 2 Games in strategic form 3 Dominance 4 Nash equilibrium 1 / 8 1. Some applications
More informationAssembly systems with non-exponential machines: Throughput and bottlenecks
Nonlinear Analysis 69 (2008) 911 917 www.elsevier.com/locate/na Assembly systems with non-exponential machines: Throughput and bottlenecks ShiNung Ching, Semyon M. Meerkov, Liang Zhang Department of Electrical
More informationSingular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities
1/ 46 Singular Stochastic Control Models for Optimal Dynamic Withdrawal Policies in Variable Annuities Yue Kuen KWOK Department of Mathematics Hong Kong University of Science and Technology * Joint work
More informationReasoning with Uncertainty
Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally
More informationRepeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games
Repeated Games Frédéric KOESSLER September 3, 2007 1/ Definitions: Discounting, Individual Rationality Finitely Repeated Games Infinitely Repeated Games Automaton Representation of Strategies The One-Shot
More informationApplications of short-time asymptotics to the statistical estimation and option pricing of Lévy-driven models
Applications of short-time asymptotics to the statistical estimation and option pricing of Lévy-driven models José Enrique Figueroa-López 1 1 Department of Statistics Purdue University CIMAT and Universidad
More informationWorst-case-expectation approach to optimization under uncertainty
Worst-case-expectation approach to optimization under uncertainty Wajdi Tekaya Joint research with Alexander Shapiro, Murilo Pereira Soares and Joari Paulo da Costa : Cambridge Systems Associates; : Georgia
More informationMicroeconomics II. CIDE, MsC Economics. List of Problems
Microeconomics II CIDE, MsC Economics List of Problems 1. There are three people, Amy (A), Bart (B) and Chris (C): A and B have hats. These three people are arranged in a room so that B can see everything
More informationMS&E 246: Lecture 2 The basics. Ramesh Johari January 16, 2007
MS&E 246: Lecture 2 The basics Ramesh Johari January 16, 2007 Course overview (Mainly) noncooperative game theory. Noncooperative: Focus on individual players incentives (note these might lead to cooperation!)
More informationMarkov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N
Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning
More informationSUCCESSIVE INFORMATION REVELATION IN 3-PLAYER INFINITELY REPEATED GAMES WITH INCOMPLETE INFORMATION ON ONE SIDE
SUCCESSIVE INFORMATION REVELATION IN 3-PLAYER INFINITELY REPEATED GAMES WITH INCOMPLETE INFORMATION ON ONE SIDE JULIAN MERSCHEN Bonn Graduate School of Economics, University of Bonn Adenauerallee 24-42,
More informationOptimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models
Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More information91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010
91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010 Lecture 17 & 18: Markov Decision Processes Oct 12 13, 2010 A subset of Lecture 9 slides from Dan Klein UC Berkeley Many slides over the course
More informationBlack-Scholes and Game Theory. Tushar Vaidya ESD
Black-Scholes and Game Theory Tushar Vaidya ESD Sequential game Two players: Nature and Investor Nature acts as an adversary, reveals state of the world S t Investor acts by action a t Investor incurs
More informationSOLVING ROBUST SUPPLY CHAIN PROBLEMS
SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated
More informationStochastic Games with 2 Non-Absorbing States
Stochastic Games with 2 Non-Absorbing States Eilon Solan June 14, 2000 Abstract In the present paper we consider recursive games that satisfy an absorbing property defined by Vieille. We give two sufficient
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Gittins Index: Discounted, Bayesian (hence Markov arms). Reduces to stopping problem for each arm. Interpretation as (scaled)
More informationOptimal structural policies for ambiguity and risk averse inventory and pricing models
Optimal structural policies for ambiguity and risk averse inventory and pricing models Xin Chen Peng Sun March 13, 2009 Abstract This paper discusses multi-period stochastic joint inventory and pricing
More information6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n
6. Martingales For casino gamblers, a martingale is a betting strategy where (at even odds) the stake doubled each time the player loses. Players follow this strategy because, since they will eventually
More informationCS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma
CS 331: Artificial Intelligence Game Theory I 1 Prisoner s Dilemma You and your partner have both been caught red handed near the scene of a burglary. Both of you have been brought to the police station,
More informationStrategies and Nash Equilibrium. A Whirlwind Tour of Game Theory
Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,
More informationAnumericalalgorithm for general HJB equations : a jump-constrained BSDE approach
Anumericalalgorithm for general HJB equations : a jump-constrained BSDE approach Nicolas Langrené Univ. Paris Diderot - Sorbonne Paris Cité, LPMA, FiME Joint work with Idris Kharroubi (Paris Dauphine),
More informationOption pricing in the stochastic volatility model of Barndorff-Nielsen and Shephard
Option pricing in the stochastic volatility model of Barndorff-Nielsen and Shephard Indifference pricing and the minimal entropy martingale measure Fred Espen Benth Centre of Mathematics for Applications
More informationGame Theory for Wireless Engineers Chapter 3, 4
Game Theory for Wireless Engineers Chapter 3, 4 Zhongliang Liang ECE@Mcmaster Univ October 8, 2009 Outline Chapter 3 - Strategic Form Games - 3.1 Definition of A Strategic Form Game - 3.2 Dominated Strategies
More informationOptimal Order Placement
Optimal Order Placement Peter Bank joint work with Antje Fruth OMI Colloquium Oxford-Man-Institute, October 16, 2012 Optimal order execution Broker is asked to do a transaction of a significant fraction
More information