Long Term Values in MDPs Second Workshop on Open Games
|
|
- Tyler Heath
- 5 years ago
- Views:
Transcription
1 A (Co)Algebraic Perspective on Long Term Values in MDPs Second Workshop on Open Games Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
2 Introduction Joint work with Frank Feys (Delft) and Larry Moss (Indiana): Long-Term Values in Markov Decision Processes, (Co)Algebraically. Proc. of Coalgebraic Methods in Computer Science (CMCS 2018). Aim: Apply (co)algebraic techniques to reason about Markov Decision Processes More generally, infinite games and equilibria (cf. Abramsky & Winschel, Lescanne, Hedges, Zahn, Ghani, Kupke, Lambert, Nordvall-Forsberg,...) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
3 Outline 1 MDP Preliminaries 2 Part I: Long-Term Values from b-corecursive Algebras 3 Part II: Policy Improvement (Co)Inductively 4 Conclusion Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
4 MDPs: Planning under Uncertainty Markov decision processes (MDPs) are state-based models of sequential decision-making under uncertainty. The system/agent chooses actions and collects rewards, but does not have full control over transitions. The decision maker wants to find a policy/plan that maximizes future expected long-term rewards. Applications: maintenance schedules, production planning, finance, reinforcement learning,... MDPs are one-player stochastic games. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
5 MDP Example A start-up company needs to decide to Advertise or Save money: S 1/2 1/2 S 1 Poor & Unknown +0 Rich & 1/2 A A 1/2 1/2 1/2 1/2 1/2 1/2 Poor Famous +0 Rich Unknown Famous +10 1/2 +10 S S & & A 1 1 A Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
6 Markov Decision Processes Def. A (discrete, time-independent) Markov Decision Process (MDP) is a Set-coalgebra m = u, t : S R ( S) A where S finite set of states, A is a finite set of actions, (, δ, µ) is the monad of finitely-supported distributions, t : X ( X ) A is a probabilistic transition function, u : X R is a reward function. (Alternatively, m : S (R S) A, i.e. rewards are given on transitions) Def. A (deterministic, stationary) policy σ is a map σ : S A. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
7 Expected Rewards via Trace Semantics Given m = u, t : S R ( S) A and policy σ : S A, we get Markov reward process m σ = u, t σ : S R S where t σ (s) := t(s)(σ(s)) mσ : S R S by determinisation (cf. Jacobs, Silva, Sokolova), details coming up. trc S δ S! R ω m σ= u,t σ m σ R S id R! R R ω Trace semantics trc(s) = (r σ 0 (s), r σ 1 (s), r σ 2 (s),...) where r σ n (s) is expected reward at time step n starting from s. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
8 Distributive Laws (cf. Bartels (2004)) Let (T, η, µ) be a monad on C and F : C C. A distributive law of (T, η, µ) over F is λ: TF FT satisfying λ η F = F η and λ µ F = F µ λ T T λ. Given such a λ, we obtain liftings F λ : EM(T ) EM(T ) (TA α A) (TFA λ A FTA F α FA) T λ : Coalg(F ) Coalg(F )...and determinisation (X c FX ) (TX Tc TFX λ X FTX ) ( ) : Coalg(FT ) Coalg(F λ ) (X c FTX ) (TX Tc TFTX λ TX FT 2 X F µ X FTX ) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
9 Distributive Law for Markov Reward Processes Markov reward process m σ : S R S is H -coalgebra where H = R Id. There is distributive law of (, δ, µ) over H (cf. Jacobs (2006)) χ X : (R X ) π 1, π 2 R X E id R X i.e. χ = E π 1, π 2. which yields determinisation of Markov reward process m σ : m σ : S R S given by m σ(ϕ) = (E u)(ϕ), (µ S t σ )(ϕ) = ( s S u(s) ϕ(s), s s S t σ(s)(s ) ϕ(s )). Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
10 Trace Semantics of Markov Reward Process trc S δ S! R ω m σ= u,t σ m σ R S id R! R R ω Trace semantics trc(s) = (r σ 0 (s), r σ 1 (s), r σ 2 (s),...) where r σ n (s) is expected reward at time step n starting from s. Long-term expected value for σ in s depends on how you evaluate Different evaluation criteria exist... (r σ 0 (s), r σ 1 (s), r σ 2 (s),...) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
11 Long-Term Value via Discounted Sums Let 0 γ < 1 be a discount factor. Def. The long-term value of policy σ according to the discounted sum criterion is V σ : S R: V σ (s) = γ n rn σ (s) n=0 Converges because reward map u : S R is bounded. We define: σ τ if for all s, V σ (s) V τ (s). σ is an optimal policy if σ τ for all τ. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
12 Optimal Value Def. The optimal value function V : S R of m is defined as V (s) = max V σ (s) σ Classical facts (wrt discounted sum criterion), cf. (Puterman, 2014): If σ is optimal, then V σ = V. Optimal policy always exists. Optimal policies need not be unique. Stationary (memoryless), deterministic policies suffice. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
13 Outline 1 MDP Preliminaries 2 Part I: Long-Term Values from b-corecursive Algebras 3 Part II: Policy Improvement (Co)Inductively 4 Conclusion Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
14 V σ as Coalgebra-to-Algebra Morphism V σ : S R satisfies for all s S: V σ (s) = u(s) + γ s S t σ(s)(s ) V σ (s ) i.e. V σ = u + γt σ V σ (as linear system) So, V σ arises as a fixpoint of the linear operator Ψ σ : R S R S given by Ψ σ (v) = u + γt σ v Observation: we can re-express (1) as V σ being a coalgebra-to-algebra morphism: (1) S V σ R α γ m σ= u,t σ R R R E R S R R R (V σ ) where α γ : R R R is α γ (x, y) = x + γ y. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
15 V σ via Universal Property? Recall that a corecursive algebra (for functor F ) is an F -algebra α s.t. f C FC! f A α FA Ff Question: Is α γ (R E) a corecursive algebra? By (Capretta et al., 2004): Let H = R Id. α γ (R E) a corecursive algebra (for H ) iff α γ corecursive algebra (for H) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
16 Is α γ Corecursive Algebra? α γ : R R R is corecursive (for H = R Id) if: for all f : X R X there is unique f : X R such that f = α γ (R f ) f. Consider coalgebra f : X R X given by a f = x 0 a 0 1 a x1 2 x2... f is solution iff f (x n ) = a n + γ f (x n+1 ), n = 0, 1, 2,.... This system has infinitely many solutions when γ > 0, even if (a n ) n is bounded. So, α γ is not corecursive for γ > 0. However, if (a n ) n is bounded then this system has a unique bounded solution. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
17 Bounded Corecursive Algebra (bca) To get uniqueness incorporate boundedness information. Def. A b-category (C, B) is a category C with a subclass B Mor(C) of bounded morphisms s.t. for all f : f B f g B. (Also known as a sieve.) Main example: (Met, B) where Met is metric spaces with all maps, and B are all bounded maps. Def. Let (C, B) be a b-category and F : C C endofunctor. An F -algebra α: FA A is a b-corecursive algebra (bca) if X f B FX!f B A α FA Ff Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
18 V σ from Universal Property of bca We show α γ (R E) is bca for H : 1 Develop some theory of b-categories (b-functors, b-natural transformation, B-preservation properties,... ) 2 Prove b-version of (Capretta et al.)-result (Theorem 2). (From bca for H we obtain a bca for H under certain conditions). 3 Show that α γ is bca for H. 4 Show that conditions for Theorem 2 apply. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
19 Step 1.1: b-categories Let (C, B) and (C, B ) be b-categories. Let F, G : C C be functors. Def. A C-arrow f preserves B if g B f g B (whenever f g is defined). F is a b-functor if f B implies that Ff preserves B. F is a strong b-functor if f B implies that Ff B. A natural transformation σ : F G is a b-natural transformation if all components σ X preserve B. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
20 Step 1.2: MDPs in (Met, B) Functors H = R Id and are lifted to Met: The product (X, d X ) (Y, d Y ) = (X Y, d X d Y ) is given the maximum metric. (X, d X ) = ( X, d X ) where d X is the Kantorovich lifting of d X on X. We have: H : Met Met is b-functor on (Met, B), but not strong. : Met Met is strong b-functor on (Met, B). δ, µ, χ are b-natural transformations wrt (Met, B). B (bounded maps) is closed under determinisation: c B c B. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
21 Step 2: b-version of (Capretta et al.) Theorem 2 Let (C, B) be a b-category, F a C-endofunctor, (T, η, µ) a monad on C, and λ a distributive law of (T, η, µ) over F. Assume further that T is a strong b-functor and that λ and F µ are b-natural in (C, B). 1 If β : F λ (A, θ) (A, θ) is an F λ -algebra in EM(T ) such that underlying β : FA A is a bca for F, and θ preserves B, then β F θ : FTA A is a bca for FT. 2 Moreover, for all g : X FTX in B, we have g = (g ) η X and (g ) = θ Tg, where h h is the solution operation for the bca β : FA A and h h is the solution operation for the bca β F θ : FTA A. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
22 Step 3+4: Obtaining bcas Step 3: Apply Banach FPT to show that α γ is bca for H. Step 4: Apply Theorem 2: - α γ is H λ -algebra (R R) αγ R E π 1, π 2 R R E R E R R α γ R - E preserves B Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
23 Optimal Value V from bca S V R m= u,t α γ (R max A E A ) R ( S) A R ( V ) A R ( R) A But bca is not obtained via Theorem 2. Problem: max A is not affine/linear. We can show directly that we have bca. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
24 Outline 1 MDP Preliminaries 2 Part I: Long-Term Values from b-corecursive Algebras 3 Part II: Policy Improvement (Co)Inductively 4 Conclusion Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
25 Policy Iteration For σ : S A and ϕ S, let l σ (ϕ) = s S ϕ(s) V σ (s ) (expected long-term value for σ wrt ϕ). Policy Iteration Algorithm: 1 Initialise σ 0 to any policy. 2 Compute V σ k (e.g. by solving system of linear equations). 3 Define σ k+1 by σ k+1 (s) := argmax a A {l σk (t a (s))} 4 If σ k+1 = σ k then stop, else go to step 2. Termination: (since A S is finite). Correctness: follows if σ k+1 σ k. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
26 Policy Improvement By definition, for all s S, σ k+1 (s) = argmax a A {l σk (t a (s))}. which implies for all s S, l σk (t σk+1 (s)) l σk (t σk (s)), i.e. l σk t σk+1 l σk t σk (in pointwise order on R S ). Policy Improvement Lemma: For all σ, τ: l σ t τ l σ t σ V τ V σ Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
27 Contraction (Co)Induction Def. Ordered metric space An ordered (complete) metric space (M, d, ) is a (complete) metric space (M, d) together with a partial order (M, ) such that for all y M, {z z y} and {z y z} are closed in the metric topology. Example: B(X, R) with the pointwise order and supremum metric. Theorem: Contraction (Co)Induction Let M be a non-empty, ordered complete metric space. If f : M M is both contractive and order-preserving, then the fixpoint x of f is: (i) a least pre-fixpoint (if f (x) x, then x x), and (ii) a greatest post-fixpoint (if x f (x), then x x ). Cf. Metric Coinduction (Kozen & Ruozzi, 2009) and (Denardo, 1967). Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
28 Proof of Policy Improvement Policy Improvement Lemma: For all σ, τ: l σ t τ l σ t σ V τ V σ Proof: Apply Contraction (Co)induction to Ψ σ : R S R S (contractive and order-preserving ) Ψ σ (v) = u + γt σ v, and V σ is its fixpoint. We have: (by contr. coind.) l σ t τ l σ t σ u + γ l σ t τ u + γ l σ t σ Ψ τ (V σ ) Ψ σ (V σ ) = V σ V τ V σ Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
29 Conclusion Value functions V σ and V from b-corecursive algebras. Coinductive proof of policy improvement theorem. Still need to resort to Banach FPT to get fixpoints. cf. metric coinduction (Kozen & Ruozzi, CALCO 2009). Future work Stochastic games (MDP is 1-player stochastic game): Existence of Nash Eq = Kakutani + Contraction (Co)Induction. Other types of equilibria (Subgame perfect, Markov,...) Make connections to Open games (Hedges et al.) Semantics of equilibria (Pavlovic, 2009) Coalgebraic infinite games (Abramsky & Winschel, 2017) Learning: reinforcement learning Thanks! Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33
Long-Term Values in MDPs, Corecursively
Long-Term Values in MDPs, Corecursively Applied Category Theory, 15-16 March 2018, NIST Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) MDPs, Corecursively NIST, 15/Mar/2018
More informationÉcole normale supérieure, MPRI, M2 Year 2007/2008. Course 2-6 Abstract interpretation: application to verification and static analysis P.
École normale supérieure, MPRI, M2 Year 2007/2008 Course 2-6 Abstract interpretation: application to verification and static analysis P. Cousot Questions and answers of the partial exam of Friday November
More informationMarkov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N
Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationMaking Complex Decisions
Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2
More informationMarkov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo
Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov
More informationRational Behaviour and Strategy Construction in Infinite Multiplayer Games
Rational Behaviour and Strategy Construction in Infinite Multiplayer Games Michael Ummels ummels@logic.rwth-aachen.de FSTTCS 2006 Michael Ummels Rational Behaviour and Strategy Construction 1 / 15 Infinite
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationMaking Decisions. CS 3793 Artificial Intelligence Making Decisions 1
Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside
More informationGame Theory: Normal Form Games
Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.
More informationOn Existence of Equilibria. Bayesian Allocation-Mechanisms
On Existence of Equilibria in Bayesian Allocation Mechanisms Northwestern University April 23, 2014 Bayesian Allocation Mechanisms In allocation mechanisms, agents choose messages. The messages determine
More informationTDT4171 Artificial Intelligence Methods
TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods
More informationCS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm
CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure
More informationMarkov Decision Processes II
Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies
More informationTHE NUMBER OF UNARY CLONES CONTAINING THE PERMUTATIONS ON AN INFINITE SET
THE NUMBER OF UNARY CLONES CONTAINING THE PERMUTATIONS ON AN INFINITE SET MICHAEL PINSKER Abstract. We calculate the number of unary clones (submonoids of the full transformation monoid) containing the
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationPh.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program August 2017
Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program August 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.
More informationInformation Aggregation in Dynamic Markets with Strategic Traders. Michael Ostrovsky
Information Aggregation in Dynamic Markets with Strategic Traders Michael Ostrovsky Setup n risk-neutral players, i = 1,..., n Finite set of states of the world Ω Random variable ( security ) X : Ω R Each
More informationGLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS
GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. In this paper we prove global
More informationBest response cycles in perfect information games
P. Jean-Jacques Herings, Arkadi Predtetchinski Best response cycles in perfect information games RM/15/017 Best response cycles in perfect information games P. Jean Jacques Herings and Arkadi Predtetchinski
More informationPrice of Anarchy Smoothness Price of Stability. Price of Anarchy. Algorithmic Game Theory
Smoothness Price of Stability Algorithmic Game Theory Smoothness Price of Stability Recall Recall for Nash equilibria: Strategic game Γ, social cost cost(s) for every state s of Γ Consider Σ PNE as the
More informationReinforcement Learning
Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical
More informationStock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models
Stock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models David Prager 1 1 Associate Professor of Mathematics Anderson University (SC) Based on joint work with Professor Qing Zhang,
More informationOutline Introduction Game Representations Reductions Solution Concepts. Game Theory. Enrico Franchi. May 19, 2010
May 19, 2010 1 Introduction Scope of Agent preferences Utility Functions 2 Game Representations Example: Game-1 Extended Form Strategic Form Equivalences 3 Reductions Best Response Domination 4 Solution
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationSTATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Spring, 2009
STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Spring, 2009 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements,
More informationAsymptotic results discrete time martingales and stochastic algorithms
Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete
More informationSTATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Preliminary Examination: Macroeconomics Spring, 2007
STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Preliminary Examination: Macroeconomics Spring, 2007 Instructions: Read the questions carefully and make sure to show your work. You
More informationFinite Additivity in Dubins-Savage Gambling and Stochastic Games. Bill Sudderth University of Minnesota
Finite Additivity in Dubins-Savage Gambling and Stochastic Games Bill Sudderth University of Minnesota This talk is based on joint work with Lester Dubins, David Heath, Ashok Maitra, and Roger Purves.
More informationBasic Arbitrage Theory KTH Tomas Björk
Basic Arbitrage Theory KTH 2010 Tomas Björk Tomas Björk, 2010 Contents 1. Mathematics recap. (Ch 10-12) 2. Recap of the martingale approach. (Ch 10-12) 3. Change of numeraire. (Ch 26) Björk,T. Arbitrage
More informationStochastic Games with 2 Non-Absorbing States
Stochastic Games with 2 Non-Absorbing States Eilon Solan June 14, 2000 Abstract In the present paper we consider recursive games that satisfy an absorbing property defined by Vieille. We give two sufficient
More informationPart A: Questions on ECN 200D (Rendahl)
University of California, Davis Date: September 1, 2011 Department of Economics Time: 5 hours Macroeconomics Reading Time: 20 minutes PRELIMINARY EXAMINATION FOR THE Ph.D. DEGREE Directions: Answer all
More informationDiscounted Stochastic Games
Discounted Stochastic Games Eilon Solan October 26, 1998 Abstract We give an alternative proof to a result of Mertens and Parthasarathy, stating that every n-player discounted stochastic game with general
More informationOptimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008
(presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35 The Secretary problem You have
More informationINTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES
INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES JONATHAN WEINSTEIN AND MUHAMET YILDIZ A. We show that, under the usual continuity and compactness assumptions, interim correlated rationalizability
More informationMartingales. by D. Cox December 2, 2009
Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a
More informationCompetitive Outcomes, Endogenous Firm Formation and the Aspiration Core
Competitive Outcomes, Endogenous Firm Formation and the Aspiration Core Camelia Bejan and Juan Camilo Gómez September 2011 Abstract The paper shows that the aspiration core of any TU-game coincides with
More informationFrom Discrete Time to Continuous Time Modeling
From Discrete Time to Continuous Time Modeling Prof. S. Jaimungal, Department of Statistics, University of Toronto 2004 Arrow-Debreu Securities 2004 Prof. S. Jaimungal 2 Consider a simple one-period economy
More informationSy D. Friedman. August 28, 2001
0 # and Inner Models Sy D. Friedman August 28, 2001 In this paper we examine the cardinal structure of inner models that satisfy GCH but do not contain 0 #. We show, assuming that 0 # exists, that such
More informationOn axiomatisablity questions about monoid acts
University of York Universal Algebra and Lattice Theory, Szeged 25 June, 2012 Based on joint work with V. Gould and L. Shaheen Monoid acts Right acts A is a left S-act if there exists a map : S A A such
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationStrategies and Nash Equilibrium. A Whirlwind Tour of Game Theory
Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,
More informationGame theory for. Leonardo Badia.
Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player
More informationMicroeconomic Theory II Preliminary Examination Solutions
Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose
More informationAM 121: Intro to Optimization Models and Methods
AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,
More information4: SINGLE-PERIOD MARKET MODELS
4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period
More informationLecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018
Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction
More informationIntroduction to Reinforcement Learning. MAL Seminar
Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology
More informationAn overview of some financial models using BSDE with enlarged filtrations
An overview of some financial models using BSDE with enlarged filtrations Anne EYRAUD-LOISEL Workshop : Enlargement of Filtrations and Applications to Finance and Insurance May 31st - June 4th, 2010, Jena
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use
More information17 MAKING COMPLEX DECISIONS
267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the
More informationSTATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010
STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements, state
More informationMATH 121 GAME THEORY REVIEW
MATH 121 GAME THEORY REVIEW ERIN PEARSE Contents 1. Definitions 2 1.1. Non-cooperative Games 2 1.2. Cooperative 2-person Games 4 1.3. Cooperative n-person Games (in coalitional form) 6 2. Theorems and
More informationA Translation of Intersection and Union Types
A Translation of Intersection and Union Types for the λ µ-calculus Kentaro Kikuchi RIEC, Tohoku University kentaro@nue.riec.tohoku.ac.jp Takafumi Sakurai Department of Mathematics and Informatics, Chiba
More informationBlackwell Optimality in Markov Decision Processes with Partial Observation
Blackwell Optimality in Markov Decision Processes with Partial Observation Dinah Rosenberg and Eilon Solan and Nicolas Vieille April 6, 2000 Abstract We prove the existence of Blackwell ε-optimal strategies
More informationGame Theory Fall 2003
Game Theory Fall 2003 Problem Set 5 [1] Consider an infinitely repeated game with a finite number of actions for each player and a common discount factor δ. Prove that if δ is close enough to zero then
More informationComplex Decisions. Sequential Decision Making
Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by
More informationSemantics and Verification of Software
Semantics and Verification of Software Thomas Noll Software Modeling and Verification Group RWTH Aachen University http://moves.rwth-aachen.de/teaching/ws-1718/sv-sw/ Recap: CCPOs and Continuous Functions
More information1 Dynamic programming
1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants
More informationCONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES
CONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES D. S. SILVESTROV, H. JÖNSSON, AND F. STENBERG Abstract. A general price process represented by a two-component
More informationAssets with possibly negative dividends
Assets with possibly negative dividends (Preliminary and incomplete. Comments welcome.) Ngoc-Sang PHAM Montpellier Business School March 12, 2017 Abstract The paper introduces assets whose dividends can
More informationChapter 3: Black-Scholes Equation and Its Numerical Evaluation
Chapter 3: Black-Scholes Equation and Its Numerical Evaluation 3.1 Itô Integral 3.1.1 Convergence in the Mean and Stieltjes Integral Definition 3.1 (Convergence in the Mean) A sequence {X n } n ln of random
More informationIntro to Reinforcement Learning. Part 3: Core Theory
Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2
More information1 A tax on capital income in a neoclassical growth model
1 A tax on capital income in a neoclassical growth model We look at a standard neoclassical growth model. The representative consumer maximizes U = β t u(c t ) (1) t=0 where c t is consumption in period
More informationInterpolation of κ-compactness and PCF
Comment.Math.Univ.Carolin. 50,2(2009) 315 320 315 Interpolation of κ-compactness and PCF István Juhász, Zoltán Szentmiklóssy Abstract. We call a topological space κ-compact if every subset of size κ has
More informationAn effective perfect-set theorem
An effective perfect-set theorem David Belanger, joint with Keng Meng (Selwyn) Ng CTFM 2016 at Waseda University, Tokyo Institute for Mathematical Sciences National University of Singapore The perfect
More informationIntroduction to Probability Theory and Stochastic Processes for Finance Lecture Notes
Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes Fabio Trojani Department of Economics, University of St. Gallen, Switzerland Correspondence address: Fabio Trojani,
More informationEquivalence between Semimartingales and Itô Processes
International Journal of Mathematical Analysis Vol. 9, 215, no. 16, 787-791 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/ijma.215.411358 Equivalence between Semimartingales and Itô Processes
More informationGame Theory for Wireless Engineers Chapter 3, 4
Game Theory for Wireless Engineers Chapter 3, 4 Zhongliang Liang ECE@Mcmaster Univ October 8, 2009 Outline Chapter 3 - Strategic Form Games - 3.1 Definition of A Strategic Form Game - 3.2 Dominated Strategies
More informationPURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS
HO-CHYUAN CHEN and WILLIAM S. NEILSON PURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS ABSTRACT. A pure-strategy equilibrium existence theorem is extended to include games with non-expected utility
More informationCS792 Notes Henkin Models, Soundness and Completeness
CS792 Notes Henkin Models, Soundness and Completeness Arranged by Alexandra Stefan March 24, 2005 These notes are a summary of chapters 4.5.1-4.5.5 from [1]. 1 Review indexed family of sets: A s, where
More informationModel-independent bounds for Asian options
Model-independent bounds for Asian options A dynamic programming approach Alexander M. G. Cox 1 Sigrid Källblad 2 1 University of Bath 2 CMAP, École Polytechnique University of Michigan, 2nd December,
More informationGödel algebras free over finite distributive lattices
TANCL, Oxford, August 4-9, 2007 1 Gödel algebras free over finite distributive lattices Stefano Aguzzoli Brunella Gerla Vincenzo Marra D.S.I. D.I.COM. D.I.C.O. University of Milano University of Insubria
More informationGAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.
14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose
More informationUnary PCF is Decidable
Unary PCF is Decidable Ralph Loader Merton College, Oxford November 1995, revised October 1996 and September 1997. Abstract We show that unary PCF, a very small fragment of Plotkin s PCF [?], has a decidable
More informationExponential utility maximization under partial information
Exponential utility maximization under partial information Marina Santacroce Politecnico di Torino Joint work with M. Mania AMaMeF 5-1 May, 28 Pitesti, May 1th, 28 Outline Expected utility maximization
More informationFinal exam solutions
EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the
More informationTABLEAU-BASED DECISION PROCEDURES FOR HYBRID LOGIC
TABLEAU-BASED DECISION PROCEDURES FOR HYBRID LOGIC THOMAS BOLANDER AND TORBEN BRAÜNER Abstract. Hybrid logics are a principled generalization of both modal logics and description logics. It is well-known
More informationRobust hedging with tradable options under price impact
- Robust hedging with tradable options under price impact Arash Fahim, Florida State University joint work with Y-J Huang, DCU, Dublin March 2016, ECFM, WPI practice is not robust - Pricing under a selected
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationPh.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017
Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.
More informationBrief Notes on the Category Theoretic Semantics of Simply Typed Lambda Calculus
University of Cambridge 2017 MPhil ACS / CST Part III Category Theory and Logic (L108) Brief Notes on the Category Theoretic Semantics of Simply Typed Lambda Calculus Andrew Pitts Notation: comma-separated
More informationbeing saturated Lemma 0.2 Suppose V = L[E]. Every Woodin cardinal is Woodin with.
On NS ω1 being saturated Ralf Schindler 1 Institut für Mathematische Logik und Grundlagenforschung, Universität Münster Einsteinstr. 62, 48149 Münster, Germany Definition 0.1 Let δ be a cardinal. We say
More informationarxiv: v1 [math.oc] 23 Dec 2010
ASYMPTOTIC PROPERTIES OF OPTIMAL TRAJECTORIES IN DYNAMIC PROGRAMMING SYLVAIN SORIN, XAVIER VENEL, GUILLAUME VIGERAL Abstract. We show in a dynamic programming framework that uniform convergence of the
More informationISBN ISSN
UNIVERSITY OF OSLO Department of Informatics A Logic-based Approach to Decision Making (extended version) Research Report 441 Magdalena Ivanovska Martin Giese ISBN 82-7368-373-7 ISSN 0806-3036 A Logic-based
More informationAMH4 - ADVANCED OPTION PRICING. Contents
AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5
More informationSlides III - Complete Markets
Slides III - Complete Markets Julio Garín University of Georgia Macroeconomic Theory II (Ph.D.) Spring 2017 Macroeconomic Theory II Slides III - Complete Markets Spring 2017 1 / 33 Outline 1. Risk, Uncertainty,
More informationLecture 4: Model-Free Prediction
Lecture 4: Model-Free Prediction David Silver Outline 1 Introduction 2 Monte-Carlo Learning 3 Temporal-Difference Learning 4 TD(λ) Introduction Model-Free Reinforcement Learning Last lecture: Planning
More informationStochastic Games and Bayesian Games
Stochastic Games and Bayesian Games CPSC 532L Lecture 10 Stochastic Games and Bayesian Games CPSC 532L Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games Stochastic Games
More informationSPDE and portfolio choice (joint work with M. Musiela) Princeton University. Thaleia Zariphopoulou The University of Texas at Austin
SPDE and portfolio choice (joint work with M. Musiela) Princeton University November 2007 Thaleia Zariphopoulou The University of Texas at Austin 1 Performance measurement of investment strategies 2 Market
More informationAnswers to Problem Set 4
Answers to Problem Set 4 Economics 703 Spring 016 1. a) The monopolist facing no threat of entry will pick the first cost function. To see this, calculate profits with each one. With the first cost function,
More informationLecture 14: Basic Fixpoint Theorems (cont.)
Lecture 14: Basic Fixpoint Theorems (cont) Predicate Transformers Monotonicity and Continuity Existence of Fixpoints Computing Fixpoints Fixpoint Characterization of CTL Operators 1 2 E M Clarke and E
More informationFinite Memory and Imperfect Monitoring
Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve
More informationStochastic Proximal Algorithms with Applications to Online Image Recovery
1/24 Stochastic Proximal Algorithms with Applications to Online Image Recovery Patrick Louis Combettes 1 and Jean-Christophe Pesquet 2 1 Mathematics Department, North Carolina State University, Raleigh,
More informationNot 0,4 2,1. i. Show there is a perfect Bayesian equilibrium where player A chooses to play, player A chooses L, and player B chooses L.
Econ 400, Final Exam Name: There are three questions taken from the material covered so far in the course. ll questions are equally weighted. If you have a question, please raise your hand and I will come
More informationThe Irrevocable Multi-Armed Bandit Problem
The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision
More informationAppendix: Common Currencies vs. Monetary Independence
Appendix: Common Currencies vs. Monetary Independence A The infinite horizon model This section defines the equilibrium of the infinity horizon model described in Section III of the paper and characterizes
More informationOptimal stopping problems for a Brownian motion with a disorder on a finite interval
Optimal stopping problems for a Brownian motion with a disorder on a finite interval A. N. Shiryaev M. V. Zhitlukhin arxiv:1212.379v1 [math.st] 15 Dec 212 December 18, 212 Abstract We consider optimal
More information