Long Term Values in MDPs Second Workshop on Open Games

Size: px
Start display at page:

Download "Long Term Values in MDPs Second Workshop on Open Games"

Transcription

1 A (Co)Algebraic Perspective on Long Term Values in MDPs Second Workshop on Open Games Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

2 Introduction Joint work with Frank Feys (Delft) and Larry Moss (Indiana): Long-Term Values in Markov Decision Processes, (Co)Algebraically. Proc. of Coalgebraic Methods in Computer Science (CMCS 2018). Aim: Apply (co)algebraic techniques to reason about Markov Decision Processes More generally, infinite games and equilibria (cf. Abramsky & Winschel, Lescanne, Hedges, Zahn, Ghani, Kupke, Lambert, Nordvall-Forsberg,...) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

3 Outline 1 MDP Preliminaries 2 Part I: Long-Term Values from b-corecursive Algebras 3 Part II: Policy Improvement (Co)Inductively 4 Conclusion Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

4 MDPs: Planning under Uncertainty Markov decision processes (MDPs) are state-based models of sequential decision-making under uncertainty. The system/agent chooses actions and collects rewards, but does not have full control over transitions. The decision maker wants to find a policy/plan that maximizes future expected long-term rewards. Applications: maintenance schedules, production planning, finance, reinforcement learning,... MDPs are one-player stochastic games. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

5 MDP Example A start-up company needs to decide to Advertise or Save money: S 1/2 1/2 S 1 Poor & Unknown +0 Rich & 1/2 A A 1/2 1/2 1/2 1/2 1/2 1/2 Poor Famous +0 Rich Unknown Famous +10 1/2 +10 S S & & A 1 1 A Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

6 Markov Decision Processes Def. A (discrete, time-independent) Markov Decision Process (MDP) is a Set-coalgebra m = u, t : S R ( S) A where S finite set of states, A is a finite set of actions, (, δ, µ) is the monad of finitely-supported distributions, t : X ( X ) A is a probabilistic transition function, u : X R is a reward function. (Alternatively, m : S (R S) A, i.e. rewards are given on transitions) Def. A (deterministic, stationary) policy σ is a map σ : S A. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

7 Expected Rewards via Trace Semantics Given m = u, t : S R ( S) A and policy σ : S A, we get Markov reward process m σ = u, t σ : S R S where t σ (s) := t(s)(σ(s)) mσ : S R S by determinisation (cf. Jacobs, Silva, Sokolova), details coming up. trc S δ S! R ω m σ= u,t σ m σ R S id R! R R ω Trace semantics trc(s) = (r σ 0 (s), r σ 1 (s), r σ 2 (s),...) where r σ n (s) is expected reward at time step n starting from s. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

8 Distributive Laws (cf. Bartels (2004)) Let (T, η, µ) be a monad on C and F : C C. A distributive law of (T, η, µ) over F is λ: TF FT satisfying λ η F = F η and λ µ F = F µ λ T T λ. Given such a λ, we obtain liftings F λ : EM(T ) EM(T ) (TA α A) (TFA λ A FTA F α FA) T λ : Coalg(F ) Coalg(F )...and determinisation (X c FX ) (TX Tc TFX λ X FTX ) ( ) : Coalg(FT ) Coalg(F λ ) (X c FTX ) (TX Tc TFTX λ TX FT 2 X F µ X FTX ) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

9 Distributive Law for Markov Reward Processes Markov reward process m σ : S R S is H -coalgebra where H = R Id. There is distributive law of (, δ, µ) over H (cf. Jacobs (2006)) χ X : (R X ) π 1, π 2 R X E id R X i.e. χ = E π 1, π 2. which yields determinisation of Markov reward process m σ : m σ : S R S given by m σ(ϕ) = (E u)(ϕ), (µ S t σ )(ϕ) = ( s S u(s) ϕ(s), s s S t σ(s)(s ) ϕ(s )). Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

10 Trace Semantics of Markov Reward Process trc S δ S! R ω m σ= u,t σ m σ R S id R! R R ω Trace semantics trc(s) = (r σ 0 (s), r σ 1 (s), r σ 2 (s),...) where r σ n (s) is expected reward at time step n starting from s. Long-term expected value for σ in s depends on how you evaluate Different evaluation criteria exist... (r σ 0 (s), r σ 1 (s), r σ 2 (s),...) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

11 Long-Term Value via Discounted Sums Let 0 γ < 1 be a discount factor. Def. The long-term value of policy σ according to the discounted sum criterion is V σ : S R: V σ (s) = γ n rn σ (s) n=0 Converges because reward map u : S R is bounded. We define: σ τ if for all s, V σ (s) V τ (s). σ is an optimal policy if σ τ for all τ. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

12 Optimal Value Def. The optimal value function V : S R of m is defined as V (s) = max V σ (s) σ Classical facts (wrt discounted sum criterion), cf. (Puterman, 2014): If σ is optimal, then V σ = V. Optimal policy always exists. Optimal policies need not be unique. Stationary (memoryless), deterministic policies suffice. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

13 Outline 1 MDP Preliminaries 2 Part I: Long-Term Values from b-corecursive Algebras 3 Part II: Policy Improvement (Co)Inductively 4 Conclusion Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

14 V σ as Coalgebra-to-Algebra Morphism V σ : S R satisfies for all s S: V σ (s) = u(s) + γ s S t σ(s)(s ) V σ (s ) i.e. V σ = u + γt σ V σ (as linear system) So, V σ arises as a fixpoint of the linear operator Ψ σ : R S R S given by Ψ σ (v) = u + γt σ v Observation: we can re-express (1) as V σ being a coalgebra-to-algebra morphism: (1) S V σ R α γ m σ= u,t σ R R R E R S R R R (V σ ) where α γ : R R R is α γ (x, y) = x + γ y. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

15 V σ via Universal Property? Recall that a corecursive algebra (for functor F ) is an F -algebra α s.t. f C FC! f A α FA Ff Question: Is α γ (R E) a corecursive algebra? By (Capretta et al., 2004): Let H = R Id. α γ (R E) a corecursive algebra (for H ) iff α γ corecursive algebra (for H) Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

16 Is α γ Corecursive Algebra? α γ : R R R is corecursive (for H = R Id) if: for all f : X R X there is unique f : X R such that f = α γ (R f ) f. Consider coalgebra f : X R X given by a f = x 0 a 0 1 a x1 2 x2... f is solution iff f (x n ) = a n + γ f (x n+1 ), n = 0, 1, 2,.... This system has infinitely many solutions when γ > 0, even if (a n ) n is bounded. So, α γ is not corecursive for γ > 0. However, if (a n ) n is bounded then this system has a unique bounded solution. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

17 Bounded Corecursive Algebra (bca) To get uniqueness incorporate boundedness information. Def. A b-category (C, B) is a category C with a subclass B Mor(C) of bounded morphisms s.t. for all f : f B f g B. (Also known as a sieve.) Main example: (Met, B) where Met is metric spaces with all maps, and B are all bounded maps. Def. Let (C, B) be a b-category and F : C C endofunctor. An F -algebra α: FA A is a b-corecursive algebra (bca) if X f B FX!f B A α FA Ff Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

18 V σ from Universal Property of bca We show α γ (R E) is bca for H : 1 Develop some theory of b-categories (b-functors, b-natural transformation, B-preservation properties,... ) 2 Prove b-version of (Capretta et al.)-result (Theorem 2). (From bca for H we obtain a bca for H under certain conditions). 3 Show that α γ is bca for H. 4 Show that conditions for Theorem 2 apply. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

19 Step 1.1: b-categories Let (C, B) and (C, B ) be b-categories. Let F, G : C C be functors. Def. A C-arrow f preserves B if g B f g B (whenever f g is defined). F is a b-functor if f B implies that Ff preserves B. F is a strong b-functor if f B implies that Ff B. A natural transformation σ : F G is a b-natural transformation if all components σ X preserve B. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

20 Step 1.2: MDPs in (Met, B) Functors H = R Id and are lifted to Met: The product (X, d X ) (Y, d Y ) = (X Y, d X d Y ) is given the maximum metric. (X, d X ) = ( X, d X ) where d X is the Kantorovich lifting of d X on X. We have: H : Met Met is b-functor on (Met, B), but not strong. : Met Met is strong b-functor on (Met, B). δ, µ, χ are b-natural transformations wrt (Met, B). B (bounded maps) is closed under determinisation: c B c B. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

21 Step 2: b-version of (Capretta et al.) Theorem 2 Let (C, B) be a b-category, F a C-endofunctor, (T, η, µ) a monad on C, and λ a distributive law of (T, η, µ) over F. Assume further that T is a strong b-functor and that λ and F µ are b-natural in (C, B). 1 If β : F λ (A, θ) (A, θ) is an F λ -algebra in EM(T ) such that underlying β : FA A is a bca for F, and θ preserves B, then β F θ : FTA A is a bca for FT. 2 Moreover, for all g : X FTX in B, we have g = (g ) η X and (g ) = θ Tg, where h h is the solution operation for the bca β : FA A and h h is the solution operation for the bca β F θ : FTA A. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

22 Step 3+4: Obtaining bcas Step 3: Apply Banach FPT to show that α γ is bca for H. Step 4: Apply Theorem 2: - α γ is H λ -algebra (R R) αγ R E π 1, π 2 R R E R E R R α γ R - E preserves B Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

23 Optimal Value V from bca S V R m= u,t α γ (R max A E A ) R ( S) A R ( V ) A R ( R) A But bca is not obtained via Theorem 2. Problem: max A is not affine/linear. We can show directly that we have bca. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

24 Outline 1 MDP Preliminaries 2 Part I: Long-Term Values from b-corecursive Algebras 3 Part II: Policy Improvement (Co)Inductively 4 Conclusion Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

25 Policy Iteration For σ : S A and ϕ S, let l σ (ϕ) = s S ϕ(s) V σ (s ) (expected long-term value for σ wrt ϕ). Policy Iteration Algorithm: 1 Initialise σ 0 to any policy. 2 Compute V σ k (e.g. by solving system of linear equations). 3 Define σ k+1 by σ k+1 (s) := argmax a A {l σk (t a (s))} 4 If σ k+1 = σ k then stop, else go to step 2. Termination: (since A S is finite). Correctness: follows if σ k+1 σ k. Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

26 Policy Improvement By definition, for all s S, σ k+1 (s) = argmax a A {l σk (t a (s))}. which implies for all s S, l σk (t σk+1 (s)) l σk (t σk (s)), i.e. l σk t σk+1 l σk t σk (in pointwise order on R S ). Policy Improvement Lemma: For all σ, τ: l σ t τ l σ t σ V τ V σ Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

27 Contraction (Co)Induction Def. Ordered metric space An ordered (complete) metric space (M, d, ) is a (complete) metric space (M, d) together with a partial order (M, ) such that for all y M, {z z y} and {z y z} are closed in the metric topology. Example: B(X, R) with the pointwise order and supremum metric. Theorem: Contraction (Co)Induction Let M be a non-empty, ordered complete metric space. If f : M M is both contractive and order-preserving, then the fixpoint x of f is: (i) a least pre-fixpoint (if f (x) x, then x x), and (ii) a greatest post-fixpoint (if x f (x), then x x ). Cf. Metric Coinduction (Kozen & Ruozzi, 2009) and (Denardo, 1967). Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

28 Proof of Policy Improvement Policy Improvement Lemma: For all σ, τ: l σ t τ l σ t σ V τ V σ Proof: Apply Contraction (Co)induction to Ψ σ : R S R S (contractive and order-preserving ) Ψ σ (v) = u + γt σ v, and V σ is its fixpoint. We have: (by contr. coind.) l σ t τ l σ t σ u + γ l σ t τ u + γ l σ t σ Ψ τ (V σ ) Ψ σ (V σ ) = V σ V τ V σ Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

29 Conclusion Value functions V σ and V from b-corecursive algebras. Coinductive proof of policy improvement theorem. Still need to resort to Banach FPT to get fixpoints. cf. metric coinduction (Kozen & Ruozzi, CALCO 2009). Future work Stochastic games (MDP is 1-player stochastic game): Existence of Nash Eq = Kakutani + Contraction (Co)Induction. Other types of equilibria (Subgame perfect, Markov,...) Make connections to Open games (Hedges et al.) Semantics of equilibria (Pavlovic, 2009) Coalgebraic infinite games (Abramsky & Winschel, 2017) Learning: reinforcement learning Thanks! Helle Hvid Hansen (TU Delft) 2nd WS Open Games Oxford 4-6 July / 33

Long-Term Values in MDPs, Corecursively

Long-Term Values in MDPs, Corecursively Long-Term Values in MDPs, Corecursively Applied Category Theory, 15-16 March 2018, NIST Helle Hvid Hansen Delft University of Technology Helle Hvid Hansen (TU Delft) MDPs, Corecursively NIST, 15/Mar/2018

More information

École normale supérieure, MPRI, M2 Year 2007/2008. Course 2-6 Abstract interpretation: application to verification and static analysis P.

École normale supérieure, MPRI, M2 Year 2007/2008. Course 2-6 Abstract interpretation: application to verification and static analysis P. École normale supérieure, MPRI, M2 Year 2007/2008 Course 2-6 Abstract interpretation: application to verification and static analysis P. Cousot Questions and answers of the partial exam of Friday November

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov

More information

Rational Behaviour and Strategy Construction in Infinite Multiplayer Games

Rational Behaviour and Strategy Construction in Infinite Multiplayer Games Rational Behaviour and Strategy Construction in Infinite Multiplayer Games Michael Ummels ummels@logic.rwth-aachen.de FSTTCS 2006 Michael Ummels Rational Behaviour and Strategy Construction 1 / 15 Infinite

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.

More information

On Existence of Equilibria. Bayesian Allocation-Mechanisms

On Existence of Equilibria. Bayesian Allocation-Mechanisms On Existence of Equilibria in Bayesian Allocation Mechanisms Northwestern University April 23, 2014 Bayesian Allocation Mechanisms In allocation mechanisms, agents choose messages. The messages determine

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure

More information

Markov Decision Processes II

Markov Decision Processes II Markov Decision Processes II Daisuke Oyama Topics in Economic Theory December 17, 2014 Review Finite state space S, finite action space A. The value of a policy σ A S : v σ = β t Q t σr σ, t=0 which satisfies

More information

THE NUMBER OF UNARY CLONES CONTAINING THE PERMUTATIONS ON AN INFINITE SET

THE NUMBER OF UNARY CLONES CONTAINING THE PERMUTATIONS ON AN INFINITE SET THE NUMBER OF UNARY CLONES CONTAINING THE PERMUTATIONS ON AN INFINITE SET MICHAEL PINSKER Abstract. We calculate the number of unary clones (submonoids of the full transformation monoid) containing the

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program August 2017

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program August 2017 Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program August 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.

More information

Information Aggregation in Dynamic Markets with Strategic Traders. Michael Ostrovsky

Information Aggregation in Dynamic Markets with Strategic Traders. Michael Ostrovsky Information Aggregation in Dynamic Markets with Strategic Traders Michael Ostrovsky Setup n risk-neutral players, i = 1,..., n Finite set of states of the world Ω Random variable ( security ) X : Ω R Each

More information

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS

GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS GLOBAL CONVERGENCE OF GENERAL DERIVATIVE-FREE TRUST-REGION ALGORITHMS TO FIRST AND SECOND ORDER CRITICAL POINTS ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. In this paper we prove global

More information

Best response cycles in perfect information games

Best response cycles in perfect information games P. Jean-Jacques Herings, Arkadi Predtetchinski Best response cycles in perfect information games RM/15/017 Best response cycles in perfect information games P. Jean Jacques Herings and Arkadi Predtetchinski

More information

Price of Anarchy Smoothness Price of Stability. Price of Anarchy. Algorithmic Game Theory

Price of Anarchy Smoothness Price of Stability. Price of Anarchy. Algorithmic Game Theory Smoothness Price of Stability Algorithmic Game Theory Smoothness Price of Stability Recall Recall for Nash equilibria: Strategic game Γ, social cost cost(s) for every state s of Γ Consider Σ PNE as the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information

Stock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models

Stock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models Stock Loan Valuation Under Brownian-Motion Based and Markov Chain Stock Models David Prager 1 1 Associate Professor of Mathematics Anderson University (SC) Based on joint work with Professor Qing Zhang,

More information

Outline Introduction Game Representations Reductions Solution Concepts. Game Theory. Enrico Franchi. May 19, 2010

Outline Introduction Game Representations Reductions Solution Concepts. Game Theory. Enrico Franchi. May 19, 2010 May 19, 2010 1 Introduction Scope of Agent preferences Utility Functions 2 Game Representations Example: Game-1 Extended Form Strategic Form Equivalences 3 Reductions Best Response Domination 4 Solution

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Spring, 2009

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Spring, 2009 STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Spring, 2009 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements,

More information

Asymptotic results discrete time martingales and stochastic algorithms

Asymptotic results discrete time martingales and stochastic algorithms Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete

More information

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Preliminary Examination: Macroeconomics Spring, 2007

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Preliminary Examination: Macroeconomics Spring, 2007 STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Preliminary Examination: Macroeconomics Spring, 2007 Instructions: Read the questions carefully and make sure to show your work. You

More information

Finite Additivity in Dubins-Savage Gambling and Stochastic Games. Bill Sudderth University of Minnesota

Finite Additivity in Dubins-Savage Gambling and Stochastic Games. Bill Sudderth University of Minnesota Finite Additivity in Dubins-Savage Gambling and Stochastic Games Bill Sudderth University of Minnesota This talk is based on joint work with Lester Dubins, David Heath, Ashok Maitra, and Roger Purves.

More information

Basic Arbitrage Theory KTH Tomas Björk

Basic Arbitrage Theory KTH Tomas Björk Basic Arbitrage Theory KTH 2010 Tomas Björk Tomas Björk, 2010 Contents 1. Mathematics recap. (Ch 10-12) 2. Recap of the martingale approach. (Ch 10-12) 3. Change of numeraire. (Ch 26) Björk,T. Arbitrage

More information

Stochastic Games with 2 Non-Absorbing States

Stochastic Games with 2 Non-Absorbing States Stochastic Games with 2 Non-Absorbing States Eilon Solan June 14, 2000 Abstract In the present paper we consider recursive games that satisfy an absorbing property defined by Vieille. We give two sufficient

More information

Part A: Questions on ECN 200D (Rendahl)

Part A: Questions on ECN 200D (Rendahl) University of California, Davis Date: September 1, 2011 Department of Economics Time: 5 hours Macroeconomics Reading Time: 20 minutes PRELIMINARY EXAMINATION FOR THE Ph.D. DEGREE Directions: Answer all

More information

Discounted Stochastic Games

Discounted Stochastic Games Discounted Stochastic Games Eilon Solan October 26, 1998 Abstract We give an alternative proof to a result of Mertens and Parthasarathy, stating that every n-player discounted stochastic game with general

More information

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008 (presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35 The Secretary problem You have

More information

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES JONATHAN WEINSTEIN AND MUHAMET YILDIZ A. We show that, under the usual continuity and compactness assumptions, interim correlated rationalizability

More information

Martingales. by D. Cox December 2, 2009

Martingales. by D. Cox December 2, 2009 Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a

More information

Competitive Outcomes, Endogenous Firm Formation and the Aspiration Core

Competitive Outcomes, Endogenous Firm Formation and the Aspiration Core Competitive Outcomes, Endogenous Firm Formation and the Aspiration Core Camelia Bejan and Juan Camilo Gómez September 2011 Abstract The paper shows that the aspiration core of any TU-game coincides with

More information

From Discrete Time to Continuous Time Modeling

From Discrete Time to Continuous Time Modeling From Discrete Time to Continuous Time Modeling Prof. S. Jaimungal, Department of Statistics, University of Toronto 2004 Arrow-Debreu Securities 2004 Prof. S. Jaimungal 2 Consider a simple one-period economy

More information

Sy D. Friedman. August 28, 2001

Sy D. Friedman. August 28, 2001 0 # and Inner Models Sy D. Friedman August 28, 2001 In this paper we examine the cardinal structure of inner models that satisfy GCH but do not contain 0 #. We show, assuming that 0 # exists, that such

More information

On axiomatisablity questions about monoid acts

On axiomatisablity questions about monoid acts University of York Universal Algebra and Lattice Theory, Szeged 25 June, 2012 Based on joint work with V. Gould and L. Shaheen Monoid acts Right acts A is a left S-act if there exists a map : S A A such

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,

More information

Game theory for. Leonardo Badia.

Game theory for. Leonardo Badia. Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player

More information

Microeconomic Theory II Preliminary Examination Solutions

Microeconomic Theory II Preliminary Examination Solutions Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose

More information

AM 121: Intro to Optimization Models and Methods

AM 121: Intro to Optimization Models and Methods AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

Introduction to Reinforcement Learning. MAL Seminar

Introduction to Reinforcement Learning. MAL Seminar Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology

More information

An overview of some financial models using BSDE with enlarged filtrations

An overview of some financial models using BSDE with enlarged filtrations An overview of some financial models using BSDE with enlarged filtrations Anne EYRAUD-LOISEL Workshop : Enlargement of Filtrations and Applications to Finance and Insurance May 31st - June 4th, 2010, Jena

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010 STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements, state

More information

MATH 121 GAME THEORY REVIEW

MATH 121 GAME THEORY REVIEW MATH 121 GAME THEORY REVIEW ERIN PEARSE Contents 1. Definitions 2 1.1. Non-cooperative Games 2 1.2. Cooperative 2-person Games 4 1.3. Cooperative n-person Games (in coalitional form) 6 2. Theorems and

More information

A Translation of Intersection and Union Types

A Translation of Intersection and Union Types A Translation of Intersection and Union Types for the λ µ-calculus Kentaro Kikuchi RIEC, Tohoku University kentaro@nue.riec.tohoku.ac.jp Takafumi Sakurai Department of Mathematics and Informatics, Chiba

More information

Blackwell Optimality in Markov Decision Processes with Partial Observation

Blackwell Optimality in Markov Decision Processes with Partial Observation Blackwell Optimality in Markov Decision Processes with Partial Observation Dinah Rosenberg and Eilon Solan and Nicolas Vieille April 6, 2000 Abstract We prove the existence of Blackwell ε-optimal strategies

More information

Game Theory Fall 2003

Game Theory Fall 2003 Game Theory Fall 2003 Problem Set 5 [1] Consider an infinitely repeated game with a finite number of actions for each player and a common discount factor δ. Prove that if δ is close enough to zero then

More information

Complex Decisions. Sequential Decision Making

Complex Decisions. Sequential Decision Making Sequential Decision Making Outline Sequential decision problems Value iteration Policy iteration POMDPs (basic concepts) Slides partially based on the Book "Reinforcement Learning: an introduction" by

More information

Semantics and Verification of Software

Semantics and Verification of Software Semantics and Verification of Software Thomas Noll Software Modeling and Verification Group RWTH Aachen University http://moves.rwth-aachen.de/teaching/ws-1718/sv-sw/ Recap: CCPOs and Continuous Functions

More information

1 Dynamic programming

1 Dynamic programming 1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants

More information

CONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES

CONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES CONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES D. S. SILVESTROV, H. JÖNSSON, AND F. STENBERG Abstract. A general price process represented by a two-component

More information

Assets with possibly negative dividends

Assets with possibly negative dividends Assets with possibly negative dividends (Preliminary and incomplete. Comments welcome.) Ngoc-Sang PHAM Montpellier Business School March 12, 2017 Abstract The paper introduces assets whose dividends can

More information

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation Chapter 3: Black-Scholes Equation and Its Numerical Evaluation 3.1 Itô Integral 3.1.1 Convergence in the Mean and Stieltjes Integral Definition 3.1 (Convergence in the Mean) A sequence {X n } n ln of random

More information

Intro to Reinforcement Learning. Part 3: Core Theory

Intro to Reinforcement Learning. Part 3: Core Theory Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2

More information

1 A tax on capital income in a neoclassical growth model

1 A tax on capital income in a neoclassical growth model 1 A tax on capital income in a neoclassical growth model We look at a standard neoclassical growth model. The representative consumer maximizes U = β t u(c t ) (1) t=0 where c t is consumption in period

More information

Interpolation of κ-compactness and PCF

Interpolation of κ-compactness and PCF Comment.Math.Univ.Carolin. 50,2(2009) 315 320 315 Interpolation of κ-compactness and PCF István Juhász, Zoltán Szentmiklóssy Abstract. We call a topological space κ-compact if every subset of size κ has

More information

An effective perfect-set theorem

An effective perfect-set theorem An effective perfect-set theorem David Belanger, joint with Keng Meng (Selwyn) Ng CTFM 2016 at Waseda University, Tokyo Institute for Mathematical Sciences National University of Singapore The perfect

More information

Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes

Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes Fabio Trojani Department of Economics, University of St. Gallen, Switzerland Correspondence address: Fabio Trojani,

More information

Equivalence between Semimartingales and Itô Processes

Equivalence between Semimartingales and Itô Processes International Journal of Mathematical Analysis Vol. 9, 215, no. 16, 787-791 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/ijma.215.411358 Equivalence between Semimartingales and Itô Processes

More information

Game Theory for Wireless Engineers Chapter 3, 4

Game Theory for Wireless Engineers Chapter 3, 4 Game Theory for Wireless Engineers Chapter 3, 4 Zhongliang Liang ECE@Mcmaster Univ October 8, 2009 Outline Chapter 3 - Strategic Form Games - 3.1 Definition of A Strategic Form Game - 3.2 Dominated Strategies

More information

PURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS

PURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS HO-CHYUAN CHEN and WILLIAM S. NEILSON PURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS ABSTRACT. A pure-strategy equilibrium existence theorem is extended to include games with non-expected utility

More information

CS792 Notes Henkin Models, Soundness and Completeness

CS792 Notes Henkin Models, Soundness and Completeness CS792 Notes Henkin Models, Soundness and Completeness Arranged by Alexandra Stefan March 24, 2005 These notes are a summary of chapters 4.5.1-4.5.5 from [1]. 1 Review indexed family of sets: A s, where

More information

Model-independent bounds for Asian options

Model-independent bounds for Asian options Model-independent bounds for Asian options A dynamic programming approach Alexander M. G. Cox 1 Sigrid Källblad 2 1 University of Bath 2 CMAP, École Polytechnique University of Michigan, 2nd December,

More information

Gödel algebras free over finite distributive lattices

Gödel algebras free over finite distributive lattices TANCL, Oxford, August 4-9, 2007 1 Gödel algebras free over finite distributive lattices Stefano Aguzzoli Brunella Gerla Vincenzo Marra D.S.I. D.I.COM. D.I.C.O. University of Milano University of Insubria

More information

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference. 14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose

More information

Unary PCF is Decidable

Unary PCF is Decidable Unary PCF is Decidable Ralph Loader Merton College, Oxford November 1995, revised October 1996 and September 1997. Abstract We show that unary PCF, a very small fragment of Plotkin s PCF [?], has a decidable

More information

Exponential utility maximization under partial information

Exponential utility maximization under partial information Exponential utility maximization under partial information Marina Santacroce Politecnico di Torino Joint work with M. Mania AMaMeF 5-1 May, 28 Pitesti, May 1th, 28 Outline Expected utility maximization

More information

Final exam solutions

Final exam solutions EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the

More information

TABLEAU-BASED DECISION PROCEDURES FOR HYBRID LOGIC

TABLEAU-BASED DECISION PROCEDURES FOR HYBRID LOGIC TABLEAU-BASED DECISION PROCEDURES FOR HYBRID LOGIC THOMAS BOLANDER AND TORBEN BRAÜNER Abstract. Hybrid logics are a principled generalization of both modal logics and description logics. It is well-known

More information

Robust hedging with tradable options under price impact

Robust hedging with tradable options under price impact - Robust hedging with tradable options under price impact Arash Fahim, Florida State University joint work with Y-J Huang, DCU, Dublin March 2016, ECFM, WPI practice is not robust - Pricing under a selected

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.

More information

Brief Notes on the Category Theoretic Semantics of Simply Typed Lambda Calculus

Brief Notes on the Category Theoretic Semantics of Simply Typed Lambda Calculus University of Cambridge 2017 MPhil ACS / CST Part III Category Theory and Logic (L108) Brief Notes on the Category Theoretic Semantics of Simply Typed Lambda Calculus Andrew Pitts Notation: comma-separated

More information

being saturated Lemma 0.2 Suppose V = L[E]. Every Woodin cardinal is Woodin with.

being saturated Lemma 0.2 Suppose V = L[E]. Every Woodin cardinal is Woodin with. On NS ω1 being saturated Ralf Schindler 1 Institut für Mathematische Logik und Grundlagenforschung, Universität Münster Einsteinstr. 62, 48149 Münster, Germany Definition 0.1 Let δ be a cardinal. We say

More information

arxiv: v1 [math.oc] 23 Dec 2010

arxiv: v1 [math.oc] 23 Dec 2010 ASYMPTOTIC PROPERTIES OF OPTIMAL TRAJECTORIES IN DYNAMIC PROGRAMMING SYLVAIN SORIN, XAVIER VENEL, GUILLAUME VIGERAL Abstract. We show in a dynamic programming framework that uniform convergence of the

More information

ISBN ISSN

ISBN ISSN UNIVERSITY OF OSLO Department of Informatics A Logic-based Approach to Decision Making (extended version) Research Report 441 Magdalena Ivanovska Martin Giese ISBN 82-7368-373-7 ISSN 0806-3036 A Logic-based

More information

AMH4 - ADVANCED OPTION PRICING. Contents

AMH4 - ADVANCED OPTION PRICING. Contents AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5

More information

Slides III - Complete Markets

Slides III - Complete Markets Slides III - Complete Markets Julio Garín University of Georgia Macroeconomic Theory II (Ph.D.) Spring 2017 Macroeconomic Theory II Slides III - Complete Markets Spring 2017 1 / 33 Outline 1. Risk, Uncertainty,

More information

Lecture 4: Model-Free Prediction

Lecture 4: Model-Free Prediction Lecture 4: Model-Free Prediction David Silver Outline 1 Introduction 2 Monte-Carlo Learning 3 Temporal-Difference Learning 4 TD(λ) Introduction Model-Free Reinforcement Learning Last lecture: Planning

More information

Stochastic Games and Bayesian Games

Stochastic Games and Bayesian Games Stochastic Games and Bayesian Games CPSC 532L Lecture 10 Stochastic Games and Bayesian Games CPSC 532L Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games Stochastic Games

More information

SPDE and portfolio choice (joint work with M. Musiela) Princeton University. Thaleia Zariphopoulou The University of Texas at Austin

SPDE and portfolio choice (joint work with M. Musiela) Princeton University. Thaleia Zariphopoulou The University of Texas at Austin SPDE and portfolio choice (joint work with M. Musiela) Princeton University November 2007 Thaleia Zariphopoulou The University of Texas at Austin 1 Performance measurement of investment strategies 2 Market

More information

Answers to Problem Set 4

Answers to Problem Set 4 Answers to Problem Set 4 Economics 703 Spring 016 1. a) The monopolist facing no threat of entry will pick the first cost function. To see this, calculate profits with each one. With the first cost function,

More information

Lecture 14: Basic Fixpoint Theorems (cont.)

Lecture 14: Basic Fixpoint Theorems (cont.) Lecture 14: Basic Fixpoint Theorems (cont) Predicate Transformers Monotonicity and Continuity Existence of Fixpoints Computing Fixpoints Fixpoint Characterization of CTL Operators 1 2 E M Clarke and E

More information

Finite Memory and Imperfect Monitoring

Finite Memory and Imperfect Monitoring Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve

More information

Stochastic Proximal Algorithms with Applications to Online Image Recovery

Stochastic Proximal Algorithms with Applications to Online Image Recovery 1/24 Stochastic Proximal Algorithms with Applications to Online Image Recovery Patrick Louis Combettes 1 and Jean-Christophe Pesquet 2 1 Mathematics Department, North Carolina State University, Raleigh,

More information

Not 0,4 2,1. i. Show there is a perfect Bayesian equilibrium where player A chooses to play, player A chooses L, and player B chooses L.

Not 0,4 2,1. i. Show there is a perfect Bayesian equilibrium where player A chooses to play, player A chooses L, and player B chooses L. Econ 400, Final Exam Name: There are three questions taken from the material covered so far in the course. ll questions are equally weighted. If you have a question, please raise your hand and I will come

More information

The Irrevocable Multi-Armed Bandit Problem

The Irrevocable Multi-Armed Bandit Problem The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision

More information

Appendix: Common Currencies vs. Monetary Independence

Appendix: Common Currencies vs. Monetary Independence Appendix: Common Currencies vs. Monetary Independence A The infinite horizon model This section defines the equilibrium of the infinity horizon model described in Section III of the paper and characterizes

More information

Optimal stopping problems for a Brownian motion with a disorder on a finite interval

Optimal stopping problems for a Brownian motion with a disorder on a finite interval Optimal stopping problems for a Brownian motion with a disorder on a finite interval A. N. Shiryaev M. V. Zhitlukhin arxiv:1212.379v1 [math.st] 15 Dec 212 December 18, 212 Abstract We consider optimal

More information