Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008

Similar documents
Martingales. by D. Cox December 2, 2009

Math-Stat-491-Fall2014-Notes-V

Lecture 23: April 10

Lecture 5 January 30

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 7: Bayesian approach to MAB - Gittins index

4 Martingales in Discrete-Time

Dynamic Admission and Service Rate Control of a Queue

A1: American Options in the Binomial Model

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Information aggregation for timing decision making.

Econometrica Supplementary Material

Chapter6.MAXIMIZINGTHERATEOFRETURN.

Sequential Decision Making

16 MAKING SIMPLE DECISIONS

Stochastic Optimal Control

House-Hunting Without Second Moments

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

4 Reinforcement Learning Basic Algorithms

Lecture Notes 1

MAFS Computational Methods for Pricing Structured Products

Problem 1: Random variables, common distributions and the monopoly price

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION

Mechanism Design and Auctions

16 MAKING SIMPLE DECISIONS

From Discrete Time to Continuous Time Modeling

Universal Portfolios

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n

Yao s Minimax Principle

X i = 124 MARTINGALES

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

17 MAKING COMPLEX DECISIONS

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Chapter 7: Estimation Sections

Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the

Decision Theory: Value Iteration

Dynamic Portfolio Choice II

Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Homework 2: Dynamic Moral Hazard

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 11 10/9/2013. Martingales and stopping times II

CMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

B. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as

Forecast Horizons for Production Planning with Stochastic Demand

STP Problem Set 3 Solutions

Admissioncontrolwithbatcharrivals

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

The Stigler-Luckock model with market makers

Session 9: The expected utility framework p. 1

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.

Optimal Stopping Rules of Discrete-Time Callable Financial Commodities with Two Stopping Boundaries

Outline of Lecture 1. Martin-Löf tests and martingales

Casino gambling problem under probability weighting

Topics in Contract Theory Lecture 3

Evaluating Strategic Forecasters. Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017

Log-linear Dynamics and Local Potential

Asymptotic results discrete time martingales and stochastic algorithms

Comparison of proof techniques in game-theoretic probability and measure-theoretic probability

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Discrete Mathematics for CS Spring 2008 David Wagner Final Exam

Tug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract

FE 5204 Stochastic Differential Equations

Self-organized criticality on the stock market

Reinforcement Learning

Web Appendix: Proofs and extensions.

Chapter 6: Risky Securities and Utility Theory

Optimizing Portfolios

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks

Homework Assignments

STAT/MATH 395 PROBABILITY II

CS 361: Probability & Statistics

5. In fact, any function of a random variable is also a random variable

CONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES

Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks

On the Optimality of a Family of Binary Trees Techical Report TR

Additional questions for chapter 3

AMH4 - ADVANCED OPTION PRICING. Contents

Adaptive Experiments for Policy Choice. March 8, 2019

Revenue Management with Forward-Looking Buyers

Introduction to Fall 2007 Artificial Intelligence Final Exam

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

A Decentralized Learning Equilibrium

POMDPs: Partially Observable Markov Decision Processes Advanced AI

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Game Theory: Normal Form Games

Lecture 3: Review of mathematical finance and derivative pricing models

Probability without Measure!

Optimal stopping problems for a Brownian motion with a disorder on a finite interval

Sublinear Time Algorithms Oct 19, Lecture 1

Optimal Investment for Worst-Case Crash Scenarios

Are the Azéma-Yor processes truly remarkable?

Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Multiple Optimal Stopping Problems and Lookback Options

Lecture 10: Point Estimation

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)

Dynamic Portfolio Execution Detailed Proofs

Transcription:

(presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35

Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35

The Secretary problem You have one secretarial position open; you interview n (known) candidates sequentially: The candidates appear uniformly permutated. You have them ordered from best to worst, but you observe only their relative ordering. You must accept or reject on the spot. Which rule maximises the probability of getting the best secretary? 3 / 35

The Secretary problem We accept only if the current candidate is relatively best. The probability of the jth candidate being best is j/n. Let W j be the probability of winning the best secretary using an optimal rule which rejects the first j applicants. W j+1 W j, since a rule which rejects the first j + 1 rejects the first j. It is optimal to accept the jth candidate iff j/n W j. Since (j + 1)/n > j/n W j W j+1 an optimal rule will, for some r, reject the first r 1 candidates then accept the next relatively best. Call this rule N r. This is a threshold rule. 4 / 35

The Secretary problem The probability of winning using the rule N r is P r = = = = n P(kth applicant is best and is selected) k=r n P(k th applicant is best)p(k element is selected it is best) k=r n k=r n k=r = r 1 n 1 P(best of first k 1 appears before stage r) n 1 r 1 n k 1 n k=r 1 k 1. 5 / 35

The Secretary problem The probability of winning using the rule N r is r n n k=r+1 n k=r+1 P r+1 P r 1 k 1 r 1 n 1 k 1 1 n k=r 1 k 1 The optimal rule selects the first candidate that appears from state r 1 on, where n 1 r 1 = min{r 1 : k 1 1} k=r+1 Since the inner sum is approximately log(n/r) 1, we have r 1 /n e 1, and P r1 e 1. 6 / 35

The Secretary problem The exact probabilities and optimal stopping thresholds: n = 1 2 3 4 5 6 7 8 r 1 = 1 1 2 2 3 3 3 4 P r1 = 1.0.500.500.458.433.428.414.410 Compare with 1/e 0.369. 7 / 35

Problems An optimal stopping problem consists of a sequence X 1, X 2,... of random variables with a particular joint distribution, and a sequence of real-valued reward functions: y 0, y 1 (x 1 ), y 2 (x 1, x 2 ),..., y (x 1, x 2,... ). A stopping rule is a sequence of functions on [0, 1]: φ = (φ 0, φ 1 (x 1 ), φ 2 (x 1, x 2 ),... ) which give the probability of stopping at stage n conditional on the n observations so far. If these functions values are always 0 or 1, the stopping rule is deterministic. Equivalently, a stopping rule is a random variable N satisfying for each n the conditional independence: P(N = n X = x) = P(N = n X 1:n = x 1:n ). 8 / 35

Problems The expected return V (φ) of a stopping rule is: V (φ) = E [y N (X 1,..., X N )] An optimal stopping rule maximises this expectation. Deterministic rewards are without loss of generality: we can expand X i to capture randomness of the i reward. The secretary problem is an optimal stopping problem with X i = {1,..., i}, and i/n if x i = 1 and i n, y i (x 1,..., x i ) = 0 if x i > 1 and i n, otherwise. and our solution defined a deterministic optimal stopping rule. 9 / 35

Examples The house-selling/job-search problem. Offers X i come in daily for an asset you wish to sell. The offers are iid, and there is a cost c of waiting. If you can recall previous offers, you receive utility max i n X i nc when stopping at time n, if you cannot you receive X n nc. Detecting a change point. A sequence of variables X 1,... is initally distributed iid according to F 0 (x), but at some unknown time switches to F 1 (x). We have Y n = c[n < T ] + (n T )[n T ] = cp(t > n X 1:n = x 1:n ) + E((n T ) + X 1:n = x 1:n ) Applications: monitoring heart patients, production quality, missile course. 10 / 35

Examples Search for a new species. Individual beetles are observed at unit time intervals. Probability p j independently for observing a member of species µ j. Cost c per observation, reward the number of unique species seen. Sequential statistical decision problems. You have prior knowledge τ(θ) about a parameter Θ, your goal is to choose an action a A maximising the utility U(θ, a) gained. Before deciding, you can observe sequentially variables X 1, X 2,..., with a cost of c each. The random variables X i are iid given θ with distribution F (x θ). If you stop at stage i, you will select a to maximise the conditional expected utility a (X 1,..., X n ) = argmax E[U(θ, a) X 1,..., X n ]. a A The rule defined above is a terminal decision rule, which can be selected independently of the stopping rule. 11 / 35

Finite Horizon problems A finite horizon problem is one where for some n we have y i (x 1,..., x i ) = for all i n. These problems can be solved by backwards induction. Define V (T ) T (x 1:T ) = y T (x 1:T ), and inductively: [ V (T ) j (x 1:j ) = max{y j (x 1:j ), E V (T ) j+1 (x 1:j, X j ) X 1:j = x 1:j ]}. It is optimal to stop at stage t if V (T ) j (x 1:j ) = y j (x 1:j ). 12 / 35

Secretary problem with arbitrary monotonic utility Utility U(j) for accepting a candidate of absolute rank j, nonincreasing in j. Selecting no candidate has utility Z. Probability that the jth candidate has absolute rank b, given that it has relative rank x: ) f (b j, x) = ( b 1 )( n b x 1 j x ( n j ). The expected utility of stopping on a candidate x = x j is y j (x 1:j ) = y j (x) = n j x b=x U(b)f (b j, x) Due to a recursion on f with the same form as below, with y n (x) = U(x) we find that: y j 1 (x) = x j y j(x + 1) + j x y j (x). j 13 / 35

Monotonic utility The above gives gives us a recurrence for the value function. With we have V (n) n (x n ) = max(u(x n ), Z ), V (n) j (x j ) = max{y j (x j ), 1 j+1 j + 1 x=1 V (n) j+1 (x)}. 14 / 35

Monotonic utility Lemma If it is optimal to select an applicant of relative rank x at stage k, then it is optimal when (x 1, k) and when (x, k + 1). Proof. Define A(j) = (1/j) j j (i). By hypothesis y k (x) A(k + 1). Since y k (x 1) y k (x) we have y k (x 1) A(k + 1). By our previous recursion for y k (x), one can see that y k+1 (x) y k (x), so since A(k + 1) A(k + 2) we have y k+1 (x) A(k + 2). i=1 V (n) Consequence: the optimal rule is defined by 1 r 1 r n n, where if at stage j you see a candidate with relative rank x, stop iff r x j. 15 / 35

Existence of optimal stopping rules Optimal stopping rules always exist for finite horizon problems. Not in the unbounded case: e.g. Y = 0 and Y n = (2 n 1) n 1 X i for X i independent fair coin flips. Return for stopping at stage n without failure is 2 n 1, continuing you can get expected value at least (2 n+1 1)/2 which is better; e.g. Y 0 = 0, Y n = 1 1/n, Y = 0. Two assumptions suffice to prove existence: A1. E[sup Y n ] <, n A2. lim sup Y n Y n a.s. 16 / 35

Optimality equation Two properties of the optimal value function are useful for later results. The principle of optimality: it is optimal to stop iff y n (x 1:n ) = V n (x 1:n ) where Vn (x 1:n ) = ess sup E[Y N X 1:n = x 1:n ]). N n The optimality equation: V n (x 1:n ) = max(y n, E[V n+1 X 1:n = x 1:n ]). where Y n = y n (x 1:n ). 17 / 35

Prophets A prophet can observe all the Y n values and pick the best. Denote by M = E sup n Y n. How much larger larger than Y can M be? 18 / 35

Prophet inequalities Theorem: for X i be a sequence of independent nonnegative random variables, in which the payoff Y i = X i, M 2V. Proof is constructive: by examining the marginal distribution we can find a rule that does no worse than 1/2 of the prophet. There are a number of other results of this nature. 19 / 35

Markov Models X 1 X 2 X 3... Y 1 Y 2 Y 3 Let {X n } n be a sequence of r.v. s forming a Markov chain, with Y n = u n (X n ). V n (x 1,..., x n ) is a function simply of x n, denoted by V n (x n ). This is the optimal value function for the corresponding MDP. The principle of optimality gives the rule N = min{n 0 : u n (X n ) V n (X n )}. 20 / 35

Example: selling an asset with and without recall X 1, X 2, X 2,... The variables X i are offers on a house, or expected values of actions we ve computed. We suppose an iid distribution F(x) on these observations. If recall is not allowed, then Y n = X n nc and Y 0 = Y = and If recall is allowed, then Y n = M n nc where M n = max{x 1,..., X n }, and Y 0 = Y = and One can show this problem satisfies A1 and A2, and so has an optimal rule. This problem is invariant in time: after observing a value and paying a cost, the value is independent of the future, and the cost sunk. 21 / 35

Example: selling an asset with and without recall Invariance in time, and montonicity in X i, combined with the principle of optimality gives N = min{n 1 : X n V }. To compute V, using the optimality equation: so V = E[max{X 1, V }] c = V V df(x) + V xdf(x) c (x V )df(x) = c. V The integral is continuous in V and decreasing from + to 0, hence there exists a unique solution for V. For F uniform on [0, 1], a simple computation finds { V 1 (2c) 1/2 if c 1/2 = c + 1/2 if c > 1/2. 22 / 35

Example: testing simple statistical hypotheses The special case of sequential statistical decision problems with two hypotheses Θ = {H 0, H 1 }, where P(x H i ) = f i (x), and each action accepts one hypothesis A = {a 0, a 1 }. The utility is { 0 if i = j U(H i, a j ) = L i if i j for L 0, L 1 given positive numbers. Denote by τ 0 the prior probability of H 0. The posterior τ n (X 1,..., X n ) = where the likelihood ratio is λ n = i τ 0 λ n τ 0 λ n + (1 τ 0 ) f 0 (x i ) f 1 (x i ). 23 / 35

Example: testing simple statistical hypotheses Upon stopping with τ the probability of H 0, the expected utility is that of the best action: ρ(τ) = max{ τl 0, (1 τ)l 1 }. Therefore with Y = and Y n = ρ(τ n (X 1,..., X n )) nc, A1 and A2 are easily verified, so we have an optimal rule. With V 0 (τ 0) the expected utility of the optimal rule, observe a time invariance: V n (X 1,..., X n ) = V 0 (τ n(x 1,..., X n )) nc so the rule given by the principle of optimality reduces to N = min{n 0 : Y n = V n (X 1,..., X n )} = min{n 0 : ρ(τ n (X 1,..., X n )) = V 0 (τ n(x 1,..., X n ))} 24 / 35

Example: testing simple statistical hypotheses V0 (τ) is a concave function of τ since in the inequality αv 0 (τ) + (1 α)v 0 (τ ) V 0 (ατ + (1 α)τ ) if we have a switch with probability α of making H 0 true with probability τ, and 1 α with τ, the left side is the expected utility when able to observe the switch, the right when not. Note V0 (0) = 0 = ρ(0) and V 0 (1) = 0 = ρ(1). This plus concavity and ρ(τ) V0 (τ) implies there are numbers a, b with 0 a L b 1 such that {τ : V0 (τ) = ρ(τ)} = {τ : 0 τ a or b τ 1}. where L = L 1 /(L 0 + L 1 ). Therefore, N = min{n 0 : τ n (X 1,..., X n ) a or b τ n (X 1,..., X n )} Typically a and b are found by approximation. 25 / 35

k-lookahead The previous problems were too easy: in general we must approximate solutions. We can approximate by truncating into a finite horizon problem. This doesn t avoid combinatorial explosion in value function tables. Better is the k-stage lookahead rule which has dynamic truncation N k = min{n 0 : Y n V n+k n } = min{n 0 : Y n E[V n+k n+1 X 1:n = x 1:n ]}. Simplest is 1-sla, the myopic rule: N = min{n 0 : Y n E[Y n+1 X 1:n = x 1:n ]}. 26 / 35

k-lookahead If an optimal rule exists, and if k-sla tells you to continue, then it is optimal to continue, as there is at least one rule that does better continuing than stopping now. Therefore, instead of using 2-sla continuously we can use the 1-sla until it tells us to stop, then use the 2-sla, etc. 27 / 35

Monotone stopping rule problems A stopping problem is monotone if Y n E[Y n+1 X 1:n = x 1:n ] Y n+1 E[Y n+2 X 1:n+1 = x 1:n+1 ] implies a.s. Equivalently, when 1-sla calls to stop at time n, then it calls to stop at time n + 1 irrespective of X n+1 (a.s.). Theorem: in a finite-horizon monotone stopping problem the 1-sla is optimal. This can be extended to the infinite horizon case with a reasonable regularity condition. 28 / 35

Example: proofreading (bug fixing) The number of errors E (e.g. misprints) and the number of errors detected on subsequent proofreadings X 1, X 2,... have some joint distribution such that X j 0, X j M a.s., and E[M] <. The cost for stopping after n proofreadings is Y n = nc 1 + (M n X j )c 2 1 where c 1 > 0 is the cost of a proofread, and c 2 > 0 the cost of a remaining error. 29 / 35

Example: proofreading (bug fixing) Let s compute the 1-sla. We find N 1 = min{n 0 : E{X n+1 X 1,..., X n } c 1 /c 2 }. One instance where this problem is monotone is where M has a Poisson distribution with known mean λ, X n+1 a binomial distribution with sample size M n 1 X j. We find N 1 = min{n 0 : λp(1 p) n c 1 /c 2 }. 30 / 35

Example: best-choice; sum-the-odds Observations are independent r.v. s X i taking values 0 and 1, failure and success. Our goal is to stop on the last success. Denote by p n = P(X n = 1) the nth success probability. Since we d never stop if some later p i = 1, we assume p i < 1 for i > 1. If we stop at stage n our payoff, the probability that we are on the last success, is Y n = X n i=n+1 (1 p i ). We assume i p i < so that by the Borel-Cantelli lemma there is a finite number of success a.s. 31 / 35

Example: best-choice; sum-the-odds Secretary problem. The probability that the ith candidate is relatively best is independent of the others, with probability 1/i. Therefore it is an instance of the above with p i = (1/i) [i n]. The secretary problem is not monotone, and the 1-sla not optimal: continuing from a relatively best option, the next may not be, which is obviously bad To fix this we only allow stopping on successes. Pretend observations occur on successes at times T 1, T 2,.... Let K be the time of the last success, or if none occur. The expected payoff at time n is { Y n = P(K = t T n = t) = i=t+1 (1 p i) if t < 0 otherwise. 32 / 35

Example: best-choice; sum-the-odds If we continue at time T n = t < and stop at T n+1, we expect to receive P(K = T n+1 T 1,..., T n = t) = p t+1 (1 p i ) + (1 p t+1 )p t+2 (1 p i ) +... = [ t+2 i=t+1 (1 p i ) ] i=t+1 p i 1 p i t+3 33 / 35

Example: best-choice; sum-the-odds The 1-sla is therefore N 1 = min{n 0 : i=t n+1 = min{t 1 : X t = 1 and p i 1 p i 1} i=t+1 p i 1 p i 1} This rule stops on a success at time t if the sum of the odds for future times is at most 1. The 1-sla is optimal. Define r i = p i /(1 p i ). The problem is monotone as i=t r n+1 i 1 implies the same for T n+1. Additionally the undescribed regularity conditions hold. For example, with the secretary problem the stopping rule reduces to the rule we computed before: n 1/i N 1 = min{t 1 : X t = 1 and 1. 1 1/i i=t+1 34 / 35

Summary Defined optimal stopping problems. Examples: house-selling, change point, search for species, sequential statistical decision problem. Introduced finite horizon problems, markov models, and monotone problems. Solved the secretary problem, its monotone utility extension, house selling with and without recall, testing simple statistical hypotheses, stop on the last success (sum-the-odds). Mentioned proofreading/bug-fixing. Discussed general existence of optimal rules. Optimality equation, principle of optimality. Covered a prophet inequality. To approximate solutions, we described k-lookahead. 1-sla is optimal is the problem is monotone. 35 / 35