Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, PDF Free Download

(presentation follows Thomas Ferguson s and Applications) November 6, 2008 1 / 35

Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35

The Secretary problem You have one secretarial position open; you interview n (known) candidates sequentially: The candidates appear uniformly permutated. You have them ordered from best to worst, but you observe only their relative ordering. You must accept or reject on the spot. Which rule maximises the probability of getting the best secretary? 3 / 35

The Secretary problem We accept only if the current candidate is relatively best. The probability of the jth candidate being best is j/n. Let W j be the probability of winning the best secretary using an optimal rule which rejects the first j applicants. W j+1 W j, since a rule which rejects the first j + 1 rejects the first j. It is optimal to accept the jth candidate iff j/n W j. Since (j + 1)/n > j/n W j W j+1 an optimal rule will, for some r, reject the first r 1 candidates then accept the next relatively best. Call this rule N r. This is a threshold rule. 4 / 35

The Secretary problem The probability of winning using the rule N r is P r = = = = n P(kth applicant is best and is selected) k=r n P(k th applicant is best)p(k element is selected it is best) k=r n k=r n k=r = r 1 n 1 P(best of first k 1 appears before stage r) n 1 r 1 n k 1 n k=r 1 k 1. 5 / 35

The Secretary problem The probability of winning using the rule N r is r n n k=r+1 n k=r+1 P r+1 P r 1 k 1 r 1 n 1 k 1 1 n k=r 1 k 1 The optimal rule selects the first candidate that appears from state r 1 on, where n 1 r 1 = min{r 1 : k 1 1} k=r+1 Since the inner sum is approximately log(n/r) 1, we have r 1 /n e 1, and P r1 e 1. 6 / 35

The Secretary problem The exact probabilities and optimal stopping thresholds: n = 1 2 3 4 5 6 7 8 r 1 = 1 1 2 2 3 3 3 4 P r1 = 1.0.500.500.458.433.428.414.410 Compare with 1/e 0.369. 7 / 35

Problems An optimal stopping problem consists of a sequence X 1, X 2,... of random variables with a particular joint distribution, and a sequence of real-valued reward functions: y 0, y 1 (x 1 ), y 2 (x 1, x 2 ),..., y (x 1, x 2,... ). A stopping rule is a sequence of functions on [0, 1]: φ = (φ 0, φ 1 (x 1 ), φ 2 (x 1, x 2 ),... ) which give the probability of stopping at stage n conditional on the n observations so far. If these functions values are always 0 or 1, the stopping rule is deterministic. Equivalently, a stopping rule is a random variable N satisfying for each n the conditional independence: P(N = n X = x) = P(N = n X 1:n = x 1:n ). 8 / 35

Problems The expected return V (φ) of a stopping rule is: V (φ) = E [y N (X 1,..., X N )] An optimal stopping rule maximises this expectation. Deterministic rewards are without loss of generality: we can expand X i to capture randomness of the i reward. The secretary problem is an optimal stopping problem with X i = {1,..., i}, and i/n if x i = 1 and i n, y i (x 1,..., x i ) = 0 if x i > 1 and i n, otherwise. and our solution defined a deterministic optimal stopping rule. 9 / 35

Examples The house-selling/job-search problem. Offers X i come in daily for an asset you wish to sell. The offers are iid, and there is a cost c of waiting. If you can recall previous offers, you receive utility max i n X i nc when stopping at time n, if you cannot you receive X n nc. Detecting a change point. A sequence of variables X 1,... is initally distributed iid according to F 0 (x), but at some unknown time switches to F 1 (x). We have Y n = c[n < T ] + (n T )[n T ] = cp(t > n X 1:n = x 1:n ) + E((n T ) + X 1:n = x 1:n ) Applications: monitoring heart patients, production quality, missile course. 10 / 35

Examples Search for a new species. Individual beetles are observed at unit time intervals. Probability p j independently for observing a member of species µ j. Cost c per observation, reward the number of unique species seen. Sequential statistical decision problems. You have prior knowledge τ(θ) about a parameter Θ, your goal is to choose an action a A maximising the utility U(θ, a) gained. Before deciding, you can observe sequentially variables X 1, X 2,..., with a cost of c each. The random variables X i are iid given θ with distribution F (x θ). If you stop at stage i, you will select a to maximise the conditional expected utility a (X 1,..., X n ) = argmax E[U(θ, a) X 1,..., X n ]. a A The rule defined above is a terminal decision rule, which can be selected independently of the stopping rule. 11 / 35

Finite Horizon problems A finite horizon problem is one where for some n we have y i (x 1,..., x i ) = for all i n. These problems can be solved by backwards induction. Define V (T ) T (x 1:T ) = y T (x 1:T ), and inductively: [ V (T ) j (x 1:j ) = max{y j (x 1:j ), E V (T ) j+1 (x 1:j, X j ) X 1:j = x 1:j ]}. It is optimal to stop at stage t if V (T ) j (x 1:j ) = y j (x 1:j ). 12 / 35

Secretary problem with arbitrary monotonic utility Utility U(j) for accepting a candidate of absolute rank j, nonincreasing in j. Selecting no candidate has utility Z. Probability that the jth candidate has absolute rank b, given that it has relative rank x: ) f (b j, x) = ( b 1 )( n b x 1 j x ( n j ). The expected utility of stopping on a candidate x = x j is y j (x 1:j ) = y j (x) = n j x b=x U(b)f (b j, x) Due to a recursion on f with the same form as below, with y n (x) = U(x) we find that: y j 1 (x) = x j y j(x + 1) + j x y j (x). j 13 / 35

Monotonic utility The above gives gives us a recurrence for the value function. With we have V (n) n (x n ) = max(u(x n ), Z ), V (n) j (x j ) = max{y j (x j ), 1 j+1 j + 1 x=1 V (n) j+1 (x)}. 14 / 35

Monotonic utility Lemma If it is optimal to select an applicant of relative rank x at stage k, then it is optimal when (x 1, k) and when (x, k + 1). Proof. Define A(j) = (1/j) j j (i). By hypothesis y k (x) A(k + 1). Since y k (x 1) y k (x) we have y k (x 1) A(k + 1). By our previous recursion for y k (x), one can see that y k+1 (x) y k (x), so since A(k + 1) A(k + 2) we have y k+1 (x) A(k + 2). i=1 V (n) Consequence: the optimal rule is defined by 1 r 1 r n n, where if at stage j you see a candidate with relative rank x, stop iff r x j. 15 / 35

Existence of optimal stopping rules Optimal stopping rules always exist for finite horizon problems. Not in the unbounded case: e.g. Y = 0 and Y n = (2 n 1) n 1 X i for X i independent fair coin flips. Return for stopping at stage n without failure is 2 n 1, continuing you can get expected value at least (2 n+1 1)/2 which is better; e.g. Y 0 = 0, Y n = 1 1/n, Y = 0. Two assumptions suffice to prove existence: A1. E[sup Y n ] <, n A2. lim sup Y n Y n a.s. 16 / 35

Optimality equation Two properties of the optimal value function are useful for later results. The principle of optimality: it is optimal to stop iff y n (x 1:n ) = V n (x 1:n ) where Vn (x 1:n ) = ess sup E[Y N X 1:n = x 1:n ]). N n The optimality equation: V n (x 1:n ) = max(y n, E[V n+1 X 1:n = x 1:n ]). where Y n = y n (x 1:n ). 17 / 35

Prophets A prophet can observe all the Y n values and pick the best. Denote by M = E sup n Y n. How much larger larger than Y can M be? 18 / 35

Prophet inequalities Theorem: for X i be a sequence of independent nonnegative random variables, in which the payoff Y i = X i, M 2V. Proof is constructive: by examining the marginal distribution we can find a rule that does no worse than 1/2 of the prophet. There are a number of other results of this nature. 19 / 35

Markov Models X 1 X 2 X 3... Y 1 Y 2 Y 3 Let {X n } n be a sequence of r.v. s forming a Markov chain, with Y n = u n (X n ). V n (x 1,..., x n ) is a function simply of x n, denoted by V n (x n ). This is the optimal value function for the corresponding MDP. The principle of optimality gives the rule N = min{n 0 : u n (X n ) V n (X n )}. 20 / 35

Example: selling an asset with and without recall X 1, X 2, X 2,... The variables X i are offers on a house, or expected values of actions we ve computed. We suppose an iid distribution F(x) on these observations. If recall is not allowed, then Y n = X n nc and Y 0 = Y = and If recall is allowed, then Y n = M n nc where M n = max{x 1,..., X n }, and Y 0 = Y = and One can show this problem satisfies A1 and A2, and so has an optimal rule. This problem is invariant in time: after observing a value and paying a cost, the value is independent of the future, and the cost sunk. 21 / 35

Example: selling an asset with and without recall Invariance in time, and montonicity in X i, combined with the principle of optimality gives N = min{n 1 : X n V }. To compute V, using the optimality equation: so V = E[max{X 1, V }] c = V V df(x) + V xdf(x) c (x V )df(x) = c. V The integral is continuous in V and decreasing from + to 0, hence there exists a unique solution for V. For F uniform on [0, 1], a simple computation finds { V 1 (2c) 1/2 if c 1/2 = c + 1/2 if c > 1/2. 22 / 35

Example: testing simple statistical hypotheses The special case of sequential statistical decision problems with two hypotheses Θ = {H 0, H 1 }, where P(x H i ) = f i (x), and each action accepts one hypothesis A = {a 0, a 1 }. The utility is { 0 if i = j U(H i, a j ) = L i if i j for L 0, L 1 given positive numbers. Denote by τ 0 the prior probability of H 0. The posterior τ n (X 1,..., X n ) = where the likelihood ratio is λ n = i τ 0 λ n τ 0 λ n + (1 τ 0 ) f 0 (x i ) f 1 (x i ). 23 / 35

Example: testing simple statistical hypotheses Upon stopping with τ the probability of H 0, the expected utility is that of the best action: ρ(τ) = max{ τl 0, (1 τ)l 1 }. Therefore with Y = and Y n = ρ(τ n (X 1,..., X n )) nc, A1 and A2 are easily verified, so we have an optimal rule. With V 0 (τ 0) the expected utility of the optimal rule, observe a time invariance: V n (X 1,..., X n ) = V 0 (τ n(x 1,..., X n )) nc so the rule given by the principle of optimality reduces to N = min{n 0 : Y n = V n (X 1,..., X n )} = min{n 0 : ρ(τ n (X 1,..., X n )) = V 0 (τ n(x 1,..., X n ))} 24 / 35

Example: testing simple statistical hypotheses V0 (τ) is a concave function of τ since in the inequality αv 0 (τ) + (1 α)v 0 (τ ) V 0 (ατ + (1 α)τ ) if we have a switch with probability α of making H 0 true with probability τ, and 1 α with τ, the left side is the expected utility when able to observe the switch, the right when not. Note V0 (0) = 0 = ρ(0) and V 0 (1) = 0 = ρ(1). This plus concavity and ρ(τ) V0 (τ) implies there are numbers a, b with 0 a L b 1 such that {τ : V0 (τ) = ρ(τ)} = {τ : 0 τ a or b τ 1}. where L = L 1 /(L 0 + L 1 ). Therefore, N = min{n 0 : τ n (X 1,..., X n ) a or b τ n (X 1,..., X n )} Typically a and b are found by approximation. 25 / 35

k-lookahead The previous problems were too easy: in general we must approximate solutions. We can approximate by truncating into a finite horizon problem. This doesn t avoid combinatorial explosion in value function tables. Better is the k-stage lookahead rule which has dynamic truncation N k = min{n 0 : Y n V n+k n } = min{n 0 : Y n E[V n+k n+1 X 1:n = x 1:n ]}. Simplest is 1-sla, the myopic rule: N = min{n 0 : Y n E[Y n+1 X 1:n = x 1:n ]}. 26 / 35

k-lookahead If an optimal rule exists, and if k-sla tells you to continue, then it is optimal to continue, as there is at least one rule that does better continuing than stopping now. Therefore, instead of using 2-sla continuously we can use the 1-sla until it tells us to stop, then use the 2-sla, etc. 27 / 35

Monotone stopping rule problems A stopping problem is monotone if Y n E[Y n+1 X 1:n = x 1:n ] Y n+1 E[Y n+2 X 1:n+1 = x 1:n+1 ] implies a.s. Equivalently, when 1-sla calls to stop at time n, then it calls to stop at time n + 1 irrespective of X n+1 (a.s.). Theorem: in a finite-horizon monotone stopping problem the 1-sla is optimal. This can be extended to the infinite horizon case with a reasonable regularity condition. 28 / 35

Example: proofreading (bug fixing) The number of errors E (e.g. misprints) and the number of errors detected on subsequent proofreadings X 1, X 2,... have some joint distribution such that X j 0, X j M a.s., and E[M] <. The cost for stopping after n proofreadings is Y n = nc 1 + (M n X j )c 2 1 where c 1 > 0 is the cost of a proofread, and c 2 > 0 the cost of a remaining error. 29 / 35

Example: proofreading (bug fixing) Let s compute the 1-sla. We find N 1 = min{n 0 : E{X n+1 X 1,..., X n } c 1 /c 2 }. One instance where this problem is monotone is where M has a Poisson distribution with known mean λ, X n+1 a binomial distribution with sample size M n 1 X j. We find N 1 = min{n 0 : λp(1 p) n c 1 /c 2 }. 30 / 35

Example: best-choice; sum-the-odds Observations are independent r.v. s X i taking values 0 and 1, failure and success. Our goal is to stop on the last success. Denote by p n = P(X n = 1) the nth success probability. Since we d never stop if some later p i = 1, we assume p i < 1 for i > 1. If we stop at stage n our payoff, the probability that we are on the last success, is Y n = X n i=n+1 (1 p i ). We assume i p i < so that by the Borel-Cantelli lemma there is a finite number of success a.s. 31 / 35

Example: best-choice; sum-the-odds Secretary problem. The probability that the ith candidate is relatively best is independent of the others, with probability 1/i. Therefore it is an instance of the above with p i = (1/i) [i n]. The secretary problem is not monotone, and the 1-sla not optimal: continuing from a relatively best option, the next may not be, which is obviously bad To fix this we only allow stopping on successes. Pretend observations occur on successes at times T 1, T 2,.... Let K be the time of the last success, or if none occur. The expected payoff at time n is { Y n = P(K = t T n = t) = i=t+1 (1 p i) if t < 0 otherwise. 32 / 35

Example: best-choice; sum-the-odds If we continue at time T n = t < and stop at T n+1, we expect to receive P(K = T n+1 T 1,..., T n = t) = p t+1 (1 p i ) + (1 p t+1 )p t+2 (1 p i ) +... = [ t+2 i=t+1 (1 p i ) ] i=t+1 p i 1 p i t+3 33 / 35

Example: best-choice; sum-the-odds The 1-sla is therefore N 1 = min{n 0 : i=t n+1 = min{t 1 : X t = 1 and p i 1 p i 1} i=t+1 p i 1 p i 1} This rule stops on a success at time t if the sum of the odds for future times is at most 1. The 1-sla is optimal. Define r i = p i /(1 p i ). The problem is monotone as i=t r n+1 i 1 implies the same for T n+1. Additionally the undescribed regularity conditions hold. For example, with the secretary problem the stopping rule reduces to the rule we computed before: n 1/i N 1 = min{t 1 : X t = 1 and 1. 1 1/i i=t+1 34 / 35

Summary Defined optimal stopping problems. Examples: house-selling, change point, search for species, sequential statistical decision problem. Introduced finite horizon problems, markov models, and monotone problems. Solved the secretary problem, its monotone utility extension, house selling with and without recall, testing simple statistical hypotheses, stop on the last success (sum-the-odds). Mentioned proofreading/bug-fixing. Discussed general existence of optimal rules. Optimality equation, principle of optimality. Covered a prophet inequality. To approximate solutions, we described k-lookahead. 1-sla is optimal is the problem is monotone. 35 / 35

Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008