Optimal Stopping. Nick Hay (presentation follows Thomas Ferguson s Optimal Stopping and Applications) November 6, 2008
|
|
- Marlene O’Connor’
- 6 years ago
- Views:
Transcription
1 (presentation follows Thomas Ferguson s and Applications) November 6, / 35
2 Contents: Introduction Problems Markov Models Monotone Stopping Problems Summary 2 / 35
3 The Secretary problem You have one secretarial position open; you interview n (known) candidates sequentially: The candidates appear uniformly permutated. You have them ordered from best to worst, but you observe only their relative ordering. You must accept or reject on the spot. Which rule maximises the probability of getting the best secretary? 3 / 35
4 The Secretary problem We accept only if the current candidate is relatively best. The probability of the jth candidate being best is j/n. Let W j be the probability of winning the best secretary using an optimal rule which rejects the first j applicants. W j+1 W j, since a rule which rejects the first j + 1 rejects the first j. It is optimal to accept the jth candidate iff j/n W j. Since (j + 1)/n > j/n W j W j+1 an optimal rule will, for some r, reject the first r 1 candidates then accept the next relatively best. Call this rule N r. This is a threshold rule. 4 / 35
5 The Secretary problem The probability of winning using the rule N r is P r = = = = n P(kth applicant is best and is selected) k=r n P(k th applicant is best)p(k element is selected it is best) k=r n k=r n k=r = r 1 n 1 P(best of first k 1 appears before stage r) n 1 r 1 n k 1 n k=r 1 k 1. 5 / 35
6 The Secretary problem The probability of winning using the rule N r is r n n k=r+1 n k=r+1 P r+1 P r 1 k 1 r 1 n 1 k 1 1 n k=r 1 k 1 The optimal rule selects the first candidate that appears from state r 1 on, where n 1 r 1 = min{r 1 : k 1 1} k=r+1 Since the inner sum is approximately log(n/r) 1, we have r 1 /n e 1, and P r1 e 1. 6 / 35
7 The Secretary problem The exact probabilities and optimal stopping thresholds: n = r 1 = P r1 = Compare with 1/e / 35
8 Problems An optimal stopping problem consists of a sequence X 1, X 2,... of random variables with a particular joint distribution, and a sequence of real-valued reward functions: y 0, y 1 (x 1 ), y 2 (x 1, x 2 ),..., y (x 1, x 2,... ). A stopping rule is a sequence of functions on [0, 1]: φ = (φ 0, φ 1 (x 1 ), φ 2 (x 1, x 2 ),... ) which give the probability of stopping at stage n conditional on the n observations so far. If these functions values are always 0 or 1, the stopping rule is deterministic. Equivalently, a stopping rule is a random variable N satisfying for each n the conditional independence: P(N = n X = x) = P(N = n X 1:n = x 1:n ). 8 / 35
9 Problems The expected return V (φ) of a stopping rule is: V (φ) = E [y N (X 1,..., X N )] An optimal stopping rule maximises this expectation. Deterministic rewards are without loss of generality: we can expand X i to capture randomness of the i reward. The secretary problem is an optimal stopping problem with X i = {1,..., i}, and i/n if x i = 1 and i n, y i (x 1,..., x i ) = 0 if x i > 1 and i n, otherwise. and our solution defined a deterministic optimal stopping rule. 9 / 35
10 Examples The house-selling/job-search problem. Offers X i come in daily for an asset you wish to sell. The offers are iid, and there is a cost c of waiting. If you can recall previous offers, you receive utility max i n X i nc when stopping at time n, if you cannot you receive X n nc. Detecting a change point. A sequence of variables X 1,... is initally distributed iid according to F 0 (x), but at some unknown time switches to F 1 (x). We have Y n = c[n < T ] + (n T )[n T ] = cp(t > n X 1:n = x 1:n ) + E((n T ) + X 1:n = x 1:n ) Applications: monitoring heart patients, production quality, missile course. 10 / 35
11 Examples Search for a new species. Individual beetles are observed at unit time intervals. Probability p j independently for observing a member of species µ j. Cost c per observation, reward the number of unique species seen. Sequential statistical decision problems. You have prior knowledge τ(θ) about a parameter Θ, your goal is to choose an action a A maximising the utility U(θ, a) gained. Before deciding, you can observe sequentially variables X 1, X 2,..., with a cost of c each. The random variables X i are iid given θ with distribution F (x θ). If you stop at stage i, you will select a to maximise the conditional expected utility a (X 1,..., X n ) = argmax E[U(θ, a) X 1,..., X n ]. a A The rule defined above is a terminal decision rule, which can be selected independently of the stopping rule. 11 / 35
12 Finite Horizon problems A finite horizon problem is one where for some n we have y i (x 1,..., x i ) = for all i n. These problems can be solved by backwards induction. Define V (T ) T (x 1:T ) = y T (x 1:T ), and inductively: [ V (T ) j (x 1:j ) = max{y j (x 1:j ), E V (T ) j+1 (x 1:j, X j ) X 1:j = x 1:j ]}. It is optimal to stop at stage t if V (T ) j (x 1:j ) = y j (x 1:j ). 12 / 35
13 Secretary problem with arbitrary monotonic utility Utility U(j) for accepting a candidate of absolute rank j, nonincreasing in j. Selecting no candidate has utility Z. Probability that the jth candidate has absolute rank b, given that it has relative rank x: ) f (b j, x) = ( b 1 )( n b x 1 j x ( n j ). The expected utility of stopping on a candidate x = x j is y j (x 1:j ) = y j (x) = n j x b=x U(b)f (b j, x) Due to a recursion on f with the same form as below, with y n (x) = U(x) we find that: y j 1 (x) = x j y j(x + 1) + j x y j (x). j 13 / 35
14 Monotonic utility The above gives gives us a recurrence for the value function. With we have V (n) n (x n ) = max(u(x n ), Z ), V (n) j (x j ) = max{y j (x j ), 1 j+1 j + 1 x=1 V (n) j+1 (x)}. 14 / 35
15 Monotonic utility Lemma If it is optimal to select an applicant of relative rank x at stage k, then it is optimal when (x 1, k) and when (x, k + 1). Proof. Define A(j) = (1/j) j j (i). By hypothesis y k (x) A(k + 1). Since y k (x 1) y k (x) we have y k (x 1) A(k + 1). By our previous recursion for y k (x), one can see that y k+1 (x) y k (x), so since A(k + 1) A(k + 2) we have y k+1 (x) A(k + 2). i=1 V (n) Consequence: the optimal rule is defined by 1 r 1 r n n, where if at stage j you see a candidate with relative rank x, stop iff r x j. 15 / 35
16 Existence of optimal stopping rules Optimal stopping rules always exist for finite horizon problems. Not in the unbounded case: e.g. Y = 0 and Y n = (2 n 1) n 1 X i for X i independent fair coin flips. Return for stopping at stage n without failure is 2 n 1, continuing you can get expected value at least (2 n+1 1)/2 which is better; e.g. Y 0 = 0, Y n = 1 1/n, Y = 0. Two assumptions suffice to prove existence: A1. E[sup Y n ] <, n A2. lim sup Y n Y n a.s. 16 / 35
17 Optimality equation Two properties of the optimal value function are useful for later results. The principle of optimality: it is optimal to stop iff y n (x 1:n ) = V n (x 1:n ) where Vn (x 1:n ) = ess sup E[Y N X 1:n = x 1:n ]). N n The optimality equation: V n (x 1:n ) = max(y n, E[V n+1 X 1:n = x 1:n ]). where Y n = y n (x 1:n ). 17 / 35
18 Prophets A prophet can observe all the Y n values and pick the best. Denote by M = E sup n Y n. How much larger larger than Y can M be? 18 / 35
19 Prophet inequalities Theorem: for X i be a sequence of independent nonnegative random variables, in which the payoff Y i = X i, M 2V. Proof is constructive: by examining the marginal distribution we can find a rule that does no worse than 1/2 of the prophet. There are a number of other results of this nature. 19 / 35
20 Markov Models X 1 X 2 X 3... Y 1 Y 2 Y 3 Let {X n } n be a sequence of r.v. s forming a Markov chain, with Y n = u n (X n ). V n (x 1,..., x n ) is a function simply of x n, denoted by V n (x n ). This is the optimal value function for the corresponding MDP. The principle of optimality gives the rule N = min{n 0 : u n (X n ) V n (X n )}. 20 / 35
21 Example: selling an asset with and without recall X 1, X 2, X 2,... The variables X i are offers on a house, or expected values of actions we ve computed. We suppose an iid distribution F(x) on these observations. If recall is not allowed, then Y n = X n nc and Y 0 = Y = and If recall is allowed, then Y n = M n nc where M n = max{x 1,..., X n }, and Y 0 = Y = and One can show this problem satisfies A1 and A2, and so has an optimal rule. This problem is invariant in time: after observing a value and paying a cost, the value is independent of the future, and the cost sunk. 21 / 35
22 Example: selling an asset with and without recall Invariance in time, and montonicity in X i, combined with the principle of optimality gives N = min{n 1 : X n V }. To compute V, using the optimality equation: so V = E[max{X 1, V }] c = V V df(x) + V xdf(x) c (x V )df(x) = c. V The integral is continuous in V and decreasing from + to 0, hence there exists a unique solution for V. For F uniform on [0, 1], a simple computation finds { V 1 (2c) 1/2 if c 1/2 = c + 1/2 if c > 1/2. 22 / 35
23 Example: testing simple statistical hypotheses The special case of sequential statistical decision problems with two hypotheses Θ = {H 0, H 1 }, where P(x H i ) = f i (x), and each action accepts one hypothesis A = {a 0, a 1 }. The utility is { 0 if i = j U(H i, a j ) = L i if i j for L 0, L 1 given positive numbers. Denote by τ 0 the prior probability of H 0. The posterior τ n (X 1,..., X n ) = where the likelihood ratio is λ n = i τ 0 λ n τ 0 λ n + (1 τ 0 ) f 0 (x i ) f 1 (x i ). 23 / 35
24 Example: testing simple statistical hypotheses Upon stopping with τ the probability of H 0, the expected utility is that of the best action: ρ(τ) = max{ τl 0, (1 τ)l 1 }. Therefore with Y = and Y n = ρ(τ n (X 1,..., X n )) nc, A1 and A2 are easily verified, so we have an optimal rule. With V 0 (τ 0) the expected utility of the optimal rule, observe a time invariance: V n (X 1,..., X n ) = V 0 (τ n(x 1,..., X n )) nc so the rule given by the principle of optimality reduces to N = min{n 0 : Y n = V n (X 1,..., X n )} = min{n 0 : ρ(τ n (X 1,..., X n )) = V 0 (τ n(x 1,..., X n ))} 24 / 35
25 Example: testing simple statistical hypotheses V0 (τ) is a concave function of τ since in the inequality αv 0 (τ) + (1 α)v 0 (τ ) V 0 (ατ + (1 α)τ ) if we have a switch with probability α of making H 0 true with probability τ, and 1 α with τ, the left side is the expected utility when able to observe the switch, the right when not. Note V0 (0) = 0 = ρ(0) and V 0 (1) = 0 = ρ(1). This plus concavity and ρ(τ) V0 (τ) implies there are numbers a, b with 0 a L b 1 such that {τ : V0 (τ) = ρ(τ)} = {τ : 0 τ a or b τ 1}. where L = L 1 /(L 0 + L 1 ). Therefore, N = min{n 0 : τ n (X 1,..., X n ) a or b τ n (X 1,..., X n )} Typically a and b are found by approximation. 25 / 35
26 k-lookahead The previous problems were too easy: in general we must approximate solutions. We can approximate by truncating into a finite horizon problem. This doesn t avoid combinatorial explosion in value function tables. Better is the k-stage lookahead rule which has dynamic truncation N k = min{n 0 : Y n V n+k n } = min{n 0 : Y n E[V n+k n+1 X 1:n = x 1:n ]}. Simplest is 1-sla, the myopic rule: N = min{n 0 : Y n E[Y n+1 X 1:n = x 1:n ]}. 26 / 35
27 k-lookahead If an optimal rule exists, and if k-sla tells you to continue, then it is optimal to continue, as there is at least one rule that does better continuing than stopping now. Therefore, instead of using 2-sla continuously we can use the 1-sla until it tells us to stop, then use the 2-sla, etc. 27 / 35
28 Monotone stopping rule problems A stopping problem is monotone if Y n E[Y n+1 X 1:n = x 1:n ] Y n+1 E[Y n+2 X 1:n+1 = x 1:n+1 ] implies a.s. Equivalently, when 1-sla calls to stop at time n, then it calls to stop at time n + 1 irrespective of X n+1 (a.s.). Theorem: in a finite-horizon monotone stopping problem the 1-sla is optimal. This can be extended to the infinite horizon case with a reasonable regularity condition. 28 / 35
29 Example: proofreading (bug fixing) The number of errors E (e.g. misprints) and the number of errors detected on subsequent proofreadings X 1, X 2,... have some joint distribution such that X j 0, X j M a.s., and E[M] <. The cost for stopping after n proofreadings is Y n = nc 1 + (M n X j )c 2 1 where c 1 > 0 is the cost of a proofread, and c 2 > 0 the cost of a remaining error. 29 / 35
30 Example: proofreading (bug fixing) Let s compute the 1-sla. We find N 1 = min{n 0 : E{X n+1 X 1,..., X n } c 1 /c 2 }. One instance where this problem is monotone is where M has a Poisson distribution with known mean λ, X n+1 a binomial distribution with sample size M n 1 X j. We find N 1 = min{n 0 : λp(1 p) n c 1 /c 2 }. 30 / 35
31 Example: best-choice; sum-the-odds Observations are independent r.v. s X i taking values 0 and 1, failure and success. Our goal is to stop on the last success. Denote by p n = P(X n = 1) the nth success probability. Since we d never stop if some later p i = 1, we assume p i < 1 for i > 1. If we stop at stage n our payoff, the probability that we are on the last success, is Y n = X n i=n+1 (1 p i ). We assume i p i < so that by the Borel-Cantelli lemma there is a finite number of success a.s. 31 / 35
32 Example: best-choice; sum-the-odds Secretary problem. The probability that the ith candidate is relatively best is independent of the others, with probability 1/i. Therefore it is an instance of the above with p i = (1/i) [i n]. The secretary problem is not monotone, and the 1-sla not optimal: continuing from a relatively best option, the next may not be, which is obviously bad To fix this we only allow stopping on successes. Pretend observations occur on successes at times T 1, T 2,.... Let K be the time of the last success, or if none occur. The expected payoff at time n is { Y n = P(K = t T n = t) = i=t+1 (1 p i) if t < 0 otherwise. 32 / 35
33 Example: best-choice; sum-the-odds If we continue at time T n = t < and stop at T n+1, we expect to receive P(K = T n+1 T 1,..., T n = t) = p t+1 (1 p i ) + (1 p t+1 )p t+2 (1 p i ) +... = [ t+2 i=t+1 (1 p i ) ] i=t+1 p i 1 p i t+3 33 / 35
34 Example: best-choice; sum-the-odds The 1-sla is therefore N 1 = min{n 0 : i=t n+1 = min{t 1 : X t = 1 and p i 1 p i 1} i=t+1 p i 1 p i 1} This rule stops on a success at time t if the sum of the odds for future times is at most 1. The 1-sla is optimal. Define r i = p i /(1 p i ). The problem is monotone as i=t r n+1 i 1 implies the same for T n+1. Additionally the undescribed regularity conditions hold. For example, with the secretary problem the stopping rule reduces to the rule we computed before: n 1/i N 1 = min{t 1 : X t = 1 and /i i=t+1 34 / 35
35 Summary Defined optimal stopping problems. Examples: house-selling, change point, search for species, sequential statistical decision problem. Introduced finite horizon problems, markov models, and monotone problems. Solved the secretary problem, its monotone utility extension, house selling with and without recall, testing simple statistical hypotheses, stop on the last success (sum-the-odds). Mentioned proofreading/bug-fixing. Discussed general existence of optimal rules. Optimality equation, principle of optimality. Covered a prophet inequality. To approximate solutions, we described k-lookahead. 1-sla is optimal is the problem is monotone. 35 / 35
Martingales. by D. Cox December 2, 2009
Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a
More informationMath-Stat-491-Fall2014-Notes-V
Math-Stat-491-Fall2014-Notes-V Hariharan Narayanan December 7, 2014 Martingales 1 Introduction Martingales were originally introduced into probability theory as a model for fair betting games. Essentially
More informationLecture 23: April 10
CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 23: April 10 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationLecture 5 January 30
EE 223: Stochastic Estimation and Control Spring 2007 Lecture 5 January 30 Lecturer: Venkat Anantharam Scribe: aryam Kamgarpour 5.1 Secretary Problem The problem set-up is explained in Lecture 4. We review
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More information4 Martingales in Discrete-Time
4 Martingales in Discrete-Time Suppose that (Ω, F, P is a probability space. Definition 4.1. A sequence F = {F n, n = 0, 1,...} is called a filtration if each F n is a sub-σ-algebra of F, and F n F n+1
More informationDynamic Admission and Service Rate Control of a Queue
Dynamic Admission and Service Rate Control of a Queue Kranthi Mitra Adusumilli and John J. Hasenbein 1 Graduate Program in Operations Research and Industrial Engineering Department of Mechanical Engineering
More informationA1: American Options in the Binomial Model
Appendix 1 A1: American Options in the Binomial Model So far we were dealing with options which can be excercised only at a fixed time, at their maturity date T. These are european options. In a complete
More informationDefinition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.
9 Point estimation 9.1 Rationale behind point estimation When sampling from a population described by a pdf f(x θ) or probability function P [X = x θ] knowledge of θ gives knowledge of the entire population.
More informationInformation aggregation for timing decision making.
MPRA Munich Personal RePEc Archive Information aggregation for timing decision making. Esteban Colla De-Robertis Universidad Panamericana - Campus México, Escuela de Ciencias Económicas y Empresariales
More informationEconometrica Supplementary Material
Econometrica Supplementary Material PUBLIC VS. PRIVATE OFFERS: THE TWO-TYPE CASE TO SUPPLEMENT PUBLIC VS. PRIVATE OFFERS IN THE MARKET FOR LEMONS (Econometrica, Vol. 77, No. 1, January 2009, 29 69) BY
More informationChapter6.MAXIMIZINGTHERATEOFRETURN.
Chapter6.MAXIMIZINGTHERATEOFRETURN. In stopping rule problems that are repeated in time, it is often appropriate to maximize the average return per unit of time. This leads to the problem of choosing a
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More information16 MAKING SIMPLE DECISIONS
247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result
More informationStochastic Optimal Control
Stochastic Optimal Control Lecturer: Eilyan Bitar, Cornell ECE Scribe: Kevin Kircher, Cornell MAE These notes summarize some of the material from ECE 5555 (Stochastic Systems) at Cornell in the fall of
More informationHouse-Hunting Without Second Moments
House-Hunting Without Second Moments Thomas S. Ferguson, University of California, Los Angeles Michael J. Klass, University of California, Berkeley Abstract: In the house-hunting problem, i.i.d. random
More informationDefinition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.
102 OPTIMAL STOPPING TIME 4. Optimal Stopping Time 4.1. Definitions. On the first day I explained the basic problem using one example in the book. On the second day I explained how the solution to the
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationLecture Notes 1
4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross
More informationMAFS Computational Methods for Pricing Structured Products
MAFS550 - Computational Methods for Pricing Structured Products Solution to Homework Two Course instructor: Prof YK Kwok 1 Expand f(x 0 ) and f(x 0 x) at x 0 into Taylor series, where f(x 0 ) = f(x 0 )
More informationProblem 1: Random variables, common distributions and the monopoly price
Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively
More informationVersion A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.
Math 224 Q Exam 3A Fall 217 Tues Dec 12 Version A Problem 1. Let X be the continuous random variable defined by the following pdf: { 1 x/2 when x 2, f(x) otherwise. (a) Compute the mean µ E[X]. E[X] x
More informationCHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION
CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION Szabolcs Sebestyén szabolcs.sebestyen@iscte.pt Master in Finance INVESTMENTS Sebestyén (ISCTE-IUL) Choice Theory Investments 1 / 65 Outline 1 An Introduction
More informationMechanism Design and Auctions
Mechanism Design and Auctions Game Theory Algorithmic Game Theory 1 TOC Mechanism Design Basics Myerson s Lemma Revenue-Maximizing Auctions Near-Optimal Auctions Multi-Parameter Mechanism Design and the
More information16 MAKING SIMPLE DECISIONS
253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)
More informationFrom Discrete Time to Continuous Time Modeling
From Discrete Time to Continuous Time Modeling Prof. S. Jaimungal, Department of Statistics, University of Toronto 2004 Arrow-Debreu Securities 2004 Prof. S. Jaimungal 2 Consider a simple one-period economy
More informationUniversal Portfolios
CS28B/Stat24B (Spring 2008) Statistical Learning Theory Lecture: 27 Universal Portfolios Lecturer: Peter Bartlett Scribes: Boriska Toth and Oriol Vinyals Portfolio optimization setting Suppose we have
More information6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n
6. Martingales For casino gamblers, a martingale is a betting strategy where (at even odds) the stake doubled each time the player loses. Players follow this strategy because, since they will eventually
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationX i = 124 MARTINGALES
124 MARTINGALES 5.4. Optimal Sampling Theorem (OST). First I stated it a little vaguely: Theorem 5.12. Suppose that (1) T is a stopping time (2) M n is a martingale wrt the filtration F n (3) certain other
More informationFDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.
FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.) Hints for Problem Set 2 1. Consider a zero-sum game, where
More information17 MAKING COMPLEX DECISIONS
267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Gittins Index: Discounted, Bayesian (hence Markov arms). Reduces to stopping problem for each arm. Interpretation as (scaled)
More informationChapter 7: Estimation Sections
1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:
More informationCopyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the
Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the open text license amendment to version 2 of the GNU General
More informationDecision Theory: Value Iteration
Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision
More informationDynamic Portfolio Choice II
Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic
More informationIntroduction to Probability Theory and Stochastic Processes for Finance Lecture Notes
Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes Fabio Trojani Department of Economics, University of St. Gallen, Switzerland Correspondence address: Fabio Trojani,
More informationProblems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:
Math 224 Fall 207 Homework 5 Drew Armstrong Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Section 3., Exercises 3, 0. Section 3.3, Exercises 2, 3, 0,.
More informationLecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.
Sufficient Statistics Lecture Notes 6 Sufficiency Data reduction in terms of a particular statistic can be thought of as a partition of the sample space X. Definition T is sufficient for θ if the conditional
More informationHomework 2: Dynamic Moral Hazard
Homework 2: Dynamic Moral Hazard Question 0 (Normal learning model) Suppose that z t = θ + ɛ t, where θ N(m 0, 1/h 0 ) and ɛ t N(0, 1/h ɛ ) are IID. Show that θ z 1 N ( hɛ z 1 h 0 + h ɛ + h 0m 0 h 0 +
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 11 10/9/2013. Martingales and stopping times II
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 11 10/9/013 Martingales and stopping times II Content. 1. Second stopping theorem.. Doob-Kolmogorov inequality. 3. Applications of stopping
More informationCMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory
CMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory Instructor: Mohammad T. Hajiaghayi Scribe: Hyoungtae Cho October 13, 2010 1 Overview In this lecture, we introduce the
More informationSOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS
SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS Questions 1-307 have been taken from the previous set of Exam C sample questions. Questions no longer relevant
More informationB. Online Appendix. where ɛ may be arbitrarily chosen to satisfy 0 < ɛ < s 1 and s 1 is defined in (B1). This can be rewritten as
B Online Appendix B1 Constructing examples with nonmonotonic adoption policies Assume c > 0 and the utility function u(w) is increasing and approaches as w approaches 0 Suppose we have a prior distribution
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationSTP Problem Set 3 Solutions
STP 425 - Problem Set 3 Solutions 4.4) Consider the separable sequential allocation problem introduced in Sections 3.3.3 and 4.6.3, where the goal is to maximize the sum subject to the constraints f(x
More informationAdmissioncontrolwithbatcharrivals
Admissioncontrolwithbatcharrivals E. Lerzan Örmeci Department of Industrial Engineering Koç University Sarıyer 34450 İstanbul-Turkey Apostolos Burnetas Department of Operations Weatherhead School of Management
More informationMartingale Pricing Theory in Discrete-Time and Discrete-Space Models
IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,
More informationThe Stigler-Luckock model with market makers
Prague, January 7th, 2017. Order book Nowadays, demand and supply is often realized by electronic trading systems storing the information in databases. Traders with access to these databases quote their
More informationSession 9: The expected utility framework p. 1
Session 9: The expected utility framework Susan Thomas http://www.igidr.ac.in/ susant susant@mayin.org IGIDR Bombay Session 9: The expected utility framework p. 1 Questions How do humans make decisions
More informationGAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.
14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose
More informationOptimal Stopping Rules of Discrete-Time Callable Financial Commodities with Two Stopping Boundaries
The Ninth International Symposium on Operations Research Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 215 224 Optimal Stopping Rules of Discrete-Time
More informationOutline of Lecture 1. Martin-Löf tests and martingales
Outline of Lecture 1 Martin-Löf tests and martingales The Cantor space. Lebesgue measure on Cantor space. Martin-Löf tests. Basic properties of random sequences. Betting games and martingales. Equivalence
More informationCasino gambling problem under probability weighting
Casino gambling problem under probability weighting Sang Hu National University of Singapore Mathematical Finance Colloquium University of Southern California Jan 25, 2016 Based on joint work with Xue
More informationTopics in Contract Theory Lecture 3
Leonardo Felli 9 January, 2002 Topics in Contract Theory Lecture 3 Consider now a different cause for the failure of the Coase Theorem: the presence of transaction costs. Of course for this to be an interesting
More informationEvaluating Strategic Forecasters. Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017
Evaluating Strategic Forecasters Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017 Motivation Forecasters are sought after in a variety of
More informationLog-linear Dynamics and Local Potential
Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically
More informationAsymptotic results discrete time martingales and stochastic algorithms
Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete
More informationComparison of proof techniques in game-theoretic probability and measure-theoretic probability
Comparison of proof techniques in game-theoretic probability and measure-theoretic probability Akimichi Takemura, Univ. of Tokyo March 31, 2008 1 Outline: A.Takemura 0. Background and our contributions
More informationReinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein
Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Final Exam
CS 70 Discrete Mathematics for CS Spring 2008 David Wagner Final Exam PRINT your name:, (last) SIGN your name: (first) PRINT your Unix account login: Your section time (e.g., Tue 3pm): Name of the person
More informationTug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract
Tug of War Game William Gasarch and ick Sovich and Paul Zimand October 6, 2009 To be written later Abstract Introduction Combinatorial games under auction play, introduced by Lazarus, Loeb, Propp, Stromquist,
More informationFE 5204 Stochastic Differential Equations
Instructor: Jim Zhu e-mail:zhu@wmich.edu http://homepages.wmich.edu/ zhu/ January 13, 2009 Stochastic differential equations deal with continuous random processes. They are idealization of discrete stochastic
More informationSelf-organized criticality on the stock market
Prague, January 5th, 2014. Some classical ecomomic theory In classical economic theory, the price of a commodity is determined by demand and supply. Let D(p) (resp. S(p)) be the total demand (resp. supply)
More informationReinforcement Learning
Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent
More informationWeb Appendix: Proofs and extensions.
B eb Appendix: Proofs and extensions. B.1 Proofs of results about block correlated markets. This subsection provides proofs for Propositions A1, A2, A3 and A4, and the proof of Lemma A1. Proof of Proposition
More informationChapter 6: Risky Securities and Utility Theory
Chapter 6: Risky Securities and Utility Theory Topics 1. Principle of Expected Return 2. St. Petersburg Paradox 3. Utility Theory 4. Principle of Expected Utility 5. The Certainty Equivalent 6. Utility
More informationOptimizing Portfolios
Optimizing Portfolios An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2010 Introduction Investors may wish to adjust the allocation of financial resources including a mixture
More informationPakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks
Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Spring 2009 Main question: How much are patents worth? Answering this question is important, because it helps
More informationHomework Assignments
Homework Assignments Week 1 (p. 57) #4.1, 4., 4.3 Week (pp 58 6) #4.5, 4.6, 4.8(a), 4.13, 4.0, 4.6(b), 4.8, 4.31, 4.34 Week 3 (pp 15 19) #1.9, 1.1, 1.13, 1.15, 1.18 (pp 9 31) #.,.6,.9 Week 4 (pp 36 37)
More informationSTAT/MATH 395 PROBABILITY II
STAT/MATH 395 PROBABILITY II Distribution of Random Samples & Limit Theorems Néhémy Lim University of Washington Winter 2017 Outline Distribution of i.i.d. Samples Convergence of random variables The Laws
More informationCS 361: Probability & Statistics
March 12, 2018 CS 361: Probability & Statistics Inference Binomial likelihood: Example Suppose we have a coin with an unknown probability of heads. We flip the coin 10 times and observe 2 heads. What can
More information5. In fact, any function of a random variable is also a random variable
Random Variables - Class 11 October 14, 2012 Debdeep Pati 1 Random variables 1.1 Expectation of a function of a random variable 1. Expectation of a function of a random variable 2. We know E(X) = x xp(x)
More informationCONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES
CONVERGENCE OF OPTION REWARDS FOR MARKOV TYPE PRICE PROCESSES MODULATED BY STOCHASTIC INDICES D. S. SILVESTROV, H. JÖNSSON, AND F. STENBERG Abstract. A general price process represented by a two-component
More informationOptimal Policies for Distributed Data Aggregation in Wireless Sensor Networks
Optimal Policies for Distributed Data Aggregation in Wireless Sensor Networks Hussein Abouzeid Department of Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute abouzeid@ecse.rpi.edu
More informationOn the Optimality of a Family of Binary Trees Techical Report TR
On the Optimality of a Family of Binary Trees Techical Report TR-011101-1 Dana Vrajitoru and William Knight Indiana University South Bend Department of Computer and Information Sciences Abstract In this
More informationAdditional questions for chapter 3
Additional questions for chapter 3 1. Let ξ 1, ξ 2,... be independent and identically distributed with φθ) = IEexp{θξ 1 })
More informationAMH4 - ADVANCED OPTION PRICING. Contents
AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5
More informationAdaptive Experiments for Policy Choice. March 8, 2019
Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:
More informationRevenue Management with Forward-Looking Buyers
Revenue Management with Forward-Looking Buyers Posted Prices and Fire-sales Simon Board Andy Skrzypacz UCLA Stanford June 4, 2013 The Problem Seller owns K units of a good Seller has T periods to sell
More informationIntroduction to Fall 2007 Artificial Intelligence Final Exam
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Final Exam You have 180 minutes. The exam is closed book, closed notes except a two-page crib sheet, basic calculators
More informationReview for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom
Review for Final Exam 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom THANK YOU!!!! JON!! PETER!! RUTHI!! ERIKA!! ALL OF YOU!!!! Probability Counting Sets Inclusion-exclusion principle Rule of product
More informationA Decentralized Learning Equilibrium
Paper to be presented at the DRUID Society Conference 2014, CBS, Copenhagen, June 16-18 A Decentralized Learning Equilibrium Andreas Blume University of Arizona Economics ablume@email.arizona.edu April
More informationPOMDPs: Partially Observable Markov Decision Processes Advanced AI
POMDPs: Partially Observable Markov Decision Processes Advanced AI Wolfram Burgard Types of Planning Problems Classical Planning State observable Action Model Deterministic, accurate MDPs observable stochastic
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationGame Theory: Normal Form Games
Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.
More informationLecture 3: Review of mathematical finance and derivative pricing models
Lecture 3: Review of mathematical finance and derivative pricing models Xiaoguang Wang STAT 598W January 21th, 2014 (STAT 598W) Lecture 3 1 / 51 Outline 1 Some model independent definitions and principals
More informationProbability without Measure!
Probability without Measure! Mark Saroufim University of California San Diego msaroufi@cs.ucsd.edu February 18, 2014 Mark Saroufim (UCSD) It s only a Game! February 18, 2014 1 / 25 Overview 1 History of
More informationOptimal stopping problems for a Brownian motion with a disorder on a finite interval
Optimal stopping problems for a Brownian motion with a disorder on a finite interval A. N. Shiryaev M. V. Zhitlukhin arxiv:1212.379v1 [math.st] 15 Dec 212 December 18, 212 Abstract We consider optimal
More informationSublinear Time Algorithms Oct 19, Lecture 1
0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation
More informationOptimal Investment for Worst-Case Crash Scenarios
Optimal Investment for Worst-Case Crash Scenarios A Martingale Approach Frank Thomas Seifried Department of Mathematics, University of Kaiserslautern June 23, 2010 (Bachelier 2010) Worst-Case Portfolio
More informationAre the Azéma-Yor processes truly remarkable?
Are the Azéma-Yor processes truly remarkable? Jan Obłój j.obloj@imperial.ac.uk based on joint works with L. Carraro, N. El Karoui, A. Meziou and M. Yor Welsh Probability Seminar, 17 Jan 28 Are the Azéma-Yor
More informationDynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment
Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3-6, 2012 Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing
More informationOutline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.
Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization
More informationMultiple Optimal Stopping Problems and Lookback Options
Multiple Optimal Stopping Problems and Lookback Options Yue Kuen KWOK Department of Mathematics Hong Kong University of Science & Technology Hong Kong, China web page: http://www.math.ust.hk/ maykwok/
More informationLecture 10: Point Estimation
Lecture 10: Point Estimation MSU-STT-351-Sum-17B (P. Vellaisamy: MSU-STT-351-Sum-17B) Probability & Statistics for Engineers 1 / 31 Basic Concepts of Point Estimation A point estimate of a parameter θ,
More informationInformation Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)
Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision
More informationDynamic Portfolio Execution Detailed Proofs
Dynamic Portfolio Execution Detailed Proofs Gerry Tsoukalas, Jiang Wang, Kay Giesecke March 16, 2014 1 Proofs Lemma 1 (Temporary Price Impact) A buy order of size x being executed against i s ask-side
More information