STP Problem Set 2 Solutions

Size: px

Start display at page:

Download "STP Problem Set 2 Solutions"

Adrian Hall
5 years ago
Views:

1 STP Problem Set 2 Solutions 3.2.) Suppose that the inventory model is modified so that orders may be backlogged with a cost of b(u) when u units are backlogged for one period. We assume that revenue is received at the end of the period in which orders are placed and that backlogging costs are incurred at the beginning of each month. a.) The modified model can be formulated as a MDP as follows: Decision epochs: T = 1, 2,, N}. States: S =, 1, 0, 1,, M}, where s t < 0 if there are unfilled orders at the beginning of the t th period and s t > 0 if there is stock left over from the preceding period. Actions: A s = 0, 1,, M s}, assuming that there is no constraint on the amount of product that can be ordered to fill backlogged orders, but that the warehouse capacity limits the amount of stock that can be held until the end of the month. Transition probabilities: If p j, j 0 is the probability that j new orders are received during a period, then the transition probabilities are: p s+a j if j s + a p t (j s, a) = 0 if j > s + a. If we assume that newly ordered inventory is used to fill any backlogged orders as soon as it arrives at the beginning of each period, then r t (s, a) = f O(a) h([s + a] + ) b([s + a] ), where [x] + = x 0 is the positive part of x, [x] = x 0 is the negative part of x, and f = p j f(j). c.) If we assume that payment for an order is not received until the end of the month in which it is filled, then the rewards must be modified so that j=1 r t (s, a) = F (s, a) O(a) h([s + a] + ) b([s + a] ) where the expected revenue is equal to s+a 1 j=0 p j f(j) + f(s + a) j=s+a p j if s + a s 0 F (s, a) = s+a 1 j=0 p j f( s + j) + f(a) j=s+a p j if s + a 0 > s f(a) if s + a < 0, with f(u) equal to the actual revenue when u units are sold. 1

2 3.8) At each round of a game of chance, a player may either pay a fee of C units and play the game or else quit and depart with his current fortune. Upon paying the fee, he receives a random payoff with (discrete) distribution F ( ). a.) Here is one formulation as an optional stopping problem: Decision epochs: T = 1, 2,, N}. States: S = R }, where s t R are the player s cumulative winnings less costs at the start of the t th period if the player chooses to play the game and s t = if the player has chosen to quit. Here I am assuming that the gambler can run up an arbitrarily large debt. Actions: A s = P, Q} if s R Q} if s =, where P indicates that the player has chosen to play the game and Q indicates that the player has chosen to quit. Transition probabilities: p t (j s, a) = F (j s + C) if s, j R and a = P 1 if j = and a = Q. 0 if s R and a = P r t (s, a) = s if s R and a = Q 0 if s = and a = Q. This formulation assumes that the player does not receive the reward until he quits the game. However, other choices of the rewards are also possible, e.g., r t (s, a) = F C if a = P 0 if a = Q, where F is the mean of the payoff distribution. b.) If C = 1 and F (2) = p and F (0) = 1 p, where p [0, 1], then F = 2p and the transition probabilities are p if j = s + 1 R and a = P p t (j s, a) = 1 p if j = s 1 R and a = P 1 if j = and a = Q. 2

3 3.11) We reconsider the secretary problem when the aim is to minimize the expected rank of the candidate that receives the best offer. This can be formulated as an optional stopping problem. Decision epochs: T = 1, 2,, N}. States: S = 1, 2,, N} }. Here s t 1,, N} specifies the rank of the t th candidate relative to candidates 1,, t 1 and we require s 1 = 1. Actions: A s = C, Q} if 1 s N Q} if s =, where C indicates that the employer chooses to interview the next candidate and Q indicates that they choose to hire the current candidate. Transition probabilities: 1 t+1 if 1 s t, 1 j t + 1, and a = C p t (j s, a) = 1 if j = and a = Q 0 otherwise. where 0 if a = C r t (s, a) = Φ(s, t) if a = Q and s 1,, N} 0 if s =, Φ(s, t) = N i=1 it N ( i 1 )( N i s 1 t s ( N 1 t 1 is the expected absolute rank of the t th candidate given that their relative rank amongst the first t candidates is s 1,, t}. To derive the formula for Φ(s, t), let X t denote the relative rank of the t th candidate amongst the first t candidates and let Z t be the absolute rank of this candidate. Then, by using Bayes formula, the conditional probability mass function of Z t given X t = s 1,, t} is equal to since ) ). P(Z t = i X t = s) = P(Z t = i) P(X t = s Z t = k) P(Z t = i) ( i 1 )( N i ) / ( N 1 ) = 1 s 1 t s t 1 N 1/t P(Z t = i) = 1 N, 1 i N P(X t = s) = 1 t, 1 s t 3

4 P(X t = s Z t = i) = P ( the first t 1 objects sampled include s 1 objects with absolute rank < i and t s objects with absolute rank > i ) ) = ( i 1 )( N i s 1 t s ( N 1 t 1 ), the last expression being the probability that a hypergeometric random variable with parameters (N 1, s 1, t 1) is equal to i 1. It follows that by choosing a stopping rule that minimizes the total expected cost, we will also be minimizing the expected rank of the candidate that is offered the position. 3.18) At the start of each week a worker receives a wage offer of w units per week. He can either choose to accept employment at that wage for the entire week or he can seek alternative employment. If he decides to work during the current week, then with probability p he will have the same wage offer available at the start of the next week and with probability 1 p he will be laid off and be unable to seek new employment during that week. If he chooses to seek alternative employment, he will receive no wage during the current week, but will receive a wage offer of w during the next week, where w is chosen from the distribution p t (w w) and w is his most recent wage offer. The utility of wage w during week t is Ψ t (w). There is some ambiguity in the statement of the problem, as it is not entirely clear whether being laid off prevents the worker from seeking new employment for that week only or also prevents them from seeking alternative work for the next week. I will assume the latter, so that a worker that was laid off in week t 1 must seek alternative work during week t, which then leads to a job offer during week t + 1. I will also assume that the wage offer is w = 0 when the worker is laid off. Of course, other interpretations are possible and these will lead to somewhat different Markov decision processes. a.) This can be formulated as the following Markov decision process: Decision epochs: T = 1, 2,, N}, N. States: S = [0, ) e, l}, where a state s = (w, y) specifies the current wage offer w and y = e if the worker is either employed or seeking employment and y = l if the worker has just been laid off. Actions: A (w,e) = W, S} if w > 0, A (0,e) = S}, and A (w,l) = N} where a = W if the worker chooses to work, a = S if the worker chooses to seek alternative employment, and a = N if the worker has just been laid off, in which case they can neither work nor seek employment for the duration of that week. r t ((w, y), a) = Ψ t (w) if w > 0, y = e and a = W 0 otherwise 4

5 Transition probabilities: p if w = w, y = y = e and a = W ( p t (w, y ) (w, y), a ) 1 p if w = 0, w > 0, y = l, y = e and a = W = 1 if w = w = 0, y = e, y = l and a = N p t (w w) if y = y = e and a = S 0 otherwise. b.) Alternatively, this model can be formulated as a restless bandit. Let S 1 [0, ) l} and S 2 = [0, ) and suppose that the transition probabilities are given by p if s 1 = s 1 = s 2 = s 2 and a = 1 (1 p) if s 1 = l, s 2 = 0, s 1 = s 2 0 and a = 1 p t ((s 1, s 2) (s 1, s 2 ), a) = 1 if s 1 = s 2 = s 2 = 0, s 1 = l, and a 1, 2} p t (s 2 s 2) if s 1 = s 2, s 1 l, s 2 0 and a = 2 0 otherwise while the rewards are r t ((s 1, s 2 ), a) = Ψ t (s 1 ) if a = 1 and s 1 l 0 otherwise. As far as I can tell, this model cannot be formulated as a classical bandit problem, in which only the chosen process advances in each decision epoch. 3.26) The lazy, adaptable lion (Mangel and Clark, 1986). This problem is concerned with the effect of group size on the trade-off between hunting success and the amount of meat available to the members of a cooperatively hunting group. Larger group sizes increase the probability of a successful hunt, but result in less meat per individual. The following is a formulation of this problem as a Markov decision process. Decision epochs: T = 1, 2,, T }. States: S = [0, 30], where s specifies the energy reserves of an individual lion, measured in units of kilograms of meat. We assume that these are limited by gut capacity (30 kg) and thus we are ignoring fat reserves. We also assume that the lion is dead if s = 0. Actions: A s = 0, 1, 2, 3, 4, 5, 6} if s 0.5 and A s = 0} if s < 0.5. Here a = 0 means that the lion either chooses not to or is unable to hunt, a = 1 means that they hunt on their own, and a = n 2,, 6} means that they hunt in a group of size n. Since a hunt consumes 0.5 kg of energy, I assume that lions with energy reserves that are less than this are physically unable to hunt. On average, prey yield 164 kg of meat and this is shared equally between the members of a hunting party. Since the aim is to maximize the probability that the lion survives for T days, we set r t (s, a) = 0 for t = 1,, T 1 for all s and a, and r T (s) = 1 if s > 0 and r T (0) = 0 otherwise. 5

6 Transition probabilities: 1 if j = (s 6) + and a = 0 p(a) if j = min30, (s /a) + } and a > 0 p t (j s, a) = 1 p(a) if j = (s 6.5) + and a > 0 0 otherwise, where p(1) = 0.15, p(2) = 0.33, p(3) = 0.37, p(4) = 0.40, p(5) = 0.42 and p(6) =

STP Problem Set 3 Solutions

STP Problem Set 3 Solutions STP 425 - Problem Set 3 Solutions 4.4) Consider the separable sequential allocation problem introduced in Sections 3.3.3 and 4.6.3, where the goal is to maximize the sum subject to the constraints f(x