Heckmeck am Bratwurmeck or How to grill the maximum number of worms

Heckmeck am Bratwurmeck or How to grill the maximum number of worms Roland C. Seydel 24/05/22 (1) Heckmeck am Bratwurmeck 24/05/22 1 / 29

Overview 1 Introducing the dice game The basic rules Understanding the strategy 2 Computing the optimal strategy Value function and state space Computing the value function 3 The results (2) Heckmeck am Bratwurmeck 24/05/22 2 / 29

Overview Introducing the dice game 1 Introducing the dice game The basic rules Understanding the strategy 2 Computing the optimal strategy Value function and state space Computing the value function 3 The results (3) Heckmeck am Bratwurmeck 24/05/22 3 / 29

Introducing the dice game The basic rules The rules (I) Goal of the game: Get the most worms by throwing high (spot) numbers. 1 There are eight dice with six sides each, going from 1 to 6, where instead of a 6 a worm is shown. 2 You can throw the dice as often as you want; after each throw you choose a number which you haven t chosen yet and put aside all the dice showing this number. 3 The numbers of the dice put aside are added, where a worm (6) counts as a 5. No 6 put aside 0 points! You can take a piece from the shelf with number your points, and put it on top of your pile. (4) Heckmeck am Bratwurmeck 24/05/22 4 / 29

Introducing the dice game The basic rules The rules (II) The winner is the player with the most worms in his pile. Several additional rules which matter to us: Not able to put aside a new number? 0 points! You can also take a piece from the top of your coplayers piles, if you match the number exactly. If your points are not sufficient to pick a piece, you lose the piece on top of your own pile. Rules we ignore for now: Lost pieces are put back on the shelf; the piece with the largest available number is removed from the shelf. Your coplayers can also take pieces from you when it s their turn! The game is over when the shelf is empty. (5) Heckmeck am Bratwurmeck 24/05/22 5 / 29

Example game Introducing the dice game The basic rules (6) Heckmeck am Bratwurmeck 24/05/22 6 / 29

Introducing the dice game Understanding the strategy Optimal strategy: Easy to tell Is selecting four 1 a good idea? Are two 5 better, or two 6? Are two 4 better, or two 5? If I could lose four worms, should I bet or rather risk nothing? (7) Heckmeck am Bratwurmeck 24/05/22 7 / 29

Introducing the dice game Understanding the strategy Optimal strategy: Difficult to tell Are three 5 better, or two 6? And what about four 5? Could it be optimal to select one or more 3 in the first throw? Should I take the five 5, or is it too dangerous? What is the expected / most likely outcome in terms of worms? Should I stop with three 5 and two 6, or continue? (8) Heckmeck am Bratwurmeck 24/05/22 8 / 29

Introducing the dice game Understanding the strategy Parallels to option pricing Early exercise: The player can decide when to stop and exercise. American option Optimal control: At each point in time, a decision has to be taken. Swing option Knock out: If there is no 6, or you are not able to put aside a new number, our points are 0. Barrier option Conclusion Compute the optimal solution by option pricing methods? (9) Heckmeck am Bratwurmeck 24/05/22 9 / 29

Overview Computing the optimal strategy 1 Introducing the dice game The basic rules Understanding the strategy 2 Computing the optimal strategy Value function and state space Computing the value function 3 The results (10) Heckmeck am Bratwurmeck 24/05/22 10 / 29

Computing the optimal strategy Differences to option pricing Option pricing Continuous state space Continuous time Typically 1d state space Same state space in time Knockout happens at barrier line Random noise is added to process No decision needed Heckmeck Discrete state space Only up to 8 times Up to 8d state space Changing state space No clear line for knockout Random is not added to state Each throw needs a decision (control) (11) Heckmeck am Bratwurmeck 24/05/22 11 / 29

Assumptions Computing the optimal strategy 1 The player wants to maximize the expected number of worms in his turn 2 Reactions of other players are not anticipated (single-player optimization) 3 Not the total outcome of the game is optimized, but one single turn of up to 6 throws! (12) Heckmeck am Bratwurmeck 24/05/22 12 / 29

Computing the optimal strategy Value function and state space Value function Situation An array of pieces available on the shelf, on other players piles and on your own pile is called a situation. State space A vector x {1,..., 6} t of already selected dice at time t {0,..., 8} is called the state at time t. The state space for time t is the ensemble of all possible selections with x {1,..., 6} t. We consider only sorted states with notation x = {...}. Value function For a particular situation, the value function v(x) is the expected number of worms (on pieces), assuming that starting from a state of x the expectation-optimal decisions will be taken. Caveat: From now on, worms are only worms on pieces, not on dice! (13) Heckmeck am Bratwurmeck 24/05/22 13 / 29

Computing the optimal strategy Value function: payoff Value function and state space (Worm) payoff For a particular situation let w(n) be the number of worms you would get or lose for a sum of points n. Then the (worm) payoff p(x) of a state x is defined by the number of worms you would get or lose upon termination. In formulas, ({ ) i p(x) = w(n(x)) = w min(x i, 5) 6 x. 0 else Each situation has a worst payoff w(0), equal to the negative number of worms on top of your pile. Examples: p({5, 5, 5, 5, 5, 5, 5, 5}) = w(0) {...} sorted vector! p({1, 2, 3, 4, 5, 6, 6, 6}) = 3 p({1, 5, 5, 5, 6}) = 1 (14) Heckmeck am Bratwurmeck 24/05/22 14 / 29

Computing the optimal strategy Value function and state space Value function on intermediate states Intermediate state For a state x {1,..., 6} t and a throw y {1,..., 6} 8 t, the tuple (x, y) is called intermediate state, i.e., a state that still needs a decision. We can also define the value function of intermediate states (x, y): Either there is a valid best choice ỹ y (in particular ỹ x = ) such that v((x, y)) = v({x, ỹ}), or there is no valid choice, in which case v((x, y)) = w(0) (worst payoff). Conclusion: It is sufficient to compute only the value function on normal states! (15) Heckmeck am Bratwurmeck 24/05/22 15 / 29

Computing the optimal strategy Value function and state space Value function: optimal exercise For states in {1,..., 6} 8, the value function is equal to the payoff (assuming full shelf), e.g.: v({5, 5, 5, 5, 5, 5, 5, 5}) = p({5, 5, 5, 5, 5, 5, 5, 5}) = w(0) v({1, 2, 3, 4, 5, 6, 6, 6}) = p({1, 2, 3, 4, 5, 6, 6, 6}) = 3 We call this the terminal condition. There are other types of states for which the value function is determined by the payoff function: Optimal exercise It is optimal to exercise in a state x {1,..., 6} t if v(x) = p(x), i.e., the value function equals the payoff function. In this case the expected optimal number of worms can be obtained immediately. Yet we do not know the value function yet... (16) Heckmeck am Bratwurmeck 24/05/22 16 / 29

Computing the optimal strategy State space in time t: example Value function and state space Throw dice (random) Decide x = {5, 5, 5} x = {4, 5, 5, 5} x = {1, 5, 5, 5} y = {1, 3, 3, 4, 5} y = {5, 5, 5, 6, 6} y = {5, 5, 5, 5, 5} x = {3, 3, 5, 5, 5} x = {5, 5, 5, 6, 6} Exercise? Knock out! t=3 t=4 t=5 (17) Heckmeck am Bratwurmeck 24/05/22 17 / 29

Computing the optimal strategy Computing the value function Finding the value function: Possible approaches Monte Carlo simulation? This is the approach implicitly chosen by experienced players: Simulate dice throws on the computer Try different strategies in different (simulated) games. Problem: Causality difficult to establish because of multitude of possible strategies Need many simulations! Use recursion programming principle? Solution fastest to implement: Uses { max {p(x), E[max y Y v((x, y))]} x < 8 v(x) = (1) p(x) x = 8 where Y follows a multinominal (dice) distribution One command v({}) starts the whole calculation recursively Problem: Takes an eternity, because most states are computed multiple times Backward induction! Start at terminal time and compute p(x), then go backwards in time using (1). Compute each state only once (18) Heckmeck am Bratwurmeck 24/05/22 18 / 29

Computing the optimal strategy Computing the value function Needed: The multinomial distribution Q: If you toss a coin 5 times, what is the probability of getting 5 4 heads and 4 tails? A: Binomial distribution! 5! (5 4)! 4! 0.55 4 (1 0.5) 4 Multinomial (dice) distribution If you throw a k-dimensional dice n times, then the probability of getting x i times the spot number i for i = 1,..., k is f M (x) = where k i=1 p i = 1 and n i=1 x i = n. n! x 1!... x k! px 1 1... px k k (19) Heckmeck am Bratwurmeck 24/05/22 19 / 29

Backward induction Computing the optimal strategy Computing the value function The backward induction makes the implicit equation (1) explicit by making sure that the values on the right hand side are already computed: Algorithm: Backward induction 1 At time t = 8, determine terminal payoff p(x) for all possible states x {1,..., 6} 8 2 Go backwards in time t t 1, for each t-state x do: 1 Compute distribution of possible dice scenarios 2 Check for each scenario whether knocked out (no new spot numbers), or else to which future state the scenario could lead 3 Take the expectation E Y [max y Y v((x, y))] with v from future times 4 Take the maximum with p(x) (exercise instead of throwing dice) MATLAB implementation runs in just a few seconds on a normal PC! (20) Heckmeck am Bratwurmeck 24/05/22 20 / 29

Overview The results 1 Introducing the dice game The basic rules Understanding the strategy 2 Computing the optimal strategy Value function and state space Computing the value function 3 The results (21) Heckmeck am Bratwurmeck 24/05/22 21 / 29

Figure: Contour lines of value function in initial throw, if shelf starts at 21; dependent on the number of selected dice (vertical axis) of a particular spot (22) Heckmeck am Bratwurmeck 24/05/22 22 / 29 The results Contour lines of value function, full shelf, initial throw 8 Heckmeck initial throw 3 7 0.5 1 0.5 1.5 2 2.5 4 6 3.5 #dice selected 5 4 0.5 1 1.5 2 2.5 3 3 0.5 1 1.5 2 2 1 1 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 spot number 1.5

Explanation: Left almost no values in [5, 20] because for these dice sums knockout is very likely. (23) Heckmeck am Bratwurmeck 24/05/22 23 / 29 Distribution of points The results 0.5 Dice sum distribution for full shelf 0.5 Dice sum distribution for shelf starting at 30 0.45 0.45 0.4 0.4 0.35 0.35 Probability 0.3 0.25 0.2 Probability 0.3 0.25 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 5 10 15 20 25 30 35 40 Dice sum 0 0 5 10 15 20 25 30 35 40 Dice sum Figure: Distribution of dice sum under optimal strategy, from a Monte Carlo simulation with 1000 paths. Left: Full shelf, right: shelf starting at 30

Most likely selections The results What is the most likely initial selection of dice, assuming we act optimally? Answer: two 6 Number of dice Spots 1 2 3 4 5 6 7 1 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0 2 0.09 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0 3 0.13 0.0507 0.0009 0.0003 0.00 0.0000 0.0000 0.0 4 0.0724 0.13 0.0531 0.0206 0.0037 0.0004 0.0000 0.0 5 0.0037 0.1666 0.0982 0.0260 0.0042 0.0004 0.0000 0.0 6 0.50 0.2347 0.1035 0.0260 0.0042 0.0004 0.0000 0.0 Table: Probability that a number of dice (horizontal axis) carrying a certain number of spots (vertical axis) is selected under the optimal strategy. (24) Heckmeck am Bratwurmeck 24/05/22 24 / 29

The results Optimal exercise decisions: Example Original question: Stop with three 5 and two 6? (25) Heckmeck am Bratwurmeck 24/05/22 25 / 29

The results Optimal exercise decisions: Example Original question: Stop with three 5 and two 6? >> [v,a,a_inv] = heckmeck_v4(21:36, [], []); >> heckmeck_value([5,5,5,6,6],a_inv,v) 2.6277 >> [v,a,a_inv] = heckmeck_v4([21:31,33:36], [], [32]); >> heckmeck_value([5,5,5,6,6],a_inv,v) 2.4352 (25) Heckmeck am Bratwurmeck 24/05/22 25 / 29

Figure: Convergence of mean of Monte Carlo simulation depending on number of paths (blue line) in initial throw, if shelf starts at 21. Green: backward induction (26) Heckmeck am Bratwurmeck 24/05/22 26 / 29 The results Optimal strategy in Monte Carlo simulation 1.9 1.85 MC simulation vs. backward induction for initial throw, full shelf Mean of MC sim Value function (backward induction) Value fct +/ stdev 1.8 Number of worms 1.75 1.7 1.65 1.6 1.55 1.5 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Number of paths

The results Optimal strategy in Monte Carlo simulation (2) In the whole game, what is our probability of winning if others pursue average strategies? Test: Optimal strategy vs. fuzzy strategy We test our results in a two-player game: Player 1 follows the optimal strategy (derived from the value function v) Player 2 derives his strategy from a value function misestimated by ±0.1 worms (by randomly perturbing v with stdev of 0.1) Result: Player 1 wins 17 out of 20 games! Conclusion: Even a slight difference in optimality makes us win most of the games (law of large numbers because of many rounds per game!) (27) Heckmeck am Bratwurmeck 24/05/22 27 / 29

The results Extensions / References Possible extensions: Incorporate risk in pricing? Utility functions Compute optimal strategies for the whole game Reference: Reiner Knizia: Heckmeck am Bratwurmeck (Pickomino), Zoch-Verlag 2005 (28) Heckmeck am Bratwurmeck 24/05/22 28 / 29

Value function dependent on shelf, initial throw Expected optimal worms 3.5 3 2.5 2 1.5 1 Value depending on shelf die=4 #selected=1 die=4 #selected=2 die=4 #selected=3 die=4 #selected=4 die=4 #selected=5 die=5 #selected=1 die=5 #selected=2 die=5 #selected=3 die=5 #selected=4 die=5 #selected=5 die=6 #selected=1 die=6 #selected=2 die=6 #selected=3 die=6 #selected=4 die=6 #selected=5 0.5 0 20 22 24 26 28 30 32 34 36 Shelf starts at... Figure: Expected number of worms under optimal strategy for different initial dice choices, dependent on the minimal number available on the shelf. (29) Heckmeck am Bratwurmeck 24/05/22 29 / 29