Monte-Carlo tree search for multi-player, no-limit Texas hold'em poker. Guy Van den Broeck

Size: px
Start display at page:

Download "Monte-Carlo tree search for multi-player, no-limit Texas hold'em poker. Guy Van den Broeck"

Transcription

1 Monte-Carlo tree search for multi-player, no-limit Texas hold'em poker Guy Van den Broeck

2

3 Should I bluff? Deceptive play

4 Should I bluff? Is he bluffing? Opponent modeling

5 Should I bluff? Is he bluffing? Who has the Ace? Incomplete information

6 Should I bluff? Is he bluffing? Who has the Ace? What are the odds? Game of chance

7 Should I bluff? Is he bluffing? Who has the Ace? What are the odds? I'll bet because he always calls Exploitation

8 Should I bluff? Is he bluffing? Who has the Ace? What are the odds? What can happen next? I'll bet because he always calls Huge state space

9 Should I bluff? Should I bet $5 or $10? Is he bluffing? Who has the Ace? What are the odds? What can happen next? I'll bet because he always calls Risk management & Continuous action space

10 Should I bluff? Should I bet $5 or $10? Who has the Ace? Is he bluffing? What are the odds? What can happen next? I'll bet because he always calls Take-Away Message: We can solve all these problems!

11 Problem Statement A bot for Texas hold'em poker No-Limit & > 2 players Not done before! Exploitative, not game theoretic Game tree search + Opponent modeling Applies to any problem with either incomplete information non-determinism continuous actions

12 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Opponent model Conclusion

13 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Opponent model Conclusion

14 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Opponent model Conclusion

15 Poker Game Tree Minimax trees: deterministic Tic-tac-toe, checkers, chess, go, max min

16 Poker Game Tree Minimax trees: deterministic Tic-tac-toe, checkers, chess, go, max min Expecti(mini)max trees: chance Backgammon, max min mix

17 Poker Game Tree Minimax trees: deterministic Tic-tac-toe, checkers, chess, go, max min Expecti(mini)max trees: chance Backgammon, max min mix Miximax trees: hidden information max mix mix + opponent model

18 my action fold raise call

19 my action fold Resolve raise call

20 0 fold my action Resolve raise call

21 0 fold my action Resolve raise call Reveal Cards

22 0 fold my action Resolve raise call Reveal Cards

23 0 fold my action Resolve raise call Reveal Cards

24 my action 0 fold Resolve raise call Reveal Cards

25 my action 0 fold Resolve raise 1 Reveal Cards call

26 my action 0 fold Resolve raise 1 Reveal Cards -1 3 opp-1 action 0.6 fold call 0.3 call 0.1 raise

27 my action 0 fold Resolve raise 1 call opp-1 action Reveal Cards fold call opp-2 action fold 0.1 raise

28 my action 0 fold Resolve raise 1 call opp-1 action Reveal Cards call fold opp-2 action fold 0.1 raise

29 my action 0 fold Resolve raise 1 call opp-1 action Reveal Cards call fold raise opp-2 action fold

30 my action 0 fold Resolve raise 1 call opp-1 action Reveal Cards call fold raise 2 opp-2 action fold

31 my action 0 fold Resolve raise 1 call opp-1 action Reveal Cards call fold raise 2 opp-2 action fold

32 my action 0 fold Resolve raise 1 call opp-1 action Reveal Cards call fold raise 2 opp-2 action 0 fold

33 my action 0 fold Resolve raise 1 call 3 opp-1 action Reveal Cards call fold raise 2 opp-2 action 0 fold

34 3 my action 0 fold Resolve raise 1 call 3 opp-1 action Reveal Cards call fold raise 2 opp-2 action 0 fold

35 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Opponent model Conclusion

36 Short Experiment

37 Opponent Model Set of probability trees Weka's M5' Separate model for Actions Hand cards at showdown

38 Fold Probability nballplayerraises <= 1.5 : callfrequency <= : nbactionsthisround <= 2.5 : potodds <= 0.28 : AF <= : AF > : potsize <= : round=flop <= 0.5 : round=flop > 0.5 : potsize > : potodds > 0.28 : stacksize <= : callfrequency <= : callfrequency > : round=flop <= 0.5 : round=flop > 0.5 : nbseatedplayers <= 7.5 : nbseatedplayers > 7.5 : stacksize > : potsize <= : foldfrequency <= : foldfrequency > : potsize > : nbactionsthisround > 2.5 : potodds <= : callfrequency <= : callfrequency > : potodds > : AF <= : AF > : 0.921

39 (Can also be relational) Tilde probability tree [Ponsen08]

40 Opponent Ranks Learn distribution of hand ranks at showdown Probability Probability Rank Bucket Number of Raises

41 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Opponent model Conclusion

42 Traversing the tree Limit Texas Hold em 1018 nodes Fully traversable No-limit >1071 nodes Too large to traverse Sampled, not searched Monte-Carlo Tree Search

43 Monte-Carlo Tree Search [Chaslot08]

44 Selection In each node: is an estimate of the reward is the number of samples

45 Selection In each node: is an estimate of the reward is the number of samples UCT (Multi-Armed Bandit)

46 Selection In each node: is an estimate of the reward is the number of samples UCT (Multi-Armed Bandit) exploitation

47 Selection In each node: is an estimate of the reward is the number of samples UCT (Multi-Armed Bandit) exploitation exploration

48 Selection In each node: is an estimate of the reward is the number of samples UCT (Multi-Armed Bandit) exploitation exploration CrazyStone

49 Expansion Simulation

50 Backpropagation is an estimate of the reward is the number of samples

51 Backpropagation is an estimate of the reward is the number of samples Sample-weighted average

52 Backpropagation is an estimate of the reward is the number of samples Sample-weighted average Maximum child

53 Initial experiments 1*MCTS + 2*rule based Exploitative! MCTS Bot

54 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Opponent model Conclusion

55 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Uncertainty in MCTS Continuous action spaces Opponent model Online learning Concept drift Conclusion

56 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Uncertainty in MCTS Continuous action spaces Opponent model Online learning Concept drift Conclusion

57 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Uncertainty in MCTS Continuous action spaces Opponent model Online learning Concept drift Conclusion

58 MCTS for games with uncertainty? Expected reward distributions (ERD) Sample selection using ERD Backpropagation of ERD [VandenBroeck09]

59 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance

60 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance

61 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance

62 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance

63 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance Sampling

64 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance Sampling ExpectiMax/MixiMax

65 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance Sampling ExpectiMax/MixiMax

66 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance Sampling ExpectiMax/MixiMax

67 Expected reward distribution MiniMax Estimating 10 samples 100 samples samples Variance Sampling ExpectiMax/MixiMax

68 Expected reward distribution MiniMax ExpectiMax/MixiMax Estimating 10 samples 100 samples samples Variance Sampling Uncertainty + Sampling

69 Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples samples Variance Sampling Uncertainty + Sampling

70 Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples samples Variance Sampling Uncertainty + Sampling

71 Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples samples Variance Sampling Uncertainty + Sampling

72 Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples samples Variance Sampling Uncertainty + Sampling

73 Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples samples Variance Sampling Uncertainty + Sampling Sampling

74 Expected reward distribution MiniMax ExpectiMax/MixiMax ExpectiMax/MixiMax Estimating / T(P) 10 samples 100 samples samples Variance Sampling Uncertainty + Sampling Sampling

75 ERD selection strategy Objective? Find maximum expected reward Sample more in subtrees with (1) High expected reward (2) Uncertain estimate UCT does (1) but not really (2) CrazyStone does (1) and (2) for deterministic games (Go) UCT+ selection: (1) (2)

76 ERD selection strategy Objective? Find maximum expected reward Sample more in subtrees with (1) High expected reward (2) Uncertain estimate UCT does (1) but not really (2) CrazyStone does (1) and (2) for deterministic games (Go) UCT+ selection: Expected value under perfect play

77 ERD selection strategy Objective? Find maximum expected reward Sample more in subtrees with (1) High expected reward (2) Uncertain estimate UCT does (1) but not really (2) CrazyStone does (1) and (2) for deterministic games (Go) UCT+ selection: Measure of uncertainty due to sampling

78 ERD max-distribution backpropagation max A B 3 4

79 ERD max-distribution backpropagation sample-weighted max 3.5 A B 3 4

80 ERD max-distribution backpropagation sample-weighted max 3.5 A max B 4 3 4

81 ERD max-distribution backpropagation sample-weighted max 3.5 A max B When the game reaches P, we'll have more time to find the real

82 ERD max-distribution backpropagation sample-weighted max 3.5 A max B max-distribution 4.5

83 ERD max-distribution backpropagation P(B<4) = 0.5 P(B>4) = 0.5 P(A<4) = 0.8 P(A>4) = 0.2 max A 3 B 4 B<4 B>4 A<4 A>4 0.8* * * *0.5 P(max(A,B)>4) = 0.6 >

84 Experiments 2*MCTS Max-distribution Sample-weighted 2*MCTS UCT+ (stddev) UCT

85 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Uncertainty in MCTS Continuous action spaces Opponent model Online learning Concept drift Conclusion

86 Dealing with continuous actions Sample discrete actions Progressive unpruning [Chaslot08] (ignores smoothness of EV function)... Tree learning search (work in progress) relative betsize

87 Tree learning search Based on regression tree induction from data streams training examples arrive quickly nodes split when significant reduction in stddev training examples are immediately forgotten Edges in TLS tree are not actions, but sets of actions, e.g., (raise in [2,40]), (fold or call) MCTS provides a stream of (action,ev) examples Split action sets to reduce stddev of EV (when significant)

88 Tree learning search max Bet in [0,10] {Fold, Call} max??

89 Tree learning search max Bet in [0,10] {Fold, Call} max??

90 Tree learning search max Bet in [0,10] {Fold, Call} max?? Optimal split at 4

91 Tree learning search max Bet in [0,10] {Fold, Call} max Bet in [0,4] Bet in [4,10] max max????

92 Tree learning search one action of P1 one action of P2

93 Selection Phase P1 Sample 2.4 Each node has EV estimate, which generalizes over actions

94 Expansion P1 P2 Selected Node

95 Expansion P1 P2 P3 Expanded node Represents any action of P3

96 Backpropagation New sample; Split becomes significant

97 Backpropagation New sample; Split becomes significant

98 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Uncertainty in MCTS Continuous action spaces Opponent model Online learning Concept drift Conclusion

99 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Uncertainty in MCTS Continuous action spaces Opponent model Online learning Concept drift Conclusion

100 Online learning of opponent model Start from (safe) model of general opponent Exploit weaknesses of specific opponent Start to learn model of specific opponent (exploration of opponent behavior)

101 Multi-agent interaction

102 Multi-agent interaction Yellow learns model for Blue and changes strategy

103 Multi-agent interaction Yellow learns model for Blue and changes strategy Yellow doesn't profit!

104 Multi-agent interaction Yellow learns model for Blue and changes strategy Yellow doesn't profit! Green profits without changing strategy!!

105 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Uncertainty in MCTS Continuous action spaces Opponent model Online learning Concept drift Conclusion

106 Concept drift While learning from a stream, the training examples in the stream change In opponent model: changing strategy Changing gears is not just about bluffing, it's about changing strategy to achieve a goal. Learning with concept drift adapt quickly to changes yet robust to noise (recognize recurrent concepts)

107 Basic approach to concept drift Maintain a window of training examples large enough to learn small enough to adapt quickly without 'old' concepts Heuristics to adjust window size based on FLORA2 framework [Widmer92]

108 4 components of a single opponent model Accuracy Start online learning Concept drift Window size

109 Bad parameters for heuristic Accuracy NOT ROBUST Window size

110 Outline Overview approach The Poker game tree Opponent model Monte-Carlo tree search Research challenges Search Opponent model Conclusion

111 Conclusions First exploitive poker Challenge for MCTS bot for games with uncertainty No-limit Holdem > 2 players continuous action space Challenge for ML Apply in other games backgammon computational pool... online learning concept drift (relational learning)

112 Thanks for listening!

Extending MCTS

Extending MCTS Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCT b) UCT is a type of MCTS

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 44. Monte-Carlo Tree Search: Introduction Thomas Keller Universität Basel May 27, 2016 Board Games: Overview chapter overview: 41. Introduction and State of the Art

More information

Algorithms and Networking for Computer Games

Algorithms and Networking for Computer Games Algorithms and Networking for Computer Games Chapter 4: Game Trees http://www.wiley.com/go/smed Game types perfect information games no hidden information two-player, perfect information games Noughts

More information

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I CS221 / Spring 2018 / Sadigh Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic

More information

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1 Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic Low-level intelligence Machine

More information

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities CSE 473: Artificial Intelligence Uncertainty, Utilities Probabilities Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Uncertainty and Utilities Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at

More information

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case CS 188: Artificial Intelligence Uncertainty and Utilities Uncertain Outcomes Instructor: Marco Alvarez University of Rhode Island (These slides were created/modified by Dan Klein, Pieter Abbeel, Anca Dragan

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Uncertainty and Utilities Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides are based on those of Dan Klein and Pieter Abbeel for

More information

Expectimax and other Games

Expectimax and other Games Expectimax and other Games 2018/01/30 Chapter 5 in R&N 3rd Ø Announcement: q Slides for this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse471/lectures/games.pdf q Project 2 released,

More information

Cooperative Games with Monte Carlo Tree Search

Cooperative Games with Monte Carlo Tree Search Int'l Conf. Artificial Intelligence ICAI'5 99 Cooperative Games with Monte Carlo Tree Search CheeChian Cheng and Norman Carver Department of Computer Science, Southern Illinois University, Carbondale,

More information

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs.

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities Worst-Case vs. Average Case max min 10 10 9 100 Dieter Fox [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro

More information

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Monte-Carlo Planning Look Ahead Trees. Alan Fern Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy

More information

CS 6300 Artificial Intelligence Spring 2018

CS 6300 Artificial Intelligence Spring 2018 Expectimax Search CS 6300 Artificial Intelligence Spring 2018 Tucker Hermans thermans@cs.utah.edu Many slides courtesy of Pieter Abbeel and Dan Klein Expectimax Search Trees What if we don t know what

More information

MDP Algorithms. Thomas Keller. June 20, University of Basel

MDP Algorithms. Thomas Keller. June 20, University of Basel MDP Algorithms Thomas Keller University of Basel June 20, 208 Outline of this lecture Markov decision processes Planning via determinization Monte-Carlo methods Monte-Carlo Tree Search Heuristic Search

More information

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds Daniel R. Jiang, Lina Al-Kanj, Warren B. Powell April 19, 2017 Abstract Monte Carlo Tree Search (MCTS), most famously used in game-play

More information

CS188 Spring 2012 Section 4: Games

CS188 Spring 2012 Section 4: Games CS188 Spring 2012 Section 4: Games 1 Minimax Search In this problem, we will explore adversarial search. Consider the zero-sum game tree shown below. Trapezoids that point up, such as at the root, represent

More information

An introduction on game theory for wireless networking [1]

An introduction on game theory for wireless networking [1] An introduction on game theory for wireless networking [1] Ning Zhang 14 May, 2012 [1] Game Theory in Wireless Networks: A Tutorial 1 Roadmap 1 Introduction 2 Static games 3 Extensive-form games 4 Summary

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2010 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 8: MEU / Utilities 2/11/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein 1 Announcements W2 is due today (lecture or

More information

Announcements. CS 188: Artificial Intelligence Spring Expectimax Search Trees. Maximum Expected Utility. What are Probabilities?

Announcements. CS 188: Artificial Intelligence Spring Expectimax Search Trees. Maximum Expected Utility. What are Probabilities? CS 188: Artificial Intelligence Spring 2010 Lecture 8: MEU / Utilities 2/11/2010 Announcements W2 is due today (lecture or drop box) P2 is out and due on 2/18 Pieter Abbeel UC Berkeley Many slides over

More information

Lecture 12: Introduction to reasoning under uncertainty. Actions and Consequences

Lecture 12: Introduction to reasoning under uncertainty. Actions and Consequences Lecture 12: Introduction to reasoning under uncertainty Preferences Utility functions Maximizing expected utility Value of information Bandit problems and the exploration-exploitation trade-off COMP-424,

More information

Monte-Carlo Planning: Basic Principles and Recent Progress

Monte-Carlo Planning: Basic Principles and Recent Progress Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo

More information

Introduction to Artificial Intelligence Spring 2019 Note 2

Introduction to Artificial Intelligence Spring 2019 Note 2 CS 188 Introduction to Artificial Intelligence Spring 2019 Note 2 These lecture notes are heavily based on notes originally written by Nikhil Sharma. Games In the first note, we talked about search problems

More information

Introduction to Decision Making. CS 486/686: Introduction to Artificial Intelligence

Introduction to Decision Making. CS 486/686: Introduction to Artificial Intelligence Introduction to Decision Making CS 486/686: Introduction to Artificial Intelligence 1 Outline Utility Theory Decision Trees 2 Decision Making Under Uncertainty I give a robot a planning problem: I want

More information

Decision making in the presence of uncertainty

Decision making in the presence of uncertainty CS 2750 Foundations of AI Lecture 20 Decision making in the presence of uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Decision-making in the presence of uncertainty Computing the probability

More information

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning?

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning? CS 188: Artificial Intelligence Fall 2011 Expectimax Search Trees What if we don t know what the result of an action will be? E.g., In solitaire, next card is unknown In minesweeper, mine locations In

More information

Continuing game theory: mixed strategy equilibrium (Ch ), optimality (6.9), start on extensive form games (6.10, Sec. C)!

Continuing game theory: mixed strategy equilibrium (Ch ), optimality (6.9), start on extensive form games (6.10, Sec. C)! CSC200: Lecture 10!Today Continuing game theory: mixed strategy equilibrium (Ch.6.7-6.8), optimality (6.9), start on extensive form games (6.10, Sec. C)!Next few lectures game theory: Ch.8, Ch.9!Announcements

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 7: Expectimax Search 9/15/2011 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 Expectimax Search

More information

Relational Regression Methods to Speed Up Monte-Carlo Planning

Relational Regression Methods to Speed Up Monte-Carlo Planning Institute of Parallel and Distributed Systems University of Stuttgart Universitätsstraße 38 D 70569 Stuttgart Relational Regression Methods to Speed Up Monte-Carlo Planning Teresa Böpple Course of Study:

More information

Action Selection for MDPs: Anytime AO* vs. UCT

Action Selection for MDPs: Anytime AO* vs. UCT Action Selection for MDPs: Anytime AO* vs. UCT Blai Bonet 1 and Hector Geffner 2 1 Universidad Simón Boĺıvar 2 ICREA & Universitat Pompeu Fabra AAAI, Toronto, Canada, July 2012 Online MDP Planning and

More information

CEC login. Student Details Name SOLUTIONS

CEC login. Student Details Name SOLUTIONS Student Details Name SOLUTIONS CEC login Instructions You have roughly 1 minute per point, so schedule your time accordingly. There is only one correct answer per question. Good luck! Question 1. Searching

More information

CS 4100 // artificial intelligence

CS 4100 // artificial intelligence CS 4100 // artificial intelligence instructor: byron wallace (Playing with) uncertainties and expectations Attribution: many of these slides are modified versions of those distributed with the UC Berkeley

More information

Biasing Monte-Carlo Simulations through RAVE Values

Biasing Monte-Carlo Simulations through RAVE Values Biasing Monte-Carlo Simulations through RAVE Values Arpad Rimmel, Fabien Teytaud, Olivier Teytaud To cite this version: Arpad Rimmel, Fabien Teytaud, Olivier Teytaud. Biasing Monte-Carlo Simulations through

More information

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted 343H: Honors AI Lecture 7: Expectimax Search 2/6/2014 Kristen Grauman UT-Austin Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted 1 Announcements PS1 is out, due in 2 weeks Last time Adversarial

More information

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Monte-Carlo Planning Look Ahead Trees. Alan Fern Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy

More information

Multi-armed bandit problems

Multi-armed bandit problems Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before

More information

Announcements. Today s Menu

Announcements. Today s Menu Announcements Reading Assignment: > Nilsson chapters 13-14 Announcements: > LISP and Extra Credit Project Assigned Today s Handouts in WWW: > Homework 9-13 > Outline for Class 25 > www.mil.ufl.edu/eel5840

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Random Tree Method. Monte Carlo Methods in Financial Engineering

Random Tree Method. Monte Carlo Methods in Financial Engineering Random Tree Method Monte Carlo Methods in Financial Engineering What is it for? solve full optimal stopping problem & estimate value of the American option simulate paths of underlying Markov chain produces

More information

HW Consider the following game:

HW Consider the following game: HW 1 1. Consider the following game: 2. HW 2 Suppose a parent and child play the following game, first analyzed by Becker (1974). First child takes the action, A 0, that produces income for the child,

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium

ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium Let us consider the following sequential game with incomplete information. Two players are playing

More information

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY A. Ben-Tal, B. Golany and M. Rozenblit Faculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel ABSTRACT

More information

Lecture outline W.B.Powell 1

Lecture outline W.B.Powell 1 Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

Advanced Managerial Economics

Advanced Managerial Economics Advanced Managerial Economics Andy McLennan July 27, 2016 Course outline Topics covered in Gans Core Economics for Managers : 1. Economic decision-making (Chapters 2-4) (July 27, August 4, 11) 2. Negotiations

More information

CS 461: Machine Learning Lecture 8

CS 461: Machine Learning Lecture 8 CS 461: Machine Learning Lecture 8 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu 2/23/08 CS 461, Winter 2008 1 Plan for Today Review Clustering Reinforcement Learning How different from supervised, unsupervised?

More information

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This

More information

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective Systems from a Queueing Perspective September 7, 2012 Problem A surveillance resource must observe several areas, searching for potential adversaries. Problem A surveillance resource must observe several

More information

Applying Monte Carlo Tree Search to Curling AI

Applying Monte Carlo Tree Search to Curling AI AI 1,a) 2,b) MDP Applying Monte Carlo Tree Search to Curling AI Katsuki Ohto 1,a) Tetsuro Tanaka 2,b) Abstract: We propose an action decision method based on Monte Carlo Tree Search for MDPs with continuous

More information

Strategy Acquisition for the Game Othello Based on Reinforcement Learning

Strategy Acquisition for the Game Othello Based on Reinforcement Learning Strategy Acquisition for the Game Othello Based on Reinforcement Learning Taku Yoshioka, Shin Ishii and Minoru Ito IEICE Transactions on Information and System 1999 Speaker : Sameer Agarwal Course : Learning

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

Module 15 July 28, 2014

Module 15 July 28, 2014 Module 15 July 28, 2014 General Approach to Decision Making Many Uses: Capacity Planning Product/Service Design Equipment Selection Location Planning Others Typically Used for Decisions Characterized by

More information

Chapter 18 Student Lecture Notes 18-1

Chapter 18 Student Lecture Notes 18-1 Chapter 18 Student Lecture Notes 18-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 18 Introduction to Decision Analysis 5 Prentice-Hall, Inc. Chap 18-1 Chapter Goals After completing

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

OPTIMAL BLUFFING FREQUENCIES

OPTIMAL BLUFFING FREQUENCIES OPTIMAL BLUFFING FREQUENCIES RICHARD YEUNG Abstract. We will be investigating a game similar to poker, modeled after a simple game called La Relance. Our analysis will center around finding a strategic

More information

Genetic Algorithms Overview and Examples

Genetic Algorithms Overview and Examples Genetic Algorithms Overview and Examples Cse634 DATA MINING Professor Anita Wasilewska Computer Science Department Stony Brook University 1 Genetic Algorithm Short Overview INITIALIZATION At the beginning

More information

National Security Strategy: Perfect Bayesian Equilibrium

National Security Strategy: Perfect Bayesian Equilibrium National Security Strategy: Perfect Bayesian Equilibrium Professor Branislav L. Slantchev October 20, 2017 Overview We have now defined the concept of credibility quite precisely in terms of the incentives

More information

Quantitative Trading System For The E-mini S&P

Quantitative Trading System For The E-mini S&P AURORA PRO Aurora Pro Automated Trading System Aurora Pro v1.11 For TradeStation 9.1 August 2015 Quantitative Trading System For The E-mini S&P By Capital Evolution LLC Aurora Pro is a quantitative trading

More information

Rollout Allocation Strategies for Classification-based Policy Iteration

Rollout Allocation Strategies for Classification-based Policy Iteration Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large

More information

CLASS 4: ASSEt pricing. The Intertemporal Model. Theory and Experiment

CLASS 4: ASSEt pricing. The Intertemporal Model. Theory and Experiment CLASS 4: ASSEt pricing. The Intertemporal Model. Theory and Experiment Lessons from the 1- period model If markets are complete then the resulting equilibrium is Paretooptimal (no alternative allocation

More information

DECISION TREE INDUCTION

DECISION TREE INDUCTION CSc-215 (Gordon) Week 12A notes DECISION TREE INDUCTION A decision tree is a graphic way of representing certain types of Boolean decision processes. Here is a simple example of a decision tree for determining

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Extensive-Form Games with Imperfect Information

Extensive-Form Games with Imperfect Information May 6, 2015 Example 2, 2 A 3, 3 C Player 1 Player 1 Up B Player 2 D 0, 0 1 0, 0 Down C Player 1 D 3, 3 Extensive-Form Games With Imperfect Information Finite No simultaneous moves: each node belongs to

More information

The Ohio State University Department of Economics Econ 601 Prof. James Peck Extra Practice Problems Answers (for final)

The Ohio State University Department of Economics Econ 601 Prof. James Peck Extra Practice Problems Answers (for final) The Ohio State University Department of Economics Econ 601 Prof. James Peck Extra Practice Problems Answers (for final) Watson, Chapter 15, Exercise 1(part a). Looking at the final subgame, player 1 must

More information

Monte-Carlo Beam Search

Monte-Carlo Beam Search IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 1 Monte-Carlo Beam Search Tristan Cazenave Abstract Monte-Carlo Tree Search is state of the art for multiple games and for solving puzzles

More information

8. Uncertainty. Reading: BGVW, Chapter 7

8. Uncertainty. Reading: BGVW, Chapter 7 8. Uncertainty Reading: BGVW, Chapter 7 1. Introduction Uncertainties abound future: incomes/prices/populations analysis: dose-response/valuation/climate/effects of regulation on environmental quality/longevity

More information

3. The Dynamic Programming Algorithm (cont d)

3. The Dynamic Programming Algorithm (cont d) 3. The Dynamic Programming Algorithm (cont d) Last lecture e introduced the DPA. In this lecture, e first apply the DPA to the chess match example, and then sho ho to deal ith problems that do not match

More information

Top-down particle filtering for Bayesian decision trees

Top-down particle filtering for Bayesian decision trees Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline

More information

Objective of Decision Analysis. Determine an optimal decision under uncertain future events

Objective of Decision Analysis. Determine an optimal decision under uncertain future events Decision Analysis Objective of Decision Analysis Determine an optimal decision under uncertain future events Formulation of Decision Problem Clear statement of the problem Identify: The decision alternatives

More information

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information Algorithmic Game Theory and Applications Lecture 11: Games of Perfect Information Kousha Etessami finite games of perfect information Recall, a perfect information (PI) game has only 1 node per information

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Answer Key: Problem Set 4

Answer Key: Problem Set 4 Answer Key: Problem Set 4 Econ 409 018 Fall A reminder: An equilibrium is characterized by a set of strategies. As emphasized in the class, a strategy is a complete contingency plan (for every hypothetical

More information

IV. Cooperation & Competition

IV. Cooperation & Competition IV. Cooperation & Competition Game Theory and the Iterated Prisoner s Dilemma 10/15/03 1 The Rudiments of Game Theory 10/15/03 2 Leibniz on Game Theory Games combining chance and skill give the best representation

More information

The exam is closed book, closed calculator, and closed notes except your three crib sheets.

The exam is closed book, closed calculator, and closed notes except your three crib sheets. CS 188 Spring 2016 Introduction to Artificial Intelligence Final V2 You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your three crib sheets.

More information

36106 Managerial Decision Modeling Decision Analysis in Excel

36106 Managerial Decision Modeling Decision Analysis in Excel 36106 Managerial Decision Modeling Decision Analysis in Excel Kipp Martin University of Chicago Booth School of Business October 19, 2017 Reading and Excel Files Reading: Powell and Baker: Sections 13.1,

More information

CUR 412: Game Theory and its Applications, Lecture 9

CUR 412: Game Theory and its Applications, Lecture 9 CUR 412: Game Theory and its Applications, Lecture 9 Prof. Ronaldo CARPIO May 22, 2015 Announcements HW #3 is due next week. Ch. 6.1: Ultimatum Game This is a simple game that can model a very simplified

More information

Ch 10 Trees. Introduction to Trees. Tree Representations. Binary Tree Nodes. Tree Traversals. Binary Search Trees

Ch 10 Trees. Introduction to Trees. Tree Representations. Binary Tree Nodes. Tree Traversals. Binary Search Trees Ch 10 Trees Introduction to Trees Tree Representations Binary Tree Nodes Tree Traversals Binary Search Trees 1 Binary Trees A binary tree is a finite set of elements called nodes. The set is either empty

More information

An Analysis of Forward Pruning. University of Maryland Institute for Systems Research. College Park, MD 20742

An Analysis of Forward Pruning. University of Maryland Institute for Systems Research. College Park, MD 20742 Proc. AAAI-94, to appear. An Analysis of Forward Pruning Stephen J. J. Smith Dana S. Nau Department of Computer Science Department of Computer Science, and University of Maryland Institute for Systems

More information

Credibility and Subgame Perfect Equilibrium

Credibility and Subgame Perfect Equilibrium Chapter 7 Credibility and Subgame Perfect Equilibrium 1 Subgames and their equilibria The concept of subgames Equilibrium of a subgame Credibility problems: threats you have no incentives to carry out

More information

Management Services Reviewer by Ma. Elenita Balatbat-Cabrera

Management Services Reviewer by Ma. Elenita Balatbat-Cabrera Course Name: Course Title: Instructors: Required Text: Course Description: XMASREV Management Services Review David, Dimalanta and Morales Management Services Reviewer by Ma. Elenita Balatbat-Cabrera This

More information

Monte-Carlo Methods in Financial Engineering

Monte-Carlo Methods in Financial Engineering Monte-Carlo Methods in Financial Engineering Universität zu Köln May 12, 2017 Outline Table of Contents 1 Introduction 2 Repetition Definitions Least-Squares Method 3 Derivation Mathematical Derivation

More information

Utilities and Decision Theory. Lirong Xia

Utilities and Decision Theory. Lirong Xia Utilities and Decision Theory Lirong Xia Checking conditional independence from BN graph ØGiven random variables Z 1, Z p, we are asked whether X Y Z 1, Z p dependent if there exists a path where all triples

More information

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Decision Analysis

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Decision Analysis Resource Allocation and Decision Analysis (ECON 800) Spring 04 Foundations of Decision Analysis Reading: Decision Analysis (ECON 800 Coursepak, Page 5) Definitions and Concepts: Decision Analysis a logical

More information

Finding Equilibria in Games of No Chance

Finding Equilibria in Games of No Chance Finding Equilibria in Games of No Chance Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen Department of Computer Science, University of Aarhus, Denmark {arnsfelt,bromille,trold}@daimi.au.dk

More information

Dynamic Real Return Series

Dynamic Real Return Series RUSSELL INVESTMENTS Dynamic Real Return Series Outcome-oriented Dynamic asset allocation Limited downside risk February 2017 A new approach for challenging times Financial markets are experiencing unusually

More information

TIm 206 Lecture notes Decision Analysis

TIm 206 Lecture notes Decision Analysis TIm 206 Lecture notes Decision Analysis Instructor: Kevin Ross 2005 Scribes: Geoff Ryder, Chris George, Lewis N 2010 Scribe: Aaron Michelony 1 Decision Analysis: A Framework for Rational Decision- Making

More information

Noncooperative Oligopoly

Noncooperative Oligopoly Noncooperative Oligopoly Oligopoly: interaction among small number of firms Conflict of interest: Each firm maximizes its own profits, but... Firm j s actions affect firm i s profits Example: price war

More information

GAME THEORY: DYNAMIC. MICROECONOMICS Principles and Analysis Frank Cowell. Frank Cowell: Dynamic Game Theory

GAME THEORY: DYNAMIC. MICROECONOMICS Principles and Analysis Frank Cowell. Frank Cowell: Dynamic Game Theory Prerequisites Almost essential Game Theory: Strategy and Equilibrium GAME THEORY: DYNAMIC MICROECONOMICS Principles and Analysis Frank Cowell April 2018 1 Overview Game Theory: Dynamic Mapping the temporal

More information

Approximations of Stochastic Programs. Scenario Tree Reduction and Construction

Approximations of Stochastic Programs. Scenario Tree Reduction and Construction Approximations of Stochastic Programs. Scenario Tree Reduction and Construction W. Römisch Humboldt-University Berlin Institute of Mathematics 10099 Berlin, Germany www.mathematik.hu-berlin.de/~romisch

More information

Dynamic Games. Econ 400. University of Notre Dame. Econ 400 (ND) Dynamic Games 1 / 18

Dynamic Games. Econ 400. University of Notre Dame. Econ 400 (ND) Dynamic Games 1 / 18 Dynamic Games Econ 400 University of Notre Dame Econ 400 (ND) Dynamic Games 1 / 18 Dynamic Games A dynamic game of complete information is: A set of players, i = 1,2,...,N A payoff function for each player

More information

INFORMATION AND WAR PSC/IR 265: CIVIL WAR AND INTERNATIONAL SYSTEMS WILLIAM SPANIEL WJSPANIEL.WORDPRESS.COM/PSCIR-265

INFORMATION AND WAR PSC/IR 265: CIVIL WAR AND INTERNATIONAL SYSTEMS WILLIAM SPANIEL WJSPANIEL.WORDPRESS.COM/PSCIR-265 INFORMATION AND WAR PSC/IR 265: CIVIL WAR AND INTERNATIONAL SYSTEMS WILLIAM SPANIEL WJSPANIEL.WORDPRESS.COM/PSCIR-265 AGENDA 1. ULTIMATUM GAME 2. EXPERIMENT #2 3. RISK-RETURN TRADEOFF 4. MEDIATION, PREDICTION,

More information

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Prof. Chuan-Ju Wang Department of Computer Science University of Taipei Joint work with Prof. Ming-Yang Kao March 28, 2014

More information

Asset Liability Management An Integrated Approach to Managing Liquidity, Capital, and Earnings

Asset Liability Management An Integrated Approach to Managing Liquidity, Capital, and Earnings Actuaries Club of Philadelphia Asset Liability Management An Integrated Approach to Managing Liquidity, Capital, and Earnings Alan Newsome, FSA, MAAA February 28, 2018 Today s Agenda What is Asset Liability

More information

CS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning

CS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning CS 360: Advanced Artificial Intelligence Class #16: Reinforcement Learning Daniel M. Gaines Note: content for slides adapted from Sutton and Barto [1998] Introduction Animals learn through interaction

More information

Finitely repeated simultaneous move game.

Finitely repeated simultaneous move game. Finitely repeated simultaneous move game. Consider a normal form game (simultaneous move game) Γ N which is played repeatedly for a finite (T )number of times. The normal form game which is played repeatedly

More information