Algorithms and Networking for Computer Games

Similar documents
CS188 Spring 2012 Section 4: Games

CS360 Homework 14 Solution

Announcements. Today s Menu

CEC login. Student Details Name SOLUTIONS

CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

Repeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games

Q1. [?? pts] Search Traces

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

In the Name of God. Sharif University of Technology. Graduate School of Management and Economics

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information

Week 8: Basic concepts in game theory

Regret Minimization and Security Strategies

Introduction to Multi-Agent Programming

Outline Introduction Game Representations Reductions Solution Concepts. Game Theory. Enrico Franchi. May 19, 2010

CHAPTER 14: REPEATED PRISONER S DILEMMA

CS 7180: Behavioral Modeling and Decision- making in AI

Extending MCTS

Introduction to Artificial Intelligence Spring 2019 Note 2

Week 8: Basic concepts in game theory

On the Optimality of a Family of Binary Trees Techical Report TR

m 11 m 12 Non-Zero Sum Games Matrix Form of Zero-Sum Games R&N Section 17.6

Introduction to Industrial Organization Professor: Caixia Shen Fall 2014 Lecture Note 5 Games and Strategy (Ch. 4)

Game Theory. VK Room: M1.30 Last updated: October 22, 2012.

To earn the extra credit, one of the following has to hold true. Please circle and sign.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

Monte-Carlo Planning Look Ahead Trees. Alan Fern

The exam is closed book, closed calculator, and closed notes except your three crib sheets.

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Stochastic Games and Bayesian Games

The Nash equilibrium of the stage game is (D, R), giving payoffs (0, 0). Consider the trigger strategies:

Prisoner s Dilemma. CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma. Prisoner s Dilemma. Prisoner s Dilemma.

Foundations of Artificial Intelligence

Non-Deterministic Search

LECTURE 4: MULTIAGENT INTERACTIONS

Introduction to Game Theory

Stochastic Games and Bayesian Games

January 26,

SI Game Theory, Fall 2008

Problem 3 Solutions. l 3 r, 1

Elements of Economic Analysis II Lecture X: Introduction to Game Theory

Monte-Carlo Planning Look Ahead Trees. Alan Fern

CS 6300 Artificial Intelligence Spring 2018

CMPSCI 240: Reasoning about Uncertainty

An introduction on game theory for wireless networking [1]

Decision making in the presence of uncertainty

1 Solutions to Homework 3

Lecture 3 Representation of Games

An Analysis of Forward Pruning. University of Maryland Institute for Systems Research. College Park, MD 20742

Game Theory: introduction and applications to computer networks

G5212: Game Theory. Mark Dean. Spring 2017

Econ 323 Microeconomic Theory. Practice Exam 2 with Solutions

Repeated Games with Perfect Monitoring

Duopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma

Outline for today. Stat155 Game Theory Lecture 19: Price of anarchy. Cooperative games. Price of anarchy. Price of anarchy

Decision Trees An Early Classifier

CMPSCI 240: Reasoning about Uncertainty

1. better to stick. 2. better to switch. 3. or does your second choice make no difference?

6.207/14.15: Networks Lecture 9: Introduction to Game Theory 1

Econ 323 Microeconomic Theory. Chapter 10, Question 1

Lecture l(x) 1. (1) x X

CS711 Game Theory and Mechanism Design

Introduction to Game Theory Lecture Note 5: Repeated Games

CS 188: Artificial Intelligence

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning?

CSE 473: Artificial Intelligence

CS 188: Artificial Intelligence

MATH 4321 Game Theory Solution to Homework Two

In reality; some cases of prisoner s dilemma end in cooperation. Game Theory Dr. F. Fatemi Page 219

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs.

IV. Cooperation & Competition

Preliminary Notions in Game Theory

CS 343: Artificial Intelligence

ECO303: Intermediate Microeconomic Theory Benjamin Balak, Spring 2008

A brief introduction to evolutionary game theory

1 Solutions to Homework 4

Sequential-move games with Nature s moves.

Repeated Games. Econ 400. University of Notre Dame. Econ 400 (ND) Repeated Games 1 / 48

Introduction to Game Theory

CS711: Introduction to Game Theory and Mechanism Design

Microeconomics of Banking: Lecture 5

CSE 21 Winter 2016 Homework 6 Due: Wednesday, May 11, 2016 at 11:59pm. Instructions

Finding Equilibria in Games of No Chance

Agenda. Game Theory Matrix Form of a Game Dominant Strategy and Dominated Strategy Nash Equilibrium Game Trees Subgame Perfection

Expectimax and other Games

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum

Repeated Games. EC202 Lectures IX & X. Francesco Nava. January London School of Economics. Nava (LSE) EC202 Lectures IX & X Jan / 16

Economics 109 Practice Problems 1, Vincent Crawford, Spring 2002

Game theory for. Leonardo Badia.

Game Theory I 1 / 38

Game Theory. Wolfgang Frimmel. Repeated Games

CS 188: Artificial Intelligence. Outline

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5

February 23, An Application in Industrial Organization

Repeated, Stochastic and Bayesian Games

Game Theory I 1 / 38

6.207/14.15: Networks Lecture 9: Introduction to Game Theory 1

Transcription:

Algorithms and Networking for Computer Games Chapter 4: Game Trees http://www.wiley.com/go/smed

Game types perfect information games no hidden information two-player, perfect information games Noughts and Crosses Chess Go imperfect information games Poker Backgammon Monopoly zero-sum property one player s gain equals another player s loss Chapter 4 Slide 2

Game tree all possible plays of two-player, perfect information games can be represented with a game tree nodes: positions (or states) edges: moves players: MAX (has the first move) and MIN ply = the length of the path between two nodes MAX has even plies counting from the root node MIN has odd plies counting from the root node MAX MIN Chapter 4 Slide 3

Division Nim with seven matches Chapter 4 Slide 4

Game tree for Division Nim Chapter 4 Slide 5

Problem statement Given a node v in a game tree find a winning strategy for MAX (or or (equivalently) show that MAX (or (or MIN (or MIN) from ) from v MIN) ) can force a win from v Chapter 4 Slide 6

Minimax assumption: players are rational and try to win given a game tree, we know the outcome in the leaves assign the leaves to win, draw, or loss (or a numeric value like +1, 0, 1) according to MAX s point of view at nodes one ply above the leaves, we choose the best outcome among the children (which are leaves) MAX: : win if possible; otherwise, draw if possible; else loss MIN: : loss if possible; otherwise, draw if possible; else win recurse through the nodes until in the root Chapter 4 Slide 7

Minimax rules 1. If the node is labelled to MAX,, assign it to the maximum value of its children. 2. If the node is labelled to MIN,, assign it to the minimum value of its children. MIN minimizes, MAX maximizes minimax Chapter 4 Slide 8

Game tree with valued nodes 1 MAX 1 1 1 MIN +1 1 +1 1 MAX +1 1 +1 MIN +1 1 MAX +1 MIN Chapter 4 Slide 9

Analysis simplifying assumptions internal nodes have the same branching factor b game tree is searched to a fixed depth d time consumption is proportional to the number of expanded nodes 1 root node (the initial ply) b nodes in the first ply b 2 nodes in the second ply b d nodes in the dth ply overall running time O(b d ) Chapter 4 Slide 10

Rough estimates on running times when d = 5 suppose expanding a node takes 1 ms branching factor b depends on the game Draughts (b( 3): t = 0.243 s Chess (b( 30): t = 6¾6 h Go (b( 300): t = 77 a alpha-beta pruning reduces b Chapter 4 Slide 11

Controlling the search depth usually the whole game tree is too large limit the search depth a partial game tree partial minimax n-move look-ahead strategy stop searching after n moves make the internal nodes (i.e., frontier nodes) leaves use an evaluation function to guess the outcome Chapter 4 Slide 12

Evaluation function combination of numerical measurements m i (s, p) ) of the game state single measurement: m i (s, p) difference measurement: m i (s, p) ) m j (s, q) ratio of measurements: m i (s, p) ) / m j (s, q) aggregate the measurements maintaining the zero-sum property Chapter 4 Slide 13

Example: Noughts and Crosses heuristic evaluation function e: count the winning lines open to MAX subtract the number of winning lines open to MIN forced wins state is evaluated +, if it is a forced win for MAX state is evaluated, if it is forced win for MIN Chapter 4 Slide 14

Examples of the evaluation e( ) = 6 5 = 1 e( ) = 4 5 = 1 e( ) = + Chapter 4 Slide 15

Drawbacks of partial minimax horizon effect heuristically promising path can lead to an unfavourable situation staged search: extend the search on promising nodes iterative deepening: increase n until out of memory or time phase-related search: opening, midgame, end game however, horizon effect cannot be totally eliminated bias we want to have an estimate of minimax but get a minimax of estimates distortion in the root: odd plies win, even plies loss Chapter 4 Slide 16

The deeper the better...? assumptions: n-move look-ahead branching factor b,, depth d, leaves with uniform random distribution minimax convergence theorem: n increases root value converges to f(b, d) last player theorem: root values from odd and even plies not comparable minimax pathology theorem: n increases probability of selecting non-optimal optimal move increases ( uniformity assumption!) Chapter 4 Slide 17

Alpha-beta pruning reduce the branching factor of nodes alpha value associated with MAX nodes represents the worst outcome MAX can achieve can never decrease beta value associated with MIN nodes represents the worst outcome MIN can achieve can never increase Chapter 4 Slide 18

in a MAX node, α = 4 Example we know that MAX can make a move which will result at least the value 4 we can omit children whose value is less than or equal to 4 in a MIN node, β = 4 we know that MIN can make a move which will result at most the value 4 we can omit children whose value is greater than or equal to 4 Chapter 4 Slide 19

Ancestors and α & β alpha value of a node is never less than the alpha value of its ancestors beta value of a node is never greater than the beta value of its ancestors Chapter 4 Slide 20

Once again α = 3 β = 5 > α = 4 β = 4 < α = 5 α = 3 β = 3 β = 5 Chapter 4 Slide 21

Rules of pruning 1. Prune below any MIN node having a beta value less than or equal to the alpha value of any of its MAX ancestors. 2. Prune below any MAX node having an alpha value greater than or equal to the beta value of any of its MIN ancestors Or, simply put: If α β,, then prune below! Chapter 4 Slide 22

Best-case analysis omit the principal variation at depth d 1 optimum pruning: each node expands one child at depth d at depth d 2 no pruning: each node expands all children at depth d 1 at depth d 3 optimum pruning at depth d 4 no pruning, etc. total amount of expanded nodes: Ω(b d/2 /2 ) Chapter 4 Slide 23

Principal variation search alpha-beta range should be small limit the range artificially aspiration search if search fails, revert to the original range game tree node is either α-node: every move has e α β-node: every move has e β principal variation node: one or more moves has e > α but none has e β Chapter 4 Slide 24

Principal variation search (cont d) if we find a principal variation move (i.e., between α and β), assume we have found a principal variation node search the rest of nodes the assuming they will not produce a good move assume that the rest of nodes have values < α null window: [α,[ α + ε] if the assumption fails, re-search the node works well if the principal variation node is likely to get selected first sort the children? Chapter 4 Slide 25

Non-zero sum game: Prisoner s dilemma two criminals are arrested and isolated from each other police suspects they have committed a crime together but don t have enough proof both are offered a deal: rat on the other one and get a lighter sentence if one defects, he gets free whilst the other gets a long sentence if both defect, both get a medium sentence if neither one defects (i.e., they co-operate operate with each other), both get a short sentence Chapter 4 Slide 26

Prisoner s dilemma (cont d) two players possible moves co-operate operate defect the dilemma: player cannot make a good decision without knowing what the other will do Chapter 4 Slide 27

Payoffs for prisoner A Prisoner B s move Prisoner A s move Co-operate: keep silent Co-operate: keep silent Fairly good: 6 months Defect: rat on the other prisoner Bad: 10 years Defect: rat on the other prisoner Good: no penalty Mediocre: 5 years Chapter 4 Slide 28

Payoffs in Chicken Driver B s move Driver A s move Co-operate: swerve Co-operate: swerve Fairly good: It s a draw. Defect: keep going Mediocre: I m chicken... Defect: keep going Good: I win! Bad: Crash, boom, bang!! Chapter 4 Slide 29

Payoffs in Battle of Sexes Wife s move Husband s move Co-operate: opera Defect: boxing Co-operate: boxing Wife: Very bad Husband: Very bad Wife: Mediocre Husband: Good Defect: opera Wife: Good Husband: Mediocre Wife: Bad Husband: Bad Chapter 4 Slide 30

Iterated prisoner s dilemma encounters are repeated players have memory of the previous encounters R. Axelrod: The Evolution of Cooperation (1984) greedy strategies tend to work poorly altruistic strategies work better even even if judged by self- interest only Nash equilibrium: always defect! but sometimes rational decisions are not sensible Tit for Tat (A. Rapoport) co-operate operate on the first iteration do what the opponent did on the previous move Chapter 4 Slide 31