Introduction to Game Theory Lecture Note 5: Repeated Games

Similar documents
In reality; some cases of prisoner s dilemma end in cooperation. Game Theory Dr. F. Fatemi Page 219

Game Theory. Wolfgang Frimmel. Repeated Games

Repeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games

Warm Up Finitely Repeated Games Infinitely Repeated Games Bayesian Games. Repeated Games

G5212: Game Theory. Mark Dean. Spring 2017

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5

CHAPTER 14: REPEATED PRISONER S DILEMMA

Repeated Games. EC202 Lectures IX & X. Francesco Nava. January London School of Economics. Nava (LSE) EC202 Lectures IX & X Jan / 16

Duopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma

Early PD experiments

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

Repeated Games. Econ 400. University of Notre Dame. Econ 400 (ND) Repeated Games 1 / 48

Prisoner s dilemma with T = 1

Finitely repeated simultaneous move game.

Repeated Games with Perfect Monitoring

SI Game Theory, Fall 2008

REPEATED GAMES. MICROECONOMICS Principles and Analysis Frank Cowell. Frank Cowell: Repeated Games. Almost essential Game Theory: Dynamic.

Lecture 5 Leadership and Reputation

Introductory Microeconomics

Problem 3 Solutions. l 3 r, 1

Chapter 8. Repeated Games. Strategies and payoffs for games played twice

The Nash equilibrium of the stage game is (D, R), giving payoffs (0, 0). Consider the trigger strategies:

ECE 586BH: Problem Set 5: Problems and Solutions Multistage games, including repeated games, with observed moves

February 23, An Application in Industrial Organization

Infinitely Repeated Games

Game Theory. Important Instructions

Stochastic Games and Bayesian Games

Economics 171: Final Exam

CUR 412: Game Theory and its Applications Final Exam Ronaldo Carpio Jan. 13, 2015

Topics in Contract Theory Lecture 1

IPR Protection in the High-Tech Industries: A Model of Piracy. Thierry Rayna University of Bristol

Outline for Dynamic Games of Complete Information

Answer Key: Problem Set 4

1 Solutions to Homework 4

Repeated games. Felix Munoz-Garcia. Strategy and Game Theory - Washington State University

Simon Fraser University Spring 2014

Microeconomic Theory II Preliminary Examination Solutions

MA300.2 Game Theory 2005, LSE

Economics 431 Infinitely repeated games

Mixed-Strategy Subgame-Perfect Equilibria in Repeated Games

In the Name of God. Sharif University of Technology. Graduate School of Management and Economics

CUR 412: Game Theory and its Applications, Lecture 9

Game Theory Fall 2006

Stochastic Games and Bayesian Games

Name. Answers Discussion Final Exam, Econ 171, March, 2012

Spring 2017 Final Exam

Optimal selling rules for repeated transactions.

6.254 : Game Theory with Engineering Applications Lecture 3: Strategic Form Games - Solution Concepts

Player 2 L R M H a,a 7,1 5,0 T 0,5 5,3 6,6

M.Phil. Game theory: Problem set II. These problems are designed for discussions in the classes of Week 8 of Michaelmas term. 1

Introduction to Industrial Organization Professor: Caixia Shen Fall 2014 Lecture Note 5 Games and Strategy (Ch. 4)

ECON/MGEC 333 Game Theory And Strategy Problem Set 9 Solutions. Levent Koçkesen January 6, 2011

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

Game Theory for Wireless Engineers Chapter 3, 4

EconS 424 Strategy and Game Theory. Homework #5 Answer Key

preferences of the individual players over these possible outcomes, typically measured by a utility or payoff function.

Microeconomics of Banking: Lecture 5

CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma

Repeated Games. Debraj Ray, October 2006

Extensive-Form Games with Imperfect Information

Discounted Stochastic Games with Voluntary Transfers

EC487 Advanced Microeconomics, Part I: Lecture 9

CUR 412: Game Theory and its Applications, Lecture 12

Lecture 6 Dynamic games with imperfect information

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

ECON 803: MICROECONOMIC THEORY II Arthur J. Robson Fall 2016 Assignment 9 (due in class on November 22)

The Limits of Reciprocal Altruism

Game theory and applications: Lecture 1

The folk theorem revisited

Topics in Contract Theory Lecture 3

Game Theory Fall 2003

Rationalizable Strategies

Economics and Computation

Outline for today. Stat155 Game Theory Lecture 19: Price of anarchy. Cooperative games. Price of anarchy. Price of anarchy

University of Hong Kong ECON6036 Stephen Chiu. Extensive Games with Perfect Information II. Outline

CMSC 474, Introduction to Game Theory 16. Behavioral vs. Mixed Strategies

Game Theory: Additional Exercises

13.1 Infinitely Repeated Cournot Oligopoly

REPUTATION WITH LONG RUN PLAYERS

Elements of Economic Analysis II Lecture X: Introduction to Game Theory

Cooperation and Rent Extraction in Repeated Interaction

An introduction on game theory for wireless networking [1]

Agenda. Game Theory Matrix Form of a Game Dominant Strategy and Dominated Strategy Nash Equilibrium Game Trees Subgame Perfection

EconS 424 Strategy and Game Theory. Homework #5 Answer Key

Public Goods Provision with Rent-Extracting Administrators

Renegotiation in Repeated Games with Side-Payments 1

1 Solutions to Homework 3

(a) (5 points) Suppose p = 1. Calculate all the Nash Equilibria of the game. Do/es the equilibrium/a that you have found maximize social utility?

CMPSCI 240: Reasoning about Uncertainty

Extensive form games - contd

Sequential-move games with Nature s moves.

MA200.2 Game Theory II, LSE

Finite Memory and Imperfect Monitoring

Introduction to Game Theory

Not 0,4 2,1. i. Show there is a perfect Bayesian equilibrium where player A chooses to play, player A chooses L, and player B chooses L.

Notes for Section: Week 4

Beliefs and Sequential Rationality

Eco AS , J. Sandford, spring 2019 March 9, Midterm answers

14.12 Game Theory Midterm II 11/15/ Compute all the subgame perfect equilibria in pure strategies for the following game:

Microeconomic Theory II Preliminary Examination Solutions Exam date: June 5, 2017

Transcription:

Introduction to Game Theory Lecture Note 5: Repeated Games Haifeng Huang University of California, Merced

Repeated games Repeated games: given a simultaneous-move game G, a repeated game of G is an extensive game with perfect information and simultaneous moves in which a history is a sequence of action profiles in G. I will denote the repeated game, if repeated T times, as G T. G is often called a stage game, and G T is called a supergame. Why repeated games matter: if players interact repeatedly, maybe they can be induced to cooperate rather than defect, by the threat of punishment, if they defect, by other players in later rounds of the interactions. E.g., Prisoner s Dilemma.

Discounting Discounting: people value $100 tomorrow less than $100 today. (Why? Interest rate; the world may end tomorrow.) The discount factorδ (0 δ 1) denotes how much a future payoff is valued at the current period, or how patient a player is. If a player has a δ of.8, then $100 tomorrow is equivalent to $80 today for her. Relationship between discount factor (δ) and discount rate/interest rate (r): for a fixed discount rate r, discretely compounded over T periods, the discount factor δ = 1 ; (1+r) T if the discount rate is continuously compounded, δ = e rt (( 1 1+ r n ) nt = (1 + r n ) ( n r )rt = e rt when n ).

Payoffs from a repeated game A player gets a payoff from each stage game, so her total payoff from the supergame is the discounted sum of the payoffs from each stage game. Let s call a sequence (with T periods) of action profiles as (a 1, a 2,...a T ), then a player i s total payoff from this sequence, when her discount factor is δ, is u i (a 1 ) + δu i (a 2 ) + δ 2 u i (a 3 ) +... + δ T 1 u(a T ) = T δ t 1 u i (a t ). t=1 If the sequence is infinite, then the discounted sum is δ t 1 u i (a t ). t=1

Geometric series The discounted sum of a player s payoffs from a repeated game is a geometric series. Geometric sequence: b, bk, bk 2,...,bk T ; geometric series: b + bk + bk 2 +... + bk T = T bk t 1. t=1 When T finite, S T = b + bk + bk 2 +... + bk T = b(1 kt +1 ) 1 k. When T infinite, b + bk + bk 2 +... + bk = 1 < k < 1. b 1 k, if

PD finitely repeated If a Prisoner s Dilemma game is repeated twice, what is the SPNE? Does the discount factor matter here? What if the PD game is repeated T times, T <?

Unraveling in finitely repeated games Proposition (unraveling): Suppose the simultaneous-move game G has a unique Nash equilibrium, σ. If T <, then the repeated game G T has a unique SPNE, in which each player plays her strategy in σ in each of the stage games. If there are more than one Nash equilibria in the stage game, however, players SPNE strategies in any given period (except the last period) need not coincide with any one-shot equilibrium.

Finitely repeated PD with punishment (1) How many Nash equilibria in the following stage game (a variant of PD with punishment)? Player 2 Cooperate Defect Punish Player 1 Cooperate 1, 1 10, 0 15, 10 Defect 0, 10 9, 9 15, 10 Punish 10, 15 10, 15 12, 12 (Cooperate, Cooperate) is not a NE in the stage game, but can be supported as part of a SPNE if the stage game is played twice and the players are patient enough.

Finitely repeated PD with punishment (2) Let the discount factor δ 1 3. For i = 1, 2, let s i0 be player i s strategy after history, s i1 be player i s strategy after the first stage game, and a 1 be the action profile in the first stage game. The following strategy profile that induces the players to cooperate in the first stage is a SPNE. { D, if a1 = (C, C); s i0 = C, and s i1 (a 1 ) = P, otherwise.

Finitely repeated PD with punishment (2) Player 2 Cooperate Defect Punish Player 1 Cooperate 1, 1 10, 0 15, 10 Defect 0, 10 9, 9 15, 10 Punish 10, 15 10, 15 12, 12 Proof: First, the strategies induce a NE in the 2nd stage game regardless of the history. Now, if the other player is sticking to the above strategy, player i s discounted payoff from adhering to the strategy is 1 9δ. Her best deviation, given the other player sticks to the above strategy, is to play D in the 1st round and P in the 2nd round, yielding a discounted payoff of 0 12δ. Therefore the original strategies constitute a SPNE if 1 9δ 0 12, or δ 1 3.

PD infinitely repeated An infinitely repeated game usually has more SPNEs than a finitely repeated game, and it may have multiple SPNEs even if the stage game has a unique NE. But playing the NE strategies in each stage game, regardless of history, is still a SPNE in the infinitely repeated game. So each player choosing defection is a SPNE in infinitely repeated PD as in finitely repeated PD. But there are other SPNEs in infinitely repeated PD.

One-deviation property One-deviation property of SPNE of finite horizon games: A strategy profile in an extensive game with perfect information and a finite horizon is a SPNE if and only if it satisfies the one-deviation property. A strategy profile in an infinitely repeated game with a discount factor less than 1 is a SPNE if and only if it satisfies the one-deviation property. One-deviation property: no player can increase her payoff by changing her action at the start of any subgame in which she is the first-mover, given the other players strategies and the rest of her own strategy.

Grim trigger strategies Consider an infinitely repeated standard PD game, where (D, D) is the unique stage game NE. Player 2 Defect Cooperate Player 1 Defect 1, 1 3, 0 Cooperate 0, 3 2, 2 Let s it be i s strategy after period t (history x t ). The following strategy profile will be a SPNE if δ 1 2. For i = 1, 2, { C, if xt = ((C, C), (C, C),..., (C, C)); s i0 = C, and s it (x t ) = D, otherwise. You start by playing C, but if any player (including yourself) ever deviates to D, then both players play D forever.

Grim trigger strategies: proof of SPNE (1) When both players follow the grim trigger strategy, the outcome path of a subgame will either be (C, C) in every period or (D, D) in every period. When the outcome is (C, C) in every period, A player s payoff (in that subgame) from following the strategy is 2 + 2δ + 2δ 2 +... = 2 1 δ. If she deviates in one period, she gets 3 in that period, but 1 in every period thereafter (since both players will then play D), so her payoff is 3 + δ 1 1 δ. She has no incentive to make that one deviation if 2 1 δ 3 + δ 1 1 δ, or δ 1 2.

Grim trigger strategies: proof of SPNE (2) When the outcome is (D, D) in every period, clearly no player wants to deviate to C in any one period (getting 0 rather than 1), while what follows after that period is still (D, D) forever. By the one-deviation property, the proposed grim trigger strategies constitute a SPNE for δ 1/2. Cooperation can also be induced by punishment for a finite number of periods rather than punishment forever, as long as δ sufficiently high. But still it has to be the case that D is triggered for both players whenever one player deviates. Triggering D for a player only if the other player deviates will not sustain cooperation in SPNE. The value of δ that supports cooperation depends on the specific payoffs in the stage game.

Tit-for-tat: another SPNE in infinitely repeated PD Tit-for-tat: a player starts by playing C, and then do whatever the other player did in the previous period. When players follow tit-for-tat, behavior in a subgame hinges on the last outcome in the history that preceded the subgame: (C, C), (C, D), (D, C), or (D, D). Suppose player 2 sticks to t-f-t, consider player 1 s choices. After (C, C): if player 1 sticks to t-f-t, she gets 2/(1 δ); if she deviates in and only in the period following (C, C) (choosing D in that period), the subgame outcome path will be (D, C), (C, D), (D, C), (C, D)..., and her payoff = 3 + 0 + 3δ 2 + 0 + 3δ 4 +... = 3 + 3(δ 2 ) + 3(δ 2 ) 2 + 3(δ 2 ) 3 +... = 3/(1 δ 2 ). She has no incentive to deviate if 2/(1 δ) 3/(1 δ 2 ), or δ 1/2.

Tit-for-tat (2) After (C, D): if player 1 sticks to t-f-t, the subgame outcome path will again be (D, C), (C, D), (D, C), (C, D)..., and her payoff is 3/(1 δ 2 ). If she deviates in and only in the period following (C, D) (choosing C), the outcome is (C, C) in every subsequent period. So her payoff is 2/(1 δ). For her not to deviate, we must have 3/(1 δ 2 ) 2/(1 δ), or δ 1/2. After (D, C): if player 1 sticks to t-f-t, the subgame outcome path will again be (C, D), (D, C), (C, D), (D, C)..., and her payoff is 3δ/(1 δ 2 ). If she deviates in and only in the period following (D, C) (choosing D), the outcome is (D, D) in every subsequent period. So her payoff is 1/(1 δ). For her not to deviate, we must have 3δ/(1 δ 2 ) 1/(1 δ), or δ 1/2.

Tit-for-tat (3) Finally, after a history ending in (D, D): if player 1 sticks to t-f-t, the subgame outcome path will again be (D, D) in every subsequent period, and her payoff is 1/(1 δ). If she deviates in and only in the period following the history ending in (D, D) (choosing C), the outcome path in subsequent periods will be (C, D), (D, C), (C, D), (D, C)... So her payoff is 3δ/(1 δ 2 ). For her not to deviate, we must have δ 1/2. By the one-deviation property, the strategy profile (tit-for-tat, tit-for-tat) constitutes a SPNE if and only if δ = 1/2 in this PD.

The (Perfect) Folk Theorem So a cooperative outcome (C, C) can be supported in an infinitely repeated PD game by grim trigger strategies if the discount factor is sufficiently high. More generally, Proposition: Suppose that σ is a Nash equilibrium of the simultaneous-move game G, and consider the supergame G (δ) with δ < 1. Let σ be any one-shot strategy profile that is strictly preferred to σ by all players: u i (σ) > u i (σ ), i. There exists δ such that if δ [δ, 1), then the following strategies, i = 1, 2,..., N, constitute a SPNE of G (δ). { σi, if x σ i0 = σ i, and σ it (x t ) = t = (σ, σ,..., σ) σi, otherwise, where σ it refers to player i s strategy after history x t.

Proof of the (Perfect) Folk Theorem As in the grim trigger strategy SPNE for PD, the above strategies constitute a SPNE following a deviation by any player. To show that no one can do better than unilaterally deviating from σ i, we just need to show no one can benefit from a one-shot deviation σ i : the inequality is satisfied when u i (σ i, σ i ) + δu i(σ ) 1 δ u i(σ) 1 δ. δ (1 δ) (u i(σ i, σ i) u i(σ)) u i (σ) u i (σ. ) If σ is a mixed strategy, the randomization of each player needs to be verifiable ex post.

Significance of the Folk Theorem Any strategy profile that is Pareto superior to some Nash equilibrium of the stage game can be supported as infinite play on the equilibrium path of a SPNE if the players are patient enough. There are other versions of the Folk Theorem, in which any feasible payoff profile of the stage game that exceeds each player s min-max payoff can be sustained in a SPNE of the infinitely repeated game, as long as the discount factor is high enough.