The folk theorem revisited

Similar documents
Repeated Games with Perfect Monitoring

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5

Introduction to Game Theory Lecture Note 5: Repeated Games

Game Theory Fall 2003

Finite Memory and Imperfect Monitoring

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012

Repeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games

Game Theory. Wolfgang Frimmel. Repeated Games

February 23, An Application in Industrial Organization

Finite Memory and Imperfect Monitoring

G5212: Game Theory. Mark Dean. Spring 2017

CHAPTER 14: REPEATED PRISONER S DILEMMA

Lecture 5 Leadership and Reputation

Stochastic Games and Bayesian Games

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Infinitely Repeated Games

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES

Stochastic Games and Bayesian Games

Switching Costs in Infinitely Repeated Games 1

Game Theory for Wireless Engineers Chapter 3, 4

Repeated Games. EC202 Lectures IX & X. Francesco Nava. January London School of Economics. Nava (LSE) EC202 Lectures IX & X Jan / 16

Game Theory: Normal Form Games

ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games

6.254 : Game Theory with Engineering Applications Lecture 3: Strategic Form Games - Solution Concepts

Regret Minimization and Security Strategies

Mixed-Strategy Subgame-Perfect Equilibria in Repeated Games

ECE 586BH: Problem Set 5: Problems and Solutions Multistage games, including repeated games, with observed moves

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.

A Decentralized Learning Equilibrium

Repeated Games. Debraj Ray, October 2006

Renegotiation in Repeated Games with Side-Payments 1

Repeated Games. Olivier Gossner and Tristan Tomala. December 6, 2007

On Existence of Equilibria. Bayesian Allocation-Mechanisms

Maintaining a Reputation Against a Patient Opponent 1

Yao s Minimax Principle

The Limits of Reciprocal Altruism

High Frequency Repeated Games with Costly Monitoring

REPEATED GAMES. MICROECONOMICS Principles and Analysis Frank Cowell. Frank Cowell: Repeated Games. Almost essential Game Theory: Dynamic.

Duopoly models Multistage games with observed actions Subgame perfect equilibrium Extensive form of a game Two-stage prisoner s dilemma

Repeated Games. Econ 400. University of Notre Dame. Econ 400 (ND) Repeated Games 1 / 48

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES

A Core Concept for Partition Function Games *

Microeconomics of Banking: Lecture 5

Optimal selling rules for repeated transactions.

Kutay Cingiz, János Flesch, P. Jean-Jacques Herings, Arkadi Predtetchinski. Doing It Now, Later, or Never RM/15/022

Introductory Microeconomics

SUCCESSIVE INFORMATION REVELATION IN 3-PLAYER INFINITELY REPEATED GAMES WITH INCOMPLETE INFORMATION ON ONE SIDE

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015

Basic Game-Theoretic Concepts. Game in strategic form has following elements. Player set N. (Pure) strategy set for player i, S i.

preferences of the individual players over these possible outcomes, typically measured by a utility or payoff function.

REPUTATION WITH LONG RUN PLAYERS

January 26,

Mixed Strategies. In the previous chapters we restricted players to using pure strategies and we

Microeconomic Theory II Preliminary Examination Solutions

The Core of a Strategic Game *

IPR Protection in the High-Tech Industries: A Model of Piracy. Thierry Rayna University of Bristol

Credible Threats, Reputation and Private Monitoring.

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

Communication in Repeated Games with Costly Monitoring

Subgame Perfect Cooperation in an Extensive Game

Long run equilibria in an asymmetric oligopoly

EC487 Advanced Microeconomics, Part I: Lecture 9

1 Solutions to Homework 4

Barro-Gordon Revisited: Reputational Equilibria with Inferential Expectations

M.Phil. Game theory: Problem set II. These problems are designed for discussions in the classes of Week 8 of Michaelmas term. 1

Efficiency in Decentralized Markets with Aggregate Uncertainty

Chapter 8. Repeated Games. Strategies and payoffs for games played twice

Rational Behaviour and Strategy Construction in Infinite Multiplayer Games

Microeconomics II. CIDE, MsC Economics. List of Problems

Player 2 L R M H a,a 7,1 5,0 T 0,5 5,3 6,6

Answer Key: Problem Set 4

A reinforcement learning process in extensive form games

Lecture 7: Bayesian approach to MAB - Gittins index

Price cutting and business stealing in imperfect cartels Online Appendix

Introduction to Game Theory

Competing Mechanisms with Limited Commitment

MA300.2 Game Theory 2005, LSE

Prisoner s dilemma with T = 1

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3

Best response cycles in perfect information games

arxiv: v1 [cs.gt] 12 Jul 2007

Parkash Chander and Myrna Wooders

BAYESIAN GAMES: GAMES OF INCOMPLETE INFORMATION

Chapter 2 Strategic Dominance

The Nash equilibrium of the stage game is (D, R), giving payoffs (0, 0). Consider the trigger strategies:

Total Reward Stochastic Games and Sensitive Average Reward Strategies

On Forchheimer s Model of Dominant Firm Price Leadership

Columbia University. Department of Economics Discussion Paper Series. Repeated Games with Observation Costs

BOUNDS FOR BEST RESPONSE FUNCTIONS IN BINARY GAMES 1

PAULI MURTO, ANDREY ZHUKOV. If any mistakes or typos are spotted, kindly communicate them to

On the existence of coalition-proof Bertrand equilibrium

13.1 Infinitely Repeated Cournot Oligopoly

An Ascending Double Auction

REPUTATION WITH LONG RUN PLAYERS

Boston Library Consortium Member Libraries

Information Aggregation in Dynamic Markets with Strategic Traders. Michael Ostrovsky

AUCTIONEER ESTIMATES AND CREDULOUS BUYERS REVISITED. November Preliminary, comments welcome.

Preliminary Notions in Game Theory

Transcription:

Economic Theory 27, 321 332 (2006) DOI: 10.1007/s00199-004-0580-7 The folk theorem revisited James Bergin Department of Economics, Queen s University, Ontario K7L 3N6, CANADA (e-mail: berginj@qed.econ.queensu.ca) Received: November 20, 2003; revised version: November 1, 2004 Summary. This paper develops a simple instant-response model of strategic behavior where players can react instantly to changing circumstances, but at the same time face some inertia after changing action. The framework is used to reconsider the folk theorem and, in particular, the role of the key condition of dimensionality. In contrast to the discounted case in discrete time, here low dimensionality may help support equilibria because it is more difficult for a potential deviator or punisher to defect beneficially. Keywords and Phrases: Instant response, Folk theorem. JEL Classification Numbers: C72, C73. 1 Introduction The term Quick-Response Equilibria was coined by Anderson (1984) in the study of dynamic games and captures the idea that agents can react quickly to changing circumstances. Here, in a variation on the terminology, the term instant-response is used to reflect the possibility that individuals can react in a time frame so short that a player cannot gain for a strategy change before the reaction occurs. This paper develops an instant response model of behavior based on two simple behavioral assumptions: if a player initiates a change in action, there is an identifiable first time, and a player cannot switch action twice at the same time (for example a player cannot choose x prior to time t, y at time t, and z at all times greater than t.) From a descriptive point of view, these assumptions are natural; technically their role is to unambiguously associate outcomes to strategies. Thanks are due to an anonymous referee for detailed comments.

322 J. Bergin An important motivating application is the standard repeated game. There, a central theme is that substantial cooperation can be sustained in long run relationships: short run gains from unilateral pursuit of one s own self interest ( defection ) are offset against the resulting breakdown in cooperation and the impact this has on future payoffs. When payoffs are averaged uniformly, the folk theorem asserts that all feasible and individually rational payoffs are equilibria. With discounting the theorem is cast as a limiting result, as the future is less and less heavily discounted; but it is subject to a key qualification the dimensionality condition introduced by Fudenberg and Maskin (1986). Low dimensionality limits the scope for selective punishment of individuals, and it is this fact which leads to a discrepancy in the equilibrium payoff set when comparing averaging and discounting. With the limiting average criterion, dimensionality is irrelevant a one period payoff gain (from deviation) doesn t affect a player s overall payoff, nor does any finite length of payoff loss incurred during punishment (of a deviation). Because of these considerations, the folk theorem holds even in the low dimensional case with the limiting average payoff criterion. When payoffs are discounted, one shot improving deviations produce an overall gain, and so punishment is both necessary for deterrence, and costly to implement. Thus, the ability to punish (selectively) becomes important and is ensured with full dimensionality. The scope for relaxing the dimensionality condition has been explored in detail (Abreu et al., 1994: Wen, 1994). As long as it is possible to punish deviations selectively the folk theorem continues to hold: gains from deviation can be taken away by player specific punishment. In either case (averaging or discounting), deviation from agreement produces gains before reaction can occur; but when payoffs are discounted the ability to inflict punishment and lower payoffs selectively is the key deterrent in the model where time is measured discretely. In contrast, with the continuous time framework, if reaction occurs before gains to a deviation accrue, the need to inflict punishment is no longer present. This changes the issue from balancing gain against punishment to one of coordination (and the incentive to coordinate reaction) such that a deviator doesn t gain. In the particular case of the (low dimension) Fudenberg-Maskin example, the individually rational payoffs turn out to be payoff outcomes of a subgame perfect equilibrium in the instant response formulation. There, given a deviation by one player, no other player has an incentive not to respond given others will: and given that other players respond instantly, the deviating player is constrained by the restriction of not moving twice at the same time, so that the response punishes effectively. However, in a second example, again where the payoff space is one dimensional, not all individually rational feasible payoffs are sustainable as subgame perfect equilibria: in this example there is an incentive to fail to coordinate in preventing gain by a defector, and it is for this reason that the folk theorem fails in continuous time. Thus, the specific structure of payoffs relative to strategies is the critical issue in games with a low dimensional payoff structure, rather than low dimensionality per se. The model is developed next; section 3 considers the folk theorem. Although strategies are formulated for repeated games, these ideas can be extended to more

The folk theorem revisited 323 general games (such as stochastic games with suitable restrictions on transition probabilities). 2 The framework The definition of a strategic game, G = {A i, u i } n i=1, is standard. There is a set of players, I = {1,..., n}, where i I has action set A i and payoff function u i : A R (A def = n i=1 A i). In the repeated game a set of time periods, T, is given, and players make a choice at each t T. Recall that when time is discrete (T = {1,... }), an outcome for the game is a sequence of period choices, h = {a t } t T (a t A), generating corresponding payoffs for each player i, {u i (a t )} t T. The payoff attached to the (discrete time) repeated game with averaging is lim inf τ τ t=1 u i(a t ), and the payoff to dis- 1 τ counting is (1 δ) t=1 δt 1 u i (a t ). A strategy for i is a sequence of functions σ i = (σ i (t)) t T, where σ i (1) A i and for t > 1, σ i (t) : A t 1 A i. With that framework, a strategy profile σ = (σ 1,..., σ n ), determines an outcome in the game period by period: a 1 = h(1) = (σ i (1)) n i=1, a2 = h(2) = (σ i (h(1), 2)) n i=1, and given a history to time t > 1, h t = (h(1),..., h(t 1)), a t = h(t) = (σ i (h t, t)) n i=1 ; and so on. Adopting the convention that h 1 denotes no information, at any time t 1, given h t, the strategy profile determines the action choice at each subsequent period, h t = (h(t),..., h(t )). In this paper the game is played in continuous time, and the appropriate time domain is T = [0, 1). An outcome in the game is a function h : T A, associating actions to each point in time. As before, A = i A i and take A i to be a compact metric space (with metric ρ i ). The corresponding payoff flow for i is {u i (h(t))} t T. Time is weighted (discounted) according to an atomless distribution λ, with full support equal to [0, 1). If the outcome is h, the associated payoff to i is V i (h) = T u i(h(t))λ(dt). In the discrete time environment, the unequivocal association of outcomes to strategies permits a straightforward definition of subgames and subgame perfect equilibrium. But, in the continuous time game, ambiguity arises because it is impossible to proceed on a period by period basis to iteratively construct outcomes from strategies. Defining a strategy as a rule selecting a choice at each history unambiguously determines an outcome in the game when time is discrete. But, in the continuous time game, subtle issues arise because it is impossible to proceed on a period by period basis to iteratively construct outcomes from strategies. The following example illustrates. Example 1. A difficulty in the formulation of strategies in continuous time was noted by Anderson (1984). Consider the prisoner s dilemma game where each player begins by cooperating, and at each time chooses cooperation if the other player has chosen cooperation always in the past; otherwise, the player chooses defect. In the discrete time setting this is an unambiguously defined strategy producing the outcome where each player chooses cooperation throughout the game. Likewise, on all subgames, outcomes are unambiguously determined.

324 J. Bergin However, in the continuous time context, for any t > 0, a history of cooperation up to and including t, and defection thereafter is consistent with the strategies. On [0, t] cooperation is observed; on (t, 1) defection is observed. At any τ (t, 1), there is τ (t, τ) at which defection has occurred, justifying the defection at τ, according to the strategy. Thus, a sensible verbal description of strategies leads to ambiguity or incoherency in the determination of outcomes. One approach to this difficulty is to build lags into the model. Introducing endogenous lags in the continuous time framework can be motivated by the observation that people do in fact change decisions at discrete steps in time. That formulation is developed in Bergin (1992) and Bergin and MacLeod (1992), and is based on lags of endogenous length. There, players cannot react infinitely fast, but there is no lower bound on the speed of reaction. An alternative procedure is developed by Simon and Stinchcombe (1989), where continuous time strategies are uniquely defined for any arbitrarily fine grid. They also provide a detailed discussion of the technical and conceptual issues involved in modeling continuous time games. 2.1 Formulation of strategies As example 1 illustrates, strategies specified pointwise do not determine a unique outcome: some restrictions are required. Yet, any formulation that permits instant response inevitably faces the difficulty. The formulation here rests on two simple assumptions. The first assumption requires that if a player initiates a deviation (rather than respond) there must be an identifiable first time at which the deviation began it is impossible to contemplate immediate reaction if one cannot identify when a change occurs. (In Example 1, there is no first time at which the deviation began.) The second assumption is the instant response assumption which allows a player to react instantly to a deviation by some other player. But a player cannot instantly react following a change of own action. Implicitly, this enforces the condition that a player cannot switch actions twice in an instant of time after changing action there is some inertia before further change can occur. So, whether initiating a change or after reacting to a change, there is some lag before subsequent changes by the same individual, although this lag may have no lower bound. These are behavioral assumptions concerning the capability of a player to implement action change over time. In what follows, these features are formalized to define continous time strategies. A history for i is a function h i : T A i. An outcome in the game, h, is a collection of such histories, one for each player. The set of outcomes in the game is H = {h = (h 1,..., h n ) h i : T A i, i = 1,..., n}. Define a strategy for i as a function σ i : H T A i. For fixed h, σ i (h, ) : T A i. Because actions can only depend on the past, we require that if h(τ) = h (τ), τ [0, t) then σ i (h, t) = σ i (h, t). 1 Call this requirement measurability. In view of the example above, additional conditions are required if a strategy is to determine an 1 In the discrete time game this is captured directly by writing σ i (t) : A t 1 A i.

The folk theorem revisited 325 outcome unambiguously. To formalize the ideas on the structure of response times described above, some notions of restricted variation in action are required: these are developed next. Definition 1. Let f be a function from the set of times to the sets of actions for i, f : T A i. Say that f is left convergent at t if the limit, lim τ t f(τ), exists; in which case, denote it f (t). Say that f is left continuous at t if the limit, lim τ t f(τ), exists and f (t) = f(t). Say that f : T A i is right constant at t [0, 1), if ɛ > 0, f(τ) = f(t), τ [t, t + ɛ). Put differently, f is left convergent at t if f (t) A i such that ρ i (f(τ), f (t)) 0 as τ t. (If f is not left convergent at t, then by compactness of A, δ > 0 such that for any ɛ > 0 there are τ, τ (t ɛ, t) such that ρ i (f(τ), f(τ )) > δ.) The functions h i and σ i (h, ) are particular functions mapping from T to A i. For example, if A i = [0, 1] and h i (t) = t k, t [t k, t k+1 ) where t k = 1 2 ( 1 2 )k, k = 1, 2,..., then h i is left convergent at t = 1 2 with hi ( 1 2 ) = 1 2. If hi ( 1 2 ) = 1 2, h i is left continuous at t = 1 2. Say that h = (h1,..., h n ) : T A 1 A n, is left convergent if each h i is left convergent and left continuous if each h i is left continuous. With this terminology, Definition 2 provides the conditions defining a strategy. Later, these are shown to be sufficient to address the issues raised by Example 1. Definition 2. A strategy for i is a function σ i : H T A i such that for all h H: 1. σ i (h, t) is right constant at t = 0; if h is left continuous at t > 0, then σ i (h, t) is right constant at t. 2. If h is not left continuous at t: (a) if h i is not left continuous at t, then σ i (h, t) is right constant at t. (b) if h i is left continuous at t then δ i (h, t) > 0, a i A i, σ i (h, τ) = a i, τ (t, t + δ i (h, t)). Denote by Σ i the set of strategies of player i. The definition of a strategy formalizes the intuition given earlier. Still, the issues are delicate, so the features of a strategy merit some detailed discussion. 2.1.1 Interpretation of strategies Condition 1 ensures that if a player initiates a change in choice, there is an identifiable first time. For example, if τ (t, t) some t < t, h(τ) = h(t) = a, so that h is left continuous, then right constancy of σ i requires that σ i (h, τ) = a for all τ in some interval [t, t) with t > t. In particular, if i choose a at t then σ i (h, τ) = b a for all τ in some interval (t, t) any t > t, is excluded as a possibility, since this would imply that each player, including i, chose a constant action from some time

326 J. Bergin t < t up to (and including) period t, but with i choosing b on (t, t) so that there is no first time at which i s choice switched from a to b. Note that this restriction is consistent with a player, say i, playing a i on [t ɛ, t) and a i a i on [t, t + ɛ ) where ɛ, ɛ > 0 (since left continuity is not satisfied at t). Condition 2 allows instant response, but a deviating player cannot instantly respond to an own initiated deviation. If h is not left continuous at t then at least one player has an action change at t. For any such player, say i, there are two ways in which an action change can occur. If h i is left convergent at t, then the limit h i (t) exists but for some δ > 0, ρ i (h i (t), h i (t)) δ; if h i is not left convergent at t > 0, then there is a δ > 0, such that for ɛ > 0, τ, τ (t ɛ, t) and ρ i (h i (τ), h i (τ )) δ, and so there is some δ > 0 such that for small ɛ > 0 there is a τ (t ɛ, t) and ρ i (h i (τ), h i (t)) δ. From 2a, if h has an action change at t, any player i who changed action at t must continue with the action chosen at t from some time forward (the strategy is right constant). Condition 2b requires that if some other player(s) changed action at time t but i did not, then i can switch (instantly) to some action, a i, which must then be chosen for a length of time, depending on the history and current time: i is free to react. In this formulation, those players that did not switch action at t (those for which h i (t) = h i (t)) can react instantly to the change that occurred at t but they cannot react instantly to each other: if (player) 1 switches action at t and 2 then responds by switching from, say, a at t to b on (t, t + ɛ), then player 3 cannot respond by switching from c at t to d on (t, t + ɛ ), taking into account the switch in actions of both 1 and 2. This would involve 3 reacting not only to 1 but also to 2, although there is no first time at which 2 switched action. Put differently, individuals who did not switch at time t are symmetric in terms of the information they possess when choosing an action on (t, t + ɛ), for some ɛ > 0. Finally, since σ i maps to A i, the strategies used are pure strategies. One consequence of this is that the min-max payoffs defined later are relative to pure strategies. There are a number of conceptual difficulties with the use of min-max strategies involving randomization in the continuous time framework. For example, if the min-max strategy involves randomization by a player, and that player chooses the min-max strategy at each instant in time, then viewed as a stochastic process, measurability of the strategy implies that the distributions on choices at different points in time are not independent (although the distributions on actions at each point in time are the same the minmax distribution). In principal, the player subject to punishment could exploit this fact to raise their payoff above the min-max level even during a punishment phase. There is no obvious (or perhaps no natural) way to address this issue; here this and other conceptual issues associated with randomization in continuous time are avoided with the focus on pure strategies. 2.1.2 Strategies and outcomes Definition 2 imposes behavioral restrictions in defining a strategy. It is necessary to confirm that strategies so defined do indeed unequivocally determine outcomes. A key point of example 1 above is that informally defined strategies may not un-

The folk theorem revisited 327 ambiguously determine outcomes. A strategy profile σ = {σ i } n i=1 is a collection of n strategies, one for each player. Say that a strategy profile σ determines a unique outcome at history-time pair (h, t) H [0, 1), if given (h, t), there is a unique h H such that σ( h, τ) = h(τ), τ [0, 1) if t = 0; and if t > 0 then h(τ) = h(τ), τ [0, t) and σ( h, τ) = h(τ), τ [t, 1). Call a strategy profile coherent if outcomes are unambiguously determined at every (h, t) pair this is formalized in Definition 3. Definition 3. A strategy profile {σ i } n i=1 is coherent if it determines a unique outcome at every history and time period pair (h, t) H [0, 1). The next result confirms that strategies as formulated above (in Definition 2) are coherent. Theorem 1. Every strategy profile is coherent: given any (h, t) a strategy profile determines a unique outcome on [0, 1). Proof. At t = 0, for each i, σ i (h, 0) = a i for some a i A i. From right continuity of σ i, for any h there is ɛ i > 0 such that σ i (h, t) = a i, t < ɛ i. In particular, for any, h with h (t) = a, t < ɛ = min i ɛ i, σ(h, t) = a. Since σ i (h, t) is determined on [0, ɛ), σ i (h, ɛ) is uniquely determined (by measurability). If at t = ɛ, h (t) = h(t), then h is left continuous at t = ɛ and condition 1 in the definition of a strategy applies. So, ɛ > 0 such that for h equal to χ [0,ɛ) (t) a + χ [ɛ,ɛ+ɛ )(t) a on [0, ɛ + ɛ ), σ(h, t) = h, t [0, ɛ + ɛ ). If at t = ɛ, h (t) h(t), then for one or more i, h i (t) h i (t). For any such i condition 2 implies that given a i = h i(t) it must be that σ i (h, t) = a i on a neighborhood of the form [t, t + ɛ(h, t)), where ɛ(h, t) > 0 from the condition of right constancy. If h i (t) = h i (t) (but h (t) h(t)), then from condition 2b, there is some a i, σ i(h, t) = a i on a neighborhood of (t, t + ɛ(h, t)). Let a be the vector of a i s determined this way. Then, from (2), for some ɛ > 0, if h agrees with χ [0,ɛ) (t) a + χ [ɛ,ɛ+ɛ )(t) a, then σ(h, t) = χ [0,ɛ) (t) a + χ [ɛ,ɛ+ɛ )(t) a on [0, ɛ + ɛ). Proceed iteratively in this way, constructing a unique outcome on lengthening intervals I 1 = [0, t 1 ), I 2 = [0, t 2 ),... I j = [0, t j ), with t j > t j 1. Let I = tj 1[0, t j ) = [0, t ). If t < 1, the earlier calculation can be repeated to identify a unique outcome further in time, t > t. The upper bound on this process is at 1. Thus, a strategy profile determines a unique outcome, say h. If at some t, the outcome does not agree with h, so the actual history agrees with h where h differs from h somewhere on [0, t). From measurability, σ( h, t) is determined. The earlier construction can now be applied to determine a unique outcome on (t, 1). Thus, given a strategy profile σ, a unique outcome, h, is determined. Similarly, given t (0, 1), and history to t agreeing with h, σ determines a unique outcome, h in the game: h (τ) = h (τ), τ [0, t) h (τ) = h(τ), τ [t, 1), for some outcome h.

328 J. Bergin 2.2 Payoffs, games and subgames From the previous discussion, since strategies are coherent, given a strategy, σ, a unique h is determined. And given a history h and time t (0, 1), σ determines a unique history on [t, 1). For t (0, 1) write h(h, σ, t) to denote the history that equals h on [0, t) and is determined by σ given (h, t) on [t, 1). Thus, for example, h(h, σ, t)(τ) = h(τ), 0 τ < t. At t = 0 denote the history fully determined by σ as h(h, σ, 0) although h(h, σ, 0) is independent of h, the notation preserves symmetry with the case where t > 0. From this formulation, payoffs as functions of strategies may be defined. Recall that the payoff to player i from history h is [0,1) u i(h(τ))dτ. So, with strategy profile σ = (σ i ) n i=1, the corresponding payoff is Vi 0 (σ 1,..., σ n ) = u i ( h(h, σ, 0)(τ))λ(dτ) [0,1) Definition 4. The infinitely repeated game in continuous time is defined: Γ = (Σ i, V i ) n i=1, where Σ i is the strategy space of i and V i the payoff function. With these details in place, the definition of Nash equilibrium is standard. A strategy profile σ Σ i is a Nash equilibrium if for each i, Vi 0(σ) V i 0(σ i, σ i), for all σ i Σ i. The following discussion provides a framework for the formulation of subgames and subgame perfection. For t [0, 1), a game may be defined on [t, 1). For 0 t < 1, let H t = {h : [t, 1) A} be the set of functions from [t, 1) to A, and for 0 t 1 let H t = {h : [0, t) A} be the set of functions from [0, t) to A. Thus, H 0 = H 1 = H. Taking H t and [t, 1) as the history and time domain respectively, let player i s strategy map from H t [t, 1) to A i, σi t : H t [t, 1) A i and assume that the strategy on this domain satisfies the conditions analogous to those in Definition 2. Such a strategy determines a unique outcome ĥ(h t, σ t, t)(τ), τ [t, 1), analogous to h(h, σ, 0)(t), t [0, 1) (so in particular ĥ(h t, σ t, t)(τ) is independent of h t H t, while for 1 > t > t, ĥ(ht, σ t, t )(τ) may depend on the value of h t in the interval [t, t )). From this, the payoff associated to σ t is Vi t(σt 1,..., σn) t = [t,1) ui(ĥ(ht, σ t, t)(τ))λ(dτ), where h t H t and the game begins in period t. Together, the strategy spaces, Σi t, i = 1,... n and payoff functions Vi t, i = 1,..., n, define a subgame Γ t = (Σi t, V i t)n i=1, of Γ 0. It remains to describe strategies induced on a subgame by a strategy profile in the initial game. Given h H, let h t (τ) def = h(τ), τ [0, t); and h t (τ) def = h(τ), τ [t, 1), so h t H t and h t H t. Considering a strategy σ i Σ i and time t, given any h H, the strategy σ (h,t) i (h t, τ) def = σ i ((h t, h t ), τ) is a strategy in Σ t. So, given any (h, t) there is an associated (sub)game, Γ t ; and given a strategy, σ i Σi 0, there is an implied (or induced) strategy in Γ t, assocated with (h, t). Thus, on the subgame associated with h t, the strategy derived from σ determines a payoff on the subgame: V (h,t) i (σ) = [t,1) u i( h(h, σ i, t)(τ))λ(dτ). Thus, V (h,0) i (σ) = Vi 0. A strategy profile σ is subgame perfect if for each i, and (h, t), V (h,t) i (σ) V (h,t) i (σ i, σ i), σ i Σ i.

The folk theorem revisited 329 3 The folk theorem revisited In characterizing equilibria in repeated games two sets are central: the feasible set and the individually rational set. Let u = (u 1,..., u n ), U = {u R n a A, u = u(a)} and define V to be the convex hull of U. The set V is the set of feasible payoffs. The minmax payoff of player i is defined as ν i = min a i max ai u i (a i, a i ). The set of feasible individually rational payoffs is defined as V = {v V v i ν i }. Loosely stated, the folk theorem asserts that the feasible individually rational payoffs arise in subgame perfect equilibrium when players are sufficiently patient. In the repeated n-player game with the limiting average criterion, for any point in V, there is a subgame perfect equilibrium with that payoff vector. This is shown by Rubinstein (1977) and Aumann and Shapley (1976). With discounting, for two player games and for n-player games where the dimension of V is n, Fudenberg and Maskin (1986) show that for each point u V, u i > ν i, i, there is a δ < 1 such that for any δ (δ, 1) there is a subgame perfect equilibrium with payoff u. Benoit and Krishna (1985) prove a similar result in the finitely repeated game with a large number of repetitions, again assuming the dimensionality condition. The importance of the dimensionality condition is show by the next example. Example 2. The following example is due to Fudenberg and Maskin. The game G 1 has three players, but the set of payoffs is 1-dimensional. In this game, player 1 chooses row, 2 chooses column and 3 chooses matrix. Each player in this game has a minmax payoff of 0, but for any δ < 1, there is no subgame perfect equilibrium with payoff lower than 1 4. However, while full dimensionality of the payoff set is a sufficient condition for the identification of subgame perfect equilibria with the feasible individually rational set, it is not necessary. Abreu, Dutta and Smith (1994) and Wen (1994) identify milder conditions under which the folk theorem continues to hold. Abreu, Dutta and Smith impose a pairwise dimensionality condition; Wen modifies the notion of minmax payoff, recognizing the possible perfect correlation of payoffs. Values of δ close to 1 naturally identify with small time intervals between moves. Let ρ be the instantaneous discount rate so that discount factor per unit of time,, is δ = e ρ ; for t periods, the discount rate is then δ t = e ρ t, and as 0, δ 1. As noted earlier there is a discontinuity in the subgame perfect equilibrium outcome set (comparing the limiting average with limiting discounting (where δ 1). In the discrete or discretized model, no matter how small, there is scope for unilateral gain which can be sustained for a period : the Fudenberg- Maskin example illustrates that low dimensionality in payoff space may impose a

330 J. Bergin uniform bound away from the individually rational payoff vector in all subgame perfect equilibria, (for all discount rates δ). In the quick response formulation, matters are dramatically different. Proposition 1. For the game G 1, the payoff (0, 0, 0) is a subgame perfect equilibrium payoff. This implies that the set of subgame perfect equilibrium payoffs is {(x, x, x) x [0, 1]}. Proof. At any point in time, t, the choice profile is one of eight (2 3 ) possibilities. Consider one such possibility, (R, R, L). The following discussion shows that there is a subgame perfect equilibrium where players choose (R, R, L) at every period along the equilibrium path. This has corresponding payoff (0, 0, 0). Suppose that σ(h, 0) = (R, R, L). By right continuity, there is some positive t, a A, and h with h (t) = a and σ(h, τ) = h (τ), τ < t. Thus on an open interval [0, t), (R, R, L) is played at each point in time. Suppose some player deviates. Only player 3 can gain so suppose that player 3 deviates to R at t, leading to choice (R, R, R) at time t. Let the strategies of 1 and 2 specify a switch to L, moving the action profile to (L, L, R). From 2a, player 3 now plays R for some positive length of time past t, players 1 and 2 can react, so let both switch to L. From 2b, players 1 and 2 now play L for some positive length of time. This can be depicted: (R,R,L) {}}{ (L,L,R) { }} { )[t]( (R,R,R) All players now play (L, L, R) for some period of time. If 3 deviates again, to L at time t, a similar cycle occurs: (L, L, R) <t (L, L, L) [t] (L, R, R) >t. 2 Going forward in time determines a subgame perfect equilibrium with payoff (0, 0, 0). Starting from any strategy profile in the game, a deviation by one player followed by a change of action by the other two players leaves the payoff at 0, and no player can gain by deviation. The key feature of this equilibrium is that following a deviation by one player, quick response by the other two is optimal in that neither gains by unilaterally not adopting the specified response. The agreement to punish is not subject to cheating. Furthermore, once both switch, the deviator is locked in for some positive length of time (so that no tail-chasing of strategy choice occurs). The next result shows that the subgame perfect equilibrium payoff sets are nested. Let SP δ be the set of subgame perfect equilibrium payoffs in the game with discount factor δ < 1, and let SP 1 be the corresponding set when the limiting average criterion is used. Finally, let SP be the instant response subgame perfect equilibrium payoffs. Then, Theorem 2. For 0 < δ < 1, SP δ SP SP 1. 2 The notation (L, L, R) <t means that the triple (L, L, R) was chosen for a period of time up to t (on some interval (t, t), t < t). Similarly (L, R, R) >t means that (L, R, R) is chosen for some period immediately after t (on (t, t ), t > t).

The folk theorem revisited 331 Proof. To show SP SP 1 it is sufficient to show that payoffs in SP are never below the minmax level. Let b i be a selection from the best response correspondence and consider the function b i (σ i(h, t)). For some ɛ > 0, σ i (h, t) = a i for some a i A i. On [0, ɛ), define σ i (h, t) = b i (σ i(h, t)) = a i. Measurability of σ i implies that any associated history, h (equal to (a i, a i )) determines the value of σ i (h, ɛ). If σ i (h, ɛ) = a i, define σ i (h, t) = a i. In this case, the associated history is left continuous at t = ɛ, so that σ i (h, ) is right constant at t = ɛ and equal to a i on [ɛ, ɛ + ɛ) for some ɛ > 0. Define σ i (h, t) = b i (σ i(h, t)) as the strategy of player i on the interval [ɛ, ɛ + ɛ). If σ i (h, ɛ) = a i a i, define σ i (h, ɛ) = a i. The strategies σ i specify some constant choice, say â i, on an interval (ɛ, ɛ). Let σ i (h, t) = b i (â i) on (ɛ, ɛ + ɛ). This construction can be extended forward over the full time domain (as in theorem 1) and the strategy for i defined this way achieves a payoff at least as high as i s minmax payoff. To show SP δ SP, it is sufficient to show that every equilibrium in the game with discount factor δ defines a corresponding equilibrium in the instant response game. Partition T into intervals {[t k, t k+1 )} k=1 with λ([t k, t k+1 )) = δ k. Define a strategy in the continuous time game as follows. Let h d = {h t } t=1 be the subgame perfect equilibrium outcome path in the discrete time game. Define a continuous time outcome path h according to h(τ) = h s, τ [t s, t s+1 ). In the continuous time game, h gives the same payoff as {h t } in the discrete game. The outcome path h is a subgame perfect equilibrium outcome path: the following strategy profile is a corresponding subgame perfect equilibrium strategy. At t 1 = 0, define σ i ( h, 0) = h i (0) for all h. For t (t 1, t 2 ), if h = h on [0, t), let σ i ( h, t) = h i (t). If h h on [0, t), let t i d = inf{t [0, t) h i (t ) h i (t )}; put t = min{t i d }, and let I = {i t i d = t }. If I has just one element, there is a unique minimizer, say i. For i i, let σ i ( h, t ) = h i (t ) and σ i (h, t ) = a i, the defecting choice of i. Let a i be the strategy profile that minmaxes i. On (t, t 2 ) let i i play a i (the punishment strategy for i i ) and let i play a i. If players adhere to this strategy over (t, t 2 ), they continue with h after t 2. If there is a defection, the defecting player is punished in a similar manner. If there is not a unique minimizer, define i = min{i I }, = and proceed as before. This discussion might suggest that with the addition of quick response the issue of dimensionality disappears and equivalence between limiting average and limiting discounting is restored. However, the speed of response is not the only issue, as the next example illustrates. Example 3. This example modifies the previous game so that the feasible individually rational payoff set is still a one dimensional set with minmax payoff (0, 0, 0). However, in this case, the minmax payoff is not subgame perfect in the instant response framework. To see this, consider some triple of actions giving a payoff of

332 J. Bergin (0, 0, 0), for example, (R, R, L). A deviation by player 3 to R, requires a switch by player 1 to L, and no change in action by player 2, for punishment to take effect. But in this case, if 1 fails to switch, then for an instant of time 1 accumulates a payoff of 1, for a positive overall payoff. Similar issues arise with each of the choice profiles giving a payoff of (0, 0, 0). 4 Conclusion In fully characterizing equilibria of discrete time repeated games, the dimensionality condition has played a key role; with discounting this is intimately related to the ability to punish selectively whereas with limiting average the issue disappears since punishment is costless. This paper shows that when instant reaction is allowed low dimensionality is still a problem, but for very different reasons. In particular, it is not low dimensionality per se that precludes supporting all individually rational payoffs, but the way in which action profiles and corresponding payoffs are configured in low dimensional environments. In the instant response context, dimensionality is not the key issue; rather, it is the strategic structure of the game (the configuration of payoffs relative to actions) which is critical in identifying equilibrium payoffs. References 1. Abreu D., Dutta P.K., Smith, L.: The folk theorem for repeated games: a neu condition. Econometrica 64, 939 948 (1994) 2. Anderson, B.: Quick-response equilibrium. Working Papers in Economic Theory and Econometrics # IP-323, Center for Research in Management, University of California, Berkeley (1984) 3. Bergin, J.: A model of strategic behavior in repeated games. Journal of Mathematical Economics 21, 113 153 (1992) 4. Bergin, J., MacLeod, W.B.: Continuous time repeated games. International Economic Review 34(1), 21 37 (1992) 5. Fudenberg, D., Maskin, E.: The folk theorem in repeated games with discounting or complete information. Econometrica 54, 533 554 (1986) 6. Simon, L.K., Stinchcombe, M.B.: Extensive form games in continuous time: pure strategies. Econometrica 57, 1171 1214 (1989) 7. Wen, Q.: The folk theorem for repeated games with complete information. Econometrica 64, 949 954 (1994)