Convergence of Best-response Dynamics in Zero-sum Stochastic Games

Size: px
Start display at page:

Download "Convergence of Best-response Dynamics in Zero-sum Stochastic Games"

Transcription

1 Convergence of Best-response Dynamics in Zero-sum Stochastic Games David Leslie, Steven Perkins, and Zibo Xu April 3, 2015 Abstract Given a two-player zero-sum discounted-payoff stochastic game, we introduce three classes of continuous-time best-response dynamics, stopping-time best-response dynamics, closed-loop best-response dynamics, and open-loop best-response dynamics. We show the global convergence of the first two classes to the set of minimax strategy profiles, and the convergence of the last class when the players are not patient. We also show that the payoffs in a modified closed-loop bestresponse dynamic converges to the asymptotic value in the zero-sum stochastic game. 1 Introduction Continuous-time best-response dynamic is a well-known evolutionary dynamic. It is in the form of differential inclusion, with a constant revision rate in myopic optimization; see Matsui (1989), Gilboa and Matsui (1991), Hofbauer (1995), and Balkenborg et al. (2013). A state in the dynamic specifies the strategy profile of all players, and the frequency of a strategy increases only if it is a best response to the current state. The continuous-time best-response dynamic has been analyzed in various classes of games; see Hofbauer and Sigmund (1998) and Sandholm (2010). In particular, the convergence of a continuous-time best-response dynamic to the set of Nash equilibria has been shown in Harris (1994), Hofbauer (1995), and Hofbauer and Sorin (2006) for two-player zero-sum games, in Harris (1994) for weighted-potential games, and in Berger (2005) for 2 n games. A zero-sum stochastic game with discounted payoff is introduced in Shapley (1953). The study on non-zero-sum stochastic games is on the basis of zero-sum stochastic games. By Folk Theorem, the analysis on the former games often needs 1

2 the values in related zero-sum stochastic games, which serve as punishment to any player who deviate from the strategy agreed beforehand; see Dutta (1995). In the present paper, we introduce several candidates to the continuous-time best-response dynamic in two-player zero-sum stochastic games with discounted payoff. We should first point out that it is tricky to define a best-response dynamic in a stochastic game, and indeed no established notion is available in the literature yet. We construct a few dynamics in the present paper, and the convergence result heavily depends on the dynamic condition. The key question here is what information and how much rationality the players have in the game. For instance, given the stationary strategy of the opponent, can the player find her best stationary strategy against it? This innocent-looking question is not easy to answer, or at least the searching takes quite a long time, even if the player is equipped with a modern computer. Note that the best stationary strategy may not be a pure strategy due to the concave or convex property of discounted payoff in a stochastic game. Maybe a more suitable question for myopic game players is that given a stationary strategy profile, whether a player can compute the discounted payoff, or whether a referee is available for such information. For the convergence in best-response dynamic, one may wonder why not view a zero-sum stochastic game as a general zero-sum game with a compact and convex strategy space for each player, and apply results such as those shown in Hofbauer and Sorin (2006). Our answer is that the transition of states after each stage in a stochastic game makes the dynamic more complicated than those studied in Hofbauer and Sorin (2006). In particular, Hofbauer and Sorin (2006) consider only those games with payoff concave in player 1 s strategy space and convex in player 2 s strategy space. In a stochastic game, however, it can be the opposite case, i.e., convex to player 1 and concave to player 2: player 1 s strategy in the current state may give her not only a better payoff today but also a better state position tomorrow. Another natural attempt is to approximate the stochastic game by a super normal-form game where each player s strategy is a bounded sequence of actions in the state games. However, one can only find an approximate value of the game. Furthermore, the stationary minimax strategy in the stochastic game may still looks elusive, as we can only see how players best respond in a truncated stochastic game. Finally, as the discount factor increases in the game, i.e., the players become more and more patient, we need a bigger and bigger super normal-form game to approximate the stochastic game. For an evolutionary dynamic in a stochastic game, we assume that a player 2

3 does not have the full rationality to compute the global best response given the strategy profiles of all players. However, the player should be able to see the difference between her state-game payoff and the continuation payoff in that state. For stopping-time best-response dynamics and closed-loop best-response dynamics defined in this paper, we specify how the continuation payoff vector evolves in each state game in the dynamic. We then study how to apply the convergence result of continuous-time best-response dynamic in normal-form zero-sum games in the context of evolving state games. We first follow the original idea in Shapley (1953), define stopping-time bestresponse dynamic in a zero-sum stochastic game, and show the convergence of the dynamic to the minimax strategy profile. This dynamic requires that both players just play best response against each other as in a standard continuous-time best-response dynamic. The continuation payoff vector in each state game is only updated to the payoff vector in that state game at countably many times, and the time interval between the two consecutive updates is sufficiently long. The stopping-time best-response dynamic reminds us that for a state game with fixed continuation payoff vector, best-response dynamic converges to the minimax strategy profile in that zero-sum game, but how to apply it to the learning in the original stochastic game? To this end, for each state s, we introduce a feedback loop from the current payoff in the state game at s to the continuation payoff of state s assumed in each state game. We further require that players play best response against each other in all state games simultaneously, and the commonly shared continuation payoff vector is approaching the state-game payoff vector as in a continuous-time fictitious play. As time passes, the continuation payoffs change more and more slowly, but the players can still adjust their strategies in the same speed as before. The key to the convergence of closed-loop best-response dynamic in a zero-sum stochastic game is simply the different adjustment speed between best-response dynamics on players strategy and fictitious play on the continuation payoffs. In the literature, there are a few paper concerning algorithms for computation of the value in a zero-sum stochastic game with discounted payoff; see, e.g., Vrieze and Tijs (1982) and Borkar (2002). However, they do not study from the perspective of evolutionary or learning process. Closed-loop best-response dynamic instead follows a rudimentary approach in the real world: when the state is changing very slowly, the player may simply best response in the current state, even if they know that the future states depend on their current behavior and that the currently assumed continuation payoffs may not match the payoffs generated in 3

4 the state games. The familiar examples are the production with consumption of natural resources and the issue of global warming. In the long run, what you obtain in a state game must match the corresponding continuation payoff, and the players will finally learn what behavior is best suited and how much payoff could be sustained in each state. We can progress further and propose a variant of the closed-loop best-response dynamic such that the value in each state game converges to the asymptotic value of the zero-sum stochastic game when the discount factor increases to 1. For this purpose, we just make the discount factor changes even more slowly than the continuation payoffs. Note that we can only see the convergence in value here but not necessarily in stationary strategies. Similarly to Harris (1994), we can show that the rate of convergence in payoff terms is 1/t and 1/ ln t for closed-loop best-response dynamic and discount-factor-converging best-response dynamic, respectively. In the case that the feedback loop is unavailable, we introduce open-loop bestresponse dynamics, and assume that a referee is telling each player her discounted payoff given the current stationary strategy profile of both players in the zero-sum stochastic game. In this dynamic, a player is unable to understand the infinitely long stochastic state-transition process, but naively reduce the stochastic game to a zero-sum state game with current discounted payoff vector as the continuation payoff vector. When players best response against each other in a such defined state game, each player still assumes that the other player would use the same current strategy from stage 2 on, even if she is aware that the opponent is adjusting her strategy at stage 1 in the stochastic game. We regard the open-loop bestresponse dynamic as a primary model to study the myopic behavior in state games that approximate the stochastic game. We are also interested in whether this dynamic can serve as an alternative algorithm to compute the value of the zerosum stochastic game. In Section 3.3, we show that when the discount factor is not too big, i.e., when the players are not patient, the open-loop best-response dynamic converges to the set of stationary minimax strategy profiles. 2 The Model We begin by reviewing two-player normal-form zero-sum games. 4

5 2.1 Normal-form Zero-sum Games In a zero-sum game G where player 1 and 2 s pure strategy sets are A 1 are A 2, respectively, the (a 1, a 2 ) element u(a 1, a 2 ) in the payoff matrix denotes the payoff to player 1 when player 1 plays a 1 and 2 plays a 2. We can then linearly extend the payoff function of player 1 to mixed strategies, i.e. u(x 1, x 2 ) is defined for any x 1 (A 1 ) and x 2 (A 2 ). Recall the value of the game is v(g) = max min x 1 (A 1 ) x 2 (A 2 ) u(x 1, x 2 ) = min max x 2 (A 2 ) x 1 (A 1 ) u(x 1, x 2 ). A minimax strategy of player 1 guarantees the payoff no less than v(g), regardless of the strategy of player 2; similarly, a minimax strategy of player 2 guarantees the payoff no more than v(g). A minimax strategy profile is also a Nash equilibrium in G Preliminary Results Lemma 2.1. Given a positive finite number c, if we modify the payoff function u to u with the property u (a 1, a 2 ) u(a 1, a 2 ) c for all (a 1, a 2 ) A 1 A 2, then for any (mixed) strategy profile (x 1, x 2 ), u (x 1, x 2 ) u(x 1, x 2 ) c. Proof. This follows from the linear property of u. Lemma 2.2. Given a positive finite number c, if we modify the payoff function u to u with the property u (a 1, a 2 ) u(a 1, a 2 ) c for all (a 1, a 2 ) A 1 A 2, then v(g) v(g ) c, where G is the game with the modified payoff function u. Proof. For any minimax strategy profile (x 1, x 2 ) in G and any minimax strategy profile ( x 1, x 2 ) in G, we have Thus u(x 1, x 2 ) u(x 1, x 2 ) and u ( x 1, x 2 ) u (x 1, x 2 ). u(x 1, x 2 ) u ( x 1, x 2 ) u(x 1, x 2 ) u (x 1, x 2 ) max a 1,a 2 u (a 1, a 2 ) u(a 1, a 2 ) c, by Lemma 2.1. Similarly, we can show that u ( x 1, x 2 ) u(x 1, x 2 ) max a 1,a 2 u (a 1, a 2 ) u(a 1, a 2 ) c. 5

6 2.1.2 Best-response Dynamics in Normal-form Zero-sum Games In continuous-time best-response dynamics, player 1 revises her strategy according to the set of current best-response strategies br 1 (x 2 ) := argmax u(ρ 1, x 2 ); similarly, ρ 1 (A 1 ) for player 2, br 2 (x 1 ) := argmin u(x 1, ρ 2 ). A best-response dynamic in a normalform zero-sum game G ρ 2 (A 2 ) satisfies ẋ i br i (x i ) x i, i = 1, 2, (2.1) where the dot represents derivative with respect to time, and we have suppressed the time argument. Since best-response strategies are in general not unique, rigorously speaking, this is a differential inclusion, not always reduced to a differential equation. In normal-form zero-sum games, the set br i (x i ) is upper semicontinuous in x i. Hence, from any initial strategy profile (x 1 (0), x 2 (0)), a solution trajectory (x(t)) t 0 exists, and x(t) is Lipschitz continuous satisfying (2.1) for almost all t 0; see Aubin and Cellina (1984). Given a strategy profile (x 1, x 2 ), we define and H(x 2 ) := L(x 1 ) := max u(y 1, x 2 ), (2.2) y 1 (A 1 ) min u(x 1, y 2 ). (2.3) y 2 (A 2 ) For any t 0, we define in a solution trajectory (x t ) t 0 w(t) := H(x 2 (t)) L(x 1 (t)). (2.4) We call w(t) the energy of the dynamic at time t. It is straightforward to see that u(x 1 (t), x 2 (t)) v(g) w(t), t 0, (2.5) and that w(t) = 0 if and only if x 1 (t) and x 2 (t) are minimax strategies of player 1 and 2, respectively. Thus, at that time t, u(x 1 (t), x 2 (t)) = v(g). Harris (1994) and Hofbauer and Sorin (2006) show the following result. Theorem 2.3. Given a normal-form zero-sum game G, along every solution trajectory (x t ) t 0 of (2.1), w(t) satisfies ẇ(t) = w(t) for almost all t, (2.6) hence w(t) = e t w(0). (2.7) Thus, every solution trajectory of (2.1) converges to the set of minimax strategy profiles in G. 6

7 Sketch Proof: ẇ = H (x 2 ; ẋ 2 ) L (x 1 ; ẋ 1 ), where f (x i ; ẋ i ) denotes the one-sided directional derivative of f in the direction x i. The picked best-response strategies in the solution trajectory at time t are denoted by b 1 t br 1 (x 2 t ) and b 2 t br 2 (x 1 t ). Then, by the envelope theorem, we have ẇ = u(b 1, ẋ 2 ) u(ẋ 1, b 2 ). From (2.1), it follows that ẇ = u(b 1, b 2 x 2 ) u(b 1 x 1, b 2 ) = u(b 1, x 2 ) + u(b 1, x 1 ) = w. Hofbauer and Sorin (2006) have extended this convergence result to continuous concave-convex zero-sum games, i.e., u(x 1, x 2 ) is concave for each fixed x 2 and convex for each fixed x 1. Barron et al. (2009) further extend the convergence result to all continuous quasiconcave-quasiconvex zero-sum games for the bestresponse dynamics on convex/concave envelops of the payoff function. 2.2 Zero-sum Stochastic Games A two-player zero-sum discounted-payoff stochastic game is a tuple Γ = I, S, A, P, r, w constructed as follows. Let I = {1, 2} be the set of players. Let S be a finite set of states. For each player i in state s, A i s denotes the set of finitely many actions. For each state s, we put the set of action pairs A s := A 1 s A 2 s. For each pair of states (s, s ) and each action pair a A s, we define P s,s (a) to be the transition probability from state s to state s given the action pair a. We define r s ( ) to be the stage payoff function for player 1. That is, when the process is at a state s, r s (a) is the stage payoff to player 1 for the action pair a A s. Note that, in a zero-sum game, player 2 always receives stage payoff r s (a). ω is a discount factor that affects the importance of future stage payoffs relative to current stage payoff. 7

8 In any state s, player i can play a mixed action π i s (A i s): π i s(a i ) denotes the probability that when in state s, player i selects action a i A i s. In this paper, we only consider stationary strategies for both players. A stationary strategy π i i := s S (A i s) of player i specifies for each state s a mixed action π i s to be played whenever the state is s. For convenience, we may write i s for (A i s). We denote a strategy profile by π = (π 1, π 2 ) = ((π 1 s) s S, (π 2 s) s S ), and the set of strategy profiles by := 1 2. Given a strategy profile π, for any state s, we may write r s (π) = r s (π 1 s, π 2 s) = a A(s) π 1 s(a 1 )π 2 s(a 2 )r s (a), and similar treatment applies for the transition probability P s,s (π) from state s to s. We can then define the expected discounted payoff for player 1 starting in state s under the strategy profile π as [ u s (π) := E (1 ω) ] ω n r sn (π) s 0 = s, (2.8) n=0 where {s n } n N is a stochastic process representing the state of the process at each iteration, and (1 ω) is to normalize the discounted payoff. Of course, player 2 has an expected discounted payoff u s (π). Define b 1 := min r s (a), b 2 := max r s (a), and B := [b 1, b 2 ]. (2.9) s S,a A s s S,a A s Then B is the set of achievable discounted payoff in Γ, and u s (π) is in B for any strategy profile π starting at any state s. A Nash equilibrium π requires for both player i = 1 and 2 and for all states s in S, u i s( π) u i s(π i, π i ), π i i. (2.10) Note that in a zero-sum stochastic game Γ, any strategy of any player i in a Nash equilibrium is a minimax strategy of that player. Shapley (1953) proves that for all two-player zero-sum discounted-payoff stochastic games Γ starting at any state s, there exists a unique optimal value V s, called the value of state s, equal to the expected discounted payoff of player 1 which she can guarantee by any minimax strategy. Shapley (1953) further shows the existence of a stationary minimax strategy profile, and that for any stationary minimax strategy profile π, V s satisfies equations V s = (1 ω)r s ( π) + ω s S P s,s ( π)v s s S. (2.11) 8

9 We can also study the asymptotic behavior in a stochastic game Γ(ω) where ω increases to 1. Given a finite stochastic game, at each state s S, the asymptotic value lim ω 1 V s (ω) exists; see Bewley and Kohlberg (1976) and Mertens and Neyman (1981). Denote the set of stationary minimax strategies of player 1 and player 2 in the stochastic game by X 1 and X 2, respectively, and the set of stationary minimax strategy profiles by X. 2.3 A State Game If every stage payoff after the initial stage is a constant, then the stochastic game reduces to a one-shot normal-form game. Given a zero-sum stochastic game Γ, for any payoff vector u = (u s ) s S in finite numbers, we define for each state s a normal-form zero-sum state game G s ( u) upon action set A 1 s and A 2 s for player 1 and 2, respectively: the payoff function of player 1 in G s ( u) is zs u (a) := (1 ω)r s (a) + ω P s,s (a)u s, a A s. (2.12) s S As G s ( u) is a zero-sum game, player 2 receives payoff zs u (a). We view an action at state s in Γ as a strategy in the state game G s ( u). The payoff function above can be linearly extended to mixed strategy profiles in (A s ). We call the payoff vector u in (2.12) as the continuation payoff vector in the state game. We denote the value of this state game by v s ( u) and a minimax strategy of player i by x i s( u) for player i = 1, 2. We call G s ( V ) the value state game at state S, where V is defined in (2.11). To define a best-response dynamic in a stochastic game and to show the convergence, we will apply continuous-time best-response dynamics in state games. We are now ready to study three classes of continuous-time best-response dynamics in zero-sum stochastic games in discounted payoff. 3 Best-response Dynamics in Zero-sum Stochastic Games All three classes of continuous-time best-response dynamics below may be viewed as a variation of an agent-form best-response dynamic; see Appendix B. In these dynamics, each player in the stochastic game is represented by an agent in each state, and two agents continuously play best response against each other in the 9

10 state game under some condition, along with all other agents playing simultaneously in all other states. The profile of the strategies of agents in all state games for player i at any time t is the evolving (stationary) strategy of player i in the stochastic game at t in the dynamic. 3.1 Stopping-time Best-response Dynamics For each state in the zero-sum stochastic game Γ, we construct below a continuoustime best-response dynamic in that state game where its continuation payoff vector is updated at countably many discrete times. To be specific, the continuation payoffs in each state game are updated at a sequence of stopping times defined on the energy w s in that state game. We choose an arbitrary positive number µ (0, 1), and we first pick an arbitrary payoff vector u 0 with each element u s,0 B. Given any initial condition (x s (0)) s S, for each state s, consider a continuous-time best-response dynamics (x s (t)) 0 t T1 defined in (2.1) in the state game G s ( u 0 ), where T 1 is to be defined in (3.2) later. If there is no ambiguity, we abbreviate the payoff function in G s ( u 0 ) to z 0 s( ). We denote w s (t) to be the energy in G s ( u 0 ) at time t (0, T 1 ]. Note that when t (0, T 1 ], x s (t) is a minimax strategy profile in the state game G s ( u 0 ) if and only if w s (t) = 0. Moreover, from (2.5), it follows that We stop the best-response dynamic at time z 0 s(x s (t)) v s ( u 0 ) w s (t). (3.1) T 1 := min t 0 ( max s S w s(t) µ ), (3.2) and record u 1 := (z 0 s(x s (T 1 ))) s S. We then run the best-response dynamics in state games G s ( u 1 ) at all states s for all t (T 1, T 2 ] with ( ) T 2 := min max w s(t) µ/2, t T 1 s S where w s (t) is defined in G s ( u 1 ). After recording u 2 := (z 1 s(x s (T 2 ))) s S, we then run best-response dynamics in state games G s ( u 2 )... For completeness, we let T 0 = 0. In thus defined best-response dynamics (x s (t)) s S, there is an increasing sequence (T n ) n N, possibly with T n = T n+1 at some n, such that for each n 0, T n+1 := min t T n ( max w s(t) µ ), (3.3) s S 2 n 10

11 where w s (t) is defined in G s ( u n ), and u n+1 := (z n s (x s (T n+1 ))) s S is defined recursively. For every finite time t 0, there is n 0 such that T n t T n+1. (3.4) Note that we run the best-response dynamic in all state games G s ( u n ) with such n at this t. If there is no ambiguity about n, we further denote at each t (T n, T n+1 ]. y s (t) := z n s (x s (t)) s S Under thus defined stopping-time best-response dynamics (x s (t)) s S,t 0, at each state s, x s (t) is Lipschitz continuous except countably many times and w(t) is continuous in every period (T n, T n+1 ] for n 0, but possibly discontinuous at some T n. Lemma 3.1. Each T n is bounded. Proof. From the definition of b 1, b 2, and B in (2.9), it follows that a continuation payoff u s,n is always in B. Thus, at any state s and at any time t 0, 0 w s (t) b 2 b 1. (3.5) From Theorem 2.3, it follows that ẇ s (t) = w s (t) for almost all t in [T n, T n+1 ) for any n 0. Therefore, at any state s, w s (t) = w s (T n ) exp( (t T n )) T n < t T n+1, n 0. (3.6) From (3.5) and (3.3), we have lim t 0 w s (T + t) b 2 b 1 and max s S w s(t n+1 ) = µ 2 n, when T n < T n+1. For such pair of (T n, T n+1 ), we may then further infer from (3.6) that T n+1 T n ln(b 2 b 1 ) ln µ 2 n. (3.7) Theorem 3.2. For each state s, as t, y s (t) V s, and x 1 (t) and x 2 (t) converge to the set of stationary minimax strategies of player 1 and 2, respectively, in the stochastic game Γ. 11

12 Proof. The proof is analogous to the argument used in Shapley (1953). For each s S and n 0, denote by v s,n the value of the state game G s ( u n ). Note that u s,n+1 is the payoff to player 1 in the state game G s ( u n ) at time t = T n+1. After time T n+1, the state game G s ( u n ) transforms to G s ( u n+1 ) with continuation payoffs y s (T n+1 ) = u s,n+1 for each state s in S. From (3.3), it follows that at each state s, for each n 0, Therefore, for all n > 0, u s,n+1 v s,n µ 2 n. (3.8) u s,n+1 u s,n µ 2 n + µ 2 n 1 + v s,n v s,n 1. (3.9) Compare the state game G s ( u n 1 ) and G s ( u n ), we find that each element in the payoff matrix changes at most ω max s S u s,n 1 u s,n. Hence, by Lemma 2.2, From (3.9) and (3.10), it follows that max s S v s,n 1 v s,n ω max s S u s,n 1 u s,n. (3.10) max s S u s,n+1 u s,n 3µ 2 n + ω max s S u s,n 1 u s,n. By iteration, we find that for any n > 0, max s S u s,n+1 u s,n ω n ( max s S u s,1 u s,0 ) + n k=1 ( ω n k 3µ ). (3.11) 2 k Note that for fixed n, n k=1 (ωn k /2 k ) is increasing with respect to ω (0, 1). It then follows { n k=1 ω n k 2 k < ω n, when ω > 0.75, 2ω 1 2(0.75 n ), when ω Thus, from (3.11), as n increases to, max s S u s,n+1 u s,n decreases to 0. From (3.8), it then follows that max s S u s,n v s,n 0. (3.12) For each state s, as the state game G s ( u n 1 ) transforms to to G s ( u n ), w s (T n ) jumps at most 2ω max s S u s,n u s,n 1, i.e., lim t 0 w s (T n + t) w s (T n ) 2ω max s S u s,n u s,n 1. From (3.3), it then follows that w s (t) (2ω max s S u s,n u s,n 1 ) + µ/2 n 12

13 for all t (T n, T n+1 ]. Thus, w s (t) decreases to 0, as t increases to. Hence max s S y s (t) u s,n decreases to 0, where n is defined with respect to t in (3.4). Therefore, by (3.12) and (2.11), s S, y s (t) V s, as t. The convergence of x i (t) to X i with i = 1, 2 follows the standard arguments in Shapley (1953). Comment: By (3.7), we can define for the best-response dynamic a sequence of bounded stopping times independent of (w s (t)) s S such that Theorem 3.2 still holds. 3.2 Closed-loop Best-response Dynamics Inspired by Shapley (1953) and the stopping-time best-response dynamics, we study a continuous-time dynamic system where the continuation payoff vector is slowly and continuously affected by the current payoffs in all state games. There is a closed loop between the continuation payoff and the state-game payoff, and each player s strategy in each state game is always moving towards her current best response there. Given a zero-sum stochastic game Γ, we adapt continuous-time best-response dynamics (x s (t)) t 0 in each evolving state game in the following way. Pick an arbitrary u(0) = (u s (0)) s S with u s (0) B for every s S, where B is defined in (2.9). Suppose that the initial stationary strategy profile (x s (0)) s S is given. At each time t 0, for each state s S, we consider the state game G s (t) with continuation payoff vector u(t) defined in the following dynamic system u s (t) = y s(t) u s (t) (3.13) 1 + t ẋ i s br i (x i s ) x i s i = 1, 2, (3.14) where y s (t) := z ut s (x s (t)) is the payoff to player 1 in the state game G s (t). We call such a defined dynamic system a closed-loop best-response dynamic. (3.14) says that the best-response dynamics defined in (2.1) are played in the state game G s (t) at every time t 0, while the continuation payoff vector in G s (t) continuously evolves according to (3.13). Thus, there is a feedback from y s to u s such that u s is always moving towards y s, though more and more slowly. We may view that as a fictitious play applied on the continuation payoff vector with respect to the state-game payoff vector. 13

14 With similar argument to (2.1), from any initial condition (x s (0), u s (0)) s S, there exists a solution trajectory (x s (t), u s (t)) s S,t 0, where x s (t) and u s (t) are Lipschitz continuous for almost all t 0 at all states s; see Aubin and Cellina (1984).. For each state s in S, at each time t 0, denote the value of the state game G s (t) by v s (t). We still define w s (t) to be the energy in G s (t), as in (2.4). Lemma 3.3. For each state s in S, as t increases to infinity, y s (t) converges to v s (t) in the state game G s (t) and w s (t) decreases to 0. Proof. We prove it by the standard results of the convergence of best-response dynamics in normal-form zero-sum games. Suppose that a finite number ɛ > 0 is given. The definition of b 1 and b 2 in (2.9) implies that at any state s, y s (t) u s b 2 b 1 for all t 0. Therefore, it follows from (3.13) that there exists t ɛ > 0 such that u s (t) ɛ t t ɛ, s S. (3.15) On the one hand, from Harris (1994) and Hofbauer and Sorin (2006), we know that if u s (t) and w s (t) are differentiable and u s (t) = 0 holds for all states s at some time t, then ẇ s (t) = w s (t) at all s. On the other hand, from Lemma 2.1, it follows that given any period of time [t 1, t 2 ], if max s S u s (t 1 ) u s (t 2 ) c for some c > 0 and x(t 1 ) = x(t 2 ) for the state game G s (t 1 ) and G s (t 2 ), then w s (t 1 ) w s (t 2 ) 2ωc. From these two observations, it follows that the total derivative of w s satisfies for almost all time t 0. ẇ s w s + 2ω max u s s s S S Taking (3.15) into account, we can find a time T ɛ > t ɛ such that w s (t) 2ɛ for all states s and all time t T ɛ. Note that ɛ in (3.15) can be arbitrarily small. Hence, y s (t) converges to v s (t) in the evolving state game G s (t), and w s (t) converges to 0. Recall that for all states s, u s (t) and v s (t) are Lipschitz continuous and hence differentiable almost everywhere. For convenience, we mean in the lemmata below the right derivative whenever the derivative of u s or v s does not exist. We take an arbitrarily small ɛ > 0. Here are some preparation for the next lemma. 14

15 For any time t 0, we mark a state For any time t 0, we mark a state s(t) arg max s S y s(t) u s (t). (3.16) s(t) arg max s S v s(t) u s (t). (3.17) From Lemma 3.3, there exists a time t 1 such that for all t t 1 and all states s in S, Lemma 3.4. For any time t t 1, if then for any state s with the property we have y s (t) v s (t) (1 ω)ɛ/64. (3.18) u s(t) (t) y s(t) (t) ɛ 4, (3.19) us(t) (t) v s(t) (t) u s (t) v s (t) (1 ω)ɛ, (3.20) 32 d u s v s dt 1 2 (1 ω) du s( ) dt (1 ω)ɛ 8(1 + t). (3.21) This lemma says that under the condition (3.19), at any state s with the property (3.20), the distance u s v s is always decreasing with speed at least linear to 1/(1 + t). Proof. From Lemma 2.2, the definition of s(t), and the differential equation (3.13), it follows that s S, dv s dt ω du s( ) dt. (3.22) For a state with the property (3.20) at time t t 1, we may infer from the definition of t 1 that u s (t) y s (t) u s(t) (t) y s(t) (t) Thus, from (3.19), it follows that du s ɛ dt (1 ω)ɛ 4 16 ɛ 4 du s( ) dt ( ω + 1 ω 2 (1 ω)ɛ. 16 ) du s( ) dt. (3.23) Note that u s is moving to v s regardless of the movement of v s. By (3.22), we have ( ( d u s v s ω + 1 ω ) ) du s( ) + ω dt 2 dt. We complete the proof by (3.19). 15

16 Lemma 3.5. There exists time t(ɛ) such that for all t t(ɛ), u s(t) (t) y s(t) (t) ɛ. Proof. For any time t t 1, u s(t) (t) v s(t) (t) u s(t) (t) v s(t) (t) (1 ω)ɛ. 32 From Lemma 3.4, it follows that for all t t 1 with the property (3.19) d u s( ) v s( ) dt (1 ω)ɛ 8(1 + t). (3.24) Hence, there exists a time period (t 1, t 2 ) such that at any t between t 1 and t 2, (3.24) holds and u s(t2 )(t 2 ) y s(t2 )(t 2 ) ɛ/4. Recall that X i is the set of stationary minimax strategies of player i in Γ. Theorem 3.6. For each state s and for both players, as time t increases to infinity, both y s (t) and u s (t) converge to v s (t); x i s(t) converges to the set of stationary minimax strategies of player i = 1, 2 in the state game G s (t), and hence x i (t) converges to the X i in the stochastic game Γ. Proof. Note that y s (t) v s (t) by Lemma 3.3. We take a sequence (ɛ/2 n ) n 0 in Lemma 3.5 and find that u s (t) also converges to v s (t). We complete the proof by the results in Shapley (1953), in particular the equations (2.11). We now show that for the asymptotic value of the zero-sum stochastic game when discount factor increase to 1, u s (t) of any solution trajectory to the following system converges to the asymptotic value. u s (t) = y s(t) u s (t) (3.25) t + 2 ẋ i s br i (x i s ) x i s, i = 1, 2 (3.26) 1 ω(t) ω(t) = (2 + t) ln(t + 2). (3.27) We call such a dynamic an ω-converging best-response dynamic. Again, one can show the existence of a solution trajectory to the dynamic system. Note that it is straightforward to see from (3.27) that ω(ω) = 1 e c ln(t + 2) (3.28) where 1 e c / ln 2 = ω(0). 16

17 Lemma 3.7. In any ω-converging best-response dynamic, for all ɛ > 0, there exists t(ɛ) such that for all t t(ɛ), u s(t) y s(t) < ɛ. Proof. We firstly observe that Lemma 3.3 still holds for any ω-converging bestresponse dynamic and that both u s (t) and v s (t) are still Lipschitz continuous. We can then find a time t 1 with the property t t 1, s S, y s (t) v s (t) ɛ 64. (3.29) We define δ := max{ b 1, b 2 }, where b 1 and b 2 are defined in (2.9). We then take a time t 2 t 1 such that t t 2, 4δ ɛ ln(t + 2) ɛ 8. (3.30) We still use the notation of s(t) and s(t) introduced in (3.16) and (3.17), respectively. Suppose that at a time t t 2 Then, by (3.25) y s(t) (t) u s(t) (t) ɛ/4. (3.31) du s( ) dt ɛ 4(2 + t), (3.32) at this t. We now consider both v s( ) and u s( ) as functions of ω and t. (When in closed-loop best-response dynamics, they are functions of t only.) From Lemma 2.2 and (3.30), it follows that at this t v s( ) ω du s( ) dt dω dt δ(1 ω(t)) (t+2) ln(t+2) ɛ 4(2+t) On the other hand, (3.21) implies that v s( ) u s( ) t = 4δ(1 ω(t)) ɛ ln(t + 2) 1 ω(t) 2 du s( ) dt 1 ω(t). (3.33) 8. (3.34) Note that u s( ) is moving to v s( ) regardless of the movement of v s( ). Thus, from (3.33), (3.34), and (3.32), it follows that d v s( ) u s( ) dt 3(1 ω(t)) 8 We may further deduce from (3.28) that where c is defined in (3.28). d v s( ) u s( ) dt du s( ) dt ω(t)) 3ɛ(1. (3.35) 32(t + 2) 3ɛe c 32(t + 2) ln(t + 2), (3.36) Thus, there exists time t 3 t 2 such that (3.36) holds for all t between t 2 and t 3, and y s(t3 )(t 3 ) u s(t3 )(t 3 ) ɛ/4. 17

18 Theorem 3.8. For each state s, as time t increases to infinity in the ω-converging best-response dynamic, y s (t), u s (t), and v s (t) all converge to the asymptotic value at state s in the stochastic game where the discount factor ω increases to 1. Proof. This follows from Lemma 3.7 and Theorem Open-loop Best-response Dynamics In contrast to the closed-loop best-response dynamics, for open-loops ones in zerosum stochastic games, the continuation payoff vector in each state game is equal to the expected discounted payoff generated by the current stationary strategy profile in the stochastic game starting from that state The Dynamics Given a stationary strategy profile π, recall the expected discounted payoff u s (π) at each state s defined in (2.8). Denote the vector (u s (π)) s S by u(π). For each state s, we denote the payoff function of player 1 in the state game G s ( u(π)) by Q s, i.e., for a joint action a A s and the current strategy profile π Q s (π, a) := (1 ω)r s (a) + ω P s,s (a)u s (π). (3.37) s S We can linearly extends this payoff function to Q s (π, ρ s ) for a strategy profile π and a ρ s (A s ) at state s. Note that Q s (π, π s ) = u s (π). Given the current strategy profile π, the best-response set of player 1 and player 2 for Q at state s are the sets and BR 1 s(π) := argmax ρ 1 s 1 s BR 2 s(π) := argmin ρ 2 s 2 s respectively. We denote {BR 1 s(π), BR 2 s(π)} by BR s (π). Q s (π, ρ 1 s, π 2 s) (3.38) Q s (π, π 1 s, ρ 2 s), (3.39) The open-loop best-response dynamic in the stochastic game is defined by the differential inclusions: π i s BR i s(π) π i s, i, s. (3.40) A solution to the open-loop best-response inclusion, also called an open-loop bestresponse trajectory, is an absolutely continuous function π( ) : R + satisfies (3.40) for almost all t in R that

19 An open-loop best-response trajectory starts from the initial strategy profile π(0). At any time t in the trajectory, given the current strategy profile π(t), each player i calculates the expected discounted payoff u i s(t) and the set BRs(t) i at every state s, based on (2.8) and (3.38) or (3.39). Each player then chooses an element in BRs(t) i to generate π s( ) i at t, which specifies the adjustment direction and rate of the mixed action πs(t). i The Result As we can see, the adjustment of a continuation payoff in the open-loop bestresponse dynamic depends on the movement of the current strategy profile, while in the closed-loop dynamic, it depends on the distance between the continuation payoff and the current state-game payoff, as well as the current time. Perkins (2013) shows the following theorem. Theorem 3.9. Given any two-player zero-sum stochastic game with ω 1 ( 1 + max s S s S max ), (3.41) a A(s) (P s,s (a)) from any initial strategy profile π 0, any open-loop best-response trajectory converges to the set of stationary minimax strategy profiles. That is, lim π(t) X and lim u s (π(t)) = V s s S. t t The convergence result holds when the discount factor ω is not too big. In particular, for a zero-sum stochastic game with S states, a sufficient condition for the convergence of an open-loop best-response trajectory to X is ω < 1/(1 + S ). Comment: The proof in Perkins (2013) elaborates the approach of the Lyapunov function used in the proof of best-response dynamics in normal-form zerosum games. To be specific, if the continuation payoffs were fixed in each state game, then the technique of Lyapunov function could be applied in all state games, as in normal-form zero-sum games. We also know that in the dynamic, if the continuation payoff u s changes δ, then the energy w s in the Lyapunov function changes at most ωδ. However, the problem is that the contribution to w s due to the magnitude of u s may overpower the decline tendency of w s in the one-shot state game G s. We have so far only found an upper bound of u s with respect to Q s, the transition probabilities, and ω; see Lemma in Perkins (2013). In short, at each s, u s in a stochastic game may be convex due to the discount factor and the transition probabilities, while Lyapunov function can only be applied to a concave (or linear) payoff function; see Hofbauer and Sorin (2006). The 19

20 Figure 1: BR may not be a global best response. closer ω to 1 in a stochastic game and the more diverse the transition probabilities, the more convex u s can be Discussion We would like to point out that in a stochastic game Γ, for player 1, BRs(π) 1 in (3.38) is a best-response set to the expected discounted payoff Q s (π, ), a betterresponse subset to the expected discounted payoff u s (π) (see Lemma 3.10), but BRs(π) 1 is not necessary the best-response set to u s (π). The example below (see Figure 2) is a one-player stochastic game, so called a Markov-decision process. We may view it as a trivial two-player zero-sum game, if we let player 2 not affect any r(s) by any means. From any state in this game, the player can move to an adjacent state, or stay at the current state, both with probability 1. The stage payoffs are independent of the player s moves: r(s 1 ) = 1, r(s 5 ) = 100, r(s 2 ) = r(s 3 ) = r(s 4 ) = 0. Suppose that the initial strategy satisfies π(s 3, s 1 ) = 1 and π(s 4, s 2 ) = 1, while at all other states the player just stays there at time 0. It follows that Q s3 (π(0), (s 3, s 1 )) = ω > Q s3 (π(0), (s 3, s 4 )) = 0, and hence (s 3, s 4 ) is not included in BR s3 (π 0 ). However, when ω is close to 1, the best-response strategy for the expected discounted payoff u at any time t is a constant one, denoted as π, where the player is always moving towards state s 5, i.e., π(s 1, s 3 ) = π(s 3, s 4 ) = π(s 4, s 5 ) = π(s 2, s 4 ) = π(s 5, s 5 ) = 1. The difference between BR s3 (π 0 ) and π arises from the fact that BR s3 (π 0 ) is optimal for the payoff in the state game G s3 ( u(0)), while π is optimal for the expected discounted payoff of the player in the whole game. The best-response dynamic in (3.40) is in the agent-form setting: the player has one agent in each state, and each agent chooses an action independently. In contrast, a strategy of 20

21 the player in the whole game is a sequence of correlated actions. (See Appendix B for more discussion on the agent-form best-response dynamic.) However, we can still show that BRs(π) 1 is a subset of better-response strategies for player 1 at s. The proof of the following lemma is straightforward. Lemma Given a strategy profile π in a zero-sum stochastic game Γ, for player 1 at any state s and any action x 1 s (A 1 s), Q s (π, x 1 s, πs) 2 u s (π) u s (π s, x 1 s, πs) 2 Q s (π, x 1 s, πs) 2 and Q s (π, x 1 s, π 2 s) u s (π) u s (π s, x 1 s, π 2 s) Q s (π, x 1 s, π 2 s) where π s = s sπ s. Appendix A Minor results on open-loop bestresponse dynamics For further research, we present additional results regarding players behavior in the open-loop best-response dynamics in a zero-sum stochastic game Γ where each player has at most two actions in each state and the minimax strategy of each player in each value state game is a mixed strategy. To ease the exposition, we consider in this toy game the case S = 2 for the stochastic game Γ. The result can be generalized to the case of any finite S. We define the maximum payoff to player 1 given the strategy profile π as U 1 s (π) := max ρ 1 s 1 s Q s (π, ρ 1 s, π 2 s), (A.1) and similarly the minimum payoff to player 1 as U 2 s (π) := min ρ 2 s 2 s Q s (π, π 1 s, ρ 2 s). (A.2) For convenience, given a best-response trajectory (π(t)) t 0, we may sometimes write u s (t) for u s (π(t)), BR i s(t) for BR i s(π(t)), U i s(t) for U i s(π(t)), and given a mixed action a i i s, Q s (t, a i ) for Q s (π(t), a i, πs i (t)). We denote two states in S by α and β. When we are referring to continuation payoffs (u, v) in a state game G s at any state s, we mean the continuation payoff to state α is u and the payoff to β is v. We pick a strategy π i s in each X i s for each state s and each player i. Given a best-response trajectory (π(t)) t 0, we put δ s (t) := V s u s (t) for s {α, β}. The lemmata in this section concern the behavior of player 1, but the dual results hold for player 2. 21

22 Lemma A.1. If at time t, δ α (t) = max s S δ s (t), then Uα(t) 1 Q α (t, π α) 1 V α ωδ α (t). If this δ α (t) > 0, then for π α (t), π α 1 is a better-response strategy for player 1 in the state game G α with continuation payoffs (V α δ α (t), V β δ α (t)). Proof. Since π α 1 is a component of the minimax strategy of player 1, we may infer that Q α (t, π α) 1 =r α ( π α, 1 πα(t)) 2 + ω P αs ( π α, 1 πα(t))u 2 s (t) s S =r α ( π 1 α, π 2 α(t)) + ω s S P αs ( π 1 α, π 2 α(t))(v s δ s (t)) V α ωδ α (t). Here is a dual lemma. Lemma A.2. If at time t, δ α (t) = min s S δ s (t), then Q α (t, π 1 α) V α ωδ α (t). If this δ α (t) < 0, then for π α (t), π 1 α is not a better-response strategy for player 1 in the state game G α with continuation payoffs (V α δ α (t), V β δ α (t)). Recall that in (3.40) the strategy of a player is alway moving to a current best-response strategy in the best-response dynamic, and that A 1 s = 2 for all s in Γ. At time t, if BR 1 s(π(t)) = {a 1 s}, i.e., the best-response strategy for player 1 in the state game G s (t) is a pure strategy a 1 s, and if the strategy π 1 s X 1 s is a convex combination of a 1 s and π 1 s(t), then the strategy π 1 s(t) is also moving towards π 1 s. In fact, π 1 s(t) is moving towards any better-response strategy when BR 1 s(π(t)) is a singleton. In this case, Q s (t, π i s) > u s (t). Lemma A.3. If u α (t) < V α and δ β (t) δ α (t), then Q α (t, π 1 α) > u α (t), π 1 α(t) is moving towards π 1 α, and BR 1 α(t) is the same as the set of best-response strategies of player 1 in the state game G α with continuation payoffs (V α δ α (t), V β δ α (t)). If it also satisfies that δ α (t) < δ β (t)/ω, (A.3) then Q β (t, π β 1) > u β(t), πβ 1(t) is also moving towards π1 β, and BR1 β (t) is the same as the set of best-response strategies of player 1 in the state game G β with continuation payoffs (V α δ β (t), V β δ β (t)). 22

23 Note that this result is independent of player 2 s strategy π 2 (t). Proof. The conclusion for state α follows from Lemma A.1. At state β, we first observe that 0 < δ β (t) δ α (t) and Q β (t, π β) 1 (A.4) =r β ( π β, 1 πβ(t)) 2 + ωp βα ( π β, 1 πβ(t))(v 2 α δ α (t)) + ωp ββ ( π β, 1 πβ(t))(v 2 β δ β (t)) =V β ωp βα ( π β, 1 πβ(t))δ 2 α (t) ωp ββ ( π β, 1 πβ(t))δ 2 β (t). 1 From (A.3), it follows that for any P ββ ( π β, πβ 2 (t)), δ α (t) < ( 1 ωpββ ( π 1 β, π 2 β (t))) δ β (t) ω ( 1 P ββ ( π 1 β, π 2 β (t))), and thus, Therefore, δ β (t) > ωp βα( π 1 β, π2 β (t))δ α(t) 1 ωp ββ ( π 1 β, π2 β (t)). δ β (t) > ωp βα ( π 1 β, π 2 β(t))δ α (t) + ωp ββ ( π 1 β, π 2 β(t))δ β (t). Combined with (A.4), it follows that π β 1 is a better-response strategy for player 1 in the state game G β (t). Since there are only two pure strategies for each player in G β (t), πβ 1 (t) moving to the best-response strategy is equivalent to moving to a better-response strategy. Finally, by a similar argument to Lemma A.1 and the case of state α above, we reach the conclusion that BRβ 1 (t) is the same as the set of best-response strategies of player 1 in the state game G β with continuation payoffs (V α δ β (t), V β δ β (t)). Here is a dual lemma. Lemma A.4. If u α (t) > V α and δ β (t) δ α (t), then Q α (t, π 1 α) < u α (t), π 1 α(t) is moving away from π 1 α, and BR 1 α(t) is the same as the set of best-response strategies of player 1 in the state game G α with continuation payoffs (V α δ α (t), V β δ α (t)). If it also satisfies that δ α (t) < δ β (t)/ω, then Q β (t, π β 1) < u β(t), πβ 1(t) is also moving away from π1 β, and BR1 β (t) is the same as the set of best-response strategies of player 1 in the state game G β with continuation payoffs (V α δ β (t), V β δ β (t)). The next lemma concerns the behavior in the best-response dynamic when δ α δ β < 0. 23

24 Lemma A.5. If u α (t) < V α and u β (t) V β, then regardless of player 2 s current strategy π 2 (t), it follows that πα(t) 1 π α 1 and πβ 1(t) π1 β. Moreover, player 1 is moving towards π α 1 at state α and moving away from π β 1 at state β at time t. BRα(t) 1 and BRβ 1(t) in G β are the same as the set of best-response strategies of player 1 in the state game G α with continuation payoffs (V α δ α (t), V β δ α (t)) and (V α δ β (t), V β δ β (t)), respectively. Proof. This is a corollary of Lemma A.3 and Lemma A.4. Appendix B Agent-form Best-response Dynamics Given a zero-sum stochastic game Γ in discounted payoff, for any stationary strategy π i of player i and any state s, we denote player i s actions except the one at state s by π i s. Given a strategy profile π and a state s, the agent-form bestresponse set of player 1 and player 2 for the expected discounted payoff u s are defined as and respectively. as ABR 1 s(π) := argmax ρ 1 s 1 s ABR 2 s(π) := argmin ρ 2 s 2 s u s (ρ 1 s, π 1 s, π 2 ) u s (π 1, ρ 2 s, π 2 s), (B.1) (B.2) We can then define the player i s agent-form best-response differential inclusion s S, π i s ABR i s(π) π i s, i = 1, 2. (B.3) Again, as in normal-form zero-sum games, the set ABR i s(π) is upper semi-continuous. Hence, from any initial strategy profile π(0), a solution trajectory exists, and π(t) is Lipschitz continuous satisfying (B.3) for almost all t 0. We conjecture that not every agent-form best-response trajectory in every Γ converges to the set of minimax strategy profiles. As for the general convergence results in Barron et al. (2009), in a stochastic game we need to consider the expected discounted payoff function u : X R rather than u s at only one state s. It would be interesting to study the characterization of which Γ has a quasiconcave u in π 1 for a fixed π 2 (or the weaker condition in that paper). 24

25 By the one-player game example in Section 3.3.3, we can show that ABRs(π) 1 may still not be the set of best-response strategies of the agent for player 1 to u s (π) at state s: when players best reply against each other in the state game G s ( u(π t )) at time t, they do not take into account that the continuation payoff vector u(π t ) is adapting under current strategies. References Aubin, J.P. and A. Cellina, Differential Inclusion, Springer, Berlin, Balkenborg, D., C. Kuzmics, and J. Hofbauer (2013): Refined Best-Response Correspondence and Dynamics, Theoretical Economics, 8 (1), Barron, E.N., R. Goebel, and R.R. Jensen (2010): Best Response Dynamics for Continuous Games, Proceedings of the American Mathematical Society, 138 (3), Berger, U. (2005): Fictitious play in 2 n games, Journal of Economic Theory, 120, Bewley, T. and E. Kohlberg (1976): The asymptotic theory of stochastic games, Mathematics of Operations Research, 1, Borkar, V. (2002): Reinforcement learning in Markovian evolutionary games, Advances in Complex Systems, 5, Dutta, P.K. (1995): A Folk Theorem for Stochastic Games, Journal of Economic Theory, 66, Harris, C. (1998): On the Rate of Convergence of Continuous-Time Fictitious Play, Games and Economic Behavior, 22, Hofbauer, J. (1995): Stability for the Best Response Dynamics, mimeo, University of Vienna. Hofbauer, J., and K. Sigmund (1998): Evolutionary Games and Population Dynamics. Cambridge University Press. Hofbauer, J., and S. Sorin (2006): Best Response Dynamics for Continuous Zerosum Games, Discrete and Continuous Dynamical Systems-Series B, 6 (1),

26 Gilboa, Y., and A. Matsui (1991): Social Stability and Equilibrium, Econometrica, 59, Matsui, A. (1989): Social Stability and Equilibrium, CMS-DMS No. 819, Northwestern University. Mertens, J.-F. and A. Neyman, (1981): Stochastic games, International Journal of Game Theory, 10, Perkins, S., Advanced Stochastic Approximation Frameworks and their Applications, PhD thesis, University of Bristol, Sandholm, W. H., Population Games and Evolutionary Dynamics. MIT Press. Shapley, L. (1953): Stochastic Games, Proceedings of National Academy of Sciences of the United States of America, 39, Vrieze, O. and S. Tijs, (1982): Fictitious play applied to sequences of games and discounted stochastic games, International Journal of Game Theory, 11,

Total Reward Stochastic Games and Sensitive Average Reward Strategies

Total Reward Stochastic Games and Sensitive Average Reward Strategies JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 98, No. 1, pp. 175-196, JULY 1998 Total Reward Stochastic Games and Sensitive Average Reward Strategies F. THUIJSMAN1 AND O, J. VaiEZE2 Communicated

More information

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015 Best-Reply Sets Jonathan Weinstein Washington University in St. Louis This version: May 2015 Introduction The best-reply correspondence of a game the mapping from beliefs over one s opponents actions to

More information

Introduction to game theory LECTURE 2

Introduction to game theory LECTURE 2 Introduction to game theory LECTURE 2 Jörgen Weibull February 4, 2010 Two topics today: 1. Existence of Nash equilibria (Lecture notes Chapter 10 and Appendix A) 2. Relations between equilibrium and rationality

More information

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3 6.896 Topics in Algorithmic Game Theory February 0, 200 Lecture 3 Lecturer: Constantinos Daskalakis Scribe: Pablo Azar, Anthony Kim In the previous lecture we saw that there always exists a Nash equilibrium

More information

On Existence of Equilibria. Bayesian Allocation-Mechanisms

On Existence of Equilibria. Bayesian Allocation-Mechanisms On Existence of Equilibria in Bayesian Allocation Mechanisms Northwestern University April 23, 2014 Bayesian Allocation Mechanisms In allocation mechanisms, agents choose messages. The messages determine

More information

An Application of Ramsey Theorem to Stopping Games

An Application of Ramsey Theorem to Stopping Games An Application of Ramsey Theorem to Stopping Games Eran Shmaya, Eilon Solan and Nicolas Vieille July 24, 2001 Abstract We prove that every two-player non zero-sum deterministic stopping game with uniformly

More information

PAULI MURTO, ANDREY ZHUKOV

PAULI MURTO, ANDREY ZHUKOV GAME THEORY SOLUTION SET 1 WINTER 018 PAULI MURTO, ANDREY ZHUKOV Introduction For suggested solution to problem 4, last year s suggested solutions by Tsz-Ning Wong were used who I think used suggested

More information

Finite Memory and Imperfect Monitoring

Finite Memory and Imperfect Monitoring Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve

More information

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012 Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 22 COOPERATIVE GAME THEORY Correlated Strategies and Correlated

More information

Efficiency in Decentralized Markets with Aggregate Uncertainty

Efficiency in Decentralized Markets with Aggregate Uncertainty Efficiency in Decentralized Markets with Aggregate Uncertainty Braz Camargo Dino Gerardi Lucas Maestri December 2015 Abstract We study efficiency in decentralized markets with aggregate uncertainty and

More information

Introduction to Game Theory Evolution Games Theory: Replicator Dynamics

Introduction to Game Theory Evolution Games Theory: Replicator Dynamics Introduction to Game Theory Evolution Games Theory: Replicator Dynamics John C.S. Lui Department of Computer Science & Engineering The Chinese University of Hong Kong www.cse.cuhk.edu.hk/ cslui John C.S.

More information

Kutay Cingiz, János Flesch, P. Jean-Jacques Herings, Arkadi Predtetchinski. Doing It Now, Later, or Never RM/15/022

Kutay Cingiz, János Flesch, P. Jean-Jacques Herings, Arkadi Predtetchinski. Doing It Now, Later, or Never RM/15/022 Kutay Cingiz, János Flesch, P Jean-Jacques Herings, Arkadi Predtetchinski Doing It Now, Later, or Never RM/15/ Doing It Now, Later, or Never Kutay Cingiz János Flesch P Jean-Jacques Herings Arkadi Predtetchinski

More information

Repeated Games with Perfect Monitoring

Repeated Games with Perfect Monitoring Repeated Games with Perfect Monitoring Mihai Manea MIT Repeated Games normal-form stage game G = (N, A, u) players simultaneously play game G at time t = 0, 1,... at each date t, players observe all past

More information

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES JONATHAN WEINSTEIN AND MUHAMET YILDIZ A. We show that, under the usual continuity and compactness assumptions, interim correlated rationalizability

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.

GAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference. 14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

PURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS

PURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS HO-CHYUAN CHEN and WILLIAM S. NEILSON PURE-STRATEGY EQUILIBRIA WITH NON-EXPECTED UTILITY PLAYERS ABSTRACT. A pure-strategy equilibrium existence theorem is extended to include games with non-expected utility

More information

Stochastic Games with 2 Non-Absorbing States

Stochastic Games with 2 Non-Absorbing States Stochastic Games with 2 Non-Absorbing States Eilon Solan June 14, 2000 Abstract In the present paper we consider recursive games that satisfy an absorbing property defined by Vieille. We give two sufficient

More information

Stochastic Games and Bayesian Games

Stochastic Games and Bayesian Games Stochastic Games and Bayesian Games CPSC 532L Lecture 10 Stochastic Games and Bayesian Games CPSC 532L Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games Stochastic Games

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015. FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.) Hints for Problem Set 2 1. Consider a zero-sum game, where

More information

Stochastic Games and Bayesian Games

Stochastic Games and Bayesian Games Stochastic Games and Bayesian Games CPSC 532l Lecture 10 Stochastic Games and Bayesian Games CPSC 532l Lecture 10, Slide 1 Lecture Overview 1 Recap 2 Stochastic Games 3 Bayesian Games 4 Analyzing Bayesian

More information

Finite Population Dynamics and Mixed Equilibria *

Finite Population Dynamics and Mixed Equilibria * Finite Population Dynamics and Mixed Equilibria * Carlos Alós-Ferrer Department of Economics, University of Vienna Hohenstaufengasse, 9. A-1010 Vienna (Austria). E-mail: Carlos.Alos-Ferrer@Univie.ac.at

More information

An Adaptive Learning Model in Coordination Games

An Adaptive Learning Model in Coordination Games Department of Economics An Adaptive Learning Model in Coordination Games Department of Economics Discussion Paper 13-14 Naoki Funai An Adaptive Learning Model in Coordination Games Naoki Funai June 17,

More information

LECTURE NOTES 10 ARIEL M. VIALE

LECTURE NOTES 10 ARIEL M. VIALE LECTURE NOTES 10 ARIEL M VIALE 1 Behavioral Asset Pricing 11 Prospect theory based asset pricing model Barberis, Huang, and Santos (2001) assume a Lucas pure-exchange economy with three types of assets:

More information

Log-linear Dynamics and Local Potential

Log-linear Dynamics and Local Potential Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically

More information

Game Theory Fall 2003

Game Theory Fall 2003 Game Theory Fall 2003 Problem Set 5 [1] Consider an infinitely repeated game with a finite number of actions for each player and a common discount factor δ. Prove that if δ is close enough to zero then

More information

KIER DISCUSSION PAPER SERIES

KIER DISCUSSION PAPER SERIES KIER DISCUSSION PAPER SERIES KYOTO INSTITUTE OF ECONOMIC RESEARCH http://www.kier.kyoto-u.ac.jp/index.html Discussion Paper No. 657 The Buy Price in Auctions with Discrete Type Distributions Yusuke Inami

More information

Microeconomic Theory II Preliminary Examination Solutions

Microeconomic Theory II Preliminary Examination Solutions Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose

More information

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision

More information

Game theory for. Leonardo Badia.

Game theory for. Leonardo Badia. Game theory for information engineering Leonardo Badia leonardo.badia@gmail.com Zero-sum games A special class of games, easier to solve Zero-sum We speak of zero-sum game if u i (s) = -u -i (s). player

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES

INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES INTERIM CORRELATED RATIONALIZABILITY IN INFINITE GAMES JONATHAN WEINSTEIN AND MUHAMET YILDIZ A. In a Bayesian game, assume that the type space is a complete, separable metric space, the action space is

More information

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India August 2012

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India August 2012 Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India August 2012 Chapter 6: Mixed Strategies and Mixed Strategy Nash Equilibrium

More information

Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the

Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the open text license amendment to version 2 of the GNU General

More information

A class of coherent risk measures based on one-sided moments

A class of coherent risk measures based on one-sided moments A class of coherent risk measures based on one-sided moments T. Fischer Darmstadt University of Technology November 11, 2003 Abstract This brief paper explains how to obtain upper boundaries of shortfall

More information

Long run equilibria in an asymmetric oligopoly

Long run equilibria in an asymmetric oligopoly Economic Theory 14, 705 715 (1999) Long run equilibria in an asymmetric oligopoly Yasuhito Tanaka Faculty of Law, Chuo University, 742-1, Higashinakano, Hachioji, Tokyo, 192-03, JAPAN (e-mail: yasuhito@tamacc.chuo-u.ac.jp)

More information

arxiv: v1 [math.oc] 23 Dec 2010

arxiv: v1 [math.oc] 23 Dec 2010 ASYMPTOTIC PROPERTIES OF OPTIMAL TRAJECTORIES IN DYNAMIC PROGRAMMING SYLVAIN SORIN, XAVIER VENEL, GUILLAUME VIGERAL Abstract. We show in a dynamic programming framework that uniform convergence of the

More information

Microeconomics II. CIDE, MsC Economics. List of Problems

Microeconomics II. CIDE, MsC Economics. List of Problems Microeconomics II CIDE, MsC Economics List of Problems 1. There are three people, Amy (A), Bart (B) and Chris (C): A and B have hats. These three people are arranged in a room so that B can see everything

More information

Bargaining and Competition Revisited Takashi Kunimoto and Roberto Serrano

Bargaining and Competition Revisited Takashi Kunimoto and Roberto Serrano Bargaining and Competition Revisited Takashi Kunimoto and Roberto Serrano Department of Economics Brown University Providence, RI 02912, U.S.A. Working Paper No. 2002-14 May 2002 www.econ.brown.edu/faculty/serrano/pdfs/wp2002-14.pdf

More information

10.1 Elimination of strictly dominated strategies

10.1 Elimination of strictly dominated strategies Chapter 10 Elimination by Mixed Strategies The notions of dominance apply in particular to mixed extensions of finite strategic games. But we can also consider dominance of a pure strategy by a mixed strategy.

More information

A reinforcement learning process in extensive form games

A reinforcement learning process in extensive form games A reinforcement learning process in extensive form games Jean-François Laslier CNRS and Laboratoire d Econométrie de l Ecole Polytechnique, Paris. Bernard Walliser CERAS, Ecole Nationale des Ponts et Chaussées,

More information

Repeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games

Repeated Games. September 3, Definitions: Discounting, Individual Rationality. Finitely Repeated Games. Infinitely Repeated Games Repeated Games Frédéric KOESSLER September 3, 2007 1/ Definitions: Discounting, Individual Rationality Finitely Repeated Games Infinitely Repeated Games Automaton Representation of Strategies The One-Shot

More information

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical

More information

Lecture Note Set 3 3 N-PERSON GAMES. IE675 Game Theory. Wayne F. Bialas 1 Monday, March 10, N-Person Games in Strategic Form

Lecture Note Set 3 3 N-PERSON GAMES. IE675 Game Theory. Wayne F. Bialas 1 Monday, March 10, N-Person Games in Strategic Form IE675 Game Theory Lecture Note Set 3 Wayne F. Bialas 1 Monday, March 10, 003 3 N-PERSON GAMES 3.1 N-Person Games in Strategic Form 3.1.1 Basic ideas We can extend many of the results of the previous chapter

More information

Martingales. by D. Cox December 2, 2009

Martingales. by D. Cox December 2, 2009 Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a

More information

Homework 2: Dynamic Moral Hazard

Homework 2: Dynamic Moral Hazard Homework 2: Dynamic Moral Hazard Question 0 (Normal learning model) Suppose that z t = θ + ɛ t, where θ N(m 0, 1/h 0 ) and ɛ t N(0, 1/h ɛ ) are IID. Show that θ z 1 N ( hɛ z 1 h 0 + h ɛ + h 0m 0 h 0 +

More information

STOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION

STOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION STOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION BINGCHAO HUANGFU Abstract This paper studies a dynamic duopoly model of reputation-building in which reputations are treated as capital stocks that

More information

Chapter 3. Dynamic discrete games and auctions: an introduction

Chapter 3. Dynamic discrete games and auctions: an introduction Chapter 3. Dynamic discrete games and auctions: an introduction Joan Llull Structural Micro. IDEA PhD Program I. Dynamic Discrete Games with Imperfect Information A. Motivating example: firm entry and

More information

Finite Memory and Imperfect Monitoring

Finite Memory and Imperfect Monitoring Federal Reserve Bank of Minneapolis Research Department Staff Report 287 March 2001 Finite Memory and Imperfect Monitoring Harold L. Cole University of California, Los Angeles and Federal Reserve Bank

More information

Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh

Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Online Appendix for Debt Contracts with Partial Commitment by Natalia Kovrijnykh Omitted Proofs LEMMA 5: Function ˆV is concave with slope between 1 and 0. PROOF: The fact that ˆV (w) is decreasing in

More information

CONSISTENCY AMONG TRADING DESKS

CONSISTENCY AMONG TRADING DESKS CONSISTENCY AMONG TRADING DESKS David Heath 1 and Hyejin Ku 2 1 Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA, USA, email:heath@andrew.cmu.edu 2 Department of Mathematics

More information

The folk theorem revisited

The folk theorem revisited Economic Theory 27, 321 332 (2006) DOI: 10.1007/s00199-004-0580-7 The folk theorem revisited James Bergin Department of Economics, Queen s University, Ontario K7L 3N6, CANADA (e-mail: berginj@qed.econ.queensu.ca)

More information

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2 6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2 Daron Acemoglu and Asu Ozdaglar MIT October 14, 2009 1 Introduction Outline Review Examples of Pure Strategy Nash Equilibria Mixed Strategies

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Mixed Strategies. Samuel Alizon and Daniel Cownden February 4, 2009

Mixed Strategies. Samuel Alizon and Daniel Cownden February 4, 2009 Mixed Strategies Samuel Alizon and Daniel Cownden February 4, 009 1 What are Mixed Strategies In the previous sections we have looked at games where players face uncertainty, and concluded that they choose

More information

Subgame Perfect Cooperation in an Extensive Game

Subgame Perfect Cooperation in an Extensive Game Subgame Perfect Cooperation in an Extensive Game Parkash Chander * and Myrna Wooders May 1, 2011 Abstract We propose a new concept of core for games in extensive form and label it the γ-core of an extensive

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate

No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate Fuzzy Optim Decis Making 217 16:221 234 DOI 117/s17-16-9246-8 No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate Xiaoyu Ji 1 Hua Ke 2 Published online: 17 May 216 Springer

More information

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan January 9, 216 Abstract We analyze a dynamic model of judicial decision

More information

Chair of Communications Theory, Prof. Dr.-Ing. E. Jorswieck. Übung 5: Supermodular Games

Chair of Communications Theory, Prof. Dr.-Ing. E. Jorswieck. Übung 5: Supermodular Games Chair of Communications Theory, Prof. Dr.-Ing. E. Jorswieck Übung 5: Supermodular Games Introduction Supermodular games are a class of non-cooperative games characterized by strategic complemetariteis

More information

Credibilistic Equilibria in Extensive Game with Fuzzy Payoffs

Credibilistic Equilibria in Extensive Game with Fuzzy Payoffs Credibilistic Equilibria in Extensive Game with Fuzzy Payoffs Yueshan Yu Department of Mathematical Sciences Tsinghua University Beijing 100084, China yuyueshan@tsinghua.org.cn Jinwu Gao School of Information

More information

On Forchheimer s Model of Dominant Firm Price Leadership

On Forchheimer s Model of Dominant Firm Price Leadership On Forchheimer s Model of Dominant Firm Price Leadership Attila Tasnádi Department of Mathematics, Budapest University of Economic Sciences and Public Administration, H-1093 Budapest, Fővám tér 8, Hungary

More information

On the Lower Arbitrage Bound of American Contingent Claims

On the Lower Arbitrage Bound of American Contingent Claims On the Lower Arbitrage Bound of American Contingent Claims Beatrice Acciaio Gregor Svindland December 2011 Abstract We prove that in a discrete-time market model the lower arbitrage bound of an American

More information

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,

More information

On the existence of coalition-proof Bertrand equilibrium

On the existence of coalition-proof Bertrand equilibrium Econ Theory Bull (2013) 1:21 31 DOI 10.1007/s40505-013-0011-7 RESEARCH ARTICLE On the existence of coalition-proof Bertrand equilibrium R. R. Routledge Received: 13 March 2013 / Accepted: 21 March 2013

More information

Sequential Investment, Hold-up, and Strategic Delay

Sequential Investment, Hold-up, and Strategic Delay Sequential Investment, Hold-up, and Strategic Delay Juyan Zhang and Yi Zhang February 20, 2011 Abstract We investigate hold-up in the case of both simultaneous and sequential investment. We show that if

More information

Regret Minimization and Security Strategies

Regret Minimization and Security Strategies Chapter 5 Regret Minimization and Security Strategies Until now we implicitly adopted a view that a Nash equilibrium is a desirable outcome of a strategic game. In this chapter we consider two alternative

More information

Finding Equilibria in Games of No Chance

Finding Equilibria in Games of No Chance Finding Equilibria in Games of No Chance Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen Department of Computer Science, University of Aarhus, Denmark {arnsfelt,bromille,trold}@daimi.au.dk

More information

Making Complex Decisions

Making Complex Decisions Ch. 17 p.1/29 Making Complex Decisions Chapter 17 Ch. 17 p.2/29 Outline Sequential decision problems Value iteration algorithm Policy iteration algorithm Ch. 17 p.3/29 A simple environment 3 +1 p=0.8 2

More information

Bilateral trading with incomplete information and Price convergence in a Small Market: The continuous support case

Bilateral trading with incomplete information and Price convergence in a Small Market: The continuous support case Bilateral trading with incomplete information and Price convergence in a Small Market: The continuous support case Kalyan Chatterjee Kaustav Das November 18, 2017 Abstract Chatterjee and Das (Chatterjee,K.,

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5 Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Repeated games OR 8 and 9, and FT 5 The basic idea prisoner s dilemma The prisoner s dilemma game with one-shot payoffs 2 2 0

More information

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.

More information

Discounted Stochastic Games

Discounted Stochastic Games Discounted Stochastic Games Eilon Solan October 26, 1998 Abstract We give an alternative proof to a result of Mertens and Parthasarathy, stating that every n-player discounted stochastic game with general

More information

Convergence Analysis of Monte Carlo Calibration of Financial Market Models

Convergence Analysis of Monte Carlo Calibration of Financial Market Models Analysis of Monte Carlo Calibration of Financial Market Models Christoph Käbe Universität Trier Workshop on PDE Constrained Optimization of Certain and Uncertain Processes June 03, 2009 Monte Carlo Calibration

More information

Mixed Strategies. In the previous chapters we restricted players to using pure strategies and we

Mixed Strategies. In the previous chapters we restricted players to using pure strategies and we 6 Mixed Strategies In the previous chapters we restricted players to using pure strategies and we postponed discussing the option that a player may choose to randomize between several of his pure strategies.

More information

Game Theory for Wireless Engineers Chapter 3, 4

Game Theory for Wireless Engineers Chapter 3, 4 Game Theory for Wireless Engineers Chapter 3, 4 Zhongliang Liang ECE@Mcmaster Univ October 8, 2009 Outline Chapter 3 - Strategic Form Games - 3.1 Definition of A Strategic Form Game - 3.2 Dominated Strategies

More information

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015. FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.) Hints for Problem Set 3 1. Consider the following strategic

More information

Maintaining a Reputation Against a Patient Opponent 1

Maintaining a Reputation Against a Patient Opponent 1 Maintaining a Reputation Against a Patient Opponent July 3, 006 Marco Celentani Drew Fudenberg David K. Levine Wolfgang Pesendorfer ABSTRACT: We analyze reputation in a game between a patient player and

More information

CS364A: Algorithmic Game Theory Lecture #14: Robust Price-of-Anarchy Bounds in Smooth Games

CS364A: Algorithmic Game Theory Lecture #14: Robust Price-of-Anarchy Bounds in Smooth Games CS364A: Algorithmic Game Theory Lecture #14: Robust Price-of-Anarchy Bounds in Smooth Games Tim Roughgarden November 6, 013 1 Canonical POA Proofs In Lecture 1 we proved that the price of anarchy (POA)

More information

Assets with possibly negative dividends

Assets with possibly negative dividends Assets with possibly negative dividends (Preliminary and incomplete. Comments welcome.) Ngoc-Sang PHAM Montpellier Business School March 12, 2017 Abstract The paper introduces assets whose dividends can

More information

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012 Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012 COOPERATIVE GAME THEORY The Core Note: This is a only a

More information

Sequential Investment, Hold-up, and Strategic Delay

Sequential Investment, Hold-up, and Strategic Delay Sequential Investment, Hold-up, and Strategic Delay Juyan Zhang and Yi Zhang December 20, 2010 Abstract We investigate hold-up with simultaneous and sequential investment. We show that if the encouragement

More information

Equilibrium selection and consistency Norde, Henk; Potters, J.A.M.; Reijnierse, Hans; Vermeulen, D.

Equilibrium selection and consistency Norde, Henk; Potters, J.A.M.; Reijnierse, Hans; Vermeulen, D. Tilburg University Equilibrium selection and consistency Norde, Henk; Potters, J.A.M.; Reijnierse, Hans; Vermeulen, D. Published in: Games and Economic Behavior Publication date: 1996 Link to publication

More information

A Decentralized Learning Equilibrium

A Decentralized Learning Equilibrium Paper to be presented at the DRUID Society Conference 2014, CBS, Copenhagen, June 16-18 A Decentralized Learning Equilibrium Andreas Blume University of Arizona Economics ablume@email.arizona.edu April

More information

MA300.2 Game Theory 2005, LSE

MA300.2 Game Theory 2005, LSE MA300.2 Game Theory 2005, LSE Answers to Problem Set 2 [1] (a) This is standard (we have even done it in class). The one-shot Cournot outputs can be computed to be A/3, while the payoff to each firm can

More information

Tangent Lévy Models. Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford.

Tangent Lévy Models. Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford. Tangent Lévy Models Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford June 24, 2010 6th World Congress of the Bachelier Finance Society Sergey

More information

Alp E. Atakan and Mehmet Ekmekci

Alp E. Atakan and Mehmet Ekmekci REPUTATION IN REPEATED MORAL HAZARD GAMES 1 Alp E. Atakan and Mehmet Ekmekci We study an infinitely repeated game where two players with equal discount factors play a simultaneous-move stage game. Player

More information

3.2 No-arbitrage theory and risk neutral probability measure

3.2 No-arbitrage theory and risk neutral probability measure Mathematical Models in Economics and Finance Topic 3 Fundamental theorem of asset pricing 3.1 Law of one price and Arrow securities 3.2 No-arbitrage theory and risk neutral probability measure 3.3 Valuation

More information

NAIVE REINFORCEMENT LEARNING WITH ENDOGENOUS ASPIRATIONS. University College London, U.K., and Texas A&M University, U.S.A. 1.

NAIVE REINFORCEMENT LEARNING WITH ENDOGENOUS ASPIRATIONS. University College London, U.K., and Texas A&M University, U.S.A. 1. INTERNATIONAL ECONOMIC REVIEW Vol. 41, No. 4, November 2000 NAIVE REINFORCEMENT LEARNING WITH ENDOGENOUS ASPIRATIONS By Tilman Börgers and Rajiv Sarin 1 University College London, U.K., and Texas A&M University,

More information

Introduction to Game Theory Lecture Note 5: Repeated Games

Introduction to Game Theory Lecture Note 5: Repeated Games Introduction to Game Theory Lecture Note 5: Repeated Games Haifeng Huang University of California, Merced Repeated games Repeated games: given a simultaneous-move game G, a repeated game of G is an extensive

More information

CHAPTER 14: REPEATED PRISONER S DILEMMA

CHAPTER 14: REPEATED PRISONER S DILEMMA CHAPTER 4: REPEATED PRISONER S DILEMMA In this chapter, we consider infinitely repeated play of the Prisoner s Dilemma game. We denote the possible actions for P i by C i for cooperating with the other

More information

Equilibrium payoffs in finite games

Equilibrium payoffs in finite games Equilibrium payoffs in finite games Ehud Lehrer, Eilon Solan, Yannick Viossat To cite this version: Ehud Lehrer, Eilon Solan, Yannick Viossat. Equilibrium payoffs in finite games. Journal of Mathematical

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

REPUTATION WITH LONG RUN PLAYERS

REPUTATION WITH LONG RUN PLAYERS REPUTATION WITH LONG RUN PLAYERS ALP E. ATAKAN AND MEHMET EKMEKCI Abstract. Previous work shows that reputation results may fail in repeated games with long-run players with equal discount factors. Attention

More information

Rohini Kumar. Statistics and Applied Probability, UCSB (Joint work with J. Feng and J.-P. Fouque)

Rohini Kumar. Statistics and Applied Probability, UCSB (Joint work with J. Feng and J.-P. Fouque) Small time asymptotics for fast mean-reverting stochastic volatility models Statistics and Applied Probability, UCSB (Joint work with J. Feng and J.-P. Fouque) March 11, 2011 Frontier Probability Days,

More information

Information Aggregation in Dynamic Markets with Strategic Traders. Michael Ostrovsky

Information Aggregation in Dynamic Markets with Strategic Traders. Michael Ostrovsky Information Aggregation in Dynamic Markets with Strategic Traders Michael Ostrovsky Setup n risk-neutral players, i = 1,..., n Finite set of states of the world Ω Random variable ( security ) X : Ω R Each

More information