Fixed Point Solution to Stochastic Priority Games

Size: px

Start display at page:

Download "Fixed Point Solution to Stochastic Priority Games"

Juliet Shaw
5 years ago
Views:

1 Fxed Pont Soluton to Stochastc Prorty Games Bruno Karelovć and Wes law Zelona IRIF, Unversté Pars Dderot - Pars 7 case 7014, Pars Cedex 13, France bruno.arelovc@gmal.com weslaw.zelona@rf.fr March 1, 2017 Abstract We defne and examne a class of two-player stochastc games that we call prorty games. The prorty games contan as proper subclasses the party games studed n computer scence [4] and the games wth the lmsup and lmnf payoff mappngs. We show that the value of the prorty game can be expressed as an approprate nested fxed pont of the value mappng of the one-day game. Ths extends the result of de Alfaro and Majumdar [4], where the authors proved that the value of the stochastc party game can be expressed as the nested fxed pont of the one-day value mappng. The dfference between our paper and [4] s two-fold. The value of the party game s obtaned by applyng the least and the greatest fxed ponts to the value mappng of the one-day game. However, n general, the greatest and the least fxed-ponts are not suffcent n order to obtan the value of the prorty game. To cope wth ths problem we ntroduce the noton of the nearest fxed pont of a monotone bounded nonexpansve mappng. Our man result s that the value of the prorty game can be obtaned as the nested nearest fxed pont of the value mappng of the one-day game. The second pont that maes our proof dfferent from [4] s that our proof s nductve. We gve a game nterpretaton for the nested fxed pont formula where some varables are free (not bounded by the fxed pont operator). Thus nstead of provng the man result n one bg step as n [4] we can lmt ourselves to the case when just one fxed pont s added to the nested fxed pont formula. 1 Introducton Stochastc two-player zero-sum games model the long-term nteractons between two players that have strctly opposte objectves. The study of stochastc games starts wth the semnal paper of Shapley [10]. Snce then stochastc games were ntensvely studed n game theory and, more recently, n computer scence. 1

2 In stochastc games players preferences are expressed by means of a payoff mappng. The payoff mappng maps nfnte plays (nfnte sequences of states and actons) to real numbers. The payoff mappngs used n computer scence tend to be dfferent from the payoff mappngs used n game theory. The payoffs prevalent n computer scence are often expressed n some nd of logc whch mples that they tae only two values, 1 for the wnnng plays and 0 for the losng plays. On the other hand, the payoff mappngs used n game theory are rather real valued: mean-payoff, dscounted payoff, lm sup and lm nf payoffs are among the most popular ones. In ths paper we defne and examne the class of prorty games. The prorty games consttute a natural extenson of party games, ths latter class s the class of games popular n computer scence havng applcatons n automata theory and verfcaton. However, the prorty games are also relevant to the games tradtonally studed n game theory. It turns out that the games wth the lmsup and lmnf payoff [8] belong to the class of prorty games. To put the results of the paper n the context we recall below the relevant results concernng the stochastc party games due to dealfaro and Majumdar [4]. 1.1 Party games A stochastc party game s a zero-sum two-player game nfnte game played by two players, player Max and player Mn, on an arena wth a fnte set of states S rns t1,..., nu. For each state, players Max and Mn have nonempty sets of avalable actons, Apq and Bpq respectvely. At each stage, the players, nowng the current state and all the prevous hstory, choose ndependently and smultaneously actons a P Apq and b P Bpq respectvely and the game moves to state j wth probablty ppj, a, bq. Immedately after each stage, and before the next one, both players are nformed about the acton played by the adversary player. An nfnte sequence of states and acton occurrng durng the game s called a play. The party games are endowed wth the reward vector r pr 1,..., r n q, where r P t0, 1u s the reward of state. The party payoff ϕphq of an nfnte play h s defned to be equal 1 to the reward of the maxmal state vsted nfntely often n h,.e. the payoff s equal to r f was vsted nfntely often n h and all states j, j ą, were vsted only a fnte number of tmes. The set of all plays s endowed n the usual way wth the Borel σ-algebra generated by the cylnders. Strateges σ, τ of players Max and Mn and an ntal state P S gve rse to a probablty measure P σ,τ over the Borel σ-algebra. The am of player Max (respectvely Mn) s to maxmze (respectvely mnmze) the expected payoff E σ,τ pϕq ż ϕphqp σ,τ pdhq 1 The payoff of the party game s usually formulated n a bt dfferent way, however t s easy to see that the defnton gven here s equvalent to the usual one. 2

3 for each ntal state. Snce the party payoff s Borel measurable, by the result of Martn [9], party games have value v for each ntal state,.e. sup nf σ τ Eσ,τ pϕq v nf τ sup E σ,τ P S. (1) σ One of the technques used to solve party games reles on the µ-calculus. In ths approach the pont of departure s the one-day game 2 played at each state P S. The one-day game has a value for each state P S and each reward vector r pr 1,..., r n q. Let f pf 1,..., f n q (2) be the mappng that maps the reward vector r P t0, 1u n to the vector of values of the one-day game,.e. for r pr 1,..., r n q and P S, f prq s the value of the one-day game played at state when the reward vector s r. We endow r0, 1s n wth the product order, x px 1,..., x n q ď py 1,..., y n q y f x ď y for all P rns, whch maes t a complete lattce. It s easy to see that f : r0, 1s n Ñ r0, 1s n s monotone under ď, thus by Tars s theorem [11], f has the least and the greatest fxed ponts. Then one can defne the nested fxed pont Fx n pfqprq µ rn x n.µ r n 1 x n µ r 2 x 2.µ r 1 x 1.fpx 1, x 2,..., x n 1, x n q, (3) where µ r x denotes ether the greatest fxed pont f r 1, or the least fxed pont f r 0, and f the value functon (2) of the one-day game. The man result obtaned by de Alfaro and Majumdar [4] n the µ-calculus approach to party games s that v pv 1,..., v n q Fx n pfqprq, where the left-hand sde vector v s composed of the values v for the party game startng at, cf. (1). To summarze, the value vector of the party game can be obtaned by calculatng the nested fxed pont of the one-day value mappng 3. The µ-calculus approach to party games was frst developed for determnstc party games (perfect nformaton games wth determnstc transtons), see Waluewcz [12]. The paper of de Alfaro and Majumdar [4] extended ths approach to stochastc party games. 2 In computer scence papers the one-day game s often not mentoned explctly, but the value functon f of the one-day game s used n the µ-calculus approach to party games, where t s sometmes called the predecessor operator. 3 The tradtonal presentaton of ths result s a bt dfferent. Roughly speang the varables are regrouped n blocs, each bloc conssts of consecutve varables to whch the same fxed pont s appled. The each fxed pont s appled to a group of varables rather than to separate varables. Ths allows to decreases the number of fxed ponts and the resultng formula alternates the least and the greatest fxed ponts. Ths s, however, only a techncal detal whch has no bearng on the result. For our purposes t s more convenent to apply fxed ponts to varables rather than to groups of varables. 3

4 1.2 From party games to prorty games The party games arose from the study of decdablty questons n logc. In ths framewor the wnnng crtera are expressed n some nd of logc, where there s room for only two types of plays, the wnnng plays that satsfy a logcal formula and the losng plays that do not satsfy the formula. For ths reason the rewards n the party games tae only two values, 0 and 1, wth the ntuton that the reward 1 s favorable and the reward 0 unfavorable for our player (and the preferences are nverse for the adversary player). However, the restrcton to 0, 1 rewards does not allow to express fner player s preferences. Ths motvates the study of the games that allow any real rewards. We defne the prorty game as the game where each state P rns S s equpped wth a reward r P R. Le n the party game the payoff ϕphq of a play h s defned to be the reward r of the greatest state that s vsted nfntely often n h. At frst glance, the prorty game s just a mld extenson of the party game. Ths mpresson s renforced by the fact that determnstc prorty games can be reduced to determnstc party games. (However, we do not now f such reducton s possble for stochastc prorty games.) The nterest n prorty games s twofold. Frst, the prorty games allow to quantfy players preferences n a more subtle way than t s possble n party games. Whle n party games there are only two classes of plays, the plays wth the party payoff 1 and the plays wth the party payoff 0, n prorty games we can dstngush many levels of preferences. As a motvatng smple example consder the prorty game wth three states S t1, 2, 3u and rewards r 1 0, r 2 1, r Ths game gves rse to three dstnct classes of nfnte plays: player Max hghest preference s for the plays such that the maxmal state vsted nfntely often s state 2 (these plays gve hm the payoff 1), hs second preference s for the plays that vst state 3 nfntely often (such plays gve hm the payoff 3 4 ), and hs lowest preference s for the plays that from some moment onward stay forever n state 1 (ths yelds hm the payoff 0). It s mpossble to capture such a herarchy of preferences when we lmt ourselves to the party payoff. The second reason to be nterested n the games wth prorty payoff stems from the fact that not only they generalze the party games, but they contan as proper subclasses the games wth the lm sup and lm nf payoffs [7]. Let p q 8 1 be the nfnte sequence of states vsted durng the play, where s the state vsted at stage. Let pr q 8 1 be the correspondng sequence of rewards. The lmsup game (respectvely lmnf game) s the game wth the payoff equal to lm sup r (respectvely lm nf r ). To see that a lmsup game s a prorty game let us tae a fnte state lmsup game and rename the states n such a way that for any two states, j P rns, f ă j then r ď r j,.e. the natural order of states reflects the reward order. Then the lmsup payoff wll be equal to the prorty payoff. For a lmnf game we proceed n a smlar way: we rename the states n such a way that, for any two states, j P rns, ă j mples that r j ď r. Under ths condton the lmnf payoff wll be equal to the prorty payoff. Our approach to prorty games s nspred by the µ-calculus approach to party 4

5 games. There are two major dfferences however. It s mpossble to solve the prorty games usng only the least and the greatest fxed ponts, we need also other fxed ponts that we name the nearest fxed ponts. To defne ths noton we use the well nown fact that the one-day game value mappng (2) s not only monotone but t s also nonexpansve, whch means that, for x, y P R n, fpxq fpyq 8 ď x y 8, where x 8 sup x s the supremum norm. In the study of party games the fact that the one-day game value mappng f s nonexpansve s rrelevant, the monotoncty of f s all that we need n order to apply Tars s fxed pont theorem. When we study the prorty games, other fxed ponts enter nto consderaton and the fact that f s nonexpansve becomes paramount. It turns out that the prorty games wth rewards n R can be reduced through a lnear transformaton to the prorty games wth rewards n the nterval r0, 1s. Therefore n the sequel we assume that the reward vector r pr 1,..., r n q belongs to r0, 1s n. Under ths condton value mappng f of the one-day game (2) s a monotone nonexpansve mappng from r0, 1s n to r0, 1s n. Snce our study of prorty games s based on the analyss of the fxed ponts of f, n Secton 3 we prepare the bacground and present basc facts concernng fxed ponts of monotone nonexpansve mappngs from r0, 1s n to r0, 1s n. All the results presented n Secton 3 are ether well nown or are rather straghtforward observatons. The only purpose of Secton 3 s to regroup n one place all the relevant facts and to ntroduce the noton of the nearest fxed pont µ r x.gpxq of monotone nonexpansve mappngs g : r0, 1s Ñ r0, 1s. Intutvely, µ r x.gpxq s the fxed pont of g whch s nearest to r P r0, 1s. Note that the least and the greatest fxed ponts of g are specal cases of ths noton, the greatest fxed pont s the fxed pont nearest to 1 and the least fxed pont s the fxed pont nearest to 0. We show that the noton of the nearest fxed pont maes sense for monotone nonexpansve mappngs from r0, 1s to r0, 1s. In Secton 3 we defne also, for each vector r pr 1,..., r n q P r0, 1s n and a monotone nonexpansve mappng f : r0, 1s n Ñ r0, 1s n, the nested nearest fxed pont Fx n pfqprq µ rn x n.µ r n 1 x n µ r 2 x 2.µ r 1 x 1.fpx 1, x 2,..., x n 1, x n q, (4) whch generalzes the nested least/greatest fxed pont (3). Secton 4 ntroduces the one-day games. Secton 5 consttutes the core of the paper. We prove that the value vector v pv 1,..., v n q, where v s the value of state n the prorty games satsfes v pv 1,..., v n q Fx n pfqprq, where the rght-hand sde s the nested nearest fxed pont (4) of the value mappng of the one-day game. Although the result of Secton 5 can be seen as an extenson of the µ-calculus characterzaton nown for party games [4], there s one pont that dstngush our approach from the tradtonal µ-calculus approach to party games. In the case of party games 4, 4 Ths remar concerns also determnstc party games [12]. 5

6 to the best of our nowledge, the µ-calculus proofs presented prevously were not nductve, rather a formula smlar to (3) was presented and t was shown, n one bg step, that t represents the value of the party game. The fact that the nested fxed pont formula (3) s n some sense recursve, was not exploted to the full extent n the proof. The novelty of the proof presented n Secton 5 les n the fact that t s genunely nductve. We provde a clear game theoretc nterpretaton of the partal fxed pont formula Fx pfqprq µ r x.... µ r 1 x 1.fpx 1,..., x, r `1,..., r n q, (5) where the fxed ponts are appled only to the low prorty varables x 1,..., x, whle the free varables x `1,..., x n tae values r `1,..., r n respectvely. Let Gprq be the prorty game endowed wth the reward vector r. Let G prq be the prorty game obtaned from Gprq by transformng all states, ą, nto absorbng states 5, whle the states j wth j ď have the same transtons n G h prq as n Gprq. Both games have the same reward vector r. It turns out that the partal nested fxed pont (5) s equal to the value vector v pv 1,..., v n q of the prorty game G prq. We prove ths fact by nducton startng wth the trval prorty game G 0 prq, where all states are absorbng. The nductve step consst n showng that, f (5) s the value of the game G prq, then addng the new fxed pont µ r `1x `1 we obtan the value vector of the game G `1 prq. In other words, addng one fxed pont corresponds to the transformaton of an absorbng state nto a nonabsorbng one. Note that n prorty games the absorbng states are trval, f a state m s absorbng then v m r m,.e. the value of m s equal to the reward r m. Thus transformng an absorbng state nto a nonabsorbng we convert a trval state nto a nontrval one. The crucal pont s that n the nductve proof gven n the paper we apply ths transformaton to just one state. And t s much easer to comprehend what happens f one state changes ts qualty from absorbng to nonabsorbng than when all states are nonabsorbng from the outset. The prelmnary verson of ths paper appeared n [5]. 2 Stochastc prorty games An arena for a two-player stochastc prorty game s composed of a fnte set of states S rns t1, 2,..., nu Ă N (we assume wthout loss of generalty that S s a subset of postve ntegers) and fnte sets A and B of actons of players Max and Mn. For each state, Apq Ď A and Bpq Ď B are fnte nonempty sets of actons that players Max and Mn can play at. We assume that A and B are dsjont and papqq PS, pbpqq PS are parttons of A and B. For, j P S, a P Apq, b P Bpq, ppj, a, bq s the probablty to move to j f players Max and Mn execute respectvely actons a and b at. 5 Recall that a state s absorbng f t s mpossble to leave,.e. for all possble actons executed n wth probablty 1 the game remans n. 6

7 An nfnte game s played by players Max and Mn. At each stage, gven the current state, the players choose smultaneously and ndependently actons a P Apq and b P Bpq and the game moves to a new state j wth probablty ppj, a, bq. The couple pa, bq s called the jont acton. A fnte hstory s a sequence h s 1, a 1, b 1, s 2, a 2, b 2, s 3..., a t 1, b t 1, s t alternatng states s and jont actons pa, b q and begnnng and endng wth a state. The length of h s the number of jont actons n h, n partcular a hstory of length 0 conssts of just one state and no actons. The set of fnte hstores s denoted H. A strategy of player Max s a mappng σ : H Ñ paq, where paq denotes the set of probablty dstrbutons over A. We requre that supppσphqq Ď Apq, where s the last state of h and supppσphqq : ta P A σphqpaq ą 0u s the support of the measure σphq. A strategy σ s memoryless f σphq depends only on the last state of h. Thus memoryless strateges of player Max can be dentfed wth mappngs from S to paq such that supppσpqq Ď Apq for each P S. Strateges for player Mn are defned n a smlar way. We use σ and τ (wth subscrpts or superscrpts) to denote strateges of Max and Mn. Σ and T wll stand for the sets of all strateges for players Max and Max respectvely. An nfnte hstory or a play s an nfnte sequence h s 1, a 1, b 1, s 2, a 2, b 2, s 3, a 3, b 3,... alternatng states s and jont actons pa, b q. The set of nfnte hstores s denoted H 8. For a fnte hstory h, by h` we denote the cylnder generated by h consstng of all nfnte hstores wth prefx h. We assume that H 8 s endowed wth the σ-algebra BpH 8 q generated by the set of cylnders. Strateges σ, τ of players Max and Mn and the ntal state determne a probablty measure P σ,τ on ph 8, BpH 8 qq. We defne nductvely P σ,τ for cylnders n the followng way. Let h 0 s 1 be a fnte hstory of length 0. Then P σ,τ ph`0 q # 0 f s 1, 1 f s 1. Let h t 1 s 1, a 1, b 1,..., s t 1, a t 1, b t 1, s t and h t h t 1, a t, b t, s t`1. Then P σ,τ ph`t q Pσ,τ ph`t 1 q σph t 1qpa t q τph t 1 qpb t q pps t`1 s t, a t, b t q. Note that the set of cylnders s π-system (.e. a famly of sets closed under ntersecton) thus a probablty defned on cylnders extends n a unque way to all sets of BpH 8 q. To defne the stochastc prorty game we endow the arena wth a reward vector r pr 1,..., r n q assocatng wth each state a reward r P R. 7

8 Gven the reward vector r, the prorty payoff s a mappng ϕ r : H 8 Ñ R such that for an nfnte hstory h s 1, pa 1, b 1 q, s 2, pa 2, b 2 q, s 3, pa 3, b 3 q,... ϕ r phq r l, where l lm sup s t. (6) t Thus the prorty payoff s equal to the reward of the greatest (n the usual nteger order) state vsted nfntely often. The am of player Max (player Mn) s to maxmze (resp. mnmze) the expected payoff ż E σ,τ rϕ r s pdhq. ϕ r phqp σ,τ H 8 The prorty game has value v for a startng state f nf sup τpt σpσ E σ,τ rϕs v sup σpσ nf τpt Eσ,τ From the determnacy of Blacwell s games proved by Martn [9] t follows that the prorty game has value for each ntal state. (The Blacwell games do not have states but the result of Martn extends to the games wth states as shown by Matra and Sudderth [7].) A strategy τ of player Mn s ε-optmal, ε ě 0, f for each state and each strategy σ of player Max, sup E σ,τ rϕs ď v ` ε. σpσ Symmetrcally, a strategy σ of player Max s ε-optmal f for each state and each strategy τ of player Mn, nf τpt Eσ,τ rϕs ě v ε. An ε-optmal strategy wth ε 0 s called optmal. If the reward vector s such that rew P t0, 1u for each state then we obtaned the party payoff. A proof of determnacy of stochastc party games usng fxed ponts was gven by de Alfaro and Majumdar [4]. 2.1 Normalzng the rewards In the sequel t wll be convenent to assume that all rewards belong to the nterval r0, 1s rather than to R. Ths can be acheved wthout loss of generalty by a smple lnear transformaton. Let a mn PS r, b max PS r and gpxq 1 b a x a b a. Then 0 gpaq ď fpxq ď gpbq 1 for x P tr 1,..., r n u. Changng the reward vector from r pr 1,..., r n q to gprq pgpr 1 q,..., gpr n qq transforms lnearly the prorty payoff of all plays h snce ϕ gprq phq gpϕ r phqq. 8 rϕs.

9 By the lnearty of expectaton, ths mples that for all startng states and all strateges σ and τ we have gpe σ,τ pϕ r qq E σ,τ pgpϕ r qq. Ths mples that v s the value of state for the game wth the prorty payoff ϕ r f and only f gpv q s the value of for the game wth the prorty payoff ϕ gprq. Smlarly a strategy s ε-optmal for the prorty ε payoff ϕ r f and only f t s b a -optmal for the prorty payoff ϕ gprq. 3 On fxed ponts of bounded monotone nonexpansve mappngs In ths techncal secton we ntroduce monotone nonexpansve mappngs, that play a crucal role n the study of stochastc prorty games. The soluton to stochastc prorty games gven n Secton 5 reles heavly on fxed pont propertes of such mappngs examned n Secton 3.1. In Secton 3.2 we defne and examne the nested nearest fxed ponts of monotone nonexpansve mappngs. The dualty of the nested nearest fxed ponts s studed n Secton 3.3. An element x px 1,..., x n q of R n wll be dentfed wth the mappng x from rns t1,..., nu to R and we can occasonally wrte xpq to denote x. The set R n s endowed wth the natural componentwse order, for x, y P R n, x ď y f x ď y for all P rns. A mappng f : R n Ñ R s monotone f for x, y P R n, x ď y mples fpxq ď fpyq (we do not assume that n, thus x ď y and fpxq ď fpyq can relate to componentwse orders n two dfferent spaces). We assume that the Cartesan product R n s endowed wth the structure of a normed real vector space wth the norm 8, for x P R n, x 8 max Prns x. Thus, for x, y P R n, x y 8 defnes a dstance between x and y. We say that a mappng f : R n Ñ R s nonexpansve f, for all x, y P R n, fpxq fpyq 8 ď x y 8. Such a mappng f can be wrtten as vector of mappngs f pf 1,..., f q, where f : R n Ñ R, 1,...,. Clearly, f s monotone nonexpansve ff all f are monotone nonexpansve. We say that a mappng f : R n Ñ R s addtve homogeneous f for all λ P R and x P R n fpx ` λe n q fpxq ` λe, where e n and e are the vectors p1,..., 1q n R n and R respectvely havng all components equal to 1. Crandall and Tartar [2] proved the followng result. Lemma 1 (Crandall and Tartar [2]). For addtve homogeneous mappngs f : R n Ñ R s the followng condtons are equvalent: () f s monotone, () f s nonexpansve. 9

10 We wll need only the mplcaton ()Ñ() that we prove below for the reader s convenence. Moreover, f the result holds for mappngs from R n to R then t holds for mappngs from R n to R. Thus we assume n the proof that that f : R n Ñ R. Proof. For x, y P R n, e n p1, 1,..., 1q P R n and λ x y 8 we have y λe n ď x ď y ` λe n. Thus for f : R n Ñ R monotone and addtve homogeneous we obtan Thus fpxq fpyq ď λ x y 8. fpyq λ ď fpxq ď fpyq ` λ. 3.1 Fxed ponts of monotone nonexpansve mappngs We say that a monotone mappng f : R n Ñ R s bounded f fpr0, 1s n q Ď r0, 1s. The set of bounded monotone nonexpansve mappngs wll be denoted by M n, r0, 1s. Moreover BMN wll stand for the abbrevaton for bonded monotone nonexpansve. In ths secton we ntroduce the noton of the nearest fxed pont of BMN mappngs generalzng the least and greatest fxed ponts. In the followng lemma states basc propertes of fxed ponts of BMN mappngs. Lemma 2. Let f P M 1,1 r0, 1s. Defne by nducton, f p0q pxq x, f p1q pxq fpxq, f p`1q pxq fpf pq pxqq, for x P r0, 1s. Then () for each x P r0, 1s the sequence pf pq pxqq, 0, 1,..., s monotone and converges to some x 8 P r0, 1s. The lmt x 8 s a fxed pont of f, fpx 8 q x 8, () f x ď y are fxed ponts of f, fpxq x and fpyq y, then for each z such that x ď z ď y, fpzq z, () the sequence pf pq p0qq, 0, 1, 2,..., converges to the least fxed pont K f of f whle the sequence pf pq p1qq, 0, 1, 2,..., converges to the greatest fxed pont J f of f. The nterval rk f, J f s s the set of all fxed ponts of f. If 0 ď x ď K f then the sequence pf pq pxqq converges to K f. If J f ď x ď 1 then the sequence pf pq pxqq converges to J f. If 0 ď x ă K f then x ă fpxq. If J f ă x ď 1 then fpxq ă x. Proof. () Suppose that fpxq ď x. Then nductvely, snce f s non-ncreasng, f p`1q pxq ď f pq pxq for all,.e. the sequence f pq pxq s non-ncreasng. Snce ths sequence s bounded from below by 0 t converges to some x 8. The case of fpxq ě x can be treated n a smlar way. Snce f s nonexpansve fpx 8 q f p`1q pxq ď x 8 f pq pxq. Because the rght-hand sde tends to 0 we can see that f pq pxq converges to fpx 8 q. On the other hand, f pq pxq converges to x 8. Therefore fpx 8 q x 8. 10

11 () Let 0 ď x ď z ď y ď 1 and f pxq x, f pyq y. Snce f s monotone, fpxq ď fpzq ď fpyq. Thus, snce f s nonexpansve, 0 ď fpyq fpzq ď y z and 0 ď fpzq fpxq ď z x. Ths mples that fpzq z. () s a drect consequence of () and (). Let f P M 1,1 r0, 1s. For a P r0, 1s we defne the nearest fxed pont µ a x.fpxq of f to be µ a x.fpxq : lm f pq paq. Lemma 2 shows that ths s really a fxed pont of f whch s closest to a,.e. a µ a x.fpxq mn zpr0,1s t a z fpzq zu. Moreover, the least and the greatest fxed ponts of f P M 1,1 r0, 1s are respectvely equal to µ 0 x.fpxq and µ 1 x.fpxq. We can see also that $ & µ 0 x.fpxq f a ď µ 0 x.fpxq, µ a x.fpxq a f µ 0 x.fpxq ď a ď µ 1 x.fpxq, (7) % µ 1 x.fpxq f µ 1 x.fpxq ď a,.e. the fxed pont nearest to a s equal ether to the least or to the greatest fxed pont or s equal to a tself. Let f P M n,1 r0, 1s. Fxng pr 1,..., r 1, r `1,..., r n q P r0, 1s n 1 we can consder the mappng x ÞÑ fpr 1,..., r 1, x, r `1,..., r n q. from r0, 1s to r0, 1s. Ths mappng belongs to M 1,1 r0, 1s thus, gven r P r0, 1s, we can calculate the nearest fxed pont µ r x.fpr 1,..., r 1, x, r `1,..., r n q. Ths fxed pont depends on r pr 1,..., r 1, r, r `1,..., r n q, thus we can defne the mappng r0, 1s n Q pr 1,..., r 1, r, r `1,..., r n q ÞÑ µ r x.fpr 1,..., r 1, x, r `1,..., r n q P r0, 1s (8) Lemma 3. If px 1,..., x n q ÞÑ fpx 1,..., x n q s BMN then the mappng (8) s BMN. Proof. Let r pr 1,..., r n q, w pw 1,..., w n q P r0, 1s n. Defne two sequences pr q, 1, 2,... and pw q, 1, 2,..., such that r 1 r and r `1 fpr 1,..., r 1, r, r `1,..., r n q and w 1 w and w `1 fpw 1,..., w 1, w, w `1,..., w n q. 11

12 By Lemma 2 both sequences converge to some r 8 and w8 and r 8 µ r x.fpr 1,..., r 1, x, r `1,..., r n q w 8 µ w x.fpw 1,..., w 1, x, w `1,..., w n q. respectvely and We shall prove by nducton that for all, r w ď r w 8. Clearly, r 1 w1 r w ď max r w r w 8. Suppose that Then r `1 r w ď r w 8. w `1 fpr 1,..., r 1, r, r `1,..., r n q fpw 1,..., w 1, w, w `1,..., w n q ď maxtmax r j w j, r j w u ď maxtmax j r j w j, r w 8 u r w 8. Tang the lmt Õ 8 we obtan r 8 w8 ď r w 8. Ths proves that (8) s nonexpansve. That (8) s monotone s obvous and left to the reader. Note that the usual pont of vew (at least when only the greatest and the least fxed ponts are appled) s that, for a mappng f P M n,1 r0, 1s tang the fxed pont µ r x.fpx 1,..., x 1, x, x `1,..., x n q bounds the varable x,.e. we consder ths expresson as the functon of the varables x 1,..., x 1, x `1,..., x n whle r s consdered as a constant. In other words, for a gven fxed r we can consder the mappng px 1,..., x 1, x `1,..., x n q ÞÑ µ r x.fpx 1,..., x 1, x, x `1,..., x n q. From Lemma?? t follows that ths mappng belongs to M n 1,1. Clearly, Lemma?? adopts a lager pont of vew where, although s some sense the varable x becomes bound by the fxed pont µ r x, at the same tme r becomes a new varable. Ths larger pont of vew s nterestng snce t allows to examne how the nearest fxed pont changes n functon of r. In the next secton we wll defne the nested nearest fxed pont µ rn x n.... µ r1 x 1.fpx 1,..., x n q of a mappng f P M n,n r0, 1s. From the tradtonal pont vew ths expresson defnes some specal fxed pont of f,.e. some specal element d P r0, 1s n such that fpdq d. However d depends on or more precsely s a functon of r pr 1,..., r n q. And t s nterestng and frutful to examne the functon pr 1,..., r n q ÞÑ µ rn x n.... µ r1 x 1.fpx 1,..., x n q. Lemma 4. If f P M,m r0, 1s and g P M m,n r0, 1s then g f P M,n r0, 1s,.e. composton of BMN mappngs s BMN. Proof. For x, y P r0, 1s, we have gpfpxqq gpfpyqq 8 ď fpxq fpyq 8 ď x y 8. Trvally, monotoncty s also preserved by composton. the 12

13 3.2 Nested fxed ponts of bounded monotone nonexpansve mappngs In ths secton we defne the nested nearest fxed pont operators Fx : M n,n r0, 1s Ñ M n,n r0, 1s, 0, 1,..., n. Each Fx can be decomposed nto n operators Fx, Fx : M n,n r0, 1s Ñ M n,1 r0, 1s, P rns, so that, for f P M n,n, Fx pfq pfx 1pfq,..., Fx npfqq. Let f pf 1,..., f n q P M n,n r0, 1s, where f P M n,1 r0, 1s, for P rns. We set Fx 0 pfq to be such that Fx 0 pfqprq r, for r P r0, 1s n. Thus Fx 0 pfq s the dentty mappng and does not depend of f. Note that Fx 0 pfqprq r,.e. Fx 0 pfq s the projecton on the th coordnate. In general we set It remans to defne Fx pfqprq for ď. Fx pfqprq r, for all 0 ď ă ď n. The defnton s by nducton on. Suppose that Fx 1 pfq s defned. For r P r0, 1s n and ζ P r0, 1s let us set F 1 pζ; rq : Fx 1 pfqpr 1,..., r 1, ζ, r `1,..., r n q, for P r 1s. (9) Note that F 1 pζ; rq depends on ζ and on pr 1,..., r 1, r `1,..., r n q but does not depend on r. Thus F 1 s n fact a mappng from r0, 1s n to r0, 1s. Then we defne Fx pfqprq : µ r ζ.f pf1 1 pζ; rq,..., F 1 1 pζ; rq, ζ, r `1,..., r n q, (10) Fx pfqprq : F 1 pr 1,..., r 1, Fx pfqprq, r `1,..., r n q, for P r 1s, Fx pfqprq : r, for P t ` 1,..., nu. Snce the defnton of the nested fxed pont mappngs uses only the composton and the nearest fxed pont operators, Lemmas 4 and 3 mply that Corollary 5. If f P M n,n r0, 1s then, for all P t0u Y rns, Fx pfq P M n,n r0, 1s. Let us note fnally that Fx pfq depends only on f 1,..., f but s ndependent of f `1,..., f n. 13

14 3.3 Dualty for the bounded monotone nonexpansve mappngs In ths secton we defne and examne the noton of dualty for the BMN mappngs. For r pr 1,..., r n q P r0, 1s n we set 1 r : p1 r 1,..., 1 r n q. Gven a BMN mappng f : r0, 1s n Ñ r0, 1s the dual of f s the mappng f : r0, 1s n Ñ r0, 1s such that fpr 1,..., r n q 1 fp1 r 1,..., 1 r n q. The dual of f pf 1,..., f q P M n, r0, 1s s defned as f pf 1,..., f n q. We can wrte ths n a more explct way f for f pf 1,..., f q P M n, r0, 1s we defne 1 f : p1 f 1,..., 1 f q. Then usng ths notaton, for f P M n, r0, 1s, we can wrte succnctly Lemma 6. If f s BMN then f s BMN. fprq 1 fp1 rq. Proof. Let pr 1,..., r n q ď pw 1,..., w n q. Then p1 r 1,..., 1 r n q ě p1 w 1,..., 1 w n q and fp1 r 1,..., 1 r n q ě fp1 w 1,..., 1 w n q. Thus fpr 1,..., r n q 1 fp1 r 1,..., 1 r n q ď 1 fp1 w 1,..., 1 w n q ď fpw 1,..., w n q,.e. f s monotone. Fnally fprq fpwq 8 p1 fp1 rqq p1 fp1 wqq 8 ď p1 rq p1 wq 8 r w 8,.e. f s nonexpansve. Lemma 7. If f P M n,1 r0, 1s then, for all P rns and r pr 1,..., r n q P r0, 1s n, µ r x.fpr 1,..., r 1, x, r `1,..., r n q 1 µ 1 r x.fp1 r 1,..., 1 r 1, 1 x, 1 r `1,..., 1 r n q. Proof. Let J f and K f be respectvely the greatest and the least fxed ponts of the mappng x ÞÑ fr 1,..., r 1, x, r `1,..., r n. Smlarly let J f, K f the greatest and the least fxed ponts of the mappng x ÞÑ fp1 r 1,..., 1 r 1, 1 x, 1 r `1,..., 1 r n q. Snce fp1 r 1,..., 1 r 1, x, 1 r `1,..., r n q 1 fpr 1,..., r 1, 1 x x, r `1,..., r n q we have K f 1 J f and J f 1 K f. There are three possbltes concernng the poston of r relatve to K f and J f. If J f ď r then µ r x.fr 1,..., r 1, x, r `1,..., r n J f. 14

15 However, n ths case we have also 1 r ď 1 J f K f mplyng that µ 1 r x.fp1 r 1,..., 1 r 1, x, 1 r `1,..., r n q K f. In a smlar way f r ď K f then µ r x.fpr 1,..., r 1, x, r `1,..., r n q K f and µ 1 r x.fp1 r 1,..., 1 r 1, x, 1 r `1,..., r n q J f. The last case to examne s when K f ď r ď J f. Then µ r x.fpr 1,..., r 1, x, r `1,..., r n q r and, on the other hand, K f ď 1 r ď J f, mplyng µ 1 r x.fp1 r 1,..., 1 r 1, x, 1 r `1,..., r n q 1 r. Lemma 8. Let g P M m, r0, 1s and f P M,n r0, 1s. Then f g f g,.e. the dual of the composton of BMN mappngs s equal to the composton of duals. Proof. For r P r0, 1s n we have pf gqprq 1 pf gqp1 rq 1 fpgp1 rqq 1 fp1 p1 gp1 rqqq 1 fp1 gprqq pfpgprqq. The followng lemma examnes the dualty for the nested nearest fxed ponts. Lemma 9. Let f pf 1,..., f n q P M n,n r0, 1s. Then for all, 0 ď ď n, and r P r0, 1s n Fx pfqprq 1 Fx pfqp1 rq. (11) Proof. Inducton on. r ÞÑ Fx 0 pfqprq r s the dentty mappng ndependently of f. Thus the left-hand sde of (11) s equal to r and the rght-hand sde s 1 p1 rq r as well. For each 0 ď ď n, let us set and Fx pfqprq H prq ph 1 prq,..., H nprqq Fx pfqprq H prq ph 1prq,..., H nprqq. Usng ths notaton (11) can be wrtten as H prq 1 H p1 rq. (12) 15

16 Our am s to prove the last equalty for under the assumpton that t holds for 1. By defnton H p1 rq µ 1 r x.f ph 1 1 p1 r 1,..., 1 r 1, x, 1 r `1,..., r n q,..., Let us defne a mappng G P M n,n r0, 1s: H 1 1p1 r 1,..., 1 r 1, x, 1 r `1,..., r n q, x, 1 r `1,..., 1 r n q. G : ph 1 1,... H 1 1, π, π `1,..., π n q, where π px 1,..., x n q x,, ` 1,..., n, s the projecton on the -th coordnate. Snce π π,.e. the dual of the projecton s equal the same projecton mappng we can see that the dual to G s Therefore, by Lemmas 8 and 7, G ph 1 1,... H 1 1, π, π `1,..., π n q. H p1 rq µ 1 r x.f G p1 r 1,..., 1 r 1, x, 1 r `1..., 1 r n q For m P r 1s, µ 1 r x.f G p1 r 1,..., 1 r 1, x, 1 r `1..., 1 r n q 1 µ r x.f G r 1,..., r 1, x, r `1,..., r n 1 H prq H mp1 rq H 1 m p1 r 1,..., 1 r 1, H p1 rq, 1 r `1,..., 1 r n q H 1 m p1 r 1,..., 1 r 1, 1 H prq, 1 r `1,..., 1 r n q 1 Hm 1 pr 1,..., r 1, H prq, r `1,..., r n q Fnally, for m ą, 1 H mprq. Ths termnates the proof of (12). 1 H mp1 rq 1 p1 r m q r m H mprq. 16

17 4 The one-day game In ths secton we defne an auxlary one-day game. Ths smple game consttutes an essental ngredent n our soluton to the general prorty games. Let x px 1,..., x n q P R n be a reward vector assgnng to each state the reward x. A one-day game Mpxq s the game played n the followng way. If the game starts at a state then players Max and Mn choose ndependently and smultaneously actons a P Apq and b P Bpq. Suppose that upon executon of pa, bq the game moves to the next state m. Ths ends the game and player Max receves from player Mn the payoff x m. A one-day game played at state gven the reward mappng x wll be denoted M pxq. Note that M pxq can be seen as a matrx game where M pxqra, bs : ÿ mps x m ppm, a, bq s the (expected) payoff obtaned by player Max from player Mn when the players play actons a and b respectvely. The value mappng of the one-day game s the mappng f pf 1,..., f n q from R n to R n such that, for each state P rns, f px 1,..., x n q : valpm pxqq, (13) where valpm pxqq s the value of the matrx game M pxq, In other words, f px 1,..., x n q s the value of the one-day game played at state seen as a functon of the reward vector x px 1,..., x n q. We wll be nterested n f pxq seen as a functon of the reward vector x px 1,..., x n q. Snce all entres n the matrx game M pxq belong to R, f pxq P R,.e. f s a mappng from R n nto R. Lemma 10. The value mappng f of the one-day game defned n (13) s monotone and non-expansve. Proof. It s easy to see that f s monotone and t s also straghtforward that f s addtvely homogeneous,.e, for all x P R n, fpx ` λ e n q fpxq ` λ e n, where e n p1,..., 1q P R n s the vector wth 1 on all components. By Lemma 1 ths mples that f s nonexpansve. 5 Stoppng prorty games Stoppng prorty games are a varant of prorty games where some states are stoppng or equvalently where some states are absorbng. 17

18 We solve the stoppng prorty games by nducton on the number of non-stoppng states and we show that the value functon can be expressed as the nearest fxed pont of the value functon (13) of the one-day game. Let ps t, t ě 1q be the stochastc process such that S t s the state vsted at stage t. For each state P rns we defne the random varable such that T ą : H 8 Ñ N Y t8u T ą mntt S t ą u. Thus T ą s the tme of the frst vst to a state greater than. We defne a new stochastc process S rs t, t P N, that we shall call the stopped state process: # S rs S t f T ą ě t, t S q f q T ą ă t. Thus f all prevously vsted states belong to t1,..., u then S rs t s equal to the state vsted at the current epoch t. However, f at some prevous epoch a state ą was vsted then S rs t s the frst such state. In other words, S rs t behaves as f the states ą were absorbng, f S rs t ą then Sq rs S rs t for all q ě t. For a gven reward vector r and P rns we defne the stoppng prorty payoff ϕ rs ϕ rs r r l where l lm sup t S rs t. The games wth payoff ϕr rs wll be called stoppng prorty games. We wll also spea about the ϕ rs r -game to refer to the game wth payoff ϕ rs r. Smlarly ϕ r -game wll stand for the usual prorty game. Note that once a state j greater than s vsted the game wth payoff ϕ rs r s for all practcal purposes over, ndependently of what can happen n the future the payoff s equal to the reward r j of ths state and the states vsted after the moment T ą have no bearng on the payoff. -game the states rs wll be called non-stoppng whle the states ą, wll be called stoppng. Note that snce we have assumed that S rns,.e. n s the greatest state, we have In the ϕ rs r ϕ rns r ϕ r. Note also that stoppng states are trval. If ą then for all plays h startng at, ϕ rs r phq r, thus E σ,τ pϕ rs r q r for all strateges σ, τ, n partcular the value of stoppng state, ą, s r. r : 18

19 5.1 Dual game We have constructed a ε-optmal strategy for Max and Mn for the game startng at but the strategy for Max was constructed under the condton r ă w whle the strategy for Mn was constructed under the condton r ď w. How to obtan ε-optmal strateges for both players for two remanng cases (r ě w for Max and r ą w for Mn) we use the natural dualty of the nested fxed ponts and the games. Let G be a prorty game. The dual game G s obtaned n the followng way: (D) G has the same states, actons and transton probabltes as G, (D) f r pr 1,..., r n q s the reward vector n G then r pr 1,..., r n q s the reward vector n G, where for z P r0, 1s, z : 1 z, (D) players Max and Mn exchange the roles, n the dual game for each state P S, Apq are the actons of player Max whle Bpq are the actons of player Mn, moreover n the dual game player Max wants to mnmze the prorty payoff ϕ r whle Mn wants to maxmze the prorty payoff ϕ r. To avod confuson, we wrte Max and Mn to denote the players, respectvely, maxmzng and mnmzng the prorty payoff n the dual game. A strategy σ s a strategy of player Max n G f and only f t s a strategy of player Mn n the dual game G. A symmetrc property holds for strateges of player Mn. For each play h we have ϕ r phq 1 ϕ r phq, thus E σ,τ pϕ r q 1 E τ,σ pϕ r q, where the left hand sde s the expected payoff n G, whle E τ,σ pϕ r q s the expected payoff n G when Max plays accordng to τ and Mn plays accordng to σ. Ths mples that v 1 v, where v s the value of state n G whle v s the value of n the G. Moreover, a strategy s ε-optmal for player Max n G f and only f t s ε-optmal for player Mn n G. A symmetrc property holds for strateges of player Mn. 6 Constructng ε-optmal strateges The rest of ths secton s devoted to the proof of the followng man result characterzng the values of the stoppng prorty games by means of the nested nearest fxed ponts. Theorem 11. Let f : r0, 1s n Ñ r0, 1s n be the value mappng of the one-day game defned n Secton 4. For 0 ď ď n, let Fx pfq be the -th nested fxed pont of f, see Secton 3.2. Then, for each reward vector r, for each ntal state P rns, the stoppng prorty ϕ rs r -game startng at has value equal to Fx pfqprq. 19

20 Proof. For each ε ą 0 we construct ε-optmal strateges for both players. The proof s carred out by nducton on. The case 0 s trval snce when all states are stoppng then the value of each state s equal to ts reward,.e. the value of state s Fx 0 pfqprq r. Under the assumpton that the theorem holds for 1,.e. Fx 1 of the non-stoppng state P r 1s n the ϕ r 1s r s the value of the non-stoppng state P rs n the ϕ rs r We wll use the followng notaton: and pfqprq s the value -game, we shall prove that Fx pfqprq -game. w : Fx pfqprq µ r x.f pf 1 1 px ; rq,..., F 1 1 px ; rq, x, r `1,..., r n q (14) w : Fx pfqprq F 1 pw ; rq, P r 1s, (15) where F 1 are defned as n (9). Thus our am s to prove that pw 1,..., w 1, w q are the values of the states t1,..., 1, u n the ϕr rs -game. Snce w s a fxed pont of (14) we have ps rs t w f pw 1,..., w 1, w, r `1,..., r n q. (16) Let T m be the random tme of the m th vst to state of the stoppng state process q tě1,.e. T 1 mntt S rs t u, T m mntt t ą T m 1 and S rs t u for m ą 1. (17) T m can be nfnte f the number of vsts of the stoppng state process S rs t to the state s smaller than m and T 1 1 f the game starts at. Snce T m s defned w.r.t. the stoppng state process S rs t, T m ă 8 mples that all states vsted pror to the moment T m are ď. Recall that S t, t ě 1, s the stochastc process that gves the state vsted at stage t. A t, t ě 1 and B t, t ě 1 are the stochastc processes that gve the actons played by players Max and Mn respectvely at stage t. Let T be any random tme,.e. a mappng from plays to t1, 2,...u Y t8u such that for each m P t1, 2,...u the event tt mu belongs to the σ-algebra F m σps 1, A 1, B 1, S 2,..., S m q. In other words, F m s the σ algebra generated by the cylnders h`m, where h m are hstores of length m. Intutvely that means that nowng the states and actons up to tme m we can decde f T m or not. Defnton 12. For a random tme T, θ T : H 8 Ñ H 8 wll denote the shft mappng that maps plays to plays and s defned n the followng way θ T ps 1, A 1, B 1, S 2,...q S T, A T, B T, S T `1, A T `1, B T `1, S T `2, A T `2, B T `2,..., 20

21 where S t s the state process gvng the state vsted at stage t and A t, B t are acton processes that gve the actons played by players Max and Mn at stage t. Thus the shft θ T forgets all hstory pror to tme T. Of course, θ T s well defned only on plays such that T ă 8. Below we use the shft θ Tm`1, where T m s the tme of the mth vst to state. Ths shft wll be appled only to the plays wth T m ă ε{2-optmal strategy σ for player Max when r ă w and s the startng state. We assume that and the am s to construct a strategy σ satsfyng for each strategy τ of Mn. Let and defne r ă w (18) E σ,τ pϕ rs r q ě w ε{2 (19) η P pw ε{2, w q ξ F 1 pη; P r 1s. (20) By the nducton hypothess, ξ s the value of the ϕ r 1s pr 1,...,r 1,η,r `1,...,r nq-game startng at the state. Let us consder the one-day game M pξ 1,..., ξ 1, η, r `1,..., r n q played at state. Then η : f pξ 1,..., ξ 1, η, r `1,..., r n q (21) s the value of ths game. By the propertes of monotone non-expansve mappngs, (18) mples that w s n fact the least fxed pont of the mappng Thus η ă w mples that Fx δ such that x ÞÑ f pf 1 1 px ; rq,..., F 1 1 px ; rq, x, r `1,..., r n q. η ă f pξ 1,..., ξ 1, η, r `1,..., r n q η ď w. (22) 0 ă δ ă η η. (23) We defne the strategy σ of player Max n the followng way: durng the m-th vst to the state, whch taes place at tme T m, c.f. (17), player Max selects actons accordng to hs optmal strategy n the one-day game M pξ 1,..., ξ 1, η, r `1,..., r n q. 21

22 durng all stages j such that T m ă j ă T m`1,.e. between the mth and pm ` 1qth vst to, player Max plays accordng to hs δ-optmal strategy for the ϕ r 1s pr 1,...,r 1,η,r `1,...,r nq -game. When he apples ths strategy then we tactly assume that after each vst to player Max forgets all precedng hstory and he plays as f the game started afresh at the frst state vsted after the last vst to. From the optmalty of σ n the one-day game M pξ 1,..., ξ 1, η, r `1,..., r n q, we have ÿ ξ P σ,τ ps Tm`1 T m ă 8q ă ` η P σ,τ ps Tm`1 T m ă 8q ` ÿ r P σ,τ ps Tm`1 T m ă 8q ą ě η. (24) Indeed, when player Max plays accordng to the strategy σ at the moment T m then the current state s and he plays usng hs optmal strategy n the one-day game M pξ 1,..., ξ 1, η, r `1,..., r n q. Now t suffces to notce that the left-hand sde of (24) s nothng else but the payoff that player Max obtans n the one-day game M pξ 1,..., ξ 1, η, r `1,..., r n q (because S Tm`1 s the state vsted at the next tme moment T m ` 1). Snce η s the value of ths one-day game the nequalty follows. In the sequel we wll note 1 A the ndcator of the event A,.e. the mappng that s equal to 1 on A and to 0 on the complement of A. Let us note the followng equalty: ÿ ą r P σ,τ ps Tm`1 T m ă 8q E σ,τ pϕ rs r 1 tstm`1 ąu T m ă 8q. (25) Indeed, f a play belongs to the event ts Tm`1, T m ă 8u for ą then T m ă 8 means that at the moment T m ths play vsts and pror to T m t never vsted states ą cf. (17), and at the next tme moment T m ` 1 such a play vsts the stoppng state ą. But for such plays the payoff ϕ rs r s equal to r. Consder now the event ts Tm`1, T m ă 8u, for ă, see Fgure 1. Ths event conssts of the plays such that the stoppng state process S rs vsts for the mth tme at tme T m (ths s guaranteed by T m ă 8, cf.(17)) and at the next tme moment T m ` 1 the play vsts the state ă. From the defnton of σ t follows that startng from the tme T m ` 1 player Max plays usng hs δ-optmal strategy n the ϕ r 1s pr 1,...,r 1,η,r `1,...,r nq-game. Snce, by the nductve hypothess (20), the value of such a game for state s ξ, we have 22

23 {T m+1 <, S Tm+1 =, T m+1 < } T m+1 t = 0 T m T m + 1 {T m+1 <, S Tm+1 =, T m+1 = } [ 1] θ Tm+1 {T m+1 <, S Tm+1 =, T m+1 < } T m+1 t = 0 [ 1] {T m+1 <, S Tm+1 =, T m+1 = } Fgure 1: The upper fgure : The event ts Tm`1, T m ă 8u conssts of the plays that at tme T m vst state for the mth tme wthout ever vstng the states ą before, and at tme T m ` 1 they vst state, where ă. These plays are parttoned nto two sets. The set tt m`1 ă 8, S Tm`1, T m ă 8u of plays that wll vst for the pm`1qth tme and the set tt m`1 8, S Tm`1, T m ă 8u of the plays for whch the mth vst n was the last one. The lower fgure : The shft mappng θ Tm`1 forgets all the hstory pror to the tme T m ` 1. E σ,τ pϕ r 1s pr 1,...,r 1,η,r `1,...,r θ nq T m`1 S Tm`1, T m ă 8q ě ξ δ, for all ă, (26) where θ Tm`1 s the shft mappng that deletes all hstory pror to the tme T m ` 1. Usng the fact that for all events A and B and each ntegrable mappng f we have Epf A, Bq P paq Epf 1 tau Bq we can rewrte (26) n the followng form E σ,τ pϕ r 1s pr 1,...,r 1,η,r `1,...,r θ nq T m`1 1 tstm`1 u T m ă 8q ě pξ δq P σ,τ ps Tm`1 T m ă 8q, for ă. (27) We shall prove that for ă, E σ,τ pϕ r 1s pr 1,...,r 1,η,r `1,...,r θ nq T m`1 1 tstm`1 u T m ă 8q η P σ,τ pt m`1 ă 8, S Tm`1 T m ă 8q`E σ,τ pϕ rs r 1 ttm`1 8u 1 tstm`1 u T m ă 8q. (28) 23

24 Indeed the left-hand sde of (28) s the sum of and E σ,τ pϕ r 1s pr 1,...,r 1,η,r `1,...,r θ nq T m`1 1 tstm`1 u 1 ttm`1 8u T m ă 8q (29) E σ,τ pϕ r 1s pr 1,...,r 1,η,r `1,...,r θ nq T m`1 1 tstm`1 u 1 ttm`1 ă8u T m ă 8q. (30) Consder frst (30). For plays h belongng to the event tt m`1 ă 8, S Tm`1 u, ă, the shft θ Tm`1 removes all prefx hstory up to the tme T m ` 1, see Fgure 1. Snce T m`1 ă 8 n the remanng suffx play θ Tm`1phq all vsted states up to the next vst to are ă. But for the plays that vst at some moment and for whch all states pror to ths frst vst to are ă the payoff ϕ r 1s pr 1,...,r 1,η,r `1,...,r nq s constant and equal to the reward η assocated wth. Thus (30) s equal to η P σ,τ pt m`1 ă 8, S Tm`1 T m ă 8q. Let us examne now (29). The plays h belongng to the event ts Tm`1, T m`1 8, T m ă 8u have the followng propertes: at tme T m they vst and all states vsted pror to T m are ď, at tme T m ` 1, just after the mth vst to, they vst the state, snce T m`1 8 the suffx play θ Tm`1phq does not contan any occurrence of ( s never vsted for the pm ` 1qth tme). These propertes assure that for such plays ϕ rs r phq ϕ rs r pθ Tm`1phqq. However, θ Tm`1phq has no occurrence of, whch mples for the resultng payoff t s rrelevant f s stoppng or not and what s the reward of. Thus ϕ rs r pθ Tm`1phqq ϕ r 1s pr 1,...,r 1,η,r `1,...,r pθ nq T m`1phqq. Ths termnates the proof that (29) s equal to Ths concludes also the proof of (28). From (27) and (28) we obtan η P σ,τ ă E σ,τ pϕr rs 1 ttm`1 8u 1 tstm`1 u T m ă 8q. pt m`1 ă 8, S Tm`1 T m ă 8q`E σ,τ pϕ rs r 1 ttm`1 8u 1 tstm`1 u T m ă 8q ě pξ δq P σ,τ ps Tm`1 T m ă 8q. Summng both sdes of ths nequalty for ă and rearrangng the terms we obtan ÿ ξ P σ,τ ps Tm`1 T m ă 8q ď η P σ,τ pt m`1 ă 8, S Tm`1 ă T m ă 8q ` E σ,τ pϕ rs r 1 ttm`1 8u 1 tstm`1 ău T m ă 8q ` δ P σ,τ ps Tm`1 ă T m ă 8q ď η P σ,τ pt m`1 ă 8, S Tm`1 ă T m ă 8q ` E σ,τ pϕ rs r 1 ttm`1 8u 1 tstm`1 ău T m ă 8q ` δ. 24

25 The last nequalty, (24) and (25) yeld Notce that η ď η P σ,τ pt m`1 ă 8, S Tm`1 ă T m ă 8q `E σ,τ pϕr rs 1 ttm`1 8u 1 tstm`1 ău T m ă 8q `δ `η P σ,τ ps Tm`1 T m ă 8q `E σ,τ pϕr rs 1 tstm`1 ąu T m ă 8q. (31) P σ,τ pt m`1 ă 8, S Tm`1 ă T m ă 8q ` P σ,τ ps Tm`1 T m ă 8q P σ,τ pt m`1 ă 8 T m ă 8q (32) whch allows to regroup the frst and the fourth summand of rght-hand sde of (31). Indeed, tt m`1 ă 8, T m ă 8u s the unon of three dsjont events, dependng on whether the state vsted at the next tme moment T m `1 s ă,, or ą. But for the second of these events we have tt m`1 ă 8, T m ă 8, S rs T m`1 u tt m ă 8, S rs T m`1 S rs T mples that T m`1 m`1 T m ` 1 ă 8. And fnally the thrd event tt m`1 ă 8, T m ă 8, S rs T m`1 u snce ą u s empty snce Srs T m`1 ą means that at tme T m`1 the game hts a stoppng state thus the stoppng state process wll never return to, therefore T m`1 8. Ths termnates the proof of (32). We can regroup also the second and the last summands of (31) snce P σ,τ pt m`1 8, S Tm`1 ă T m ă 8q ` P σ,τ ps Tm`1 ą T m ă 8q P σ,τ pt m`1 8 T m ă 8q We obtan ths agan by presentng the event tt m`1 8, T m ă 8u as the unon of three dsjont events dependng on the value of S Tm`1. However, S Tm`1 contradcts T m`1 8 and S Tm`1 ą mples T m`1 8. Usng these observatons we deduce from (31) that Snce ϕ rs r η ď η P σ,τ pt m`1 ă 8 T m ă 8q ` E σ,τ pϕ rs r 1 ttm`1 8u T m ă 8q ` δ. ď 1, from (33) we obtan that η P σ,τ pt m`1 ă 8 T m ă 8q ` P σ,τ pt m`1 8 T m ă 8q ě η δ. But P σ,τ pt m`1 8 T m ă 8q ` P σ,τ pt m`1 ă 8 T m ă 8q 1 thus the last nequalty yelds (33) P σ,τ pt m`1 ă 8 T m ă 8q ď 1 ` δ η 1 η ă 1 ` pη ηq η 1 η 1. 25

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics Unversty of Illnos Fall 08 ECE 586GT: Problem Set : Problems and Solutons Unqueness of Nash equlbra, zero sum games, evolutonary dynamcs Due: Tuesday, Sept. 5, at begnnng of class Readng: Course notes,