Existence of Multiagent Equilibria with Limited Agents

Size: px
Start display at page:

Download "Existence of Multiagent Equilibria with Limited Agents"

Transcription

1 Exstence of Multagent Equlbra wth Lmted Agents Mchael Bowlng Manuela Veloso Computer Scence Department, Carnege Mellon Unversty, Pttsburgh PA, Abstract Multagent learnng s a necessary yet challengng problem as multagent systems become more prevalent and envronments become more dynamc. Much of the groundbreakng work n ths area draws on notable results from game theory, n partcular, the concept of Nash equlbra. Learners that drectly learn equlbra obvously rely on ther exstence. Learners that nstead seek to play optmally wth respect to the other players also depend upon equlbra snce equlbra are, and are the only, learnng fxed ponts. From another perspectve, agents wth lmtatons are real and common. These may be undesred physcal lmtatons as well as self-mposed ratonal lmtatons, such as abstracton and approxmaton technques, used to make learnng tractable. Ths artcle explores the nteractons of these two mportant concepts, rasng for the frst tme the queston of whether equlbra contnue to exst when agents have lmtatons. We look at the general effects lmtatons can have on agent behavor, and defne a natural extenson of equlbra that accounts for these lmtatons. Usng ths formalzaton, we show that the exstence of equlbra cannot be guaranteed n general. We then prove ther exstence for certan classes of domans and agent lmtatons. These results have wde applcablty as they are not ted to any partcular learnng algorthm or specfc nstance of agent lmtatons. We then present emprcal results from a specfc multagent learner appled to a specfc nstance of lmted agents. These results demonstrate that learnng wth lmtatons s possble, and our theoretcal analyss of equlbra under lmtatons s relevant.. Introducton Multagent domans are becomng more prevalent as more applcatons and stuatons requre multple agents. Learnng n these systems s as useful and mportant as n sngle-agent domans, possbly more so. Optmal behavor n a multagent system may depend on the behavor of the other agents. For example, n robot soccer, passng the ball may only be optmal f the defendng goale s gong to move to block the player s shot and no defender wll move to ntercept the pass. Ths s complcated by the fact that the behavor of the other agents s often not predctable by the agent desgner, makng learnng and adaptaton a necessary component of the agent tself. In addton, the behavor of the other agents, and therefore the optmal response, can be changng as they also adapt to acheve ther own goals. Game theory provdes a framework for reasonng about these strategc nteractons. The game theoretc concepts of stochastc games and Nash equlbra are the foundaton for much of the recent research n multagent learnng, e.g., (Lttman, 994; Hu & Wellman, 998; Greenwald & Hall, 22; Bowlng & Veloso, 22). Nash equlbra defne a course of acton for each agent, such that no agent could beneft by changng ther behavor. So, all agents are playng optmally, gven that the other agents contnue to play accordng to the equlbrum.

2 From the agent desgn perspectve, optmal agents n realstc envronments are not practcal. Agents are faced wth all sorts of lmtatons. Some lmtatons may physcally prevent certan behavor, e.g., a soccer robot that has tracton lmts on ts acceleraton. Other lmtatons are selfmposed to help gude an agent s learnng, e.g., usng a subproblem soluton for advancng the ball down the feld. In short, lmtatons prevent agents from playng optmally and possbly from followng a Nash equlbrum. Ths clash between the concept of equlbra and the realty of lmted agents s a topc of crtcal mportance. Do equlbra exst when agents have lmtatons? Are there classes of domans or classes of lmtatons where equlbra are guaranteed to exst? Ths artcle ntroduces these questons and provdes concrete answers. Secton 2 ntroduces the stochastc game framework as a model for multagent learnng. We defne the game theoretc concept of equlbra, and examne the dependence of current multagent learnng algorthms on ths concept. Secton 3 enumerates and classfes some common agent lmtatons and presents two formal models ncorporatng the effects of lmtatons nto the stochastc game framework. Secton 4 s the major contrbuton of the artcle, presentng both proofs of exstence for certan domans and lmtatons as well as counterexamples for others. Secton 5 gves an example of how these results affect and relate to one partcular multagent learnng algorthm. We present the frst known results of applyng an explctly multagent learnng algorthm n a settng wth lmted agents. Fnally, Secton 6 concludes wth mplcatons of ths work and future drectons. 2. Stochastc Games A stochastc game s a tuple (n, S, A...n, T, R...n ), where, n s the number of agents, S s a set of states, A s the set of actons avalable to agent wth A beng the jont acton space, A... A n, T s a transton functon, S A S [, ], such that, s S a A T (s, a, s ) =, s S and R s a reward functon for the th agent, S A R. Ths s very smlar to the framework of a Markov Decson Process (MDP). Instead of a sngle agent, though, there are multple agents whose jont acton determnes the next state and rewards to the agents. The goal of an agent, as n MDPs, s to maxmze ts long-term reward. Notce, though, that each agent has ts own ndependent reward functon that t s seekng to maxmze. The goal of maxmzng long-term reward wll be made formal n Secton 2.2. Stochastc games can equally thought of as an extenson of the concept of matrx games to multple states. Two common matrx games are n Fgure. In these games there are two players; one selects a row and the other selects a column of the matrx. The entry of the matrx they jontly select determnes the payoffs. Rock-Paper-Scssors n Fgure (a) s a zero-sum game, where the column player receves the negatve of the row player s payoff. In the general case (general-sum games; e.g., 2

3 Bach or Stravnsky n Fgure (b)) each player has an ndependent matrx that determnes ts payoff. Stochastc games, then, can be vewed as havng a matrx game assocated wth each state. The mmedate payoffs at a partcular state are determned by the matrx entres R (s, a). After selectng actons and recevng ther rewards from the matrx game, the players are transton to another state and assocated matrx game, whch s determned by ther jont acton. So stochastc games contan both MDPs (when n = ) and matrx games (when S = ) as subsets of the framework. R r (s, ) = R c (s, ) = (a) Rock-Paper-Scssors R r (s, ) = R c (s, ) = Table : Two example matrx games. ( 2 ( 2 ) ) (b) Bach or Stravnsky 2. Polces Unlke n sngle-agent settngs, determnstc polces, whch assocate a sngle acton wth every state, can often be exploted n multagent settngs. Consder Rock-Paper-Scssors as shown n Fgure (a). If the column player were to play any acton determnstcally, the row player could wn a payoff of one every tme. Ths requres us to consder stochastc strateges and polces. A stochastc polcy for player, π : S P D(A ), s a functon that maps states to mxed strateges, whch are probablty dstrbutons over the player s actons. We use the notaton Π to be the set of all possble stochastc polces avalable to player, and Π = Π... Π n to be the set of jont polces of all the players. We also use the notaton π to refer to a partcular jont polcy of all the players except player, and Π to refer to the set of such jont polces. Fnally, the notaton π, π refers to the jont polcy where player follows π whle the other players follow ther polcy from π. In ths work, we make the dstncton between the concept of stochastc polces and mxtures of polces. A mxture of polces, σ : P D(S A ), s a probablty dstrbuton over the set of determnstc polces. An agent followng a mxture of polces selects a determnstc polcy accordng to ts mxture dstrbuton at the start of the game and always follows ths polcy. Ths s smlar to the dstncton between mxed strateges and behavoral strateges n extensve-form games (Kuhn, 953). Ths work focuses on stochastc polces as they () are a more compact representaton requrng A S parameters nstead of A S parameters to represent the complete space of polces, () are the common noton of stochastc polces n sngle-agent behavor learnng, e.g., (Jaakkola, Sngh, & Jordan, 994; Sutton, McAllester, Sngh, & Mansour, 2; Ng, Parr, & Koller, 999), and () don t requre the artfcal commtment to a sngle determnstc polcy at the start of the game, whch can be dffcult to understand wthn a learnng context. 3

4 2.2 Reward Formulatons There are a number of possble reward formulatons n sngle-agent learnng that defne the agent s noton of optmalty. These formulatons also apply to stochastc games. We wll explore two of these reward formulatons n ths artcle: dscounted reward and average reward. Although, ths work focuses on dscounted reward, many of our theoretcal results also apply to average reward. Dscounted Reward. In the dscounted reward formulaton, the value of future rewards s dmnshed by a dscount factor γ. Formally, gven a jont polcy π for all the agents, the value to agent of startng at state s S s, V π (s) = γ t E { rt s = s, π }, () t= where rt s the mmedate reward to player at tme t wth the expectaton condtoned on s as the ntal state and the players followng the jont polcy π. In our formulaton, we wll assume an ntal state, s S, s gven and defne the goal of each agent as maxmzng V π(s ). Ths dffers from the usual goal n MDPs and stochastc games, whch s to smultaneously maxmze the value of all states. We requre ths weaker goal snce our exploraton nto agent lmtatons makes smultaneous maxmzaton unattanable. Ths same dstncton was requred by Sutton and colleagues (Sutton et al., 2) n ther work on parameterzed polces, one example of an agent lmtaton. Average Reward. In the average reward formulaton all rewards n the sequence are equally weghted. Formally, ths corresponds to, V π (s) = lm T T t= T E {r t s = s, π}, (2) wth the expectaton defned as n Equaton. As s common wth ths formulaton, we assume that the stochastc game s ergodc. A stochastc game s ergodc f for all jont polces any state can be reached n fnte tme from any other state wth non-zero probablty. Ths assumpton makes the value of a polcy ndependent of the ntal state. Therefore, s, s S V π (s) = V π (s ). So any polcy that maxmzes the average value from one state maxmzes the average value from all states. These results along wth more detals on the average reward formulaton for MDPs are summarzed by Mahadevan (996). For ether formulaton we wll use the notaton V π to refer to the value of the jont polcy π to agent, whch n ether formulaton s smply V π(s ), where s can be any arbtrary state for the average reward formulaton.. Ths fact s demonstrated later by the example n Fact 5 n Secton 4. In ths game wth the descrbed lmtaton, f the column player randomzes among ts actons, then the row player cannot smultaneously maxmze the value of the left and rght states. 4

5 2.3 Best-Response and Equlbra Even wth the concept of stochastc polces and well-defned reward formulatons, there are stll no optmal polces that are ndependent of the other players polces. We can, though, defne a noton of best-response. Defnton For a game, the best-response functon for player, BR (π ), s the set of all polces that are optmal gven the other player(s) play the jont polcy π. A polcy π s optmal gven π f and only f, π Π V π,π V π,π. The major advancement that has drven much of the development of game theory, matrx games, and stochastc games s the noton of a best-response equlbrum, or Nash equlbrum (Nash, Jr., 95). Defnton 2 A Nash equlbrum s a jont polcy, π =...n, wth =,..., n π BR (π ). Bascally, each player s playng a best-response to the other players polces. So, no player can do better by changng polces gven that all the other players contnue to follow the equlbrum polcy. What makes the noton of an equlbrum nterestng s that at least one, possbly many, exst n all matrx games and stochastc games. Ths was proven by Nash (95) for matrx games, Shapley (953) for zero-sum dscounted stochastc games, Fnk (964) for general-sum dscounted stochastc games, and Mertens and Neyman (98) for zero-sum average reward stochastc games. The exstence of equlbra of general-sum average reward stochastc games s stll an open problem (Flar & Vreze, 997). In the Rock-Paper-Scssors example n Fgure (a), the only equlbrum conssts of each player playng the mxed strategy where all the actons have equal probablty. In the Bach-or-Stravnsky example n Fgure (b), there are three equlbra. Two consst of both players selectng ther frst acton or both selectng ther second. The thrd nvolves both players selectng ther preferred cooperatve acton wth probablty 2/3, and the other acton wth probablty / Learnng n Stochastc Games Learnng n stochastc games has receved much attenton n recent years as the natural extenson of MDPs to multple agents. The Mnmax-Q algorthm (Lttman, 994) was the frst renforcement learnng to explctly consder the stochastc game framework. Developed for dscounted reward, zero-sum stochastc games, the essence of the algorthm was to use Q-learnng to learn the values of jont actons. The value of the next state was then computed by solvng for the value of the unque Nash equlbrum of that state s Q-values. Lttman proved that under usual exploraton requrements, Mnmax-Q would converge to the Nash equlbrum of the game, ndependent of the opponent s play. Other algorthms have snce been presented for learnng n stochastc games. We wll summarze these algorthms by broadly groupng them nto two categores: equlbra learners and best-response learners. The man focus of ths summarzaton s to demonstrate how the exstence of equlbra under lmtatons s a crtcal queston to exstng algorthms. 5

6 Equlbra Learners. Mnmax-Q has been extended n many dfferent ways. Nash-Q (Hu & Wellman, 998), Frend-or-Foe-Q (Lttman, 2), Correlated-Q (Greenwald & Hall, 22) are all varatons on ths same theme wth dfferent restrctons on the applcable class of games or the noton of equlbra learned. All of the algorthms, though, seek to learn an equlbrum of the game drectly, by teratvely computng ntermedate equlbra. They are, generally speakng, guaranteed to converge to ther part of an equlbrum soluton regardless of the play or convergence of the other agents. We refer collectvely to these algorthms as equlbra learners. What s mportant to observe s that these algorthms depend explctly on the exstence of equlbra. If an agent or agents were lmted n such a way so that no equlbra exsted then these algorthms would be, for the most part, ll-defned. 2 Best-Response Learners. Another class of algorthms s the class ofbest-response learners. These algorthms do not explctly seek to learn equlbra, nstead seekng to learn best-responses to the other agents. The smplest example of one of these algorthms s Q-learnng (Watkns, 989). Although not an explctly multagent algorthm, t was one of the frst algorthms appled to multagent envronments (Tan, 993; Sen, Sekaran, & Hale, 994). Another less nave best-response learnng algorthm s WoLF-PHC (Bowlng & Veloso, 22), whch vares the learnng rate to account for the other agents learnng smultaneously. Other best-response learners nclude Fcttous Play (Robnson, 95; Vreze, 987), Opponent-Modelng Q-Learnng (Uther & Veloso, 997), Jont Acton Learners (Claus & Boutler, 998), and any sngle-agent learnng algorthm that learns optmal polces. Although these algorthms have no explct dependence on equlbra, there s an mportant mplct dependence. If algorthms that learn best-responses converge when playng each other, then t must be to a Nash equlbrum (Bowlng & Veloso, 22). Therefore, Nash equlbra are, and are the only, learnng fxed ponts. In the context of agent lmtatons, ths means that f lmtatons cause equlbra to not exst, then best-response learners could not converge. Ths s exactly one of the problems faced by Q-learnng n stochastc games. Q-learnng s lmted to determnstc polces. Ths determnstc polcy lmtaton can, n fact, cause no equlbra to exst (see Fact n Secton 4.) So there are many games for whch Q-learnng cannot converge when playng wth other best-response learners, such as other Q-learners. In summary, both equlbra and best-response learners depend on the exstence of equlbra. The next secton explores agent lmtatons that are lkely to be faced n realstc learnng stuatons. In Secton 4, we then present our man results examnng the effect these lmtatons have on the exstence of equlbra, and consequently on both equlbra and best-response learners. 3. Lmtatons The soluton concept of Nash equlbra depends on all the agents playng optmally. From the agent development perspectve, agents have lmtatons that prevent ths from beng a realty. The workng defnton of lmtaton n ths artcle s anythng that can restrct the agent from learnng or playng optmal polces. Broadly speakng, lmtatons can be classfed nto two categores: physcal lmtatons and ratonal lmtatons. Physcal lmtatons are those caused by the nteracton 2. It should be noted that n the case of Mnmax-Q, the algorthm and soluton concept are stll well-defned. A polcy that maxmzes ts worst-case value may stll exst even f lmtatons make t such that no equlbra exst. But, ths mnmax optmal polcy mght not necessarly be part of an equlbrum. Later, n Secton 4, Fact 5, we present an example of a zero-sum stochastc game and agent lmtatons where the mnmax optmal polces exst but do not comprse an equlbrum. 6

7 of the agent wth ts envronment and are often unavodable. Ratonal lmtatons are lmtatons specfcally chosen by the agent desgner to make the learnng problem tractable, ether n memory or tme. We brefly explore some of these lmtatons nformally before presentng a formal model of lmtatons that attempts to capture ther effect wthn the stochastc game framework. 3. Physcal Lmtatons One obvous physcal lmtaton s that the agent smply s broken. A moble agent may cease to move or less drastcally may lose the use of one of ts actuators preventng certan movements. Smlarly, another agent may appear to be broken when n fact the moton s smply outsde ts capabltes. For example, n a moble robot envronment where the rules allow robots to move up to two meters per second, there may be a robot that sn t capable of reachng that speed. An agent that s not broken, may suffer from poor control where ts actons aren t always carred out as desred, e.g., due to poorly tuned servos, nadequate wheel tracton, or hgh system latency. Another common physcal lmtaton s hardwred behavor. Most agents n dynamc domans need some amount of hard-wrng for fast response and safety. For example, many moble robot platforms are programmed to mmedately stop f an obstacle s too close. These hardwred actons prevent certan behavor by the agent, whch s often unsafe but s potentally optmal. Sensng s a common area of agent lmtatons contanng everythng from nose to partal observablty. Here we ll menton just one broad category of sensng problems: state alasng. Ths occurs when an agent cannot dstngush between two dfferent states of the world. An agent may need to remember past states and actons n order to properly dstngush the states, or may smply execute the same acton n both states. 3.2 Ratonal Lmtatons Ratonal lmtatons are a requrement for agents to learn n even moderately szed problems. Technques for makng learnng scale, whch often focus on near-optmal solutons, contnue to be proposed and nvestgated n sngle-agent learnng. They are lkely to be even more necessary n multagent envronments whch tend to have larger state spaces. We wll examne a few specfc methods. In domans wth sparse rewards one common technque s reward shapng, e.g., (Matarc, 994). A desgner artfcally rewards the agent for actons the desgner beleves to be progressng toward the sparse rewards. Ths can often speed learnng by focusng exploraton, but also can cause the agent to learn suboptmal polces. For example, n robotc soccer movng the ball down the feld s a good heurstc for goal progresson, but at tmes the optmal goal-scorng polcy s to pass the ball backwards to an open teammate. Subproblem reuse also has a smlar effect, where a subgoal s used n a porton of the state space to speed learnng, e.g., (Hauskrecht, Meuleau, Kaelblng, Dean, & Boutler, 998; Bowlng & Veloso, 999). These subgoals, though, may not be optmal for the global problem and so prevent the agent from playng optmally. Temporally abstract optons, ether provded (Sutton, Precup, & Sngh, 998) or learned (McGovern & Barto, 2; Uther, 22), also enforce a partcular subpolcy on a porton of the state space. Although n theory, the prmtve actons are stll avalable to the agents to play optmal polces, n practce abstracton away from prmtve actons s often necessary n large or contnuous state spaces. Parameterzed polces are recevng a great deal of attenton as a way for renforcement learnng to scale to large problems, e.g., (Wllams & Bard, 993; Sutton et al., 2; Baxter & Bartlett, 7

8 2). The dea s to gve the learner a polcy that depends on far less parameters than the entre polcy space actually would requre. Learnng s then performed n ths smaller space of parameters usng gradent technques. Ths smplfes and speeds learnng at the expense of possbly not beng able to represent the optmal polcy n the parameter space. 3.3 Models of Lmtatons Ths enumeraton of lmtatons shows that there are a number and varety of lmtatons wth whch agents may be faced, and they cannot be realstcally avoded. In order to understand ther mpact on equlbra we model lmtatons formally wthn the game theoretc framework. We ntroduce two models that capture broad classes of lmtatons: mplct games and restrcted polcy spaces. Implct Games. Lmtatons may cause an agent to play suboptmally, but t may be that the agent s actually playng optmally n a dfferent game. If ths new game can be defned wthn the stochastc game framework we call ths the mplct game, n contrast to the orgnal game called the explct game. For example, reward shapng adds artfcal rewards to help gude the agent s search. Although the agent s no longer learnng an optmal polcy n the explct game, t s learnng an optmal polcy of some game, specfcally the game wth these addtonal rewards added to that agent s R functon. Another example s due to broken actuators preventng an agent from takng some acton. The agent may be suboptmal n the explct game, whle stll beng optmal n the mplct game defned by removng these actons from the agent s acton set, A. We can formalze ths concept n the followng defnton. Defnton 3 Gven a stochastc game (n, S, A...n, T, R...n ) the tuple (n, S, Â...n, ˆT, ˆR...n ) s an mplct game f and only f t s tself a stochastc game and there exst mappngs, such that, τ : S Â A, s, s S â Â ˆT (s, â =...n, s ) = T (s, τ (s, â ) =...n, s ). Reward shapng and broken actuators can both be captured wthn ths model. For reward shapng the mplct game s (n, S, A...n, T, ˆR...n ), where ˆR adds the shaped reward nto the orgnal reward, R. In ths case the τ mappngs are just the dentty, τ (s, a) = a. For the broken actuator example, let a A be some null acton for agent and let a b A be some broken acton for agent that under the lmtaton has the same effect as the null acton. The mplct game, then, s (n, S, A...n, ˆT, ˆR...n ), where, ˆT (s, a, s ) = ˆR(s, a) = { T (s, a, a, s ) f a = a b T (s, a, s ) otherwse { R(s, a, a ) f a = a b R(s, a) otherwse, and, τ (s, a) = { a f a = a b a otherwse. 8

9 Lmtatons captured by ths model can be easly analyzed wth respect to ther effect on the exstence of equlbra. Usng the ntutve defnton of equlbra as a jont polcy such that no player can do better by changng polces, an equlbrum n the mplct game acheves ths defnton for the explct game. Snce all stochastc games have at least one equlbrum, so must the mplct game, and therefore the explct game when accountng for the agents lmtatons also has an equlbrum. On the other hand, many of the lmtatons descrbed above cannot be modeled n ths way. None of the lmtatons of abstracton, subproblem reuse, parameterzed polces, or state alasng lend themselves to be descrbed by ths model. Ths leads us to our second, and n many ways more general, model of lmtatons. Restrcted Polcy Spaces. The second model s that of restrcted polcy spaces, whch models lmtatons as restrctng the agent from playng certan polces. For example, a fxed exploraton strategy restrcts the player to polces that select all actons wth some mnmum probablty. Parameterzed polcy spaces have a restrcted polcy space correspondng to the space of polces that can be represented by ther parameters. We can defne ths formally. Defnton 4 A restrcted polcy space for player s a non-empty and compact subset, Π Π. The assumpton of compactness 3 may at frst appear strange, but t s not partcularly lmtng, and s crtcal for any equlbra analyss. It should be straghtforward to see that parameterzed polces, exploraton, state alasng (wth no memory), and subproblem reuse all can be captured as a restrcton on polces the agent can play. Therefore they can be naturally descrbed as restrcted polcy spaces. On the other hand, the analyss of the exstence of equlbra under ths model s not at all straghtforward. Snce restrcted polcy spaces capture most of the really nterestng lmtatons we have dscussed, ths s precsely the focus of the next secton. Before movng on to ths analyss, we summarze our enumeraton of lmtatons n Table 2. The lmtatons that we have been dscussed are lsted as well as denotng the model that most naturally captures ther effect on agent behavor. 4. Exstence of Equlbra In ths secton we defne formally the concept of restrcted equlbra, whch account for agents restrcted polcy spaces. We then carefully analyze what can be proven about the exstence of restrcted equlbra. The results presented range from somewhat trval examples (Facts, 2, 3, and 4) and applcatons of known results from game theory and basc analyss (Theorems and 5) to results that we beleve are completely new (Theorems 2, 3, and 4), as well as a crtcal counterexample to the wder exstence of restrcted equlbra (Fact 5). But all of the results are n a sense novel snce ths specfc queston has receved no drect attenton n the game theory nor the multagent learnng lterature. 3. Snce Π s a subset of a bounded set, the requrement that Π s compact merely adds that the lmt pont of any sequence of elements from the set s also n the set. 9

10 Physcal Lmtatons Implct Games Restrcted Polces Broken Actuators X X Hardwred Behavor X X Poor Control X State Alasng X Ratonal Lmtatons Implct Games Restrcted Polces Reward Shapng or Incentves X Exploraton X X State Abstracton/Optons X Subproblems X Parameterzed Polcy X Table 2: Common agent lmtatons. The column check-marks correspond to whether the lmtaton can be modeled straghtforwardly usng mplct games and/or restrcted polcy spaces. 4. Restrcted Equlbra We begn by defnng the concept of equlbra under the model of restrcted polcy spaces. Frst we need a noton of best-response that accounts for the players lmtatons. Defnton 5 A restrcted best-response for player, BR (π ), s the set of all polces from Π that are optmal gven the other player(s) play the jont polcy π. We can now use ths to defne an equlbrum. Defnton 6 A restrcted equlbrum s a jont polcy, π =...n, where, π BR (π ). So no player can wthn ther restrcted polcy space do better by changng polces gven that all the other players contnue to follow the equlbrum polcy. 4.2 Exstence of Restrcted Equlbra We can now state some results about when equlbra are preserved by restrcted polcy spaces, and when they are not. Unless otherwse stated (as n Theorems 2 and 4, whch only apply to dscounted reward), the results presented here apply equally to both the dscounted reward and the average reward formulatons. We wll separate the proofs for the two reward formulatons when needed. The frst four facts show that the queston of the exstence of restrcted equlbra does not have a trval answer. Fact Restrcted equlbra do not necessarly exst. Proof. Consder the Rock-Paper-Scssors matrx game wth players restrcted to the space of determnstc polces. There are nne jont determnstc polces, and none of these jont polces are equlbra.

11 Fact 2 There exst restrcted polcy spaces such that restrcted equlbra exst. Proof. One trval restrcted equlbrum s n the case where all agents have a sngleton polcy subspace. The sngleton jont polcy therefore must be a restrcted equlbrum. Fact 3 If π s a Nash equlbrum and π Π, then π s a restrcted equlbrum. Proof. If π s a Nash equlbrum, then we have Snce Π Π, then we also have {... n} π Π {... n} π Π V π V π V π,π. V π,π, and thus π s a restrcted equlbrum. On the other hand, the converse s not true; not all restrcted equlbra are of ths trval varety. Fact 4 There exst non-trval restrcted equlbra that are nether Nash equlbra nor come from sngleton polcy spaces. Proof. Consder the Rock-Paper-Scssors matrx game from Fgure. Suppose the column player s forced, due to some lmtaton, to play Paper exactly half the tme, but s free to choose between Rock and Scssors otherwse. Ths s a restrcted polcy space that excludes the only Nash equlbrum of the game. We can solve ths game usng the mplct game model, by gvng the lmted player only two actons, s = (.5,.5, ) and s 2 = (,.5,.5), whch the player can mx between. Ths s depcted graphcally n Fgure. We can solve the mplct game and convert the two actons back to actons of the explct game to fnd a restrcted equlbrum. Notce ths restrcted equlbrum s not a Nash equlbrum. Notce that the Fact 4 example has a convex polcy space,.e., all lnear combnatons of polces n the set are also n the set. Also, notce that the Fact counterexample has a non-convex polcy space, Ths suggests that restrcted equlbra may exst as long as the restrcted polcy space s convex. We can prove ths for matrx games, but unfortunately t s not generally true for stochastc games. Theorem When S =,.e. n matrx games, f Π s convex, then there exsts a restrcted equlbrum. Proof. One mght thnk of provng ths by appealng to mplct games as was used n Fact 4. In fact, f Π was a convex hull of a fnte number of strateges, ths would be the case. In order to prove t for any convex Π we apply Rosen s theorem about the exstence of equlbra n concave games (Rosen, 965). In order to use ths theorem we need to show the followng:. Π s non-empty, compact, and convex. 2. V π as a functon over π Π s contnuous. 3. For any π Π, the functon over π Π defned as V π,π s concave.

12 s R Restrcted Polcy Space Nash Equlbrum Restrcted Equlbrum P s 2 S Payoffs Nash Equlbrum Restrcted Equlbrum Explct Game 3, 3, 3, 3, 3, 3 Implct Game , 3, 2 3, 2 3, 3, 3, 2 3, 3, 2, 6 Fgure : Example of a restrcted equlbrum that s not a Nash equlbrum. Here, the column player n Rock-Paper-Scssors s restrcted to playng only lnear combnatons of the strateges s = 2, 2, and s 2 =, 2, 2. Condton s by assumpton. In matrx games, where S = {s }, we can smplfy the defnton of a polcy s value from Equatons and 2. V π = γ R (s, a)π n =π (s, a ), (3) a A where γ = for the average reward formulaton. Equaton 3 shows that the value s a multlnear functon wth respect to the jont polcy and therefore s contnuous. So Condton 2 s satsfed. Observe that by fxng the polces for all but one player Equaton 3 becomes a lnear functon over the remanng player s polcy and so s also concave satsfyng Condton 3. Therefore Rosen s theorem apples and ths game has a restrcted equlbrum. Fact 5 For a stochastc game, even f Π s convex, restrcted equlbra do not necessarly exst. Proof. Consder the stochastc game n Fgure 2. Ths s a zero-sum game where only the payoffs to the row player are shown. The dscount factor γ (, ). The actons avalable to the row player are U and D, and for the column player L or R. From the ntal state, the column player may select ether L or R whch results n no rewards but wth hgh probablty, ɛ, transtons to the specfed state (regardless of the row player s acton), and wth low probablty, ɛ, transtons to the opposte state. Notce that ths stochastcty s not explctly shown n Fgure 2. In each of the resultng states the players play the matrx game shown and then determnstcally transton back to the ntal state. Notce that ths game s unchan, where all the states are n a sngle ergodc set, thus satsfyng the average reward formulaton requrement. 2

13 L s R s L s R Fgure 2: An example stochastc game where convex restrcted polcy spaces don t preserve the exstence of equlbra. Now consder the restrcted polcy space where players have to play ther actons wth the same probablty n all states. So, Π = { π Π s, s S a A π (s, a) = π (s, a) }. (4) Notce that ths s a convex set of polces. That s, f polces x and x 2 are n Π (accordng to Equaton 4), then for any α [, ], x 3 must also be n Π, where, x 3 (s, a) = αx (s, a) + ( α)x 2 (s, a). (5) Ths can be seen by examnng x 3 (s, a) for any s S. From Equaton 5, we have, x 3 (s, a) = αx (s, a) + ( α)x 2 (s, a) (6) = αx (s, a) + ( α)x 2 (s, a) (7) = x 3 (s, a). (8) Therefore, x 3 s n Π and hence Π s convex. Ths game, though, does not have a restrcted equlbrum. The four possble jont determnstc polces, (U, L), (U, R), (D, L), and (D, R), are not equlbra. So f there exsts an equlbrum t must be mxed. Consder any mxed strategy for the row player. If ths strategy plays U wth probablty less than 2 then the unque best-response for the column player s to play L; f greater than 2 then the unque best-response s to play R; f equal then the unque best-responses are to play L or R determnstcally. In all cases all best-responses are determnstc, so ths rules out mxed strategy equlbra, and so no equlbra exsts. Convexty s not a strong enough property to guarantee the exstence of restrcted equlbra. Standard equlbrum proof technques fal for ths example due to the fact that the player s bestresponse sets are not convex, even though ther restrcted polcy spaces are convex. Notce that the best-response to the row player mxng equally between actons s to play ether of ts actons determnstcally. But, lnear combnatons of these actons (e.g., mxng equally) are not bestresponses. Ths ntuton s proven n the followng lemma. Lemma For any stochastc game, f Π s convex and for all π Π, BR (π ) s convex, then there exsts a restrcted equlbrum. 3

14 Proof. The proof reles on Kakutan s fxed pont theorem. We frst need to show some facts about the restrcted best-response functon. Frst, remember that Π s non-empty and compact. Also, note that the value (wth both dscounted and average reward) to a player at any state of a jont polcy s a contnuous functon of that jont polcy (Flar & Vreze, 997, Theorem and Lemma 5..4). Therefore, from basc analyss (Gaughan, 993, Theorem 3.5 and Corollary 3.), the set of maxmzng (or optmal) ponts must be a non-empty and compact set. So BR (π ) s non-empty and compact. Defne the set-valued functon, F : Π Π, F (π) = n = BR (π ). We want to show F has a fxed pont. To apply Kakutan s fxed pont theorem we must show the followng condtons to be true,. Π s a non-empty, compact, and convex subset of a Eucldean space, 2. F (π) s non-empty, 3. F (π) s compact and convex, and 4. F s upper hem-contnuous. Snce the Cartesan product of non-empty, compact, and convex sets s non-empty, compact, and convex we have condton () by the assumptons on Π. By the facts of BR from above and the lemma s assumptons we smlarly get condtons (2) and (3). What remans s to show condton (4). Consder two sequences x j x Π and y j y Π such that j y j F (x j ). It must be shown that y F (x), or just y BR (x). Let v be y s value aganst x. By contradcton assume there exsts a y wth hgher value, v than y ; let δ = v v. Snce the value functon s contnuous we can choose an N large enough that the value of y aganst x N dffers from v by at most δ/4 4, and the value of y aganst x N dffers from v by at most δ/4, and the value of y N aganst x N dffers from y aganst x N by at most δ/4. The comparson of values of these varous jont polces s shown n Fgure 3. Addng all of these together, we have a pont n the sequence y n>n whose value aganst x n s less than the value of y aganst x n. So y n / BR (x n ), and therefore y n / F (x n ) creatng our contradcton. (y, x) (y, x N ) (y N, x N ) (y, x N ) (y, x) v δ/4 δ/4 δ/4 v Fgure 3: An llustraton of the demonstraton by contradcton that the best-response functons are upper hem-contnuous. We can now apply Kakutan s fxed pont theorem. So there exsts π Π such that π F (π). Ths means π BR (π ), and therefore ths s a restrcted equlbrum. 4. Ths value s arbtrarly selected and s only requred to be strctly smaller than δ/3. 4

15 The consequence of ths lemma s that, f we can prove that the sets of restrcted best-responses are convex then restrcted equlbra exst. As we have stated earler ths was not true of the counterexample n Fact 5. The next four theorems all further lmt ether the restrcted polcy spaces or the stochastc game to stuatons where the best-response sets are provably convex. We wll frst examne a specfc class of restrcted polcy spaces, and then examne specfc classes of stochastc games A SUBCLASS OF RESTRICTED POLICIES Our frst result for general stochastc games uses a stronger noton of convexty for restrcted polcy spaces. Defnton 7 A restrcted polcy space Π s statewse convex, f t s the Cartesan product over all states of convex strategy sets. Equvalently, f for all x, x 2 Π and all functons α : S [, ], the polcy x 3 (s, a) = α(s)x (s, a) + ( α(s))x 2 (s, a) s also n Π. Theorem 2 In the dscounted reward formulaton, f Π s statewse convex, then there exsts a restrcted equlbrum. Proof. Wth statewse convex polcy spaces, there exst optmal polces n the strong sense as mentoned n Secton 2. Specfcally, there exsts a polcy that can smultaneously maxmze the value of all states. Formally, for any π there exsts a π Π such that, s S π Π V π,π (s) V π,π (s). Suppose ths were not true,.e., there were two polces each whch maxmzed the value of dfferent states. We can construct a new polcy that n each state follows the polcy whose value s larger for that state. Ths polcy wll maxmze the value of both states that those polces maxmzed, and due to statewse convexty s also n Π. We wll use that fact to redefne optmalty to ths strong sense for ths proof. We wll now make use of Lemma. Frst, notce the lemma s proof stll holds even wth ths new defnton of optmalty. We just showed that under ths redefnton, BR (π ) s non-empty, and the same argument for compactness of BR (π ) holds. So we can make use of Lemma and what remans s to prove that BR (π ) s convex. Snce π s a fxed polcy for all the other players ths defnes an MDP for player (Flar & Vreze, 997, Corollary 4.2.). So we need to show that the set of polces from the player s restrcted set that are optmal for ths MDP s a convex set. Concretely, f x, x 2 Π are optmal for ths MDP, then the polcy x 3 (s, a) = αx (s, a) + ( α)x 2 (s, a) s also optmal for any α [, ]. Snce x and x 2 are optmal n the strong sense,.e., maxmzng the value of all states smultaneously, then they must have the same per-state value. Here, we wll use the notaton V x (s) to refer to the value of polcy x from state s n ths fxed MDP. The value functon for any polcy satsfes the Bellman equatons, specfcally, ( ) s S V x (s) = a x(s, a) R(s, a) + γ s T (s, a, s )V x (s ). (9) For x 3 then we get the followng, V x 3 (s) = ( x 3 (s, a) R(s, a) + γ ) T (s, a, s )V x 3 (s ) a s () 5

16 = ( (αx (s, a) + ( α)x 2 (s, a)) R(s, a) + γ ) T (s, a, s )V x 3 (s ) () a s = α ( x (s, a) R(s, a) + γ ) T (s, a, s )V x 3 (s ) + a s ( α) ( x 2 (s, a)) R(s, a) + γ ) T (s, a, s )V x 3 (s ). (2) a s Notce that V x 3 (s) = V x (s) = V x 2 (s) satsfes these equatons. So x 3 has the same values as x and x 2, and s therefore also optmal. Therefore BR (π ) s convex, and from Lemma we get the exstence of restrcted equlbra under ths strcter noton of optmalty, whch also makes the polces a restrcted equlbra under our orgnal noton of optmalty, that s only maxmzng the value of the ntal state SUBCLASSES OF STOCHASTIC GAMES Unfortunately, most ratonal lmtatons that allow renforcement learnng to scale are not statewse convex restrctons, and usually have some dependence between states. For example, parameterzed polces nvolve far less parameters than the number of states, whch can be ntractably large, and so the space of polces cannot select actons at each state ndependently. Smlarly subproblems force whole portons of the state space to follow the same subproblem soluton. Therefore, these portons of the state space do not select ther actons ndependently. One way to relax from statewse convexty to general convexty s to consder only a subset of stochastc games. Theorem 3 Consder no-control stochastc games, where all transtons are ndependent of the players actons,.e., s, s S a, b A T (s, a, s ) = T (s, b, s ). If Π s convex, then there exsts a restrcted equlbrum. Proof (Dscounted Reward). Ths proof also makes use of Lemma, leavng us only to show that BR (π ) s convex. Just as n the proof of Theorem 2 we wll consder the MDP defned for player when the other players follow the fxed polcy π. As before t suffces to show that for ths MDP, f x, x 2 Π are optmal for ths MDP, then the polcy x 3 (s, a) = αx (s, a) + ( α)x 2 (s, a) s also optmal for any α [, ]. Agan we use the notaton V π (s) to refer to the tradtonal value of a polcy π at state s n ths fxed MDP. Snce T (s, a, s ) s ndependent of a, we can smplfy the Bellman equatons (Equaton 9) to V x (s) = x(s, a)r(s, a) + γ x(s, a)t (s, a, s )V x (s ) (3) a s a = a x(s, a)r(s, a) + γ s T (s,, s )V x (s ). (4) For the polcy x 3, the value of state s s then, V x 3 (s) = α a x (s, a)r(s, a) + ( α) a x 2 (s, a)r(s, a) + 6

17 γ s T (s,, s )V x 3 (s ). (5) Usng equaton 4 for both x and x 2 we get, V x 3 (s) = α(v x (s) γ s T (s,, s )V x (s )) + ( α)(v x 2 (s) γ s T (s,, s )V x 2 (s )) + γ s T (s,, s )V x 3 (s ) (6) = αv x (s) + ( α)v x 2 (s) + γ s T (s,, s ) ( V x 3 (s ) αv x (s ) ( α)v x 2 (s ) ) (7) Notce that a soluton to these equatons s V x 3 (s) = αv x (s)+( α)v x 2 (s). Therefore V x 3 (s ) s equal to V x (s ) and V x 2 (s ), whch are equal snce both are optmal. So x 3 s optmal, and BR (π) s convex. Applyng Lemma we get that restrcted equlbra exst. Proof (Average Reward). An equvalent defnton to Equaton 2 of a polcy s average reward s, V π (s) = d π (s) a π(s, a)r(s, a), (8) where d π (s) defnes the dstrbuton over states vsted whle followng π after nfnte tme. For a stochastc game or MDP that s unchan we know that ths dstrbuton s ndependent of the ntal state. In the case of no-control stochastc games or MDPs, ths dstrbuton becomes ndependent of the actons and polces of the players, and depends solely on the transton probabltes. So Equaton 8 can be wrtten, V π (s) = d(s) a π(s, a)r(s, a). (9) As before, we must show that BR (π ) s convex to apply Lemma. Consder the MDP defned for player when the other players follow the polcy π. It suffces to show that for ths MDP, f x, x 2 Π are optmal for ths MDP, then the polcy x 3 (s, a) = αx (s, a) + ( α)x 2 (s, a) s also optmal for any α [, ]. Usng Equaton 9, we can wrte the value of x 3 as, V x 3 (s) = d(s) a x 3 (s, a)r(s, a) (2) = d(s) (αx (s, a) + ( α)x 2 (s, a)) R(s, a) (2) a ( = d(s) αx (s, a)r(s, a) + ) ( α)x 2 (s, a)r(s, a) (22) a a ( ) ( ) = α d(s) x (s, a)r(s, a) + ( α) d(s) x 2 (s, a)r(s, a) (23) a a = αv x (s) + ( α)v x 2 (s). (24) 7

18 Therefore x 3 has the same average reward as x and x 2 and so s also optmal. So BR (π ) s convex and by Lemma there exsts an equlbrum. We can now merge Theorem 2 and Theorem 3 allowng us to prove exstence of equlbra for a general class of games where only one of the player s actons affects the next state. Theorem 4 Consder sngle-controller stochastc games (Flar & Vreze, 997), where all transtons depend solely on player s actons,.e., s, s S a, b A a = b T (s, a, s ) = T (s, b, s ). In the dscounted reward formulaton, f Π s statewse convex and Π s convex, then there exsts a restrcted equlbrum. Proof. Ths proof agan makes use of Lemma, leavng us to show that BR (π ) s convex. For = we use the argument from the proof of Theorem 2. For we use the argument from Theorem 3. The prevous results have looked at stochastc games whose transton functons have partcular propertes. Our fnal theorem examnes stochastc games where the rewards have a partcular structure. Specfcally we address team games, where the agents all receve equal payoffs. Theorem 5 For team games,.e.,, j {,..., n} s S a A R (s, a) = R j (s, a), there exsts a restrcted equlbrum. Proof. The only constrants on the players restrcted polcy spaces are those stated at the begnnng of ths secton: non-empty and compact. Snce Π s compact, beng a Cartesan product of compact sets, and player one s value n ether formulaton s a contnuous functon of the jont polcy, then the value functon attans ts maxmum (Gaughan, 993, Corollary 3.). Specfcally, there exsts π Π such that, π Π V π V π. Snce V = V we then get that the polcy π maxmzes all the players rewards, and so each must be playng a restrcted best-response to the others polces. 4.3 Summary Facts and 5 provde counterexamples that show the threat lmtatons play to equlbra. Theorems, 2, 4, and 5 gve us four general classes of stochastc games and restrcted polcy spaces where equlbra are guaranteed to exst. The fact that equlbra do not exst n general rases concerns about equlbra as a general bass for multagent learnng n domans where agents have lmtatons. On the other hand, combned wth the model of mplct games, the presented theoretcal results lays the ntal groundwork for understandng when equlbra can be reled on and when ther exstence may be n queston. These contrbutons also provde some formal foundaton for applyng multagent learnng n lmted agent problems. 8

19 5. Learnng wth Lmtatons In Secton 2, we hghlghted the mportance of the exstence of equlbra to multagent learnng algorthms. Ths secton presents results of applyng a partcular learnng algorthm to a settng of lmted agents. We use the best-response learner, WoLF-PHC (Bowlng & Veloso, 22). Ths algorthm s ratonal, that s, t s guaranteed to converge to a best-response f the other players polces converge. In addton, t has been emprcally shown to converge n self-play, where both players use WoLF-PHC for learnng. In ths artcle we apply ths algorthm n self-play to matrx games, both wth and wthout player lmtatons. Snce the algorthm s ratonal, f the players converge ther converged polces must be an equlbrum (Bowlng & Veloso, 22). The specfc lmtatons we examne fall nto both the restrcted polcy space model as well as the mplct game model. One player s restrcted to playng strateges that are the convex hull of a subset of the avalable strateges. From Theorem, there exsts a restrcted equlbrum wth these lmtatons. For best-response learners, ths amounts to a possble convergence pont for the players. For the lmted player, the WoLF-PHC algorthms modfed slghtly so that the player mantans Q- values of ts restrcted set of avalable strateges and performs ts usual hll-clmbng n the mxed space of these strateges. The unlmted player s unchanged and completely unnformed of the lmtaton of ts opponent. 5. Rock-Paper-Scssors The frst game we examne s Rock-Paper-Scssors. Fgure 4 shows the results of learnng when nether player s lmted. Each graph shows the mxed polcy the player s playng over tme. The labels to the rght of the graph sgnfy the probabltes of each acton n the game s unque Nash equlbrum. Observe that the players strateges converge to ths learnng fxed pont..8 Player (Unlmted) P(Rock) P(Paper) P(Scssors).8 Player 2 (Unlmted) P(Rock) P(Paper) P(Scssors) Rock Paper Scssors.4.2 Rock Paper Scssors 2 3 E(Reward) = 2 3 E(Reward) = Fgure 4: WoLF-PHC n Rock-Paper-Scssors. Nether player s lmted. Fgure 5 shows the results of restrctng player to a convex restrcted polcy space, defned by requrng the player to play Paper exactly half the tme. Ths s the same restrcton as shown graphcally n Fgure. The graphs agan show the players strateges over tme, and the labels to the rght now label the game s restrcted equlbrum, whch accounts for the lmtaton (see Fgure ). The player s strateges now converge to ths new learnng fxed pont. If we examne the expected rewards to the players, we see that the unrestrcted player gets a hgher expected reward n 9

20 the restrcted equlbrum than n the game s Nash equlbrum (/6 compared to.) In summary, both players learn optmal best-response polces wth the unrestrcted learner approprately takng advantage of the other player s lmtaton..8 Player (Lmted) P(Rock) P(Paper) P(Scssors).8 Player 2 (Unlmted) P(Rock) P(Paper) P(Scssors).6.4 Paper Rock.6.4 Scssors Paper.2 Scssors E(Reward) = -.67 Rock 2 3 E(Reward) =.67 Fgure 5: WoLF-PHC n Rock-Paper-Scssors. Player must play Paper wth probablty Colonel Blotto The second game we examned s Colonel Blotto (Gnts, 2), whch s also a zero-sum matrx game. In ths game, players smultaneously allot regments to one of two battlefelds. If one player allots more armes to a battlefeld than the other, they receve a reward of one plus the number of armes defeated, and the other player loses ths amount. If the players te, then the reward s zero for both. In the unlmted game, the row player has four regments to allot, and the column player has only three. The matrx of payoffs for ths game s shown n Fgure 6. R (s, a) = Fgure 6: Colonel Blotto Game. The row player s rewards are shown; the column player receves the negatve of ths reward. Fgure 7 shows expermental results wth unlmted players. The labels on the rght sgnfy the probabltes assocated wth the Nash equlbrum to whch the players strateges converge. Player s then gven the lmtaton that t could only allot two of ts armes, the other two would be allotted randomly. Ths s also a convex restrcted polcy space and therefore by Theorem has a restrcted equlbrum. Fgure 8 shows the learnng results. The labels to the rght correspond to the acton probabltes for the restrcted equlbrum, whch was computed by hand. As n Rock-Paper-Scssors, the players strateges converge to the new learnng fxed pont. Smlarly, the expected reward for the unrestrcted player resultng from the restrcted equlbrum s consderably 2

OPERATIONS RESEARCH. Game Theory

OPERATIONS RESEARCH. Game Theory OPERATIONS RESEARCH Chapter 2 Game Theory Prof. Bbhas C. Gr Department of Mathematcs Jadavpur Unversty Kolkata, Inda Emal: bcgr.umath@gmal.com 1.0 Introducton Game theory was developed for decson makng

More information

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem. Topcs on the Border of Economcs and Computaton December 11, 2005 Lecturer: Noam Nsan Lecture 7 Scrbe: Yoram Bachrach 1 Nash s Theorem We begn by provng Nash s Theorem about the exstance of a mxed strategy

More information

Games and Decisions. Part I: Basic Theorems. Contents. 1 Introduction. Jane Yuxin Wang. 1 Introduction 1. 2 Two-player Games 2

Games and Decisions. Part I: Basic Theorems. Contents. 1 Introduction. Jane Yuxin Wang. 1 Introduction 1. 2 Two-player Games 2 Games and Decsons Part I: Basc Theorems Jane Yuxn Wang Contents 1 Introducton 1 2 Two-player Games 2 2.1 Zero-sum Games................................ 3 2.1.1 Pure Strateges.............................

More information

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019 5-45/65: Desgn & Analyss of Algorthms January, 09 Lecture #3: Amortzed Analyss last changed: January 8, 09 Introducton In ths lecture we dscuss a useful form of analyss, called amortzed analyss, for problems

More information

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME Vesna Radonć Đogatovć, Valentna Radočć Unversty of Belgrade Faculty of Transport and Traffc Engneerng Belgrade, Serba

More information

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics Unversty of Illnos Fall 08 ECE 586GT: Problem Set : Problems and Solutons Unqueness of Nash equlbra, zero sum games, evolutonary dynamcs Due: Tuesday, Sept. 5, at begnnng of class Readng: Course notes,

More information

Introduction to game theory

Introduction to game theory Introducton to game theory Lectures n game theory ECON5210, Sprng 2009, Part 1 17.12.2008 G.B. Ashem, ECON5210-1 1 Overvew over lectures 1. Introducton to game theory 2. Modelng nteractve knowledge; equlbrum

More information

Price and Quantity Competition Revisited. Abstract

Price and Quantity Competition Revisited. Abstract rce and uantty Competton Revsted X. Henry Wang Unversty of Mssour - Columba Abstract By enlargng the parameter space orgnally consdered by Sngh and Vves (984 to allow for a wder range of cost asymmetry,

More information

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement CS 286r: Matchng and Market Desgn Lecture 2 Combnatoral Markets, Walrasan Equlbrum, Tâtonnement Matchng and Money Recall: Last tme we descrbed the Hungaran Method for computng a maxmumweght bpartte matchng.

More information

Lecture Note 1: Foundations 1

Lecture Note 1: Foundations 1 Economcs 703 Advanced Mcroeconomcs Prof. Peter Cramton ecture Note : Foundatons Outlne A. Introducton and Examples B. Formal Treatment. Exstence of Nash Equlbrum. Exstence wthout uas-concavty 3. Perfect

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #21 Scrbe: Lawrence Dao Aprl 23, 2013 1 On-Lne Log Loss To recap the end of the last lecture, we have the followng on-lne problem wth N

More information

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization

Single-Item Auctions. CS 234r: Markets for Networks and Crowds Lecture 4 Auctions, Mechanisms, and Welfare Maximization CS 234r: Markets for Networks and Crowds Lecture 4 Auctons, Mechansms, and Welfare Maxmzaton Sngle-Item Auctons Suppose we have one or more tems to sell and a pool of potental buyers. How should we decde

More information

Problems to be discussed at the 5 th seminar Suggested solutions

Problems to be discussed at the 5 th seminar Suggested solutions ECON4260 Behavoral Economcs Problems to be dscussed at the 5 th semnar Suggested solutons Problem 1 a) Consder an ultmatum game n whch the proposer gets, ntally, 100 NOK. Assume that both the proposer

More information

UNIVERSITY OF NOTTINGHAM

UNIVERSITY OF NOTTINGHAM UNIVERSITY OF NOTTINGHAM SCHOOL OF ECONOMICS DISCUSSION PAPER 99/28 Welfare Analyss n a Cournot Game wth a Publc Good by Indraneel Dasgupta School of Economcs, Unversty of Nottngham, Nottngham NG7 2RD,

More information

references Chapters on game theory in Mas-Colell, Whinston and Green

references Chapters on game theory in Mas-Colell, Whinston and Green Syllabus. Prelmnares. Role of game theory n economcs. Normal and extensve form of a game. Game-tree. Informaton partton. Perfect recall. Perfect and mperfect nformaton. Strategy.. Statc games of complete

More information

Equilibrium in Prediction Markets with Buyers and Sellers

Equilibrium in Prediction Markets with Buyers and Sellers Equlbrum n Predcton Markets wth Buyers and Sellers Shpra Agrawal Nmrod Megddo Benamn Armbruster Abstract Predcton markets wth buyers and sellers of contracts on multple outcomes are shown to have unque

More information

Random Variables. b 2.

Random Variables. b 2. Random Varables Generally the object of an nvestgators nterest s not necessarly the acton n the sample space but rather some functon of t. Techncally a real valued functon or mappng whose doman s the sample

More information

Elements of Economic Analysis II Lecture VI: Industry Supply

Elements of Economic Analysis II Lecture VI: Industry Supply Elements of Economc Analyss II Lecture VI: Industry Supply Ka Hao Yang 10/12/2017 In the prevous lecture, we analyzed the frm s supply decson usng a set of smple graphcal analyses. In fact, the dscusson

More information

Understanding Annuities. Some Algebraic Terminology.

Understanding Annuities. Some Algebraic Terminology. Understandng Annutes Ma 162 Sprng 2010 Ma 162 Sprng 2010 March 22, 2010 Some Algebrac Termnology We recall some terms and calculatons from elementary algebra A fnte sequence of numbers s a functon of natural

More information

3: Central Limit Theorem, Systematic Errors

3: Central Limit Theorem, Systematic Errors 3: Central Lmt Theorem, Systematc Errors 1 Errors 1.1 Central Lmt Theorem Ths theorem s of prme mportance when measurng physcal quanttes because usually the mperfectons n the measurements are due to several

More information

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers II. Random Varables Random varables operate n much the same way as the outcomes or events n some arbtrary sample space the dstncton s that random varables are smply outcomes that are represented numercally.

More information

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999 FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS by Rchard M. Levch New York Unversty Stern School of Busness Revsed, February 1999 1 SETTING UP THE PROBLEM The bond s beng sold to Swss nvestors for a prce

More information

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002 TO5 Networng: Theory & undamentals nal xamnaton Professor Yanns. orls prl, Problem [ ponts]: onsder a rng networ wth nodes,,,. In ths networ, a customer that completes servce at node exts the networ wth

More information

Financial mathematics

Financial mathematics Fnancal mathematcs Jean-Luc Bouchot jean-luc.bouchot@drexel.edu February 19, 2013 Warnng Ths s a work n progress. I can not ensure t to be mstake free at the moment. It s also lackng some nformaton. But

More information

Tests for Two Ordered Categorical Variables

Tests for Two Ordered Categorical Variables Chapter 253 Tests for Two Ordered Categorcal Varables Introducton Ths module computes power and sample sze for tests of ordered categorcal data such as Lkert scale data. Assumng proportonal odds, such

More information

Scribe: Chris Berlind Date: Feb 1, 2010

Scribe: Chris Berlind Date: Feb 1, 2010 CS/CNS/EE 253: Advanced Topcs n Machne Learnng Topc: Dealng wth Partal Feedback #2 Lecturer: Danel Golovn Scrbe: Chrs Berlnd Date: Feb 1, 2010 8.1 Revew In the prevous lecture we began lookng at algorthms

More information

Tests for Two Correlations

Tests for Two Correlations PASS Sample Sze Software Chapter 805 Tests for Two Correlatons Introducton The correlaton coeffcent (or correlaton), ρ, s a popular parameter for descrbng the strength of the assocaton between two varables.

More information

Solution of periodic review inventory model with general constrains

Solution of periodic review inventory model with general constrains Soluton of perodc revew nventory model wth general constrans Soluton of perodc revew nventory model wth general constrans Prof Dr J Benkő SZIU Gödöllő Summary Reasons for presence of nventory (stock of

More information

Finance 402: Problem Set 1 Solutions

Finance 402: Problem Set 1 Solutions Fnance 402: Problem Set 1 Solutons Note: Where approprate, the fnal answer for each problem s gven n bold talcs for those not nterested n the dscusson of the soluton. 1. The annual coupon rate s 6%. A

More information

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode. Part 4 Measures of Spread IQR and Devaton In Part we learned how the three measures of center offer dfferent ways of provdng us wth a sngle representatve value for a data set. However, consder the followng

More information

Appendix - Normally Distributed Admissible Choices are Optimal

Appendix - Normally Distributed Admissible Choices are Optimal Appendx - Normally Dstrbuted Admssble Choces are Optmal James N. Bodurtha, Jr. McDonough School of Busness Georgetown Unversty and Q Shen Stafford Partners Aprl 994 latest revson September 00 Abstract

More information

Quiz on Deterministic part of course October 22, 2002

Quiz on Deterministic part of course October 22, 2002 Engneerng ystems Analyss for Desgn Quz on Determnstc part of course October 22, 2002 Ths s a closed book exercse. You may use calculators Grade Tables There are 90 ponts possble for the regular test, or

More information

- contrast so-called first-best outcome of Lindahl equilibrium with case of private provision through voluntary contributions of households

- contrast so-called first-best outcome of Lindahl equilibrium with case of private provision through voluntary contributions of households Prvate Provson - contrast so-called frst-best outcome of Lndahl equlbrum wth case of prvate provson through voluntary contrbutons of households - need to make an assumpton about how each household expects

More information

Parallel Prefix addition

Parallel Prefix addition Marcelo Kryger Sudent ID 015629850 Parallel Prefx addton The parallel prefx adder presented next, performs the addton of two bnary numbers n tme of complexty O(log n) and lnear cost O(n). Lets notce the

More information

New Distance Measures on Dual Hesitant Fuzzy Sets and Their Application in Pattern Recognition

New Distance Measures on Dual Hesitant Fuzzy Sets and Their Application in Pattern Recognition Journal of Artfcal Intellgence Practce (206) : 8-3 Clausus Scentfc Press, Canada New Dstance Measures on Dual Hestant Fuzzy Sets and Ther Applcaton n Pattern Recognton L Xn a, Zhang Xaohong* b College

More information

Applications of Myerson s Lemma

Applications of Myerson s Lemma Applcatons of Myerson s Lemma Professor Greenwald 28-2-7 We apply Myerson s lemma to solve the sngle-good aucton, and the generalzaton n whch there are k dentcal copes of the good. Our objectve s welfare

More information

Problem Set #4 Solutions

Problem Set #4 Solutions 4.0 Sprng 00 Page Problem Set #4 Solutons Problem : a) The extensve form of the game s as follows: (,) Inc. (-,-) Entrant (0,0) Inc (5,0) Usng backwards nducton, the ncumbent wll always set hgh prces,

More information

Static (or Simultaneous- Move) Games of Complete Information

Static (or Simultaneous- Move) Games of Complete Information Statc (or Smultaneous- Move) Games of Complete Informaton Nash Equlbrum Best Response Functon F. Valognes - Game Theory - Chp 3 Outlne of Statc Games of Complete Informaton Introducton to games Normal-form

More information

Quadratic Games. First version: February 24, 2017 This version: August 3, Abstract

Quadratic Games. First version: February 24, 2017 This version: August 3, Abstract Quadratc Games Ncolas S. Lambert Gorgo Martn Mchael Ostrovsky Frst verson: February 24, 2017 Ths verson: August 3, 2018 Abstract We study general quadratc games wth multdmensonal actons, stochastc payoff

More information

Optimal Service-Based Procurement with Heterogeneous Suppliers

Optimal Service-Based Procurement with Heterogeneous Suppliers Optmal Servce-Based Procurement wth Heterogeneous Supplers Ehsan Elah 1 Saf Benjaafar 2 Karen L. Donohue 3 1 College of Management, Unversty of Massachusetts, Boston, MA 02125 2 Industral & Systems Engneerng,

More information

Quadratic Games. First version: February 24, 2017 This version: December 12, Abstract

Quadratic Games. First version: February 24, 2017 This version: December 12, Abstract Quadratc Games Ncolas S. Lambert Gorgo Martn Mchael Ostrovsky Frst verson: February 24, 2017 Ths verson: December 12, 2017 Abstract We study general quadratc games wth mult-dmensonal actons, stochastc

More information

Survey of Math: Chapter 22: Consumer Finance Borrowing Page 1

Survey of Math: Chapter 22: Consumer Finance Borrowing Page 1 Survey of Math: Chapter 22: Consumer Fnance Borrowng Page 1 APR and EAR Borrowng s savng looked at from a dfferent perspectve. The dea of smple nterest and compound nterest stll apply. A new term s the

More information

Clearing Notice SIX x-clear Ltd

Clearing Notice SIX x-clear Ltd Clearng Notce SIX x-clear Ltd 1.0 Overvew Changes to margn and default fund model arrangements SIX x-clear ( x-clear ) s closely montorng the CCP envronment n Europe as well as the needs of ts Members.

More information

A New Uniform-based Resource Constrained Total Project Float Measure (U-RCTPF) Roni Levi. Research & Engineering, Haifa, Israel

A New Uniform-based Resource Constrained Total Project Float Measure (U-RCTPF) Roni Levi. Research & Engineering, Haifa, Israel Management Studes, August 2014, Vol. 2, No. 8, 533-540 do: 10.17265/2328-2185/2014.08.005 D DAVID PUBLISHING A New Unform-based Resource Constraned Total Project Float Measure (U-RCTPF) Ron Lev Research

More information

Problem Set 6 Finance 1,

Problem Set 6 Finance 1, Carnege Mellon Unversty Graduate School of Industral Admnstraton Chrs Telmer Wnter 2006 Problem Set 6 Fnance, 47-720. (representatve agent constructon) Consder the followng two-perod, two-agent economy.

More information

MULTIPLE CURVE CONSTRUCTION

MULTIPLE CURVE CONSTRUCTION MULTIPLE CURVE CONSTRUCTION RICHARD WHITE 1. Introducton In the post-credt-crunch world, swaps are generally collateralzed under a ISDA Master Agreement Andersen and Pterbarg p266, wth collateral rates

More information

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM Yugoslav Journal of Operatons Research Vol 19 (2009), Number 1, 157-170 DOI:10.2298/YUJOR0901157G A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM George GERANIS Konstantnos

More information

Members not eligible for this option

Members not eligible for this option DC - Lump sum optons R6.2 Uncrystallsed funds penson lump sum An uncrystallsed funds penson lump sum, known as a UFPLS (also called a FLUMP), s a way of takng your penson pot wthout takng money from a

More information

Dynamic Analysis of Knowledge Sharing of Agents with. Heterogeneous Knowledge

Dynamic Analysis of Knowledge Sharing of Agents with. Heterogeneous Knowledge Dynamc Analyss of Sharng of Agents wth Heterogeneous Kazuyo Sato Akra Namatame Dept. of Computer Scence Natonal Defense Academy Yokosuka 39-8686 JAPAN E-mal {g40045 nama} @nda.ac.jp Abstract In ths paper

More information

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena Producton and Supply Chan Management Logstcs Paolo Dett Department of Informaton Engeneerng and Mathematcal Scences Unversty of Sena Convergence and complexty of the algorthm Convergence of the algorthm

More information

A Distributed Algorithm for Constrained Multi-Robot Task Assignment for Grouped Tasks

A Distributed Algorithm for Constrained Multi-Robot Task Assignment for Grouped Tasks A Dstrbuted Algorthm for Constraned Mult-Robot Tas Assgnment for Grouped Tass Lngzh Luo Robotcs Insttute Carnege Mellon Unversty Pttsburgh, PA 15213 lngzhl@cs.cmu.edu Nlanjan Charaborty Robotcs Insttute

More information

Lecture Note 2 Time Value of Money

Lecture Note 2 Time Value of Money Seg250 Management Prncples for Engneerng Managers Lecture ote 2 Tme Value of Money Department of Systems Engneerng and Engneerng Management The Chnese Unversty of Hong Kong Interest: The Cost of Money

More information

Bargaining over Strategies of Non-Cooperative Games

Bargaining over Strategies of Non-Cooperative Games Games 05, 6, 73-98; do:0.3390/g603073 Artcle OPEN ACCESS games ISSN 073-4336 www.mdp.com/ournal/games Barganng over Strateges of Non-Cooperatve Games Guseppe Attanas, *, Aurora García-Gallego, Nkolaos

More information

Appendix for Solving Asset Pricing Models when the Price-Dividend Function is Analytic

Appendix for Solving Asset Pricing Models when the Price-Dividend Function is Analytic Appendx for Solvng Asset Prcng Models when the Prce-Dvdend Functon s Analytc Ovdu L. Caln Yu Chen Thomas F. Cosmano and Alex A. Hmonas January 3, 5 Ths appendx provdes proofs of some results stated n our

More information

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates Secton on Survey Research Methods An Applcaton of Alternatve Weghtng Matrx Collapsng Approaches for Improvng Sample Estmates Lnda Tompkns 1, Jay J. Km 2 1 Centers for Dsease Control and Preventon, atonal

More information

2.1 Rademacher Calculus... 3

2.1 Rademacher Calculus... 3 COS 598E: Unsupervsed Learnng Week 2 Lecturer: Elad Hazan Scrbe: Kran Vodrahall Contents 1 Introducton 1 2 Non-generatve pproach 1 2.1 Rademacher Calculus............................... 3 3 Spectral utoencoders

More information

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent.

Economics 1410 Fall Section 7 Notes 1. Define the tax in a flexible way using T (z), where z is the income reported by the agent. Economcs 1410 Fall 2017 Harvard Unversty Yaan Al-Karableh Secton 7 Notes 1 I. The ncome taxaton problem Defne the tax n a flexble way usng T (), where s the ncome reported by the agent. Retenton functon:

More information

Members not eligible for this option

Members not eligible for this option DC - Lump sum optons R6.1 Uncrystallsed funds penson lump sum An uncrystallsed funds penson lump sum, known as a UFPLS (also called a FLUMP), s a way of takng your penson pot wthout takng money from a

More information

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds Fnte Math - Fall 2016 Lecture Notes - 9/19/2016 Secton 3.3 - Future Value of an Annuty; Snkng Funds Snkng Funds. We can turn the annutes pcture around and ask how much we would need to depost nto an account

More information

Linear Combinations of Random Variables and Sampling (100 points)

Linear Combinations of Random Variables and Sampling (100 points) Economcs 30330: Statstcs for Economcs Problem Set 6 Unversty of Notre Dame Instructor: Julo Garín Sprng 2012 Lnear Combnatons of Random Varables and Samplng 100 ponts 1. Four-part problem. Go get some

More information

Global Optimization in Multi-Agent Models

Global Optimization in Multi-Agent Models Global Optmzaton n Mult-Agent Models John R. Brge R.R. McCormck School of Engneerng and Appled Scence Northwestern Unversty Jont work wth Chonawee Supatgat, Enron, and Rachel Zhang, Cornell 11/19/2004

More information

Taxation and Externalities. - Much recent discussion of policy towards externalities, e.g., global warming debate/kyoto

Taxation and Externalities. - Much recent discussion of policy towards externalities, e.g., global warming debate/kyoto Taxaton and Externaltes - Much recent dscusson of polcy towards externaltes, e.g., global warmng debate/kyoto - Increasng share of tax revenue from envronmental taxaton 6 percent n OECD - Envronmental

More information

Optimising a general repair kit problem with a service constraint

Optimising a general repair kit problem with a service constraint Optmsng a general repar kt problem wth a servce constrant Marco Bjvank 1, Ger Koole Department of Mathematcs, VU Unversty Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands Irs F.A. Vs Department

More information

A Polynomial-Time Algorithm for Action-Graph Games

A Polynomial-Time Algorithm for Action-Graph Games A Polynomal-Tme Algorthm for Acton-Graph Games Albert Xn Jang Kevn Leyton-Brown Department of Computer Scence Unversty of Brtsh Columba {ang;evnlb}@cs.ubc.ca Abstract Acton-Graph Games (AGGs) (Bhat & Leyton-Brown

More information

A Game-Theoretic Approach for Integrity Assurance in Resource-Bounded Systems

A Game-Theoretic Approach for Integrity Assurance in Resource-Bounded Systems Internatonal Journal of Informaton Securty 208 7:22 242 https://do.org/0.007/s0207-07-0364-2 A Game-Theoretc Approach for Integrty Assurance n Resource-Bounded Systems Aron aszka Yevgeny Vorobeychk Xenofon

More information

Fast Laplacian Solvers by Sparsification

Fast Laplacian Solvers by Sparsification Spectral Graph Theory Lecture 19 Fast Laplacan Solvers by Sparsfcaton Danel A. Spelman November 9, 2015 Dsclamer These notes are not necessarly an accurate representaton of what happened n class. The notes

More information

When is the lowest equilibrium payoff in a repeated game equal to the min max payoff?

When is the lowest equilibrium payoff in a repeated game equal to the min max payoff? JID:YJETH AID:3744 /FLA [m1+; v 1.113; Prn:21/08/2009; 11:31] P.1 (1-22) Journal of Economc Theory ( ) www.elsever.com/locate/jet When s the lowest equlbrum payoff n a repeated game equal to the mn max

More information

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of Module 8: Probablty and Statstcal Methods n Water Resources Engneerng Bob Ptt Unversty of Alabama Tuscaloosa, AL Flow data are avalable from numerous USGS operated flow recordng statons. Data s usually

More information

Notes on experimental uncertainties and their propagation

Notes on experimental uncertainties and their propagation Ed Eyler 003 otes on epermental uncertantes and ther propagaton These notes are not ntended as a complete set of lecture notes, but nstead as an enumeraton of some of the key statstcal deas needed to obtan

More information

Topics on the Border of Economics and Computation November 6, Lecture 2

Topics on the Border of Economics and Computation November 6, Lecture 2 Topcs on the Border of Economcs and Computaton November 6, 2005 Lecturer: Noam Nsan Lecture 2 Scrbe: Arel Procacca 1 Introducton Last week we dscussed the bascs of zero-sum games n strategc form. We characterzed

More information

2) In the medium-run/long-run, a decrease in the budget deficit will produce:

2) In the medium-run/long-run, a decrease in the budget deficit will produce: 4.02 Quz 2 Solutons Fall 2004 Multple-Choce Questons ) Consder the wage-settng and prce-settng equatons we studed n class. Suppose the markup, µ, equals 0.25, and F(u,z) = -u. What s the natural rate of

More information

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8 Department of Economcs Prof. Gustavo Indart Unversty of Toronto November 9, 2006 SOLUTION ECO 209Y MACROECONOMIC THEORY Term Test #1 A LAST NAME FIRST NAME STUDENT NUMBER Crcle your secton of the course:

More information

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8 Department of Economcs Prof. Gustavo Indart Unversty of Toronto November 9, 2006 SOLUTION ECO 209Y MACROECONOMIC THEORY Term Test #1 C LAST NAME FIRST NAME STUDENT NUMBER Crcle your secton of the course:

More information

Multifactor Term Structure Models

Multifactor Term Structure Models 1 Multfactor Term Structure Models A. Lmtatons of One-Factor Models 1. Returns on bonds of all maturtes are perfectly correlated. 2. Term structure (and prces of every other dervatves) are unquely determned

More information

Jeffrey Ely. October 7, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Jeffrey Ely. October 7, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. October 7, 2012 Ths work s lcensed under the Creatve Commons Attrbuton-NonCommercal-ShareAlke 3.0 Lcense. Recap We saw last tme that any standard of socal welfare s problematc n a precse sense. If we want

More information

Stackelberg vs. Nash in Security Games: Interchangeability, Equivalence, and Uniqueness

Stackelberg vs. Nash in Security Games: Interchangeability, Equivalence, and Uniqueness Stackelberg vs. Nash n Securty Games: Interchangeablty, Equvalence, and Unqueness Zhengyu Yn 1, Dmytro Korzhyk 2, Chrstopher Kekntveld 1, Vncent Contzer 2, and Mlnd Tambe 1 1 Unversty of Southern Calforna,

More information

Cyclic Scheduling in a Job shop with Multiple Assembly Firms

Cyclic Scheduling in a Job shop with Multiple Assembly Firms Proceedngs of the 0 Internatonal Conference on Industral Engneerng and Operatons Management Kuala Lumpur, Malaysa, January 4, 0 Cyclc Schedulng n a Job shop wth Multple Assembly Frms Tetsuya Kana and Koch

More information

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed.

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed. Fnal Exam Fall 4 Econ 8-67 Closed Book. Formula Sheet Provded. Calculators OK. Tme Allowed: hours Please wrte your answers on the page below each queston. (5 ponts) Assume that the rsk-free nterest rate

More information

WHEN IS THE LOWEST EQUILIBRIUM PAYOFF IN A REPEATED GAME EQUAL TO THE MINMAX PAYOFF? OLIVIER GOSSNER and JOHANNES HÖRNER

WHEN IS THE LOWEST EQUILIBRIUM PAYOFF IN A REPEATED GAME EQUAL TO THE MINMAX PAYOFF? OLIVIER GOSSNER and JOHANNES HÖRNER WHEN IS THE LOWEST EQUILIBRIUM PAYOFF IN A REPEATED GAME EQUAL TO THE MINMAX PAYOFF? BY OLIVIER GOSSNER and JOHANNES HÖRNER COWLES FOUNDATION PAPER NO. 1294 COWLES FOUNDATION FOR RESEARCH IN ECONOMICS

More information

A Single-Product Inventory Model for Multiple Demand Classes 1

A Single-Product Inventory Model for Multiple Demand Classes 1 A Sngle-Product Inventory Model for Multple Demand Classes Hasan Arslan, 2 Stephen C. Graves, 3 and Thomas Roemer 4 March 5, 2005 Abstract We consder a sngle-product nventory system that serves multple

More information

Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning

Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning Mach Learn (2011) 82: 281 314 DOI 10.1007/s10994-010-5192-9 Learnng to compete, coordnate, and cooperate n repeated games usng renforcement learnng Jacob W. Crandall Mchael A. Goodrch Receved: 13 February

More information

Introduction to PGMs: Discrete Variables. Sargur Srihari

Introduction to PGMs: Discrete Variables. Sargur Srihari Introducton to : Dscrete Varables Sargur srhar@cedar.buffalo.edu Topcs. What are graphcal models (or ) 2. Use of Engneerng and AI 3. Drectonalty n graphs 4. Bayesan Networks 5. Generatve Models and Samplng

More information

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost Tamkang Journal of Scence and Engneerng, Vol. 9, No 1, pp. 19 23 (2006) 19 Economc Desgn of Short-Run CSP-1 Plan Under Lnear Inspecton Cost Chung-Ho Chen 1 * and Chao-Yu Chou 2 1 Department of Industral

More information

Ch Rival Pure private goods (most retail goods) Non-Rival Impure public goods (internet service)

Ch Rival Pure private goods (most retail goods) Non-Rival Impure public goods (internet service) h 7 1 Publc Goods o Rval goods: a good s rval f ts consumpton by one person precludes ts consumpton by another o Excludable goods: a good s excludable f you can reasonably prevent a person from consumng

More information

REPUTATION WITHOUT COMMITMENT IN FINITELY-REPEATED GAMES

REPUTATION WITHOUT COMMITMENT IN FINITELY-REPEATED GAMES REPUTATION WITHOUT COMMITMENT IN FINITELY-REPEATED GAMES JONATHAN WEINSTEIN AND MUHAMET YILDIZ A. In the reputaton lterature, players have commtment types whch represent the possblty that they do not have

More information

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 9 Elton, Gruber, Brown, and Goetzmann Modern Portfolo Theory and Investment Analyss, 7th Edton Solutons to Text Problems: Chapter 9 Chapter 9: Problem In the table below, gven that the rskless rate equals

More information

PREFERENCE DOMAINS AND THE MONOTONICITY OF CONDORCET EXTENSIONS

PREFERENCE DOMAINS AND THE MONOTONICITY OF CONDORCET EXTENSIONS PREFERECE DOMAIS AD THE MOOTOICITY OF CODORCET EXTESIOS PAUL J. HEALY AD MICHAEL PERESS ABSTRACT. An alternatve s a Condorcet wnner f t beats all other alternatves n a parwse majorty vote. A socal choce

More information

/ Computational Genomics. Normalization

/ Computational Genomics. Normalization 0-80 /02-70 Computatonal Genomcs Normalzaton Gene Expresson Analyss Model Computatonal nformaton fuson Bologcal regulatory networks Pattern Recognton Data Analyss clusterng, classfcaton normalzaton, mss.

More information

Option pricing and numéraires

Option pricing and numéraires Opton prcng and numérares Daro Trevsan Unverstà degl Stud d Psa San Mnato - 15 September 2016 Overvew 1 What s a numerare? 2 Arrow-Debreu model Change of numerare change of measure 3 Contnuous tme Self-fnancng

More information

Note on Cubic Spline Valuation Methodology

Note on Cubic Spline Valuation Methodology Note on Cubc Splne Valuaton Methodology Regd. Offce: The Internatonal, 2 nd Floor THE CUBIC SPLINE METHODOLOGY A model for yeld curve takes traded yelds for avalable tenors as nput and generates the curve

More information

Stochastic Investment Decision Making with Dynamic Programming

Stochastic Investment Decision Making with Dynamic Programming Proceedngs of the 2010 Internatonal Conference on Industral Engneerng and Operatons Management Dhaka, Bangladesh, January 9 10, 2010 Stochastc Investment Decson Makng wth Dynamc Programmng Md. Noor-E-Alam

More information

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh Antt Salonen Farzaneh Ahmadzadeh 1 Faclty Locaton Problem The study of faclty locaton problems, also known as locaton analyss, s a branch of operatons research concerned wth the optmal placement of facltes

More information

Pivot Points for CQG - Overview

Pivot Points for CQG - Overview Pvot Ponts for CQG - Overvew By Bran Bell Introducton Pvot ponts are a well-known technque used by floor traders to calculate ntraday support and resstance levels. Ths technque has been around for decades,

More information

Formation of Coalition Structures as a Non-Cooperative Game

Formation of Coalition Structures as a Non-Cooperative Game Formaton of Coalton Structures as a Non-Cooperatve Game Dmtry Levando Natonal Research Unversty Hgher School of Economcs, Moscow, Russa dlevando@hse.ru Abstract. The paper proposes a lst of requrements

More information

Mathematical Thinking Exam 1 09 October 2017

Mathematical Thinking Exam 1 09 October 2017 Mathematcal Thnkng Exam 1 09 October 2017 Name: Instructons: Be sure to read each problem s drectons. Wrte clearly durng the exam and fully erase or mark out anythng you do not want graded. You may use

More information

Intensive vs Extensive Margin Tradeo s in a Simple Monetary Search Model

Intensive vs Extensive Margin Tradeo s in a Simple Monetary Search Model Intensve vs Extensve Margn Tradeo s n a Smple Monetary Search Model Sébasten Lotz y Unversty of Pars 2 Andre Shevchenko z Mchgan State Unversty Aprl 2006 hrstopher Waller x Unversty of Notre Dame Abstract

More information

REFINITIV INDICES PRIVATE EQUITY BUYOUT INDEX METHODOLOGY

REFINITIV INDICES PRIVATE EQUITY BUYOUT INDEX METHODOLOGY REFINITIV INDICES PRIVATE EQUITY BUYOUT INDEX METHODOLOGY 1 Table of Contents INTRODUCTION 3 TR Prvate Equty Buyout Index 3 INDEX COMPOSITION 3 Sector Portfolos 4 Sector Weghtng 5 Index Rebalance 5 Index

More information

Online Appendix for Merger Review for Markets with Buyer Power

Online Appendix for Merger Review for Markets with Buyer Power Onlne Appendx for Merger Revew for Markets wth Buyer Power Smon Loertscher Lesle M. Marx July 23, 2018 Introducton In ths appendx we extend the framework of Loertscher and Marx (forthcomng) to allow two

More information

Meaningful cheap talk must improve equilibrium payoffs

Meaningful cheap talk must improve equilibrium payoffs Mathematcal Socal Scences 37 (1999) 97 106 Meanngful cheap talk must mprove equlbrum payoffs Lanny Arvan, Luıs Cabral *, Vasco Santos a b, c a Unversty of Illnos at Urbana-Champagn, Department of Economcs,

More information

Two Period Models. 1. Static Models. Econ602. Spring Lutz Hendricks

Two Period Models. 1. Static Models. Econ602. Spring Lutz Hendricks Two Perod Models Econ602. Sprng 2005. Lutz Hendrcks The man ponts of ths secton are: Tools: settng up and solvng a general equlbrum model; Kuhn-Tucker condtons; solvng multperod problems Economc nsghts:

More information