Learning in Network Games with Incomplete Information. implementation of rational behavior]

Size: px

Start display at page:

Download "Learning in Network Games with Incomplete Information. implementation of rational behavior]"

Christopher Cole
5 years ago
Views:

[ Ceyhun Eksn, Pooya Molav, Alejandro Rbero, and Al Jadbabae ] Learnng n Network Games wth Incomplete Informaton ISOCKPHOO.

dynamcs of technology adopton [1], consumpton behavor [], organzatonal behavor [3], and fnancal markets [].

Interest n opnon dynamcs over networks s further amplfed by the contnuous growth n the amount of tme that ndvduals spend on socal meda Web stes and the consequent ncrease n the mportance of networked

3519 Date of publcaton: 5 Aprl 013 become more readly avalable, a research problem s to dentfy metrcs that could characterze emergent phenomena such as conformsm or dversty n ndvduals preferences for

1 [ Ceyhun Eksn, Pooya Molav, Alejandro Rbero, and Al Jadbabae ] Learnng n Network Games wth Incomplete Informaton ISOCKPHOO.COM/JAMIE FARRAN [Asymptotc analyss and tractable mplementaton of ratonal behavor] he role of socal networks n learnng and opnon formaton has been demonstrated n a varety of scenaros such as the dynamcs of technology adopton [1], consumpton behavor [], organzatonal behavor [3], and fnancal markets []. he emergence of network-wde socal phenomena from local nteractons between connected agents has been studed usng feld data [5] [7] as well as lab experments [8], [9]. Interest n opnon dynamcs over networks s further amplfed by the contnuous growth n the amount of tme that ndvduals spend on socal meda Web stes and the consequent ncrease n the mportance of networked phenomena n socal and economc outcomes. As quanttatve data Dgtal Object Identfer /MSP Date of publcaton: 5 Aprl 013 become more readly avalable, a research problem s to dentfy metrcs that could characterze emergent phenomena such as conformsm or dversty n ndvduals preferences for consumer products or poltcal deologes [10]. Wth these metrcs avalable, a natural follow-up research goal s the study of mechansms that lead to dversty or conformsm and the role of network propertes lke neghborhood structures on these outcomes. All of these questons motvate the development of theoretcal models of opnon formaton through local nteractons n dfferent scenaros. he canoncal model of learnng n networks consders a set of connected agents, each endowed wth prvate nformaton regardng a common underlyng random state. Each agent uses hs prvate nformaton to form a probablty dstrbuton on the state of the world and selects an acton from an allowable set that s optmal wth respect to ths belef. he defnton of optmalty wth IEEE SIGNAL PROCESSING MAGAZINE [30] MAY /13/$ IEEE

2 respect to the belef vares, but a general model s to postulate the exstence of a utlty functon that depends on the selected acton, the state of the world, and possbly on the actons selected by other members of the network. If the state of the world were known, we would say that we have complete nformaton and select the acton that maxmzes the utlty. However, nformaton s ncomplete because only a belef on the state s avalable. herefore, agents proceed to select actons that maxmze the expectaton of ther utltes wth respect to ther belefs. In a networked settng, agents further observe actons taken by agents n ther connectvty neghborhoods. hese observatons contan nformaton on the state that a ratonal agent would feel compelled to ncorporate nto hs belef leadng to the selecton of a dfferent optmal acton. hs phenomenon of observatons of neghborng actons affectng decson makng of agents s called nformaton externalty. In general, at a gven pont n tme, any gven agent has seen a hstory of neghborng actons that he combnes wth hs prvate nformaton to update the probablty dstrbuton on the state of the world. hs belef determnes an acton that s optmal wth respect to the expected utlty. As tme progresses, agents learn the state of the world ~ n the sense that they refne ther knowledge.e., the mass of the belef becomes more concentrated through the observaton of neghborng actons. he focus s often on the characterzaton of asymptotc propertes of the belef and the actons of agents as well as algorthmc consderatons. When utltes of agents depend not only on the unknown state of the world but also on the unknown actons of other agents, we say that there are payoff externaltes. In most socal learnng scenaros, payoff externaltes and nformaton externaltes coexst n that the acton chosen by agent s determned by both an nformatonal component pertanng to agents belefs about the underlyng state of the world and a payoff externalty correspondng to ther belefs about the actons taken by all the other agents n the network. In stock markets, for nstance, the actons of each ndvdual affect the utlty of all the other agents and agents respond strategcally to actons based on ther belefs. At the same tme, these actons contan nformaton about stocks ntrnsc valuatons that market partcpants are ntent on learnng. By selectng certan actons, agents are revealng, perhaps unwllngly, peces of prvate nformaton about the true value of the stock [11], [1]. he focus s the study of asymptotc behavor of agents actons and ther belefs gven a fxed network that determnes the flow of nformaton; see the secton Learnng wth Payoff Externaltes. here exsts an extensve lterature on Bayesan learnng over networks for scenaros wthout payoff externaltes [13] [16]. One may thnk of ths problem as a varant of dstrbuted estmaton snce agents ntend to compute an estmate based on global nformaton by aggregatng local nformaton and successvely refnng ther estmates usng those of ther neghbors. Lnear and nonlnear estmaton problems are well-studed n the sgnal processng lterature; see, e.g., [17]. he man dfference between dstrbuted estmaton problems and the ones consdered here s that n the former network nodes may exchange observatons, estmates, and even some auxlary varables [18] [6]. In the problems consdered here, on the other hand, agents try to nfer the state of the world by observng actons of neghborng nodes. he former s a sutable model for algorthm and protocol desgn, but the latter s a more approprate model of socal and economc nteractons. Besdes sgnal processng, models wth purely nformatonal externaltes have been studed n economcs [1], [15], [7], [8], computer scence [9], statstcs [30], and control theory [31] [3]. Even though Bayesan learnng stands as the normatve behavoral model for agents n socal networks, t s often computatonally ntractable even for networks wth a small number of agents. hs s snce a Bayesan update requres an agent to nfer not only the nformaton of hs neghbors but also that of the neghbors of hs neghbors and so on. Because of such computatonal ntractablty, lttle s known about Bayesan learnng besdes the asymptotc behavor. However, under some structural assumptons on dstrbuton of nformaton [30] or the network structure [9], Bayesan learnng s shown to be tractable n the absence of payoff externaltes. In ths artcle, we present a tractable algorthm for the case when agents also face payoff externaltes, assumng that agents ntal prvate sgnals are normally dstrbuted; see the secton Gaussan Quadratc Network Games. We use the algorthm to numercally study the effect of the network structure on convergence tme; see the secton Numercal Examples. BAYESIAN LEARNING IN NEWORKS he network learnng models consdered n ths artcle are composed of an unknown state of the world ~! X and a group of agents N = {, 1 f, N} whose nteractons are characterzed by a network G = ( N, E). At subsequent ponts n tme t = 01,,, f, agents n the network observe prvate sgnals s t, that carry nformaton about the state of the world ~ and decde on an acton a t, belongng to some common compact metrc acton space A that they deem optmal wth respect to a utlty functon of the form u^~, a, t,{ a j, t} j! N \ h. (1) Besdes hs acton a t,, the utlty of agent depends on the state of the world ~ and the actons { a jt,} j! N \ of all other agents n the network. hs dependence tres to capture tradeoffs that arse n socal and economc networks. For example, the state of the world ~ may represent the nherent value of a servce, the prvate sgnals s t, qualty perceptons after use, and a t, decsons on how much to use the servce. he utlty of a person derves from the use of the servce dependng not only on the nherent qualty ~ but also on how much other people use the servce. Decdng optmal actons a t, would be easy f all agents were able to coordnate ther actons. All prvate sgnals s t, could be combned to form a sngle probablty dstrbuton on the state of the world ~ and that common belef used to select a t,. Agents could act together and combne ther utltes nto a socal objectve or they could exhbt strategc behavor and select game equlbrum actons. Whether there s payoff externalty or not, global coordnaton s an mplausble model of socal and economc behavor. We therefore consder agents that act ndependently of each other and couple ther behavor through observaton of the acton hstory of agents n ther network neghborhood N. IEEE SIGNAL PROCESSING MAGAZINE [31] MAY 013

3 o be more precse, say that at tme t = 0, there s a common ntal belef among agents about the unknown parameter ~. hs common belef s represented by a probablty dstrbuton P. At tme t = 0, each agent observes hs own prvate sgnal s, 0, whch he uses n conjuncton wth the pror belef P to choose and execute acton a, 0. Upon executon of a, 0, actons {a j, 0} j! N of neghborng agents become known to node. Knowng the actons of ts neghbors provdes agent wth nformaton about the neghborng prvate sgnals {s j, 0} j! N, whch n turn refnes hs belef about the state of the world ~. hs new knowledge prompts a re-evaluaton of the optmal acton a, 1 n the subsequent tme slot. In general, at stage t, agent has acqured knowledge n the form of the hstory h, t of past and present prvate sgnals s, x for x = 0, f, t and past actons of neghborng agents {a j, x} j! N for tmes x = 1, f, t - 1. hs hstory s used to determne the acton a, t for the current slot. In gong from stage t to stage t + 1, neghborng actons {a j, t} j! N become known and ncorporated nto the hstory of past observatons. We can thus formally defne the hstory h, t by the recurson h, t + 1 = ^ h, t, {a j, t} j! N, s, t + 1). () he component of the game that determnes acton of agent from observed hstory h, t s hs strategy v, t. A pure strategy s a functon that maps any possble hstory to an acton, v, t : h, t 7 a, t. he value of a strategy functon v, t assocated wth the gven observed hstory h, t s the acton of agent, a, t. Gven hs strategy v := {v, u} u = 0, f, 3, agent knows exactly what acton to take at any stage upon observng the hstory at that stage. Hence, the (pure) strateges of all the agents across tme v := {v j, u} j! N, u = 0, f, t, specfcally, the strategy profle determnes the path of play, that s, the sequence of hstores each agent wll observe. As a result, f agent at tme t knows the nformaton set at tme t,.e., h t = {h 1, t, f, h N, t}, then he knows the contnuaton of the game from tme t onward gven knowledge of the strategy profle v. When agents have (common) pror P on the state of the world at tme t = 0, the strategy profle v nduces a belef Pv ($) on the path of play. hat s, Pv (h) s the probablty assocated wth reachng an nformaton set h when agents follow the actons prescrbed by v. herefore, at tme t, the strategy profle determnes the pror belef q, t of agent gven h, t, that s, Snce the belef s a probablty dstrbuton over the set of possble actons n the future, agent can calculate expected payoffs from choosng an acton. A ratonal behavor for agent s to select the acton a, t that maxmzes the expected utlty gven hs belef q, t, a, t! argmax E v 6u ^~, a, {v j, t (h j, t)} j! N \ h h, t@ a! A := argmax a! A #h u ^~, a, {v j,t (h j,t)} j! N \ hdq,t (h), where we have defned condtonal expectaton operator E v [$ h, t] wth respect to the condtonal dstrbuton Pv ($ h, t). he ratonal acton a, t n (5) s optmal gven strategy profle v; as a result, a, t s a functon of the strategy profle v. So far, we have not mposed any constrants on the strategy profle v. Accordng to the defnton of ratonal behavor n (5), all agents should maxmze the expected value of the self-utlty functon. Wth ths n mnd, we defne the Bayesan Nash equlbrum (BNE) to be the strategy profle of a ratonal agent. A BNE strategy v s a best response strategy such that no agent can expect to ncrease hs utlty by unlaterally devatng from hs strategy v, t gven that the rest of the agents play equlbrum strateges {v j, t} j! N\ ; that s, v s BNE f for each! N and t = 0, 1,, f, the strategy v, t maxmzes the expected payoff v, t (h, t)! argmax E v 6u (~, a, {v j, t (h j, t)} j! N\ ) h, t@. a! A (6) We emphasze that (6) needs to be satsfed for all possble hstores h, t and not just for the hstory realzed n a partcular game realzaton. hs s necessary because agent does not know the hstory observed by agent j but rather has a probablty dstrbuton on hstores. hus, to evaluate the expectaton n (5), agent needs a representaton of the equlbrum strategy for all possble hstores h j, t. In ths artcle, we restrct our attenton to the equlbrum noton where agents choose myopcally optmal actons as n (5). It s also possble to defne BNE for nonmyopc agents that dscount future payoffs. Agents exhbtng nonmyopc behavor mght experment to obtan valuable nformaton to be used n the future. In rest of the artcle, we consder myopc agents playng wth respect to BNE strategy v. o smplfy future notaton, we defne the expectaton operator E, t [ $ ] := E v [ $ h, t], q, t ($) = Pv ($ h, t). (5) (7) (3) he pror belef q, t puts a dstrbuton on the set of possble nformaton sets h t at tme t gven that agents played accordng to v 0, f,t - 1 and observed h, t. Furthermore, the strateges from tme t onward v t, f, 3 permt the transformaton of belefs on the nformaton set nto a dstrbuton over respectve upcomng actons {a j, u} j! N, u = t, f, 3. As a result, upon observng {a j, t} j! N and s, t, agent updates hs belef usng Bayes rule, q, t + 1 ($) = Pv ($ h, t + 1) = Pv ($ h, t, s, t + 1, {a j, t} j! N) = q, t ($ s, t + 1, {a j, t} j! N). () to represent expectatons wth respect to the local hstory h, t when agents play accordng to the equlbrum strategy v. BNE s an extenson of Nash equlbrum to games wth ncomplete nformaton. In ths soluton concept we assume that agents nterpret actons of ther neghbors knowng that they play accordng to the BNE strategy,.e., BNE s common knowledge. Note that whle defnng ratonal behavor n (5), we have not specfed how agent models actons of other agents. o calculate hs expected utlty n (5), agent needs to have a model of strateges of other agents. Common knowledge of BNE strateges and ratonalty s a partcular model of agents behavor n whch agent IEEE SIGNAL PROCESSING MAGAZINE [3] MAY 013

4 beleves, correctly so, that agent j s ratonal. In other words, agent s model of behavor of agent j! N \{ } s that j also maxmzes expected payoff as n (6) and further that agent can correctly guess j s actons f he had access to j s hstory h jt,. In a networked settng, agents also requre knowledge of the network to nfer about nformaton of other agents. Hence, we also assume network structure s common knowledge. Notce that ths equlbrum noton couples belefs and strateges n a consstent way n the sense that strateges nduce belefs and the belefs determne optmal strategy. hs ratonal model provdes a benchmark for comparson wth other behavoral models. NOIONS OF CONSENSUS An mportant research queston when studyng socal learnng models s whether agents reach consensus and, f they do, whether the outcome s effcent accordng to some crteron. Several dfferent notons of consensus have been studed n the lterature. Agents are sad to reach consensus n ther actons f they all eventually take the same acton, or more formally, f the dstance between a t, and a jt, goes to zero as tme goes to nfnty where the dstance s defned usng the metrc on the acton space A. If agents utlty functons are the same, then reachng consensus n the actons mples that agents obtan the same utlty; however, the converse s not necessarly true. A dfferent characterzaton of consensus s n terms of agents belefs. Agents and j are sad to reach consensus n ther belefs, f the dstance (n total varaton) between probablty measures q t, () $ and q jt, () $ goes to zero as t goes to nfnty. If ths s true for any two par of agents, we say that all agents reach consensus. he consensus belef however mght be naccurate n the sense of not correspondng correctly to the agents observatons. Another noton of convergence consders expected payoffs. We say that agents are expected to perform equally well asymptotcally f lm E 6 u( ~, a, t,{ ak, t} k! N \ t " 3 lm E 6 u j( ~, a j, t,{ ak, t} k N \ (8) =! t " 3 where the expectaton s over all possble realzatons of the state of the world ~. he result n (8) establshes a form of consensus that s attaned n the lmt. It s possble that for ndvdual realzatons of the parameter ~, the expected payoffs of dfferent agents are dfferent; however, f we consder an average across realzatons of ~, the payoffs asymptotcally concde. We can nterpret ths result as statng that ex ante all agents are expected to obtan the same payoff. Each of the notons of agreement dscussed above mght be relevant n certan applcatons. Moreover, they do not necessarly concde. Agents mght reach consensus n ther actons wthout havng the same belefs f agents actons do not completely reflect the belefs held by them. On the other hand, agents mght reach consensus n ther belefs (and even learn all the nformaton) and yet take dsparate actons. In the followng sectons, we consder learnng n a class of games n whch actons are on the real lne and the utlty functon s quadratc n agents actons and the state of the world. In the secton Learnng wth Payoff Externaltes, we survey a recent result that shows convergence n expected payoffs. hs result s essental n provng that agents reach consensus n ther actons n the lmt. In the secton Gaussan Quadratc Network Games, we derve tractable recursons for ratonal learnng gven that agents prvate sgnals are normally dstrbuted. LEARNING WIH PAYOFF EXERNALIIES he presence of payoff externaltes adds another layer of complexty to the learnng process compared to models wth purely nformatonal externaltes, snce t prohbts agents from nterpretng the actons of ther neghbors as solely revealng nformaton about the true state of the world. Instead, they have to keep track of motves of other agents and at the same tme ncorporate the new nformaton effectvely. he nterested readers can refer to Example of Learnng wth Payoff Externalty for an llustraton of the ratonal learnng process wth both payoff and nformatonal externaltes. An example of learnng wth only nformatonal externaltes s gven n [16]. In ths secton, we ntroduce games wth utlty functons that are quadratc both n the state of the world and agents actons. We then exemplfy ths quadratc form n the context of fnancal markets. Fnally, we provde asymptotc convergence results for learnng n quadratc games over networks. QUADRAIC GAMES At any tme t, selecton of actons { a: = a, t! R}! N when the state of the world s ~! R results n agent recevng a payoff, u( ~, a,{ a j} j! N \ ) / / (9) 1 =- a j + bjaa j+ d a ~ + c~, j! N j! N\ where b j, d, and c are real-valued constants. he constant b j measures the effect of j s acton on s utlty. For notatonal convenence, we let b = 0 for all! N. Snce u s a strctly concave functon of a (.e., u/ a 1 0), the myopcally optmal acton can be computed explctly by takng the dervatve wth respect to a, equatng the result to zero, and solvng for a. As a result, the ratonal acton, defned n (5), for agent n response to any strategy { v jt,} j! N\ s a lnear functon of the strateges of other agents and the underlyng parameter a = / b E [ v ( h )] + d E [ ~ ]. (10) t, j t, jt, jt, t, j! N\ Accordng to the equlbrum defnton (6), at each state agents play a myopc best response gven the observed hstory aganst other agents actons, whch n turn are myopc best responses. Consequently, for the quadratc utlty functon an equlbrum strategy profle { v }! N solves the followng set of equatons: for all! N. t, vt, ( ht,) = / bjet, [ v jt, ( h jt,)] + d Et, [ ~ ], (11) j! N\ IEEE SIGNAL PROCESSING MAGAZINE [33] MAY 013

5 EXAMPLE OF LEARNING WIH PAYOFF EXERNALIY In ths secton, we gve an example of ratonal behavor n a model wth both nformatonal and payoff externaltes. he example llustrates how ratonal agents are able to rule out possble states of the world upon observng actons of ther neghbors. here are three agents n a lne network; that s, N = { 13,, }, N1 = { }, N = { 13, }, and N3 = { }. he possble states of the world belong to the set, X = { ~ 1, ~, ~ 3}. Agents have a common unform pror over the possble states. At the begnnng, agents receve prvate sgnals s1, s, and s3. Based on s1, Agent 1 can dstngush whether the true state s ~ 3 or belongs to the set { ~ 1, ~ }. he prvate sgnal of s does not carry any nformaton. s3 reveals whether the true state s ~ 1 or belongs to the set { ~, ~ 3}. We assume that agents know the nformatveness of the prvate sgnals of all agents,.e., the partton of the prvate sgnals s known by all agents. here are two possble actons, A = {, l r}. Agent s payoff depends on the actons of the other two agents an\ : = { aj} jd N\ n the followng way: 1 f ~ = ~ 1, a = l, an\ = {, l l}, u( ~, a, t, an\ ) = f ~ = ~ 3, a = r, an\ = {, r r}, (S1) 0 otherwse. Accordng to (S1), Agent earns a payoff only when all the agents choose l and the state s ~ 1 or when all the agents choose r and the state s ~3. Intal strateges of agents consst of functons that map ther observed hstores at t = 0 (whch only consst of ther sgnals) to actons. Let ( v10,, v0,, v 30, ) be a strategy profle at t = 0 defned as v v v l ( s ) = ' r ( s ) = r, l ( s ) = ' r 10, 1 0, 30, 3 f f f f s1 = { ~ 1, ~ }, s1 = { ~ 3}, s3 = { ~ 1}, s3 = { ~, ~ 3}. Note that snce Agent s sgnal s unnformatve, he needs to take the same acton regardless of hs sgnal. Agents strateges at a tme t $ 1 map ther observed hstores to actons. For t $ 1 let the ( v1, t, v, t, v 3, t) be a strategy profle defned as v v v l f s1 = { ~ 1, ~ }, ( h, t) = ' r f s1 = { ~ 3}, r f a1, t-1 = a3, t-1 = r, ( h, t) = ' l otherwse, l f s3 = { ~ 1}, ( h, t) = ' r f s3 = { ~, ~ 3}. 1, t 1, t 3, t 3 Note that even though agents strateges could depend on ther entre hstores, n the above specfcaton Agent 1 s and 3 s actons only depend on ther prvate sgnals, whereas Agent s actons only depend on the last actons taken by hs neghbors. We argue that v = ( vt, )! N, t= 01,, f as defned above s an equlbrum strategy. We assume that the strategy profle v s common knowledge and verfy that agents actons gven any hstory maxmzes ther expected utltes gven the belefs nduced by the Bayes rule. Frst, consder the tme perod t = 0. Suppose that Agent 1 observes s1 = { ~ 1, ~ }. He assgns one half probablty to the event ~ = ~ 1 n whch case accordng to v Agent plays r COORDINAION GAMES In hs general theory of economcs, Keynes argues that f a person s asked to guess the beauty contest wnner, he should evaluate each contestant wth respect to what he thnks other people s crtera of beauty s. Smlarly, an nvestment n the stocks of a company entals a player not only to consder hs own estmate of how well the company s dong but also what everyone else thnks about the company s status [11]. Let ~! R be the true stock value of a company. In ths context, acton a t, represents player s valuaton of the asset;.e., t s the prce that agent s wllng to pay per stock share at tme t. he payoff functon for agent s gven by u( ~, a,{ a j} j! N\ ) 1 m ( a ) ( a a ), m ( N 1 ) j =- - -~ - / - (1) -! N\ where m! ( 01, ). hs payoff functon s remnscent of that of a coordnaton game (or potental game) [35], [36] wth the only dfference beng the addton of a term correspondng to an estmaton problem. he frst term of the payoff functon measures the desre of the player to estmate the true value of the stock as the quadratc dstance between s acton and ~. he second term s the coordnaton (or the beauty contest) term measurng the payoff assocated wth beng close to valuatons of other members of j the socety. It represents how the actons of others affect the payoff of agent. he constant m gauges the relatve mportance of coordnaton and estmaton. Usng (6), the BNE strategy v n ths quadratc game solves the followng set of equatons: m vt, ( ht,) = ( 1 - m ) Et, + Et, 6v jt, ( h jt, )@, N - 1 j! N\! N and t! N. / (13) Snce the payoff (1) s of the form n (9), the equlbrum equatons n (10) are lnear n strateges of other agents as n (11). he same payoff functon can also be motvated by lookng at coordnaton among a network of moble agents startng wth a certan formaton tryng to move toward a fnsh lne on a straght path [1], [37]. Each agent collects an ntal nosy measurement of the true headng angle ~, that s, the angle that acheves the shortest path toward the fnsh lne. In ths example, the actons of agents represent ther choce of headng drecton or movement angle. We assume the agents move wth constant and equal velocty. he frst term n (1) represents agents goal to estmate the correct headng angle. he desre of agent to mantan the ntal formaton s captured by the second term n (1). IEEE SIGNAL PROCESSING MAGAZINE [3] MAY 013

6 and Agent 3 plays l, and he assgns one half probablty to state ~ = ~ n whch case Agent plays r and Agent 3 plays r. herefore, hs expected payoff s zero regardless of the acton he takes; that s, he does not have a proftable unlateral devaton from the strategy profle v. Next, suppose that Agent 1 observes s1 = { ~ 3}. In ths case he knows for sure that ~ = ~ 3 and that Agents and 3 both play r. herefore, the best he can do s also to play r whch s the acton specfed by v. hs argument shows that Agent 1 has no proftable devaton from v regardless of the realzaton of s1. Next, we focus on Agent. He has no nformaton at t = 0. herefore, he assgns one-thrd probablty to the event ~ = ~ 1 n whch case a10, = a30, = l, one-thrd probablty to the event ~ = ~ 3 n whch case a10=, l and a30=, r, and one-thrd probablty to the event ~ = ~ n whch case a10, = a30, = r. herefore, hs expected payoff of takng acton r s 3 /, whereas hs expected payoff of takng acton l s 13 /. Fnally, consderng Agent 3, f he observes s3 = { ~ 1}, he knows that Agents 1 and play l and r, respectvely, n whch case he s ndfferent between l and r. If he observes s3 = { ~, ~ 3}, on the other hand, he assgns one-half probablty to ~ = ~ n whch case a10=, l and a0=, r, and one-half probablty to ~ = ~ 3 n whch case a10, = a0, = r. herefore, he strctly prefers playng r n ths case. We have shown that at t = 0, no agent has an ncentve to devate from the actons prescrbed by v. We have ndeed shown somethng stronger. Strateges v 10, and v 0, are domnant strateges for Agents 1 and 3, respectvely; that s, regardless of what other agents do, Agents 1 and 3 have no ncentve to devate from playng these strateges. Next, consder the tme perod t = 1. In ths tme perod, Agent knows the strateges that Agents 1 and 3 used n the prevous tme perod and learns the true state; specfcally, f they played {, ll } the state s ~ 1, f they played {, rr } the state s ~ 3, and otherwse the state s ~. Also, by the above argument Agents 1 and 3 wll never have an ncentve to change ther strateges from what s prescrbed by v. herefore, v s consstent wth equlbrum at t = 1 as well. he exact same argument can be repeated for t 1. Now that we have shown that v s an equlbrum strategy, we can focus on the evoluton of agents expected payoffs. For the rest of the example, assume that ~ = ~1. At t = 0, Agent 3 learns the true state. Agents 1,, and 3 play l, r, and l, respectvely. Snce Agents 1 and know that Agent wll play a0=, r, ther condtonal expected payoffs at t = 0 are zero. Agent on the other hand, assgns one thrd probablty to the state ~ 3 and acton profle ( rrr,, ); therefore, hs expected payoff s gven by 3 /. At t = 1, all agents play l. Agent learns the true state. Snce Agents and 3 know the true state and know that the acton profle that s chosen s ( lll,,), ther expected payoffs are equal to one. On the other hand, Agent 1 does not know whether the state s ~ 1 or ~ but he knows that the acton profle taken s (, lll,); therefore, hs condtonal expected payoff s equal to 1 /. In later stages, agents change nether ther belefs nor ther actons. he example llustrates an mportant aspect of learnng n presence of payoff externaltes. Agents need to nfer about the actons of other agents n the next stage based on the nformaton avalable to them and use the knowledge of equlbrum strategy to make predcton about how others would play. hs nference process ncludes reasonng about others reasonng about actons of self and other agents, whch n turn leads to the noton of equlbrum strategy that we formally defned n the secton Bayesan Learnng n Networks. ASYMPOIC PROPERIES OF LEARNING IN QUADRAIC GAMES In ths secton we present results from [1] and [38] that focus on symmetrc, supermodular, and dagonally domnant games. he game defned by the utlty functon n (9) s symmetrc when the parwse nfluences b j are equal for any par; that s, bj = b for constant b! R for all! N and j! N \. A game s supermodular when agents strateges are complementary to each other. Strategc complementarty between agents and j means that the margnal utlty of agent s acton ncreases wth an ncrease n j s acton. For a twce dfferentable utlty functon, ths s equvalent to requrng that u()/ $ aa j 0 for any two agents and j. For our quadratc utlty functon n (9), the actons of and j are strategc complements when b j $ 0 for all! N and j! N \. We further restrct our attenton to games n whch the Hessan matrces of the utlty functon are strctly dagonally domnant. For the utlty functon n (9), ths s equvalent to requrng that there exsts t 1 1 such that / bj # t for all! N. (1) j! N\ he nterpretaton of (1) s that an agent s utlty s more senstve to changes n hs own actons than to changes n the actons of other agents. Notce that the payoff functon (1) of the coordnaton problem satsfes all of these propertes. Accordng to the learnng framework ntroduced n the secton Bayesan Learnng n Networks, agents take actons specfed by the equlbrum strategy, observe neghborng actons, update ther belefs accordng to the Bayes rule, and then start the next stage as a new game wth belefs dfferent from the prevous stage. hs means that the equlbrum of the new game s not necessarly the same as the equlbrum of the prevous stage. However, snce agents accumulate nformaton about the unknown state over tme, t s possble to show that under the equlbrum behavor n (6), agents expected utltes converge for the utlty functon n (9) [1]. By the same token agents equlbrum actons at, : = vt, ( ht, ) defned n (11) converge n the lmt at, " a, 3 for all! N. (15) Exstence of lmt actons mples that the agents can learn ther neghbors lmt actons. Snce observes actons of j! N, agent j s acton at tme t - 1s n the nformaton set of agent at tme t ;.e., a jt, - 1! ht,. hs mples that a jt, - 1 s measurable wth respect to the nformaton of at tme t. herefore, snce a jt, " a j, 3 and s nformaton s an ncreasng set that converges as t goes to nfnty, the lmt acton a j, 3 s measurable wth IEEE SIGNAL PROCESSING MAGAZINE [35] MAY 013

7 respect to s nformaton at nfnty. In other words, agent s able to dentfy the lmt acton of a neghborng agent j. he fact that agents can dentfy the lmt actons of ther neghbors leads to a number of nterestng conclusons by makng use of the so-called mtaton prncple [1], [15]. he mtaton prncple states that the expected payoff of agent wth respect to hs hstory cannot ncrease f he adopts an acton of a neghborng agent by the followng argument. Gven (18) the lmt acton of agent j s a maxmzer of the expected utlty of agent ;.e., a j, 3 = argmaxa! AE, 3 6 u( ~, a,{ ak, 3} k! N\ By strct concavty of (9) the myopcally optmal acton n (5) s unque. Hence, t must be the case that a, 3 = a j, 3 for all! N and j! N. Gven that the network s connected, ths mples that a, 3 = a j, 3 for any par of agents. hs conformty result proved n [1] extends some of the results n [15] to models wth payoff externaltes. E 6u ( ~, a,{ a, }! N\ )@ $ E, 3 6u( ~, a j, 3,{ ak, 3} k! N\ j! N. (16), 3, 3 k 3 k he nequalty n (16) s due to the defnton of a t, as the maxmzng acton n (6). Notce that (16) s only true for neghborng nodes because only observes (and hence can dentfy) the actons of hs neghbors. Actons of other agents, on the other hand, are not observed by and may not be measurable wth respect to q, 3. By applyng the mtaton prncple and makng use of the assumpton on the strategc complementarty of the actons between agents and j! N, we can show that neghborng agents are expected to receve the same payoff whether plays hs own lmt acton a, 3 or any of hs neghbors lmt actons a j, 3 [1]; that s, EE 6, 3 6u( ~, a, 3,{ ak, 3} k! N\ = EE 6, 3 6u( ~, a j, 3,{ ak, 3} k! N\ for all j! N. (17) he ntutve argument behnd (17) s as follows. By the symmetry property of the utlty functon strategc complementarty mples that unlateral devatons by and j to each other s actons are at least as good as playng ther own lmt actons n expectaton. However, by the mtaton prncple n (16), ths behavor can never yeld strctly hgher payoffs. Hence, t must be the case that devatons to neghbors lmt actons result n no change n the expected performance of agents. Note that ths s true only for neghborng agents. By the mtaton prncple n (16), the left-hand sde of (17) s no larger than the rght-hand sde for all ~. Hence, t must be case that the equalty holds almost surely when we remove the outer expectaton n (17) E 6u ( ~, a,{ a }! N )@ = E 6u ( ~, a,{ a }! N\ )@,, 3, 3 k, 3 k \, 3 j, 3 k, 3 k for all j! N. (18) Accordng to (18), agent expects hs lmt acton to result n a payoff no worse than f he were to play the lmt acton of one of hs neghbors;.e., from the perspectve of agent, agent j s lmt acton s just as good as self-lmt acton. An mmedate corollary of (18) s that for a connected network agents reach consensus n ther actons. he result s proved EACH AGEN EXPECS HIS LIMI ACION O RESUL IN A PAYOFF NO WORSE HAN IF HE WERE O PLAY HE LIMI ACION OF ONE OF HIS NEIGHBORS. GAUSSIAN QUADRAIC NEWORK GAMES In ths secton, we restrct our attenton to games wth quadratc utlty functons as n (9) and prvate sgnals that are normally dstrbuted, and we show that the equlbrum strateges can be computed explctly. he results of ths secton are presented more formally and n more detal n [37]. Assume that at tme t = 0, agent receves a prvate nosy sgnal s! R about the unknown parameter s = ~ + e, (19) where e s normal wth mean zero and varance C. he groupng of all prvate sgnals s denoted by the vector of prvate sgnals s : [ s,, sn] R N # 1 = 1 f!. Further, agents common pror for ~ s an (mproper) unform measure over R. Hence, the posteror at tme t = 0, P( ~, s s) s normal. o see how equlbrum responses can be computed explctly, assume for the sake of argument that at gven tme t t s possble to wrte the mnmum mean squared error (MMSE) estmates of the state of the world and the prvate sgnals s as lnear combnatons of the prvate sgnals themselves;.e., at tme t there are vectors k t, and matrces L t, for whch we can wrte E = k s, Et6s@ = Lt s, for all! N. (0) t, t,,, Notce that (0) does not mply that MMSE estmates E t, 6 and E t6, s@ are computed as lnear combnatons of the prvate sgnals s. hs s not possble because agent does not know the values of all prvate sgnals f ths were the case, he would have complete nformaton and the expresson E t, 6 s@ = Lt, swould be pontless. Our assumpton s that whatever may be the computatons that agents perform, they are equvalent to computng the lnear combnatons n (0). he valdty of (0), whch we assume for the moment wthout a proof, s nstrumental n smplfyng the computaton of best responses and the assocated fxed ponts that defne the equlbrum actons. For that matter, we solve (6) by postulatng that the best response acton can be wrtten as a lnear combnaton at, = vt, Et, [] s of the prvate sgnals MMSE estmates wth weghts gven by some vector v t, to be determned. Gven ths canddate soluton, we can rewrte the best response fxed pont condton n (6) as vt, Et, [] s = / bjet, 6 v jt, E jt, [] + d Et, [ ~ ]. (1) j! N\ IEEE SIGNAL PROCESSING MAGAZINE [36] MAY 013

8 Snce we are assumng that (0) holds and that n partcular the prvate sgnal MMSE estmate s E, t = L, t s, we can rewrte the double expectatons nsde the summaton n (1) as E, t 6v j, t E j, t [s]@ = E, t 6v j, t L j, t s@. () Substtutng the expresson n () nto (1), and further notng that as per (0), the MMSE estmate of the world state s E, t 6~@ = k, t s and the estmate of the prvate sgnals s E, t [s] = L, t s, we can smplfy the equlbrum condton to v, t L, t s = / b j v j, t L j, t L, t s + d k, t s, (3) j! N\ whch as (6), or (1) for that matter, we requre for all agents! N. o solve ths systems of equatons we observe that t s underdetermned. Each vector v, t contans N elements and snce there are N of ths vectors there are a total of N unknowns. However, there s one equaton lke (1) for each agent leadng to a total of N equatons. We can take advantage of ths ndetermnacy and proceed to equate the terms that multply each ndvdual sgnal s j on each sde of (1). hs results n a set of N equatons of the form L, t v, t = / b j L, t L j, t v j, t + d k, t, () j! N\ assocated wth each agent. Snce we have a total of N agents, there are N equatons that we can use to determne the N values of the vectors v, t for all agents. Observe that the systems of lnear equatons defned by () does not depend on the realzaton of the prvate sgnals and that as a consequence nether do the coeffcents v, t. Irrespectve of the realzaton of the prvate sgnals s, the strategy of agent at tme t s the lnear combnaton a, t = v, t E, t [s] wth weghts v, t. An mportant consequence of ths observaton s that the coeffcent v, t can be determned locally by each agent as long as he has access to the (known) network parameters wthout requrng knowledge of the (unknown) prvate sgnal values. he actons realzed, on the other hand, depend on the observed hstory through the MMSE estmate E, t [s] and hence on the realzaton of the prvate sgnals. As well they should. For future reference, stack all weghtng coeffcents v,t nto the aggregate vector v t := [v 1, t, f, v N, t] and all coeffcents k, t nto the aggregate k t := [k 1, t, f, k N, t]. Further defne the matrx L t! R N # N as the matrx wth jth N # N dagonal block equal to L j, t and off dagonal blocks - b j L, t L j, t as seen by (5) n the box at the bottom of the page. J L 1, t - b 1 L 1, t L, t g K L, t g K - b 1 L, t L 1, t L t := K h g j K g L N - 1, t K- b N - 11 L N - 1, t L 1, t K - b L L - b NN - 1 L N, t L N - 1, t g N1 N, t 1, t L Wth these defntons the system of lnear equatons n () can be wrtten n the more compact form L t v t = dk t. (6) We have argued that f we can wrte MMSE estmates as lnear combnatons of prvate sgnals as per (0), the determnaton of the equlbrum strategy reduces to the soluton of the system of lnear equatons n (6). However, s t true that we can wrte the estmates E, t [~] and E, t [s] as n (0)? And f ths s true, what are the values of the vectors k, t and the matrces L, t that are needed to formulate (6)? o answer these questons, we offer an nductve argument n the form of a recursve equaton to update the values of the coeffcents k, t and L, t. An mportant consequence of the assumpton n (0) (whch we have not emphaszed) s that equlbrum actons can be also wrtten as a lnear combnaton of the prvate sgnals. Indeed, snce we can fnd vectors v, t as the soluton of the system of equatons n (6) such that equlbrum actons are a, t = v, t E, t [s], and snce we assume E, t [s] = L, t s, we can wrte the acton of agent at tme t as a, t = v, t L, t s, for all! N. (7) Do note that as n the case of (0), we do not mply that agent calculates ts equlbrum acton usng (7). hs s mpossble because some prvate sgnals are unknown and the correct nterpretaton of (7) s that whatever computatons agents perform to determne ther equlbrum actons, they are equvalent to performng the lnear combnatons n (7). he expresson n (7) smplfes the understudyng of the nformaton revealed by the acton of a user. From the perspectve of agent, observng the acton a j, t of agent j s equvalent to observng the lnear combnaton of prvate sgnals. Observng the composton of neghborng actons a N, t := 6a j1, t, f, a jd(), t@, where we use d () to denote the cardnalty of the set N, s therefore equvalent to observng the vector lnear combnaton an, t = H, t s := [v j 1, t L j 1, t; f; v j d (), t L j d (), t] s, (8) where we have defned the observaton matrx H, t := 6v j1, t L j1, t; f; v jd(), t L jd (), t@! R d () # N. If (7) s true for all tmes t, whch mples that the same s true of (8), t follows that from the perspectve of agent estmaton of the prvate sgnals s and of the underlyng state of the world ~ s a smple sequental lnear MMSE (LMMSE) estmaton problem. Indeed, at tme t = 1, the pror dstrbuton P (~, s ) s Gaussan, and observes neghborng actons a N, 0 gven by the lnear combnaton H, 0 s. Incorporatng - b 1N L 1, t L N, t NO the nformaton contaned n ths - b N L, t L N, t O lnear observaton changes the poso. (5) h teror dstrbuton to P (~, s h, 1) O - b N - 1N L N - 1, t L N, to but ths latter dstrbuton s also O L N, t normal. At general tme t + 1, agent P has a normal pror P (~, s h, t) IEEE SIGNAL PROCESSING MAGAZINE [37] MAY 013

9 and observaton of neghborng actons a N, t = H, t s results n a normal posteror P (~, s h, t + 1 ). hus, to track the belef P (~, s h, t ) t suffces to keep bearngs on the correspondng means and varances whch we can do usng an LMMSE flter. Specfcally, consder agent at tme t and defne the prvate sgnal covarance matrx M ss (t) and the state-prvate sgnal cross covarance M ~s (t), respectvely, defned by the expressons M ss (t) := E, t 6^s - E, t [s] h^s - E, t [s] h@, M ~s (t) := E, t 6^~ - E, t [~] h^s - E, t [s] h@. N, t (9) (30), t he LMMSE estmaton of s from observaton a = H s as gven by (8) requres the defnton of the LMMSE gan K s (t) gven by the product of the cross covarance between the sgnal s and the observaton a N, t tmes the nverse of the covarance matrx of the observaton a N, t. Snce the covarance of s s M ss (t) and the observaton model s a N, t = H, t s, the LMMSE gan s gven explctly by K s (t) = M ss (t) H, t ^ H, t M ss (t) H, th. -1 (31) Usng the value of the LMMSE gan K s (t) n (31), the posteror mean E, t + 1 [s] and posteror covarance matrx M ss (t + 1) after observng the neghborng actons a N, t follow from the recursve expressons E, t + 1 [s] = E, t [s] + K s (t) ^ a N, t - E, t [a N, t] h, M ss (t + 1) = M ss (t) - K s (t) H, t M ss (t), (3) (33) where the executed value of the observatons follows from (8) as E, t [a N, t] = H, t E, t [s]. Lkewse, for the estmaton of the state ~ from observatons a N, t we compute the LMMSE gan K ~ (t) = M ~ s (t) H, t ^ H, t M ss (t) H, th, -1 (3) gven by the product of the cross covarance M ~ s (t) H, t between sgnal ~ and observaton a N, t = H, t s tmes the nverse of the observaton s covarance H, t M ss (t) H, t. We then have that the state s posteror mean E, t + 1 [~] and posteror cross covarance M ~s (t + 1) after observng the neghborng actons a N, t are gven by the recursons E, t + 1 [~] = E, t [~] + K ~ (t) ^a N, t - E, t 6a N, t@h, M ~s (t + 1) = M ~s (t) - K ~ (t) H, t M ss (t). (35) (36) We emphasze that t s possble to wrte a smlar varance update for the world state varance M ~~ (t) := E, t 6(~ - E, t but ths s nconsequental to our argument. Further note that the somewhat unfamlar form of the LMMSE gans K s (t) n (31) and K ~ (t) n (3) are due to the fact that the observaton model a N, t = H, t s n (8) s noseless. We have therefore concluded that f we have lnear actons as per (7), whch s true as long as (0) s true, the propagaton of belefs P (~, s h, t) reduces to the recursve propagaton of means and covarances n (31) (36). From the expressons n (3) and (35), we can see that t s possble to wrte the state and prvate sgnal expectatons at tme t + 1 as the lnear combnatons E, t + 1 6~@ = k, t + 1 s and E, t + 1 [s] = L, t + 1 s akn to those shown n (0). For the prvate sgnal MMSE, substtute E, t [s] = L, t s, a N, t = H,t s, and E, t [a N, t] = H, t E, t [s] = H, t L, t s nto (3) to conclude that f t s possble to wrte (0) at tme t, we can also wrte t at tme t + 1. Perhaps more mportantly, these substtutons also yeld a recursve formula that allows updatng the matrces L, t as L, t + 1 = L, t + K s (t) ^ H, t - H, t L, t h. (37) he same argument can be made for E, t [~] to conclude that f E, t [~] = k, t s at tme t, t s also true at tme t + 1 wth the lnear combnaton coeffcents adherng to the recurson k, t + 1 = k, t + K ~ (t) ^ H, t - H, t L, t h. (38) o complete the nductve argument we need to show that (0) s true at tme t = 0, but ths s obvously true because E, 0 6~@ = s and E, 0 6s j@ = s for all agents. he nducton loop we just completed s suffcently long so as to warrant retracng. We begn by the assumpton that at tme t we can wrte MMSE estmates of the state of the world ~ and the prvate sgnals s as lnear combnatons of the prvate sgnals themselves as per (0). From ths assumpton t follows that equlbrum actons can be wrtten as the lnear combnatons of prvate sgnals n (7). From here t follows that belefs are propagated as per the LMMSE flter summarzed n (31) (36). A smple set of substtutons allows us to conclude that (0) s true at tme t + 1 wth the vector k, t + 1 propagated as per (38) and the matrx L, t + 1 propagated as n (38). he expressons n (0) and (7) are nether used to propagate belefs nor to compute equlbrum actons. he actual operatons carred by each agent are summarzed n Fgures 1 and and descrbed n the followng secton. QUADRAIC NEWORK GAME FILER o compute and play BNE strateges each node runs a quadratc network game (QNG) flter. hs flter entals a full network smulaton n whch agent mantans belefs on the state of the world and the prvate sgnals of all other agents. hese jont belefs allow agent to form an mplct belef on all other actons a j, t for all j! N, whch he uses to fnd hs equlbrum acton a, t. he QNG flter at node s an mplementaton of the LMMSE flters defned by (3) and (35) followed by the play a, t = v, t E, t [s]. A block dagram for ths flter s shown n Fgure 1. At tme t, the nput to the flter s the observed actons a N, t of agent s neghbors. he predcton E, t [a N, t] = H, t E, t [s] of ths vector s subtracted from the observed value and the result s fed nto two parallel blocks respectvely tasked wth updatng the belef E, t [~] on the state of the world ~ and the belef E, t [s] on the prvate sgnals s of other agents. o update IEEE SIGNAL PROCESSING MAGAZINE [38] MAY 013

10 the belef on ~, we mplement (35) by multplyng the nnovaton an, t - Et, [ an, t] by the gan K~ () t and add the result to the prevous state estmate E t, [ ~ ]. o update the belef on the prvate sgnals s we multply the nnovaton by the LMMSE gan {v j,t } jen Ks (). t he result of ths amplfcaton s s added to the prevous prvate sgnal belef Et, [] s as dctated by (3). o determne the equlbrum play, we multply the prvate sgnal estmate Et, [] s by the vector v t, obtaned by solvng the system of lnear equatons n (6). Observe that, n the QNG flter, we do not use the fact that estmates E t, 6 and Et, [] s as well as actons a t, can be wrtten as lnear combnatons of the prvate sgnals [cf. (0) and (7)]. Whle the expressons n (0) and (7) are certanly correct, they cannot be used for mplementaton, because s s partally unknown to agent. he role of (0) and (7) s to allow dervaton of recursons that we use to keep E, [] s track of the gans used n the QNG flter. hese recursons can be dvded nto a group of LMMSE updates and a group of game updates as we show n Fgure. As t follows from (31), (33), (3), and (36), the update of LMMSE coeffcents s dentcal to the gan and covarance updates of a conventonal sequental LMMSE. he only pecularty {L j,t } jen {L j,t } jen -E,t [a N,t ] K ~(t) Mss (t) E,t [~] E,t [s] H a,t N,t R K s(t) R v a,t,t -H,t R {k j,t } jen [FIG1] Block dagram of the QNG flter at agent. he QNG flter contans a mechansm for belef propagaton and a mechansm to calculate equlbrum actons. Insde the dashed box, the belef propagaton feedback loops that compute the estmates of s and ~ as lnear combnatons of prvate sgnals estmates of prevous tme are summarzed. he observaton predcton s subtracted from the observaton to form the predcton error. Afterward, the belef propagaton for s and ~ follow the same steps wth dfferent gans. he predcton error s multpled by the correspondng gan matrx, and added to the prevous mean estmate to form the corrected estmate [see (3) and (35)]. Multplyng the corrected sgnal estmate wth the acton coeffcent gves the equlbrum acton. he gan coeffcents are provded by the LMMSE block n Fgure. he observaton matrx and acton coeffcent are fed from the game block n Fgure. Whle these coeffcents can be calculated by each agent, the mean estmates t and equlbrum acton a t, can only be calculated by agent. s that the observaton matrx H jt, s fed from the game update block and s partally determned by the LMMSE gans and covarances of prevous teratons. Nevertheless, ths pecularty s more assocated wth the game block than wth the LMMSE block. he game block uses (37) and (38) to keep track of the matrces L jt, Varable Update Varable Update L j, t L j, t+1 = L j, t + K j s (t)(h j, t - H j, t L j, t) (37) K j s (t) K j s (t) K j s(t) = M j ss (t)h j, t (H j, tm ss(t)h j, t ) -1 (31) k j, t k j, t+1 = k j, t + K~, j (t)(h j,t - H j,t L j,t) (38) K j w (t) K j ~ (t) K j ~ (t) = M j ~s(t)h j, t (H j, tm ss(t)h j, t ) -1 (3) v j,t (I - L t )v t = dk t (6) Mss j (t) M ss j (t + 1) = M ss j (t) - K s(t)h j j,t Mss(t) j (33) H j,t H j,t := [v k 1,tL k1,t ; ; v k d (j ),tl kd (j ),t ] (8) H j, t M~s(t) j M~s(t j + 1) = M~s j (t) - K~(t)H j j, t Mss(t) j (36) v,t H,t K s(t) K ~(t) o Flter (a) o Flter o Flter (b) o Flter [FIG] Propagaton of gans requred to mplement the QNG flter of Fgure 1. Gans are separated nto nteractng LMMSE and game blocks. All agents perform a full network smulaton n whch they compute the gans of all other agents. hs s necessary because when we compute the play coeffcents v jt, n the game block, agent bulds the matrx Lt that s formed by the blocks L jt, of all agents [see (5)]. hs full network smulaton s possble because the network topology and prvate sgnal models are assumed to be common knowledge. IEEE SIGNAL PROCESSING MAGAZINE [39] MAY 013

Problems to be discussed at the 5 th seminar Suggested solutions

Problems to be discussed at the 5 th seminar Suggested solutions ECON4260 Behavoral Economcs Problems to be dscussed at the 5 th semnar Suggested solutons Problem 1 a) Consder an ultmatum game n whch the proposer gets, ntally, 100 NOK. Assume that both the proposer