The Option-Critic Architecture

Size: px
Start display at page:

Download "The Option-Critic Architecture"

Transcription

1 The Option-Critic Architecture Pierre-Luc Bcon nd Jen Hrb nd Doin Precup Resoning nd Lerning Lb, School of Computer Science McGill University {pbcon, jhrb, Preliminries nd Nottion A Mrkov Decision Process consists of set of sttes S, set of ctions A, trnsition function P : S A (S [, 1]) nd rewrd function r : S A R. For convenience, we develop our ides ssuming discrete stte nd ction sets. However, our results extend to continuous spces using usul mesure-theoretic ssumptions (some of our empiricl results re in continuous tsks). A (Mrkovin sttionry) policy is probbility distribution over ctions conditioned on sttes, π : S A [, 1]. In discounted problems, the vlue function of policy π is defined s the expected return: V π (s) = E π [ t= γt r t+1 s = s] nd its ction-vlue function s Q π (s, ) = E π [ t= γt r t+1 s = s, = ], where γ [, 1) is the discount fctor. A policy π is greedy with respect to given ction-vlue function Q if π(s, ) > iff = rgmx Q(s, ). In discrete MDP, there is t lest one optiml policy which is greedy with rerxiv: v2 [cs.ai] 3 Dec 216 Abstrct Temporl bstrction is key to scling up lerning nd plnning in reinforcement lerning. While plnning with temporlly extended ctions is well understood, creting such bstrctions utonomously from dt hs remined chllenging. We tckle this problem in the frmework of options [Sutton, Precup & Singh, 1999; Precup, 2]. We derive policy grdient theorems for options nd propose new option-critic rchitecture cpble of lerning both the internl policies nd the termintion conditions of options, in tndem with the policy over options, nd without the need to provide ny dditionl rewrds or subgols. Experimentl results in both discrete nd continuous environments showcse the flexibility nd efficiency of the frmework. Introduction Temporl bstrction llows representing knowledge bout courses of ction tht tke plce t different time scles. In reinforcement lerning, options (Sutton, Precup, nd Singh 1999; Precup 2) provide frmework for defining such courses of ction nd for semlessly lerning nd plnning with them. Discovering temporl bstrctions utonomously hs been the subject of extensive reserch efforts in the lst 15 yers (McGovern nd Brto 21; Stolle nd Precup 22; Menche, Mnnor, nd Shimkin 22; Şimşek nd Brto 29; Silver nd Ciosek 212), but pproches tht cn be used nturlly with continuous stte nd/or ction spces hve only recently strted to become fesible (Konidris et l. 211; Niekum 213; Mnn, Mnnor, nd Precup 215; Mnkowitz, Mnn, nd Mnnor 216; Kulkrni et l. 216; Vezhnevets et l. 216; Dniel et l. 216). The mjority of the existing work hs focused on finding subgols (useful sttes tht n gent should rech) nd subsequently lerning policies to chieve them. This ide hs led to interesting methods but ones which re lso difficult to scle up given their combintoril flvor. Additionlly, lerning policies ssocited with subgols cn be expensive in terms of dt nd computtion time; in the worst cse, it cn be s expensive s solving the entire tsk. We present n lterntive view, which blurs the line between the problem of discovering options from tht of lerning options. Bsed on the policy grdient theorem (Sutton et l. 2), we derive new results which enble grdul lerning process of the intr-option policies nd termintion functions, simultneously with the policy over them. This pproch works nturlly with both liner nd non-liner function pproximtors, under discrete or continuous stte nd ction spces. Existing methods for lerning options re considerbly slower when lerning from single tsk: much of the benefit comes from re-using the lerned options in similr tsks. In contrst, we show tht our pproch is cpble of successfully lerning options within single tsk without incurring ny slowdown nd while still providing benefits for trnsfer lerning. We strt by reviewing bckground relted to the two min ingredients of our work: policy grdient methods nd options. We then describe the core ides of our pproch: the intr-option policy nd termintion grdient theorems. Additionl technicl detils re included in the ppendix. We present experimentl results showing tht our pproch lerns meningful temporlly extended behviors in n effective mnner. As opposed to other methods, we only need to specify the number of desired options; it is not necessry to hve subgols, extr rewrds, demonstrtions, multiple problems or ny other specil ccommodtions (however, the pproch cn tke dvntge of pseudo-rewrd functions if desired). To our knowledge, this is the first end-to-end pproch for lerning options tht scles to very lrge domins t comprble efficiency.

2 spect to its own ction-vlue function. Policy grdient methods (Sutton et l. 2; Kond nd Tsitsiklis 2) ddress the problem of finding good policy by performing stochstic grdient descent to optimize performnce objective over given fmily of prmetrized stochstic policies, π θ. The policy grdient theorem (Sutton et l. 2) provides expressions for the grdient of the verge rewrd nd discounted rewrd objectives with respect to θ. In the discounted setting, the objective is defined with respect to designted strt stte (or distribution) s : ρ(θ, s ) = E πθ [ t= γt r t+1 s ]. The policy grdient theorem shows ρ(θ,s tht: ) = s µ π θ (s s ) π θ ( s) Q πθ (s, ), where µ πθ (s s ) = t= γt P (s t = s s ) is discounted weighting of the sttes long the trjectories strting from s. In prctice, the policy grdient is estimted from smples long the on-policy sttionry distribution. (Thoms 214) showed tht neglecting the discount fctor in this sttionry distribution mkes the usul policy grdient estimtor bised. However, correcting for this discrepncy lso reduces dt efficiency. For simplicity, we build on the frmework of (Sutton et l. 2) nd discuss how to extend our results ccording to (Thoms 214). The options frmework (Sutton, Precup, nd Singh 1999; Precup 2) formlizes the ide of temporlly extended ctions. A Mrkovin option ω Ω is triple (I ω, π ω, β ω ) in which I ω S is n initition set, π ω is n intr-option policy, nd β ω : S [, 1] is termintion function. We lso ssume tht s S, ω Ω : s I ω (i.e., ll options re vilble everywhere), n ssumption mde in the mjority of option discovery lgorithms. We will discuss how to dispense with this ssumption in the finl section. (Sutton, Precup, nd Singh 1999; Precup 2) show tht n MDP endowed with set of options becomes Semi-Mrkov Decision Process (Putermn 1994, chpter 11), which hs corresponding optiml vlue function over options V Ω (s) nd option-vlue function Q Ω (s, ω). Lerning nd plnning lgorithms for MDPs hve their counterprts in this setting. However, the existence of the underlying MDP offers the possibility of lerning bout mny different options in prllel : this is the ide of introption lerning, which we leverge in our work. Lerning Options We dopt continul perspective on the problem of lerning options. At ny time, we would like to distill ll of the vilble experience into every component of our system: vlue function nd policy over options, intr-option policies nd termintion functions. To chieve this gol, we focus on lerning option policies nd termintion functions, ssuming they re represented using differentible prmeterized function pproximtors. We consider the cll-nd-return option execution model, in which n gent picks option ω ccording to its policy over options π Ω, then follows the intr-option policy π ω until termintion (s dictted by β ω ), t which point this procedure is repeted. Let π ω,θ denote the intr-option policy of option ω prmetrized by θ nd β ω,ϑ, the termintion function of ω prmeterized by ϑ. We present two new results for lerning options, obtined using s blueprint the policy grdient theorem (Sutton et l. 2). Both results re derived under the ssumption tht the gol is to lern options tht mximize the expected return in the current tsk. However, if one wnted to dd extr informtion to the objective function, this could redily be done so long s it comes in the form of n dditive differentible function. Suppose we im to optimize directly the discounted return, expected over ll the trjectories strting t designted stte s nd option ω, then: ρ(ω, θ, ϑ, s, ω ) = E Ω,θ,ω [ t= γt r t+1 s, ω ]. Note tht this return depends on the policy over options, s well s the prmeters of the option policies nd termintion functions. We will tke grdients of this objective with respect to θ nd ϑ. In order to do this, we will mnipulte equtions similr to those used in intr-option lerning (Sutton, Precup, nd Singh 1999, section 8). Specificlly, the definition of the option-vlue function cn be written s: Q Ω (s, ω) = π ω,θ ( s) Q U (s, ω, ), (1) where Q U : S Ω A R is the vlue of executing n ction in the context of stte-option pir: Q U (s, ω, ) = r(s, ) + γ s P (s s, ) U(ω, s ). (2) Note tht the (s, ω) pirs led to n ugmented stte spce, cf. (Levy nd Shimkin 211). However, we will not work explicitly with this spce; it is used only to simplify the derivtion. The function U : Ω S R is clled the option-vlue function upon rrivl, (Sutton, Precup, nd Singh 1999, eqution 2). The vlue of executing ω upon entering stte s is given by: U(ω, s ) = (1 β ω,ϑ (s ))Q Ω (s, ω) + β ω,ϑ (s )V Ω (s ) (3) Note tht Q U nd U both depend on θ nd ϑ, but we do not include these in the nottion for clrity. The lst ingredient required to derive policy grdients is the Mrkov chin long which the performnce mesure is estimted. The nturl pproch is to consider the chin defined in the ugmented stte spce, becuse stte-option pirs now ply the role of regulr sttes in usul Mrkov chin. If option ω t hs been initited or is executing t time t in stte s t, then the probbility of trnsitioning to (s t+1, ω t+1 ) in one step is: P (s t+1, ω t+1 s t, ω t ) = π ωt,θ ( s t ) P(s t+1 s t, )( (1 β ωt,ϑ(s t+1 ))1 ωt=ω t+1 + β ωt,ϑ(s t+1 )π Ω (ω t+1 s t+1 )) (4) Clerly, the process given by (4) is homogeneous. Under mild conditions, nd with options vilble everywhere, it is in fct ergodic, nd unique sttionry distribution over stte-option pirs exists. We will now compute the grdient of the expected discounted return with respect to the prmeters θ of the intr-

3 option policies, ssuming tht they re stochstic nd differentible. From (1, 2), it follows tht: ( ) Q Ω (s, ω) π ω,θ ( s) = Q U (s, ω, ) + π ω,θ ( s) γ P (s s, ) U(ω, s ). s We cn further expnd the right hnd side using (3) nd (4), which yields the following theorem: Theorem 1 (Intr-Option Policy Grdient Theorem). Given set of Mrkov options with stochstic intr-option policies differentible in their prmeters θ, the grdient of the expected discounted return with respect to θ nd initil condition (s, ω ) is: µ Ω (s, ω s, ω ) s,ω π ω,θ ( s) Q U (s, ω, ), where µ Ω (s, ω s, ω ) is discounted weighting of stteoption pirs long trjectories strting from (s, ω ): µ Ω (s, ω s, ω ) = t= γt P (s t = s, ω t = ω s, ω ). The proof is in the ppendix. This grdient describes the effect of locl chnge t the primitive level on the globl expected discounted return. In contrst, subgol or pseudorewrd methods ssume the objective of n option is simply to optimize its own rewrd function, ignoring how proposed chnge would propgte in the overll objective. We now turn our ttention to computing grdients for the termintion functions, ssumed this time to be stochstic nd differentible in ϑ. From (1, 2, 3), we hve: Q Ω (s, ω) = π ω,θ ( s) γ P (s s, ) U(ω, s ). s discounted return objective with respect to ϑ nd the initil condition (s 1, ω ) is: s,ω µ Ω (s, ω s 1, ω ) β ω,ϑ(s ) A Ω (s, ω), where µ Ω (s, ω s 1, ω ) is discounted weighting of stte-option pirs from (s 1, ω ): µ Ω (s, ω s 1, ω ) = t= γt P (s t+1 = s, ω t = ω s 1, ω ). The dvntge function often ppers in policy grdient methods (Sutton et l. 2) when forming bseline to reduce the vrince in the grdient estimtes. Its presence in tht context hs to do mostly with lgorithm design. It is interesting tht in our cse, it follows s direct consequence of the derivtion nd gives the theorem n intuitive interprettion: when the option choice is suboptiml with respect to the expected vlue over ll options, the dvntge function is negtive nd it drives the grdient corrections up, which increses the odds of terminting. After termintion, the gent hs the opportunity to pick better option using π Ω. A similr ide lso underlies the interrupting execution model of options (Sutton, Precup, nd Singh 1999) in which termintion is forced whenever the vlue of Q Ω (s, ω) for the current option ω is less thn V Ω (s ). (Mnn, Mnkowitz, nd Mnnor 214) recently studied interrupting options through the lens of n interrupting Bellmn Opertor in vlueitertion setting. The termintion grdient theorem cn be interpreted s providing grdient-bsed interrupting Bellmn opertor. Algorithms nd Architecture Policy over options π Ω Options ω t Hence, the key quntity is the grdient of U. This is nturl consequence of the cll-nd-return execution, in which the goodness of termintion functions cn only be evluted upon entering the next stte. The relevnt grdient cn be further expnded s: s t π ω, β ω Grdients Critic Q U, A Ω TD error t U(ω, s ) = β ω,ϑ(s ) A Ω (s, ω) + γ P (s, ω s, ω) U(ω, s ), (5) ω s where A Ω is the dvntge function (Bird 1993) over options A Ω (s, ω) = Q Ω (s, ω) V Ω (s ). Expnding U(ω,s ) recursively leds to similr form s in theorem (1) but where the weighting of stte-option pirs is now ccording to Mrkov chin shifted by one time step: µ Ω (s t+1, ω t s t, ω t 1 ) (detils re in the ppendix). Theorem 2 (Termintion Grdient Theorem). Given set of Mrkov options with stochstic termintion functions differentible in their prmeters ϑ, the grdient of the expected r t Environment Figure 1: Digrm of the option-critic rchitecture. The option execution model is depicted by switch over the contcts. A new option is selected ccording to π Ω only when the current option termintes. Bsed on theorems 1 nd 2, we cn now design stochstic grdient descent lgorithm for lerning options. Using two-timescle frmework (Kond nd Tsitsiklis 2), we propose to lern the vlues t fst timescle while updting the intr-option policies nd termintion functions t slower rte.

4 We refer to the resulting system s n option-critic rchitecture, in reference to the ctor-critic rchitectures (Sutton 1984). The intr-option policies, termintion functions nd policy over options belong to the ctor prt of the system while the critic consists of Q U nd A Ω. The option-critic rchitecture does not prescribe how to obtin π Ω since vriety of existing pproches would pply: using policy grdient methods t the SMDP level, with plnner over the options models, or using temporl difference updtes. If π Ω is the greedy policy over options, it follows from (2) tht the corresponding one-step off-policy updte trget g (1) t is: g (1) γ t = r t+1 + ( (1 β ωt,ϑ(s t+1 )) + β ωt,ϑ(s t+1 ) mx ω π ωt,θ ( s t+1 ) Q U (s t+1, ω t, ) ) π ω,θ ( s t+1 ) Q U (s t+1, ω, ) which is lso the updte trget of the intr-option Q-lerning lgorithm of (Sutton, Precup, nd Singh 1999). A prototypicl implementtion of option-critic which uses intr-option Q-lerning is shown in Algorithm 1. The tbulr setting is ssumed only for clrity of presenttion. We write α, α θ nd α ϑ for the lerning rtes of the critic, intr-option policies nd termintion functions respectively. Algorithm 1: Option-critic with tbulr intr-option Q- lerning s s Choose ω ccording to n ɛ-soft policy over options π Ω (s) repet Choose ccording to π ω,θ ( s) Tke ction in s, observe s, r 1. Options evlution: δ r Q U (s, ω, ) if s is non-terminl then δ δ + γ(1 β ω,ϑ (s ))Q Ω (s, ω) + γβ ω,ϑ (s ) mx ω Q Ω(s, ω) end Q U (s, ω, ) Q U (s, ω, ) + αδ 2. Options improvement: θ θ + α θ log π ω,θ ( s) Q U (s, ω, ) ϑ ϑ α ϑ β ω,ϑ (s ) (Q Ω (s, ω) V Ω (s )) if β ω,ϑ termintes in s then choose new ω ccording to ɛ-soft(π Ω (s )) s s until s is terminl Lerning Q U in ddition to Q Ω is computtionlly wsteful both in terms of the number of prmeters nd smples. A prcticl solution is to only lern Q Ω nd derive n estimte of Q U from it. Becuse Q U is n expecttion over next, sttes, Q U (s, ω, ) = E s P [r(s, ) + γu(ω, s ) s, ω, ], it follows tht g (1) t is n pproprite estimtor. We chose this pproch for our experiment with deep neurl networks in the Arcde Lerning Environment. Experiments We first consider nvigtion tsk in the four-rooms domin (Sutton, Precup, nd Singh 1999). Our gol is to evlute the bility of set of options lerned fully utonomously to recover from sudden chnge in the environment. (Sutton, Precup, nd Singh 1999) presented similr experiment for set of pre-specified options; the options in our results hve not been specified priori. Initilly the gol is locted in the est doorwy nd the initil stte is drwn uniformly from ll the other cells. After 1 episodes, the gol moves to rndom loction in the lower right room. Primitive movements cn fil with probbility 1/3, in which cse the gent trnsitions rndomly to one of the empty djcent cells. The discount fctor ws.99, nd the rewrd ws +1 t the gol nd otherwise. We chose to prmetrize the intr-option policies with Boltzmnn distributions nd the termintions with sigmoid functions. The policy over options ws lerned using intr-option Q-lerning. We lso implemented primitive ctor-critic (denoted AC-PG) using Boltzmnn policy. We lso compred option-critic to primitive SARSA gent using Boltzmnn explortion nd no eligibility trces. For ll Boltzmnn policies, we set the temperture prmeter to.1. All the weights were initilized to zero. Steps SARSA() AC-PG Episodes OC 4 options OC 8 options Figure 2: After 1 episodes, the gol loction in the fourrooms domin is moved rndomly. Option-critic ( OC ) recovers fster thn the primitive ctor-critic ( AC-PG ) nd SARSA(). Ech line is verged over 35 runs. As cn be seen in Figure 2, when the gol suddenly chnges, the option-critic gent recovers fster. Furthermore, the initil set of options is lerned from scrtch t rte comprble to primitive methods. Despite the simplicity of the domin, we re not wre of other methods which could hve solved this tsk without incurring cost much lrger thn when using primitive ctions lone (McGovern nd Brto 21; Şimşek nd Brto 29).

5 211) of order 3. We experimented with 2, 3 or 4 options. We used Boltzmnn policies for the intr-option policies nd liner-sigmoid functions for the termintion functions. The lerning rtes were set to.1 for the critic nd.1 for both the intr nd termintion grdients. We used n epsilongreedy policy over options with ɛ = Figure 3: Termintion probbilities for the option-critic gent lerning with 4 options. The drkest color represents the wlls in the environment while lighter colors encode higher termintion probbilities. In the two temporlly extended settings, with 4 options nd 8 options, termintion events re more likely to occur ner the doorwys (Figure 3), greeing with the intuition tht they would be good subgols. As opposed to (Sutton, Precup, nd Singh 1999), we did not encode this knowledge ourselves but simply let the gents find options tht would mximize the expected discounted return. Pinbll Domin Figure 4: Pinbll: Smple trjectory of the solution found fter 25 episodes of trining using 4 options All options (color-coded) re used by the policy over options in successful trjectories. The initil stte is in the top left corner nd the gol is in the bottom right one (red circle). In the Pinbll domin (Konidris nd Brto 29), bll must be guided through mze of rbitrrily shped polygons to designted trget loction. The stte spce is continuous over the position nd velocity of the bll in the x- y plne. At every step, the gent must choose mong five discrete primitive ctions: move the bll fster or slower, in the verticl or horizontl direction, or tke the null ction. Collisions with obstcles re elstic nd cn be used to the dvntge of the gent. In this domin, drg coefficient of.995 effectively stops bll movements fter finite number of steps when the null ction is chosen repetedly. Ech thrust ction incurs penlty of 5 while tking no ction costs 1. The episode termintes with +1 rewrd when the gent reches the trget. We interrupted ny episode tking more thn 1 steps nd set the discount fctor to.99. We used intr-option Q-lerning in the critic with liner function pproximtion over Fourier bses (Konidris et l. Undiscounted Return options 3 options 2 options Episodes Figure 5: Lerning curves in the Pinbll domin. In (Konidris nd Brto 29), n option cn only be used nd updted fter gesttion period of 1 episodes. As lerning is fully integrted in option-critic, by 4 episodes ner optiml set of options hd lredy been lerned in ll settings. From qulittive point of view, the options exhibit temporl extension nd speciliztion (fig. 4). We lso observed tht cross mny successful trjectories the red option would consistently be used in the vicinity of the gol. Arcde Lerning Environment We pplied the option-critic rchitecture in the Arcde Lerning Environment (ALE) (Bellemre et l. 213) using deep neurl network to pproximte the critic nd represent the intr-option policies nd termintion functions. We used the sme configurtion s (Mnih et l. 213) for the first 3 convolutionl lyers of the network. We used 32 convolutionl filters of size 8 8 nd stride of 4 in the first lyer, 64 filters of size 4 4 with stride of 2 in the second nd filters with stride of 1 in the third lyer. We then fed the output of the third lyer into dense shred lyer of 512 neurons, s depicted in Figure 6. We fixed the lerning rte for the intr-option policies nd termintion grdient to.25 nd used RMSProp for the critic. π Ω ( s) {β ω(s)} {π ω( s)} Figure 6: Deep neurl network rchitecture. A conctention of the lst 4 imges is fed through the convolutionl lyers, producing dense representtion shred cross intr-option policies, termintion functions nd policy over options. We represented the intr-option policies s liner-softmx

6 of the fourth (dense) lyer, so s to output probbility distribution over ctions conditioned on the current observtion. The termintion functions were similrly defined using sigmoid functions, with one output neuron per termintion. The critic network ws trined using intr-option Q- lerning with experience reply. Option policies nd termintions were updted on-line. We used n ɛ-greedy policy over options with ɛ =.5 during the test phse (Mnih et l. 213). As consequence of optimizing for the return, the termintion grdient tends to shrink options over time. This is expected since in theory primitive ctions re sufficient for solving ny MDP. We tckled this issue by dding smll ξ =.1 term to the dvntge function, used by the termintion grdient: A Ω (s, ω)+ξ = Q Ω (s, ω) V Ω (s)+ξ. This term hs regulriztion effect, by imposing n ξ-mrgin between the vlue estimte of n option nd tht of the optiml one reflected in V Ω. This mkes the dvntge function positive if the vlue of n option is ner the optiml one, thereby stretching it. A similr regulrizer ws proposed in (Mnn, Mnkowitz, nd Mnnor 214). As in (Mnih et l. 216), we observed tht the intr-option policies would quickly become deterministic. This problem seems to pertin to the use of policy grdient methods with deep neurl networks in generl, nd not from option-critic itself. We pplied the regulrizer prescribed by (Mnih et l. 216), by penlizing for low-entropy intr-option policies. Primitive ctions options, no bseline 8 options, bseline 2 options, bseline Figure 7: Sequest: Using bseline in the grdient estimtors improves the distribution over ctions in the intr-option policies, mking them less deterministic. Ech column represents one of the options lerned in Sequest. The verticl xis spns the 18 primitive ctions of ALE. The empiricl ction frequencies re coded by intensity. Finlly, the bseline Q Ω ws dded to the intr-option policy grdient estimtor to reduce its vrince. This chnge provided substntil improvements (Hrb 216) in the qulity of the intr-option policy distributions nd the overll gent performnce s explined in Figure 7. We evluted option-critic in Asterisk, Ms. Pcmn, Sequest nd Zxxon. For comprison, we llowed the system to lern for the sme number of episodes s (Mnih et l. 213) nd fixed the prmeters to the sme vlues in ll four domins. Despite hving more prmeters to lern, optioncritic ws cpble of lerning options tht would chieve the gol in ll gmes, from the ground up, within 2 episodes (Figure 8). In Asterisk, Sequest nd Zxxon, option-critic surpssed the performnce of the originl DQN rchitecture bsed on primitive ctions. The eight options lerned in ech gme re lerned fully end-to-end, in tndem with the feture representtion, with no prior specifiction of subgol or pseudo-rewrd structure. The solution found by option-critic ws esy to interpret in the gme of Sequest when lerning with only two options. We found tht ech option specilized in behvior sequence which would include either the up or the down button. Figure 9 shows typicl trnsition from one option to the other, first going upwrd with option then switching to option 1 downwrd. Options with similr structure were lso found in this gme by (Krishnmurthy et l. 216) using n option discovery lgorithm bsed on grph prtitioning. Relted Work As option discovery hs received lot of ttention recently, we now discuss in more detil the plce of our pproch with respect to others. (Comnici nd Precup 21) used grdient-bsed pproch for improving only the termintion function of semi-mrkov options; termintion ws modeled by logistic distribution over cumultive mesure of the fetures observed since initition. (Levy nd Shimkin 211) lso built on policy grdient methods by constructing explicitly the ugmented stte spce nd treting stopping events s dditionl control ctions. In contrst, we do not need to construct this (very lrge) spce directly. (Silver nd Ciosek 212) dynmiclly chined options into longer temporl sequences by relying on compositionlity properties. Erlier work on liner options (Sorg nd Singh 21) lso used compositionlity to pln using liner expecttion models for options. Our pproch lso relies on the Bellmn equtions nd compositionlity, but in conjunction with policy grdient methods. Severl very recent ppers lso ttempt to formulte option discovery s n optimiztion problem with solutions tht re comptible with function pproximtion. (Dniel et l. 216) lern return-optimizing options by treting the termintion functions s hidden vribles, nd using EM to lern them. (Vezhnevets et l. 216) consider the problem of lerning options tht hve open-loop intr-option policies, lso clled mcro-ctions. As in clssicl plnning, ction sequences tht re more frequent re cched. A mpping from sttes to ction sequences is lerned long with commitment module, which triggers re-plnning when necessry. In contrst, we use closed-loop policies throughout, which re rective to stte informtion nd cn provide better solutions. (Mnkowitz, Mnn, nd Mnnor 216) propose grdient-bsed option lerning lgorithm, ssuming prticulr structure for the initition sets nd termintion functions. Under this frmework, exctly one option is ctive in ny prtition of the stte spce. (Kulkrni et l. 216)

7 Avg. Score Testing Moving vg.1 DQN Testing Moving vg.1 DQN Testing 5 Moving vg.1 DQN Testing Moving vg.1 DQN Epoch Epoch Epoch Epoch () Asterix (b) Ms. Pcmn (c) Sequest (d) Zxxon Figure 8: Lerning curves in the Arcde Lerning Environment. The sme set of prmeters ws used cross ll four gmes: 8 options,.1 termintion regulriztion,.1 entropy regulriztion, nd bseline for the intr-option policy grdients. Time Option Option 1 Figure 9: Up/down speciliztion in the solution found by option-critic when lerning with 2 options in Sequest. The top br shows trjectory in the gme, with white representing segment during which option 1 ws ctive nd blck for option 2. use the DQN frmework to implement grdient-bsed option lerner, which uses intrinsic rewrds to lern the internl policies of options, nd extrinsic rewrds to lern the policy over options. As opposed to our frmework, descriptions of the subgols re given s inputs to the option lerners. Option-critic is conceptully generl nd does not require intrinsic motivtion for lerning the options. Discussion We developed generl grdient-bsed pproch for lerning simultneously the intr-option policies nd termintion functions, s well s the policy over options, in order to optimize performnce objective for the tsk t hnd. Our ALE experiments demonstrte successful end-to-end lerning of options in the presence of nonliner function pproximtion. As noted, our pproch only requires specifying the number of options. However, if one wnted to use dditionl pseudo-rewrds, the option-critic frmework would esily ccommodte it. In this cse, the internl policies nd termintion function grdients would simply need to be tken with respect to the pseudo-rewrds insted of the tsk rewrd. A simple instnce of this ide, which we used in some of the experiments, is to use dditionl rewrds to encourge options tht re indeed temporlly extended by dding penlty whenever switching event occurs. Our pproch cn work semlessly with ny other heuristic for bising the set of options towrds some desirble property (e.g. compositionlity or sprsity), s long s it cn be expressed s n dditive rewrd structure. However, s seen in the results, such bising is not necessry to produce good results. The option-critic rchitecture relies on the policy grdient theorem, nd s discussed in (Thoms 214), the grdient estimtors cn be bised in the discounted cse. By introducing fctors of the form γ t t i=1 (1 β i) in our updtes (Thoms 214, eq (3)), it would be possible to obtin unbised estimtes. However, we do not recommend this pproch since the smple complexity of the unbised estimtors is generlly too high nd the bised estimtors performed well in our experiments. Perhps the biggest remining limittion of our work is the ssumption tht ll options pply everywhere. In the cse of function pproximtion, nturl extension to initition sets is to use clssifier over fetures, or some other form of function pproximtion. As result, determining which options re llowed my hve similr cost to evluting policy over options (unlike in the tbulr setting, where options with sprse initition sets led to fster decisions). This is kin to eligibility trces, which re more expensive thn using no trce in the tbulr cse, but hve the sme complexity with function pproximtion. If initition sets re to be lerned, the min constrint tht needs to be dded is tht the options nd the policy over them led to n ergodic chin in the ugmented stte-option spce. This cn be expressed s flow condition tht links initition sets with termintions. The precise description of this condition, s well s sprsity regulriztion for initition sets, is left for future work. Acknowledgements The uthors grtefully cknowledge finncil support for this work by the Ntionl Science nd Engineering Reserch Council of Cnd (NSERC) nd the Fonds de recherche du Quebec - Nture et Technologies (FRQNT).

8 Appendix Augmented Process If ω t hs been initited or is executing t time t, then the discounted probbility of trnsitioning to (s t+1, ω t+1 ) is: γ (s t+1, ω t+1 s t, ω t ) = π ωt ( s t ) γ P(s t+1 s t, ) ( (1 β ωt (s t+1 ))1 ωt=ω t+1 + β ωt (s t+1 )π Ω (ω t+1 s t+1 ) ). When conditioning the process from (s t, ω t 1 ), the discounted probbility of trnsitioning to s t+1, ω t is: γ (s t+1, ω t s t, ω t 1 ) = ( (1 β ωt 1 (s t ))1 ωt=ω t 1 + β ωt 1 (s t )π Ω (ω t s t ) ) π ωt ( s t ) γ P (s t+1 s t, ). More generlly, the k-steps discounted probbilities cn be expressed recursively s follows: P (k) γ (s t+k, ω t+k s t, ω t ) = s t+1 γ (s t+1, ω t+1 s t, ω t ) P (k 1) ω t+1 ( P (k) γ (s t+k, ω t+k 1 s t, ω t 1 ) = s t+1 γ (s t+k, ω t+k s t+1, ω t+1 ) ), ( ω t γ (s t+1, ω t s t, ω t 1 ) P (k 1) γ (s t+k, ω t+k 1 s t+1, ω t ) ). Proof of the Intr-Option Policy Grdient Theorem Tking the grdient of the option-vlue function: Q Ω (s, ω) = π ω,θ ( s) Q U (s, ω, ) = ( π ω,θ ( s) Q U (s, ω, )+ ) π ω,θ ( s) Q U (s, ω, ) = ( π ω,θ ( s) Q U (s, ω, )+ π ω,θ ( s) ) γ P (s s, ) U(ω, s ), (6) s U(ω, s ) = (1 β ω,ϑ (s )) Q Ω(s, ω) + β ω,ϑ (s ) V Ω(s ) = (1 β ω,ϑ (s )) Q Ω(s, ω) + β ω,ϑ (s ) π Ω (ω s ) Q Ω(s, ω ) ω = ω ( (1 βω,ϑ (s ))1 ω =ω+ β ω,ϑ (s )π Ω (ω s ) ) Q Ω (s, ω ). (7) where (7) follows from the ssumption tht θ only ppers in the intr-option policies. Substituting (7) into (6) yields recursion which, using the previous remrks bout ugmented process cn be trnsformed into: Q Ω (s, ω) = π ω,θ ( s) Q U (s, ω, )+ π ω,θ ( s) γ P (s s, ) (β ω,ϑ (s )π Ω (ω s ) s ω ) + (1 β ω,ϑ (s QΩ (s, ω ) ))1 ω =ω = π ω,θ ( s) Q U (s, ω, )+ γ (s, ω s, ω) Q Ω(s, ω ) s ω = P (k) γ (s, ω s, ω) π ω,θ ( s ) Q U (s, ω, ). k= s,ω The grdient of the expected discounted return with respect to θ is then: Q Ω (s, ω ) = s,ω k= = s,ω P (k) γ (s, ω s, ω ) µ Ω (s, ω s, ω ) π ω,θ ( s) Q U (s, ω, ) π ω,θ ( s) Q U (s, ω, ). Proof of the Termintion Grdient Theorem The expected sum of discounted rewrds strting from (s 1, ω ) is given by: [ ] U(ω, s 1 ) = E γ t 1 r t s 1, ω. t=1 We strt by expnding U s follows: U(ω, s ) = (1 β ω,ϑ (s ))Q Ω (s, ω) + β ω,ϑ (s )V Ω (s ) = (1 β ω,ϑ (s )) ( π ω,θ ( s ) r(s, ) + ) γ P (s s, ) U(ω, s ) s + β ω,ϑ (s ) ω π Ω (ω s ) ( π ω,θ ( s ) r(s, ) + ) γ P (s s, ) U(ω, s ) s The grdient of U is then: U(ω, s ) = β ω,ϑ(s ) (V Ω (s ) Q Ω (s, ω)) + }{{} A Ω(s,ω) (1 β ω,ϑ (s )) π ω,θ ( s ) γ P (s s, ) U(ω, s ). s.

9 Using the structure of the ugmented process: U(ω, s ) = β ω,ϑ(s ) A Ω (s, ω)+ γ (s, ω s, ω) U(ω, s ) ω s = P (k) γ (s, ω s, ω) β ω,ϑ(s ) A Ω (s, ω ). ω,s k= We finlly obtin: U(ω, s 1 ) = ω,s P (k) k= γ (s, ω s 1, ω ) β ω,ϑ(s ) A Ω (s, ω) = µ Ω (s, ω s 1, ω ) β ω,ϑ(s ) A Ω (s, ω). ω,s References Bird, L. C Advntge updting. Technicl Report WL TR , Wright Lbortory. Bellemre, M. G.; Nddf, Y.; Veness, J.; nd Bowling, M The rcde lerning environment: An evlution pltform for generl gents. Journl of Artificil Intelligence Reserch 47: Comnici, G., nd Precup, D. 21. Optiml policy switching lgorithms for reinforcement lerning. In AAMAS, Şimşek, O., nd Brto, A. G. 29. Skill chrcteriztion bsed on betweenness. In NIPS 21, Dniel, C.; vn Hoof, H.; Peters, J.; nd Neumnn, G Probbilistic inference for determining options in reinforcement lerning. Mchine Lerning, Specil Issue 14(2): Hrb, J Lerning options in deep reinforcement lerning. Mster s thesis, McGill University. Kond, V. R., nd Tsitsiklis, J. N. 2. Actor-critic lgorithms. In NIPS 12, Konidris, G., nd Brto, A. 29. Skill discovery in continuous reinforcement lerning domins using skill chining. In NIPS 22, Konidris, G.; Kuindersm, S.; Grupen, R. A.; nd Brto, A. G Autonomous skill cquisition on mobile mnipultor. In AAAI. Krishnmurthy, R.; Lkshminrynn, A. S.; Kumr, P.; nd Rvindrn, B Hierrchicl reinforcement lerning using sptio-temporl bstrctions nd deep neurl networks. CoRR bs/ Kulkrni, T.; Nrsimhn, K.; Seedi, A.; nd Tenenbum, J Hierrchicl deep reinforcement lerning: Integrting temporl bstrction nd intrinsic motivtion. In NIPS 29. Levy, K. Y., nd Shimkin, N Unified inter nd intr options lerning using policy grdient methods. In EWRL, Mnkowitz, D. J.; Mnn, T. A.; nd Mnnor, S Adptive skills, dptive prtitions (ASAP). In NIPS 29. Mnn, T. A.; Mnkowitz, D. J.; nd Mnnor, S Timeregulrized interrupting options (TRIO). In ICML, Mnn, T. A.; Mnnor, S.; nd Precup, D Approximte vlue itertion with temporlly extended ctions. Journl of Artificil Intelligence Reserch 53: McGovern, A., nd Brto, A. G. 21. Automtic discovery of subgols in reinforcement lerning using diverse density. In ICML, Menche, I.; Mnnor, S.; nd Shimkin, N. 22. Q-cut - dynmic discovery of sub-gols in reinforcement lerning. In ECML, Mnih, V.; Kvukcuoglu, K.; Silver, D.; Grves, A.; Antonoglou, I.; Wierstr, D.; nd Riedmiller, M. A Plying tri with deep reinforcement lerning. CoRR bs/ Mnih, V.; Bdi, A. P.; Mirz, M.; Grves, A.; Lillicrp, T. P.; Hrley, T.; Silver, D.; nd Kvukcuoglu, K Asynchronous methods for deep reinforcement lerning. In ICML. Niekum, S Semnticlly Grounded Lerning from Unstructured Demonstrtions. Ph.D. Disserttion, University of Msschusetts, Amherst. Precup, D. 2. Temporl bstrction in reinforcement lerning. Ph.D. Disserttion, University of Msschusetts, Amherst. Putermn, M. L Mrkov Decision Processes: Discrete Stochstic Dynmic Progrmming. John Wiley & Sons, Inc. Silver, D., nd Ciosek, K Compositionl plnning using optiml option models. In ICML. Sorg, J., nd Singh, S. P. 21. Liner options. In AAMAS, Stolle, M., nd Precup, D. 22. Lerning options in reinforcement lerning. In Abstrction, Reformultion nd Approximtion, 5th Interntionl Symposium, SARA Proceedings, Sutton, R. S.; McAllester, D. A.; Singh, S. P.; nd Mnsour, Y. 2. Policy grdient methods for reinforcement lerning with function pproximtion. In NIPS Sutton, R. S.; Precup, D.; nd Singh, S. P Between mdps nd semi-mdps: A frmework for temporl bstrction in reinforcement lerning. Artificil Intelligence 112(1-2): Sutton, R. S Temporl Credit Assignment in Reinforcement Lerning. Ph.D. Disserttion. Thoms, P Bis in nturl ctor-critic lgorithms. In ICML, Vezhnevets, A. S.; Mnih, V.; Agpiou, J.; Osindero, S.; Grves, A.; Vinyls, O.; nd Kvukcuoglu, K Strtegic ttentive writer for lerning mcro-ctions. In NIPS 29.

Multi-Step Reinforcement Learning: A Unifying Algorithm

Multi-Step Reinforcement Learning: A Unifying Algorithm Multi-Step Reinforcement Lerning: A Unifying Algorithm Kristopher De Asis, 1 J. Fernndo Hernndez-Grci, 1 G. Zchris Hollnd, 1 Richrd S. Sutton Reinforcement Lerning nd Artificil Intelligence Lbortory, University

More information

Chapter 3: The Reinforcement Learning Problem. The Agent'Environment Interface. Getting the Degree of Abstraction Right. The Agent Learns a Policy

Chapter 3: The Reinforcement Learning Problem. The Agent'Environment Interface. Getting the Degree of Abstraction Right. The Agent Learns a Policy Chpter 3: The Reinforcement Lerning Problem The Agent'Environment Interfce Objectives of this chpter: describe the RL problem we will be studying for the reminder of the course present idelized form of

More information

3: Inventory management

3: Inventory management INSE6300 Ji Yun Yu 3: Inventory mngement Concordi Februry 9, 2016 Supply chin mngement is bout mking sequence of decisions over sequence of time steps, fter mking observtions t ech of these time steps.

More information

The Okun curve is non-linear

The Okun curve is non-linear Economics Letters 70 (00) 53 57 www.elsevier.com/ locte/ econbse The Okun curve is non-liner Mtti Viren * Deprtment of Economics, 004 University of Turku, Turku, Finlnd Received 5 My 999; ccepted 0 April

More information

Technical Appendix. The Behavior of Growth Mixture Models Under Nonnormality: A Monte Carlo Analysis

Technical Appendix. The Behavior of Growth Mixture Models Under Nonnormality: A Monte Carlo Analysis Monte Crlo Technicl Appendix 1 Technicl Appendix The Behvior of Growth Mixture Models Under Nonnormlity: A Monte Crlo Anlysis Dniel J. Buer & Ptrick J. Currn 10/11/2002 These results re presented s compnion

More information

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs?

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs? CS 188: Artificil Intelligence Fll 2010 Lecture 9: MDP 9/2/2010 Reinforcement Lerning [DEMOS] Bic ide: Receive feedbck in the form of rewrd Agent utility i defined by the rewrd function Mut (lern to) ct

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations CS 188: Artificil Intelligence Fll 2009 Lecture 10: MDP 9/29/2009 Announcement P2: Due Wednedy P3: MDP nd Reinforcement Lerning i up! W2: Out lte thi week Dn Klein UC Berkeley Mny lide over the coure dpted

More information

A Fuzzy Inventory Model With Lot Size Dependent Carrying / Holding Cost

A Fuzzy Inventory Model With Lot Size Dependent Carrying / Holding Cost IOSR Journl of Mthemtics (IOSR-JM e-issn: 78-578,p-ISSN: 9-765X, Volume 7, Issue 6 (Sep. - Oct. 0, PP 06-0 www.iosrournls.org A Fuzzy Inventory Model With Lot Size Dependent Crrying / olding Cost P. Prvthi,

More information

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions CS 188: Artificil Intelligence Fll 2008 Lecture 10: MDP 9/30/2008 Dn Klein UC Berkeley Recp: MDP Mrkov deciion procee: Stte S Action A Trnition P(,) (or T(,, )) Rewrd R(,, ) (nd dicount γ) Strt tte 0 Quntitie:

More information

UNinhabited aerial vehicles (UAVs) are becoming increasingly

UNinhabited aerial vehicles (UAVs) are becoming increasingly A Process Algebr Genetic Algorithm Sertc Krmn Tl Shim Emilio Frzzoli Abstrct A genetic lgorithm tht utilizes process lgebr for coding of solution chromosomes nd for defining evolutionry bsed opertors is

More information

164 CHAPTER 2. VECTOR FUNCTIONS

164 CHAPTER 2. VECTOR FUNCTIONS 164 CHAPTER. VECTOR FUNCTIONS.4 Curvture.4.1 Definitions nd Exmples The notion of curvture mesures how shrply curve bends. We would expect the curvture to be 0 for stright line, to be very smll for curves

More information

DYNAMIC PROGRAMMING REINFORCEMENT LEARNING. COGS 202 : Week 7 Presentation

DYNAMIC PROGRAMMING REINFORCEMENT LEARNING. COGS 202 : Week 7 Presentation DYNAMIC PROGRAMMING REINFORCEMENT LEARNING COGS 202 : Week 7 Preenttion OUTLINE Recp (Stte Vlue nd Action Vlue function) Computtion in MDP Dynmic Progrmming (DP) Policy Evlution Policy Improvement Policy

More information

UNIT 7 SINGLE SAMPLING PLANS

UNIT 7 SINGLE SAMPLING PLANS UNIT 7 SINGLE SAMPLING PLANS Structure 7. Introduction Objectives 7. Single Smpling Pln 7.3 Operting Chrcteristics (OC) Curve 7.4 Producer s Risk nd Consumer s Risk 7.5 Averge Outgoing Qulity (AOQ) 7.6

More information

Gridworld Values V* Gridworld: Q*

Gridworld Values V* Gridworld: Q* CS 188: Artificil Intelligence Mrkov Deciion Procee II Intructor: Dn Klein nd Pieter Abbeel --- Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning CS 188: Artificil Intelligence Spring 2011 Lecture 8: Gme, MDP 2/14/2010 Pieter Abbeel UC Berkeley Mny lide dpted from Dn Klein Outline Zero-um determinitic two plyer gme Minimx Evlution function for non-terminl

More information

Optimal firm's policy under lead time- and price-dependent demand: interest of customers rejection policy

Optimal firm's policy under lead time- and price-dependent demand: interest of customers rejection policy Optiml firm's policy under led time- nd price-dependent demnd: interest of customers rejection policy Abduh Syid Albn Université Grenoble Alpes, G-SCOP, F-38000 Grenoble, Frnce bduh-syid.lbn@grenoble-inp.org

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 4

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 4 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 4 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Non-Deterministic Serch Picture runner, coming to the end of

More information

Addition and Subtraction

Addition and Subtraction Addition nd Subtrction Nme: Dte: Definition: rtionl expression A rtionl expression is n lgebric expression in frction form, with polynomils in the numertor nd denomintor such tht t lest one vrible ppers

More information

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World CS 188: Artificil Intelligence Mrkov Deciion Procee Non-Determinitic Serch Dn Klein, Pieter Abbeel Univerity of Cliforni, Berkeley Exmple: Grid World Grid World Action A mze-like problem The gent live

More information

Cache CPI and DFAs and NFAs. CS230 Tutorial 10

Cache CPI and DFAs and NFAs. CS230 Tutorial 10 Cche CPI nd DFAs nd NFAs CS230 Tutoril 10 Multi-Level Cche: Clculting CPI When memory ccess is ttempted, wht re the possible results? ccess miss miss CPU L1 Cche L2 Cche Memory L1 cche hit L2 cche hit

More information

Buckling of Stiffened Panels 1 overall buckling vs plate buckling PCCB Panel Collapse Combined Buckling

Buckling of Stiffened Panels 1 overall buckling vs plate buckling PCCB Panel Collapse Combined Buckling Buckling of Stiffened Pnels overll uckling vs plte uckling PCCB Pnel Collpse Comined Buckling Vrious estimtes hve een developed to determine the minimum size stiffener to insure the plte uckles while the

More information

Continuous Optimal Timing

Continuous Optimal Timing Srlnd University Computer Science, Srbrücken, Germny My 6, 205 Outline Motivtion Preliminries Existing Algorithms Our Algorithm Empiricl Evlution Conclusion Motivtion Probbilistic models unrelible/unpredictble

More information

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies.

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies. CS 188: Artificil Intelligence Fll 2008 Lecture 9: MDP 9/25/2008 Announcement Homework olution / review eion: Mondy 9/29, 7-9pm in 2050 Vlley LSB Tuedy 9/0, 6-8pm in 10 Evn Check web for detil Cover W1-2,

More information

A portfolio approach to the optimal funding of pensions

A portfolio approach to the optimal funding of pensions Economics Letters 69 (000) 01 06 www.elsevier.com/ locte/ econbse A portfolio pproch to the optiml funding of pensions Jysri Dutt, Sndeep Kpur *, J. Michel Orszg b, b Fculty of Economics University of

More information

CH 71 COMPLETING THE SQUARE INTRODUCTION FACTORING PERFECT SQUARE TRINOMIALS

CH 71 COMPLETING THE SQUARE INTRODUCTION FACTORING PERFECT SQUARE TRINOMIALS CH 7 COMPLETING THE SQUARE INTRODUCTION I t s now time to py our dues regrding the Qudrtic Formul. Wht, you my sk, does this men? It mens tht the formul ws merely given to you once or twice in this course,

More information

arxiv: v1 [cs.lg] 23 Jan 2019

arxiv: v1 [cs.lg] 23 Jan 2019 Robust temporl difference lerning for criticl domins rxiv:1901.08021v1 [cs.lg] 23 Jn 2019 Richrd Klim University of Liverpool, UK richrd.klim@liverpool.c.uk Michel Kisers Centrum Wiskunde & Informtic,

More information

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics BERTRAND VS. COURNOT COMPETITION IN ASYMMETRIC DUOPOLY: THE ROLE OF LICENSING

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics BERTRAND VS. COURNOT COMPETITION IN ASYMMETRIC DUOPOLY: THE ROLE OF LICENSING UNIVERSITY OF NOTTINGHAM Discussion Ppers in Economics Discussion Pper No. 0/0 BERTRAND VS. COURNOT COMPETITION IN ASYMMETRIC DUOPOLY: THE ROLE OF LICENSING by Arijit Mukherjee April 00 DP 0/0 ISSN 160-48

More information

What is Monte Carlo Simulation? Monte Carlo Simulation

What is Monte Carlo Simulation? Monte Carlo Simulation Wht is Monte Crlo Simultion? Monte Crlo methods re widely used clss of computtionl lgorithms for simulting the ehvior of vrious physicl nd mthemticl systems, nd for other computtions. Monte Crlo lgorithm

More information

3/1/2016. Intermediate Microeconomics W3211. Lecture 7: The Endowment Economy. Today s Aims. The Story So Far. An Endowment Economy.

3/1/2016. Intermediate Microeconomics W3211. Lecture 7: The Endowment Economy. Today s Aims. The Story So Far. An Endowment Economy. 1 Intermedite Microeconomics W3211 Lecture 7: The Endowment Economy Introduction Columbi University, Spring 2016 Mrk Den: mrk.den@columbi.edu 2 The Story So Fr. 3 Tody s Aims 4 Remember: the course hd

More information

CHAPTER-IV PRE-TEST ESTIMATOR OF REGRESSION COEFFICIENTS: PERFORMANCE UNDER LINEX LOSS FUNCTION

CHAPTER-IV PRE-TEST ESTIMATOR OF REGRESSION COEFFICIENTS: PERFORMANCE UNDER LINEX LOSS FUNCTION CHAPTER-IV PRE-TEST ESTIMATOR OF REGRESSION COEFFICIENTS: PERFORMANCE UNDER LINEX LOSS FUNCTION 4.1 INTRODUCTION It hs lredy been demonstrted tht the restricted lest squres estimtor is more efficient thn

More information

The Market Approach to Valuing Businesses (Second Edition)

The Market Approach to Valuing Businesses (Second Edition) BV: Cse Anlysis Completed Trnsction & Guideline Public Comprble MARKET APPROACH The Mrket Approch to Vluing Businesses (Second Edition) Shnnon P. Prtt This mteril is reproduced from The Mrket Approch to

More information

On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud

On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud On-demnd, Spot, or Both: Dynmic Resource Alloction for Executing Btch Jobs in the Cloud Ishi Menche Microsoft Reserch Ohd Shmir Weizmnn Institute Nvendu Jin Microsoft Reserch Abstrct Cloud computing provides

More information

INF 4130 Exercise set 4

INF 4130 Exercise set 4 INF 4130 Exercise set 4 Exercise 1 List the order in which we extrct the nodes from the Live Set queue when we do redth first serch of the following grph (tree) with the Live Set implemented s LIFO queue.

More information

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search Overview CSE 473 Mrkov Deciion Procee Dn Weld Mny lide from Chri Bihop, Mum, Dn Klein, Sturt Ruell, Andrew Moore & Luke Zettlemoyer Introduction & Agent Serch, Heuritic & CSP Adverril Serch Logicl Knowledge

More information

A Closer Look at Bond Risk: Duration

A Closer Look at Bond Risk: Duration W E B E X T E S I O 4C A Closer Look t Bond Risk: Durtion This Extension explins how to mnge the risk of bond portfolio using the concept of durtion. BOD RISK In our discussion of bond vlution in Chpter

More information

JFE Online Appendix: The QUAD Method

JFE Online Appendix: The QUAD Method JFE Online Appendix: The QUAD Method Prt of the QUAD technique is the use of qudrture for numericl solution of option pricing problems. Andricopoulos et l. (00, 007 use qudrture s the only computtionl

More information

A ppendix to. I soquants. Producing at Least Cost. Chapter

A ppendix to. I soquants. Producing at Least Cost. Chapter A ppendix to Chpter 0 Producing t est Cost This ppendix descries set of useful tools for studying firm s long-run production nd costs. The tools re isoqunts nd isocost lines. I soqunts FIGURE A0. SHOWS

More information

Problem Set for Chapter 3: Simple Regression Analysis ECO382 Econometrics Queens College K.Matsuda

Problem Set for Chapter 3: Simple Regression Analysis ECO382 Econometrics Queens College K.Matsuda Problem Set for Chpter 3 Simple Regression Anlysis ECO382 Econometrics Queens College K.Mtsud Excel Assignments You re required to hnd in these Excel Assignments by the due Mtsud specifies. Legibility

More information

(a) by substituting u = x + 10 and applying the result on page 869 on the text, (b) integrating by parts with u = ln(x + 10), dv = dx, v = x, and

(a) by substituting u = x + 10 and applying the result on page 869 on the text, (b) integrating by parts with u = ln(x + 10), dv = dx, v = x, and Supplementry Questions for HP Chpter 5. Derive the formul ln( + 0) d = ( + 0) ln( + 0) + C in three wys: () by substituting u = + 0 nd pplying the result on pge 869 on the tet, (b) integrting by prts with

More information

Option exercise with temptation

Option exercise with temptation Economic Theory 2008) 34: 473 501 DOI 10.1007/s00199-006-0194-3 RESEARCH ARTICLE Jinjun Mio Option exercise with tempttion Received: 25 Jnury 2006 / Revised: 5 December 2006 / Published online: 10 Jnury

More information

9.3. Regular Languages

9.3. Regular Languages 9.3. REGULAR LANGUAGES 139 9.3. Regulr Lnguges 9.3.1. Properties of Regulr Lnguges. Recll tht regulr lnguge is the lnguge ssocited to regulr grmmr, i.e., grmmr G = (N, T, P, σ) in which every production

More information

Research Article Existence of Positive Solution to Second-Order Three-Point BVPs on Time Scales

Research Article Existence of Positive Solution to Second-Order Three-Point BVPs on Time Scales Hindwi Publishing Corportion Boundry Vlue Problems Volume 2009, Article ID 685040, 6 pges doi:10.1155/2009/685040 Reserch Article Existence of Positive Solution to Second-Order hree-point BVPs on ime Scles

More information

How to construct good temporal abstractions. Doina Precup McGill University Joint work with Pierre-Luc Bacon and Jean Mehreb-Harb

How to construct good temporal abstractions. Doina Precup McGill University Joint work with Pierre-Luc Bacon and Jean Mehreb-Harb How to construct good temporal abstractions Doina Precup McGill University Joint work with Pierre-Luc Bacon and Jean Mehreb-Harb EWRL, December 2016 Options framework Suppose we have an MDP S, A, r, P,

More information

Menu costs, firm size and price rigidity

Menu costs, firm size and price rigidity Economics Letters 66 (2000) 59 63 www.elsevier.com/ locte/ econbse Menu costs, firm size nd price rigidity Robert A. Buckle *, John A. Crlson, b School of Economics nd Finnce, Victori University of Wellington,

More information

PSAS: Government transfers what you need to know

PSAS: Government transfers what you need to know PSAS: Government trnsfers wht you need to know Ferury 2018 Overview This summry will provide users with n understnding of the significnt recognition, presenttion nd disclosure requirements of the stndrd.

More information

International Budget Partnership OPEN BUDGET QUESTIONNAIRE POLAND

International Budget Partnership OPEN BUDGET QUESTIONNAIRE POLAND Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE POLAND September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC 20002

More information

PRICING CONVERTIBLE BONDS WITH KNOWN INTEREST RATE. Jong Heon Kim

PRICING CONVERTIBLE BONDS WITH KNOWN INTEREST RATE. Jong Heon Kim Kngweon-Kyungki Mth. Jour. 14 2006, No. 2, pp. 185 202 PRICING CONVERTIBLE BONDS WITH KNOWN INTEREST RATE Jong Heon Kim Abstrct. In this pper, using the Blck-Scholes nlysis, we will derive the prtil differentil

More information

FINANCIAL ANALYSIS I. INTRODUCTION AND METHODOLOGY

FINANCIAL ANALYSIS I. INTRODUCTION AND METHODOLOGY Dhk Wter Supply Network Improvement Project (RRP BAN 47254003) FINANCIAL ANALYSIS I. INTRODUCTION AND METHODOLOGY A. Introduction 1. The Asin Development Bnk (ADB) finncil nlysis of the proposed Dhk Wter

More information

production for Community & Culture Project Reference e 2 design episodes Bogotá: Building a Sustainable City and Affordable Green Housing.

production for Community & Culture Project Reference e 2 design episodes Bogotá: Building a Sustainable City and Affordable Green Housing. Community & Culture Project Reference e 2 design episodes Bogotá: Building Sustinble City nd Affordble Green Housing. 1) Red the bckground essy nd discussion questions for e 2 design episodes Bogotá: Building

More information

International Monopoly under Uncertainty

International Monopoly under Uncertainty Interntionl Monopoly under Uncertinty Henry Ary University of Grnd Astrct A domestic monopolistic firm hs the option to service foreign mrket through export or y setting up plnt in the host country under

More information

THE FINAL PROOF SUPPORTING THE TURNOVER FORMULA.

THE FINAL PROOF SUPPORTING THE TURNOVER FORMULA. THE FINAL PROOF SUPPORTING THE TURNOVER FORMULA. I would like to thnk Aris for his mthemticl contriutions nd his swet which hs enled deeper understnding of the turnover formul to emerge. His contriution

More information

Chapter55. Algebraic expansion and simplification

Chapter55. Algebraic expansion and simplification Chpter55 Algebric expnsion nd simplifiction Contents: A The distributive lw B The product ( + b)(c + d) C Difference of two squres D Perfect squres expnsion E Further expnsion F The binomil expnsion 88

More information

Roadmap of This Lecture

Roadmap of This Lecture Reltionl Model Rodmp of This Lecture Structure of Reltionl Dtbses Fundmentl Reltionl-Algebr-Opertions Additionl Reltionl-Algebr-Opertions Extended Reltionl-Algebr-Opertions Null Vlues Modifiction of the

More information

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights:

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights: Technicl Report Technicl Anlyst FIS Technicl - Grins And Ferts Edwrd Hutn 44 20 7090 1120 Edwrdh@freightinvesr.com Highlights: SOY The weekly schstic is wrning slowing momentum in the mrket. USD 966 ¼

More information

Open Space Allocation and Travel Costs

Open Space Allocation and Travel Costs Open Spce Alloction nd Trvel Costs By Kent Kovcs Deprtment of Agriculturl nd Resource Economics University of Cliforni, Dvis kovcs@priml.ucdvis.edu Pper prepred for presenttion t the Americn Agriculturl

More information

ASYMMETRIC SWITCHING COSTS CAN IMPROVE THE PREDICTIVE POWER OF SHY S MODEL

ASYMMETRIC SWITCHING COSTS CAN IMPROVE THE PREDICTIVE POWER OF SHY S MODEL Document de trvil 2012-14 / April 2012 ASYMMETRIC SWITCHIG COSTS CA IMPROVE THE PREDICTIVE POWER OF SHY S MODEL Evens Slies OFCE-Sciences-Po Asymmetric switching costs cn improve the predictive power of

More information

OPEN BUDGET QUESTIONNAIRE UKRAINE

OPEN BUDGET QUESTIONNAIRE UKRAINE Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE UKRAINE September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC 20002

More information

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes CS 188: Artificil Intelligence Fll 2011 Mximum Expected Utility Why hould we verge utilitie? Why not minimx? Lecture 8: Utilitie / MDP 9/20/2011 Dn Klein UC Berkeley Principle of mximum expected utility:

More information

Technical Report Global Leader Dry Bulk Derivatives

Technical Report Global Leader Dry Bulk Derivatives Soybens Mrch 17 - Weekly Soybens Mrch 17 - Dily Weekly Close US$ 1,054 ½ RSI 59 MACD Bullish The hisgrm is widening S1 US$ 1,016 ½ S2 US$ 993 R1 US$ 1,071 R2 US$ 1,096 Dily Close US$ 1,030 RSI 60 MACD

More information

Chapter 2: Relational Model. Chapter 2: Relational Model

Chapter 2: Relational Model. Chapter 2: Relational Model Chpter : Reltionl Model Dtbse System Concepts, 5 th Ed. See www.db-book.com for conditions on re-use Chpter : Reltionl Model Structure of Reltionl Dtbses Fundmentl Reltionl-Algebr-Opertions Additionl Reltionl-Algebr-Opertions

More information

Fractal Analysis on the Stock Price of C to C Electronic Commerce Enterprise Ming Chen

Fractal Analysis on the Stock Price of C to C Electronic Commerce Enterprise Ming Chen 6th Interntionl Conference on Electronic, Mechnicl, Informtion nd Mngement (EMIM 2016) Frctl Anlysis on the Stock Price of C to C Electronic Commerce Enterprise Ming Chen Soochow University, Suzhou, Chin

More information

Bequest motives and fertility decisions B

Bequest motives and fertility decisions B Economics Letters 92 (2006) 348 352 www.elsevier.com/locte/econbse Bequest motives nd fertility decisions B Ritsuko Futgmi, Kimiyoshi Kmd b, *, Tkshi Sto c Deprtment of Mngement Informtion Systems, Chubu

More information

OPEN BUDGET QUESTIONNAIRE

OPEN BUDGET QUESTIONNAIRE Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE SOUTH KOREA September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC 20002

More information

Outline. CSE 326: Data Structures. Priority Queues Leftist Heaps & Skew Heaps. Announcements. New Heap Operation: Merge

Outline. CSE 326: Data Structures. Priority Queues Leftist Heaps & Skew Heaps. Announcements. New Heap Operation: Merge CSE 26: Dt Structures Priority Queues Leftist Heps & Skew Heps Outline Announcements Leftist Heps & Skew Heps Reding: Weiss, Ch. 6 Hl Perkins Spring 2 Lectures 6 & 4//2 4//2 2 Announcements Written HW

More information

Characterizing Higher-Order Ross More Risk Aversion by Comparison of Risk Compensation

Characterizing Higher-Order Ross More Risk Aversion by Comparison of Risk Compensation Chrcterizing Higher-Order Ross More Risk Aversion by Comprison of Risk Compenstion Guoqing Tin Yougong Tin b,c Deprtment of Economics, Texs A&M University, College Sttion, TX77843, USA b School of Economics,

More information

Chapter 4. Profit and Bayesian Optimality

Chapter 4. Profit and Bayesian Optimality Chpter 4 Profit nd Byesin Optimlity In this chpter we consider the objective of profit. The objective of profit mximiztion dds significnt new chllenge over the previously considered objective of socil

More information

NORTH YORKSHIRE PENSION FUND GOVERNANCE COMPLIANCE STATEMENT

NORTH YORKSHIRE PENSION FUND GOVERNANCE COMPLIANCE STATEMENT NORTH YORKSHIRE PENSION FUND GOVERNANCE COMPLIANCE STATEMENT TABLE OF CONTENTS Section Pge 1 INTRODUCTION 2 2 GOVERNANCE ARRANGEMENTS 2 3 REPRESENTATION AND MEETINGS 4 4 OPERATIONAL PROCEDRES 5 5 KEY POLICY

More information

MATH 236 ELAC MATH DEPARTMENT FALL 2017 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

MATH 236 ELAC MATH DEPARTMENT FALL 2017 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. MATH 236 ELAC MATH DEPARTMENT FALL 2017 TEST 1 REVIEW SHORT ANSWER. Write the word or phrse tht best completes ech sttement or nswers the question. 1) The supply nd demnd equtions for certin product re

More information

Measuring Search Trees

Measuring Search Trees Mesuring Serch Trees Christin Bessiere 1, Bruno Znuttini 2, nd Cèsr Fernández 3 1 LIRMM-CNRS, Montpellier, Frnce 2 GREYC, Cen, Frnce 3 Univ. de Lleid, Lleid, Spin Astrct. The SAT nd CSP communities mke

More information

Effects of Entry Restriction on Free Entry General Competitive Equilibrium. Mitsuo Takase

Effects of Entry Restriction on Free Entry General Competitive Equilibrium. Mitsuo Takase CAES Working Pper Series Effects of Entry Restriction on Free Entry Generl Competitive Euilirium Mitsuo Tkse Fculty of Economics Fukuok University WP-2018-006 Center for Advnced Economic Study Fukuok University

More information

Market uncertainty, macroeconomic expectations and the European sovereign bond spreads.

Market uncertainty, macroeconomic expectations and the European sovereign bond spreads. Mrket uncertinty, mcroeconomic expecttions nd the Europen sovereign bond spreds. Dimitris A. Georgoutsos Athens University of Economics & Business, Deprtment of Accounting & Finnce 76, Ptission str., 434,

More information

Rational Equity Bubbles

Rational Equity Bubbles ANNALS OF ECONOMICS AND FINANCE 14-2(A), 513 529 (2013) Rtionl Equity Bubbles Ge Zhou * College of Economics, Zhejing University Acdemy of Finncil Reserch, Zhejing University E-mil: flhszh@gmil.com This

More information

A comparison of quadratic discriminant function with discriminant function based on absolute deviation from the mean

A comparison of quadratic discriminant function with discriminant function based on absolute deviation from the mean A comprison of qudrtic discriminnt function with discriminnt function bsed on bsolute devition from the men S. Gneslingm 1, A. Nnthkumr Siv Gnesh 1, 1 Institute of Informtion Sciences nd Technology College

More information

MARKET POWER AND MISREPRESENTATION

MARKET POWER AND MISREPRESENTATION MARKET POWER AND MISREPRESENTATION MICROECONOMICS Principles nd Anlysis Frnk Cowell Note: the detil in slides mrked * cn only e seen if you run the slideshow July 2017 1 Introduction Presenttion concerns

More information

)''/?\Xck_

)''/?\Xck_ bcbsnc.com Deductible options: $250, $500, $1,000 or $2,500 Deductible options $500, $1,000, $2,500, $3,500 or $5,000 D or (100% coinsurnce is not vilble on the $2,500 deductible option) coinsurnce plns:

More information

Inequality and the GB2 income distribution

Inequality and the GB2 income distribution Working Pper Series Inequlity nd the GB2 income distribution Stephen P. Jenkins ECINEQ WP 2007 73 ECINEC 2007-73 July 2007 www.ecineq.org Inequlity nd the GB2 income distribution Stephen P. Jenkins* University

More information

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights:

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights: Technicl Report Technicl Anlyst FIS Technicl - Grins And Ferts Edwrd Hutn 44 20 7090 1120 Edwrdh@freightinvesr.com Highlights: SOY The weekly chrt is chowing lower high suggesting wekness going forwrd,

More information

Math F412: Homework 4 Solutions February 20, κ I = s α κ α

Math F412: Homework 4 Solutions February 20, κ I = s α κ α All prts of this homework to be completed in Mple should be done in single worksheet. You cn submit either the worksheet by emil or printout of it with your homework. 1. Opre 1.4.1 Let α be not-necessrily

More information

Controlling a population of identical MDP

Controlling a population of identical MDP Controlling popultion of identicl MDP Nthlie Bertrnd Inri Rennes ongoing work with Miheer Dewskr (CMI), Blise Genest (IRISA) nd Hugo Gimert (LBRI) Trends nd Chllenges in Quntittive Verifiction Mysore,

More information

The Option-Critic Architecture

The Option-Critic Architecture The Option-Critic Architecture Pierre-Luc Bacon, Jean Harb, Doina Precup Reasoning and Learning Lab McGill University, Montreal, Canada AAAI 2017 Intelligence: the ability to generalize and adapt efficiently

More information

Voluntary provision of threshold public goods with continuous contributions: experimental evidence

Voluntary provision of threshold public goods with continuous contributions: experimental evidence Journl of Public Economics 71 (1999) 53 73 Voluntry provision of threshold public goods with continuous contributions: experimentl evidence Chrles Brm Cdsby *, Elizbeth Mynes, b Deprtment of Economics,

More information

OPEN BUDGET QUESTIONNAIRE SOUTH AFRICA

OPEN BUDGET QUESTIONNAIRE SOUTH AFRICA Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE SOUTH AFRICA September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC

More information

Problem Set 2 Suggested Solutions

Problem Set 2 Suggested Solutions 4.472 Prolem Set 2 Suggested Solutions Reecc Zrutskie Question : First find the chnge in the cpitl stock, k, tht will occur when the OLG economy moves to the new stedy stte fter the government imposes

More information

Rates of Return of the German PAYG System - How they can be measured and how they will develop

Rates of Return of the German PAYG System - How they can be measured and how they will develop Rtes of Return of the Germn PAYG System - How they cn be mesured nd how they will develop Christin Benit Wilke 97-2005 me Mnnheimer Forschungsinstitut Ökonomie und Demogrphischer Wndel Gebäude L 13, 17_D-68131

More information

Information Acquisition and Disclosure: the Case of Differentiated Goods Duopoly

Information Acquisition and Disclosure: the Case of Differentiated Goods Duopoly Informtion Acquisition nd Disclosure: the Cse of Differentited Goods Duopoly Snxi Li Jinye Yn Xundong Yin We thnk Dvid Mrtimort, Thoms Mriotti, Ptrick Rey, Wilfried Snd-Zntmn, Frnces Xu nd Yongsheng Xu

More information

POLICY BRIEF 11 POTENTIAL FINANCING OPTIONS FOR LARGE CITIES

POLICY BRIEF 11 POTENTIAL FINANCING OPTIONS FOR LARGE CITIES POTENTIAL FINANCING OPTIONS FOR LARGE CITIES EXECUTIVE SUMMARY In South Afric lrge cities fce myrid of chllenges including rpid urbnistion, poverty, inequlity, unemployment nd huge infrstructure needs.

More information

First version: September 1997 This version: October On the Relevance of Modeling Volatility for Pricing Purposes

First version: September 1997 This version: October On the Relevance of Modeling Volatility for Pricing Purposes First version: September 1997 This version: October 1999 On the Relevnce of Modeling Voltility for Pricing Purposes Abstrct: Mnuel Moreno 3 Deprtment of Economics nd Business Universitt Pompeu Fbr Crrer

More information

Static Fully Observable Stochastic What action next? Instantaneous Perfect

Static Fully Observable Stochastic What action next?  Instantaneous Perfect CS 188: Ar)ficil Intelligence Mrkov Deciion Procee K+1 Intructor: Dn Klein nd Pieter Abbeel - - - Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to

More information

The Combinatorial Seller s Bid Double Auction: An Asymptotically Efficient Market Mechanism*

The Combinatorial Seller s Bid Double Auction: An Asymptotically Efficient Market Mechanism* The Combintoril Seller s Bid Double Auction: An Asymptoticlly Efficient Mret Mechnism* Rhul Jin IBM Wtson Reserch Hwthorne, NY rhul.jin@us.ibm.com Prvin Vriy EECS Deprtment University of Cliforni, Bereley

More information

Smart Investment Strategies

Smart Investment Strategies Smrt Investment Strtegies Risk-Rewrd Rewrd Strtegy Quntifying Greed How to mke good Portfolio? Entrnce-Exit Exit Strtegy: When to buy? When to sell? 2 Risk vs.. Rewrd Strtegy here is certin mount of risk

More information

OPEN BUDGET QUESTIONNAIRE MACEDONIA

OPEN BUDGET QUESTIONNAIRE MACEDONIA Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE MACEDONIA September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC 20002

More information

Asset finance (US) Opportunity. Flexibility. Planning. Develop your capabilities using the latest equipment

Asset finance (US) Opportunity. Flexibility. Planning. Develop your capabilities using the latest equipment Asset finnce (US) Opportunity Develop your cpbilities using the ltest equipment Flexibility Mnge your cshflow nd ccess the technology you need Plnning Mnge your investment with predictble costs nd plnned

More information

ACHIEVING ALARP WITH SAFETY INSTRUMENTED SYSTEMS

ACHIEVING ALARP WITH SAFETY INSTRUMENTED SYSTEMS ACHIEVING ALARP WITH SAFETY INSTRUMENTED SYSTEMS C.R. Timms, MIEE, United Kingdom, Tel: + 44 (0) 9 88668, Emil: c.timms@ifb.co.uk Keywords: ALARP, hzrds, risk, sfety, SIS. Abstrct This pper sets out methodology

More information

Notes on the BENCHOP implementations for the COS method

Notes on the BENCHOP implementations for the COS method Notes on the BENCHOP implementtions for the COS method M. J. uijter C. W. Oosterlee Mrch 29, 2015 Abstrct This text describes the COS method nd its implementtion for the BENCHOP-project. 1 Fourier cosine

More information

MIXED OLIGOPOLIES AND THE PROVISION OF DURABLE GOODS. Baranovskyi Volodymyr. MA in Economic Analysis. Kyiv School of Economics

MIXED OLIGOPOLIES AND THE PROVISION OF DURABLE GOODS. Baranovskyi Volodymyr. MA in Economic Analysis. Kyiv School of Economics MIXED OLIGOPOLIES AND THE PROVISION OF DURABLE GOODS by Brnovskyi Volodymyr A thesis submitted in prtil fulfillment of the requirements for the degree of MA in Economic Anlysis Kyiv School of Economics

More information

Interacting with mathematics in Key Stage 3. Year 9 proportional reasoning: mini-pack

Interacting with mathematics in Key Stage 3. Year 9 proportional reasoning: mini-pack Intercting with mthemtics in Key Stge Yer 9 proportionl resoning: mini-pck Intercting with mthemtics Yer 9 proportionl resoning: mini-pck Crown copyright 00 Contents Yer 9 proportionl resoning: smple unit

More information

OPEN BUDGET QUESTIONNAIRE FRANCE

OPEN BUDGET QUESTIONNAIRE FRANCE Interntionl Budget Project OPEN BUDGET QUESTIONNAIRE FRANCE October 2005 Interntionl Budget Project Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC 20002 www.interntionlbudget.org

More information

PSAKUIJIR Vol. 4 No. 2 (July-December 2015)

PSAKUIJIR Vol. 4 No. 2 (July-December 2015) Resonble Concession Period for Build Operte Trnsfer Contrct Projects: A Cse Study of Theun-Hiboun Hydropower Dm Project nd Ntionl Rod No. 14 A Project Pnysith Vorsing * nd Dr.Sounthone Phommsone ** Abstrct

More information

This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON

This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ~~FN3092 ZA 0 his pper is not to be remove from the Exmintion Hlls UNIESIY OF LONDON FN3092 ZA BSc egrees n Diploms for Grutes in Economics, Mngement, Finnce n the Socil Sciences, the Diploms in Economics

More information

Announcements. Maximizing Expected Utility. Preferences. Rational Preferences. Rational Preferences. Introduction to Artificial Intelligence

Announcements. Maximizing Expected Utility. Preferences. Rational Preferences. Rational Preferences. Introduction to Artificial Intelligence Introduction to Artificil Intelligence V22.0472-001 Fll 2009 Lecture 8: Utilitie Announcement Will hve Aignment 1 grded by Wed. Aignment 2 i up on webpge Due on Mon 19 th October (2 week) Rob Fergu Dept

More information