Multi-Step Reinforcement Learning: A Unifying Algorithm

Size: px
Start display at page:

Download "Multi-Step Reinforcement Learning: A Unifying Algorithm"

Transcription

1 Multi-Step Reinforcement Lerning: A Unifying Algorithm Kristopher De Asis, 1 J. Fernndo Hernndez-Grci, 1 G. Zchris Hollnd, 1 Richrd S. Sutton Reinforcement Lerning nd Artificil Intelligence Lbortory, University of Albert {kldesis,jfhernn,ghollnd,rsutton}@ulbert.c 1992) is rgubly the most populr, nd is considered n offpolicy method becuse the policy generting the behviour (the behviour policy), nd the policy tht is being lerned (the trget policy) re different. Srs (Rummery nd Nirnjn 1994; Sutton 1996) is the clssicl on-policy control method, where the behviour nd trget policies re the sme. However, Srs cn be extended to lern off-policy with the use of importnce smpling (Precup, Sutton, nd Singh 2000). Expected Srs is n extension of Srs tht, insted of using the ction-vlue of the next stte to updte the vlue of the current stte, uses the expecttion of ll the subsequent ction-vlues of the current stte with respect to the trget policy. Expected Srs hs been studied s strictly on-policy method (vn Seijen et l. 2009), but in this pper we present more generl version tht cn be used for both on- nd off-policy lerning nd tht lso subsumes Q-lerning. All of these methods re often described in the simple one-step cse, but they cn lso be extended cross multiple time steps. The TD(λ) lgorithm unifies one-step TD lerning with Monte Crlo methods (Sutton 1988). Through the use of eligibility trces, nd the trce-decy prmeter, λ [0, 1], spectrum of lgorithms is creted. At one end, λ = 1, exists Monte Crlo methods, nd t the other, λ = 0, exists one-step TD lerning. In the middle of the spectrum re intermedite methods which cn perform better thn the methods t either extreme (Sutton nd Brto 1998). The concept of eligibility trces cn lso be pplied to TD control methods such s Srs nd Q-lerning, which cn crete more efficient lerning nd produce better performnce (Rummery 1995). Multi-step TD methods re usully thought of in terms of n verge of mny multi-step returns of differing lengths nd re often ssocited with eligibility trces, s is the cse with TD(λ). However, it is lso nturl to think of them in terms of individul n-step returns with their ssocited n- step bckup (Sutton nd Brto 1998). We refer to ech of these individul bckups s tomic bckups, wheres the combintion of severl tomic bckups of different lengths cretes compound bckup. In the existing literture, it is not cler how best to extend one-step Expected Srs to multi-step lgorithm. The Treebckup lgorithm ws originlly presented s method to perform off-policy evlution when the behviour policy is non-mrkov, non-sttionry or completely unknown (Precup, Sutton, nd Singh 2000). In this pper, we re-present TreerXiv: v2 [cs.ai] 11 Jun 2018 Abstrct Unifying seemingly disprte lgorithmic ides to produce better performing lgorithms hs been longstnding gol in reinforcement lerning. As primry exmple, TD(λ) elegntly unifies one-step TD prediction with Monte Crlo methods through the use of eligibility trces nd the trce-decy prmeter λ. Currently, there re multitude of lgorithms tht cn be used to perform TD control, including Srs, Q-lerning, nd Expected Srs. These methods re often studied in the one-step cse, but they cn be extended cross multiple time steps to chieve better performnce. Ech of these lgorithms is seemingly distinct, nd no one domintes the others for ll problems. In this pper, we study new multi-step ction-vlue lgorithm clled Q(σ) tht unifies nd generlizes these existing lgorithms, while subsuming them s specil cses. A new prmeter, σ, is introduced to llow the degree of smpling performed by the lgorithm t ech step during its bckup to be continuously vried, with Srs existing t one extreme (full smpling), nd Expected Srs existing t the other (pure expecttion). Q(σ) is generlly pplicble to both on- nd off-policy lerning, but in this work we focus on experiments in the on-policy cse. Our results show tht n intermedite vlue of σ, which results in mixture of the existing lgorithms, performs better thn either extreme. The mixture cn lso be vried dynmiclly which cn result in even greter performnce. The Lndscpe of TD Algorithms Temporl-difference (TD) methods (Sutton nd Brto 1998) re n importnt concept in reinforcement lerning (RL) tht combines ides from Monte Crlo nd dynmic progrmming methods. TD methods llow lerning to occur directly from rw experience in the bsence of model of the environment s dynmics, like with Monte Crlo methods, while lso llowing estimtes to be updted bsed on other lerned estimtes without witing for finl result, like with dynmic progrmming. The core concepts of TD methods provide flexible frmework for creting vriety of powerful lgorithms tht cn be used for both prediction nd control. There re number of TD control methods tht hve been proposed. Q-lerning (Wtkins 1989; Wtkins nd Dyn Copyright c 2018, Assocition for the Advncement of Artificil Intelligence ( All rights reserved. 1 Authors contributed eqully, nd re listed lphbeticlly.

2 bckup s nturl multi-step extension of Expected Srs. Insted of performing the updtes with entirely smpled trnsitions s with multi-step Srs, Tree-bckup performs the updte using the expected vlues of ll the ctions t ech trnsition. Q(σ) is n lgorithm tht ws first proposed by Sutton nd Brto (2018) which unifies nd generlizes the existing multistep TD control methods. The degree of smpling performed by the lgorithm is controlled by the smpling prmeter, σ. At one extreme (σ = 1) exists Srs (full smpling), nd t the other (σ = 0) exists Tree-bckup (pure expecttion). Intermedite vlues of σ crete lgorithms with mixture of smpling nd expecttion, nd σ cn be interpreted s wy to control the bis-vrince trde-off inherent in multi-step TD lgorithms. In this work, on problems with tbulr representtion nd problem requiring function pproximtion, we show tht n intermedite vlue of σ cn outperform the lgorithms tht exist t either extreme. In ddition, we show tht σ cn be vried dynmiclly to produce even greter performnce. We limit our discussion of Q(σ) to the tomic multi-step cse without eligibility trces, but nturl extension is to mke use of compound bckups nd is n venue for future reserch. Furthermore, Q(σ) is generlly pplicble to both on- nd off-policy lerning, but for our initil empiricl study we exmined only on-policy prediction nd control problems. MDPs nd One-step Solution Methods The sequentil decision problem encountered in RL is often modeled s Mrkov decision process (MDP). Under this frmework, n gent nd the environment interct over sequence of discrete time steps t. At every time step, the gent receives informtion bout the environment s stte, S t S, where S is the set of ll possible sttes. The gent uses this informtion to select n ction, A t, from the set of ll possible ctions A. Bsed on the behvior of the gent nd the stte of the environment, the gent receives rewrd, R t+1 R, nd moves to nother stte, S t+1 S, with stte-trnsition probbility p(s s, ) = P (S t+1 = s S t = s, A t = ), for A nd s, s S. The gent behves ccording to policy π( s), which is probbility distribution over the set S A. Through the process of policy itertion (Sutton nd Brto 1998), the gent lerns the optiml policy, π, tht mximizes the expected discounted return: G t = R t+1 + γr t+2 + γ 2 R t = T t 1 k=0 γ k R t+1+k, (1) for discount fctor γ [0, 1) nd T = for continuing tsks, or γ [0, 1] nd T equl to the finl time step in episodic tsks. TD lgorithms strive to mximize the expected return by computing vlue functions tht estimte the expected future rewrds in terms of the elements of the environment nd the ctions of the gent. The stte-vlue function is the expected return when the gent is in stte s nd follows policy π, defined s v π (s) = E π [G t S t = s]. For control, most of the time we focus on estimting the ction-vlue function, which is the expected return when the gent tkes n ction, in stte s, while following policy π, nd is defined s: q π (s, ) = E π [G t S t = s, A t = ]. (2) Eqution 2 cn be estimted itertively by observing new rewrds, bootstrpping on old estimtes of q π, nd using the updte rule: Q(S t, A t ) Q(S t, A t ) (3) + α[r t+1 + γq(s t+1, A t+1 ) Q(S t, A t )], where α (0, 1] is the step size prmeter. Updte rules re lso known s bckup opertions becuse they trnsfer informtion bck from future sttes to the current one. A common wy to visulize bckup opertions is by using bckup digrms such s the ones depicted in Figure 1. For clrity, the lgorithmic ides in this pper re presented initilly s tbulr solution methods, but we lso extend them to use function pproximtion, nd thus they lso serve s pproximte solution methods. The term in brckets in (3): δ S t = R t+1 + γq(s t+1, A t+1 ) Q(S t, A t ), (4) is lso known s the TD error, denoted δ t. TD control methods re chrcterized by their TD error; for exmple, the TD error in (4) corresponds to the clssic on-policy method known s Srs. Becuse lerning requires certin mount of explortion, behving greedily with respect to the estimted optiml policy is often infesible. Therefore, gents re often trined under ɛ-greedy policies for which the gent only chooses the optiml ction with probbility (1 ɛ) nd behves rndomly with probbility ɛ, for ɛ [0, 1]. Nevertheless, lerning the optiml policy is possible if it is done off-policy. When the gent is lerning off-policy, it behves ccording to behvior policy, µ, while lerning trget policy, π. This cn be chieved by using nother TD control method, Expected Srs. In contrst with Srs, Expected Srs behves ccording to the behvior policy, but updtes its estimte by tking n expecttion of Q(S t, A t ) over the ctions t time t, ccording to the trget policy (vn Seijen et l. 2009). For convenience, let the expected ction-vlue be defined s: V t+1 = π( S t+1 )Q(S t+1, ). (5) Then, the TD error of Expected Srs cn be written s: δ ES t = R t+1 + γv t+1 Q(S t, A t ). (6) A specil cse of Expected Srs is Q-lerning, where the estimte is updted ccording to the mximum of Q(S t, ) over the ctions (Wtkins 1989): δ QL t = R t+1 + γ mx Q(S t+1, ) Q(S t, A t ). (7) Q-lerning is the resulting lgorithm when the trget policy of Expected Srs is the greedy policy with respect to Q.

3 4-Step Srs 4-Step Expected Srs 4-Step Tree-Bckup 4-Step Figure 1: Bckup digrms for tomic 4-step Srs, Expected Srs, Tree-bckup, nd Q(σ). Here we cn see tht Q(σ) encompsses the other three lgorithms bsed on the setting of σ. Atomic Multi-Step Algorithms The TD methods presented in the previous section cn be generlized even further by bootstrpping over longer time intervls. This hs been shown to decrese the bis of the updte t the cost of incresing the vrince (Jkkol, Jordn, nd Singh 1994). Nevertheless, in mny cses it is possible to chieve better performnce by choosing vlue for the bckup length prmeter, n, greter thn one (Sutton nd Brto 1998). We refer to lgorithms which mke use of multi-step tomic bckup s tomic multi-step lgorithms. Just like how one-step methods re defined by their TD error, ech tomic multi-step lgorithm is chrcterized by its n- step return. For tomic multi-step Srs, the n-step return is: G t:t+n =R t+1 + γr t+2 + γ 2 R t γ n 1 R t+n + γ n Q t+n 1 (S t+n, A t+n ), γ k R t+k+1 + γ n Q t+n 1 (S t+n, A t+n ), (8) n 1 = k=0 where Q t+n 1 is the estimte of q π t time t + n 1, nd the subscript rnge, t : t + n, denotes the length of the bckup. n-step Srs cn be dpted for off-policy lerning by introducing n importnce smpling rtio term (Precup, Sutton, nd Singh 2000): ρ t+n t = τ k=t π(a k S k ) µ(a k S k ), (9) nd multiplying it with the TD error to get the following updte rule: Q t+n (S t, A t ) Q t+n 1 (S t, A t ) (10) + αρ t+n t+1 [G t:t+n Q t+n 1 (S t, A t )], where τ = min(t + n 1, T 1) is the time step before the end of the updte or before the end of the episode. In the updte, the ction-vlues for ll other sttes remin the sme i.e. Q t+n (s, ) = Q t+n 1 (s, ), s S t, nd A t. This updte rule is not only pplicble for off-policy n-step Srs, but is generlly useful form for other tomic multi-step lgorithms s well. We present the lgorithms in this work s generl off-policy solution methods, but in the experiments section we evlute them empiriclly on-policy only which provides useful insight into their behviour. We defer the empiricl study nd comprison of the lgorithms in n offpolicy setting to future work. Expected Srs cn lso be generlized to multi-step method by using the return: G t:t+n = R t+1 + γr t+2 + γ 2 R t γ n V t+n. (11) The first n 1 sttes nd ctions re smpled ccording to the behviour policy, s with n-step Srs, but the lst stte is bcked up ccording to the expected ction-vlue under the trget policy. To mke n-step Expected Srs entirely off-policy, n importnce smpling rtio term cn lso be introduced, but it needs to omit the lst time step. The resulting updte would be the sme s in (10), but would use ρt+1 t+n 1 nd the n-step return for n-step Expected Srs from (11). A drwbck to using importnce smpling to lern offpolicy is tht it cn crete high vrince which must be compensted for by using smll step sizes; this cn slow lerning (Precup, Sutton, nd Singh 2000). In the next section we present method tht is lso generliztion of Expected Srs, but tht cn lern off-policy without importnce smpling. Tree-bckup As shown in (11), the TD return of n-step Expected Srs is clculted by tking n expecttion over the ctions t the lst step of the bckup. However, it is possible to extend this ide to every time step of the bckup by tking n expecttion t every step (Precup, Sutton, nd Singh 2000). The resulting lgorithm is multi-step generliztion of Expected Srs tht is known s Tree-bckup becuse of its chrcteristic bckup digrm (Figure 1). Moreover, just like Expected Srs nd Q-lerning, this proposed generliztion does not require importnce smpling to be pplied off-policy. Hence, it could be rgued tht it is more pproprite generliztion of Expected Srs to multi-step lerning (Sutton nd Brto 2018). Becuse Expected Srs subsumes Q-lerning, Treebckup cn lso be thought of s multi-step generliztion of Q-lerning if the trget policy is greedy with respect to the ction-vlue function. Tree-bckup hs severl dvntges over n-step Expected Srs. Tree-bckup hs the cpcity for lerning off-policy without the need for importnce smpling, reducing the vrince due to the importnce smpling rtios. Additionlly, becuse n importnce smpling rtio does not need to be computed, the behvior policy does not need to be sttionry, Mrkov, or even known (Precup, Sutton, nd Singh 2000). Ech brnch of the tree represents n ction, while the min brnch represents the ction tken t time t. The vlue of ech of the brnches is the vlue of Q t+n (S t, A t ) for the corresponding t, wheres the vlue of ech segment of the min brnch is the rewrd t the corresponding time step.

4 The n-step return is the sum of the vlues of ech brnch weighted by the product of the probbilities of the ctions leding to the brnch nd multiplied by the corresponding power of the discount term. For clrity, it is esier to present the n-step return of the Tree-bckup lgorithm in terms of the TD error of Expected Srs from (6): G t:t+n =Q t 1 (S t, A t ) + τ k=t δ ES k k i=t+1 γπ(a i S i ). (12) This tomic version of multi-step Tree-bckup ws first presented by Sutton nd Brto (2018). As result of the product term in (12), in ddition to the discount fctor γ, future rewrds re further discounted by the probbilities of the ctions tken. The Tree-bckup lgorithm therefore ssigns less weight to the rewrd sequence received, nd compenstes by bootstrpping off of the vlues of ctions not tken. Due to this, Tree-bckup is more bised thn Srs in the multi-step cse with stochstic policy, s Srs gives full weight to every rewrd received prior to bootstrpping. However, this increse in bis (towrds the estimtes in the vlue function) is trded off with decresed vrince in the rewrd sequence from tking expecttions. The Q(σ) Algorithm In the previous sections we hve incrementlly introduced severl generliztions for the TD control methods Srs nd Expected Srs, nd in this section we present n lgorithm tht unifies them clled Q(σ). Srs cn be generlized to n tomic multi-step lgorithm by using n n-step return, nd n-step Srs generlizes to n off-policy lgorithm through the use of importnce smpling. In contrst, Expected Srs cn lern off-policy without the need for importnce smpling, nd generlizes to the tomic multi-step lgorithms: Tree-bckup nd n-step Expected Srs. All of the lgorithms presented so fr cn be brodly ctegorized into two fmilies: those tht bckup their ctions s smples, like Srs; nd those tht consider n expecttion over ll ctions in their bckup, like Expected Srs nd Tree-bckup. In this section, we introduce method to unify both fmilies of lgorithms by introducing new prmeter, σ. The possibility of unifying Srs nd Tree-bckup ws first suggested by Precup et l. (2000), nd the first formultion of Q(σ) ws presented by Sutton nd Brto (2018). The intuition behind Q(σ) is bsed on the ide tht we hve choice to updte the estimte of q π bsed on one ction smpled from the set of possible future ctions, or bsed on the expecttion over the possible future ctions. For exmple, with n-step Srs, smple is tken t every step of the bckup, wheres with the Tree-bckup lgorithm, n expecttion is tken insted. However, the choice of smpling or expecttion does not hve to remin constnt for every step of the bckup. Furthermore, the bckup t time step t could be bsed on weighted verge of both smpling nd expecttion. In order to implement this, the prmeter, σ t [0, 1], is introduced to control the degree of smpling t ech step of the bckup. Thus, the TD error of Q(σ) cn be represented in terms of weighted sum of the TD errors of Algorithm 1 Off-policy n-step Q(σ) for estimting q π Input: behviour policy µ nd trget policy π Initilize S 0 terminl; select A 0 ccording to µ(. S 0 ) Store S 0, A 0, nd Q(S 0, A 0 ) for t = 0, 1, 2,..., T + n 1 do if t < T then Tke ction A t ; observe R nd S t+1 Store S t+1 if S t+1 is terminl then Store: δt σ = R Q(S t, A t ) else Select A t+1 ccording to µ( S t+1 ) nd Store Store: Q(S t+1, A t+1 ), σ t+1, π(a t+1 S t+1 ) Store: δt σ = R + γ[σ t+1 Q(S t+1, A t+1 ) +(1 σ t+1 )V t+1 ] Q(S t, A t ) Store: ρ t+1 = π(at+1 St+1) µ(s t+1 A t+1) end if end if τ t n + 1 if τ 0 then ρ 1 E 1 G Q(S τ, A τ ) for k = τ,..., min(τ + n 1, T 1) do G G + Eδk σ E γe [(1 σ k )π(a k+1 S k+1 ) + σ k+1 ] ρ ρ(1 σ k + σ k ρ k ) end for Q(S τ, A τ ) Q(S τ, A τ ) + αρ[g Q(S τ, A τ )] end if end for Srs nd Expected Srs: δ σ t = σ t+1 δ S t + (1 σ t+1 )δ ES t, = R t+1 + γ[σ t+1 Q t (S t+1, A t+1 ) + (1 σ t+1 )V t+1 ] Q t 1 (S t, A t ). (13) The n-step return is then: G t:t+n = Q t 1 (S t, A t ) (14) τ k + γ[(1 σ i )π(a i S i ) + σ i ]. δk σ k=t i=t+1 Moreover, the importnce smpling rtio from (9) cn be modified to include σ s follows: τ ( ) ρ t+n t+1 = π(a k S k ) σ k µ(a k S k ) + 1 σ k. (15) k=t+1 The updte rule for Q(σ) cn then be obtined by using G t:t+n from (14) nd ρ t+n t+1 from (15), with the updte rule from (10). Algorithm 1 shows the pseudocode for the complete off-policy n-step Q(σ) lgorithm. Additionlly, proof for one-step Q(σ) is redily vilble by pplying the results from Jkkol et l. (1994), Singh et l. (2000), nd vn Seijen et l. (2009).

5 Theorem 1. The one-step Q(σ) estimte defined by Q t+1 (S t, A t ) = (1 α t )Q t (S t, A t ) + α t [R t+1 + γ(σ t+1 Q t+1 (S t+1, A t+1 ) + (1 σ t+1 )V t+1 )], (16) converges to the optiml policy when the following conditions re stisfied: 1. The size of the set S A is finite. 2. α t = α t (S t, A t ) [0, 1], t α t =, t α2 t < w.p. 1 nd (s, ) (S t, A t ) : α t (S t, A t ) = The policy is greedy in the limit with infinite explortion. 4. The rewrd function is bounded. We defer the full detils of the proof to the ppendix; however, There re two importnt results from the proof tht re worth emphsizing. First, just s with one-step Q-lerning, Srs, nd Expected Srs, one-step Q(σ) cn be used to lern optiml ction-vlue functions. Second, t ech time step t it is possible to choose σ t such tht the contrction property of the Q(σ) updte is less thn or equl to the contrction induced by the Srs or Expected Srs updtes. This implies tht it is possible to choose σ t t every time step in order to speed up convergence. It is importnt to note tht every TD control method presented thus fr cn be obtined with Q(σ) by vrying the smpling prmeter, σ; when σ = 1, we obtin Srs, when σ = 0, we obtin Expected Srs nd Tree-bckup, nd when σ = 1 for every step of the bckup except for the lst, where σ = 0, we obtin n-step Expected Srs. Thus, tuning the hyper-prmeter σ is not strictly necessry since it cn be set to fixed vlue in order to obtin one of the existing TD control lgorithms. Nevertheless, intermedite vlues of σ between 0 nd 1 crete entirely new lgorithms tht exist somewhere between full smpling nd pure expecttion nd tht could result in better performnce. Furthermore, σ does not need to remin constnt throughout every episode or even t every time step during n episode or continuing tsk. σ could be vried dynmiclly s function of time, of the current stte, or of some mesure of the lerning progress. In prticulr, σ could lso be vried s function of the episode number, which we investigte in our experiments. There re potentilly vriety of effective schemes for choosing nd vrying σ, nd would be subject for further reserch. Experiments 19-Stte Rndom Wlk The 19-stte rndom wlk, shown in Figure 2, is 1- dimensionl environment where n gent rndomly trnsitions to one of two neighboring sttes. There is terminl stte on ech end of the environment, trnsitioning to one of them gives rewrd of -1, nd trnsitioning to the other gives rewrd of 1. To compre lgorithms tht involve tking n expecttion bsed on its policy, the tsk is formulted such tht ech stte hd two ctions. Ech ction deterministiclly trnsitions to one of the two neighboring sttes, nd the gent lerns on-policy under n equiprobble rndom behvior policy. This differs from typicl rndom wlk setups where ech stte hs one ction tht will rndomly trnsition to either Figure 2: The 19-stte rndom wlk MDP. The gol is to ccurtely estimte the vlue of ech stte under equiprobble rndom behvior. neighboring stte (Sutton nd Brto 1998), but the resulting stte vlues re identicl. This environment ws treted s prediction tsk where lerning lgorithm is to estimte vlue function under its behvior policy. We conducted n experiment compring vrious Q(σ) lgorithm instnces, ssessing different multistep bckup lengths, step sizes, nd degrees of smpling. The root-men-squre (RMS) error between its estimted vlue function nd the nlyticlly computed vlues ws mesured fter ech episode. Ech Q(σ) instnce nd prmeter setting rn for 50 episodes nd the results re verged cross 100 runs. Figure 3 shows the results with n = 3 nd α = 0.4, which ws found to be representtive of the best prmeter setting for ech instnce of Q(σ) on this tsk. Srs (full smpling) hd better initil performnce but poor symptotic performnce, Tree-bckup (no smpling) hd poor initil performnce but better symptotic performnce, nd intermedite degrees of smpling trded off between the initil nd symptotic performnces. This motivted the ide of dynmiclly decresing σ from 1 (full smpling) towrds 0 (pure expecttion) to tke dvntge of the initil performnce of Srs, nd the symptotic performnce of Tree-bckup. To ccomplish this we decresed σ by fctor of 0.95 fter ech episode. Q(σ) with dynmiclly vrying σ outperformed ll of the fixed degrees of smpling. Stochstic Windy Gridworld The windy gridworld is tbulr nvigtion tsk in stndrd gridworld which is described by Sutton nd Brto (1998). There is strt stte nd gol stte, nd there re four possible moves: right, left, up, nd down. When the gent moves into one of the middle columns of the gridworld, it is ffected by n upwrd wind which shifts the resultnt next stte upwrds by number of cells nd vries from column to column. If the gent is t the edge of the world nd selects move tht would cuse it to leve the grid, or would be pushed off the world by the wind, it is simply replced in the nerest stte t the edge of the world. At ech time step the gent receives constnt rewrd of -1 until the gol is reched. A vrition of the windy gridworld, clled the stochstic windy gridworld, is one where the results of choosing n ction re not deterministic. The lyout, ctions, nd wind strengths re the sme, but t ech time step, with probbility of 10%, the next stte tht results from picking ny ction is determined t rndom from the 8 sttes currently surrounding the gent. We conducted n experiment on the stochstic windy grid-

6 Figure 3: 19-stte rndom wlk results. The plot shows the performnce of Q(σ) in terms of RMS error in the vlue function. The results re n verge of 100 runs, nd the stndrd errors re ll less thn Q(1) hd the best initil performnce, Q(0) hd the best symptotic performnce, nd dynmic σ outperformed ll fixed vlues of σ. world which consisted of 1000 runs of 100 episodes ech to evlute the performnce of vrious instnces of Q(σ) with different prmeter combintions. All instnces of the lgorithms behved nd lerned ccording to n ɛ-greedy policy, with ɛ = 0.1. As the performnce mesure, we compred the verge return over the 100 episodes. The results re summrized in Figure 4. For ll the vlues of σ tht we tested, choosing n = 3 resulted in the gretest performnce; higher nd lower vlues of n decresed the performnce. Overll, Q(σ) with dynmic σ performed the best, while σ = 0.5 ws close second. Mountin Cliff We implemented vrint of the clssicl episodic tsk, mountin cr, s described by Sutton nd Brto (1998). For this implementtion, the rewrds, ctions nd gol remined the sme. However, if the gent ever ventured pst the top of the leftmost mountin, it would fll off cliff, be rewrded -100 nd returned to rndom initil loction in the vlley between the two hills. We nmed this environment mountin cliff. Both environments were tested nd showed the sme trend in the results. However, the results obtined in mountin cliff were more pronounced nd thus were more suitble for demonstrtion purposes. Becuse the stte spce is continuous, we pproximted q π using tile coding function pproximtion. Specificlly, we used version 3 of Sutton s tile coding softwre (n.d.) with 8 tilings, n symmetric offset by consecutive odd numbers, nd ech tile tking over 1/8 frction of the feture spce, which gives resolution of pproximtely 1.6%. For ech lgorithm, we conducted 500 independent runs of 500 episodes ech. All trining ws done on-policy under n ɛ-greedy policy with ɛ = 0.1 nd γ = 1. We optimized for the Figure 4: Stochstic windy gridworld results. The plot shows the performnce of Q(σ) in terms of the verge return over 100 episodes s function of the step size, α, for vrious vlues of σ. The results re for selected α vlues, then re connected by stright lines, nd re n verge of 1000 runs. The stndrd errors re ll less thn 0.3 which is bout line width. 3-step lgorithms performed better thn their 1-step equivlents, nd Q(σ) with dynmic σ performed the best overll. verge return fter 500 episodes over different vlues of the step size prmeter, α, nd the bckup length, n. The results correspond to the best-performing prmeter combintion for ech lgorithm: α = 1/6 nd n = 4 for Srs; α = 1/6 nd n = 8 for Tree-bckup; α = 1/4 nd n = 4 for Q(0.5); nd α = 1/7 nd n = 8 for Dynmic σ. We omit n-step Expected Srs in the results becuse its performnce ws not much different from n-step Srs s performnce. Figure 6 shows the return per episode verged over 500 runs. To smooth the results, we computed right-centered moving verge with window of 30 successive episodes. Additionlly, we dded the verge return per episode in lighter tone to show the vrince of ech lgorithm. As it cn be observed, tomic multi-step Srs nd Q(0.5) hd firly similr performnce. Among the tomic multi-step methods with sttic σ, Tree-bckup hd the best performnce. Nonetheless, Q(σ) with dynmic σ outperformed ll the lgorithms tht were using sttic σ. In order to gin more insight into the nture of the results, we looked t the verge return per episode fter 50 (initil performnce) nd 500 (symptotic performnce) episodes for ech lgorithm. Additionlly, 95% confidence intervl ws computed in order to vlidte the results. After 50 episodes, Q(0.5) hd the best performnce mong the four lgorithms with n verge return per episode of ; Dynmic σ ws close second with n verge return per episode of On the other hnd, fter 500 episodes, Dynmic σ mnged to outperform ll the other lgorithms with n verge return per episode of followed by Q(0.5) with n verge return per episode of Q(1) (Srs) hd the lowest

7 Figure 5: The mountin cliff environment. The gol of the gent is to drive pst the flg without flling off the cliff. The gent receives rewrd of -1 t every time step, nd flling off the cliff returns it to rndom initil loction in the vlley with rewrd of performnce with verge return per episode fter 50 episodes nd fter 500 episodes. These results contrst with Figure 6 becuse the verge is tken over ll the previous episode insted of the preceding 30 episodes. Discussion From our experiments, it is evident tht there is merit in unifying the spce of lgorithms with Q(σ). In prediction tsks, such s the 19-stte rndom wlk, vrying the degree of smpling results in trde-off between initil nd symptotic performnce. In control tsks, such s the stochstic windy gridworld, intermedite degrees of smpling re cpble of chieving higher per-episode verge return thn either extreme, depending on the number of elpsed episodes. These findings lso extend to tsks with continuous stte spces, such s the mountin cliff. Intermedite vlues of σ llow for higher initil performnce, wheres smll vlues of σ llow for better symptotic performnce. As shown in Figure 6, Q(σ) with dynmic σ is ble to exploit these two benefits by djusting σ over time. Moreover, our experiments in the stochstic windy gridworld tsk demonstrted tht it is possible to improve performnce by choosing higher vlue of the bckup length prmeter, n. Vrying n controls bis-vrince trde-off by djusting how mny rewrds re included in the bckup before bootstrpping, similr to the prmeter λ in the TD(λ) lgorithm. The prmeter σ lso hs bis-vrince trdeoff interprettion, s the Tree-bckup lgorithm decys the weighting of future rewrds bsed on the stochsticity in the policy (nd is therefore more bised). The length prmeter n controls the bis-vrince trde-off in the direction of the trjectory tken, while the prmeter σ mnges it by controlling the bootstrpping in the direction of ctions not tken. A qulittive result tht illustrtes the bis-vrince trde-off induced by the prmeter σ cn be observed in the 19-Stte Figure 6: Mountin cliff results. The plot shows the performnce of ech tomic multi-step lgorithm in terms of the verge return per episode. The drk lines show the results smoothed using right-centered moving verge with window of 30 successive episodes, while the light lines show the un-smoothed results. Q(σ) with dynmic σ hd the best performnce mong ll the lgorithms. Rndom Wlk experiment. A lrge vlue of σ results in lower bis t the beginning of trining nd lower RMS error s consequence. However, s the bis of the return decreses in the symptote, the low vrince inherent to smll vlues of σ result in more ccurte estimtes of the ction-vlue function. Conclusions In this pper we studied Q(σ), which is unifying lgorithm for multi-step TD control methods. Q(σ), through the use of the smpling prmeter σ, llows for continuous vrition between updting bsed on full smpling nd updting bsed on pure expecttion. Our results on prediction nd control problems showed tht n intermedite fixed degree of smpling cn outperform the methods tht exist t the extremes (Srs nd Tree-bckup). In ddition, we presented simple wy of dynmiclly djusting σ which outperformed ny fixed degree of smpling. Our presenttion of Q(σ) ws limited to the tomic multistep cse without eligibility trces, we only conducted experiments on on-policy problems, nd we only investigted one simple method for dynmiclly vrying σ. This leves open severl venues for future reserch. First, Q(σ) could be extended to use eligibility trces nd compound bckups. Second, the performnce of Q(σ) could be evluted on off-policy problems. Third, other schemes for dynmiclly vrying σ could be investigted perhps s function of stte, the recently observed rewrds, or some mesure of the lerning progress. Acknowledgments The uthors thnk Vincent Zhng, Hrm vn Seijen, Doin Precup, nd Pierre-luc Bcon for insights nd discussions contributing to the results presented in this pper, nd the entire

8 Reinforcement Lerning nd Artificil Intelligence reserch group for providing the environment to nurture nd support this reserch. We grtefully cknowledge funding from Albert Innovtes Technology Futures, Google Deepmind, nd from the Nturl Sciences nd Engineering Reserch Council of Cnd. Appendix Proof of Theorem 1 Let X = S A, X t = (S t, A t ) X, R t = E{R t }, nd Q be the optiml ction-vlue function defined s Q (S t, A t ) = R t+1 + γe{mx Q (S t+1, )}. (17) We define new stochstic process (α t, t, F t ) t 0 by subtrcting Q (X t ) from both sides of eqution (16) t+1 (X t ) = (1 α t (X t )) t (X t ) α t (X t )F t (X t ), nd letting α t (0, 1], t (X t ) = Q t (X t ) Q (X t ), nd F t = R t+1 +γ[σ t+1 Q t (X t+1 )+(1 σ t+1 )V t+1 ] Q (X t ). Additionlly, let P t be sequence of incresing σ-fields representing the history such tht α 0 nd 0 re P 0 -mesurble nd α t, t, nd F t 1 re P t -mesurble for t 1. Proving tht t converges to 0 s t is equivlent to showing tht Q t converges to Q s t. Consequently, the proof is equivlent to showing tht the conditions of lemm 1 from Singh et l. (2000) re stisfied for t. Conditions one, two, nd three of the lemm re stisfied by the corresponding ssumptions of the theorem. Hence, we only need to show tht E{F t P t } k t + C t where. is the mximum norm, k [0, 1), nd C t goes to 0 with probbility 1. By dding nd subtrcting mx Q t (S t, ), using the definition of Q nd the tringle inequlity, we cn show tht E{F t P t } E{R t+1 + γ mx Q t (S t+1, ) Q (S t, A t )} + γ E{σ t+1 Q t (s t+1, t+1 ) + (1 σ t+1 )V t+1 mx Q t (s t+1, )} = γ E{mx Q t (S t+1, ) mx Q (S t+1, b)} + C t b γ mx s mx γ mx mx s = γ t + C t. Q t (s, ) mx b Q t (s, ) Q (s, ) + C t Q (s, b) + C t Note tht if the policy is greedy nd σ t+1 [0, 1], then σ t+1 Q t (S t+1, A t+1 )+(1 σ t+1 )V t+1 = mx Q t (S t+1, ). Therefore, C t goes to 0 s the policy becomes greedy in the limit. Consequently, condition 3 of lemm 1 from Sing et l. (2000) is stisfied. Therefore, t converges to 0 w.p. 1, which implies tht Q t converges to Q w.p. 1. Precup, D.; Sutton, R. S.; nd Singh, S Eligibility trces for off-policy policy evlution. In Kufmn, M., ed., Proceedings of the 17th Interntionl Conference on Mchine Lerning, Rummery, G. A., nd Nirnjn, M On-line Q-lerning using connectionist systems. Technicl report, CUED/F- INFENG/TR 166, Engineering Deprtment, Cmbridge University. Rummery, G. A Problem Solving with Reinforcement Lerning. Ph.D. Disserttion, Cmbridge University. Singh, S.; Jkkol, T.; Littmn, M. L.; nd Szepesvári, C Convergence results for single-step on-policy reinforcement-lerning lgorithms. Mchine Lerning 38(3): Sutton, R. S., nd Brto, A. G Reinforcement Lerning: An Introduction. Cmbridge, Msschusetts: MIT Press. Sutton, R. S., nd Brto, A. G Reinforcement Lerning: An Introduction. 2nd edition. Mnuscript in preprtion. Sutton, R. S Lerning to predict by the methods of temporl differences. Mchine lerning 3(1):9 44. Sutton, R. S Generliztion in reinforcement lerning: Successful exmples using sprse corse coding. In Touretzky, D. S., nd Hsselmo, M. E., eds., Advnces in Neurl Informtion Processing Systems 8, MIT Press. vn Seijen, H.; vn Hsselt, H.; Whiteson, S.; nd Wiering, M A theoreticl nd empiricl nlysis of expected Srs. In Proceedings of the IEEE Symposium on Adptive Dynmic Progrmming nd Reinforcement Lerning, Wtkins, C. J. C. H., nd Dyn, P Q-lerning. Mchine lerning 8(3-4): Wtkins, C. J. C. H Lerning from Delyed Rewrds. Ph.D. Disserttion, Cmbridge University. References Jkkol, T.; Jordn, M. I.; nd Singh, S. P On the convergence of stochstic itertive dynmic progrmming lgorithms. Neurl Computtion 6(6):

Chapter 3: The Reinforcement Learning Problem. The Agent'Environment Interface. Getting the Degree of Abstraction Right. The Agent Learns a Policy

Chapter 3: The Reinforcement Learning Problem. The Agent'Environment Interface. Getting the Degree of Abstraction Right. The Agent Learns a Policy Chpter 3: The Reinforcement Lerning Problem The Agent'Environment Interfce Objectives of this chpter: describe the RL problem we will be studying for the reminder of the course present idelized form of

More information

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs?

Reinforcement Learning. CS 188: Artificial Intelligence Fall Grid World. Markov Decision Processes. What is Markov about MDPs? CS 188: Artificil Intelligence Fll 2010 Lecture 9: MDP 9/2/2010 Reinforcement Lerning [DEMOS] Bic ide: Receive feedbck in the form of rewrd Agent utility i defined by the rewrd function Mut (lern to) ct

More information

Technical Appendix. The Behavior of Growth Mixture Models Under Nonnormality: A Monte Carlo Analysis

Technical Appendix. The Behavior of Growth Mixture Models Under Nonnormality: A Monte Carlo Analysis Monte Crlo Technicl Appendix 1 Technicl Appendix The Behvior of Growth Mixture Models Under Nonnormlity: A Monte Crlo Anlysis Dniel J. Buer & Ptrick J. Currn 10/11/2002 These results re presented s compnion

More information

3: Inventory management

3: Inventory management INSE6300 Ji Yun Yu 3: Inventory mngement Concordi Februry 9, 2016 Supply chin mngement is bout mking sequence of decisions over sequence of time steps, fter mking observtions t ech of these time steps.

More information

UNIT 7 SINGLE SAMPLING PLANS

UNIT 7 SINGLE SAMPLING PLANS UNIT 7 SINGLE SAMPLING PLANS Structure 7. Introduction Objectives 7. Single Smpling Pln 7.3 Operting Chrcteristics (OC) Curve 7.4 Producer s Risk nd Consumer s Risk 7.5 Averge Outgoing Qulity (AOQ) 7.6

More information

arxiv: v1 [cs.lg] 23 Jan 2019

arxiv: v1 [cs.lg] 23 Jan 2019 Robust temporl difference lerning for criticl domins rxiv:1901.08021v1 [cs.lg] 23 Jn 2019 Richrd Klim University of Liverpool, UK richrd.klim@liverpool.c.uk Michel Kisers Centrum Wiskunde & Informtic,

More information

A Fuzzy Inventory Model With Lot Size Dependent Carrying / Holding Cost

A Fuzzy Inventory Model With Lot Size Dependent Carrying / Holding Cost IOSR Journl of Mthemtics (IOSR-JM e-issn: 78-578,p-ISSN: 9-765X, Volume 7, Issue 6 (Sep. - Oct. 0, PP 06-0 www.iosrournls.org A Fuzzy Inventory Model With Lot Size Dependent Crrying / olding Cost P. Prvthi,

More information

Gridworld Values V* Gridworld: Q*

Gridworld Values V* Gridworld: Q* CS 188: Artificil Intelligence Mrkov Deciion Procee II Intructor: Dn Klein nd Pieter Abbeel --- Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

DYNAMIC PROGRAMMING REINFORCEMENT LEARNING. COGS 202 : Week 7 Presentation

DYNAMIC PROGRAMMING REINFORCEMENT LEARNING. COGS 202 : Week 7 Presentation DYNAMIC PROGRAMMING REINFORCEMENT LEARNING COGS 202 : Week 7 Preenttion OUTLINE Recp (Stte Vlue nd Action Vlue function) Computtion in MDP Dynmic Progrmming (DP) Policy Evlution Policy Improvement Policy

More information

A ppendix to. I soquants. Producing at Least Cost. Chapter

A ppendix to. I soquants. Producing at Least Cost. Chapter A ppendix to Chpter 0 Producing t est Cost This ppendix descries set of useful tools for studying firm s long-run production nd costs. The tools re isoqunts nd isocost lines. I soqunts FIGURE A0. SHOWS

More information

Buckling of Stiffened Panels 1 overall buckling vs plate buckling PCCB Panel Collapse Combined Buckling

Buckling of Stiffened Panels 1 overall buckling vs plate buckling PCCB Panel Collapse Combined Buckling Buckling of Stiffened Pnels overll uckling vs plte uckling PCCB Pnel Collpse Comined Buckling Vrious estimtes hve een developed to determine the minimum size stiffener to insure the plte uckles while the

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 4

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 4 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 4 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Non-Deterministic Serch Picture runner, coming to the end of

More information

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions

Recap: MDPs. CS 188: Artificial Intelligence Fall Optimal Utilities. The Bellman Equations. Value Estimates. Practice: Computing Actions CS 188: Artificil Intelligence Fll 2008 Lecture 10: MDP 9/30/2008 Dn Klein UC Berkeley Recp: MDP Mrkov deciion procee: Stte S Action A Trnition P(,) (or T(,, )) Rewrd R(,, ) (nd dicount γ) Strt tte 0 Quntitie:

More information

3/1/2016. Intermediate Microeconomics W3211. Lecture 7: The Endowment Economy. Today s Aims. The Story So Far. An Endowment Economy.

3/1/2016. Intermediate Microeconomics W3211. Lecture 7: The Endowment Economy. Today s Aims. The Story So Far. An Endowment Economy. 1 Intermedite Microeconomics W3211 Lecture 7: The Endowment Economy Introduction Columbi University, Spring 2016 Mrk Den: mrk.den@columbi.edu 2 The Story So Fr. 3 Tody s Aims 4 Remember: the course hd

More information

JFE Online Appendix: The QUAD Method

JFE Online Appendix: The QUAD Method JFE Online Appendix: The QUAD Method Prt of the QUAD technique is the use of qudrture for numericl solution of option pricing problems. Andricopoulos et l. (00, 007 use qudrture s the only computtionl

More information

A Closer Look at Bond Risk: Duration

A Closer Look at Bond Risk: Duration W E B E X T E S I O 4C A Closer Look t Bond Risk: Durtion This Extension explins how to mnge the risk of bond portfolio using the concept of durtion. BOD RISK In our discussion of bond vlution in Chpter

More information

What is Monte Carlo Simulation? Monte Carlo Simulation

What is Monte Carlo Simulation? Monte Carlo Simulation Wht is Monte Crlo Simultion? Monte Crlo methods re widely used clss of computtionl lgorithms for simulting the ehvior of vrious physicl nd mthemticl systems, nd for other computtions. Monte Crlo lgorithm

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations

Announcements. CS 188: Artificial Intelligence Fall Recap: MDPs. Recap: Optimal Utilities. Practice: Computing Actions. Recap: Bellman Equations CS 188: Artificil Intelligence Fll 2009 Lecture 10: MDP 9/29/2009 Announcement P2: Due Wednedy P3: MDP nd Reinforcement Lerning i up! W2: Out lte thi week Dn Klein UC Berkeley Mny lide over the coure dpted

More information

THE FINAL PROOF SUPPORTING THE TURNOVER FORMULA.

THE FINAL PROOF SUPPORTING THE TURNOVER FORMULA. THE FINAL PROOF SUPPORTING THE TURNOVER FORMULA. I would like to thnk Aris for his mthemticl contriutions nd his swet which hs enled deeper understnding of the turnover formul to emerge. His contriution

More information

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics BERTRAND VS. COURNOT COMPETITION IN ASYMMETRIC DUOPOLY: THE ROLE OF LICENSING

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics BERTRAND VS. COURNOT COMPETITION IN ASYMMETRIC DUOPOLY: THE ROLE OF LICENSING UNIVERSITY OF NOTTINGHAM Discussion Ppers in Economics Discussion Pper No. 0/0 BERTRAND VS. COURNOT COMPETITION IN ASYMMETRIC DUOPOLY: THE ROLE OF LICENSING by Arijit Mukherjee April 00 DP 0/0 ISSN 160-48

More information

The Market Approach to Valuing Businesses (Second Edition)

The Market Approach to Valuing Businesses (Second Edition) BV: Cse Anlysis Completed Trnsction & Guideline Public Comprble MARKET APPROACH The Mrket Approch to Vluing Businesses (Second Edition) Shnnon P. Prtt This mteril is reproduced from The Mrket Approch to

More information

Inequality and the GB2 income distribution

Inequality and the GB2 income distribution Working Pper Series Inequlity nd the GB2 income distribution Stephen P. Jenkins ECINEQ WP 2007 73 ECINEC 2007-73 July 2007 www.ecineq.org Inequlity nd the GB2 income distribution Stephen P. Jenkins* University

More information

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World

Non-Deterministic Search. CS 188: Artificial Intelligence Markov Decision Processes. Grid World Actions. Example: Grid World CS 188: Artificil Intelligence Mrkov Deciion Procee Non-Determinitic Serch Dn Klein, Pieter Abbeel Univerity of Cliforni, Berkeley Exmple: Grid World Grid World Action A mze-like problem The gent live

More information

Addition and Subtraction

Addition and Subtraction Addition nd Subtrction Nme: Dte: Definition: rtionl expression A rtionl expression is n lgebric expression in frction form, with polynomils in the numertor nd denomintor such tht t lest one vrible ppers

More information

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning

Outline. CS 188: Artificial Intelligence Spring Speeding Up Game Tree Search. Minimax Example. Alpha-Beta Pruning. Pruning CS 188: Artificil Intelligence Spring 2011 Lecture 8: Gme, MDP 2/14/2010 Pieter Abbeel UC Berkeley Mny lide dpted from Dn Klein Outline Zero-um determinitic two plyer gme Minimx Evlution function for non-terminl

More information

Option exercise with temptation

Option exercise with temptation Economic Theory 2008) 34: 473 501 DOI 10.1007/s00199-006-0194-3 RESEARCH ARTICLE Jinjun Mio Option exercise with tempttion Received: 25 Jnury 2006 / Revised: 5 December 2006 / Published online: 10 Jnury

More information

CHAPTER-IV PRE-TEST ESTIMATOR OF REGRESSION COEFFICIENTS: PERFORMANCE UNDER LINEX LOSS FUNCTION

CHAPTER-IV PRE-TEST ESTIMATOR OF REGRESSION COEFFICIENTS: PERFORMANCE UNDER LINEX LOSS FUNCTION CHAPTER-IV PRE-TEST ESTIMATOR OF REGRESSION COEFFICIENTS: PERFORMANCE UNDER LINEX LOSS FUNCTION 4.1 INTRODUCTION It hs lredy been demonstrted tht the restricted lest squres estimtor is more efficient thn

More information

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies.

Announcements. CS 188: Artificial Intelligence Fall Reinforcement Learning. Markov Decision Processes. Example Optimal Policies. CS 188: Artificil Intelligence Fll 2008 Lecture 9: MDP 9/25/2008 Announcement Homework olution / review eion: Mondy 9/29, 7-9pm in 2050 Vlley LSB Tuedy 9/0, 6-8pm in 10 Evn Check web for detil Cover W1-2,

More information

Open Space Allocation and Travel Costs

Open Space Allocation and Travel Costs Open Spce Alloction nd Trvel Costs By Kent Kovcs Deprtment of Agriculturl nd Resource Economics University of Cliforni, Dvis kovcs@priml.ucdvis.edu Pper prepred for presenttion t the Americn Agriculturl

More information

CH 71 COMPLETING THE SQUARE INTRODUCTION FACTORING PERFECT SQUARE TRINOMIALS

CH 71 COMPLETING THE SQUARE INTRODUCTION FACTORING PERFECT SQUARE TRINOMIALS CH 7 COMPLETING THE SQUARE INTRODUCTION I t s now time to py our dues regrding the Qudrtic Formul. Wht, you my sk, does this men? It mens tht the formul ws merely given to you once or twice in this course,

More information

9.3. Regular Languages

9.3. Regular Languages 9.3. REGULAR LANGUAGES 139 9.3. Regulr Lnguges 9.3.1. Properties of Regulr Lnguges. Recll tht regulr lnguge is the lnguge ssocited to regulr grmmr, i.e., grmmr G = (N, T, P, σ) in which every production

More information

The Okun curve is non-linear

The Okun curve is non-linear Economics Letters 70 (00) 53 57 www.elsevier.com/ locte/ econbse The Okun curve is non-liner Mtti Viren * Deprtment of Economics, 004 University of Turku, Turku, Finlnd Received 5 My 999; ccepted 0 April

More information

Effects of Entry Restriction on Free Entry General Competitive Equilibrium. Mitsuo Takase

Effects of Entry Restriction on Free Entry General Competitive Equilibrium. Mitsuo Takase CAES Working Pper Series Effects of Entry Restriction on Free Entry Generl Competitive Euilirium Mitsuo Tkse Fculty of Economics Fukuok University WP-2018-006 Center for Advnced Economic Study Fukuok University

More information

INF 4130 Exercise set 4

INF 4130 Exercise set 4 INF 4130 Exercise set 4 Exercise 1 List the order in which we extrct the nodes from the Live Set queue when we do redth first serch of the following grph (tree) with the Live Set implemented s LIFO queue.

More information

Chapter55. Algebraic expansion and simplification

Chapter55. Algebraic expansion and simplification Chpter55 Algebric expnsion nd simplifiction Contents: A The distributive lw B The product ( + b)(c + d) C Difference of two squres D Perfect squres expnsion E Further expnsion F The binomil expnsion 88

More information

Problem Set for Chapter 3: Simple Regression Analysis ECO382 Econometrics Queens College K.Matsuda

Problem Set for Chapter 3: Simple Regression Analysis ECO382 Econometrics Queens College K.Matsuda Problem Set for Chpter 3 Simple Regression Anlysis ECO382 Econometrics Queens College K.Mtsud Excel Assignments You re required to hnd in these Excel Assignments by the due Mtsud specifies. Legibility

More information

A portfolio approach to the optimal funding of pensions

A portfolio approach to the optimal funding of pensions Economics Letters 69 (000) 01 06 www.elsevier.com/ locte/ econbse A portfolio pproch to the optiml funding of pensions Jysri Dutt, Sndeep Kpur *, J. Michel Orszg b, b Fculty of Economics University of

More information

Rational Equity Bubbles

Rational Equity Bubbles ANNALS OF ECONOMICS AND FINANCE 14-2(A), 513 529 (2013) Rtionl Equity Bubbles Ge Zhou * College of Economics, Zhejing University Acdemy of Finncil Reserch, Zhejing University E-mil: flhszh@gmil.com This

More information

Menu costs, firm size and price rigidity

Menu costs, firm size and price rigidity Economics Letters 66 (2000) 59 63 www.elsevier.com/ locte/ econbse Menu costs, firm size nd price rigidity Robert A. Buckle *, John A. Crlson, b School of Economics nd Finnce, Victori University of Wellington,

More information

Static Fully Observable Stochastic What action next? Instantaneous Perfect

Static Fully Observable Stochastic What action next?  Instantaneous Perfect CS 188: Ar)ficil Intelligence Mrkov Deciion Procee K+1 Intructor: Dn Klein nd Pieter Abbeel - - - Univerity of Cliforni, Berkeley [Thee lide were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to

More information

The Option-Critic Architecture

The Option-Critic Architecture The Option-Critic Architecture Pierre-Luc Bcon nd Jen Hrb nd Doin Precup Resoning nd Lerning Lb, School of Computer Science McGill University {pbcon, jhrb, dprecup}@cs.mcgill.c Preliminries nd Nottion

More information

(a) by substituting u = x + 10 and applying the result on page 869 on the text, (b) integrating by parts with u = ln(x + 10), dv = dx, v = x, and

(a) by substituting u = x + 10 and applying the result on page 869 on the text, (b) integrating by parts with u = ln(x + 10), dv = dx, v = x, and Supplementry Questions for HP Chpter 5. Derive the formul ln( + 0) d = ( + 0) ln( + 0) + C in three wys: () by substituting u = + 0 nd pplying the result on pge 869 on the tet, (b) integrting by prts with

More information

Continuous Optimal Timing

Continuous Optimal Timing Srlnd University Computer Science, Srbrücken, Germny My 6, 205 Outline Motivtion Preliminries Existing Algorithms Our Algorithm Empiricl Evlution Conclusion Motivtion Probbilistic models unrelible/unpredictble

More information

Bequest motives and fertility decisions B

Bequest motives and fertility decisions B Economics Letters 92 (2006) 348 352 www.elsevier.com/locte/econbse Bequest motives nd fertility decisions B Ritsuko Futgmi, Kimiyoshi Kmd b, *, Tkshi Sto c Deprtment of Mngement Informtion Systems, Chubu

More information

Fractal Analysis on the Stock Price of C to C Electronic Commerce Enterprise Ming Chen

Fractal Analysis on the Stock Price of C to C Electronic Commerce Enterprise Ming Chen 6th Interntionl Conference on Electronic, Mechnicl, Informtion nd Mngement (EMIM 2016) Frctl Anlysis on the Stock Price of C to C Electronic Commerce Enterprise Ming Chen Soochow University, Suzhou, Chin

More information

UNinhabited aerial vehicles (UAVs) are becoming increasingly

UNinhabited aerial vehicles (UAVs) are becoming increasingly A Process Algebr Genetic Algorithm Sertc Krmn Tl Shim Emilio Frzzoli Abstrct A genetic lgorithm tht utilizes process lgebr for coding of solution chromosomes nd for defining evolutionry bsed opertors is

More information

Cache CPI and DFAs and NFAs. CS230 Tutorial 10

Cache CPI and DFAs and NFAs. CS230 Tutorial 10 Cche CPI nd DFAs nd NFAs CS230 Tutoril 10 Multi-Level Cche: Clculting CPI When memory ccess is ttempted, wht re the possible results? ccess miss miss CPU L1 Cche L2 Cche Memory L1 cche hit L2 cche hit

More information

Controlling a population of identical MDP

Controlling a population of identical MDP Controlling popultion of identicl MDP Nthlie Bertrnd Inri Rennes ongoing work with Miheer Dewskr (CMI), Blise Genest (IRISA) nd Hugo Gimert (LBRI) Trends nd Chllenges in Quntittive Verifiction Mysore,

More information

Arithmetic and Geometric Sequences

Arithmetic and Geometric Sequences Arithmetic nd Geometric Sequences A sequence is list of numbers or objects, clled terms, in certin order. In n rithmetic sequence, the difference between one term nd the next is lwys the sme. This difference

More information

Chapter 2: Relational Model. Chapter 2: Relational Model

Chapter 2: Relational Model. Chapter 2: Relational Model Chpter : Reltionl Model Dtbse System Concepts, 5 th Ed. See www.db-book.com for conditions on re-use Chpter : Reltionl Model Structure of Reltionl Dtbses Fundmentl Reltionl-Algebr-Opertions Additionl Reltionl-Algebr-Opertions

More information

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights:

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights: Technicl Report Technicl Anlyst FIS Technicl - Grins And Ferts Edwrd Hutn 44 20 7090 1120 Edwrdh@freightinvesr.com Highlights: SOY The weekly schstic is wrning slowing momentum in the mrket. USD 966 ¼

More information

Technical Report Global Leader Dry Bulk Derivatives

Technical Report Global Leader Dry Bulk Derivatives Soybens Mrch 17 - Weekly Soybens Mrch 17 - Dily Weekly Close US$ 1,054 ½ RSI 59 MACD Bullish The hisgrm is widening S1 US$ 1,016 ½ S2 US$ 993 R1 US$ 1,071 R2 US$ 1,096 Dily Close US$ 1,030 RSI 60 MACD

More information

On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud

On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud On-demnd, Spot, or Both: Dynmic Resource Alloction for Executing Btch Jobs in the Cloud Ishi Menche Microsoft Reserch Ohd Shmir Weizmnn Institute Nvendu Jin Microsoft Reserch Abstrct Cloud computing provides

More information

International Monopoly under Uncertainty

International Monopoly under Uncertainty Interntionl Monopoly under Uncertinty Henry Ary University of Grnd Astrct A domestic monopolistic firm hs the option to service foreign mrket through export or y setting up plnt in the host country under

More information

ASYMMETRIC SWITCHING COSTS CAN IMPROVE THE PREDICTIVE POWER OF SHY S MODEL

ASYMMETRIC SWITCHING COSTS CAN IMPROVE THE PREDICTIVE POWER OF SHY S MODEL Document de trvil 2012-14 / April 2012 ASYMMETRIC SWITCHIG COSTS CA IMPROVE THE PREDICTIVE POWER OF SHY S MODEL Evens Slies OFCE-Sciences-Po Asymmetric switching costs cn improve the predictive power of

More information

Optimal firm's policy under lead time- and price-dependent demand: interest of customers rejection policy

Optimal firm's policy under lead time- and price-dependent demand: interest of customers rejection policy Optiml firm's policy under led time- nd price-dependent demnd: interest of customers rejection policy Abduh Syid Albn Université Grenoble Alpes, G-SCOP, F-38000 Grenoble, Frnce bduh-syid.lbn@grenoble-inp.org

More information

Roadmap of This Lecture

Roadmap of This Lecture Reltionl Model Rodmp of This Lecture Structure of Reltionl Dtbses Fundmentl Reltionl-Algebr-Opertions Additionl Reltionl-Algebr-Opertions Extended Reltionl-Algebr-Opertions Null Vlues Modifiction of the

More information

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search

4/30/2012. Overview. MDPs. Planning Agent. Grid World. Review: Expectimax. Introduction & Agents Search, Heuristics & CSPs Adversarial Search Overview CSE 473 Mrkov Deciion Procee Dn Weld Mny lide from Chri Bihop, Mum, Dn Klein, Sturt Ruell, Andrew Moore & Luke Zettlemoyer Introduction & Agent Serch, Heuritic & CSP Adverril Serch Logicl Knowledge

More information

Voluntary provision of threshold public goods with continuous contributions: experimental evidence

Voluntary provision of threshold public goods with continuous contributions: experimental evidence Journl of Public Economics 71 (1999) 53 73 Voluntry provision of threshold public goods with continuous contributions: experimentl evidence Chrles Brm Cdsby *, Elizbeth Mynes, b Deprtment of Economics,

More information

NBER WORKING PAPER SERIES A SHARPER RATIO: A GENERAL MEASURE FOR CORRECTLY RANKING NON-NORMAL INVESTMENT RISKS. Kent Smetters Xingtan Zhang

NBER WORKING PAPER SERIES A SHARPER RATIO: A GENERAL MEASURE FOR CORRECTLY RANKING NON-NORMAL INVESTMENT RISKS. Kent Smetters Xingtan Zhang BER WORKIG PAPER SERIES A SHARPER RATIO: A GEERAL MEASURE FOR CORRECTLY RAKIG O-ORMAL IVESTMET RISKS Kent Smetters Xingtn Zhng Working Pper 19500 http://www.nber.org/ppers/w19500 ATIOAL BUREAU OF ECOOMIC

More information

Research Article Existence of Positive Solution to Second-Order Three-Point BVPs on Time Scales

Research Article Existence of Positive Solution to Second-Order Three-Point BVPs on Time Scales Hindwi Publishing Corportion Boundry Vlue Problems Volume 2009, Article ID 685040, 6 pges doi:10.1155/2009/685040 Reserch Article Existence of Positive Solution to Second-Order hree-point BVPs on ime Scles

More information

Fully Observable. Perfect

Fully Observable. Perfect CS 188: Ar)ficil Intelligence Mrkov Deciion Procee II Stoch)c Plnning: MDP Sttic Environment Fully Obervble Perfect Wht ction next? Stochtic Intntneou Intructor: Dn Klein nd Pieter Abbeel - - - Univerity

More information

The Combinatorial Seller s Bid Double Auction: An Asymptotically Efficient Market Mechanism*

The Combinatorial Seller s Bid Double Auction: An Asymptotically Efficient Market Mechanism* The Combintoril Seller s Bid Double Auction: An Asymptoticlly Efficient Mret Mechnism* Rhul Jin IBM Wtson Reserch Hwthorne, NY rhul.jin@us.ibm.com Prvin Vriy EECS Deprtment University of Cliforni, Bereley

More information

A Sharper Ratio: A General Measure for Correctly Ranking Non-Normal Investment Risks

A Sharper Ratio: A General Measure for Correctly Ranking Non-Normal Investment Risks A Shrper Rtio: A Generl Mesure for Correctly Rnking on-orml Investment Risks Kent Smetters Xingtn Zhng This Version: Februry 3, 2014 Abstrct While the Shrpe rtio is still the dominnt mesure for rnking

More information

PRICING CONVERTIBLE BONDS WITH KNOWN INTEREST RATE. Jong Heon Kim

PRICING CONVERTIBLE BONDS WITH KNOWN INTEREST RATE. Jong Heon Kim Kngweon-Kyungki Mth. Jour. 14 2006, No. 2, pp. 185 202 PRICING CONVERTIBLE BONDS WITH KNOWN INTEREST RATE Jong Heon Kim Abstrct. In this pper, using the Blck-Scholes nlysis, we will derive the prtil differentil

More information

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes

Maximum Expected Utility. CS 188: Artificial Intelligence Fall Preferences. MEU Principle. Rational Preferences. Utilities: Uncertain Outcomes CS 188: Artificil Intelligence Fll 2011 Mximum Expected Utility Why hould we verge utilitie? Why not minimx? Lecture 8: Utilitie / MDP 9/20/2011 Dn Klein UC Berkeley Principle of mximum expected utility:

More information

PSAS: Government transfers what you need to know

PSAS: Government transfers what you need to know PSAS: Government trnsfers wht you need to know Ferury 2018 Overview This summry will provide users with n understnding of the significnt recognition, presenttion nd disclosure requirements of the stndrd.

More information

A Static Model for Voting on Social Security

A Static Model for Voting on Social Security A Sttic Model for Voting on Socil Security Henning Bohn Deprtment of Economics University of Cliforni t Snt Brbr Snt Brbr, CA 93106, USA; nd CESifo Phone: 1-805-893-4532; Fx: 1-805-893-8830. E-mil: bohn@econ.ucsb.edu

More information

ACHIEVING ALARP WITH SAFETY INSTRUMENTED SYSTEMS

ACHIEVING ALARP WITH SAFETY INSTRUMENTED SYSTEMS ACHIEVING ALARP WITH SAFETY INSTRUMENTED SYSTEMS C.R. Timms, MIEE, United Kingdom, Tel: + 44 (0) 9 88668, Emil: c.timms@ifb.co.uk Keywords: ALARP, hzrds, risk, sfety, SIS. Abstrct This pper sets out methodology

More information

Name Date. Find the LCM of the numbers using the two methods shown above.

Name Date. Find the LCM of the numbers using the two methods shown above. Lest Common Multiple Multiples tht re shred by two or more numbers re clled common multiples. The lest of the common multiples is clled the lest common multiple (LCM). There re severl different wys to

More information

FINANCIAL ANALYSIS I. INTRODUCTION AND METHODOLOGY

FINANCIAL ANALYSIS I. INTRODUCTION AND METHODOLOGY Dhk Wter Supply Network Improvement Project (RRP BAN 47254003) FINANCIAL ANALYSIS I. INTRODUCTION AND METHODOLOGY A. Introduction 1. The Asin Development Bnk (ADB) finncil nlysis of the proposed Dhk Wter

More information

OPEN BUDGET QUESTIONNAIRE UKRAINE

OPEN BUDGET QUESTIONNAIRE UKRAINE Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE UKRAINE September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC 20002

More information

Market uncertainty, macroeconomic expectations and the European sovereign bond spreads.

Market uncertainty, macroeconomic expectations and the European sovereign bond spreads. Mrket uncertinty, mcroeconomic expecttions nd the Europen sovereign bond spreds. Dimitris A. Georgoutsos Athens University of Economics & Business, Deprtment of Accounting & Finnce 76, Ptission str., 434,

More information

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights:

Technical Report Global Leader Dry Bulk Derivatives. FIS Technical - Grains And Ferts. Highlights: Technicl Report Technicl Anlyst FIS Technicl - Grins And Ferts Edwrd Hutn 44 20 7090 1120 Edwrdh@freightinvesr.com Highlights: SOY The weekly chrt is chowing lower high suggesting wekness going forwrd,

More information

MATH 236 ELAC MATH DEPARTMENT FALL 2017 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

MATH 236 ELAC MATH DEPARTMENT FALL 2017 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. MATH 236 ELAC MATH DEPARTMENT FALL 2017 TEST 1 REVIEW SHORT ANSWER. Write the word or phrse tht best completes ech sttement or nswers the question. 1) The supply nd demnd equtions for certin product re

More information

Search-based Uncertainty-wise Requirements Prioritization

Search-based Uncertainty-wise Requirements Prioritization Simul Reserch Lbortory, Technicl Report, 2017-06 Jun, 2017 Serch-bsed Uncertinty-wise Requirements Prioritiztion Yn Li 1, Mn Zhng 2, To Yue 2,3, Shukt Ali 2 nd Li Zhng 1 1 Beihng University, 2 Simul Reserch

More information

Incentives from stock option grants: a behavioral approach

Incentives from stock option grants: a behavioral approach Incentives from stock option grnts: behviorl pproch Hmz Bhji To cite this version: Hmz Bhji. Incentives from stock option grnts: behviorl pproch. 6th Interntionl Finnce Conference (IFC)- Tunisi, Mr 2011,

More information

Characterizing Higher-Order Ross More Risk Aversion by Comparison of Risk Compensation

Characterizing Higher-Order Ross More Risk Aversion by Comparison of Risk Compensation Chrcterizing Higher-Order Ross More Risk Aversion by Comprison of Risk Compenstion Guoqing Tin Yougong Tin b,c Deprtment of Economics, Texs A&M University, College Sttion, TX77843, USA b School of Economics,

More information

Information Acquisition and Disclosure: the Case of Differentiated Goods Duopoly

Information Acquisition and Disclosure: the Case of Differentiated Goods Duopoly Informtion Acquisition nd Disclosure: the Cse of Differentited Goods Duopoly Snxi Li Jinye Yn Xundong Yin We thnk Dvid Mrtimort, Thoms Mriotti, Ptrick Rey, Wilfried Snd-Zntmn, Frnces Xu nd Yongsheng Xu

More information

Insurance: Mathematics and Economics

Insurance: Mathematics and Economics Insurnce: Mthemtics nd Economics 43 008) 303 315 Contents lists vilble t ScienceDirect Insurnce: Mthemtics nd Economics journl homepge: www.elsevier.com/locte/ime he design of equity-indexed nnuities Phelim

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning n-step bootstrapping Daniel Hennes 12.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 n-step bootstrapping Unifying Monte Carlo and TD n-step TD n-step Sarsa

More information

MARKET POWER AND MISREPRESENTATION

MARKET POWER AND MISREPRESENTATION MARKET POWER AND MISREPRESENTATION MICROECONOMICS Principles nd Anlysis Frnk Cowell Note: the detil in slides mrked * cn only e seen if you run the slideshow July 2017 1 Introduction Presenttion concerns

More information

Multi-step Bootstrapping

Multi-step Bootstrapping Multi-step Bootstrapping Jennifer She Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto February 7, 2017 J February 7, 2017 1 / 29 Multi-step Bootstrapping Generalization

More information

Preference Cloud Theory: Imprecise Preferences and Preference Reversals Oben Bayrak and John Hey

Preference Cloud Theory: Imprecise Preferences and Preference Reversals Oben Bayrak and John Hey Preference Cloud Theory: Imprecise Preferences nd Preference Reversls Oben Byrk nd John Hey This pper presents new theory, clled Preference Cloud Theory, of decision-mking under uncertinty. This new theory

More information

FIS Technical - Capesize

FIS Technical - Capesize Technicl Report Technicl Anlyst FIS Technicl - Cpesize Edwrd Hutn 442070901120 Edwrdh@freightinvesr.com Client Reltions Andrew Cullen 442070901120 Andrewc@freightinvesr.com Highlights: Cpesize Index- Holding

More information

A comparison of quadratic discriminant function with discriminant function based on absolute deviation from the mean

A comparison of quadratic discriminant function with discriminant function based on absolute deviation from the mean A comprison of qudrtic discriminnt function with discriminnt function bsed on bsolute devition from the men S. Gneslingm 1, A. Nnthkumr Siv Gnesh 1, 1 Institute of Informtion Sciences nd Technology College

More information

First version: September 1997 This version: October On the Relevance of Modeling Volatility for Pricing Purposes

First version: September 1997 This version: October On the Relevance of Modeling Volatility for Pricing Purposes First version: September 1997 This version: October 1999 On the Relevnce of Modeling Voltility for Pricing Purposes Abstrct: Mnuel Moreno 3 Deprtment of Economics nd Business Universitt Pompeu Fbr Crrer

More information

Rates of Return of the German PAYG System - How they can be measured and how they will develop

Rates of Return of the German PAYG System - How they can be measured and how they will develop Rtes of Return of the Germn PAYG System - How they cn be mesured nd how they will develop Christin Benit Wilke 97-2005 me Mnnheimer Forschungsinstitut Ökonomie und Demogrphischer Wndel Gebäude L 13, 17_D-68131

More information

"Multilateralism, Regionalism, and the Sustainability of 'Natural' Trading Blocs"

Multilateralism, Regionalism, and the Sustainability of 'Natural' Trading Blocs "Multilterlism, Regionlism, nd the Sustinility of 'Nturl' Trding Blocs" y Eric Bond Deprtment of Economics Penn Stte June, 1999 Astrct: This pper compres the mximum level of world welfre ttinle in n incentive

More information

Managerial Incentives and Financial Contagion

Managerial Incentives and Financial Contagion WP/04/199 ngeril Incentives nd Finncil Contgion Sujit Chkrvorti nd Subir Lll 004 Interntionl onetry Fund WP/04/199 IF Working Pper Policy Development nd Review Deprtment ngeril Incentives nd Finncil Contgion

More information

Trigonometry - Activity 21 General Triangle Solution: Given three sides.

Trigonometry - Activity 21 General Triangle Solution: Given three sides. Nme: lss: p 43 Mths Helper Plus Resoure Set. opyright 003 rue. Vughn, Tehers hoie Softwre Trigonometry - tivity 1 Generl Tringle Solution: Given three sides. When the three side lengths '', '' nd '' of

More information

Chapter 4. Profit and Bayesian Optimality

Chapter 4. Profit and Bayesian Optimality Chpter 4 Profit nd Byesin Optimlity In this chpter we consider the objective of profit. The objective of profit mximiztion dds significnt new chllenge over the previously considered objective of socil

More information

ECON 105 Homework 2 KEY Open Economy Macroeconomics Due November 29

ECON 105 Homework 2 KEY Open Economy Macroeconomics Due November 29 Instructions: ECON 105 Homework 2 KEY Open Economy Mcroeconomics Due Novemer 29 The purpose of this ssignment it to integrte the explntions found in chpter 16 ok Kennedy with the D-S model nd the Money

More information

Technical Report Global Leader Dry Bulk Derivatives

Technical Report Global Leader Dry Bulk Derivatives Soybens Mrch 17 - Weekly Soybens Mrch 17 - Dily Source Bloomberg Weekly Close US$ 1,026 7/8 RSI 56 MACD Bullish, the hisgrm is flt S1 US$ 1,032 ½ S2 US$ 1,001 R1 US$ 1,072 R2 US$ 1,080 Dily Close US$ 1,042

More information

Optimal Trading Strategies in a Limit Order Market with Imperfect Liquidity

Optimal Trading Strategies in a Limit Order Market with Imperfect Liquidity Optiml rding Strtegies in Limit Order Mrket with Imperfect Liquidity P. Kovlev,, G. Iori b City University, Deprtment of Economics, D Socil Sciences Bldg, Whiskin St., London ECR JD b City University,

More information

Does Population Aging Represent a Crisis for Rich Societies?

Does Population Aging Represent a Crisis for Rich Societies? First drft Does Popultion Aging Represent Crisis for Rich Societies? by Gry Burtless THE BROOKINGS INSTITUTION Jnury 2002 This pper ws prepred for session of the nnul meetings of the Americn Economic Assocition

More information

OPEN BUDGET QUESTIONNAIRE SOUTH AFRICA

OPEN BUDGET QUESTIONNAIRE SOUTH AFRICA Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE SOUTH AFRICA September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC

More information

International Budget Partnership OPEN BUDGET QUESTIONNAIRE POLAND

International Budget Partnership OPEN BUDGET QUESTIONNAIRE POLAND Interntionl Budget Prtnership OPEN BUDGET QUESTIONNAIRE POLAND September 28, 2007 Interntionl Budget Prtnership Center on Budget nd Policy Priorities 820 First Street, NE Suite 510 Wshington, DC 20002

More information

Optimal incentive contracts under loss aversion and inequity aversion

Optimal incentive contracts under loss aversion and inequity aversion Fuzzy Optim Decis Mking https://doi.org/10.1007/s10700-018-9288-1 Optiml incentive contrcts under loss version nd inequity version Chi Zhou 1 Jin Peng 2 Zhibing Liu 2 Binwei Dong 3 Springer Science+Business

More information

NBER WORKING PAPER SERIES THE IMPACT OF TRADE ON INTRAINDUSTRY REALLOCATIONS AND AGGREGATE INDUSTRY PRODUCTIVITY: A COMMENT

NBER WORKING PAPER SERIES THE IMPACT OF TRADE ON INTRAINDUSTRY REALLOCATIONS AND AGGREGATE INDUSTRY PRODUCTIVITY: A COMMENT NBER WORKING PAPER SERIES THE IMPACT OF TRAE ON INTRAINUSTRY REALLOCATIONS AN AGGREGATE INUSTRY PROUCTIVITY: A COMMENT Richrd E. Bldwin Frederic Robert-Nicoud Working Pper 078 http://www.nber.org/ppers/w078

More information