Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4.

Size: px
Start display at page:

Download "Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4."

Transcription

1 Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4. If the reader will recall, we have the following problem-specific constants in the main text: the size of the global observer set σ, parameter L > 0 from the continuity assumption, error bound max (Mσ M σ ) 1 σ i1 M x i M xi (ν i ν 0 ), where the max is taken from ν 0, ν 1,, ν σ 0, 1 n, and the maximum difference in the expected reward R max max x1,x X,ν 0,1 n r(x 1, ν) r(x, ν). For technical reasons, we also defined φ(ν) max((ν, 1), 0) to adjust ν to the nearest vector in 0, 1 n, and r(x, ν) r(x, φ(ν)), ν R n \ 0, 1 n to preserve the Lipschitz continuity throughout R n. To make our proof clearer, we define v(t) as the state of any variable v by the end of time step t. Our analysis is based on the snapshot of all variables just before the statement t t + 1 (Line 14 and 0). One batch processing in exploration phase is called one round, and then n σ is increased by 1. Denote ˆν (j) as the estimated mean of outcomes after j rounds of exploration. For example, at time t, the estimated mean of outcomes is ˆν(t) and the exploration counter is n σ (t), so we have ˆν (nσ(t)) ˆν(t). And for time step t + 1, the player will use the previous knowledge of ˆν(t) to get ˆx(t + 1) argmax x X r(x, ˆν(t)) and ˆx (t + 1) argmax x X \{ˆx(t+1)} r(x, ˆν(t)). In the following analysis, the frequency function is set to f X (t) ln t + ln X. Note that by using f X (t), we can αf construct the confidence interval X (t) n σ to eliate failures with high probability. Define N(t) X t, which will be frequently used in our analysis, then exp{ f X (t)} N(t) 1. Let α 8L β σ a, where a > 0 is a parameter to be tuned later. The symbols used in the proof are listed in Table 1, and to facilitate the understanding of our proof, each lemma is briefly summarized in Table. Below we give an outline of the complete proof, which consists of three main parts: First, we introduce some general insights, present the preliaries, and prove basic properties of our model and the Algorithm GCB, which is shared by the proofs of both distribution-independent and distribution-dependent regret bounds. Next, we obtain the concentration property of the empirical mean of outcomes via Azuma-Hoeffding Inequality (Fact A.) in Lemma A.. Lemma A.1 shows that our algorithm bounds the number of exploration rounds by O(T / log T ), which implies that our algorithm will not play exploration for too long. In Lemma A.14, we prove that when the gap between the estimated optimal action ˆx and the second optimal ˆx is large (i.e. the first condition in Line 8), with low probability the estimated optimal action is sub-optimal. This means that our global confidence bound will exclude sub-optimal actions effectively. In Section A.1, we prove the distribution-independent regret bound of O(T log T ) (Theorem 4.1 in the main text). In Lemma A.15, we show that the gap between the estimated optimal action ˆx and the real one x decays exponentially with the number of exploration rounds. Thus, the penalty in exploitation phase can be derived in Lemma A.16. Then, we use Lemmas A.1 and A.16 to prove Theorem 4.1 in the main text. Hence the distribution-independent bound O(T log T ) is achieved. In Section A., we prove the distribution-dependent bound of O(log T ) related to the predetered distribution p, assug that the optimal action ( x is unique ) (Theorem 4. in the main text). First, we show in Lemma A.17 that, when the algorithm plays Ω ln t+ ln X rounds of exploration, the probability of the estimated optimal action ˆx being sub-optimal is low. Then, in Lemma A.18, we combine the results of Lemmas A.14 and A.17 and show that with a low probability the algorithm exploits with a sub-optimal action. Thus, Lemma A.18 is enough to bound the regret ( of exploitation. ) Next, we bound the regret of exploration by bounding the number of exploration rounds to O ln T + ln X in Lemma A.. This is done by showing that whenever the algorithm has conducted ( ) Θ ln t+ ln X rounds of exploration, with high probability it switches to exploitation (Lemma A.19), and then aggregating multiple switches between exploration and exploitation in the proof of Lemma A.. Finally, we combine Lemmas A.18 and A. to prove Theorem 4. in the main text. Fact A.1. The following probability laws will be used in the analysis.

2 Symbols in the main text v(t) v 0, 1 n ṽ 0, 1 n ν ˆν x X r(x, v), r(x, ν) R M x R mx n y(t) R m x(t) y x, max, σ X L R max f X (t) x ˆx(t) ˆx (t) Symbols in the proof n σ µ(t), η(t) X Good, X Bad X F Good, F Bad L CI, L c CI E Explore, E InExplore, E FinExplore E Exploit δ xi,x j, ˆδ xi,x j (t) R G k Definition state any variable v by the end of time step t outcomes of the environment estimation of outcomes through inversion mean of outcomes empirical mean of outcomes action x in action set X reward function taking x and v, and expected reward function taking x and ν transformation matrix of action x where m x depends on x feedback vector under the choice of x(t) vector that stacks feedbacks from different times reward gap of action x, of the maximum, of the (positive) imum global observer set of actions Lipschitz constant distribution-independent error bound from σ distribution-independent largest gap of expected reward frequency function real optimal action estimated optimal action at time t estimated second optimal action at time t Definition exploration counter threshold functions good action set, and bad action set event of choosing good action set, and bad action set event of occurring gap being larger than confidence interval and its complement event of doing exploration, being in the middle of it, and being at its end event of exploitation reward gap of action x i and x j, and its estimated value at time t the event indicating the first occurrence of k rounds of exploration Table 1. List of symbols in the proof. Succinct interpretation of the results Dependence Lemma A. Estimate of outcomes concentrates around the mean. Fact A. Lemma A.7 Difference of real and estimated gap is bounded. Lemma A.8 Estimated error of outcomes is small compared to the confidence interval. Lemma A. Lemma A.1 The counter of exploration is bounded within O(T / log T ). Lemma A.14 Finding a bad action to fail confidence interval occurs rarely. Lemma A.8 Lemma A.15 Incurring a large penalty for current optimal action is rare. Lemma A. Lemma A.16 The penalty in the exploitation phase is bounded. Lemmas A.14, A.15 Theorem 4.1 Distribution-independent bound: O(T / log T ). Lemmas A.1, A.16 Lemma A.17 With enough exploration, finding a bad action is rare. Lemma A. Lemma A.18 Finding a bad action and exploiting it become rare as time elapses. Lemmas A.14, A.17 Lemma A.19 With enough exploration, finding a good action but yet exploring becomes rare. Lemmas A., A.7, A.1 Lemma A.0 Once the algorithm performs enough exploration, it switches to exploitation. Lemmas A.17, A.19 Lemma A. Exploration rounds are bounded. Lemma A.0 Theorem 4. Distribution-dependent bound: O(log T ). Lemmas A.18, A. Table. List of lemmas and their dependencies in the proof.

3 Law of Conditional Probability: Pr A B Pr A B Pr B. Law of Total Probability: if {B n : n 1,, } is a set of disjoint events whose union is the entire sample space, then Pr A Pr A B n. n Fact A. (Azuma-Hoeffding Inequality in Euclidean Space (Theorem 1.8 of (Hayes, 00))). Let X (X 0,, X n ) be a very-weak martingale, which is defined for every i, E X i X i 1 X i 1, and it takes values in Euclidean Space, such that for every i, X i R d. Suppose X 0 0, and for i 1,, n, X i X i 1 1. Then, for every ɛ > 0, (ɛ 1) 1 Pr X n ɛ < e n < e e ɛ n. (1) We can use the preceding fact to obtain the concentration property of outcomes during exploration. Lemma A. (Concentration during exploration). After the exploration round i 1,,, j at t 1, t,, t j respectively, we use the inverse to get ṽ i I(M σ, y(t i )) M + σ y i and their mean is ˆν (j) 1 j j i1 ṽi. Then, γ > 0: Pr ν ˆν (j) γ Proof. For each i, let X i be the sequence sum satisfying X i i l1 e exp{ γ j βσ }. () ν ṽ l ν ṽ, where E i 0, and ν ṽi 1. So X i X i 1 ν ṽi implies X i X i 1 1. And we know that ṽ i is independent of the previous inverse ṽ 1,, ṽ i 1, so it holds that E X i X i 1 X i 1 E X i X i 1 X i 1 () ν ṽi E X i 1 (4) ν ṽi E 0. (5) Therefore, X (X 0,, X n ) satisfies the definition of a very-weak martingale. Apply Fact A., and it will achieve the bound ɛ > 0, Pr X j ɛ < e e ɛ j. Let γ ɛ j, as ν ˆν (j) βσ j X j, we will get: γ > 0, Pr ν ˆν (j) γ < e exp{ γ j βσ }. (6) Under a predetered outcome distribution p with mean outcome vector ν and x argmax x X r(x, ν), in the main text we define the gap: x r(x, ν) r(x, ν), (7) max max{ x : x X }, (8) { x : x X, x > 0}. (9) Definition A.4 (Good actions / bad actions). Based on the distance to the optimal action, define good actions and bad actions as: X Good {x : x X, x 0} (10) X Bad {x : x X, x > 0}. (11) Therefore, X X Good X Bad. Moreover, x X Good. (x is unique if and only if X Good 1.)

4 Definition A.5 (Events of finding a good action / bad action). Define ˆx(t) argmax x X r(x, ˆν(t 1)) as the current optimal action at time t. Let F Bad (t) be the event that fails to choose the optimal action at time t. Formally, F Bad (t) and its complement event are: F Bad (t) {ˆx(t) X Bad} (1) F Good (t) {ˆx(t) X Good}. (1) To build the connection with the exploration round j, we define the time-invariant event F(j) Bad as the event in which the algorithm fails to choose the optimal action after j rounds of exploration: F Bad (j) {ˆx (j) X Bad} (14) { ˆx (j) X Good}, (15) F Good (j) where ˆx (j) argmax x X r(x, ˆν (j) ). By definition, it is always true that F Bad (n σ(t 1)) F Bad (t) and F Good (n σ(t 1)) F Good (t). Definition A.6 (Estimated gap and real gap). For any pair of action x i, x j X, defined the gap of estimated reward between x i, x j as ˆδ xi,x j (t) r(x i, ˆν(t 1)) r(x j, ˆν(t 1)), and the gap of real reward between them as δ xi,x j r(x i, ν) r(x j, ν). Lemma A.7 (Bound of the gap). For any pair of action x i, x j X, we establish the inequality over time t as: ˆδ xi,x j (t) δ xi,x j L ν ˆν(t 1). (16) Proof. ˆδ xi,x j (t) δ xi,x j (r(x i, ˆν(t 1)) r(x i, ν)) (r(x j, ˆν(t 1)) r(x j, ν)) (17) r(x i, ˆν(t 1)) r(x i, ν) + r(x j, ˆν(t 1)) r(x j, ν) (18) L ν ˆν(t 1) + L ν ˆν(t 1) (19) L ν ˆν(t 1). (0) Lemma A.8 (Small error in estimation). Given time t, for f X (t) ln t + ln X, α 8L β σ a, and a > 0, αf X (t) γ > 0, Pr ν ˆν(t 1) γ n σ (t 1) e 4γ L X N(t)1 a. (1)

5 Proof. As the time of exploration equals to the counter n σ (t 1) and ˆν(t 1) ˆν (nσ(t 1)), we have: αf X (t) Pr ν ˆν(t 1) γ n σ (t 1) t 1 Pr ν ˆν (nσ(t 1)) αf X (t) γ n j1 σ (t 1) n σ(t 1) j t 1 Pr ν ˆν (j) αf X (t) γ n σ (t 1) j j j1 t 1 Pr ν ˆν (j) αf X (t) γ j j1 ( e exp γ t 1 j1 t 1 j1 t 1 j1 { } e exp γ αf X (t) βσ e N(t) γ α βσ. ) αf X (t) j j β σ () () (4) (5) {Lemma A.} (6) (7) (8) As α 8L β σ a and a > 0, the probability is: αf X (t) Pr ν ˆν(t 1) γ n σ (t 1) e (t 1) N(t) γ α β σ e 4γ L X N(t)1 a. (9) Definition A.9 (Events of exploration or exploitation). In Algorithm GCB, for any time t, we can define three events, namely the beginning of exploration E Explore (t), in the process of exploration E InExplore (t) and exploitation E Exploit (t). They are mutually exclusive, and E Explore (t) E InExplore (t) E Exploit (t) is always true. Formally, it is: E Explore (t) {state(t) begin exploration} (0) E InExplore (t) {state(t) in exploration} (1) E Exploit (t) {state(t) exploitation}. () Definition A.10 (Events related to confidence interval). In Line 8 of Algorithm GCB, we can define the event for the first condition where the gap of estimated optimal action and other actions is larger than confidence interval as L CI (t) at time t, i.e., { } αf L CI (t) x X \ {ˆx(t)}, ˆδˆx(t),x X (t) (t) >. () n σ (t 1) And its complement event is: { } L c αf CI(t) x X \ {ˆx(t)}, ˆδˆx(t),x X (t) (t). (4) n σ (t 1) Remark { 1. In Algorithm GCB, we know } that the first condition of Line 8 is true, if and only if L CI (t) x X \ {ˆx(t)}, ˆδˆx(t),x (t) > αfx (t) n σ(t 1) occurs. Thus, we use the equivalent event in the following proof to make it clearer.

6 Definition A.11. For simplicity, suppose α 8L β σ a, constant a > 0 and θ > 0, then we can define two threshold functions: Note that η(t) and µ(t) are values, not random variables. Proposition A.1. If t > T 0 (1+θa) α η(t) t fx (t) (5) µ(t) (1 + θa) αf X (t). (6), then µ(t) < η(t). (It can be verified by the definition.) Lemma A.1 (Exploration Ceiling). Let α 8L β σ a and a > 0. For any time t, if the exploration counter n σ (t 1) > η(t), the algorithm will play exploitation surely, i.e., Pr E Explore (t) n σ (t 1) > η(t) 0. (7) Proof. If n σ (t 1) > η(t), then Line 8 of Algorithm GCB will be true because of its second condition. According to the algorithm, it will not go to exploration phase, so we know that which restricts n σ (t 1) to no larger than η(t) + 1 at any time t. Pr E Explore (t) n σ (t 1) > η(t) 0, (8) Lemma A.14 (Low failure probability of the confidence interval). Let f X (t) ln t+ ln X, α 8L β σ a and 0 < a 1. For any time t, the probability that both choosing bad action and the gap is larger than confidence interval satisfies: Pr L CI (t) F Bad (t) e X N(t). (9) Proof. The definition of F Bad (t) {ˆx(t) X Bad } implies x X \ {ˆx(t)}. Their gap is ˆδˆx(t),x (t) r(ˆx(t), ˆν(t 1)) r(x, ˆν(t 1)) (40) r(ˆx(t), ˆν(t 1)) r(x, ˆν(t 1)) + r(x, ν) r(ˆx(t), ν) {Definition of x } (41) r(ˆx(t), ˆν(t 1)) r(ˆx(t), ν) + r(x, ν) r(x, ˆν(t 1)) (4) L ν ˆν(t 1). (4) Thus, we can write the probability as: Pr L CI (t) F Bad (t) αf Pr x X \ {ˆx(t)}, ˆδˆx(t),x X (t) n σ (t 1) F Bad (t) αf X (t) Pr ˆδˆx(t),x n σ (t 1) F Bad (t) αf X (t) Pr ˆδˆx(t),x n σ (t 1) Pr ν ˆν (nσ(t 1)) 1 αf X (t) L n σ (t 1) e X N(t)1 1 a {Lemma A.8 with γ 1 L } (49) e X N(t). {0 < a 1 } (50) (44) (45) (46) (47) (48)

7 A.1. Distribution-independent bound Lemma A.15. For any ɛ > 0, j 1,,, t 1, when the algorithm has played n σ (t 1) j rounds exploitation at time t, the probability of incurring penalty ɛ satisfies Pr ˆx(t) ɛ n σ (t 1) j e exp { j } ɛ 8L βσ. (51) Proof. ˆx(t) is the real gap of reward between x and ˆx(t): ˆx(t) δ x,ˆx(t) δ x,ˆx(t) + ˆδˆx(t),x (t) {Definition of ˆx(t)} (5) δ x,ˆx(t) ˆδ x,ˆx(t)(t) r(x, ν) r(x, ˆν(t 1)) + r(ˆx(t), ν) r(ˆx(t), ˆν(t 1)) (55) L ν ˆν(t 1) L ν ˆν (nσ(t 1)) (56) When n σ (t 1) j, we can conclude that the probability of incurring a large penalty is: Pr ˆx(t) ɛ n σ (t 1) j Pr L ν ˆν (nσ(t 1)) ɛ n σ (t 1) j (57) Pr ν ˆν (j) ɛ (58) L e exp { j } ɛ 8L βσ. {Lemma A.} (59) (5) (54) In Algorithm GCB, we know that the exploitation is penalized with respect to the regret only if it chooses a bad action and exploits it simultaneously, i.e., F Bad (t) and E Exploit (t) are both satisfied. When the algorithm chooses exploitation at time t, the regret at that time will be E ˆx(t) I F Bad (t) E Exploit (t). Lemma A.16 (Penalty of exploitation). ɛ > 0, Algorithm GCB with f X (t) ln t + ln X, α 8L β σ a, 0 < a 1, and η(t) t f X (t), the penalty in the exploitation phase at time t will be in expectation: E ˆx(t) I F Bad (t) E Exploit (t) ( }) e ɛ + max X N(t) + e η(t) ɛ exp { 8L βσ. (60) Proof. ɛ > 0, the expectation satisfies: E ˆx(t) I F Bad (t) E Exploit (t) (61) E ˆx(t) F Bad (t) E Exploit (t) Pr F Bad (t) E Exploit (t) (6) E ˆx(t) ˆx(t) < ɛ F Bad (t) E Exploit (t) Pr ˆx(t) < ɛ F Bad (t) E Exploit (t) + E ˆx(t) ˆx(t) ɛ F Bad (t) E Exploit (t) Pr ˆx(t) ɛ F Bad (t) E Exploit (t) (6) ɛ Pr ˆx(t) < ɛ F Bad (t) E Exploit (t) + max Pr ˆx(t) ɛ F Bad (t) E Exploit (t) (64) ɛ + max Pr ˆx(t) ɛ F Bad (t) E Exploit (t). (65) By definition, exploration event E Exploit (t) {L CI (t) n σ (t) > η(t)} happens when no other action is in the gap L CI (t) or the counter n σ (t) > η(t). And we know that n σ (t) is no larger than η(t) + 1, because it is a hard constraint implied

8 by Lemma A.1. Therefore, the probability in the second term is the joint of these two events: Pr ˆx(t) ɛ F Bad (t) E Exploit (t) (66) Pr ˆx(t) ɛ F Bad (t) (L CI (t) n σ (t 1) > η(t)) (67) Pr ˆx(t) ɛ F Bad (t) L CI (t) n σ (t 1) η(t) + Pr ˆx(t) ɛ F Bad (t) n σ (t 1) > η(t) (68) Pr F Bad (t) L CI (t) + Pr ˆx(t) ɛ F Bad (t) n σ (t 1) η(t) + 1 (69) } e X N(t) + e η(t) ɛ exp { 8L βσ. {Lemma A.14 and A.15} (70) Therefore, we have E ˆx(t) I F Bad (t) E Exploit (t) ( }) e ɛ + max X N(t) + e η(t) ɛ exp { 8L βσ. (71) Theorem 4.1 (in the main text): (Distribution-independent bound). Let f X (t) ln t + ln X, and α 4L βσ. The distribution-independent regret bound of Algorithm GCB is: ) R(T ) R max σ T 8 (ln T + ln X ) + LT + Rmax ( σ + 4e X 4. (7) Proof. From the algorithm, we know that it either plays actions in the exploration phase or in the exploitation phase. The exploration phase will take time σ to finish, and its penalty is x σ x. And the penalty of playing exploitation is ˆx(t) at each time step t. R(T ) x E n σ (T ) + x σ T E ˆx(t) I F Bad (t) E Exploit (t). (7) From Lemma A.1, we can infer that if the exploration counter n σ (t) > η(t) t f X (t), it will no longer play exploration. Therefore, the expected number of rounds of exploration satisfies E n σ (T ) T f X (T ) + 1, so the regret for exploration is x E n σ (T ) ( ) x T fx (T ) + 1. (74) x σ Let ɛ 4L t 1, then η(t) t f X (t) and η(t)ɛ 8L βσ exploitation part: Therefore, we will have R(T ) x σ f X (t). Therefore, we can apply Lemma A.16 to get the regret of T E ˆx(t) I F Bad (t) E Exploit (t) (75) T ( }) e ɛ + max X N(t) + e η(t) ɛ exp { 8L β (76) σ T ( ) 4L t 1 e + max X + e N(t) (77) 8 ( ) LT e 1 + max X + e X 4. (78) x σ ( x T 8 (ln T + ln X ) + LT + x σ ) x + 4e X 4 max. (79)

9 As x and max is bounded by R max under any distribution, we conclude that: R(T ) R max σ T (ln T + ln X ) + 8 LT + Rmax ) ( σ + 4e X 4. (80) A.. Distribution-dependent Bound Under a predetered outcome distribution p, the imum gap between optimal action and sub-optimal action is. It follows that: Lemma A.17 (Condition of choosing optimal action). Suppose we have played exploration round j, at time t. If b 1, j b 8L β σ f X (t), Algorithm GCB will choose the optimal action with high probability: j b 8L βσ f X (t), Pr F Bad (j) e. (81) t N(t) b 1 Proof. According to the definition, F(j) Bad only occurs only if one sub-optimal action has the largest estimated reward. Pr F(j) Bad (8) Pr x b X Bad, x g X Good, r(x g, ˆν (j) ) r(x b, ˆν (j) ) (8) Pr x b X Bad, x g X Good, r(x g, ˆν (j) ) r(x b, ˆν (j) ) (84) Pr r(x g, ˆν (j) ) r(x b, ˆν (j) ) 0 {Union bound} (85) x b X Bad x g X Good x b X Bad x g X Good x b X Bad x g X Good x b X Bad x g X Good ( Pr r(x g, ν) r(x g, ˆν (j) ) + Pr r(x b, ν) r(x b, ˆν (j) ) < ) ( Pr r(x g, ν) r(x g, ˆν (j) ) + Pr r(x b, ν) r(x b, ˆν (j) ) > ) Pr L ν ˆν (j). (88) Thus, by Lemma A., it is Pr F(j) Bad Pr L ν ˆν (j) > x b X Bad,x g X Good (89) 4e j exp{ 8L β } x b X Bad,x g X Good σ (90) 4e X Bad X Good exp{ j 8L βσ } (91) e X exp{ j 8L βσ }. { X Bad + X Good X } (9) Therefore, if j b 8L β σ f X (t), b 1, we can conclude: Pr F Bad (j) e X (86) (87) N(t) b e. (9) tn(t) b 1

10 Lemma A.18 (Exploit the Optimal Action). Let α 8L β σ a, 0 < a 1 and θ. For any time t > T 0, the probability of F Bad (t) and playing exploitation in Algorithm GCB is: Pr E Exploit (t) F Bad (t) e N(t). (94) Proof. If t > T 0, and E Exploit (t) {L CI (t) n σ (t 1) > η(t)}, we can write the probability of exploitation as: Pr E Exploit (t) F Bad (t) Pr E Exploit (t) F Bad (t) n σ (t 1) > η(t) + Pr E Exploit (t) F Bad (t) n σ (t 1) η(t) (96) Pr F Bad (t) n σ (t 1) > η(t) + Pr L CI (t) F Bad (t) n σ (t 1) η(t) (97) Pr F Bad (t) n σ (t 1) > η(t) + Pr L CI (t) F Bad (t) (98) Pr F Bad (t) n σ (t 1) > η(t) + e X N(t). {Lemma A.14} Since we know that n σ (t 1) > η(t), 0 < a 1 and θ 1, then n σ (t 1) > η(t) > µ(t) By Lemma A.17, the following inequality holds: (1 + θa) a 8L β σf X (t) > 8L βσf X (t). Pr F Bad (t) n σ (t 1) > η(t) (100) t 1 jη(t) t jη(t) t 1 jη(t) t 1 jη(t) Therefore, we can get: Pr F Bad (t) n σ (t 1) j (101) Pr F(n Bad n σ(t 1)) σ(t 1) j Pr e N(t). F Bad (j) (95) (99) (10) (10) e tn(t) {Lemma A.17 with b } (104) (105) Pr E Exploit (t) F Bad (t) e N(t) + e X N(t) e N(t). (106) Lemma A.19 (The exploration probability will drop). Suppose the instance has unique optimal action under distribution p, i.e., X Good 1. Let α 8L β σ a, 0 < a 1. For any time t > T 0, when n σ (t 1) µ(t) (1 + θa) αf X (t) where θ, and the probability of F Good (t) and exploration happening simultaneously is: Pr E Explore (t) F Good (t) n σ (t 1) µ(t) e X N(t). (107) Proof. By definition, the event that exploration happens at time t is E Explore (t) {L c CI (t) n σ η(t)}. When t > T 0, it is true that η(t) > µ(t).

11 On one hand, if n σ (t 1) > η(t), then by Lemma A.1, we know that Pr E Explore (t) n σ (t 1) > η(t) (108) Pr E Explore (t) nσ (t 1) > η(t) Pr n σ (t 1) > η(t) (109) 0. On the other hand, for µ(t) n σ (t 1) η(t), whether to play exploration only depends on the event L c CI (t). If F Good (t) {ˆx(t) X Good} and with the assumption that X Good 1, we know that X Good (X \ {ˆx(t)}). So the gap at time t is, x X \ {ˆx(t)}, ˆδˆx(t),x (t) ˆδ x,x(t) δ x,x ˆδ x,x(t) δ x,x (111) (110) ˆδ x,x(t) δ x,x { is the imum gap} (11) L ν ˆν(t 1). {Lemma A.7} (11) And we also know that if n σ (t 1) µ(t) (1 + θa) αf X (t), αf X (t) n σ (t 1) 1 + θa, (114) thus we can get Let α 8L β σ a Pr E Explore (t) F Good (t) µ(t) n σ (t 1) η(t) (115) αf Pr x X \ {ˆx(t)}, ˆδˆx(t),x X (t) (t) n σ (t 1) F Good (t) µ(t) n σ (t 1) η(t) (116) αf X (t) Pr L ν ˆν(t 1) n σ (t 1) µ(t) n σ(t 1) η(t) (117) Pr L ν ˆν(t 1) 1 + θa µ(t) n σ(t 1) η(t) (118) Pr L ν ˆν (nσ(t 1)) θa 1 + θa µ(t) n σ (t 1) η(t) (119) Pr L ν ˆν (nσ(t 1)) θa 1 + θa µ(t) n σ (t 1) η(t) (10) η(t) Pr L ν ˆν (nσ(t 1)) θa 1 + θa n σ (t 1) j (11) jµ(t) η(t) jµ(t) η(t) jµ(t) Pr L ν ˆν (j) θa 1 + θa n σ (t 1) j (1) Pr L ν ˆν (j) θa 1 + θa. (1), 0 < a 1. For j µ(t),, η(t) and µ(t) (1 + θa) αf X (t) Pr ν ˆν (j) 1 L θa 1 + θa } { e exp (θa) j (1 + θa) 8L, recall Lemma A., then we have: (14) {Lemma A.} (15) e exp { θ f X (t) } {j µ(t)} (16) e N(t) θ. (17)

12 Therefore, we have: Pr E Explore (t) F Good (t) n σ (t 1) µ(t) (18) Pr E Explore (t) F Good (t) n σ (t 1) > η(t) + Pr E Explore (t) F Good (t) µ(t) n σ (t 1) η(t) (19) 0 + η(t) jµ(t) e t N(t) θ e X N(t)1 θ e N(t) θ (10) (11) (1) e X N(t). {Let θ } (1) When the instance has a unique optimal action x under distribution p, the following lemmata ensures that exploration will not continue endlessly, thus it will switch to exploitation gradually. For simplicity, we consider the case that the exploration round has already reached µ(t ) at given time T. Lemma A.0 (Switch to exploitation gradually). Suppose the instance has a unique optimal action x under distribution p. Given time T, if for time i T the exploration rounds n σ (i) µ(t ) has already been satisfied, where µ(t ) (1 + θa) αf X (T ), 0 < a 1. θ. Then t, max{i + 1, T 0 } t T, the probability of playing exploration is: Proof. As n σ (i) µ(t ), we know that Pr E Explore (t) n σ (i) µ(t ) 4e N(t). (14) n σ (i) µ(t ) n σ (t 1) n σ (i) µ(t ), (15) which implies that the event n σ (i) µ(t ) is the subset of the event n σ (t 1) µ(t ). From Lemma A.19, the first part is For the second part, as 0 < a 1 and θ, we can get µ(t ) (1 + θa) αf X (T ) Pr E Explore (t) n σ (i) µ(t ) (16) Pr E Explore (t) n σ (t 1) µ(t ) (17) Pr E Explore (t) F Good (t) n σ (t 1) µ(t ) + Pr E Explore (t) F Bad (t) n σ (t 1) µ(t ) (18) Pr E Explore (t) F Good (t) n σ (t 1) µ(t ) (19) + Pr F Bad (t) n σ (t 1) µ(t ). (140) Pr E Explore (t) F Good (t) n σ (t 1) µ(t ) (141) Pr E Explore (t) F Good (t) n σ (t 1) µ(t) (14) e X N(t). (14) (1 + θa) αf X (t) (1 + θa) a 8L β σf X (t) > 8L βσf X (t). (144)

13 Thus, by using Lemma A.17, it is Pr F Bad (t) n σ (t 1) µ(t ) (145) t Therefore, we can get t 1 jµ(t ) t 1 jµ(t ) t 1 jµ(t ) t 1 jµ(t ) Pr F Bad (t) n σ (t 1) j (146) Pr F(n Bad n σ(t 1)) σ(t 1) j Pr F(j) Bad Pr F Bad (j) n σ (t 1) j (147) (148) (149) e tn(t) {Lemma A.17 with b } (150) e N(t). (151) Pr E Explore (t) n σ (i) µ(t ) e X N + e N 4e N. (15) For counter n σ, the following definition characterizes its first occurrence to be k. Definition A.1. Given k, for any t, we define the event that n σ (t) k and n σ (t 1) k 1 as G k (t), i.e., G k (t) {n σ (t) k n σ (t 1) k 1}. Lemma A. (Exploration Numbers). Let µ(t ) (1 + θa) αf X (T ), 0 < a 1 and θ. If under distribution p, there is a unique optimal action, i.e., X Good 1, then the expected exploration round at time T (T 0 T ) is: E n σ (T ) µ(t ) + 4e T0 X 4 ln(t + 1) Pr E Explore (t). (15) Proof. Note that it takes σ time steps to play exploration and then to increase n σ by 1. E FinExplore (t) is the event that the algorithm finishes one round of exploration and updates n σ at time t. Then, we have E FinExplore (t) E Explore (t σ + 1) and t 1,,, σ 1, Pr E FinExplore (t) 0, meaning that the event never happens for t < σ. By definition, we can get: E n σ (T ) T Pr E FinExplore (t) T Pr E FinExplore (t) t σ Because the accumulation of exploration rounds is n σ (T ), therefore its expected number can be: Pr E Explore (t). (154) E n σ (T ) Pr E Explore (t) (155) Pr E Explore (t) n σ (T ) < µ(t ) + Pr E Explore (t) n σ (T ) µ(t ). (156)

14 The following inequality ensures that the first part is not large: Pr E Explore (t) n σ (T ) < µ(t ) (157) Pr n σ (T ) < µ(t ) Pr E Explore (t) n σ (T ) < µ(t ) (158) Pr n σ (T ) < µ(t ) Pr E Explore (t) nσ (T ) < µ(t ) (159) Pr n σ (T ) < µ(t ) T Pr E FinExplore (t) n σ (T ) < µ(t ) (160) Pr n σ (T ) < µ(t ) E n σ (T ) n σ (T ) < µ(t ) (161) Pr n σ (T ) < µ(t ) µ(t ). (16) We know the counter n σ could only increase by 1 at a time. For this reason, if the value of n σ (T ) exceeds µ(t ) at time T {, this event must happen within t µ(t ),, T. Thus, the occurrence of µ(t ) is equivalent to the union of events T } G µ(t )(i). By definition, each event G µ(t ) (i), i µ(t ),, T, is mutually exclusive. Therefore, we have { T } {n σ (T ) µ(t )} G µ(t )(i), and the second part is: T T Pr E Explore (t) n σ (T ) µ(t ) (16) Pr E Explore (t) Pr T T T ( E Explore (t) G µ(t ) (i) ) G µ(t ) (i) (164) (165) Pr E Explore (t) G µ(t ) (i) {Union bound} (166) Pr E Explore (t) G µ(t ) (i) (167) i Pr E Explore (t) G µ(t ) (i) + T ti+1 Pr E Explore (t) G µ(t ) (i). (168)

15 Now we will prove that the first term is in O(µ(T )): T T T T T T i Pr E Explore (t) G µ(t ) (i) (169) i Pr G µ(t ) (i) Pr E Explore (t) Gµ(T ) (i) (170) Pr G µ(t ) (i) i Pr E Explore (t) G µ(t ) (i) (171) Pr G µ(t ) (i) i+ σ 1 Pr E FinExplore (t) G µ(t ) (i) (17) Pr G µ(t ) (i) E n σ (i + σ 1) G µ(t ) (i) (17) Pr G µ(t ) (i) E n σ (i) + 1 Gµ(T ) (i) {n σ (i + σ 1) n σ (i) + 1} (174) Pr n σ (T ) µ(t ) (µ(t ) + 1), {Mutually exclusive} (175) Since G µ(t ) (i) {n σ (i) µ(t ) n σ (i 1) µ(t ) 1}, we can write the second term as: T T T ti+1 T 0 T 0 T 0 T Pr E Explore (t) G µ(t ) (i) (176) Pr E Explore (t) G µ(t ) (i) + Pr E Explore (t) G µ(t ) (i) + Pr E Explore (t) G µ(t ) (i) + T 0 Pr E Explore (t) T 0 T Pr E Explore (t) T + 4e X 4 T 0 Pr E Explore (t) + 4e ln T. X 4 Therefore, we can get T tmax{i+1,t 0+1} T tmax{i+1,t 0+1} T tmax{i+1,t 0+1} G µ(t ) (i) + 4e X 4 T T Pr E Explore (t) G µ(t ) (i) (177) Pr E Explore (t) n σ (i) µ(t ) (178) 4e N(t) ti t dt di {Lemma A.0} (179) {Mutually exclusive} (180) 1 di (181) i (18) E n σ (T ) µ(t ) e T0 X 4 ln T + Pr E Explore (t). (18)

16 Theorem 4. (in the main text): (Distribution-dependent bound). For Algorithm GCB, let f X (t) ln t + ln X, α 4L βσ. If the instance has a unique optimal action under outcome distribution p and mean outcome vector ν, the distribution-dependent regret bound of Algorithm GCB is: R(T ) 96L βσ x x σ (ln T + ln X ) + 4e X 4 ln T + 1 where x σ x, max and are problem-specific constants under the distribution p. ( e + max X L βσ ), (184) Proof. If we penalize each time the algorithm plays a sub-optimal action by max, then the regret function is composed of exploration and exploitation: R(T ) x σ x E n σ (T ) + max x σ x E n σ (T ) + max T E I E Exploit (t) F Bad (t) (185) T Pr E Exploit (t) F Bad (t). (186) Suppose it has unique optimal action X Good 1, from Lemma A. the expected rounds of exploration are: E n σ (T ) (1 + θa) αf X (T ) e T0 X 4 ln(t + 1) + The regret of exploitation phase can be inferred from Lemma A.18 that: max max Pr E Explore (t). (187) T Pr F Bad (t) E Exploit (t) (188) ( T tt 0+1 ( e max X 4 ( max e T0 N(t) + Pr F Bad (t) E Exploit (t) ) (189) T tt 0+1 e T0 X 4 + T0 1 t + Pr E Exploit (t) ) (190) Pr E Exploit (t) ). (191) Since for t 1,,, T 0, we perform either exploration or exploitation, the regret is no worse than max T 0, that is: T 0 x Pr E Explore (t) T 0 + max Pr E Exploit (t) max T 0. (19) x σ Thus, for f X (t) ln t + ln X, α 8L β σ a, 0 < a 1 and θ, R(T ) (1 + θa) α x (ln T + ln X ) + 4e x σ X 4 ln T + 1 where T 0 (1+θa) α. Let a 1, θ, and α 4L βσ. As a conclusion, we will get: R(T ) 96L β σ x (ln T + ln X ) + 4e X 4 ln T + 1 x σ ( ) e + max X 4 + T 0, (19) ( e + max X L βσ ). (194)

17 B. An Example of M σ and Global Observer Set Construction for 1 < s < N in the Crowdsourcing Application In this section, we provide an example of constructing the stacked matrix M σ and the global observer set in the crowdsourcing application when we require 1 < s < N, where s is the number of matched worker-task pairs used for reporting the feedback. Recall that the feedback for a matching is the simple summation of these s matched worker-task pairs. This implies that for each matching x, the transformation matrix M x contains a single row with exactly s 1s and all other entries are 0, and M x x s. As an illustration, consider the case that both N and M are divisible by s + 1. Then we can construct a full-rank square matrix M σ such that, after rearranging the columns of M σ, it is a block diagonal matrix with each block B being an (s+1)-by-(s+1) square matrix with 0 in the diagonal entries and 1 as off-diagonal entries. The following is an illustration of such a matrix for the case of s + 1 N M It is clear that this M σ is full column rank. To recover the NM actions (matchings) corresponding to the NM rows, we map each block B to a matching that matches s + 1 workers to s + 1 tasks such that these matchings share no common edges. This can be done in the following way. We partition N workers into N/(s+1) groups of size s+1 each, and partition M tasks into M/(s+1) groups of size s+1 each. Taking any group W of s + 1 workers and any group U of s + 1 tasks, we can find s + 1 non-overlapping matchings between W and U by rotation: in the j-th matching, the i-th worker is matched with the (i + j mod s + 1)-th task. Since we have NM/(s + 1) worker-task group pairs, and each group pair generates s + 1 non-overlapping matchings, in total we have NM/(s + 1) non-overlapping matchings, and we map these matches to the NM/(s + 1) blocks in the rearranged matrix M σ. The above construction implies that we can find NM actions to form a global observer set, in which each action is a matching of s + 1 workers to s + 1 tasks, and each matching returns an aggregate performance feedback of s worker-task pairs in the matching. Thus the assumption on the existence of the global observer set holds and the set can be constructed easily. The error bound for the above constructed M σ is more complicated to analyze, but by our empirical evaluation using Matlab, we believe that it is also a low-degree polynomial in N and M. References Hayes, Thomas P. A large-deviation inequality for vector-valued martingales. Combinatorics, Probability and Computing, 00.

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

AMH4 - ADVANCED OPTION PRICING. Contents

AMH4 - ADVANCED OPTION PRICING. Contents AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5

More information

Information aggregation for timing decision making.

Information aggregation for timing decision making. MPRA Munich Personal RePEc Archive Information aggregation for timing decision making. Esteban Colla De-Robertis Universidad Panamericana - Campus México, Escuela de Ciencias Económicas y Empresariales

More information

Lecture 19: March 20

Lecture 19: March 20 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 19: March 0 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3 6.896 Topics in Algorithmic Game Theory February 0, 200 Lecture 3 Lecturer: Constantinos Daskalakis Scribe: Pablo Azar, Anthony Kim In the previous lecture we saw that there always exists a Nash equilibrium

More information

Lecture 14: Examples of Martingales and Azuma s Inequality. Concentration

Lecture 14: Examples of Martingales and Azuma s Inequality. Concentration Lecture 14: Examples of Martingales and Azuma s Inequality A Short Summary of Bounds I Chernoff (First Bound). Let X be a random variable over {0, 1} such that P [X = 1] = p and P [X = 0] = 1 p. n P X

More information

Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model

Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model 1(23) Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility

More information

arxiv: v1 [cs.lg] 21 May 2011

arxiv: v1 [cs.lg] 21 May 2011 Calibration with Changing Checking Rules and Its Application to Short-Term Trading Vladimir Trunov and Vladimir V yugin arxiv:1105.4272v1 [cs.lg] 21 May 2011 Institute for Information Transmission Problems,

More information

QI SHANG: General Equilibrium Analysis of Portfolio Benchmarking

QI SHANG: General Equilibrium Analysis of Portfolio Benchmarking General Equilibrium Analysis of Portfolio Benchmarking QI SHANG 23/10/2008 Introduction The Model Equilibrium Discussion of Results Conclusion Introduction This paper studies the equilibrium effect of

More information

Lecture 23: April 10

Lecture 23: April 10 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 23: April 10 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

Levin Reduction and Parsimonious Reductions

Levin Reduction and Parsimonious Reductions Levin Reduction and Parsimonious Reductions The reduction R in Cook s theorem (p. 266) is such that Each satisfying truth assignment for circuit R(x) corresponds to an accepting computation path for M(x).

More information

Lecture 4. Finite difference and finite element methods

Lecture 4. Finite difference and finite element methods Finite difference and finite element methods Lecture 4 Outline Black-Scholes equation From expectation to PDE Goal: compute the value of European option with payoff g which is the conditional expectation

More information

3.2 No-arbitrage theory and risk neutral probability measure

3.2 No-arbitrage theory and risk neutral probability measure Mathematical Models in Economics and Finance Topic 3 Fundamental theorem of asset pricing 3.1 Law of one price and Arrow securities 3.2 No-arbitrage theory and risk neutral probability measure 3.3 Valuation

More information

Microeconomic Theory II Preliminary Examination Solutions

Microeconomic Theory II Preliminary Examination Solutions Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose

More information

Notes on the symmetric group

Notes on the symmetric group Notes on the symmetric group 1 Computations in the symmetric group Recall that, given a set X, the set S X of all bijections from X to itself (or, more briefly, permutations of X) is group under function

More information

Sublinear Time Algorithms Oct 19, Lecture 1

Sublinear Time Algorithms Oct 19, Lecture 1 0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation

More information

GPD-POT and GEV block maxima

GPD-POT and GEV block maxima Chapter 3 GPD-POT and GEV block maxima This chapter is devoted to the relation between POT models and Block Maxima (BM). We only consider the classical frameworks where POT excesses are assumed to be GPD,

More information

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical

More information

Regret Minimization and Correlated Equilibria

Regret Minimization and Correlated Equilibria Algorithmic Game heory Summer 2017, Week 4 EH Zürich Overview Regret Minimization and Correlated Equilibria Paolo Penna We have seen different type of equilibria and also considered the corresponding price

More information

MATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models

MATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models MATH 5510 Mathematical Models of Financial Derivatives Topic 1 Risk neutral pricing principles under single-period securities models 1.1 Law of one price and Arrow securities 1.2 No-arbitrage theory and

More information

The Analytics of Information and Uncertainty Answers to Exercises and Excursions

The Analytics of Information and Uncertainty Answers to Exercises and Excursions The Analytics of Information and Uncertainty Answers to Exercises and Excursions Chapter 6: Information and Markets 6.1 The inter-related equilibria of prior and posterior markets Solution 6.1.1. The condition

More information

On Complexity of Multistage Stochastic Programs

On Complexity of Multistage Stochastic Programs On Complexity of Multistage Stochastic Programs Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA e-mail: ashapiro@isye.gatech.edu

More information

Equity correlations implied by index options: estimation and model uncertainty analysis

Equity correlations implied by index options: estimation and model uncertainty analysis 1/18 : estimation and model analysis, EDHEC Business School (joint work with Rama COT) Modeling and managing financial risks Paris, 10 13 January 2011 2/18 Outline 1 2 of multi-asset models Solution to

More information

Martingales. by D. Cox December 2, 2009

Martingales. by D. Cox December 2, 2009 Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a

More information

Single-Parameter Mechanisms

Single-Parameter Mechanisms Algorithmic Game Theory, Summer 25 Single-Parameter Mechanisms Lecture 9 (6 pages) Instructor: Xiaohui Bei In the previous lecture, we learned basic concepts about mechanism design. The goal in this area

More information

Interpolation. 1 What is interpolation? 2 Why are we interested in this?

Interpolation. 1 What is interpolation? 2 Why are we interested in this? Interpolation 1 What is interpolation? For a certain function f (x we know only the values y 1 = f (x 1,,y n = f (x n For a point x different from x 1,,x n we would then like to approximate f ( x using

More information

Roy Model of Self-Selection: General Case

Roy Model of Self-Selection: General Case V. J. Hotz Rev. May 6, 007 Roy Model of Self-Selection: General Case Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income

More information

Essays on Some Combinatorial Optimization Problems with Interval Data

Essays on Some Combinatorial Optimization Problems with Interval Data Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university

More information

Bandit Learning with switching costs

Bandit Learning with switching costs Bandit Learning with switching costs Jian Ding, University of Chicago joint with: Ofer Dekel (MSR), Tomer Koren (Technion) and Yuval Peres (MSR) June 2016, Harvard University Online Learning with k -Actions

More information

On Packing Densities of Set Partitions

On Packing Densities of Set Partitions On Packing Densities of Set Partitions Adam M.Goyt 1 Department of Mathematics Minnesota State University Moorhead Moorhead, MN 56563, USA goytadam@mnstate.edu Lara K. Pudwell Department of Mathematics

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period

More information

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ. Sufficient Statistics Lecture Notes 6 Sufficiency Data reduction in terms of a particular statistic can be thought of as a partition of the sample space X. Definition T is sufficient for θ if the conditional

More information

Max Registers, Counters and Monotone Circuits

Max Registers, Counters and Monotone Circuits James Aspnes 1 Hagit Attiya 2 Keren Censor 2 1 Yale 2 Technion Counters Model Collects Our goal: build a cheap counter for an asynchronous shared-memory system. Two operations: increment and read. Read

More information

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued)

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued) CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued) Instructor: Shaddin Dughmi Administrivia Homework 1 due today. Homework 2 out

More information

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Solutions to Final Exam.

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Solutions to Final Exam. The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (32 pts) Answer briefly the following questions. 1. Suppose

More information

Modelling Returns: the CER and the CAPM

Modelling Returns: the CER and the CAPM Modelling Returns: the CER and the CAPM Carlo Favero Favero () Modelling Returns: the CER and the CAPM 1 / 20 Econometric Modelling of Financial Returns Financial data are mostly observational data: they

More information

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0. Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization

More information

Yao s Minimax Principle

Yao s Minimax Principle Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,

More information

Game Theory: Normal Form Games

Game Theory: Normal Form Games Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.

More information

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples 1.3 Regime switching models A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples (or regimes). If the dates, the

More information

A No-Arbitrage Theorem for Uncertain Stock Model

A No-Arbitrage Theorem for Uncertain Stock Model Fuzzy Optim Decis Making manuscript No (will be inserted by the editor) A No-Arbitrage Theorem for Uncertain Stock Model Kai Yao Received: date / Accepted: date Abstract Stock model is used to describe

More information

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go

More information

TTIC An Introduction to the Theory of Machine Learning. Learning and Game Theory. Avrim Blum 5/7/18, 5/9/18

TTIC An Introduction to the Theory of Machine Learning. Learning and Game Theory. Avrim Blum 5/7/18, 5/9/18 TTIC 31250 An Introduction to the Theory of Machine Learning Learning and Game Theory Avrim Blum 5/7/18, 5/9/18 Zero-sum games, Minimax Optimality & Minimax Thm; Connection to Boosting & Regret Minimization

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

The Fixed Income Valuation Course. Sanjay K. Nawalkha Gloria M. Soto Natalia A. Beliaeva

The Fixed Income Valuation Course. Sanjay K. Nawalkha Gloria M. Soto Natalia A. Beliaeva Interest Rate Risk Modeling The Fixed Income Valuation Course Sanjay K. Nawalkha Gloria M. Soto Natalia A. Beliaeva Interest t Rate Risk Modeling : The Fixed Income Valuation Course. Sanjay K. Nawalkha,

More information

Asymptotic results discrete time martingales and stochastic algorithms

Asymptotic results discrete time martingales and stochastic algorithms Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete

More information

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 11 10/9/2013. Martingales and stopping times II

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 11 10/9/2013. Martingales and stopping times II MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 11 10/9/013 Martingales and stopping times II Content. 1. Second stopping theorem.. Doob-Kolmogorov inequality. 3. Applications of stopping

More information

M5MF6. Advanced Methods in Derivatives Pricing

M5MF6. Advanced Methods in Derivatives Pricing Course: Setter: M5MF6 Dr Antoine Jacquier MSc EXAMINATIONS IN MATHEMATICS AND FINANCE DEPARTMENT OF MATHEMATICS April 2016 M5MF6 Advanced Methods in Derivatives Pricing Setter s signature...........................................

More information

Bayesian Linear Model: Gory Details

Bayesian Linear Model: Gory Details Bayesian Linear Model: Gory Details Pubh7440 Notes By Sudipto Banerjee Let y y i ] n i be an n vector of independent observations on a dependent variable (or response) from n experimental units. Associated

More information

Math-Stat-491-Fall2014-Notes-V

Math-Stat-491-Fall2014-Notes-V Math-Stat-491-Fall2014-Notes-V Hariharan Narayanan December 7, 2014 Martingales 1 Introduction Martingales were originally introduced into probability theory as a model for fair betting games. Essentially

More information

Is Greedy Coordinate Descent a Terrible Algorithm?

Is Greedy Coordinate Descent a Terrible Algorithm? Is Greedy Coordinate Descent a Terrible Algorithm? Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, Hoyt Koepke University of British Columbia Optimization and Big Data, 2015 Context: Random

More information

Web Appendix: Proofs and extensions.

Web Appendix: Proofs and extensions. B eb Appendix: Proofs and extensions. B.1 Proofs of results about block correlated markets. This subsection provides proofs for Propositions A1, A2, A3 and A4, and the proof of Lemma A1. Proof of Proposition

More information

The Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition.

The Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition. The Real Numbers Here we show one way to explicitly construct the real numbers R. First we need a definition. Definitions/Notation: A sequence of rational numbers is a funtion f : N Q. Rather than write

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

3 The Model Existence Theorem

3 The Model Existence Theorem 3 The Model Existence Theorem Although we don t have compactness or a useful Completeness Theorem, Henkinstyle arguments can still be used in some contexts to build models. In this section we describe

More information

Non replication of options

Non replication of options Non replication of options Christos Kountzakis, Ioannis A Polyrakis and Foivos Xanthos June 30, 2008 Abstract In this paper we study the scarcity of replication of options in the two period model of financial

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Brownian Motion. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011

Brownian Motion. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011 Brownian Motion Richard Lockhart Simon Fraser University STAT 870 Summer 2011 Richard Lockhart (Simon Fraser University) Brownian Motion STAT 870 Summer 2011 1 / 33 Purposes of Today s Lecture Describe

More information

Statistical Tables Compiled by Alan J. Terry

Statistical Tables Compiled by Alan J. Terry Statistical Tables Compiled by Alan J. Terry School of Science and Sport University of the West of Scotland Paisley, Scotland Contents Table 1: Cumulative binomial probabilities Page 1 Table 2: Cumulative

More information

CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION

CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION Szabolcs Sebestyén szabolcs.sebestyen@iscte.pt Master in Finance INVESTMENTS Sebestyén (ISCTE-IUL) Choice Theory Investments 1 / 65 Outline 1 An Introduction

More information

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (42 pts) Answer briefly the following questions. 1. Questions

More information

Evaluating Strategic Forecasters. Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017

Evaluating Strategic Forecasters. Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017 Evaluating Strategic Forecasters Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017 Motivation Forecasters are sought after in a variety of

More information

LECTURE NOTES 10 ARIEL M. VIALE

LECTURE NOTES 10 ARIEL M. VIALE LECTURE NOTES 10 ARIEL M VIALE 1 Behavioral Asset Pricing 11 Prospect theory based asset pricing model Barberis, Huang, and Santos (2001) assume a Lucas pure-exchange economy with three types of assets:

More information

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation

Chapter 3: Black-Scholes Equation and Its Numerical Evaluation Chapter 3: Black-Scholes Equation and Its Numerical Evaluation 3.1 Itô Integral 3.1.1 Convergence in the Mean and Stieltjes Integral Definition 3.1 (Convergence in the Mean) A sequence {X n } n ln of random

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

AD in Monte Carlo for finance

AD in Monte Carlo for finance AD in Monte Carlo for finance Mike Giles giles@comlab.ox.ac.uk Oxford University Computing Laboratory AD & Monte Carlo p. 1/30 Overview overview of computational finance stochastic o.d.e. s Monte Carlo

More information

Log-linear Dynamics and Local Potential

Log-linear Dynamics and Local Potential Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically

More information

1 Dynamic programming

1 Dynamic programming 1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants

More information

MAT 4250: Lecture 1 Eric Chung

MAT 4250: Lecture 1 Eric Chung 1 MAT 4250: Lecture 1 Eric Chung 2Chapter 1: Impartial Combinatorial Games 3 Combinatorial games Combinatorial games are two-person games with perfect information and no chance moves, and with a win-or-lose

More information

American options and early exercise

American options and early exercise Chapter 3 American options and early exercise American options are contracts that may be exercised early, prior to expiry. These options are contrasted with European options for which exercise is only

More information

The stochastic calculus

The stochastic calculus Gdansk A schedule of the lecture Stochastic differential equations Ito calculus, Ito process Ornstein - Uhlenbeck (OU) process Heston model Stopping time for OU process Stochastic differential equations

More information

FIGURE A1.1. Differences for First Mover Cutoffs (Round one to two) as a Function of Beliefs on Others Cutoffs. Second Mover Round 1 Cutoff.

FIGURE A1.1. Differences for First Mover Cutoffs (Round one to two) as a Function of Beliefs on Others Cutoffs. Second Mover Round 1 Cutoff. APPENDIX A. SUPPLEMENTARY TABLES AND FIGURES A.1. Invariance to quantitative beliefs. Figure A1.1 shows the effect of the cutoffs in round one for the second and third mover on the best-response cutoffs

More information

Counting Basics. Venn diagrams

Counting Basics. Venn diagrams Counting Basics Sets Ways of specifying sets Union and intersection Universal set and complements Empty set and disjoint sets Venn diagrams Counting Inclusion-exclusion Multiplication principle Addition

More information

STOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION

STOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION STOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION BINGCHAO HUANGFU Abstract This paper studies a dynamic duopoly model of reputation-building in which reputations are treated as capital stocks that

More information

Monte Carlo Methods in Option Pricing. UiO-STK4510 Autumn 2015

Monte Carlo Methods in Option Pricing. UiO-STK4510 Autumn 2015 Monte Carlo Methods in Option Pricing UiO-STK4510 Autumn 015 The Basics of Monte Carlo Method Goal: Estimate the expectation θ = E[g(X)], where g is a measurable function and X is a random variable such

More information

Week 1 Quantitative Analysis of Financial Markets Basic Statistics A

Week 1 Quantitative Analysis of Financial Markets Basic Statistics A Week 1 Quantitative Analysis of Financial Markets Basic Statistics A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

An Intertemporal Capital Asset Pricing Model

An Intertemporal Capital Asset Pricing Model I. Assumptions Finance 400 A. Penati - G. Pennacchi Notes on An Intertemporal Capital Asset Pricing Model These notes are based on the article Robert C. Merton (1973) An Intertemporal Capital Asset Pricing

More information

Dynamic Replication of Non-Maturing Assets and Liabilities

Dynamic Replication of Non-Maturing Assets and Liabilities Dynamic Replication of Non-Maturing Assets and Liabilities Michael Schürle Institute for Operations Research and Computational Finance, University of St. Gallen, Bodanstr. 6, CH-9000 St. Gallen, Switzerland

More information

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015 Best-Reply Sets Jonathan Weinstein Washington University in St. Louis This version: May 2015 Introduction The best-reply correspondence of a game the mapping from beliefs over one s opponents actions to

More information

Lecture 11: Bandits with Knapsacks

Lecture 11: Bandits with Knapsacks CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic

More information

IEOR E4004: Introduction to OR: Deterministic Models

IEOR E4004: Introduction to OR: Deterministic Models IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the

More information

Theoretical Statistics. Lecture 4. Peter Bartlett

Theoretical Statistics. Lecture 4. Peter Bartlett 1. Concentration inequalities. Theoretical Statistics. Lecture 4. Peter Bartlett 1 Outline of today s lecture We have been looking at deviation inequalities, i.e., bounds on tail probabilities likep(x

More information

monotone circuit value

monotone circuit value monotone circuit value A monotone boolean circuit s output cannot change from true to false when one input changes from false to true. Monotone boolean circuits are hence less expressive than general circuits.

More information

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.

More information

No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate

No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate Fuzzy Optim Decis Making 217 16:221 234 DOI 117/s17-16-9246-8 No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate Xiaoyu Ji 1 Hua Ke 2 Published online: 17 May 216 Springer

More information

Central Limit Theorem for the Realized Volatility based on Tick Time Sampling. Masaaki Fukasawa. University of Tokyo

Central Limit Theorem for the Realized Volatility based on Tick Time Sampling. Masaaki Fukasawa. University of Tokyo Central Limit Theorem for the Realized Volatility based on Tick Time Sampling Masaaki Fukasawa University of Tokyo 1 An outline of this talk is as follows. What is the Realized Volatility (RV)? Known facts

More information

Single Price Mechanisms for Revenue Maximization in Unlimited Supply Combinatorial Auctions

Single Price Mechanisms for Revenue Maximization in Unlimited Supply Combinatorial Auctions Single Price Mechanisms for Revenue Maximization in Unlimited Supply Combinatorial Auctions Maria-Florina Balcan Avrim Blum Yishay Mansour February 2007 CMU-CS-07-111 School of Computer Science Carnegie

More information

Write legibly. Unreadable answers are worthless.

Write legibly. Unreadable answers are worthless. MMF 2021 Final Exam 1 December 2016. This is a closed-book exam: no books, no notes, no calculators, no phones, no tablets, no computers (of any kind) allowed. Do NOT turn this page over until you are

More information

Techniques for Calculating the Efficient Frontier

Techniques for Calculating the Efficient Frontier Techniques for Calculating the Efficient Frontier Weerachart Kilenthong RIPED, UTCC c Kilenthong 2017 Tee (Riped) Introduction 1 / 43 Two Fund Theorem The Two-Fund Theorem states that we can reach any

More information

Risk Neutral Measures

Risk Neutral Measures CHPTER 4 Risk Neutral Measures Our aim in this section is to show how risk neutral measures can be used to price derivative securities. The key advantage is that under a risk neutral measure the discounted

More information

Homework Assignments

Homework Assignments Homework Assignments Week 1 (p. 57) #4.1, 4., 4.3 Week (pp 58 6) #4.5, 4.6, 4.8(a), 4.13, 4.0, 4.6(b), 4.8, 4.31, 4.34 Week 3 (pp 15 19) #1.9, 1.1, 1.13, 1.15, 1.18 (pp 9 31) #.,.6,.9 Week 4 (pp 36 37)

More information

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (40 points) Answer briefly the following questions. 1. Consider

More information

Representing Risk Preferences in Expected Utility Based Decision Models

Representing Risk Preferences in Expected Utility Based Decision Models Representing Risk Preferences in Expected Utility Based Decision Models Jack Meyer Department of Economics Michigan State University East Lansing, MI 48824 jmeyer@msu.edu SCC-76: Economics and Management

More information

Value of Flexibility in Managing R&D Projects Revisited

Value of Flexibility in Managing R&D Projects Revisited Value of Flexibility in Managing R&D Projects Revisited Leonardo P. Santiago & Pirooz Vakili November 2004 Abstract In this paper we consider the question of whether an increase in uncertainty increases

More information

Comprehensive Exam. August 19, 2013

Comprehensive Exam. August 19, 2013 Comprehensive Exam August 19, 2013 You have a total of 180 minutes to complete the exam. If a question seems ambiguous, state why, sharpen it up and answer the sharpened-up question. Good luck! 1 1 Menu

More information

You Have an NP-Complete Problem (for Your Thesis)

You Have an NP-Complete Problem (for Your Thesis) You Have an NP-Complete Problem (for Your Thesis) From Propositions 27 (p. 242) and Proposition 30 (p. 245), it is the least likely to be in P. Your options are: Approximations. Special cases. Average

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

An Axiomatic Approach to Arbitration and Its Application in Bargaining Games

An Axiomatic Approach to Arbitration and Its Application in Bargaining Games An Axiomatic Approach to Arbitration and Its Application in Bargaining Games Kang Rong School of Economics, Shanghai University of Finance and Economics Aug 30, 2012 Abstract We define an arbitration problem

More information