Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4.
|
|
- Baldwin Caldwell
- 5 years ago
- Views:
Transcription
1 Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4. If the reader will recall, we have the following problem-specific constants in the main text: the size of the global observer set σ, parameter L > 0 from the continuity assumption, error bound max (Mσ M σ ) 1 σ i1 M x i M xi (ν i ν 0 ), where the max is taken from ν 0, ν 1,, ν σ 0, 1 n, and the maximum difference in the expected reward R max max x1,x X,ν 0,1 n r(x 1, ν) r(x, ν). For technical reasons, we also defined φ(ν) max((ν, 1), 0) to adjust ν to the nearest vector in 0, 1 n, and r(x, ν) r(x, φ(ν)), ν R n \ 0, 1 n to preserve the Lipschitz continuity throughout R n. To make our proof clearer, we define v(t) as the state of any variable v by the end of time step t. Our analysis is based on the snapshot of all variables just before the statement t t + 1 (Line 14 and 0). One batch processing in exploration phase is called one round, and then n σ is increased by 1. Denote ˆν (j) as the estimated mean of outcomes after j rounds of exploration. For example, at time t, the estimated mean of outcomes is ˆν(t) and the exploration counter is n σ (t), so we have ˆν (nσ(t)) ˆν(t). And for time step t + 1, the player will use the previous knowledge of ˆν(t) to get ˆx(t + 1) argmax x X r(x, ˆν(t)) and ˆx (t + 1) argmax x X \{ˆx(t+1)} r(x, ˆν(t)). In the following analysis, the frequency function is set to f X (t) ln t + ln X. Note that by using f X (t), we can αf construct the confidence interval X (t) n σ to eliate failures with high probability. Define N(t) X t, which will be frequently used in our analysis, then exp{ f X (t)} N(t) 1. Let α 8L β σ a, where a > 0 is a parameter to be tuned later. The symbols used in the proof are listed in Table 1, and to facilitate the understanding of our proof, each lemma is briefly summarized in Table. Below we give an outline of the complete proof, which consists of three main parts: First, we introduce some general insights, present the preliaries, and prove basic properties of our model and the Algorithm GCB, which is shared by the proofs of both distribution-independent and distribution-dependent regret bounds. Next, we obtain the concentration property of the empirical mean of outcomes via Azuma-Hoeffding Inequality (Fact A.) in Lemma A.. Lemma A.1 shows that our algorithm bounds the number of exploration rounds by O(T / log T ), which implies that our algorithm will not play exploration for too long. In Lemma A.14, we prove that when the gap between the estimated optimal action ˆx and the second optimal ˆx is large (i.e. the first condition in Line 8), with low probability the estimated optimal action is sub-optimal. This means that our global confidence bound will exclude sub-optimal actions effectively. In Section A.1, we prove the distribution-independent regret bound of O(T log T ) (Theorem 4.1 in the main text). In Lemma A.15, we show that the gap between the estimated optimal action ˆx and the real one x decays exponentially with the number of exploration rounds. Thus, the penalty in exploitation phase can be derived in Lemma A.16. Then, we use Lemmas A.1 and A.16 to prove Theorem 4.1 in the main text. Hence the distribution-independent bound O(T log T ) is achieved. In Section A., we prove the distribution-dependent bound of O(log T ) related to the predetered distribution p, assug that the optimal action ( x is unique ) (Theorem 4. in the main text). First, we show in Lemma A.17 that, when the algorithm plays Ω ln t+ ln X rounds of exploration, the probability of the estimated optimal action ˆx being sub-optimal is low. Then, in Lemma A.18, we combine the results of Lemmas A.14 and A.17 and show that with a low probability the algorithm exploits with a sub-optimal action. Thus, Lemma A.18 is enough to bound the regret ( of exploitation. ) Next, we bound the regret of exploration by bounding the number of exploration rounds to O ln T + ln X in Lemma A.. This is done by showing that whenever the algorithm has conducted ( ) Θ ln t+ ln X rounds of exploration, with high probability it switches to exploitation (Lemma A.19), and then aggregating multiple switches between exploration and exploitation in the proof of Lemma A.. Finally, we combine Lemmas A.18 and A. to prove Theorem 4. in the main text. Fact A.1. The following probability laws will be used in the analysis.
2 Symbols in the main text v(t) v 0, 1 n ṽ 0, 1 n ν ˆν x X r(x, v), r(x, ν) R M x R mx n y(t) R m x(t) y x, max, σ X L R max f X (t) x ˆx(t) ˆx (t) Symbols in the proof n σ µ(t), η(t) X Good, X Bad X F Good, F Bad L CI, L c CI E Explore, E InExplore, E FinExplore E Exploit δ xi,x j, ˆδ xi,x j (t) R G k Definition state any variable v by the end of time step t outcomes of the environment estimation of outcomes through inversion mean of outcomes empirical mean of outcomes action x in action set X reward function taking x and v, and expected reward function taking x and ν transformation matrix of action x where m x depends on x feedback vector under the choice of x(t) vector that stacks feedbacks from different times reward gap of action x, of the maximum, of the (positive) imum global observer set of actions Lipschitz constant distribution-independent error bound from σ distribution-independent largest gap of expected reward frequency function real optimal action estimated optimal action at time t estimated second optimal action at time t Definition exploration counter threshold functions good action set, and bad action set event of choosing good action set, and bad action set event of occurring gap being larger than confidence interval and its complement event of doing exploration, being in the middle of it, and being at its end event of exploitation reward gap of action x i and x j, and its estimated value at time t the event indicating the first occurrence of k rounds of exploration Table 1. List of symbols in the proof. Succinct interpretation of the results Dependence Lemma A. Estimate of outcomes concentrates around the mean. Fact A. Lemma A.7 Difference of real and estimated gap is bounded. Lemma A.8 Estimated error of outcomes is small compared to the confidence interval. Lemma A. Lemma A.1 The counter of exploration is bounded within O(T / log T ). Lemma A.14 Finding a bad action to fail confidence interval occurs rarely. Lemma A.8 Lemma A.15 Incurring a large penalty for current optimal action is rare. Lemma A. Lemma A.16 The penalty in the exploitation phase is bounded. Lemmas A.14, A.15 Theorem 4.1 Distribution-independent bound: O(T / log T ). Lemmas A.1, A.16 Lemma A.17 With enough exploration, finding a bad action is rare. Lemma A. Lemma A.18 Finding a bad action and exploiting it become rare as time elapses. Lemmas A.14, A.17 Lemma A.19 With enough exploration, finding a good action but yet exploring becomes rare. Lemmas A., A.7, A.1 Lemma A.0 Once the algorithm performs enough exploration, it switches to exploitation. Lemmas A.17, A.19 Lemma A. Exploration rounds are bounded. Lemma A.0 Theorem 4. Distribution-dependent bound: O(log T ). Lemmas A.18, A. Table. List of lemmas and their dependencies in the proof.
3 Law of Conditional Probability: Pr A B Pr A B Pr B. Law of Total Probability: if {B n : n 1,, } is a set of disjoint events whose union is the entire sample space, then Pr A Pr A B n. n Fact A. (Azuma-Hoeffding Inequality in Euclidean Space (Theorem 1.8 of (Hayes, 00))). Let X (X 0,, X n ) be a very-weak martingale, which is defined for every i, E X i X i 1 X i 1, and it takes values in Euclidean Space, such that for every i, X i R d. Suppose X 0 0, and for i 1,, n, X i X i 1 1. Then, for every ɛ > 0, (ɛ 1) 1 Pr X n ɛ < e n < e e ɛ n. (1) We can use the preceding fact to obtain the concentration property of outcomes during exploration. Lemma A. (Concentration during exploration). After the exploration round i 1,,, j at t 1, t,, t j respectively, we use the inverse to get ṽ i I(M σ, y(t i )) M + σ y i and their mean is ˆν (j) 1 j j i1 ṽi. Then, γ > 0: Pr ν ˆν (j) γ Proof. For each i, let X i be the sequence sum satisfying X i i l1 e exp{ γ j βσ }. () ν ṽ l ν ṽ, where E i 0, and ν ṽi 1. So X i X i 1 ν ṽi implies X i X i 1 1. And we know that ṽ i is independent of the previous inverse ṽ 1,, ṽ i 1, so it holds that E X i X i 1 X i 1 E X i X i 1 X i 1 () ν ṽi E X i 1 (4) ν ṽi E 0. (5) Therefore, X (X 0,, X n ) satisfies the definition of a very-weak martingale. Apply Fact A., and it will achieve the bound ɛ > 0, Pr X j ɛ < e e ɛ j. Let γ ɛ j, as ν ˆν (j) βσ j X j, we will get: γ > 0, Pr ν ˆν (j) γ < e exp{ γ j βσ }. (6) Under a predetered outcome distribution p with mean outcome vector ν and x argmax x X r(x, ν), in the main text we define the gap: x r(x, ν) r(x, ν), (7) max max{ x : x X }, (8) { x : x X, x > 0}. (9) Definition A.4 (Good actions / bad actions). Based on the distance to the optimal action, define good actions and bad actions as: X Good {x : x X, x 0} (10) X Bad {x : x X, x > 0}. (11) Therefore, X X Good X Bad. Moreover, x X Good. (x is unique if and only if X Good 1.)
4 Definition A.5 (Events of finding a good action / bad action). Define ˆx(t) argmax x X r(x, ˆν(t 1)) as the current optimal action at time t. Let F Bad (t) be the event that fails to choose the optimal action at time t. Formally, F Bad (t) and its complement event are: F Bad (t) {ˆx(t) X Bad} (1) F Good (t) {ˆx(t) X Good}. (1) To build the connection with the exploration round j, we define the time-invariant event F(j) Bad as the event in which the algorithm fails to choose the optimal action after j rounds of exploration: F Bad (j) {ˆx (j) X Bad} (14) { ˆx (j) X Good}, (15) F Good (j) where ˆx (j) argmax x X r(x, ˆν (j) ). By definition, it is always true that F Bad (n σ(t 1)) F Bad (t) and F Good (n σ(t 1)) F Good (t). Definition A.6 (Estimated gap and real gap). For any pair of action x i, x j X, defined the gap of estimated reward between x i, x j as ˆδ xi,x j (t) r(x i, ˆν(t 1)) r(x j, ˆν(t 1)), and the gap of real reward between them as δ xi,x j r(x i, ν) r(x j, ν). Lemma A.7 (Bound of the gap). For any pair of action x i, x j X, we establish the inequality over time t as: ˆδ xi,x j (t) δ xi,x j L ν ˆν(t 1). (16) Proof. ˆδ xi,x j (t) δ xi,x j (r(x i, ˆν(t 1)) r(x i, ν)) (r(x j, ˆν(t 1)) r(x j, ν)) (17) r(x i, ˆν(t 1)) r(x i, ν) + r(x j, ˆν(t 1)) r(x j, ν) (18) L ν ˆν(t 1) + L ν ˆν(t 1) (19) L ν ˆν(t 1). (0) Lemma A.8 (Small error in estimation). Given time t, for f X (t) ln t + ln X, α 8L β σ a, and a > 0, αf X (t) γ > 0, Pr ν ˆν(t 1) γ n σ (t 1) e 4γ L X N(t)1 a. (1)
5 Proof. As the time of exploration equals to the counter n σ (t 1) and ˆν(t 1) ˆν (nσ(t 1)), we have: αf X (t) Pr ν ˆν(t 1) γ n σ (t 1) t 1 Pr ν ˆν (nσ(t 1)) αf X (t) γ n j1 σ (t 1) n σ(t 1) j t 1 Pr ν ˆν (j) αf X (t) γ n σ (t 1) j j j1 t 1 Pr ν ˆν (j) αf X (t) γ j j1 ( e exp γ t 1 j1 t 1 j1 t 1 j1 { } e exp γ αf X (t) βσ e N(t) γ α βσ. ) αf X (t) j j β σ () () (4) (5) {Lemma A.} (6) (7) (8) As α 8L β σ a and a > 0, the probability is: αf X (t) Pr ν ˆν(t 1) γ n σ (t 1) e (t 1) N(t) γ α β σ e 4γ L X N(t)1 a. (9) Definition A.9 (Events of exploration or exploitation). In Algorithm GCB, for any time t, we can define three events, namely the beginning of exploration E Explore (t), in the process of exploration E InExplore (t) and exploitation E Exploit (t). They are mutually exclusive, and E Explore (t) E InExplore (t) E Exploit (t) is always true. Formally, it is: E Explore (t) {state(t) begin exploration} (0) E InExplore (t) {state(t) in exploration} (1) E Exploit (t) {state(t) exploitation}. () Definition A.10 (Events related to confidence interval). In Line 8 of Algorithm GCB, we can define the event for the first condition where the gap of estimated optimal action and other actions is larger than confidence interval as L CI (t) at time t, i.e., { } αf L CI (t) x X \ {ˆx(t)}, ˆδˆx(t),x X (t) (t) >. () n σ (t 1) And its complement event is: { } L c αf CI(t) x X \ {ˆx(t)}, ˆδˆx(t),x X (t) (t). (4) n σ (t 1) Remark { 1. In Algorithm GCB, we know } that the first condition of Line 8 is true, if and only if L CI (t) x X \ {ˆx(t)}, ˆδˆx(t),x (t) > αfx (t) n σ(t 1) occurs. Thus, we use the equivalent event in the following proof to make it clearer.
6 Definition A.11. For simplicity, suppose α 8L β σ a, constant a > 0 and θ > 0, then we can define two threshold functions: Note that η(t) and µ(t) are values, not random variables. Proposition A.1. If t > T 0 (1+θa) α η(t) t fx (t) (5) µ(t) (1 + θa) αf X (t). (6), then µ(t) < η(t). (It can be verified by the definition.) Lemma A.1 (Exploration Ceiling). Let α 8L β σ a and a > 0. For any time t, if the exploration counter n σ (t 1) > η(t), the algorithm will play exploitation surely, i.e., Pr E Explore (t) n σ (t 1) > η(t) 0. (7) Proof. If n σ (t 1) > η(t), then Line 8 of Algorithm GCB will be true because of its second condition. According to the algorithm, it will not go to exploration phase, so we know that which restricts n σ (t 1) to no larger than η(t) + 1 at any time t. Pr E Explore (t) n σ (t 1) > η(t) 0, (8) Lemma A.14 (Low failure probability of the confidence interval). Let f X (t) ln t+ ln X, α 8L β σ a and 0 < a 1. For any time t, the probability that both choosing bad action and the gap is larger than confidence interval satisfies: Pr L CI (t) F Bad (t) e X N(t). (9) Proof. The definition of F Bad (t) {ˆx(t) X Bad } implies x X \ {ˆx(t)}. Their gap is ˆδˆx(t),x (t) r(ˆx(t), ˆν(t 1)) r(x, ˆν(t 1)) (40) r(ˆx(t), ˆν(t 1)) r(x, ˆν(t 1)) + r(x, ν) r(ˆx(t), ν) {Definition of x } (41) r(ˆx(t), ˆν(t 1)) r(ˆx(t), ν) + r(x, ν) r(x, ˆν(t 1)) (4) L ν ˆν(t 1). (4) Thus, we can write the probability as: Pr L CI (t) F Bad (t) αf Pr x X \ {ˆx(t)}, ˆδˆx(t),x X (t) n σ (t 1) F Bad (t) αf X (t) Pr ˆδˆx(t),x n σ (t 1) F Bad (t) αf X (t) Pr ˆδˆx(t),x n σ (t 1) Pr ν ˆν (nσ(t 1)) 1 αf X (t) L n σ (t 1) e X N(t)1 1 a {Lemma A.8 with γ 1 L } (49) e X N(t). {0 < a 1 } (50) (44) (45) (46) (47) (48)
7 A.1. Distribution-independent bound Lemma A.15. For any ɛ > 0, j 1,,, t 1, when the algorithm has played n σ (t 1) j rounds exploitation at time t, the probability of incurring penalty ɛ satisfies Pr ˆx(t) ɛ n σ (t 1) j e exp { j } ɛ 8L βσ. (51) Proof. ˆx(t) is the real gap of reward between x and ˆx(t): ˆx(t) δ x,ˆx(t) δ x,ˆx(t) + ˆδˆx(t),x (t) {Definition of ˆx(t)} (5) δ x,ˆx(t) ˆδ x,ˆx(t)(t) r(x, ν) r(x, ˆν(t 1)) + r(ˆx(t), ν) r(ˆx(t), ˆν(t 1)) (55) L ν ˆν(t 1) L ν ˆν (nσ(t 1)) (56) When n σ (t 1) j, we can conclude that the probability of incurring a large penalty is: Pr ˆx(t) ɛ n σ (t 1) j Pr L ν ˆν (nσ(t 1)) ɛ n σ (t 1) j (57) Pr ν ˆν (j) ɛ (58) L e exp { j } ɛ 8L βσ. {Lemma A.} (59) (5) (54) In Algorithm GCB, we know that the exploitation is penalized with respect to the regret only if it chooses a bad action and exploits it simultaneously, i.e., F Bad (t) and E Exploit (t) are both satisfied. When the algorithm chooses exploitation at time t, the regret at that time will be E ˆx(t) I F Bad (t) E Exploit (t). Lemma A.16 (Penalty of exploitation). ɛ > 0, Algorithm GCB with f X (t) ln t + ln X, α 8L β σ a, 0 < a 1, and η(t) t f X (t), the penalty in the exploitation phase at time t will be in expectation: E ˆx(t) I F Bad (t) E Exploit (t) ( }) e ɛ + max X N(t) + e η(t) ɛ exp { 8L βσ. (60) Proof. ɛ > 0, the expectation satisfies: E ˆx(t) I F Bad (t) E Exploit (t) (61) E ˆx(t) F Bad (t) E Exploit (t) Pr F Bad (t) E Exploit (t) (6) E ˆx(t) ˆx(t) < ɛ F Bad (t) E Exploit (t) Pr ˆx(t) < ɛ F Bad (t) E Exploit (t) + E ˆx(t) ˆx(t) ɛ F Bad (t) E Exploit (t) Pr ˆx(t) ɛ F Bad (t) E Exploit (t) (6) ɛ Pr ˆx(t) < ɛ F Bad (t) E Exploit (t) + max Pr ˆx(t) ɛ F Bad (t) E Exploit (t) (64) ɛ + max Pr ˆx(t) ɛ F Bad (t) E Exploit (t). (65) By definition, exploration event E Exploit (t) {L CI (t) n σ (t) > η(t)} happens when no other action is in the gap L CI (t) or the counter n σ (t) > η(t). And we know that n σ (t) is no larger than η(t) + 1, because it is a hard constraint implied
8 by Lemma A.1. Therefore, the probability in the second term is the joint of these two events: Pr ˆx(t) ɛ F Bad (t) E Exploit (t) (66) Pr ˆx(t) ɛ F Bad (t) (L CI (t) n σ (t 1) > η(t)) (67) Pr ˆx(t) ɛ F Bad (t) L CI (t) n σ (t 1) η(t) + Pr ˆx(t) ɛ F Bad (t) n σ (t 1) > η(t) (68) Pr F Bad (t) L CI (t) + Pr ˆx(t) ɛ F Bad (t) n σ (t 1) η(t) + 1 (69) } e X N(t) + e η(t) ɛ exp { 8L βσ. {Lemma A.14 and A.15} (70) Therefore, we have E ˆx(t) I F Bad (t) E Exploit (t) ( }) e ɛ + max X N(t) + e η(t) ɛ exp { 8L βσ. (71) Theorem 4.1 (in the main text): (Distribution-independent bound). Let f X (t) ln t + ln X, and α 4L βσ. The distribution-independent regret bound of Algorithm GCB is: ) R(T ) R max σ T 8 (ln T + ln X ) + LT + Rmax ( σ + 4e X 4. (7) Proof. From the algorithm, we know that it either plays actions in the exploration phase or in the exploitation phase. The exploration phase will take time σ to finish, and its penalty is x σ x. And the penalty of playing exploitation is ˆx(t) at each time step t. R(T ) x E n σ (T ) + x σ T E ˆx(t) I F Bad (t) E Exploit (t). (7) From Lemma A.1, we can infer that if the exploration counter n σ (t) > η(t) t f X (t), it will no longer play exploration. Therefore, the expected number of rounds of exploration satisfies E n σ (T ) T f X (T ) + 1, so the regret for exploration is x E n σ (T ) ( ) x T fx (T ) + 1. (74) x σ Let ɛ 4L t 1, then η(t) t f X (t) and η(t)ɛ 8L βσ exploitation part: Therefore, we will have R(T ) x σ f X (t). Therefore, we can apply Lemma A.16 to get the regret of T E ˆx(t) I F Bad (t) E Exploit (t) (75) T ( }) e ɛ + max X N(t) + e η(t) ɛ exp { 8L β (76) σ T ( ) 4L t 1 e + max X + e N(t) (77) 8 ( ) LT e 1 + max X + e X 4. (78) x σ ( x T 8 (ln T + ln X ) + LT + x σ ) x + 4e X 4 max. (79)
9 As x and max is bounded by R max under any distribution, we conclude that: R(T ) R max σ T (ln T + ln X ) + 8 LT + Rmax ) ( σ + 4e X 4. (80) A.. Distribution-dependent Bound Under a predetered outcome distribution p, the imum gap between optimal action and sub-optimal action is. It follows that: Lemma A.17 (Condition of choosing optimal action). Suppose we have played exploration round j, at time t. If b 1, j b 8L β σ f X (t), Algorithm GCB will choose the optimal action with high probability: j b 8L βσ f X (t), Pr F Bad (j) e. (81) t N(t) b 1 Proof. According to the definition, F(j) Bad only occurs only if one sub-optimal action has the largest estimated reward. Pr F(j) Bad (8) Pr x b X Bad, x g X Good, r(x g, ˆν (j) ) r(x b, ˆν (j) ) (8) Pr x b X Bad, x g X Good, r(x g, ˆν (j) ) r(x b, ˆν (j) ) (84) Pr r(x g, ˆν (j) ) r(x b, ˆν (j) ) 0 {Union bound} (85) x b X Bad x g X Good x b X Bad x g X Good x b X Bad x g X Good x b X Bad x g X Good ( Pr r(x g, ν) r(x g, ˆν (j) ) + Pr r(x b, ν) r(x b, ˆν (j) ) < ) ( Pr r(x g, ν) r(x g, ˆν (j) ) + Pr r(x b, ν) r(x b, ˆν (j) ) > ) Pr L ν ˆν (j). (88) Thus, by Lemma A., it is Pr F(j) Bad Pr L ν ˆν (j) > x b X Bad,x g X Good (89) 4e j exp{ 8L β } x b X Bad,x g X Good σ (90) 4e X Bad X Good exp{ j 8L βσ } (91) e X exp{ j 8L βσ }. { X Bad + X Good X } (9) Therefore, if j b 8L β σ f X (t), b 1, we can conclude: Pr F Bad (j) e X (86) (87) N(t) b e. (9) tn(t) b 1
10 Lemma A.18 (Exploit the Optimal Action). Let α 8L β σ a, 0 < a 1 and θ. For any time t > T 0, the probability of F Bad (t) and playing exploitation in Algorithm GCB is: Pr E Exploit (t) F Bad (t) e N(t). (94) Proof. If t > T 0, and E Exploit (t) {L CI (t) n σ (t 1) > η(t)}, we can write the probability of exploitation as: Pr E Exploit (t) F Bad (t) Pr E Exploit (t) F Bad (t) n σ (t 1) > η(t) + Pr E Exploit (t) F Bad (t) n σ (t 1) η(t) (96) Pr F Bad (t) n σ (t 1) > η(t) + Pr L CI (t) F Bad (t) n σ (t 1) η(t) (97) Pr F Bad (t) n σ (t 1) > η(t) + Pr L CI (t) F Bad (t) (98) Pr F Bad (t) n σ (t 1) > η(t) + e X N(t). {Lemma A.14} Since we know that n σ (t 1) > η(t), 0 < a 1 and θ 1, then n σ (t 1) > η(t) > µ(t) By Lemma A.17, the following inequality holds: (1 + θa) a 8L β σf X (t) > 8L βσf X (t). Pr F Bad (t) n σ (t 1) > η(t) (100) t 1 jη(t) t jη(t) t 1 jη(t) t 1 jη(t) Therefore, we can get: Pr F Bad (t) n σ (t 1) j (101) Pr F(n Bad n σ(t 1)) σ(t 1) j Pr e N(t). F Bad (j) (95) (99) (10) (10) e tn(t) {Lemma A.17 with b } (104) (105) Pr E Exploit (t) F Bad (t) e N(t) + e X N(t) e N(t). (106) Lemma A.19 (The exploration probability will drop). Suppose the instance has unique optimal action under distribution p, i.e., X Good 1. Let α 8L β σ a, 0 < a 1. For any time t > T 0, when n σ (t 1) µ(t) (1 + θa) αf X (t) where θ, and the probability of F Good (t) and exploration happening simultaneously is: Pr E Explore (t) F Good (t) n σ (t 1) µ(t) e X N(t). (107) Proof. By definition, the event that exploration happens at time t is E Explore (t) {L c CI (t) n σ η(t)}. When t > T 0, it is true that η(t) > µ(t).
11 On one hand, if n σ (t 1) > η(t), then by Lemma A.1, we know that Pr E Explore (t) n σ (t 1) > η(t) (108) Pr E Explore (t) nσ (t 1) > η(t) Pr n σ (t 1) > η(t) (109) 0. On the other hand, for µ(t) n σ (t 1) η(t), whether to play exploration only depends on the event L c CI (t). If F Good (t) {ˆx(t) X Good} and with the assumption that X Good 1, we know that X Good (X \ {ˆx(t)}). So the gap at time t is, x X \ {ˆx(t)}, ˆδˆx(t),x (t) ˆδ x,x(t) δ x,x ˆδ x,x(t) δ x,x (111) (110) ˆδ x,x(t) δ x,x { is the imum gap} (11) L ν ˆν(t 1). {Lemma A.7} (11) And we also know that if n σ (t 1) µ(t) (1 + θa) αf X (t), αf X (t) n σ (t 1) 1 + θa, (114) thus we can get Let α 8L β σ a Pr E Explore (t) F Good (t) µ(t) n σ (t 1) η(t) (115) αf Pr x X \ {ˆx(t)}, ˆδˆx(t),x X (t) (t) n σ (t 1) F Good (t) µ(t) n σ (t 1) η(t) (116) αf X (t) Pr L ν ˆν(t 1) n σ (t 1) µ(t) n σ(t 1) η(t) (117) Pr L ν ˆν(t 1) 1 + θa µ(t) n σ(t 1) η(t) (118) Pr L ν ˆν (nσ(t 1)) θa 1 + θa µ(t) n σ (t 1) η(t) (119) Pr L ν ˆν (nσ(t 1)) θa 1 + θa µ(t) n σ (t 1) η(t) (10) η(t) Pr L ν ˆν (nσ(t 1)) θa 1 + θa n σ (t 1) j (11) jµ(t) η(t) jµ(t) η(t) jµ(t) Pr L ν ˆν (j) θa 1 + θa n σ (t 1) j (1) Pr L ν ˆν (j) θa 1 + θa. (1), 0 < a 1. For j µ(t),, η(t) and µ(t) (1 + θa) αf X (t) Pr ν ˆν (j) 1 L θa 1 + θa } { e exp (θa) j (1 + θa) 8L, recall Lemma A., then we have: (14) {Lemma A.} (15) e exp { θ f X (t) } {j µ(t)} (16) e N(t) θ. (17)
12 Therefore, we have: Pr E Explore (t) F Good (t) n σ (t 1) µ(t) (18) Pr E Explore (t) F Good (t) n σ (t 1) > η(t) + Pr E Explore (t) F Good (t) µ(t) n σ (t 1) η(t) (19) 0 + η(t) jµ(t) e t N(t) θ e X N(t)1 θ e N(t) θ (10) (11) (1) e X N(t). {Let θ } (1) When the instance has a unique optimal action x under distribution p, the following lemmata ensures that exploration will not continue endlessly, thus it will switch to exploitation gradually. For simplicity, we consider the case that the exploration round has already reached µ(t ) at given time T. Lemma A.0 (Switch to exploitation gradually). Suppose the instance has a unique optimal action x under distribution p. Given time T, if for time i T the exploration rounds n σ (i) µ(t ) has already been satisfied, where µ(t ) (1 + θa) αf X (T ), 0 < a 1. θ. Then t, max{i + 1, T 0 } t T, the probability of playing exploration is: Proof. As n σ (i) µ(t ), we know that Pr E Explore (t) n σ (i) µ(t ) 4e N(t). (14) n σ (i) µ(t ) n σ (t 1) n σ (i) µ(t ), (15) which implies that the event n σ (i) µ(t ) is the subset of the event n σ (t 1) µ(t ). From Lemma A.19, the first part is For the second part, as 0 < a 1 and θ, we can get µ(t ) (1 + θa) αf X (T ) Pr E Explore (t) n σ (i) µ(t ) (16) Pr E Explore (t) n σ (t 1) µ(t ) (17) Pr E Explore (t) F Good (t) n σ (t 1) µ(t ) + Pr E Explore (t) F Bad (t) n σ (t 1) µ(t ) (18) Pr E Explore (t) F Good (t) n σ (t 1) µ(t ) (19) + Pr F Bad (t) n σ (t 1) µ(t ). (140) Pr E Explore (t) F Good (t) n σ (t 1) µ(t ) (141) Pr E Explore (t) F Good (t) n σ (t 1) µ(t) (14) e X N(t). (14) (1 + θa) αf X (t) (1 + θa) a 8L β σf X (t) > 8L βσf X (t). (144)
13 Thus, by using Lemma A.17, it is Pr F Bad (t) n σ (t 1) µ(t ) (145) t Therefore, we can get t 1 jµ(t ) t 1 jµ(t ) t 1 jµ(t ) t 1 jµ(t ) Pr F Bad (t) n σ (t 1) j (146) Pr F(n Bad n σ(t 1)) σ(t 1) j Pr F(j) Bad Pr F Bad (j) n σ (t 1) j (147) (148) (149) e tn(t) {Lemma A.17 with b } (150) e N(t). (151) Pr E Explore (t) n σ (i) µ(t ) e X N + e N 4e N. (15) For counter n σ, the following definition characterizes its first occurrence to be k. Definition A.1. Given k, for any t, we define the event that n σ (t) k and n σ (t 1) k 1 as G k (t), i.e., G k (t) {n σ (t) k n σ (t 1) k 1}. Lemma A. (Exploration Numbers). Let µ(t ) (1 + θa) αf X (T ), 0 < a 1 and θ. If under distribution p, there is a unique optimal action, i.e., X Good 1, then the expected exploration round at time T (T 0 T ) is: E n σ (T ) µ(t ) + 4e T0 X 4 ln(t + 1) Pr E Explore (t). (15) Proof. Note that it takes σ time steps to play exploration and then to increase n σ by 1. E FinExplore (t) is the event that the algorithm finishes one round of exploration and updates n σ at time t. Then, we have E FinExplore (t) E Explore (t σ + 1) and t 1,,, σ 1, Pr E FinExplore (t) 0, meaning that the event never happens for t < σ. By definition, we can get: E n σ (T ) T Pr E FinExplore (t) T Pr E FinExplore (t) t σ Because the accumulation of exploration rounds is n σ (T ), therefore its expected number can be: Pr E Explore (t). (154) E n σ (T ) Pr E Explore (t) (155) Pr E Explore (t) n σ (T ) < µ(t ) + Pr E Explore (t) n σ (T ) µ(t ). (156)
14 The following inequality ensures that the first part is not large: Pr E Explore (t) n σ (T ) < µ(t ) (157) Pr n σ (T ) < µ(t ) Pr E Explore (t) n σ (T ) < µ(t ) (158) Pr n σ (T ) < µ(t ) Pr E Explore (t) nσ (T ) < µ(t ) (159) Pr n σ (T ) < µ(t ) T Pr E FinExplore (t) n σ (T ) < µ(t ) (160) Pr n σ (T ) < µ(t ) E n σ (T ) n σ (T ) < µ(t ) (161) Pr n σ (T ) < µ(t ) µ(t ). (16) We know the counter n σ could only increase by 1 at a time. For this reason, if the value of n σ (T ) exceeds µ(t ) at time T {, this event must happen within t µ(t ),, T. Thus, the occurrence of µ(t ) is equivalent to the union of events T } G µ(t )(i). By definition, each event G µ(t ) (i), i µ(t ),, T, is mutually exclusive. Therefore, we have { T } {n σ (T ) µ(t )} G µ(t )(i), and the second part is: T T Pr E Explore (t) n σ (T ) µ(t ) (16) Pr E Explore (t) Pr T T T ( E Explore (t) G µ(t ) (i) ) G µ(t ) (i) (164) (165) Pr E Explore (t) G µ(t ) (i) {Union bound} (166) Pr E Explore (t) G µ(t ) (i) (167) i Pr E Explore (t) G µ(t ) (i) + T ti+1 Pr E Explore (t) G µ(t ) (i). (168)
15 Now we will prove that the first term is in O(µ(T )): T T T T T T i Pr E Explore (t) G µ(t ) (i) (169) i Pr G µ(t ) (i) Pr E Explore (t) Gµ(T ) (i) (170) Pr G µ(t ) (i) i Pr E Explore (t) G µ(t ) (i) (171) Pr G µ(t ) (i) i+ σ 1 Pr E FinExplore (t) G µ(t ) (i) (17) Pr G µ(t ) (i) E n σ (i + σ 1) G µ(t ) (i) (17) Pr G µ(t ) (i) E n σ (i) + 1 Gµ(T ) (i) {n σ (i + σ 1) n σ (i) + 1} (174) Pr n σ (T ) µ(t ) (µ(t ) + 1), {Mutually exclusive} (175) Since G µ(t ) (i) {n σ (i) µ(t ) n σ (i 1) µ(t ) 1}, we can write the second term as: T T T ti+1 T 0 T 0 T 0 T Pr E Explore (t) G µ(t ) (i) (176) Pr E Explore (t) G µ(t ) (i) + Pr E Explore (t) G µ(t ) (i) + Pr E Explore (t) G µ(t ) (i) + T 0 Pr E Explore (t) T 0 T Pr E Explore (t) T + 4e X 4 T 0 Pr E Explore (t) + 4e ln T. X 4 Therefore, we can get T tmax{i+1,t 0+1} T tmax{i+1,t 0+1} T tmax{i+1,t 0+1} G µ(t ) (i) + 4e X 4 T T Pr E Explore (t) G µ(t ) (i) (177) Pr E Explore (t) n σ (i) µ(t ) (178) 4e N(t) ti t dt di {Lemma A.0} (179) {Mutually exclusive} (180) 1 di (181) i (18) E n σ (T ) µ(t ) e T0 X 4 ln T + Pr E Explore (t). (18)
16 Theorem 4. (in the main text): (Distribution-dependent bound). For Algorithm GCB, let f X (t) ln t + ln X, α 4L βσ. If the instance has a unique optimal action under outcome distribution p and mean outcome vector ν, the distribution-dependent regret bound of Algorithm GCB is: R(T ) 96L βσ x x σ (ln T + ln X ) + 4e X 4 ln T + 1 where x σ x, max and are problem-specific constants under the distribution p. ( e + max X L βσ ), (184) Proof. If we penalize each time the algorithm plays a sub-optimal action by max, then the regret function is composed of exploration and exploitation: R(T ) x σ x E n σ (T ) + max x σ x E n σ (T ) + max T E I E Exploit (t) F Bad (t) (185) T Pr E Exploit (t) F Bad (t). (186) Suppose it has unique optimal action X Good 1, from Lemma A. the expected rounds of exploration are: E n σ (T ) (1 + θa) αf X (T ) e T0 X 4 ln(t + 1) + The regret of exploitation phase can be inferred from Lemma A.18 that: max max Pr E Explore (t). (187) T Pr F Bad (t) E Exploit (t) (188) ( T tt 0+1 ( e max X 4 ( max e T0 N(t) + Pr F Bad (t) E Exploit (t) ) (189) T tt 0+1 e T0 X 4 + T0 1 t + Pr E Exploit (t) ) (190) Pr E Exploit (t) ). (191) Since for t 1,,, T 0, we perform either exploration or exploitation, the regret is no worse than max T 0, that is: T 0 x Pr E Explore (t) T 0 + max Pr E Exploit (t) max T 0. (19) x σ Thus, for f X (t) ln t + ln X, α 8L β σ a, 0 < a 1 and θ, R(T ) (1 + θa) α x (ln T + ln X ) + 4e x σ X 4 ln T + 1 where T 0 (1+θa) α. Let a 1, θ, and α 4L βσ. As a conclusion, we will get: R(T ) 96L β σ x (ln T + ln X ) + 4e X 4 ln T + 1 x σ ( ) e + max X 4 + T 0, (19) ( e + max X L βσ ). (194)
17 B. An Example of M σ and Global Observer Set Construction for 1 < s < N in the Crowdsourcing Application In this section, we provide an example of constructing the stacked matrix M σ and the global observer set in the crowdsourcing application when we require 1 < s < N, where s is the number of matched worker-task pairs used for reporting the feedback. Recall that the feedback for a matching is the simple summation of these s matched worker-task pairs. This implies that for each matching x, the transformation matrix M x contains a single row with exactly s 1s and all other entries are 0, and M x x s. As an illustration, consider the case that both N and M are divisible by s + 1. Then we can construct a full-rank square matrix M σ such that, after rearranging the columns of M σ, it is a block diagonal matrix with each block B being an (s+1)-by-(s+1) square matrix with 0 in the diagonal entries and 1 as off-diagonal entries. The following is an illustration of such a matrix for the case of s + 1 N M It is clear that this M σ is full column rank. To recover the NM actions (matchings) corresponding to the NM rows, we map each block B to a matching that matches s + 1 workers to s + 1 tasks such that these matchings share no common edges. This can be done in the following way. We partition N workers into N/(s+1) groups of size s+1 each, and partition M tasks into M/(s+1) groups of size s+1 each. Taking any group W of s + 1 workers and any group U of s + 1 tasks, we can find s + 1 non-overlapping matchings between W and U by rotation: in the j-th matching, the i-th worker is matched with the (i + j mod s + 1)-th task. Since we have NM/(s + 1) worker-task group pairs, and each group pair generates s + 1 non-overlapping matchings, in total we have NM/(s + 1) non-overlapping matchings, and we map these matches to the NM/(s + 1) blocks in the rearranged matrix M σ. The above construction implies that we can find NM actions to form a global observer set, in which each action is a matching of s + 1 workers to s + 1 tasks, and each matching returns an aggregate performance feedback of s worker-task pairs in the matching. Thus the assumption on the existence of the global observer set holds and the set can be constructed easily. The error bound for the above constructed M σ is more complicated to analyze, but by our empirical evaluation using Matlab, we believe that it is also a low-degree polynomial in N and M. References Hayes, Thomas P. A large-deviation inequality for vector-valued martingales. Combinatorics, Probability and Computing, 00.
Martingale Pricing Theory in Discrete-Time and Discrete-Space Models
IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,
More informationAMH4 - ADVANCED OPTION PRICING. Contents
AMH4 - ADVANCED OPTION PRICING ANDREW TULLOCH Contents 1. Theory of Option Pricing 2 2. Black-Scholes PDE Method 4 3. Martingale method 4 4. Monte Carlo methods 5 4.1. Method of antithetic variances 5
More informationInformation aggregation for timing decision making.
MPRA Munich Personal RePEc Archive Information aggregation for timing decision making. Esteban Colla De-Robertis Universidad Panamericana - Campus México, Escuela de Ciencias Económicas y Empresariales
More informationLecture 19: March 20
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 19: March 0 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More information6.896 Topics in Algorithmic Game Theory February 10, Lecture 3
6.896 Topics in Algorithmic Game Theory February 0, 200 Lecture 3 Lecturer: Constantinos Daskalakis Scribe: Pablo Azar, Anthony Kim In the previous lecture we saw that there always exists a Nash equilibrium
More informationLecture 14: Examples of Martingales and Azuma s Inequality. Concentration
Lecture 14: Examples of Martingales and Azuma s Inequality A Short Summary of Bounds I Chernoff (First Bound). Let X be a random variable over {0, 1} such that P [X = 1] = p and P [X = 0] = 1 p. n P X
More informationValuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model
Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model 1(23) Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility
More informationarxiv: v1 [cs.lg] 21 May 2011
Calibration with Changing Checking Rules and Its Application to Short-Term Trading Vladimir Trunov and Vladimir V yugin arxiv:1105.4272v1 [cs.lg] 21 May 2011 Institute for Information Transmission Problems,
More informationQI SHANG: General Equilibrium Analysis of Portfolio Benchmarking
General Equilibrium Analysis of Portfolio Benchmarking QI SHANG 23/10/2008 Introduction The Model Equilibrium Discussion of Results Conclusion Introduction This paper studies the equilibrium effect of
More informationLecture 23: April 10
CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 23: April 10 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationLevin Reduction and Parsimonious Reductions
Levin Reduction and Parsimonious Reductions The reduction R in Cook s theorem (p. 266) is such that Each satisfying truth assignment for circuit R(x) corresponds to an accepting computation path for M(x).
More informationLecture 4. Finite difference and finite element methods
Finite difference and finite element methods Lecture 4 Outline Black-Scholes equation From expectation to PDE Goal: compute the value of European option with payoff g which is the conditional expectation
More information3.2 No-arbitrage theory and risk neutral probability measure
Mathematical Models in Economics and Finance Topic 3 Fundamental theorem of asset pricing 3.1 Law of one price and Arrow securities 3.2 No-arbitrage theory and risk neutral probability measure 3.3 Valuation
More informationMicroeconomic Theory II Preliminary Examination Solutions
Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose
More informationNotes on the symmetric group
Notes on the symmetric group 1 Computations in the symmetric group Recall that, given a set X, the set S X of all bijections from X to itself (or, more briefly, permutations of X) is group under function
More informationSublinear Time Algorithms Oct 19, Lecture 1
0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation
More informationGPD-POT and GEV block maxima
Chapter 3 GPD-POT and GEV block maxima This chapter is devoted to the relation between POT models and Block Maxima (BM). We only consider the classical frameworks where POT excesses are assumed to be GPD,
More informationSocially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors
Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical
More informationRegret Minimization and Correlated Equilibria
Algorithmic Game heory Summer 2017, Week 4 EH Zürich Overview Regret Minimization and Correlated Equilibria Paolo Penna We have seen different type of equilibria and also considered the corresponding price
More informationMATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models
MATH 5510 Mathematical Models of Financial Derivatives Topic 1 Risk neutral pricing principles under single-period securities models 1.1 Law of one price and Arrow securities 1.2 No-arbitrage theory and
More informationThe Analytics of Information and Uncertainty Answers to Exercises and Excursions
The Analytics of Information and Uncertainty Answers to Exercises and Excursions Chapter 6: Information and Markets 6.1 The inter-related equilibria of prior and posterior markets Solution 6.1.1. The condition
More informationOn Complexity of Multistage Stochastic Programs
On Complexity of Multistage Stochastic Programs Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA e-mail: ashapiro@isye.gatech.edu
More informationEquity correlations implied by index options: estimation and model uncertainty analysis
1/18 : estimation and model analysis, EDHEC Business School (joint work with Rama COT) Modeling and managing financial risks Paris, 10 13 January 2011 2/18 Outline 1 2 of multi-asset models Solution to
More informationMartingales. by D. Cox December 2, 2009
Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a
More informationSingle-Parameter Mechanisms
Algorithmic Game Theory, Summer 25 Single-Parameter Mechanisms Lecture 9 (6 pages) Instructor: Xiaohui Bei In the previous lecture, we learned basic concepts about mechanism design. The goal in this area
More informationInterpolation. 1 What is interpolation? 2 Why are we interested in this?
Interpolation 1 What is interpolation? For a certain function f (x we know only the values y 1 = f (x 1,,y n = f (x n For a point x different from x 1,,x n we would then like to approximate f ( x using
More informationRoy Model of Self-Selection: General Case
V. J. Hotz Rev. May 6, 007 Roy Model of Self-Selection: General Case Results drawn on Heckman and Sedlacek JPE, 1985 and Heckman and Honoré, Econometrica, 1986. Two-sector model in which: Agents are income
More informationEssays on Some Combinatorial Optimization Problems with Interval Data
Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university
More informationBandit Learning with switching costs
Bandit Learning with switching costs Jian Ding, University of Chicago joint with: Ofer Dekel (MSR), Tomer Koren (Technion) and Yuval Peres (MSR) June 2016, Harvard University Online Learning with k -Actions
More informationOn Packing Densities of Set Partitions
On Packing Densities of Set Partitions Adam M.Goyt 1 Department of Mathematics Minnesota State University Moorhead Moorhead, MN 56563, USA goytadam@mnstate.edu Lara K. Pudwell Department of Mathematics
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More information4: SINGLE-PERIOD MARKET MODELS
4: SINGLE-PERIOD MARKET MODELS Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 4: Single-Period Market Models 1 / 87 General Single-Period
More informationLecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.
Sufficient Statistics Lecture Notes 6 Sufficiency Data reduction in terms of a particular statistic can be thought of as a partition of the sample space X. Definition T is sufficient for θ if the conditional
More informationMax Registers, Counters and Monotone Circuits
James Aspnes 1 Hagit Attiya 2 Keren Censor 2 1 Yale 2 Technion Counters Model Collects Our goal: build a cheap counter for an asynchronous shared-memory system. Two operations: increment and read. Read
More informationCS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued)
CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued) Instructor: Shaddin Dughmi Administrivia Homework 1 due today. Homework 2 out
More informationThe University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Solutions to Final Exam.
The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (32 pts) Answer briefly the following questions. 1. Suppose
More informationModelling Returns: the CER and the CAPM
Modelling Returns: the CER and the CAPM Carlo Favero Favero () Modelling Returns: the CER and the CAPM 1 / 20 Econometric Modelling of Financial Returns Financial data are mostly observational data: they
More informationOutline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.
Outline Coordinate Minimization Daniel P. Robinson Department of Applied Mathematics and Statistics Johns Hopkins University November 27, 208 Introduction 2 Algorithms Cyclic order with exact minimization
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationGame Theory: Normal Form Games
Game Theory: Normal Form Games Michael Levet June 23, 2016 1 Introduction Game Theory is a mathematical field that studies how rational agents make decisions in both competitive and cooperative situations.
More informationA potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples
1.3 Regime switching models A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples (or regimes). If the dates, the
More informationA No-Arbitrage Theorem for Uncertain Stock Model
Fuzzy Optim Decis Making manuscript No (will be inserted by the editor) A No-Arbitrage Theorem for Uncertain Stock Model Kai Yao Received: date / Accepted: date Abstract Stock model is used to describe
More informationLecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory
CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go
More informationTTIC An Introduction to the Theory of Machine Learning. Learning and Game Theory. Avrim Blum 5/7/18, 5/9/18
TTIC 31250 An Introduction to the Theory of Machine Learning Learning and Game Theory Avrim Blum 5/7/18, 5/9/18 Zero-sum games, Minimax Optimality & Minimax Thm; Connection to Boosting & Regret Minimization
More informationDRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics
Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward
More informationThe Fixed Income Valuation Course. Sanjay K. Nawalkha Gloria M. Soto Natalia A. Beliaeva
Interest Rate Risk Modeling The Fixed Income Valuation Course Sanjay K. Nawalkha Gloria M. Soto Natalia A. Beliaeva Interest t Rate Risk Modeling : The Fixed Income Valuation Course. Sanjay K. Nawalkha,
More informationAsymptotic results discrete time martingales and stochastic algorithms
Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete
More informationStrategies and Nash Equilibrium. A Whirlwind Tour of Game Theory
Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 11 10/9/2013. Martingales and stopping times II
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 11 10/9/013 Martingales and stopping times II Content. 1. Second stopping theorem.. Doob-Kolmogorov inequality. 3. Applications of stopping
More informationM5MF6. Advanced Methods in Derivatives Pricing
Course: Setter: M5MF6 Dr Antoine Jacquier MSc EXAMINATIONS IN MATHEMATICS AND FINANCE DEPARTMENT OF MATHEMATICS April 2016 M5MF6 Advanced Methods in Derivatives Pricing Setter s signature...........................................
More informationBayesian Linear Model: Gory Details
Bayesian Linear Model: Gory Details Pubh7440 Notes By Sudipto Banerjee Let y y i ] n i be an n vector of independent observations on a dependent variable (or response) from n experimental units. Associated
More informationMath-Stat-491-Fall2014-Notes-V
Math-Stat-491-Fall2014-Notes-V Hariharan Narayanan December 7, 2014 Martingales 1 Introduction Martingales were originally introduced into probability theory as a model for fair betting games. Essentially
More informationIs Greedy Coordinate Descent a Terrible Algorithm?
Is Greedy Coordinate Descent a Terrible Algorithm? Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, Hoyt Koepke University of British Columbia Optimization and Big Data, 2015 Context: Random
More informationWeb Appendix: Proofs and extensions.
B eb Appendix: Proofs and extensions. B.1 Proofs of results about block correlated markets. This subsection provides proofs for Propositions A1, A2, A3 and A4, and the proof of Lemma A1. Proof of Proposition
More informationThe Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition.
The Real Numbers Here we show one way to explicitly construct the real numbers R. First we need a definition. Definitions/Notation: A sequence of rational numbers is a funtion f : N Q. Rather than write
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More information3 The Model Existence Theorem
3 The Model Existence Theorem Although we don t have compactness or a useful Completeness Theorem, Henkinstyle arguments can still be used in some contexts to build models. In this section we describe
More informationNon replication of options
Non replication of options Christos Kountzakis, Ioannis A Polyrakis and Foivos Xanthos June 30, 2008 Abstract In this paper we study the scarcity of replication of options in the two period model of financial
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationBrownian Motion. Richard Lockhart. Simon Fraser University. STAT 870 Summer 2011
Brownian Motion Richard Lockhart Simon Fraser University STAT 870 Summer 2011 Richard Lockhart (Simon Fraser University) Brownian Motion STAT 870 Summer 2011 1 / 33 Purposes of Today s Lecture Describe
More informationStatistical Tables Compiled by Alan J. Terry
Statistical Tables Compiled by Alan J. Terry School of Science and Sport University of the West of Scotland Paisley, Scotland Contents Table 1: Cumulative binomial probabilities Page 1 Table 2: Cumulative
More informationCHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION
CHOICE THEORY, UTILITY FUNCTIONS AND RISK AVERSION Szabolcs Sebestyén szabolcs.sebestyen@iscte.pt Master in Finance INVESTMENTS Sebestyén (ISCTE-IUL) Choice Theory Investments 1 / 65 Outline 1 An Introduction
More informationThe University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam
The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (42 pts) Answer briefly the following questions. 1. Questions
More informationEvaluating Strategic Forecasters. Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017
Evaluating Strategic Forecasters Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017 Motivation Forecasters are sought after in a variety of
More informationLECTURE NOTES 10 ARIEL M. VIALE
LECTURE NOTES 10 ARIEL M VIALE 1 Behavioral Asset Pricing 11 Prospect theory based asset pricing model Barberis, Huang, and Santos (2001) assume a Lucas pure-exchange economy with three types of assets:
More informationChapter 3: Black-Scholes Equation and Its Numerical Evaluation
Chapter 3: Black-Scholes Equation and Its Numerical Evaluation 3.1 Itô Integral 3.1.1 Convergence in the Mean and Stieltjes Integral Definition 3.1 (Convergence in the Mean) A sequence {X n } n ln of random
More informationBasic Procedure for Histograms
Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that
More informationAD in Monte Carlo for finance
AD in Monte Carlo for finance Mike Giles giles@comlab.ox.ac.uk Oxford University Computing Laboratory AD & Monte Carlo p. 1/30 Overview overview of computational finance stochastic o.d.e. s Monte Carlo
More informationLog-linear Dynamics and Local Potential
Log-linear Dynamics and Local Potential Daijiro Okada and Olivier Tercieux [This version: November 28, 2008] Abstract We show that local potential maximizer ([15]) with constant weights is stochastically
More information1 Dynamic programming
1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants
More informationMAT 4250: Lecture 1 Eric Chung
1 MAT 4250: Lecture 1 Eric Chung 2Chapter 1: Impartial Combinatorial Games 3 Combinatorial games Combinatorial games are two-person games with perfect information and no chance moves, and with a win-or-lose
More informationAmerican options and early exercise
Chapter 3 American options and early exercise American options are contracts that may be exercised early, prior to expiry. These options are contrasted with European options for which exercise is only
More informationThe stochastic calculus
Gdansk A schedule of the lecture Stochastic differential equations Ito calculus, Ito process Ornstein - Uhlenbeck (OU) process Heston model Stopping time for OU process Stochastic differential equations
More informationFIGURE A1.1. Differences for First Mover Cutoffs (Round one to two) as a Function of Beliefs on Others Cutoffs. Second Mover Round 1 Cutoff.
APPENDIX A. SUPPLEMENTARY TABLES AND FIGURES A.1. Invariance to quantitative beliefs. Figure A1.1 shows the effect of the cutoffs in round one for the second and third mover on the best-response cutoffs
More informationCounting Basics. Venn diagrams
Counting Basics Sets Ways of specifying sets Union and intersection Universal set and complements Empty set and disjoint sets Venn diagrams Counting Inclusion-exclusion Multiplication principle Addition
More informationSTOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION
STOCHASTIC REPUTATION DYNAMICS UNDER DUOPOLY COMPETITION BINGCHAO HUANGFU Abstract This paper studies a dynamic duopoly model of reputation-building in which reputations are treated as capital stocks that
More informationMonte Carlo Methods in Option Pricing. UiO-STK4510 Autumn 2015
Monte Carlo Methods in Option Pricing UiO-STK4510 Autumn 015 The Basics of Monte Carlo Method Goal: Estimate the expectation θ = E[g(X)], where g is a measurable function and X is a random variable such
More informationWeek 1 Quantitative Analysis of Financial Markets Basic Statistics A
Week 1 Quantitative Analysis of Financial Markets Basic Statistics A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationAn Intertemporal Capital Asset Pricing Model
I. Assumptions Finance 400 A. Penati - G. Pennacchi Notes on An Intertemporal Capital Asset Pricing Model These notes are based on the article Robert C. Merton (1973) An Intertemporal Capital Asset Pricing
More informationDynamic Replication of Non-Maturing Assets and Liabilities
Dynamic Replication of Non-Maturing Assets and Liabilities Michael Schürle Institute for Operations Research and Computational Finance, University of St. Gallen, Bodanstr. 6, CH-9000 St. Gallen, Switzerland
More informationBest-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015
Best-Reply Sets Jonathan Weinstein Washington University in St. Louis This version: May 2015 Introduction The best-reply correspondence of a game the mapping from beliefs over one s opponents actions to
More informationLecture 11: Bandits with Knapsacks
CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic
More informationIEOR E4004: Introduction to OR: Deterministic Models
IEOR E4004: Introduction to OR: Deterministic Models 1 Dynamic Programming Following is a summary of the problems we discussed in class. (We do not include the discussion on the container problem or the
More informationTheoretical Statistics. Lecture 4. Peter Bartlett
1. Concentration inequalities. Theoretical Statistics. Lecture 4. Peter Bartlett 1 Outline of today s lecture We have been looking at deviation inequalities, i.e., bounds on tail probabilities likep(x
More informationmonotone circuit value
monotone circuit value A monotone boolean circuit s output cannot change from true to false when one input changes from false to true. Monotone boolean circuits are hence less expressive than general circuits.
More informationPh.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017
Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2017 The time limit for this exam is four hours. The exam has four sections. Each section includes two questions.
More informationNo-arbitrage theorem for multi-factor uncertain stock model with floating interest rate
Fuzzy Optim Decis Making 217 16:221 234 DOI 117/s17-16-9246-8 No-arbitrage theorem for multi-factor uncertain stock model with floating interest rate Xiaoyu Ji 1 Hua Ke 2 Published online: 17 May 216 Springer
More informationCentral Limit Theorem for the Realized Volatility based on Tick Time Sampling. Masaaki Fukasawa. University of Tokyo
Central Limit Theorem for the Realized Volatility based on Tick Time Sampling Masaaki Fukasawa University of Tokyo 1 An outline of this talk is as follows. What is the Realized Volatility (RV)? Known facts
More informationSingle Price Mechanisms for Revenue Maximization in Unlimited Supply Combinatorial Auctions
Single Price Mechanisms for Revenue Maximization in Unlimited Supply Combinatorial Auctions Maria-Florina Balcan Avrim Blum Yishay Mansour February 2007 CMU-CS-07-111 School of Computer Science Carnegie
More informationWrite legibly. Unreadable answers are worthless.
MMF 2021 Final Exam 1 December 2016. This is a closed-book exam: no books, no notes, no calculators, no phones, no tablets, no computers (of any kind) allowed. Do NOT turn this page over until you are
More informationTechniques for Calculating the Efficient Frontier
Techniques for Calculating the Efficient Frontier Weerachart Kilenthong RIPED, UTCC c Kilenthong 2017 Tee (Riped) Introduction 1 / 43 Two Fund Theorem The Two-Fund Theorem states that we can reach any
More informationRisk Neutral Measures
CHPTER 4 Risk Neutral Measures Our aim in this section is to show how risk neutral measures can be used to price derivative securities. The key advantage is that under a risk neutral measure the discounted
More informationHomework Assignments
Homework Assignments Week 1 (p. 57) #4.1, 4., 4.3 Week (pp 58 6) #4.5, 4.6, 4.8(a), 4.13, 4.0, 4.6(b), 4.8, 4.31, 4.34 Week 3 (pp 15 19) #1.9, 1.1, 1.13, 1.15, 1.18 (pp 9 31) #.,.6,.9 Week 4 (pp 36 37)
More informationThe University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam
The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (40 points) Answer briefly the following questions. 1. Consider
More informationRepresenting Risk Preferences in Expected Utility Based Decision Models
Representing Risk Preferences in Expected Utility Based Decision Models Jack Meyer Department of Economics Michigan State University East Lansing, MI 48824 jmeyer@msu.edu SCC-76: Economics and Management
More informationValue of Flexibility in Managing R&D Projects Revisited
Value of Flexibility in Managing R&D Projects Revisited Leonardo P. Santiago & Pirooz Vakili November 2004 Abstract In this paper we consider the question of whether an increase in uncertainty increases
More informationComprehensive Exam. August 19, 2013
Comprehensive Exam August 19, 2013 You have a total of 180 minutes to complete the exam. If a question seems ambiguous, state why, sharpen it up and answer the sharpened-up question. Good luck! 1 1 Menu
More informationYou Have an NP-Complete Problem (for Your Thesis)
You Have an NP-Complete Problem (for Your Thesis) From Propositions 27 (p. 242) and Proposition 30 (p. 245), it is the least likely to be in P. Your options are: Approximations. Special cases. Average
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationAn Axiomatic Approach to Arbitration and Its Application in Bargaining Games
An Axiomatic Approach to Arbitration and Its Application in Bargaining Games Kang Rong School of Economics, Shanghai University of Finance and Economics Aug 30, 2012 Abstract We define an arbitration problem
More information