Treatment Allocations Based on Multi-Armed Bandit Strategies
|
|
- Carmella Haynes
- 5 years ago
- Views:
Transcription
1 Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics and Machine Learning for Precision Medicine September 15, 2017
2 1 Bandit Problems 2 Methodology and Theory 3 Model Combining 4 Numerical Studies 5 Conclusion
3 Standard Multi-Armed Bandit Problem There is a wall of slot machines.!! =!!%!!! =!!%!!! =!!%!!! Each machine has certain winning probability to receive $1.
4 Standard Multi-Armed Bandit Problem There is a wall of slot machines.!! =!!%!!! =!!%!!! =!!%!!! Each machine has certain winning probability to receive $1. Chances of winning are unknown to the game player. At each time, one and only one machine can be played, and the immediate result is observed. Goal: maximize the total number of wins over N times of plays.
5 Exploration-Exploitation Tradeoff Exploration: pull each arm as many times as possible to explore on the true reward probabilities. Exploitation: use the existing information and play the best arm.
6 Motivation: Ethical Clinical Studies Slot machines: different treatments to a certain disease Survival probability: unknown to the doctor Goal: sequentially assign treatments to patients to maximize the survival rate
7 A Real Example: ECMO Trial ECMO for treating newborns with persistent pulmonary hypertension? Ethical dilemma of using a conventional randomized controlled trial current patients versus future patients two hats on a participating doctor A solution is response adaptive design. L.J. Wei s randomized version of the play the winner rule was used in a study. The ECMO trial has generated a lot of discussions. See, e.g., two Statistical Science papers in 1989 and 1991.
8 Motivation: Online Services Web applications are generating massive data streams. Online recommendation systems recommend articles to online newspaper readers. recommend products to customers of online retailers.
9 Motivation: Online Services Web applications are generating massive data streams. Online recommendation systems recommend articles to online newspaper readers. recommend products to customers of online retailers.
10 Motivation: Bandit Problem For Online Services Slot machines: multiple articles Each internet visit: one and only one article delivered Clicking probability: unknown to the internet company Goal: sequentially choose an article for internet users to maximize the total number of clicks or click-through-rate (CTR)
11 Bandit Problem With Covariates Standard bandit problem assumes constant winning probabilities. In practice, winning probability can be dependent on covariates.
12 Bandit Problem With Covariates Standard bandit problem assumes constant winning probabilities. In practice, winning probability can be dependent on covariates. Personalized medical service Treatment effects (e.g., survival probability) can be associated with patients prognostic factors.
13 Personalized Web Service Personalized online advertising, article recommendation Internet user s interest in an ad or an article story can be associated with some user information.
14 Multi-Armed Bandit with Covariate (MABC) for Precision Medicine An example scenario: A few FDA approved drugs are available on the market for treating a certain disease Currently the doctors perhaps choose among the available drugs based on limited information and reading of scattered publications if any Why not use the MABC framework for better medical practice?
15 Two-Armed Bandit Problem with Covariates Two treatments (news articles): A and B Patient (user) covariate x [0, 1] Recovering (clicking) probability: f A(x), f B(x) clicking probablity f A (x) f B (x) x
16 Problem Setup: Two-Armed Bandit with Covariates Problem Setup: Given a bandit problem with two arms: treatments A and B Unknown recovering probabilities given covariate x [0, 1] d : f A(x), f B(x) Covariates X n, i.i.d. from continuous distribution P X
17 Problem Setup: Two-Armed Bandit with Covariates Problem Setup: Given a bandit problem with two arms: treatments A and B Unknown recovering probabilities given covariate x [0, 1] d : f A(x), f B(x) Covariates X n, i.i.d. from continuous distribution P X At each time n, 1 observe patient covariate Xn P X; 2 Based on previous observations and Xn, apply a sequential allocation algorithm to choose the treatment I n {A, B}; 3 observe result YIn,n Bernoulli(f In (X n)). recover: Y In,n = 1; otherwise: Y In,n = 0.
18 Problem Setup: Two-Armed Bandit with Covariates Problem Setup: Given a bandit problem with two arms: treatments A and B Unknown recovering probabilities given covariate x [0, 1] d : f A(x), f B(x) Covariates X n, i.i.d. from continuous distribution P X At each time n, 1 observe patient covariate Xn P X; 2 Based on previous observations and Xn, apply a sequential allocation algorithm to choose the treatment I n {A, B}; 3 observe result YIn,n Bernoulli(f In (X n)). recover: Y In,n = 1; otherwise: Y In,n = 0. Question: how to design the sequential allocation algorithm?
19 A Measure of Performance: Regret Given patient covariate x, optimal strategy: give the treatment I (x) := argmax f i(x) i {A,B} optimal recovering probability: f (x) := max fi(x) i {A,B}
20 A Measure of Performance: Regret Given patient covariate x, optimal strategy: give the treatment I (x) := argmax f i(x) i {A,B} optimal recovering probability: f (x) := max fi(x) i {A,B} Suppose at time n, the patient covariate X n is observed. optimal choice: I (X n) the algorithm chooses treatment I n. regret n = f (X n) f In (X n).
21 A Measure of Performance: Regret Given patient covariate x, optimal strategy: give the treatment I (x) := argmax f i(x) i {A,B} optimal recovering probability: f (x) := max fi(x) i {A,B} Suppose at time n, the patient covariate X n is observed. optimal choice: I (X n) the algorithm chooses treatment I n. regret n = f (X n) f In (X n). To measure the overall performance, consider cumulative regret N ( R N := f (X n) f In (X ) n) n=1 An algorithm is strongly consistent if R N = o(n) almost surely.
22 Model Assumptions of f A and f B Parametric framework Woodroofe, 1979; Auer, 2002; Li et al., 2010; Goldenshluger and Zeevi, 2009, 2013; Bastani and Bayati, 2016 Linear models Nonparametric framework Yang and Zhu, 2002; Rigollet and Zeevi, 2010; Perchet and Rigollet, 2013
23 Algorithms Two articles A and B with clicking probabilities f A(x) and f B(x) 1 Deliver each article an equal number of times (e.g., each is delivered n 0 = 20 times): I 1 = A, I 2 = B,, I 2n0 1 = A, I 2n0 = B. 2 For the next internet visit (n = 2n0 + 1), observe the internet user covariate X n. 3 Estimate fa and f B using previous data to obtain ˆf A,n and ˆf B,n. 4 Find the more promising option: în = argmax i {A,B} ˆfi,n(X n); Deliver article with randomization scheme: {în, with probability 1 π n, I n = i, with probability π n, i î n. Observe the result Y In,n.
24 Kernel Estimation Given article A, at each time point n, define J A,n = {j : I j = A, 1 j n 1} Nadaraya-Watson estimator of f A(x): ˆf A,n(x) = j J A,n Y A,jK j J A,n K ( x Xj ( x Xj h n ) h n ) kernel function K(u) : R d R; bandwidth h n Epanechnikov quadratic kernel: K(u) = 3 4 (1 u2 )I( u 1)
25 An UCB-Type Kernel Estimator Upper Confidence Bound (UCB) kernel estimator ˆf A,n(x) = j J A,n Y A,jK j J A,n K ) ( x Xj h n ( x Xj ) + U A,n(x) h n A standard error quantity c (log N) ( j J A,n K 2 x Xj U A,n(x) = ( ) x Xj j J A,n K h n Under uniform kernel K(u) = I( u 1) with N A,n(x) = j J A,n I( X j x h), log N U A,n(x) = c N A,n(x) h n )
26 Algorithm Illustration Deliver each article 20 times. X 1 = 0.93, article A Time n = 1, n A = 1, n B = 0 clicking probablity x
27 Algorithm Illustration Deliver each article 20 times. X 1 = 0.93, article A Time n = 1, n A = 1, n B = 0 clicking probablity x
28 Algorithm Illustration Deliver each article 20 times. X 2 = 0.88, article B Time n = 2, n A = 1, n B = 1 clicking probablity x
29 Algorithm Illustration Deliver each article 20 times. X 2 = 0.88, article B Time n = 2, n A = 1, n B = 1 clicking probablity x
30 Algorithm Illustration Deliver each article 20 times. Time n = 40, n A = 20, n B = 20 clicking probablity x
31 Algorithm Illustration X 41 = Estimate f A(X 41) and f B(X 41) by kernel estimation. Time n = 40, n A = 20, n B = 20 clicking probablity x
32 Algorithm Illustration Estimate f A(X 41) Time n = 40, n A = 20, n B = 20 clicking probablity x
33 Algorithm Illustration Estimate f A(X 41): consider a window [X 41 h, X 41 + h]. Similar information may give similar clicking probability. Time n = 40, n A = 20, n B = 20 clicking probablity x
34 Algorithm Illustration Estimate f A(X 41): consider a window [X 41 h, X 41 + h]. ˆf A(X 41) = 0 Time n = 40, n A = 20, n B = 20 clicking probablity x
35 Algorithm Illustration Estimate f B(X 41): consider a window [X 41 h, X 41 + h]. ˆf B(X 41) = Time n = 40, n A = 20, n B = 20 clicking probablity x
36 Algorithm Illustration Article B looks more promising: ˆfA(X 41) < ˆf B(X 41). π n = 20%: P(I 41 = B H 41) = 80%, P(I 41 = A H 41) = 20% Time n = 40, n A = 20, n B = 20 clicking probablity x
37 Algorithm Illustration Continue the process with decreasing h n and π n to the end. Time n = 800, n A = 349, n B = 451 clicking probablity x
38 Challenges and Contributions Partial information in bandit problem Breakdown of i.i.d. assumptions: Existing consistency results for kernel estimation under i.i.d. or weak dependence assumption do not apply
39 Challenges and Contributions Partial information in bandit problem Breakdown of i.i.d. assumptions: Existing consistency results for kernel estimation under i.i.d. or weak dependence assumption do not apply Technical tools to develop new arguments Martingale theories Hoeffding-type inequalities Chaining methods Stong consistency and finite-time analysis
40 Challenges and Contributions Partial information in bandit problem Breakdown of i.i.d. assumptions: Existing consistency results for kernel estimation under i.i.d. or weak dependence assumption do not apply Technical tools to develop new arguments Martingale theories Hoeffding-type inequalities Chaining methods Stong consistency and finite-time analysis Dimension reduction and model combination
41 Asymptotic Performance Theorem (Qian and Yang, JMLR, 2016a) If f i s (i {A, B}) are uniformly continuous, and h n and π n are chosen to satisfy h n 0, π n 0 and nh 2d n πn/(log 4 n) 3, then Nadaraya-Watson estimators are uniformly strong consistent, that is, for each i {A, B}, sup x [0,1] d ( ˆfi,n(x) f i(x) ) 0 a.s. as n. Estimation uniform strong consistency implies that R N = o(n) almost surely. Equivalently, N n=1 YIn,n N n=1 Y n 1 a.s. as N
42 Finite-Time Regret Analysis Modulus of continuity: ω(h; f) = Hölder continuity: ω(h; f i) ρh κ (0 < κ 1) Theorem (Qian and Yang, JMLR, 2016a) N R N < C 1 n δ + n=n δ sup f(x 1) f(x 2) x 1 x 2 h There exists n δ N such that with probability larger than 1 2δ, ( 2 max ω(hn; f C 2 log(n) ) i) + i {A,B} nh d + π n nπ n ( 1 ) + C 3 N log. δ Upper bound of f (X n) f In (X n) Estimation bias: ω(h n; f i ) Estimation variance: C 2 log(n)/(nh d nπ n) Exploration price: π n
43 Finite-Time Regret Analysis Modulus of continuity: ω(h; f) = Hölder continuity: ω(h; f i) ρh κ (0 < κ 1) Theorem (Qian and Yang, JMLR, 2016a) N R N < C 1 n δ + n=n δ sup f(x 1) f(x 2) x 1 x 2 h There exists n δ N such that with probability larger than 1 2δ, ( 2 max ω(hn; f C 2 log(n) ) i) + i {A,B} nh d + π n nπ n ( 1 ) + C 3 N log. δ Upper bound of f (X n) f In (X n) Nonparametric estimation: Bias-Variance tradeoff Bandit problem: Exploration-Exploitation tradeoff
44 Finite-Time Regret Upper Bounds Under Hölder continuity, when using the kernel UCB-type estimator, ER N < CN d/κ (log N) c. Larger d and smaller κ gives larger power index. Matches minimax rate of Perchet and Rigollet (2013) up to a logarithmic factor.
45 Finite-Time Regret Upper Bounds Under Hölder continuity, when using the kernel UCB-type estimator, ER N < CN d/κ (log N) c. Larger d and smaller κ gives larger power index. Matches minimax rate of Perchet and Rigollet (2013) up to a logarithmic factor. Adaptive performance (Qian and Yang, EJS, 2016b): near minimax rate can be achieved without having κ a priori (0 < c κ 1).
46 Model Combining Different regression methods kernel estimation, histogram, K-nearest neighbors linear regression Model combining: weighted average of different statistical models AFTER (Yang, 2004): combines different forecasting procedures Data-driven algorithm with robust performance
47 Model Combining Illustration clicking probablity f A(x) f B(x) x f A(x) = 0.7e 30(x 0.2) e 30(x 0.8)2 f B(x) = x Time horizon N = 800, π n = 1 log 2 n Model Combining 1 Nadaraya-Watson estimation (h 1 and h 2 ) 2 Linear regression
48 Model Combining Adaptive Performance Per-round regret r n = R n/n r n combined Nadaraya-Watson-h 1 Nadaraya-Watson-h 2 linear regression n
49 Yahoo! Front Page Today Module Dataset 46 million internet visit events with user response and five user covariates in ten days. Contains a pool of about 10 editor-picked news articles. Raw data file is 8GB each day. Algorithms are implemented efficiently in C++. Potentially adapted for online applications.
50 Evaluation Results Algorithms evaluated by click-through-rate (CTR). Complete random Naive simple average (no covariates) LinUCB (Chapelle and Li, 2011): Bayesian logistic regression based algorithm Model combining: Kernel estimation (h 1 = n 1/6, h 2 = n 1/8, h 3 = n 1/10 ) Naive simple average random Naive LinUCB Combining avg. normalized CTR std. dev
51 Conclusion Precision medicine demands online learning for optimal treatment results MABC provides a framework for designing effective treatment allocation rules in a way that integrates the learning from experimentation with maximizing the benefits to the patients along the process Many theoretical and practical issues need to be addressed
52 Some References Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem, Machine Learning, 47, Lai, T. L. and Robbins, H. (1985), Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, 6, Perchet, V. and Rigollet, P. (2013), The multi-armed bandit problem with covariates, The Annals of Statistics, 41, Qian, W. and Yang, Y. (2016a), Kernel estimation and model combination in a bandit problem with covariates, Journal of Machine Learning Research, 17, Qian, W. and Yang, Y. (2016b), Randomized allocation with arm elimination in a bandit problem with covariates, Electronic Journal of Statistics, 10, Robbins, H. (1954), Some aspects of the sequential design of experiments,. Bulletin of the American Mathematical Society, 58, Woodroofe, M. (1979), A one-armed bandit problem with a concomitant variable, Journal of the American Statistical Association, 74, Yang, Y. (2004), Combining forecasting procedures: some theoretical results, Econometric Theory, 20, Yang, Y. and Zhu, D. (2002), Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates, The Annals of Statistics, 30, Yahoo! Academic Relations. (2011) Yahoo! front page today module user click log dataset, version 1.0. (Available from
Tuning bandit algorithms in stochastic environments
Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationMulti-armed bandit problems
Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationDynamic Pricing with Varying Cost
Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy
More informationAdaptive Experiments for Policy Choice. March 8, 2019
Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:
More informationBernoulli Bandits An Empirical Comparison
Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050
More informationAn algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet
More informationMulti-Armed Bandit, Dynamic Environments and Meta-Bandits
Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This
More informationZooming Algorithm for Lipschitz Bandits
Zooming Algorithm for Lipschitz Bandits Alex Slivkins Microsoft Research New York City Based on joint work with Robert Kleinberg and Eli Upfal (STOC'08) Running examples Dynamic pricing. You release a
More informationBandit algorithms for tree search Applications to games, optimization, and planning
Bandit algorithms for tree search Applications to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord Europe Journées MAS
More informationRegression estimation in continuous time with a view towards pricing Bermudan options
with a view towards pricing Bermudan options Tagung des SFB 649 Ökonomisches Risiko in Motzen 04.-06.06.2009 Financial engineering in times of financial crisis Derivate... süßes Gift für die Spekulanten
More informationLearning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme
Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10
More informationRegret Minimization against Strategic Buyers
Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and
More informationHigh Dimensional Bayesian Optimisation and Bandits via Additive Models
1/20 High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kandasamy, Jeff Schneider, Barnabás Póczos ICML 15 July 8 2015 2/20 Bandits & Optimisation Maximum Likelihood inference
More informationMonte-Carlo Planning: Basic Principles and Recent Progress
Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo
More informationMulti-armed bandits in dynamic pricing
Multi-armed bandits in dynamic pricing Arnoud den Boer University of Twente, Centrum Wiskunde & Informatica Amsterdam Lancaster, January 11, 2016 Dynamic pricing A firm sells a product, with abundant inventory,
More informationThe Irrevocable Multi-Armed Bandit Problem
The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation The likelihood and log-likelihood functions are the basis for deriving estimators for parameters, given data. While the shapes of these two functions are different, they have
More informationOnline Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
More informationOptimistic Planning for the Stochastic Knapsack Problem
Optimistic Planning for the Stochastic Knapsack Problem Anonymous Author Anonymous Author 2 Anonymous Author 3 Unknown Institution Unknown Institution 2 Unknown Institution 3 Abstract The stochastic knapsack
More informationLecture 11: Bandits with Knapsacks
CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic
More informationAsymptotic results discrete time martingales and stochastic algorithms
Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete
More informationD I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018
D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING Arnoud V. den Boer University of Amsterdam N. Bora Keskin Duke University Rotterdam May 24, 2018 Dynamic pricing and learning: Learning
More informationImportance Sampling for Fair Policy Selection
Importance Sampling for Fair Policy Selection Shayan Doroudi Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Carnegie Mellon University Pittsburgh, PA 15213 philipt@cs.cmu.edu
More informationTTIC An Introduction to the Theory of Machine Learning. The Adversarial Multi-armed Bandit Problem Avrim Blum.
TTIC 31250 An Introduction to the Theory of Machine Learning The Adversarial Multi-armed Bandit Problem Avrim Blum Start with recap 1 Algorithm Consider the following setting Each morning, you need to
More informationEmpirical Research on Economic Inequality Equivalent variation and welfare
Empirical Research on Economic Inequality Equivalent variation and welfare Maximilian Kasy Harvard University, fall 2015 1 / 1 Welfare versus observables Previous classes: distribution of observable variables
More informationStochastic Approximation Algorithms and Applications
Harold J. Kushner G. George Yin Stochastic Approximation Algorithms and Applications With 24 Figures Springer Contents Preface and Introduction xiii 1 Introduction: Applications and Issues 1 1.0 Outline
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationCooperative Games with Monte Carlo Tree Search
Int'l Conf. Artificial Intelligence ICAI'5 99 Cooperative Games with Monte Carlo Tree Search CheeChian Cheng and Norman Carver Department of Computer Science, Southern Illinois University, Carbondale,
More informationStrategies and Nash Equilibrium. A Whirlwind Tour of Game Theory
Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,
More informationSuperiority by a Margin Tests for the Ratio of Two Proportions
Chapter 06 Superiority by a Margin Tests for the Ratio of Two Proportions Introduction This module computes power and sample size for hypothesis tests for superiority of the ratio of two independent proportions.
More informationRevenue optimization in AdExchange against strategic advertisers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationELEMENTS OF MONTE CARLO SIMULATION
APPENDIX B ELEMENTS OF MONTE CARLO SIMULATION B. GENERAL CONCEPT The basic idea of Monte Carlo simulation is to create a series of experimental samples using a random number sequence. According to the
More informationSensitivity Analysis with Data Tables. 10% annual interest now =$110 one year later. 10% annual interest now =$121 one year later
Sensitivity Analysis with Data Tables Time Value of Money: A Special kind of Trade-Off: $100 @ 10% annual interest now =$110 one year later $110 @ 10% annual interest now =$121 one year later $100 @ 10%
More informationLecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory
CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go
More informationRollout Allocation Strategies for Classification-based Policy Iteration
Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large
More informationGame Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012
Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 22 COOPERATIVE GAME THEORY Correlated Strategies and Correlated
More informationTolerance Intervals for Any Data (Nonparametric)
Chapter 831 Tolerance Intervals for Any Data (Nonparametric) Introduction This routine calculates the sample size needed to obtain a specified coverage of a β-content tolerance interval at a stated confidence
More informationMechanism Design and Auctions
Mechanism Design and Auctions Game Theory Algorithmic Game Theory 1 TOC Mechanism Design Basics Myerson s Lemma Revenue-Maximizing Auctions Near-Optimal Auctions Multi-Parameter Mechanism Design and the
More informationAsymptotic Notation. Instructor: Laszlo Babai June 14, 2002
Asymptotic Notation Instructor: Laszlo Babai June 14, 2002 1 Preliminaries Notation: exp(x) = e x. Throughout this course we shall use the following shorthand in quantifier notation. ( a) is read as for
More informationAuction. Li Zhao, SJTU. Spring, Li Zhao Auction 1 / 35
Auction Li Zhao, SJTU Spring, 2017 Li Zhao Auction 1 / 35 Outline 1 A Simple Introduction to Auction Theory 2 Estimating English Auction 3 Estimating FPA Li Zhao Auction 2 / 35 Background Auctions have
More informationA lower bound on seller revenue in single buyer monopoly auctions
A lower bound on seller revenue in single buyer monopoly auctions Omer Tamuz October 7, 213 Abstract We consider a monopoly seller who optimally auctions a single object to a single potential buyer, with
More informationBump detection in heterogeneous Gaussian regression
Bump detection in heterogeneous Gaussian regression Frank Werner 1, joint with Farida Enikeeva 3,4, Axel Munk 1, 1 Statistical Inverse Problems in Biophysics group, MPIbpC University of Göttingen 3 Université
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationChapter 7: Estimation Sections
1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationAsymptotic Theory for Renewal Based High-Frequency Volatility Estimation
Asymptotic Theory for Renewal Based High-Frequency Volatility Estimation Yifan Li 1,2 Ingmar Nolte 1 Sandra Nolte 1 1 Lancaster University 2 University of Manchester 4th Konstanz - Lancaster Workshop on
More informationCS364A: Algorithmic Game Theory Lecture #3: Myerson s Lemma
CS364A: Algorithmic Game Theory Lecture #3: Myerson s Lemma Tim Roughgarden September 3, 23 The Story So Far Last time, we introduced the Vickrey auction and proved that it enjoys three desirable and different
More informationSTAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.
STAT 509: Statistics for Engineers Dr. Dewei Wang Applied Statistics and Probability for Engineers Sixth Edition Douglas C. Montgomery George C. Runger 7 Point CHAPTER OUTLINE 7-1 Point Estimation 7-2
More informationGroup-Sequential Tests for Two Proportions
Chapter 220 Group-Sequential Tests for Two Proportions Introduction Clinical trials are longitudinal. They accumulate data sequentially through time. The participants cannot be enrolled and randomized
More informationApplying Risk Theory to Game Theory Tristan Barnett. Abstract
Applying Risk Theory to Game Theory Tristan Barnett Abstract The Minimax Theorem is the most recognized theorem for determining strategies in a two person zerosum game. Other common strategies exist such
More informationExperiments on universal portfolio selection using data from real markets
Experiments on universal portfolio selection using data from real markets László Györfi, Frederic Udina, Harro Walk January 17, 2008 Abstract In recent years optimal portfolio selection strategies for
More informationX i = 124 MARTINGALES
124 MARTINGALES 5.4. Optimal Sampling Theorem (OST). First I stated it a little vaguely: Theorem 5.12. Suppose that (1) T is a stopping time (2) M n is a martingale wrt the filtration F n (3) certain other
More informationBlack-Scholes and Game Theory. Tushar Vaidya ESD
Black-Scholes and Game Theory Tushar Vaidya ESD Sequential game Two players: Nature and Investor Nature acts as an adversary, reveals state of the world S t Investor acts by action a t Investor incurs
More informationIEOR E4703: Monte-Carlo Simulation
IEOR E4703: Monte-Carlo Simulation Other Miscellaneous Topics and Applications of Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Gittins Index: Discounted, Bayesian (hence Markov arms). Reduces to stopping problem for each arm. Interpretation as (scaled)
More informationOptimal Regret Minimization in Posted-Price Auctions with Strategic Buyers
Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers Mehryar Mohri Courant Institute and Google Research 251 Mercer Street New York, NY 10012 mohri@cims.nyu.edu Andres Muñoz Medina
More informationData-Driven Pricing of Demand Response
Data-Driven Pricing of Demand Response Kia Khezeli Eilyan Bitar Abstract We consider the setting in which an electric power utility seeks to curtail its peak electricity demand by offering a fixed group
More informationOptimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models
Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics
More informationDynamics of the Second Price
Dynamics of the Second Price Julian Romero and Eric Bax October 17, 2008 Abstract Many auctions for online ad space use estimated offer values and charge the winner based on an estimate of the runner-up
More informationIntroduction to Algorithmic Trading Strategies Lecture 8
Introduction to Algorithmic Trading Strategies Lecture 8 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References
More informationCalibration Estimation under Non-response and Missing Values in Auxiliary Information
WORKING PAPER 2/2015 Calibration Estimation under Non-response and Missing Values in Auxiliary Information Thomas Laitila and Lisha Wang Statistics ISSN 1403-0586 http://www.oru.se/institutioner/handelshogskolan-vid-orebro-universitet/forskning/publikationer/working-papers/
More informationTesting for Weak Form Efficiency of Stock Markets
Testing for Weak Form Efficiency of Stock Markets Jonathan B. Hill 1 Kaiji Motegi 2 1 University of North Carolina at Chapel Hill 2 Kobe University The 3rd Annual International Conference on Applied Econometrics
More informationEffects of skewness and kurtosis on model selection criteria
Economics Letters 59 (1998) 17 Effects of skewness and kurtosis on model selection criteria * Sıdıka Başçı, Asad Zaman Department of Economics, Bilkent University, 06533, Bilkent, Ankara, Turkey Received
More informationMDP Algorithms. Thomas Keller. June 20, University of Basel
MDP Algorithms Thomas Keller University of Basel June 20, 208 Outline of this lecture Markov decision processes Planning via determinization Monte-Carlo methods Monte-Carlo Tree Search Heuristic Search
More informationMonte-Carlo Planning Look Ahead Trees. Alan Fern
Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy
More information1 Bandit View on Noisy Optimization
1 Bandit View on Noisy Optimization Jean-Yves Audibert audibert@certis.enpc.fr Imagine, Université Paris Est; Willow, CNRS/ENS/INRIA Paris, France Sébastien Bubeck sebastien.bubeck@inria.fr Sequel Project,
More informationBandit Learning with switching costs
Bandit Learning with switching costs Jian Ding, University of Chicago joint with: Ofer Dekel (MSR), Tomer Koren (Technion) and Yuval Peres (MSR) June 2016, Harvard University Online Learning with k -Actions
More informationProblem Set 3. Thomas Philippon. April 19, Human Wealth, Financial Wealth and Consumption
Problem Set 3 Thomas Philippon April 19, 2002 1 Human Wealth, Financial Wealth and Consumption The goal of the question is to derive the formulas on p13 of Topic 2. This is a partial equilibrium analysis
More informationUniversal Portfolios
CS28B/Stat24B (Spring 2008) Statistical Learning Theory Lecture: 27 Universal Portfolios Lecturer: Peter Bartlett Scribes: Boriska Toth and Oriol Vinyals Portfolio optimization setting Suppose we have
More informationDynamic Programming and Reinforcement Learning
Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning
More informationMATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS
MATH307/37 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS School of Mathematics and Statistics Semester, 04 Tutorial problems should be used to test your mathematical skills and understanding of the lecture material.
More informationCore-Selecting Auction Design for Dynamically Allocating Heterogeneous VMs in Cloud Computing
Core-Selecting Auction Design for Dynamically Allocating Heterogeneous VMs in Cloud Computing Haoming Fu, Zongpeng Li, Chuan Wu, Xiaowen Chu University of Calgary The University of Hong Kong Hong Kong
More informationDynamic Pricing for Competing Sellers
Clemson University TigerPrints All Theses Theses 8-2015 Dynamic Pricing for Competing Sellers Liu Zhu Clemson University, liuz@clemson.edu Follow this and additional works at: https://tigerprints.clemson.edu/all_theses
More informationAn introduction to game-theoretic probability from statistical viewpoint
.. An introduction to game-theoretic probability from statistical viewpoint Akimichi Takemura (joint with M.Kumon, K.Takeuchi and K.Miyabe) University of Tokyo May 14, 2013 RPTC2013 Takemura (Univ. of
More informationFuel-Switching Capability
Fuel-Switching Capability Alain Bousquet and Norbert Ladoux y University of Toulouse, IDEI and CEA June 3, 2003 Abstract Taking into account the link between energy demand and equipment choice, leads to
More informationSOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS
SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS Questions 1-307 have been taken from the previous set of Exam C sample questions. Questions no longer relevant
More informationCS364B: Frontiers in Mechanism Design Lecture #18: Multi-Parameter Revenue-Maximization
CS364B: Frontiers in Mechanism Design Lecture #18: Multi-Parameter Revenue-Maximization Tim Roughgarden March 5, 2014 1 Review of Single-Parameter Revenue Maximization With this lecture we commence the
More informationIEOR E4602: Quantitative Risk Management
IEOR E4602: Quantitative Risk Management Risk Measures Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Reference: Chapter 8
More informationProbability. An intro for calculus students P= Figure 1: A normal integral
Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2016 Slide 1 CPSC 422, Lecture 9 An MDP Approach to Multi-Category Patient Scheduling in a Diagnostic Facility Adapted from: Matthew
More informationAN ONLINE LEARNING APPROACH TO ALGORITHMIC BIDDING FOR VIRTUAL TRADING
AN ONLINE LEARNING APPROACH TO ALGORITHMIC BIDDING FOR VIRTUAL TRADING Lang Tong School of Electrical & Computer Engineering Cornell University, Ithaca, NY Joint work with Sevi Baltaoglu and Qing Zhao
More informationAvailable online at ScienceDirect. Procedia Computer Science 95 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 95 (2016 ) 483 488 Complex Adaptive Systems, Publication 6 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri
More informationChapter 7 1. Random Variables
Chapter 7 1 Random Variables random variable numerical variable whose value depends on the outcome of a chance experiment - discrete if its possible values are isolated points on a number line - continuous
More informationInformation Theory and Networks
Information Theory and Networks Lecture 18: Information Theory and the Stock Market Paul Tune http://www.maths.adelaide.edu.au/matthew.roughan/ Lecture_notes/InformationTheory/
More informationFinancial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA
Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA Rajesh Bordawekar and Daniel Beece IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation
More informationJournal of Computational and Applied Mathematics. The mean-absolute deviation portfolio selection problem with interval-valued returns
Journal of Computational and Applied Mathematics 235 (2011) 4149 4157 Contents lists available at ScienceDirect Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
More informationStatistics for Business and Economics
Statistics for Business and Economics Chapter 5 Continuous Random Variables and Probability Distributions Ch. 5-1 Probability Distributions Probability Distributions Ch. 4 Discrete Continuous Ch. 5 Probability
More informationFinal exam solutions
EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the
More informationLecture Note 6 of Bus 41202, Spring 2017: Alternative Approaches to Estimating Volatility.
Lecture Note 6 of Bus 41202, Spring 2017: Alternative Approaches to Estimating Volatility. Some alternative methods: (Non-parametric methods) Moving window estimates Use of high-frequency financial data
More informationLecture 12: Introduction to reasoning under uncertainty. Actions and Consequences
Lecture 12: Introduction to reasoning under uncertainty Preferences Utility functions Maximizing expected utility Value of information Bandit problems and the exploration-exploitation trade-off COMP-424,
More informationEntropic Derivative Security Valuation
Entropic Derivative Security Valuation Michael Stutzer 1 Professor of Finance and Director Burridge Center for Securities Analysis and Valuation University of Colorado, Boulder, CO 80309 1 Mathematical
More informationSupplemental Online Appendix to Han and Hong, Understanding In-House Transactions in the Real Estate Brokerage Industry
Supplemental Online Appendix to Han and Hong, Understanding In-House Transactions in the Real Estate Brokerage Industry Appendix A: An Agent-Intermediated Search Model Our motivating theoretical framework
More informationMath 416/516: Stochastic Simulation
Math 416/516: Stochastic Simulation Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 13 Haijun Li Math 416/516: Stochastic Simulation Week 13 1 / 28 Outline 1 Simulation
More information12 The Bootstrap and why it works
12 he Bootstrap and why it works For a review of many applications of bootstrap see Efron and ibshirani (1994). For the theory behind the bootstrap see the books by Hall (1992), van der Waart (2000), Lahiri
More informationHigh Dimensional Edgeworth Expansion. Applications to Bootstrap and Its Variants
With Applications to Bootstrap and Its Variants Department of Statistics, UC Berkeley Stanford-Berkeley Colloquium, 2016 Francis Ysidro Edgeworth (1845-1926) Peter Gavin Hall (1951-2016) Table of Contents
More informationDO NOT OPEN THIS QUESTION PAPER UNTIL YOU ARE TOLD TO DO SO. Performance Pillar. P1 Performance Operations. Wednesday 27 August 2014
DO NOT OPEN THIS QUESTION PAPER UNTIL YOU ARE TOLD TO DO SO. Performance Pillar P1 Performance Operations Instructions to candidates Wednesday 27 August 2014 You are allowed three hours to answer this
More informationTwo-Sample Z-Tests Assuming Equal Variance
Chapter 426 Two-Sample Z-Tests Assuming Equal Variance Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample z-tests when the variances of the two groups
More informationIdiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective
Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective Alisdair McKay Boston University June 2013 Microeconomic evidence on insurance - Consumption responds to idiosyncratic
More information