Multi-armed bandit problems

Size: px
Start display at page:

Download "Multi-armed bandit problems"

Transcription

1 Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013

2 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before 31 March: Hand in exercises. Papers and exercises can be found at Please make four groups and me before the end of this week.

3 Outline for today Optimization under uncertainty Multi-armed bandit problems Upper bounds on the performance of policies Lower bounds on the performance of policies

4 Decision making under uncertainty Deterministic optimization problem: max f (x; θ), x X where θ Θ is unknown.

5 Decision making under uncertainty Deterministic optimization problem: max f (x; θ), x X where θ Θ is unknown. Robust optimization approach: max min x X θ Θ f (x; θ).

6 Decision making under uncertainty Deterministic optimization problem: max f (x; θ), x X where θ Θ is unknown. Robust optimization approach: max min x X θ Θ f (x; θ). Note the difference with min θ Θ max x X f (x; θ) Also note that X is known and deterministic (else Stochastic Programming, chance constraints).

7 Decision making under uncertainty In stochastic decision problems, f is random and one typically maximizes the expectation of f, max E[f (x; θ)]. x X For example, θ parametrizes the distribution of a random variable Y θ (x) which depends on x, and we solve max E[f (x; Y θ(x))]. x X

8 Decision making under uncertainty If data D = (x i, y i (x i )) 1 i n is available, the value of θ may be inferred.

9 Decision making under uncertainty If data D = (x i, y i (x i )) 1 i n is available, the value of θ may be inferred. 1) Let ˆθ = ˆθ(D) be an estimate of θ (e.g. LS, MLE,..). 2) Solve max E[f (x; x X Yˆθ(x))].

10 Decision making under uncertainty If data D = (x i, y i (x i )) 1 i n is available, the value of θ may be inferred. 1) Let ˆθ = ˆθ(D) be an estimate of θ (e.g. LS, MLE,..). 2) Solve max E[f (x; x X Yˆθ(x))]. Robust alternatives are possible, e.g. max min E[f (x; Y θ(x))], x X θ CI where CI is a 95% confidence interval: P (θ CI) 0.95.

11 Decision making under uncertainty Consider a discrete-time sequential stochastic decision problem under uncertainty: x t = arg max E[f (x; Y θ (x))], x X (t N, θ unknown), where previous decisions x 1,..., x t 1 and observed realizations of Y θ (x 1 ),..., Y θ (x t 1 ) can be used to estimate θ.

12 Decision making under uncertainty Consider a discrete-time sequential stochastic decision problem under uncertainty: x t = arg max E[f (x; Y θ (x))], x X (t N, θ unknown), where previous decisions x 1,..., x t 1 and observed realizations of Y θ (x 1 ),..., Y θ (x t 1 ) can be used to estimate θ. Then periodically updating ˆθ may be beneficial.

13 Decision making under uncertainty DATA

14 Decision making under uncertainty Estimate unknown parameters STATISTICS DATA

15 Decision making under uncertainty OPTIMIZATION Determine optimal decision Estimate unknown parameters STATISTICS Collect new data DATA

16 Decision making under uncertainty OPTIMIZATION Determine optimal decision Estimate unknown parameters STATISTICS Collect new data DATA

17 Decision making under uncertainty OPTIMIZATION Determine optimal decision Estimate unknown parameters STATISTICS Collect new data DATA

18 Decision making under uncertainty Examples of sequential stochastic decision problems under uncertainty: Clinical trials Optimal placement of online advertisements Recommendation systems Optimal routing Dynamic pricing Inventory control...

19 Decision making under uncertainty Myopic policy: x t arg max E[f (x; Yˆθ t (x))] for all suff. large t, x X where ˆθ t is an estimate of θ, based on x 1,..., x t 1 and realizations of Y θ (x 1 ),..., Y θ (x t ).

20 Decision making under uncertainty Myopic policy: x t arg max E[f (x; Yˆθ t (x))] for all suff. large t, x X where ˆθ t is an estimate of θ, based on x 1,..., x t 1 and realizations of Y θ (x 1 ),..., Y θ (x t ). Typical questions: How well does a myopic policy perform? Is experimentation beneficial? Given a policy, what are the costs-for-learning? What are the lowest costs-for-learning achievable by any policy?

21 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ).

22 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled.

23 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i.

24 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i. Let I t denote the arm pulled at time t. Each I t may depend on previously chosen arms and observed rewards, but not on the future.

25 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i. Let I t denote the arm pulled at time t. Each I t may depend on previously chosen arms and observed rewards, but not on the future. Goal: maximize the expected reward n t=1 E[µ I t ].

26 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i. Let I t denote the arm pulled at time t. Each I t may depend on previously chosen arms and observed rewards, but not on the future. Goal: maximize the expected reward n t=1 E[µ I t ]. Alternatively, minimize the regret R n : R n = nµ i n t=1 E[µ I t ], where i arg max µ i. i

27 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms.

28 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms.

29 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms.

30 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms. Stationary reward distribution.

31 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms. Stationary reward distribution. Finite time horizon.

32 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms. Stationary reward distribution. Finite time horizon. Non-Bayesian.

33 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K.

34 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K. Estimate N ˆµ i = N 1 X i,t, t=1 where X i,1,..., X i,n are the N rewards observed from pulling arm i. Use an arm j s.t. ˆµ j = max i ˆµ i during time periods KN + 1,..., n.

35 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K. Estimate N ˆµ i = N 1 X i,t, t=1 where X i,1,..., X i,n are the N rewards observed from pulling arm i. Use an arm j s.t. ˆµ j = max i ˆµ i during time periods KN + 1,..., n. Observe: both exploration and exploitation.

36 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K. Estimate N ˆµ i = N 1 X i,t, t=1 where X i,1,..., X i,n are the N rewards observed from pulling arm i. Use an arm j s.t. ˆµ j = max i ˆµ i during time periods KN + 1,..., n. Observe: both exploration and exploitation. One can show R n = O(log n), by choosing N appropriately.

37 Multi-armed bandit problems Some disadvantages of the simple policy: Does not use all data to estimate µ i Needs to know n in advance With positive probability, the fraction that the optimal arm is chosen is o(n). Alternative policy?

38 Multi-armed bandit problems UCB1: Idea: determine a confidence interval for ˆµ i. Use the arm with the highest confidence interval.

39 Multi-armed bandit problems UCB1: Idea: determine a confidence interval for ˆµ i. Use the arm with the highest confidence interval. Choose each arm once. For all t = K + 1,..., n, play machine j that maximizes ˆµ j + 2 log t T j (t), where ˆµ j is the average reward obtained from arm j, and T j (t) is the number of times arm j has been played up to time t.

40 Multi-armed bandit problems UCB1: Idea: determine a confidence interval for ˆµ i. Use the arm with the highest confidence interval. Choose each arm once. For all t = K + 1,..., n, play machine j that maximizes ˆµ j + 2 log t T j (t), where ˆµ j is the average reward obtained from arm j, and T j (t) is the number of times arm j has been played up to time t. Again one can show R n = O(log n)

41 Multi-armed bandit problems We have seen two policies with R n = O(log n). Can any policy do better?

42 Multi-armed bandit problems We have seen two policies with R n = O(log n). Can any policy do better? No. All policies have R n = Ω(log n). Lai and Robbins (1985): Any uniformly good policy and any suboptimal arm j lim inf t E[T j (t)] log t 1 D KL (X j X i ) a.s., where D KL (P Q) is the Kullback-Leibler divergence between r.v. P and Q, and uniformly good means E[T j (n)] = o(n a ) for all a > 0 and all suboptimal arms j.

43 Multi-armed bandit problems How to choose between different policies, each with logarithmic regret? Constant before the log-term Finite-time behavior Variance of regret Some numerical studies: Kuleshov and Precup (2000), Vermorel and Mohri (2005) (on website).

44 Reminder: presentations next week Topics: See 1 Incomplete learning (20 March) 2 Adversarial bandits (20 March) 3 Non-stationarity (21 March) 4 Continuum-armed bandits (21 March) for papers and more information. Please make four groups and me (a.v.d.boer@tue.nl) before the end of this week. First-come-first-served.

Multi-armed bandits in dynamic pricing

Multi-armed bandits in dynamic pricing Multi-armed bandits in dynamic pricing Arnoud den Boer University of Twente, Centrum Wiskunde & Informatica Amsterdam Lancaster, January 11, 2016 Dynamic pricing A firm sells a product, with abundant inventory,

More information

Dynamic Pricing with Varying Cost

Dynamic Pricing with Varying Cost Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy

More information

Tuning bandit algorithms in stochastic environments

Tuning bandit algorithms in stochastic environments Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference

More information

Bernoulli Bandits An Empirical Comparison

Bernoulli Bandits An Empirical Comparison Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018

D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018 D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING Arnoud V. den Boer University of Amsterdam N. Bora Keskin Duke University Rotterdam May 24, 2018 Dynamic pricing and learning: Learning

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Treatment Allocations Based on Multi-Armed Bandit Strategies

Treatment Allocations Based on Multi-Armed Bandit Strategies Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:

More information

Chapter 8: Sampling distributions of estimators Sections

Chapter 8: Sampling distributions of estimators Sections Chapter 8 continued Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample

More information

Adaptive Experiments for Policy Choice. March 8, 2019

Adaptive Experiments for Policy Choice. March 8, 2019 Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:

More information

Regret Minimization against Strategic Buyers

Regret Minimization against Strategic Buyers Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and

More information

Confidence Intervals Introduction

Confidence Intervals Introduction Confidence Intervals Introduction A point estimate provides no information about the precision and reliability of estimation. For example, the sample mean X is a point estimate of the population mean μ

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 31 : Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood

More information

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ. 9 Point estimation 9.1 Rationale behind point estimation When sampling from a population described by a pdf f(x θ) or probability function P [X = x θ] knowledge of θ gives knowledge of the entire population.

More information

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions Frequentist Methods: 7.5 Maximum Likelihood Estimators

More information

Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme

Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10

More information

Learning From Data: MLE. Maximum Likelihood Estimators

Learning From Data: MLE. Maximum Likelihood Estimators Learning From Data: MLE Maximum Likelihood Estimators 1 Parameter Estimation Assuming sample x1, x2,..., xn is from a parametric distribution f(x θ), estimate θ. E.g.: Given sample HHTTTTTHTHTTTHH of (possibly

More information

Online Network Revenue Management using Thompson Sampling

Online Network Revenue Management using Thompson Sampling Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira

More information

The Irrevocable Multi-Armed Bandit Problem

The Irrevocable Multi-Armed Bandit Problem The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision

More information

Lecture 11: Bandits with Knapsacks

Lecture 11: Bandits with Knapsacks CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic

More information

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

17 MAKING COMPLEX DECISIONS

17 MAKING COMPLEX DECISIONS 267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the

More information

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Spring 2009 Main question: How much are patents worth? Answering this question is important, because it helps

More information

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This

More information

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n 6. Martingales For casino gamblers, a martingale is a betting strategy where (at even odds) the stake doubled each time the player loses. Players follow this strategy because, since they will eventually

More information

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased. Point Estimation Point Estimation Definition A point estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic

More information

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go

More information

Zooming Algorithm for Lipschitz Bandits

Zooming Algorithm for Lipschitz Bandits Zooming Algorithm for Lipschitz Bandits Alex Slivkins Microsoft Research New York City Based on joint work with Robert Kleinberg and Eli Upfal (STOC'08) Running examples Dynamic pricing. You release a

More information

Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms

Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms 1 Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms Pouya Tehrani, Yixuan Zhai, Qing Zhao Department of Electrical and Computer Engineering University of California,

More information

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems. Practice Exercises for Midterm Exam ST 522 - Statistical Theory - II The ACTUAL exam will consists of less number of problems. 1. Suppose X i F ( ) for i = 1,..., n, where F ( ) is a strictly increasing

More information

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move

More information

Module 10:Application of stochastic processes in areas like finance Lecture 36:Black-Scholes Model. Stochastic Differential Equation.

Module 10:Application of stochastic processes in areas like finance Lecture 36:Black-Scholes Model. Stochastic Differential Equation. Stochastic Differential Equation Consider. Moreover partition the interval into and define, where. Now by Rieman Integral we know that, where. Moreover. Using the fundamentals mentioned above we can easily

More information

Applied Statistics I

Applied Statistics I Applied Statistics I Liang Zhang Department of Mathematics, University of Utah July 14, 2008 Liang Zhang (UofU) Applied Statistics I July 14, 2008 1 / 18 Point Estimation Liang Zhang (UofU) Applied Statistics

More information

INVERSE REWARD DESIGN

INVERSE REWARD DESIGN INVERSE REWARD DESIGN Dylan Hadfield-Menell, Smith Milli, Pieter Abbeel, Stuart Russell, Anca Dragan University of California, Berkeley Slides by Anthony Chen Inverse Reinforcement Learning (Review) Inverse

More information

Bandit algorithms for tree search Applications to games, optimization, and planning

Bandit algorithms for tree search Applications to games, optimization, and planning Bandit algorithms for tree search Applications to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord Europe Journées MAS

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result

More information

Dynamic Pricing for Competing Sellers

Dynamic Pricing for Competing Sellers Clemson University TigerPrints All Theses Theses 8-2015 Dynamic Pricing for Competing Sellers Liu Zhu Clemson University, liuz@clemson.edu Follow this and additional works at: https://tigerprints.clemson.edu/all_theses

More information

Back to estimators...

Back to estimators... Back to estimators... So far, we have: Identified estimators for common parameters Discussed the sampling distributions of estimators Introduced ways to judge the goodness of an estimator (bias, MSE, etc.)

More information

MVE051/MSG Lecture 7

MVE051/MSG Lecture 7 MVE051/MSG810 2017 Lecture 7 Petter Mostad Chalmers November 20, 2017 The purpose of collecting and analyzing data Purpose: To build and select models for parts of the real world (which can be used for

More information

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints David Laibson 9/11/2014 Outline: 1. Precautionary savings motives 2. Liquidity constraints 3. Application: Numerical solution

More information

Revenue optimization in AdExchange against strategic advertisers

Revenue optimization in AdExchange against strategic advertisers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

The Value of Stochastic Modeling in Two-Stage Stochastic Programs

The Value of Stochastic Modeling in Two-Stage Stochastic Programs The Value of Stochastic Modeling in Two-Stage Stochastic Programs Erick Delage, HEC Montréal Sharon Arroyo, The Boeing Cie. Yinyu Ye, Stanford University Tuesday, October 8 th, 2013 1 Delage et al. Value

More information

Budget Management In GSP (2018)

Budget Management In GSP (2018) Budget Management In GSP (2018) Yahoo! March 18, 2018 Miguel March 18, 2018 1 / 26 Today s Presentation: Budget Management Strategies in Repeated auctions, Balseiro, Kim, and Mahdian, WWW2017 Learning

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

16 MAKING SIMPLE DECISIONS

16 MAKING SIMPLE DECISIONS 253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

Chapter 8. Introduction to Statistical Inference

Chapter 8. Introduction to Statistical Inference Chapter 8. Introduction to Statistical Inference Point Estimation Statistical inference is to draw some type of conclusion about one or more parameters(population characteristics). Now you know that a

More information

A selection of MAS learning techniques based on RL

A selection of MAS learning techniques based on RL A selection of MAS learning techniques based on RL Ann Nowé 14/11/12 Herhaling titel van presentatie 1 Content Single stage setting Common interest (Claus & Boutilier, Kapetanakis&Kudenko) Conflicting

More information

Chapter 8: Sampling distributions of estimators Sections

Chapter 8: Sampling distributions of estimators Sections Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.

More information

Lecture outline W.B.Powell 1

Lecture outline W.B.Powell 1 Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous

More information

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems Spring 2005 1. Which of the following statements relate to probabilities that can be interpreted as frequencies?

More information

Lecture Notes 1

Lecture Notes 1 4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross

More information

Monte-Carlo Planning: Basic Principles and Recent Progress

Monte-Carlo Planning: Basic Principles and Recent Progress Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo

More information

CS340 Machine learning Bayesian model selection

CS340 Machine learning Bayesian model selection CS340 Machine learning Bayesian model selection Bayesian model selection Suppose we have several models, each with potentially different numbers of parameters. Example: M0 = constant, M1 = straight line,

More information

Dealing with forecast uncertainty in inventory models

Dealing with forecast uncertainty in inventory models Dealing with forecast uncertainty in inventory models 19th IIF workshop on Supply Chain Forecasting for Operations Lancaster University Dennis Prak Supervisor: Prof. R.H. Teunter June 29, 2016 Dennis Prak

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

Chapter 5. Sampling Distributions

Chapter 5. Sampling Distributions Lecture notes, Lang Wu, UBC 1 Chapter 5. Sampling Distributions 5.1. Introduction In statistical inference, we attempt to estimate an unknown population characteristic, such as the population mean, µ,

More information

Rollout Allocation Strategies for Classification-based Policy Iteration

Rollout Allocation Strategies for Classification-based Policy Iteration Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large

More information

Chapter 5. Statistical inference for Parametric Models

Chapter 5. Statistical inference for Parametric Models Chapter 5. Statistical inference for Parametric Models Outline Overview Parameter estimation Method of moments How good are method of moments estimates? Interval estimation Statistical Inference for Parametric

More information

Dynamic Portfolio Choice II

Dynamic Portfolio Choice II Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic

More information

Problem 1: Random variables, common distributions and the monopoly price

Problem 1: Random variables, common distributions and the monopoly price Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively

More information

Revenue Management with Incomplete Demand Information

Revenue Management with Incomplete Demand Information Revenue Management with Incomplete Demand Information Victor F. Araman René Caldentey Stern School of Business, New York University, New York, NY 10012. Abstract Consider a seller who is endowed with a

More information

AM 121: Intro to Optimization Models and Methods

AM 121: Intro to Optimization Models and Methods AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,

More information

CMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory

CMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory CMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory Instructor: Mohammad T. Hajiaghayi Scribe: Hyoungtae Cho October 13, 2010 1 Overview In this lecture, we introduce the

More information

EE641 Digital Image Processing II: Purdue University VISE - October 29,

EE641 Digital Image Processing II: Purdue University VISE - October 29, EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 The EM Algorithm. Suffient Statistics and Exponential Distributions Let p(y θ) be a family of density functions parameterized by

More information

1 The EOQ and Extensions

1 The EOQ and Extensions IEOR4000: Production Management Lecture 2 Professor Guillermo Gallego September 16, 2003 Lecture Plan 1. The EOQ and Extensions 2. Multi-Item EOQ Model 1 The EOQ and Extensions We have explored some of

More information

EE365: Risk Averse Control

EE365: Risk Averse Control EE365: Risk Averse Control Risk averse optimization Exponential risk aversion Risk averse control 1 Outline Risk averse optimization Exponential risk aversion Risk averse control Risk averse optimization

More information

Final exam solutions

Final exam solutions EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the

More information

Lecture 22. Survey Sampling: an Overview

Lecture 22. Survey Sampling: an Overview Math 408 - Mathematical Statistics Lecture 22. Survey Sampling: an Overview March 25, 2013 Konstantin Zuev (USC) Math 408, Lecture 22 March 25, 2013 1 / 16 Survey Sampling: What and Why In surveys sampling

More information

Unobserved Heterogeneity Revisited

Unobserved Heterogeneity Revisited Unobserved Heterogeneity Revisited Robert A. Miller Dynamic Discrete Choice March 2018 Miller (Dynamic Discrete Choice) cemmap 7 March 2018 1 / 24 Distributional Assumptions about the Unobserved Variables

More information

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY A. Ben-Tal, B. Golany and M. Rozenblit Faculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel ABSTRACT

More information

Asymptotic results discrete time martingales and stochastic algorithms

Asymptotic results discrete time martingales and stochastic algorithms Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete

More information

Universal Portfolios

Universal Portfolios CS28B/Stat24B (Spring 2008) Statistical Learning Theory Lecture: 27 Universal Portfolios Lecturer: Peter Bartlett Scribes: Boriska Toth and Oriol Vinyals Portfolio optimization setting Suppose we have

More information

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 7 - Lecture 1 General concepts and criteria Chapter 7 - Lecture 1 General concepts and criteria January 29th, 2010 Best estimator Mean Square error Unbiased estimators Example Unbiased estimators not unique Special case MVUE Bootstrap General Question

More information

Behavioral Competitive Equilibrium and Extreme Prices. Faruk Gul Wolfgang Pesendorfer Tomasz Strzalecki

Behavioral Competitive Equilibrium and Extreme Prices. Faruk Gul Wolfgang Pesendorfer Tomasz Strzalecki Behavioral Competitive Equilibrium and Extreme Prices Faruk Gul Wolfgang Pesendorfer Tomasz Strzalecki behavioral optimization behavioral optimization restricts agents ability by imposing additional constraints

More information

SOLVING ROBUST SUPPLY CHAIN PROBLEMS

SOLVING ROBUST SUPPLY CHAIN PROBLEMS SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated

More information

Prediction Market, Mechanism Design, and Cooperative Game Theory

Prediction Market, Mechanism Design, and Cooperative Game Theory Prediction Market, Mechanism Design, and Cooperative Game Theory V. Conitzer presented by Janyl Jumadinova October 16, 2009 Prediction Markets Created for the purpose of making predictions by obtaining

More information

Chapter 4: Asymptotic Properties of MLE (Part 3)

Chapter 4: Asymptotic Properties of MLE (Part 3) Chapter 4: Asymptotic Properties of MLE (Part 3) Daniel O. Scharfstein 09/30/13 1 / 1 Breakdown of Assumptions Non-Existence of the MLE Multiple Solutions to Maximization Problem Multiple Solutions to

More information

Statistical analysis and bootstrapping

Statistical analysis and bootstrapping Statistical analysis and bootstrapping p. 1/15 Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Statistical analysis and bootstrapping

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

A New Hybrid Estimation Method for the Generalized Pareto Distribution

A New Hybrid Estimation Method for the Generalized Pareto Distribution A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD

More information

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1 Math 321 Chapter 5 Confidence Intervals (draft version 2019/04/11-11:17:37) Contents 1 Introduction 1 2 Confidence interval for mean µ 2 2.1 Known variance................................. 2 2.2 Unknown

More information

INTERTEMPORAL ASSET ALLOCATION: THEORY

INTERTEMPORAL ASSET ALLOCATION: THEORY INTERTEMPORAL ASSET ALLOCATION: THEORY Multi-Period Model The agent acts as a price-taker in asset markets and then chooses today s consumption and asset shares to maximise lifetime utility. This multi-period

More information

Modelling, Estimation and Hedging of Longevity Risk

Modelling, Estimation and Hedging of Longevity Risk IA BE Summer School 2016, K. Antonio, UvA 1 / 50 Modelling, Estimation and Hedging of Longevity Risk Katrien Antonio KU Leuven and University of Amsterdam IA BE Summer School 2016, Leuven Module II: Fitting

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Recharging Bandits. Joint work with Nicole Immorlica.

Recharging Bandits. Joint work with Nicole Immorlica. Recharging Bandits Bobby Kleinberg Cornell University Joint work with Nicole Immorlica. NYU Machine Learning Seminar New York, NY 24 Oct 2017 Prologue Can you construct a dinner schedule that: never goes

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 12, 2018 CS 361: Probability & Statistics Inference Binomial likelihood: Example Suppose we have a coin with an unknown probability of heads. We flip the coin 10 times and observe 2 heads. What can

More information

Lecture 12: Introduction to reasoning under uncertainty. Actions and Consequences

Lecture 12: Introduction to reasoning under uncertainty. Actions and Consequences Lecture 12: Introduction to reasoning under uncertainty Preferences Utility functions Maximizing expected utility Value of information Bandit problems and the exploration-exploitation trade-off COMP-424,

More information

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel STATISTICS Lecture no. 10 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 8. 12. 2009 Introduction Suppose that we manufacture lightbulbs and we want to state

More information

CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization

CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization March 9 16, 2018 1 / 19 The portfolio optimization problem How to best allocate our money to n risky assets S 1,..., S n with

More information

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz 1 EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu

More information

All Investors are Risk-averse Expected Utility Maximizers. Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel)

All Investors are Risk-averse Expected Utility Maximizers. Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel) All Investors are Risk-averse Expected Utility Maximizers Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel) First Name: Waterloo, April 2013. Last Name: UW ID #:

More information

Teaching Bandits How to Behave

Teaching Bandits How to Behave Teaching Bandits How to Behave Manuscript Yiling Chen, Jerry Kung, David Parkes, Ariel Procaccia, Haoqi Zhang Abstract Consider a setting in which an agent selects an action in each time period and there

More information

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics

More information

Robust Longevity Risk Management

Robust Longevity Risk Management Robust Longevity Risk Management Hong Li a,, Anja De Waegenaere a,b, Bertrand Melenberg a,b a Department of Econometrics and Operations Research, Tilburg University b Netspar Longevity 10 3-4, September,

More information