Multi-armed bandit problems
|
|
- Madison Haynes
- 5 years ago
- Views:
Transcription
1 Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013
2 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before 31 March: Hand in exercises. Papers and exercises can be found at Please make four groups and me before the end of this week.
3 Outline for today Optimization under uncertainty Multi-armed bandit problems Upper bounds on the performance of policies Lower bounds on the performance of policies
4 Decision making under uncertainty Deterministic optimization problem: max f (x; θ), x X where θ Θ is unknown.
5 Decision making under uncertainty Deterministic optimization problem: max f (x; θ), x X where θ Θ is unknown. Robust optimization approach: max min x X θ Θ f (x; θ).
6 Decision making under uncertainty Deterministic optimization problem: max f (x; θ), x X where θ Θ is unknown. Robust optimization approach: max min x X θ Θ f (x; θ). Note the difference with min θ Θ max x X f (x; θ) Also note that X is known and deterministic (else Stochastic Programming, chance constraints).
7 Decision making under uncertainty In stochastic decision problems, f is random and one typically maximizes the expectation of f, max E[f (x; θ)]. x X For example, θ parametrizes the distribution of a random variable Y θ (x) which depends on x, and we solve max E[f (x; Y θ(x))]. x X
8 Decision making under uncertainty If data D = (x i, y i (x i )) 1 i n is available, the value of θ may be inferred.
9 Decision making under uncertainty If data D = (x i, y i (x i )) 1 i n is available, the value of θ may be inferred. 1) Let ˆθ = ˆθ(D) be an estimate of θ (e.g. LS, MLE,..). 2) Solve max E[f (x; x X Yˆθ(x))].
10 Decision making under uncertainty If data D = (x i, y i (x i )) 1 i n is available, the value of θ may be inferred. 1) Let ˆθ = ˆθ(D) be an estimate of θ (e.g. LS, MLE,..). 2) Solve max E[f (x; x X Yˆθ(x))]. Robust alternatives are possible, e.g. max min E[f (x; Y θ(x))], x X θ CI where CI is a 95% confidence interval: P (θ CI) 0.95.
11 Decision making under uncertainty Consider a discrete-time sequential stochastic decision problem under uncertainty: x t = arg max E[f (x; Y θ (x))], x X (t N, θ unknown), where previous decisions x 1,..., x t 1 and observed realizations of Y θ (x 1 ),..., Y θ (x t 1 ) can be used to estimate θ.
12 Decision making under uncertainty Consider a discrete-time sequential stochastic decision problem under uncertainty: x t = arg max E[f (x; Y θ (x))], x X (t N, θ unknown), where previous decisions x 1,..., x t 1 and observed realizations of Y θ (x 1 ),..., Y θ (x t 1 ) can be used to estimate θ. Then periodically updating ˆθ may be beneficial.
13 Decision making under uncertainty DATA
14 Decision making under uncertainty Estimate unknown parameters STATISTICS DATA
15 Decision making under uncertainty OPTIMIZATION Determine optimal decision Estimate unknown parameters STATISTICS Collect new data DATA
16 Decision making under uncertainty OPTIMIZATION Determine optimal decision Estimate unknown parameters STATISTICS Collect new data DATA
17 Decision making under uncertainty OPTIMIZATION Determine optimal decision Estimate unknown parameters STATISTICS Collect new data DATA
18 Decision making under uncertainty Examples of sequential stochastic decision problems under uncertainty: Clinical trials Optimal placement of online advertisements Recommendation systems Optimal routing Dynamic pricing Inventory control...
19 Decision making under uncertainty Myopic policy: x t arg max E[f (x; Yˆθ t (x))] for all suff. large t, x X where ˆθ t is an estimate of θ, based on x 1,..., x t 1 and realizations of Y θ (x 1 ),..., Y θ (x t ).
20 Decision making under uncertainty Myopic policy: x t arg max E[f (x; Yˆθ t (x))] for all suff. large t, x X where ˆθ t is an estimate of θ, based on x 1,..., x t 1 and realizations of Y θ (x 1 ),..., Y θ (x t ). Typical questions: How well does a myopic policy perform? Is experimentation beneficial? Given a policy, what are the costs-for-learning? What are the lowest costs-for-learning achievable by any policy?
21 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ).
22 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled.
23 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i.
24 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i. Let I t denote the arm pulled at time t. Each I t may depend on previously chosen arms and observed rewards, but not on the future.
25 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i. Let I t denote the arm pulled at time t. Each I t may depend on previously chosen arms and observed rewards, but not on the future. Goal: maximize the expected reward n t=1 E[µ I t ].
26 Multi-armed bandit problems (MAB) Given K 2 independent slot machines ( bandits, arms ). At each time point t = 1,..., n N, exactly one arm has to be pulled. The reward of pulling arm i is random, with unknown finite mean µ i. Let I t denote the arm pulled at time t. Each I t may depend on previously chosen arms and observed rewards, but not on the future. Goal: maximize the expected reward n t=1 E[µ I t ]. Alternatively, minimize the regret R n : R n = nµ i n t=1 E[µ I t ], where i arg max µ i. i
27 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms.
28 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms.
29 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms.
30 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms. Stationary reward distribution.
31 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms. Stationary reward distribution. Finite time horizon.
32 Multi-armed bandit problems Note: Rewards of arm i are i.i.d., and independent of rewards from other arms. Finite number of arms. No structure or ordering assumed among arms. Stationary reward distribution. Finite time horizon. Non-Bayesian.
33 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K.
34 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K. Estimate N ˆµ i = N 1 X i,t, t=1 where X i,1,..., X i,n are the N rewards observed from pulling arm i. Use an arm j s.t. ˆµ j = max i ˆµ i during time periods KN + 1,..., n.
35 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K. Estimate N ˆµ i = N 1 X i,t, t=1 where X i,1,..., X i,n are the N rewards observed from pulling arm i. Use an arm j s.t. ˆµ j = max i ˆµ i during time periods KN + 1,..., n. Observe: both exploration and exploitation.
36 Multi-armed bandit problems A simple policy: Use arm i during time periods (i 1)N + 1,..., in, for i = 1,..., K. Estimate N ˆµ i = N 1 X i,t, t=1 where X i,1,..., X i,n are the N rewards observed from pulling arm i. Use an arm j s.t. ˆµ j = max i ˆµ i during time periods KN + 1,..., n. Observe: both exploration and exploitation. One can show R n = O(log n), by choosing N appropriately.
37 Multi-armed bandit problems Some disadvantages of the simple policy: Does not use all data to estimate µ i Needs to know n in advance With positive probability, the fraction that the optimal arm is chosen is o(n). Alternative policy?
38 Multi-armed bandit problems UCB1: Idea: determine a confidence interval for ˆµ i. Use the arm with the highest confidence interval.
39 Multi-armed bandit problems UCB1: Idea: determine a confidence interval for ˆµ i. Use the arm with the highest confidence interval. Choose each arm once. For all t = K + 1,..., n, play machine j that maximizes ˆµ j + 2 log t T j (t), where ˆµ j is the average reward obtained from arm j, and T j (t) is the number of times arm j has been played up to time t.
40 Multi-armed bandit problems UCB1: Idea: determine a confidence interval for ˆµ i. Use the arm with the highest confidence interval. Choose each arm once. For all t = K + 1,..., n, play machine j that maximizes ˆµ j + 2 log t T j (t), where ˆµ j is the average reward obtained from arm j, and T j (t) is the number of times arm j has been played up to time t. Again one can show R n = O(log n)
41 Multi-armed bandit problems We have seen two policies with R n = O(log n). Can any policy do better?
42 Multi-armed bandit problems We have seen two policies with R n = O(log n). Can any policy do better? No. All policies have R n = Ω(log n). Lai and Robbins (1985): Any uniformly good policy and any suboptimal arm j lim inf t E[T j (t)] log t 1 D KL (X j X i ) a.s., where D KL (P Q) is the Kullback-Leibler divergence between r.v. P and Q, and uniformly good means E[T j (n)] = o(n a ) for all a > 0 and all suboptimal arms j.
43 Multi-armed bandit problems How to choose between different policies, each with logarithmic regret? Constant before the log-term Finite-time behavior Variance of regret Some numerical studies: Kuleshov and Precup (2000), Vermorel and Mohri (2005) (on website).
44 Reminder: presentations next week Topics: See 1 Incomplete learning (20 March) 2 Adversarial bandits (20 March) 3 Non-stationarity (21 March) 4 Continuum-armed bandits (21 March) for papers and more information. Please make four groups and me (a.v.d.boer@tue.nl) before the end of this week. First-come-first-served.
Multi-armed bandits in dynamic pricing
Multi-armed bandits in dynamic pricing Arnoud den Boer University of Twente, Centrum Wiskunde & Informatica Amsterdam Lancaster, January 11, 2016 Dynamic pricing A firm sells a product, with abundant inventory,
More informationDynamic Pricing with Varying Cost
Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy
More informationTuning bandit algorithms in stochastic environments
Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference
More informationBernoulli Bandits An Empirical Comparison
Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationD I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018
D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING Arnoud V. den Boer University of Amsterdam N. Bora Keskin Duke University Rotterdam May 24, 2018 Dynamic pricing and learning: Learning
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationTreatment Allocations Based on Multi-Armed Bandit Strategies
Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationChapter 7: Estimation Sections
1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:
More informationChapter 8: Sampling distributions of estimators Sections
Chapter 8 continued Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample
More informationAdaptive Experiments for Policy Choice. March 8, 2019
Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:
More informationRegret Minimization against Strategic Buyers
Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and
More informationConfidence Intervals Introduction
Confidence Intervals Introduction A point estimate provides no information about the precision and reliability of estimation. For example, the sample mean X is a point estimate of the population mean μ
More informationChapter 7: Estimation Sections
1 / 31 : Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood
More informationDefinition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.
9 Point estimation 9.1 Rationale behind point estimation When sampling from a population described by a pdf f(x θ) or probability function P [X = x θ] knowledge of θ gives knowledge of the entire population.
More informationInterval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems
Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide
More informationChapter 7: Estimation Sections
Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions Frequentist Methods: 7.5 Maximum Likelihood Estimators
More informationLearning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme
Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10
More informationLearning From Data: MLE. Maximum Likelihood Estimators
Learning From Data: MLE Maximum Likelihood Estimators 1 Parameter Estimation Assuming sample x1, x2,..., xn is from a parametric distribution f(x θ), estimate θ. E.g.: Given sample HHTTTTTHTHTTTHH of (possibly
More informationOnline Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
More informationThe Irrevocable Multi-Armed Bandit Problem
The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision
More informationLecture 11: Bandits with Knapsacks
CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic
More informationAn algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet
More informationEE266 Homework 5 Solutions
EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The
More information17 MAKING COMPLEX DECISIONS
267 17 MAKING COMPLEX DECISIONS The agent s utility now depends on a sequence of decisions In the following 4 3grid environment the agent makes a decision to move (U, R, D, L) at each time step When the
More informationPakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks
Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Spring 2009 Main question: How much are patents worth? Answering this question is important, because it helps
More informationMulti-Armed Bandit, Dynamic Environments and Meta-Bandits
Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This
More information6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n
6. Martingales For casino gamblers, a martingale is a betting strategy where (at even odds) the stake doubled each time the player loses. Players follow this strategy because, since they will eventually
More informationPoint Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.
Point Estimation Point Estimation Definition A point estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic
More informationLecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory
CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go
More informationZooming Algorithm for Lipschitz Bandits
Zooming Algorithm for Lipschitz Bandits Alex Slivkins Microsoft Research New York City Based on joint work with Robert Kleinberg and Eli Upfal (STOC'08) Running examples Dynamic pricing. You release a
More informationDynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms
1 Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms Pouya Tehrani, Yixuan Zhai, Qing Zhao Department of Electrical and Computer Engineering University of California,
More informationPractice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.
Practice Exercises for Midterm Exam ST 522 - Statistical Theory - II The ACTUAL exam will consists of less number of problems. 1. Suppose X i F ( ) for i = 1,..., n, where F ( ) is a strictly increasing
More information6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move
More informationModule 10:Application of stochastic processes in areas like finance Lecture 36:Black-Scholes Model. Stochastic Differential Equation.
Stochastic Differential Equation Consider. Moreover partition the interval into and define, where. Now by Rieman Integral we know that, where. Moreover. Using the fundamentals mentioned above we can easily
More informationApplied Statistics I
Applied Statistics I Liang Zhang Department of Mathematics, University of Utah July 14, 2008 Liang Zhang (UofU) Applied Statistics I July 14, 2008 1 / 18 Point Estimation Liang Zhang (UofU) Applied Statistics
More informationINVERSE REWARD DESIGN
INVERSE REWARD DESIGN Dylan Hadfield-Menell, Smith Milli, Pieter Abbeel, Stuart Russell, Anca Dragan University of California, Berkeley Slides by Anthony Chen Inverse Reinforcement Learning (Review) Inverse
More informationBandit algorithms for tree search Applications to games, optimization, and planning
Bandit algorithms for tree search Applications to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord Europe Journées MAS
More information16 MAKING SIMPLE DECISIONS
247 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action A will have possible outcome states Result
More informationDynamic Pricing for Competing Sellers
Clemson University TigerPrints All Theses Theses 8-2015 Dynamic Pricing for Competing Sellers Liu Zhu Clemson University, liuz@clemson.edu Follow this and additional works at: https://tigerprints.clemson.edu/all_theses
More informationBack to estimators...
Back to estimators... So far, we have: Identified estimators for common parameters Discussed the sampling distributions of estimators Introduced ways to judge the goodness of an estimator (bias, MSE, etc.)
More informationMVE051/MSG Lecture 7
MVE051/MSG810 2017 Lecture 7 Petter Mostad Chalmers November 20, 2017 The purpose of collecting and analyzing data Purpose: To build and select models for parts of the real world (which can be used for
More informationEconomics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints
Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints David Laibson 9/11/2014 Outline: 1. Precautionary savings motives 2. Liquidity constraints 3. Application: Numerical solution
More informationRevenue optimization in AdExchange against strategic advertisers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationThe Value of Stochastic Modeling in Two-Stage Stochastic Programs
The Value of Stochastic Modeling in Two-Stage Stochastic Programs Erick Delage, HEC Montréal Sharon Arroyo, The Boeing Cie. Yinyu Ye, Stanford University Tuesday, October 8 th, 2013 1 Delage et al. Value
More informationBudget Management In GSP (2018)
Budget Management In GSP (2018) Yahoo! March 18, 2018 Miguel March 18, 2018 1 / 26 Today s Presentation: Budget Management Strategies in Repeated auctions, Balseiro, Kim, and Mahdian, WWW2017 Learning
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More information16 MAKING SIMPLE DECISIONS
253 16 MAKING SIMPLE DECISIONS Let us associate each state S with a numeric utility U(S), which expresses the desirability of the state A nondeterministic action a will have possible outcome states Result(a)
More informationCharacterization of the Optimum
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing
More informationChapter 8. Introduction to Statistical Inference
Chapter 8. Introduction to Statistical Inference Point Estimation Statistical inference is to draw some type of conclusion about one or more parameters(population characteristics). Now you know that a
More informationA selection of MAS learning techniques based on RL
A selection of MAS learning techniques based on RL Ann Nowé 14/11/12 Herhaling titel van presentatie 1 Content Single stage setting Common interest (Claus & Boutilier, Kapetanakis&Kudenko) Conflicting
More informationChapter 8: Sampling distributions of estimators Sections
Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p.
More informationLecture outline W.B.Powell 1
Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous
More informationActuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems
Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems Spring 2005 1. Which of the following statements relate to probabilities that can be interpreted as frequencies?
More informationLecture Notes 1
4.45 Lecture Notes Guido Lorenzoni Fall 2009 A portfolio problem To set the stage, consider a simple nite horizon problem. A risk averse agent can invest in two assets: riskless asset (bond) pays gross
More informationMonte-Carlo Planning: Basic Principles and Recent Progress
Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo
More informationCS340 Machine learning Bayesian model selection
CS340 Machine learning Bayesian model selection Bayesian model selection Suppose we have several models, each with potentially different numbers of parameters. Example: M0 = constant, M1 = straight line,
More informationDealing with forecast uncertainty in inventory models
Dealing with forecast uncertainty in inventory models 19th IIF workshop on Supply Chain Forecasting for Operations Lancaster University Dennis Prak Supervisor: Prof. R.H. Teunter June 29, 2016 Dennis Prak
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationChapter 5. Sampling Distributions
Lecture notes, Lang Wu, UBC 1 Chapter 5. Sampling Distributions 5.1. Introduction In statistical inference, we attempt to estimate an unknown population characteristic, such as the population mean, µ,
More informationRollout Allocation Strategies for Classification-based Policy Iteration
Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large
More informationChapter 5. Statistical inference for Parametric Models
Chapter 5. Statistical inference for Parametric Models Outline Overview Parameter estimation Method of moments How good are method of moments estimates? Interval estimation Statistical Inference for Parametric
More informationDynamic Portfolio Choice II
Dynamic Portfolio Choice II Dynamic Programming Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Dynamic Portfolio Choice II 15.450, Fall 2010 1 / 35 Outline 1 Introduction to Dynamic
More informationProblem 1: Random variables, common distributions and the monopoly price
Problem 1: Random variables, common distributions and the monopoly price In this problem, we will revise some basic concepts in probability, and use these to better understand the monopoly price (alternatively
More informationRevenue Management with Incomplete Demand Information
Revenue Management with Incomplete Demand Information Victor F. Araman René Caldentey Stern School of Business, New York University, New York, NY 10012. Abstract Consider a seller who is endowed with a
More informationAM 121: Intro to Optimization Models and Methods
AM 121: Intro to Optimization Models and Methods Lecture 18: Markov Decision Processes Yiling Chen and David Parkes Lesson Plan Markov decision processes Policies and Value functions Solving: average reward,
More informationCMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory
CMSC 858F: Algorithmic Game Theory Fall 2010 Introduction to Algorithmic Game Theory Instructor: Mohammad T. Hajiaghayi Scribe: Hyoungtae Cho October 13, 2010 1 Overview In this lecture, we introduce the
More informationEE641 Digital Image Processing II: Purdue University VISE - October 29,
EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 The EM Algorithm. Suffient Statistics and Exponential Distributions Let p(y θ) be a family of density functions parameterized by
More information1 The EOQ and Extensions
IEOR4000: Production Management Lecture 2 Professor Guillermo Gallego September 16, 2003 Lecture Plan 1. The EOQ and Extensions 2. Multi-Item EOQ Model 1 The EOQ and Extensions We have explored some of
More informationEE365: Risk Averse Control
EE365: Risk Averse Control Risk averse optimization Exponential risk aversion Risk averse control 1 Outline Risk averse optimization Exponential risk aversion Risk averse control Risk averse optimization
More informationFinal exam solutions
EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the
More informationLecture 22. Survey Sampling: an Overview
Math 408 - Mathematical Statistics Lecture 22. Survey Sampling: an Overview March 25, 2013 Konstantin Zuev (USC) Math 408, Lecture 22 March 25, 2013 1 / 16 Survey Sampling: What and Why In surveys sampling
More informationUnobserved Heterogeneity Revisited
Unobserved Heterogeneity Revisited Robert A. Miller Dynamic Discrete Choice March 2018 Miller (Dynamic Discrete Choice) cemmap 7 March 2018 1 / 24 Distributional Assumptions about the Unobserved Variables
More informationROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit
ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY A. Ben-Tal, B. Golany and M. Rozenblit Faculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel ABSTRACT
More informationAsymptotic results discrete time martingales and stochastic algorithms
Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete
More informationUniversal Portfolios
CS28B/Stat24B (Spring 2008) Statistical Learning Theory Lecture: 27 Universal Portfolios Lecturer: Peter Bartlett Scribes: Boriska Toth and Oriol Vinyals Portfolio optimization setting Suppose we have
More informationChapter 7 - Lecture 1 General concepts and criteria
Chapter 7 - Lecture 1 General concepts and criteria January 29th, 2010 Best estimator Mean Square error Unbiased estimators Example Unbiased estimators not unique Special case MVUE Bootstrap General Question
More informationBehavioral Competitive Equilibrium and Extreme Prices. Faruk Gul Wolfgang Pesendorfer Tomasz Strzalecki
Behavioral Competitive Equilibrium and Extreme Prices Faruk Gul Wolfgang Pesendorfer Tomasz Strzalecki behavioral optimization behavioral optimization restricts agents ability by imposing additional constraints
More informationSOLVING ROBUST SUPPLY CHAIN PROBLEMS
SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated
More informationPrediction Market, Mechanism Design, and Cooperative Game Theory
Prediction Market, Mechanism Design, and Cooperative Game Theory V. Conitzer presented by Janyl Jumadinova October 16, 2009 Prediction Markets Created for the purpose of making predictions by obtaining
More informationChapter 4: Asymptotic Properties of MLE (Part 3)
Chapter 4: Asymptotic Properties of MLE (Part 3) Daniel O. Scharfstein 09/30/13 1 / 1 Breakdown of Assumptions Non-Existence of the MLE Multiple Solutions to Maximization Problem Multiple Solutions to
More informationStatistical analysis and bootstrapping
Statistical analysis and bootstrapping p. 1/15 Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Statistical analysis and bootstrapping
More informationDecision Theory: Value Iteration
Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision
More informationA New Hybrid Estimation Method for the Generalized Pareto Distribution
A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD
More informationContents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1
Math 321 Chapter 5 Confidence Intervals (draft version 2019/04/11-11:17:37) Contents 1 Introduction 1 2 Confidence interval for mean µ 2 2.1 Known variance................................. 2 2.2 Unknown
More informationINTERTEMPORAL ASSET ALLOCATION: THEORY
INTERTEMPORAL ASSET ALLOCATION: THEORY Multi-Period Model The agent acts as a price-taker in asset markets and then chooses today s consumption and asset shares to maximise lifetime utility. This multi-period
More informationModelling, Estimation and Hedging of Longevity Risk
IA BE Summer School 2016, K. Antonio, UvA 1 / 50 Modelling, Estimation and Hedging of Longevity Risk Katrien Antonio KU Leuven and University of Amsterdam IA BE Summer School 2016, Leuven Module II: Fitting
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationRecharging Bandits. Joint work with Nicole Immorlica.
Recharging Bandits Bobby Kleinberg Cornell University Joint work with Nicole Immorlica. NYU Machine Learning Seminar New York, NY 24 Oct 2017 Prologue Can you construct a dinner schedule that: never goes
More informationCS 361: Probability & Statistics
March 12, 2018 CS 361: Probability & Statistics Inference Binomial likelihood: Example Suppose we have a coin with an unknown probability of heads. We flip the coin 10 times and observe 2 heads. What can
More informationLecture 12: Introduction to reasoning under uncertainty. Actions and Consequences
Lecture 12: Introduction to reasoning under uncertainty Preferences Utility functions Maximizing expected utility Value of information Bandit problems and the exploration-exploitation trade-off COMP-424,
More informationPoint Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel
STATISTICS Lecture no. 10 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 8. 12. 2009 Introduction Suppose that we manufacture lightbulbs and we want to state
More informationCSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization
CSCI 1951-G Optimization Methods in Finance Part 07: Portfolio Optimization March 9 16, 2018 1 / 19 The portfolio optimization problem How to best allocate our money to n risky assets S 1,..., S n with
More informationEVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz
1 EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu
More informationAll Investors are Risk-averse Expected Utility Maximizers. Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel)
All Investors are Risk-averse Expected Utility Maximizers Carole Bernard (UW), Jit Seng Chen (GGY) and Steven Vanduffel (Vrije Universiteit Brussel) First Name: Waterloo, April 2013. Last Name: UW ID #:
More informationTeaching Bandits How to Behave
Teaching Bandits How to Behave Manuscript Yiling Chen, Jerry Kung, David Parkes, Ariel Procaccia, Haoqi Zhang Abstract Consider a setting in which an agent selects an action in each time period and there
More informationOptimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models
Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics
More informationRobust Longevity Risk Management
Robust Longevity Risk Management Hong Li a,, Anja De Waegenaere a,b, Bertrand Melenberg a,b a Department of Econometrics and Operations Research, Tilburg University b Netspar Longevity 10 3-4, September,
More information