Dynamic Pricing with Varying Cost
|
|
- Emma Richardson
- 5 years ago
- Views:
Transcription
1 Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu
2 Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy 4 Regret Analysis 5 Numerical Results Dynamic Pricing with Varying Cost 3 / 21
3 Online Pricing Problem A company sells a product online. Customers arrive one at a time and buy one unit of the product if the price is lower than their willingness to pay (WTP). Customers are homogenous, having the same WTP distribution. The company has a menu of prices, e.g., 3.99, 4.99 and 5.99, to choose from. Question: How to set the price? Dynamic Pricing with Varying Cost 4 / 21
4 Learning and Earning The objective is to maximize the cumulative profit by adaptively offering different prices to different customers. The decision maker faces a tradeoff between exploration of the acceptance probabilities at different prices (learning) and exploitation of the immediate profit (earning). The problem was first introduced by Rothschild (1974). Without any assumptions on the WTP distribution, the problem is typically formulated as a multi-armed bandit (MAB) problem. Dynamic Pricing with Varying Cost 5 / 21
5 Multi-armed Bandit Originally formulated by Robbins (1952), the MAB is an important class of sequential optimization problems. Objective: Devise a sampling policy among a group of K 2 statistical populations (arms) that maximizes expected cumulative reward over a finite time horizon. Dynamic Pricing with Varying Cost 6 / 21
6 Multi-armed Bandit Policies are evaluated based on the regret, R[T] = T E [ ] µ i µ It, t=1 where i is the optimal arm and I t is the arm chosen in period t. Lai and Robbins (1985) proved that the regret for the MAB problem has to grow at least O ( log T ). The upper-confidence-bound (UCB) policy of Auer et al has R[T] C log T for some constant C > 0. Dynamic Pricing with Varying Cost 7 / 21
7 UCB Policy 1 Initialization: Play each arm once. 2 Loop: Play arm j that maximizes µ j + 2 log t T j (t 1) where µ j is the average reward obtained from arm j, T j (t 1) is the number of times arm j has been played so far and t is the overall number of plays done so far. Dynamic Pricing with Varying Cost 8 / 21
8 Varying Cost In some practical applications, costs may vary for different customers. Online sales of an insurance product: Potential customers are usually asked to fill questionnaires before getting quotes for the product. The insurance company is able to assess the potential risk (cost) of each individual customer through these questionnaires. The cost for each customer, as is often the case, varies. To maximize the cumulative profit, different premiums (prices) should be asked for different customers based on their costs. Other examples include: the sales of some perishable goods, e.g. gasoline, fresh fruit, etc. Dynamic Pricing with Varying Cost 9 / 21
9 Notation T: Total length of of the selling periods (or customers). c t : The cost observed in period t. We assume they are i.i.d. samples from a fixed (unknown) distribution on C. p 1 < p 2 < p K : Prices choices. We assume p 1 > max c C c. K = {1, 2..., K}: index set of all the prices. µ k (c): The profit function of price k when the cost is c, ( µ k (c) = E[D(p k )] (p k c) = π k 1 c ), p k where π k = E[D(p k )]p k is the expected revenue at p k. We also assume that the observed revenue D(p k )p k [0, 1]. Dynamic Pricing with Varying Cost 10 / 21
10 Problem Formulation Consider a company selling a product over T (unknown) periods. At the beginning of each period t, upon observing a cost c t, the decision maker needs to choose a price p k where k K. The index of the true optimal price at time t is: i (c t ) = arg max k K µ k (c t ) Let I t (c t ) be the index of the price chosen by a pricing policy. Objective: Find a pricing policy that minimizes the cumulative regret: T R [T] = E [ µ i (c t ) (c t ) µ It (c t ) (c t ) ]. t=1 Dynamic Pricing with Varying Cost 11 / 21
11 Why Considering Varying Cost? Without considering the varying cost, suppose one uses the expected cost E(c t ) in making pricing decision. The problem becomes a MAB problem max k K µ k (E(c t )) Considering varying cost, the problem is max k K µ k (c t ) By Jensen s Inequality, [ ] max µ k (E(c t )) E max µ k (c t ) k K k K Dynamic Pricing with Varying Cost 12 / 21
12 Special Features Notice that µ k (c) = π k ( 1 c p k ), for any k K, the straight line µ k (c) always passes a fixed point [ p k, 0 ] and [0, π k ]. Precisely estimating π k is crucial. Dynamic Pricing with Varying Cost 13 / 21
13 Pricing Policy 1. Initialization: For t K, choose each price once. 2. Loop: For t > K Estimate revenue for each p k, and let π k,t = T 1 k (t 1) Π (p k ) T k (t 1) i where Π (p k ) i is the i-th realization of the revenue of p k. Write down the upper bound of the profit function for each p k in UCB manner, let µ k,t (c) = π k,t + Choose the price with index, i=1 2 log t T k (t 1) I t (c t ) = arg max k K µ k,t (c t ) ) (1 cpk Dynamic Pricing with Varying Cost 14 / 21
14 Main Results Theorem (1) If K = 2 and π 1 > π 2, the cumulative regret is bounded by R [T] C 1 ( log T ) 2 where C 1 is a positive constant that depends on the configuration of µ 1 (c) and µ 2 (c). Regret is mainly caused by the inaccurate estimation of the intersection point of µ 1 (c) and µ 2 (c), and the regret comes from the neighborhood of the intersection point. For each t, the expected regret is bounded by constant log t t. The result can be extended to K > 2 under some conditions. Dynamic Pricing with Varying Cost 15 / 21
15 Illustration Dynamic Pricing with Varying Cost 16 / 21
16 Intuitions The information learned at one value of the cost can also be used at other values of cost. Our problem is significantly more difficult than the standard MAB problem, because the profits can be arbitrarily close, making the selection very difficult. Yet, the regret is not much worse, O ( ( log T ) 2 ) compared to O ( log T ). The regret comes from the inaccurate estimation of the intersection point and, thus, causing wrong decisions. Dynamic Pricing with Varying Cost 17 / 21
17 A Special Case: C < If C <, at any cost, there is an gap between µ 1 (c) and µ 2 (c). Then, the inaccurate estimation of the intersection point will not happen infinitely often. What s the implication? Dynamic Pricing with Varying Cost 18 / 21
18 A Constant Bound Theorem (3) If C < and none of the feasible prices is inferior, then there exists a constant C 2 such that R [T] C 2. We would expect the problem with varying cost is more difficult than the one with constant cost. It is not! Because every price is good for some costs, one does not have to conduct exploration on prices that do not look good. Dynamic Pricing with Varying Cost 19 / 21
19 Numerical Results The cumulative regret with respective to T C={1,3} C=[0,4] Cumulative Regret (normalized) T x 10 4 Figure: p 1 = 4.0, p 2 = 4.1, π 1 = 0.6, π 2 = 0.59 Dynamic Pricing with Varying Cost 20 / 21
20 Q & A Thank you! Dynamic Pricing with Varying Cost 21 / 21
Multi-armed bandit problems
Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before
More informationZooming Algorithm for Lipschitz Bandits
Zooming Algorithm for Lipschitz Bandits Alex Slivkins Microsoft Research New York City Based on joint work with Robert Kleinberg and Eli Upfal (STOC'08) Running examples Dynamic pricing. You release a
More informationAn algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet
More informationTreatment Allocations Based on Multi-Armed Bandit Strategies
Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationTuning bandit algorithms in stochastic environments
Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference
More informationThe Irrevocable Multi-Armed Bandit Problem
The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationOnline Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
More informationD I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018
D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING Arnoud V. den Boer University of Amsterdam N. Bora Keskin Duke University Rotterdam May 24, 2018 Dynamic pricing and learning: Learning
More informationDynamic Programming and Reinforcement Learning
Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning
More informationLecture 11: Bandits with Knapsacks
CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic
More informationMulti-armed bandits in dynamic pricing
Multi-armed bandits in dynamic pricing Arnoud den Boer University of Twente, Centrum Wiskunde & Informatica Amsterdam Lancaster, January 11, 2016 Dynamic pricing A firm sells a product, with abundant inventory,
More informationRollout Allocation Strategies for Classification-based Policy Iteration
Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large
More informationAdaptive Experiments for Policy Choice. March 8, 2019
Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:
More informationBernoulli Bandits An Empirical Comparison
Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050
More informationRegret Minimization against Strategic Buyers
Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and
More informationBandit algorithms for tree search Applications to games, optimization, and planning
Bandit algorithms for tree search Applications to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord Europe Journées MAS
More informationMulti-Armed Bandit, Dynamic Environments and Meta-Bandits
Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This
More informationApproximate Revenue Maximization with Multiple Items
Approximate Revenue Maximization with Multiple Items Nir Shabbat - 05305311 December 5, 2012 Introduction The paper I read is called Approximate Revenue Maximization with Multiple Items by Sergiu Hart
More informationAn Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking
An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking Mika Sumida School of Operations Research and Information Engineering, Cornell University, Ithaca, New York
More informationRecharging Bandits. Joint work with Nicole Immorlica.
Recharging Bandits Bobby Kleinberg Cornell University Joint work with Nicole Immorlica. NYU Machine Learning Seminar New York, NY 24 Oct 2017 Prologue Can you construct a dinner schedule that: never goes
More informationMulti-period mean variance asset allocation: Is it bad to win the lottery?
Multi-period mean variance asset allocation: Is it bad to win the lottery? Peter Forsyth 1 D.M. Dang 1 1 Cheriton School of Computer Science University of Waterloo Guangzhou, July 28, 2014 1 / 29 The Basic
More informationMonte-Carlo Planning: Basic Principles and Recent Progress
Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo
More informationLearning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme
Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10
More informationBandit Problems with Lévy Payoff Processes
Bandit Problems with Lévy Payoff Processes Eilon Solan Tel Aviv University Joint with Asaf Cohen Multi-Arm Bandits A single player sequential decision making problem. Time is continuous or discrete. The
More informationTHE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE
THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,
More informationTTIC An Introduction to the Theory of Machine Learning. The Adversarial Multi-armed Bandit Problem Avrim Blum.
TTIC 31250 An Introduction to the Theory of Machine Learning The Adversarial Multi-armed Bandit Problem Avrim Blum Start with recap 1 Algorithm Consider the following setting Each morning, you need to
More information1 Dynamic programming
1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants
More information1 Precautionary Savings: Prudence and Borrowing Constraints
1 Precautionary Savings: Prudence and Borrowing Constraints In this section we study conditions under which savings react to changes in income uncertainty. Recall that in the PIH, when you abstract from
More information6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move
More informationAssortment Optimization Over Time
Assortment Optimization Over Time James M. Davis Huseyin Topaloglu David P. Williamson Abstract In this note, we introduce the problem of assortment optimization over time. In this problem, we have a sequence
More informationOPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE
Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 005 Seville, Spain, December 1-15, 005 WeA11.6 OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF
More informationMarkov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N
Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning
More informationDecision Theory: Value Iteration
Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision
More informationAn optimal policy for joint dynamic price and lead-time quotation
Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming
More informationDynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming
Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role
More informationHedging Under Jump Diffusions with Transaction Costs. Peter Forsyth, Shannon Kennedy, Ken Vetzal University of Waterloo
Hedging Under Jump Diffusions with Transaction Costs Peter Forsyth, Shannon Kennedy, Ken Vetzal University of Waterloo Computational Finance Workshop, Shanghai, July 4, 2008 Overview Overview Single factor
More informationBandit Learning with switching costs
Bandit Learning with switching costs Jian Ding, University of Chicago joint with: Ofer Dekel (MSR), Tomer Koren (Technion) and Yuval Peres (MSR) June 2016, Harvard University Online Learning with k -Actions
More informationLower Bounds on Revenue of Approximately Optimal Auctions
Lower Bounds on Revenue of Approximately Optimal Auctions Balasubramanian Sivan 1, Vasilis Syrgkanis 2, and Omer Tamuz 3 1 Computer Sciences Dept., University of Winsconsin-Madison balu2901@cs.wisc.edu
More informationFinal exam solutions
EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the
More informationLec 1: Single Agent Dynamic Models: Nested Fixed Point Approach. K. Sudhir MGT 756: Empirical Methods in Marketing
Lec 1: Single Agent Dynamic Models: Nested Fixed Point Approach K. Sudhir MGT 756: Empirical Methods in Marketing RUST (1987) MODEL AND ESTIMATION APPROACH A Model of Harold Zurcher Rust (1987) Empirical
More informationStock Repurchase with an Adaptive Reservation Price: A Study of the Greedy Policy
Stock Repurchase with an Adaptive Reservation Price: A Study of the Greedy Policy Ye Lu Asuman Ozdaglar David Simchi-Levi November 8, 200 Abstract. We consider the problem of stock repurchase over a finite
More informationMaking Decisions. CS 3793 Artificial Intelligence Making Decisions 1
Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside
More informationHomework 3: Asset Pricing
Homework 3: Asset Pricing Mohammad Hossein Rahmati November 1, 2018 1. Consider an economy with a single representative consumer who maximize E β t u(c t ) 0 < β < 1, u(c t ) = ln(c t + α) t= The sole
More informationStatistics for Managers Using Microsoft Excel 7 th Edition
Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 7 Sampling Distributions Statistics for Managers Using Microsoft Excel 7e Copyright 2014 Pearson Education, Inc. Chap 7-1 Learning Objectives
More informationMonte Carlo Methods (Estimators, On-policy/Off-policy Learning)
1 / 24 Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) Julie Nutini MLRG - Winter Term 2 January 24 th, 2017 2 / 24 Monte Carlo Methods Monte Carlo (MC) methods are learning methods, used
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationThe Neoclassical Growth Model
The Neoclassical Growth Model 1 Setup Three goods: Final output Capital Labour One household, with preferences β t u (c t ) (Later we will introduce preferences with respect to labour/leisure) Endowment
More informationBasic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]
Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal
More informationNotes on Intertemporal Optimization
Notes on Intertemporal Optimization Econ 204A - Henning Bohn * Most of modern macroeconomics involves models of agents that optimize over time. he basic ideas and tools are the same as in microeconomics,
More informationUncertainty in Equilibrium
Uncertainty in Equilibrium Larry Blume May 1, 2007 1 Introduction The state-preference approach to uncertainty of Kenneth J. Arrow (1953) and Gérard Debreu (1959) lends itself rather easily to Walrasian
More informationOptimal Long-Term Supply Contracts with Asymmetric Demand Information. Appendix
Optimal Long-Term Supply Contracts with Asymmetric Demand Information Ilan Lobel Appendix Wenqiang iao {ilobel, wxiao}@stern.nyu.edu Stern School of Business, New York University Appendix A: Proofs Proof
More informationMATH 425: BINOMIAL TREES
MATH 425: BINOMIAL TREES G. BERKOLAIKO Summary. These notes will discuss: 1-level binomial tree for a call, fair price and the hedging procedure 1-level binomial tree for a general derivative, fair price
More informationTDT4171 Artificial Intelligence Methods
TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods
More informationExtraction capacity and the optimal order of extraction. By: Stephen P. Holland
Extraction capacity and the optimal order of extraction By: Stephen P. Holland Holland, Stephen P. (2003) Extraction Capacity and the Optimal Order of Extraction, Journal of Environmental Economics and
More informationInformation Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)
Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision
More informationPosted-Price Mechanisms and Prophet Inequalities
Posted-Price Mechanisms and Prophet Inequalities BRENDAN LUCIER, MICROSOFT RESEARCH WINE: CONFERENCE ON WEB AND INTERNET ECONOMICS DECEMBER 11, 2016 The Plan 1. Introduction to Prophet Inequalities 2.
More informationE-companion to Coordinating Inventory Control and Pricing Strategies for Perishable Products
E-companion to Coordinating Inventory Control and Pricing Strategies for Perishable Products Xin Chen International Center of Management Science and Engineering Nanjing University, Nanjing 210093, China,
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationElif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006
On the convergence of Q-learning Elif Özge Özdamar elif.ozdamar@helsinki.fi T-61.6020 Reinforcement Learning - Theory and Applications February 14, 2006 the covergence of stochastic iterative algorithms
More informationLecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018
Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction
More informationInformation aggregation for timing decision making.
MPRA Munich Personal RePEc Archive Information aggregation for timing decision making. Esteban Colla De-Robertis Universidad Panamericana - Campus México, Escuela de Ciencias Económicas y Empresariales
More informationThe Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition.
The Real Numbers Here we show one way to explicitly construct the real numbers R. First we need a definition. Definitions/Notation: A sequence of rational numbers is a funtion f : N Q. Rather than write
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationProblem Set 3. Thomas Philippon. April 19, Human Wealth, Financial Wealth and Consumption
Problem Set 3 Thomas Philippon April 19, 2002 1 Human Wealth, Financial Wealth and Consumption The goal of the question is to derive the formulas on p13 of Topic 2. This is a partial equilibrium analysis
More informationCharacterization of the Optimum
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing
More information4 Martingales in Discrete-Time
4 Martingales in Discrete-Time Suppose that (Ω, F, P is a probability space. Definition 4.1. A sequence F = {F n, n = 0, 1,...} is called a filtration if each F n is a sub-σ-algebra of F, and F n F n+1
More informationDynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment
Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3-6, 2012 Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing
More informationFinite Memory and Imperfect Monitoring
Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve
More informationOptimizing S-shaped utility and risk management
Optimizing S-shaped utility and risk management Ineffectiveness of VaR and ES constraints John Armstrong (KCL), Damiano Brigo (Imperial) Quant Summit March 2018 Are ES constraints effective against rogue
More informationDynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms
1 Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms Pouya Tehrani, Yixuan Zhai, Qing Zhao Department of Electrical and Computer Engineering University of California,
More informationModelling Anti-Terrorist Surveillance Systems from a Queueing Perspective
Systems from a Queueing Perspective September 7, 2012 Problem A surveillance resource must observe several areas, searching for potential adversaries. Problem A surveillance resource must observe several
More informationWe study a seller that starts with an initial inventory of goods, has a target horizon over which to sell the
MANAGEMENT SCIENCE Vol. 58, No. 9, September 212, pp. 1715 1731 ISSN 25-199 (print) ISSN 1526-551 (online) http://dx.doi.org/1.1287/mnsc.111.1513 212 INFORMS Dynamic Pricing with Financial Milestones:
More informationOptimization Models in Financial Mathematics
Optimization Models in Financial Mathematics John R. Birge Northwestern University www.iems.northwestern.edu/~jrbirge Illinois Section MAA, April 3, 2004 1 Introduction Trends in financial mathematics
More informationDynamically Scheduling and Maintaining a Flexible Server
Dynamically Scheduling and Maintaining a Flexible Server Jefferson Huang Operations Research Department Naval Postgraduate School INFORMS Annual Meeting November 7, 2018 Co-Authors: Douglas Down (McMaster),
More informationHigh Dimensional Bayesian Optimisation and Bandits via Additive Models
1/20 High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kandasamy, Jeff Schneider, Barnabás Póczos ICML 15 July 8 2015 2/20 Bandits & Optimisation Maximum Likelihood inference
More informationCompeting Mechanisms with Limited Commitment
Competing Mechanisms with Limited Commitment Suehyun Kwon CESIFO WORKING PAPER NO. 6280 CATEGORY 12: EMPIRICAL AND THEORETICAL METHODS DECEMBER 2016 An electronic version of the paper may be downloaded
More informationHandout 4: Deterministic Systems and the Shortest Path Problem
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas
More informationCS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm
CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure
More informationRegret Minimization and Security Strategies
Chapter 5 Regret Minimization and Security Strategies Until now we implicitly adopted a view that a Nash equilibrium is a desirable outcome of a strategic game. In this chapter we consider two alternative
More informationEE266 Homework 5 Solutions
EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The
More informationThe Value of Information in Central-Place Foraging. Research Report
The Value of Information in Central-Place Foraging. Research Report E. J. Collins A. I. Houston J. M. McNamara 22 February 2006 Abstract We consider a central place forager with two qualitatively different
More informationOptimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models
Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics
More informationSTATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010
STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements, state
More informationMartingale Pricing Theory in Discrete-Time and Discrete-Space Models
IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,
More informationLecture 1: Lucas Model and Asset Pricing
Lecture 1: Lucas Model and Asset Pricing Economics 714, Spring 2018 1 Asset Pricing 1.1 Lucas (1978) Asset Pricing Model We assume that there are a large number of identical agents, modeled as a representative
More informationMaintenance Management of Infrastructure Networks: Issues and Modeling Approach
Maintenance Management of Infrastructure Networks: Issues and Modeling Approach Network Optimization for Pavements Pontis System for Bridge Networks Integrated Infrastructure System for Beijing Common
More informationJEFF MACKIE-MASON. x is a random variable with prior distrib known to both principal and agent, and the distribution depends on agent effort e
BASE (SYMMETRIC INFORMATION) MODEL FOR CONTRACT THEORY JEFF MACKIE-MASON 1. Preliminaries Principal and agent enter a relationship. Assume: They have access to the same information (including agent effort)
More informationUnobserved Heterogeneity Revisited
Unobserved Heterogeneity Revisited Robert A. Miller Dynamic Discrete Choice March 2018 Miller (Dynamic Discrete Choice) cemmap 7 March 2018 1 / 24 Distributional Assumptions about the Unobserved Variables
More informationPortfolio Optimization using Conditional Sharpe Ratio
International Letters of Chemistry, Physics and Astronomy Online: 2015-07-01 ISSN: 2299-3843, Vol. 53, pp 130-136 doi:10.18052/www.scipress.com/ilcpa.53.130 2015 SciPress Ltd., Switzerland Portfolio Optimization
More informationImportance Sampling for Fair Policy Selection
Importance Sampling for Fair Policy Selection Shayan Doroudi Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Carnegie Mellon University Pittsburgh, PA 15213 philipt@cs.cmu.edu
More informationEE365: Markov Decision Processes
EE365: Markov Decision Processes Markov decision processes Markov decision problem Examples 1 Markov decision processes 2 Markov decision processes add input (or action or control) to Markov chain with
More informationMultiproduct-Firm Oligopoly: An Aggregative Games Approach
Multiproduct-Firm Oligopoly: An Aggregative Games Approach Volker Nocke 1 Nicolas Schutz 2 1 UCLA 2 University of Mannheim ASSA ES Meetings, Philadephia, 2018 Nocke and Schutz (UCLA &Mannheim) Multiproduct-Firm
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May, 204 Review of Game heory: Let M be a matrix with all elements in [0, ]. Mindy (called the row player) chooses
More informationData-Driven Pricing of Demand Response
Data-Driven Pricing of Demand Response Kia Khezeli Eilyan Bitar Abstract We consider the setting in which an electric power utility seeks to curtail its peak electricity demand by offering a fixed group
More informationChapter 6 Analyzing Accumulated Change: Integrals in Action
Chapter 6 Analyzing Accumulated Change: Integrals in Action 6. Streams in Business and Biology You will find Excel very helpful when dealing with streams that are accumulated over finite intervals. Finding
More informationX ln( +1 ) +1 [0 ] Γ( )
Problem Set #1 Due: 11 September 2014 Instructor: David Laibson Economics 2010c Problem 1 (Growth Model): Recall the growth model that we discussed in class. We expressed the sequence problem as ( 0 )=
More informationSOLVING ROBUST SUPPLY CHAIN PROBLEMS
SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated
More information