Dynamic Pricing with Varying Cost

Size: px
Start display at page:

Download "Dynamic Pricing with Varying Cost"

Transcription

1 Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu

2 Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy 4 Regret Analysis 5 Numerical Results Dynamic Pricing with Varying Cost 3 / 21

3 Online Pricing Problem A company sells a product online. Customers arrive one at a time and buy one unit of the product if the price is lower than their willingness to pay (WTP). Customers are homogenous, having the same WTP distribution. The company has a menu of prices, e.g., 3.99, 4.99 and 5.99, to choose from. Question: How to set the price? Dynamic Pricing with Varying Cost 4 / 21

4 Learning and Earning The objective is to maximize the cumulative profit by adaptively offering different prices to different customers. The decision maker faces a tradeoff between exploration of the acceptance probabilities at different prices (learning) and exploitation of the immediate profit (earning). The problem was first introduced by Rothschild (1974). Without any assumptions on the WTP distribution, the problem is typically formulated as a multi-armed bandit (MAB) problem. Dynamic Pricing with Varying Cost 5 / 21

5 Multi-armed Bandit Originally formulated by Robbins (1952), the MAB is an important class of sequential optimization problems. Objective: Devise a sampling policy among a group of K 2 statistical populations (arms) that maximizes expected cumulative reward over a finite time horizon. Dynamic Pricing with Varying Cost 6 / 21

6 Multi-armed Bandit Policies are evaluated based on the regret, R[T] = T E [ ] µ i µ It, t=1 where i is the optimal arm and I t is the arm chosen in period t. Lai and Robbins (1985) proved that the regret for the MAB problem has to grow at least O ( log T ). The upper-confidence-bound (UCB) policy of Auer et al has R[T] C log T for some constant C > 0. Dynamic Pricing with Varying Cost 7 / 21

7 UCB Policy 1 Initialization: Play each arm once. 2 Loop: Play arm j that maximizes µ j + 2 log t T j (t 1) where µ j is the average reward obtained from arm j, T j (t 1) is the number of times arm j has been played so far and t is the overall number of plays done so far. Dynamic Pricing with Varying Cost 8 / 21

8 Varying Cost In some practical applications, costs may vary for different customers. Online sales of an insurance product: Potential customers are usually asked to fill questionnaires before getting quotes for the product. The insurance company is able to assess the potential risk (cost) of each individual customer through these questionnaires. The cost for each customer, as is often the case, varies. To maximize the cumulative profit, different premiums (prices) should be asked for different customers based on their costs. Other examples include: the sales of some perishable goods, e.g. gasoline, fresh fruit, etc. Dynamic Pricing with Varying Cost 9 / 21

9 Notation T: Total length of of the selling periods (or customers). c t : The cost observed in period t. We assume they are i.i.d. samples from a fixed (unknown) distribution on C. p 1 < p 2 < p K : Prices choices. We assume p 1 > max c C c. K = {1, 2..., K}: index set of all the prices. µ k (c): The profit function of price k when the cost is c, ( µ k (c) = E[D(p k )] (p k c) = π k 1 c ), p k where π k = E[D(p k )]p k is the expected revenue at p k. We also assume that the observed revenue D(p k )p k [0, 1]. Dynamic Pricing with Varying Cost 10 / 21

10 Problem Formulation Consider a company selling a product over T (unknown) periods. At the beginning of each period t, upon observing a cost c t, the decision maker needs to choose a price p k where k K. The index of the true optimal price at time t is: i (c t ) = arg max k K µ k (c t ) Let I t (c t ) be the index of the price chosen by a pricing policy. Objective: Find a pricing policy that minimizes the cumulative regret: T R [T] = E [ µ i (c t ) (c t ) µ It (c t ) (c t ) ]. t=1 Dynamic Pricing with Varying Cost 11 / 21

11 Why Considering Varying Cost? Without considering the varying cost, suppose one uses the expected cost E(c t ) in making pricing decision. The problem becomes a MAB problem max k K µ k (E(c t )) Considering varying cost, the problem is max k K µ k (c t ) By Jensen s Inequality, [ ] max µ k (E(c t )) E max µ k (c t ) k K k K Dynamic Pricing with Varying Cost 12 / 21

12 Special Features Notice that µ k (c) = π k ( 1 c p k ), for any k K, the straight line µ k (c) always passes a fixed point [ p k, 0 ] and [0, π k ]. Precisely estimating π k is crucial. Dynamic Pricing with Varying Cost 13 / 21

13 Pricing Policy 1. Initialization: For t K, choose each price once. 2. Loop: For t > K Estimate revenue for each p k, and let π k,t = T 1 k (t 1) Π (p k ) T k (t 1) i where Π (p k ) i is the i-th realization of the revenue of p k. Write down the upper bound of the profit function for each p k in UCB manner, let µ k,t (c) = π k,t + Choose the price with index, i=1 2 log t T k (t 1) I t (c t ) = arg max k K µ k,t (c t ) ) (1 cpk Dynamic Pricing with Varying Cost 14 / 21

14 Main Results Theorem (1) If K = 2 and π 1 > π 2, the cumulative regret is bounded by R [T] C 1 ( log T ) 2 where C 1 is a positive constant that depends on the configuration of µ 1 (c) and µ 2 (c). Regret is mainly caused by the inaccurate estimation of the intersection point of µ 1 (c) and µ 2 (c), and the regret comes from the neighborhood of the intersection point. For each t, the expected regret is bounded by constant log t t. The result can be extended to K > 2 under some conditions. Dynamic Pricing with Varying Cost 15 / 21

15 Illustration Dynamic Pricing with Varying Cost 16 / 21

16 Intuitions The information learned at one value of the cost can also be used at other values of cost. Our problem is significantly more difficult than the standard MAB problem, because the profits can be arbitrarily close, making the selection very difficult. Yet, the regret is not much worse, O ( ( log T ) 2 ) compared to O ( log T ). The regret comes from the inaccurate estimation of the intersection point and, thus, causing wrong decisions. Dynamic Pricing with Varying Cost 17 / 21

17 A Special Case: C < If C <, at any cost, there is an gap between µ 1 (c) and µ 2 (c). Then, the inaccurate estimation of the intersection point will not happen infinitely often. What s the implication? Dynamic Pricing with Varying Cost 18 / 21

18 A Constant Bound Theorem (3) If C < and none of the feasible prices is inferior, then there exists a constant C 2 such that R [T] C 2. We would expect the problem with varying cost is more difficult than the one with constant cost. It is not! Because every price is good for some costs, one does not have to conduct exploration on prices that do not look good. Dynamic Pricing with Varying Cost 19 / 21

19 Numerical Results The cumulative regret with respective to T C={1,3} C=[0,4] Cumulative Regret (normalized) T x 10 4 Figure: p 1 = 4.0, p 2 = 4.1, π 1 = 0.6, π 2 = 0.59 Dynamic Pricing with Varying Cost 20 / 21

20 Q & A Thank you! Dynamic Pricing with Varying Cost 21 / 21

Multi-armed bandit problems

Multi-armed bandit problems Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before

More information

Zooming Algorithm for Lipschitz Bandits

Zooming Algorithm for Lipschitz Bandits Zooming Algorithm for Lipschitz Bandits Alex Slivkins Microsoft Research New York City Based on joint work with Robert Kleinberg and Eli Upfal (STOC'08) Running examples Dynamic pricing. You release a

More information

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet

More information

Treatment Allocations Based on Multi-Armed Bandit Strategies

Treatment Allocations Based on Multi-Armed Bandit Strategies Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Tuning bandit algorithms in stochastic environments

Tuning bandit algorithms in stochastic environments Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference

More information

The Irrevocable Multi-Armed Bandit Problem

The Irrevocable Multi-Armed Bandit Problem The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Lecture 7: Bayesian approach to MAB - Gittins index

Lecture 7: Bayesian approach to MAB - Gittins index Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach

More information

Online Network Revenue Management using Thompson Sampling

Online Network Revenue Management using Thompson Sampling Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira

More information

D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018

D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018 D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING Arnoud V. den Boer University of Amsterdam N. Bora Keskin Duke University Rotterdam May 24, 2018 Dynamic pricing and learning: Learning

More information

Dynamic Programming and Reinforcement Learning

Dynamic Programming and Reinforcement Learning Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning

More information

Lecture 11: Bandits with Knapsacks

Lecture 11: Bandits with Knapsacks CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic

More information

Multi-armed bandits in dynamic pricing

Multi-armed bandits in dynamic pricing Multi-armed bandits in dynamic pricing Arnoud den Boer University of Twente, Centrum Wiskunde & Informatica Amsterdam Lancaster, January 11, 2016 Dynamic pricing A firm sells a product, with abundant inventory,

More information

Rollout Allocation Strategies for Classification-based Policy Iteration

Rollout Allocation Strategies for Classification-based Policy Iteration Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large

More information

Adaptive Experiments for Policy Choice. March 8, 2019

Adaptive Experiments for Policy Choice. March 8, 2019 Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:

More information

Bernoulli Bandits An Empirical Comparison

Bernoulli Bandits An Empirical Comparison Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050

More information

Regret Minimization against Strategic Buyers

Regret Minimization against Strategic Buyers Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and

More information

Bandit algorithms for tree search Applications to games, optimization, and planning

Bandit algorithms for tree search Applications to games, optimization, and planning Bandit algorithms for tree search Applications to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord Europe Journées MAS

More information

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This

More information

Approximate Revenue Maximization with Multiple Items

Approximate Revenue Maximization with Multiple Items Approximate Revenue Maximization with Multiple Items Nir Shabbat - 05305311 December 5, 2012 Introduction The paper I read is called Approximate Revenue Maximization with Multiple Items by Sergiu Hart

More information

An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking

An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking Mika Sumida School of Operations Research and Information Engineering, Cornell University, Ithaca, New York

More information

Recharging Bandits. Joint work with Nicole Immorlica.

Recharging Bandits. Joint work with Nicole Immorlica. Recharging Bandits Bobby Kleinberg Cornell University Joint work with Nicole Immorlica. NYU Machine Learning Seminar New York, NY 24 Oct 2017 Prologue Can you construct a dinner schedule that: never goes

More information

Multi-period mean variance asset allocation: Is it bad to win the lottery?

Multi-period mean variance asset allocation: Is it bad to win the lottery? Multi-period mean variance asset allocation: Is it bad to win the lottery? Peter Forsyth 1 D.M. Dang 1 1 Cheriton School of Computer Science University of Waterloo Guangzhou, July 28, 2014 1 / 29 The Basic

More information

Monte-Carlo Planning: Basic Principles and Recent Progress

Monte-Carlo Planning: Basic Principles and Recent Progress Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo

More information

Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme

Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10

More information

Bandit Problems with Lévy Payoff Processes

Bandit Problems with Lévy Payoff Processes Bandit Problems with Lévy Payoff Processes Eilon Solan Tel Aviv University Joint with Asaf Cohen Multi-Arm Bandits A single player sequential decision making problem. Time is continuous or discrete. The

More information

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,

More information

TTIC An Introduction to the Theory of Machine Learning. The Adversarial Multi-armed Bandit Problem Avrim Blum.

TTIC An Introduction to the Theory of Machine Learning. The Adversarial Multi-armed Bandit Problem Avrim Blum. TTIC 31250 An Introduction to the Theory of Machine Learning The Adversarial Multi-armed Bandit Problem Avrim Blum Start with recap 1 Algorithm Consider the following setting Each morning, you need to

More information

1 Dynamic programming

1 Dynamic programming 1 Dynamic programming A country has just discovered a natural resource which yields an income per period R measured in terms of traded goods. The cost of exploitation is negligible. The government wants

More information

1 Precautionary Savings: Prudence and Borrowing Constraints

1 Precautionary Savings: Prudence and Borrowing Constraints 1 Precautionary Savings: Prudence and Borrowing Constraints In this section we study conditions under which savings react to changes in income uncertainty. Recall that in the PIH, when you abstract from

More information

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 5 LECTURE OUTLINE Stopping problems Scheduling problems Minimax Control 1 PURE STOPPING PROBLEMS Two possible controls: Stop (incur a one-time stopping cost, and move

More information

Assortment Optimization Over Time

Assortment Optimization Over Time Assortment Optimization Over Time James M. Davis Huseyin Topaloglu David P. Williamson Abstract In this note, we introduce the problem of assortment optimization over time. In this problem, we have a sequence

More information

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 005 Seville, Spain, December 1-15, 005 WeA11.6 OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

Decision Theory: Value Iteration

Decision Theory: Value Iteration Decision Theory: Value Iteration CPSC 322 Decision Theory 4 Textbook 9.5 Decision Theory: Value Iteration CPSC 322 Decision Theory 4, Slide 1 Lecture Overview 1 Recap 2 Policies 3 Value Iteration Decision

More information

An optimal policy for joint dynamic price and lead-time quotation

An optimal policy for joint dynamic price and lead-time quotation Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming

More information

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming

Dynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role

More information

Hedging Under Jump Diffusions with Transaction Costs. Peter Forsyth, Shannon Kennedy, Ken Vetzal University of Waterloo

Hedging Under Jump Diffusions with Transaction Costs. Peter Forsyth, Shannon Kennedy, Ken Vetzal University of Waterloo Hedging Under Jump Diffusions with Transaction Costs Peter Forsyth, Shannon Kennedy, Ken Vetzal University of Waterloo Computational Finance Workshop, Shanghai, July 4, 2008 Overview Overview Single factor

More information

Bandit Learning with switching costs

Bandit Learning with switching costs Bandit Learning with switching costs Jian Ding, University of Chicago joint with: Ofer Dekel (MSR), Tomer Koren (Technion) and Yuval Peres (MSR) June 2016, Harvard University Online Learning with k -Actions

More information

Lower Bounds on Revenue of Approximately Optimal Auctions

Lower Bounds on Revenue of Approximately Optimal Auctions Lower Bounds on Revenue of Approximately Optimal Auctions Balasubramanian Sivan 1, Vasilis Syrgkanis 2, and Omer Tamuz 3 1 Computer Sciences Dept., University of Winsconsin-Madison balu2901@cs.wisc.edu

More information

Final exam solutions

Final exam solutions EE365 Stochastic Control / MS&E251 Stochastic Decision Models Profs. S. Lall, S. Boyd June 5 6 or June 6 7, 2013 Final exam solutions This is a 24 hour take-home final. Please turn it in to one of the

More information

Lec 1: Single Agent Dynamic Models: Nested Fixed Point Approach. K. Sudhir MGT 756: Empirical Methods in Marketing

Lec 1: Single Agent Dynamic Models: Nested Fixed Point Approach. K. Sudhir MGT 756: Empirical Methods in Marketing Lec 1: Single Agent Dynamic Models: Nested Fixed Point Approach K. Sudhir MGT 756: Empirical Methods in Marketing RUST (1987) MODEL AND ESTIMATION APPROACH A Model of Harold Zurcher Rust (1987) Empirical

More information

Stock Repurchase with an Adaptive Reservation Price: A Study of the Greedy Policy

Stock Repurchase with an Adaptive Reservation Price: A Study of the Greedy Policy Stock Repurchase with an Adaptive Reservation Price: A Study of the Greedy Policy Ye Lu Asuman Ozdaglar David Simchi-Levi November 8, 200 Abstract. We consider the problem of stock repurchase over a finite

More information

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1 Making Decisions CS 3793 Artificial Intelligence Making Decisions 1 Planning under uncertainty should address: The world is nondeterministic. Actions are not certain to succeed. Many events are outside

More information

Homework 3: Asset Pricing

Homework 3: Asset Pricing Homework 3: Asset Pricing Mohammad Hossein Rahmati November 1, 2018 1. Consider an economy with a single representative consumer who maximize E β t u(c t ) 0 < β < 1, u(c t ) = ln(c t + α) t= The sole

More information

Statistics for Managers Using Microsoft Excel 7 th Edition

Statistics for Managers Using Microsoft Excel 7 th Edition Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 7 Sampling Distributions Statistics for Managers Using Microsoft Excel 7e Copyright 2014 Pearson Education, Inc. Chap 7-1 Learning Objectives

More information

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning)

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) 1 / 24 Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) Julie Nutini MLRG - Winter Term 2 January 24 th, 2017 2 / 24 Monte Carlo Methods Monte Carlo (MC) methods are learning methods, used

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming

More information

The Neoclassical Growth Model

The Neoclassical Growth Model The Neoclassical Growth Model 1 Setup Three goods: Final output Capital Labour One household, with preferences β t u (c t ) (Later we will introduce preferences with respect to labour/leisure) Endowment

More information

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig]

Basic Framework. About this class. Rewards Over Time. [This lecture adapted from Sutton & Barto and Russell & Norvig] Basic Framework [This lecture adapted from Sutton & Barto and Russell & Norvig] About this class Markov Decision Processes The Bellman Equation Dynamic Programming for finding value functions and optimal

More information

Notes on Intertemporal Optimization

Notes on Intertemporal Optimization Notes on Intertemporal Optimization Econ 204A - Henning Bohn * Most of modern macroeconomics involves models of agents that optimize over time. he basic ideas and tools are the same as in microeconomics,

More information

Uncertainty in Equilibrium

Uncertainty in Equilibrium Uncertainty in Equilibrium Larry Blume May 1, 2007 1 Introduction The state-preference approach to uncertainty of Kenneth J. Arrow (1953) and Gérard Debreu (1959) lends itself rather easily to Walrasian

More information

Optimal Long-Term Supply Contracts with Asymmetric Demand Information. Appendix

Optimal Long-Term Supply Contracts with Asymmetric Demand Information. Appendix Optimal Long-Term Supply Contracts with Asymmetric Demand Information Ilan Lobel Appendix Wenqiang iao {ilobel, wxiao}@stern.nyu.edu Stern School of Business, New York University Appendix A: Proofs Proof

More information

MATH 425: BINOMIAL TREES

MATH 425: BINOMIAL TREES MATH 425: BINOMIAL TREES G. BERKOLAIKO Summary. These notes will discuss: 1-level binomial tree for a call, fair price and the hedging procedure 1-level binomial tree for a general derivative, fair price

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Extraction capacity and the optimal order of extraction. By: Stephen P. Holland

Extraction capacity and the optimal order of extraction. By: Stephen P. Holland Extraction capacity and the optimal order of extraction By: Stephen P. Holland Holland, Stephen P. (2003) Extraction Capacity and the Optimal Order of Extraction, Journal of Environmental Economics and

More information

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete)

Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Information Acquisition under Persuasive Precedent versus Binding Precedent (Preliminary and Incomplete) Ying Chen Hülya Eraslan March 25, 2016 Abstract We analyze a dynamic model of judicial decision

More information

Posted-Price Mechanisms and Prophet Inequalities

Posted-Price Mechanisms and Prophet Inequalities Posted-Price Mechanisms and Prophet Inequalities BRENDAN LUCIER, MICROSOFT RESEARCH WINE: CONFERENCE ON WEB AND INTERNET ECONOMICS DECEMBER 11, 2016 The Plan 1. Introduction to Prophet Inequalities 2.

More information

E-companion to Coordinating Inventory Control and Pricing Strategies for Perishable Products

E-companion to Coordinating Inventory Control and Pricing Strategies for Perishable Products E-companion to Coordinating Inventory Control and Pricing Strategies for Perishable Products Xin Chen International Center of Management Science and Engineering Nanjing University, Nanjing 210093, China,

More information

Forecast Horizons for Production Planning with Stochastic Demand

Forecast Horizons for Production Planning with Stochastic Demand Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December

More information

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006 On the convergence of Q-learning Elif Özge Özdamar elif.ozdamar@helsinki.fi T-61.6020 Reinforcement Learning - Theory and Applications February 14, 2006 the covergence of stochastic iterative algorithms

More information

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018 Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction

More information

Information aggregation for timing decision making.

Information aggregation for timing decision making. MPRA Munich Personal RePEc Archive Information aggregation for timing decision making. Esteban Colla De-Robertis Universidad Panamericana - Campus México, Escuela de Ciencias Económicas y Empresariales

More information

The Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition.

The Real Numbers. Here we show one way to explicitly construct the real numbers R. First we need a definition. The Real Numbers Here we show one way to explicitly construct the real numbers R. First we need a definition. Definitions/Notation: A sequence of rational numbers is a funtion f : N Q. Rather than write

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Problem Set 3. Thomas Philippon. April 19, Human Wealth, Financial Wealth and Consumption

Problem Set 3. Thomas Philippon. April 19, Human Wealth, Financial Wealth and Consumption Problem Set 3 Thomas Philippon April 19, 2002 1 Human Wealth, Financial Wealth and Consumption The goal of the question is to derive the formulas on p13 of Topic 2. This is a partial equilibrium analysis

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

4 Martingales in Discrete-Time

4 Martingales in Discrete-Time 4 Martingales in Discrete-Time Suppose that (Ω, F, P is a probability space. Definition 4.1. A sequence F = {F n, n = 0, 1,...} is called a filtration if each F n is a sub-σ-algebra of F, and F n F n+1

More information

Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment

Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing Acquisition and Redevelopment Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3-6, 2012 Dynamic and Stochastic Knapsack-Type Models for Foreclosed Housing

More information

Finite Memory and Imperfect Monitoring

Finite Memory and Imperfect Monitoring Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve

More information

Optimizing S-shaped utility and risk management

Optimizing S-shaped utility and risk management Optimizing S-shaped utility and risk management Ineffectiveness of VaR and ES constraints John Armstrong (KCL), Damiano Brigo (Imperial) Quant Summit March 2018 Are ES constraints effective against rogue

More information

Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms

Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms 1 Dynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms Pouya Tehrani, Yixuan Zhai, Qing Zhao Department of Electrical and Computer Engineering University of California,

More information

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective

Modelling Anti-Terrorist Surveillance Systems from a Queueing Perspective Systems from a Queueing Perspective September 7, 2012 Problem A surveillance resource must observe several areas, searching for potential adversaries. Problem A surveillance resource must observe several

More information

We study a seller that starts with an initial inventory of goods, has a target horizon over which to sell the

We study a seller that starts with an initial inventory of goods, has a target horizon over which to sell the MANAGEMENT SCIENCE Vol. 58, No. 9, September 212, pp. 1715 1731 ISSN 25-199 (print) ISSN 1526-551 (online) http://dx.doi.org/1.1287/mnsc.111.1513 212 INFORMS Dynamic Pricing with Financial Milestones:

More information

Optimization Models in Financial Mathematics

Optimization Models in Financial Mathematics Optimization Models in Financial Mathematics John R. Birge Northwestern University www.iems.northwestern.edu/~jrbirge Illinois Section MAA, April 3, 2004 1 Introduction Trends in financial mathematics

More information

Dynamically Scheduling and Maintaining a Flexible Server

Dynamically Scheduling and Maintaining a Flexible Server Dynamically Scheduling and Maintaining a Flexible Server Jefferson Huang Operations Research Department Naval Postgraduate School INFORMS Annual Meeting November 7, 2018 Co-Authors: Douglas Down (McMaster),

More information

High Dimensional Bayesian Optimisation and Bandits via Additive Models

High Dimensional Bayesian Optimisation and Bandits via Additive Models 1/20 High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kandasamy, Jeff Schneider, Barnabás Póczos ICML 15 July 8 2015 2/20 Bandits & Optimisation Maximum Likelihood inference

More information

Competing Mechanisms with Limited Commitment

Competing Mechanisms with Limited Commitment Competing Mechanisms with Limited Commitment Suehyun Kwon CESIFO WORKING PAPER NO. 6280 CATEGORY 12: EMPIRICAL AND THEORETICAL METHODS DECEMBER 2016 An electronic version of the paper may be downloaded

More information

Handout 4: Deterministic Systems and the Shortest Path Problem

Handout 4: Deterministic Systems and the Shortest Path Problem SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas

More information

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm

CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm CS 234 Winter 2019 Assignment 1 Due: January 23 at 11:59 pm For submission instructions please refer to website 1 Optimal Policy for Simple MDP [20 pts] Consider the simple n-state MDP shown in Figure

More information

Regret Minimization and Security Strategies

Regret Minimization and Security Strategies Chapter 5 Regret Minimization and Security Strategies Until now we implicitly adopted a view that a Nash equilibrium is a desirable outcome of a strategic game. In this chapter we consider two alternative

More information

EE266 Homework 5 Solutions

EE266 Homework 5 Solutions EE, Spring 15-1 Professor S. Lall EE Homework 5 Solutions 1. A refined inventory model. In this problem we consider an inventory model that is more refined than the one you ve seen in the lectures. The

More information

The Value of Information in Central-Place Foraging. Research Report

The Value of Information in Central-Place Foraging. Research Report The Value of Information in Central-Place Foraging. Research Report E. J. Collins A. I. Houston J. M. McNamara 22 February 2006 Abstract We consider a central place forager with two qualitatively different

More information

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models

Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models Optimally Thresholded Realized Power Variations for Lévy Jump Diffusion Models José E. Figueroa-López 1 1 Department of Statistics Purdue University University of Missouri-Kansas City Department of Mathematics

More information

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010

STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics. Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010 STATE UNIVERSITY OF NEW YORK AT ALBANY Department of Economics Ph. D. Comprehensive Examination: Macroeconomics Fall, 2010 Section 1. (Suggested Time: 45 Minutes) For 3 of the following 6 statements, state

More information

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models IEOR E4707: Foundations of Financial Engineering c 206 by Martin Haugh Martingale Pricing Theory in Discrete-Time and Discrete-Space Models These notes develop the theory of martingale pricing in a discrete-time,

More information

Lecture 1: Lucas Model and Asset Pricing

Lecture 1: Lucas Model and Asset Pricing Lecture 1: Lucas Model and Asset Pricing Economics 714, Spring 2018 1 Asset Pricing 1.1 Lucas (1978) Asset Pricing Model We assume that there are a large number of identical agents, modeled as a representative

More information

Maintenance Management of Infrastructure Networks: Issues and Modeling Approach

Maintenance Management of Infrastructure Networks: Issues and Modeling Approach Maintenance Management of Infrastructure Networks: Issues and Modeling Approach Network Optimization for Pavements Pontis System for Bridge Networks Integrated Infrastructure System for Beijing Common

More information

JEFF MACKIE-MASON. x is a random variable with prior distrib known to both principal and agent, and the distribution depends on agent effort e

JEFF MACKIE-MASON. x is a random variable with prior distrib known to both principal and agent, and the distribution depends on agent effort e BASE (SYMMETRIC INFORMATION) MODEL FOR CONTRACT THEORY JEFF MACKIE-MASON 1. Preliminaries Principal and agent enter a relationship. Assume: They have access to the same information (including agent effort)

More information

Unobserved Heterogeneity Revisited

Unobserved Heterogeneity Revisited Unobserved Heterogeneity Revisited Robert A. Miller Dynamic Discrete Choice March 2018 Miller (Dynamic Discrete Choice) cemmap 7 March 2018 1 / 24 Distributional Assumptions about the Unobserved Variables

More information

Portfolio Optimization using Conditional Sharpe Ratio

Portfolio Optimization using Conditional Sharpe Ratio International Letters of Chemistry, Physics and Astronomy Online: 2015-07-01 ISSN: 2299-3843, Vol. 53, pp 130-136 doi:10.18052/www.scipress.com/ilcpa.53.130 2015 SciPress Ltd., Switzerland Portfolio Optimization

More information

Importance Sampling for Fair Policy Selection

Importance Sampling for Fair Policy Selection Importance Sampling for Fair Policy Selection Shayan Doroudi Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Carnegie Mellon University Pittsburgh, PA 15213 philipt@cs.cmu.edu

More information

EE365: Markov Decision Processes

EE365: Markov Decision Processes EE365: Markov Decision Processes Markov decision processes Markov decision problem Examples 1 Markov decision processes 2 Markov decision processes add input (or action or control) to Markov chain with

More information

Multiproduct-Firm Oligopoly: An Aggregative Games Approach

Multiproduct-Firm Oligopoly: An Aggregative Games Approach Multiproduct-Firm Oligopoly: An Aggregative Games Approach Volker Nocke 1 Nicolas Schutz 2 1 UCLA 2 University of Mannheim ASSA ES Meetings, Philadephia, 2018 Nocke and Schutz (UCLA &Mannheim) Multiproduct-Firm

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May, 204 Review of Game heory: Let M be a matrix with all elements in [0, ]. Mindy (called the row player) chooses

More information

Data-Driven Pricing of Demand Response

Data-Driven Pricing of Demand Response Data-Driven Pricing of Demand Response Kia Khezeli Eilyan Bitar Abstract We consider the setting in which an electric power utility seeks to curtail its peak electricity demand by offering a fixed group

More information

Chapter 6 Analyzing Accumulated Change: Integrals in Action

Chapter 6 Analyzing Accumulated Change: Integrals in Action Chapter 6 Analyzing Accumulated Change: Integrals in Action 6. Streams in Business and Biology You will find Excel very helpful when dealing with streams that are accumulated over finite intervals. Finding

More information

X ln( +1 ) +1 [0 ] Γ( )

X ln( +1 ) +1 [0 ] Γ( ) Problem Set #1 Due: 11 September 2014 Instructor: David Laibson Economics 2010c Problem 1 (Growth Model): Recall the growth model that we discussed in class. We expressed the sequence problem as ( 0 )=

More information

SOLVING ROBUST SUPPLY CHAIN PROBLEMS

SOLVING ROBUST SUPPLY CHAIN PROBLEMS SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated

More information