Regret Minimization against Strategic Buyers
|
|
- Ira Glenn
- 5 years ago
- Views:
Transcription
1 Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research
2
3 Motivation Online advertisement: revenue of modern search engine and popular online sites. billions of transactions every day. key role of revenue optimization algorithms.
4 Motivation Second-price auctions with reserve: widely adopted mechanism in Ad Exchanges. many transactions admit a single bidder posted-price auctions. study of posted-price auctions with strategic buyers.
5 Related Work Revenue optimization in second-price auctions [Cui et al. 2011; He et al., 2013; Cesa-Bianchi et al., 2013; MM and Muñoz 2014]. Revenue optimization in generalized second-price auctions (GSP) [MM and Muñoz, 2015; Varian, 2007; Lucier et al., 2014; Sun et al, 2014; Rudolph et al. 2016; Charles et al., 2016; Roughgarden and Wang, 2016]. Dynamic pricing [Kanoria and Nazerzadeh 2014, Bikhchandani and McCardle 2012; den Boer, 2015; Chen et al. 2015]. Pricing with strategic and patient buyers [Feldman et al., 2016]. Preference reconstruction [Blum et al. 2014]. Learning optimal auctions [Huang et al., 2015; Morgenstern and Roughgarden, 2015].
6 This Talk Scenarios: Fixed valuation. Random valuation.
7 Setup Repeated posted-price auctions: good repeatedly offered for sale by a seller to a single buyer over rounds. buyer holds private valuation. seller offers price and buyer accepts, a t =1, p t or rejects,, at each round. T v 2 [0, 1] a t =0 t 2 [T ]
8 Setup Seller: pricing algorithm A. total revenue: P T t=1 a tp t. regret: Reg T (A) =vt P T t=1 a tp t. Buyer: discounting factor 2 (0, 1]. surplus: Sur T (A) = P T t=1 t 1 a t (v p t ).
9 Strategic Setting [Amin et al., 2013] Seller announces his algorithm A. Buyer acts strategically: he seeks to maximize his surplus. Sur T (A) Seller seeks to minimize his strategic regret, that is regret Reg T (A) against strategic buyer. Question: can we design algorithms for minimizing strategic regret?
10 Truthful Setting Fast Search (FS) algorithm: [Kleinberg and Leighton, 2003] keeps track of feasible interval with and parameter. [0, 1] = 1 2 [a, b], starting in each phase, offers prices until a price is rejected. a +,a+2,... if price a + k is rejected, new phase with interval [a +(k 1),a+ k ] and parameter. until size of the interval less than. 1 T 2
11 Truthful Setting Fast Search (FS) algorithm: [Kleinberg and Leighton, 2003] at most dlog 2 log 2 T )e +1 phases. regret in. O(log log T ) lower bound: (log log T ).
12 Example v = 16 $8? $4? $2?!!! $1? No No No YES!
13 Monotone algorithms Algorithm: offer price p t = t ( < 1) until it is accepted. offer accepted price thereafter. idea: slow enough decrease inconvenient for the buyer. q O T 1 Strategic regret in. [Amin et al., 2013]
14 Monotone algorithms Theorem: the strategic regret of any monotone p decreasing convex algorithm is in ( T ). [MM and Muñoz, 2014]
15 Proof idea Fix monotone function. Choose v 2 [ 1 2, 1] at random. Let apple =inf{t: p t <v}, then E[apple] E[v p apple ] c. Tradeoff optimized for p t p t+1 p 1. T
16 Lower Bound [Amin et al. 2013; Kleinberg and Leighton, 2003; MM and Muñoz, 2014] Theorem: for any pricing algorithm following lower bound holds: Reg T (A) max A, the 1,Clog log T 12(1 ) for some universal constant C.
17 Idea Lie buyer when rejecting v>p t or accepting when v<p t. Can we dissuade the buyer from lying? buyer s weakness: time (discounted surplus). penalization: if buyer rejects price, reoffer the price for another (r 1) rounds. r choice of subject to a trade-off.
18 Pricing Strategies Any deterministic strategy can be represented by a tree. 1/2 1/4 3/4 1/8 3/8 5/8 7/8
19 Meta-Algorithm 1/2 1/4 3/4 Strategic 1/8 3/8 5/8 7/8 1/2 Truthful r rounds 1/2 3/4 1/4 3/4 7/8
20 PFS Guarantees [MM and Muñoz, 2014] Theorem: let 0 2 ( 1 ; the following strategic 2, 1) regret guarantees hold for Penalized Fast Search (PFS): for ; Reg T (PFS) = O(log log T ) 2 (0, 1 2 ] Reg T (PFS) = O(log T log log T ) for 2 ( 1. 2, 0)
21 Proof Idea Surplus of rejected path at most t+r 1 1 p t Surplus of accepted path at least t 1 (v p t ) (v p t ) apple 1 r
22 Horizon-Indep. Regret Extension of PFS via exponentiating trick to horizon-independent algorithm : i gpfs length of th epoch verifies log 2 log 2 T i =2 i 1. Reg for 2 (0, 1 T ( PFS) g = O(log log T ) ; 2 ] Reg for 2 ( 1 T ( PFS) g = O(log T log log T ). 2, 0) [Drutsa, 2017]
23 Further Improvement PRRFES algorithm: truthful FES algorithm: modified FS; after rejection, reoffer last accepted price g times. same lie penalization as in PFS. continue to offer until rejection. [Drutsa, 2017] strategic regret: for 2 (0, 0]. Reg T (PRRFES) = O(log log T )
24 Random valuations
25 Setup [Amin et al., 2013] Repeated posted-price auctions: good repeatedly offered for sale by a seller to a single buyer over rounds. buyer receives valuation v t 2 [0, 1], v t D. seller offers price and buyer accepts, a t =1, p t or rejects, a t =0, at each round. T t 2 [T ]
26 Setup Seller: pricing algorithm A. total revenue:. regret: Buyer: discounting factor 2 (0, 1]. surplus: E P T t=1 a tp t Reg T (A) = max p2p Sur T (A) =E p P(v >p)t E apple T X t=1 apple T X t=1 t 1 a t (v p t ). a t p t.
27 Strategic buyers Simple tree structure for fixed valuation. Seller offers price from distribution. Surplus of state s t =(P t,h t 1,v t,p t ): S t (s t ) = max t 1 a t (v t p t ) a t 2{0,1} + E St+1 (f t (P t,h t 1 ),H t,v t+1,p t+1. (v t+1,p t+1 ) D f t (P t,h t 1 ) Solution found in time T P. P t
28 -strategic buyers Stop optimizing if all future surplus is at most. Behave truthfully otherwise, Optimize for Tractable MDP. log[ 1 (1 )] log 1 rounds.
29 Bandit Formulation Only observe reward of price offered. Minimize pseudo-regret Reg T (A) = max p2p p P(v >p)t E apple X T a t p t. t=1 Problem: rewards not i.i.d. (strategic buyer).
30 Regret bound [MM and Muñoz, 2015] Theorem: Let be a finite set of prices. Let be the number of time the buyer lies. Let P p = argmax p2p and p = p P(v >p ) p P(v >p). For any > 0, Reg T apple E[L]+ p P(v >p) X p : p> where T p (t) is the number of times price p has been offered up to time. t E[T p (t)] p + T L
31 R-UCB Make UCB robust to lies. Use different upper confidence bounds bµ p (t) = 1 T p (t) tx a i p i 1 pi =p i=1 s 2 log t + T p (t) + Lp T p (t)
32 Regret analysis Proposition: The regret of the R-UCB algorithm is bounded by Reg T apple E[L]+ X 4Lp + p: p> 32 log T p +2 p + T + TX t=1 P t (p, L), where P t (p, L) :=P Lt (p) T p (t) + L t(p ) T (t) L p T p (t) + p T (t).
33 Bound on Lies An -strategic buyer lies at most. Regret of R-UCB in O log T +. Extension to continuous set of prices by discretization. Regret in O. p T + T 1/4 1 1 P l log(1/ (1 )) m log(1/ )
34 Conclusion Analysis of strategic regret. Fixed and random valuation scenarios. Simple algorithms extending truthful scenario. Many questions: Can we extend results to other types of buyers? What about if the buyer learns too? Extension to general auctions?
35 Other Related Questions Can the buyer trust the algorithm announced? testing incentive-compatibility [Lahaie, Muñoz, Sivan, and Vassilvitskii, 2017] (Andres s talk). Extend analysis to the case where algorithmic details are not known.
Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme
Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10
More informationRevenue optimization in AdExchange against strategic advertisers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationOptimal Regret Minimization in Posted-Price Auctions with Strategic Buyers
Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers Mehryar Mohri Courant Institute and Google Research 251 Mercer Street New York, NY 10012 mohri@cims.nyu.edu Andres Muñoz Medina
More informationHorizon-Independent Optimal Pricing in Repeated Auctions with Truthful and Strategic Buyers
Horizon-Independent Optimal Pricing in Repeated Auctions with Truthful and Strategic Buyers Alexey Drutsa Yandex, 16, Leo Tolstoy St. Moscow, Russia adrutsa@yandex.ru ABSTRACT We study revenue optimization
More informationarxiv: v1 [cs.lg] 23 Nov 2014
Revenue Optimization in Posted-Price Auctions with Strategic Buyers arxiv:.0v [cs.lg] Nov 0 Mehryar Mohri Courant Institute and Google Research Mercer Street New York, NY 00 mohri@cims.nyu.edu Abstract
More informationMulti-armed bandit problems
Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationarxiv: v1 [cs.gt] 7 May 2018
Optimal Pricing in Repeated Posted-Price Auctions arxiv:1805.02574v1 [cs.gt] 7 May 2018 Arsenii Vanunts Yandex, MSU avanunts@yandex.ru Alexey Drutsa Yandex, MSU adrutsa@yandex.ru 19 March 2018 Abstract
More informationDynamic Pricing with Varying Cost
Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy
More informationPosted-Price Mechanisms and Prophet Inequalities
Posted-Price Mechanisms and Prophet Inequalities BRENDAN LUCIER, MICROSOFT RESEARCH WINE: CONFERENCE ON WEB AND INTERNET ECONOMICS DECEMBER 11, 2016 The Plan 1. Introduction to Prophet Inequalities 2.
More informationEssays on Some Combinatorial Optimization Problems with Interval Data
Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university
More informationLecture 11: Bandits with Knapsacks
CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic
More informationOnline Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
More informationCS364A: Algorithmic Game Theory Lecture #3: Myerson s Lemma
CS364A: Algorithmic Game Theory Lecture #3: Myerson s Lemma Tim Roughgarden September 3, 23 The Story So Far Last time, we introduced the Vickrey auction and proved that it enjoys three desirable and different
More informationTreatment Allocations Based on Multi-Armed Bandit Strategies
Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics
More informationD I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING. Rotterdam May 24, 2018
D I S C O N T I N U O U S DEMAND FUNCTIONS: ESTIMATION AND PRICING Arnoud V. den Boer University of Amsterdam N. Bora Keskin Duke University Rotterdam May 24, 2018 Dynamic pricing and learning: Learning
More informationAdaptive Experiments for Policy Choice. March 8, 2019
Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:
More informationDynamic Programming and Reinforcement Learning
Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning
More informationCS364B: Frontiers in Mechanism Design Lecture #18: Multi-Parameter Revenue-Maximization
CS364B: Frontiers in Mechanism Design Lecture #18: Multi-Parameter Revenue-Maximization Tim Roughgarden March 5, 2014 1 Review of Single-Parameter Revenue Maximization With this lecture we commence the
More informationRecap First-Price Revenue Equivalence Optimal Auctions. Auction Theory II. Lecture 19. Auction Theory II Lecture 19, Slide 1
Auction Theory II Lecture 19 Auction Theory II Lecture 19, Slide 1 Lecture Overview 1 Recap 2 First-Price Auctions 3 Revenue Equivalence 4 Optimal Auctions Auction Theory II Lecture 19, Slide 2 Motivation
More informationEE365: Risk Averse Control
EE365: Risk Averse Control Risk averse optimization Exponential risk aversion Risk averse control 1 Outline Risk averse optimization Exponential risk aversion Risk averse control Risk averse optimization
More informationBudget Management In GSP (2018)
Budget Management In GSP (2018) Yahoo! March 18, 2018 Miguel March 18, 2018 1 / 26 Today s Presentation: Budget Management Strategies in Repeated auctions, Balseiro, Kim, and Mahdian, WWW2017 Learning
More informationAn Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking
An Approximation Algorithm for Capacity Allocation over a Single Flight Leg with Fare-Locking Mika Sumida School of Operations Research and Information Engineering, Cornell University, Ithaca, New York
More informationOutline. Objective. Previous Results Our Results Discussion Current Research. 1 Motivation. 2 Model. 3 Results
On Threshold Esteban 1 Adam 2 Ravi 3 David 4 Sergei 1 1 Stanford University 2 Harvard University 3 Yahoo! Research 4 Carleton College The 8th ACM Conference on Electronic Commerce EC 07 Outline 1 2 3 Some
More informationLecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018
Lecture 2: Making Good Sequences of Decisions Given a Model of World CS234: RL Emma Brunskill Winter 218 Human in the loop exoskeleton work from Steve Collins lab Class Structure Last Time: Introduction
More informationMechanism Design and Auctions
Mechanism Design and Auctions Game Theory Algorithmic Game Theory 1 TOC Mechanism Design Basics Myerson s Lemma Revenue-Maximizing Auctions Near-Optimal Auctions Multi-Parameter Mechanism Design and the
More informationMDP Algorithms. Thomas Keller. June 20, University of Basel
MDP Algorithms Thomas Keller University of Basel June 20, 208 Outline of this lecture Markov decision processes Planning via determinization Monte-Carlo methods Monte-Carlo Tree Search Heuristic Search
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationLower Bounds on Revenue of Approximately Optimal Auctions
Lower Bounds on Revenue of Approximately Optimal Auctions Balasubramanian Sivan 1, Vasilis Syrgkanis 2, and Omer Tamuz 3 1 Computer Sciences Dept., University of Winsconsin-Madison balu2901@cs.wisc.edu
More informationSingle-Parameter Mechanisms
Algorithmic Game Theory, Summer 25 Single-Parameter Mechanisms Lecture 9 (6 pages) Instructor: Xiaohui Bei In the previous lecture, we learned basic concepts about mechanism design. The goal in this area
More informationPricing a Low-regret Seller
Hoda Heidari Mohammad Mahdian Umar Syed Sergei Vassilvitskii Sadra Yazdanbod HODA@CIS.UPENN.EDU MAHDIAN@GOOGLE.COM USYED@GOOGLE.COM SERGEIV@GOOGLE.COM YAZDANBOD@GATECH.EDU Abstract As the number of ad
More informationA lower bound on seller revenue in single buyer monopoly auctions
A lower bound on seller revenue in single buyer monopoly auctions Omer Tamuz October 7, 213 Abstract We consider a monopoly seller who optimally auctions a single object to a single potential buyer, with
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationTuning bandit algorithms in stochastic environments
Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More informationFrom the Assignment Model to Combinatorial Auctions
From the Assignment Model to Combinatorial Auctions IPAM Workshop, UCLA May 7, 2008 Sushil Bikhchandani & Joseph Ostroy Overview LP formulations of the (package) assignment model Sealed-bid and ascending-price
More informationRecharging Bandits. Joint work with Nicole Immorlica.
Recharging Bandits Bobby Kleinberg Cornell University Joint work with Nicole Immorlica. NYU Machine Learning Seminar New York, NY 24 Oct 2017 Prologue Can you construct a dinner schedule that: never goes
More informationCorrelation-Robust Mechanism Design
Correlation-Robust Mechanism Design NICK GRAVIN and PINIAN LU ITCS, Shanghai University of Finance and Economics In this letter, we discuss the correlation-robust framework proposed by Carroll [Econometrica
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use
More informationMonte-Carlo Planning: Basic Principles and Recent Progress
Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo
More informationLecture outline W.B.Powell 1
Lecture outline What is a policy? Policy function approximations (PFAs) Cost function approximations (CFAs) alue function approximations (FAs) Lookahead policies Finding good policies Optimizing continuous
More informationMatching Markets and Google s Sponsored Search
Matching Markets and Google s Sponsored Search Part III: Dynamics Episode 9 Baochun Li Department of Electrical and Computer Engineering University of Toronto Matching Markets (Required reading: Chapter
More informationAlgorithmic Game Theory (a primer) Depth Qualifying Exam for Ashish Rastogi (Ph.D. candidate)
Algorithmic Game Theory (a primer) Depth Qualifying Exam for Ashish Rastogi (Ph.D. candidate) 1 Game Theory Theory of strategic behavior among rational players. Typical game has several players. Each player
More information,,, be any other strategy for selling items. It yields no more revenue than, based on the
ONLINE SUPPLEMENT Appendix 1: Proofs for all Propositions and Corollaries Proof of Proposition 1 Proposition 1: For all 1,2,,, if, is a non-increasing function with respect to (henceforth referred to as
More informationSublinear Time Algorithms Oct 19, Lecture 1
0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation
More informationMulti-Armed Bandit, Dynamic Environments and Meta-Bandits
Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This
More informationEvaluation of proportional portfolio insurance strategies
Evaluation of proportional portfolio insurance strategies Prof. Dr. Antje Mahayni Department of Accounting and Finance, Mercator School of Management, University of Duisburg Essen 11th Scientific Day of
More informationBernoulli Bandits An Empirical Comparison
Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050
More informationMulti-period mean variance asset allocation: Is it bad to win the lottery?
Multi-period mean variance asset allocation: Is it bad to win the lottery? Peter Forsyth 1 D.M. Dang 1 1 Cheriton School of Computer Science University of Waterloo Guangzhou, July 28, 2014 1 / 29 The Basic
More informationRobust Dual Dynamic Programming
1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217 2 / 18 Inspired by SDDP Stochastic optimization
More informationOptimal selling rules for repeated transactions.
Optimal selling rules for repeated transactions. Ilan Kremer and Andrzej Skrzypacz March 21, 2002 1 Introduction In many papers considering the sale of many objects in a sequence of auctions the seller
More informationCS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 4: Prior-Free Single-Parameter Mechanism Design. Instructor: Shaddin Dughmi
CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 4: Prior-Free Single-Parameter Mechanism Design Instructor: Shaddin Dughmi Administrivia HW out, due Friday 10/5 Very hard (I think) Discuss
More informationThe Complexity of Simple and Optimal Deterministic Mechanisms for an Additive Buyer. Xi Chen, George Matikas, Dimitris Paparas, Mihalis Yannakakis
The Complexity of Simple and Optimal Deterministic Mechanisms for an Additive Buyer Xi Chen, George Matikas, Dimitris Paparas, Mihalis Yannakakis Seller has n items for sale The Set-up Seller has n items
More informationNetworks: Fall 2010 Homework 3 David Easley and Jon Kleinberg Due in Class September 29, 2010
Networks: Fall 00 Homework David Easley and Jon Kleinberg Due in Class September 9, 00 As noted on the course home page, homework solutions must be submitted by upload to the CMS site, at https://cms.csuglab.cornell.edu/.
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationOptimal Auctions. Game Theory Course: Jackson, Leyton-Brown & Shoham
Game Theory Course: Jackson, Leyton-Brown & Shoham So far we have considered efficient auctions What about maximizing the seller s revenue? she may be willing to risk failing to sell the good she may be
More informationA simulation study of two combinatorial auctions
A simulation study of two combinatorial auctions David Nordström Department of Economics Lund University Supervisor: Tommy Andersson Co-supervisor: Albin Erlanson May 24, 2012 Abstract Combinatorial auctions
More informationLecture 5: Iterative Combinatorial Auctions
COMS 6998-3: Algorithmic Game Theory October 6, 2008 Lecture 5: Iterative Combinatorial Auctions Lecturer: Sébastien Lahaie Scribe: Sébastien Lahaie In this lecture we examine a procedure that generalizes
More informationForecast Horizons for Production Planning with Stochastic Demand
Forecast Horizons for Production Planning with Stochastic Demand Alfredo Garcia and Robert L. Smith Department of Industrial and Operations Engineering Universityof Michigan, Ann Arbor MI 48109 December
More informationRollout Allocation Strategies for Classification-based Policy Iteration
Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large
More informationA Field Guide to Personalized Reserve Prices
A Field Guide to Personalized Reserve Prices Renato Paes Leme Martin Pál Sergei Vassilvitskii February 26, 2016 arxiv:1602.07720v1 [cs.gt] 24 Feb 2016 Abstract We study the question of setting and testing
More informationUniversal Portfolios
CS28B/Stat24B (Spring 2008) Statistical Learning Theory Lecture: 27 Universal Portfolios Lecturer: Peter Bartlett Scribes: Boriska Toth and Oriol Vinyals Portfolio optimization setting Suppose we have
More informationThe Irrevocable Multi-Armed Bandit Problem
The Irrevocable Multi-Armed Bandit Problem Ritesh Madan Qualcomm-Flarion Technologies May 27, 2009 Joint work with Vivek Farias (MIT) 2 Multi-Armed Bandit Problem n arms, where each arm i is a Markov Decision
More informationAn algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet
More informationMarkov Decision Processes
Markov Decision Processes Ryan P. Adams COS 324 Elements of Machine Learning Princeton University We now turn to a new aspect of machine learning, in which agents take actions and become active in their
More informationReinforcement Learning
Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards Grid World The agent
More informationGenerating Power Laws
Summary of Unit Six Generating Power Laws Introduction to Fractals and Scaling David P. Feldman http://www.complexityexplorer.org Rich-get-Richer Models Procedure for growing a network. At each step, make
More informationDynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming
Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role
More informationMicroeconomic Theory II Preliminary Examination Solutions
Microeconomic Theory II Preliminary Examination Solutions 1. (45 points) Consider the following normal form game played by Bruce and Sheila: L Sheila R T 1, 0 3, 3 Bruce M 1, x 0, 0 B 0, 0 4, 1 (a) Suppose
More informationSOLVING ROBUST SUPPLY CHAIN PROBLEMS
SOLVING ROBUST SUPPLY CHAIN PROBLEMS Daniel Bienstock Nuri Sercan Özbay Columbia University, New York November 13, 2005 Project with Lucent Technologies Optimize the inventory buffer levels in a complicated
More informationThe value of foresight
Philip Ernst Department of Statistics, Rice University Support from NSF-DMS-1811936 (co-pi F. Viens) and ONR-N00014-18-1-2192 gratefully acknowledged. IMA Financial and Economic Applications June 11, 2018
More informationAn optimal policy for joint dynamic price and lead-time quotation
Lingnan University From the SelectedWorks of Prof. LIU Liming November, 2011 An optimal policy for joint dynamic price and lead-time quotation Jiejian FENG Liming LIU, Lingnan University, Hong Kong Xianming
More informationMDPs and Value Iteration 2/20/17
MDPs and Value Iteration 2/20/17 Recall: State Space Search Problems A set of discrete states A distinguished start state A set of actions available to the agent in each state An action function that,
More informationThe revenue management literature for queues typically assumes that providers know the distribution of
MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol. 15, No. 2, Spring 2013, pp. 292 304 ISSN 1523-4614 (print) ISSN 1526-5498 (online) http://dx.doi.org/10.1287/msom.1120.0418 2013 INFORMS Bayesian Dynamic
More informationDRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics
Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward
More informationOptimal Bidding Strategies in Sequential Auctions 1
Auction- Inventory Optimal Bidding Strategies in Sequential Auctions 1 Management Science and Information Systems Michael N. Katehakis, CDDA Spring 2014 Workshop & IAB Meeting May 7th and 8th, 2014 1 Joint
More informationQuasi-Convex Stochastic Dynamic Programming
Quasi-Convex Stochastic Dynamic Programming John R. Birge University of Chicago Booth School of Business JRBirge SIAM FM12, MSP, 10 July 2012 1 General Theme Many dynamic optimization problems dealing
More informationand Pricing Problems
Mechanism Design, Machine Learning, and Pricing Problems Maria-Florina Balcan Carnegie Mellon University Overview Pricing and Revenue Maimization Software Pricing Digital Music Pricing Problems One Seller,
More informationCasino gambling problem under probability weighting
Casino gambling problem under probability weighting Sang Hu National University of Singapore Mathematical Finance Colloquium University of Southern California Jan 25, 2016 Based on joint work with Xue
More informationBandit Learning with switching costs
Bandit Learning with switching costs Jian Ding, University of Chicago joint with: Ofer Dekel (MSR), Tomer Koren (Technion) and Yuval Peres (MSR) June 2016, Harvard University Online Learning with k -Actions
More informationBounding Optimal Expected Revenues for Assortment Optimization under Mixtures of Multinomial Logits
Bounding Optimal Expected Revenues for Assortment Optimization under Mixtures of Multinomial Logits Jacob Feldman School of Operations Research and Information Engineering, Cornell University, Ithaca,
More informationDynamic Resource Allocation for Spot Markets in Cloud Computi
Dynamic Resource Allocation for Spot Markets in Cloud Computing Environments Qi Zhang 1, Quanyan Zhu 2, Raouf Boutaba 1,3 1 David. R. Cheriton School of Computer Science University of Waterloo 2 Department
More informationBudget Feasible Mechanism Design
Budget Feasible Mechanism Design YARON SINGER Harvard University In this letter we sketch a brief introduction to budget feasible mechanism design. This framework captures scenarios where the goal is to
More informationThe Optimality of Being Efficient. Lawrence Ausubel and Peter Cramton Department of Economics University of Maryland
The Optimality of Being Efficient Lawrence Ausubel and Peter Cramton Department of Economics University of Maryland 1 Common Reaction Why worry about efficiency, when there is resale? Our Conclusion Why
More informationHigh Dimensional Bayesian Optimisation and Bandits via Additive Models
1/20 High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kandasamy, Jeff Schneider, Barnabás Póczos ICML 15 July 8 2015 2/20 Bandits & Optimisation Maximum Likelihood inference
More informationTDT4171 Artificial Intelligence Methods
TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods
More informationTHE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE
THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE GÜNTER ROTE Abstract. A salesperson wants to visit each of n objects that move on a line at given constant speeds in the shortest possible time,
More informationThe Duo-Item Bisection Auction
Comput Econ DOI 10.1007/s10614-013-9380-0 Albin Erlanson Accepted: 2 May 2013 Springer Science+Business Media New York 2013 Abstract This paper proposes an iterative sealed-bid auction for selling multiple
More informationDifferent Monotonicity Definitions in stochastic modelling
Different Monotonicity Definitions in stochastic modelling Imène KADI Nihal PEKERGIN Jean-Marc VINCENT ASMTA 2009 Plan 1 Introduction 2 Models?? 3 Stochastic monotonicity 4 Realizable monotonicity 5 Relations
More information39 Minimizing Regret with Multiple Reserves
39 Minimizing Regret with Multiple Reserves TIM ROUGHGARDEN, Stanford University JOSHUA R. WANG, Stanford University We study the problem of computing and learning non-anonymous reserve prices to maximize
More informationTTIC An Introduction to the Theory of Machine Learning. The Adversarial Multi-armed Bandit Problem Avrim Blum.
TTIC 31250 An Introduction to the Theory of Machine Learning The Adversarial Multi-armed Bandit Problem Avrim Blum Start with recap 1 Algorithm Consider the following setting Each morning, you need to
More information1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016
AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 9 February 24th Overview In the previous lecture we reviewed results from multivariate calculus in preparation for our journey into convex
More informationDay 3. Myerson: What s Optimal
Day 3. Myerson: What s Optimal 1 Recap Last time, we... Set up the Myerson auction environment: n risk-neutral bidders independent types t i F i with support [, b i ] and density f i residual valuation
More informationMechanisms for Risk Averse Agents, Without Loss
Mechanisms for Risk Averse Agents, Without Loss Shaddin Dughmi Microsoft Research shaddin@microsoft.com Yuval Peres Microsoft Research peres@microsoft.com June 13, 2012 Abstract Auctions in which agents
More informationLecture 10: The knapsack problem
Optimization Methods in Finance (EPFL, Fall 2010) Lecture 10: The knapsack problem 24.11.2010 Lecturer: Prof. Friedrich Eisenbrand Scribe: Anu Harjula The knapsack problem The Knapsack problem is a problem
More informationOptimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing
Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Prof. Chuan-Ju Wang Department of Computer Science University of Taipei Joint work with Prof. Ming-Yang Kao March 28, 2014
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationPricing Problems under the Markov Chain Choice Model
Pricing Problems under the Markov Chain Choice Model James Dong School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853, USA jd748@cornell.edu A. Serdar Simsek
More information