Bandit Learning with switching costs
|
|
- Sheila Thornton
- 5 years ago
- Views:
Transcription
1 Bandit Learning with switching costs Jian Ding, University of Chicago joint with: Ofer Dekel (MSR), Tomer Koren (Technion) and Yuval Peres (MSR) June 2016, Harvard University
2 Online Learning with k -Actions player (a.k.a. learner) Jian Ding (University of Chicago) Bandits with switching costs June, / 30
3 Online Learning with k -Actions k actions (a.k.a. arms, experts) player (a.k.a. learner) Jian Ding (University of Chicago) Bandits with switching costs June, / 30
4 Online Learning with k -Actions player (a.k.a. learner) Jian Ding (University of Chicago) adversary (a.k.a. environment) Bandits with switching costs June, / 30
5 Round player (a.k.a. learner) Jian Ding (University of Chicago) 0.6 Bandits with switching costs adversary (a.k.a. environment) June, / 30
6 Round player (a.k.a. learner) 0.6 adversary (a.k.a. environment) 0.2 Jian Ding (University of Chicago) Bandits with switching costs June, / 30
7 Finite-Action Online Learning easy hard unlearnable less feedback more powerful adversary Goal: a complete characterization of learning hardness. Jian Ding (University of Chicago) Bandits with switching costs June, / 30
8 Round t randomized player adversary Jian Ding (University of Chicago) Bandits with switching costs June, / 30
9 Round t 0.1 Two Types of Adversaries An adaptive adversary takes0.7 the player s past actions into account when setting loss values. An oblivious adversary ignores the player s past acadversary tions when setting loss values. randomized 0.2 player Jian Ding (University of Chicago) Bandits with switching costs June, / 30
10 Round t randomized player Jian Ding (University of Chicago) adversary 0.1 Bandits with switching costs June, / 30
11 Round t 0.1 Two Feedback Models In the bandit feedback model, the player only sees the 0.7 loss associated with his action (one number). In the full feedback model, the player also sees the adversary losses associated with the other actions (k numbers). randomized 0.2 player Jian Ding (University of Chicago) 0.1 Bandits with switching costs June, / 30
12 Examples bandit feedback Display one of k news articles to maximize user clicks. full feedback Invest in one stock on each day. Jian Ding (University of Chicago) Bandits with switching costs June, / 30
13 More Formally Setting A T -round repeated game between a randomized player and a deterministic adaptive adversary Notation: player s actions: X = {1,..., k} Before the game: adversary chooses sequence of loss functions f 1,..., f T, where f t : X t [0, 1] The game: for t = 1,..., T player chooses distribution µ t over X and draws X t µ t player suffers and observes loss f t (X 1,..., X t ) if full feedback, player observes x f t (X 1,..., X t 1, x) Jian Ding (University of Chicago) Bandits with switching costs June, / 30
14 Adaptive vs. Oblivious Adaptive : f t : X t [0, 1] can be any function Oblivious : adversary chooses l 1,..., l T, where l t : X [0, 1], and sets f t (x 1,..., x t ) = l t (x t ). oblivious adaptive Jian Ding (University of Chicago) Bandits with switching costs June, / 30
15 Loss, Regret Definition Player s expected cumulative loss is [ T ] E t=1 f t(x 1,..., X t ). Jian Ding (University of Chicago) Bandits with switching costs June, / 30
16 Loss, Regret Definition Player s expected cumulative loss is [ T ] E t=1 f t(x 1,..., X t ). Definition R(T ) = E [ T Player s regret w.r.t. the best action is ] t=1 f t(x 1,..., X t ) min T x X t=1 f t(x,..., x). Interpretation R(T ) = o(t ) the player gets better with time. Jian Ding (University of Chicago) Bandits with switching costs June, / 30
17 Minimax Regret Regret measures a specific player s performance We want to measure the inherent difficulty of the problem Definition The minimax regret R (T ), is the inf over randomized player strategies of the sup over adversary loss sequences of the resulting expected regret. R (T ) = θ( T ) problem is easy R (T ) = θ(t ) problem is unlearnable Jian Ding (University of Chicago) Bandits with switching costs June, / 30
18 Full + Oblivious a.k.a. Predicting with Expert Advice Littlestone & Warmuth (1994), Freund & Schapire (1997) The Multiplicative Weights Algorithm Sample X t from µ t where t 1 µ t (i) exp γ l j (i). j=1 Theorem γ = 1/ T yields R(T ) = O( T log(k)). Jian Ding (University of Chicago) Bandits with switching costs June, / 30
19 Bandit + Oblivious a.k.a. The Adversarial Multiarmed Bandit Problem Auer, Cesa-Bianchi, Freund, Schapire (2002) The EXP3 Algorithm Run the weighted majority algorithm with estimates of the full feedback vectors { lt (i) µ ˆl t (i) = t if i = X (i) t 0 otherwise. Theorem E[ˆl t (i)] = l t (i) R(T ) = O( Tk). Jian Ding (University of Chicago) Bandits with switching costs June, / 30
20 Adaptive Obstacle (Arora, Dekel, Tewari 2012) R (T ) = θ(t ) in any feedback model. Proof w.l.o.g. assume µ 1 (1) > 0. Define { 1 if x 1 = 1 f t (x 1,..., x t ) = 0 otherwise. This loss guarantees R (T ) = µ 1 (1) T. Jian Ding (University of Chicago) Bandits with switching costs June, / 30
21 The Characterization (so far) oblivious adaptive bandit full θ( T ) easy θ(t ) unlearnable less feedback more powerful adversary Boring Feedback models seem to be equivalent (when k = 2, say). Jian Ding (University of Chicago) Bandits with switching costs June, / 30
22 Adding a Switching Cost The switching cost adversary chooses l 1,..., l T, where l t : X [0, 1], and sets f t (x 1,..., x t ) = 1 ( ) lt (x t ) + 1 xt x 2 t 1. The Follow the Lazy Leader algorithm (Kalai-Vempala 05) guarantees R(T ) = O( T ) (full information); also Shrinking the dartboard (Geulen-Vöcking-Winkler 10) oblivious switching adaptive Jian Ding (University of Chicago) Bandits with switching costs June, / 30
23 m-memory Adversary, Counterfactual Feedback The m-memory adversary defines loss functions that depend only on the m + 1 recent actions. ( ) ( ) f t x1,..., x }{{} t = ft xt m,..., x }{{} t. t m+1 A Third Feedback Model In the counterfactual feedback model, the player receives the entire loss function f t. Merhav et al. (2002) proved R(T ) = O(T 2/3 ). Gyorgy & Neu (2011) improved this to R(T ) = O( T ). Jian Ding (University of Chicago) Bandits with switching costs June, / 30
24 Adversaries and Feedbacks oblivious switching m-memory adaptive bandit full counterfactual Jian Ding (University of Chicago) Bandits with switching costs June, / 30
25 The Characterization (so far) oblivious switching m-memory bandit full counterfactual θ( T ) easy more powerful adversary θ(t ) unlearnable less feedback Jian Ding (University of Chicago) Bandits with switching costs June, / 30
26 bandit The Characterization (so far) full counterfactual oblivious θ( T ) easy switching Ω( T ), O(T 2/3 ) easy? hard? m-memory more powerful adversary θ(t ) unlearnable less feedback Arora, Dekel, Tewari 2012 Jian Ding (University of Chicago) Bandits with switching costs June, / 30
27 The Characterization (so far) oblivious switching m-memory bandit full counterfactual θ( T ) easy θ(t 2/3 ) hard more powerful adversary θ(t ) unlearnable less feedback Cesa-Bianchi, Dekel, Shamir (2013), (Unbounded Losses) Dekel, D., Koren, Peres (2013) Jian Ding (University of Chicago) Bandits with switching costs June, / 30
28 The Characterization (so far) oblivious switching m-memory bandit full counterfactual θ( T ) easy θ(t 2/3 ) hard more powerful adversary θ(t ) unlearnable less feedback Cesa-Bianchi, Dekel, Shamir (2013), (Unbounded Losses) Dekel, D., Koren, Peres (2013) Jian Ding (University of Chicago) Bandits with switching costs June, / 30
29 Bandit + Switching: Upper Bound Algorithm split rounds into T /B blocks of length B. use EXP3 to choose action ˆx j for the entire block feedback to EXP3 is the average loss in the block ˆx 1 ˆx 1 ˆx 1 ˆx 1 ˆx 2 ˆx 2 ˆx 2 ˆx 2 ˆx 3 ˆx 3 ˆx 3 ˆx 3 ˆx 4 ˆx 4 ˆx 4 ˆx 4 Regret Analysis R(T ) T /B }{{} switches + B O ( T /B ) }{{} loss = O(T /B + TB) Minimized by selecting B = T 1/3 yielding regret R(T ) = O(T 2/3 ). Jian Ding (University of Chicago) Bandits with switching costs June, / 30
30 Bandit + Switching: Lower Bound Yao s Minimax Principle (1975) The expected regret of the best deterministic algorithm on a random loss sequence lower-bounds the expected regret of a randomized algorithm on the worst deterministic loss sequence. Goal find a random loss sequence for which all deterministic algorithms have expected regret Ω(T 2/3 ). For simplicity, assume k = 2. Jian Ding (University of Chicago) Bandits with switching costs June, / 30
31 Bandit + Switching: Lower Bound Cesa-Bianchi, Dekel, Shamir 2013: random walk construction Let (S t ) be a Gaussian random walk, and ɛ = 1/ T. Randomly choose an action and assign to it the loss function (S t ), and the other action the loss function (S t + ɛ). Key: 1/ɛ 2 switches required before determining which action is worse! Drawback: Unbounded loss function is hard learning an artifact of unboundedness?? Jian Ding (University of Chicago) Bandits with switching costs June, / 30
32 Multi-Scale Random Walk Define the loss of action 1: Draw independent Gaussians ξ 1,..., ξ T N(0, σ 2 ) For each t, define a parent ρ(t) {0,..., t 1} Define (recursively): L 0 = 1/2, L t = L ρ(t) + ξ t ξ 4 ξ 2 ξ 6 ξ 1 ξ 3 ξ 5 ξ 7 L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 loss of action 1 Jian Ding (University of Chicago) Bandits with switching costs June, / 30
33 Examples ρ(t) = t gcd(t, 2 T ) L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 ρ(t) = 0 wide L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 ρ(t) = t 1 L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 deep Jian Ding (University of Chicago) Bandits with switching costs June, / 30
34 Define the loss of action 2: Draw a random sign χ The Second Action Define L t = L t + χɛ, where ɛ = T 1/3. ( ) Pr(χ = +1) = Pr(χ = 1) loss ɛ Action 1 (L t ) Action 2 (L t) t Choose worse action θ(t ) times R(T ) = Ω(T 2/3 ) Jian Ding (University of Chicago) Bandits with switching costs June, / 30
35 The Information in One Sample To avoid choosing the worse action θ(t ) times, algorithm must identify the value of χ. Fact Q: How many samples to estimate the mean of a Gaussian with accuracy ɛ? A: ( ) σ 2 ɛ L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 Jian Ding (University of Chicago) Bandits with switching costs June, / 30
36 The Information in One Sample To avoid choosing the worse action θ(t ) times, algorithm must identify the value of χ. Fact Q: How many samples to estimate the mean of a Gaussian with accuracy ɛ? A: ( ) σ 2 ɛ no info L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 action: Jian Ding (University of Chicago) Bandits with switching costs June, / 30
37 The Information in One Sample To avoid choosing the worse action θ(t ) times, algorithm must identify the value of χ. Fact Q: How many samples to estimate the mean of a Gaussian with accuracy ɛ? A: ( ) σ 2 ɛ no info one sample L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 action: Jian Ding (University of Chicago) Bandits with switching costs June, / 30
38 The Information in One Sample To avoid choosing the worse action θ(t ) times, algorithm must identify the value of χ. Fact Q: How many samples to estimate the mean of a Gaussian with accuracy ɛ? A: ( ) σ 2 ɛ L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 action: How many red edges? Player needs at least σ 2 T 2/3 Jian Ding (University of Chicago) Bandits with switching costs June, / 30
39 Counting the Information Define width(ρ) is the maximum size of any vertical cut in the graph induced by ρ. width(ρ)=3 L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 Lemma A switch contributes width(ρ) samples. Jian Ding (University of Chicago) Bandits with switching costs June, / 30
40 Depth Define depth(ρ) is the length of the longest path. L 0 L 1 L 2 L 3 L 4 L 5 L 6 L 7 Loss should remain bounded in [0, 1] set σ 1 depth(ρ). Jian Ding (University of Chicago) Bandits with switching costs June, / 30
41 Putting it All Together σ 2 T 2/3 samples needed each switch gives width(ρ) samples loss bounded in [0, 1] σ 2 1 depth(ρ). Conclusion Number of switches needed to determine the better action T 2/3 width(ρ) depth(ρ) 2 Jian Ding (University of Chicago) Bandits with switching costs June, / 30
42 Putting it All Together σ 2 T 2/3 samples needed each switch gives width(ρ) samples loss bounded in [0, 1] σ 2 1 depth(ρ). Conclusion Number of switches needed to determine the better action T 2/3 width(ρ) depth(ρ) 2 Choose ρ(t) = t gcd(t, 2 T ) Lemma depth(ρ) log(t ) and width(ρ) log(t )+1 Jian Ding (University of Chicago) Bandits with switching costs June, / 30
43 Corollaries & Extensions Corollary Exploration requires switching. e.g., EXP3 switches θ(t ) times. Dependence on k The minimax regret of the multiarmed bandit with switching costs is θ(t 2/3 k 1/3 ) Implications on other models The minimax regret of learning an adversarial deterministic MDP is θ(t 2/3 ) Jian Ding (University of Chicago) Bandits with switching costs June, / 30
44 Summary A complete characterization of learning hardness. There exist online learning problems that are hard yet learnable. Learning with bandit feedback can be strictly harder than learning with full feedback. Exploration requires extensive switching. Jian Ding (University of Chicago) Bandits with switching costs June, / 30
45 The End hard easy Jian Ding (University of Chicago) Bandits with switching costs r lea un ble na June, / 30
An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet
More informationYao s Minimax Principle
Complexity of algorithms The complexity of an algorithm is usually measured with respect to the size of the input, where size may for example refer to the length of a binary word describing the input,
More informationTTIC An Introduction to the Theory of Machine Learning. The Adversarial Multi-armed Bandit Problem Avrim Blum.
TTIC 31250 An Introduction to the Theory of Machine Learning The Adversarial Multi-armed Bandit Problem Avrim Blum Start with recap 1 Algorithm Consider the following setting Each morning, you need to
More informationDistributed Non-Stochastic Experts
Distributed Non-Stochastic Experts Varun Kanade UC Berkeley vkanade@eecs.berkeley.edu Zhenming Liu Princeton University zhenming@cs.princeton.edu Božidar Radunović Microsoft Research bozidar@microsoft.com
More informationLecture 7: Bayesian approach to MAB - Gittins index
Advanced Topics in Machine Learning and Algorithmic Game Theory Lecture 7: Bayesian approach to MAB - Gittins index Lecturer: Yishay Mansour Scribe: Mariano Schain 7.1 Introduction In the Bayesian approach
More informationTuning bandit algorithms in stochastic environments
Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationOptimal Regret Minimization in Posted-Price Auctions with Strategic Buyers
Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers Mehryar Mohri Courant Institute and Google Research 251 Mercer Street New York, NY 10012 mohri@cims.nyu.edu Andres Muñoz Medina
More informationSequential Decision Making
Sequential Decision Making Dynamic programming Christos Dimitrakakis Intelligent Autonomous Systems, IvI, University of Amsterdam, The Netherlands March 18, 2008 Introduction Some examples Dynamic programming
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationSupplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4.
Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4. If the reader will recall, we have the following problem-specific
More informationThe Non-stationary Stochastic Multi-armed Bandit Problem
The Non-stationary Stochastic Multi-armed Bandit Problem Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard To cite this version: Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard The Non-stationary
More informationStochastic Dual Dynamic Programming
1 / 43 Stochastic Dual Dynamic Programming Operations Research Anthony Papavasiliou 2 / 43 Contents [ 10.4 of BL], [Pereira, 1991] 1 Recalling the Nested L-Shaped Decomposition 2 Drawbacks of Nested Decomposition
More information1 Online Problem Examples
Comp 260: Advanced Algorithms Tufts University, Spring 2018 Prof. Lenore Cowen Scribe: Isaiah Mindich Lecture 9: Online Algorithms All of the algorithms we have studied so far operate on the assumption
More informationRegret Minimization and Correlated Equilibria
Algorithmic Game heory Summer 2017, Week 4 EH Zürich Overview Regret Minimization and Correlated Equilibria Paolo Penna We have seen different type of equilibria and also considered the corresponding price
More informationDynamic Pricing with Varying Cost
Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy
More informationMulti-Armed Bandit, Dynamic Environments and Meta-Bandits
Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This
More informationarxiv: v1 [cs.lg] 23 Nov 2014
Revenue Optimization in Posted-Price Auctions with Strategic Buyers arxiv:.0v [cs.lg] Nov 0 Mehryar Mohri Courant Institute and Google Research Mercer Street New York, NY 00 mohri@cims.nyu.edu Abstract
More informationRevenue optimization in AdExchange against strategic advertisers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games
University of Illinois Fall 2018 ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games Due: Tuesday, Sept. 11, at beginning of class Reading: Course notes, Sections 1.1-1.4 1. [A random
More informationOn Existence of Equilibria. Bayesian Allocation-Mechanisms
On Existence of Equilibria in Bayesian Allocation Mechanisms Northwestern University April 23, 2014 Bayesian Allocation Mechanisms In allocation mechanisms, agents choose messages. The messages determine
More informationRegret Minimization and the Price of Total Anarchy
Regret Minimization and the Price of otal Anarchy Avrim Blum, Mohammadaghi Hajiaghayi, Katrina Ligett, Aaron Roth Department of Computer Science Carnegie Mellon University {avrim,hajiagha,katrina,alroth}@cs.cmu.edu
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May, 204 Review of Game heory: Let M be a matrix with all elements in [0, ]. Mindy (called the row player) chooses
More information6.896 Topics in Algorithmic Game Theory February 10, Lecture 3
6.896 Topics in Algorithmic Game Theory February 0, 200 Lecture 3 Lecturer: Constantinos Daskalakis Scribe: Pablo Azar, Anthony Kim In the previous lecture we saw that there always exists a Nash equilibrium
More informationRollout Allocation Strategies for Classification-based Policy Iteration
Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large
More informationarxiv: v1 [cs.dm] 4 Jan 2012
COPS AND INVISIBLE ROBBERS: THE COST OF DRUNKENNESS ATHANASIOS KEHAGIAS, DIETER MITSCHE, AND PAWE L PRA LAT arxiv:1201.0946v1 [cs.dm] 4 Jan 2012 Abstract. We examine a version of the Cops and Robber (CR)
More information4 Martingales in Discrete-Time
4 Martingales in Discrete-Time Suppose that (Ω, F, P is a probability space. Definition 4.1. A sequence F = {F n, n = 0, 1,...} is called a filtration if each F n is a sub-σ-algebra of F, and F n F n+1
More informationLecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory
CSCI699: Topics in Learning & Game Theory Lecturer: Shaddin Dughmi Lecture 5 Scribes: Umang Gupta & Anastasia Voloshinov In this lecture, we will give a brief introduction to online learning and then go
More informationMartingale Measure TA
Martingale Measure TA Martingale Measure a) What is a martingale? b) Groundwork c) Definition of a martingale d) Super- and Submartingale e) Example of a martingale Table of Content Connection between
More informationAlgorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information
Algorithmic Game Theory and Applications Lecture 11: Games of Perfect Information Kousha Etessami finite games of perfect information Recall, a perfect information (PI) game has only 1 node per information
More informationRisk-Sensitive Online Learning
Risk-Sensitive Online Learning Eyal Even-Dar, Michael Kearns, and Jennifer Wortman Department of Computer and Information Science University of Pennsylvania, Philadelphia, PA 19104 {evendar,wortmanj}@seas.upenn.edu,
More informationarxiv: v1 [cs.lg] 14 Nov 2012
Distributed Non-Stochastic Experts arxiv:1211.3212v1 [cs.lg] 14 Nov 2012 Varun Kanade UC Berkeley vkanade@eecs.berkeley.edu Božidar Radunović Microsoft Research bozidar@microsoft.com November 15, 2012
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More informationRational Behaviour and Strategy Construction in Infinite Multiplayer Games
Rational Behaviour and Strategy Construction in Infinite Multiplayer Games Michael Ummels ummels@logic.rwth-aachen.de FSTTCS 2006 Michael Ummels Rational Behaviour and Strategy Construction 1 / 15 Infinite
More informationPLAYING GAMES WITHOUT OBSERVING PAYOFFS
PLAYING GAMES WITHOUT OBSERVING PAYOFFS Michal Feldman Hebrew University & Microsoft Israel R&D Center Joint work with Adam Kalai and Moshe Tennenholtz FLA--BONG-DING FLA BONG DING 鲍步 爱丽丝 Y FLA Y FLA 5
More informationLearning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme
Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10
More informationLecture 9 Feb. 21, 2017
CS 224: Advanced Algorithms Spring 2017 Lecture 9 Feb. 21, 2017 Prof. Jelani Nelson Scribe: Gavin McDowell 1 Overview Today: office hours 5-7, not 4-6. We re continuing with online algorithms. In this
More informationZooming Algorithm for Lipschitz Bandits
Zooming Algorithm for Lipschitz Bandits Alex Slivkins Microsoft Research New York City Based on joint work with Robert Kleinberg and Eli Upfal (STOC'08) Running examples Dynamic pricing. You release a
More informationLecture 23: April 10
CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 23: April 10 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationDynamic Programming: An overview. 1 Preliminaries: The basic principle underlying dynamic programming
Dynamic Programming: An overview These notes summarize some key properties of the Dynamic Programming principle to optimize a function or cost that depends on an interval or stages. This plays a key role
More informationApproximate Revenue Maximization with Multiple Items
Approximate Revenue Maximization with Multiple Items Nir Shabbat - 05305311 December 5, 2012 Introduction The paper I read is called Approximate Revenue Maximization with Multiple Items by Sergiu Hart
More informationNear-Optimal No-Regret Algorithms for Zero-Sum Games
Near-Optimal No-Regret Algorithms for Zero-Sum Games Constantinos Daskalakis, Alan Deckelbaum 2, Anthony Kim 3 Abstract We propose a new no-regret learning algorithm. When used ( ) against an adversary,
More informationAdaptive Market Making via Online Learning
Adaptive Market Making via Online Learning Jacob Abernethy Computer Science and Engineering University of Michigan jabernet@umich.edu Satyen Kale IBM T. J. Watson Research Center sckale@us.ibm.com Abstract
More informationSublinear Time Algorithms Oct 19, Lecture 1
0368.416701 Sublinear Time Algorithms Oct 19, 2009 Lecturer: Ronitt Rubinfeld Lecture 1 Scribe: Daniel Shahaf 1 Sublinear-time algorithms: motivation Twenty years ago, there was practically no investigation
More informationarxiv: v1 [cs.lg] 21 May 2011
Calibration with Changing Checking Rules and Its Application to Short-Term Trading Vladimir Trunov and Vladimir V yugin arxiv:1105.4272v1 [cs.lg] 21 May 2011 Institute for Information Transmission Problems,
More informationOnline Trading Algorithms and Robust Option Pricing
Online Trading Algorithms and Robust Option Pricing Peter DeMarzo Ilan Kremer Yishay Mansour November 2, 2005 Abstract In this work we show how to use efficient online trading algorithms to price the current
More informationEvaluating Strategic Forecasters. Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017
Evaluating Strategic Forecasters Rahul Deb with Mallesh Pai (Rice) and Maher Said (NYU Stern) Becker Friedman Theory Conference III July 22, 2017 Motivation Forecasters are sought after in a variety of
More informationTug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract
Tug of War Game William Gasarch and ick Sovich and Paul Zimand October 6, 2009 To be written later Abstract Introduction Combinatorial games under auction play, introduced by Lazarus, Loeb, Propp, Stromquist,
More informationReduced-Variance Payoff Estimation in Adversarial Bandit Problems
Reduced-Variance Payoff Estimation in Adversarial Bandit Problems Levente Kocsis and Csaba Szepesvári Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111
More informationMulti-period mean variance asset allocation: Is it bad to win the lottery?
Multi-period mean variance asset allocation: Is it bad to win the lottery? Peter Forsyth 1 D.M. Dang 1 1 Cheriton School of Computer Science University of Waterloo Guangzhou, July 28, 2014 1 / 29 The Basic
More informationOnline Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
More informationStrategies and Nash Equilibrium. A Whirlwind Tour of Game Theory
Strategies and Nash Equilibrium A Whirlwind Tour of Game Theory (Mostly from Fudenberg & Tirole) Players choose actions, receive rewards based on their own actions and those of the other players. Example,
More informationAsymptotic results discrete time martingales and stochastic algorithms
Asymptotic results discrete time martingales and stochastic algorithms Bernard Bercu Bordeaux University, France IFCAM Summer School Bangalore, India, July 2015 Bernard Bercu Asymptotic results for discrete
More informationEfficient Market Making via Convex Optimization, and a Connection to Online Learning
Efficient Market Making via Convex Optimization, and a Connection to Online Learning by J. Abernethy, Y. Chen and J.W. Vaughan Presented by J. Duraj and D. Rishi 1 / 16 Outline 1 Motivation 2 Reasonable
More informationLog-Robust Portfolio Management
Log-Robust Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Elcin Cetinkaya and Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983 Dr.
More informationLarge-Scale SVM Optimization: Taking a Machine Learning Perspective
Large-Scale SVM Optimization: Taking a Machine Learning Perspective Shai Shalev-Shwartz Toyota Technological Institute at Chicago Joint work with Nati Srebro Talk at NEC Labs, Princeton, August, 2008 Shai
More informationFinite Memory and Imperfect Monitoring
Federal Reserve Bank of Minneapolis Research Department Finite Memory and Imperfect Monitoring Harold L. Cole and Narayana Kocherlakota Working Paper 604 September 2000 Cole: U.C.L.A. and Federal Reserve
More informationGAME THEORY. Department of Economics, MIT, Follow Muhamet s slides. We need the following result for future reference.
14.126 GAME THEORY MIHAI MANEA Department of Economics, MIT, 1. Existence and Continuity of Nash Equilibria Follow Muhamet s slides. We need the following result for future reference. Theorem 1. Suppose
More informationNear-Optimal No-Regret Algorithms for Zero-Sum Games
Near-Optimal No-Regret Algorithms for Zero-Sum Games Constantinos Daskalakis Alan Deckelbaum Anthony Kim Abstract We propose a new no-regret learning algorithm. When used against an adversary, ( ) our
More informationLaws of probabilities in efficient markets
Laws of probabilities in efficient markets Vladimir Vovk Department of Computer Science Royal Holloway, University of London Fifth Workshop on Game-Theoretic Probability and Related Topics 15 November
More informationMax Registers, Counters and Monotone Circuits
James Aspnes 1 Hagit Attiya 2 Keren Censor 2 1 Yale 2 Technion Counters Model Collects Our goal: build a cheap counter for an asynchronous shared-memory system. Two operations: increment and read. Read
More informationCEC login. Student Details Name SOLUTIONS
Student Details Name SOLUTIONS CEC login Instructions You have roughly 1 minute per point, so schedule your time accordingly. There is only one correct answer per question. Good luck! Question 1. Searching
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationA New Understanding of Prediction Markets Via No-Regret Learning
A New Understanding of Prediction Markets Via No-Regret Learning ABSTRACT Yiling Chen School of Engineering and Applied Sciences Harvard University Cambridge, MA 2138 yiling@eecs.harvard.edu We explore
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationAn Approach to Bounded Rationality
An Approach to Bounded Rationality Eli Ben-Sasson Department of Computer Science Technion Israel Institute of Technology Adam Tauman Kalai Toyota Technological Institute at Chicago Ehud Kalai Kellogg Graduate
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationCS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued)
CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued) Instructor: Shaddin Dughmi Administrivia Homework 1 due today. Homework 2 out
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationThe Stackelberg Minimum Spanning Tree Game
The Stackelberg Minimum Spanning Tree Game J. Cardinal, E. Demaine, S. Fiorini, G. Joret, S. Langerman, I. Newman, O. Weimann, The Stackelberg Minimum Spanning Tree Game, WADS 07 Stackelberg Game 2 players:
More informationOn Finite Strategy Sets for Finitely Repeated Zero-Sum Games
On Finite Strategy Sets for Finitely Repeated Zero-Sum Games Thomas C. O Connell Department of Mathematics and Computer Science Skidmore College 815 North Broadway Saratoga Springs, NY 12866 E-mail: oconnellt@acm.org
More informationLecture 11: Bandits with Knapsacks
CMSC 858G: Bandits, Experts and Games 11/14/16 Lecture 11: Bandits with Knapsacks Instructor: Alex Slivkins Scribed by: Mahsa Derakhshan 1 Motivating Example: Dynamic Pricing The basic version of the dynamic
More informationTTIC An Introduction to the Theory of Machine Learning. Learning and Game Theory. Avrim Blum 5/7/18, 5/9/18
TTIC 31250 An Introduction to the Theory of Machine Learning Learning and Game Theory Avrim Blum 5/7/18, 5/9/18 Zero-sum games, Minimax Optimality & Minimax Thm; Connection to Boosting & Regret Minimization
More informationThe Value of Stochastic Modeling in Two-Stage Stochastic Programs
The Value of Stochastic Modeling in Two-Stage Stochastic Programs Erick Delage, HEC Montréal Sharon Arroyo, The Boeing Cie. Yinyu Ye, Stanford University Tuesday, October 8 th, 2013 1 Delage et al. Value
More informationCommitment in First-price Auctions
Commitment in First-price Auctions Yunjian Xu and Katrina Ligett November 12, 2014 Abstract We study a variation of the single-item sealed-bid first-price auction wherein one bidder (the leader) publicly
More informationOnline Algorithms SS 2013
Faculty of Computer Science, Electrical Engineering and Mathematics Algorithms and Complexity research group Jun.-Prof. Dr. Alexander Skopalik Online Algorithms SS 2013 Summary of the lecture by Vanessa
More informationComputing Unsatisfiable k-sat Instances with Few Occurrences per Variable
Computing Unsatisfiable k-sat Instances with Few Occurrences per Variable Shlomo Hoory and Stefan Szeider Abstract (k, s)-sat is the propositional satisfiability problem restricted to instances where each
More informationEssays on Some Combinatorial Optimization Problems with Interval Data
Essays on Some Combinatorial Optimization Problems with Interval Data a thesis submitted to the department of industrial engineering and the institute of engineering and sciences of bilkent university
More information1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016
AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 9 February 24th Overview In the previous lecture we reviewed results from multivariate calculus in preparation for our journey into convex
More informationMonte-Carlo Planning: Basic Principles and Recent Progress
Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC
More informationDecision Making in Uncertain and Changing Environments
Decision Making in Uncertain and Changing Environments Karl H. Schlag Andriy Zapechelnyuk June 18, 2009 Abstract We consider an agent who has to repeatedly make choices in an uncertain and changing environment,
More informationHandout 4: Deterministic Systems and the Shortest Path Problem
SEEM 3470: Dynamic Optimization and Applications 2013 14 Second Term Handout 4: Deterministic Systems and the Shortest Path Problem Instructor: Shiqian Ma January 27, 2014 Suggested Reading: Bertsekas
More informationOptimal Order Placement
Optimal Order Placement Peter Bank joint work with Antje Fruth OMI Colloquium Oxford-Man-Institute, October 16, 2012 Optimal order execution Broker is asked to do a transaction of a significant fraction
More informationMartingales. by D. Cox December 2, 2009
Martingales by D. Cox December 2, 2009 1 Stochastic Processes. Definition 1.1 Let T be an arbitrary index set. A stochastic process indexed by T is a family of random variables (X t : t T) defined on a
More informationRegret Minimization against Strategic Buyers
Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and
More informationCooperative Games with Monte Carlo Tree Search
Int'l Conf. Artificial Intelligence ICAI'5 99 Cooperative Games with Monte Carlo Tree Search CheeChian Cheng and Norman Carver Department of Computer Science, Southern Illinois University, Carbondale,
More informationResponse Regret. Martin Zinkevich University of Alberta Department of Computing Science. Fundamentals of Game Theory
Response Regret Martin Zinkevich University of Alberta Department of Computing Science Abstract The concept of regret is designed for the long-term interaction of multiple agents. However, most concepts
More informationAll-Pay Contests. (Ron Siegel; Econometrica, 2009) PhDBA 279B 13 Feb Hyo (Hyoseok) Kang First-year BPP
All-Pay Contests (Ron Siegel; Econometrica, 2009) PhDBA 279B 13 Feb 2014 Hyo (Hyoseok) Kang First-year BPP Outline 1 Introduction All-Pay Contests An Example 2 Main Analysis The Model Generic Contests
More informationHow to Buy Advice. Ronen Gradwohl Yuval Salant. First version: January 3, 2011 This version: September 20, Abstract
How to Buy Advice Ronen Gradwohl Yuval Salant First version: January 3, 2011 This version: September 20, 2011 Abstract A decision maker, whose payoff is influenced by an unknown stochastic process, seeks
More informationOutline Introduction Game Representations Reductions Solution Concepts. Game Theory. Enrico Franchi. May 19, 2010
May 19, 2010 1 Introduction Scope of Agent preferences Utility Functions 2 Game Representations Example: Game-1 Extended Form Strategic Form Equivalences 3 Reductions Best Response Domination 4 Solution
More informationBounded computational capacity equilibrium
Available online at www.sciencedirect.com ScienceDirect Journal of Economic Theory 63 (206) 342 364 www.elsevier.com/locate/jet Bounded computational capacity equilibrium Penélope Hernández a, Eilon Solan
More informationTDT4171 Artificial Intelligence Methods
TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods
More informationREPUTATION WITH LONG RUN PLAYERS AND IMPERFECT OBSERVATION
REPUTATION WITH LONG RUN PLAYERS AND IMPERFECT OBSERVATION ALP E. ATAKAN AND MEHMET EKMEKCI Abstract. Previous work shows that reputation results may fail in repeated games between two long-run players
More informationOptimal online-list batch scheduling
Optimal online-list batch scheduling Paulus, J.J.; Ye, Deshi; Zhang, G. Published: 01/01/2008 Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)
More informationUNIVERSITY OF VIENNA
WORKING PAPERS Ana. B. Ania Learning by Imitation when Playing the Field September 2000 Working Paper No: 0005 DEPARTMENT OF ECONOMICS UNIVERSITY OF VIENNA All our working papers are available at: http://mailbox.univie.ac.at/papers.econ
More informationCoordination Games on Graphs
CWI and University of Amsterdam Based on joint work with Mona Rahn, Guido Schäfer and Sunil Simon : Definition Assume a finite graph. Each node has a set of colours available to it. Suppose that each node
More informationReinforcement Learning
Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical
More informationInformation Aggregation in Dynamic Markets with Strategic Traders. Michael Ostrovsky
Information Aggregation in Dynamic Markets with Strategic Traders Michael Ostrovsky Setup n risk-neutral players, i = 1,..., n Finite set of states of the world Ω Random variable ( security ) X : Ω R Each
More information