Regret Minimization and Correlated Equilibria

Similar documents
CS364A: Algorithmic Game Theory Lecture #14: Robust Price-of-Anarchy Bounds in Smooth Games

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory

Price of Anarchy Smoothness Price of Stability. Price of Anarchy. Algorithmic Game Theory

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Chapter 10: Mixed strategies Nash equilibria, reaction curves and the equality of payoffs theorem

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014

ECON 803: MICROECONOMIC THEORY II Arthur J. Robson Fall 2016 Assignment 9 (due in class on November 22)

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3

ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games

PAULI MURTO, ANDREY ZHUKOV

Yao s Minimax Principle

1 x i c i if x 1 +x 2 > 0 u i (x 1,x 2 ) = 0 if x 1 +x 2 = 0

Game theory and applications: Lecture 1

Lecture 1: Normal Form Games: Refinements and Correlated Equilibrium

6.254 : Game Theory with Engineering Applications Lecture 3: Strategic Form Games - Solution Concepts

MA300.2 Game Theory 2005, LSE

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

Introduction to Game Theory Lecture Note 5: Repeated Games

Regret Minimization and the Price of Total Anarchy

Game Theory with Applications to Finance and Marketing, I

Regret Minimization and Security Strategies

TTIC An Introduction to the Theory of Machine Learning. Learning and Game Theory. Avrim Blum 5/7/18, 5/9/18

Repeated Games with Perfect Monitoring

A Decentralized Learning Equilibrium

Econ 8602, Fall 2017 Homework 2

Outline for today. Stat155 Game Theory Lecture 19: Price of anarchy. Cooperative games. Price of anarchy. Price of anarchy

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

The Ohio State University Department of Economics Second Midterm Examination Answers

University of Hong Kong ECON6036 Stephen Chiu. Extensive Games with Perfect Information II. Outline

UCLA Department of Economics Ph.D. Preliminary Exam Industrial Organization Field Exam (Spring 2010) Use SEPARATE booklets to answer each question

Optimal selling rules for repeated transactions.

PAULI MURTO, ANDREY ZHUKOV. If any mistakes or typos are spotted, kindly communicate them to

X ln( +1 ) +1 [0 ] Γ( )

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015

Notes for Section: Week 7

Jianfei Shen. School of Economics, The University of New South Wales, Sydney 2052, Australia

Public Schemes for Efficiency in Oligopolistic Markets

Near-Optimal No-Regret Algorithms for Zero-Sum Games

Game theory for. Leonardo Badia.

The Capital Asset Pricing Model as a corollary of the Black Scholes model

Game Theory Fall 2006

Simon Fraser University Spring 2014

Near-Optimal No-Regret Algorithms for Zero-Sum Games

Game Theory: Global Games. Christoph Schottmüller

10.1 Elimination of strictly dominated strategies

Online Shopping Intermediaries: The Strategic Design of Search Environments

Economics 2010c: Lecture 4 Precautionary Savings and Liquidity Constraints

TEST 1 SOLUTIONS MATH 1002

Outline for today. Stat155 Game Theory Lecture 13: General-Sum Games. General-sum games. General-sum games. Dominated pure strategies

Lecture 7: Bayesian approach to MAB - Gittins index

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2017

Lecture 6 Dynamic games with imperfect information

MA200.2 Game Theory II, LSE

The value of foresight

On Existence of Equilibria. Bayesian Allocation-Mechanisms

Solution to Tutorial 1

that internalizes the constraint by solving to remove the y variable. 1. Using the substitution method, determine the utility function U( x)

Game Theory Fall 2003

CS 798: Homework Assignment 4 (Game Theory)

Problem Set 2 - SOLUTIONS

Complexity of Iterated Dominance and a New Definition of Eliminability

Strategy-Based Warm Starting for Regret Minimization in Games

Elements of Economic Analysis II Lecture XI: Oligopoly: Cournot and Bertrand Competition

Lecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India August 2012

Answers to Problem Set 4

Games of Incomplete Information ( 資訊不全賽局 ) Games of Incomplete Information

Mixed Strategies. Samuel Alizon and Daniel Cownden February 4, 2009

IEOR E4602: Quantitative Risk Management

Game Theory: Normal Form Games

Chapter 3. Dynamic discrete games and auctions: an introduction

MIDTERM ANSWER KEY GAME THEORY, ECON 395

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 6: Prior-Free Single-Parameter Mechanism Design (Continued)

Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors

Homework #2 Psychology 101 Spr 03 Prof Colin Camerer

Strategy -1- Strategy

Microeconomic Theory II Preliminary Examination Solutions Exam date: June 5, 2017

Microeconomics III Final Exam SOLUTIONS 3/17/11. Muhamet Yildiz

EE266 Homework 5 Solutions

Laws of probabilities in efficient markets

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012

CS711 Game Theory and Mechanism Design

(a) Describe the game in plain english and find its equivalent strategic form.

FDPE Microeconomics 3 Spring 2017 Pauli Murto TA: Tsz-Ning Wong (These solution hints are based on Julia Salmi s solution hints for Spring 2015.

Can we have no Nash Equilibria? Can you have more than one Nash Equilibrium? CS 430: Artificial Intelligence Game Theory II (Nash Equilibria)

Log-linear Dynamics and Local Potential

In the Name of God. Sharif University of Technology. Microeconomics 2. Graduate School of Management and Economics. Dr. S.

Final Examination December 14, Economics 5010 AF3.0 : Applied Microeconomics. time=2.5 hours

Lecture 5: Iterative Combinatorial Auctions

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

Games with Congestion-Averse Utilities

Maximizing the expected net future value as an alternative strategy to gamma discounting

Introduction to Political Economy Problem Set 3

Finish what s been left... CS286r Fall 08 Finish what s been left... 1

Problem 3 Solutions. l 3 r, 1

Iterated Dominance and Nash Equilibrium

Microeconomics Qualifying Exam

Topics in Contract Theory Lecture 3

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

Bandit Learning with switching costs

Transcription:

Algorithmic Game heory Summer 2017, Week 4 EH Zürich Overview Regret Minimization and Correlated Equilibria Paolo Penna We have seen different type of equilibria and also considered the corresponding price of anarchy. hese equilibria have different features: CCE always exist easy to find CE MNE PNE always exist hard to find may not exist hard to find In this lecture we show that coarse correlated equilibria (CCE) are easy to compute. We have also seen that the price of anarchy bounds obtained via the smooth framework extend to CCE equilibria. An interesting class of games are congestion games with affine delays: (1) he price of anarchy for P NE is 5/2, but (2) computing P NE is such games is PLS-complete. Fortunately (3) the 5/2 bound on the price of anarchy holds also for CCE and (4) today we see that CCE can be computed in polynomial time (in any game). Structure of this lecture How to play against and adversary (regret minimization) From regret minimization to CCE (no-regret dynamics) 1 Regret Minimization he next two sections will introduce the main ideas towards the general definition of regret-minimization and the algorithm. You can jump directly to Section 1.2 for the general results. 1.1 Experts Problem (warm up) Consider this setting. We have m experts that tell us if tomorrow it will rain (R) or be sunny (S). One of these expert is a real expert, meaning that he/she is never wrong. We do not know who is the expert. Every day we make a prediction based on what the experts tell us. If our prediction is wrong, we have a cost equal to 1, otherwise we incur no cost. Here is one algorithm Version : October 16, 2017 Page 1 of 8

Majority Algorithm (MAJ): Each day do the following: ake the majority of the experts advice Every time an expert is wrong, discard him/her from future consideration; Claim 1. he number of mistakes is at most log m, where m is the number of experts. Proof. Every mistake will half the number of experts that the algorithm takes into account. What if the best expert makes some mistakes? We could restart the previous algorithm every time we run out of experts. If the (best) expert makes r errors, we are going to make at most r log m errors: After each phase (we discarded all experts), the best expert must have done at least one mistake. So we cannot restart more than r times, and a phase will cost us at most log m (as before). he main idea of next algorithm is to keep a weight for each expert and reduce his/her weight whenever he/she was wrong. Weighted Majority (WM): Each day do the following: w 1 (a) 1 (initial weights) w t+1 (a) w t (a) 1 2 if a errs at step t Do weighted majority to decide S or R at step t; Claim 2. he number of mistakes is at most (2.41)C BES + log m, where m is the number of experts and C BES is the number of mistakes of the best expert. Proof. We work with the following quantities: W t := a w t (a) and show two things: 1. If the best expert does not make many mistakes, in the end W is not too small; 2. Every time we make an error, then W t drops exponentially. he intuition is that we cannot do too many mistakes, if the best expert does few mistakes. Here is the first step: every time the best expert a makes one mistake, we half its weight, therefore W +1 w +1 (a ) = w 1 (a ) }{{} 1 ( ) CBES 1. 2 Version : October 16, 2017 Page 2 of 8

We claim that every time we make a mistake at step t, we have ( ) 3 W t+1 W t 4 because we will half the weights of W t which were the weighted majority, leaving the weighted minority unchanged. herefore, if r is the number of mistakes we make, then ( ) r 3 W +1 }{{} W 1 4 m Combining the two inequalities on W +1 we get ( ) CBES 1 m 2 and taking the log on both sides we obtain ( ) r 3 4 r 1/ log (4/3) C }{{} BES + log m 2.41 1.2 Minimizing External Regret (general setting) Consider the following problem. here is a single player playing rounds against an adversary, trying to minimize his cost. In each round, the player chooses a probability distribution over m strategies (also termed actions here). After the player has committed to a probability distribution, the adversary picks a cost vector fixing the cost for each of the m strategies. In round t = 1,...,, the following happens: he player picks a probability distribution p t over his strategies. he adversary picks a cost vector c t, specifying a cost c t (a) [0, 1] for every strategy a. he player picks a strategy using his/her probability distribution p t, and therefore has an expected cost of p t (a)c t (a). At this point the player gets to know the entire cost vector c t. a What is the right benchmark for an algorithm in this setting? he best action sequence in hindsight achieves a cost of min a c t i. However, getting close to this number is generally hopeless as the following example shows. Version : October 16, 2017 Page 3 of 8

Example 3. Suppose m = 2 and consider an adversary that chooses c t = (1, 0) if p t 1 1/2 and c t = (0, 1) otherwise. hen the expected cost of the player is at least /2, while the best action sequence in hindsight has cost 0. We will instead compare with the best fixed action over the same period: C BES := min a c t (a), which is nothing but the best fixed action in hindsight. he algorithm A used by the player to determine the distributions p t s has cost C A := p t (a)c t (a) Definition 4. he difference of this cost and the cost of the best single strategy in hindsight is called external regret, a R A := C A C BES An algorithm is called no-external-regret algorithm if for any adversary and all we have R A = o( ). his means that on average the cost of a no-external-regret algorithm approaches the one of the best fixed strategy in hindsight or even beats it, C A C BES + ɛ. he next example shows that there can be no deterministic no-external-regret algorithm. Example 5 (Randomization is necessary). Suppose there are m 2 actions. In each round t the algorithm commits to a strategy a. he adversary can set c t (a) = 1 and c t (b) = 0 for b a. he total cost of the algorithm will be, while the cost of the best fixed action in hindsight is at most /m. 1.3 he Multiplicative-Weights Algorithm In this section, we will get to know the multiplicative-weights algorithm (also known as randomized weighted majority or hedge). Multiplicative Weights Update Algorithm (MW): w 1 (a) 1; w t+1 (a) w t (a) (1 η) ct (a) At time t choose strategy a with probability p t (a) =w t (a)/w t where W t = a w t (a). (1) Version : October 16, 2017 Page 4 of 8

he algorithm maintains weights w t (a), which are proportional to the probability that strategy a will be used in round t. After each round, the weights are updated by a multiplicative factor, which depends on the cost in the current round. 1.4 Analysis he first step is to show that if the optimum has large cost the weight W is also large: W (1 η) C BES (2) Here is the proof of (2): if a denotes the best fixed action for the costs, C BES = ct (a ), then W w (a ) = w 1 (a )(1 η) c1 (a ) (1 η) c2 (a ) (1 η) c (a ). he second step is to relate W t+1 to the expected cost of the algorithm at time t: he expected cost of the algorithm at step t is W t+1 W t (1 η C t MW ) (3) C t MW := a p t (a) c t (a) = a w t (a) W t c t (a). Now observe that W t+1 = a w t+1 (a) = a a w t (a) (1 η) ct (a) w t (a) (1 η c t (a)) (4) =W t ηw t C t MW. (5) where (4) follows from the fact that (1 η) x (1 ηx) for η [0, 1 ] and x [0, 1]. 2 his step of the proof gives the hypothesis: η [0, 1 2 ] and costs ct (a) in [0, 1]. Now we compare the cost of the algorithm to the optimum: ake the logarithm on both sides (1 η) C BES W W 1 (1 η CMW t ) C BES ln(1 η) ln m + ln(1 η CMW t ) Version : October 16, 2017 Page 5 of 8

Now we use aylor expansion: ln(1 x) = x x2 2 x3 3 in particular, ln(1 η) η η 2 because η 1/2, and ln(1 η C t MW ) η Ct MW, thus obtaining that is C BES ( η η 2 ) ln m + ηcmw t = ln m η C MW C MW (1 + η)c BES + ln m η C BES + η + ln m η where the inequality uses a crude upper bound C BES because c t (a) 1. Now we can optimize our parameter η knowing. For η = ln m/ the cost of MW satisfies C MW C BES + 2 ln m o summarize we have proven the following results. heorem 6 (Littlestone and Warmuth, 1994). he multiplicative-weights algorithm, for any sequence of cost vectors from [0, 1], guarantees C A (1 + η)c BES + ln m η Corollary 7. he multiplicative-weights algorithm with η =. ln m most 2 ln m = o( ) and hence is a no-external-regret algorithm. has external regret at 2 Connection to Coarse Correlated Equilibria Let us now connect this back to cost-minimization games. For this fix a cost-minimization game. Without loss of generality, assume that all costs are in [0, 1]. We consider noexternal-regret dynamics defined as follows. At each time step t = 1,..., : 1. Each player i simultaneously and independently chooses a mixed strategy σ t i using a no-external-regret algorithm A. 2. Each player i receives a cost vector c t i, where c t i(s i ) is the expected cost of strategy s i when the other players play their chosen mixed strategies: c t i(s i ) := E s i σ i [c i (s i, s i )]. Version : October 16, 2017 Page 6 of 8

Do such dynamics converge to Nash equilibria? Not necessarily. However, on average the players play according to an approximate coarse correlated equilibrium. Proposition 8. Let σ 1,..., σ be generated by no-external-regret dynamics such that each player s external regret is at most ɛ. Let p be the probability distribution that first selects a single t [ ] uniformly at random and then chooses for every player i one s i according to σi. t hen p is an ɛ-coarse correlated equilibrium. Proof. By definition, for each player i, E s p [c i (s)] E s p [c i (s i, s i )] = 1 (E s σ t[c i (s)] E s σ t[c i (s i, s i )]) ɛ, where the inequality follows by observing that the first term in the summation is the expected cost achieved by the regret-minimization algorithm A and the second term is bounded by the cost achieved by the best fixed cost in hindsight: E s σ t[c i (s)] = C A and E s σ t[c i (s i, s i )] C BES. (6) (Note that C A and C BES are defined with respect to the costs c t i() of the adversary of i, that is, the distributions of all other players.) Exercise 1. Verify that (6) indeed holds by looking at the definition of C A and C BES given above. Exercise 2. Show that an ɛ-cce can be computed in O( ln m ) iterations of the dynamics ɛ 2 above. Hint: use multiplicative weights update algorithm. References he material of this lecture can be also found here: im Roughgarden, wenty Lectures on Algorithmic Game heory, Cambridge University Press, 2016 (Chapter 17 and references therein). Alternatively, see im Roughgarden s lecture notes, http://theory.stanford. edu/~tim/f13/f13.pdf A significant part of this notes is from last year s notes by Paul Dütting available here: http://www.cadmo.ethz.ch/education/lectures/hs15/agt_hs2015/ Version : October 16, 2017 Page 7 of 8

Exercises (during the exercise class - 16.10.2017) We shall discuss and solve together these exercises. Exercise 3. Each of the following statements is false. Your task is to disprove them (give a counterexample): 1. A pure Nash equilibrium can be computed in the following way: Find the state minimizing the social cost (sum of all players costs). 2. Suppose we have a game with no pure Nash equilibria. hen there is a mixed Nash equilibrium in which each player assigns strictly positive probability to every strategy. 3. Suppose best response converge to a pure Nash equilibrium, no matter the starting state. hen the game is a potential game. Exercise 4. Consider this symmetric network congestion game with two players: 1, 5 s 2, 6 t (a) What are the price of anarchy and the price of stability for pure Nash equilibria? (b) What are the price of anarchy and the price of stability for mixed Nash equilibria? Hint: Start by listing all mixed Nash equilibria. o obtain these start with a sentence like, Let σ be a mixed Nash equilibrium with σ 1 = (λ 1, 1 λ 1 ), σ 2 = (λ 2, 1 λ 2 ), and continue by deriving properties of λ 1 and λ 2. (c) What is the best price-of-anarchy bound that can be shown via smoothness?