COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May 1, 2014

Similar documents
TTIC An Introduction to the Theory of Machine Learning. Learning and Game Theory. Avrim Blum 5/7/18, 5/9/18

15.053/8 February 28, person 0-sum (or constant sum) game theory

Math 167: Mathematical Game Theory Instructor: Alpár R. Mészáros

Regret Minimization and Correlated Equilibria

6.896 Topics in Algorithmic Game Theory February 10, Lecture 3

TUFTS UNIVERSITY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING ES 152 ENGINEERING SYSTEMS Spring Lesson 16 Introduction to Game Theory

Thursday, March 3

Best counterstrategy for C

PAULI MURTO, ANDREY ZHUKOV. If any mistakes or typos are spotted, kindly communicate them to

IEOR E4004: Introduction to OR: Deterministic Models

Near-Optimal No-Regret Algorithms for Zero-Sum Games

Using the Maximin Principle

Lecture 5. 1 Online Learning. 1.1 Learning Setup (Perspective of Universe) CSCI699: Topics in Learning & Game Theory

Yao s Minimax Principle

MATH 210, PROBLEM SET 1 DUE IN LECTURE ON WEDNESDAY, JAN. 28

MAT 4250: Lecture 1 Eric Chung

CS711 Game Theory and Mechanism Design

Chapter 10: Mixed strategies Nash equilibria, reaction curves and the equality of payoffs theorem

ORF 307: Lecture 12. Linear Programming: Chapter 11: Game Theory

Game theory for. Leonardo Badia.

Iterated Dominance and Nash Equilibrium

Lecture l(x) 1. (1) x X

Strategies and Nash Equilibrium. A Whirlwind Tour of Game Theory

Game Theory: Minimax, Maximin, and Iterated Removal Naima Hammoud

Outline for today. Stat155 Game Theory Lecture 13: General-Sum Games. General-sum games. General-sum games. Dominated pure strategies

Regret Minimization and Security Strategies

Game theory and applications: Lecture 1

COS 445 Final. Due online Monday, May 21st at 11:59 pm. Please upload each problem as a separate file via MTA.

Strategy Lines and Optimal Mixed Strategy for R

Econ 172A, W2002: Final Examination, Solutions

Best-Reply Sets. Jonathan Weinstein Washington University in St. Louis. This version: May 2015

PAULI MURTO, ANDREY ZHUKOV

Maximizing Winnings on Final Jeopardy!

Lecture 7: Bayesian approach to MAB - Gittins index

Definition 4.1. In a stochastic process T is called a stopping time if you can tell when it happens.

LINEAR PROGRAMMING. Homework 7

Near-Optimal No-Regret Algorithms for Zero-Sum Games

MATH 121 GAME THEORY REVIEW

Subject : Computer Science. Paper: Machine Learning. Module: Decision Theory and Bayesian Decision Theory. Module No: CS/ML/10.

Mixed Strategies. Samuel Alizon and Daniel Cownden February 4, 2009

6.254 : Game Theory with Engineering Applications Lecture 3: Strategic Form Games - Solution Concepts

Lecture 6. 1 Polynomial-time algorithms for the global min-cut problem

Comparative Study between Linear and Graphical Methods in Solving Optimization Problems

m 11 m 12 Non-Zero Sum Games Matrix Form of Zero-Sum Games R&N Section 17.6

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India October 2012

Maximum Contiguous Subsequences

Outline for today. Stat155 Game Theory Lecture 19: Price of anarchy. Cooperative games. Price of anarchy. Price of anarchy

Introduction to game theory LECTURE 2

Their opponent will play intelligently and wishes to maximize their own payoff.

Outline. 1 Introduction. 2 Algorithms. 3 Examples. Algorithm 1 General coordinate minimization framework. 1: Choose x 0 R n and set k 0.

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Maximizing Winnings on Final Jeopardy!

The Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used.

Game Theory I. Author: Neil Bendle Marketing Metrics Reference: Chapter Neil Bendle and Management by the Numbers, Inc.

Complexity of Iterated Dominance and a New Definition of Eliminability

Game Theory Notes: Examples of Games with Dominant Strategy Equilibrium or Nash Equilibrium

Lecture 10: The knapsack problem

1. better to stick. 2. better to switch. 3. or does your second choice make no difference?

Best Reply Behavior. Michael Peters. December 27, 2013

MA300.2 Game Theory 2005, LSE

Week 8: Basic concepts in game theory

Q1. [?? pts] Search Traces

56:171 Operations Research Midterm Examination Solutions PART ONE

TTIC An Introduction to the Theory of Machine Learning. The Adversarial Multi-armed Bandit Problem Avrim Blum.

CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma

Rationalizable Strategies

Math 135: Answers to Practice Problems

Tug of War Game. William Gasarch and Nick Sovich and Paul Zimand. October 6, Abstract

1 Shapley-Shubik Model

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Recap First-Price Revenue Equivalence Optimal Auctions. Auction Theory II. Lecture 19. Auction Theory II Lecture 19, Slide 1

CS364A: Algorithmic Game Theory Lecture #3: Myerson s Lemma

CS360 Homework 14 Solution

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

Problem Set 2 - SOLUTIONS

Lecture Notes 1

Outline Introduction Game Representations Reductions Solution Concepts. Game Theory. Enrico Franchi. May 19, 2010

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted

1 Overview. 2 The Gradient Descent Algorithm. AM 221: Advanced Optimization Spring 2016

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2017

ECS171: Machine Learning

Efficiency and Herd Behavior in a Signalling Market. Jeffrey Gao

Game Theory - Lecture #8

ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games

Lecture 8: Introduction to asset pricing

Version 01 as of May 12, 2014 Author: Steven Sagona To be submitted to PRL (not really)

Epistemic Game Theory

2. This algorithm does not solve the problem of finding a maximum cardinality set of non-overlapping intervals. Consider the following intervals:

GAME THEORY. Game theory. The odds and evens game. Two person, zero sum game. Prototype example

Chapter 2 Strategic Dominance

Microeconomics II. CIDE, MsC Economics. List of Problems

January 26,

Applications of Linear Programming

Lecture 5: Iterative Combinatorial Auctions

Game Theory: introduction and applications to computer networks

Homework 2: Solutions Sid Banerjee Problem 1: Practice with Dynamic Programming Formulation

Elements of Economic Analysis II Lecture XI: Oligopoly: Cournot and Bertrand Competition

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

Lecture 5 January 30

Transcription:

COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jordan Ash May, 204 Review of Game heory: Let M be a matrix with all elements in [0, ]. Mindy (called the row player) chooses row i while Max (called the column player) chooses column j. In this case, from Mindy s expected loss is: Loss = M(i, j) Alternatively, Mindy could select a move randomly from a distribution over the rows and Max could select a move randomly from a distribution over the columns. Here, the expected loss for Mindy is: Loss = X i,j (i)m(i, j)(j) = M = M(, ) and are called mixed strategies, while i and j are called pure strategies. 2 Minimax heorem: In some games, such as Rock, aper, Scissors, players move at exactly the same time. In this way, both players have the same information available to them at the time of moving. Now we suppose that Mindy plays first, followed by Max. Max knows the that Mindy chose, and further knows M(, ) for any he chooses. Consequently, he chooses a that maximizes M(, ). Because Mindy knows that Max will choose = arg max M(, ) for any she chooses, she selects a that minimizes max M(, ). hus, if Mindy goes first, she could expect to su er a loss of min max M(, ). Overall, it may initially seem like the player to go second has an advantage because she has more information available to her. From Mindy s perspective again, this leads to: max min M(, ) apple min max M(, ) So Mindy playing after Max seems to be better than if the two play in reverse order. However, John von Neumann showed that the expected outcome of a game is always the same, regardless of the order in which players move. v = max min M(, ) =min max M(, )

Here, v denotes the value of the game. his may seem counterintuitive, because the player that goes second has more information available to her at the time of choosing a move. We will prove the above statement using an online learning algorithm. Let: hen, = arg min = arg max max M(, ) min M(, ) 8 : M(,) apple v () 8 : M(, ) v (2) In other words, for some optimal, the maximum loss that Max could cause is bounded by v and Mindy s loss is at least v, regardless of the particular strategies they choose. If we had knowledge of M, we might be able to find by employing techniques from linear programming. However, we don t necessarily have this knowledge, and even if we did, M could be massively large. Further, applies here only for opponents that are perfectly adversarial, so it doesn t account for an opponent that might make mistakes. hus, it makes sense to try to learn M and iteratively. We do this with the following formulation: for t =,..., Mindy chooses t Max chooses t (with knowledge of t ) Mindy observes M(i, t )8i Loss = M( t, t ) end Clearly, the total loss of this algorithm is simply M( t, t ). We want to be able to compare this loss to the best possible loss that could have been achieved by fixing any single strategy for all iterations. In other words, we want to show: M( t, t ) apple min M(, t) + [Small Regret erm] 2. Multiplicative Updates Suppose we use a multiplicative weight algorithm that updates weights in the following way, where n is the number of rows in matrix M: 2

2 [0, ) (3) (i) = 8i (4) n t+ (i) = t (i) M(i,t) Normalizing Constant (5) Our algorithm is similar to the weighted majority algorithm. he idea is decrease the probability of choosing a particular row proportionally to the loss su ered by selecting that row. After making an argument using potentials, we could use this algorithm to obtain the following bound: where = ln( ) and c =. 2.2 Corollary M( t, t ) apple min M(, t )+c ln(n) (6) We can choose such that: M(, ) apple min M(, t )+ (7) q ln(n) where = O( ), which goes to zero for large. In other words, the loss su ered by Mindy per round approaches the optimal average loss per round. We ll use this result to prove the Minimax theorem. 2.3 roof Suppose that Mindy uses the above algorithm to choose t, and that Max chooses t such that t = arg max M(,), maximizing Mindy s loss. Also, let: = = t (8) t (9) We also know intuitively, as mentioned before, that max min M(, ) apple min max M(, ), because as stated earlier, the player that goes second has more information available to her. 3

o show equality, which would prove the Minimax theorem stated earlier, it s enough to show that max min M(, ) min max M(, ) also. min max M apple max M By definition of : = max t M By convexity: apple max t M By definition of t : = t M t By corollary 2.2: apple min M t + he proof is finished because that: By definition of : =min M + apple max min M+ goes to zero as gets large. his proof also shows max M apple v + where v = max min M. If we take the average of the t terms computed at each round of the algorithm, we get something within of the optimal value. Because goes to zero for large values of, we can get closer to the optimal strategy by simply increasing. In other words, this strategy becomes more and more optimal as the number of rounds increases. For this reason, is called an approximate min max strategy. A similar argument could be made to show that is an approximate max min strategy. 3 Relation to Online Learning In order to project our analysis into an online learning framework, consider the following problem setting: 4

for t =,..., Observe x t from X redict ŷ t 2{0, } Observe true label c(x t ) end Here we consider each hypothesis h as being an expert from the set of all hypotheses H. We want to show that: number of mistakes apple number of mistakes of best h + [Small Regret erm] We set up a game matrix M where M(i, j) =M(h, x) =ifh(x) 6= c(x) and 0 otherwise. hus, the size of this matrix is H X. Given an x t, the algorithm must choose some t, a distribution used to predict x t s label. h is chosen according to the distribution t, and then ŷ t is chosen as h(x t ). t in this context is the distribution concentrated on x t (is at x t and 0 at all other x 2 X). Consequently: M( t,x t ) = E[number of mistakes] apple min M(h, x t ) + [Small Regret erm] h Notice that min h M(h, x t) is equal to the number of mistakes made by the best hypothesis h. IfwesubstituteM( t,x t )with h t(h) {h(x t ) 6= c(x t )} = r[h(x) 6= c(x)] above, we obtain the same bound as was found in the previous section. 4 Relation to Boosting We could think of boosting as a game between the boosting algorithm and the weak learner it calls. Consider the following problem: for t =,..., he boosting algorithm selects a distribution D t over the training set samples X he weak learner chooses a hypothesis h t end Here we assume that all weak learners h t obey the weak learning assumption, i.e. that r (x,y) Dt [h t (x) 6= y)] apple 2 and >0. We could define the game matrix M 0 in terms of the matrix M used in the last section. However, here we want a distribution over the X samples rather than over the hypotheses, so we need to transpose and normalize M. M 0 = M 5

In other words, M 0 (i, j) =M 0 (x, h) =ifh(x) =c(x) and 0 otherwise. Here, t = D t, and t is a distribution fully concentrated on the particular h t chosen by the weak learner. We could apply our same analysis from the multiplicative weights algorithm: M 0 ( t,h t ) apple min x M 0 (x, h t )+ Also, M 0 ( t,h t )= x (x) {h(x) =c(x)} = r[h t (x) =c(x)] 2 + Combining these facts: 2 + apple M 0 (x, h t ) apple min x M 0 (x, h t )+ 8x : Rearranging, X M 0 (x, h t ) 2 + > 2 which is again true because approaches 0 as gets large. In other words, we have found that over 2 of the weak hypotheses correctly classify any x when gets su ciently large. Because the final hypothesis is just a majority vote of these weak learners, we have proven that the boosting algorithm drives training error to zero when enough weak learners are employed. 6