Regret Minimization against Strategic Buyers

Size: px

Start display at page:

Download "Regret Minimization against Strategic Buyers"

Ira Glenn
5 years ago
Views:

1 Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research

3 Motivation Online advertisement: revenue of modern search engine and popular online sites. billions of transactions every day. key role of revenue optimization algorithms.

4 Motivation Second-price auctions with reserve: widely adopted mechanism in Ad Exchanges. many transactions admit a single bidder posted-price auctions. study of posted-price auctions with strategic buyers.

5 Related Work Revenue optimization in second-price auctions [Cui et al. 2011; He et al., 2013; Cesa-Bianchi et al., 2013; MM and Muñoz 2014]. Revenue optimization in generalized second-price auctions (GSP) [MM and Muñoz, 2015; Varian, 2007; Lucier et al., 2014; Sun et al, 2014; Rudolph et al. 2016; Charles et al., 2016; Roughgarden and Wang, 2016]. Dynamic pricing [Kanoria and Nazerzadeh 2014, Bikhchandani and McCardle 2012; den Boer, 2015; Chen et al. 2015]. Pricing with strategic and patient buyers [Feldman et al., 2016]. Preference reconstruction [Blum et al. 2014]. Learning optimal auctions [Huang et al., 2015; Morgenstern and Roughgarden, 2015].

6 This Talk Scenarios: Fixed valuation. Random valuation.

7 Setup Repeated posted-price auctions: good repeatedly offered for sale by a seller to a single buyer over rounds. buyer holds private valuation. seller offers price and buyer accepts, a t =1, p t or rejects,, at each round. T v 2 [0, 1] a t =0 t 2 [T ]

8 Setup Seller: pricing algorithm A. total revenue: P T t=1 a tp t. regret: Reg T (A) =vt P T t=1 a tp t. Buyer: discounting factor 2 (0, 1]. surplus: Sur T (A) = P T t=1 t 1 a t (v p t ).

9 Strategic Setting [Amin et al., 2013] Seller announces his algorithm A. Buyer acts strategically: he seeks to maximize his surplus. Sur T (A) Seller seeks to minimize his strategic regret, that is regret Reg T (A) against strategic buyer. Question: can we design algorithms for minimizing strategic regret?

10 Truthful Setting Fast Search (FS) algorithm: [Kleinberg and Leighton, 2003] keeps track of feasible interval with and parameter. [0, 1] = 1 2 [a, b], starting in each phase, offers prices until a price is rejected. a +,a+2,... if price a + k is rejected, new phase with interval [a +(k 1),a+ k ] and parameter. until size of the interval less than. 1 T 2

11 Truthful Setting Fast Search (FS) algorithm: [Kleinberg and Leighton, 2003] at most dlog 2 log 2 T )e +1 phases. regret in. O(log log T ) lower bound: (log log T ).

12 Example v = 16 $8? $4? $2?!!! $1? No No No YES!

13 Monotone algorithms Algorithm: offer price p t = t ( < 1) until it is accepted. offer accepted price thereafter. idea: slow enough decrease inconvenient for the buyer. q O T 1 Strategic regret in. [Amin et al., 2013]

14 Monotone algorithms Theorem: the strategic regret of any monotone p decreasing convex algorithm is in ( T ). [MM and Muñoz, 2014]

15 Proof idea Fix monotone function. Choose v 2 [ 1 2, 1] at random. Let apple =inf{t: p t <v}, then E[apple] E[v p apple ] c. Tradeoff optimized for p t p t+1 p 1. T

16 Lower Bound [Amin et al. 2013; Kleinberg and Leighton, 2003; MM and Muñoz, 2014] Theorem: for any pricing algorithm following lower bound holds: Reg T (A) max A, the 1,Clog log T 12(1 ) for some universal constant C.

17 Idea Lie buyer when rejecting v>p t or accepting when v<p t. Can we dissuade the buyer from lying? buyer s weakness: time (discounted surplus). penalization: if buyer rejects price, reoffer the price for another (r 1) rounds. r choice of subject to a trade-off.

18 Pricing Strategies Any deterministic strategy can be represented by a tree. 1/2 1/4 3/4 1/8 3/8 5/8 7/8

19 Meta-Algorithm 1/2 1/4 3/4 Strategic 1/8 3/8 5/8 7/8 1/2 Truthful r rounds 1/2 3/4 1/4 3/4 7/8

20 PFS Guarantees [MM and Muñoz, 2014] Theorem: let 0 2 ( 1 ; the following strategic 2, 1) regret guarantees hold for Penalized Fast Search (PFS): for ; Reg T (PFS) = O(log log T ) 2 (0, 1 2 ] Reg T (PFS) = O(log T log log T ) for 2 ( 1. 2, 0)

21 Proof Idea Surplus of rejected path at most t+r 1 1 p t Surplus of accepted path at least t 1 (v p t ) (v p t ) apple 1 r

22 Horizon-Indep. Regret Extension of PFS via exponentiating trick to horizon-independent algorithm : i gpfs length of th epoch verifies log 2 log 2 T i =2 i 1. Reg for 2 (0, 1 T ( PFS) g = O(log log T ) ; 2 ] Reg for 2 ( 1 T ( PFS) g = O(log T log log T ). 2, 0) [Drutsa, 2017]

23 Further Improvement PRRFES algorithm: truthful FES algorithm: modified FS; after rejection, reoffer last accepted price g times. same lie penalization as in PFS. continue to offer until rejection. [Drutsa, 2017] strategic regret: for 2 (0, 0]. Reg T (PRRFES) = O(log log T )

24 Random valuations

25 Setup [Amin et al., 2013] Repeated posted-price auctions: good repeatedly offered for sale by a seller to a single buyer over rounds. buyer receives valuation v t 2 [0, 1], v t D. seller offers price and buyer accepts, a t =1, p t or rejects, a t =0, at each round. T t 2 [T ]

26 Setup Seller: pricing algorithm A. total revenue:. regret: Buyer: discounting factor 2 (0, 1]. surplus: E P T t=1 a tp t Reg T (A) = max p2p Sur T (A) =E p P(v >p)t E apple T X t=1 apple T X t=1 t 1 a t (v p t ). a t p t.

27 Strategic buyers Simple tree structure for fixed valuation. Seller offers price from distribution. Surplus of state s t =(P t,h t 1,v t,p t ): S t (s t ) = max t 1 a t (v t p t ) a t 2{0,1} + E St+1 (f t (P t,h t 1 ),H t,v t+1,p t+1. (v t+1,p t+1 ) D f t (P t,h t 1 ) Solution found in time T P. P t

28 -strategic buyers Stop optimizing if all future surplus is at most. Behave truthfully otherwise, Optimize for Tractable MDP. log[ 1 (1 )] log 1 rounds.

29 Bandit Formulation Only observe reward of price offered. Minimize pseudo-regret Reg T (A) = max p2p p P(v >p)t E apple X T a t p t. t=1 Problem: rewards not i.i.d. (strategic buyer).

30 Regret bound [MM and Muñoz, 2015] Theorem: Let be a finite set of prices. Let be the number of time the buyer lies. Let P p = argmax p2p and p = p P(v >p ) p P(v >p). For any > 0, Reg T apple E[L]+ p P(v >p) X p : p> where T p (t) is the number of times price p has been offered up to time. t E[T p (t)] p + T L

31 R-UCB Make UCB robust to lies. Use different upper confidence bounds bµ p (t) = 1 T p (t) tx a i p i 1 pi =p i=1 s 2 log t + T p (t) + Lp T p (t)

32 Regret analysis Proposition: The regret of the R-UCB algorithm is bounded by Reg T apple E[L]+ X 4Lp + p: p> 32 log T p +2 p + T + TX t=1 P t (p, L), where P t (p, L) :=P Lt (p) T p (t) + L t(p ) T (t) L p T p (t) + p T (t).

33 Bound on Lies An -strategic buyer lies at most. Regret of R-UCB in O log T +. Extension to continuous set of prices by discretization. Regret in O. p T + T 1/4 1 1 P l log(1/ (1 )) m log(1/ )

34 Conclusion Analysis of strategic regret. Fixed and random valuation scenarios. Simple algorithms extending truthful scenario. Many questions: Can we extend results to other types of buyers? What about if the buyer learns too? Extension to general auctions?

35 Other Related Questions Can the buyer trust the algorithm announced? testing incentive-compatibility [Lahaie, Muñoz, Sivan, and Vassilvitskii, 2017] (Andres s talk). Extend analysis to the case where algorithmic details are not known.

Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme

Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10