Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme

Size: px

Start display at page:

Download "Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme"

Jerome McDowell
5 years ago
Views:

1 Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme

2 How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10

3 Complications What if the seller only sees a sample of the population? What if the seller doesn t know every buyer s valuation? Can buyers lie and don t provide their true valuation? What if valuations change as a function of features?

4 Outline Online revenue optimization Batch revenue optimization

5 Various flavors of this problem One buyer (pricing) vs multiple buyers (auctions) Fixed valuations (realizable), random valuations (stochastic) and worst-case valuations (adversarial) Contextual vs non-contextual Strategic vs myopic buyers

6 Definitions Valuation ( v ): What a buyer is willing to pay for a good Bid: How much a buyer claims she is willing to pay Reserve price ( p): Minimum price acceptable to the seller Revenue ( Rev) :How much the seller gets from selling Interactions ( interact T ): Number of times buyer and seller

7 Single buyer Valuation v t = maximum willingness to pay Reserve price p t Myopic (price taking buyer): buys whenever v t p t i.e. doesn t reason about consequences of purchasing decision revenue function is Rev(p t,v t )=p t 1 vt p t Strategic buyer: reasons about how purchasing decisions affect future prices

8 Single myopic buyer Realizable setting: valuation is fixed but unknown v t = v 2 [0, 1] Stochastic setting: valuations are sampled from an unknown distribution v t D Adversarial setting no assumption made on valuations Seller s goal: Minimize regret

9 Single myopic buyer $5 $6 $4 $7 Yes Yes Yes No ML Google

10 Fixed valuation. v t = v 2 [0, 1] Regret: R = Tv TX Rev(p t,v t ) t=1

11 Binary Search At round k S k =[a k,a k + k ], s =0 and k+1 = k /2 While price accepted p t = a k + s k+1 ; s = s +1 Rejection: Start new round a k+1 is last accepted price k < 1 Stop, offer p t = a k for all t T p t p t v p t a k a k+1 a k+1 k + + k k+1

12 Fast Search Kleinberg and Leighton 2007 At round k S k =[a k,a k + k ], s =0 and k+1 = 2 k While price accepted p t = a k + s k+1 ; s = s +1 Rejection: Start new round a k+1 is last accepted price k < 1 Stop, offer p t = a k for all t T p t p t pa tk+1 + k+1 p t p t p t v a k a k+1 a k + k

13 Kleinberg and Leighton search Analysis: in each round there is at most one no-sale for each sale, the regret is at most k there are at most k/ k+1 =1/ sales k the total regret per round is O(1), since there are O(log log T) rounds before k < 1/T the total regret is O(log log T).

14 Kleinberg and Leighton search Regret R 2 O(log log T ) Lower bound (log log T )

15 Multiple valuations

16 Bandits Expected revenue curve R(p) =E v [Rev(p, v)] R(p) EXP3 p UCB 0 Discretize 1 Apply Bandits

17 Random valuation Valuation Regret General strategy: discretize prices and treat each prices as a bandit v t D R = T max p E p [Rev(p, v t )] h X T E t=1 i Rev(p t,v t ) without any assumptions Õ(T 2/3 ) : balance the discretization error and error in UCB can be improved for special families of distributions

18 Random valuation Expected revenue function E v D [Rev(p, v)] is unimodal Unimodal p Lipschitz bandits [Combes, Proutiere 2014] Õ( T ) If the revenue curve is quadratic around the maximum, then Kleinberg and Leighton also give a p Õ( T ) regret algorithm which is tight in this class.

19 Adversarial Valuations Compete against the best fixed price policy h TX TX i R = E max Rev(p,v t ) Rev(p t,v t ) p General approach: discretize prices in K intervals and treat each as an arm. Use EXP3: [Kleinberg and Leighton 07] t=1 R = Õ(p KT)+O(T/K) =Õ(T 2/3 ) EXP3 regret discretization regret t=1

20 Contextual Pricing Each product represented by a context x t 2 R d ; kx t k 2 apple 1 Buyer valuation is a dot-product: v t = h,x t i The weight vector is fixed but unknown, k k 2 apple 1 TX Regret is: R = v t Rev(p t,v t ) t=1 Can we draw a connection with online learning?

21 Contextual Pricing Õ( p T ) Stochastic gradient give regret [Amin et al. 2014] Cohen, Lobel, Paes Leme, Vladu, Schneider: R = O(d log T ) Algorithm based on the ellipsoid method Keep knowledge sets: S 0 = { 2 R d ; k k 2 apple 1} For each x t we know: v t 2 [a t,b t ] a t =min 2St h,x t i x t b t = max 2St h,x t i

22 Contextual Pricing Õ( p T ) Stochastic gradient give regret [Amin et al. 2014] Cohen, Lobel, Paes Leme, Vladu, Schneider: R = O(d log T ) Algorithm based on the ellipsoid method S t+1 If a t b t apple 1/T then we are done. If not, guess p t 2 [a t,b t ] Update the knowledge set to either: S t+1 = { 2 S t ; h,x t iapplep t } S t+1 = { 2 S t ; h,x t i p t } x t

23 Contextual Pricing Õ( p T ) Stochastic gradient give regret [Amin et al. 2014] Cohen, Lobel, Paes Leme, Vladu, Schneider: R = O(d log T ) Algorithm based on the ellipsoid method Theorem: Setting p t = 1 2 (a t + b t ) has (2 d log T ) regret. Theorem: Ellipsoid regularization has O(d 2 log T ) regret. Theorem: Cylindrification regularizer has O(d log T ) regret. Theorem: Squaring trick has regret O(d 4 log log T )

24 Strategic Buyers

25 Strategic buyers What happens if buyers know the seller will adapt prices?

26 Setup Buyer s valuation Seller offers price v t p t Buyer accepts a t =1or rejects a t =0 Discount factor h i PT Buyer optimizes E t=1 t a t (v t p t ) h X T i Seller maximizes revenue E a t p t t=1

27 Three scenarios Fixed value v t = v [Amin et al. 2013, Mohri and Muñoz 2014, Drutsa 2017] Random valuation and Muñoz 2015] v t D [Amin et al. 2013, Mohri Contextual valuation v t = h,x t i with x t D [Amin et al. 2014]

28 Game setup Seller selects pricing algorithm Announces algorithm to buyer Buyer can play strategically

29 Measuring regret Best fixed price in hindsight? real value = 8 fake value = 1 $4? $2? $1? No No YES! p t =4, 2, 1, 1, 1, 1,... a t =0, 0, 1, 1, 1, 1,...

30 Strategic Regret Compare against best possible outcome TX Fixed valuation R = Tv a t p t Random valuation Contextual valuation t=1 R = T max E p [Rev(p, v t )] E[a t p t ] p h X T R = E t=1 v t a t p t i

31 The Buyer Knowledge of future incentivizes buyer to lie Lie: Buyer rejects even if his value is greater than reserve price

32 How can we reduce the number of lies?

33 Warm up Monotone algorithms [Amin et al. 2013] Choose < 1 Offer prices p t = t If accepted offer price for the remaining rounds

34 Warm up Decrease slowly to make lies costly Not too slow or accumulate regret p T Regret in O 1 Lower bound log log T + 1 1

35 Better guarantees Fast search with penalized rejections [Mohri and Muñoz 2014] Every time a price is rejected offer again for several rounds Regret in Horizon independent guarantees [Drutsa 2017] Regret in log T O 1 log log T O 1

36 Random valuations Valuation v t D Regret R = T max p E p [Rev(p, v t )] E[a t p t ] UCB type algorithm with slow decreasing confidence bounds [Mohri and Muñoz 2015] p 1 Regret in O T + log 1/ T 1/4

37 Contextual Valuation Explore exploit algorithm with longer explore time Amin et al Regret in O T 2/3 p log(1/ )

38 Related Work Revenue optimization in second price auctions [Cesa- Bianchi et al. 2013] Modeling buyers as regret minimizers [Nekipelov et al. 2015] Selling to no regret buyers [Heidari et al. 2017, Braverman et al. 2017] Selling to patient buyers [Feldman et al. 2016]

39 Open problems Contextual valuations without realizability assumptions Strategic buyers with adversarial valuations Online learning algorithms in general auctions [Roughgarden 2016] Multiple strategic buyers

40 Revenue from Multiple Buyers (Pricing -> Auctions)

41 ? Multiple buyers $100 $1000 $50

42 Multi-buyer Setup N buyers with valuations v i 2 [0, 1] from distribution D i Auction A is an allocation x i :[0, 1] N! {0, 1} and payment p i :[0, 1] N! R Revenue: Rev(A) = Goal: Maximize NX i=1 p i E v1,...,v N [Rev(A)] Notation: Given valuation vector (v 1,...,v N ) (v, v i )=(v 1,...,v i 1,v,v i+1,...,v N )

43 Conditions on auction NX Object can only be allocated once x i apple 1 i=1 Individual rationality (IR): u i = v i x i p i 0 Incentive compatibility (IC): v i x i (v i,v i ) p i (v i,v i ) v i x i (v, v i ) p i (v, v i )

44 Why IC? Buyers truly reveal how much they are willing to pay. Makes auction stable Allows learning

45 Some IC auctions Second price auction: allocate to the buyer with highest v i and charge second highest value. x i =1$ v i = max j v j p i = max j6=i v j if x i =1; 0 otherwise

46 Second price auction $100 $1000 $50

47 IC auctions Second price with reserve price r: allocate to the highest bidder if v i r. Charge p i = max(r, max j6=i v j ) x i =1if v i max(max j v j,r) p i = max(max v j,r) if x i =1 j6=i

48 Second Price Auction With Reserve $100 r = $2000 $1000 r = $900 $50

49 Myerson Auction 1 $100 $600 $90 2 $1000 $500 3 $50 $300

50 Some IC auctions Myerson s auction: pick a monotone bid deformation i( ) x i =1$ i (v i ) = max j j(v j ) and i(v i ) > 0 p i = 1 (max(max j(v j ), 0)) if x i =1, 0 otherwise i j6=i If i = 8i x i =1$ v i = max j v j p i = 1 max(max j6=i (v j ), 0) = max(max j6=i v j, 1 (0)

51 Myerson Auction Optimal auction if v i D i independently D i If is known, functions i can be calculated exactly What about unknown distributions? Can we learn the optimal monotone functions? What is the sample complexity?

52 Sample Complexity of Auctions N bidders Valuations v i D i independent Observe Nm samples v i,1...v i,m D i, i 2 {1,...,N} Find auction A such that E[Rev(A)] (1 ) max A E[Rev(A)] Can we use empirical revenue optimization? max A 1 m mx j=1 NX i=1 p i (v 1j,...,v Nj )

53 Lower bounds on sample complexity Proof for a single buyer [Huang et al. 2015] Problem reduces to finding the optimal price for a distribution Need at least approximation 1 2 samples to get a 1

54 Idea of the proof Two similar distributions D2. KL(D1 D2) = Need 1 2 samples to distinguish them w.h.p

55 Revenue curves Approximately optimal revenue sets disjoint E v D2 [Rev(r, v)] E v D1 [Rev(r, v)] If algorithm optimizes revenue for both distributions. It must be able to distinguish them r

56 Upper bounds on sample complexity Auctions are parametrized by increasing functions i Pseudo-dimension of increasing functions is infinite! Restrict the class and measure approximation error

57 t-level auctions $100 $ $50

58 t-level auctions Morgenstern and Roughgarden 2016 Rank candidates using t-step functions Pseudo dimension bounded O(Ntlog Nt) Best t-level auction is a 1 t approximation

59 t-level auctions 1 Theorem: Let t =, using a sample of size N. m = the t-level auction ba maximizing 3 empirical revenue is a optimal auction 1 approximation to the

60 Algorithm Cole and Roughgarden 2015, Huang et al In summary, optimize auctions over all increasing functions Proof for finite support Extension by discretization 1 O 3 samples

61 Is this enough?

62 Features in auctions In practice valuations are not i.i.d. They depend on features (context) Dependency is not realizable in general Algorithm of Huang et al. can be generalized to 1 feature

63 Display ads Millions of auctions Parametrized by publisher information, time of day, Dependency of valuations on features is not clear

64 Setup Single buyer auction, find optimal reserve price Observe sample (x 1,v 1 ),...(x m,v m ) distribution D over X [0, 1] from Hypotheses Goal: Find h: X! R max h2h E (x,v) D[Rev(h(x),v)]

65 Revenue function Non-concave Non-differentiable Discontinuous Is it possible to learn?

66 Learning Theory Theorem [Mohri and Muñoz 2013] given a sample of size m, with high probability the following bound holds uniformly for all h 2 H E[Rev(h(x),v)] 1 m mx i=1 r P Dim(H) Rev(h(x i ),v i ) apple O m Space of linear functions?

67 Can we do empirical maximization?

68 The revenue function

69 Revenue function Non-concave Non-differentiable Discontinuous Is it possible to optimize?

70 Surrogates Loss similar to 0-1 loss Can we optimize a concave surrogate reward?

71 Calibration We say a function R: R R! R is calibrated with respect to Rev if for any distribution D we have argmax r E v [R(r, v)] argmax r E v [Rev(r, v)]

72 Surrogates Theorem [Mohri and Muñoz 2013]: Any concave function that is calibrated is constant.

73 Continuous Surrogates Remove discontinuity Difference of concave functions DC algorithm for linear hypothesis class [Mohri and Muñoz 2013]

74 Optimization Issues Sequential algorithm Not scalable

75 Other class of functions?

76 Clustering Muñoz and Vassilvitskii 2017 Show attainable revenue is related to variance of the distribution Cluster features to have low variance of valuations Revenue related to quality of cluster

77 Related problems Dynamic reserves for repeated auctions [Kanoria and Nazerzadeh 2017] New complexity measures [Syrgkanis 2017] Combinatorial auction sample complexity [Morgenstern and Roughgarden 2016, Balcan et al. 2016] Optimal auction design with neural networks [Dütting et al. 2017]

78 Conclusion Revenue optimization is a crucial practical problem Machine learning techniques have yielded new theory and algorithms on this field We need to better understand the relationship of buyers and sellers There are several open problems still out there

79 Thank you!

Regret Minimization against Strategic Buyers

Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and