Dynamic Pricing with Limited Supply

Size: px

Start display at page:

Download "Dynamic Pricing with Limited Supply"

Bernard Doyle
6 years ago
Views:

1 Dyamic Pricig with Limited Supply Moshe Babaioff, Microsoft Research Silico Valley, Moutai View CA, USA Shaddi Dughmi, Microsoft Research Redmod, Redmod WA, USA Robert Kleiberg, Corell Uiversity, Ithaca NY, USA Aleksadrs Slivkis, Microsoft Research Silico Valley, Moutai View CA, USA We cosider the problem of desigig reveue maximizig olie posted-price mechaisms whe the seller has limited supply. A seller has k idetical items for sale ad is facig potetial buyers ( agets ) that are arrivig sequetially. Each aget is iterested i buyig oe item. Each aget s value for a item is a idepedet sample from some fixed (but ukow) distributio with support [0, 1]. The seller offers a take-it-or-leave-it price to each arrivig aget (possibly differet for differet agets), ad aims to maximize his expected reveue. We focus o mechaisms that do ot use ay iformatio about the distributio; such mechaisms are called detail-free (a alterative term is prior-idepedet ). They are desirable because kowig the distributio is urealistic i may practical scearios. We study how the reveue of such mechaisms compares to the reveue of the optimal offlie mechaism that kows the distributio ( offlie bechmark ). We preset a detail-free olie posted-price mechaism whose reveue is at most O((k log ) 2/3 ) less tha the offlie bechmark, for every distributio that is regular. I fact, this guaratee holds without ay assumptios if the bechmark is relaxed to fixed-price mechaisms. Further, we prove a matchig lower boud. The performace guaratee for the same mechaism ca be improved to O( k log ), with a distributio-depedet costat, if the ratio k is sufficietly small. We show that, i the worst case over all demad distributios, this is essetially the best rate that ca be obtaied with a distributio-specific costat. Categories ad Subject Descriptors: J.4 [Social ad Behavioral Scieces]: Ecoomics; K.4.4 [Computers ad Society]: Electroic Commerce; F.2.2 [Aalysis of Algorithms ad Problem Complexity]: Noumerical Algorithms ad Problems; F.1.2 [Modes of Computatio]: Olie computatio Geeral Terms: Desig, Algorithms, Performace, Theory, Ecoomics Additioal Key Words ad Phrases: Mechaism desig; reveue maximizatio; dyamic pricig; multiarmed badits 1. INTRODUCTION Cosider a airlie that is iterested i sellig k tickets for a give flight. The seller is iterested i maximizig her reveue from sellig these tickets, ad is offerig the tickets o a website such as Expedia. Potetial buyers ( agets ) arrive oe after aother, each with the goal of purchasig a ticket if the price is smaller tha the aget s valuatio. The seller expects such agets to arrive. Wheever a aget arrives the seller presets to him a take-it-or-leave-it price, ad the aget makes a purchasig decisio accordig to that price. The seller ca update the price takig ito accout the observed history ad the umber of remaiig items ad agets. The full versio of this paper is avaliable o arxiv.org. Author s s: {moshe,shaddi,slivkis}@microsoft.com, rdk@cs.corell.edu. Permissio to make digital or hard copies of part or all of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies show this otice o the first page or iitial scree of a display alog with the full citatio. Copyrights for compoets of this work owed by others tha ACM must be hoored. Abstractig with credit is permitted. To copy otherwise, to republish, to post o servers, to redistribute to lists, or to use ay compoet of this work i other works requires prior specific permissio ad/or a fee. Permissios may be requested from Publicatios Dept., ACM, Ic., 2 Pe Plaza, Suite 701, New York, NY USA, fax +1 (212) , or permissios@acm.org. EC 12, Jue 4 8, 2012, Valecia, Spai. Copyright 2012 ACM /12/06...$

2 We adopt a Bayesia view that the valuatios of the agets are IID samples from a fixed distributio, called demad distributio. A stadard assumptio i a Bayesia settig is that the demad distributio is kow to the seller, who ca desig a specific mechaism tailored to this kowledge. (For example, the Myerso optimal auctio for oe item sets a reserve price that is a fuctio of the distributio). However, i some settigs this assumptio is very strog, ad should be avoided if possible. For example, whe the seller eters a ew market, she might ot kow the demad distributio, ad learig it through market research might be costly. Likewise, whe the market has experieced a sigificat recet chage, the ew demad fuctio might ot be easily derived from the old data. Ideally we would like to desig mechaisms that perform well for ay demad distributio, ad yet do ot rely o kowig it. Such mechaisms are called detail-free i the sese that the specificatio of the mechaism does ot deped o the details of the eviromet, i the spirit of Wilso s Doctrie [Wilso 1989] (a alterative term is prior-idepedet ). Learig about the demad distributio is a itegral part of the problem that a detail-free mechaism faces. The performace of such mechaisms is compared to a bechmark that does deped o the specific demad distributio, as i [Kleiberg ad Leighto 2003; Hartlie ad Roughgarde 2008; Besbes ad Zeevi 2009; Dhagwatotai et al. 2010] ad may other papers. I this paper we take this approach ad desig detail-free, olie posted-price mechaisms with reveue that is close to the reveue of the optimal offlie mechaism (that ca deped o the demad distributio ad is ot restricted to be posted price). Our mai results are for ay demad distributio that is regular, or ay demad distributio that satisfies the stroger coditio of mootoe hazard rate. Both coditios are mild ad stadard, ad eve the stroger oe is satisfied by most commo distributios, such as the ormal, uiform, ad expoetial distributios. Posted price mechaisms are commoly used i practice, ad are appealig for several reasos. First, a aget oly eeds to evaluate her offer rather tha compute her private value exactly. Huma agets ted to fid the former task much easier tha the latter. Secod, agets do ot reveal their etire private iformatio to the seller: rather, they oly reveal whether their private value is larger tha the posted price. Third, posted-price mechaisms are trivially truthful (i domiat strategies) ad moreover also group strategy-proof (a otio of collusio resistace whe side paymets are ot allowed). Further, detail-free posted-price mechaisms are particularly useful i practice as the seller is ot required to estimate the demad distributio i advace. Similar argumets ca be foud i prior work, e.g. [Chawla et al. 2010]. Our model. We cosider the followig limited supply auctio model, which we term dyamic pricig with limited supply. A seller has k items she ca sell to a set of agets (potetial buyers), aimig to maximize her expected reveue. The agets arrive sequetially to the market ad the seller iteracts with each aget before observig future agets (i a olie maer). We make the simplifyig assumptio that each aget iteracts with the seller oly oce, ad the timig of the iteractio caot be iflueced by the aget. (This assumptio is also made i other papers that cosider our problem for special supply amouts [Kleiberg ad Leighto 2003; Babaioff et al. 2011; Besbes ad Zeevi 2009].) Each aget i (1 i ) is iterested i buyig oe item, ad has a private value v i for a item. The private values are idepedetly draw from the same demad distributio F. The demad distributio F is ukow to the seller, but it is kow that F has support i [0, 1]. (Assumig support(f ) [0, 1] is w.l.o.g. by ormalizig as log as the seller kows a upper boud o the support.) Wheever aget i arrives to the market the seller offers him a price p i for a item. The aget buys the item if ad oly if v i p i, ad i case she buys the item she pays 75

3 p i (so the mechaism is icetive-compatible). The seller ever lears the exact value of v i, she oly observes the aget s biary decisio to buy the item or ot. The seller selects prices p i usig a olie algorithm, that we heceforth call pricig strategy. We are iterested i desigig pricig strategies with high reveue compared to a atural bechmark, with miimal assumptios o the demad distributio. Our mai bechmark is the maximal expected reveue of a offlie mechaism that is allowed to use the demad distributio; heceforth, we will call it offlie bechmark. This is a very strog bechmark, as it has the followig advatages over our mechaism: it is allowed to use the demad distributio, it is ot costraied to posted prices ad is ot costraied to ru olie. It is realized by a well-kow Myerso Auctio [Myerso 1981] (which does rely o kowig the demad distributio). High-level discussio. Abset the supply costrait, our problem fits ito the multiarmed badit (MAB) framework [Cesa-Biachi ad Lugosi 2006]: i each roud, a algorithm chooses amog a fixed set of alteratives ( arms ) ad observes a payoff, ad the objective is to maximize the total payoff over a give time horizo. Our settig correspods to (prior-free) MAB with stochastic payoffs [Lai ad Robbis 1985]: i each roud, the payoff is a idepedet sample from some ukow distributio that depeds o the chose arm (price). This coectio is exploited i [Kleiberg ad Leighto 2003; Blum et al. 2003] for the special case of ulimited supply (k = ). The authors use a stadard algorithm for MAB with stochastic payoffs, called UCB1 [Auer et al. 2002a]. Specifically, they focus o the prices {iδ : i N}, for some parameter δ, ad ru UCB1 with these prices as arms. The aalysis relies o the regret boud from [Auer et al. 2002a]. However, either the aalysis or the ituitio behid UCB1 ad similar MAB algorithms is directly applicable for the settig with limited supply. Iformally, the goal of a MAB algorithm would be to coverge to a price p that maximizes the expected per-roud reveue R(p) p(1 F (p)). This is, i geeral, a wrog approach if the supply is limited: ideed, sellig at a price that maximizes R( ) may quickly exhaust the ivetory, i which case a higher price would be more profitable. Our high-level coceptual cotributio is showig that eve the limited supply settig ca be fruitfully treated as a badit problem. The MAB perspective here is that we focus o the trade-off betwee exploratio (acquirig ew iformatio) ad exploitatio (takig advatage of the iformatio available so far). I particular, we recover a essetial feature of UCB1 that it does ot separate exploratio ad exploitatio, ad istead explores arms (prices) accordig to a schedule that uceasigly adapts to the observed payoffs. This feature results, both for UCB1 ad for our algorithm, i a much more efficiet exploratio of suboptimal arms: very suboptimal arms are chose very rarely eve while they are beig explored. We use a idex-based algorithm where each arm is determiistically assiged a umerical score ( idex ) based o the past history, ad i each roud a arm with a maximal idex is chose; the idex of a arm depeds o the past history of this arm (ad ot o other arms). Oe key idea is that we defie the idex of a arm accordig to the estimated expected total payoff from this arm give the kow costraits, rather tha accordig to its estimated expected payoff i a sigle roud. This idea leads to a algorithm that is simple ad (we believe) very atural. However, while the algorithm is simple its aalysis is ot: some ew ideas are eeded, as the elegat tricks from prior work do ot apply (see Sectio 3 for further discussio). Cotributios. I all results below, we cosider the dyamic pricig problem with limited supply: agets ad k items. We preset pricig strategies with expected reveue that is close to the offlie bechmark, for large families of atural distri- 76

4 butios. All our pricig strategies are determiistic ad (trivially) ru i polyomial time. Our mai result follows. THEOREM 1.1. There exists a detail-free pricig strategy such that for ay regular demad distributio its expected reveue is at least the offlie bechmark mius O((k log ) 2/3 ). We emphasize that Theorem 1.1 holds for a pricig strategy that does ot kow the demad distributio. The resultig mechaism is icetive-compatible as it is a posted price mechaism. The specific boud O((k log ) 2/3 ) is most iformative whe k log, so that the depedece o is isigificat; the focus here is to optimize the power of k. (Note that ay o-trivial boud must be below k.) The proof of Theorem 1.1 cosists of two stages. The first stage, immediate from [Ya 2011], is to observe that for ay regular demad distributio the expected reveue of the best fixed-price strategy 1 is close to the offlie bechmark. Heceforth, the expected reveue of the best fixed-price strategy will be called the fixed-price bechmark. The secod stage, which is our mai techical cotributio, is to show that our pricig strategy achieves expected reveue that is close to the fixed-price bechmark. Surprisigly, this holds without ay assumptios o the demad distributio. THEOREM 1.2. There exists a detail-free pricig strategy whose expected reveue is at least the fixed-price bechmark mius O((k log ) 2/3 ). This result holds for every demad distributio. Moreover, this result is the best possible up to a factor of O(log ). As discussed above, we recover the MAB techique from [Auer et al. 2002a] for the ulimited supply settig. The correspodig cotributio to the literature o MAB may be of idepedet iterest. If the demad distributio is regular ad moreover the ratio k is sufficietly small the the guaratee i Theorem 1.1 ca be improved to O( k log ), with a distributiospecific costat. THEOREM 1.3. There exists a detail-free pricig strategy whose expected reveue, for ay regular demad distributio F, is at least the offlie bechmark mius O(c F k log ) wheever k s F, where c F ad s F are positive costats that deped o F. For mootoe hazard rate distributios oe ca take s F = 1 4. The boud i Theorem 1.3 is achieved usig the pricig strategy from Theorem 1.1 with a differet parameter. 2 Varyig this parameter, we obtai a family of strategies that improve over the boud i Theorem 1.1 i the ice settig of Theorem 1.3, ad moreover have o-trivial additive guaratees for arbitrary demad distributios. However, we caot match both theorems with the same parameter. Note that the rate- k depedece o k i Theorem 1.3 cotais a distributiodepedet costat c F (which ca be arbitrarily large, depedig o F ), ad thus is ot directly comparable to the rate-k 2/3 depedece i Theorem 1.2. A sigificat gap betwee bouds with ad without distributio-depedet costats is ot ucommo i the literature o sequetial decisio problems, e.g. i [Auer et al. 2002a; Kleiberg ad Leighto 2003; Kleiberg et al. 2008]. 3 1 A fixed-price strategy is a pricig strategy that offers the same price to all agets, as log as it has items to sell. The best such strategy is oe with the maximal expected reveue for a give demad distributio. 2 The pricig strategy i Theorem 1.3 chooses amog prices δ(1+δ) i [0, 1], i N, where δ is the parameter. 3 E.g. for the K-armed badit problem with stochastic payoffs the best possible rates for regret with ad without a distributio depedet costat are respectively O(c F log ) ad O( K) [Auer et al. 2002a; Auer et al. 2002b; Audibert ad Bubeck 2010]. 77

5 I fact, we show that the c F k depedece o k is essetially the best possible. 4 We focus o the fixed-price bechmark (which is a weaker bechmark, so it gives to a stroger lower boud). Followig the literature, we defie regret as the fixed-price bechmark mius the expected reveue of our pricig strategy. THEOREM 1.4. For ay γ < 1 2, o detail-free pricig strategy ca achieve regret O(c F k γ ) for all demad distributios F ad arbitrarily large k,, where the costat c F ca deped o F. The bouds i Theorem 1.1 ad Theorem 1.2 are uiformative whe k = O(log 2 ). We ext provide aother detail-free, olie posted-price mechaism that gives meaigful bouds ot depedig o i the case that k is very small (but bigger tha some costat). THEOREM 1.5. There exists a detail-free pricig strategy such that for ay MHR demad distributio its expected reveue is at least the offlie bechmark mius O(k 3/4 poly log(k)) Related Work Dyamic pricig. Dyamic pricig problems ad, more geerally, reveue maagemet problems, have a rich literature i Operatios Research. A proper survey of this literature is beyod our scope; see [Besbes ad Zeevi 2009] for a overview. The mai focus is o parameterized demad distributios, with priors o the parameters. The study of dyamic pricig with ukow demad distributio (without priors) has bee iitiated i [Blum et al. 2003; Kleiberg ad Leighto 2003]. Several special cases of our settig have bee studied i [Kleiberg ad Leighto 2003; Babaioff et al. 2011; Besbes ad Zeevi 2009], detailed below. First, [Kleiberg ad Leighto 2003] cosider the ulimited supply case (buildig o the earlier work [Blum et al. 2003]). Amog other results, they study IID valuatios, i.e. our settig with k =. They provide upper bouds o regret of order O( 2/3 ) ad O(c F ). 5 The latter boud is aki to Theorem 1.3 i that it assumes a versio of regularity, ad depeds o a distributio-specific costat c F. Further, they prove matchig lower bouds; the latter imply Theorem 1.4 for the special case of ulimited supply. 6 O the other extreme, [Babaioff et al. 2011] cosider the case that the seller has oly oe item to sell (k = 1). They provide a super-costat multiplicative lower boud for urestricted demad distributio (with respect to the olie optimal mechaism), ad a costat-factor approximatio assumig MHR. Note that we also use MHR to derive bouds that apply to the case of a very small k. [Besbes ad Zeevi 2009] cosider a cotiuous-time versio which (whe specialized to discrete time) is essetially equivalet to our settig with k = Ω(). They prove a umber of upper bouds o regret with respect to the fixed-price bechmark, with guaratees that are iferior to ours. The key distictio is that their pricig strategies separate exploratio ad exploitatio. Assumig that the demad distributio F ( ) ad its iverse F 1 ( ) are Lipschitz-cotiuous, they achieve regret O( 3/4 ). They improve it to O( 2/3 ) if furthermore the demad distributios are parameterized, ad to 4 However, our lower boud is for arbitrary demad distributios rather tha regular oes. 5 Throughout this sectio, we omit the log factors i regret bouds. 6 The costructio i [Kleiberg ad Leighto 2003] that proves Theorem 1.4(a) for the ulimited supply case is cotaied i the proof of a theorem o adversarial valuatios, but the costructio itself oly uses IID valuatios. 78

6 O( ) if this is a sigle-parameter parametrizatio. Both results rely o kowig the parametrizatio: the mechaisms cotiuously update the estimates of the parameter(s) ad revise the curret price accordig to these estimates. The upper bouds i [Besbes ad Zeevi 2009] should be cotrasted with our O(k 2/3 ) upper boud that applies to a arbitrary k ad makes o assumptios o the demad distributio, ad the O(c F k) improvemet for MHR demad distributios. Also, [Besbes ad Zeevi 2009] cotais a Ω( ) lower boud for their otio of regret. Essetially, this lower boud compares the best pricig strategy for a give demad distributio to the best (distributio-depedet) pricig strategy for a fictitious eviromet where i every roud the mechaism sells a fractioal amout of good. I particular, this lower boud does ot have ay immediate implicatios o regret with respect to either of the two bechmarks that we use i this paper. Olie mechaisms. The study of olie mechaisms was iitiated by [Lavi ad Nisa 2000], who ulike us cosider the case that each aget is iterested i multiple items, ad provide a logarithmic multiplicative approximatio. Below we survey oly the most relevat papers i this lie of work, i additio to the special cases of our settig that we have already discussed. Several papers [Bar-Yossef et al. 2002; Blum et al. 2003; Kleiberg ad Leighto 2003; Blum ad Hartlie 2005] cosider olie mechaisms with ulimited supply ad adversarial valuatios (as opposed to limited supply ad IID valuatios i our settig). The mechaism i the iitial paper [Bar-Yossef et al. 2002] requires the agets to submit bids ad so is ot posted-price. The subsequet work [Blum et al. 2003; Kleiberg ad Leighto 2003; Blum ad Hartlie 2005] provides various improvemets. I particular, [Blum et al. 2003] (amog other results) desig a simple posted-price mechaism which achieves multiplicative approximatio 1+ɛ, for ay ɛ > 0, with a additive term that depeds o ɛ. 7 [Blum ad Hartlie 2005] use a more elaborate posted-price mechaism to improve the additive term. [Kleiberg ad Leighto 2003] show that the simple mechaism i [Blum et al. 2003] achieves regret O( 2/3 ); moreover, they provide a early matchig lower boud of Ω( 2/3 ). [Hajiaghayi et al. 2004; Devaur ad Hartlie 2009] study olie mechaisms for limited supply ad IID valuatios (same as us), but their mechaisms are ot postedprice. [Hajiaghayi et al. 2004] cosider a olie auctio model where players arrive ad depart olie, ad may misreport the time period durig which they participate i the auctio. This makes desigig strategy-proof mechaisms more challegig, ad as a result their mechaisms achieve a costat multiplicative approximatio rather tha additive regret. [Devaur ad Hartlie 2009] study several variats of the limited-supply mechaism desig problem: supply is kow or ukow, olie or offlie. Most related to our paper is their mechaism for limited, kow, olie supply. This mechaism is based o radom samplig ad achieves costat (multiplicative) approximatio, but is ot posted-price. Our mechaism is posted-price ad achieves low (additive) regret. Other work. Abset the supply costrait, our problem (ad a umber of related formulatios) fit ito the multi-armed badit (MAB) framework. 8 MAB has a rich literature i Statistics, Operatios Research, Computer Sciece ad Ecoomics. A proper discussio of this literature is beyod the scope of this paper; a reader ca refer 7 This result cosiders valuatios i the rage [1, H], ad the additive term also depeds o H. 8 To void a possible cofutio, we ote that the supply costrait i our settig may appear similar to the budget costrait i lie of work o budgeted MAB (see [Bubeck et al. 2009; Goel et al. 2009] for details ad further refereces). However, the budget i budgeted MAB is essetially the duratio of the experimetatio phase (), rather tha the umber of rouds with positive reward (k). 79

7 to [Cesa-Biachi ad Lugosi 2006; Bergema ad Välimäki 2006] for backgroud. Most relevat to our specific settig is the work o (prior-free) MAB with stochastic payoffs, e.g. [Lai ad Robbis 1985; Auer et al. 2002a], ad MAB with Lipschitzcotiuous stochastic payoffs, e.g. [Agrawal 1995; Kleiberg 2004; Auer et al. 2007; Kleiberg et al. 2008; Bubeck et al. 2008]. The posted-price mechaisms i [Blum et al. 2003; Kleiberg ad Leighto 2003; Blum ad Hartlie 2005] described above are based o a well-kow MAB algorithm [Auer et al. 2002b] for adversarial payoffs. The coectio betwee olie learig ad olie mechaisms has bee explored i a umber of other papers, icludig [Nazerzadeh et al. 2008; Devaur ad Kakade 2009; Babaioff et al. 2009; Babaioff et al. 2010]. Recetly, [Chawla et al. 2010; Chakraborty et al. 2010; Ya 2011] studied the problem of desigig a offlie, sequetial posted-price mechaisms i Bayesia settigs, where the distributios of valuatios are ot ecessarily idetical, yet are kow to the seller. Chawla et al. [Chawla et al. 2010] provide costat multiplicative approximatios. [Ya 2011] obtais a multiplicative boud that is optimal for large k, ad [Chakraborty et al. 2010] obtai a PTAS for all k. 2. PRELIMINARIES Throughout, we assume that agets valuatios are draw idepedetly from a distributio F with support i [0, 1], called demad distributio. We use p [0, 1] to deote a price. We let F (p) deote the c.d.f, ad S(p) = 1 F (p) deote the survival rate at price p. Let R(p) = p S(p) deote the reveue fuctio: the expected sigle-roud reveue at price p give that there is still at least oe item left. The demad distributio F is called regular if F ( ) is twice differetiable ad the reveue fuctio R( ) is cocave: R ( ) 0. We call F strictly regular if furthermore R ( ) < 0. The R(p) is icreasig for p p r ad decreasig for p p r, where p r is the uique maximizer, kow as the Myerso reserve price. Moreover, the survival rate S( ) is strictly decreasig, so the iverse S 1 is well-defied. We say F is a Mootoe Hazard Rate (MHR) distributio if F ( ) is twice differetiable ad the hazard rate H(p) F (p)/s(p) is o-decreasig. All MHR distributios are regular. A fixed-price strategy with agets, k items ad price p, deoted A k (p), is a pricig strategy that makes a fixed offer price p to every aget so log as fewer tha k items have bee sold, ad stops afterwards (equivaletly, from that poit always sets the price to ). Note that for the ulimited supply case A (p) sells S(p) items i expectatio. A pricig strategy is called detail-free if it does ot use the kowledge of the demad distributio. We are iterested i desigig detail-free pricig strategies with good performace for every demad distributio i some (large) family of distributios. We compare our mechaisms to two bechmarks that deped o the demad distributio: the maximal expected reveue of a offlie mechaism (the offlie bechmark), ad the maximal expected reveue of a fixed price mechaism (the fixed-price bechmark). A offlie mechaism that maximizes expected reveue was give i the semial paper [Myerso 1981]; it is ot a olie posted price mechaism. Let Rev(A) be the total expected reveue achieved by mechaism A. We defie the regret of A with respect to the fixed-price bechmark as follows: Regret(A) max p Rev[A k (p)] Rev(A). Thus, regret is the additive loss i expected reveue compared to the best fixed-price mechaism. (Note that the regret of A could, i priciple, be a egative umber, sice the fixed-price bechmark is ot geerally the Bayesia optimal pricig strategy for distributio F.) 80

8 Bechmarks Compariso. We observe that for regular demad distributios, the fixed-price bechmark is close to the offlie bechmark. This result is immediate from [Ya 2011]; we provide a self-cotaied proof i the full versio. LEMMA 2.1 ([YAN 2011]). For each regular demad distributio there is a fixedprice strategy whose expected reveue is at least the offlie bechmark mius O( k). Lemma 2.1 implies that ay pricig strategy with regret O(R), R = Ω( k) with respect to the fixed-price bechmark has the same asymptotic regret O(R) with respect to the offlie bechmark, as log as the demad distributio is regular, ad i particular if it is MHR. Therefore, the rest of the paper ca focus o the fixed-price bechmark. I particular, our mai result, Theorem 1.1 for regular distributios, follows from Theorem 1.2 that addresses the fixed-price bechmark. The expected reveue of a fixed-price mechaism has a easy characterizatio: CLAIM 2.2. Let A be the fixed-price mechaism with price p. The ν(p) O(p k log k) Rev(A) ν(p), where ν(p) p mi(k, S(p))). (1) It follows that for a strictly regular demad distributio the boud i Lemma 2.1 is satisfied for the fixed price p = argmax p ν(p) = max(p r, S 1 ( k )), where p r = argmax p p S(p) is the Myerso reserve price. PROOF. Let us focus o the first iequality i (1) (the secod oe is obvious). Let X t be the idicator variable of sale i roud t. Deote X = t=1 X t ad let µ = E[X]. The by Cheroff Bouds (Theorem 3.7(a)) with probability at least 1 1 k it holds that X µ O( µ log k), i which case #sales = mi(k, X) mi(k, µ O( µ log k)) mi(k, µ) O( k log k), which implies the claim sice µ = S(p). 3. THE MAIN TECHNICAL RESULT: THE UPPER BOUND IN THEOREM 1.2 This sectio is devoted to the mai techical result (the upper boud i Theorem 1.2) which asserts that there exists a detail-free pricig strategy whose regret with respect to the fixed-price bechmark is at most O(k log ) 2/3. This result is very geeral, as it makes o assumptios o the demad distributio. As discussed i Sectio 1, we desig a algorithm that carefully optimizes the explore-exploit trade-off. We use a idex-based algorithm i which each arm is assiged a umerical score, called idex, so that i each roud a arm with the highest idex is picked. The idex of a arm depeds oly o the past history of this arm. I prior work o idex-based badit algorithms the idex of a arm was defied accordig to estimated expected payoff from this arm i a sigle roud. Istead, we defie the idex accordig to estimated expected total payoff from this arm give the costraits. We apply the above idea to UCB1. The idex i UCB1 is, essetially, the best available Upper Cofidece Boud (UCB) o the expected sigle-roud payoff from a give arm. Accordigly, we defie a ew idex, so that the idex of a give price correspods to a UCB o the expected total payoff from this price (i.e., from a fixed-price strategy with this price), give the umber of agets ad the ivetory size. Such idex takes ito accout both the average payoff from this arm ( exploitatio ) ad the umber of samples for this arm ( exploratio ), as well as the supply costrait. I particular we recover the appealig property of UCB1 that it does ot separate exploratio ad exploitatio, ad istead explores arms (prices) accordig to a schedule that uceasigly adapts to the observed payoffs. 81

9 There are several steps to make this approach more precise. First, while it is temptig to use the curret values for the umber of agets ad the ivetory size to defie the idex, we adopt a o-obvious (but more elegat) desig choice to use the origial values, i.e. the ad the k. Secod, sice the exact expected total payoff for a give price is hard to quatify, we will istead use a atural approximatio thereof provided by ν(p) i Claim 2.2. I other words, our idex will be a UCB o ν(p). Third, i specifyig the UCB we will use o-stadard estimator from [Kleiberg et al. 2008] to better hadle prices with very low survival rate. The mai techical hurdle i the aalysis is to charge each suboptimal price for each time that it is chose, i a way that the total regret is bouded by the sum of these charges ad this sum ca be usefully bouded from above. The aalysis of UCB1 accomplishes this via simple (but very elegat) tricks which, ufortuately, fail i the limited supply settig. A additioal difficulty comes from the probabilistic ature of the aalysis. While we adopt a well-kow trick we defie some high-probability evets ad assume that these evets hold determiistically i the rest of the aalysis choosig a appropriate collectio of evets is, i our case, o-trivial. Provig that these evets ideed hold with high probability relies o some o-stadard tail bouds from prior work Our pricig strategy Let us defie our pricig strategy, called CappedUCB. The pricig strategy is iitialized with a set P of active prices. I each roud t, some price p P is chose. Namely, for each price p P we defie a umerical score, called idex, ad we pick a price with the highest idex, breakig ties arbitrarily. Oce k items are sold, CappedUCB sets the price to ad ever sells ay additioal item. By Claim 2.2, the expected reveue from the fixed-price strategy A k (p) is roughly ν(p) p mi(k, S(p)). I each roud t, we defie the idex I t (p) as a UCB o ν(p): I t (p) p mi(k, St UB (p)). Here St UB (p) is a UCB o the survival rate S(p), as defied below. For each p P, let N t (p) be the umber of rouds before t i which price p has bee chose, ad let k t (p) be the umber of items sold i these rouds. The Ŝt(p) k t (p)/n t (p) is the curret average survival rate. To avoid divisio by zero, we defie Ŝ t (p) to be equal to 1 whe N t (p) = 0. We will defie St UB (p) = Ŝt(p) + r t (p), where r t (p) is a cofidece radius: some umber such that S(p) Ŝt(p) r t (p) ( p P, t ). (2) holds with high probability, amely with probability at least 1 2. We eed to defie a suitable cofidece radius r t (p), which we wat to be as small as possible subject to (2). Note that r t (p) must be defied i terms of quatities that are observable at time t, such as N t (p) ad Ŝt(p). A stadard cofidece radius used i the literature is (essetially) r t (p) = Θ(log ) N. t(p)+1 Istead, we use a more elaborate cofidece radius from [Kleiberg et al. 2008]: r t (p) α N t (p) α Ŝt(p), for some α = Θ(log ). (3) N t (p) + 1 The cofidece radius i (3) performs as well as the stadard oe i the worst case: r t (p) O(log ) N, ad much better for very small survival rates: r O(log ) t(p)+1 t(p) N. t(p)+1 82

10 Mechaism 1 Pricig strategy CappedUCB for agets ad k items Parameter: δ (0, 1) 1: P {δ(1 + δ) i [0, 1] : i N} { active prices } 2: While there is at least oe item left, i each roud t pick ay price p argmax p P I t (p), where I t (p) is the idex give by (4). 3: For all remaiig agets, set price p =. To recap, we have I t (p) p mi(k, (Ŝt(p) + r t (p))), where r t (p) is from (3). (4) Fially, the active prices are give by P = {δ(1 + δ) i [0, 1] : i N}, where δ (0, 1) is a parameter. (5) This completes the specificatio of CappedUCB. See Mechaism 1 for the pseudocode Aalysis of the pricig strategy Our goal is to boud from above the regret of CappedUCB, which is the differece betwee the optimal expected reveue of a fixed-price strategy ad the expected reveue of CappedUCB. We prove that CappedUCB achieves regret O(k log ) 2/3 for a suitable choice of parameter δ i (5). LEMMA 3.1. CappedUCB with δ = k 1/3 (log ) 2/3 achieves regret O(k log ) 2/3. The boud above is trivial for k < log 2, so we will assume k log 2 from ow o. Note that CappedUCB exits (sets the price to ) after it sells k items. For a thought experimet, cosider a versio of this pricig strategy that does ot exit ad cotiues ruig as if it has ulimited supply of items; let us call this versio CappedUCB. The the realized reveue of CappedUCB is exactly equal to the realized reveue obtaied by CappedUCB from sellig the first k items. Thus from here o we focus o aalyzig the latter. We will use the followig otatio. Let X t be the idicator variable of the radom evet that CappedUCB makes a sale i roud t. Note that X t is a 0-1 radom variable with expectatio S(p t ), where p t depeds o X 1,..., X t 1. Let X t=1 X t be the total umber of sales if the ivetory were ulimited. Note that E[X] = S t=1 S(p t). Goig back to our origial algorithm, let Rev deote the realized reveue of CappedUCB (reveue that is realized i a give executio). The Rev = N t=1 p t X t, where N = max{n : N t=1 X t k}. (6) High-probability evets. We tame the radomess iheret i the sales X t by settig up three high-probability evets, as described below. I the rest of the aalysis, we will argue determiistically uder the assumptio that these three evets hold. It suffices because the expected loss i reveue from the low-probability failure evets will be egligible. The three evets are summarized i the followig claim: CLAIM 3.2. With probability 1 2, for each roud t ad each price p P: ( ) S(p) Ŝt(p) α r t (p) 3 N + α S t(p) t(p)+1 N t(p)+1, (7) X S < O( S log + log ), (8) t=1 p t(x t S(p t )) < O( S log + log ). (9) 83

11 The probability bouds o the three evets i Claim 3.2 are derived via appropriate tail iequalities, some of which are o-stadard; see Sectio 3.3 for further discussio ad proofs. I the first evet, the left iequality asserts that r t (p) is a cofidece radius, ad the right iequality gives the performace guaratee for it. The other two evets focus o CappedUCB, ad boud the deviatio of the total umber of sales (X) ad the realized reveue ( t=1 p t X t ) from their respective expectatios; importatly, these boud are i terms of S rather tha. I the rest of the aalysis we assume that the three evets hold determiistically. Sigle-roud aalysis. Let us aalyze what happes i a particular roud t of the pricig strategy. Let p t be the price chose i roud t. Let p act argmax p P ν(p) be the best active price accordig to ν( ), ad let νact ν(p act ). Let (p) max(0, 1 ν act p S(p)) be our otio of badess of price p, compared to the optimal approximate reveue ν. We will use this otatio throughout the aalysis, ad evetually we will boud regret i terms of p P (p) N(p), where N(p) is the total umber of times price p is chose. CLAIM 3.3. For each price p P it holds that N(p) (p) O(log ) ( 1 + k ) 1 (p). (10) PROOF. By defiitio (2) of the cofidece radius, for each price p P ad each roud t we have ν(p) I t (p) p mi (k, (S(p) + 2 r t (p))). (11) Let us use this to coect each choice p t with ν act : { It (p t ) I t (p act ) ν(p act ) ν act I t (p t ) p t mi (k, (S(p t ) + 2 r t (p t ))). Combiig these two iequalities, we obtai the key iequality: 1 ν act p t mi ( k, S(p t) + 2 r t (p t ) ). (12) There are several cosequeces for p t ad (p t ): p t 1 k ν act (p t ) 2 p t r t (p t ) (p t ) > 0 S(p t ) < k. (13) The first two lies i (13) follow immediately from (12). To obtai the third lie, ote that (p t ) > 0 implies p t k νact > p t S(p t ), which i tur implies S(p t ) < k. Note that we have ot yet used the defiitio (3) of the cofidece radius. For each price p = p t, let t be the last roud i which this price has bee selected by the pricig strategy. Note that N(p) (the total #times price p is chose) is equal to N t (p) + 1. The usig the secod lie i (13) to boud (p), Eq. (7) to boud the cofidece radius r t (p), ad the third lie i (13) to boud the survival rate, we obtai: (p) O(p) max ( log N(p), k log N(p) Rearragig the terms, we ca boud N(p) i terms of (p) ad obtai (10). Aalyzig the total reveue. A key step is the followig claim that allows us to cosider t=1 p t S(p t ) istead of the realized reveue Rev, effectively igorig the capacity costrait. This is where we use the high-probability evets (8) ad (9). For brevity, let us deote β(s) = O( S log + log ). 84 ).

12 CLAIM 3.4. Rev mi(ν act, t=1 p t S(p t )) β(k). PROOF. Recall that p t 1 k ν act by (13). It follows that Rev νact if t=1 X t > k. Hece, if Rev < νact the t=1 X t k ad so Rev = t=1 p t X t. Thus, by (9) we have Rev mi (νact, t=1 p t X t ) mi (νact, t=1 p t S(p t ) β(s)). So the claim holds whe S k. O the other had, if S > k the by (8) it holds that X S β(s) k β(k) It follows that Rev mi(k, X) ( 1 k ν act ) ν act β(k). I light of Claim 3.4, we ca ow focus o t=1 p t S(p t ). t=1 p t S(p t ) t=1 1 ν act (p t) = νact t=1 (p t) = νact p P (p) N(p). (14) Fix a parameter ɛ > 0 to be specified later, ad deote { Psel {p P : N(p) 1} P ɛ {p P sel : (p) ɛ} to be, resp., the set of prices selected at least oce ad the set of prices of badess at least ɛ that have bee selected at least oce. Pluggig (10) ito (14), we obtai p P (p) N(p) p P sel \P ɛ (p) N(p) + p P ɛ (p) N(p) ) ɛ + O(log ) p P ɛ (1 + k ɛ + O(log ) 1 (p) ( P ɛ + k 1 p P ɛ (p) ). (15) Combiig (14), (15) ad Claim 3.4 yields a claim that summarizes our fidigs: CLAIM 3.5. For ay set P of active prices ad ay parameter ɛ > 0 it holds that ( ) νact E[ Rev] ɛ + O(log ) P ɛ + k 1 p P ɛ (p) + β(k). Iterestigly, this claim holds for ay set of active prices. The followig claim, however, takes advatage of the fact that the active prices are give by (5). CLAIM 3.6. ν act ν δk, where ν max p ν(p). PROOF. Let p argmax p ν(p) deote the best fixed price with respect to ν( ), ties broke arbitrarily. If p δ the ν δk. Else, lettig p 0 = max{p P : p p } we have p 0 /p 1 1+δ 1 δ, ad so ν act ν(p 0) p0 p ν(p ) ν (1 δ) ν δk. It follows that for ay ɛ > 0 ad δ (0, 1) we have: ( ) Regret O(log ) P ɛ + k 1 p P ɛ (p) + ɛ + δk + β(k). (16) Pluggig i (p) ɛ for each p P ɛ i (16), we obtai: Regret O( P ɛ log ) ( ɛ k ) + ɛ + δk + β(k). Note that P 1 δ log. To simplify, we will assume that δ 1 ad ɛ = δ k. The ( Regret O δk + 1 δ (log ) 2 + ) k log. (17) 2 85

13 Fially, it remais to pick δ to miimize the right-had side of (17). Let us simply take δ such that the first two summads are equal: δ = k 1/3 (log ) 2/3. The the two summads are equal to O(k log ) 2/3. This completes the proof of Lemma Tail bouds ad the proof of Claim 3.2 We use the well-kow Cheroff Bouds (e.g., Theorem 2.3 i [McDiarmid 1998]). THEOREM 3.7. Cosider i.i.d. radom variables X 1... X with values i [0, 1]. Let X = i=1 be their average, ad let µ = E[X]. The: (a) Pr[ X µ > δµ] < 2 e µδ2 /3 for ay δ (0, 1). (b) Pr[X > a] < 2 a for ay a > 6µ. Further, we use a somewhat o-stadard corollary from [Kleiberg et al. 2008] which provides a sharper (i.e., smaller) cofidece radius (i the sese of (3)) whe µ is small. For completeess, we provide the proof i the full versio. THEOREM 3.8. Cosider i.i.d. radom variables X 1... X o [0, 1]. Let X be their average, ad let µ = E[X]. The for ay α > 0, lettig r(α, x) = α + αx, we have: Pr [ X µ < r(α, X) < 3 r(α, µ) ] > 1 e Ω(α), PROOF (7) IN CLAIM 3.2. For each price p P let {Z i,p } i be a family of idepedet 0-1 radom variables with expectatio S(p). Without loss of geerality, let us preted that the i-th time that price p is selected by the pricig strategy, sale happes if ad oly if Z i,p = 1. The by Lemma 3.8 after the i-th play of price p the boud (7) holds with probability at least 1 4. Takig the Uio Boud over all choices of i ad all choices of p, we obtai that (7) holds with probability at least 1 2 as log as P (which is the case for us). Sharper Azuma-Hoeffdig iequality. We use a tail boud o the sum of radom variables X t {0, 1} such that each variable X t is a radom coi toss with probability M t that depeds o the previous variables X 1,..., X t 1. We are iterested i boudig the deviatio X M, where X = t X t ad M = t M t. The well-kow Azuma- Hoeffdig iequality states that with high probability we have X M O( log ). However, we eed a sharper high-probability boud: X M O( M log ). Moreover, we eed a extesio of such boud which cosiders deviatio t=1 α t(x t M t ), where each multiplier α t [0, 1] is determied by X 1,..., X t 1. THEOREM 3.9. Let X 1,..., X be 0-1 radom variables. Let M = t=1 E[X t X 1,..., X t 1 ]. For each t, let α t [0, 1] be the multiplier determied by X 1,..., X t 1. The for ay b 1 the evet t=1 α t(x t M t ) b( M log + log ). holds with probability at least 1 Ω(b). We have ot bee able to fid this exact formulatio i the literature. I the full versio we derive it as a corollary of a more geeral boud from the literature. PROOF (8) AND (9) IN CLAIM 3.2. Recall that for each t, X t is a 0-1 radom variable with expectatio S(p t ), where p t depeds o X 1,..., X t 1. Usig Lemma 3.9 with α t 1 we obtai (8). Usig Lemma 3.9 with α t = p t we obtai (9). 4. THE O( K LOG N) REGRET BOUND (THEOREM 1.3) We show that the pricig strategy from Sectio 3 (with a differet parameter) satisfies a improved regret boud, O( k log ), if the demad distributio is regular ad moreover the ratio k is sufficietly small. The regret boud depeds o a distributiospecific costat. 86

14 THEOREM 4.1. For ay regular demad distributio F there exist positive costats s F, c F such that CappedUCB with δ = k 1/2 log() achieves regret O(c F k log ) wheever k s F. For mootoe hazard rate distributios we ca take s F = 1 4. PROOF. Let g(s) s S 1 (s) be a fuctio from [S(1), 1] to [0, 1] that maps a survival rate to the correspodig reveue. Regularity implies g ( ) 0. Sice g (0) > 0, we ca pick a costat s F > 0 such that C g (s F ) > 0. For mootoe hazard rate distributios we ca take s F = 1 4 because for ay maximizer s of g( ) it holds that s 1 e (this claim is proved i the full versio). Now, for ay k s F we have that g ( k ) C. We will use this to obtai a lower boud o (p); ay such lower boud is abset i the aalysis i Sectio 3. This improvemet results i savigs i (16), which i tur implies the claimed regret boud. We will use the otatio from Sectio 3.2, particularly the badess (p) ad the set P ɛ of arms of badess ɛ that have bee selected at least oce. Note that by regularity g (s) C for ay s (0, k ). Let p = S 1 ( k ) ad p P ɛ. By the third lie i (13) it holds that S(p) < k ad the p > p. First, we claim that S(p) < p k p. This is because p S(p) = g(s(p)) < g( k ) = p k. Secod, we boud (p) from below: 1 ν act (1 δ) ν (1 δ) g( k ) (p) (1 δ) g( k ) g(s(p)) [g( k ) g(s(p))] δ g( k ) C( k S(p)) δ k p C k (1 p p ) δ k p C k (1 p p (1 + δ C )). Sice P is give by (5), it holds that P ɛ {p α (1 + δ) i : i N} for some α 1. Defie P {p P ɛ : p = p α (1 + δ) i with i 2 C }. The for ay p P it holds that p/p = α(1 + δ) i 1 + iδ ad therefore (p) C k (1 1+δ/C 1+iδ ) C 2 Therefore, otig that P P O( 1 δ log 1 δ ), we have k 1 p P (p) 2 C k iδ 1+iδ. p P (1 + 1 iδ ) 2 C ( P + 1 δ log P ) O( 1 1 C δ log 1 δ ) p P ɛ\p 1 (p) 1 ɛ P \ P 1 ɛ ( 2 C + 1). Pluggig this ito (16) with ɛ = δ k, we obtai: k 1 p P ɛ (p) O( 1 δ log 1 δ )(1 + 1 C ) Regret O(δk + 1 δ (1 + 1 C )(log )2 + k log ) (18) O(c F k log ), where cf = 1 + 1/C. The regret boud (18) improves over the correspodig boud (17) i Sectio 3. We obtai the fial boud by pluggig δ = k 1/2 log. It is desirable to achieve the bouds i Theorem 1.2 ad Theorem 4.1 usig the same pricig strategy. The choice of parameter δ i Theorem 4.1 results i a trivial O(k) regret guaratee for arbitrary demad distributios (as per Equatio (17)). However, varyig δ ad usig Equatios (17) ad (18) we obtai a family of pricig strategies 87

15 that improve over the boud i Theorem 1.2 for the ice settig i Theorem 4.1, ad moreover have o-trivial regret bouds for arbitrary demad distributios. THEOREM 4.2. For each γ [ 1 3, 1 2 ], pricig strategy CappedUCB with parameter δ = Õ(k γ ) achieves regret Õ(k1 γ )(1 + 1/g ( k )) if the demad distributio is regular ad g ( k ) > 0, ad regret Õ(k2γ ) for arbitrary demad distributios. 5. LOWER BOUNDS We prove two lower bouds o regret over all demad distributios which match the upper bouds i Theorem 1.2 ad Theorem 1.3, respectively. (Note that the latter upper boud is specific to regular distributios.) Throughout this sectio, regret is with respect to the fixed-price bechmark. Our proof is a black-box reductio to the ulimited supply case (k = ), which i tur is proved i [Kleiberg ad Leighto 2003]. THEOREM 5.1. Cosider the dyamic pricig problem with agets ad k items. (a) No detail-free pricig strategy ca achieve regret o(k 2/3 ) for arbitrarily large k,. (b) No detail-free pricig strategy ca achieve regret O(c F k γ ), γ < 1 2 for all demad distributios F ad arbitrarily large k,, where the costat c F ca deped o F. PROOF. Suppose that some pricig strategy A violates part (a). The there is a sequece {k i, i } i N, where k i i ad {k i } i N is strictly icreasig, such that A achieves regret o(k 2/3 ) for all problem istaces with i agets ad k i items, for each i N. To obtai a cotradictio, let us use A to solve the ulimited supply problem with regret o( 2/3 ). Specifically, we will solve problem istaces with k i /4 agets, for each i. Fix i N ad let k = k i ad = i. Cosider a problem istace I with ulimited supply ad k/4 agets ad survival rate S( ). Let I be a artificial problem istace with ulimited supply ad agets, so that the first k/4 agets i I correspod to I. Form a artificial problem istace J with k items ad agets as follows: i each roud, A outputs a price, the with probability k/2 this price is offered to the ext aget i I, ad with the remaiig probability there is o iteractio with agets i I ad o sale. Sice the demad distributio for J is a mixture of the o sale evet which happes with probability 1 k 2 ad the origial demad distributio for I, the survival rate for J is give by S J (p) = k 2 S(p). Ruig A o problem istace J iduces a pricig strategy A o the origial problem istace I. (If A stops before it iterates through all agets i I, the remaiig agets i I are offered a price of.) We will show that A achieves regret o(k 2/3 ) o I. Let Rev J (A) ad Rev J (A) be, respectively, the expected reveue ad the realized reveue of A o problem istace J. Let r = argmax p ps(p) be the Myerso reserve price, ad let A r be the fixed-price strategy with price r. By our assumptio, we have that Rev J (A) Rev J (A r ) o(k 2/3 ). We eed to deduce Rev I (A ) Rev I (A r ) o(k 2/3 ). Let N be the umber of rouds i J i which A iteracts with the agets i I. With high probability k 4 < N < k. Let us coditio o N ad the evet E N {k/4 < N < k}: Sice E[N] = k 2, it follows that E[ Rev J (A r ) N, E N ] = NrS(r) E[ Rev J (A) Rev I (A ) N, E N ] (N k 4 ) rs(r). Rev I (A ) Rev J (A) k 4 rs(r) o(1) Rev J (A r ) k 4 rs(r) o(k2/3 ) = k 4 rs(r) o(k2/3 ) = Rev I (A r ) o(k 2/3 ), as required. The reductio for part (b) proceeds similarly. 88

16 6. SELLING VERY FEW ITEMS: PROOF OF THEOREM 1.5 I this sectio we target a case whe very few items are available for sale (roughly, k < O(log 2 )), so that the boud i Theorem 1.1 becomes trivial. We provide a differet pricig strategy whose regret does ot deped o, uder the mild assumptio of mootoe hazard rate. We rely o the characterizatio i Claim 2.2: we look for the price p = max(p r, S 1 ( k )), where p r = argmax p p S(p) is the Myerso reserve price. The pricig strategy proceeds as follows (see Mechaism 2). It cosiders prices p l = (1 δ) l, l N sequetially i the descedig order. For each l, it offers the price p l to a fixed umber of agets. The loop stops oce the pricig strategy detects that, essetially, the best p l has bee reached: either S(p l ) is close to k, or we are ear a maximum of p S(p). Parameters are chose so as to miimize regret. Mechaism 2 Descedig prices Parameter: Approximatio parameters δ, ɛ [0, 1] 1: Let α = ( k 1 δ, ) γ = mi(α, 1/e). 2: l 0, l max 0, R max 0. 3: repeat 4: l l + 1, p l (1 + δ) l 5: Offer price p l to m = δ log 1+δ (1/ɛ) agets. 6: Let S l be the fractio of them who accept. 7: Let R l = p l S l be the average per aget reveue. 8: If S l (1 + δ) 1 γ ad R l R max, 9: the R max R l, l max l 10: util p l ɛ or S l (1 + δ)α or R l (1 + δ) 2 R max 11: Offer price p = p l so log as usold items remai. THEOREM 6.1. For some parameters ɛ ad δ, Mechaism 2 achieves regret O ( k 3/4 poly log(k) ) with respect to the offlie bechmark, for ay demad distributio that satisfies the mootoe hazard rate coditio. We prove Theorem 6.1 for parameters ɛ = k 1/4 ad δ = ( 1 k log k)1/4. We derive it from a multiplicative boud: if p > ɛ the the expected reveue of Mechaism 2 is at least 1 O(δ) fractio of the offlie bechmark. It appears difficult to prove the additive boud directly. The proof ca be foud i the full versio. 7. CONCLUSIONS AND OPEN QUESTIONS We cosider dyamic pricig with limited supply ad achieve ear-optimal performace usig a idex-based badit-style algorithm. A key idea i desigig this algorithm is that we defie the idex of a arm (price) accordig to the estimated expected total payoff from this arm give the kow costraits. It is worth otig that a good idex-based algorithm did ot have to exist i our settig. Ideed, may badit algorithms i the literature are ot idex-based, e.g. EXP3 [Auer et al. 2002b] ad zoomig algorithm [Kleiberg et al. 2008] ad their respective variats. The fact that Gittis algorithm [Gittis 1979] ad UCB1 [Auer et al. 2002a] achieve (ear-)optimal performace with idex-based algorithms was widely see as a impressive cotributio. While i this paper we apply the above key idea to a specific idex-based algorithm (UCB1), it ca be see as a (iformal) geeral reductio for idex-based algorithms 89

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL A SULEMENTAL MATERIAL Theorem (Expert pseudo-regret upper boud. Let us cosider a istace of the I-SG problem ad apply the FL algorithm, where each possible profile A is a expert ad receives, at roud, a