Pricing for Customers with Probabilistic Valuations as a Continuous Knapsack Problem

Prcng for Customers wth Probablstc Valuatons as a Contnuous Knapsack Problem Mchael Bensch School of Computer Scence, Carnege Mellon Unversty mbensch@cs.cmu.edu Norman Sadeh School of Computer Scence, Carnege Mellon Unversty sadeh@cs.cmu.edu James Andrews School of Computer Scence, Carnege Mellon Unversty jandrews@andrew.cmu.edu ABSTRACT In ths paper, we examne the problem of choosng dscrmnatory prces for customers wth probablstc valuatons and a seller wth ndstngushable copes of a good. We show that under certan assumptons ths problem can be reduced to the contnuous knapsack problem (CKP). We present a new fast ɛ-optmal algorthm for solvng CKP nstances wth arbtrary concave reward functons. We also show that our algorthm can be extended beyond the CKP settng to handle prcng problems wth overlappng goods (e.g.goods wth common components or common resource requrements), rather than ndstngushable goods. We provde a framework for learnng dstrbutons over customer valuatons from hstorcal data that are accurate and compatble wth our CKP algorthm, and we valdate our technques wth experments on prcng nstances derved from the Tradng Agent Competton n Supply Chan Management (TAC SCM). Our results confrm that our algorthm converges to an ɛ-optmal soluton more quckly n practce than a prevously proposed greedy heurstc. 1. INTRODUCTION In ths paper we study a ubqutous prcng problem: a seller wth fnte, ndstngushable copes of a good attempts to optmze proft n choosng dscrmnatory, take-t-or-leave-t offers for a set of customers. Each customer draws a valuaton from some probablty dstrbuton known to the seller, and decdes whether or not they wll accept the seller s offers (we wll refer to ths as a probablstc prcng problem for short). Ths settng characterzes exstng electronc markets bult around supply chans for goods or servces. In such markets, sellers can buld probablstc valuaton models for ther customers, e.g.to capture uncertanty about prces of- Ths space wll be flled wth Copyrght Informaton. fered by compettors, or to reflect the demand of ther own customers. We show that ths prcng problem s equvalent to a contnuous knapsack problem (CKP) (. e. the prcng problem can be reduced to the knapsack problem and vce versa) under two reasonable assumptons:.) that probablstc demand s equvalent to actual demand, and.) that the seller does not wsh to over promse goods n expectaton. The CKP asks: gven a knapsack wth a weght lmt and a set of weghted tems each wth ts value defned as a functon of the fracton possessed fll the knapsack wth fractons of those tems to maxmze the knapsack s value. In the equvalent prcng problem, the tems are the customer demand curves. The weght lmt s the supply of the seller. The value of a fracton of an tem s the expected value of that customer demand curve. The expected value s defned as the probablty wth whch the customer s expected to accept the correspondng offer tmes the offer prce. Studes of CKPs n Artfcal Intellgence (AI) and Operatons Research (OR) most often focus on classes nvolvng only lnear and quadratc reward functons [10]. We present a fast algorthm for fndng ɛ-optmal solutons to CKPs wth arbtrary concave reward functons. The class of prcng problems that reduce to CKPs wth concave reward functons nvolves customers wth valuaton dstrbutons that satsfy the dmnshng returns (DMR) property. We further augment our CKP algorthm by provdng a framework for learnng accurate customer valu- aton dstrbutons that satsfy ths property from hstorcal prcng data. We also show that our algorthm can be extended to prcng problems that nvolve sellers wth dstngushable goods that requre some ndstngushable shared resources (for example common components or shared assembly capacty). Such problems more accurately represent the movement from make-to-stock producton to assemble-to-order and make-to-order producton, but nvolve constrants that are too complex for tradtonal CKP algorthms. The class of prcng problems that reduce to CKPs wth concave reward functons nvolve customers wth valuaton

dstrbutons that satsfy the dmnshng returns (DMR) property. Therefore, we augment our CKP algorthm by provdng a framework for learnng accurate customer valuaton dstrbutons that satsfy ths property from hstorcal prcng data. The rest of ths paper s structured as follows: In Secton 2 we dscuss related work on the probablstc prcng and contnuous knapsack problems. In Secton 3 we present the prcng problem and ts equvalence to contnuous knapsack. In Secton 4 we present our ɛ-optmal bnary search algoorthm for concave CKPs. Secton 5 presents the framework for learnng customer valuaton functons. In Secton 6 we valdate our algorthm and framework emprcally on nstances derved from the Tradng Agent Competton n Supply Chan Management (TAC SCM). 2. BACKGROUND 2.1 Related Work on Prcng Problems The prcng problem we study captures many real world settngs, t s also the bass of nteractons between customers and agents n the Tradng Agent Competton n Supply Chan Management. TAC SCM s an nternatonal competton that revolves around a game featurng sx competng agents each entered by a dfferent team. In TAC SCM smulated customers submt requests for quotes (RFQs) whch nclude a PC type, a quantty, a delvery date, a reserve prce, and a tardness penalty ncurred for mssng the requested delvery date. Agents can respond to RFQs wth prce quotes, or bds, and the agent that offers the lowest bd on an RFQ s rewarded wth a contractual order (the reader s refered to [3] for the full game specfcaton). Other entrants from TAC SCM have publshed technques that can be adapted to the settng we study. Pardoe and Stone proposed a heurstc algorthm wth motvatons smlar to ours [8]. The algorthm greedly allocates resources to customers wth the largest ncrease n prce per addtonal unt sold. Bensch et. al. suggested dscretzng the space of prces and usng Mxed Integer Programmng to determne offers [1], however ths technque requres a farly coarse dscretzaton on large-scale problems. Sandholm and Sur provde research on the closely related settng of demand curve prcng. The work n [11] nvestgates the problem of a lmted supply seller choosng dscrmnatory prces wth respect to a set of demand curves. Under the assumptons we make, the optmal polynomal tme prcng algorthm presented n [11] translates drectly to the case when all customers have unform valuaton dstrbutons. Addtonally, the result that non-contnuous demand functons are N P-Complete to prce optmally n [11], mples the same s true of non-contnuous valuaton dstrbutons. Addtonally there have been several algorthms developed for solvng certan classes of contnuous knapsack problems. When rewards are lnear functons of the ncluded fractons of tems, t s well known that a greedy algorthm provdes an optmal soluton n polynomal tme 1. CKP nstances 1 Lnear reward functons for CKP would result from a prcng problem where all customers have fxed valuatons. wth concave quadratc reward functons can be solved wth standard quadratc programmng solvers [10], or the algorthm provded by Sandholm and Sur. The only technque that generalzes beyond quadratc reward functons was presented by Melman and Rabnowtz n [7]. The technque n that paper provdes a numercal soluton to symmetrc CKP nstances where all reward functons are concave and dentcal 2. However, ths technque nvolves solvng a dffcult root fndng problem, and ts computatonal costs have not been fully explored. 2.2 Related Work on Learnng Valuatons The second group of relevant work nvolves learnng technques for dstrbutons over customer valuatons. Relevant work on automated valuaton proflng has focused prmarly on frst prce sealed bd (FPSB) reverse aucton settngs. Reverse auctons refer to scenaros where several sellers are bddng for the busness of a sngle customer. In the FPSB varant customers collect bds from all potental sellers and pay the prce assocated wth the lowest bd to the lowest bdder. Predctng the wnnng bd n a frst prce reverse aucton amounts to fndng the largest prce a seller could have offered the customer and stll won. From the pont of vew of a seller, ths prce s equvalent to the customer s valuaton for the good. Pardoe and Stone provde a technque for learnng dstrbutons over FPSB reverse auctons n TAC SCM [8]. The technque nvolves dscretzng the range of possble customer valuatons, and tranng a regresson from hstorcal data at each dscrete valuaton. The regresson s used to predct the probablty that a customer s valuaton s less than or equal to the dscrete pont t s assocated wth. Smlar technques have been used to predct FPSB aucton prces for IBM PCs [6], PDA s on ebay [5], and arlne tckets [4]. 3. MARKET MODEL 3.1 P3ID We defne the Probablstc Prcng Problem wth Indstngushable Goods (P3ID) as follows: A seller has k ndstngushable unts of a good to sell. There are n customers that demand dfferent quanttes of the good. Each customer has a prvate valuaton for the entrety of her demand, and the seller has a probablstc model of ths valuaton. Formally the seller has the followng nputs: k: the number of ndstngushable goods avalable to sell. n: the number of customers that have expressed demand for the good. q : the number of unts demanded by the th customer. G (v ): a cumulatve densty functon ndcatng the probablty that the th customer draws a valuaton below v. Consequently, 1 G (p) s the probablty that the customer wll be wllng to purchase her demand at prce p. 2 Identcal reward functons for CKP would result from a prcng problem where all customers draw valuatons from the same dstrbuton.

The seller wshes to make optmal dscrmnatory take-t-orleave-t offers to all customers smultaneously. We make the followng two assumptons as part of the P3ID to smplfy the problem of choosng prces: Contnuous Probablstc Demand (CPD) Assumpton: For markets nvolvng a large number of customers, we can assume that the customer cumulatve probablty curves can be treated as contnuous demand curves. In other words f a customer draws a valuaton greater than or equal to $1000 wth probablty 1, we assume the customer demands 1 of her 2 2 actual demand at that prce. Ths s formally modelled by the probablstc demand of customer at prce p, q (1 G (p)). Expected Supply (ESY) Assumpton: We assume that the seller mantans a strct polcy aganst overofferng supply n expectaton by lmtng the number of goods sold to k (the supply). Note that k s not necessarly the entrety of the seller s nventory. Under these assumptons, the goal of the seller s to choose a prce to offer each customer, p, that maxmzes the expected total revenue functon, F(p): F(p) = X (1 G (p )) q p (1) Subject to the ESY constrant that supply s not exceeded n expectaton: X (1 G (p )) q k (2) 3.2 P3ID and CKP Equvalence To demonstrate the equvalence between the P3ID and CKP we wll show that an nstance of ether can easly be reduced to an nstance of the other. CKP nstances nvolve a knapsack wth a fnte capacty, k, and a set of n tems. Each tem has a reward functon, f (x), and a weght w. Includng a fracton x of tem n the knapsack yelds a reward of f (x ) and consumes w x of the capacty. We can easly reduce a P3ID nstance to a CKP nstance usng the followng converson: Set the knapsack capacty to the seller s capacty n the P3ID nstance. k CKP = k P3ID Include one tem n the CKP nstance for each of the n customers n the P3ID nstance. Set the weght of the th tem to the customer s demanded quantty n the P3ID nstances. w = q Set the reward functon of the th tem to be the nverse of the seller s expected revenue from customer. f (x) = G 1 (1 x) x q The fracton of each tem ncluded n the optmal soluton to ths CKP nstance, x, can be converted to an optmal prce n the P3ID nstance, p, usng the nverse of the CDF functon over customer valuatons, p = G 1 (1 x ) To reduce a CKP nstance to a P3ID nstance we can reverse ths reducton. The CDF functon for the new P3ID nstance s defned as, G (p) = 1 f 1 (p) p q Once found, the optmal prce for a customer, p, can be translated to the optmal fracton to nclude, x, usng ths CDF functon, x = G (p ) Ths equvalence does not hold f ether the CDF over customer valuatons n the P3ID nstance, or the reward functon n the CKP nstance s not nvertble. However, f the nverse exsts but s dffcult to compute numercally, t can be approxmated to arbtrary precson by precomputng a mappng from nputs to outputs. 3.3 Example Problem We provde ths smple example to llustrate the knd of prcng problem we address n ths paper, and ts reducton to a CKP nstance. Our example nvolves a PC Manufacturer wth k = 5 fnshed PCs of the same type. Two customers have submtted requests for prces on dfferent quanttes of PCs. Customer A demands 3 PCs and Customer B demands 4 PCs. Each customer has a prvate valuaton, f the manufacturer s offer prce s less than or equal to ths valuaton the customer wll purchase the PCs. Based on publc attrbutes that the Customers have revealed, the seller s able to determne that Customer A has a normal unt-valuaton (prce per unt) dstrbuton wth a mean of $1500 and a standard devaton of $300, g A = N(1500, 300), and Customer B has a normal untvaluaton dstrbuton wth mean of $1200, and a standard devaton of $100, g B = N(1200, 100). Fgure 1(a) shows the expected revenue ganed by the seller from each customer as a functon of the offer prce accordng to these valuaton dstrbutons. Fgure 1(b) shows the reward functons for the correspondng CKP nstance as a functon of the fracton of the customer s demand ncluded n the knapsack. Note that n ths example, as the prce offered to Customer A (or Customer B) ncreases the probablty (or Customer B) acceptng t decreases, and hence so does the expected

Expected Revenue, (1 - G (p )) * p 1000 900 800 700 600 500 400 300 200 100 0 Exepected Revenue as a Functon of Offer Prce Customer A Customer B 1000 1200 1400 1600 1800 2000 Offer Prce, p CKP Reward, f 4000 3500 3000 2500 2000 1500 1000 500 0 CKP Reward as Functon of Included Fracton Customer A Customer B 0.2 0.4 0.6 0.8 Fracton of Demand Included, x (a) The expected unt-revenue generated for the seller by each customer as a functon of the customer s offer prce ((1 G (p ))p )), wth p between 1000 and 2000. (b) The reward functon n the CKP nstance correspondng to the expected revenue curves n Fgure 1(a). Reward s presented as a functon of the fracton of demand ncluded n the knapsack. Fgure 1: The expected customer revenue and correspondng reward for the example problem n Secton 3.3 number of PCs sold to that customer. The manufacturer wshes to choose prces to offer each customer to maxmze hs overall expected revenue, and sell less than or equal to 5 PCs n expectaton. In ths example t can be shown that the optmal soluton s for the manufacturer to offer a unt prce of $1413 to Customer A, whch has about a 58% chance of beng accepted, and a prce of $1112 to Customer B whch has about an 81% chance of beng accepted. The total expected revenue of ths soluton s about $1212 per unt and t sells exactly 5 unts n expectaton. 4. SOLVING ASYMMETRIC CONCAVE CKPS 4.1 Characterzng an Optmal Soluton The man dea behnd our algorthm for solvng asymmetrc CKPs s to add tems to the knapsack accordng to the rate, or frst dervatve, of ther reward functons. We wll show that, f all reward functons are concave 3, they share a unque frst dervatve value n an optmal soluton. Fndng the optmal soluton amounts to searchng for ths frst dervatve value. To formalze and prove ths proposton we ntroduce the followng notatons, Let φ (x) = f (x) 1 w, be the frst dervatve of the th tem s unt reward functon. Item s unt reward functon s ts reward per weght unt. Let φ 1 ( ), be the nverse of the frst dervatve of th tem s unt reward functon. In other words, t returns the fracton of the th tem where ts unt reward s changng at the rate. Proposton 1. Gven a CKP nstance, K, f all f n K are concave over the nterval [0, 1], then there exsts a unque 3 Secton 5.1 explans why we can reasonably restrct our consderaton to concave reward functons n reductons from P3ID nstances. such that, x = φ 1 ( ), where x s the fracton of the th tem n an optmal soluton to K. Proof. Frst we wll prove that φ (x) s nvertble, and that φ 1 ( ) s unque for all. The reward functons and unt reward functons (snce these are smply scaled versons of the orgnals) n the CKP nstance are concave on the nterval [0, 1], by the predcate of our proposton. In other words, the frst dervatve of each unt reward functon, φ (x), s decreasng and unque on the nterval [0, 1]. Because each unt reward functon s frst dervatve s contnuous, decreasng, and unque, t s nvertble, and ts nverse, φ 1 ( ), s unque 4. We wll now prove that the unt reward functons of any two tems, and j, must share the same frst dervatve value n the optmal soluton. To do ths we ntroduce the followng Lemma, Lemma 1. If f s concave over the nterval [0, 1], φ 1 ( ) ncreases as decreases from φ (0) to 0. Essentally the Lemma states that as the dervatve of tem s unt reward functon ncreases, the fracton of the tem ncluded n the knapsack shrnks. Ths s true because, as we have shown, the dervatve s decreasng and unque. For the remander of the proof there are two cases we must consder: Case 1: the knapsack s not full n the optmal soluton. In ths case the unt reward functons wll all have dervatves 4 Ths nverse may be dffcult to characterze numercally. However, the precomputaton technque suggested for approxmatng the nverse of G or f apples to φ as well.

Fgure 2: Intal values for + and are computed from the even CKP soluton for the example problem n Secton 3.3. of 0, snce every tem s ncluded up to the pont where ts reward begns to decrease 5. Case 2: the knapsack s full n the optmal soluton. In ths case we wll assume that f and f j do not share the same dervatve value, and show ths assumpton leads to a contradcton. Specfcally, we can assume, wthout loss of generalty, that the reward functon of tem has a larger frst dervatve than j,.e. φ (x ) > φ j(x j). Therefore, there must exst some ɛ, such that addng t to tem j s unt reward dervatve mantans the nequalty, φ (x ) > φ j(x j)+ɛ. We can then construct an alternatve soluton to K as follows: Set x j n our alternatve soluton to be the fracton of tem j that provdes ts orgnal dervatve plus ɛ, x j = φ 1 j (φ j(x j) + ɛ) By Lemma 1 we know that x j < x j, whch provdes some excess space, α, n the knapsack, α = w j(x j x j). We can fll the empty space wth tem, up to the pont where the knapsack s full, or ts dervatve decreases by ɛ, x = mn x + α «, φ 1 (φ (x ) ɛ) w It must be that x > x. Ether all of the knapsack space from tem j was added, n whch case the fracton 5 We assume that all reward functons have dervatves 0 when an tem s entrely ncluded n the knapsack, snce the tem cannot possbly provde any addtonal reward. of tem clearly ncreased. Otherwse, ts dervatve value decreased by ɛ, whch, by Lemma 1, must have ncreased ts ncluded fracton. If φ (x ) decreased by ɛ before the knapsack flled up, we can reallocate the excess space to j, x j = (k x ) 1 w j Notce that we have constructed our alternate soluton by movng the same number of knapsack unts from tem j to tem. In our constructon we guaranteed that tem was ganng more reward per unt durng the entre transfer. Therefore, the knapsack space s more valuable n the alternate soluton. Ths contradcts our assumpton that x and x j were part of an optmal soluton. We have shown that any two unt reward functons must share the same dervatve value,, n an optmal soluton. Ths mples that all unt reward functons must share the dervatve value n an optmal soluton (snce no two can dffer). 4.2 Fndng In our proof of Proposton 1 we showed that 0. We also showed that as ncreases, the fracton of each tem n the knapsack decreases. Thus, one method for fndng would be to begn wth = 0 and ncrement by ɛ untl the resultng soluton s feasble (fts n the knapsack). However, much of ths search effort can be reduced by employng a bnary search technque. Fgure 3 presents pseudo-code for a bnary search algorthm

procedure ɛ-opt CKP(K) x even CKP(K) + max φ 1 (x ) mn φ 1 (x ) return bnary search( +,, K) procedure bnary search( +,, K) f converged( +,, K) then x + {φ 1 1 ( + ),..., φ 1 n ( + )} return x + end f δ + 2 f feasble({φ 1 1 (δ),..., φ 1 n (δ)}, K) then return bnary search(δ,, K) else return bnary search( +, δ, K) end f procedure even CKP(K) ŵ P w return { k ŵ,..., k ŵ } procedure feasble(x, K) return P wx k procedure converged( +,, K) x + {φ 1 1 ( + ),..., φ 1 n ( + )} x {φ 1 1 ( ),..., φ 1 n ( )} return P f(x+ ) f (x ) ɛ Fgure 3: Pseudo-code for an ɛ-optmal concave CKP bnary search algorthm. that fnds solutons provably wthn ɛ of an optmal reward value. The algorthm recursvely refnes ts upper and lower bounds on, + and, untl the reward dfference between solutons defned by the bounds s less than or equal to ɛ. The ntal bounds, shown n Fgure 2, are derved from a smple feasble soluton where the same fracton of each tem s ncluded n the knapsack (see even CKP n Fgure 3). The largest dervatve value n ths soluton provdes the upper bound, +. Ths s because we can reduce the ncluded fractons of each tem to the pont where all of ther dervatves equal +, and guarantee the soluton s stll feasble. By the same reasonng, the smallest dervatve value n the smple soluton provdes a lower bound. Fgure 2 shows how ntal values of + and are computed from the even soluton on the Example problem from Secton 3.3. Durng each teraton, a new canddate bound, δ, s computed by halvng the space between the pror bounds. The process contnues recursvely: f the new bound defnes a feasble soluton t replaces the old upper bound, otherwse (f t s not a vald upper bound), t replaces the old lower bound. When the algorthm converges the soluton defned by s guaranteed to be feasble and wthn ɛ of the optmal soluton. Convergence s guaranteed snce we have proved that exsts, and the bounds get tghter after each teraton. It s dffcult to provde theoretcal guarantees about the number of teratons, snce convergence s defned n terms of the nstance-specfc reward functons. However, the emprcal results n Secton 6 show that the algorthm typcally converges exponentally fast n the number of feasblty checks. 4.3 Shared Resource Extenson Our ɛ-optmal bnary search algorthm can be extended to solve problems nvolvng more complex resource constrants than typcally assocated wth CKPs. In partcular, the algorthm can be generalzed to solve reductons of Probablstc Prcng Problems wth Shared Resources (P3SR). P3SR nstances nvolve sellers wth multple dstngushable goods for sale. Each good n a P3SR consumes some amount of a fnte shared resource, such as parts or assembly tme. Ths model provdes a more accurate characterzaton of the problem faced by PC Manufacturers, and agents n TAC SCM. By applyng the reducton descrbed n Secton 3.2, a P3SR nstance can be converted to a problem smlar to a CKP nstance. However, the resource constrant n the resultng problem s more complex than ensurng a knapsack contans less than ts weght lmt. It could nvolve determnng the feasblty of a potentally N P-Hard schedulng problem, n the case of a shared assembly lne and demands wth deadlnes. Clearly, ths would requre, among other thngs, changng the feasblty checkng procedure (see feasble() n Fgure 3), and could make each check substantally more expensve. 5. CUSTOMER VALUATIONS 5.1 Dmnshng Returns Property Our algorthm was desgned to solve CKP reductons of P3ID nstances. Recall that t apples only when the reward functons are concave over the nterval [0, 1]. Ths s not a partcularly restrctve requrement. In the settng where customers draw valuatons accordng to nternal processes, ths corresponds wth the the Dmnshng Returns 6 (DMR) property. Defnton 1. The DMR property s satsfed when, for 6 Ths s also occasonally referred to as the Decreasng Margnal Returns property.

any gven ncrease n output (flled customer demand), the ncrease n expected revenue s less per unt than t was for any prevous ncrease n output. Ths s generally accepted as characterzng most real-world economc processes [2]. Another typcal settng nvolves the seller competetng wth others, and valuatons that reflect the compettors prces. In ths settng normally dstrbuted competng prces can also be shown to result n concave reward functons. Ths s representatve of large markets where market transparency leads sellers to hover around a common prce. 5.2 Normal Dstrbuton Trees We consder a technque whch a seller may use to model a customer s valuaton dstrbuton. It wll use a normal dstrbuton to ensure our model satsfes the desred DMR property. We assume that customers have some publc attrbutes, and the seller has hstorcal data assocatng attrbutes vectors wth valuatons. Our technque trans a regresson tree to predct a customer s valuaton from the hstorcal prcng data. A regresson tree splts attrbutes at nternal nodes, and bulds a lnear regresson that best fts the tranng data at each leaf. When a valuaton dstrbuton for a new customer needs to be created, the customer s assocated wth a leaf node by traversng the tree accordng to her attrbutes. The predcton from the lnear model at the leaf node s used as the mean of a normal valuaton dstrbuton, and the standard devaton of the dstrbuton s taken from tranng data that generated the leaf. Formally the regresson tree learnng algorthm receves as nput, n: the number of tranng examples. a : the attrbute vector of the th tranng example. v : the valuaton assocated wth the th tranng example. A regresson tree learnng algorthm, such as the M5 algorthm [9], can be used to learn a tree, T, from the tranng examples. After the constructon of T, the j th leaf of the tree contans a lnear regresson over attrbutes, y j(a). The regresson s constructed to best ft the tranng data assocated wth the leaf. The leaf also contans the average error over ths data, s j. The regresson tree, T, s converted to a dstrbuton tree by replacng the regresson at each node wth a normal dstrbuton. The mean of the normal dstrbuton at the j th leaf s set to the predcton of the regresson, µ j = y j(a). The standard devaton of the dstrbuton at the j th leaf s set to the average error over tranng examples at the leaf, σ j = s j. Fgure 4 shows an example of ths knd of normal dstrbuton tree. 5.3 Learnng Customer Valuatons n TAC SCM TAC SCM provdes an deal settng to evaluate the dstrbuton tree technque descrbed n the prevous secton. Each Fgure 4: An example Normal Dstrbuton Tree customer request n TAC SCM can be assocated wth several attrbutes. The attrbutes nclude characterzatons of the request, such as ts due date, PC type, and quantty. The attrbutes also nclude hgh and low sellng prces for the requested PC type from prevous smulaton days. Upon the completon of a game, the prce at whch each customer request was flled s made avalable to agents. Ths data can be used wth the technque descrbed n the prevous secton to tran a normal dstrbuton tree. The tree can then be used n subsequent games to construct valuaton dstrbutons from request attrbutes. Fgure 5.3 shows the accuracy curve of a normal dstrbuton tree traned on hstorcal data wth an M5 learnng algorthm. Tranng nstances were drawn randomly from customer requests n the 2005 Sem-Fnal round of TAC SCM and testng nstances were drawn from the Fnals. The attrbutes selected to characterze each request ncluded: the due date, PC type, quantty, reserve prce, penalty, day on whch the request was placed, and the hgh and low sellng prces of the requested PC type from the prevous 5 game days. The error of the dstrbuton was measured n the followng way: startng at p =.1, and ncreasng to p =.9, the traned dstrbuton was asked to supply a prce for all test nstances that would fall below the actual closng prce (be a wnnng bd) wth probablty p. The average dfference between p and the actual percentage of test nstances won was consdered the error of the dstrbuton. The experments were repeated wth 10 dfferent tranng and testng sets. The results show that normal dstrbuton trees can be used to predct dstrbutons over customer valuatons n TAC SCM wth about 95% accuracy after about 25,000 tranng examples. 6. EMPIRICAL EVALUATION 6.1 Emprcal Setup Our experments were desgned to nvestgate the convergence rate of the ɛ-optmal bnary search algorthm. We generated 100 CKP nstances from P3ID nstances based on the prcng problem faced by agents n TAC SCM. The P3ID nstances were generated by randomly selectng customer requests from the fnal round of the 2005 TAC SCM.

% Error n Learned Dstrbuton Dstrbuton Accuracy 18 M5 Normal Dstrbuton Tree 16 14 12 10 8 6 4 0 20000 40000 60000 80000 100000 120000 Number of Tranng Examples The frst set of results, shown n Fgure 7(a), confrm that the ɛ-optmal bnary search algorthm converges exponenprocedure greedy CKP(K) converged whle converged and P x < n do argmax unt reward ncrease (, x, K) δ best ncrease(, x, K) f feasble({x 1,..., x + δ,..., x n}, K) then x x + δ else x x + 1 w `k P xw converged end f end whle return {x 1,..., x n} Fgure 5: The accuracy curve of an M5 normal dstrbuton tree as the number of tranng nstances ncreases. Each customer request n TAC SCM has a quantty randomly chosen unformly between 1 and 20 unts. Normal probablty dstrbutons were generated to approxmate the customer valuatons of each customer usng the technque descrbed n Secton 5 wth an M5 Regresson Tree learnng algorthm. The learnng algorthm was gven 50,000 tranng nstances from the 2005 TAC SCM Sem-Fnal rounds. We tested our algorthm aganst the even soluton, whch allocates equal resources to each customer, and the greedy heurstc algorthm used by the frst place agent, TacTex [8]. Fgure 6.1 provdes pseudo-code for the TacTex algorthm. It greedly adds fractons of tems to the knapsack that result n the largest ncreases n expected revenue. We performed three sets of experments. The frst set of experments provded each algorthm wth 20 PCs to sell n expectaton, and the same 200 customer requests (ths represents a prcng nstance of a TAC SCM agent operatng under a make-to-stock polcy). Fgure 7(a) shows each algorthm s percentage of optmalty after each feasblty check. For the second set of experments, the algorthms were gven 200 customer requests, and ther PC supply was vared by 10 from k = 10, to k = 100. Fgure 7(b) shows the number of feasblty checks needed by the bnary search and greedy algorthms to reach solutons wthn 1% of optmal. The last set of experments fxed k = 20 and vared n by 100 from n = 200 to n = 1000. Fgure 7(c) shows the number of feasblty checks needed by each algorthm to reach a soluton wthn 1% of optmal as n ncreased. 6.2 Emprcal Results The results presented n Fgure 7 compare the optmalty of the CKP algorthms to the number of feasblty checks performed. Ths comparson s mportant to nvestgate for two reasons,.) because t captures the convergence rate of the algorthms, and.) because these algorthms are desgned to be extended to shared resource settngs where each feasblty check nvolves solvng (or approxmatng) an N P-Hard schedulng problem. procedureunt reward ncrease(, x, δ, K) δ best ncrease(, x, K) return 1 w (f (x + δ ) f (x )) procedure best ncrease(, x, K) return argmax δ f (x +δ) f (x ) δ Fgure 6: Pseudo-code for the greedy heurstc algorthm used by the 2005 frst placed agent, TacTex. tally fast n the number of consstency checks. In addton, the results confrm the ntutons of Pardoe and Stone n [8] that the greedy heurstc fnds near optmal solutons on CKP nstances generated from TAC SCM. However, the results also show that t has a lnear, rather than exponental, convergence rate n terms of consstency checks. Fnally, the frst set of results show that the even soluton, whch does not use consstency checks, provdes solutons to TAC SCM nstances that are about 80% optmal on average. Fgures 7(b) and 7(c) nvestgate how the number of feasblty checks needed to fnd near (wthn 99% of) optmal solutons changes as the supply and number of customers ncrease. The even soluton s not ncluded n these results because t does not produce near optmal solutons. The results shown n Fgure 7(b) show that the number of consstency checks used by the greedy algorthm ncreases lnearly wth the sze of the knapsack, whereas the convergence rate of the bnary search algorthm does not change. The results shown n Fgure 7(c) show that the number of consstency checks used by both algorthms does not sgnfcantly ncrease wth the number of customers. 7. CONCLUSION In ths paper we defne the P3ID and formally show ts equvalence the CKP. The P3ID s a prcng problem that captures the challenges faced by sellers n large electronc markets, and the CKP s a problem that has receved much attenton n tradtonal AI and OR lterature. P3ID nstances wth customer valuaton dstrbutons that satsfy the DMR

% Optmal Revenue 1 0.8 0.6 0.4 0.2 0 % Optmalty Versus Number of Feasblty Checks ε-opt_ckp greedy_ckp even_ckp 0 2 4 6 8 10 12 14 16 18 20 Number of Feasblty Checks property reduce to CKP nstances wth arbtrary concave reward functons. However, tradtonal CKP lterature has largely gnored CKP nstances wth arbtrary nonlnear concave reward functons. To address ths gap, we provde a new ɛ-optmal algorthm for such CKP nstances, and we show that t converges exponentally fast n practce. Fnally, we provde a technque for learnng dstrbutons over customer valuatons from hstorcal data, by extendng exstng regresson tree learnng algorthms. We valdate our dstrbuton learnng technque on data from TAC SCM, and show that t acheves about 95% accuracy n ths settng. 8. ACKNOWLEDGMENTS The research reported n ths paper has been funded by the Natonal Scence Foundaton under ITR Grant 0205435. (a) Ths graph shows how the optmalty of each algorthm mproves wth each feasblty check t uses. Number of Feasblty Checks 90 80 70 60 50 40 30 20 10 Number of Feasblty Checks Versus k ε-opt_ckp greedy_ckp 0 0 10 20 30 40 50 60 70 80 90 100 Knapsack Capacty, k (b) The number of feasblty checks needed to reach a soluton wthn 1% of optmal as k ncreases. Number of Feasblty Checks 50 45 40 35 30 25 20 15 10 5 0 Number of Feasblty Checks Versus n ε-opt_ckp greedy_ckp 0 100 200 300 400 500 600 700 800 900 1000 Number of Items, n (c) The number of feasblty checks needed to reach a soluton wthng 1% of optmal as n ncreases. Fgure 7: Performance of CKP algorthms on nstances reduced from TAC SCM prcng problems. Unless otherwse specfed, results are averaged over 100 CKP nstances wth n = 200 and k = 20. 9. REFERENCES [1] M. Bensch, A. Greenwald, I. Grypar, R. Lederman, V. Narodtsky, and M. C. Tschantz. Bottcell: A supply chan management agent. In Thrd Internatonal Jont Conference on Autonomous Agents and Multagent Systems AAMAS 04, pages 1174 1181, New York, July 2004. [2] K. E. Case and R. C. Far. Prncples of Economcs (5th ed.). Prentce-Hall, 1999. [3] J. Collns, R. Arunachalam, N. Sadeh, J. Erksson, N. Fnne, and S. Janson. The supply chan management game for the 2005 tradng agent competton. Techncal Report CMU-ISRI-04-139, Carnege Mellon Unversty, 2005. [4] O. Etzon, R. Tuchnda, C. A. Knoblock, and A. Yates. To buy or not to buy: mnng arfare data to mnmze tcket purchase prce. In KDD 03: Proceedngs of the nnth ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng, pages 119 128, New York, NY, USA, 2003. ACM Press. [5] R. Ghan. Prce predcton and nsurance for onlne auctons. In KDD 05: Proceedng of the eleventh ACM SIGKDD nternatonal conference on Knowledge dscovery n data mnng, pages 411 418, New York, NY, USA, 2005. ACM Press. [6] D. Lawrence. A machne-learnng approach to optmal bd prcng. In Proceedngs of INFORMS 03, 2003. [7] A. Melman and G. Rabnowtz. An effcent method for a class of contnuous knapsack problems. Socety for Industral and Appled Mathematcs Revew, 42(3):440 448, 2000. [8] D. Pardoe and P. Stone. Bddng for customer orders n TAC SCM. In Proceedngs of AAMAS-04 Workshop on Agent-Medated Electronc Commerce, 2004. [9] J. R. Qunlan. Learnng wth Contnuous Classes. In 5th Australan Jont Conference on Artfcal Intellgence, pages 343 348, 1992. [10] A. G. Robnson, N. Jang, and C. S. Lerme. On the contnuous quadratc knapsack problem. Math. Program., 55(1-6):99 108, 1992.

[11] T. Sandholm and S. Sur. Market clearablty. In IJCAI, pages 1145 1151, 2001.