A Truthful Budget Feasible Multi-Armed Bandit Mechanism for Crowdsourcing Time Critical Tasks

A Truthful Budget Feasible Multi-Armed Bandit Mehanism for Crowdsouring Time Critial Tasks ABSTRACT Arpita Biswas Xerox Researh Centre India Bangalore, India arpita.biswas@xerox.om Debmalya Mandal Shool of Engineering and Applied Sienes Harvard University, Cambridge, MA - 0238 dmandal@g.harvard.edu Motivated by alloation and priing problems faed by servie requesters on modern rowdsouring platforms, we study a multi-armed bandit (MAB) problem with several realworld features: (a) the requester wishes to rowdsoure a number of tasks but has a fixed budget whih leads to a trade-off between ost and quality while alloating tasks to workers; (b) eah task has a fixed deadline and a worker who is alloated a task is not available until this deadline; () the qualities (probability of ompleting a task suessfully within deadline) of rowd workers are not known; and (d) the rowd workers are strategi about their osts. We propose a mehanism that maximizes the expeted number of suessfully ompleted tasks, assuring budget feasibility, inentive ompatibility, and individual rationality. We establish an upper bound of O(B 2/3 (K ln(kb)) /3 ) on the expeted regret of the proposed mehanism with respet to an appropriate benhmark algorithm, where B is the total budget and K is the number of workers. Next, we provide a haraterization of any deterministi truthful mehanism that solves the above lass of problems and use this haraterization to establish a lower bound of Ω(B 2/3 K /3 ) on the expeted regret for any budgeted MAB mehanism satisfying the above properties. Categories and Subjet Desriptors I.2. [Distributed Artifiial Intelligene]: Multiagent systems; I.2.6 [Learning]: Parameter learning This work was done when the author was a student at Indian Institute of Siene This work was partially supported by a Mirosoft Researh dotoral fellowship grant to the author. This work was done when the author was a researh assoiate at Indian Institute of Siene Appears in: Proeedings of the 4th International Conferene on Autonomous Agents and Multiagent Systems (AAMAS 205), Bordini, Elkind, Weiss, Yolum (eds.), May 4 8, 205, Istanbul, Turkey. Copyright 205, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. Shweta Jain Indian Institute of Siene Bangalore, India, 56002 jainshweta@sa.iis.ernet.in Keywords Y. Narahari Indian Institute of Siene Bangalore, India, 56002 hari@sa.iis.ernet.in Multi-Armed Bandits; Mehanism Design; Online Learning; Crowdsouring; Regret Bound; Rational Agents. INTRODUCTION Over the past deade, rowdsouring has reeived signifiant attention for its utility in solving problems that require intelligene and effort of human beings. In a typial rowdsouring platform, the requesters submit their tasks to the platform and expet the tasks to be ompleted with high quality. The requesters are required to make online deisions about the tasks to be assigned to speifi workers and the payments to be offered. Often these deisions are omplex, requiring the use of algorithms that learn different attributes of the workers over time, suh as quality of the workers, time taken by a worker to omplete a task, et., besides taking onto aount the strategi behavior of the workers. In this paper, we study a budgeted multi-armed bandit mehanism, motivated by online rowdsouring platforms like guru.om, elane.om, rent-aoder.om et., where eah requester posts tasks and workers bid for the desired tasks. In rent-aoder.om, for example, a requester posts a projet that has to be ompleted within a budget and has a fixed deadline. One the projet is posted, various registered workers an bid for the projet. The requester then assigns the projet to one or more workers depending on their bids and his past experiene about the quality of the workers. However, if the requester is ompletely unaware of the quality of the workers bidding on the projet, the requester would like to learn the qualities of these workers while ensuring that the projet is ompleted within the projet deadline. This ould be done by dividing the projet into smaller tasks and giving these tasks to the workers in a sequential manner. Deadlines of these tasks should be planned in aordane with the overall projet deadline. Submissions made by the alloated workers ould be evaluated for quality. For example, if a task is ompleted with high quality by a worker, the same worker an be given another task. Also, for posting a new task, the requester has to hoose among only the available workers. Motivated by suh rowdsouring platforms, we onsider a model where a requester has a set of homogeneous tasks 0

that need to be ompleted within a fixed budget. Eah task has a fixed deadline whih is assumed to be the same for all the tasks. Eah worker on the other hand has a fixed quality whih is unknown to the requester and ost for ompleting a task is private information of the worker. In order to maximize the quality, the requester assigns the tasks to the workers one by one so as to learn their qualities. We assume that the suess or failure of a task an be verified instantly as soon as the task deadline elapses, and thus the quality of the workers an be estimated. The estimated quality is useful to determine future alloations. No task is alloated to the worker until the deadline for the alloated task elapses. In order to utilize the budget in the best possible way, the requester has to inentivize the workers to bid their true ost of effort, at the same time, ensure voluntary partiipation. Thus, the requester seeks to maximize the expeted number of tasks ompleted suessfully within the fixed budget, besides ensuring truthfulness and voluntary partiipation of the strategi workers. The above situation is an example of sequential deision making in an unertain environment. Here, the requester seeks to optimize his alloation and payment deisions while ontinuously gathering more information about the qualities of the workers. This leads to a trade-off between exploration (alloating tasks to all workers suffiiently often to obtain better estimates of the quality of eah worker) and exploitation (alloating eah task to the best available worker). These kind of problems naturally fall into multiarmed bandit (MAB) problems. The presene of strategi agents in multi-armed bandit problems leads to multi-armed bandit (MAB) mehanism design problems. The budget onstraint leads to budgeted MAB mehanism design problems. We have, in addition, deadlines for tasks whih have to be honored.. Our Contributions There exist several papers in the literature (we provide a review in the next setion) that deal with budgeted multiarmed bandit problems. However, there is no existing work that additionally aptures the task deadlines and strategi nature of workers over their osts. The main ontributions of our work are as follows. We propose a MAB mehanism that takes into aount limited budget, task deadlines, unknown qualities, and strategi workers (strategi about their osts). Note that the quality of a worker refers to the probability of the worker ompleting a task suessfully within the given deadline. Our mehanism maximizes the expeted number of tasks ompleted suessfully subjet to budget feasibility, inentive ompatibility, and individual rationality. We believe this is the first effort in designing a mehanism for this lass of problems. We establish an upper bound of O(B 2/3 (K ln(kb)) /3 ) (Theorem 2) on the expeted regret of the proposed mehanism with respet to an appropriate benhmark algorithm, where B is the total budget and K is the number of workers. The unertainty in the availability of a worker, the budget onstraint, and the strategi nature of the workers render the regret analysis hallenging. We provide a haraterization of any deterministi truthful mehanism that solves the above lass of problems. We establish a lower bound Ω(B 2/3 K /3 ) on the expeted regret (Theorem 6). 2. RELATED WORK Multi-armed bandit problems have been studied extensively for solving problems in different domains [, 4, 7]. The budgeted multi-armed bandit that is lose to our setting is onsidered by Tran-Thanh et al. [8, 7], without strategi agents and task deadlines. The algorithms in [8, 7] ahieves regret of O(B 2/3 ) and O(ln(B)) respetively. A more general formulation is onsidered by Agrawal and Devanur [2] where the authors attempt to maximize a onave objetive funtion with onvex onstraints via multiarmed bandit algorithm. However, these algorithms do not onsider the strategi behavior of the agents and assume that the osts are private knowledge [8, 7] or stohasti [2]. Budgeted MAB problems have also been widely studied for priing tasks (or items) in rowdsouring (or dynami prourement) problems. With workers arriving online with a fixed and known distribution, Singer et al. [5] onsidered a budgeted setting with a goal to maximize the total number of alloated tasks to the workers. The workers were assumed to omplete the task suessfully if alloated and thus, the goal was to design a priing mehanism to omplete the tasks within a budget. In our setting, workers omplete the alloated task with a fixed probability whih is unknown and we wish to design an aution mehanism by inentivizing the workers to bid their true ost of effort. Badanidiyuru et al. [6] and Ho et al. [] modeled dynami prourement and rowdsouring problems as MAB problems where arms orresponded to feasible posted-pries. However, they did not aptured task deadlines and strategi nature of the workers. The survey paper by Slivkins et al. [6] lists results for various rowdsouring problems and provides insights for possible diretions of researh in this area. In MAB problems, the need for mehanism design arises when eah arm holds some private valuations. Thus, in addition to learning the unknown parameters, the mehanism is also required to eliit the private valuations of the arms truthfully. Most of the researh in this area deals with forward aution, for example, aution of ad-slots on a webpage among advertisers, where the lik probabilities of the advertisers are to be learnt and the value of an advertisement is held privately by the orresponding advertiser. In the online advertising ontext, MAB mehanisms aim at maximizing the revenue of the platform or the soial welfare as a whole. Devanur et al. [9] showed that the truthful restrition on pay-per-lik online advertising problem imposes statistial limits on ahievable regret in terms of revenue and thus the ahievable regret is very high (Θ(n 2/3 )). Babaioff et al. [5] proved that any truthful mehanisms for forward aution must separate exploration and exploitation, and the regret in terms of soial welfare is Ω(n 2/3 K /3 ). Our work, when ompared to the existing MAB mehanisms, is novel due to the extension to the ase of limited budget and non-availability of workers due to task deadlines. In the absene of learning, Singer et al. [4] onsidered the problem of budget feasible truthful mehanism for nondereasing submodular valuation funtions. We wish to point out here that while we adopt alloation tehniques from standard MAB variations like budgeted MAB by Guha et al. [0] and sleeping bandits due to unavailability by Kleinberg et al. [2], however, the presene of strategi agents requires 02

a arefully designed payment rule that makes regret analysis interesting. 3. THE MODEL In this setion, we formalize the budgeted MAB mehanism design problem for rowdsouring senario. We onsider a requester and a fixed set of K workers denoted by N = {,..., K}. The requester has a set of homogeneous tasks to be ompleted within a budget B. In addition to the budget, the requester also has a fixed deadline and he wishes to omplete all the tasks within the deadline. On the other hand, eah worker i N is assoiated with a quality q i and inurs a ost i for ompleting a task. The quality q i represents the probability with whih the worker i suessfully ompletes the alloated task within the speified deadline. The qualities are initially unknown to the requester as well as the workers, whereas, the osts of the workers are the private information held by the respetive workers. We onsider a general version of this problem wherein the osts and qualities do not depend on eah other. The requester s objetive is to design a mehanism that maximizes the expeted number of suessfully ompleted tasks within the budget B, in the presene of strategi workers. The model is desribed in Figure. Symbol Desription K Number of workers N Set of workers {, 2,..., K} B Total budget available q i Quality (probability of suess) of worker i ˆq i Estimated quality of worker i ˆq + i Upper onfidene bound on ˆq i i True ost of worker i Vetor of true osts = (,..., K) ĉ i Bid of worker i ĉ i Bid vetor of all the workers exept i [, ] Minimum and maximum bids τ Fixed deadline for exeuting eah task B t Budget remaining with the requester after t time steps s Suess realization where, s i,t is the indiator variable denoting suess of the task submitted at time step t by worker i A t i(ĉ; s; B t ) Indiator funtion denoting whether the worker i is alloated for a task at time step t Pi t (ĉ; s; B t ) Payment for a task to an alloated worker i at time step t Ui t (ĉ; i; s; B t ) Utility of a worker i for task at time step t E[R] Expeted regret argmax n [a] Gives the index of n th maximum value in an array [a] Table : Table of Notations Figure : A pitorial representation of the model Sine the qualities of the workers are stohasti and unknown, the requester is required to learn the qualities. So, the requester posts one task per time step, alloates the task to a worker, observes the suess of the task, and thus, gains a better estimate of the quality of the worker after eah time step. In this paper, we assume that the requester gives a fixed deadline of τ time steps for eah task posted, whih auses an additional omplexity to the budgeted MAB mehanism design problem. When a task is alloated to a worker i, the worker beomes unavailable for next τ time steps as the worker is busy exeuting the task. Thus, no further tasks an be assigned to the worker until the next τ time steps. The tasks are homogeneous, that is, eah worker i ompletes any task suessfully with a probability q i, inurs a fixed ost i and eah task has a fixed deadline of τ. Thus, these tasks are time ritial tasks. Let s be a suess realization matrix, where s i,t {0, } denotes whether or not ( or 0) a task submitted by a worker i at time step t is suessfully ompleted within deadline. The worker i submits the alloated task at (t + τ) th time, beomes available, and the suess bit s i,(t+τ) {0, } is observed at (t + τ) th time step for estimating the worker s quality, where s i,(t+τ) = with probability q i and s i,(t+τ) = 0 with probability q i. Thus, a task alloated at time step t an be unsuessful in two ways, first, if the alloated worker fails to submit the task at time step t + τ, and seond, an inorret submission (e.g. ode submitted to rentaoder.om might fail to work). A task is alloated only to one of the available workers at eah time step. When a task is alloated to a worker, the requester has to make payment to the worker irrespetive of the task being suessfully ompleted. In order to deide the payment, the requester onduts a reverse aution where the workers bid for taking up the task. We assume that the workers are allowed to bid only at the starting of the aution and their osts remain same for all the tasks as the tasks are homogeneous. The requester also provides a lower limit and upper limit on the bids, and respetively. Lower limit an be 0 and upper limit indiates that the requester is not willing to pay more than per task. The bids for all the workers are denoted by a vetor ĉ = (ĉ,..., ĉ K) [, ] K. The requester needs to define an alloation rule A and a payment rule P to ensure that the workers bid their true osts, whih leads to budgeted MAB mehanism design. Let s t = (s i,t ) i N t {,...,t } and Bt denote the suess realization and the budget left respetively till t time step. An alloation rule A is a funtion that maps the bid vetor ĉ, the 03

suess realization s t, and the budget left B t to a worker i N for eah time t. For eah task a worker i is alloated, a payment Pi t is given to the worker. The requester has a budget B, and the total payment given to the workers annot exeed the budget B. Also, not all the entries in s t are known at a time step t, sine only the suess of the alloated workers is only revealed. Thus, the alloation A and payment P depend only on the observed entries of suess realization s t. Now, ideally, the bid ĉ i must be equal to the true ost i for eah worker i, however, the workers, being strategi agents, may bid a value different from their true osts to maximize their own utilities. The utility of a worker i at time step t is given as U t i (ĉ i; i; s t ; B t ) = (P t i (ĉ; s t ; B t ) i)a t i(ĉ; s t ; B t ) () where, { A t i(ĉ; s t ; B t if worker i is alloated task at time t )= 0 otherwise We now present some essential properties that our mehanism should satisfy. Definition (Truthful) A mehanism is truthful if bidding true osts maximizes the utility of any worker i irrespetive of the bids of other workers. Formally, ĉ i, ĉ i, i, s t, B t, U t i ( i, ĉ i; i; s t ; B t ) U t i (ĉ i, ĉ i; i; s t ; B t ). (2) In this work, we onsider the strongest notion of truthfulness also known as Dominant Inentive Strategy Compatible (DSIC) where no worker has inentive to misreport his bid irrespetive of the bids of other workers. Definition 2 (Individually Rational) A mehanism is individually rational if every worker i derives a non negative utility by partiipating in the aution. Formally,, s t, B t, U t i (; i; s t ; B t ) 0. (3) Note that, even if the workers are bidding only one, the truthfulness is defined based on the utility, a worker ahieves at every time. This is beause, workers are unaware of the budget of the requester and thus will wish to maximize the utility at every time step. As in the later rounds, budget might get over. The non-strategi version of our problem an be mapped to a budgeted MAB problem [8] where workers represent the arms, tasks represent time steps and alloating a task to the worker orresponds to pulling an arm. Table provides the notation that we will be using throughout. 4. BUDGETED MAB MECHANISM WITH TIME CRITICAL TASKS 4. A Benhmark Mehanism In order to ompare the performane of our mehanism, we onsider a benhmark mehanism that knows the quality q i for eah worker i, but still has to inentivize the workers to bid truthfully. Thus, the benhmark algorithm is not required to learn the qualities of the workers, however, it should still satisfy properties like budget feasibility, truthfulness, and individual rationality. Let T (B) denote the total number of tasks that an be exeuted with budget B and ost vetor. Note that the total number of tasks also depend on the payment whih in turn depends on the osts and qualities. We have dropped this dependene for notational brevity. Let us denote A i(ĉ; q; B) = Tĉ(B) t= A t i(ĉ; q; B) to be the number of tasks alloated to a worker i when the bid profile is ĉ and the known quality vetor is q. Let the payment given to a worker i is denoted by P i(ĉ; q; B) = Tĉ(B) t= Pi t (ĉ; q; B). If τ = 0, i.e., all the workers are available for all tasks, we have the following optimization problem: maximize (A(ĉ;q;B),P(ĉ;q;B)) subjet to K A i(ĉ; q; B)q i i= K P i(ĉ; q; B) B i= A i(ĉ; q; B) Z i N where, worker i s payment Pi t (ĉ; q; B) and the number of alloations A t i(ĉ; q; B) satisfy the property of truthfulness and individual rationality given in Definitions and 2 respetively. If we assume that the osts are not private information then this optimization problem redues to the unbounded knapsak problem whih is NP-hard and is onsidered in [8] when osts are publi knowledge. Thus, we adopt a similar benhmark where the best worker is identified aording to highest quality by ost ratio. However, there is a non-zero task deadline, τ > 0, alloating a task to a worker implies that the worker remains unavailable for the next τ time steps. The idea is to alloate a task to the most effiient available worker at eah time step. Without loss of generality, let us assume that all the workers are ranked aording to their quality by ost ratio, that is, q ĉ q 2 ĉ2 (4)... q K ĉk. Workers are alloated greedily aording to this ranking among all the available workers. By the end of the first τ time steps, the best worker is available, and the best worker gets the task at time step τ. So, instead of seleting one optimal worker, the alloation rule has to selet best τ workers. Thus, the best τ workers are alloated one by one sequentially. The benhmark algorithm is given in Algorithm where N τ is the set of τ best workers, P denotes the sum of payments given to the best τ workers for taking one task eah, and T denotes the number of tasks given to eah worker. ALGORITHM : Benhmark mehanism Input: Bids ĉ,..., ĉ K, Qualities q,..., q K, Deadline τ, Maximum possible bid, and Budget B Output: Mehanism M = (A, P) Assumption: q ĉ q 2 ĉ2 2 N τ = {,..., τ}; 3 for eah i in N τ do 4 Set p i = min{ q i, }; 5 P = p i; 6 T = B/P ; 7 for t = to T do 8 for eah i in N τ do 9 A t i i = ; 0 P t i i = p i;... q K ĉk and τ < K; 04

Now, we show that the benhmark mehanism satisfies budget feasibility, truthfulness and individual rationality. The mehanism is budget feasible as T = B ensures that P the total payment is less than or equal to B. For ensuring truthfulness and individual rationality, a property alled monotone alloation rule is required, and hene, we define monotone alloation. Definition 3 (Monotone Alloation) An alloation rule A is monotone if for any quality vetor q, Budget B, ĉ i i, A t i( i, i; q; B) A t i(ĉ i, i; q; B). (5) Eah worker i that belongs to the set of best τ workers (i N τ ), gets the alloation only if his bid is less than q i, keeping all the other bids same. Thus, the alloation rule for the benhmark mehanism is monotone. Also, the payment rule should satisfy some riteria to make the benhmark mehanism truthful and individually rational, aording to the following Theorem. Theorem [3, 3] A mehanism is inentive ompatible (truthful) and individually rational if and only if for eah agent i and bid vetor ĉ, the alloation rule A t i is monotone, 0 At i(ĉ; q; B)dĉ < and the payment is given by P t i (ĉ; q; B) = ĉ ia t i(ĉ; q; B) + This payment is alled ritial payment. ĉ i A t i(z, ĉ i; q; B)dz (6) The ritial value whih is paid to eah of the best τ workers is given by Equation (6). From Algorithm, the payment for any worker i N τ, is given by Pi t = q i. To see that the payment satisfies Equation (6), we observe that the worker i N τ lose alloation if ĉ i > q i. Thus, the value of integra- ( ) τ+ q i ĉ i A t i(ĉ i, ĉ i; q; B) tion A t ĉ i i(z, ĉ i; q; B)dz = and thus, satisfies Equation (6). Note that, if all the best τ workers are not alloated the same number of tasks, it is hard to define the ritial payments. Thus, we onsider that i q i B B i q i budget remains unalloated to ensure truthfulness. Also, for worker i N τ, the payment for eah task is q i whih is greater than or equal to i making the mehanism individually rational. Thus, the benhmark mehanism ensures budget feasibility, truthfulness and individual rationality. Note that, generalization of the above algorithm is diffiult to the ase when the task deadlines are heterogeneous i.e. eah task has a different deadline, beause the alloation is no longer uniform and this requires ompliated payments to design a truthful mehanism. The total expeted reward (expeted number of suessfully ompleted tasks) aumulated by the benhmark algorithm is τ k= B q k τ k= q k = B qτ+ (7) The expeted regret for an algorithm with unknown suess rates is given by Tĉ(B) E[R] = B qτ+ q It (8) t= where, Tĉ(B) is the total number of tasks, and I t denotes the worker alloated for task t. We bound the regret with respet to benhmark greedy algorithm, sine it is hard to ompute the optimal solution even when the qualities are known. Moreover, benhmark algorithm gives us an approximate solution with a fator of two. 4.2 Proposed Mehanism In this setion, we provide a mehanism for budgeted MAB problem with task deadlines given in Algorithm 2. The input parameters to the algorithm are bid vetor ĉ, task deadline τ, budget B and the maximum allowed bid. We all our mehanism exploration-separated as the mehanism divides the budget B into exploration budget and exploitation budget. The value of exploration budget B is given in Step and is alulated in a way that minimizes the expeted regret 8 (alulation provided in Setion 4.4). In Setion 4.5, we prove that the exploration separated property is neessary for any truthful and IR mehanism. ALGORITHM 2: Budgeted MAB mehanism with task deadline Input: Bids ĉ,..., ĉ K, Deadline τ, Maximum bid, Budget B Output: Mehanism M = (A, P) Initialize t = ; ˆq k = 0 and n k = 0 k N; 2 Set B = (2) /3 (K ln(kb)) /3 B 2/3 ; 3 for l =, 2,..., B K do 4 for i =, 2,..., K do 5 if t > τ then 6 Let k= worker alloated at time (t τ); 7 Observe reward s k,t ; 8 Update ˆq k = (ˆq k n k + s k,t )/(n k + ); 9 Update n k = n k + ; 0 Alloate worker i, A t i =, P t i = ; Update t = t + ; 2 for eah k in N do 3 Update = ˆq k + 4 Set N τ = {[],..., [τ]}; 5 Set j = argmax τ+ K ln(kb) 2B ; ; 6 Set B t = B B K K; 7 while B t > i min{ ˆq+ i ˆq j, } do + j 8 for eah i in N τ do 9 Set A t i =, Pi t = min{ ˆq+ i ˆq j, }; + j 20 Alloate worker i, A t i = and pay Pi t ; 2 Update B t+ = B t Pi t ; 22 Update t = t + ; The algorithm first explores the workers by alloating the tasks to the workers in a round robin fashion till budget B is exhausted. Suh rounds are known as exploration rounds. The algorithm maintains the running average of quality ˆq k obtained from eah worker k in the exploration round. The per alloation payment for eah worker during exploration phase is. After exploration rounds, the best τ workers, 05

aording to dereasing order of k N, are hosen to be played sequentially one by one in eah round, where = ˆq K ln(kb) k + 2B is the upper bound quality estimate for worker k. Let denote the i th ranked worker and N τ denote the best τ workers aording to the ratio ˆq+ k, that is, = argmax i and N τ = {[],..., [τ]}. We use the notation argmax i to denote the parameter value that gives i th maximum value of the orresponding array. The per alloation payment made to a worker k among the best τ workers is ˆq+ k ĉ[τ+], where [τ + ] = argmax τ+. 4.3 Properties of the Proposed Mehanism Our mehanism satisfies the following desirable properties: Truthfulness:. Monotone Alloation: The alloation funtion during the exploration rounds is independent of the bids, so A t i(ĉ i, ĉ i; s t ; B t ) does not hange for a different bid i < ĉ i. During the exploitation phase, the workers are alloated aording to the ratio ˆq+. Sine the iĉi qualities does not hange in exploitation rounds, the alloation rule of the proposed mehanism is monotone. 2. Critial Payment: The ritial payment is given by Equation (6). In exploration phase, the alloation is bid independent, that is, the alloation remains same for any i [ i, ]. Thus, payment to an alloated worker during exploitation phase at eah time is. During exploitation phase, any worker i among best τ workers lose alloation when ĉ > ˆq+ i ĉ[τ+]. Thus, the payment using Equation (6) an be shown to be ˆq+ i ĉ[τ+] per round. As the payment rule of the mehanism mathes the payment given by (6) with monotone alloation rule, the mehanism is truthful, that is ĉ i = i i N. Individually Rationality: The payment during exploration phase and ˆq+ i ĉ[τ+] during exploitation phase to eah worker i is always greater than or equal to ĉ i, and hene, the mehanism is individually rational. Budget Feasibility: The algorithm stops when there is no more budget left to be paid to the workers in exploitation phase. Thus, the total payment given to all the workers does not exeed the given budget B. Thus, the mehanism is budget feasible. Computationally Effiient: Initially, the mehanism invites bids from all the workers, and using the observed qualities and bids, find best τ + workers aording to their quality per ost ratio. This takes O(K) time [8]. The following setion provides an upper bound on the expeted regret and proposes the optimal value of B for the algorithm. 4.4 Upper Bound Analysis The performane of any mehanism depends upon the regret aumulated by the mehanism. The expeted regret of the proposed algorithm (as given by Equation 8) is given as: B B B B q k K [τ+] q (9) ˆq + q [τ+] i N τ here, denotes i th ranked worker aording to the ratio ˆq+ k k. The following theorem provides an upper bound on the regret by taking an optimal value of B. Theorem 2 The expeted regret for the proposed algorithm is O(B 2 3 K 3 (ln (KB)) 3 ). We first provide bounds on the learnt quality after exploration phase in the following two lemmas. Lemma 3 After the exploration rounds, for eah worker k N, q k < ˆq + k with probability at least. KB Proof: Let eah worker be alloated n times in the exploration phase, i.e. n = B. After n alloations we have, K P (ˆq ) + ) ln (KB) k q i = P (ˆq k + q i 2n ( ) ln (KB) = P ˆq k q i 2n Applying Chernoff-Hoeffding s bound, we get: ( ) { ( ln (KB) ln(kb) 2n 2n P ˆq k q i e 2n ) 2 } = KB Therefore, with probability at least ( ), for a worker KB k, q k <. Lemma 4 After the exploration rounds, with probability at least B, qr r ˆq + [r] [r] where, [r] = argmax r Proof: The probability that is greater than q k for every worker k N is ( ) P (ˆq+ k > q k ) ( ) = P (ˆq+ k q k ) P( q k ) = B KB Hene, r th max value of the set { qk or equal to the r th max value of the set probability ( B ). k k (Using lemma 3) } : k N { k is less than } : k N with Proof of Theorem 2: The expeted regret of the algorithm an be upper bounded by 06

( B ) B B q k K (B B ) ˆq + [τ+] q [τ+] + B B = ( B ) B q k K q +(B B ) [τ+] τ+ ˆq + + ( B ) B + (B B ) + ( B ) B + (B B ) [τ+] τ+ ˆq + + ( B ) B + [τ+] q q [τ+] ˆq + ˆq + [τ+] [τ+] ˆq + q (B B ) ˆq + ˆq + q + (By Lemma 4) = ( B ) B + (B B )2τ K ln(kb) [τ+] τ+ ˆq + 2B + B + (B B )2τ [τ+] K ln(kb) τ+ ˆq + 2B + B + (B B )2τ K ln(kb) 2B + B + 2B K ln(kb) 2B + It an be shown that the above expression attains minimum value when the value of B is B = (2) /3 (K ln(kb))/3 B 2/3 (0) Substituting the value of B, we get the upper bound of expeted regret as E[R].587 (K ln(kb)) /3 B 2/3 + qτ+ Thus it is proved that the expeted regret is O(B 2/3 K /3 (ln(kb)) /3 ) Note that the upper bound on regret does not depend on τ diretly, though the presene of in the expression of upper bound on regret makes the regret bound derease as τ inreases. Also, our regret bounds mathes with the existing literature [8]. We now show that this is the best ahievable regret up to a onstant fator. 4.5 Lower Bound Analysis In this setion, we prove that the regret bound for the proposed algorithm is tight by providing lower bound analysis. Lower bound on regret exists for MAB mehanism in the ontext of sponsored searh aution without any budget onstraint [5]. However, these results annot be extended in a straight forward way for budgeted MAB mehanism. The presene of budget auses an inherent dependeny between payment and the number of rounds. Given a fixed budget, the number of rounds of the algorithm may vary as the bid hanges or observed suess rate hanges. This makes the lower bound proof in the setting more hallenging. Let T (B) denotes the total number of tasks that are exeuted with ost profile and budget B. Note that the total number of tasks will also depend on the suess realization and more importantly on alloation and payment rule. However, for notational simpliity we do not show this dependene expliitly. We prove the lower bounds for the speial ase of τ = 0, i.e., the task is verified instantly and thus all the workers are available for all the tasks. Sine, the problem with task deadline τ = 0 is a speial ase, the lower bounds are appliable to the general lass of problems. We start with some definitions that will be used in proofs and Theorem 6 provides the final lower bound. Definition 4 (Similar suess realization) Two suess realizations s orresponding to bid profile and s 2 orresponding to bid profile ĉ are said to be similar if s t = s t 2 t min(t (B), Tĉ(B)). Similarity between two suess realizations s and s 2 is denoted by s s 2. We now redefine monotoniity of alloation rule for learning setting: Definition 5 (Monotone) An alloation A is alled monotone if for any ĉ i i, A t i( i, i; s ) =, then A t i(ĉ i, i; s 2) =, s s 2 and t min(t (i, i )(B), T (ĉi, i )(B)). To provide lower bounds, we will assume that the mehanism knows full suess realization and hene alloation rule at any time t depends on omplete suess realization s instead of s t. An alloation rule at time t may or may not depend on the bid. We denote suh rounds as bid independent rounds. Definition 6 (Bid Independent Round) A round t is alled bid independent if the alloation A t (ĉ) remains same for all ĉ [, ] K. It is easy to see from the ritial payment that if a round t is bid independent, then the payment to the alloated worker at that round is. If the requester hooses to make alloations of all rounds bid independent, then only B rounds will be played. However, suh alloation rule severely redues the expeted number of suessfully ompleted jobs. Ideally, 07

there should be no bid-independent alloation rounds. On the other hand, we show that bid independent rounds are neessary for inentive ompatibility. Hene, the total number of bid independent rounds should be properly balaned. We say a round t is used for exploration, when the suess observed in that round is used for deiding future alloations. We denote suh a round t as influential round, and the round t whose alloation depends on the suess observed at round t is alled influened round. Suh a pair (t, t ) is alled influential pair. Definition 7 (Influential Round) A round t is alled an influential round if the suess observed in round t, is used for deiding future alloation in round t and round t is alled influened round. Definition 8 (Influential Pair) A pair (t, t ) is alled an influential pair if A t (; s) A t (; s ) for t > t, where A t (; s) = j and s = s I{j, t} for some bid profile, and some suess realization s. Here, s = s I{j, t} denotes suess realization with only the suess bit for worker j at time t is flipped in s. From Theorem, any truthful and individually rational mehanism should satisfy the property of monotoniity. Number of influential rounds indiates how many rounds are required to learn the qualities. We now show that any influential pair (t, t ) should be bid independent for truthful mehanism and hene, payment in these influential rounds should be. Theorem 5 If there exists an influential pair (t, t ) for some bid profile, and suess realization s, where t < t B then a deterministi mehanism is truthful only if the payment at time t is bid independent. Proof: Let us hoose K = 2 with workers and 2. Let (t, t ) be an influential pair for some bid profile ĉ and some suess realization s suh that A t (ĉ; s) = 2 and A t (ĉ; s) A t (ĉ; s ), where, s = s I{2, t}. We assume t is the least time step whih is influened by time step t. Any algorithm plays for at least B rounds sine the payment to any worker for any task does not exeed. Let us assume that round t is bid dependent, hene for some ĉ < ĉ, At (ĉ, ĉ2; s) = where s s. Sine s x = s x x B, we replae s with s in rest of the proof (as t, t B ). As the suess of arm 2 at round t is not observed by the algorithm, so the alloation and payment at t remains same for s and s, and thus, A t (ĉ, ĉ2; s) = At (ĉ, ĉ2; s ) = i and Pi t (ĉ, ĉ2; s) = Pi t (ĉ, ĉ2; s ). Let us assume that A t (ĉ; s) = and A t (ĉ; s ) = 2, otherwise we an swap s with s. Due to monotoniity, A t (ĉ; s) = A t (ĉ, ĉ2; s) =. Thus, At (ĉ, ĉ2; s) = At (ĉ, ĉ2; s ) =. Now, A t (ĉ; s) = and he is paid an amount up to whih he ould have inreased his bid and still get the alloation, thus the payment P t (ĉ; s) ĉ. () Sine A t (ĉ, ĉ2; s) =, he is paid an amount up to whih he ould inrease his bid and still get the alloation. Now, as we inrease the ost from ĉ to ĉ, arm still gets the alloation, so the payment P t (ĉ, ĉ2; s) ĉ. (2) Again, A t (ĉ; s ) = 2 and A t (ĉ, ĉ2; s ) =, so the amount up to whih he an raise his bid and still get the alloation is less than or equal to ĉ, as he did not get an alloation at round t with bid ĉ, so P t (ĉ, ĉ2; s) ĉ. By arguments () and (2), it an be seen that the payment P t (ĉ, ĉ2; s) = Pt (ĉ, ĉ2; s ) = ĉ. If payment to a worker is equal to its bid then the mehanism an not be truthful. Thus, the payment at time t has to be bid independent. Aording to Theorem 5, the alloation at an influential or exploration round t does not depend on the bids, and thus the payment at any influential round t is. Theorem 6 Any deterministi truthful mehanism for budgeted MAB setting with strategi osts has an expeted regret of Ω(B 2 3 K 3 ). Proof: To prove lower bounds, we assume that there is an adversary that provides the qualities and osts vetor as input to the algorithm that tries to maximize the regret. Sine the upper bound on the payment to any worker is given by, any mehanism in this setting run for at least B rounds. If there are no influential rounds that means alloation to any worker does not depend on his quality (sine the quality is being learnt). In this ase the best any mehanism an do is to alloate equal number of tasks to all the workers. Otherwise the adversary an hose the quality of the worker that has been assigned lesser number of tasks to be of that of and rest of quality 0. Thus, inurring the regret of at least B(K ) if quality profile is q = (, 0,..., 0). K Now, let us assume that there are some influential pairs in the first B rounds, and let the orresponding influential rounds be denoted by set I. By Theorem 5, these influential rounds should be bid independent. Now onsider two ases:. If I βb 2/3 K /3, where β is any onstant. These rounds are bid independent, by (Theorem 5) payment made in these rounds is. Now, onsider the quality vetor, q = (,,..., ) and ost vetor, = (,,..., ). Regret for this profile is at least βb2/3 K /3 βb2/3 K /3. As >, we get the regret of Ω(B 2/3 K /3 ). 2. If I βb 2/3 K /3, then using the result in [5], one an prove that there exists suess realizations whih ahieves regret of at least βb 2/3 K /3. Thus, upper bound regret is tight up to a logarithmi fator. 5. CONCLUSION AND FUTURE WORK We studied a budgeted MAB problem with strategi arms and task deadlines. We proposed a budget feasible, truthful, and individually rational mehanism to solve the problem. We provided an upper bound of O(B 2/3 K /3 ln(kb) /3 ) on the regret of the proposed algorithm with respet to an appropriate benhmark. We also showed that any deterministi truthful algorithm that solved budgeted MAB mehanism design problem would suffer an expeted regret of Ω(B 2/3 K /3 ). The questions left open by this work onern results for the following problems: (a) A randomized truthful mehanism for the problem with better regret bound. (b) A truthful mehanism for the problem with tasks having different deadlines. () A truthful mehanism for minimizing total payment where the total reward obtained is higher than a fixed threshold. 08

REFERENCES [] R. Agrawal. Sample mean based index poliies with O(logn) regret for the multi-armed bandit problem. Advanes in Applied Probability, pages 054 078, 995. [2] S. Agrawal and N. R. Devanur. Bandits with onave rewards and onvex knapsaks. In Proeedings of the fifteenth ACM Conferene on Eonomis and Computation (EC 4), pages 989 006. ACM, 204. [3] A. Arher and É. Tardos. Truthful mehanisms for one-parameter agents. In Proeedings of the 42nd IEEE Symposium on Foundations of Computer Siene, 200, pages 482 49. IEEE, 200. [4] P. Auer, N. Cesa-Bianhi, and P. Fisher. Finite-time analysis of the multiarmed bandit problem. Mahine Learning, 47(2-3):235 256, 2002. [5] M. Babaioff, Y. Sharma, and A. Slivkins. Charaterizing truthful multi-armed bandit mehanisms. In Proeedings of the 0th ACM Conferene on Eletroni Commere, pages 79 88. ACM, 2009. [6] A. Badanidiyuru, R. Kleinberg, and Y. Singer. Learning on a budget: posted prie mehanisms for online prourement. In Proeedings of the Thirteenth ACM Conferene on Eletroni Commere (EC 2), pages 28 45. ACM, 202. [7] S. Bubek and N. Cesa-Bianhi. Regret analysis of stohasti and nonstohasti multi-armed bandit problems. Foundations and Trends in Mahine Learning, 5(): 22, 202. [8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introdution to Algorithms. The MIT Press, 2 edition, 200. [9] N. R. Devanur and S. M. Kakade. The prie of truthfulness for pay-per-lik autions. In Proeedings of the 0th ACM Conferene on Eletroni Commere, pages 99 06. ACM, 2009. [0] S. Guha and K. Munagala. Approximation algorithms for budgeted learning problems. In Proeedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pages 04 3. ACM, 2007. [] C.-J. Ho, A. Slivkins, and J. W. Vaughan. Adaptive ontrat design for rowdsouring markets: bandit algorithms for repeated prinipal-agent problems. In Proeedings of the fifteenth ACM onferene on Eonomis and omputation, pages 359 376. ACM, 204. [2] R. Kleinberg, A. Niulesu-Mizil, and Y. Sharma. Regret bounds for sleeping experts and bandits. Mahine Learning, 80(2-3):245 272, 200. [3] R. B. Myerson. Optimal aution design. Mathematis of Operations Researh, 6():58 73, 98. [4] Y. Singer. Budget feasible mehanisms. In Proeedings of the 5st Annual IEEE Symposium on Foundations of Computer Siene (FOCS), pages 765 774. IEEE, 200. [5] Y. Singer and M. Mittal. Priing mehanisms for rowdsouring markets. In Proeedings of the 22nd International Conferene on World Wide Web, pages 57 66, 203. [6] A. Slivkins and J. W. Vaughan. Online deision making in rowdsouring markets: Theoretial hallenges (position paper). arxiv preprint arxiv:308.746, 203. [7] L. Tran-Thanh, A. C. Chapman, A. Rogers, and N. R. Jennings. Knapsak based optimal poliies for budget-limited multi-armed bandits. In Proeedings of the Twenty-Sixth AAAI Conferene on Artifiial Intelligene, pages 34 40, 202. [8] L. Tran-Thanh, S. Stein, A. Rogers, and N. R. Jennings. Effiient rowdsouring of unknown experts using multi-armed bandits. In European Conferene on Artifiial Intelligene, pages 768 773, 202. 09