Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors

Size: px

Start display at page:

Download "Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors"

Beverley Perry
5 years ago
Views:

1 Socially-Optimal Design of Crowdsourcing Platforms with Reputation Update Errors 1 Yuanzhang Xiao, Yu Zhang, and Mihaela van der Schaar Abstract Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical Turk) provide a platform for requesters, who have tasks to solve, to ask for help from workers. Vital to the proliferation of crowdsourcing systems is incentivizing the workers to exert high effort to provide high-quality services. In particular, this incentive should be provided, preferably, without monetary payment, which adds extra infrastructure to the system. Reputation mechanisms have been shown to work effectively as incentive schemes in crowdsourcing systems. A reputation agency updates the reputations of the workers based on the requesters reports on the quality of the workers services. A low-reputation worker is less likely to get served when it requests for help, which provides incentive for the workers to obtain a high reputation by exerting high effort. However, reputation update errors are inevitable, because of either system errors such as loss of reports, or inaccurate reports, resulting from the difficulty in accurately assessing the quality of a worker s service. The reputation update error prevents existing reputation mechanisms from achieving the social optimum. In this paper, we propose a simple binary reputation mechanism, which has only two reputation labels ( good and bad ). To the best of our knowledge, our proposed reputation mechanism is the first that is proven to be able to achieve the social optimum even in the presence of reputation update errors. We give the conditions under which the designed binary reputation mechanism can achieve the social optimum, and discuss how to design the reputation rules, such that the conditions for achieving social optimum can be fulfilled. I. INTRODUCTION Crowdsourcing systems (e.g. Yahoo! Answers and Amazon Mechanical Turk) provide a platform for a user to elicit collective efforts from the other users in order to solve a task. In a typical crowdsourcing system, a user can either post tasks and request for help as a requester, or solve the tasks posted by others as a worker. In some crowdsourcing systems [1][2], the workers are rewarded by monetary payments from the requesters. The payment is usually paid when the

2 2 worker is assigned with the task, instead of after the worker completes the task. This creates an incentive problem, namely the worker may want to exert low effort on the task since it has already been paid. In other systems [3], the servers are rewarded by the benefit obtained from other users services. In such systems without monetary payment, it is even more difficult to provide the servers with the incentive to exert high efforts. In summary, the incentive provision for the servers is vital to the effectiveness of crowdsourcing systems. One effective incentive mechanism is the reputation mechanism [1] [3]. In a reputation mechanism, there is a reputation agency, who assigns reputations to all the users based on their behaviors. If a user exerts high efforts in the past as a server, it will receives a high reputation as a summary of its good behaviors in the past. Similarly, a user s low reputation indicates that it used to exert low efforts as a server. Since a low-reputation user may get its task served with low effort, the users have incentive to get high reputation by exerting high efforts on the others tasks. Built on this intuition, the existing reputation mechanisms work well, except when there are reputation update errors. The reputation update error comes from two sources. First, a client cannot perfectly assess the service quality of a server, and hence, will report the service quality as low to the reputation agency, even when the server actually exert high effort. Second, the reputation agency may miss the client s report on the service quality, or update the reputation in a wrong way by mistake. In the presence of reputation update errors, the existing reputation mechanisms [1] [3] have performance loss that is increasing with the update error probability. This performance loss in [1] [3] partially comes from the restriction of attention on stationary Markov strategies. As we will see later, once we remove this restriction, the social optimum can be achieved. In this paper, we show that the social optimum can be achieved in the presence of reputation update errors, if we do not restrict our attention to stationary Markov strategies as in [1] [3]. In other words, we allow the users to take different actions given the same current state at different time instants, which increases the strategy space to be considered significantly and potentially complicates the optimal strategy. Nevertheless, we rigorously prove that under certain conditions, the social optimum can be achieved by a class of simple reputation mechanisms, namely binary reputation mechanisms that assign binary reputation labels to the users. We derive the conditions under which the social optimum can be achieved, which provide guidelines for the design of reputation update rules. In addition, we further simplify the optimal strategy by proving that

3 3 strategies of a simple structure can be optimal, and propose an algorithm to construct the optimal strategy. The rest of the paper is organized as follows. In Section II, we discuss related works. In Section III, we describe the model of crowdsourcing systems. Then we design the optimal reputation mechanisms in Section V. Simulation results in Section VI demonstrate the performance improvement of the proposed reputation mechanism. Finally, Section VII concludes the paper. II. RELATED WORKS The idea of social norm and reputation mechanism was originally proposed in [4]. Assuming that there is no reputation update error, [4] proposed a simple reputation mechanism, in which any deviating user will be assigned with the lowest reputation forever and will be effectively excluded from the system. Without reputation update errors, this reputation mechanism with the most severe punishment can achieve social optimum. However, with reputation update errors, eventually all the users will have the lowest reputation and receive no payoff. Hence, the performance loss of the reputation mechanism in [4] is large under reputation update errors. Based on the idea in [4], [5][6][7] did some performance analysis on different reputation mechanisms through simulation. Through simulation, some simple reputation mechanisms are shown to have performance loss under reputation update errors. [8] is the first one that rigorously analyzes a class of reputation mechanisms under reputation update errors. For the class of reputation mechanisms considered, [8] quantifies the performance loss in terms of the reputation update error probability. Since there is only one long-live player in the model of [8], the reputation mechanisms in [8] are actually equivalent to a class of reputation mechanisms that we will analyze in Section V-C. Our analysis on this subclass of reputation mechanisms shows the same intuition as in [8]. However, we will also design another class of reputation mechanisms that asymptotically achieve social optimum under reputation update errors. The contrast between these two classes of reputation mechanisms shows that differential punishment is crucial to achieve social optimum. In our setting, we can punish users with low reputation and reward users with high reputation, in order to have differential punishment. On the contrary, in [8], because there is only one long-live player, all the users (considered as the single long-live player) are punished or rewarded simultaneously. [1] [3] also rigorously analyze a class of reputation mechanisms in the limit case when the

4 4 TABLE I COMPARISON WITH RELATED WORKS. User Strategy Deviation-proof Discount factor Performance loss (due to update errors) [1] [3] Stationary Markov Yes < 1 Yes [5][6][7] Stationary Markov No < 1 Yes [8] Stationary Markov Yes < 1 Yes [4] Nonstationary Yes 1 Yes This work Nonstationary Yes < 1 No reputation update error probability goes to 0. However, under positive reputation update error probabilities, there will also be performance loss. Our paper is the first one that proposes a class of socially optimum reputation mechanisms under reputation update errors. Although we model the interaction among the users in the crowdsourcing platform as a stochastic game, our results are fundamentally different from the folk theorem-type results in repeated games [9] and in stochastic games [10] in two aspects. First, [9] and [10] show that for any feasible payoff profile v, there exists a discount factor, such that under any, the payoff profile v can be achieved at the equilibrium. However, they cannot obtain the lower bound. Hence, they cannot construct the equilibrium strategy that achieves v, because a feasible discount factor is needed to compute the strategy. In contrast, for a payoff profile v that is arbitrarily close to the social optimum (b c) 1 N, we can analytically determine the lower bound of feasible discount factors, and thus can construct the equilibrium strategy to achieve v based on a feasible discount factor. Second, although both our work and [9][10] build on the theory of self-generating sets [11], we cannot use the theory of self-generating sets in the same way as [9] and [10] did, because of some restriction on the strategies in our setting. We will discuss this in details in Section V-A. In Table I, we compare the proposed work with existing reputation mechanisms.

5 5 III. SYSTEM MODEL A. Basic Setup Consider a crowdsourcing platform with N users. Denote the set of users by N = {1,..., N}. We assume that the number of users N is displayed on the platform and is known to all the users. Each user has plenty of tasks to solve, and possesses some resources (e.g. knowledge) valuable to the other users tasks. A user can request for help to solve its task as a requester, and can provide service for another user as a worker. Since the users usually stay in the platform for a long period of time, we divide time into periods labeled by t = 0, 1, 2,.... In each period t, the users act in the following order: Each user requests for help to solve its tasks. Each user is matched to another user s task by a matching rule. Each user chooses to exert high or low effort in solving the task. A matching is define as a mapping m : N N, where user i, as a worker, is matched to the task of user m(i). Since a user cannot be matched to itself, we denote the set of all possible matchings as M = {m : m(i) i, i N }. (1) A matching rule is then defined as a probability distribution µ on the set of all possible matchings M. In this paper, we will focus on the uniformly random matching rule, which satisfies ( N ( 1) i ) µ(m) = 1/ N!, m M. (2) i! i=2 We assume that there is no cost for requesting for service, and that the users always have tasks to solve. Hence, each user requests for service in every period. A user pays a high or low cost to exert high or low effort, respectively, as a worker. We assume that the cost of high or low effort is the same across all the users. A user receives a high or low benefit as a requester if the matched worker exerts high or low effort, respectively. We also assume that the high or low benefit is the same across all the users. Since only the difference in the costs of high effort and low effort matters, we normalize the worker s cost of exerting low effort to 0, and set the cost of exerting high effort as c > 0. For the same reason, we normalize the requester s received benefit from the worker s low effort to 0, and set the benefit from high effort as b > 0. Hence,

6 6 TABLE II THE GIFT-GIVING GAME BETWEEN A CLIENT AND A SERVER. high effort low effort request (b, c) (0, 0) the interaction of a requester and a worker can be summarized by the gift-giving game in Table II, where the row player is the requester and the column player is the worker. We can see that in the unique Nash equilibrium of the gift-giving game, the worker will exert low effort, which results in a zero payoff for both the requester and the worker. We are interested in the scenarios where b > c, namely exerting high effort leads to the social optimum. Our goal is to design an incentive scheme such that it is in their self-interests for the workers to exert high effort. B. Binary Reputation Mechanisms A powerful incentive scheme is the reputation mechanism, which exploits the repeated interaction of the requesters and the workers, and assigns each user a reputation as the summary of its past behavior. In this paper, we focus on the simplest reputation mechanisms, namely binary reputation mechanisms, and will show that binary reputation mechanisms can achieve social optimum. A binary reputation mechanism assigns binary reputations to the users. Denote the set of binary reputations by Θ = {0, 1}, and the reputation of user i by θ i, i N. The reputation profile is defined as the vector of all the users reputations, θ = (θ 1,..., θ N ). The reputation profile contains the users private information and is not known to the users. However, an important statistics, namely the reputation distribution, can be listed on the platform and observed by all the users. A reputation distribution is defined as a tuple s(θ) = (s 0 (θ), s 1 (θ)), where s 1 (θ) = i N θ i is the number of users with reputation 1, and s 0 (θ) = i N (1 θ i ) is the number of users with reputation 0. The set of all possible reputation distributions is denoted S. Note, however, that when a worker is matched to a requester, it is informed by the platform of its requester s reputation. As described before, a user always request for service as a requester because there is no cost to request. Hence, a user only needs to determine its effort level as a worker. Consequently, we

7 7 model each user s action as a contingent plan of exerting high or low effort based on its own reputation and the reputation of the requester matched to it. Formally, each user i s action, denoted by α i, is a mapping α i : Θ Θ Z, where Z = {0, 1} is the set of effort levels with z = 0 representing low effort. Then α i (θ m(i), θ i ) denotes user i s effort level as a worker when it is matched to a requester with reputation θ m(i). We write the action set as A = {α α : Θ Θ Z}, and the joint action profile of all the users as α = (α 1,..., α N ). After each worker exerts effort based on its action, the requester reports its assessment of the effort level to the platform. The report is defined as a mapping R : Z (Z), where (Z) is the probability distribution over Z. For example, R(1 z) is the probability that the requester reports high effort given the worker s actual effort level z. An example report could be 1 ε, z = z R(z z) =, (3) ε, z z where ε [0, 0.5) is the report error probability. 1 This report error may be caused by the requester s inaccurate assessment of the effort level and by the system error of the platform (e.g. loss of reports). Since the users are homogeneous, we assume the same report mapping R for all the users. Based on the requester s report, the platform will update the worker s reputation based on the reputation update rule. Again, since the users are homogeneous, we assume that the reputation update rule is the same for the users. Then the reputation update rule, denoted by τ, is defined as a mapping τ : Θ Θ Z (Θ). For example, τ(θ w θ r, θ w, z) is the probability distribution of the worker s updated reputation θ w, given the reputation of the requester θ r, the worker s own reputation θ w, and the requester s report z. We focus on a class of reputation update rules 1 We confine the report error probability ε to be smaller than 0.5, because with an error probability ε = 0.5, the report contains no information about the effort level. If an error probability is larger than 0.5, the platform can use the opposite of the report as an indication of the effort level, which is equivalent to the case with an error probability smaller than 0.5.

8 8 defined as β + θ w, θ w = 1, z α 0 (θ r, θ) τ(θ w θ r, θ w, z)= 1 β + θ w, θ w = 0, z α 0 (θ r, θ) 1 β θ w, θ w = 1, z < α 0 (θ r, θ) β θ w, θ w = 0, z < α 0 (θ r, θ), for θ w = 0, 1, where α 0 is the platform s recommended action, based on which the reputation is updated. In the above reputation update rule, if the reported effort level is not lower than the one specified by the recommended action, a worker with reputation θ w will have reputation 1 with probability β θ + w ; otherwise, it will have reputation 0 with probability βθ w. In summary, a reputation mechanism (Θ, τ) is completely determined by the set of reputations Θ and the reputation update rule τ. Finally, we summarize the repeated interaction among the users in the crowdsourcing platform with a reputation mechanism as follows. 1) At the beginning of period t, the platform displays the current reputation distribution s(θ), and announces the recommended action α 0. 2) Each user i requests for services, following which it is matched as a worker to the task of user m(i) with probability µ(m). It is then informed by the platform of the requester s reputation θ m(i). According to its action α i, it chooses its effort level α i (θ m(i), θ i ). 3) Each requester reports its assessment of the effort level to the platform. 4) At the end of period t, the platform updates the reputation profile based on the reputation update rule τ. IV. STOCHASTIC GAME FORMULATION In the section, we first formulate the repeated interaction among the users in the crowdsourcing platform with reputation mechanisms as a stochastic game. Then we make some restrictions on the users strategies based on the users knowledge of the system, and define the corresponding equilibrium, considering the above restrictions. Finally, we define the platform designer s problem.

9 9 A. The Stochastic Game A stochastic game is described by the set of players, the set of states with the state transition probability, the set of actions, and the players stage-game payoff functions. We consider the platform as a player, which is indexed by 0, and define the set of players as {0} N. The state is defined as the reputation profile θ, and the set of states as Θ N. The platform s action is the recommended action, which is defined in the same way as the users actions, namely α 0 : Θ Θ {0, 1}. We write the state transition probability as q(θ θ, α 0, α), which is the probability that the next state is θ given the current state θ, the platform s recommended action, and the joint action profile α. Note that the state transition probability q is determined by the matching rule µ, the report function R, and the reputation update rule τ. Finally, each user i s stage-game payoff function is u i (θ, α 0, α), which depends on 2 the state θ and the joint action profile α. We define the platform s payoff as a constant u 0 (θ, α 0, α) = 0, θ, α 0, α, such that it will follow the platform designer s decisions on how to choose the recommended action. Note that the platform designer s goal (or payoff) is the social welfare, which will be defined later. The users interact in the platform repeatedly. Hence, each user should have a strategy, which is a contingent plan of which action to take based on the history. The history is the collection of the past and the current states (i.e. reputation profiles) and its own service qualities received from the workers. This history is different for different users, who have different records of service qualities, and thus is called private history. By contrast, we define the public history as the collection of states, which is the same for all the users. In this paper, we focus on public strategies, which determine the actions based on public histories only. Public strategies are less complicated than private strategies, and more importantly, will be proven to be able to achieve the social optimum. Denote the public history at period t as h t = (θ 0,..., θ t ), where θ t is the reputation profile at the beginning of period t, and the set of public histories at period t as H t = (Θ N ) t+1. Then each user i s strategy π i : t=0h t A as a mapping from the set of all possible histories to the action set. Similarly, we define the platform s recommended strategy as π 0 : t=0h t A. 2 Note that, although the stage-game payoff does not depend on the recommended action, we write the recommended action α 0 as an argument of the payoff function by the convention of game theory. However, the recommended action does affect the long-term payoff through the state transition probability.

10 10 The joint strategy profile of all the users is written as π = (π 1,..., π N ). We write the set of all strategies and the set of all strategy profiles as Π and Π N, respectively. Given the matching rule µ, the initial reputation profile θ 0, the report function R, and the reputation mechanism (Θ, τ), the recommended strategy π 0 and the joint strategy profile π induce a probability distribution over the set of all the histories H. Taking the expectation with respect to this probability distribution, each user i receives a discounted average payoff U i (θ 0, π 0, π) defined as { } U i (θ 0, π 0, π) = E h (1 ) t u i (θ t, π 0 (h t ), π(h t )), t=0 where [0, 1) is the common discount factor of all the users. The discount factor is the rate at which the users discount future payoffs, and reflects the patience of the users. A more patient user has a larger discount factor. B. Symmetric Feasible Strategy Profiles First, since the users are homogeneous, we assume that all the users adopt the same strategy, namely π i = π, i N. In particular, we write the symmetric joint strategy profile as π 1 N, where 1 N is a vector of 1 s with length N. Note that although all the users adopt the same action in each period, they will choose different effort levels since both their own and their requesters reputations are different. Second, since the users know the reputation distributions only, but not the reputation profiles, we restrict the users strategies to be indifferent in reputation profiles that have the same reputation distribution. We call such strategies feasible strategies, since it is infeasible for the users to adopt strategies that require the knowledge of reputation profiles. Formally, we define the feasible strategies as follows. Definition 1: A strategy π is feasible, if for all t 0 and for all h t, h t H t, we have π(h t ) = π( h t ), if s(θ k ) = s( θ k ), k = 0, 1,..., t. (4) We write the set of all feasible strategies as Π f. In this paper, we focus on symmetric feasible strategy profiles, which is defined as follows. Definition 2: A symmetric feasible strategy profile is a symmetric strategy profile π 1 N, in which the strategy π is feasible, namely π Π f. We write the set of all symmetric feasible strategy profiles as Π N f.

11 11 If the users adopt symmetric feasible strategies, their stage-game payoffs can be calculated as [ sθi 1 u i (θ, α 0, (α i, α 1 N 1 )) = b N 1 α(θ i, θ i ) + s ] 1 θ i N 1 α(θ i, 1 θ i ) (5) [ sθi 1 c N 1 α i(θ i, θ i ) + s ] 1 θ i N 1 α i(1 θ i, θ i ), where (α i, α 1 N 1 ) is the action profile in which user i chooses α i and all the other users choose α. We can see that each user s payoff can be determined by the reputation distribution s(θ) only. Finally, since the action set A has 2 4 = 16 elements, the complexity of choosing the action is large for the users. Hence, we consider the strategies that choose actions from a subset B A, and define Π f (B) as the set of symmetric feasible strategies restricted on the subset of actions B. We are particularly interested in two subsets of actions, A afs {α a, α f, α s } with three actions and A as {α a, α s } with two actions. Specifically, the action α a is the altruistic action, which we define as α a (θ r, θ w ) = 1, θ r, θ w {0, 1}, (6) where the worker exerts high effort regardless of the worker s and the requester s reputations. The action α f is the fair action, which we define as 0 θ w > θ r α f (θ r, θ w ) = 1 θ w θ r, (7) where the worker exerts high effort only when the requester has a higher or equal reputation. The action α s is the selfish action, which we define as α s (θ r, θ w ) = 0, θ r, θ w {0, 1}, (8) where the worker exerts low effort regardless of the worker s and the requester s reputations. As we will show later, a pair of a recommended strategy and a strategy profile (π 0, π 1 N ) Π f (A afs ) Π N f (A afs ) can achieve social optimum at the equilibrium if we design reputation update rules carefully, while a pair (π 0, π 1 N ) Π f (A as ) Π N f (A as ) incurs performance loss at the equilibrium under all reputation update rules unless the reputation update error ε = 0.

12 12 C. Definition of The Equilibrium As in the game theory literature on stochastic games [10][12], we use public perfect equilibrium (PPE) as the equilibrium concept, but adapt PPE in our setting because we focus on feasible strategy profiles. In a PPE, no user has the incentive to deviate following any history. This is a stronger requirement than that in the Nash equilibrium (NE), which requires that the users do not deviate following the histories that happen in the equilibrium. However, in our setting, due to the reputation update error, all the states occur with positive probabilities under all the strategy profiles. Hence, NE is equivalent to PPE, which means that PPE is the only appropriate equilibrium concept. Before we define PPE, we need to define the continuation strategy π h t, which is a mapping π h t : k=0h k A with π h t(h k ) = π(h t h k ), where h t h k is the concatenation of h t and h k. Keeping in mind that we focus on symmetric feasible strategy profiles, we formally define symmetric feasible PPE (SF-PPE) as follows. Definition 3: A pair of a feasible recommended strategy and a symmetric feasible strategy profile (π 0, π 1 N ) Π f Π N f is a SF-PPE, if for all t 0, for all h t H t, and for all i N, we have U i ( θ t, π 0 ht, 1 π ht N ) (9) U i ( θ t, π 0 h, (π t i h, t π h 1 t N 1 )), π i h Π t f, where (π i ht, 1 π ht N 1 ) is the continuation strategy profile in which user i deviates to π i h t and the other users follow the strategy π h. t The major difference between SF-PPE and conventional PPE is that in PPE, we need to prevent the users from deviating to any strategy π Π. However, in SF-PPE, we need to prevent the deviation to feasible strategies π Π f only, because the users do not know the reputation profiles. D. The Platform Design Problem The goal of the platform designer is to maximize the social welfare at the equilibrium in the worst case (with respect to different initial reputation distributions). Hence, the platform design

13 13 problem can be formulated as follows. max τ,(π 0,π 1 N ) Π 0f Π N f s.t. 1 min θ 0 Θ N N U i (θ 0, π 0, π 1 N ) (10) i N (π 0, π 1 N ) is a SF PPE. The social optimum is b c, achieved by the workers exerting high efforts all the time, which is not an equilibrium strategy. In the next section, we will show that under properly designed reputation update rules, the social optimum b c can be asymptotically achieved at the SF-PPE when the users are sufficiently patient (i.e. the discount factor 1). V. SOCIALLY OPTIMAL DESIGN In this section, we design reputation mechanisms that asymptotically achieve social optimum at the equilibrium, even when the reputation update rule ε > 0. In our design, we use the APS technique, named after the authors of the seminal paper [11], which is also used to prove the folk theorem for repeated games in [9] and for stochastic games in [10]. We will briefly introduce the APS technique first. Meanwhile, more importantly, we will illustrate why we cannot use APS in our setting in the same way as [9] and [10] did. Then, we will show how we use APS in a different way in our setting, in order to design the optimal reputation mechanism and to construct the equilibrium strategy. Finally, we analyze the performance of a class of simple but suboptimal strategies, which sheds light on why the proposed strategy can achieve social optimum. A. The APS Technique APS [11] provides a characterization of the set of PPE payoffs. It builds on the idea of self-generating sets described as follows. Define a set W θ R N for every state θ Θ N, and write (W θ ) θ Θ N as the collection of these sets. Then we have the following definitions [11][10][12]. First, we say a payoff profile v(θ) R N is decomposable on (W θ ) θ ΘN given θ, if there exists a recommended action α 0, an action profile α, and a continuation payoff function γ : Θ N θ Θ N Wθ with γ(θ ) W θ, such that for all i N and for all α i A, v i = (1 )u i (θ, α 0, α ) + θ γ i (θ )q(θ θ, α 0, α ) (11) (1 )u i (θ, α 0, α i, α i) + θ γ i (θ )q(θ θ, α 0, α i, α i).

14 14 Then, we say a set (W θ ) θ Θ N W θ is decomposable on (W θ ) θ Θ N is a self-generating set, if for any θ, every payoff profile v(θ) that any self-generating set is a set of PPE payoffs [11][10][12]. given θ. The important property of self-generating sets is Based on the idea of self-generating sets, [9] and [10] proved the folk theorem for repeated games and stochastic games, respectively. However, we cannot use APS in the same way as [9] and [10] did for the following two reasons. First, we restrict our attention on symmetric feasible strategy profiles, which impose the following constraints on the self-generating sets and the continuation payoff functions. Lemma 1: Suppose that a self-generating set (W θ ) θ Θ N all θ, every payoff profile v(θ) W θ should satisfy is a set of SF-PPE payoffs. Then for v i (θ) = v j (θ), i, j N s.t. θ i = θ j, (12) and should be decomposed by a continuation payoff function such that for all θ, γ i (θ ) = γ j (θ ), i, j N s.t. θ i = θ j. (13) Proof: For a pair of a feasible recommended strategy and a symmetric feasible strategy profile (π 0, π 1 N ), we have for all θ, U i (θ, π 0, π 1 N ) = U j (θ, π 0, π 1 N ), i, j N s.t. θ t i = θ t j. (14) Since for all θ, any payoff profile v(θ) W θ can be achieved by a SF-PPE, it should satisfy v i (θ) = v j (θ), i, j N s.t. θ i = θ j. (15) Since the range of the continuation payoff function is in the self-generating set, for the same reason, any continuation payoff function should satisfy for all θ, γ i (θ ) = γ j (θ ), i, j N s.t. θ i = θ j. (16) Since the restrictions in Lemma 1 are not considered in [9][10], we cannot apply APS in the same way as [9] and [10] did. Moreover, when deriving the sufficient conditions for the folk theorem to hold, both [9] and [10] use the NE of the stage game to decompose the payoffs in the interior of the self-generating set. This prevents them from obtaining the lower bound of the discount factor. More specifically,

15 15 for any feasible payoff profile v, they can prove that there exists a discount factor, such that under any, the payoff profile v can be achieved at PPE. However, they cannot obtain the lower bound. Because of this, they cannot construct the equilibrium strategy that achieves v, because a feasible discount factor is needed to compute the strategy. In contrast, for a payoff profile v that is arbitrarily close to the social optimum (b c) 1 N, we can analytically determine the lower bound of feasible discount factors, and thus can construct the equilibrium strategy to achieve v based on a feasible discount factor. B. Socially Optimal Design Now we use APS in a different way to prove that social optimum can be achieved at a SF-PPE. For the given ɛ 0 and ɛ 1, we construct the following self-generating set. For θ with 1 s 1 (θ) N 1, where (v 0, v 1 ) satisfies v 1 + v 1 For θ with s 1 (θ) = 0, where v 0 where v 1 W θ = { v : v i = v θ i, i N }, (17) v 1 v 0 ɛ 0 ɛ 1, (18) c (N 1)b c v0 z 2 (1 + (N 1)b )(b c) c (N 1)b ɛ 0 ɛ 1, (19) b b 1 N 2 b c N 2 v0 N 1 z 3 b c 1 + c z 2. (20) N 1 (N 1)b ɛ 0 ɛ 1 z 3 z2 (ɛ0 ɛ1) b N 2 1, 1+ c (N 1)b b c N 1 b N 2 (ɛ 0 ɛ 1 ) z 3 N 1 b c b, N 2 1 N 1 b c W 0 N = { v : v i = v 0, i N }, (21). For θ with s 1 (θ) = N, W 1 N = { v : v i = v 1, i N }, (22) b ( N 2 1)z 2+ c N 1 b c (N 1)b z 3 b N 2 + c N 1 b c (N 1)b We can see that each set W θ is bounded by hyperplanes, which is different from the smooth set (e.g. a ball) used in [9] and [10]..

16 16 Apart from the restrictions in Lemma 1, we impose more restrictions on the continuation payoff function to make it even simpler. Specifically, we focus on the simple continuation payoff functions defined as below. Definition 4: We say a continuation payoff function γ : Θ N θ W θ is a simple continuation payoff function, if for any given state θ, and any given payoff profile v W θ, γ satisfies γ i (θ) = γ j (θ ), θ, θ Θ N, and i, j N s.t. θ i = θ j. Given a state θ and a payoff v W θ, a simple continuation payoff function assigns the same continuation payoff for the users with the same reputation under all the future states. Using the above self-generating set and simple continuation payoff functions, we prove that under certain conditions on the reputation update rule τ and the discount factor, the social optimum can be asymptotically achieved at a SF-PPE. For better presentation of the theorem, we b define a few auxiliary variables first. Define κ 1 1 and κ N 2 N 1 b c c. In addition, (N 1)b we define, which will be proved to be the lower bound on the feasible discount factors, as follows { max max s s 1 {1,...,N 1}: 1 N 1 b+ N s 1 N 1 c>ɛ 0 ɛ 1 max θ {0,1} max θ {0,1} ɛ 0 ɛ 1 (ɛ 0 ɛ 1 ) ( N s 1 N 1 β+ 1 ( s1 N 1 b + N s 1 + s 1 1 N 1 x+ 1 N 1 c) ) ( s1 b +, N s 1 N 1 N 1 c) c c + (1 2ε)(β + θ (1 β θ ))(ɛ 0 ɛ 1 ), (23) b c + c x + θ x + θ (1 2ε)[β + θ (1 β )] θ b c + c (1 2ε)[β + θ (1 β )] + κ 1z 2 +(κ 2 1)z 3 θ (1+κ 1)(ɛ 0 ɛ 1 ) z 3 κ 1 Theorem 1: Choose any small real numbers ɛ 1 > 0 and ɛ 0 (ɛ 1, (1 + κ 2 κ 1 )ɛ 1 ). If the following three sets of conditions are satisfied Condition 1: β 1 + > 1 β1 and x + 1 (1 ε)β ε(1 β1 1 ) > ; 1+ c (N 1)b Condition 2: β 0 + > 1 β0 and x + 0 (1 ε)β ε(1 β0 ) < 1 β+ 1 c ; (N 1)b Condition 3: [, 1); then there exists a SF-PPE (π 0, π 0 1 N ) Π f (A afs ) Π N f (A afs ) that achieves starting from any initial reputation profile θ 0. Proof: See Appendix A. U i (θ 0, π 0, π 0 1 N ) = b c ɛ θi, i N, (24) }.

17 17 Theorem 1 gives the sufficient conditions under which the social optimum can be asymptotically achieved. The first two conditions are the restrictions on the optimal reputation update rules. First, the probability that the reputation goes up when the effort level is not lower than the recommended one, β θ +, should be larger than 1 β θ, which is the probability that the reputation goes up when the effort level is lower than the recommended one. Second, for the users with reputation 1, the expected probability that the reputation goes up when the effort level is not lower than the recommended one, x + 1, should be larger than the threshold specified in Condition 1. Meanwhile, for the users with reputation 0, the expected probability that the reputation goes up when the effort level is not lower than the recommended one, x + 0, should be smaller than the threshold specified in Condition 2. In this way, a user will prefer to have reputation 1 because x + 0 < x + 1. Condition 3 gives us an analytical lower bound for the discount factor. This is important for constructing the equilibrium (π 0, π 0 1 N ), because a feasible discount factor is needed to determine the equilibrium. Note that in [9] and [10], the lower bound for the discount factor cannot be obtained analytically. Remark 1: Note that the SF-PPE has a desirable property that the user s strategy π = π 0 is the same as the recommended strategy. Hence, the platform can announce the recommended action in each period, such that the users can follow the recommended action. In this way, the users do not need to calculate the action to take in each period by themselves. Once the conditions for achieving social optimum are satisfied, we know that an optimal equilibrium strategy exists. Now we show how to construct the optimal equilibrium strategy. The optimal equilibrium strategy is not stationary, in that given the same state, the actions taken can be different at different periods. Hence, at each period t, we need to determine the recommended action α0, t which is the same as the users actions α. The algorithm of constructing the recommended strategy, given a sufficiently large discount factor, is described in Table III. Remark 2: Note that in the algorithm, if all the users have the same reputation, the altruistic action α a and the selfish action α s are used; otherwise, the altruistic action α a and the fair action α f are used.

18 18 TABLE III THE ALGORITHM OF CONSTRUCTING THE EQUILIBRIUM STRATEGY. Require: ɛ 0, ɛ 1,, sθ Initialization: t = 0, v θ = b c ɛ θ. repeat if s 1(θ) = 0 then if v 0 (1 ) else end α t 0 = α t = α a v 0 v0 [ ] b c + (1 ε)β+ 0 +ε(1 β 0 ) c (1 2ε)(β + 0 (1 β 0 ) 1 α t 0 = α t = α s v 0 v0, v1 v 0 elseif s 1(θ) = N then else end if v 1 (1 ) else end if else end t t + 1 α t 0 = α t = α a v 1 v1 + ɛ 0 ɛ 1 z 3 then b 1 N 2 N 1 b c [ ] [ ] b c + (1 ε)β+ 0 +ε(1 β 0 ) c,v 1 v c (1 2ε)(β + 0 (1 β 0 ) (1 2ε)(β + 0 (1 β 0 ) [ ] b c + (1 ε)β+ 1 +ε(1 β 1 ) c (1 2ε)(β + 1 (1 β 1 ) 1 α t 0 = α t = α s v 1 v1, v0 v 1 1+( b N 2 N 1 b c 1)x + 0 x + 1 x+ 0 α t 0 = α t = α a v ɛ 0 ɛ 1 z 3 then b 1 N 2 N 1 b c [ ] [ ] b c + (1 ε)β+ 1 +ε(1 β 1 ) c, v 0 v c (1 2ε)(β + 1 (1 β 1 ) (1 2ε)(β + 1 (1 β 1 ) v 1 (1 x + 0 )v1 (1 x + 1 )v0 x + 1 x+ 0 v 1 v 1, v 0 v 0 α t 0 = α t = α f v 1 1 v 1 1 (1 x + 0 )v1 (1 x + s 1 (θ) )v0 x + 0 v1 x + s 1 (θ) v0 x + 0 x+ s 1 (θ) v 1 v 1, v 0 v 0 1+( b N 2 N 1 b c 1)x x + 1 x+ 0 v 0 z 3 (1 )(b c)( 1 (b c), v0 1 x + 1 v0 x + 0 v1 x + 1 x+ 0 1 (b s 1 (θ) 1 N 1 b N 2 N 1 1 (b c) b c 1) then (b s 1 (θ) 1 N 1 c)(1 x+ 0 ) ( s 0 (θ) 1 N 1 b c)(1 x+ s 1 (θ) ) c)x+ 0 ( s 0 (θ) 1 N 1 b c)x+ s 1 (θ) x + 0 x+ s 1 (θ) until

19 19 C. Performance Analysis of Strategies Restricted on A as Previously, we have shown that under certain conditions, the social optimum can be achieved by feasible recommended strategies and symmetric feasible strategy profiles restricted on the subset of action A afs. Now we consider a even simpler class of strategies, restricted on the subset of action A as. The following proposition quantifies the performance loss of the optimal feasible recommended strategy and symmetric feasible strategy profile restricted on A as. Before stating the proposition, we first define s θi 1 N 1 ρ(θ, α 0, S B ) max s S\S B q(s θ, α 0, α a 1 N ) q(s θ, α 0, α i = α 0, α a 1 N ), i N max s 1 θi N 1 s S\S B q(s θ, α 0, α a 1 N ) q(s θ, α 0, α i = α 1, α a 1 N ), (25) } 1. s S\S B q(s θ, α 0, α a 1 N ) q(s θ, α 0, α i = α 01, α a 1 N ) Proposition 1: Starting from any initial reputation profile θ, the maximum social welfare achievable by feasible recommended strategies and symmetric feasible strategy profiles restricted on A as, namely (π 0, π 1 N ) Π f (A as ) Π N f (A as ), is at most b c c ρ(θ, α 0, S B) s S B q(s θ, α 0, α a 1 N ), (26) where α 0, the optimal recommended action, and S B, the optimal subset of reputation distributions, are the solutions to the following optimization problem: min α0 min SB S ρ(θ, α 0, S B ) q(s θ, α 0, α a 1 N ) (27) s S B s.t. q(s θ, α 0, α a 1 N ) > q(s θ, α 0, α i = α 0, α a 1 N ), i N, s S\S B s S\S B q(s θ, α 0, α a 1 N ) > q(s θ, α 0, α i = α 1, α a 1 N ), i N, s S\S B s S\S B q(s θ, α 0, α a 1 N ) > q(s θ, α 0, α i = α 01, α a 1 N ), i N. s S\S B s S\S B Proof: See Appendix B. The following corollary shows that the maximum social welfare achievable by (π 0, π 1 N ) Π f (A as ) Π N f (A as ) is bounded away from the social optimum b c, unless there is no reputation update error. In contrast, if we can use the fair action α f, the social optimum can be asymptotically

20 20 achieve when the discount factor goes to 1. In other words, the differential punishment introduced by the fair action is crucial for achieving social optimum. Corollary 1: Suppose that the reputation update error ε > 0. Then, starting from any initial reputation profile θ, the maximum social welfare achievable by feasible recommended strategies and symmetric feasible strategy profiles restricted on A as, namely (π 0, π 1 N ) Π f (A as ) Π N f (A as ), is bounded away from the social optimum b c, regardless of the discount factor. Proof: We can see from (28) that, the distance between the maximum social welfare achievable by (π 0, π 1 N ) Π f (A as ) Π N f (A as ) and the social optimum b c c ρ(θ, α 0, S B) is independent of the discount factor. s S B q(s θ, α 0, α a 1 N ), (28) Moreover, since ρ(θ, α 0, S B ) > 0, the distance can be zero only when s S B q(s θ, α 0, α a 1 N ) = 0. However, when the reputation update error ε is strictly larger than 0, any reputation profile can occur with positive probabilities. Hence, we must choose SB s SB q(s θ, α0, α a 1 N ) = 0. In this case, the constraint s S\S B q(s θ, α 0, α a 1 N ) > = to have s S\S B q(s θ, α 0, α i = α 0, α a 1 N ) (29) cannot be satisfied, since s S\S B q(s θ, α 0, α i = α 0, α a 1 N ) = s S q(s θ, α 0, α i = α 0, α a 1 N ) = 1. VI. SIMULATION RESULTS We compare against the state-of-the-art reputation mechanism, namely the reputation mechanism with optimal stationary Markov strategies in [3]. The reputation mechanism in [3] has been shown to outperform existing reputation mechanisms [1] and BitTorrent protocols [?]. In a stationary Markov strategy, the action to take depends on the current state only, which is independent of when the current state appears. In contrast, the proposed strategy depends on the history of states. The same current state may lead to different actions in different periods. In our experiments, we consider a system with N = 10 users and discount factor = In the simulation of the optimal stationary Markov strategy [3], the recommended action is fixed to be the fair action α f.

21 21 Normalized social welfare proposed stationary Markov (b=2) stationary Markov (b=3) stationary Markov (b=4) stationary Markov (b=5) Error probability Fig. 1. Comparison of the social welfare under different reputation update errors. We first compare the social welfare (i.e. the average payoff), which has been normalized to the social optimum b c, of the two reputation mechanisms under different benefit values. We can see from Fig. 1 that as the error probability grows, the social welfare of the optimal stationary Markov strategy decreases, and drops to 0 when the error probability is large (e.g. when ε > 0.4). Then, we compare the evolution of the states, the recommended actions, and the actions taken by the users in these two reputation mechanisms, in order to see how they work differently. The evolution under the optimal stationary Markov strategy is shown in Table IV, and that of the proposed strategy is shown in Table V. The first difference is that in the reputation mechanism with stationary Markov strategies [3], the recommended action is fixed and the action taken can be different from the recommended action. On the contrary, in the proposed reputation mechanism, the recommended action is not fixed over time and the action taken is always the same as the recommended action. The second difference is that in stationary Markov strategies, the actions taken in different periods are the same as long as the current state is the same. In contrast, in the proposed strategy, the actions taken can be different even at the same current state. VII. CONCLUSION In this paper, we proposed a design framework for binary reputation mechanisms that can achieve the social optimum in the presence of reputation update errors. We discovered the

22 22 TABLE IV EVOLUTION OF STATES AND ACTIONS TAKEN IN THE OPTIMAL STATIONARY MARKOV STRATEGY. Period State Recommended action (stationary Markov) Action taken (stationary Markov) 0 (4,6) Fair Selfish 1 (4,6) Fair Selfish 2 (6,4) Fair Selfish 3 (6,4) Fair Selfish 4 (4,6) Fair Selfish 5 (4,6) Fair Selfish 6 (3,7) Fair Altruistic 7 (1,9) Fair Altruistic 8 (1,9) Fair Altruistic 9 (2,8) Fair Altruistic TABLE V EVOLUTION OF STATES, RECOMMENDED ACTIONS, AND ACTIONS TAKEN IN THE PROPOSED STRATEGY. Period state Recommended action (proposed) Action taken (proposed) 0 (4,6) Altruistic Altruistic 1 (3,7) Fair Fair 2 (5,5) Altruistic Altruistic 3 (4,6) Fair Fair 4 (5,5) Altruistic Altruistic 5 (6,4) Altruistic Altruistic 6 (4,6) Fair Fair 7 (2,8) Altruistic Altruistic 8 (1,9) Altruistic Altruistic 9 (1,9) Altruistic Altruistic enforceability conditions, which guarantee that the social optimum can be achieved. We proposed a class of reputation update rules, which can fulfill the enforceability conditions when the parameters are chosen carefully. Moreover, we constructed the optimal recommended strategy in the reputation mechanism. The computational complexity is greatly reduced, since only three actions are proved to be enough for the optimal strategy. Simulation results demonstrate the

23 23 significant performance gain of the proposed reputation mechanism over the state-of-the-art mechanisms, especially when the reputation update error probability is large. A. Outline of the proof APPENDIX A PROOF OF THEOREM 1 We derive the conditions under which the set (W θ ) θ Θ N is a self-generating set. Specifically, we derive the conditions under which any payoff profile v W θ is decomposable on (W θ ) θ Θ N given θ, for all θ Θ N. B. When users have different reputations 1) Preliminaries: We first focus on the states θ with 1 s 1 (θ) N 1, and derive the conditions under which any payoff profile v W θ can be decomposed by (α 0 = α a, α a 1 N ) or (α 0 = α f, α f 1 N ). First, v could be decomposed by (α a, α a 1 N ), if there exists a continuation payoff function γ : Θ N θ Θ N Wθ with γ(θ ) W θ, such that for all i N and for all α i A, v i = (1 )u i (θ, α a, α a 1 N ) + γ i (θ )q(θ θ, α a, α a 1 N ) θ (30) (1 )u i (θ, α a, α i, α a 1 N 1 ) + γ i (θ )q(θ θ, α a, α i, α a 1 N 1 ). θ Since we focus on simple continuation payoff functions, all the users with the same future reputation will have the same continuation payoff regardless of the recommended action α 0, the action profile (α i, α 1 N 1 ), and the future state θ. Hence, we write the continuation payoffs for the users with future reputation 1 and 0 as γ 1 and γ 0, respectively. Consequently, the above conditions on decomposability can be simplified to v i = (1 ) u i (θ, α a, α a 1 N ) (31) + γ 1 q(θ θ, α a, α a 1 N ) + γ 0 q(θ θ, α a, α a 1 N ) θ :θ i =1 θ :θ i =0 (1 ) u i (θ, α a, α i, α a 1 N 1 ) + γ 1 q(θ θ, α a, α i, α a 1 N 1 ) + γ 0 q(θ θ, α a, α i, α a 1 N 1 ). θ :θ i =1 θ :θ i =0

24 24 First, consider the case when user i has reputation 1 (i.e. θ i = 1). Based on (5), we can calculate the stage-game payoff as u i (θ, α a, α a 1 N ) = b c. The term θ :θ i =1 q(θ θ, α a, α a 1 N ) is the probability that user i has reputation 1 in the next state. Since user i s reputation update is independent of the other users reputation update, we can calculate this probability as q(θ θ, α a, α a 1 N ) = [(1 ε)β ε(1 β 1 )] µ(m) (32) θ :θ i =1 m M:θ m(i) =1 + [(1 ε)β ε(1 β1 )] µ(m) (33) m M:θ m(i) =0 = (1 ε)β ε(1 β1 ) = x + 1. (34) Similarly, we can calculate θ :θ i =0 q(θ θ, α a, α a 1 N ), the probability that user i has reputation 0 in the next state, as q(θ θ, α a, α a 1 N ) = [(1 ε)(1 β 1 + ) + εβ 1 ] µ(m) (35) θ :θ i =0 m M:θ m(i) =1 + [(1 ε)(1 β 1 + ) + εβ1 ] µ(m) (36) m M:θ m(i) =0 = (1 ε)(1 β 1 + ) + εβ1 = 1 x + 1. (37) Now we discuss what happens if user i deviates. Since the recommended action α a is to exert high effort for all the users, user i can deviate to the other three actions, namely exert high effort for reputation-1 users only, exert high effort for reputation-0 users only, exert low effort for all the users. We can calculate the corresponding stage-game payoff and state transition probabilities under each deviation. exert high effort for reputation-1 users only (α i (1, θ i ) = 1, α i (0, θ i ) = 0): u i (θ, α a, α i, α a 1 N 1 ) = b c m M:θ m(i) =1 µ(m) = b c s1(θ) 1 N 1 (38) q(θ θ, α a, α i, α a 1 N 1 ) (39) θ :θ i =1 = [(1 ε)β ε(1 β1 )] µ(m) + [(1 ε)(1 β1 ) + εβ 1 + ] µ(m) m M:θ m(i) =1 = [(1 ε)β ε(1 β 1 )] s 1(θ) 1 N 1 + [(1 ε)(1 β 1 ) + εβ + 1 ] s 0(θ) N 1. m M:θ m(i) =0

25 q(θ θ, α a, α i, α a 1 N 1 ) (40) θ :θ i =0 = [(1 ε)(1 β 1 + ) + εβ1 ] µ(m) + [(1 ε)β1 + ε(1 β 1 + )] µ(m) m M:θ m(i) =1 = [(1 ε)(1 β + 1 ) + εβ 1 ] s 1(θ) 1 N 1 + [(1 ε)β 1 + ε(1 β + 1 )] s 0(θ) N 1. exert high effort for reputation-0 users only (α i (1, θ i ) = 0, α i (0, θ i ) = 1): m M:θ m(i) =0 25 u i (θ, α a, α i, α a 1 N 1 ) = b c m M:θ m(i) =0 µ(m) = b c s0(θ) N 1 q(θ θ, α a, α i, α a 1 N 1 ) (42) θ :θ i =1 = [(1 ε)(1 β1 ) + εβ 1 + ] µ(m) + [(1 ε)β ε(1 β1 )] µ(m) m M:θ m(i) =1 = [(1 ε)(1 β1 ) + εβ 1 + ] s 1(θ) 1 N 1 + [(1 ε)β ε(1 β 1 )] s 0(θ) N 1. m M:θ m(i) =0 q(θ θ, α a, α i, α a 1 N 1 ) (43) θ :θ i =0 = [(1 ε)β1 + ε(1 β 1 + )] µ(m) + [(1 ε)(1 β 1 + ) + εβ1 ] µ(m) m M:θ m(i) =1 = [(1 ε)β 1 + ε(1 β + 1 )] s 1(θ) 1 N 1 + [(1 ε)(1 β + 1 ) + εβ 1 ] s 0(θ) N 1. exert low effort for all the users (α i (1, θ i ) = 0, α i (0, θ i ) = 0): m M:θ m(i) =0 (41) u i (θ, α a, α i, α a 1 N 1 ) = b (44) q(θ θ, α a, α i, α a 1 N 1 ) (45) θ :θ i =1 = [(1 ε)(1 β1 ) + εβ 1 + ] µ(m) + [(1 ε)(1 β1 ) + εβ + 1 ] µ(m) m M:θ m(i) =1 m M:θ m(i) =0 = (1 ε)(1 β1 ) + εβ 1 +. q(θ θ, α a, α i, α a 1 N 1 ) (46) θ :θ i =0 = [(1 ε)β1 + ε(1 β 1 + )] µ(m) + [(1 ε)β1 + ε(1 β + 1 )] µ(m) m M:θ m(i) =1 m M:θ m(i) =0 = (1 ε)β1 + ε(1 β 1 + ).

26 26 Plugging the above expressions into (31), we can simplify the incentive compatibility constraints (i.e. the inequality constraints) to under all three deviating actions. (1 2ε) [ β + 1 (1 β 1 ) ] (γ 1 γ 0 ) 1 c, (47) Hence, if user i has reputation 1, the decomposability constraints (31) reduces to v 1 = (1 ) (b c) + [x + 1 γ 1 + (1 x + 1 )γ 0], (48) where v 1 is the payoff of the users with reputation 1, and and (1 2ε) [ β + 1 (1 β 1 ) ] (γ 1 γ 0 ) 1 c. (49) Similarly, if user i has reputation 0, we can reduce the decomposability constraints (31) to v 0 = (1 ) (b c) + [x + 0 γ 1 + (1 x + 0 )γ 0], (50) (1 2ε) [ β + 0 (1 β 0 ) ] (γ 1 γ 0 ) 1 c. (51) For the above incentive compatibility constraints (the above two inequalities) to hold, we need to have β + 1 (1 β 1 ) > 0 and β + 0 (1 β 0 ) > 0, which are part of Condition 1 and Condition 2. Now we will derive the rest of the sufficient conditions in Theorem 1. The above two equalities determine the continuation payoff γ 1 and γ 0 as below γ 1 = 1 (1 x+ 0 )v1 (1 x + 1 )v0 1 (b c) x + 1 x+ 0. (52) γ 0 = 1 x+ 1 v0 x + 0 v1 1 (b c) x + 1 x+ 0 Now we consider the decomposability constraints if we want to decompose a payoff profile v W θ using the fair action α f. Since we focus on decomposition by simple continuation payoff functions, we write the decomposition constraints as v i = (1 ) u i (θ, α f, α f 1 N ) (53) + γ 1 q(θ θ, α f, α f 1 N ) + γ 0 q(θ θ, α f, α f 1 N ) θ :θ i =1 θ :θ i =0 (1 ) u i (θ, α f, α i, α f 1 N 1 ) + γ 1 q(θ θ, α f, α i, α f 1 N 1 ) + γ 0 q(θ θ, α f, α i, α f 1 N 1 ). θ :θ i =1 θ :θ i =0

Socially-Optimal Design of Service Exchange Platforms with Imperfect Monitoring

Socially-Optimal Design of Service Exchange Platforms with Imperfect Monitoring Yuanzhang Xiao and Mihaela van der Schaar Abstract We study the design of service exchange platforms in which long-lived