Optimal Information Design for Reputation Building

Similar documents
Economics 2202 (Section 05) Macroeconomic Theory Practice Problem Set 3 Suggested Solutions Professor Sanjay Chugh Fall 2014

TOTAL PART 1 / 50 TOTAL PART 2 / 50

Economics 602 Macroeconomic Theory and Policy Problem Set 4 Suggested Solutions Professor Sanjay Chugh Summer 2010

Importantly, note that prices are not functions of the expenditure on advertising that firm 1 makes during the first period.

Economics 325 Intermediate Macroeconomic Analysis Practice Problem Set 1 Suggested Solutions Professor Sanjay Chugh Spring 2011

Problem Set 8 Topic BI: Externalities. a) What is the profit-maximizing level of output?

FOREST CITY INDUSTRIAL PARK FIN AN CIAL RETURNS EXECUTIVE SUMMARY

Output and Expenditure

Dynamic Pricing of Di erentiated Products

At a cost-minimizing input mix, the MRTS (ratio of marginal products) must equal the ratio of factor prices, or. f r

Sequential Procurement Auctions and Their Effect on Investment Decisions

Experimentation, Private Observability of Success, and the Timing of Monitoring

Kyle Bagwell and Robert W. Staiger. Revised: November 1993

ARTICLE IN PRESS. Journal of Health Economics xxx (2011) xxx xxx. Contents lists available at SciVerse ScienceDirect. Journal of Health Economics

Exogenous Information, Endogenous Information and Optimal Monetary Policy

Strategic Dynamic Sourcing from Competing Suppliers: The Value of Commitment

The Impact of Capacity Costs on Bidding Strategies in Procurement Auctions

State of New Mexico Participation Agreement for Deferred Compensation Plan

AUDITING COST OVERRUN CLAIMS *

Optimal Disclosure Decisions When There are Penalties for Nondisclosure

Licensing and Patent Protection

County of San Diego Retirement Benefit Options

Say you have $X today and can earn an annual interest rate r by investing it. Let FV denote the future value of your investment and t = time.

ON TRANSACTION COSTS IN STOCK TRADING

0NDERZOEKSRAPPORT NR TAXES, DEBT AND FINANCIAL INTERMEDIARIES C. VAN HULLE. Wettelijk Depot : D/1986/2376/4

Valuation of Bermudan-DB-Underpin Option

Asymmetric Integration *

PROSPECTUS May 1, Agency Shares

Econ 455 Answers - Problem Set Consider a small country (Belgium) with the following demand and supply curves for cloth:

Health Savings Account Application

Exogenous Information, Endogenous Information and Optimal Monetary Policy

Contending with Risk Selection in Competitive Health Insurance Markets

CONSUMPTION-LEISURE FRAMEWORK SEPTEMBER 20, 2010 THE THREE MACRO (AGGREGATE) MARKETS. The Three Macro Markets. Goods Markets.

Important information about our Unforeseeable Emergency Application

The Simple Economics of White Elephants

Policy Consideration on Privatization in a Mixed Market

Risk Sharing and Adverse Selection with Asymmetric Information on Risk Preference

CHAPTER 9 BUDGETARY PLANNING SUMMARY OF QUESTIONS BY STUDY OBJECTIVES AND BLOOM S TAXONOMY. True-False Statements. Multiple Choice Questions

CONSUMPTION-LABOR FRAMEWORK SEPTEMBER 19, (aka CONSUMPTION-LEISURE FRAMEWORK) THE THREE MACRO (AGGREGATE) MARKETS. The Three Macro Markets

The Simple Economics of White Elephants

Transport tax reforms, two-part tariffs, and revenue recycling. - A theoretical result

Bidding for network size

Centre de Referència en Economia Analítica

Study on Rural Microfinance System s Defects and Risk Control Based on Operational Mode

Giacomo Calzolari and Giancarlo Spagnolo*

Are Hard Budget Constraints for Sub-National GovernmentsAlwaysEfficient?

Source versus Residence Based Taxation with International Mergers and Acquisitions

Associate Professor Jiancai PI, PhD Department of Economics School of Business, Nanjing University

Clipping Coupons: Redemption of Offers with Forward-Looking Consumers

Should platforms be allowed to charge ad valorem fees?

County of San Diego Participation Agreement for 457(b) Deferred Compensation Plan

Page 80. where C) refers to estimation cell (defined by industry and, for selected industries, region)

Optimal Contracting with Unknown Risk Preference

Class Notes: Week 6. Multinomial Outcomes

State of New Mexico Distribution Request for Deferred Compensation Plan

Nash Bargaining Part I - The Continuous Case

Decision-making Method for Low-rent Housing Construction Investment. Wei Zhang*, Liwen You

Managerial Legacies, Entrenchment and Strategic Inertia

Optimal Monetary Policy in a Model of the Credit Channel

Limiting Limited Liability

Intermediating Auctioneers

IS-LM model. Giovanni Di Bartolomeo Macro refresh course Economics PhD 2012/13

Managerial Legacies, Entrenchment and Strategic Inertia

Globalization, Jobs, and Welfare: The Roles of Social Protection and Redistribution 1

Availability Analysis with Opportunistic Maintenance of a Two Component Deteriorating System

Optimal Monetary Policy in a Model of the Credit Channel

The Economics of Setting Auditing Standards

A Truthful Budget Feasible Multi-Armed Bandit Mechanism for Crowdsourcing Time Critical Tasks

Voluntary Prices vs. Voluntary Quantities

The Simple Economics of White Elephants

On the Welfare Benefits of an International Currency

Decision, Risk & Operations Working Papers Series

i e AT 16 of 2008 INSURANCE ACT 2008

An EOQ Model with Parabolic Demand Rate and Time Varying Selling Price

First-price equilibrium and revenue equivalence in a sequential procurement auction model

Research Article The Real Causes of Inflation

Rational Bias in Inflation Expectations

Rational Bias in Inflation Expectations

Explanatory Memorandum

Bonus-Malus System with the Claim Frequency Distribution is Geometric and the Severity Distribution is Truncated Weibull

Title: Bertrand-Edgeworth Competition, Demand Uncertainty, and Asymmetric Outcomes * Authors: Stanley S. Reynolds Bart J. Wilson

Consumption smoothing and the welfare consequences of social insurance in developing economies

Multi-Firm Mergers with Leaders and Followers

Tax-loss Selling and the Turn-of-the-Year Effect: New Evidence from Norway 1

Investment and capital structure of partially private regulated rms

Credible Threats, Reputation and Private Monitoring.

Libertarian Paternalism, Information Sharing, and Financial Decision-Making

The Optimal Monetary and Fiscal Policy Mix in a Financially Heterogeneous Monetary Union

NBER WORKING PAPER SERIES MYOPIA AND THE EFFECTS OF SOCIAL SECURITY AND CAPITAL TAXATION ON LABOR SUPPLY. Louis Kaplow

Market Power Rents and Climate Change Mitigation. A Rationale for Export Taxes on Coal? Philipp M. Richter, Roman Mendelevitch, Frank Jotzo

i e V04 ANTI-MONEY LAUNDERING AND COUNTERING THE FINANCING OF TERRORISM (AMENDMENT) CODE 2018

i e SD No.2015/0206 PAYMENT SERVICES REGULATIONS 2015

Retirement Benefits Schemes (Miscellaneous Amendments) RETIREMENT BENEFITS SCHEMES (MISCELLANEOUS AMENDMENTS) REGULATIONS 2014

This article attempts to narrow the gap between

An Economic Analysis of User-Privacy Options in Ad-Supported Services

Discriminatory tariffs and international negotiations

Hillgrove Resources Ltd

DISCUSSION PAPER SERIES. No MARKET SIZE, ENTREPRENEURSHIP, AND INCOME INEQUALITY. Kristian Behrens, Dmitry Pokrovsky and Evgeny Zhelobodko

T R A D E A N D I N D U S T R I A L P O L I C Y S T R A T E G I E S

Calculus VCT plc. For investors looking for regular, tax-free income. Please send completed application packs to:

Transcription:

Optimal Information Design for Reputation Building Erik Lillethun February 23, 2017 Most reent version here: http://eonweb.usd.edu/~elilleth/pdfs/lillethun_jmp.pdf Abstrat Conventional wisdom holds that ratings and review platforms serve onsumers best when they reveal the maximum amount of information to onsumers at all times. This paper shows within a stylized model how this may not be true. The hannel is that partial information may inentivize reputation-minded firms to invest more in quality. Committing to publish all reviews an lead to a old start problem, where there is a failure to attrat early adopters, thereby shutting down the soure of information. To find a solution to this problem, I use a dynami Bayesian Persuasion model in whih a long-run firm with a persistent type interats with a sequene of short-run onsumers. When the platform designs the publi information poliy to maximize total onsumer welfare, there is a poliy with three phases that onverges to optimal as reviews beome frequent. In the first phase, the platform reveals reviews with an interior probability and onsumers learn about the firm. In the seond phase, the onsumers observe all reviews, and the firm always produes high quality. Finally, in the third phase, new reviews are hidden entirely and the firm produes low quality without damage to its reputation. When the designer has weaker ommitment power and may revise its poliy at a small ost, a repeated three phase poliy is robust to revisions and remains optimal. Department of Eonomis, University of California, San Diego. Address: Department of Eonomis, University of California, San Diego, 9500 Gilman Dr. #0508, La Jolla, CA 92093-0508, USA, Email: elilleth@usd.edu. I am grateful to Joel Watson, Nageeb Ali, and Joel Sobel for their ontinual support and guidane. I would also like to thank Simone Galperti, Isla Globus-Harris, David Coyne, and UC San Diego seminar partiipants for their helpful omments.

1 Introdution Reviews of onsumer produts and servies have long been an important part of the portfolio of information that onsumers use to make purhasing deisions. For example, the Better Business Bureau has been operating for over 100 years, and Consumer Reports magazine began publiation in 1936. One the internet beame ubiquitous, websites began olleting large masses of reviews based on atual onsumer experienes. Today, these take the form of reviews left on the websites of online marketplaes (most notably Amazon in the U.S.) and independent platforms designed almost exlusively to ollet reviews for onsumers to use (suh as Yelp, TripAdvisor, and Angie s List). The ommon thread shared by all of these platforms is that onsumers make purhasing deisions over time based on a urrent stok of publi reviews, and after puhasing, they may ontribute their own experiene to this stok. Even when the reviews themselves are honest, the platform an ontrol what information is presented to onsumers, whether to present a biased sample of reviews or to oneal negative reviews. This is a ontroversial pratie. Consumers are a great deal about the quality of the information presented to them on these platforms, and allegations of information suppression make headlines. Sometimes the ontroversy enters on onealing or delaying ertain types of reviews (Kane [2015], Cohrane [2011], Pihi [2014]). In other ases, platforms use seret proprietary algorithms to assign sores or rankings, suh as the TripAdvisor Popularity Ranking, whih onsumers often suspet of manipulating information in a biased way (Vivion [2013], Needham [2016]). Either way, the popular onsensus is that onsumers suffer whenever a platform withholds information from them, and that a benevolent review site should reveal all user reviews immediately, exatly, and entirely. Contrary to this ommon intuition, this paper finds that onsumers may be best off when there is some onealing of reviews. One piee missing from the ommon understanding of the role of a review platform is that firms make prodution deisions over time in order to build and maintain good reputations on these sites. Sine building a good reputation involves produing a high quality produt or servie, reputation building benefits onsumers. Even more importantly, high initial quality enourages early onsumers to buy the produt (early adopters) and review it for the benefit of later onsumers. With low expeted initial quality, there are no early adopters and there is no learning about the firm, whih is alled the old start problem. When high quality is expensive to produe, a review platform whih reveals all reviews immediately may not be able to support high initial quality, even when high quality improves total welfare. On the other hand, better outomes an be ahieved with an information poliy whih rewards firms who produe high quality by sometimes 1

Consumer Payoffs Full Info Poliy Optimal Poliy Cold Start Problem Higher quality 0 0 Prior Beliefs 1 Figure 1: Consumer Payoffs in the Optimal Poliy and Full Information Poliy allowing them to produe low quality without any negative onsequenes. This paper uses a model of reputation pioneered by Kreps, Milgrom, Roberts, and Wilson [1982] and Fudenberg and Levine [1989]. There is a monopolisti firm that may be one of two types. The good type always produes high quality. The normal type may hoose between produing high or low quality, where high quality is more ostly than low quality. Consumers hoose whether to buy the produt without first observing its quality. This stage game is repeated over a finite horizon with the same firm but a different onsumer eah time. High quality is possible even with a finite horizon, beause the normal firm an build and maintain a high onsumer belief that it is a good type (i.e., a good reputation), thereby staying in business. When osts are low and there is suffiient patiene, the temptation to shirk and produe low quality does not overwhelm the firm s deision making until the very end of the time horizon. When osts are high or the firm is less patient, the normal firm needs extra inentives to enourage it to produe high quality. One problem with revealing all reviews (whih I all full information ) is that as soon as the firm shirks, all future onsumers know it is not a good type, and they refuse to buy from the firm in the future. To give the firm stronger inentives, high quality an be rewarded by later allowing the firm to shirk and get away with it. The main result of this paper shows that a very stark version of this type of inentive maximizes the onsumers welfare. This optimal poliy indues higher average quality from the normal type of firm and overomes the old start problem, as shown in Figure 1. 2

I expliitly haraterize an information poliy featuring three phases: partial dislosure, full dislosure, and no dislosure (hene, I all it a PFN poliy ). Initially, if beliefs are too low, there is a partial dislosure learning phase. In the learning phase, reviews are revealed with a probability that makes the firm indifferent, so the firm mixes and onsumers learn. A lak of bad reviews in the learning stage drives onsumer beliefs upwards to a speifi utoff. In the seond phase, the platform threatens to reveal all reviews publily (full dislosure), while the firm produes high quality. In the third phase, the platform ommits to onealing all reviews (no dislosure), and the firm gets away with low quality. The third phase poliy rewards a firm with a good reputation (one that did not produe low quality in the seond phase) by letting it produe low quality while still selling to onsumers. The learning phase may be essential for making the third phase reward effetive, sine low beliefs will lead the onsumer to avoid buying the produt in the third phase. In the limit as reviews beome frequent, this poliy is optimal. Then, I adapt these results to an infinite horizon model. In the infinite horizon proof, the optimal poliy must feature smoother payoffs over time ompared to the straightforward PFN poliy. I show that when the horizon is infinite, a poliy whih repeats the PFN poliy onverges to optimal as the frequeny of repetition inreases. However, the one shot PFN poliy is also optimal, as it ahieves the same initial expeted payoff as the repeated poliy. Nevertheless, the smoother payoffs of the repeated poliy are appealing in a way that I formalize. I show that for frequent repetition, this poliy is robust to revising the poliy (a weakening of the designer s ommitment power). Speifially, I hange the model so that the designer an publily revise the poliy at any time for some small one time ost. For any revision ost, there exists a repeated PFN poliy where the designer never revises in equilibrium. On the other hand, the one shot PFN poliy is not robust to revision for small revision osts. As a result, the one shot PFN poliy fails to inentivize high quality, whereas the frequently repeated PFN poliy sueeds. Finally, I onsider the problem where the firm is a provider of redene goods. In other words, the onsumers have idiosynrati preferenes for different levels of servie, but only the firm knows what these preferenes are. Servie providers suh as mehanis, dotors, and ontrators fall into this ategory, sine they have the expertise to diagnose the ustomers problems. As shown in Ely and Välimäki [2003], the reputation building inentives of these firms an be harmful, as they may try to exessively signal that they are not the type that upsells. They do so by providing heaper levels of servie where more expensive servie is warranted. In a three period example, I show that the designer s 3

optimal poliy is nearly reversed in this bad reputation model, with no dislosure of reviews early on and full dislosure later. The early onealment period removes the firm s inentives to signal its type. This paper joins the literature on stati Bayesian Persuasion, initiated by Kamenia and Gentzkow [2011], and dynami Bayesian Persuasion, studied in Ely [2015]. I onsider a dynami information design problem with non-trivial strategi interations, a forward looking agent, and a dual inentive problem (the designer motivates both the firm and the onsumers). There is also a signifiant literature pertaining to the old start problem. Previously established solutions to the old start problem usually ome in one of two lasses: mahine learning and information obfusation. Mahine learning solutions (Shein, Popesul, Ungar, and Pennok [2002], Elahi, Rii, and Rubens [2016]) try to give users aurate reommendations or ratings using tangentially related information suh as a user s own history, other users histories, and similarities aross produts (and importantly, they proess user feedbak to hone reommendations). However, there still must be an existing stok of information to start from. In previous information obfusation solutions (as in Che and Horner [2015]), there must still be an external soure of private information about the produt. The main result of my paper solves the old start problem without any exogenous or indiretly related soures of information and uses only the most fruitful soure of information, user reviews of the produt in question. The optimal onealment of information found in this paper is omplementary to the hannel identified by Kremer, Mansour, and Perry [2014] and Che and Horner [2015]. In these papers, there is a sequene of onsumers who may experiment (e.g., buy the produt) and learn some information, or they may play it safe but learn nothing. Experimentation has positive externalities, beause future onsumers an learn more about the produt. Conealing information enourages experimentation, as a higher bar must be reahed to onvine onsumers not to hoose the produt. These papers are about soial learning about a fixed produt whose quality does not hange over time. As suh, there are no produt quality and reputation onerns in their models. These onerns play a starring role in my model. Moreover, their optimal poliies feature early onealment instead of the late onealment result of this paper. The most losely related papers ombine reputation with information design. Dellaroas [2006], Bar-Isaa and Deb [2016], Horner and Lambert [2016], and Pei [2015] all study ways of optimally obsuring information. However, these papers do not takle the old start problem, nor do they haraterize a dynami information poliy that is optimal aross all poliies. My goal here is to show how the information poliy that is optimal 4

for onsumers involves onealing information from them. This paper also adds to the debate over the permanene of reputations. Cripps, Mailath, and Samuelson [2004] show that in a similar model with full information, the onsumers learn the firm s type with ertainty in the limit as t, whih means the firm s reputation must break down at some point. On the other hand, Ekmeki [2011] finds that there exist information poliies where the firm s reputation remains fully intat on every history, although these are not neessarily generally optimal poliies. My main infinite horizon result shows that there is an optimal poliy in whih the normal type of firm has a positive probability of maintaining a good reputation permanently. There are potentially interesting researh projets in whih onsumers produe reviews strategially, or where reviews may not be legitimate (oming from onsumers who did not purhase or from biased parties who are not onsumers, suh as the firm or another ompeting firm). This paper is not about the strategi inentives of reviewers. Instead, it fouses on the reputation building inentives of firms. Therefore, for the rest of this paper, I assume that reviews are omplete, honest representations of onsumer experienes. This paper proeeds as follows: First, an optimal poliy is derived in the setting of a simple three period example. Then, the general finite horizon, ontinuous time model is explained. Next, the paper presents the main finite horizon result followed by some disussion. This result is then adapted for an infinite horizon model. The paper proeeds to analyze the robustness of these poliies to future revisions from the designer. Then, there is an extension to a bad reputation model. Finally, some extensions are disussed before onluding. All omitted proofs are in the Appendix. 2 Example Consider a three period model (t {1, 2, 3}) with no disounting (there must be at least three periods to demonstrate the main problem). There are three ategories of agents in this model: a designer (the review platform), a firm, and the onsumers. In eah period, a single monopolisti firm hooses a produt quality q t, whih is either high (H) or low (L). A short-lived onsumer who only lives for a single period hooses to either buy (y) or not buy (n). The payoff matrix for eah period is shown in Figure 2. The ost of produing high quality is > 1, whereas low quality has no ost. This example has a prie of 1, and the onsumer values the high quality produt at 2 and the low quality produt at 0. Nature hooses whether the firm is ommitted to produing high quality (type C) or 5

y n H 1, 1, 0 L 1, 1 0, 0 Figure 2: Firm and Consumer Stage Payoffs has the normal hoie between high and low quality (type N). Only the firm observes its type. Let µ 1 be the prior probability that Nature selets type C. In general, let µ t be the onsumer s belief at the beginning of period t. Sine the onsumers always get a weakly higher payoff from type C, I all this the good type, while type N is the normal type. The firm is a long-run player whih maximizes the sum of its stage payoffs. There is also a designer (the review platform) who observes the quality hoie of the firm at eah period if and only if that period s onsumer hooses y (by observing the reviews of dilligent, honest onsumers). The designer an send publi messages m t in eah period t at the very end of the stage game (the third period message is irrelevant and is ignored). Moreover, the designer an ommit to any mapping from histories to distributions of heap talk messages. Let the spae of messages be {G, B} for all t (only two messages are needed for the optimal poliy). Intuitively, G (the good signal) will be indiative of good type, and B (the bad signal) will be indiative of the normal type. Using the probability of signal realization G to represent any message distribution, the designer s poliy is any mapping from histories to probabilities of sending G. The designer s objetive is to maximize the sum of onsumer payoffs. 1 A full information poliy immediately reveals every firm quality hoie: for every history, the designer sends G whenever the most reent quality is high and B whenever the most reent quality is low. This paper fouses on sequential equilibria (whih are heneforth alled just equilibria ). A full information equilibrium fails to overome the old start problem for low priors, beause the normal firm does not have suffiient inentive to produe high quality, whih sares away the potential early adopter (the t = 1 onsumer). Moreover, when beliefs are barely high enough to attrat an early adopter, there is still a substantial loss ompared to a omplete information benhmark. Proposition 1. In a full information equilibrium, the normal firm plays L in every period. The designer s expeted payoff is { 0 if µ 1 < 1 2 4 µ 1 1 if µ 1 1 2 1 If the platform is a monopolist and gets profits solely from ad revenue, suh a poliy is also profit maximizing, beause it still provides valuable information to onsumers. 6

3 2.5 Designer's Expeted Payoff 2 1.5 1 0.5 0 Full Info Optimal Poliy 0 0.25 0.5 0.75 1 1 Figure 3: Designer s Expeted Payoff in Various Poliies when = 3 2 Proposition 1 reveals that a symmetri belief of 1 is speial. At this belief, when 2 the onsumer expets the normal firm to hoose low quality, the onsumer is indifferent. Whether the onsumer buys or not is going to be ruial for designing rewards for the firm. I will refer to µ t 1 as the firm having a good reputation in period t and µ 2 t < 1 2 as the firm having a bad reputation in period t. In order for rewards to properly inentivize the firm to produe high quality, the firm must first have a good reputation. The optimal poliy will be split into two ases: a high prior ase where the firm starts with a good reputation, and a low prior ase where a good reputation must be earned. Figure 3 shows the relevant designer payoffs as a funtion of the prior belief when = 3 2. The dashed red line graphs the payoff from a full information poliy. For µ 1 < 1 2, full information fails to overome the old start problem, so both types of firm never sell. For µ 1 1, there is no old start problem. However, there is still room for improvement 2 sine the normal firm shirks and the onsumer buys in period 1. Note that for > 2, no poliy an improve on full information. For these high osts, even letting the firm sell at low quality for periods 2 and 3 annot indue it to produe high quality in the first period. Unable to provide effetive inentives, the designer helps the onsumers best by giving them full information, allowing the seond and third onsumers to avoid buying from the normal firm. In the subsequent disussion on the 7

optimal poliy, the relevant region of osts is (1, 2], where designing inentives an make a differene. 2.1 High Prior Assume that µ 1 1, whih means that the first onsumer buys no matter what the 2 normal firm s first period strategy is. Although a poliy is allowed to have signals after period t depend on the history before t, this is unneessary in the optimal poliy. Furthermore, using an argument similar to that found in Kamenia and Gentzkow [2011], if signal realization B results in market breakdown, then B should drive onsumer beliefs all the way to 0. This follows, beause it allows realization G to be as likely as possible, preventing a breakdown with a higher probability. This implies that G is always sent after high quality, so the only missing piee is the probability of sending G after low quality. The following proposition gives the optimal probability: Proposition 2. If µ 1 1, the optimal poliy takes the following form: 2 1. P r(m 1 = G q 1 = H) = P r(m 1 = B q 1 = L) = 1 2. P r(m 2 = G q 2 = H) = 1, P r(m 2 = G q 2 = L) = 1 Part 1 shows that there is full information in the first period, leading to high quality. Part 2 speifies the onealment in period 2. The firm gets away with period 2 shirking with probability 1. This gives the firm a high payoff in the last period onditional on high quality in the first period. This leads to higher quality for the first onsumer than is otherwise possible. However, it omes at the expense of the last onsumer, who sometimes buys a low quality produt. The overall effet is still positive, beause the last onsumer sometimes observes a bad signal and does not buy. The designer s payoff in the optimal poliy is represented by the solid blue line in Figure 3. This payoff lies stritly above the full information poliy payoff for all µ 1 1 2. For high priors, the optimal poliy payoff onverges to the full information poliy payoff as 2. 2.2 Low Prior With a low prior (µ 1 < 1), the firm must first drive the onsumers beliefs above 1, so that 2 2 the third onsumer buys the normal firm s low quality produt. This requires some mixing from the firm in the first period, beause produing high quality with ertainty pools it ompletely with the good type firm, so beliefs annot hange. However, onsumers are 8

served best when the probability of low quality is minimized. The general struture of the optimal poliy is the same as for the high prior exept for the mixing in the first period. However, this assumes that full information an be improved upon. In this short disrete time example, a lower prior requires a lower probability of high quality in the first period in order to inrease beliefs enough. If this probability is too low, there is a old start problem, so no onsumer buys. This leads to the following optimal poliy: Proposition 3. 1. If 1 4 µ 1 < 1 2, the optimal poliy is the same as in Proposition 2 and P r(q 1 = H) = µ 1 1 µ 1. 2. If µ 1 < 1, any poliy is optimal. 4 The range of priors 1 µ 4 1 < 1 shows the power of the optimal poliy to prevent the 2 old start problem. In this range, full information results in no onsumer ever buying. However, in the optimal poliy, the firm mixes in the first period to drive beliefs upward (a prerequisite to obtaining a later shirking reward) but still hooses high quality with high enough probability to onvine the first onsumer to buy. When µ 1 < 1, no poliy an reward the firm enough for produing high quality in 4 period 1, so there is low quality and onsumers never buy. It is worth noting that the lower bound µ 1 1 is an artifat of the time horizon. For lower beliefs, three periods is 4 not enough time to build up beliefs slowly (playing H often enough to enourage buying) while still leaving enough time for the firm to reap the rewards of its good reputation. Any interior belief, given a suffiiently long time horizon, an lead to an outome better than full information. Figure 3 shows the optimal poliy in this range of priors. The solid blue line starts off flat at 0 payoff (same as the full information line) and jumps to a positive and quikly inreasing payoff at µ 1 = 1. Note that the slope of the optimal poliy payoff line is 4 greater in [ 1, 1) than it is in [ 1, 1]. This reflets the loss assoiated with having the firm 4 2 2 build its reputation. The firm must sometimes hoose low quality in the first period. Yet the first onsumer still buys, beause the probability of high quality is large enough. As a result, the first onsumer may reeive a low, though still positive, expeted payoff. There are three lessons from this example. First, if the prior is low, there is initial mixing by the normal firm to build up beliefs. Seond, there is full information early on. Finally, there is onealment of information later on. The result is (possibly) mixed quality, followed by high quality, followed by low quality. For suffiiently high beliefs, the onsumers always buy unless they reeive a perfet signal that the firm is a normal type. 9

3 Model For the general analysis, I onsider a finite horizon ontinuous time model. Continuous time simplifies the haraterization of the optimal poliy without hanging it qualitatively. 2 As before, there is a designer, a firm, and a ontinuum of onsumers (one for eah time t). At eah time t [0, T ] (for some T < ), the firm and onsumer make simultaneous hoies: the firm hooses quality q(t) {L, H}, and the onsumer hooses ation a(t) {y, n}, indiating whether or not the onsumer buys. The payoff matrix below desribes the firm s flow payoffs and the onsumer s payoffs at any time t: y n H p, b p, 0 L p, p 0, 0 Figure 4: Firm and Consumer Payoffs The prie of the good being sold is fixed at p > 0, but the firm may adjust the quality. 3 The onsumer s value of high quality ompared to low quality is b > p. Finally, > 0 is the marginal ost of produing high quality instead of low quality. Assume that b > > p. The upper bound ensures that high quality is soially effiient. The lower bound fouses attention on the ase where a full information poliy is suboptimal. The designer observes the arrivals of a Poisson proess with time t arrival rate of λ 1{q(t) = L} 1{a(t) = y} for parameter λ > 0 (1{ } is an indiator funtion). This is a perfet bad news model, sine the good type of firm an never produe an arrival of this proess. However, the normal type an imitate the good type for as long as it pays flow ost. The fator 1{a(t) = y} differentiates this from the standard perfet bad news model. In this model, news an only arrive while onsumers are buying. A natural interpretation of this model is as follows: The designer s information omes entirely from onsumer reviews. Consumers an only submit reviews if they buy. Finally, onsumers do not always notie low quality (λ < ). The signal that the designer observes at time t will be alled x(t), where x(t) = 1 if there was bad news at time t and x(t) = 0 otherwise. The designer sends messages m(t) over time, whih are immediately observed by the firm and all onsumers. A publi history h t H is the history of past messages: 2 A key differene is that the poliy does not need to randomize in order to drive the firm to indifferene. Rather, this an be ahieved by ontrolling the duration of the firm s reward period. 3 A version of this game where pries are determined by Nash bargaining given onsumer beliefs yields similar results. 10

Firm and onsumer hoose q(t) and a(t) Designer observes signal Designer sends message m(t) aording to σ Firm and onsumers observe m(t) and update beliefs Figure 5: Timeline at time t h t = {m(t )} t <t. Consumers observe only the publi history, so a onsumer strategy is a mapping a : H {y, n} (mixed strategies are not allowed, although this is not restritive, beause the designer never wants onsumers to mix). The designer s history h t d ontains all past messages and signal arrivals. Let {t i} i {1,...,n} be a history of bad news arrival times up to and inluding time t. Then the designer s time t history is h t d = ({t i} i {1,...,n}, h t ) H d. Then, a designer s poliy is σ : H d M, where M is an arbitrary message spae satisfying M 2. Finally, the normal firm knows the publi history and the bad news arrivals prior to time t (all this {t i } i {1,...,m} ) 4 and also knows its atual history of quality: h t f = ({q(t )} t <t, {t i } i {1,...,m}, h t ) H f. The firm hooses from strategies q : H f {H, L}. For simpliity, q(h t f ) may refer to the probability of H on firm history h t f. This definition of strategies implies a time t timeline as shown in Figure 5. Let the designer s and onsumer s realized beliefs before the stage game at time t be µ(t) and ν(t), respetively. There is a ommon prior, so µ(0) = ν(0). These beliefs depend on the observed histories, so ν(t) = ν(h t ) and µ(t) = µ(h t d ). The onsumer at time t best responds aording to the stage game in Figure 4 given beliefs ν(t). The utoff belief for the onsumer given that the normal type hooses L is ν, where ν b p = 0 whih implies that ν = p (this is the analogue of belief 1 in the example). Any b 2 poliy and strategy profile indues a stohasti proess (q(t), a(t)). Let E q,a,σ [ h] be the expetation onditional on observing history h and believing strategies q and a determine quality and buying (probability 0 histories an lead to any beliefs). Given poliy σ and believing [ the onsumers to at aoring to strategy â, at eah time t, the firm maximizes ] E q,â,σ T u t f (q(s), a(s)) ds h t f, where u f (q, a) is the firm s flow payoff from Figure 4. For any given poliy σ, an equilibrium is a strategy profile (q, a), a believed profile (ˆq, â), and beliefs ν suh that 4 It may seem unreasonable to assume that the firm knows about all of the reviews past onsumers have sent to the designer even when the designer oneals or otherwise obsures them from view. However, onsider the limit as λ. In the limit, buying onsumers always notie low quality, so the firm does know the history of submitted reviews. Finite λ distorts the designer s information about submitted reviews, but this assumption is reasonable for large λ. 11

1. For every h t H, a(h t ) = y if and only if ν(h t ) + [1 ν(h t )] Eˆq,â,σ [q(h t f ) ht ] ν. [ ] 2. For every h t f H f, q maximizes E q,â,σ T u t f (q(s), a(s)) ds h t f. 3. For every h t H, ν(h t ) is derived from Bayes Rule given µ(0), ˆq, â, and σ whenever possible. When h t is impossible, ν(h t ) = 0. 4. (ˆq, â) = (q, a) The beliefs onsisteny ondition (3) required by this definition of equilibrium is a mild one. 5 Beliefs are updated aording to Bayes Rule whenever possible, as in a weak perfet Bayesian equilibrium. Sine onsumers know the true poliy hosen by the designer, the only probability 0 publi histories are those with unexpeted messages given this poliy. Sine the good type s strategy is fixed, unexpeted messages an only be generated by the normal type, so the onsumers beliefs on these histories should be 0. I will restrit the designer to poliies that only depend on urrent beliefs, the urrent signal, and time, so the message distribution on any designer history is σ(µ(t), ν(t), x(t), t). For any poliy σ, E(σ) is the set of equilibrium profiles satisfying the above four onditions and a Markov ondition. This Markov ondition prevents the players from ating based on payoff-irrelevant aspets of the history. Only the urrent beliefs and time are payoff-relevant, so the firm s strategy is q(µ(t), ν(t), t) and the onsumer s strategy is a(ν(t), t). The time t firm ontinuation payoffs are denoted V f (µ(t), ν(t), t), and designer ontinuation payoffs are [ denoted V d (µ(t), ν(t), t). The designer hooses a poliy to ] maximize sup Eˆq,â,σ T u (ˆq,â) E(σ) 0 (q(t), a(t)) dt, where u (q, a) is the onsumer s payoff given in Figure 4. In other words, the designer evaluates poliies based on their best equilibrium { outomes. [ An optimal poliy is one for whih V d (µ(0), ν(0), 0) = ]} max σ sup Eˆq,â,σ T u (ˆq,â) E(σ) 0 (q(t), a(t)) dt Vd. 4 Optimal Poliy Before desribing the main result, I formally define full information and no information. A poliy is full information at time t if there exist messages m H, m L M suh that for all histories h t d, σ(ht d ) plaes probability 1 on m L whenever x(t) = 1 and plaes probability 1 on m H otherwise. A poliy is no information at time t if there exists m φ M suh that for all histories h t, σ(h t ) plaes probability 1 on m φ. 6 5 The onsisteny ondition here is the same as in a sequential equilibrium. 6 It is possible to define full information and no information more generally than this, but the given definitions suffie for the main result. 12

z(t) 1 0 τ 1 τ 2 T Figure 6: Dislosure Probability in a PFN Poliy t A PFN poliy is a poliy separated into three sequential time intervals by utoffs τ 1 and τ 2 suh that: 1. Partial Dislosure: For all t [0, τ 1 ), the designer reveals bad news at time t with probability z(t), and the normal firm hooses high quality with probability ρ(t) (0, 1). 2. Full Dislosure: For all t [τ 1, τ 2 ), there is full information and high quality at time t. 3. No Dislosure: For all t [τ 2, T ], there is no information and low quality from the normal type at time t. Intuitively, the first phase is a learning phase, the seond phase is the onsumer reward phase, and the last phase is the firm reward phase. Let Vd (λ) be the designer s payoff from an optimal poliy given arrival rate λ, and let Vd σ (λ) be the designer s payoff from poliy σ given λ (these also depend on other parameters, whih are omitted). The main result proves that a PFN poliy an ahieve payoffs arbitrarily lose to optimal for large λ: Theorem 1. There exists a mapping σ(λ) speifying a PFN poliy for eah λ suh that Vd σ(λ) (λ) Vd (λ) 0 as λ. Figure 6 illustrates a PFN poliy for a fixed λ. The solid blak line gives the probability of revealing a bad news arrival at eah point in time. The learning phase dislosure 13

probability is designed so that the firm is indifferent between high and low quality. In this phase, the firm always mixes, so beliefs trend upwards in the absene of revealed bad news. As time passes without a bad news dislosure, the firm s ontinuation payoff rises, so a lower dislosure probability makes the firm indifferent. In the seond phase, z(t) = 1 to indue high quality. Finally, the third phase features z(t) = 0 to reward the firm by letting it produe low quality without onsequenes. As λ inreases, learning happens in a shorter time. In the limit as λ, learning is instantaneous, so τ 1 0 (indiated by the blue dashed arrows). This last point is not immediately obvious, as a faster arrival rate requires a lower dislosure rate to keep the firm indifferent. Nevertheless, the proof will show that the effet of the arrival rate dominates, and learning onverges to instantaneous. Proof. First, I onstrut the PFN poliy and derive its equilibrium behavior and payoffs. Lemma 1. For any λ suffiiently large and ξ suffiiently small, there is a PFN poliy with the following properties: { 1. First utoff: τ1 (λ) = max λ ( p) 2. Seond utoff: τ 2 (λ, ξ) = p T p [ ( ) p ] } ν 1 ν(0) (1 ρ) 1, 0 1 ν ν(0) τ 1 1 λ ξ 3. Learning phase dislosure probability: z(t) = λ ( p) t+ 4. Learning phase high quality probability: ρ = max{ p ν(0) b b ν(0) b, 0} Proof. See Setion A.4 Q.E.D. Lemma 1 aomplishes several things. First of all, it proves that an inentive ompatible PFN poliy always exists for large λ. Seondly, this poliy has a learning phase that shrinks to duration 0 as λ. Furthermore, the firm s payoff at the start of the high effort phase an be shrunk to the firm s indifferene point by reduing ξ, as long as λ is suffiiently high for eah ξ. Finally, Lemma 1 expliitly defines a PFN poliy whih an be made arbitrarily lose to optimal, whih is important for applying suh a poliy in pratie. Let σ be the poliy from Lemma 1 and let V σ d lim λ Vd σ(λ). In the limit, the learning period of the PFN poliy shrinks to zero duration, so τ1 an be ignored. Furthermore, ξ an be made arbitrarily small. Define τ (t) = p (T t) + t and ˆν(t) min{ ν(t), 1}. To fill in the unspeified parts of the value funtion, for any 1 ν 1 ν(t) ν ν < ν, let beliefs be immediately driven up to ν. This is done with a vanishing learning phase in whih the firm hooses high quality with probability ρ(ν) = max{ p ν b (1 ν) b, 0}. 14

Note that p ρ(ν) b > 0, ν, so the normal firm ontributes negative flow payoffs to the designer in the learning phase. The time at whih the normal firm swithes from high to low quality is always τ (0). Then the designer s limit payoff is V σ d(µ, ν, t) = µ (T t) (b p) + (1 µ) ˆν [1{t < τ (0)} (τ (0) t) b p (T t)] Now, the proof must onfirm that this limiting poliy is optimal amongst all possible information poliies. Consider a variation on the designer s problem where the designer learns the firm s type immediately after hoosing a poliy. The solution to this problem will give a weakly higher payoff than the optimal poliy in the original problem, sine the designer an always ignore the firm s type. Using the dynami obfusation priniple of Ely [2015], the designer an hoose the distribution of posterior beliefs of the onsumer subjet to the onstraint that this distribution must be onsistent with the prior. The designer s approximate value funtion for small dt > 0 is V d (ν, t) max E f{u(ν, t) dt + V d (ν (t + dt), t + dt) f ([0,1]),E[f]=ν Here, u(ν, t) is the designer s expetation of the onsumer s flow payoff with beliefs ν given the optimal poliy. Consumer beliefs do not evolve on their own, so ν (t + dt) = ν. Using first-order Taylor approximations around dt = 0, subtrating V d (ν, t) from both sides, dividing by dt, and taking the limit as dt 0 gives the appropriate Hamilton- Jaobi-Bellman (HJB) equation: { 0 = max E f u(ν, t) + V } d f ([0,1]),E[f]=ν t (ν, t) This is a problem in the standard form found in Aumann, Mashler, and Stearns [1995] and Kamenia and Gentzkow [2011], so the solution to the maximization is the onavifiation (denoted by av{ }) of the expression in braes: 15

0 ν* 1 ν Figure 7: u(ν, t) + V σ d t (ν, t) for t < τ (0). { 0 = av u(ν, t) + V } d (ν, t) t In the onjetured optimal poliy, if t < τ (0) and 0 < ν < ν, then the poliy is in the learning phase. The derivative of the designer s ontinuation payoff is 7 V d (ν, t) = ν (b p) (1 ν) ˆν (b p) t The flow payoffs for this ase are u(ν, t) = ν (b p) + (1 ν) (ρ(ν) b p) = 0. If ν ν, then u(ν, t) = b p. Therefore, u(ν, t) + V σ d (ν, t) = t 0 if ν ν ν (b p) (1 ν) ˆν (b p) if 0 < ν < ν 0 if ν = 0 This is graphed in Figure 7. The solid line is u(ν, t) + V σ d (ν, t) and the dashed blue t line shows where the onavifiation differs. The onavifiation is equal to zero, as desired. 7 Note that the ommon prior assumption implies that the expetation of µ is equal to ν. 16

If t τ (0), then no information is being given to onsumers in the onjetured poliy. If ν < ν, then the onsumers never buy, leading to onstant flow payoffs of 0. If ν ν, then the onsumers buy and there is a onstant flow payoff of ν b p. In either ase, the expression in braes is onstant at 0, so the onavifiation is trivially onstant at 0. Q.E.D. The fat that there is an optimal poliy that is so simple is quite striking. The designer has at its disposal all possible signaling rules with whih to manipulate the ations of the firm and the onsumers, yet everything boils down to a straightforward ombination of full information and no information (exept for the learning period, whih shrinks to 0 as λ 0). Returning to the motivation of the designer as a review platform, the optimal platform design is simple: Publish just enough reviews early on to separate out some normal firms, then publiize all reviews for a time and only begin onealing reviews after the firm has had an unblemished reputation for a ertain period of time. The stylized takeaway is that early information revelation followed by onealment is very effetive at inentivizing reputation building and high quality from firms, while also overoming the old start problem and preventing market shutdown. 5 Infinite Horizon Now, I onsider the same model with only two hanges: there is no time horizon (i.e., T = ), and the firm and designer disount their flow payoffs with disount rate r > 0. In this setting, there is a potential for high quality under the full information poliy. As long as p >, low r and high λ an lead to high quality in a full information equilibrium. However, when > p, this is impossible. 8 Sine the onsumers are short-lived players, the standard folk theorem does not hold, beause onsumers are unwilling to take suffiiently low short-term payoffs to give the firm rewards for high quality. However, the designer an still inentivize the firm by manipulating information as long as < b. Therefore, no additional assumptions are neessary in the infinite horizon model. A one shot PFN poliy may not be optimal in the infinite horizon, beause the designer s value funtion is not everywhere onave in ν. If t is large enough, the normal 8 The firm s equilibrium payoff annot exeed its pure-ation Stakelberg payoff, whih is its maximum payoff when it publily ommits to its ation before the onsumer ats (for a disrete time proof, see Chapter 2.7 of Mailath and Samuelson [2006]). Announing a ommitment to H guarantees the firm a negative payoff, sine p <. Announing L leads onsumers to not buy, for a payoff of 0. Sine the firm annot ever have a ontinuation payoff exeeding 0, the ondition λ V f (ν, t) an never hold. This is the full information high quality ondition (with a grim trigger punishment), so the firm always produes low quality. 17

z(t) 1 0 t T T Figure 8: Dislosure Probability in a Repeated PFN Poliy with Cyle Length T types ontribute negatively to the designer s value, sine they will swith from H to L very soon (or have already). The designer an overome this problem by smoothing out the payoffs over time. One natural way of doing this is simply by repeating the finite horizon PFN poliy. A poliy is a repeated PFN poliy with yle length T if for all k {0, 1,...}, the poliy restrited to [k T, (k + 1) T ) is a finite horizon PFN poliy with horizon T. Suh a poliy is illustrated in Figure 8. Theorem 2. There exists a mapping σ(λ) speifying a repeated PFN poliy for eah λ suh that Vd σ(λ) (λ) Vd (λ) 0 as λ. Proof. The proof works by haraterizing the optimal repeated PFN poliy for any T and then showing that this is optimal amongst all poliies in the limit as T 0. Sine the mapping σ(λ) an speify poliies suh that T 0 as λ, this will be suffiient to prove the result. Note that the learning phase need not hange in any substantial way from the poliy onstruted in Theorem 1 (the designer must aount for the firm s disount rate, but this distintion vanishes as λ ). The following Lemma gives the designer payoff speified by the optimal repeated PFN poliy in the limit as T 0 and λ : Lemma 2. Let σ rep (λ, T ) be the optimal repeated PFN poliy with yle length T given λ. Then r lim λ,t 0 V σrep (λ,t ) d (λ) = [ν (b p) + (1 ν) ˆν p (b ) ]. Proof. See Setion A.5. Q.E.D. For an optimal poliy, the value funtion is 18

V d (ν, t) max E f{ f ([0,1]),E[f]=ν t+dt t e r (s t) u(ν, s) ds + e r dt V d (ν, t + dt)} The first-order Taylor approximation of e r dt V d (ν, t+dt) around dt = 0 is V d (ν, t) r V d (ν, t) dt+ V d t (ν, t) dt. Rearranging, dividing by dt, and taking the limit as dt 0, r V d (ν, t) = max { = av f ([0,1]),E[f]=ν E f u(ν, t) + V d (ν, t) t { u(ν, t) + V } d t (ν, t) } Note that in the onjetured optimal poliy, (ν, t) = 0, ν, t. For ν t ν, the normal firm hooses high quality p proportion of the time, and for ν (0, ν ), the normal firm hooses high quality with probability ρ(ν), so V σ d u(ν, t) + V σ d (ν, t) = t ν (b p) + (1 ν) ( p b p) ν (b p) + (1 ν) (ρ(ν) b p) if ν ν if 0 < ν < ν 0 if ν = 0 By the definition of ρ(ν), this expression is onstant at 0 for all ν [0, ν ). ν ν, the expression is idential to r V σ d. For 0 < ν < ν it is stritly less than r V σ d, due to the loss from learning (onsumers sometimes buy despite low quality), but both are equal to 0 when ν = 0. Thus, the onavifiation makes them equal, as shown in Figure 9. The solid blak line is the graph of u(ν, t) + V σ d (ν, t), and the dotted blue line t shows where the onavifiation differs. Hene, the limiting value funtion is optimal. For Q.E.D. Theorem 2 shows that alternating between full information and no information is lose to optimal. However, the limit payoff of the repeated PFN poliy is ν (b p) + (1 ν) ˆν p (b ), whih is the same as the one shot PFN poliy at t = 0. Therefore, a PFN poliy similar to that used in Theorem 1 is still optimal: Corollary 1. There exists a one shot PFN{ poliy where ( the normal ) } firm transitions from high to low quality at time τ 2 = τ 1 + max 1 ln r +λ ( p), 0 that is approximately r λ optimal for large λ. 19

b-p p(b-)/ 0 0 ν* 1 ν Figure 9: u(ν, t) + V σ d (ν, t) in the optimal poliy. t Both the repeated and the one shot PFN poliy feature permanene of reputations. There is a positive probability (probability ˆν) that the onsumers never learn a normal firm s type, even in the limit as t. Theorem 2 shows that for the parameter ranges onsidered here, this is optimal. Essentially, permanent reputations are valuable to the designer, beause they allow for the onstrution of permanent inentives for the firm to produe high quality. By keeping the onsumers unertain, the designer an ontinue to threaten the firm with the release of bad reviews if it does not produe high quality. In light of Corollary 1, repeating the PFN poliy may seem needlessly ompliated. However, in the next setion, I will show that the repeated PFN poliy outperforms the one shot PFN poliy when the designer has some freedom to revise its poliy. 6 Poliy Revision Thus far, I have assumed that the designer ommits to its poliy one and an never revise it. However, poliy revisions are ommon in pratie. For example, Amazon modified its produt rating algorithm in 2015 9, and TripAdvisor rolled out a new version of its Popularity Ranking algorithm in early 2016. 10 Presumably, a platform ould redesign its poliy to try to improve equilibrium produt quality. However, the temptation to do so an neutralize the inentives of the original poliy. Therefore, the designer would prefer a poliy that does not tempt it to redesign the poliy later. Now, onsider a poliy revision model in whih there is only partial ommitment. 9 http://www.geekwire.om/2015/amazon-hanges-its-influential-formula-for-alulating-produtratings/ 10 https://www.tripadvisor.om/tripadvisorinsights/n2701/hanges-tripadvisor-popularity-rankingalgorithm 20

In this new model, the designer is free at the very beginning of any time t to announe a new poliy or keep the same poliy. 11 If the designer announes a new poliy, it inurs a one time additive ost ε > 0. This enompasses various osts suh as the osts of writing new omputer ode or the osts of publiizing the hange in poliy. At eah time t and history h t d, the designer an either not revise, getting the ontinuation payoff of the existing poliy, or revise and get Vd ε. This defines the designer s payoffs in a modified game in whih the designer adjusts its poliy over time. A poliy σ is alled robust to revision if there exists an equilibrium in this modified game where the designer hooses σ at t = 0 and never revises. In the infinitely repeated game with no revisions, both the one shot PFN poliy and the repeated PFN poliy are approximately optimal. However, the former poliy tempts the designer to revise the poliy. In the PFN poliy, the low quality phase is bad for onsumers, and it is permanent. The designer would like to return to the original t = 0 poliy to initiate a new high quality phase. As a result, the firm does not believe that the designer will follow through on its original poliy and reward it for high quality. Therefore, this poliy fails to inentivize the firm to produe high quality. Proposition 4. There exists ε > 0 suh that if ε < ε, no PFN poliy is robust to revision. Proof. In a PFN poliy, at any time in the low quality phase, the designer s ontinuation payoff is 1 [ν b p]. The ontinuation payoff from the optimal revised poliy is 1 [ν r r (b p) + (1 ν) p (b ) ]. Revision is stritly preferred if ε < 1 [ν (b p) + (1 ν) r p (b ) ] 1 [ν b p] = 1 p b (1 ν) ε. Q.E.D. r r The repeated PFN poliy avoids these problems when T is small, beause on any history, the designer s ontinuation payoff is very lose to the highest possible ontinuation payoff on that history. Proposition 5. For any ε > 0, there exists T suh that for all T < T, the optimal repeated PFN poliy with yle length T is robust to revision for large λ. Proof. First, onsider the possibility of deviations in the learning phase. For large λ, the urrent poliy is lose to optimal for all onsumer beliefs ν. Therefore, the (approximately) best possible revision is to restart from the t = 0 poliy. However, this is exatly the same as the urrent poliy, so revision is never a best response. For the high and low quality phases, the most tempting onditions for revision are at the start of a low quality phase. However, even at the start of a low quality phase, the 11 Note that this does not allow the designer to reneg on its previous ommitments. For example, suppose at some time t, there is a bad news arrival that is supposed to be revealed to the onsumers. The designer may not suddenly hange the poliy to hide this bad news. 21

designer s ontinuation payoff onverges to the upper bound payoff of 1 [ν (b p) + r (1 ν) p (b ) ] as T 0. Therefore, revision is not benefiial to the designer for small enough T. Q.E.D. Sine the designer is already getting arbitrarily lose to the optimal payoff at all times t, it an never indue a suffiiently large improvement by revising. This shows another advantage of smoothing the value funtion over time. In the one shot PFN poliy, the final phase gives the designer a very low ontinuation payoff, and it may want to revise the poliy. If the designer annot redibly ommit not to reveal in the third phase, then the inentives break down, and the normal firm will not produe high quality. This is not a problem in the limiting repeated PFN poliy. Another interpretation of of the repeated PFN poliy is as a oarse rating sheme with bad review forgiveness. There are two ratings: thumbs up and thumbs down. The firm starts with a thumbs up rating. The thumbs down rating reveals the firm as a normal type, so it is an absorbing state. The firm may transition from thumbs up to thumbs down due to bad reviews. However, any bad reviews inurred in a low quality phase are forgiven, ontingent on no bad reviews in the next high quality phase. As long as the firm erases its bad reviews by periodially hoosing high quality, it maintains its thumbs up rating, and future onsumers are unaware that it is a normal type. Qualitatively similar poliies an be found in pratie. TripAdvisor s Popularity Ranking Index assigns a oarse numerial ranking to hotels based on their history of reviews. Their system gives more weight to reent reviews than it does to older reviews. 12 A hotel an optimize its quality while maintaining its rating by arefully alternating between high and low quality phases, swithing to high quality whenever there are enough reent bad reviews. 13 For example, suppose TripAdvisor onsiders reviews written within the last M units of time to be reent, and a hotel s ranking will only drop when there are at least 5 reent bad reviews. Then a hotel an alternate between high and low quality, swithing bak to high quality only when there are 4 reent bad reviews. A sample path resulting from this sort of poliy is illustrated in Figure 10. It is surprising that an optimal poliy in this setting empasizes the reeny of reviews. In the optimal poliy of Horner and Lambert [2016], the reeny of information is taken into aount. However, in that paper there is a dynamially evolving type, so there is an inherently greater value of newer information. In this paper, the firm s type is onstant 12 https://www.tripadvisor.om/tripadvisorinsights/n684/tripadvisor-popularity-ranking-keyfators-and-how-improve 13 Although qualitatively similar, TripAdvisor s poliy differs from the poliy in Theorem 2 in its history dependene, whih is not allowed in the lass of poliies onsidered here. 22