Dynamic Pricing with Varying Cost L. Jeff Hong College of Business City University of Hong Kong Joint work with Ying Zhong and Guangwu Liu
Outline 1 Introduction 2 Problem Formulation 3 Pricing Policy 4 Regret Analysis 5 Numerical Results Dynamic Pricing with Varying Cost 3 / 21
Online Pricing Problem A company sells a product online. Customers arrive one at a time and buy one unit of the product if the price is lower than their willingness to pay (WTP). Customers are homogenous, having the same WTP distribution. The company has a menu of prices, e.g., 3.99, 4.99 and 5.99, to choose from. Question: How to set the price? Dynamic Pricing with Varying Cost 4 / 21
Learning and Earning The objective is to maximize the cumulative profit by adaptively offering different prices to different customers. The decision maker faces a tradeoff between exploration of the acceptance probabilities at different prices (learning) and exploitation of the immediate profit (earning). The problem was first introduced by Rothschild (1974). Without any assumptions on the WTP distribution, the problem is typically formulated as a multi-armed bandit (MAB) problem. Dynamic Pricing with Varying Cost 5 / 21
Multi-armed Bandit Originally formulated by Robbins (1952), the MAB is an important class of sequential optimization problems. Objective: Devise a sampling policy among a group of K 2 statistical populations (arms) that maximizes expected cumulative reward over a finite time horizon. Dynamic Pricing with Varying Cost 6 / 21
Multi-armed Bandit Policies are evaluated based on the regret, R[T] = T E [ ] µ i µ It, t=1 where i is the optimal arm and I t is the arm chosen in period t. Lai and Robbins (1985) proved that the regret for the MAB problem has to grow at least O ( log T ). The upper-confidence-bound (UCB) policy of Auer et al. 2002 has R[T] C log T for some constant C > 0. Dynamic Pricing with Varying Cost 7 / 21
UCB Policy 1 Initialization: Play each arm once. 2 Loop: Play arm j that maximizes µ j + 2 log t T j (t 1) where µ j is the average reward obtained from arm j, T j (t 1) is the number of times arm j has been played so far and t is the overall number of plays done so far. Dynamic Pricing with Varying Cost 8 / 21
Varying Cost In some practical applications, costs may vary for different customers. Online sales of an insurance product: Potential customers are usually asked to fill questionnaires before getting quotes for the product. The insurance company is able to assess the potential risk (cost) of each individual customer through these questionnaires. The cost for each customer, as is often the case, varies. To maximize the cumulative profit, different premiums (prices) should be asked for different customers based on their costs. Other examples include: the sales of some perishable goods, e.g. gasoline, fresh fruit, etc. Dynamic Pricing with Varying Cost 9 / 21
Notation T: Total length of of the selling periods (or customers). c t : The cost observed in period t. We assume they are i.i.d. samples from a fixed (unknown) distribution on C. p 1 < p 2 < p K : Prices choices. We assume p 1 > max c C c. K = {1, 2..., K}: index set of all the prices. µ k (c): The profit function of price k when the cost is c, ( µ k (c) = E[D(p k )] (p k c) = π k 1 c ), p k where π k = E[D(p k )]p k is the expected revenue at p k. We also assume that the observed revenue D(p k )p k [0, 1]. Dynamic Pricing with Varying Cost 10 / 21
Problem Formulation Consider a company selling a product over T (unknown) periods. At the beginning of each period t, upon observing a cost c t, the decision maker needs to choose a price p k where k K. The index of the true optimal price at time t is: i (c t ) = arg max k K µ k (c t ) Let I t (c t ) be the index of the price chosen by a pricing policy. Objective: Find a pricing policy that minimizes the cumulative regret: T R [T] = E [ µ i (c t ) (c t ) µ It (c t ) (c t ) ]. t=1 Dynamic Pricing with Varying Cost 11 / 21
Why Considering Varying Cost? Without considering the varying cost, suppose one uses the expected cost E(c t ) in making pricing decision. The problem becomes a MAB problem max k K µ k (E(c t )) Considering varying cost, the problem is max k K µ k (c t ) By Jensen s Inequality, [ ] max µ k (E(c t )) E max µ k (c t ) k K k K Dynamic Pricing with Varying Cost 12 / 21
Special Features Notice that µ k (c) = π k ( 1 c p k ), for any k K, the straight line µ k (c) always passes a fixed point [ p k, 0 ] and [0, π k ]. Precisely estimating π k is crucial. Dynamic Pricing with Varying Cost 13 / 21
Pricing Policy 1. Initialization: For t K, choose each price once. 2. Loop: For t > K Estimate revenue for each p k, and let π k,t = T 1 k (t 1) Π (p k ) T k (t 1) i where Π (p k ) i is the i-th realization of the revenue of p k. Write down the upper bound of the profit function for each p k in UCB manner, let µ k,t (c) = π k,t + Choose the price with index, i=1 2 log t T k (t 1) I t (c t ) = arg max k K µ k,t (c t ) ) (1 cpk Dynamic Pricing with Varying Cost 14 / 21
Main Results Theorem (1) If K = 2 and π 1 > π 2, the cumulative regret is bounded by R [T] C 1 ( log T ) 2 where C 1 is a positive constant that depends on the configuration of µ 1 (c) and µ 2 (c). Regret is mainly caused by the inaccurate estimation of the intersection point of µ 1 (c) and µ 2 (c), and the regret comes from the neighborhood of the intersection point. For each t, the expected regret is bounded by constant log t t. The result can be extended to K > 2 under some conditions. Dynamic Pricing with Varying Cost 15 / 21
Illustration Dynamic Pricing with Varying Cost 16 / 21
Intuitions The information learned at one value of the cost can also be used at other values of cost. Our problem is significantly more difficult than the standard MAB problem, because the profits can be arbitrarily close, making the selection very difficult. Yet, the regret is not much worse, O ( ( log T ) 2 ) compared to O ( log T ). The regret comes from the inaccurate estimation of the intersection point and, thus, causing wrong decisions. Dynamic Pricing with Varying Cost 17 / 21
A Special Case: C < If C <, at any cost, there is an gap between µ 1 (c) and µ 2 (c). Then, the inaccurate estimation of the intersection point will not happen infinitely often. What s the implication? Dynamic Pricing with Varying Cost 18 / 21
A Constant Bound Theorem (3) If C < and none of the feasible prices is inferior, then there exists a constant C 2 such that R [T] C 2. We would expect the problem with varying cost is more difficult than the one with constant cost. It is not! Because every price is good for some costs, one does not have to conduct exploration on prices that do not look good. Dynamic Pricing with Varying Cost 19 / 21
Numerical Results The cumulative regret with respective to T 1 0.9 C={1,3} C=[0,4] Cumulative Regret (normalized) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 T x 10 4 Figure: p 1 = 4.0, p 2 = 4.1, π 1 = 0.6, π 2 = 0.59 Dynamic Pricing with Varying Cost 20 / 21
Q & A Thank you! Dynamic Pricing with Varying Cost 21 / 21