Performance of Stochastic Programming Solutions

Performance of Stochastic Programming Solutions Operations Research Anthony Papavasiliou 1 / 30

Performance of Stochastic Programming Solutions 1 The Expected Value of Perfect Information 2 The Value of the Stochastic Solution 3 Basic Inequalities 4 Estimating Performance 2 / 30

Two-Stage Stochastic Linear Programs min z = c T x + E ξ [min q(ω) T y(ω)] s.t. Ax = b T (ω)x + Wy(ω) = h(ω) x 0, y(ω) 0 First stage decisions x R n 1, c R n 1, b R m 1, A R m 1 n 1 For a given realization ω, second-stage data are q(ω) R n 2, h(ω) R m 2, T (ω) R m 2 n 1 All random variables of the problem are assembled in a single random vector ξ T (ω) = (q(ω) T, h(ω) T, T 1 (ω),..., T m2 (ω)) 3 / 30

Motivation Is it worth solving a stochastic program? How well could we do if we knew the future? How well could we do with a simpler model (e.g. expected value problem)? 4 / 30

Table of Contents 1 The Expected Value of Perfect Information 2 The Value of the Stochastic Solution 3 Basic Inequalities 4 Estimating Performance 5 / 30

Notation z(x, ξ) = c T x + Q(x, ξ) + δ(x K 1 ) Q(x, ξ) = min{q(ω) T y Wy = h(ω) T (ω)x} y What is the interpretation of z(x, ξ)? Recall, K 1 = {x Ax = b, x 0}, K 2 (ξ) = {x y : Wy = h(ω) T (ω)x} It can be that z(x, ξ) = + (if x / K 1 K 2 (ξ)) It can be that z(x, ξ) = (unbounded below) 6 / 30

Wait-and-See, Here-and-Now The wait-and-see value is the expected value of reacting with perfect foresight x(ξ) to ξ: WS = E ξ [min z(x, ξ)] x E ξ z( x(ξ), ξ) The here-and-now value is the expected value of the recourse problem: RP = min x E ξ z(x, ξ) We have swapped min and E. What s the difference? Which one is more difficult to compute? 7 / 30

Expected Value of Perfect Information (EVPI) The expected value of perfect information is the difference between the two solutions: EVPI = RP WS Interpretation: value of a perfect forecast for the future 8 / 30

Example: Capacity Expansion Planning We will solve this in class, on AMPL 9 / 30

Table of Contents 1 The Expected Value of Perfect Information 2 The Value of the Stochastic Solution 3 Basic Inequalities 4 Estimating Performance 10 / 30

Expected Value Problem Expected (or mean) value problem: EV = min z(x, ξ), ξ = E[ξ] x Expected value solution x( ξ): optimal solution of expected value problem 11 / 30

Value of the Stochastic Solution The expected result of using the EV solution measures the performance of x( ξ) (optimal second-stage reactions given x( ξ)): EEV = E ξ [z( x( ξ), ξ)] The value of the stochastic solution is VSS = EEV RP Which one is easier to compute: WS, RP, or EEV? Which one is harder? What can we say about VSS if x(ξ) is independent of ξ? 12 / 30

Table of Contents 1 The Expected Value of Perfect Information 2 The Value of the Stochastic Solution 3 Basic Inequalities 4 Estimating Performance 13 / 30

Crystal Ball For every ξ, we have z( x(ξ), ξ) z(x, ξ) where x is the optimal solution to the recourse problem Taking expectations on both sides yields WS RP Interpretation: we can do better if we have a crystal ball (i.e. we know the future in advance) 14 / 30

Lazy x is the optimal solution of {min x E ξ z(x, ξ)} x( ξ) is a feasible solution, therefore min x E ξ z(x, ξ) = RP EEV = E ξ z( x( ξ), ξ) Interpretation: we do worse when we are lazy (i.e. when we do not account for uncertainty explicitly) Would anything change if some of the x, y were integer? 15 / 30

Jensen s Inequality Suppose f is convex and ξ is a random variable, then f (E[ξ]) E[f (ξ)] 16 / 30

Lazy and a Liar! Suppose c, W, T are independent of ω (i.e., ξ = h): then EV WS We will show that z(x, h) = c T x + Q(x, h) + δ(x Ax = b, x 0) is jointly convex in (x, h) We will then show that f (ξ) = min x z(x, ξ) is convex in ξ From Jensen s inequality, we have Ef (ξ) f (Eξ) Interpretation: EV (the lazy solution) is a biased estimate of expected cost. Is it optimistic, or pessimistic? 17 / 30

Proof that z(x, h) is convex in (x, h) Consider x 1, x 2 and λ (0, 1). Without loss of generality, assume Ax 1 = b, Ax 2 = b, x 1, x 2 0. z(x i, h i ) = c T x i + q T y i, where y i = min{q T y Wy = h i Tx i, y 0}, i = {1, 2} z(λx 1 + (1 λ)x 2, λh 1 + (1 λ)h 2 ) = c T (λx 1 + (1 λ)x 2 ) + q T y λ, where y λ = min{q T y Wy = λh 1 + (1 λ)h 2 T (λx 1 + (1 λ)x 2 ), y 0} λy 1 + (1 λ)y 2 is a feasible solution for min{q T y Wy = λh 1 + (1 λ)h 2 T (λx 1 + (1 λ)x 2 ), y 0}. Therefore, we have q T y λ λq T y 1 + (1 λ)q T y 2. It follows that z(λx 1 + (1 λ)x 2, λh 1 + (1 λ)h 2 ) λz(x 1, h 1 ) + (1 λ)z(x 2, h 2 ) 18 / 30

Proof that f (ξ) = min x z(x, ξ) is convex in ξ Consider ξ 1, ξ 2 with f (ξ 1 ) = z(x 1, ξ 1 ), f (ξ 2 ) = z(x 2, ξ 2 ): λf (ξ 1 ) + (1 λ)f (ξ 2 ) = λz(x 1, ξ 1 ) + (1 λ)z(x 2, ξ 2 ) z(λ(x 1, ξ 1 ) + (1 λ)(x 2, ξ 2 )) min x z(x, λξ 1 + (1 λ)ξ 2 ) = f (λξ 1 + (1 λ)ξ 2 ) by definition convexity of z by definition 19 / 30

Counter-Example: EV > WS Consider the following problem: min 2x + E ξ[ξ y] x 0 s.t. y 1 x y 0 where P(ξ = 1) = 3/4, P(ξ = 3) = 1/4 Does this problem satisfy the assumptions of slide 16? 20 / 30

Optimal second-stage decision: y = 1 x if 1 x 0, y = 0 otherwise Tradeoff: by increasing x we can push y to lower values, but incur certain cost 2x For ξ = 3 4 + 3 4 = 3 2 we have {min 2x + 3 2y y 1 x, x 0, y 0} Optimal solution: x( ξ) = 0, y = 1 with EV = 3 2 To compute WS, note that for ξ = 1 the optimal first-stage decision is x = 0, with cost of 1, while for ξ = 3 the optimal first-stage decision is x = 1, with cost of 2: WS = 3 4 + 1 4 2 = 5 4 < EV 21 / 30

Summary We have established that VSS 0, EVPI 0 VSS EEV EV, EVPI EEV EV If EEV EV = 0 then VSS = 0, EVPI = 0 (for example, if x(ξ) independent of ξ - this is rare) EVPI VSS EV WS RP EEV 22 / 30

Table of Contents 1 The Expected Value of Perfect Information 2 The Value of the Stochastic Solution 3 Basic Inequalities 4 Estimating Performance 23 / 30

Central Limit Theorem Suppose ξ(ω) is continuous, does this complicate the computation of EV, RP, EEV and WS? Central limit theorem: Suppose {X 1, X 2,...} is a sequence of i.i.d. random variables with E[X i ] = µ and Var[X i ] =? 2 <. Then as n approaches infinity, n(s n?µ) converge in distribution to a normal N(0, σ 2 ): n (( 1 n n i=1 ) ) X i µ d N(0, σ 2 ). Can we use the CLT? What would the X i be in our case? 24 / 30

Motivating Example The cost C of operating a facility is C(ξ 1 ) = 1 under normal operations, p(ξ 1 ) = 0.9 C(ξ 2 ) = 100 under emergency operations, p(ξ 2 ) = 0.1 µ = 0.1 100 + 0.9 1 = 10.9 σ = 0.9 (1 10.9) 2 + 0.1 (100 10.9) 2 = 29.7 25 / 30

Rare outcome (1 out of 10 samples) influences expected value calculation since it contributes by 0.1 100 10.9 = 91.7% to expected value From central limit theorem, in order to get estimate of E[C] to be within 5% with 95.4% confidence, we need 2 σ n = 0.05µ, from which n = 11879! 26 / 30

Importance Sampling Suppose we wish to estimate E[C(ξ)], where ξ is a random variable distributed according to f (ξ) Monte Carlo pulls samples ξ i according to distribution f (ξ) and estimates E[C(ξ)] with N i=1 1 N C(ξ i) Importance sampling pulls samples ξ i according to distribution g(ξ) = f (ξ) C(ξ) E[C] and estimates E[ξ] with N i=1 1 f (ξ i ) C(ξ i ) N g(ξ i ) 27 / 30

Motivation of Importance Sampling Note that E[C(ξ)] = Ξ C(ξ) f (ξ)dξ = Ξ C(ξ) f (ξ) g(ξ) g(ξ)dξ C(ξ) f (ξ) The random variable g(ξ), which is distributed according to g(ξ), also has expectation E[C] Which g(ξ) minimizes the variance of this new random variable? g(ξ) = Any sample evaluates to E[C]! C(ξ) f (ξ) E[C] We cheated: g(ξ) requires knowledge of E[C], which is what we are estimating But we learned something: pull samples according to C(ξ) f (ξ) contribution to expected value, E[C]. Even if we do not know E[C], we can approximate it. 28 / 30

Back to the Example Problem: rare bad outcome had the greatest influence on expected value Remedy: Redefine distribution so that we observe bad outcome earlier, then adjust our expected value calculations in order to unbias result q(ξ 1 ) = p(ξ 1) C(ξ 1 ) E[C] q(ξ 2 ) = p(ξ 2) C(ξ 2 ) E[C] = 0.9 1 10.9 = 0.9 10.9 = 0.1 100 10.9 Estimates for ξ 1, ξ 2 are constant and equal to E[C]: = 10 10.9 C(ξ 1 ) p(ξ 1) q(ξ 1 ) = 1 0.9 = 10.9 0.9 10.9 C(ξ 2 ) p(ξ 2) 0.1 = 100 = 10.9 q(ξ 2 ) 10 10.9 29 / 30

Further Reading 4.1 BL: the expected value of perfect information 4.2 BL: the value of the stochastic solution 4.3 BL: basic inequalities 4.4 BL: the relationship between EVPI and VSS 4.5 BL: examples 4.6 BL: bounds on EVPI and VSS 30 / 30