Summary Sampling Techniques MS&E 348 Prof. Gerd Infanger 2005/2006 Using Monte Carlo sampling for solving the problem Monte Carlo sampling works very well for estimating multiple integrals or multiple sums for higher h-dimensional spaces. Instead of computing expectations exactly by enumerating all possible outcomes ω Ω (which as already noted above for problems with many random parameters is impossible due to the many combinations of possible outcomes, we use Monte Carlo sampling for estimating the expectations z(ˆx k =Ez ω (ˆx k, and G k+1 = Eπ ω (ˆx k B ω on each iteration k of the Benders decomposition algorithm. Crude Monte Carlo sampling Crude Monte Carlo refers to employing Monte Carlo sampling directly without any variance reduction techniques. On each iteration k of the Benders decomposition algorithm, we draw an independent sample S k of size S k = from the distribution of possible outcomes ω Ω and estimate the expectations z(ˆx k, and G k+1 using the unbiased estimators z(ˆx k = 1 z ω (ˆx k, and G k+1 = 1 π ω (ˆx k B ω The estimator z(ˆx k has a true variance σ 2 z (ˆxk = 1 σ2 (ˆx k, where σ 2 (ˆx k is the (population variance of the second stage costs, given ˆx k, e.g., σ 2 (ˆx k = (z ω (ˆx k z(ˆx k 2 p ω. We estimate the variance σ 2 (ˆx k using sample S k as σ 2 (ˆx k = 1 (z ω (ˆx k z(ˆx k 2 1 Excerpt from the DECIS User s Guide. Copyright c 1989 1998 by Gerd Infanger. All rights reserved. 1
and obtain σ 2 z (ˆxk = 1 σ2 (ˆx k. Importance sampling Importance sampling is a powerful variance reduction technique. Compared to crude Monte Carlo you can expect, using the same sample size, a significantly better estimate. We recall that z ω (ˆx k =z(v ω, ˆx k, since the parameters of the second-stage subproblem are all functions of the independent random vector V : f ω = f(v ω, B ω = B(v ω, D ω = D(v ω, d ω = d(v ω. Suppose there is a function Γ(v ω, ˆx k that closely approximates z(v ω, ˆx k and whose expectation Γ(ˆx k is known or can be computed easily. We rewrite z(ˆx k = z(vω, ˆx k p(v ω as z(ˆx k = Γ(ˆx z(v ω, ˆx k k Γ(v ω, ˆx k and interpret it as z(ˆx k = Γ(ˆx k E z(vω, ˆx k Γ(v ω, ˆx k distributed according to the probability mass function q(v ω, ˆx k = Γ(vω, ˆx k Γ(ˆx k p(v ω, which depends on the first-stage decision ˆx k on iteration k. Since Γ(ˆx k = Γ(vω, ˆx k p(v ω, Γ(v ω, ˆx k Γ(ˆx k p(v ω q(v ω, ˆx k = Γ(v ω, ˆx k p(v ω Γ(ˆx k =1. Since p(v ω 0, if all Γ(v ω, ˆx k 0also Γ(ˆx k 0 and it follows that q(v ω, ˆx k = Γ(vω, ˆx k Γ(ˆx k p(v ω 0. Also, if all Γ(v ω, ˆx k 0also Γ(ˆx k 0 and it follows again that q(v ω, ˆx k = Γ(vω, ˆx k Γ(ˆx k p(v ω 0. Either Γ(v ω, ˆx k 0 for all ω ΩorΓ(v ω, ˆx k 0 for all ω Ω can be ensured by adding an appropriate constant to the approximation function. We obtain a new estimator of z(ˆx k as z(ˆx k = 1 Γ(ˆx k z(vω, ˆxk Γ(v ω, ˆx k 2
by sampling a sample S k of size S k = from the distribution q(v ω, ˆx k. The variance of z is given by var( z = 1 ( Γ(ˆx k z(vω, ˆx k 2 Γ(v ω, ˆx k z(ˆxk q(v ω. One can see easily that if Γ(v ω, ˆx k =z(v ω, ˆx k thenvar( z = 0, and q (v ω, ˆx k = z(vω, ˆx k z(ˆx k p(v ω, where q reflects the best importance distribution. This, of course, is only theoretical, since for computing the best importance distribution we need to know z(ˆx k, which we eventually want to compute. ote that the optimal importance density is proportional to the product z(v ω, ˆx k p(v ω. If we use a Γ(v ω, ˆx k z(v ω, ˆx k, we obtain a good importance distribution, which is approximately proportional to the product z(v ω, ˆx k p(v ω. DECIS uses additive and multiplicative (in the components of the stochastic vector V approximation functions for obtaining variance reductions through importance sampling. Since p(v ω = h p i (v ω i, if Γ(v ω, ˆx k = h, ˆx k then h h Γ(ˆx k = E, ˆx k =, ˆx k p i (v ω i ω i Ω i and if Γ(v ω, ˆx k = h, ˆx k then Γ(ˆx k = E, ˆx k =, ˆx k p i (v ω i. ω i Ω i Instead of computing one h-dimensional integral or sum, both in the additive and multiplicative case the expected value calculation of the approximation function reduces to computing h 1-dimensional integrals or sums. Additive approximation function As additive approximation function DECIS uses a additive marginal cost approximation. We introduce z(τ, ˆx k, the costs of a base case, and compute the approximate cost as the cost of the base case plus the sum of the marginal cost in each dimension of V : Γ(v ω, ˆx k =z(τ, ˆx k + h i, ˆx k, i, ˆx k =z(τ 1,...,τ i 1,v ω i i,τ i+1,...,τ h, ˆx k z(τ, ˆx k. Using this model, we explore the cost function at the margins, e.g., we vary the random elements V i to compute the costs for all outcomes v ω i i while we fix the other random elements at the level of the base case τ. The base case can be any arbitrary chosen point of the set of k i discrete values of V i,,...,h. If possible, we choose τ i as that outcome of V i which leads to the lowest costs, ceteris paribus. 3
Using the additive marginal cost model, we can express the expected value of the second stage costs in the following form: h z(ˆx k =z(τ, ˆx k + Γ i (ˆx k z(v ω, ˆx k z(τ, ˆx k Γi (v ω i i, ˆx k h i, ˆx k Γ i (ˆx k j=1 p j (v ω i j, where we assume that h Γ i (vi ω, ˆxk > 0, so that at least one Γ i (vi ω, ˆx > 0. ote that this formulation consists of a constant term and a sum of h expectations. Given a fixed sample size we partition into h sub-samples, with sample sizes i, i =1,...,h such that Σ i = and i 1, i =1,...,,where i is approximately proportional to Γ i (ˆx k. The h expectations are separately approximated by sampling from marginal densities. The i-th expectation corresponds to the i-th component of V. In generating sample points for the i-th expectation, we use the importance density (p i Γ i / Γ i for sampling the i-th component of V and the original marginal densities for any other components. Denoting by µ i (ˆx = 1 i i the estimate of the i-th sum, we obtain z(ˆx k =z(τ, ˆx k + z(v ω, ˆx k z(τ, ˆx k h i, ˆx k h Γ i (ˆx k µ i (ˆx k, the estimated expected value of the second-stage costs. Let σ 2 i (ˆxk bethesamplevariance of the i-th expectation, where we set σ 2 i (ˆxk =0if i = 1. The estimated variance of the mean, σ 2 z(ˆx k, is then given by σ 2 z(ˆx k = h To derive a cut we use an analogous framework: Γ 2 i (ˆxk σ 2 i (ˆxk i. G(ˆx k =π(τ, ˆx k B(τ+ h Γ i (ˆx k π(v ω, ˆx k B(v ω π(τ, ˆx k B(τ Γi (v ω i, ˆx k h, ˆx k Γ i (ˆx k p j (v ω j. j=1 We estimate the coefficients of a cut by sampling using the same scheme and same sample points at hand from the computation of z(ˆx k toobtain G(ˆx k, and compute g(ˆx k = z(ˆx k G(ˆx k ˆx k. 4
Multiplicative approximation function For the multiplicative approximation function we introduce the cost of a base case z(τ, ˆx k, and compute the approximate costs as the cost of the base case times the relative marginal cost in each dimension of V : Γ(v ω, ˆx k =z(τ, ˆx k i, ˆx k, i, ˆx k = z(τ 1,...,τ i 1,v ω i i,τ i+1,...,τ h, ˆx k z(τ, ˆx k. We call this the multiplicative marginal cost model. Using this model, we explore the cost function at the margins, e.g. we vary the random elements V i to compute the costs for all outcomes v ω i i while we fix the other random elements at the level of the base case τ and compute the relative marginal cost with regard to the cost of the base case. The base case can be any arbitrary chosen point of the set of W i discrete values of V i,,...,h,since (according to the assumptions z(v ω, ˆx k 0, and therefore also Γ i (v i, ˆx k 0. Using the multiplicative marginal cost model, we can express the expected value of the second stage costs in the following form: z(ˆx k = Γ(ˆx k z(v ω, ˆx k h i, ˆx k Γ(v ω, ˆx k Γ i (ˆx k p i (v ω i i. ote that using the multiplicative marginal approximation we need to estimate only one expectation. We estimate the mean by sampling using the marginal densities p i Γ i / Γ i in each dimension i of the multidimensional random vector V : z(ˆx k = Γ(ˆx k 1 and estimate the variance of the mean, σ 2 z(ˆx k, as σ 2 z (ˆxk = 1 ( 1 z(v ω, ˆx k Γ(v ω, ˆx k, Γ(ˆx k z(vω, ˆx k 2 Γ(v ω, ˆx k z(ˆxk. To derive a cut we use an analogous framework: G(ˆx k = Γ(ˆx k π(v ω, ˆx k B(v ω h i, ˆx k Γ(v ω, ˆx k Γ i (ˆx k p i (v ω i i. We estimate the coefficients of a cut by sampling using the same scheme and sample points at hand from the computation of z(ˆx k toobtain and we compute G(ˆx k = Γ(ˆx k 1 π(v ω, ˆx k B(v ω Γ(v ω, ˆx k, g(ˆx k = z(ˆx k G(ˆx k ˆx k. 5
Control variates Control Variates is also a powerful variance reduction technique, which (using the same size of sample may lead to significantly better estimates. We recall that z ω (ˆx k =z(v ω, ˆx k. Suppose there is a function Γ(v ω, ˆx k that closely approximates z(v ω, ˆx k, (i.e., is positively correlated and whose expectation Γ(ˆx k is known or can be computed easily. We rewrite z(ˆx k = z(vω, ˆx k p(v ω as z(ˆx k = z(v ω, ˆx k αγ(v ω, ˆx k p(v ω +α Γ(ˆx k and interpret it as z(ˆx k =E z(v ω, ˆx k αγ(v ω, ˆx k + α Γ(ˆx k. We obtain a new estimator of z(ˆx k, z(ˆx k = 1 z(v ω, ˆx k αγ(v ω, ˆx k + α Γ(ˆx k by sampling a sample S k of size S k = from the distribution p(v ω. The variance of z is given by var( z = 1 var(z(vω, ˆx k αγ(v ω, ˆx k var( z = 1 var(z(v ω, ˆx k + α 2 var(γ(v ω, ˆx k 2αcov(z(v ω, ˆx k, Γ(v ω, ˆx k One can see easily that the larger the covariance between z(v ω, ˆx k andγ(v ω, ˆx k is,the smaller is the variance of the estimate of z(ˆx k. Minimizing the expression above with respect to α, we obtain the optimal value of α, α = cov(z(vω, ˆx k, Γ(v ω, ˆx k var(γ(v ω, ˆx k the value of alpha that leads to the smallest variance of the estimate. When carrying out Monte Carlo sampling with control variates, we estimate both cov(z(v ω, ˆx k, Γ(v ω, ˆx k, and var(γ(v ω, ˆx k using the sample S k used for estimating the expected value z(ˆx k. We estimate the coefficients of a cut using crude Monte Carlo: G(ˆx k = 1 π(v ω, ˆx k B(v ω using the sample points at hand from the computation of z(ˆx k, and compute g(ˆx k = z(ˆx k G(ˆx k ˆx k. DECIS uses both additive and multiplicative approximation functions as control variates. Both in the additive and multiplicative case the expected value calculation of the approximation function requires computing h 1-dimensional integrals or sums. 6
Additive approximation function Using the additive marginal cost approximation Γ(v ω, ˆx k =z(τ, ˆx k + h i, ˆx k, where we compute and i, ˆx k =z(τ 1,...,τ i 1,v ω i i,τ i+1,...,τ h, ˆx k z(τ, ˆx k, Γ(ˆx k =z(τ, ˆx k + h Γ i (ˆx k, Γ i (ˆx k = z(τ 1,...,τ i 1,v ωi i,τ i+1,...,τ h, ˆx k p ω i i z(τ, ˆx k. ω i Ω i Multiplicative approximation function Using the multiplicative marginal cost approximation Γ(v ω, ˆx k =z(τ, ˆx k i, ˆx k, where we compute and Γ i (ˆx k = i, ˆx k = z(τ 1,...,τ i 1,v ω i i,τ i+1,...,τ h, ˆx k z(τ, ˆx k, 1 z(τ, ˆx k Γ(ˆx k =z(τ, ˆx k Γ i (ˆx k, z(τ 1,...,τ i 1,v ωi i ω i Ω i,τ i+1,...,τ h, ˆx k p ω i i. 7