Ch4. Zhang Jin-Ting Department of Statistics and Applied Probability July 17, 2012 Ch4.
Outline Ch4.
This chapter aims to improve the Monte Carlo Integration estimator via reducing its variance using some useful techniques. Stratified Sampling Importance Sampling Control Variates Method Antithetic Variates Method Ch4.
This chapter aims to improve the Monte Carlo Integration estimator via reducing its variance using some useful techniques. Stratified Sampling Importance Sampling Control Variates Method Antithetic Variates Method Ch4.
This chapter aims to improve the Monte Carlo Integration estimator via reducing its variance using some useful techniques. Stratified Sampling Importance Sampling Control Variates Method Antithetic Variates Method Ch4.
This chapter aims to improve the Monte Carlo Integration estimator via reducing its variance using some useful techniques. Stratified Sampling Importance Sampling Control Variates Method Antithetic Variates Method Ch4.
This chapter aims to improve the Monte Carlo Integration estimator via reducing its variance using some useful techniques. Stratified Sampling Importance Sampling Control Variates Method Antithetic Variates Method Ch4.
The Integration Problem Suppose we want to estimate an integral over some region, such as I A = k(x)dx S where S is subset of R d, x denotes a generic point of R d, and k is a given real-valued function on S; or I B = h(x)f (x)dx d R where h is a real-valued function on R d and f is a given pdf on R d. Ch4.
The Transformed Problem: Monte Carlo Integration It is clear that I B can be written as an expectation: I B = E(h(X)) where X f. Also, extend the definition of k to all of R d by saying that k(x) = 0 for every x that is not in S, then I A = R k(x) = d R d k(x) f (x) f (x)dx = E[k(x) ]. (1) f (x) Ch4.
The Transformed Problem: Monte Carlo Integration It is clear that I B can be written as an expectation: I B = E(h(X)) where X f. Also, extend the definition of k to all of R d by saying that k(x) = 0 for every x that is not in S, then I A = R k(x) = d R d k(x) f (x) f (x)dx = E[k(x) ]. (1) f (x) Ch4.
Notice that k(x ) f (x ) is well-defined except where f equals 0, which is a set of probability 0. This is a simple trick that will be especially useful in the method known as Importance Sampling. Ch4.
Notice that k(x ) f (x ) is well-defined except where f equals 0, which is a set of probability 0. This is a simple trick that will be especially useful in the method known as Importance Sampling. Ch4.
Simple Sampling This leads to a natural Monte Carlo strategy for estimating the value of I B, say. If we can generate iid random variables X 1, X 2,... whose common pdf is f, then for every n, Î n = 1 n n h(x i ) i=1 is an unbiased estimator of I B. Ch4.
Simple Sampling This leads to a natural Monte Carlo strategy for estimating the value of I B, say. If we can generate iid random variables X 1, X 2,... whose common pdf is f, then for every n, Î n = 1 n n h(x i ) i=1 is an unbiased estimator of I B. Ch4.
Moreover, the strong law of large numbers implies that În converges to I B with probability 1 as n. This method for estimating I B is called simple sampling. Ch4.
Moreover, the strong law of large numbers implies that În converges to I B with probability 1 as n. This method for estimating I B is called simple sampling. Ch4.
The Variance Reduction Problem The variance of simple sampling estimator În of I B is var(în) = var(h(x)) n = ( S h(x)2 f (x)dx IB 2). (2) n The variance of the estimator determines the size of the confidence interval. The n in the denominator is hard to avoid in Monte Carlo, but there are various ways to reduce the numerator. Ch4.
The Variance Reduction Problem The variance of simple sampling estimator În of I B is var(în) = var(h(x)) n = ( S h(x)2 f (x)dx IB 2). (2) n The variance of the estimator determines the size of the confidence interval. The n in the denominator is hard to avoid in Monte Carlo, but there are various ways to reduce the numerator. Ch4.
The Variance Reduction Problem The variance of simple sampling estimator În of I B is var(în) = var(h(x)) n = ( S h(x)2 f (x)dx IB 2). (2) n The variance of the estimator determines the size of the confidence interval. The n in the denominator is hard to avoid in Monte Carlo, but there are various ways to reduce the numerator. Ch4.
The goal of this chapter is to explore alternative sampling schemes which can achieve smaller variance for the same amount of computational efforts. Ch4.
Stratified Sampling Step 1: Range Partition Stratified sampling is a powerful and commonly used technique in population survey and is also very useful in Monte Carlo computations. To evaluate I B, the stratified sampling is to partition S into several disjoint sets S (1),..., S (M) (so that S = M i=1 S(i) ). Ch4.
Stratified Sampling Step 1: Range Partition Stratified sampling is a powerful and commonly used technique in population survey and is also very useful in Monte Carlo computations. To evaluate I B, the stratified sampling is to partition S into several disjoint sets S (1),..., S (M) (so that S = M i=1 S(i) ). Ch4.
For i = 1,..., M, let a i = f (x)dx = P(X S (i) ). S (i) Observe that a 1 + + a M = 1. Fix integers n 1,..., n M such that n 1 + + n M = n. Ch4.
For i = 1,..., M, let a i = f (x)dx = P(X S (i) ). S (i) Observe that a 1 + + a M = 1. Fix integers n 1,..., n M such that n 1 + + n M = n. Ch4.
Step 2: Sub-sampling For each i, generate n i samples X (i) 1,..., X (i) n i from S (i) having the conditional pdf { f (x ) a g(x) = i if x S (i) 0 otherwise Ch4.
Let T i = n 1 ni (i) i j=1 h(x j ). Then E(T i ) = h(x) f (x) dx = 1 S (i) a i a i by defining I i = S (i) h(x)f (x)dx. S (i) h(x)f (x)dx = I i /a i, Ch4.
Step 3: The Stratified Estimator Observe that I 1 + + I M = I B. The stratified estimator is T = M a i T i. i=1 It is unbiased because of E(T ) = M a i E(T i ) = i=1 M a i I i /a i = I B. i=1 Ch4.
Step 3: The Stratified Estimator Observe that I 1 + + I M = I B. The stratified estimator is T = M a i T i. i=1 It is unbiased because of E(T ) = M a i E(T i ) = i=1 M a i I i /a i = I B. i=1 Ch4.
The variance of T is var(t ) = M ai 2 var(t i), i=1 where var(t i ) = following from (2). S h(x) 2 f (x ) (i) a i dx ( I i a i ) 2. n i Ch4.
Theorem The Foundation Theory of the Stratified Sampling If n i = na i for i = 1,..., M. then the stratified estimator has smaller variance than the simple estimator În. In fact, var(în) = var(t ) + 1 n M i=1 a i ( I i a i I B ) 2. The choice n i = na i, called proportional allocation, give a stratified estimator which has smaller variance than the simple estimator. Ch4.
Theorem The Foundation Theory of the Stratified Sampling If n i = na i for i = 1,..., M. then the stratified estimator has smaller variance than the simple estimator În. In fact, var(în) = var(t ) + 1 n M i=1 a i ( I i a i I B ) 2. The choice n i = na i, called proportional allocation, give a stratified estimator which has smaller variance than the simple estimator. Ch4.
Importance Sampling Property of the Important Sampling Importance sampling is a very powerful method that can improve Monte Carlo efficiency by orders of magnitude in some problems. But it requires Caution: an inappropriate implementation can reduce efficiency by orders of magnitude! Ch4.
Importance Sampling Property of the Important Sampling Importance sampling is a very powerful method that can improve Monte Carlo efficiency by orders of magnitude in some problems. But it requires Caution: an inappropriate implementation can reduce efficiency by orders of magnitude! Ch4.
The Basic Idea The method works by sampling from an artificial probability distribution that is chosen by the user, and then reweighting the observations to get an unbiased estimate. The idea is based on the identity (1) I A = R k(x) = k(x) f (x)dx = E[k(x) d R d f (x) f (x) ]. Ch4.
The Basic Idea The method works by sampling from an artificial probability distribution that is chosen by the user, and then reweighting the observations to get an unbiased estimate. The idea is based on the identity (1) I A = R k(x) = k(x) f (x)dx = E[k(x) d R d f (x) f (x) ]. Ch4.
It implies that I A can be estimated by Ĵ n = 1 n n i=1 k(x i ) f (X i ), where X i s are iid from f. We call Ĵn the importance sampling estimator based on f. The identity (1) implies that Ĵn is unbiased. Ch4.
It implies that I A can be estimated by Ĵ n = 1 n n i=1 k(x i ) f (X i ), where X i s are iid from f. We call Ĵn the importance sampling estimator based on f. The identity (1) implies that Ĵn is unbiased. Ch4.
It implies that I A can be estimated by Ĵ n = 1 n n i=1 k(x i ) f (X i ), where X i s are iid from f. We call Ĵn the importance sampling estimator based on f. The identity (1) implies that Ĵn is unbiased. Ch4.
The Important Sampling Procedure Suppose now one is interested in evaluating I B = h(x)f (x)dx, d R the procedure of the importance sampling is as follows: (a) Draw X 1,..., X n from a trial density g. (b) Calculate the importance weight (c) Approximate I B by w j = f (X j )/g(x j ), for j = 1,..., n. n j=1 Ĵ g,n = w jh(x j ) n j=1 w. (3) j Ch4.
The Important Sampling Procedure Suppose now one is interested in evaluating I B = h(x)f (x)dx, d R the procedure of the importance sampling is as follows: (a) Draw X 1,..., X n from a trial density g. (b) Calculate the importance weight (c) Approximate I B by w j = f (X j )/g(x j ), for j = 1,..., n. n j=1 Ĵ g,n = w jh(x j ) n j=1 w. (3) j Ch4.
The Important Sampling Procedure Suppose now one is interested in evaluating I B = h(x)f (x)dx, d R the procedure of the importance sampling is as follows: (a) Draw X 1,..., X n from a trial density g. (b) Calculate the importance weight (c) Approximate I B by w j = f (X j )/g(x j ), for j = 1,..., n. n j=1 Ĵ g,n = w jh(x j ) n j=1 w. (3) j Ch4.
The Important Sampling Procedure Suppose now one is interested in evaluating I B = h(x)f (x)dx, d R the procedure of the importance sampling is as follows: (a) Draw X 1,..., X n from a trial density g. (b) Calculate the importance weight (c) Approximate I B by w j = f (X j )/g(x j ), for j = 1,..., n. n j=1 Ĵ g,n = w jh(x j ) n j=1 w. (3) j Ch4.
Thus, in order to make the estimation error small, one wants to choose g as close in shape to h(x)f (x) as possible. Ch4.
An Alternative Important Sampling Procedure A major advantage of using (3) instead of the unbiased estimate, Ĩ B = 1 n w j h(x j ) n j=1 is that in using the former, we need only to know the ratio f (X)/g(X) up to a multiplicative constant; whereas in the latter, the ratio needs to be known exactly. Although introducing a small bias, (3) often has a smaller mean squared error than the unbiased one ĨB. Ch4.
An Alternative Important Sampling Procedure A major advantage of using (3) instead of the unbiased estimate, Ĩ B = 1 n w j h(x j ) n j=1 is that in using the former, we need only to know the ratio f (X)/g(X) up to a multiplicative constant; whereas in the latter, the ratio needs to be known exactly. Although introducing a small bias, (3) often has a smaller mean squared error than the unbiased one ĨB. Ch4.
An Alternative Important Sampling Procedure A major advantage of using (3) instead of the unbiased estimate, Ĩ B = 1 n w j h(x j ) n j=1 is that in using the former, we need only to know the ratio f (X)/g(X) up to a multiplicative constant; whereas in the latter, the ratio needs to be known exactly. Although introducing a small bias, (3) often has a smaller mean squared error than the unbiased one ĨB. Ch4.
Example 1 Let h(x) = 4 1 x 2, x [0, 1]. Let us imagine that we do not know how to evaluate I = 1 0 h(x)dx (which is π, of course). Ch4.
Use Simple Sampling The simple sampling estimate is Î n = 1 n n i=1 4 1 Ui 2, where U i are iid U[0,1] random variables. This is unbiased, with variance var(în) = 1 1 n ( h(x) 2 dx I 2 ) = 1 1 n ( 0 0 16(1 x 2 )dx π 2 ) = 0.797 n. Ch4.
Use Simple Sampling The simple sampling estimate is Î n = 1 n n i=1 4 1 Ui 2, where U i are iid U[0,1] random variables. This is unbiased, with variance var(în) = 1 1 n ( h(x) 2 dx I 2 ) = 1 1 n ( 0 0 16(1 x 2 )dx π 2 ) = 0.797 n. Ch4.
Use Inappropriate Important Sampling Consider the importance sampling estimate based on the pdf g b (x) = 2x, x [0, 1]. It is easy to generate Y i g b ( the cdf is F(t) = t 2, so we can set Y i F 1 (U i ) = U i, where U i U[0, 1]). The importance sampling estimator is Ĵ (b) n = 1 n n h(y i )/g b (Y i ) = 1 n i=1 n i=1 4 1 Yi 2. 2Y i Ch4.
Use Inappropriate Important Sampling Consider the importance sampling estimate based on the pdf g b (x) = 2x, x [0, 1]. It is easy to generate Y i g b ( the cdf is F(t) = t 2, so we can set Y i F 1 (U i ) = U i, where U i U[0, 1]). The importance sampling estimator is Ĵ (b) n = 1 n n h(y i )/g b (Y i ) = 1 n i=1 n i=1 4 1 Yi 2. 2Y i Ch4.
Use Inappropriate Important Sampling Consider the importance sampling estimate based on the pdf g b (x) = 2x, x [0, 1]. It is easy to generate Y i g b ( the cdf is F(t) = t 2, so we can set Y i F 1 (U i ) = U i, where U i U[0, 1]). The importance sampling estimator is Ĵ (b) n = 1 n n h(y i )/g b (Y i ) = 1 n i=1 n i=1 4 1 Yi 2. 2Y i Ch4.
The Ĵ(b) n has mean I and variance var(ĵ(b) n ) = 1 h(y ) var( n g b (Y ) ) = 1 n 1 0 ( h(x) g b (x) I)2 dx = +. Hence, the trial density g(x) = 2x is very bad, and we need try a different one. Ch4.
The Ĵ(b) n has mean I and variance var(ĵ(b) n ) = 1 h(y ) var( n g b (Y ) ) = 1 n 1 0 ( h(x) g b (x) I)2 dx = +. Hence, the trial density g(x) = 2x is very bad, and we need try a different one. Ch4.
Use Appropriate Important Sampling Let g c (x) = (4 2x)/3, x [0, 1]. The importance sampling estimator is whose variance is Ĵ (c) n = 1 n n i=1 var(ĵ(c) n ) = 1 h(y ) var( n g c (Y ) ) = 1 n = 1 n [ 1 0 4 1 Yi 2 (4 2Y i )/3, 1 0 ( h(x) g c (x) I)2 dx 16(1 x 2 ) (4 2x)/3 dx π2 ] = 0.224/n. Ch4.
Use Appropriate Important Sampling Let g c (x) = (4 2x)/3, x [0, 1]. The importance sampling estimator is whose variance is Ĵ (c) n = 1 n n i=1 var(ĵ(c) n ) = 1 h(y ) var( n g c (Y ) ) = 1 n = 1 n [ 1 0 4 1 Yi 2 (4 2Y i )/3, 1 0 ( h(x) g c (x) I)2 dx 16(1 x 2 ) (4 2x)/3 dx π2 ] = 0.224/n. Ch4.
Thus, the importance sampling estimate of (c) can achieve the same size confidence interval as the simple sampling estimate of (a) while using only one third as many generated random variables. Ch4.
Control Variates Method The Main Idea In this method, one uses a control variate C, which is Correlated with the sample X, to produce a better estimate. The Procedure Suppose the estimation of µ = E(X) is of interest and µ C = E(C) is known. Then we can construct Monte Carlo samples of the form X(b) = X + b(c µ C ), which have the same mean as X, but a new variance var(x(b)) = var(x) 2bCov(X, C) + b 2 var(c). Ch4.
Control Variates Method The Main Idea In this method, one uses a control variate C, which is Correlated with the sample X, to produce a better estimate. The Procedure Suppose the estimation of µ = E(X) is of interest and µ C = E(C) is known. Then we can construct Monte Carlo samples of the form X(b) = X + b(c µ C ), which have the same mean as X, but a new variance var(x(b)) = var(x) 2bCov(X, C) + b 2 var(c). Ch4.
Control Variates Method The Main Idea In this method, one uses a control variate C, which is Correlated with the sample X, to produce a better estimate. The Procedure Suppose the estimation of µ = E(X) is of interest and µ C = E(C) is known. Then we can construct Monte Carlo samples of the form X(b) = X + b(c µ C ), which have the same mean as X, but a new variance var(x(b)) = var(x) 2bCov(X, C) + b 2 var(c). Ch4.
If the computation of Cov(X, C) and var(c) is easy, then we can let b = Cov(X, C)/Var(C), in which case var(x(b)) = (1 ρ 2 XC )var(x) < var(x). A Special Case Another situation is when we know only that E(C) is equal to µ. Then, we can form X(b) = bx + (1 b)c. It is easy to show that if C is Correlated with X, we can always choose a proper b so that X(b) has a smaller variance than X. Ch4.
If the computation of Cov(X, C) and var(c) is easy, then we can let b = Cov(X, C)/Var(C), in which case var(x(b)) = (1 ρ 2 XC )var(x) < var(x). A Special Case Another situation is when we know only that E(C) is equal to µ. Then, we can form X(b) = bx + (1 b)c. It is easy to show that if C is Correlated with X, we can always choose a proper b so that X(b) has a smaller variance than X. Ch4.
If the computation of Cov(X, C) and var(c) is easy, then we can let b = Cov(X, C)/Var(C), in which case var(x(b)) = (1 ρ 2 XC )var(x) < var(x). A Special Case Another situation is when we know only that E(C) is equal to µ. Then, we can form X(b) = bx + (1 b)c. It is easy to show that if C is Correlated with X, we can always choose a proper b so that X(b) has a smaller variance than X. Ch4.
Antithetic Variates Method The Main Idea Suppose U is a random number used in the production of a sample X that follows a distribution with cdf F, that is, X = F 1 (U), then X = F 1 (1 U) also follows distribution F. More generally, if g is a monotone function, then [g(u 1 ) g(u 2 )][g(1 u 1 ) g(1 u 2 )] 0 for any u 1, u 2 [0, 1]. Ch4.
Antithetic Variates Method The Main Idea Suppose U is a random number used in the production of a sample X that follows a distribution with cdf F, that is, X = F 1 (U), then X = F 1 (1 U) also follows distribution F. More generally, if g is a monotone function, then [g(u 1 ) g(u 2 )][g(1 u 1 ) g(1 u 2 )] 0 for any u 1, u 2 [0, 1]. Ch4.
For two independent uniform random variable U 1 and U 2, we have E{[g(U 1 ) g(u 2 )][g(1 U 1 ) g(1 U 2 )]} = Cov(X, X ) 0, where X = g(u) and X = g(1 U). Therefore, var[(x + X )/2] var(x)/2, implying that using the pair X and X is better than using two independent Monte Carlo draws for estimating E(X). Ch4.
For two independent uniform random variable U 1 and U 2, we have E{[g(U 1 ) g(u 2 )][g(1 U 1 ) g(1 U 2 )]} = Cov(X, X ) 0, where X = g(u) and X = g(1 U). Therefore, var[(x + X )/2] var(x)/2, implying that using the pair X and X is better than using two independent Monte Carlo draws for estimating E(X). Ch4.
Example 2 We return once more to the problem of estimating the integral I = 1 0 4 1 x 2 dx. Choose a large even value of n. As usual, our Simple Estimator and its Variance are Î n = 1 n n h(u i ), i=1 var(în) = 0.797/n. Ch4.
Example 2 We return once more to the problem of estimating the integral I = 1 0 4 1 x 2 dx. Choose a large even value of n. As usual, our Simple Estimator and its Variance are Î n = 1 n n h(u i ), i=1 var(în) = 0.797/n. Ch4.
Our corresponding Antithetic Estimator and its Variance are În An = 1 n/2 (h(u i ) + h(1 U i )). n i=1 var(îan n ) = 1 n 2 {n 2 [var(h(u 1) + 2Cov(h(U 1 ), h(1 U 1 )) + var(h(1 U 1 ))]} = 1 n [var(h(u 1) + Cov(h(U 1 ), h(1 U 1 ))] = 0.219/n Ch4.