Generating Random Numbers Aim: produce random variables for given distribution Inverse Method Let F be the distribution function of an univariate distribution and let F 1 (y) = inf{x F (x) y} (generalized inverse of F ). For a uniformly distributed random variable U U(0, 1), let X = F 1 (U). Then X has distribution function F : P(X x) = P(F 1 (U) x) = P(U F (x)) = F (x). Example: For the exponential distribution, we have F (x) = 1 exp( θx). Thus X = F 1 (U) = 1 θ log(1 U) is exponentially distributed with parameter θ. Remarks: The result shows that the generation of sequences of random variables (usually iid) for some given distribution is based on the the production of uniform random variables. Since the algorithms for generating uniform random variables are deterministic, we call them pseudo random number generators. References: Robert, C.P. and Casella G. (2004) Monte Carlo Statistical Methods. Springer, New York. Ripley, B.D. (1987). Springer, New York Monte Carlo Methods, Apr 22, 2004-1 -
Generating Random Numbers Acceptance-Rejection Method Let f be the density of some univariate distribution we want to sample from. Suppose the f( ) is majorized by c g( ), f(y) c g(y), for some simple probability density g and some constant c > 1. The idea of acceptance-rejection is to sample proposals X from the simpler (simulationwise) density g and then to reject some proposals which are likely to be overrepresented in the sample: Sample X g(x) Sample U U(0, 1) Accept X if U f(x) c g(x). Example: Non-standard prior for binomial parameter Suppose that Y is binomially distributed Y Bin(n, θ) with non-conjugate prior distribution π(θ) = 4 ( 1 2 1 θ ), θ [0, 1]. 2 The posterior distribution (conditional on data Y = y) satisfies π(θ y) θ y (1 θ) n y ( 1 2 θ 1 ), θ [0, 1]. 2 Monte Carlo Methods, Apr 22, 2004-2 -
Generating Random Numbers Example: Non-standard prior for binomial parameter (contd) Since 1 2 θ 1 2 1 2 2 (θ 1 2 )2 = 2 θ (1 θ) we get for some c > 0 π(θ y) c θ y+1 (1 θ) n y+1 The right side is proportional to the density of a beta distribution with parameters y + 2 and n y + 2. This suggests the following sampling scheme: Define f(θ) = θ y (1 θ) ( n y 1 2 θ 1 ) 2 g(θ) = 2 θ y+1 (1 θ) n y+1 Sample θ Beta(y + 2, n y + 2) U U(0, 1) Accept θ if U f(θ) g(θ). Note: We did not need to compute the proportionality factor f(y θ) π(θ) dθ, Θ since this cancels in the ratio f(θ) g(θ). 3.0 Acceptance Rejection for Bayesian Inference with Non Standard Prior 1600 2.5 cg(θ) 2.0 1200 Density 1.5 1.0 0.5 π(θ y) Frequency 800 400 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 θ 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 θ Monte Carlo Methods, Apr 22, 2004-3 -
Generating Random Numbers For other distributions (e.g. the normal) exist special methods. In this course, we will only use the built-in generators in R: N (µ, σ 2 ) U[min, max] Beta(a, b) Bin(s, p) Cauchy(α, σ)) χ 2 n(δ) Exp(rate) F m,n Γ(a, s) Geom(p) H(m, n, k) log-normal(µ, σ 2 ) Logistic(µ, σ 2 ) NegBinom(s, p) Poisson(λ) t n Weibull(a, b) rnorm(n,mean=0,sd=1) runif(n,min=0,max=1) rbeta(n,a,b) rbinom(n,size,prob) rnorm(n,loc=0,scale=1) rchisq(n,df,ncp=0) rnorm(n,rate=1) rf(n,df1,df2) rgamma(n,shape,rate=1,scale=1/rate) rgeom(n,prob) rhyper(nn,m,n,k) rlnorm(n,mean=0,sd=1)) rlogis(n,loc=0,scale=1) rnbinom(n,size,prob,mu) rpois(n,lambda) rt(n,df) rweibull(n,shape,scale=1) R also provides functions for calculating the density (df), the distribution function (pf) and quantiles (qf), where F is the name of the distribution as in the above commands. Monte Carlo Methods, Apr 22, 2004-4 -
Monte Carlo Methods Aim: Evaluate expectation E(h(Y )) = h(y) f(y) dy, (1) where Y is some (possibly high-dimensional) random variable with distribution defined by f(y). Examples: Suppose ˆθ = ˆθ(Y ) is an estimator for some parameter θ. Quantities of interest are the bias and the standard deviation of ˆθ, bias(ˆθ) = E(ˆθ) θ, σ(ˆθ) = ( E(ˆθ E(ˆθ)) 2) 1 2. In cases, where the estimate ˆθ is obtained by an iterative estimation procedures (e.g. by Newton-Raphson), the estimator ˆθ(Y ) cannot be written in closed form and the integral (1) cannot be computed by numerical integration. For two normally distributed samples Y 11,..., Y n1 1 and Y 12,..., Y n2 2, the hypothesis of equal means can be tested by the two-sample t test with test statistic T = Ȳ1 Ȳ2. s 2 1 n 1 + s2 2 n 2 The p-value of the test is defined as P(T (Y ) > t) = E ( (1 (t, ) (T (Y )) ). To see how the t test performs under departures from normality, we need to evaluate the p-value (or equivalently the significance level) of the test for nonnormal distributions f(y). Monte Carlo Methods, Apr 22, 2004-5 -
Monte Carlo approach: Draw sample Y (1),..., Y Estimate expectation by 1 n h(y (t) ) (Monte Carlo integration) Monte Carlo Methods (n) iid f(y). For independent samples, the Law of Large Numbers yields 1 n h(y (t) ) E(h(Y )) as n. Example: Two-sample t test in R N<-10000 #number of MC repititions n1<-8 #sample size n2<-4 Y1<-rgamma(N*n1,1,1) #sample from gamma distribution Y2<-rgamma(N*n2,2,2) #Y1<-rnorm(N*n1,0,1) #alternatively: #Y2<-rnorm(N*n2,0,2) #sample from normal distribution Y<-c(Y1,Y2) dim(y)<-c(n,n1+n2) tstat<-function(y,n1,n2) { Y1<-Y[1:n1] Y2<-Y[(n1+1):(n1+n2)] #two-sample t test statistic T<-(mean(Y1)-mean(Y2))/sqrt(var(Y1)/n1+var(Y2)/n2) #Satterthwaite approximation of degrees of freedom df<-(var(y1)/n1+var(y2)/n2)^2/((var(y1)/n1)^2/(n1-1)+(var(y2)/n2)^2/(n2-1)) return(c(t,df)) } #calculate test statistic for N samples R<-apply(Y,1,tstat,n1,n2) #estimate significance level a<-mean(ifelse(abs(r[1,])>qt(0.975,r[2,]),1,0)) a Monte Carlo Methods, Apr 22, 2004-6 -
Parametric Bootstrap Suppose Y 1,..., Y n iid P θ and ˆθ = S(Y ) is an estimator for θ. Aim: Determine variance of ˆθ. Problem: Iterative algorithms such as the EM algorithm often do not give analytic expressions for the variance of the estimator. Idea: Simulate from the distribution P θ to obtain samples Y (b) = (Y (b) 1,..., Y (b) n ), b = 1,..., B and for each sample an estimate ˆθ (b) = S(Y (b) ). Then the standard error of ˆθ = S(Y ) can be estimated by ˆσ 2 (ˆθ) = 1 B B (ˆθ(b) θ ) 2. b=1 Problem: We cannot sample from P θ since θ is unknown. Idea: Approximate P θ by Pˆθ Parametric bootstrap: Estimate θ by ˆθ = S(Y ) For b = 1,..., B, simulate bootstrap replications Y (b) j iid Pˆθ. Compute bootstrap estimate ˆθ (b) from the bootstrap sample Y (b). Estimate the variance of ˆθ = S(Y ) by ˆσ 2 (ˆθ) = 1 B B (ˆθ (b) θ ) 2, b=1 where θ = 1 B B ˆθ (b). b=1 Monte Carlo Methods, Apr 22, 2004-7 -
Importance Sampling Aim: Reduction of variance Rewrite E(h(Y )) with Y f as h(y) f(y) h(y) f(y) dy = g(y) dy. g(y) Thus if Z (1),..., Z (n) g then 1 n h(z (t) ) f(z (t) ) g(z (t) ) offers an alternative approximation for E(h(Y )). The variance of the Monte Carlo estimate for E(h(Y )) is minimized if h(y) f(y) g(y) constant Example: Suppose we want to approximate p = P(Y c) for (i) Y N (0, 1) (ii) Y Cauchy(0, 1) An intuitive Monte Carlo approximation is where Y p n = 1 n ( 1 ) (c, ) Y (t) (t) iid (t) iid N (0, 1) resp. Y Cauchy(0, 1). Since p n is the mean of n Bernoulli random variables, its relative error is var(pn ) 1 p = p p n. Problem: For small p the relative error is large. Monte Carlo Methods, Apr 22, 2004-8 -
Importance Sampling Alternative approach: Let Z (t) be exponentially distributed on (c, ), Z (t) g(z) = exp(c z) 1 (c, ) (z). Then p can be approximated by p n = 1 n f ( Z (t)) g ( Z (t)), where f is the density of a (i) normal or (ii) Cauchy distribution. 0.4 p n solid p n dashed 14 p n solid p n dashed 0.3 12 pn [%] 0.2 pn [%] 10 0.1 8 0.0 6 0 10 20 30 40 50 t 0 10 20 30 40 50 t 3.5 4 3.0 f(z)/g(z) [x1000] 3 2 1 f(z)/g(z) 2.5 2.0 1.5 1.0 0.5 0 3 4 5 6 7 8 9 10 z 0.0 3 4 5 6 7 8 9 10 z Remarks: If the ratio f/g is unbounded, the weights f(z (t) )/g(z (t) ) vary widely and the approximation is large influence by only a few values. The distribution g should have heavier tails than f. Monte Carlo Methods, Apr 22, 2004-9 -
Markov Chain Monte Carlo Suppose that Y = (Y 1,..., Y d ) T has density f(y). Aim: Approximation E(h(Y )) 1 n h(y (t) ) with Y (1),..., Y (n) iid f(y). Problem: Independent sampling from a multivariate distribution is often difficult. Solution: The Law of Large Numbers 1 n h(y (t) ) E(h(Y )) as n. still applies if Y (t) are (not too) dependent observations with Y (t) f(y). Idea: Generate a sequence of random numbers Y (k) which converges to a dependent sample from the joint distribution f(y) Y (t) f(y) but NOT Y (t) iid f(y). We will show that this can be accomplished by sampling Y (t) i conditional distribution f ( y i Y (t) 1,..., Y (t) i 1, Y (t 1) i+1,..., Y (t 1) ) d from the for i = 1,..., d Monte Carlo Methods, Apr 22, 2004-10 -