Estimating Joint Default Probability by Efficient Importance Sampling with Applications from Bottom Up

Estimating Joint Default Probability by Efficient Importance Sampling with Applications from Bottom Up Chuan-Hsiang Han October 6, 2009 Abstract his paper provides a unified approach to estimate the probability of joint default under classical models in the bottom up approach. Starting from a toy model defined on the Gaussian random variable in one dimension, we develop an importance sampling scheme and consider it variance approximation problem with a small scale. By means of the large deviation principle, the importance sampling is proved to be efficient, justified by numerical experiments in credit risk applications such as VaR and C-VaR estimation. he same approach is applicable to construct importance sampling schemes for high dimensional problems including some factor copula models in reduced form and structural-form models. In particular, the large deviation principle is applied to prove that all these importance sampling schemes are efficient for rare event simulation. Extensive numerical examples demonstrate the efficiency and stability of importance sampling. When stochastic correlation or stochastic volatility arising from structural-form models, the importance sampling proposed cannot be applied to estimate the joint default probability. We overcome this difficulty by a combination of the singular perturbation approximation with the importance sampling scheme. A numerical example for computing the loss density function of a credit portfolio confirms the efficiency and stability of this new importance sampling method in high dimensions. Department of Quantitative Finance, National sing Hua University, Hsinchu, aiwan, 3003, ROC, chhan@mx.nthu.edu.tw. Work supported by NSC 97-25-M-007-002- MY2, aiwan. Ackowledgements: NCS, National sing-hua University and IMS, National aiwan University.

Keywords: Bottom up approach, Reduced-form model, Structureform model, Copula, Importance sampling, Singular Perturbation. Introduction We aim to estimate the probability of joint default among a credit portfolio from the bottom up approach, in which the likelihood of constituent default is specified. For example, a reduced-form model imposes an intensity process to characterize the distribution of constituent default time, while a structural-form model determines the timing of constituent default-trigger event based on whether assets are too small relative to liabilities on a balance-sheet. See [0, 24] for details in this approach. In contrast, in the top down approach the portfolio intensity is specified as in reduced form without references to constitutes [6]. A constituent intensity is recovered by random thinning. A comparison of bottom up and top down approaches both in reduced form can be found in [5]. he probability of joint default is important for predicting possible losses of credit portfolio and the evaluation of multi-name credit derivatives [20]. Due to the high-dimenional nature of joint default, its probability can be estimated by Monte Carlo simulation. he main advantage of the basic Monte Carlo method is the independence of dimension, while its disadvantage comes from the slow rate of convergence. Hence it is crucial to reduce variance of the Monte Carlo estimator in order to improve its convergence. Control variate and importance sampling are prominent techniques for variance reduction. See [7, 23] for details as well as many other variance reduction techniques. By additivity, the performance of control variate relies on the correlation between the random variable to be evaluated and the additive control random variable. More correlation, either positive or negative, would reduce more variance. On the contrary, by multiplicity the performance of importance sampling relies on how the distribution is relocated. For example, one can add more weights to random samples falling into region of interest and reduce weights to other samples in order to maintain the sum of total weights being one. his approach has a potential of significant performance for rare event simulation. Adding more weights to where rare event happens will increase its occurrence, so that the event of interest under the new distribution is not rare any more and accurate estimate can be expected. In this paper, we explore a unified approach to analyze variances of importance sampling estimators. We construct importance sampling schemes in an intuitive fashion, while defining small scales to amplify 2

rare events. hen by the large deviation principle, we approximate the first moment of an importance sampling estimator, unbiased to the default probability, and its second moment. In the asymptotic sense, when a zero variance is obtained, an efficient importance sampling scheme is verified and follows by financial applications. We show that this approach is applicable to one-diemnsional toy model, highdimensional factor Gaussian copula models in reduced form, and highdimensional first passage time problems in structural form. In an application to estimate daily VaR under the jump-diffusion model, see abel 6 in Section 3.2.2 our importance sampling method performs even better than those importance sampling schemes taking jump effects into account [27]. Another interesting application, which is beyond finance, is to compute the joint CDF of multi-normal variate. It can be viewed as an extension of factor Gaussian copula model. Numerical results generated from our efficient importance sampling are comparable to those from the Matlab code mvncdf.m based on the Quasi Monte Carlo method developed in [4]. Moreover, there is no dimension restriction in our method while it seems that the maximal dimension is 25 in this Matlab code. When complex models such as stochastic correlation or stochastic volatility in structural form, efficient importance sampling schemes cannot be applied. Based on the singular perturbation method developed in [3], we derive an asymptotic expansion of the joint default probability for a stochastic correlation model. he leading-order approximation term in the expansion has an effective correlation, which is constant, so that an importance sampling scheme can be constructed as before. Hence we propose a new importance sampling scheme based on a singular perturbation approximation. Carmona, Fouque, and Vestal [7] recently study a first passage time problem and estimate the loss density function for a credit portfolio under a stochastic volatility model. hey use interacting particle systems for variance reduction. Our new importance sampling scheme is an alternative to treat similar problems in [7], and we find that numerical results perform reasonably well in accuracy and stability. he organization of this paper is the following. Section 2 explores the fundamental understanding in a simple Gaussian random variable model with a small scale. Approximation to the variance of the efficient importance sampling estimator can be obtained directly by a tail approximation, then justified by the Cramer s theorem. Section 3 explores standard classical models in reduced form and in structural form. We show that importance sampling schemes can also be developed in these models. Section 4 and Section 5 provides a comprehensive treatment to the context of random environment such as stochas- 3

tic correlation and stochastic volatility. We first apply the singular perturbation method [3] to obtain the homogenized leading-order term with an accuracy result under some smoothness and boundness assumptions. hen we use this leading-order approximation to construct an importance sampling scheme. An example to compute the portfolio loss density function is provided. 2 A oy Model: One Dimension We start from a simple and static model to estimate the probability of default defined by P c = E{I (X > c}, where the standard normal random variable X N (0, denotes the loss of a portfolio and c > 0 denotes a loss threshold. Of course, P c is simply a tail probability and admits a closed-form solution N ( c, where N (x denotes the cumulative normal integral function. Using the basic Monte Carlo method to estimate P c is not feasible when the loss threshold c is large. his is because the number of random samples falling inside the default region {X > c} is too small (could be zero relative to the number of simulation. hat is, a problem of rare event simulation is encountered when the scale c is large. Applying importance sampling as a variance reduction method to treat the rare event simulation has been extensively studied [6]. An intuitive idea is to shift the mean level of X from zero to µ, typically nearby the default threshold c, by a suitable likelihood function so that the default event is no longer rare under the new probability measure. For example, the density function 2 π e (x µ2 /2 is often used to change the probability measure as follows P c = 2π I (x > c e x2 /2 e (x µ2 /2 e (x µ2 /2 dx = E µ {I (X > c e µ2 /2 e µx}, ( where X N (µ, is defined under the new probability measure P µ with the likelihood function e µ2 /2 e µx. Note that for any given µ, the estimator associated with ( is unbiased to the default probability. For the purpose of variance reduction, one seeks to minimize the second moment function P2 c (µ defined by P c 2 (µ := E µ {I (X > c e µ2 e 2µX}, (2 which also admits a closed-form solution e µ2 ( N (c + µ due to the normality assumption. Besides to compute the minimizer µ numerically, one can solve it through some asymptotic analysis. ypically 4

the later determines a suitable µ a priori as an approximation to the optimal µ, so it has no cost in solving the minimizer P c 2 (µ. Recall the classical tail approximation if x, N (x 2πx e x2 /2. (3 By the choice of µ = c and for large loss threshold c, straightforward calculation lead to the following approximations: P c = N ( c 2πc e c2 /2, P c 2 (c = e c2 ( N (2c Note that P c 2 (c (P c 2 for large c, or precisely lim c 2π2c e c2. c 2 log P 2 c (c = 2 lim c c 2 log P c. his result implies that by the choice of mean shift as µ = c, the variance of the estimator (, P c 2 (c (P c 2, is asymptotically zero when c is large enough. herefore, we prove the following asymptotic result. Lemma. he variance of the estimator in ( with µ = c approaches zero when c approaches infinity. According to Bucklew [6], the importance sampling scheme based on { P c = E c I (X > c e c2 /2 e cx} is efficient because of the asymptotic property P 2 (c (P c2 for large c. able demonstrate performance of two importance sample schemes to estimate the default probability P (X > c for X N (0, with various loss threshold c. In Column 2, exact solutions N ( c are listed. In each column of Monte Carlo simulation, including importance sampling, Mean and SE stand for the sample mean and the sample standard error, respectively. IS(µ = c represents the scheme in ( using the pre-determined choice of µ = c suggested from the asymptotic analysis in Lemma, while IS(µ = µ represents the optimal scheme in ( using µ = µ s tar, which minimizes P2 c (µ numerically. We observe that standard errors obtained from these two importance sampling schemes are comparable in the same order of accuracy, while the computing time is not. From the last row, the IS(µ = µ scheme takes about 50 times more than the IS(µ = c scheme. hese numerical experiments are implemented in Matlab on a laptop PC with 2.40GHz Intel Duo CPU 8300. 5

able : Estimation of default probability P (X > c with different loss threshold c when X N (0,.he total number of simulation is 0,000. DP Basic MC IS(µ = c IS(µ = µ c true M ean SE M ean SE M ean SE 0.587 0.566 0.0036 0.592 0.009 0.594 0.008 2 0.0228 0.022 0.004 0.0227 3.49E-04 0.0225 3.37E-04 3 0.003.00E-03 3.6E-04 0.004 2.53E-05 0.004 2.5E-05 4 3.7e-05 - - 3.3E-05 6.62E-07 3.E-05 6.66E-07 time 0.004659 0.020904.06067 2. Laplace Method: wist or iled Probability A general procedure to construct an importance sampling scheme for estimating P c = E {I(X > c} is the following. Assume that the density function of the real-valued random variable X is f(x > 0 for each x R. One can change the measure by { E {I(X > c} = E µ I(X > c f(x }, f µ (X so that the density of X is f µ (x > 0 for each x R under the new probability measure P µ. he twist or titled probability measure refers to the choice of f µ (x exp(µ x f(x being equal to f µ (x = M(µ, where M(µ = E[exp(µ X] denotes the moment generating function of X. Substituting this choice of f µ (x into the second moment, one can obtain { P2 c (µ := E µ I(X > c f 2 } (X fµ(x 2 = E = M(µE {I(X > c exp( µ X} { I(X > c f(x f µ (X M(µ exp( µ c, (4 where we assume that µ and c are positive numbers for this upper bound to hold. o minimize the logarithm of this upper bound, its first order condition is satisfied d ln(m(µ exp( µ c dµ = M (µ M(µ c = 0. If µ solves for M (µ M(µ = c, it follows that the expected value of X under the new probability measure P µ is exactly the loss threshold c. 6 }

his is confirmed by x exp(µ E µ (X = x f µ (x x f(x dx = M(µ dx = M (µ M(µ = c. From the simulation point of view, this is a favorable feature because the rare event of default under the original probability measure is no longer rare under this new measure P µ. In the concrete case of X being a standard normal random variable, we deduce that c = M (µ M(µ = µ exp(µ 2 /2 exp(µ 2 /2 = µ. hat is, the minimizer µ, which is a candidate of shifted mean for X, is equal to c, so that the twist or tiled density becomes f µ (x = exp(µ x f(x M(µ = exp( (x c2 /2 2 π. his result actually coincides with the asymptotic choice µ = c studied in Lemma. here have been extensive studies and applications, see for example [6, 7], in using this concept of minimizing an upper bound of the second moment under a parametrized twist probability, in order to construct an importance sampling scheme. In general, it remains to check whether this scheme is efficient or not. 2.2 Asymptotic Analysis by Large Deviation heory In Lemma, the variance reduced by the importance sampling scheme is proved to be asmptotically optimal by an application of tail approximation. his result can also be obtained by Cramer s theorem in large deviation theory [6, 9]. heorem. (Cramer s theorem [6] Let {X i } be real-valued i.i.d. r.v. s under IP and IEX <. For any x IEX, we have ( Sn lim ln IP n n n x = inf y x Γ (y, (5 where S n = n i= X i denotes the sample sum of size n, Γ(θ = ln IE { e θx } denotes the cumulant function, and Γ (x = sup θ R [θ x Γ(θ]. From this theorem, we can deduce the following asymptotic result, which is similar to Lemma. Corollary 2. Assume that X N (0,. When c, then P c = E {I (X c} exp ( c 2 /2 and P c 2 (c exp ( c 2 7

Proof: From heorem and the moment generating function E {exp(θx} = exp(θ 2 /2, we obtain ( n lim n n ln P i= X i x = x2 n 2, ( P n i= or equivalently P X i n x exp( n x2 2. Note that the default probability P (X c is equal to P nc, where ( P n i= X i n each random variable X i has the same distribution as X. Hence P (X c exp( c2 2 for c := nx. With the choice of µ = c, the asymptotic result P2 c(c exp( c2 can be similarly derived by an application of change of measure. 2.3 Risk Management: VaR and C-VaR Computation Value at Risk (VaR in short is a practical measurement to predict risk exposure in financial applications [20]. In statistics, VaR is simply a quantile given a loss distribution. For example, given any confidence α (0 α, estimating VaR is to invert the loss threshold c α so that its default probability P (X > c α is equal to α. Hence, VaR estimation can be thought as an inverse problem of the default probability estimation. In [25] many numerical methods, such as nonlinear algebraic equation solvers, can be found to resolve this inverse part of the problem numerically. Artzner et. al. [] provided some criteria for a good risk measure to satisfy. hey showed that VaR may fail to satisfy the diversification principle and proposed a generalization called the conditional value at risk (C-VaR, known as expected shortfall in risk management. C- VaR is defined as the conditional expectation E {X X > c}, where c = V ar α satisfies P (X V ar α = α. When X is standard normal, E {X X > c} admits a closed-form solution e c2 /2 2πN ( c. he basic Monte Carlo algorithm to calculate E {X X > c} is the following: n c = N I(X (i > c, (N is the total number of simulations. i= E {X X > c} n c X (i, for each X (i > c. (6 n c i= We now derive a generic importance sampling scheme. By choosing a likelihood function Q = dp d P, a new probability measure P is defined, 8

able 2: Comparison of C-VaR by basic Monte Carlo (BMC, exact solution, and importance sampling-is(µ = c. he number of simulations is 0 6. Standard errors are shown in parenthesis. c BM C Exact Sol IS 3 3.285 (0.007 3.283 3.286 (0.0047 4 4.222 (0.0328 4.226 4.229 (0.0076 5 - (- 5.87 5.78 (0.009 then one can obtain E {X X > c} = = = = E {XI(X > c} P (X > c Ẽ {XI(X > cq} Ẽ {I(X > cq} N N i= X(i I(X (i > cq(x i N N i= I(X(i > cq(x i nc nc i= X(i Q(X i i= Q(Xi n c X (i q i, for each X (i > c, (7 i= where q i = Q(X (i / n c i= Q(X(i. Note that under the importance sampling scheme, the C-VaR is approximated by the sum of a collection of random samples X (i, i =, n c, with different weight q i for each X (i. his is different from the basic Monte Carlo (6, in which the weight associated to each sample is fixed as n c. We remark that the calculation of the standard error in this non-equally weighted case is s.e. nc i= ( X (i ˆm 2 qi nc, for each X (i > c, where ˆm denotes the sample mean. In able 2, we compare numerical results obtained from the basic Monte Carlo, exact solution, and importance sampling scheme adopting µ = c in (. he significance of this importance sampling scheme is particularly shown in the regime of rare event, say c 4. 9

3 Classical Models in Credit Risk: High Dimensions Structural-form model and reduced-form model are considered as the most popular models to characterize default times in credit risk modeling [8, 0, 24]. Structural-form models describe the time to default as a default-trigger event, while reduced-from models treat default as an unexpected event whose likelihood follows a default-intensity process. As a high-dimensional extension of Section 2, here we consider the computation of joint default probability under ( factor copula model in reduced form, and (2 the structural-form model. 3. Efficient Importance Sampling for Factor Copula Model in Reduced Form Let F i (t, 0 t denote the default time distribution of firm i. Under the Gaussian copula factor model [8], the default time of firm i, denoted by τ i, is characterized by τ i = Fi (Φ(W i, where F i is assumed to be CDF of the exponential random variable with intensity λ, Φ is CDF of the standard normal random variable, and W i = ρ i Z 0 + ρ 2 i Z i denotes the factor. he common factor Z 0 and marginal factor Z i are assumed standard normal, and ρ s are assumed constant. hrough the correlation between W i and W j, default times τ i and τ j are correlated. his is a popular approach taken in copula factor methods. he default event is defined by { τ i = Fi (Φ(W i } = { W i Φ (F i ( }. where the maturity is positive and finite. When the joint default event of n firms (or called names is considered, its probability is E {Π n i=i(τ i } = E { Π n i=i(w i c i := Φ (F i ( }. (8 his is a special case of the high dimensional version considered in the toy model. Similarly, a Student- copula factor model characterizes each default event by { S i t ν (F i ( }, where S i is a Student- variable following the distribution t ν and ν is the degree of freedom. Next we focus on the estimation problem (8 as a high-dimensional version of the toy model. A scale L is introduced to define the joint default probability by { = E I(X } L C, (9 P L where X is assumed a centered multivariate normal with dimension n, distributed by N (0, Σ, and C > 0 is the threshold vector. As in 0

the toy model, let P µ be the new probability measure { so that X is distributed by N (µ, Σ under P µ and P L = E µ I(X } L C dp dp µ. his shifted mean µ is a vector of size n. he second moment of the weighted random variable is defined as { ( P2 L (µ = E µ I(X L C dp } 2. (0 dp µ It is shown below the optimal choice of µ is µ = L C in the asymptotic sense, so that the second moment is approximately the square of the probability of joint default. heorem 3. [8] Assume that the scale L and each element in the vector C R n are positive, and X N (0, Σ is R n -valued. We obtain the following asymptotic approximations: lim L L log P 2 L ( L C = 2 lim L L log P L = C Σ C. hat implies that the importance sampling scheme P L = E L C {I(X L C exp(2 } LC Σ X + LC Σ C, where X N ( L C, Σ to estimate the probability of joint default is efficient. Remark: his theorem treat the multivariate normal case which is more general than the factor Gaussian model considered. he proof of this theorem can be found in [8] by using an application of Gartner- Ellis heorem [9]. Next we conduct numerical experiments to estimate the joint default probability (8 in homogeneous case. Parameters are chosen as for i =,, n, ρ = ρ i = 0.5 and c = c i = 2 for each entry in C. We compare the basic Monte Carlo method, our importance sampling scheme, and a Matlab scheme mvncdf.m, which is based on a quasi Monte Carlo method developed in [4]. In able 3., we vary n the number of firms from 5 to 25. (he maximal number 25 is set in the Matlab code, while our importance sampling scheme can surely go beyond this number. Numerics (mean and standard error generated from the efficient importance sampling scheme are roughly of the same order of accuracy with those from the Matlab code. (Matlab performs better before n=6 while the importance sampling performs better for most of the rest. Moreover this importance sampling scheme can be applied to many applications such as the probability of k-th to default, i.e., E {I(τ }, where τ is the k-th order statistic of τ,, τ n. For the case of factor Student- model, we use conditional importance sampling method. hese applications can be found in [8].

able 3: Estimation joint default probability with different number of firms under Gaussian copula factor model.(c = 2, ρ = 0.5, number of simulation is 25000 Basic MC Importance Sampling Quasi MC n M ean SE M ean SE M ean Error 5 4.00E-05 4.00E-05.35E-05 5.46E-07.40E-05.3E-07 6 0.00E+00 0.00E+00 4.74E-06 2.69E-07 4.77E-06.4E-07 7 0.00E+00 0.00E+00.78E-06.03E-07.85E-06 3.93E-08 8 0.00E+00 0.00E+00 7.50E-07 5.57E-08 8.2E-07 2.39E-08 9 0.00E+00 0.00E+00 3.96E-07 2.94E-08 3.8E-07 3.49E-08 0 0.00E+00 0.00E+00 2.3E-07.7E-08 2.0E-07.62E-08 0.00E+00 0.00E+00.02E-07.0E-08.07E-07 8.07E-09 2 0.00E+00 0.00E+00 8.55E-08 9.43E-09 6.30E-08 4.77E-09 3 0.00E+00 0.00E+00 3.43E-08 4.36E-09 3.65E-08 2.83E-09 4 0.00E+00 0.00E+00 2.09E-08 2.32E-09 2.24E-08.7E-09 5 0.00E+00 0.00E+00.67E-08 2.46E-09.52E-08 2.30E-09 6 0.00E+00 0.00E+00 8.83E-09.73E-09 9.77E-09.73E-09 7 0.00E+00 0.00E+00 6.7E-09 8.85E-0 7.6E-09 2.59E-09 8 0.00E+00 0.00E+00 4.4E-09 5.78E-0 4.6E-09.08E-09 9 0.00E+00 0.00E+00 2.94E-09 5.34E-0 3.80E-09 2.5E-09 20 0.00E+00 0.00E+00 2.54E-09 4.02E-0 2.56E-09 7.42E-0 2 0.00E+00 0.00E+00.47E-09 3.62E-0.64E-09 3.34E-0 22 0.00E+00 0.00E+00.45E-09 2.89E-0.35E-09 3.99E-0 23 0.00E+00 0.00E+00.28E-09 2.8E-0.07E-09 4.47E-0 24 0.00E+00 0.00E+00.05E-09.89E-0 6.23E-0.67E-0 25 0.00E+00 0.00E+00 4.42E-0.0E-0 7.26E-0 3.44E-0 2

3.2 First Passage ime Problem in Structural- Form Model Estimation of the joint default probability under a structural-form models emerged pretty early in the presence of stochastic financial theory. In models of Black and Scholes [2] and Merton [26], default can happen only at the maturity date of debt if the issuer s asset value is less than the debt value. Black and Scholes modeled the asset value process by a geometric Brownian motion while Merton incorporated an additional compound Poisson jump term into Black and Schole s model. Black and Cox [3] generalized these models by allowing that default can occur any time before the maturity date of debt. hey considered a first passage time problem for geometric Brownian motion in one dimension. In Zhou [28], the author extended the onedimensional geometric Brownian motion to the jump-diffusion model as Merton did, and in Zhou [29], the author considered the joint default for two-dimensional geometric Brownian motions. A comprehensive technical review can be found in [4]. In this section, we focus on generalizing the joint default from a twodimensional first passage time problem to any finite dimension. o estimate the probability of joint default, we develop an importance sampling scheme which can be proved to be efficient. hen we apply this methodology to many models such as jump-diffusion models, stochastic correlatino/volatilility models, and problems such as VaR estimation and loss density estimation. A high-dimensional setup of the first passage time problem under correlated geometric Brownian motions is the following. We assume that each firm value process, S it i n, has the following dynamics, ds it = µ i S it dt + σ i S it dw it, ( where each σ i is a constant volatility and Brownian motions W i s are correlated by d W it, W jt = ρ ij dt. Each firm also has a constant barrier, B i, i n, and its default happens at the first time when the asset value S it falls below the barrier level. herefore, each default time τ i for the i-th firm is defined as τ i = inf{t 0 : S it B i }. (2 Let the filtration F t 0 be generated by all S it, i =,, n under a probability measure IP. At time 0, the joint default probability with a terminal time is defined by DP = IE {Π n i=i (τ i F 0 }, (3 3

In general, there is no closed-form solution for the probability of joint default so one has to reply on numerical methods for estimation. Using deterministic methods such as numerical PDE or binomial tree to calculate this joint default probability typically suffers from the curse of dimensionality. herefore, Monte Carlo simulation becomes feasible to estimate (3. It is necessary to develop an importance sampling scheme particularly in the presence of rare event simulation. We shall remark that a high dimension generalization to Black and Schole s model [2] is the same as in the factor copula model. Due to the fact that default can only occur at maturity of debt, the default event is characterized by {τ i = } = {S i B i } (4 { = W i c i := log (B i/s i0 (µ i σi 2/2 }, σ i where W i is a normal random variable with mean zero and variance and the correlation coefficient between W i and W j is ρ ij. Hence the joint default event {Π n i= I (W i c i } is as the form in (8. If the issued debt B i is much smaller than initial asset value S i0, such as a high ranking company, then c i will be negatively large, which results in a rare event simulation problem. 3.2. Efficient Importance Sampling Scheme In this section, we review an importance sampling scheme developed by Han and Vestal [9] in order to improve the convergence of Monte Carlo simulation. In addition, we provide a variance analysis to justify that the importance sampling scheme is asymptotic optimal (or efficient in one dimension. he basic Monte Carlo simulation approximates the joint default probability such as (3 by the following estimator DP N N k= ( Π n i=i τ (k i, (5 where τ (k i denotes the k-th i.i.d. sample of the i-th default time and N denotes the number of simulation. By Girsanov heorem, one can construct an equivalent probability measure IP defined by the following Radon-Nikodym derivative ( dip dip = Q (h = exp h(s, S s d W s h(s, S s 2 ds,(6 0 2 0 4

where we denote by S s = (S s,, S ns the state variable (asset value process vector and W ( s = Ws,, W ns the vector of standard Brownian motions, respectively. he function h(s, S s is assumed to satisfy Novikov s condition such that W t = W t + t 0 h(s, S sds is a vector of Brownian motions under IP. he importance sampling scheme proposed by [9] is to select a constant vector h = (h,, h n which satisfies the following n conditions IE {S i F 0 } = B i, i =,, n. (7 In fact, (7 can be simplified by using the explicit log-normal density of S i, so we deduce the following sequence of linear equations for h i s: Σ i j=ρ ij h j = µ i ln B i/s i0. (8 σ i σ i If the covariance matrix Σ = (ρ ij i,j, n is non-singular, the vector h exists uniquely so that the equivalent probability measure IP is uniquely determined. he joint default probability defined in (3 becomes DP = IE {Π n i=i (τ i Q (h F 0 }. (9 Equation (7 requires that, under the new probability measure P, the expectation of asset s value at time is equal to its debt level. When the debt level B of a company is much smaller than its initial asset value S 0 (see examples in able 4, or returns of any two names are highly negative correlated (see examples in able 5, joint default events are rare. By the proposed importance sampling scheme, random samples drawn under the new measure IP are able to cause more defaults than IP. able 4 and able 5 illustrate numerical results on estimating the (joint default probabilities for single name and three names. he exact solution of the single name default probability is ( 2µ/σ 2 N (d + 2 + N (d 2 S0 B (20 with d ± 2 σ. his result can be obtained from the distribution of the running minimum of Brownian motion. However, there is no closed-form solution for the joint default probability of three names under consideration in able 5 except for the case of zero correlation. = ± ln(s 0/B+(µ σ 2 /2 5

able 4: Comparison of single-name default probability by basic Monte Carlo (BMC, exact solution, and importance sampling (IS. he number of simulation is 0 4 and an Euler discretization for ( is used by taking time step size /400, where is one year. Other parameters are S 0 = 00, µ = 0.05 and σ = 0.4. Standard errors are shown in parenthesis. B BMC Exact Sol IS 50 0.0886 (0.0028 0.0945 0.0890 (0.006 20 0 (0 7.730 0 5 7.598 0 5 (2.383 0 6 0 (0.334 0 30.820 0 30 (3.444 0 3 able 5: Comparison of three-name joint default probability by basic Monte Carlo (BMC, and importance sampling (IS. he number of simulation is 0 4 and an Euler discretization for ( is used by taking time step size /00, where is one year. Other parameters are S 0 = S 20 = S 30 = 00, µ = µ 2 = µ 3 = 0.05, σ = σ 2 = 0.4, σ 3 = 0.3 and B = B 2 = 50, B 3 = 60. Standard errors are shown in parenthesis. ρ BMC IS 0.3 0.0049(6.9832 0 4 0.0057(.9534 0 4 0 3.0000 0 4 (.739 0 4 6.4052 0 4 (6.9935 0 5-0.3 0(0 2.2485 0 5 (.259 0 5 3.2.2 Comment on Merton s Model Merton [26] extended Black and Scholes model [2] defined in (4 by adding a compound Poisson jump term to the underlying geometric Brownian motion so that the firm value process is governed by the jump-diffusion model ds t N t = µdt + σdw t + d (Y (j, S t where µ denotes the drift rate, σ the volatility, N(t the Poisson process with intensity λ, and the logarithm of the firm s jump size Y is normally distributed with the mean a and the variance b 2. It is known [7] that conditioning on N t = n, the distribution of S is ln S N j= ( log S 0 + (µ σ2 2 + an, (σ 2 + b2. (2 6

he probability of default at maturity under this jump-diffusion model admits a closed-form solution [7] where P JD = IE {I(log(S /S 0 < B} = IE { E { I(S < D := S 0 exp(b N = n }} e λ (λ n = ( N (d 2, (22 n! n=0 d 2 = ln(s 0/D + (µ n σ 2 n/2 σ n, µ n = µ + σ n = σ 2 + nb2. n log( + m, + m = exp(a + b 2 /2, Wang, Lin, and Fuh [27] studied the estimation of default probability under this model. hey proposed a Laplace method similar to the derivation in Section 2. so that they can change the measure for the jump and diffusion by solving some nonlinear equation. In contrast, our importance sampling scheme is simple. he measure change is only for the diffusion and it is determined a priori so no extra computation is needed. A numerical example studied in able [27] is used as a benchmark to compare with our importance sampling scheme. he comparison is illustrated in able 6, in which the first four columns are recorded from [27]. Our results are listed in Column 5, which perform the best in all cases. 3.2.3 Asymptotic Analysis Next, we provide two theoretical verifications to show that the importance sampling developed above is asymptotic optimal (or efficient in Monte Carlo terms for the one-dimensional first passage time problem under the geometric Brownian motion. We present ( a direct calculation and (2 an application of the Freidlin-Wentzell theorem [5, 9] in large deviation theory, to approximate the default probability and the second moment of the importance sampling estimator in (9 when the small scale ε := log(b/s 0. We obtain that the second moment approximation is the square of the first moment (or default probability approximation. herefore, we attain the optimality of variance reduction in the asymptotic sense so that the importance sampling scheme is efficient. 7

able 6: Under Merton s jump-diffusion model, we compare with the closedform solution for the single-name default probability (22 and estimates from the basic Monte Carlo, the importance sampling proposed in [27] and our importance sampling with various default level B. Standard errors are reported in the parenthesis. he first four columns are recorded from able in [27]. Model parameters are µ = 0.06, σ = 0.2, λ =, a = 0, b 2 = 0.02, = /252. he number of replications in Monte Carlo methods are 0,000 and the time discretization in our importance sampling scheme is 00. B P JD P JD P JD P JD rue Basic MC IS-JD IS-D 0.02 0.05 0.0499 0.050 0.0482 (0.0069 (0.0024 (7.89 0 4 0.0298 0.0 0.0 0.0 0.0094 (3. 0 3 (6.8 0 4 (.7084 0 4 0.04 0.00 0.000 0.00 0.00095 (9.9 0 4 ( 0 4 (8.38 0 5 heorem 4. Let S t denote the asset value following the log-normal process ds t = µs t dt + σ S t dw t with the initial value S 0, and B denote the default boundary. Let ε = log(b/s 0. We define the default probability and its importance sampling scheme by { ( } P ε = IE I min = IE { I S t B 0 t ( min S t B 0 t } Q (h, where the Radon-Nykodym derivative Q(h is defined as (6 so that the drifted Brownian motion W t = W t + ht under the new measure P is a standard Brownian motion. he second moment of this estimator is denoted by P ε 2 (h = IE { ( } I min S t B Q 2 (h. 0 t By the choice of h = µ ε σ, the expected value S under P is B. hat is, IE {S } = B. When ε is small enough or equivalently B S 0, we obtain P2 ε (h (P ε 2. his implies that the importance sampling scheme is efficient. 8

able 7: Comparisons of the closed-form solution for the single-name default probability and estimates from the basic Monte Carlo, the optimal importance sampling and the efficient importance sampling with various default level B. Standard errors are reported in the parenthesis. Model parameters are S 0 = 00, µ = 0., σ = 0.3, =. he number of replications in Monte Carlo methods are 0,000 and the time discretization is 00. B P ε P ε α P ε (h(α, β P ε (h(d, rue Basic MC β Optimal IS Efficient IS 40 0.003 0.00098 40.06 0.00 0.00 (0.033.076 (8.84 0 5 (.96 0 5 30 2.83 0 5 3 0 5 28.24 2.9 0 5 2.3 0 5 (.73 0 5.06 (2.3 0 8 (4.83 0 8 20 2.98 0 8 20.23 2.2 0 8 2.7 0 8 0.98 (2.54 0 0 (2.36 0 0 0 3.92 0 5 0.0 2.64 0 5 2.66 0 5.02 (3.45 0 7 (5.2 0 7 Because the proof is lengthy, we describe it in Appendix. Also we remark that the same result and its extension to high dimensions can be derived by large deviation theory. his is a current research work in preparation. 3.2.4 Optimal Importance Sampling Versus Efficient Importance Sampling From the formula of the second moment Γ(α, β in (33, one can determine the optimal (α, β by minimizing Γ(α, β numerically. hen use the corresponding importance sampling scheme (32 to estimate the single-name default probability. In able 7 we compare the performance of this optimal importance sampling scheme with efficient importance sampling scheme. his comparison is similar to able in the toy model section. he computing time for the optimal importance sampling scheme takes about at least 50 seconds in Matlab while the efficient importance sampling takes in a second. Solving (α, β by a two-dimensional optimization scheme in Matlab can be time consuming. Note that the optimal pairs are close to (B, chosen by the efficient importance sampling scheme. Examples in this table demonstrate that the efficient importance sampling performs well by measurement in time and accuracy. 9

4 Stochastic Correlation Model: wo- Dimensional Case Hull, Presescu and White (2005 examined the effect of random correlation and incorporated a stochastic correlation model in structural form. We assume that the correlation process ρ t = ρ(y t is driven by a mean-reverting process such as Orenstein-Ulenbeck process. A two-name system with this stochastic correlation is given by dst = rst dt + σ St dwt ( dst 2 = rst 2 dt + σ 2 St 2 ρ(y t dwt + ρ 2 (Y t dwt 2 dy t = 2β ε (m Y tdt + dz t, ε where the correlation function ρ( is assumed smooth and takes value in [, ]. he joint default probability is defined by { ( } P ε (t, x, x 2, y := IE Π 2 i= I min t u Si u B i St = x, St 2 = x 2, Y t = y,(23 provided no default before time t. Note that under this stochastic correlation model, the importance sampling scheme proposed previously cannot be directly applied. his is because in the standard structural-form model, all parameters including correlation coefficients are constant, while they are randomly changed over time in the current model. o overcome this difficulty, one can approximate can first approximate the default probability under the stochastic correlation model by another default probability, in which the effective correlation is a constant, then use this approximation to construct an importance sampling scheme. his approach has been applied for option pricing under stochastic volatility models. See for example [, 2], in which a martingale control variate method under an effective volatility is used to improve the convergence of Monte Carlo simulation and Quasi Monte Carlo simulation, respectively. Next we apply a singular perturbation technique to derive the asymptotic expansion of the two-name default probability under the stochastic correlation model. he leading order term is a default probability with the effective correlation, which is constant, so the problem is the same as the standard structural-form model. We are able to use this homogenized approximation to combine with the importance sampling scheme developed above, then estimate the joint default probability by simulation. 20

4. Formal Expansion of he Perturbed Joint Default Probability By an application of Feynman-Kac formula, P ε (t, x, x 2, y solves a three-dimensional partial differential equation (PDE ( ε L 0 + L P ε (t, x, x 2, y = 0, (24 where the partial differential operators are L 0 = β 2 2 y 2 + (m y y L (ρ(y = L,0 + ρ(yl, L,0 = t + 2 i= σ 2 i x2 i 2 2 x 2 i L, = σ σ 2 x x 2 2 x x 2. + 2 i= µ i x i x i he terminal condition is P ε (, x, x 2, y = I {x B } I {x2 B 2 } and two boundary conditions are P ε (t, B, x 2, y = P ε (t, x, B 2, y = 0. Suppose that the perturbed joint default probability admits the following expansion P ε (t, x, x 2, y = ε i P i (t, x, x 2, y. i=0 Substituting this into (24 ( (P0 0 = ε L 0 + L + εp + ε 2 P 2 + = ε (L 0 P 0 + (L 0 P + L P 0 + ε (L 0 P 2 + L P +ε 2 (L 0 P 3 + L P 2 + is obtained. By equating each term in order of ε to zero, a sequence of PDEs must be solved. For the O( ε term, L 0 P 0 (t, x, x 2, y = 0. One can choose P 0 as variable y independent. For the O( term, (L 0 P + L P 0 (t, x, x 2, y = 0. his is a Poisson equation. As L 0 is the generator of an ergodic process Y t, by centering condition we can obtain < L > P 0 = 0. he notation < > means the averaging with respect to the invariance measure of the ergodic process Y. hus the leading order term P 0 solves the homogenized PDE: (L,0 + ρl, P 0 (t, x, x 2 = 0, 2

where ρ =< ρ(y > OU = ρ(y e (y m2 2ν 2 dy (25 2πν with the terminal condition is P 0 (, x, x 2 = I {x B } I {x2 B 2 } and two boundary conditions are P 0 (t, B, x 2 = P 0 (t, x, B 2 = 0. he closed-form solution of P 0 (t, x, x 2 exists with a similar formulation as in (27. Combining L 0 P + L P 0 = 0 with < L > P 0 = 0, we obtain L 0 P = (L P 0 < L > P 0 such that P (t, x, x 2, y = L 0 (L < L > P 0 (t, x, x 2 = L 0 (ρ(y ρ L, P 0 (t, x, x 2 = ϕ(yσ σ 2 x x 2 2 x x 2 P 0 (t, x, x 2, where ϕ(y solves the Poisson equation L 0 ϕ(y = ρ(y ρ. Similar argument can actually go through the continuing expansion terms. We thus skip the lengthy derivation but simply summarize the successive expansion terms are P n+ (t, x, x 2, y = i+j=n+ i 0,j ϕ (n+ i,j (y L i,0 L j, P n where a sequence of Poisson equations must be solved as ( L 0 ϕ (n+ i+,j (y = ϕ (n i,j (y < ϕ(n i,j (y > ( L 0 ϕ (n+ i,j+ (y = ρ(y ϕ (n i,j (y < ρ ϕ(n i,j >. hus we derive a recursive formula for calculating P ε = P 0 + εp + ε 2 P 2 +. Remark: he asymptotic expansion presented in this section can be generalized to multi-dimensional cases. Remark: he leading order term P 0 with the effective correlation ρ in (25 admits a closed-form solution [4]. Assume that two geometric Brownian motions (S t, S 2 t have a constant correlation ρ in the following form: ds t = µ S t dt + σ S t dw t ds 2 t = µ 2 S 2 t dt + σ 2 S 2 t (ρdw t + ρ 2 dw 2 t, where µ (µ 2 denotes the drift rate and σ (σ 2 the volatility with respect to the Brownian motion W t (W 2 t. When the default boundary 22

is deterministic of an exponential type Be λt, the default time can be defined as τ i = inf{t 0; S i t B i e λ it } (26 for each i {, 2}. he joint default probability with the effective correlation ρ is defined by P (0, x, x 2 = P (τ, τ 2. We assume no initial default, i.e., S i 0 > B i for each i to avoid the trivial case. he joint default can be expressed as P (0, x, x 2 = P (0, x + P 2 (0, x 2 Q,2 (0, x, x 2 (27 where P i := P (τ i denotes the i th marginal default probability and Q,2 := P (τ or τ 2 the probability that at least one default happens. he closed-form formula of P, P 2, and Q,2 can be found in [4] and they are P i = N ( d i µ i λ i σ i ( + e 2(λ i µ i d i σ i N d i + µ i λ i σ i, where d i = ln(si 0 /K i σ i, and Q,2 [4] can be expressed as a series of modified Bessel functions that we skip here. 4.2 Accuracy of he Leading Order erm Our goal in this section is to show the approximation P 0 to P ε is of order ε when the terminal condition P ε (, x, x 2, y = P 0 (, x, x 2 = h(x, x, 2 is smooth and bounded. In order to do that one consider the expansion P ε = P 0 + εp Z ε, where Z ε is a reminder which depends on order ε as well. Note that ( ε L 0 + L Z ε ( = ε L 0 + L (P 0 + εp P ε = ε L 0 P 0 + (L 0 P + L P 0 + εl P = εl P (28 23

because P ε solves the original PDE (24 and P 0 and P have been chosen to cancel the first two terms. he terminal condition and boundary conditions of Z ε are equal to εp. Next we prove that when ( ε L 0 + L Z ε = εl P = O(ε with a small boundary condition Z ε = εp = O(ε, Z ε = O(ε is obtained. o see this we write the probabilistic representation of the remainder term Z ε : Z ε { (t, x, x 2, y = εe x,x 2,y P (, S, S 2, Y L P (s, Ss, Ss 2, Y s I ( B Ss, B 2 Ss 2 for all t s } ds, t where the subscript of E denotes the conditional expectation given S t = x, S 2 t = x 2, and Y t = y. Under the smoothness and boundedness assumption on the terminal function h and the boundedness of ρ(y, solutions of Poisson equations are at most linearly growing in y, see [3]. We show that P ε (t, x, x 2, y = P 0 (t, x, x 2 ; ρ + O(ε, which shows that the error of the leading order term is of order ε. heorem 5. Let the initial conditions at t, S t = x B, S 2 t = x 2 B 2 and Y t = y be given. Assuming that the terminal (payoff condition h(x, x 2 is smooth and bounded. For any ε > 0, there exists a constant C, independent of ε but may depend on y such that P ε (t, x, x 2, y P 0 (t, x, x 2 ; ρ C ε. Remark: he accuracy analysis for a non-smooth boundary condition such as h(x, x 2 = I {x B } I {x2 B 2 } can follow a regularization technique presented in [22]. 5 Homogenization in Large Deviations: Structural-Form Model under Random Environment We have studied in Section 4 that under a stochastic correlation, the two-name joint default probability has an approximation by the homogenized joint default probability with a constant correlation, and in Section 3.2. that an efficient importance sampling scheme can be useful to estimate multi-name joint default probability under the standard structural-form model. 24

We first study the estimation problem defined in (23 by importance sampling. A direct application of the efficient importance sampling is not possible because it requires constant correlations. But this is exactly where perturbation methods can help because the leading-order approximation term has the form of constant correlation. As a result our methodology to estimate the two-name joint default probability with stochastic correlation is to apply the efficient importance sampling scheme using the effective correlation, derived from the singular perturbation analysis. able 8: wo-name joint default probability estimations under a stochastic correlation model are calculated by the basic Monte Carlo (BMC and importance sampling (IS, respectively. Several time scale ε are given to compare the effect of stochastic correlation. he number of simulation is 0 4 and an Euler discretization scheme is used by taking time step size /400, where is one year. Other parameters are S 0 = S 20 = 00, B = 50, B 2 = 40, Y 0 = m = π/4, ν = 0.5, ρ(y = sin(y. Standard errors are shown in parenthesis. α = BMC Importance Sampling ε 0. 0.0037(6 0 4 0.0032( 0 4 0.0074(9 0 4 0.0065(2 0 4 0 0.02( 0 3 0.06(4 0 4 50 0.063( 0 3 0.037(5 0 4 00 0.06( 0 3 0.032(4 0 4 In able 8, two-name joint default probabilities, roughly between 0 2 and 0 3, are not very small. he importance sampling can improve the variance reduction ration by 6.25 times at most. In able 9, we consider a rare event simulation. he basic Monte Carlo simulation basically cannot provide reliable estimates while the importance sampling scheme performs well. For example, 95% confidence intervals are still in fairly reasonable estimation. From these two tables, it is also interesting to observe ( the sensitivity of the time scale ε on the estimation of the joint default probability. When ε decreases ( α increases, the joint default probability increases while all other parameters remain unchanged. (2 hough from the derivation of singular perturbation as well as heorem 5, the time scale ε ought to be small in order to validate the leading order approximation. However the importance sampling scheme still perform well even though ε = 0 is not in a small regime. his shows that the importance sampling scheme is not only theoretically efficient but also empirically robust. Similar observations are found in [] for option pricing problems under multi-factor stochastic volatility models. 25

able 9: wo-name joint default probability estimations under a stochastic correlation model are calculated by the basic Monte Carlo (BMC and importance sampling (IS, respectively. Several time scale ε are given to compare the effect of stochastic correlation. he number of simulation is 0 4 and an Euler discretization scheme is used by taking time step size /400, where is one year. Other parameters are S 0 = S 20 = 00, B = 30, B 2 = 20, Y 0 = m = π/4, ν = 0.5, ρ(y = sin(y. Standard errors are shown in parenthesis. α = BMC Importance Sampling ε 0. 0(0 9. 0 7 (7 0 8 0(0 7.5 0 6 (6 0 7 0 0(0 2.4 0 5 (2 0 6 50 0 4 ( 0 4 2.9 0 5 (3 0 6 00 0 4 ( 0 4 2.7 0 5 (2 0 6 In this case, a control variate method is developed by adopting the leading order price approximation to construct the martingale control. 5. Comment on Stochastic Volatility Model Our importance sampling scheme can also be extended to stochastic volatility models by means of the singular perturbation method. For simplicity we assume the driving volatility process Y t is the Ornstein- Uhlenbeck process and is the same (homogeneous for all names. hat is for i =,, N(total firm number, ds it = µ i S it dt + σ i exp (Y t /2 S it dw it, (29 dy t = α(m Y t dt + κ 2αdZ t, where d < W it, W jt >= ρ ij dt for each i, j N, and Z t can be correlated to W i s. Following the methodology presented above, we no longer have the simple linear system solution in (8 because the condition postulated in (7 is not solvable in general. However, assuming the shift function h in the Radon-Nykodym derivative (6 to be a constant vector, one can still obtain an asymptotic result to solve an effective linear system Σ i j=ρ ij h j = µ i ˆσ ln B i/s i0, (30 ˆσ where ˆσ is defined as the square root of the limiting average of 0 exp (Y tdt when the mean reverting rate α goes to infinity. he result (30 is derived through an asymptotic analysis by means of the singular perturbation method [3]. 26

5.2 Estimation of Loss Density Function for Credit Portfolio Model Next we consider a credit portfolio model with stochastic volatility. Estimation of loss density under this model has been recently studied by Carmona, Fouque, and Vestal [7], in which they propose an interacting particle system for variance reduction. heir model is similar to (29 by taking the square root process Y t as the variance process instead. An importance sampling scheme can be similarly constructed under this framework. he portfolio loss function is defined by L( = Σ N i=i(τ i, (3 where > 0 is a fixed time and the default time is the first hitting time defined in (2. he loss density function is defined by p k := P (L( = k for each k {0,,, N}. his is a different problem from estimating the joint default probability because the probability of mixed event, some default and the rest survive, is considered here. Although the probability of joint default can be used to compute p k from a combinatoric formula, it will be inefficient when the firm size is large. We suggest that each defaulters in {, 2,, N}, its corresponding element in the vector h defined in (6 has to full-fill the condition (7, while each of the rest elements in h remains 0. (his suggestion is based on the rare event situation when all firms are highly ranked, i,e, each firm has more chance to survive than default. Figure 5.2 demonstrates estimation for the density p k, labelled by sign, as well as its confidence interval, labelled by the candle bar. We consider a homogeneous case with the total firm number N = 25. Individual parameters are chosen by µ i = 0.06, σ i =, ρ ij = 0.4, S i0 = 90, B i = 36, volatility parameters are α =, m = 2.4079, κ =, σ 0 = m, and there is no correlation between W i s and Z. he maturity is one year. he number of time discritization is 00 and number of simulation is 0,000. It takes about one hour in a laptop PC by Matlab to generate all numerical results in the figure. In [7], the authors find that classical importance sampling method may be numerically unstable under a stochastic volatility model. In contrast, we propose a new importance sampling scheme based on the combination of the large deviation principle and singular perturbation. he accuracy of these numerical results are mostly acceptable. A further comparison between the interacting particle system and importance sampling is left for future research. 27