Advanced Portfolio Management - PDF Free Download

May 2008 Alberto Dominguez, ASA CFA Towers Perrin, Atlanta

Topic 1 Financial Markets Modeling Techniques Candidate will be able to: a) Criticize following modeling methods: deterministic vs. stochastic single period vs. multiple period one vs. multiple factors realistic vs. risk-neutral equilibrium vs. arbitrage-free actuarial vs. capital markets simulation vs. formula-based mean-reversion b) Recommend a modeling method for a given situation c) Define and apply concepts of martingale, market price of risk and measures in single and multiple state variable contexts d) Describe and evaluate equity and interest rate models e) Contrast commonly used equity and interest rate models f) Recommend an equity or interest rate model for a given situation g) Describe issues and best practices in estimation or calibration of financial models Hardy, Investment Guarantees: Modeling and Risk Management for Equity-Linked Life Insurance Chapter 1 (pp. 11-14): Provision for equity-linked liabilities Chapter 2: Modeling long-term stock returns Chapter 3: MLE for Stock Return Models Chapter 4: Left-Tail Calibration Method Chapter 5: Markov Chain Monte Carlo Estimation Chapter 7 (pp. 115-123): A review of option pricing theory Validation of Long-Term Equity Return Models for Equity-Linked Guarantees by Hardy, Freeland and Till, NAAJ Vol. 10 No. 4, October 2006 (Sections 1-4 only) Babbel & Fabozzi, Investment Management for Insurers, 1999, Chapter 11: Four faces of an interest rate model V-C101-07 (formerly 8V-202-00): Quantitative Strategies Research Notes, Model Risk Hull, Options, Futures and Other Derivatives Sixth Edition, 2006 Chapter 19: Estimating Volatilities and Correlations Chapter 25: Martingales and Measures V-C125-07: Chapters 5 & 6 of Oxford Guide to Financial Modeling

Table of Contents Investment Guarantees Chapter 1... 3 Investment Guarantees Chapter 2... 6 Investment Guarantees Chapter 3... 13 Investment Guarantees Chapter 4... 16 Investment Guarantees Chapter 5... 18 Investment Guarantees Chapter 7... 21 Validation of Long-Term Equity Return Models... 22 Validation of Long-Term Equity Return Models... 22 Investment Management for Insurers Chapter 11... 25 Model Risk... 30 Options Futures and Other Derivatives Chapter 19... 33 Options Futures and Other Derivatives Chapter 25... 34 Oxford Guide to Financial Modeling Chapter 5... 36 Oxford Guide to Financial Modeling Chapter 6... 43 Part 1 - Syllabus Study Notes 2 2008 Alberto Dominguez

Investment Guarantees Chapter 1 Major Benefit Types Guaranteed Minimum Maturity Benefit (GMMB) or Guaranteed Minimum Withdrawal Benefit Guaranteed Minimum Death Benefit (GMDB) Guaranteed Minimum Accumulation Benefit (GMAB) Guaranteed Minimum Surrender Benefit (GMSB) Guaranteed Minimum Income Benefit (GMIB) or Guaranteed Annuitization Option (GAO) Contract Types Segregated Fund Contracts (Canada) Variable Annuities (USA) Unit Linked Insurance (UK) Equity Indexed Annuities (USA, UK) Equity Linked Insurance (Germany) Provision for Equity Linked Liabilities Reinsurance Buying options from third parties equivalent to those they have embedded/sold is equivalent to reinsuring entire risk Reinsurers have been involved in selling such options to insurers Insurer likely to pass on substantial proportion of expected profit along with risk Insurer must be aware of counterparty risk, since maturity may be decades away For some markets reinsurers and other providers increasingly unwilling to provide at prices acceptable to insurers (e.g. segregated fund contracts in Canada) Dynamic Hedging used by banks and hedge funds, only occasionally by insurers Use BSM to determine replicating portfolio and invest in it Portfolio will change continuously, so necessary to recalculate and adjust frequently BSM contains some strong assumptions that cannot be realized in practice Replicating portfolio still manages to provide a powerful method of hedging liability Most academic literature assumes dynamic hedging Although application by actuaries in practice of FE theory to management of embedded options is growing, in many areas still not widely accepted Actuarial Approach most common for USA, Canada and UK insurers Prolonged low stock market of 1973-4 awakened actuaries to possibility that GMMB of 100% of premium, which had been treated as a relatively unimportant policy tweak/rider with very little value or risk, constituted a serious potential liability Then recent BSM was considered too risky and unproven to be used for unit-linked GMMB by UK actuarial profession If actuarial profession had taken opportunity to learn and apply option pricing theory and risk management at that time, design and management of embedded options would have been very different and actuaries would have been better placed to participate in derivatives revolution In 1980 Maturity Guarantees Working Party (MGWP) suggested using stochastic simulation to determine approximate distribution for guarantee liabilities and then using quantile reserving to convert distribution into usable capital requirement Part 1 - Syllabus Study Notes 3 2008 Alberto Dominguez

Quantile reserve had already been used for many years, particularly in non-life insurance Insurer assesses appropriate quantile of loss distribution (say 99%) and PV of quantile is held in riskless bonds so can be 99% certain liability will be met Principle identical to VaR concept in finance, though generally applied over longer time period by insurers than banks Use of quantile as risk measure not actually fundamental and other risk measures may be preferable; actuarial approaches in general use stochastic simulation to project liabilities and then discount back at riskless rate, implicitly assuming capital investment in riskless bonds Actuarial approach still popular, particularly with actuaries, and offers valid alternative for some contracts CIA Segregated Funds Task Force 2000 (SFTF) used actuarial approach as underpinning methodology for determining capital requirements although a combined approach is also accommodated It should not be assumed that actuarial approach is incompatible with dynamic hedging Many practitioners who use one of these approaches harbor distrust of other approach, often based on a lack of understanding Both approaches have their merits, and two approaches in appropriate combination may lead to better risk management than either provides separately Ad Hoc Approach There is a diminishing body of opinion among actuaries that this is unnecessary or even irrelevant Their approach to financial guarantees is actuarial judgment, especially with lowfrequency type options (e.g. GMMB = total premium over 10 years); judgment often wrong This view that there would never be a liability so little or no provision needs to be made is uncommon now and tends to be unpopular with regulators Equitable Life UK, large mutual more than 200 years old, oldest life office in world In early-1980s issued large number of guaranteed annuity option contracts and set up no provision for guarantee Guarantee in money if interest rates fell below 6.5%, at time rates were higher than 10% and cautious long-term view was they might fall to 8% Assumption that rates would never again fall below 6.5% seems very odd given that rates had been lower for decades until only 10-15 years earlier In mid-1990s interest rates fell, guarantees moved into money, Equitable was forced to close to new business in December 2000 Had actuaries in 1980s used stochastic methods then available, it would have been clear that substantial provision would be required for this option Pricing and Capital Requirements Three main considerations What price should policyholder be charged for guarantee benefit? How much capital should insurer hold wrt benefit through term of contract? How should this capital be invested? Many companies focus only on first consideration, but three issues crucially interrelated Price only appropriate if suitably invested to match pricing approach Different risk management strategies require different levels of capital for same level of risk and therefore different prices for same guarantee Part 1 - Syllabus Study Notes 4 2008 Alberto Dominguez

They are all asset-liability issues, not separate asset issues and liability issues Part 1 - Syllabus Study Notes 5 2008 Alberto Dominguez

Investment Guarantees Chapter 2 Deterministic vs Stochastic There has been some effort to adapt deterministic equity return approach for equity linked liabilities; Office of Superintendent of Financial Institutions (OSFI) in Canada mandated a deterministic test for GMMB under segregated fund contracts; this has since been superseded by recommendations of SFTF Problems with deterministic equity return approach It is likely that any single path used to model extreme behavior relevant to GMMB will lack credibility OSFI scenario for diversified equity mutual fund is immediate fall of 60% followed by returns of 5.75% per year for 10 years Worst monthly S&P return last 100 years was 35% Insurers skeptical about need to reserve against such an unlikely outcome It is difficult to interpret results. What does it mean to hold enough capital to satisfy a particular bad path? It will not be enough to pay guarantee with certainty unless full amount is held in riskless bonds, so how extreme must circumstances be before required deterministic amount is not enough? A single path may not capture risk appropriately for all contracts, particularly if guarantee ratchets upward from time to time; one-time drop may be less damaging than a sharp rise followed by period of poor returns if guarantee depends on path rather than final value, e.g. GMAB True understanding of nature and sources of risk under equity-linked contracts requires stochastic analysis of guarantee liabilities Stochastic analysis requires credible long-term model of underlying stock return process Actuaries have no general agreement on form of such a model Financial engineers traditionally use lognormal (LN) model; nowadays a wide variety of models are applied Many insurers in Canada segregated fund market use LN model; SFTF gives specific guidance on LN model on grounds that it has been very popular choice in industry Economic Theory vs Statistical Approach Some models are derived from economic theory (e.g. EMH or No Arbitrage Assumption) Statistical approach does not consider theory, instead finds model that best fits data Hardy like most practitioners uses an implicit mixture of approaches using models that make sense in terms of market experience and intuition but are tractable Data Either total return data (for variable annuity contracts) or price index data (for equity index products) can be used; Hardy uses total return data S&P500 (USA) or TSE300 (Canada, now superseded by S&P/TSX Composite) Selecting appropriate data series for calibration There is some evidence of a shift in Depression and returns may also be distorted by WW2 Thus it is attractive to use only more recent data Part 1 - Syllabus Study Notes 6 2008 Alberto Dominguez

On other hand for very long-term contracts, we may be forecasting distributions further forward than we have considered in model (e.g. 40 years for a GMAB) so it may not be wise to assume no future similar events, so more data would be preferred However because of mitigating influence of mortality, lapses and discounting cash flows beyond say 20 years may not have substantial influence on overall results Hardy uses 1956-1999 data 1 Results demonstrate that higher volatility assumption (using 1924-1999) has a very substantial effect on liability; gives density functions with much fatter tails S&P and TSE very similar; TSE was a bit more volatile in early 1980s and in 1956-1999 TSE had slightly lower mean return and slightly higher standard deviation Some traders and actuaries think history should not be used at all to predict future They propose distribution parameter should be derived from current implied market volatility from traded options; usually market volatility differs very substantially from historical estimate of long-term volatility Difficulties However options embedded in equity linked contracts especially GMMB are not tradable and have effective maturities far longer than traded options; market volatility varies with term to maturity, so in absence of very long-term traded options, it is not possible to state what would be appropriate market volatility for equity linked options It is important to have a model of true probability measure rather than risk-neutral measure on which market prices are based There is much volatility in implied market volatility; a change of 100 bp in volatility assumption for say a 10 year option may have enormous financial impact, but such movements in implied volatility are common in practice; not satisfactory to determine long-term strategies for actuarial management of equity-linked liabilities on assumptions that may be deemed utterly incorrect one day later GMMB Liability Historical Evidence S&P 10 year put was in money for initial deposits made in three periods 2 Substantially in money for options spanning bear market of 1929-1932 Moderately in money for some deposits in 1964-1965 (due to 73-74 bear market) 0.1% payout for deposits in December 1968 TSE option was in money for options spanning bear market of 1929-1932 Extra volatility from exchange rate risk (e.g. S&P in Canada) and complications from ratchets and resets would lead to even higher liabilities than indicated for simple contracts Lognormal Model Assumes that in continuous time stock returns follow Geometric Brownian Motion (GBM) In discrete time implications are Over any discrete time interval stock price accumulation factor is lognormally distributed S t+w /S t ~ LN(w,w) ln (S t+w /S t ) ~ N(w, 2 w) is mean log return over unit of time; is volatility as an annual rate Returns in non-overlapping intervals are independent, consistent with EMH pdf = 1/(x(2w)) exp {-½(ln x - w) 2 / 2 w} Pr (S t+w /S t x) = N((ln x - w)/(w)) 1 This seems inappropriate to me 2 Study assumed a rather high 2.5% management fee, which certainly can affect results Part 1 - Syllabus Study Notes 7 2008 Alberto Dominguez

Option prices and probability distributions for payoffs under standard puts can be derived analytically E(S t+w /S t ) = exp {w + ½ 2 w} V(S t+w /S t ) = exp {2w + 2 w } x ( exp { 2 w} 1 ) Simple, tractable and provides a reasonable approximation over short time periods Short-term variations appear consistent with LN; LN is less appealing for longer term problems; does not provide satisfactory fit to data Fails to capture extreme price movements such as October 1987 crash; need distribution with fatter tails (leptokurtic) to fit data; difference between LN distribution and true fattertail distribution doesn t have severe consequences for short-term contracts but longer term implications can be substantial Autocorrelations in data make significant difference in tail of distribution over longer terms but are not incorporated into LN model, which assumes independent returns; conditional LN distribution removes serial independence from model Fails to capture volatility bunching, periods of high volatility often associated with severe downward stock price movements; critical omission Autoregressive Model AR(q) Incorporates a tendency to revert back towards mean AR(q) has terms in (Y t-r - ) for r = 1,,q where q is order AR(1) has process Y t = + a(y t-1 - ) + t = (1-a) + ay t-1 + t where a <1 If a is close to 1 process reverts slowly; if a is close to 0 process reverts quickly If a is negative, values tend to bounce above and below mean in an alternating fashion If a=0 process simplifies to LN Autocorrelation function for AR(1) process is k = a k Captures autocorrelation, but not extreme values or volatility bunching ARCH and GARCH One idea to estimate current level of volatility is to give more weight to recent data by replacing 2 n = (1/n)u 2 n-i with 2 n = m 1 i u 2 n-i where i =1 AR Conditional Heteroskedastic (ARCH) models popular in many areas of econometrics including stock return modeling; popular for introducing stochastic volatility in a reasonably intuitive and tractable fashion Looking forward a single step volatility is fixed; more than one step it s a stochastic process ARCH(m) is an extension of weighted average idea that assumes there is a long-run average variance that is also given some weight by 2 n = 0 2 ave + m 1 i u 2 n-i where i =1 ARCH(1) has variance process 2 t = a 0 + a 1 (Y t-1 - ) 2 and Y t = + t t Variance is conditional on Y t-1, if yield in prior period deviates significantly from mean expected variance increases Introduced by Engle (1982) and applied to quarterly UK inflation data Random occasional extreme value causes volatility clustering, a period of higher volatility after which volatility gradually returns to longer term value, but model does not allow for autocorrelation AR-ARCH results when ARCH is combined with AR(1) to give model Y t = + a(y t-1 - ) + t t and 2 t = a 0 + a 1 (Y t-1 - ) 2 Part 1 - Syllabus Study Notes 8 2008 Alberto Dominguez

Generalized ARCH (GARCH) developed by Bollerslev (1986) is more flexible and better fit for many econometric applications; variance process looks like an ARMA process without random term GARCH(1,1) estimates variance as 2 n = 0 2 ave + 1 u 2 n-1 + 2 2 n-1 ; by far most popular GARCH model; weight assigned to u 2 n-i is 1 i-1 2 ; 2 defines relative importance given to daily changes; equivalent to model where variance follows stochastic process d = a( ave - )dt + b ave dz 2 = 0 ave Mean reverting which is theoretically appealing since variance rates do tend to be mean reverting; average variance rate per day approaches long term ave as T becomes large 1 + 2 < 1 is required so process is covariance stationary otherwise process is mean fleeing rather than mean reverting; E[ 2 n+t ] = 2 ave + ( 1 + 2 ) t ( 2 n - 2 ave ) 2 n - 2 n-1 = 0 2 ave + 1 u 2 n-1 + ( 2-1) 2 n-1, and u 2 n-1 has mean 2 n-1 and standard deviation 2 2 n-1 ; therefore 2 n - 2 n-1 = 0 2 ave + ( 1 + 2-1) 2 n-1 + 1 2 2 n-1 = ( 1 + 2-1)( 2 n-1-2 ave ) + 1 2 2 n-1 This can also be expressed as 2 n - 2 ave = 1 (u 2 n-1-2 ave ) + 2 ( 2 n-1-2 ave ) dv = (1-1 - 2 )(V ave V) + 2 1 Vdz = a(v ave V) + Vdz; when time is measured in years dv = 252a(V ave V) + Vdz252 GARCH(p,q) estimates variance from most recent p observations of u 2 and q most recent estimates of 2 Other GARCH models have been proposed that incorporate asymmetric news n depends on sign of u n-1 ; since for stocks volatility is inversely related to price a negative u n-1 should have a bigger effect than same positive u n-1 AR-GARCH(1,1) has process Y t = + a(y t-1 - ) + t t and 2 t = a 0 + a 1 (Y t-1 - ) 2 2 + a 2 t-1 ARCH or GARCH density functions can be estimated by simulation 2 If a GARCH model is working well it should remove autocorrelation in u i Ljung-Box statistic is mw k 2 k where k are k-lag autocorrelations and w k = (m+2)/(m-k) When assessing goodness of fit over medium terms (5-10 years) ARCH and GARCH often provide good parsimonious fit to data; however, they are not substantially fatter tailed than LN on crucial left side of distribution Regime Switching Log Normal (RSLN) Model Regime switching (RS) models introduced by Hamilton (1989) assume a discrete process switches randomly between K regimes characterized by a different parameter set through a Markov process; easy way to create stochastic volatility One of simplest is RSLN which switches randomly between K LN processes; more tractable than ARCH/GARCH and more accurately captures extreme observed behavior with a substantially fatter tail than LN model; Bollen (1998) used it to value American options Two Regime RSLN (RSLN-2) provides very good fit to stock index data relevant to equity linked insurance and is parsimonious (6 parameters vs 12 for RSLN-3), so it will be main model used by Hardy P n (r) = 1 Pr[R n (0)=r start =1] + 2 Pr[R n (0)=r start =2] where 1 =p 21 /(p 12 +p 21 ) and 2 =p 12 /(p 12 +p 21 ) 3 Pr (S n x) = N((ln x - *(r))/*(r))p n (r) *(r)=r n 1 +(N-R n ) 2 3 Mocciolo pp21-22 provides a good derivation of this result Part 1 - Syllabus Study Notes 9 2008 Alberto Dominguez

*(r)=(r n 2 1 +(N-R n ) 2 2 ) pdf(x) = ((ln x - *(r))/*(r)x)p n (r) P n (r) can also be used to find unconditional moments of S n RSLN-2 was one of standard models used by SFTF; because we are changing and at same time we can capture association of high volatility with poor returns as one regime Empirical Model Use historic returns as sample space for future return simulation, each being equally likely Simple and quick though analytic development not possible; suffers from same problems as LN model which it closely resembles in distribution especially losing autocorrelation in data One adaptation would be to sample in blocks of several months at a time Model produces returns that are not distributed enough; there is also no volatility bunching Used by some financial institutions for VaR calculations but tend to be quite short-term One particularly useful feature is ease of constructing multivariate distributions by sampling from same date although lagged correlations are lost Stable Distribution Family If {Y t } is a Levy process then at any fixed time Y t has a corresponding stable distribution Popular especially for maturity guarantees because can be very fat tailed and distribution can be convoluted A distribution F is stable if any linear combination of two independent variables from distribution can be described with same distribution parameters; for iid X 1,X 2,X a>0,b>0 c>0,d ax 1 +bx 2 ~cx+d, e.g. normal distribution is stable but Poisson distribution is not Characteristic function (t) = E[e ixt ] = exp { it c t (1-isign(t)z(t,)) } where c>0, (0,2], [-1,1] and z(t,) = tan(/2) if 1 or 2ln t / if =1 is called location parameter and is called characteristic exponent =2 is normal distribution, =1 and =0 is Cauchy/Lorentz distribution, =1/2 and =1 is inverse Gaussian distribution For <2 distribution is so fat tailed that variance is infinite Not easy to use or simulate, estimating requires advanced techniques, does not incorporate autocorrelations from volatility bunching General Stochastic Volatility Model Can vary volatility stochastically without regime constraints, highly adaptable but very difficult in general to estimate parameters A good model 4 might be Y t = + t t and t = exp{ t /2} and t = t-1 + t Wilkie Model History Early version of Wilkie model of stock and bond returns was used applied to UK GMMB by MGWP, which adopted actuarial approach Full version applied to insurer solvency by FoA Solvency Working Party (1986) Subsequently fitted to other markets including USA and Canada 4 Hardy errata: Stochastic volatility model described in text is not sensible since volatility keeps increasing if a>0 and may become negative if a<0 Part 1 - Syllabus Study Notes 10 2008 Alberto Dominguez

Has been applied in Canada segregated fund market by number of companies Differs in several fundamental ways from other models Multivariate; each factor incorporates some factor from connected series higher up on cascade; very useful for applications that require consistent projections of related economic series, say stock prices and inflation rates or fixed interest yields Designed for long-term applications; ideally suited for projections more than 10 years and used even for 100 year projections Designed to be applied to annual data; can only be adapted to more frequent data by changing AR structure of series; attempts to produce continuous form say by constructing Brownian bridge between year-ends add complexity Unsatisfactory approximation for contracts with monthly cash flows; important when there are reset opportunities to increase guarantee mid-year Annual intervals too infrequent for exploration of dynamic hedging strategies for insurers who want to hold replicating portfolio since bought/sold at much shorter intervals Five integrated processes CPI (q), share yield (y), share dividends (d), long bond yield (c), short bond yield (b) Force of inflation follows AR(1) process q (t) = q + a q ( q (t-1)- q ) + q q (t) ultimate distribution is N( q, q /(1-a 2 q )) Share Prices and Dividends Share dividend yield y(t) = exp { w y q (t) + y + yn(t) } where yn(t) = a y yn(t-1) + y y (t) E[y(t)] = exp{ y } x M q (w y ) x exp { yn + 2 y /(2(1-a 2 y ))} where for large t mgf M q (u) = exp {u q + ½u 2 2 q } Force of dividend d (t) = w d DM(t) + (1-w d ) q (t) + d y y y (t-1) + d + b d d d (t-1) + d d (t) where DM(t) = d d q (t) + (1- d d )DM(t-1) Weighted average of current and past inflation Dividend yield effect where a fall in dividend yield is associated with a rise in dividend index Influence from previous year s white noise term White noise term Dividend index is D(t)=D(t-1)exp{ d (t)}; price index is P(t) = D(t)/y(t); return on shares is py(t) = (P(t)+D(t))/P(t-1) 1 Long and Short Bond Yields c(t) = cm(t) + cn(t) where cm(t) = d c q (t) + (1-d c )cm(t-1) and cn(t) = c x exp {a c cn(t- 1) + y c y y (t) + c c (t)} b(t) = c(t)e -bd(t) where bd(t) = b + a b (bd(t-1) - b ) + b c c c (t) + b b (t) Wilkie also describes wage inflation, inflation index bonds, exchange rates To run model we can start simulations at neutral values (stationary values we would obtain if all residuals were 0) rather than current environment; alternatively we can start at current date and let past data determine initial parameters Convenient to start at neutral values so results not distorted by current investment conditions; for contracts some time ahead these figures are close to figures that would be obtained using formerly current starting values For strategic decisions designed for immediate implementation appropriate to use contemporary data Criticism of Wilkie model (Huber 1997) Part 1 - Syllabus Study Notes 11 2008 Alberto Dominguez

Evidence of permanent change in nature of economic time series around WW2 not allowed for; this criticism applies to all stationary models, but non-stationary models can have even more serious problems (impossible scenarios, explosive volatility) Inconsistency with some economic theories such as EMH; Wilkie model close to random walk consistent with EMH over short term; significant debate about applicability of EMH over long term Data mining problem; one way around problem is to use part of available data to fit model and rest to test fit; for a complex model with many parameters data are already scarce; however Wilkie model is substantially theory driven and data mining problem only applies to non-theory-based time-series modeling 5 Huber not intending to limit actuaries to deterministic methodology although it has often been quoted in support of that view Vector Autoregression A more generalized vector AR model than Wilkie is to use AR(q) for a vector of multiple correlated relevant financial series with correlations captured in a matrix Cascade structure makes parameter estimation easier and perhaps makes model more transparent Then generate values at each step x t = (x 1t,, x mt ) T by a vector equation x t = + A j (x t-j - ) + LZ t where A is matrix of AR coefficients and LL T = is variance-covariance matrix of series residuals Wright develops a model of this form with inflation modeled independently of other variables and including as an extra term; this has advantage that many covariance terms can be set to 0 since much correlation between series can be explained by correlation with inflation 5 This still seems most serious of three criticisms to me Part 1 - Syllabus Study Notes 12 2008 Alberto Dominguez

Investment Guarantees Chapter 3 Introduction MLE estimate is parameter value that yields highest probability of observing given data values by maximizing likelihood L; it is almost always simpler to work with log-likelihood l If observations are iid, L() = f(x t ;) and l() = ln f(x t ;) If there s serial dependence, L() = f(x 1 ;) f(x 2 ; x 1 ) f(x n ; x 1,,x n-1 ) and l() = ln f(x ti ; x 1,,x t-1 ) Provided likelihood can be calculated it is sometimes possible to determine parameters analytically, if not maximization is generally easily determined using software Subject to some regularity conditions, MLE has many attractive properties Considered as a function of random sample, estimator is a random variable so we can talk about its distribution and moments and estimate accuracy by considering its mean and variance Attractive Properties of MLE Estimators Stationary distributions {Y t } is strictly stationary if for any sequence t i joint distribution for (Y t1,,y tr ) is identical to that for (Y t1-k,,y tr-k ), that is joint distribution is same regardless of where in series variables lie {Y t } is weakly stationary or covariance stationary if unconditional mean is constant E[Y t ]= t and all covariances Cov[Y t,y t-j ] depend only on lag j meaning E[(Y t -)(Y t-j -)]= j t,j If joint density for (Y t1,,y tr ) is multivariate normal and covariance stationary then it is also strictly stationary Attractive properties of MLE for independent samples also apply to MLE for strictly stationary time series Asymptotic Unbiasedness for large sample sizes expected value of MLE tends to parameter; in many cases MLE may be unbiased estimator for all sample sizes Asymptotic Minimum Variance Provided an estimator is unbiased or nearly unbiased lower variance is preferred In scalar case, expected information I() = E[-d 2 /d 2 l()] and asymptotic variance is I() -1 In vector case, I() ij = E[- 2 / i j l()] and I() -1 is asymptotic variance-covariance matrix Inverse information function is Cramer-Rao lower bound for variance of estimator; it doesn t get any better than this (MLE) for large samples and is often used as an approximate variance even where sample size is not large Problems Function of unknown parameter; use estimator in place of parameter If likelihood function complicated, information matrix difficult to find analytically; use numerical methods Delta method Asymptotic variance of a function of, g(), is V = where =(g()/ 1,,g()/ s ) and = I() -1 and asymptotic distribution is N(g(),V) Asymptotic (Multivariate) Normal Distribution with mean equal to parameter(s) being estimated and variance (matrix) I() -1 ; for large samples can be used to set confidence intervals Part 1 - Syllabus Study Notes 13 2008 Alberto Dominguez

Limitations of MLE Estimators Asymptotic results do not apply for models that are not strictly stationary; for non-stationary models other methods may be preferable; time series in Hardy are stationary subject to some parameter constraints Asymptotic results cannot be relied on if parameter estimate is near boundaries of parameter space; this problem also arises for GARCH and RSLN-3; should be considered carefully especially for more complex models Asymptotic results are only useful if we have reasonably large sample; for small samples other estimators may perform better than MLE for both bias and variance; for smaller samples or non-stationary distributions we have no information about relationships between parameter estimates MLE will not tell you how close fit is; it cannot be assumed that a small standard error means model is a good fit; another model with larger errors may provide better fit Using MLE for TSE and S&P Data For LN model MLE estimates are = y t /n and = ((y t -) 2 /n) and 2 = (n/(n+1))s 2 y ; estimator for is unbiased for all sample sizes; estimator for is asymptotically unbiased but is biased for all finite samples; sample variance s y is unbiased for all sample sizes and is therefore often used in place of MLE; covariance matrix is = ( ( 2 /n, 0), (0, 2 /2n) ) For AR MLE results in a system of nonlinear equations that must be solved numerically; for large samples Var() 2 /(n(1-a 2 )), Var() 2 /2n, Var(a) (1-a 2 )/n, and asymptotic covariances are all zero For ARCH treat initial variance 2 1 as an extra parameter; its effect on final likelihood will be small if data series is a reasonable size For GARCH, MLE estimates are ones that maximize [-ln( 2 i ) (u 2 i / 2 i )]; an alternative and more robust approach to estimating parameters in GARCH(1,1) is variance targeting, which involves setting long-run average variance to sample variance or to some other variable believed to be reasonable and then estimating remaining two parameters For RSLN-2 likelihood is calculated recursively f( 1 =1,y 1 ) = 1 ((y 1-1 )/ 1 ); f( 1 =2,y 1 ) = 2 ((y 1-2 )/ 2 ); f(y 1 ) = f( 1 =1,y 1 ) + f( 1 =2,y 1 ); p( 1 y 1,) = f( 1,y 1 )/f(y 1 ) f( t, t-1,y t y t-1,,y 1,) = p( t t-1,) p( t-1 y t-1,,y 1,) f(y t t,) where p( t-1 y t-1,,y 1,) is found from previous recursion and f(y t t,) = (1/ pt )((y t - pt )/ pt ) Likelihood-Based Model Selection Parsimony indicates that more complex models require significant improvement in fit to be worthwhile; for models with an equal number of parameters it is appropriate to choose model with higher log-likelihood Likelihood Ratio Test (LRT) Compares embedded models where a model with k1 parameters is a special case of a more complex model with k2 > k1 parameters If l1 is log-likelihood of simpler model and l2 is log-likelihood of more complex model, test statistic is 2(l2-l1) and null hypothesis is H0 = no significant improvement in more complex model Test statistic has 2 distribution with degrees of freedom equal to k2-k1 Part 1 - Syllabus Study Notes 14 2008 Alberto Dominguez

Even where models are not embedded LRT may be used although in this case 2 distribution is only an approximation Even where models are embedded there may be theoretical problems with LRT; in particular it is not a valid test for number of regimes in RS model; results should be viewed with caution Other tests Akaike Information Criterion (AIC) uses model that maximizes l-k; therefore each new parameter should add at least 1.0 to log-likelihood Schwartz-Bayes Criterion (SBC) uses model that maximizes l-½kln(n) where n = sample size Results For TSE data RSLN-2 is best model using any criterion For S&P data RSLN-2 is best model using SBC, but there is a marginal improvement in fit for RSLN-3 using LRT (at p=.01) or AIC (950.7 vs 947.4); evidence in favor of RSLN-3 may not outweigh added complexity Moment Matching A quick method of fitting parameters is to match mean, variance and covariances, and if necessary higher moments of data to those of distribution Note that for LN model matching moments of log-returns against a normal distribution (=<y>=0.987% and =s=4.145%) yields different results than matching moments of accumulation factors against a LN distribution (<x>=exp(+ 2 /2) and s 2 x =(exp(2+ 2 ))(exp( 2 )-1)) which can be inverted to obtain =0.987% and =4.096%; first version is very close to MLE and has smaller variance than second version Matching moments is unreliable method of fitting parameters and emphasizes a fit in center of distribution; overall fit may not be very satisfactory and standard errors can be large; for satisfactory fit it is better to employ more than first two moments Common use is as starting values for an iterative optimization procedure Part 1 - Syllabus Study Notes 15 2008 Alberto Dominguez

Investment Guarantees Chapter 4 MLE provides a fit of whole distribution with emphasis on center which contributes more to likelihood than tails; for separate account products though we may be more interested in left tail probability that results over a period are very poor (or right tail or both tails, for other applications) Quantile matching Determine mode parameters by matching model and empirical quantiles Method recommended by SFTF to be required of actuaries assessing guarantee risk Precise choice of quantiles to match will have substantial effect on resulting parameters Using quantile matching we can get better fit in tail of LN model by sacrificing quality of fit to other portions of distribution if we want to use it despite poor overall fit Canadian Calibration Table SFTF does not mandate specific distribution or precise nature of equity return model because they don t want to constrain companies unnecessarily or discourage development of new models However some restriction was thought necessary to avoid overly optimistic assessment of guarantee liabilities Recommended approach allows any model to be used provided it can be calibrated to give a sufficiently fat left tail; report suggests MLE or some other suitable method be used first to estimate parameters and quantile matching then be used to adjust parameters to get adequate left tail fit; ensure left tail and center match while sacrifice may be a poor fit in right tail Canadian Calibration Table specifies maximum acceptable 1/5/10 year accumulation factors at 2.5/5/10 percentiles derived using stochastic volatility models fitted to data Also, 1 year accumulation factor mean should lie in range 1.10-1.12 with standard deviation at least 0.175 Quantiles for accumulation factors It is difficult to derive appropriate factors because non-overlapping empirical data is very limited (4 ten-year, 8 five-year, 43 one-year) and overlapping periods are not statistically permissible We can use bootstrap method to create new pseudo-samples, e.g. breaking up data into six-month blocks (to allow for some auto-correlation) and randomly sample 20 of them to create a ten-year time period Doing so generates quantile estimates remarkably consistent with Canadian Calibration Table which was developed by extrapolation LN It is possible to use any two table values to solve for parameters and, but this tends to give values that lie outside acceptable range for mean Recommended method is to keep mean equal to empirical mean as first equation; then we can use any one table value for second equation; using parameter that satisfies most stringent test would indicate using 2.5 percentile of one-year factor; however must check that results in fact satisfy all eight other values; also must check that 1 year standard deviation sufficiently large 6 Calibrated LN model has very similar left tail to RSLN-2 6 Process is illustrated with example in text and in Carmody p27 Part 1 - Syllabus Study Notes 16 2008 Alberto Dominguez

AR(1) process is similar to LN but more tedious Use mean one-year accumulation as one defining equation Use 2.5 percentile one-year factor as second equation 2 = (1/(1-a)) 2 ((1-a i ) 2 ) Another reasonable approach is to keep MLE estimate of a and then most stringent quantile equation to solve for RSLN-2 MLE already satisfies all requirements without adjustment Calibration by simulation is allowed but fitted values must be adequate with high (95%) probability; useful where analytic calculation is not practical Simulate m values for each accumulation factor, for each cell count how many simulated values M fall below maximum quantile and use p=m/m as estimate of probability that accumulation factor is less than calibration value Using normal approximation, if p 1.645(p(1-p)/m) is greater than required probability (.025,.05,.1) then we are 95% certain that parameters satisfy calibration criterion If not all nine criteria are satisfied, necessary to adjust parameters and repeat simulation GARCH Using MLE parameter estimates yields quantiles that are too small at all durations and a mean that is rather large First step is to reduce to a value consistent with LN model Increasing any other parameter will increase standard deviation and therefore increase left tail; after some experimentation it appears most appropriate in terms of overall fit to increase a 0 and leave a 1 and a 2 alone Part 1 - Syllabus Study Notes 17 2008 Alberto Dominguez

Investment Guarantees Chapter 5 Bayesian Statistics Gives a scientific but straightforward method for quantifying effects of parameter uncertainty, without resorting to asymptotic arguments Parameter uncertainty may significantly affect estimated capital requirements Bayes Theorem relates joint, conditional and marginal functions: f(a,b) = f(a B)f(B) Used in Bayesian parameter estimation with unknown parameter vector as one random variable and random sample used to fit distribution as another random variable; originally method was constrained by difficulty in combining data distribution and parameter distribution; modern techniques have very substantially removed this restriction and Bayesian methods are now widely used in every area of statistical inference Before any data is collected we may have some information about parameter vector, expressed in terms of assumed prior probability distribution (); lacking any previous information a flat or high variance prior is chosen; after observing data x it is possible to construct posterior probability distribution conditional on data f( x) L(;x) = f(x ) and f(x,) = L(;x)() and ( x) = L(;x)()/f(x) Predictive distribution f(x x) = f(x )( x)d is used to model future observations of x E[ x] is a multivariate parameter estimator, giving information about how parameters are interrelated With MLE we know only asymptotic relationships between parameter estimates; with Bayesian approach we derive full joint distributions between parameters; price paid is additional structure imposed with prior distribution Markov Chain Monte Carlo (MCMC) For all but very simple models, direct calculation of posterior distribution not possible For long time this limited applicability of Bayesian approach; in 1980s MCMC technique developed Allows one to simulate sample from posterior distribution even if we don t know analytic form, and get parameter estimates, confidence intervals and parameter correlations from sample Technically MCMC is used to construct a Markov Chain that has required posterior distribution as its stationary distribution If we generate large number of simulated values after a while process will reach stationary distribution (early values are burn-in and discarded) and from that point on generate random values from posterior distribution; use these values to estimate distribution functions Determining when process has converged is complex and technical In practice a way of checking is to rerun from a few different seed values to ensure results are stable Method effective because we can update parameter vector one parameter at a time; makes simulation much easier to construct Suppose we are estimating a three-parameter distribution = (, a, b); given data y and (r) = ( (r), a (r), b (r) ) we find ( y,a (r),b (r) ) and simulate a value (r+1) which we can then use to simulate a (r+1) ~(a y, (r+1),b (r) ) and b (r+1) ~(b y, (r+1),a (r) ) Problem reduces to simulating from posterior distributions for each parameter assuming known values for all remaining parameters Part 1 - Syllabus Study Notes 18 2008 Alberto Dominguez

( i y, ~i ) f(y )p() Metropolis-Hastings Algorithm (MHA) If we can find a closed form for conditional probability function we can simulate directly from that distribution (Gibbs Sampler Method); if there is no closed form we may be able to use MHA which is relatively straightforward to apply provided likelihood function can be calculated Prior distributions For each parameter we assign a prior distribution before simulation; these can be independent or if there is a reason to use joint distributions for subsets of parameters that is also possible MLE (or if not available then center) of prior distribution is a good starting point for parameter; if variance of prior distribution is very large then prior distribution will have little effect on posterior distribution which will depend strongly on data alone; if variance of prior distribution is low then new data will have little effect on posterior distribution which will depend strongly on prior distribution Candidate distribution q(, i ) A random value from candidate distribution will be accepted or rejected as new value (r+1) using an acceptance probability For candidate distribution we can use any distribution that spans parameter space for i but some will be more efficient than others in terms of speed with which they reach stationary distribution; choosing requires some experimentation For unrestricted parameters such as mean for AR, ARCH or GARCH, normal distribution centered on previous value of parameter has advantages and is a common choice Normal distribution can sometimes even be used even if parameter space is restricted provided probability of generating a value outside parameter space is kept to near impossibility; for variance parameters that must be strictly positive popular distributions in literature are gamma and inverted gamma and again there are advantages in centering candidate distribution on previous value of series Acceptance-rejection procedure = min (1, ( y, (r+1,r) ~i )q( (r) i ) / ( (r) i y, (r+1,r) ~i )q( (r) i ) ) ( y, (r+1,r) ~i )/( (r) i y, (r+1,r) ~i ) = (L i (, (r+1,r) ~i )()/f(y)) x (f(y)/l i ( (r) i, (r+1,r) ~i )( (r) i )) = min (1, L i (, (r+1,r) ~i )()q( (r) i ) / L i ( (r) i, (r+1,r) ~i )( (r) i )q( (r) i ) ) If random value u < then set (r+1) i = else set (r+1) (r) i = i If prior distribution is disperse it will not have large effect on calculation because it will be numerically much smaller than likelihood, so major part of acceptance probability is ratio of likelihood with candidate value to likelihood with previous value Process is very similar to a MC search for joint MLE Poor choice for candidate distribution will result in acceptance probabilities being too low (series takes a long time to converge to limiting distribution) or too high (values tend not to reach tails of limiting distribution) Acceptance probabilities should lie in range [0.15,0.5] (Roberts 1996) It is always important to check sample paths for parameters when using MHA 7 7 Mocciolo pp44-46 provides an excellent explanation of MCMC and MHA Part 1 - Syllabus Study Notes 19 2008 Alberto Dominguez

MCMC for RSLN Without strong prior information as in this case, it is appropriate to set variance for prior distribution large enough that effect of prior distribution is small Model 1,2 ~ N(0,0.02 2 ) Candidate distribution for first regime is N(1 (r),0.005 2 ) and for second regime is N(2 (r),0.02 2 ) q( 1)=(1/.005)(2)exp(-½((-1)/.005) 2 ) and similarly for 2 = min (1, L i (, (r+1,r) ~i )(/.02) / L i (1 (r), (r+1,r) ~i )(1 (r) /.02) ) 0.4 and similarly for 2 Prior distribution for precision = -2 is gamma distribution with mean 865 and variance 849 2 in regime 1 and mean 190 and variance 1000 2 in regime 2 Candidate distributions are also gamma; for (r+1) 1 we use mean (r) 1 and standard deviation (r) 1 /2.75; for (r+1) 2 we use mean (r) 2 and standard deviation (r) 1 /1.5 0.20-0.35 p ij are constrained to lie in (0,1) which indicates beta distribution; prior distributions are p 12 ~ beta(2,48) and p 21 ~ beta(2,6) Candidate distributions are also beta; 12 ~ beta(1.2,28.8) and 21 ~ beta(1,3); 0.35 Results Very similar to MLE which is not surprising because method is very close to MLE especially with such disperse prior distributions Standard deviations also correspond closely to estimated standard errors of MLE, but these slightly understate standard errors for parameters because estimates are serially correlated; effect is reduced by using every 20 th value in standard deviation calculations Estimated marginal density functions for parameters are very similar for TSE and S&P in regime 1 (low volatility) but are different for regime 2 (high volatility); analysis of timing of regime switches shows that whenever S&P is in regime 2 so is TSE, but TSE also makes an occasional foray into high-volatility regime while S&P is in low-volatility regime; explanation appears to be that there are influences specific to Canadian market that do not affect US market Typically using MLE we assume estimates are normally distributed; here our sample size is small and graphs show that parameter estimates are not at all normally distributed Simulating Predictive Distribution Once we have generated a sample from posterior distribution for parameter vector, we can generate a sample from predictive distribution; advantage of using predictive distribution is that it implicitly allows for parameter uncertainty Predictive distribution can be written as E x [f(x,x)] while using mean of posterior distribution as point estimate is equivalent to using f(x E[ x]) Around medians these two distributions will be similar However since first allows for process variability and parameter variability whereas second allows for only process variability, we expect variance of predictive distribution to be higher Executing this process for 10-year accumulation factors gave a distribution with fatter tails; this has financial implications for equity-linked liabilities Part 1 - Syllabus Study Notes 20 2008 Alberto Dominguez

Investment Guarantees Chapter 7 Guarantee Liability as Derivative Security Actuaries in some areas have been slow to fully accept and implement financial economics; some actuaries feel it is too risky to use, but for contracts involving investment guarantees it may be more risky not to use it GMMB is a European put where K is guarantee, T is maturity date and F T is segregated fund value, so payoff is (K-F T ) +, and can be analyzed using BSM Replication and No Arbitrage Pricing Suppose an insurer had liability to pay in one month an amount exactly equal to price of at that time one unit of a risky asset; traditional actuarial approach would be to take expected value of risk asset using some realistic probability distribution and discount at some rate; such a calculation is essentially worthless; insurer can just buy one unit of risky asset now and it will have enough to precisely meet liability due in one month Replicating portfolio will always have value P = (V u (1-p*) + V d p*)e -r Hardy discusses law of one price, replicating portfolio for puts using a risky asset and a riskless asset in one period binomial model and then dynamic hedging in two period binomial model 8 8 There s no need to reproduce that material in these notes; I know that stuff cold from Shreve I Part 1 - Syllabus Study Notes 21 2008 Alberto Dominguez