P2.T5. Market Risk Measurement & Management Jorion, Value-at Risk: The New Benchmark for Managing Financial Risk, 3 rd Edition Bionic Turtle FRM Study Notes By David Harper, CFA FRM CIPM and Deepa Raju www.bionicturtle.com
Jorion, Chapter 6: Backtesting VaR DEFINE BACKTESTING AND EXCEPTIONS AND EXPLAIN THE IMPORTANCE OF BACKTESTING VAR MODELS.... 3 EXPLAIN THE SIGNIFICANT DIFFICULTIES IN BACKTESTING A VAR MODEL.... 4 VERIFY A MODEL BASED ON EXCEPTIONS OR FAILURE RATES.... 5 2
Jorion, Chapter 6: Backtesting VaR Define backtesting and exceptions and explain the importance of backtesting VaR models. Explain the significant difficulties in backtesting a VaR model. Verify a model based on exceptions or failure rates. Define and identify type I and type II errors. Explain the need to consider conditional coverage in the backtesting framework. Describe the Basel rules for backtesting. Define backtesting and exceptions and explain the importance of backtesting VaR models. Model validation is the process of asking, Is this an adequate model? and/or Is this model consistent with reality?. Validation tools include: Backtesting Stress testing Independent review and oversight Backtesting: Backtesting attempts to verify whether actual losses are reasonably consistent with projected losses. It compares the history of value at risk (VaR) forecasts to actual (realized) portfolio returns. It is important to backtest VaR models because: Backtesting gives a reality check on whether VaR forecasts are well calibrated, Backtesting is central to Basel Committee s ground-breaking decision to allow internal VaR models for capital requirements, and Under the Basel II internal models approach (IMA) to Market Risk, banks must backtest their VaR model (in addition to stress-testing); i.e., the green/yellow/red traffic light zones Exceptions: We can backtest a VAR model with relative ease. When the VaR model is perfectly calibrated, the number of observations falling outside VAR should align with the confidence level. Specifically, the percentage of observed exceptions should be approximately the same as the VaR significance level, where significance is one minus the confidence level. The number of exceptions is also known as the number of exceedances, and this is simply the number of days during which the VaR level is exceeded. When too many exceptions are observed, the model is bad and underestimates risk. This is a major problem because too little capital may be allocated to risk-taking units; penalties also may be imposed by the regulator. When too few exceptions are observed, this also problematic because it leads to an inefficient allocation of capital across units. 3
In summary, the number of loss observations (e.g., daily losses) that exceed the VaR is called the number of exceedances or exceptions. For example, if the VaR model is perfectly calibrated: A 95.0% daily VaR should be exceeded about 13 days per year (5% * 252 = 12.6) A 99.0% daily VaR should be exceeded about 8 days per three years (3 * 252 *1% = 7.6) Jorion on backtesting: Backtesting is a formal statistical framework that consists of verifying that actual losses are in line with projected losses. This involves systematically comparing the history of VAR forecasts with their associated portfolio returns. These procedures, sometimes called reality checks, are essential for VAR users and risk managers, who need to check that their VAR forecasts are well calibrated. If not, the models should be reexamined for faulty assumptions, wrong parameters, or inaccurate modeling. This process also provides ideas for improvement and as a result should be an integral part of all VAR systems. Backtesting is also central to the Basel Committee's ground-breaking decision to allow internal VAR models for capital requirements. It is unlikely the Basel Committee would have done so without the discipline of a rigorous backtesting mechanism. Otherwise, banks may have an incentive to understate their risk. This is why the backtesting framework should be designed to maximize the probability of catching banks that willfully understate their risk. On the other hand, the system also should avoid unduly penalizing banks whose VAR is exceeded simply because of bad luck. This delicate choice is at the heart of statistical decision procedures for backtesting. Explain the significant difficulties in backtesting a VaR model. There are at least two difficulties when backtesting a VaR model: Backtesting remains a statistical decision to accept or reject (effectively a null hypothesis) such that we cannot avoid the risk of committing one of the two possible errors (i.e., Type I and Type II error). Consequently, backtesting can never really tell us with 100.0% confidence whether our model is good or bad. Our decision to deem the model good or bad is itself a probabilistic (less than certain) evaluation. An actual portfolio is contaminated by (dynamic) changes in portfolio composition (i.e., trades and fees), but the VaR assumes a static portfolio. o Contamination will be minimized only in short horizons o Risk manager should track both the actual portfolio return and the hypothetical return (representing a static portfolio) If the model passes back testing with hypothetical but not actual returns, then the problem lies with intraday trading. In contrast, if the model does not pass back testing with hypothetical returns, then the modeling methodology should be reexamined o Sometimes a cleaned-return approximation is used instead of actual return which is actual return minus all non-mark-to-market items like fees, commissions and net income. 4
Verify a model based on exceptions or failure rates. We verify a model by recording the failure rate which is the proportion of times VaR is exceeded in a given sample. Under the null hypothesis of a correctly calibrated model (Null H 0: correct model), the number of exceptions (x) follows a binomial probability distribution: ( ) = (1 ) The expected value of (x) is p*t and a variance, σ^2(x) = p*(1-p)*t. By characterizing failures with a binomial distribution we are assuming that exceptions (failures) are independent and identically distributed (i.i.d.) random variables. Let s illustrate with Jorion s own example. The assumptions are: The backtest (aka, estimation) window is one year with 250 trading days; T = 250 The bank employed a 99.0% confidence value at risk (VaR) model; p = 0.010 The backtest analyzes the results of an actual, observed (realized) series of results. Because each daily outcome either exceeded the VaR or did not, the historical window of observations is characterized by an (i.i.d.) binomial distribution. The table below illustrates (on the left) the distribution if the model is calibrated correctly, specifically when p = 1.0%, and (on the right) if the model is not calibrated correctly, in this case when p = 3.0%. Unlike the correct model, we can specify multiple incorrect models; e.g., p = 2.0% or 5.0% are both incorrect. So the incorrect model choice is arbitrary. Finally, the red regions reflect a selected, arbitrary cutoff of five (5) or more exceptions. We will therefore reject the null (i.e., deem the model as bad ) if we observe five or more exceptions. But this is a probabilistic decision that weighs the Type I versus Type II trade-off. Correct Model Incorrect model No. of p= 0.01 No. of p= 0.02 p= 0.03 p= 0.04 p= 0.05 Except T=250 Except T=250 T=250 T=250 T=250 0 8.11% 0 0.6% 0.0% 0.0% 0.0% 1 20.5% 1 3.3% 0.4% 0.0% 0.0% 2 25.7% 2 8.3% 1.5% 0.2% 0.0% 3 21.5% 3 14.0% 3.8% 0.7% 0.1% 4 13.4% 4 17.7% 7.2% 1.8% 0.3% 5 6.7% 5 17.7% 10.9% 3.6% 0.9% 6 2.7% 6 14.8% 13.8% 6.2% 1.8% 7 1.0% 7 10.5% 14.9% 9.0% 3.4% 8 0.3% 8 6.5% 14.0% 11.3% 5.4% 9 0.1% 9 3.6% 11.6% 12.7% 7.6% 10 0.0% 10 1.8% 8.6% 12.8% 9.6% 11 0.8% 5.8% 11.6% 11.1% 12 0.3% 3.6% 9.6% 11.6% 13 0.1% 2.0% 7.3% 11.2% Cutoff of 5 (or more) 14 Cutoff 0.0% of 51.1% (or more) 5.2% 10.0% reject model but 15 0.0% accept 0.5% model but 3.4% 8.2% possible Type I error possible Type II error 5
Specifically, in this scenario illustrated above: The left-hand panel above characterizes the distribution of a correct 99.0% model (p = 0.01). Over 250 trading days, we should not be surprised to observe, for example, three exceptions because the probability of this outcome is fully 21.5%. On the other hand, the probability is only 0.0806% (rounded to 0.1% above) that a correct 99.0% VaR model would produce nine (9) exceptions. Type I error: Assume we reject the null hypothesis if we observe five or more exceptions (per our selected cutoff). Further, say we observe six exceptions. However, the model might be correct and this is merely the random (sampling variation) outcome. In this case, where the model is correct but we reject the null (i.e., because we expected two or three exceptions but we observed six), we will commit a Type I error. In fact, if the cutoff is five or more exceptions, the probability of a Type I error is 10.8% (this is the sum of the red region values, which is given by 100% - 89.2%). This 10.8% is the probability of rejecting a correct model. Type II error: If the VaR model is actually incorrect (above right panel where p = 0.03) but only four or fewer exceptions are observed, we will commit a Type II error by accepting (not rejecting) an incorrect model. Under this 3.0% assumption, the probability of such a Type II error is 12.8% which is the sum of the red region values. The same distributions are plotted in the graphs below. 6
Above we showed four Incorrect model scenarios to highlight the challenge in specifying the incorrect model scenario. Each of these assumptions reflects an incorrect 99.0% VaR model because the probabilities are, respectively, 2.0%, 3.0%, 4.0% and 5.0%. Each of these different incorrect models, of course, implies a different binomial distribution. Under the dubious assumption of independence (recall the binomial assumes i.i.d.), the binomial model can be used to test whether the number of exceptions is acceptably small. If the number of observations is large, we can approximate this binomial with the normal distribution using the central limit theorem. Technically, one test is that both p*t and (1-p)*T are at least 10; e.g., if T = 250 and p = 1.0%, then p*t is not greater than 10 and we are not justified in employing the normal approximation. In this particular case, binomial already tends to approximate the normal as the sample size is large. Normal approximation of the binomial distribution (in applying the backtest) Jorion provides the shortcut based on the normal approximation: = (1 ) (0,1) is the number of observed exceedences; e.g., VaR was exceeded ( ) times over the back-testing window. is the ex ante expected number of exceptions; e.g., we expect a 95.0% VaR to be exceeded on (0.05*T) days over a (T) day horizon because that is the mean of the binomial distribution. is the distance from our observation (sample mean) to the hypothesized mean, if the model is correct. (1 ) is the standard deviation, or standard error, which standardizes the distance, such that (z) is approximately a standard normal variable 7
Example of normal approximation: 95.0% VaR and 95.0% backtest confidence In Jorion s J.P. Morgan example (Jorion Box 6-1), the VaR confidence level and the backtest significance are both 95.0% while the sample consists of 252 trading days. The cutoff is given by (note the normal deviate = 1.96 because this is a two-tailed significance test): Cutoff = ( % ) +. ( % % ) =. In the actual (observed) scenario, revenue falls short of 95.0% VaR on fully 20 days (out of 252 total days). The z value of 2.14 is larger than 1.96; put another way, 20 occurrences exceed the cutoff value of 19.38. Therefore, we reject the null hypothesis that the VaR is unbiased and decide, based on a desired 95.0% backtest confidence (which informs the 1.96) that 20 exceptions cannot be explained by luck (sampling variation) alone. Example of normal approximation: 95.0% VaR and 99.0% backtest confidence In order to highlight the distinction between the (one-tailed) VaR confidence level and the two-tailed backtest significance level, below the cutoff is illustrated under the same assumptions except the backtest significance level is 99.0%. Notice the standard deviation and corresponding normal distribution (which serves as the approximation) are unchanged but the cutoff shifts from 20 to 22 to reflect the higher sought backtest confidence: 8