Backtesting Trading Book Models Using Estimates of VaR Expected Shortfall and Realized p-values Alexander J. McNeil 1 1 Heriot-Watt University Edinburgh ETH Risk Day 11 September 2015 AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 1 / 34
Overview 1 Introduction 2 Backtesting Value-at-Risk 3 Backtesting Expected Shortfall 4 Backtesting Realized p-values 5 Conclusions AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 2 / 34
Overview 1 Introduction 2 Backtesting Value-at-Risk 3 Backtesting Expected Shortfall 4 Backtesting Realized p-values 5 Conclusions AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 3 / 34
Aims To review the standard paradigm for backtesting VaR and ask (briefly) whether elicitability has a role to play in this. To look at whether an analogous procedure can be adopted for expected shortfall (ES). We will find that the paradigm has to be extended to one where VaR and ES estimates are jointly evaluated. To point out that backtesting of realized p-values is a potential alternative to backtesting of VaR and ES. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 4 / 34
Set-Up for Modelling Trading Book Losses A bank models trading book P&L on a filtered probability space (Ω F (F t ) t N0 P) where F t represents the information available to the risk modelling group at time t. For t N the trading book loss over [t 1 t] is the negative P&L given by an expression of form L t = l [t 1] (X t ) where l [t 1] is a F t 1 -measurable function and X t is a random vector of changes in fundamental risk factors over [t 1 t]. The loss operator l [t 1] is determined by (i) the portfolio mapping at time t 1 (choice of risk factors and valuation models portfolio weights) and (ii) the risk factor values at time t 1. We denote the conditional loss distribution by F t (x) = P (L t x F t 1 ) (1) and asssume that this is a continuous distribution for all t. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 5 / 34
Estimation and Backtesting For t N the risk modeling group forms an estimate F t of F t based on information up to time t 1. The estimate F t is used to compute estimates of Value-at-Risk and expected shortfall. We denote the true underlying values of these quantities by VaR αt and ES αt and the corresponding estimates by VaR αt and ÊS αt. The models may be parametric or non-parametric (e.g. historical simulation). Backtesting consists of a series of comparisons of L t with risk measures derived from F t (such as VaR αt ) for t = 1... n. Each comparison is a one-off unrepeatable experiment because the loss operator l [t 1] changes at each time point as does the conditional distribution F Xt F t 1 of risk-factor changes. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 6 / 34
Testing Philosophy As pointed out by Davis (2014) the hypothesis of correct estimation of F t by F t at a single time point is unfalsifiable on the basis of a single experiment. We can be almost certain that all of our models F t are wrong!. Davis argues that the correct formalism for backtesting is to view it as a forecasting problem and the evaluation of the quality of the forecasts should respect the weak prequential principle of Dawid (1984) - the metric we use should be based only on the predictions we make (for example the VaR estimates) and the realized losses (L t ). The standard VaR backtesting approach based on number and pattern of VaR violations conforms to this paradigm. We ask: Based on ( VaR αt L t ) t=1...n are there any grounds to reject the hypothesis that the sequence of estimates ( VaR αt ) t=1...n are the true quantiles of the conditional loss distributions (F t ) t=1...n?. H 0 : VaR αt = VaR αt t = 1... n. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 7 / 34
How Agents View Backtesting The regulator has mandated that capital for the trading book be linked to a risk measure currently VaR. It wants to be satisfied that the bank is capable of estimating the risk measure. The bank may have twin objectives. It wants to submit backtest results that satisfy the regulator. It doesn t want to submit risk measure estimates that are unnecessaily large. Neither is interested in the question of whether the bank is providing optimal forecasts F t. For internal risk management purposes - such as desk limit setting - a prudent bank might be more motivated to seek improvements in modelling. With the proposed change to expected shortfall as the primary risk measure the main question of interest is whether a backtesting regime for ES similar to that for VaR can be established. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 8 / 34
Overview 1 Introduction 2 Backtesting Value-at-Risk 3 Backtesting Expected Shortfall 4 Backtesting Realized p-values 5 Conclusions AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 9 / 34
Properties of VaR Violations For t N let I t = I {Lt >VaR αt }. The process (I t ) t N of indicator variables for exceedances of the true VaR values satisfies E(I t F t 1 ) = P(L t > VaR αt F t 1 ) = 1 α. (2) This property can be expressed in terms of a calibration function by writing E (h α (VaR αt L t ) F t 1 ) = 0 (3) where h α is the calibration function given by h α (q l) = I {l>q} (1 α). (4) AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 10 / 34
Bernoulli Trials Process For t N let Ît = I {Lt > VaR αt }. We test H 0 : E (h α ( VaR ) αt L t ) F t 1 = 0 t. (5) Lemma: The martingale difference property (5) holds if and only if (Ît) t N is a Bernoulli trials process with E(Ît) = 1 α. This just means that (Ît) is a sequence of iid Bernoulli events. The implication is that the number of violations in n trials should be a binomially-distributed random variable - B(n 1 α) - and the spacings between consecutive violations should be independent geometrically-distributed random variables with mean 1/(1 α). AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 11 / 34
Possible Tests A test for binomial behaviour can be based on a likelihood ratio statistic (Christoffersen 1998) score statistic or direct comparison with binomial. Christoffersen (1998) proposed a test for independence of violations against the alternative of first-order Markov behaviour; a similar test is considered in Davis (2014). Christoffersen and Pelletier (2004) proposed a test based on the spacings between violations. The null hypothesis of exponential spacings (constant hazard model) is tested against a Weibull alternative (in which the hazard function may be increasing or decreasing). See also Berkowitz et al. (2011). A regression-based approach using the CAViaR framework of Engle and Manganelli (2004) works well. This involves fitting model where E(Ît F t 1 ) = p t = g(θ t ) θ t = µ + βît 1 + γ VaR αt (6) for a link function g and testing that β = δ = 0 and g(µ) = 1 α. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 12 / 34
Elicitability Theory in Brief If there exists a scoring function S(x y) and a statistical functional φ( ) such that E (S(x Y )) = S(x y)df Y (y) is minimized by φ(f Y ) for some class of distribution functions F Y X then φ( ) is said to be an elicitable functional and S is said to be a consistent scoring function for φ and X. If the minimum is uniquely attained by φ(f Y ) then S is strictly consistent. φ(f Y ) = F 1 Y (α) is elicitable for continuous strictly increasing distribution functions F Y and the strictly consistent scoring function R S α (x y) = I {y x} α y x. This can be easily verified by checking that d dx E (S α(x Y )) = E (h α (x Y )) where h α is the calibration function (4). It follows immediately that F 1 Y (α) is the minimizing functional. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 13 / 34
A Role for Elicitability in VaR Backtesting? Suppose that the regulator decided to impose a financial penalty on a bank based on its backtest results according to the formula Penalty = n S α ( VaR αt L t ) = t=1 n Ît (1 α) L t VaR αt. (7) At each time t elicitability theory tells us that E (S α (x L t ) F t 1 ) is minimized by choosing x = F 1 t (α) = VaR αt. A bank seeking to minimize the penalty (7) should always submit its best estimate VaR αt based on its best estimate F t of F t. This is not what happens in practice and a bank is free to pursue its own optimization problem subject to the condition that its backtest data appear to conform to the Bernoulli trials hypothesis. It is now well known that expected shortfall is not an elicitable functional (Gneiting 2011; Bellini and Bignozzi 2013; Ziegel 2015). t=1 AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 14 / 34
Overview 1 Introduction 2 Backtesting Value-at-Risk 3 Backtesting Expected Shortfall 4 Backtesting Realized p-values 5 Conclusions AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 15 / 34
Finding a Calibration Function for Expected Shortfall Based on (ÊS αt L t ) t=1...n are there any grounds to reject the hypothesis that the sequence of estimates (ÊS αt) t=1...n are the true expected shortfall values for the distributions (F t ) t=1...n? A natural approach is to look for a function h such that E(h(ES t α L t+1 ) F t ) = 0 for a large class of models. However it is not possible to find such a function (a fact that is related to the non-elicitability of expected shortfall. Instead the backtests that have been proposed generally rely on calibration functions h that also reference VaR and satisfy E(h(VaR t α ES t α L t+1 ) F t ) = 0. Based on ( VaR αt ÊS αt L t ) t=1...n are there any grounds to reject the hypothesis that the sequence of estimates ( VaR αt ÊS αt) t=1...n are the true VaR and ES values for (F t ) t=1...n? AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 16 / 34
First Calibration Function By the definition of expected shortfall we have that E ((L t ES αt )I t F t 1 ) = 0. (8) Using the calibration function ( ) l e h (1) (q e l) = I {l>q} e we define the quantity K t = h (1) (VaR αt ES αt L t ). (9) Expressions of this kind were studied in McNeil and Frey (2000) who used them to define violation residuals. The idea of analysing (9) has been further developed in Acerbi and Szekely (2014). Clearly (K t ) t N is a martingale difference satisfying E (K t F t 1 ) = 0. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 17 / 34
Second Calibration Function Acerbi and Szekely (2014) obtained an alternative calibration function by considering E (L t I t F t 1 ) ES αt (1 α) = 0 (10) which also follows from (8). If we define we can set so that E (S t F t 1 ) = 0. h α (2) (q e l) = I {l>q} (1 α) e S t = h (2) α (VaR αt ES αt L t ) (11) We use a slightly different scaling to Acerbi and Szekely (2014). Under our definition S t and K t are related by S t = K t + V t V t = h α (VaR αt L t ) = (I t (1 α)). Note that existence of these joint calibration functions relates to the so-called co-elicitability of VaR and ES (Lambert et al. 2008). AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 18 / 34
Properties of (K t ) (S t ) and (V t ) All three are martingale difference processes (F t -adapted processes (Y t ) satisfying E(Y t F t 1 ) = 0). (V t ) is an iid series but it is not possible to make stronger statements about (K t ) and (S t ) without making stronger assumptions about the underlying model. For example suppose that losses (L t ) follow an iid innovations model of the form L t = l [t 1] (X t ) = σ t Z t t (12) where σt 2 = var(l t F t 1 ) and (Z t ) forms a strict white noise (an iid process) with mean zero and variance one. For a simple example think of a GARCH model. Under assumption (12) the sequences (K t ) and (S t ) are processes of iid variables with mean zero. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 19 / 34
(K t ) and (S t ) for GARCH Process (m = 2000) X 4 2 0 2 4 0 500 1000 1500 2000 Index Kstat 0.2 0.2 0.6 0 500 1000 1500 2000 Index Sstat 0.0 0.5 1.0 1.5 0 500 1000 1500 2000 Index AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 20 / 34
Formulating Tests Tests are based on finite realizations of the processes ( K t ) t N (Ŝt) t N and ( V t ) t N defined by These are related by K t = h (1) ( VaR αt ÊS αt L t ) Ŝ t = h α (2) ( VaR αt ÊS αt L t ) V t = h α ( VaR αt L t ) Ŝ t = K t + V t. (13) We expect all three datasets to behave like realizations of martingale difference sequences. Under assumption (12) we expect them to behave like realizations of iid mean-zero random variables. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 21 / 34
Interpreting Tests If the data suggest a tendency for E( V t F t 1 ) > 0 then this can be interpreted as systematic underestimation of VaR. If the data suggest tendency for E( K t F t 1 ) > 0 then this implies that E(L t L t > VaR αt ) ÊS αt > 0 which means that the estimated expected shortfall systematically underestimates the expected loss size when the estimated VaR is exceeded. If the data suggest a tendency for E(Ŝt F t 1 ) > 0 then in view of the identity (13) this could be due to one or both of the above systematic estimation biases. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 22 / 34
Possible Tests McNeil and Frey (2000) suggested a bootstrap hypothesis test (Efron and Tibshirani 1994) of the mean-zero hypothesis for the ( K t ) values. A t test gives a very similar performance. It is possible to extend the regression approach based on CAViaR used for VaR to test E( K t F t 1 ) = 0 against an explicit time series alternative in which the conditional mean is a function of F t 1 -measurable variables such as past violation indicators or VaR and ES estimates. Similarly for (Ŝt). Acerbi and Szekely (2014) have proposed a Monte Carlo hypothesis test. The test requires that we can sample losses from F t (x) for x VaR αt and thus has some disadvantages. It doesn t fit into the prequential framework because it doesn t just involve {( VaR αt ÊSαt Lt) t = 1... n}. It would be a difficult approach for an external agent like a regulator to check since they typically do not have access to F t. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 23 / 34
Example Simulation Experiment. The true data generating mechanism is a GARCH(11) model with Student t innovations with 4 degrees of freedom. Models are estimated using windows of 1000 past data but are only refitted every 10 steps. Model A. Forecaster uses an ARCH(1) model with normal innovations. This is misspecified with respect to the form of the dynamics and the distribution of the innovations. Model B. Forecaster uses a GARCH(11) model with normal innovations. This is misspecified with respect to the distribution of the innovations. Model C. Forecaster uses a GARCH(11) model with Student t innovations. He has identified correct dynamics and distribution but still has to estimate the parameters of model. The aim is to estimate the 97.5% VaR and expected shortfall of F t. Binomial test p-values for A B and C are 0.21 0.07 0.35. Shortfall t-test p-values for A B and C are 0.00 0.00 0.41. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 24 / 34
Residuals Model B 0 500 1000 1500 2000 4 2 0 2 4 Index X 0 500 1000 1500 2000 0.2 0.2 0.6 1.0 Index Kstat 0 500 1000 1500 2000 0.0 0.5 1.0 1.5 2.0 Index Sstat AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 25 / 34
Residuals Model C 0 500 1000 1500 2000 4 2 0 2 4 Index X 0 500 1000 1500 2000 0.2 0.2 0.6 Index Kstat 0 500 1000 1500 2000 0.0 0.5 1.0 1.5 Index Sstat AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 26 / 34
Overview 1 Introduction 2 Backtesting Value-at-Risk 3 Backtesting Expected Shortfall 4 Backtesting Realized p-values 5 Conclusions AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 27 / 34
Realized p-values We briefly consider an alternative to backtests based on ES. Let U t = F t (L t ) for t N. The process (U t ) t N is a process of iid standard uniform variables (Rosenblatt 1952). We define realized p-values by Ût = F t (L t ) for t N. Realized p-values effectively contain information about VaR violations at any level α since Û t α L t VaR αt. It is possible to transform uniform variables to any scale. For example if we define Ẑt = Φ 1 (Ût) where Φ is the standard normal df then we would expect that the (Ẑt) t N variables are iid standard normal. Berkowitz (2001) has proposed a test based on this fact. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 28 / 34
Berkowitz Test The realized p-values can be truncated by defining ( ) ) = min max (Ût α 1 α 2 0 α 1 < α 2 1. Û t Applying the probit transformation we obtain truncated z values: Ẑ t = Φ 1 (Û t ) t N. Let TN(µ σ 2 k 1 k 2 ) denote a normal distribution truncated to [k 1 k 2 ]. Under the null hypothesis of correct estimation of the loss distribution the truncated z-values are iid realizations from a TN(0 1 Φ 1 (α 1 ) Φ 1 (α 2 )) distribution. Berkowitz applies one-sided truncation and uses a likelihood ratio test to test the null hypothesis against the alternative that the truncated z values have an unconstrained TN(µ σ 2 Φ 1 (α 1 ) ) distribution. This can be extended to a joint test of uniformity in the tail and independence by making µ (and possibly σ) time dependent. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 29 / 34
Overview 1 Introduction 2 Backtesting Value-at-Risk 3 Backtesting Expected Shortfall 4 Backtesting Realized p-values 5 Conclusions AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 30 / 34
Conclusions Value-at-Risk has special properties that make it particularly natural to backtest. Namely the violation process forms a Bernoulli trials process under any reasonable model for the losses. The lack of a natural calibration function for expected shortfall which is a consequence of the lack of elicitability means that expected shortfall can not be backtested in isolation. However it is feasible to develop joint backtests of ES and VaR. These can detect deficiencies of tail models that are not detected by backtesting VaR at a single level. The recommended tests are tests of the martingale difference hypothesis for violation residuals (or the iid mean-zero hypothesis under stronger assumptions). Static bootstrap and t tests are useful and regression-based tests can also be devised. The Monte Carlo test of Acerbi and Szekely (2014) may be less suited to backtesting in the regulatory context. Tests of realized p-values in the tail are an interesting and effective alternative to joint tests of VaR and ES. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 31 / 34
For Further Reading Acerbi C. and Szekely B. (2014). Back-testing expected shortfall. Risk pages 1 6. Bellini F. and Bignozzi V. (2013). Elicitable risk measures. Working paper available at SSRN: http://ssrn.com/abstract=2334746. Berkowitz J. (2001). Testing the accuracy of density forecasts applications to risk management. Journal of Business & Economic Statistics 19(4):465 474. Berkowitz J. Christoffersen P. and Pelletier D. (2011). Evaluating value-at-risk models with desk-level data. Management Science 57(12):2213 2227. Christoffersen P. (1998). Evaluating interval forecasts. International Economic Review 39(4). Christoffersen P. F. and Pelletier D. (2004). Backtesting value-at-risk: a duration-based approach. Journal of Financial Econometrics 2(1):84 108. Davis M. H. A. (2014). Consistency of risk measure estimates. Preprint available at arxiv:1410.4382v1. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 32 / 34
For Further Reading (cont.) Dawid A. (1984). Present theory and potential developments: some personal views. Statistical theory: the prequential approach. Journal of the Royal Statistical Society Series A 147(2):278 292. Efron B. and Tibshirani R. J. (1994). An Introduction to the Bootstrap. Chapman & Hall New York. Engle R. and Manganelli S. (2004). CAViaR: conditional autoregressive value at risk by regression quantiles. Journal of Business & Economic Statistics 22(4):367 381. Gneiting T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association 106(494):746 762. Lambert N. Pennock D. and Shoham Y. (2008). Eliciting properties of probability distributions. In Proceedings of the 9th ACM Conference on Electronic Commerce. EC 08 ACM New York. pages 129 138. McNeil A. J. and Frey R. (2000). Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach. Journal of Empirical Finance 7:271 300. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 33 / 34
For Further Reading (cont.) Rosenblatt M. (1952). Remarks on a multivariate transformation. Annals of Mathematical Statistics 23:470 472. Ziegel J. F. (2015). Coherence and elicitability. Mathematical Finance doi: 10.1111/mafi.12080. AJM (HWU) Backtesting Trading Book Models ETH Risk Day 2015 34 / 34