Stochastic reserving using Bayesian models can it add value?

Stochastic reserving using Bayesian models can it add value? Prepared by Francis Beens, Lynn Bui, Scott Collings, Amitoz Gill Presented to the Institute of Actuaries of Australia 17 th General Insurance Seminar 7 10 November 2010 Gold Coast This paper has been prepared for the Institute of Actuaries of Australia s (Institute) 17 th General Insurance Seminar. The Institute Council wishes it to be understood that opinions put forward herein are not necessarily those of the Institute and the Council is not responsible for those opinions. 2010 Finity Consulting Pty Ltd The Institute will ensure that all reproductions of the paper acknowledge the Author/s as the author/s, and include the above copyright statement: The Institute of Actuaries of Australia Level 7 Challis House 4 Martin Place Sydney NSW Australia 2000 Telephone: +61 2 9233 3466 Facsimile: +61 2 9233 3446 Email: actuaries@actuaries.asn.au Website: www.actuaries.asn.au Page 1

Abstract Stochastic reserving methods applied to traditional actuarial triangular data have been a keen area of research by the actuarial profession over the past decade. Many of these methods are complex and are often viewed as something of a black box. This paper develops Bayesian stochastic reserving models that are similar in design to traditional spreadsheet models. It shows that, under certain conditions, the Bayesian model will produce the same result as the spreadsheet model. The paper then investigates what additional benefits using a Bayesian model can bring to a valuation. The paper explores whether using a Bayesian chain ladder and PPCI model, applied to triangular data, can do a better job of calculating the central estimate and risk margin than can traditional actuarial techniques. We find two key areas where Bayesian models allow for better modelling. The first is skewness of actuarial parameters. We investigate the best distribution shape for common actuarial assumptions, such as chain ladder, incurred cost development, and payment factors. We find that, where these factors exhibit sample skew of more than 3, a skewed distribution can give a significantly better fit than a Normal distribution. Bayesian models allow for this. We also investigate prior distributions, which are a distinctly Bayesian feature. Prior distributions allow actuaries to incorporate external information into their models, in a similar way to actuarial judgment being applied to traditional spreadsheet models. We investigate the ways in which prior distributions can be used, and their impact on the results. Keywords: Bayesian models, chain ladder, PPCI, skewed distributions, prior distributions, suitability of stochastic models The authors welcome comments and questions on this paper. You can contact the authors at francis.beens@finity.com.au Page 2

Introduction Stochastic reserving methods have been a keen area of research for the global actuarial profession over the past decade (For reviews, see England and Verrall, 2002; and Li, 2006). One particular branch of stochastic methods are known as Bayesian methods. These methods can take a variety of forms, but share a common feature in that they take advantage of Bayesian statistical modelling techniques. This paper aims to contribute in three areas, by: developing Bayesian versions of traditional spreadsheet 1 based reserving models which, under a set of standard conditions give the same central estimate as traditional spreadsheet models, investigating the most appropriate distribution shape or family (eg., Normal, LogNormal, Gamma) for the parameters used in the models, and demonstrating how actuarial judgment can be incorporated into the models, via the use of prior distributions, and exploring the impact this has on the central estimate and risk margins. This paper is organised as follows: The Background provides background information on Bayesian statistics, and discusses how stochastic models may be useful in a reserving context Chapter 1 presents a Bayesian chain ladder and a Bayesian PPCI model Chapter 2 discusses the information available from the models (essentially the distribution of reserves, plus the distribution of every parameter and result in the model), and how it may be useful Chapter 3 investigates distribution shape, and discusses the most appropriate distribution form for various actuarial parameters. We investigate the best fitting distribution for chain ladder on numbers, incurred cost development, and payment per claim incurred factors across 70 different claims portfolios. Chapter 4 discusses the use of prior information, and provides two examples of how prior information can be incorporated into Bayesian models, and the impact this can have. The Conclusion briefly summarises the key points in the paper. 1 Deterministic reserving models do not, of course, need to be implemented in a spreadsheet. However, spreadsheet implementations of the traditional deterministic reserving models are popular among actuaries. Throughout this paper we use deterministic and spreadsheet reserving models interchangeably. Page 3

Background The background briefly sets out, at a high level, what Bayesian statistics is and provides references for those interested in more information. We also discuss the justification for using stochastic reserving models, and the benefits they can bring. What is Bayesian statistics? Bayesian statistics is a branch of statistics that approaches statistical modelling from a slightly different perspective to classical statistics. Bayesian statistics makes use of two sources of information: the observed data (called the likelihood ), and additional external information that may not necessarily be present in the observed data (this is called the prior ). Both the likelihood and the prior are formulated as probability distributions. Bayesian statistics is in contrast to classical statistics, which is generally limited to using the observed data only. O Hagan (2003) is an excellent, non-technical, introduction to Bayesian statistics. He notes that the Bayesian approach to statistics provides more intuitive and meaningful inferences, answers complex questions cleanly and exactly, makes use of all available information, and is particularly well suited for decision-making. In the same volume, Jansen and Hagenaars (2003) note that Bayesian statistics holds great promises for model calibration, provides the perfect starting point for uncertainty analysis and provides an excellent starting point for decision support. These comments suggest that Bayesian statistics is particularly well suited to actuarial applications. O Hagan sets out the fundamental building blocks of Bayesian analysis: create a statistical model to link data to parameters for example, the statistical model may be that development factors in an ICD model are Normally distributed. formulate prior information about parameters prior information may be uninformative, in that it does not influence the results of the model, or informative where it does influence the results. An informative prior may be the ICD factor for development period 2 will be about 1.5. Prior information is formulated as a distribution. combine the two sources of information using Bayes theorem the two sources being the prior and the likelihood (the observed data). The end result of combining the two sources is called the posterior. Bayes theorem states that the posterior is proportional to the likelihood multiplied by the prior. Bayes theorem is the formal mathematical method used to combine the two sources. The formulae can be time consuming and difficult to solve analytically, so simulation methods known as Markov Chain Monte Carlo (MCMC) are generally used to solve the equations. use the resulting posterior distribution to derive inferences about parameters in addition to inference (for example, what is the probability that the true development factor is less than 1) the posterior distribution can be used as a projection tool. In an ICD example, the posterior distribution is the distribution of an ICD factor, so by multiplying the incurred to date by the posterior distribution, we have a stochastic reserving model. Page 4

As mentioned in point three, Bayesian statistics involves combining a likelihood distribution with a prior distribution. Depending on the choice of distribution family, this can be difficult to do analytically. Furthermore, Bayesian models are often hierarchical, in that a prior distribution could itself be a posterior, made up of another likelihood and prior. Solving the resulting equations can be mathematically intractable. In practice, simulation methods such as Monte Carlo Markov Chains are used to solve what would otherwise be impossible to solve analytically. In this paper we use the freely available software program WinBUGS to implement Bayesian models. WinBUGS handles all of the simulation required to solve the equation. We do not go into the mathematics behind how to solve Bayesian equations for those interested, there are a range of good papers available. Muller (2001) provides a brief note on Gibbs sampling and the Metropolis-Hastings algorithm. Neal (1993) explains Bayesian inference, why MCMC is required, and the mechanics behind Gibbs sampling. Congdon (2006) discusses the main MCMC sampling algorithms, and provides a comprehensive list of further references. And Lunn, Thomas, and Spiegelhalter (2000) provide information about how WinBUGS implements MCMC sampling. A general internet search will provide many more papers. It is important to note that being able to solve complicated equations or code a Metropolis-Hastings algorithm is not necessary to building Bayesian stochastic reserving models, as WinBUGS implements the necessary simulations techniques. Bayesian statistics is currently used in a wide range of disciplines such as health, pharmacology, ecology, environmental science, economics, particle physics, and even archaeology (generally in radiocarbon dating). As pointed out in Piwcewicz (2008) Bayesian statistics is not an actuarial model in and of itself. In a similar vein, Generalised Linear Modelling is not inherently actuarial. But you can build an actuarial model using Bayesian methods, just as you can use GLMs for a variety of actuarial purposes. Regulatory momentum for stochastic reserving When setting an outstanding claims reserve, actuaries first need to estimate the central estimate, or mean, of the distribution of possible outcomes (see paragraph 5 of AASB 1023; Attachment A paragraph 18 of GPS 310). In addition, actuaries need to estimate a risk margin. Under AASB 1023 the probability of sufficiency is not set at a specific level, while in GPS 310 it is set at 75% (See paragraph 5 of AASB 1023; Attachment A paragraph 24 of GPS 310). Our understanding is that in Australia, it is relatively uncommon for actuaries to estimate the full liability distribution when calculating the central estimate or risk margin. Nor is it common for actuaries to use the same model for both. APRA, in their risk margins survey update (APRA, 2008, at page 12) note that most insurers retain a heavy reliance on the Bateup and Reed and Collings and White reports. Medium and large insurers analyse internal data quite extensively, and most commonly use the stochastic chain ladder model to calculate risk margins. Bootstrapping and the Mack model are also used. There were also subjective loadings Page 5

added to allow for sources of error not explicitly considered (e.g. model error and unmodelled sources of uncertainty). The International Accounting Standards Board recently released a draft International Financial Reporting Standard on Insurance Contracts. The Australian Accounting Standards Board has released an exposure draft of the IFRS (ED/2010/8) which notes that AASB 1023 will be replaced by the new IFRS. The draft IFRS has quite a lot of detail on calculating the central estimate and risk margin. In particular, the draft says that the starting point for an estimate of cash flows is a range of scenarios that reflect the full range of possible outcomes, (para B38 of AASB ED/2010/8) and that, where a distribution can adequately describe the range of possible outcomes, a distribution may be used (para B39). When calculating risk margins (called risk adjustments in the draft IFRS), only three techniques are permitted a confidence level approach (similar to current Australian practice), a condition tail expectation approach, and a cost of capital approach (para B73). Each of these requirements presupposes some underlying distribution (see paragraphs B75-B79 on confidence levels, B80-B83 on conditional tail expectations, and B84-B90 on cost of capital). Furthermore, where the distribution is highly skewed a confidence level approach would not be appropriate, and a conditional tail expectation approach or cost of capital approach would need to be used (paras B95-B97). APRA have also stated they would like to see more analysis of the uncertainty present in the gross of recovery outstanding claims provisions. (APRA, 2010, page 37) Increasingly, the regulatory environment is requiring more thought to be given to the distribution of results when calculating central estimates and risk margins. Are there other benefits? There are a number of benefits that Bayesian stochastic reserving models bring. The first benefit is that Bayesian modelling is flexible enough to build models that are similar to currently used reserving models. Bayesian ICD, PPCI, PPCF, PCE, or BF models are all possible reserving models. Bayesian models can easily incorporate GLM models, making Bayesian versions of a broad range of stochastic reserving models possible. Under certain conditions (which we discuss in detail later), the Bayesian version of a model will give the same central estimate as the spreadsheet version of the model. This removes much of the black box element that many stochastic reserving models suffer from, where the stochastic model gives a result that is not reconcilable to the result from a traditional method. Using a Bayesian models, we can start with identical results to a spreadsheet model, and then observe and explain any changes to results as the initial conditions are changed. Page 6

This means the same model can be used for calculating the central estimate and risk margin. 2 If we choose to use a PPCI model to set the central estimate, we can create a Bayesian PPCI model and use it for both the central estimate and risk margin. Stochastic models produce a full distribution of results, so no additional work is needed to estimate the risk margin at an 85% probability of sufficiency in addition to the 75% probability of sufficiency. All percentiles from the distribution of results are available. A full distribution also allows the calculation of conditional tail expectations. As ever, the accuracy of any such percentile estimates will depend on the accuracy of the input assumptions. Bayesian models can directly incorporate prior information, or information that is external to the observed data. This is in contrast to most stochastic reserving models, which cannot incorporate information that is external to the data being analysed (see section 10.8 of Li, 2006). Bayesian models provide a formal framework for integrating actuarial judgment where an actuary does not consider the pure data alone to completely describe all of the information relevant to valuing the liabilities. As we are dealing with distributions of results (and distributions of parameters), judgment can be used to more properly deal with skewness in actuarial parameters. Fleming (2008) and Houltram (2003) note that the average of a small sample from a skewed distribution will often be less than the average of the distribution itself. Bayesian models can incorporate this belief and assist in selecting a more appropriate central estimate. In a sense, Bayesian models may be seen as a bridge between pure stochastic models and pure deterministic models. They allow actuaries to enhance and expand on the strengths of existing reserving approaches, without the need to start again from scratch with a purely stochastic model. The remainder of this paper introduces the models, explores whether actuarial parameters arise from skewed distributions, explains how prior information can be used, and discusses the benefits that Bayesian stochastic reserving models bring. 2 The model will produce the 75 th percentile of the distribution of reserves. The distribution will only include sources of uncertainty that we have included in our model depending on the model structure this may be independent risk only, or it may be independent risk plus components of internal and/or external systemic risk. It will still be necessary to incorporate other elements of risk that are outside the model to calculate an actual risk margin. Page 7

Chapter 1 The models In this section we describe how the models work. As we describe them, we highlight specific aspects of the models that give them additional flexibility or insight over traditional deterministic models. In later sections of this paper we discuss this additional flexibility and insight in more depth. There have been many papers that present a variety of Bayesian reserving models. For an easy to follow introduction to using WinBUGS to build Bayesian models, see Scollnik (2001) and Scollnik (2004). For a selection of additional Bayesian models see de Alba (2002), Ntzoufras and Dellaportas (2002), Verrall (2004; 2007), Meyers (2007a; 2007b; 2009), and Piwcewicz (2008). We present two models; a chain ladder style model, and a PPCI style model. These models work in a similar manner to the deterministic versions of chain ladder and PPCI models that most GI actuaries would be familiar with, and are often implemented in a spreadsheet. Under a set of standard assumptions, the stochastic models will produce the same central estimate as spreadsheet models would. Sample model code is provided in the appendix. Chain ladder model Our chain ladder model can be used with numbers of claims, with payments, or with incurred cost (payments plus case estimates). In this example we demonstrate the model using numbers of claims. 3 The initial steps of the model are identical to a spreadsheet model. Figure 1.1 Cumulative Claims Reports j 19 38 50 51 58 60 70 70 70 70 21 49 61 66 70 73 73 74 74 17 38 47 52 52 61 63 65 23 33 41 48 56 56 61 14 37 40 53 56 60 i 19 37 41 54 59 22 41 41 41 23 37 45 18 48 20 The model begins with a triangle of cumulative numbers of claims, shown in Figure 1.1. Each cell can be referenced by its accident period (i) and its development period (j): Numbers [Acc i, Dev j] The model then calculates chain ladder factors or development factors, by dividing each cell in the triangle by the cell immediately to its left. That is: Dev factor = Numbers [acc i, Dev j] / Numbers [acc i, Dev j-1] 3 Note that this data has been created solely for use in this example. Page 8

This gives a triangle of development factors, shown in Figure 1.2. Figure 1.2 Claim Report Development Factors j 2.643 1.081 1.325 1.057 1.071 1.947 1.108 1.317 1.093 i 1.864 1.000 1.000 1.609 1.216 2.667 For example, the development factor for the most recent accident period is calculated as: 48 / 18 = 2.667. We then need to select development factors for each column (or development period ). These development factors will be used to project out, or multiply through, the claim numbers to calculate an ultimate number of claims. Each of our selected development factors is not a single constant, but is actually a statistical distribution. 4 In the example, in order to produce the same results as a spreadsheet model would, we will assume each of the distributions is from the Normal family. In later sections we consider if this is appropriate, and what the impact of assuming a different distribution family is. Figure 1.3, below, sets out how the development factors are chosen. 4 Technically, it is not just a single development factor distribution per column. Each future cell is a separate distribution, so for each column there will be a different development factor for each accident period yet to be projected. However, these development factors all have the same data and prior distributions, and so are all the same. It is possible, however, to set different priors by accident period for a single column of development factors. Page 9

Figure1.3 Diagram of Chain Ladder Model j 2.643 1.081 1.325 1.057 1.071 1.947 1.108 1.317 1.093 i 1.864 1.000 1.000 1.609 1.216 2.667 Dev fac Dev fac Dev fac Dev fac Dev fac Dev facs Col 1 Col 2 Col 3 Col 4 Col 5 Col 6,7,8, Col 1 Mean Likelihood (data) Col 1 Std Dev Likelihood (data) X Prior X Prior We now introduce distinctly Bayesian concepts. In classical statistics, we would say that a Normal distribution has a mean of x, and a variance of y, where x and y are constant parameters. In Bayesian statistics, x and y become distributions themselves. The mean parameter of the development factor distribution is itself a distribution, and the variance parameter of the development factor distribution is also itself a distribution. The mean parameter distribution and variance parameter distribution are themselves determined by two further distributions: 1. the likelihood of the data in this case, our data is the development factors in a particular column, and 2. the prior distribution. The likelihood is a probability distribution that gives the probability of seeing the data that has actually been observed, given a range of possible parameter values. For example, you may see the following three data points, from some random process: 3, 4, 5. The sample mean of these is 4. The true mean of the underlying process that these numbers come from might be 4, but it might be something else. The likelihood is the distribution of possible values that the true mean can take. Page 10

Based on the sample of 3, 4, 5, it would have a peak at 4 the most likely true mean from the underlying distribution is 4. But there is some chance that the true mean is lower or higher than 4. The prior is a distribution that we specify. A prior allows us to incorporate additional information, external to the data in the model. There are two types of prior that can be used an informative prior, which will influence the results of the model, and an uninformative prior, which will have no influence on the results of the model. With an uninformative prior, the model is entirely data driven. 5 We would use an informative prior where we have additional information to incorporate into the model. In this example we use an uninformative prior at this stage, we would like the model to be completely driven by the observed experience. In later sections we discuss when using an informative prior may be appropriate, and the impact of using an informative prior. Bayes formula gives the rule for combining these two sources of information: Developmen tfactorparameter likelihood prior That is, the distribution of the chain ladder development factor parameter (eg., the mean of the development factor) is proportional to the likelihood (the information that comes from the data) multiplied by the prior distribution (the additional information incorporated into the model). For particular distribution shapes it is possible to solve this formula analytically, although it may be tedious. For other distributions it becomes mathematically intractable. To solve the equations, therefore, we use Monte Carlo Markov Chain simulation, which is implemented in the free WinBUGS software package (Lunn, Thomas, and Spiegelhalter, 2000). For a Normal distribution, the mean of the likelihood will be the simple average of the development factors. 6 This is what our Bayesian model produces when using a Normal distribution and an uninformative prior. The Bayesian chain ladder model then works by multiplying out the number of claims along the diagonal by the development factor distributions. Because the development factors are distributions, the resulting projections are distributions themselves. 5 There is active debate as to whether any prior distribution can theoretically be truly uninformative. For our purposes, we use distributions with very large variances (eg., a Normal distribution with a variance of 10 36 ) which is more than uninformative enough! 6 Note that we have used the simple average in our Bayesian model and our spreadsheet model. By using a slightly different model design you can easily use a weighted average in the Bayesian model see Scollnik (2004) for an example of a Bayesian model that uses weights. Page 11

Figure 1.4 - Diagram of Chain Ladder Model (Cont ) j 14 37 40 53 56 60 19 37 41 54 59 68 i 22 41 41 41 52 23 37 45 52 65 18 48 57 82 20 42 49 71 This is 48 multiplied by our column 3 development factor distribution. Therefore it is a distribution itself. The expected value (mean) from the Bayesian chain ladder model agrees with the results from the spreadsheet model. The table below shows the results: Table 1.1 Projected Claim Numbers Ultimate Numbers Acc Period Reported Numbers CL factors S'Sheet Chain Ladder Bayesian Chain Ladder 1 70 1.000 70.0 70.0 2 74 1.000 74.0 74.0 3 65 1.000 65.0 65.0 4 61 1.015 61.9 61.9 5 60 1.088 65.3 65.3 6 59 1.158 68.4 68.3 7 41 1.258 51.6 51.6 8 45 1.441 64.9 64.9 9 48 1.702 81.7 81.6 10 20 3.542 70.8 70.7 Total 543 673.5 673.3 The spreadsheet model produces an ultimate number of claims of 673.5. The Bayesian model produces an ultimate of 673.3 materially the same number as the spreadsheet model. The accident period results are also materially the same for the two models. The small differences come about because the Bayesian results are from a simulation. PPCI model Our Bayesian PPCI model starts with a chain ladder model on numbers of claims, which is used to calculate the ultimate number of claims by accident period. The model then calculates payments per claim incurred by taking the triangle of cumulative, inflation adjusted payments, and dividing through by the accident period ultimate claim number means. The figure below shows the data used in this example, as well as the PPCIs. Page 12

Figure 1.5 - Diagram of PPCI Model Incremental Payments 15,000 35,000 112,000 85,000 45,000 110,000 35,000 4,000 8,000 5,000 20,000 65,000 165,000 112,000 55,000 40,000 18,000 15,000 3,000 18,000 30,000 144,000 63,000 111,000 35,000 22,000 6,000 22,000 34,000 305,000 115,000 86,000 29,000 19,000 Ultimate Claim Numbers 17,000 112,000 298,000 204,000 65,000 42,000 mean(ult.row[1]) 70 18,000 26,000 168,000 119,000 25,000 mean(ult.row[2]) 74 17,000 19,000 115,000 110,000 mean(ult.row[3]) 65 20,000 28,000 215,000 mean(ult.row[4]) 62 21,000 14,000 mean(ult.row[5]) 65 19,000 mean(ult.row[6]) 68 Divide payments by mean(ult.row[7]) 52 To give mean(ult.row[8]) 65 mean(ult.row[9]) 82 mean(ult.row[10]) 71 PPCIs 260 1,715 4,563 3,123 995 643 263 380 2,458 1,741 366 330 369 2,230 2,133 308 432 3,315 257 172 269 PPCI PPCI PPCI PPCI PPCI PPCI PPCIs Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 6,7,8, Col 2 Mean Likelihood (data) Col 2 Std Dev Likelihood (data) X Prior X Prior We then have a triangle of PPCIs. Our Bayesian PPCI model progresses in a similar way to our Bayesian chain ladder model. Each column of PPCIs becomes the data points for the PPCI distributions by development period. The selected PPCIs are distributions they allow for the volatility in the PPCI data. As with the Bayesian chain ladder model, for this example we use a Normal distribution and an uninformative prior at this stage, the model is completely driven by the observed experience. For a Normal distribution, the mean of the likelihood will be the simple average of the PPCIs. This is what the Bayesian PPCI model produces when using a Normal distribution and an uninformative prior. Page 13

The final step in the Bayeisan PPCI model is to multiply the PPCI distributions by the claim number distributions. This allows the model to capture uncertainty in the number of claims, as well as uncertainty in the average incremental payments made on each claim. As with the Bayesian chain ladder model, the Bayesian PPCI model gives the same central estimate as the spreadsheet model, when using a Normal distribution and an uninformative prior. Table 1.2 sets out the results from the spreadsheet model and the Bayesian model: Table 1.2 Projected Outstanding Payments Outstanding Claims Acc Period S'Sheet PPCI Bayesian PPCI ($'000) ($'000) 1 0 0 2 0 0 3 0 0 4 7 7 5 30 30 6 83 83 7 113 113 8 258 258 9 565 566 10 533 534 Total 1,590 1,593 The spreadsheet PPCI model produces outstanding claims of $1.590 million, while the Bayesian PPCI model produces $1.593 million once again, these are materially the same, and the results by accident period are also the same. Benefits of Bayesian models We have seen that our Bayesian models produce the same expected, or mean, results as the spreadsheet models, when using a Normal distribution and an uninformative prior. This is an advantage, as it means that we can understand and explain the central estimate result as we would for spreadsheet models. In itself, it is not a reason for changing over to use Bayesian models. The three reasons why we might like to use Bayesian models are: Results Distribution shapes Prior distributions The remainder of the paper goes through those reasons. Firstly, we discuss the results available from a Bayesian model. The results available are useful to understand the overall distribution of reserves, but also the distribution of each development factor (and thus potential values for each development factor). Page 14

Secondly, we discuss whether a Normal distribution is in fact appropriate for general insurance actuarial parameters, and what the impact of changing the distribution may be. Thirdly, we discuss how informative priors can be used, and how this can change the results. Page 15

Chapter 2 The outputs and results Our Bayesian model gives us not only a central estimate, but also a full distribution of the results. It also gives a full distribution for each of the parameters in the model. Full distribution of results Having the full distribution of results makes it possible to use any percentile on the distribution. It is as easy to get the 75% probability of sufficiency for an APRA risk margin, as it is to get the 65% or 90% probability of sufficiency. The figure below shows the distribution of outstanding claims from the example PPCI model: Figure 2.1 Outstanding Claim Distribution - PPCI 0.060 Probability Density 0.050 0.040 0.030 0.020 0.010 0.000 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 $000 Normal Mean Median 75% 95% From a risk margins perspective, using a Bayesian model means you can use a single model to produce both the central estimate and risk margin. Rather than use a spreadsheet PPCI model for the central estimate and an ICD bootstrap model for the risk margin, you can use a Bayesian PPCI for both the central estimate and risk margin. The Bayesian model will give the same central estimate as a spreadsheet model would. While we have presented a chain ladder and PPCI model, there is no reason why a Bayesian PCE or PPCF could not be used. If you can build a model in a spreadsheet, it will generally be possible to build a Bayesian version of the model. It is also possible to build more complicated models, for example, a PPCF in operational time, where the average claim sizes are calculated using a GLM of some sort Bayesian models can be used to fit GLMs, and the PPCF model could then combine the GLM component with the PPCF component. Page 16

Distribution of each of the parameters Bayesian models also give a distribution of all of the stochastic parameters in the model. You can look at any or all of these distributions, in order to check for things such as the reasonableness of assumptions, or what the key drivers of overall volatility are. The figures below show the probability density functions of development factors from the Bayesian chain ladder on numbers presented in chapter 1. The second subscript identifies which development period is shown, for example, devfac[10,3] shows the distribution of the period 3 development factor: Figure 2.2 Chain Ladder Model Distribution of Development Factors These distributions can also be used when thinking about how to specify prior distributions. Page 17

With an uninformative prior, the resulting distribution is essentially the likelihood. 7 Looking at the distribution can help to decide if you could observe a parameter different from what the data suggests. For example, could you observe a development factor of 1.4 in the second development period? The following figure shows the distribution for just development factor 2. Figure 2.3 Development Period 2 Distribution 0.09 0.08 Probability Density 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 Development Factor 1.60 1.70 1.80 1.90 2.00 Purely based on the data, the chance of seeing a development factor that is 1.4 or greater is around 3.8%. So a selection for an expected value of 1.4, without anything else informing our judgment, would appear conservative. However, you may think 1.4 is appropriate based on additional information, for example: this distribution is based purely on the actual development factors seen to date. This assumes future experience will be similar to the data present in our triangle. You may have more data available, or there may be external factors supporting your belief that the future will be different (in which case you could use an appropriate prior distribution to inform the data in the model). the 3.8% is based on a normal distribution. If you assume, say, a LogNormal distribution, the probability of seeing a factor that is 1.4 or greater is higher at around 4.9%. The important point to note is that full distributions of all of the factors in the model are available for diagnostic and selection purposes. 7 Not quite: it is the likelihood with some uncertainty over the parameters. The pure likelihood takes our data as given, and assumes the data gives the true mean and standard deviation. In Bayesian statistics, our data are simply a sample of the true underlying distribution, so our model will also capture some uncertainty in the parameters themselves. Page 18

Chapter 3 Distribution shape In this part of the paper we look at whether Chain Ladder on Numbers factors, Incurred Cost Development factors, and Payments Per Claim Incurred amounts are more likely to come from a Normal distribution (no skewness), or a skewed distribution (such as a Gamma or LogNormal distribution). Choosing the most appropriate distribution shape can have a significant impact on the central estimate and risk margin. For a 15 by 15 sized triangle there will be 14 development factors for the first development period, 13 factors for the second period, 12 for the third period, and so on. Determining the best distribution shape (Normal, LogNormal, Gamma, etc.) based on a sample of 14 observations is difficult. A modest difference in a single observation could mean that a Normal distribution appears to be a better fit than a LogNormal distribution, or vice versa. Each individual portfolio is too small to provide any definitive results. In order to counter this, we tested the best fitting distribution shape to factors from 70 different portfolios, drawn from 13 different insurers. By testing 70 different portfolios, we aim to discover if particular distribution shapes are appropriate more often than not. Distribution shape The figure below shows a normal distribution: 0.025 Figure 3.1 Normal Distribution 0.020 0.015 0.010 0.005 0.000 2 3 4 5 6 7 8 Normal distribution (mean 5, sd 1) Mean, Mode, and Median In a Normal distribution there is no skewness, ie., the mean, mode, and median are all the same. Page 19

In a skewed distribution, the mode (most likely) can be quite different from the median (50 th percentile) and the mean (expected value), depending on the level of skewness. The figure below shows a Gamma distribution: 0.010 0.008 0.006 0.004 0.002 Figure 3.2 Gamma Distribution 0.000 0 2 4 6 8 10 Gamma distribution (shape 2, scale 1) Mode Median Mean For a skewed distribution, where we have a representative sample, the sample average may not be a good estimate of the true mean, depending on the presence of tail observations in the data. This issue is further exacerbated when attempting to measure the 75 th or 90 th percentile. Testing skewness We tested the factors from 70 different portfolios, from 13 different insurers. We split the data into two groups short tail (about 50 portfolios) and long tail (about 20). We tested whether the factors from each portfolio were best fit by the following distributions: Normal LogNormal Uniform Gamma Exponential Pareto For each portfolio, we split the triangles into development periods. Most of the triangles were either 39 by 39 or 28 by 28, giving a reasonable number of factors for early development periods. Page 20

Figure 3.3 Example of Development Factors by Development Period Dev Dev Dev Dev Dev period 1 period 2 period 3 period 4 period 5 2.000 1.316 1.020 1.137 1.034 2.333 1.245 1.082 1.061 1.043 2.235 1.237 1.106 1.000 1.173 1.435 1.242 1.171 1.167 1.000 2.643 1.081 1.325 1.057 1.071 1.947 1.108 1.317 1.093 1.864 1.000 1.000 1.609 1.216 2.667 Our tests involved creating a new model for each of the distributions. For each development period, for each portfolio, we used WinBUGS, with an uninformative prior, to fit a distribution to the factors. For a single development factor period, we used the Deviance Information Criteria (DIC) to estimate which distribution was the best. Lunn, Thomas, Best, and Spiegelhalter (2000), Congdon (2006) and Spiegelhalter (2006) discuss the mathematics behind the DIC. The DIC measures, roughly, the deviance between our data points and what our fitted distribution would have predicted the data points to be. It is a standard test used for testing the goodness of fit of different Bayesian models. We used uninformative priors, so WinBUGS estimated the parameters to give a distribution that fit the data as closely as possible, given the particular distribution shape / family. The best distribution is the one with the lowest deviance. A very small difference between the deviance of the best fitting distribution and the deviance of the Normal distribution suggests that both distributions could fit the data equally well. It is only when the difference in deviance is big enough that you could conclusively say that a particular distribution is a better fit than a Normal distribution. A rule of thumb is that a difference in deviance of more than 7 to 10 is big enough (Holsinger, 2010). We used a difference of 8 in our tests if the difference in deviance between a Normal distribution and the best fit is less than 8, we say Normal is the best fit. Only if the difference in deviance is 8 or more do we say that the other distribution is the best fit. The results are set out in the tables below. Page 21

Table 3.1 Best Fit Distribution - Chain Ladder on Reported Numbers Factors Short tail portfolios Development 1 2 3 4 8 12 Total Probably Normal 90.2% 95.0% 100.0% 100.0% 100.0% 100.0% 96.8% Exponential 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Gamma 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% LogNormal 9.8% 5.0% 0.0% 0.0% 0.0% 0.0% 3.2% Pareto 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Long tail portfolios Development 1 2 3 4 8 12 Total Probably Normal 78.3% 73.9% 91.3% 95.2% 100.0% 100.0% 88.3% Exponential 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Gamma 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% LogNormal 21.7% 26.1% 8.7% 4.8% 0.0% 0.0% 11.7% Pareto 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Table 3.1 shows that, for chain ladder factors, which measure the development of reported numbers of claims, Normal distributions are appropriate for nearly all of our short tail portfolios, and for most of our long tail portfolios. At earlier periods of development, a minority of long tail portfolios have LogNormally distributed development factors. Table 3.2 Incurred Cost Development Factors Short tail portfolios Development 1 2 3 4 8 12 Total Probably Normal 43.5% 80.0% 90.9% 88.4% 95.8% 100.0% 79.3% Exponential 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Gamma 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% LogNormal 56.5% 20.0% 9.1% 11.6% 4.2% 0.0% 20.7% Pareto 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Long tail portfolios Development 1 2 3 4 8 12 Total Probably Normal 47.8% 52.2% 56.5% 73.9% 83.3% 89.5% 69.6% Exponential 0.0% 0.0% 0.0% 0.0% 4.2% 0.0% 0.6% Gamma 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% LogNormal 52.2% 47.8% 43.5% 26.1% 12.5% 10.5% 29.7% Pareto 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Table 3.2 shows that, for incurred cost development factors, Normal distributions are appropriate for most of our short tail portfolios, but for only around half of our long tail portfolios. The first three development periods for nearly half of all long tail portfolios are significantly better modelled with a LogNormal distribution. For longer development periods (out to 12 development periods), a small number (around 10%) of long tail portfolios are better modelled with a LogNormal distribution. Page 22

Table 3.3 Payments Per Claim Incurred Amounts Short tail portfolios Development 1 2 3 4 8 12 Total Probably Normal 28.9% 64.9% 56.8% 37.1% 29.0% 9.5% 38.8% Exponential 2.6% 0.0% 5.4% 2.9% 9.7% 38.1% 8.3% Gamma 0.0% 8.1% 5.4% 5.7% 6.5% 0.0% 4.4% LogNormal 47.4% 16.2% 8.1% 17.1% 48.4% 47.6% 30.6% Pareto 21.1% 10.8% 24.3% 37.1% 6.5% 4.8% 18.0% Long tail portfolios Development 1 2 3 4 8 12 Total Probably Normal 9.5% 52.4% 36.8% 40.0% 15.8% 0.0% 24.6% Exponential 0.0% 0.0% 5.3% 5.0% 10.5% 18.8% 6.9% Gamma 4.8% 4.8% 15.8% 0.0% 0.0% 6.3% 4.6% LogNormal 57.1% 33.3% 21.1% 25.0% 15.8% 25.0% 29.2% Pareto 28.6% 9.5% 21.1% 30.0% 57.9% 50.0% 34.6% Not unexpectedly, Table 3.3 shows that payments per claim incurred amounts tend to be better modelled by non-normal distributions. This is particularly the case for long tail portfolios, where a variety of other distribution shapes, primarily LogNormal and Pareto, are significantly better than Normal distributions at modelling the factors. To develop a quick rule of thumb as to when you should consider a non-normal distribution, we have compared the sample skewness of each data set to the best fitting distribution for ICD and PPCI factors. Figure 3.4 shows the results for ICD factors, and Figure 3.5 shows the results for PPCI factors: Figure 3.4 ICD Factor Skewness Number of fits 80 70 60 50 40 30 20 10 0 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% -4-3 -2-1 0 1 2 3 4 5 6 Skewness Normal Not normal Not normal - (% of fits) Percentage not Normal Page 23

Figure 3.5 PPCI Skewness Number of fits 100 90 80 70 60 50 40 30 20 10 0-3 -2-1 0 1 2 3 4 5 6 7 Skewness 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Normal Exponential Gamma LogNormal Pareto Not normal - % Percentage not Normal For ICD models, most data sets have some skewness. Nonetheless, a Normal distribution tends to be adequate while the sample skewness is less than about 3. Where sample skewness is 3 or higher, LogNormal distributions tend to be more appropriate. For PPCI amounts, non-normal distributions tend to be appropriate once skewness is above 2. A good rule of thumb, then, is if your sample skew for a number of development periods is 3 or greater, you should consider using a non-normal distribution. Impact In some cases using a skewed distribution (such as a LogNormal) to model our parameters instead of a Normal distribution will lead to a material difference in the central estimate, and in other cases it will not. In addition to possible differences in the central estimate, using skewed distributions can have an impact on risk margins, as the results for the 75 th or higher percentile can change. Finding that LogNormally distributed development factors can produce the same result as Normally distributed development factors (at least when using uninformative prior distributions, so our model is entirely data driven) can be somewhat counter intuitive. We know that a LogNormal distribution is skewed, and would generally expect this to lead to a higher central estimate. However, the impact of changing distribution will depend on the shape of the data. Development factors are often clumped around 1, particularly for later development periods. For data clumped around 1, with a relatively narrow range, the difference between the best fitting Normal distribution and the best fitting LogNormal distribution can be very small. Figure 3.6 demonstrates this. Page 24

Figure 3.6 Normal and LogNormal Distributions Fit to Data 0.05 Data Normal Distribution LogNormal Distribution 0 0.5 1 1.5 In this particular instance, the data is clumped relatively equally around 1. The Normal and LogNormal distributions both look very similar over this particular range of data. Of course, we often do not have the volume of data shown in Figure 3.6. Instead, we have something similar to Figure 3.7, where the small data bars are equivalent to 1 data point, and the large data bars are equivalent to 2. 0.05 Figure 3.7 Normal and LogNormal Distributions Fit to Data Data Normal Distribution LogNormal Distribution 0 0.5 1 1.5 An important factor to keep in mind is that the ultimate result is the product of a number of development factor distributions. Therefore, even relatively modest differences in distributions can potentially produce materially different results when a number of development factors are compounded together. Page 25

The following figures show the distribution of outstanding claims for three different portfolios where we have used a Normal distribution on factors, followed by a LogNormal distribution. 8 These models all had uninformative priors. The first table and figure shows the results from an ICD model on a short tail, stable, portfolio. Table 3.4 Short-tail ICD Results Mean 75% 95% Normal 5,442 5,872 6,526 LogNormal 5,453 5,884 6,567 Difference 0.2% 0.2% 0.6% 0.035 0.030 Figure 3.8 Short-tail ICD Distribution Probability Density 0.025 0.020 0.015 0.010 0.005 0.000 3,000 3,500 4,000 4,500 5,000 5,500 6,000 Reserve Normal ICD factors 6,500 7,000 7,500 8,000 LogNormal ICD factors The results show very little difference between the Normal and LogNormal distributions. This particular portfolio is quite short tail, with fairly stable development. 8,500 8 The absolute size of the results has been masked to ensure confidentiality. Page 26

The second table and figure show the results from an ICD model, on a portfolio with a somewhat longer tail, and with more volatility than the first set of results: Table 3.5 Longer-tail ICD Results Mean 75% 95% Normal 4,930 6,862 10,251 LogNormal 5,146 7,174 11,189 Difference 4.4% 4.6% 9.2% Probability Density 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Figure 3.9 Longer-tail ICD Distribution 000 1,000 2,000 3,000 4,000 5,000 6,000 7,000 Reserve Normal ICD factors 8,000 9,000 10,000 11,000 12,000 LogNormal ICD factors 13,000 14,000 In this case, the model with LogNormally distributed parameters does produce a slightly higher result, although it is still a fairly modest difference at 4%. The shape of the distribution is slightly different, and at the 95% level, the LogNormally distributed parameters are suggesting a reserve that is around 9% higher. This particular portfolio had slightly longer development than our first one, and development factors that are substantially more volatile the relative volatility for each column of development factors is about 5 times higher than in the first portfolio. Page 27

The third table and figure shows the results from a PPCI model: Table 3.6 Long-tail PPCI Results Mean 75% 95% Normal 13,043 13,801 14,908 LogNormal 24,847 22,327 54,914 Difference 91% 62% 268% Probability Density 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Figure 3.10 Long-tail PPCI Distribution 5,600 7,200 8,800 10,400 12,000 13,600 15,200 16,800 18,400 Reserve Normal PPCI amounts 20,000 21,600 23,200 24,800 26,400 28,000 29,600 LogNormal PPCI amounts The model with LogNormally distributed PPCI factors produces a significantly higher result that the model with Normally distributed PPCI factors. The actual PPCI amounts for this portfolio are very volatile and show a significant level of skewness. There are occasional very large payments, and this drives the higher level of reserves. The shape of the distribution between the two models is very different, and the model with LogNormally distributed PPCIs has a significantly fatter tail. Small samples It stands to reason that the average of a small sample drawn from a skewed distribution will often be less than the average of the underlying distribution (and see Fleming, 2008, and Houltram, 2003). In fact, the average of a small sample is likely to be closer to the mode of the true distribution than to the mean of the true distribution. We have seen that longer tail portfolios, in particular, are more likely to have skewed distributions of actuarial parameters. Taking the average of the observed parameters may lead to material understatement of the outstanding claims reserve. Page 28

Bayesian stochastic reserving models can be constructed to allow for this skewness, using prior distributions. This is discussed in further detail in Chapter 4. Page 29

Chapter 4 Prior distributions In this part of the paper we consider prior distributions, and the impact that choosing an informative prior can have on our results. Choosing priors In the example model presented in Chapter 1, each of the development factors was a distribution, calculated as the combination of the likelihood (data) and the prior (actuarial judgment). Figure 4.1 Components of a Bayesian Development Factor Dev fac Col 1 Col 1 Mean Likelihood (data) Col 1 Std Dev Likelihood (data) X Prior X Prior Up to now our prior distributions have been uninformative, or flat, such as a Uniform distribution over a very large range. Adding some shape to the priors, eg., assuming a Normal distribution, will begin to change the resulting development factor distribution. Up to now we have been focusing on the likelihood, and assuming the prior was uninformative. We have been using Bayesian statistics as a handy tool for statistical inference (that is, generating the likelihood distribution) and combining multiple distributions (multiplying out parameters). Introducing a prior distribution moves us into a fully Bayesian world. The final answer will be a blend of the likelihood of the data, and the pre-specified prior distribution. The prior must be specified as a distribution. This means you need to specify all of the parameters of the prior (or even distributions of these parameters). For a Normally distributed prior, this would involve specifying the mean and the standard deviation. Page 30