Non linearity issues in PD modelling Amrita Juhi Lucas Klinkers May 2017
Content Introduction Identifying non-linearity Causes of non-linearity Performance 2
Content Introduction Identifying non-linearity Causes of non-linearity Performance 3
Introduction Economic and / or regulatory capital held for credit risk is driven by three parameters: 1 Probability of Default The probability that a client will be unable to meet its debt obligations in the next 12 months PD (%) 2 Exposure at Default What is the expected exposure at the moment of default? EAD ( ) 3 Loss Given Default How much of the outstanding exposure should we expect to lose? LGD (%) 4
Introduction Economic and / or regulatory capital held for credit risk is driven by three parameters: 1 Probability of Default The probability that a client will be unable to meet its debt obligations in the next 12 months PD (%) PD modelling can be divided in two parts Clients are ranked on the basis of creditworthiness (scorecard model). Scores are converted in probability of defaults (PD model). 3 EAD ( ) LGD (%) A lot of attention in literature has been focused on the ranking performance of the model (Gini coefficient), while attention on the accuracy of PD prediction is less widespread. 5
Introduction In business operations accuracy of PD important for capital calculation purposes (AIRB & Vasicek). Increased accuracy of PD will decrease the Margin of Conservatism (MoC) and regulatory capital. PD s are overestimated which increases the MoC that has to be incorporated in the model *High log-odds mean low PD 6
Introduction Logistic regression maps PD on a [0,1] scale based on explanatory variables. Relationship between PD and explanatory variables is non-linear as shown in the example below. 7
Introduction Linearity assumption logistic regression Explanatory variables are linearly related to the log odds of the PD Risk driver A Risk driver B Risk driver C 8
Introduction Non-linearity can still be present Risk driver A Risk driver B Risk driver C PD Model (Risk driver A, B & C) Even if the individual risk drivers are linearly related to the log-odds, the PD model predictions deviate nonlinearly from observed logodds. 9
Content Introduction Identifying non-linearity Causes of non-linearity Performance 10
Identifying non-linearity 11
Identifying non-linearity 12
Identifying non-linearity Value of Gamma 2 parameter? Significance of Gamma 2 parameter? 13
Identifying non-linearity Run regression with squared factor. Expected values if linearity holds: Constant (γ 0 ) 0 Score (γ 1 ) 1 Score 2 (γ 2 ) 0 14
Identifying non-linearity Example dataset Run regression with squared factor: Constant: 0.15 (p < 0.01) Score: 0.85 (p < 0.01) Score 2 : 0.025 (p < 0.01) Non-linear parameter significantly different from zero. Confirming non-linearity present in this data. 15
Identifying non-linearity Example dataset 16
Content Introduction Identifying non-linearity Causes of non-linearity Performance 17
Causes non-linearity Lloyds banking data from McDonald, Ross et al. (2012): McDonald, Ross et al. (2012) identified possible causes: Differences in distributions of default data and no default data. Correlation between risk drivers and their effect on maximum likelihood conversion. Missing values extrapolate the correlation issue. 18
Causes non-linearity Differences in distribution Difference between distribution of scores of default data and no-default data. 19
Causes non-linearity (1/2) Correlation Maximum likelihood converges well in the case of low correlation. X3 X1 20
Causes non-linearity (2/2) Correlation High correlation causes the parameters to be interchangeable, so hard to define the optimal set. X4 X2 21
Content Introduction Identifying non-linearity Causes of non-linearity Performance 22
Performance Example dataset Bucketing prediction error decreases with non-linear prediction: Bucket Prediction error Non-linear prediction error 1-1.7% 0.3% 2-12% -8% 3-3.9% -3.8% 4 2.9% 0.01% 5 14% 7.7% 6 35% 23% 7 49% 30% 8 36% 12% 9 33% 1.7% 10 8.9% -27% Mean error 16.3% 3.5% Mean absolute error 19.8% 11.4% 23
Conclusions Presence of non-linearity within PD modelling can be identified. Correcting for non-linearity outperforms in terms of prediction error and therefore decreases the MoC. Regulatory Capital PD accuracy 24
Next steps More research into causes of non-linearity and improved identification. Research into different methods of correcting for non-linearity. 25
Questions?
Appendix - Data statistics Correlation 27
Appendix - Bucket performance 28
Appendix - Skewness as indicator 29
Appendix Example data normality 30
Appendix Predicted vs Observed Squared transformation 31
Appendix AIC & SBIC 32
Appendix Math behind score^2 33