Modeling Private Firm Default: PFirm

Similar documents
Credit Card Default Predictive Modeling

Simple Fuzzy Score for Russian Public Companies Risk of Default

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

MOODY S KMV RISKCALC V3.1 UNITED STATES

MOODY S KMV RISKCALC V3.1 UNITED KINGDOM

MOODY S KMV RISKCALC V3.1 BELGIUM

Z-Score History & Credit Market Outlook

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

MODELLING CREDIT RISK FOR INNOVATIVE FIRMS: THE ROLE OF INNOVATION MEASURES

Predictive Modeling Cross Selling of Home Loans to Credit Card Customers

BondScore 3.0: A Credit Risk Model for Corporate Debt Issuers

Assessing Bankruptcy Probability with Alternative Structural Models and an Enhanced Empirical Model

Lecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit

MOODY S KMV RISKCALC V3.1 GERMANY

RiskCalc 4.0 France MODELING METHODOLOGY. Abstract

Dynamic Corporate Default Predictions Spot and Forward-Intensity Approaches

Predicting Australian Takeover Targets: A Logit Analysis

MOODY S KMV RISKCALC V3.2 JAPAN

Predicting Companies Delisting to Improve Mutual Fund Performance

Predicting Financial Distress: Multi Scenarios Modeling Using Neural Network

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing

MOODY S KMV RISKCALC V3.1 FRANCE

Loan Approval and Quality Prediction in the Lending Club Marketplace

Financial Distress Models: How Pertinent Are Sampling Bias Criticisms?

RATING COMPANIES A SUPPORT VECTOR MACHINE ALTERNATIVE

Validating the Public EDF Model for European Corporate Firms

Portfolio Analysis with Random Portfolios

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Modelling Bank Loan LGD of Corporate and SME Segment

CREDIT SCORING & CREDIT CONTROL XIV August 2015 Edinburgh. Aneta Ptak-Chmielewska Warsaw School of Ecoomics

ASSESSING CREDIT DEFAULT USING LOGISTIC REGRESSION AND MULTIPLE DISCRIMINANT ANALYSIS: EMPIRICAL EVIDENCE FROM BOSNIA AND HERZEGOVINA

THE PROPOSITION VALUE OF CORPORATE RATINGS - A RELIABILITY TESTING OF CORPORATE RATINGS BY APPLYING ROC AND CAP TECHNIQUES

The analysis of credit scoring models Case Study Transilvania Bank

Claim Risk Scoring using Survival Analysis Framework and Machine Learning with Random Forest

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Credit Risk Scoring - Basics

MOODY S KMV RISKCALC V3.1 DENMARK

Using Financial Ratios to Select Companies for Tax Auditing: A Preliminary Study

Loan Approval and Quality Prediction in the Lending Club Marketplace

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

Financial Distress Prediction Using Distress Score as a Predictor

Modeling Credit Risk of Portfolio of Consumer Loans

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Assessing the probability of financial distress of UK firms

Z-score Model on Financial Crisis Early-Warning of Listed Real Estate Companies in China: a Financial Engineering Perspective Wang Yi *

A case study on using generalized additive models to fit credit rating scores

A DECISION SUPPORT SYSTEM TO PREDICT FINANCIAL DISTRESS. THE CASE OF ROMANIA

Bridging the Gap of Missing Company Financials to Estimate Credit Risk

Risk and Risk Management in the Credit Card Industry

Using analytics to prevent fraud allows HDI to have a fast and real time approval for Claims. SAS Global Forum 2017 Rayani Melega, HDI Seguros

The CreditRiskMonitor FRISK Score

Moody s RiskCalc Model for Privately-Held U.S. Banks

Statistical Case Estimation Modelling

MWSUG Paper AA 04. Claims Analytics. Mei Najim, Gallagher Bassett Services, Rolling Meadows, IL

MOODY S KMV RISKCALC V3.1 SWEDEN

Developing WOE Binned Scorecards for Predicting LGD

Session 5. Predictive Modeling in Life Insurance

Credit Risk Modeling Using Excel and VBA with DVD O. Gunter Loffler Peter N. Posch. WILEY A John Wiley and Sons, Ltd., Publication

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time

Tendencies and Characteristics of Financial Distress: An Introductory Comparative Study among Three Industries in Albania

Predicting Economic Recession using Data Mining Techniques

CreditEdge TM At a Glance

MOODY S KMV RISKCALC V3.1 SOUTH AFRICA

Synthesizing Housing Units for the American Community Survey

Estimation of a credit scoring model for lenders company

Predicting and Preventing Credit Card Default

SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS. May 2006

The Role of Cash Flow in Financial Early Warning of Agricultural Enterprises Based on Logistic Model

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

Session 57PD, Predicting High Claimants. Presenters: Zoe Gibbs Brian M. Hartman, ASA. SOA Antitrust Disclaimer SOA Presentation Disclaimer

Innovation and Financial Stability

Wage Determinants Analysis by Quantile Regression Tree

Decision Trees An Early Classifier

Computing the Probabilities of Closing of 10b-5 Securities Class Action Cases

Bloomberg. Portfolio Value-at-Risk. Sridhar Gollamudi & Bryan Weber. September 22, Version 1.0

Session 40 PD, How Would I Get Started With Predictive Modeling? Moderator: Douglas T. Norris, FSA, MAAA

The Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used.

Enforcing monotonicity of decision models: algorithm and performance

Creation and Application of Expert System Framework in Granting the Credit Facilities

Creation Bankruptcy Prediction Model with Using Ohlson and Shirata Models

Bayesian Methods for Improving Credit Scoring Models

2 all subsequent nodes. 252 all subsequent nodes. 401 all subsequent nodes. 398 all subsequent nodes. 330 all subsequent nodes

Credit Risk Management: A Primer. By A. V. Vedpuriswar

The Basel II Risk Parameters

Better decision making under uncertain conditions using Monte Carlo Simulation

Section 3 describes the data for portfolio construction and alternative PD and correlation inputs.

Web Extension 25A Multiple Discriminant Analysis

THE MOODY S KMV EDF RISKCALC v3.1 MODEL

Expanding Predictive Analytics Through the Use of Machine Learning

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

PRE CONFERENCE WORKSHOP 3

POWER AND LEVEL VALIDATION OF MOODY S KMV EDF CREDIT MEASURES IN NORTH AMERICA, EUROPE, AND ASIA

Using survival models for profit and loss estimation. Dr Tony Bellotti Lecturer in Statistics Department of Mathematics Imperial College London

The Evolution of the Altman Z-Score Models & Their Applications to Financial Markets

An Application of Data Mining Algorithms For Shipbuilding Cost Estimation

Effects of Financial Parameters on Poverty - Using SAS EM

Transcription:

Modeling Private Firm Default: PFirm Grigoris Karakoulas Business Analytic Solutions May 30 th, 2002

Outline Problem Statement Modelling Approaches Private Firm Data Mining Model Development Model Evaluation Explaining Model Prediction Discussion 2

Problem Statement A loan is commonly considered to be in default if any of the following occur: a loan is classified as non-accrual a borrower is 90 days or more past due in its principal or interest payments a borrower has filed for bankruptcy protection a loan is partially or fully written off Only a few quantitative models for private Middle Market firms most banks use judgmental models 3

Problem Statement Growing interest for private firm models due to Basel II Accord and loan securitization Quantitative models can be used as a decisioning tool to: automate mechanical tasks such as financial assessment of a company analyze multidimensional interactions simulate complex what-if scenarios provide early warning signals 4

Problem Statement Given historical data from annual financial statements of defaulted and non-defaulted firms estimate probability P{y t+k X t }, that a firm will default (y=1) within the next K months from the date of financial statements T for a short term horizon model K=12 months 5

Problem Statement Independent variables from the literature Coverage ratios EBIT / interest EBITDA / interest Profitability ratios (net income - extraordinary items) / total assets EBIT / total assets Leverage ratios total liabilities / net worth total liabilities / total assets 6

Problem Statement Independent variables (cont.) Liquidity working capital / total assets current assets / current liabilities cash / total assets Activity ratios accounts payable accounts receivable Growth ratios (net sales, net income) Financial size (assets) 7

Modelling Approaches Discriminant Analysis for estimation of generative models Limitations of DA assumes explanatory variables have a multivariate normal distribution requires the proportion of default/non-default in the sample to be the same in the population linear classification rule 8

Modelling Approaches Probit and Logit (discriminative) models y * t+k = bx t + u t y=1 if y * t+k >=0; y=0 otherwise assumptions about distribution of u t pros: estimation of expected probability of default violation of assumption about distribution of defaults in the population makes parameter estimates biased 9

Modelling Approaches Instead of y i being the (0/1) random variable, suppose the length of time t i that firm i survives is the random variable each firm either defaults during the sample period, survives the sample period, or leaves the sample for some other reason The hazard function h d (t;x,b) gives the instantaneous probability of the length of time t ending with default conditional on surviving up to that time 10

Modelling Approaches With hazard models there is no need to assume independence between firm-year observations as with previous approaches All the above modelling approaches are parametric a lot of effort for crafting the form of the model difficult to capture interactions amongst variables 11

Private Firm Data Mining History of financial statements of Canadian companies since 1991 Exclude real estate firms, financial institutions and government as obligors Data cleansing Database of private firms 2,177 obligors 8,757 financial statements 12

Private Firm Data Mining Candidate Input Variables: 34 financial variables debt service coverage, profitability, liquidity, leverage, activity, growth, financial size type of financial statement 1 for audited and unqualified; 2 for reviewed and compiled; 0 otherwise Target Variable: 0/1 (=default) in the next 12 months from the F/S date 13

Private Firm Data Mining Construct the dataset of observations for each defaulted ( bad ) obligor construct one observation of the input variables from financial statements with date at least 12 months prior to default and no more than 24 months prior to default for each good obligor and for each financial statement date in our database construct an observation of the input variables 14

Private Firm Data Mining Training/Test split of dataset Test (out-of-sample) set contains obligors with F/S dates since 1998/02 (temporal constraint) 454 obligors; 760 F/S records Training set contains obligors not in test set (cross-sectional constraint) and with F/S dates prior to 1998/02 1446 obligors; 4495 records temporal + cross-sectional constraints = true out-of-sample testing 15

Private Firm Data Mining Descriptive statistics of some financial ratios in training set Attributes Median 25% Quartile 75% Quartile Total Assets ($M) 3.947 1.896 10.271 Inventory/COGS 0.1648 0.0861 0.279 Liabilities/Assets 0.693 0.4928 0.85 Net Income Growth 6.235-38.14 77.5 Net Income/Assets 0.0795 0.037 0.1428 Quick Ratio 0.9107 0.5752 1.4496 RE/A 0.2359 0.0786 0.4147 Sales Growth 7.625-1.28 20.94 Cash/Assets 0.0675 0.0148 0.1774 EBIT/Interest 3.33 1.56 8.78 16

Private Firm Data Mining 17

Private Firm Data Mining 18

Model Development Predictive performance true bads: actual defaults (bads) correctly predicted as defaults true goods: actual good obligors correctly predicted as good false bads: actual good obligors incorrectly predicted as defaults (Type II Error) false goods: actual defaults incorrectly predicted as good (Type I Error) In a probabilistic model there is tradeoff between true goods and false goods 19

Model Development Probability Good Cutoff Bad False Goods Predicted Score False Bads How can we induce from data good and bad distributions with little overlap? 20

Model Development Receiver-Operating Characteristic (ROC) curve 100% Perfect Model Proportion of false goods 80% 60% 40% 20% A B Good Model Random Model 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Percentile of p(no default) for goods 21

Model Development Area under ROC curve is the probability that a randomly selected bad obligor will have predicted score of no-default less than that of a randomly selected good obligor a measure of separability of two distributions Use the area under ROC curve as the performance criterion in an algorithm that learns a model from data criterion = 2*AUROC-1 22

Model Development NBTree is an in-house technique for learning a probabilistic model from data a decision tree (discriminant model) where internal nodes are partitioning the data into subsets and each leaf node contains a generative model for estimating conditional probability using variables not in the path to that leaf Let X = [x 1, x 2,, x n ] be the vector of input variables (financial ratios) and Y the output binary variable (default event) 23

Model Development To compute probability of default P{Y=1 x 1,x 2,,x n } one needs to make assumptions for independence amongst input variables NBTree learns these assumptions from data by recursively building a decision tree x 3 <= value 3 > value 3 x 1... PROB PROB = P{Y X,x 3 >value 3 } where X denotes the variables excluding x 3 24

Model Development Feature Selection is a hard problem various heuristic approaches, e.g. forward selection, backward selection Use an in-house feature selection technique based on genetic algorithms for searching for a best subset of input variables such that the NBTree model has the biggest area under the ROC curve 25

Model Development Our feature selection technique selected a best set of model variables (PFirm) Profitability1 Profitability2 Liquidity1 Liquidity2 Leverage1 Profitability3 Leverage2 Leverage3 Growth1 Growth2 26

Model Development Graphical Representation of PFirm Model <= 1016.5 X1 <= 1.205 > 1.205 X2 > 1016.5 PD X3 X4 <= -0.0989 PD >-0.989 PD <= -46 >-46 & >152 < 152 PD PD PD 27

Model Evaluation Four benchmark models (Appendix) on the out-of-sample (test) dataset: RiskCalc 10-variable model NB. Since RiskCalc is continuously recalibrated by Moody s its performance is in-sample rather out-of-sample Altman s 5-variable model by refitting it on our training data Shumway s model by refitting it on our training data NI/TA - TL/TA (naïve predictor) 28

Model Evaluation 29

Model Evaluation Summary of comparisons based on area under ROC curve in previous graphs and accuracy ratio for area under CAP curve PFirm NI/TA- TL/TA RiskCalc Altman Shumway AUROC 0.6628 0.4358 0.5539 0.4774 0.4605 Accurary Ratio 0.6542 0.4324 0.5396 0.4736 0.4578 30

Model Evaluation Main points from evaluation: PFirm seems to be robust in changes in the cycle since it is trained on expansion years and tested on recession years Altman s, Shumway s and naïve-predictor models have almost the same performance they are linear models in contrast to PFirm and RiskCalc that are non-linear and perform better One of the reasons that PFirm is performing better than RiskCalc is because PFirm is capturing co-dependencies amongst variables 31

Explaining Model Prediction Case Study: XYZ Corp. classified date: July 2001 May-98 May-99 May-00 Percentile Rel. Contr. Percentile Rel. Contr. Percentile Rel. Contr. profitability1 53.00% 0.73% 21.00% -4.75% 3.00% -26.63% profitability2 33.00% 0.00% 14.00% 0.00% 3.00% -0.01% liquidity1 40.00% -0.01% 41.00% 0.00% 42.00% 0.00% liquidity2 52.00% 0.00% 30.00% 0.00% 43.00% 0.00% leverage1 39.00% -67.70% 61.00% 58.20% 55.00% 23.42% profitability3 34.00% -31.46% 15.00% -32.66% 8.00% -39.21% leverage2 81.00% 0.09% 99.00% 0.64% 100.00% 2.67% leverage3 45.00% 0.00% 18.00% -0.01% 14.00% -0.01% growth1 NaN NaN 95.00% 1.81% 8.00% -0.70% growth2 NaN NaN 19.00% -1.92% 6.00% -7.35% PD 0.008518 0.008518 0.017634 0.017634 0.692126 0.692126 32

Discussion PFirm is built on in-house techniques for feature selection and model development NBTree is a non-parametric modeling technique that combines the advantages of discriminant and generative techniques The evaluation results show that PFirm performs better than benchmark models including Riskcalc Work underway for incorporating industry factors into PFirm 33

Appendix: Benchmarks RiskCalc: a three stage model total assets net income/assets net income growth interest coverage quick ratio cash & equivalents/assets inventories/gocs sales growth liabilities/assets retained earnings/assets 34

Appendix: Benchmarks Two linear models for predicting the probability of default for 1 and 5 years Each model is estimated in three stages: (i) transform the input data of the model variables into percentiles (binning) (ii) build univariate default models by separately fitting each transformed model variable to the target variable (iii) use the output of the above model to fit a linear probit model for predicting default 35

Appendix: Benchmarks Altman s: logistic regression model Z = b 1 *(WorkingCapital/TotalAssets) + b 2 *(RetainedEarnings/TotalAssets) + b 3 *(EBIT/TotalAssets) + b 4 *(bookequity/totalliabilities) + b 5 *(Sales/TotalAssets) Shumway s: logistic regression model S = b 1 *(NetIncome/TotalAssets) + b 2 *(TotalLiabilities/TotalAssets) + b 3 *(CurrentAssets/CurrentLiabilities) 36

Modelling Private Firm Default: PFirm Grigoris Karakoulas Business Analytic Solutions grigoris.karakoulas@cibc.ca