Machine Learning Performance over Long Time Frame

Similar documents
Credit Card Default Predictive Modeling

ECS171: Machine Learning

MODELLING HEALTH MAINTENANCE ORGANIZATIONS PAYMENTS UNDER THE NATIONAL HEALTH INSURANCE SCHEME IN NIGERIA

Using survival models for profit and loss estimation. Dr Tony Bellotti Lecturer in Statistics Department of Mathematics Imperial College London

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

Risk and Risk Management in the Credit Card Industry

DFAST Modeling and Solution

Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt*

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Applying Machine Learning Techniques to Everyday Strategies. Ernie Chan, Ph.D. QTS Capital Management, LLC.

Predicting Foreign Exchange Arbitrage

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time

Foreign Exchange Forecasting via Machine Learning

MODELING VOLATILITY OF US CONSUMER CREDIT SERIES

NBER WORKING PAPER SERIES RISK AND RISK MANAGEMENT IN THE CREDIT CARD INDUSTRY

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

International Journal of Forecasting. Forecasting loss given default of bank loans with multi-stage model

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

COMM 324 INVESTMENTS AND PORTFOLIO MANAGEMENT ASSIGNMENT 2 Due: October 20

Expected shortfall or median shortfall

Long-run Consumption Risks in Assets Returns: Evidence from Economic Divisions

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Wide and Deep Learning for Peer-to-Peer Lending

A Study of Probability Estimation Techniques for Rule Learning

Handling Uncertainty in Social Lending Credit Risk Prediction with a Choquet Fuzzy Integral Model

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

Session 5. Predictive Modeling in Life Insurance

Forecasting Agricultural Commodity Prices through Supervised Learning

Confusion in scorecard construction - the wrong scores for the right reasons

σ e, which will be large when prediction errors are Linear regression model

Style Investing with Machine Learning

Article from. Predictive Analytics and Futurism. June 2017 Issue 15

A new look at tree based approaches

Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach

A Comparative Study of Various Forecasting Techniques in Predicting. BSE S&P Sensex

An Examination of Herding Behaviour: An Empirical Study on Nine Sector Indices of Indonesian Stock Market

Regressing Loan Spread for Properties in the New York Metropolitan Area

UPDATED IAA EDUCATION SYLLABUS

Session 79PD, Using Predictive Analytics to Develop Assumptions. Moderator/Presenter: Jonathan D. White, FSA, MAAA, CERA

2018 Predictive Analytics Symposium Session 10: Cracking the Black Box with Awareness & Validation

Predictive Modeling Cross Selling of Home Loans to Credit Card Customers

Interrelationship between Profitability, Financial Leverage and Capital Structure of Textile Industry in India Dr. Ruchi Malhotra

ScienceDirect. Detecting the abnormal lenders from P2P lending data

Lasso and Ridge Quantile Regression using Cross Validation to Estimate Extreme Rainfall

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Economic Adjustment of Default Probabilities

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

MS&E 448 Final Presentation High Frequency Algorithmic Trading

SELECTION BIAS REDUCTION IN CREDIT SCORING MODELS

Predicting and Preventing Credit Card Default

INSTITUTE OF ACTUARIES OF INDIA

Predicting Online Peer-to-Peer(P2P) Lending Default using Data Mining Techniques

The Case for Growth. Investment Research

Journal of Finance and Banking Review. Single Beta and Dual Beta Models: A Testing of CAPM on Condition of Market Overreactions

Loan Approval and Quality Prediction in the Lending Club Marketplace

Factor Performance in Emerging Markets

Relative and absolute equity performance prediction via supervised learning

Unfold Income Myth: Revolution in Income Models with Advanced Machine Learning. Techniques for Better Accuracy

Leasing and Debt in Agriculture: A Quantile Regression Approach

The Duration Derby: A Comparison of Duration Based Strategies in Asset Liability Management

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

Expected Inflation Regime in Japan

Reliable region predictions for Automated Valuation Models

SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS. May 2006

Examining the Morningstar Quantitative Rating for Funds A new investment research tool.

Improving VIX Futures Forecasts using Machine Learning Methods

Scoring Credit Invisibles

Do Stock Prices Fully Reflect Information in Accruals and Cash Flows About Future Earnings?

ABSTRACT. KEYWORDS: Credit Risk, Bad Debts, Credit Rating, Credit Indices, Logistic Regression INTRODUCTION AHMAD NAGHILOO 1 & MORADI FEREIDOUN 2

COMPREHENSIVE ANALYSIS OF BANKRUPTCY PREDICTION ON STOCK EXCHANGE OF THAILAND SET 100

Portfolio replication with sparse regression

Intro to GLM Day 2: GLM and Maximum Likelihood

Journal Of Financial And Strategic Decisions Volume 10 Number 2 Summer 1997 AN ANALYSIS OF VALUE LINE S ABILITY TO FORECAST LONG-RUN RETURNS

Lecture 2. Vladimir Asriyan and John Mondragon. September 14, UC Berkeley

Profit-based Logistic Regression: A Case Study in Credit Card Fraud Detection

State Switching in US Equity Index Returns based on SETAR Model with Kalman Filter Tracking

Behavioral patterns of long term saving : Predictive analysis of adverse behaviors on a savings portfolio

Fundamental, Technical, and Combined Information for Separating Winners from Losers

Monetary policy regime formalization: instrumental rules

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Equity, Vacancy, and Time to Sale in Real Estate.

The Application of the Theory of Power Law Distributions to U.S. Wealth Accumulation INTRODUCTION DATA

P2.T5. Tuckman Chapter 9. Bionic Turtle FRM Video Tutorials. By: David Harper CFA, FRM, CIPM

Chapter 5 Mean Reversion in Indian Commodities Market

Data-Driven Exploration of Factors Affecting Federal Student Loan Repayment

MEASURING RISK-ADJUSTED RETURNS IN ALTERNATIVE INVESTMENTS

Predicting Changes in Earnings: A Walk Through a Random Forest

Dynamic Replication of Non-Maturing Assets and Liabilities

Sentiment Extraction from Stock Message Boards The Das and

A Multi-topic Approach to Building Quant Models. Bringing Semantic Intelligence to Financial Markets

Risk Taking and Performance of Bond Mutual Funds

Non linearity issues in PD modelling. Amrita Juhi Lucas Klinkers

Window Width Selection for L 2 Adjusted Quantile Regression

Introduction to Algorithmic Trading Strategies Lecture 9

Optimal Portfolio Inputs: Various Methods

Five Things You Should Know About Quantile Regression

Lectures and Seminars in Insurance Mathematics and Related Fields at ETH Zurich. Spring Semester 2019

Predicting Student Loan Delinquency and Default. Presentation at Canadian Economics Association Annual Conference, Montreal June 1, 2013

Transcription:

Machine Learning Performance over Long Time Frame Yazhe Li, Tony Bellotti, Niall Adams Imperial College London yli16@imperialacuk Credit Scoring and Credit Control Conference, Aug 2017 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 1 / 29

Acknowledgments Yazhe Li is a PhD student from Department of Mathematics, Imperial College London This work is supervised by Dr Tony Bellotti (Imperial College London) and Professor Niall Adams (Imperial College London) Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 2 / 29

Machine Learning in Consumer Credit Risk Common machine learning methods in the credit risk industry include: Logistic Regression Penalized Logistic Regression Decision Trees Various studies have interested in machine learning algorithms in the credit risk industry: Random Forests Boosted Regression Trees Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 3 / 29

Methods Background Penalized Logistic Regression: penalized logistic regression adds penalty terms to the likelihood function of logistic regression Objective Function = L(β; x) λ [(1 α) 1 2 β 22 + α β 1 ], where λ > 0 and 0 α 1 (1) It is designed for parameter shrinkage and variable selection Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 4 / 29

Methods Background Decision Trees Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 5 / 29

Methods Background Although decision trees have a good interpretability; decision trees also have an unstable nature Several ensemble methods based on the tree model, like boosted regression trees and random forests, are designed Random forests: build approximately uncorrelated trees, and average them Boosted regression trees: sequentially fit many trees to the training set and combine them with their learning rates Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 6 / 29

Research Gaps in Current Literature Several remaining research gaps which are relevant to credit risk issues: 1 Temporal issue: The relationship between the distribution changes in the portfolio (ie population drift) and the credit risk model performance is an area need investigation [5] 2 Extreme class imbalance: High imbalance (one class is rare, compared to the other) is a common problem in the credit risk industry For example, mortgage default rate could be as low as 05% in some data sets How extreme imbalance will influence model behavior in the financial industry Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 7 / 29

Research Hypotheses Two hypotheses prior to our experiment: Non-linear models (machine learning algorithms) are generally superior than linear models in credit risk modelling Since non-linear models can capture the non-linear pattern in the credit data set Parsimonious models are more robust than complex models over time Because high model complexity can lead to overfitting Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 8 / 29

Data Description Freddie Mac (a US federal government sponsored enterprise) provides decade-long US mortgage credit information and contains several extreme low default rate years The characteristics of Freddie Mac data typically address the research gaps: high imbalance and temporal issues Mortgage default status is defined as when a borrower is greater than 180 days due in making a repayment on their home loan In our experiment, the target variable is whether those mortgages moved to the default status in the following two years after the first payment date Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 9 / 29

Data Description Figure: Sample size and default rate from 2003 to 2013 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 10 / 29

Experiment Description After data preparation process, we deploy five models: Balanced Random Forest (BRF) [1], Boosted Regression Trees (BRT), Undersample Boosted Regression Trees (BRTU) [3], Logistic Regression (LR) and Lasso Penalized Logistic Regression (LLR) Experiment Procedure: 1 We use data from an individual year as a training set to train five models (year 2000) 2 Five models are used to forecast the data for the four quarters in the following third year (year 2003) Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 11 / 29

Experiment Description Experiment notes: The two-year gap" in our procedure is designed for recording default status of mortgages in the training set We use AUC as performance metric In forecast process, we bootstrap each quarters data 100 times, in order to calculate the mean and the standard deviation of AUC The efficacy of these models for mortgage default forecasting are observed over a 11-year long time frame (includes the financial crisis period), which allow us to observe performance over an extended period LR is regarded as a reference benchmark, because it is in common use now [6] Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 12 / 29

Empirical Results Figure: Forecast AUC from 2003 to 2013 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 13 / 29

Empirical Results 1 We notice the declining performance of LR in the financial crisis period; however other advanced methods still perform well 2 We never observe one classifier continuously dominates LR performance; there is no clear winner" in this experiment Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 14 / 29

Empirical Results (rank in each quarter) (number of the quarters) We also use the average rank to evaluate these algorithms performance (from 1 best to 5 worst), based on their AUC The rank is: LLR (2) BRF (213) BRT (352) BRTU (356) LR(377) Friedman s test [2] shows that in our experiment, there is a significant difference in different model s performance ranks, Friedman χ 2 = 51727 and p value < 10 9 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 15 / 29

Empirical Results It is important to check to what extent machine learning algorithms perform better than the benchmark algorithm LR Thus the highest rank technique LLR and second best performance BRF are compared with LR (worst performance) by using a permutation test [4], to check whether there is a significant difference in the mean AUC Table: Permutation test p value table Methods p-value AUC Difference LLR vs BRF 03385 00049 LR vs BRF 10 4-00663 LR vs LLR 10 4-00614 p value table shows that both LLR and BRF appear to have better performance than LR However, there is no apparent difference between LLR and BRF Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 16 / 29

Discussion Overall, the results indicate that over long time frame, machine learning algorithms efficacy varies Both LLR and BRF provide a comparatively reliable prediction, significantly outperform LR Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 17 / 29

Discussion 1 LLR: capture important variables it is easily interpreted LLR extends the existing credit scoring standard model (ie LR) 2 BRF: ability to select important variables capacity to handle highly imbalanced data [1] Our initial experiment results show that BRF outperform RF in all 44 quarters Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 18 / 29

Discussion Table: Lasso coefficient table (2005) Variable Coefficient Variable Coefficient Variable Coefficient score -00073 numberborrowers -02688 servicer -05721 LTV -00211 occupancystatuss 07250 OIR 04608 Intercept -47054 other variables 0 Figure: Variable importance of BRF in 2005 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 19 / 29

Discussion 3-Year Gap Figure: Forecast AUC from 2004 to 2013 (3-year gap) Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 20 / 29

Discussion 4-Year Gap Figure: Forecast AUC from 2005 to 2013 (4-year gap) Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 21 / 29

Discussion The two prior hypotheses are contrary to our results: If we use LLR as our linear model, both nonlinear model BRF and linear model LLR provide a reliable forecast Parsimonious model (LR) is not more robust than a complex model (BRF) over time If we increase the time gap to 3 years or 4 years, we find logistic regression still has a declining performance in the financial crisis Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 22 / 29

Conclusion Machine learning algorithms efficacy varies, which shows that continuing to use one kind of model is not appropriate Overall, both LLR and BRF provide a comparatively reliable forecast With gap time increasing, models efficacy decreases The declining performance of LR during the financial crisis is significant Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 23 / 29

Future Work Issues of using logistic regression in highly imbalanced data set and remedies to fix its decline performance in the financial crisis (will be discussed in another talk) In the financial application, the costs of false positive error and false negative error are different; which is critical in measuring models effectiveness for operational purpose Incorporating cost information into model building process is meaningful in the credit risk industry Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 24 / 29

References I [1] L Breiman, C Chen, and A Liaw Using random forest to learn imbalanced data J of Machine Learning Research, (666), 2004 [2] M Friedman A comparison of alternative tests of significance for the problem of m rankings The Annals of Mathematical Statistics, 11(1):86 92, 1940 [3] H He and E A Garcia Learning from imbalanced data IEEE Transactions on knowledge and data engineering, 21(9):1263 1284, 2009 [4] T Hesterberg, D S Moore, S Monaghan, A Clipson, and R Epstein Bootstrap methods and permutation tests Introduction to the Practice of Statistics, 5:1 70, 2005 [5] G Krempl and V Hofer Classification in presence of drift and latency In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 596 603 IEEE, 2011 [6] S Lessmann, B Baesens, H-V Seow, and L C Thomas Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research European Journal of Operational Research, 247(1):124 136, 2015 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 25 / 29

Thanks for your attention! Any questions?

Appendix: Empirical Results SD Figure: SD of forecast AUC from 2003 to 2013 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 27 / 29

Appendix: Empirical Results SD Stability is another important issue to judge the performance of a classifier We find: No algorithm has a continuous lower standard deviation All classifiers standard deviation are relatively low in 2007/2008 Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 28 / 29

Appendix: Empirical Results SD Figure: Mean and SD of AUC vs number of sample points Yazhe Li (Imperial College London) Consumer Credit Risk Aug 2017 29 / 29