The analysis of credit scoring models Case Study Transilvania Bank

Similar documents
Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

Investigating the Theory of Survival Analysis in Credit Risk Management of Facility Receivers: A Case Study on Tose'e Ta'avon Bank of Guilan Province

Credit Card Default Predictive Modeling

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

Estimation of a credit scoring model for lenders company

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

SELECTION BIAS REDUCTION IN CREDIT SCORING MODELS

The Role of Cash Flow in Financial Early Warning of Agricultural Enterprises Based on Logistic Model

The Effect of Expert Systems Application on Increasing Profitability and Achieving Competitive Advantage

Creation and Application of Expert System Framework in Granting the Credit Facilities

Modelling LGD for unsecured personal loans

Implementing the Expected Credit Loss model for receivables A case study for IFRS 9

A Micro Data Approach to the Identification of Credit Crunches

Predicting and Preventing Credit Card Default

A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION

Modeling Private Firm Default: PFirm

Statistical Data Mining for Computational Financial Modeling

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Calculating the Probabilities of Member Engagement

Staff Paper December 1991 USE OF CREDIT EVALUATION PROCEDURES AT AGRICULTURAL. Glenn D. Pederson. RM R Chellappan

SOUTH CENTRAL SAS USER GROUP CONFERENCE 2018 PAPER. Predicting the Federal Reserve s Funds Rate Decisions

Quantitative Measure. February Axioma Research Team

CRIF Lending Solutions WHITE PAPER

THE USE OF PCA IN REDUCTION OF CREDIT SCORING MODELING VARIABLES: EVIDENCE FROM GREEK BANKING SYSTEM

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

International Journal of Research in Engineering Technology - Volume 2 Issue 5, July - August 2017

A Statistical Analysis to Predict Financial Distress

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

ASSESSING CREDIT DEFAULT USING LOGISTIC REGRESSION AND MULTIPLE DISCRIMINANT ANALYSIS: EMPIRICAL EVIDENCE FROM BOSNIA AND HERZEGOVINA

Using Financial Ratios to Select Companies for Tax Auditing: A Preliminary Study

MANAGING CREDIT RISK IN CHANGING TIMES

UNIVERSITY OF CRAIOVA FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION. SUMMARY Of the Ph.D. Thesis PUBLIC DEBT IN ROMANIA

Model Maestro. Scorto TM. Specialized Tools for Credit Scoring Models Development. Credit Portfolio Analysis. Scoring Models Development

Understanding Your FICO Score. Understanding FICO Scores

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

Predicting Non-performing Loans by Financial Ratios for Small and Medium Entities in Lebanon

Simple Fuzzy Score for Russian Public Companies Risk of Default

Analyzing the Determinants of Project Success: A Probit Regression Approach

Web Extension 25A Multiple Discriminant Analysis

ScienceDirect. Detecting the abnormal lenders from P2P lending data

ABSTRACT. KEYWORDS: Credit Risk, Bad Debts, Credit Rating, Credit Indices, Logistic Regression INTRODUCTION AHMAD NAGHILOO 1 & MORADI FEREIDOUN 2

PERCEIVED FINANCIAL LITERACY AND SAVINGS BEHAVIOR OF IT PROFESSIONALS IN KERALA

Improving Lending Through Modeling Defaults. BUDT 733: Data Mining for Business May 10, 2010 Team 1 Lindsey Cohen Ross Dodd Wells Person Amy Rzepka

Session 40 PD, How Would I Get Started With Predictive Modeling? Moderator: Douglas T. Norris, FSA, MAAA

Research Article Design and Explanation of the Credit Ratings of Customers Model Using Neural Networks

CFPB Data Point: Becoming Credit Visible

Z-score Model on Financial Crisis Early-Warning of Listed Real Estate Companies in China: a Financial Engineering Perspective Wang Yi *

STUDY ON CONSUMER ATTITUDE TOWARDS FIXED DEPOSITS AS AN INVESTMENT OPTION IN LOW RATE ENVIRONMENT

ANALYSIS OF ROMANIAN SMALL AND MEDIUM ENTERPRISES BANKRUPTCY RISK

HOW EFFECTIVE ARE REWARDS PROGRAMS IN PROMOTING PAYMENT CARD USAGE? EMPIRICAL EVIDENCE

The Presentation of Financial Crisis Forecast Pattern (Evidence from Tehran Stock Exchange)

A STUDY ON PREDICTION OF DEFAULT PROBABILITY OF AUTOMOBILE DEALERSHIP COMPANIES USING ALTMAN Z SCORE MODEL

Using alternative data, millions more consumers qualify for credit and go on to improve their credit standing

Application of bankruptcy models. on companies from Harghita County

The Conceptual Framework for Financial Reporting

EQUITY RESEARCH AND PORTFOLIO MANAGEMENT

COMPREHENSIVE ANALYSIS OF BANKRUPTCY PREDICTION ON STOCK EXCHANGE OF THAILAND SET 100

BAC INTERNATIONAL BANK (GRAND CAYMAN)

(i) A company with a cash flow problem that is having difficulty collecting its debts.

Chapter 11. Evaluating Consumer Loans

Complying with CECL. We assess five ways to implement the new regulations. September 2017

Chapter 3--Financial Statements, Tools, and Budgets

CFA Level II - LOS Changes

Note on Assessment and Improvement of Tool Accuracy

Financial Distress Models: How Pertinent Are Sampling Bias Criticisms?

CHAPTER 2 THEORITICAL FOUNDATIONS Definition of Risk

factore ALPHACORE INSIGHTS INTRODUCING 1. WHY WE CREATED factore

Financial Statement Analysis. Cash Flow Statement

HOUSEHOLDS INDEBTEDNESS: A MICROECONOMIC ANALYSIS BASED ON THE RESULTS OF THE HOUSEHOLDS FINANCIAL AND CONSUMPTION SURVEY*

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

VantagePoint software

SZENT ISTVÁN UNIVERSITY MANAGEMENT AND BUSINESS ADMINISTRATION PHD SCHOOL GÖDÖLLŐ

Greek household indebtedness and financial stress: results from household survey data

Developing a Bankruptcy Prediction Model for Sustainable Operation of General Contractor in Korea

Predicting Economic Recession using Data Mining Techniques

Snapshot Images of Country Risk Ratings: An International Comparison

Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt*

SUMMARY OF THE DOCTORAL THESIS PUBLIC DEBT AND SOCIAL AND ECONOMIC IMPLICATIONS

Credit Scoring Analysis using LASSO Logistic Regression and Support Vector Machine (SVM)

Model Maestro. Scorto. Specialized Tools for Credit Scoring Models Development. Credit Portfolio Analysis. Scoring Models Development

The Conceptual Framework for Financial Reporting

FARM MANAGEMENT CAPITAL INVESTMENT DECISIONS: METHODS OF ANALYSIS*

9. Logit and Probit Models For Dichotomous Data

We are experiencing the most rapid evolution our industry

Financial Risk Diagnosis of Listed Real Estate Companies in China Based on Revised Z-score Model Xin-Ning LIANG

The Forex Report CORE CONCEPTS

Construction of Investor Sentiment Index in the Chinese Stock Market

Juridical and accounting approach of insolvency at European and national level - Resume of the doctoral thesis -

The Conceptual Framework for Financial Reporting

9. Assessing the impact of the credit guarantee fund for SMEs in the field of agriculture - The case of Hungary

An Empirical Study on Default Factors for US Sub-prime Residential Loans

Research on Chinese Consumer Behavior of Auto Financing

Modelling the potential human capital on the labor market using logistic regression in R

The Influence of Bureau Scores, Customized Scores and Judgmental Review on the Bank Underwriting

Changes in Stock Ownership by Race/Hispanic Status,

Identifying High Spend Consumers with Equifax Dimensions

The Determinants of Bank Mergers: A Revealed Preference Analysis

Rating the Financial Condition of Banks: A StatiStical Approach to Aid Bank Supervision

CONTROVERSIES REGARDING THE UTILIZATION OF ALTMAN MODEL IN ROMANIA

Transcription:

The analysis of credit scoring models Case Study Transilvania Bank Author: Alexandra Costina Mahika Introduction Lending institutions industry has grown rapidly over the past 50 years, so the number of credit applicants is continually growing. Although the decision to grant a loan product is most often based on credit analyst s past experience, intuition is not enough. Hence the need to develop competent credit scoring models. Assessment proves to be relevant when the model scoring highlights not only the client hierarchy when applying for credit, but its typology regarding future payments. Credit scoring models have long been a debated topic in statistics and even artificial intelligence. Continuous research in this field has resulted in increasing the accuracy with which the model can classify a certain consumer. And as increasing the accuracy of the model, even slightly, can mean huge gains for the bank, scientists have tried to develop more sophisticated models. Credit scoring, and even behavior scoring, respond to the bank s need to make safe investments, to applicants who meet the margin of risk that the bank is willing to assume. This paper is intended to be an insight into the domain of scoring, whose main concern is the permanent development of models. The study is based on careful research of the phenomena, that presents as relevant the following analytical methods: logistic regression, discriminant analysis and neural networks. The thesis is divided into three chapters, starting from an overview of the phenomenon

based on a review of the literature, followed by the presentation of analysis used, in order to be translated into a practical approach. The case study was conducted on the scoring model used by BT and aims (using discriminant analysis) to answer the question: Which of the future credit applicants will not get to meet their payments? Moreover, and after dismissal of bad credits, the aim is to highlight (using TOPSIS multi-criteria analysis), which customers, of those who are likely to reach the "no default" rate of less than 4% will the bank provide with an additional product in order to reward their characteristics. The study was conducted on a sample of 250 customers of BT who have been granted a personal loan Practic BT, and whose conduct of payment has been pursued over a period of 12 months from the grant. Based on the model generated from these records, another 50 have been validated (BT prospective customers), and those who were classified with less than 4% probability of failing to honor the payments were again passed through a filter. This time, the new set of variables analyzed refer to upper-class customer loyalty. The scoring determined by discriminant analysis and TOPSIS classification analysis reveal the ideal client typology for BT. Chapter 1 Credit Scoring; Concept and theoretical insight Loans are usually the most important investment of a bank. Banks assume, when granting a loan, risks that are related to the debtors not paying on term. Banks assume a certain amount of risk, since it uses the money of depositors, shareholders (equity) or loans from other banks, to fund the placements. When we analyze credit risk we must remember that it is in close correlation with the risk of reinvestment. A term of the debt default by a customer leads to delays in reinvestment funds that the bank has attracted from the market Credit analysis is the assessment of credit risk, which must be determined by what the bank expects to realize from the lending process, namely earnings. These may be direct

earnings (interest and fees charged by the bank) and indirect gains (selling to customers other products and services after the initiation of this relationship). Credit scoring is the process of modeling the decision to grant credit. This process is conducted by banks or other financial institution and involve statistical methods such as discriminant analysis or time series. Based on the statistical analysis of historical data, financial variables are considered to be important in assessing the financial stability and power of the borrower. This information is summarized and used in a scoring system for the acceptance or rejection of credit. Pre-scoring is a preliminary scoring, calculated at the time of counseling, based on primary information provided by the requesting client without documentation. Scoring is a plurality of predetermined criteria, after which the borrower will be assigned a class defined risk rating and it will get accepted or rejected. The analysis is used both for making the decision on lending, and to determine the maximum degree of indebtedness of the customer taken into account in determining capacity to repay the loan. The lending industry has developed rapidly in last years, focusing on the best methods of determining the efficiency of scoring models. The issue of crime has called into question the effectiveness of established models and hence arises the necessity of perfecting scoring models. In idea of lowering reserves for customers, banks use the principles laid down by Basel II, and determined a number of key factors that must be analyzed to determine the creditworthiness of customers. Multiple studies have contributed to the development of scoring accuracy and consistent hierarchy of statistical models. Despite this, there are still errors of classification, that result in the misclassification of groups of customers who have low potential eligibility.

Yoon Seong Kim and So Young used discriminant analysis and logistic regression to determine a scoring model that not only allows the bank to safely lend a customer, but also predicts its future payment behavior. They classified customers into four categories: client that respected their payments and will continue to do so in the future, that respected their payments and will fall back on payments, that did not respect their payments but will eventually pay, that did not respect their payments and will continue to do so. Lyn C Thomas concluded a stereotype for the bad credit client. He used as a combination of discriminant analysis, linear regression and multi criteria analysis, and concluded that in the scoring model a number of variables can be integrated (history of payment, job security ) to help predict future payment behavior of the candidate. The sample used for analysis included an equal number of borrowers whose payment behavior was consistent with the number of those who have reached the default. Other comparative studies have included that neural networks resulted in interesting results. A study by Altman and others (1994) demonstrated that discriminant analysis, logistic regression and neural networks have achieved approximately the same degree of accuracy. Robert B. Avery, Paul S. Calem, Glenn B. Canner have shown the importance of taking into account circumstance. They raise issues that may impact the accuracy of the scoring. Studies show that failure to take into account the macroeconomic circumstances, can trigger default risk and lead to errors in the econometric model. But it is difficult to incorporate these exceptional circumstances in the model, the main reason being that lending institutions are confronted with limitations in their determination. Linear regression model used in this study shows that factors such as economic situation Overall, employer performance (theoretically elements that do not infer with a customer s profile), can cause difficulties during loan repayments. Although normally the analysis when applying for a credit, is done for individuals / applicants, we can not underestimate the impact that the environment can have on the evolution of the payments.

Chapter 2 Methods of analysis Previously conducted studies to determine factors of influence to scoring were based on three statistical models, or a combination. Although the results revealed that largely independent variables determine the classification of certain categories, it was proved necessary to conduct multiple tests. This paper aims not only to determine variables that influence a customer's creditworthiness, but its payment behavior tracked over a period of 12 months after granting the loan, to discover whether it will respect its premise to make the payments set out in time. Ancestors of these studies demonstrated the importance of customer hierarchy through methods such as:.. logistic regression. neural networks.. discriminant analysis Logistic regression, also called and logit analysis, is a method of multivariate analysis of data increasingly more often used because the number of conditions that must be met, lower than, for example, discriminant analysis. Logistic regression models the relationship between a set of independent variables xi (categorical, continuous) and a dichotomous dependent variable (nominal, binary) Y. Unlike classical regression, one or more variables are not predictive / explanatory (independent variables) but categorical. Neural networks are considered a form of non-linear regression, proved to be highly relevant and suitable for creating models for credit scoring. It was mainly used in scoring corporate site, where data are relatively fewer in the case of retail applicants.

Neural networks and classification trees are expert systems, automated procedures that have learning skills Neural networks have been created that mimic the human brain, a combination of nonlinear regression, discriminant analysis and cluster models. Recently neural networks have been used predominantly building credit scoring models, and have demonstrated the accuracy and superiority over traditional statistical methods like discriminant analysis and logistic regression. But the results are correlated with a large database and only if the independent variables used have a high degree of discrimination (errors occur when the analysis is performed and irrelevant variables). Discriminant analysis has been conducted to work when we face variables that are quantitative and qualitative. It is able to group sets of data based on a crucial variable. It aims to identify factors influencing several quantitative independent variables, by the change of variable quality.. The steps involved in conducting discriminant analysis are: the formulation, estimation, determination of meaning, interpretation, and validation. Chapter 3 Case Study Transilvania Bank Conducting a comparative analysis between the three methods, revealed that, custom for Transilvania Bank, I believed to be relevant a discriminant analysis, this being the methodology used in the creation of the application Anacred. In assessing the creditworthiness of customers, BT is using a scoring developed, based on a discriminant analysis. The application used Anacred, determines not only the maximum amount that can be credited, the indebtedness, but also customer profile of the applicant.

The discriminant analysis was conducted on a sample of 250 records, with clients who have been granted a consumer credit from May 2009 to May 2010. Their payment behavior was followed on a period of 12 months, to determine a customer profile that will default (more than three consecutive delays in payment rates, in 15 +). The aim is to determine customer typology that will not honor their repayment schedule, despite the fact that it is an eligible customer at the time of the analysis of documentation. The purpose of the study is to determine, on a prospective sample of 50 bank customers seeking a loan, which of them, the bank will honor by providing additional product (credit card with a 0 limit) If the determination of variables that will influence the scoring is based on discriminant analysis, determining the ideal customer that will be rewarded with a new product, will be assessed by using TOPSIS method Variables defined for creating the scoring model are primarily qualitative, that is why we chose to scale their intervals. Independent variables used were: age, marital status, home phone, mobile phone, ownership status, last graduated school, years with current employer, works for the state or private, net income, dependents, other loans (#), delays on payment in past 12 months. The dependent variable according to the scoring model was Good / Bad (they had or had nor delayed payments). The analysis was performed in SPSS, with a database of 250 customers that the bank has offered a loan product during the period May 2009 to May 2010 and which have been pursued in payment behavior over a period of 12 months. Another 50 entries have completed data set for validation of customer typology. They have been calculated probability to default, validating predetermined scoring model. Classification results: The table below is used to assess how relevant the discriminant function is, and if it works equally well for each group of dependent variable. In the

initial classification (which formed the model data) 129 of 139 customers were properly classified as applicants who are not going to default, 18 of 28 were ranked accordingly as "bad credits. " Overall 83.1% of the initial cases were properly classified. Among unselected cases, 7 were predicted as "bad" and 43 "good ". The percentage of success is 85.6%, which indicates that the model is relevant, less than 1 in 5 cases are classified incorrectly. Of course this is meant to be improved, which is why BT is always willing to assess their model. Classification Results b,c,d Good / Bad Predicted Group Membership 0 1 Total 0 18 10 28 Count 1 11 128 139 Original 0 64,3 35,7 100 % Cases 1 7,9 92,1 100 Selected 0 18 10 28 Count Crossvalidateda 0 64,3 35,7 100 1 14 125 139 % 1 10,1 89,9 100 0 7 7 14 Count 1 7 62 69 Ungrouped 7 43 50 Cases Not cases Original Selected 0 50 50 100 % 1 10,1 89,9 100 Ungrouped 14 86 100 cases a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. b. 87,4% of selected original grouped cases correctly classified. c. 83,1% of unselected original grouped cases correctly classified. d. 85,6% of selected cross-validated grouped cases correctly classified.

To determine which of the 8 will get the extra product, TOPSIS ranking is processed with a new set of independent variables: debit card, deposits, cash in salary, minimum cash flow, AEGON Insurance, Groupama Insurance, Direct Debit, Internet Banking. Taking into account the 8 credit applicants have resulted from discriminating as having a probability> 96% that could be classified into good customers, and the 8 classification criteria mentioned above, the ideal client BT revealed by TOPSIS is: Last Marital Home Age Cell Phone Ownership graduated Status Phone school 30-50 years Not married Yes Yes Owner Technical Years with current employer Works for the state or private 3+ years Private Income 1500-2000 Lei Dependents Other Loans (#) Delays in payment 0 2 0 Debit Card Internet Banking Direct Debit Deposit Mastercard No 1 Deposit in Euro AEGON Groupama Salary received through BT Turnover No CASCO >12 months >3000 Lei/month Conclusions This paper aims to be an overview of the determinants of scoring and technical models. Applicative part shows a scoring model used by one of the best known financial institutions, BT. The results show that ownership of most of the characteristics is not the goal, rather have the ideal combination. If I were to use another set of records, the study would have not

revealed the exact same profile (a set of variables are redundant), but we can note the weights of each criterion for selection and predict the typology of further applicants.