Could Decision Trees Improve the Classification Accuracy and Interpretability of Loan Granting Decisions?

Size: px

Start display at page:

Download "Could Decision Trees Improve the Classification Accuracy and Interpretability of Loan Granting Decisions?"

Myrtle Walters
6 years ago
Views:

1 Could Decision Trees Improve the Classification Accuracy and Interpretability of Loan Granting Decisions? Jozef Zurada Department of Computer Information Systems College of Business University of Louisville Abstract The paper compares the classification performance rate of eight models: logistic regression (LR), neural network (NN), radial basis function neural network (RBFNN), support vector machine (SVM), case-base reasoning (CBR), and three decision trees (s). We build models and test their classification accuracy rates on a historical data set provided by a German financial institution. The data set contains 21 financial attributes of 1000 customers. Though at the time of loan application all individuals deemed to the institution to be qualified to obtain a loan, 300 of them defaulted upon a loan and 700 paid it off. To obtain reliable and unbiased error estimates for each of the eight models we apply 10-fold cross-validation and repeat an experiment 10 times. We found that in the overall classification accuracy rates at 0.5 probability cut-off, two of the three models significantly outperformed (at α=0.05) the other remaining models. We then concentrate our attention on models and compare their performance at 0.3 and 0.7 cut-off levels which are more likely to be used by financial institutions. The models not only classify better than the other models, but the knowledge they learn in the form of if-then rules is easy to interpret, makes sense, and might be of value to financial institutions which may have to explain the reasons for a loan denial. 1. Introduction In recent years the banking industry has experienced a fast growth in new mortgages for commercial and residential properties, refinancing, credit cards, and home equity loans to name a few. According to [7], for example, about 12% of retail expenditure in 1994 was made using credit cards, amounting to a total of about 36 billion British pounds. This rapid growth also happened in the former East-block Central and East European countries, now members of the European Union. In 2004 Czek and Slovak banks recorded 33.8% and 36.7% increases in their retail loans, respectively [17]. With this growth, however, there have been significant losses from loan defaults. For example, manufacturers Hanover's $3.5 billion commercial property portfolio was burdened with $385 million in non-performing loans [14]. Also, since 2008 many U.S. and world financial institutions reported mounting losses from bad loans and experienced serious levels of financial distress. Some of them sought help from their governments or went bankrupt. In response, financial institutions have been applying traditional statistical methods and various non-parametric AI methods to increase the accuracy of credit decisions. The ultimate goal of these methods is to more precisely detect creditworthy applicants who are granted loan, thereby increasing profits, and non-creditworthy applicants who are denied loan, thus decreasing losses. Even slight improvements in accuracy of predicting creditworthiness of customers can generate substantial future savings. Simultaneously many financial institutions [such as State Farm Agency (SFA) which loans to farmers] are subject to the provisions of the Equal Credit Opportunity Act (ECOA, 1975) and are obliged by law to provide a clear explanation to borrower applicants why a loan is denied. The paper compares the classification performance rate of eight models: logistic regression (LR), neural network (NN), radial basis function neural network (RBFNN), support vector machine (SVM), case-base reasoning (CBR), and three decision trees (s). We build models and test their /10 $ IEEE 1

2 classification accuracy rates on a historical data set provided by a German financial institution. The data set contains 21 financial attributes of 1000 customers. Though at the time of loan application all individuals deemed to the institution to be qualified to obtain a loan, 300 of them defaulted upon a loan and 700 paid it off. To obtain reliable and unbiased error estimates for each of the eight models we apply 10-fold crossvalidation and repeat an experiment 10 times. We found that in the overall classification accuracy rates at 0.5 probability cut-off, two of the three models significantly outperformed (at α=0.05) the other remaining models. We then concentrate our attention on models and compare their performance at 0.3 and 0.7 cut-off levels which are more likely to be used by financial institutions. Several of the mentioned models, however, are considered black boxes as the knowledge they learned is opaque and cannot be easily interpreted. For example, all NN architectures store their knowledge in the internal connections between neurons, called weights, represented by small numeric values. Although there are techniques for extracting understandable knowledge in the form of if-then rules from NNs, this knowledge is generally not reproducible. This means the NN trained on exactly the same data, but initialized with different set of initial weights, is very likely to produce different set of rules. The LR and SVM models are somewhat better understood as the knowledge they learn is expressed in regression coefficients and support vector parameters, respectively. The same refers to neuro-fuzzy systems which store knowledge in the form of fuzzy rules and fuzzy sets and their membership functions. s, however, represent the most transparent, intuitive, and reproducible knowledge because it is represented in the form of simple to explain if-then rules. The rules make sense and are easy to interpret and understand by business managers who would need to approve their implementation as well as explained to loan applicants as the reason for denying a loan. Considering the ECOA provision and the facts that the s perform better than our other models in classifying loans and that the knowledge they learn can be expressed in the form of simple and intuitive if-then rules, the s are an obvious choice for the final models. The paper is organized as follows. Section 2 discusses the previous studies. Section 3 briefly covers several selected methods with the emphasis on s, whereas section 4 describes the data set used in computer simulation. The results are included in section 5. Finally, section 6 concludes the paper and gives some recommendations for future work. 2. Prior research Many studies are reported in literature concerning the accuracy of machine learning methods such as NNs, s, fuzzy systems, genetic algorithms, rough sets, and LR analysis in predicting customer loan default rates. The vast majority of these studies, however, concentrated on the classification accuracy of various models built on data sets drawn from different financial contexts and paid little attention to the models interpretability. For example, Barney et al. [1] compared the performance of NNs and LR analyses in identifying the farmers who had defaulted on their FSA and those farmers who paid off the loans as scheduled. The authors found that NNs outperform LR in correctly classifying farmers into those who made timely payments and those who did not. However, both methods could not clearly explain the reason for a loan denial. Jagielska et al. [9] investigated credit risk classification abilities of NNs, fuzzy logic, genetic algorithms, s, and rough sets and concluded that the genetic/fuzzy approach compared more favorably with the neuro/fuzzy, rough set approaches, and s. Piramuthu [11],[12] investigated the beneficial aspects of using both NNs and neuro-fuzzy systems for credit-risk evaluation decisions. The author employed three real-world data sets that involved credit-risk evaluation in various forms: credit approval, loan default, and bank failure prediction. NNs performed significantly better than neuro-fuzzy systems in terms of classification accuracy, on both training as well as testing data. Unlike the neuro-fuzzy systems, however, which generate if-the rules that could be partially interpreted, the NN could not explain the rationale behind its credit granting/denial decision. West [18] investigated the credit scoring accuracy of five NN architectures and compared them to traditional statistical methods. The neural architectures and traditional models included multilayer perceptron, mixture-of-experts, RBFNN, learning vector quantization, and fuzzy adaptive resonance; and discriminant analysis, LR, k nearest neighbor, kernel density estimation, and s, respectively. Using two real world data sets and testing the models using 10-fold cross-validation, the author found that among neural architectures the mixture-of-experts and RBFNN did best, whereas among the traditional methods LR analysis was the most accurate. Thomas [16] surveyed the techniques for forecasting financial risk of lending to consumers. Yang et al. [20] examined the application of neural 2

3 networks to an early warning system for loan risk assessment. Zurada J. and Zurada M. [22] and Zurada [23] compared the performance of data mining techniques, including decision trees, in predicting the credit worthiness of customers. Also, Feldman and Gross [4] applied decision trees for detecting mortgage default rates. Huang et al. [8] investigated almost a dozen of various classification techniques including a NN, RBFNN, CBR, rule extraction techniques, Naïve Bayesian classifier, and ensemble classifiers. Using a bank s customers checking account information they obtain the overall classification accuracy rates between 70% and 81% for the dependent variable taking three distinct values: declined, risky, and good. Yang [21] used SVMs and kernel attribute selection techniques to enhance the predictive performance of the scoring model and add transparency to the final model, respectively. The author tests the models on two real world data sets containing a large number of attributes and samples, and concludes that kernel methods yield the classification results of about 69% and 70% for good and bad loans, respectively; while the stepwise logistic regression analysis produced the accuracy rates of about 90% and 24% for good and bad classes, respectively. The 24% classification accuracy rate for bad loans was certainly unacceptable. To summarize, it appears that many papers we studied concentrate on the overall classification accuracy rates that different tools produce as well as the percentages of good and bad loans classified correctly. In the available literature, the task of extraction useful rules from the created models as well as important attributes that have the best prediction power seemed to be of the secondary importance. Rules could be easily explained to lenders who would need to approve their implementation as well as loan applicants as the reason for denying a loan. 3. Description of selected methods used in this study In this section we devote our main attention to s. They yielded overall classification results better than the other models and at the same time the rules they produce are intuitive and easy to interpret. As LR and feed-forward NNs with back-propagation are very popular and frequently used in a variety of business and scientific applications, we only describe the three remaining relatively less known methods (CBR, RBFNN, and SVM), though less formally and in much less detail Decision trees s are fairly simple and widely applied tools for classification. A tree is built of nodes and branches. It has 3 types of nodes: a root node (a top node), internal nodes, and leaf nodes. In a binary tree, a top node has no incoming branches and two outgoing branches. Each internal node has exactly one incoming branch and two outgoing branches. Finally, each leaf node has exactly one incoming branch and no outgoing branches. Each leaf node is assigned a class label. Branches coming of the root and other internal nodes contain attribute test conditions to separate cases that have different characteristics. One of the greatest advantages of s is the fact that knowledge can be extracted and represented in the form of classification if-then rules between the input attributes and the target attribute. The operation of s are based on the ID3 or C4.5 divide-and-conquer algorithms [10],[13] and search heuristics which make the clusters at the node gradually purer by progressively reducing impurity in the original data set. The algorithms place the attribute that has the most predictive power at the top node of the tree and they have to find the optimum number of splits and determine where to partition the data to maximize the information gain. The fewer the splits, the more explainable the output is (there are less rules to understand). Selecting the best split is based on the degree of impurity of the child nodes. For example, a node which contains only cases of class good or class bad has the smallest impurity = 0. Similarly, a node that contains an equal number of cases of class good and class bad has the highest impurity = 1. Impurity can be measured by the wellestablished concept of entropy and information gain. Other popular measures include Gini reduction, classification error, and chi square. We formally introduce these measures below. Given a collection S, containing the positive (good) and negative examples (bad) of some target concept, the entropy of S relative to this Boolean classification is Entropy( S) p good log 2 p good pbad log 2 pbad where p good is the proportion of positive examples in S and p bad is the proportion of negative examples in S. If the output variable takes on k different values, then the entropy of S relative to this k-wise classification is defined as Entropy( S) = k i= 1 p i log 2 p i 3

4 Gini ( S ) = 1 k 2 p i i = 1 ( S) = 1 max( pi i Classification error For example, if impurity is measured by entropy, the information gain, Gain(S,A) of an attribute A, relative to a collection of examples S, can be computed as Sv Gain( S, A) Entropy( S) Entropy( S v ) S v Values( A) where Values(A) is the set of all possible values for attribute A, and S v is the subset of S for which attribute A has the value v (i.e., S v = { s S A( s) = v }. Chi-squared splitting criteria measure the reduction in variability of the target distribution in the branch (child) nodes. Specifically, the likelihood ratio Pearson chi-squared test statistic is a measure of association between the categories of the dependent variable and the branch nodes. This test statistic can be used to judge the worth of the split; it measures the difference between the observed cell counts and what would be expected if the branches and target classes were independent. Classifying a test record is straightforward once a has been built. Starting from the root node, one applies the test condition to the case and follows the appropriate branch based on the outcome of the test. This will lead one either to another internal node, for which a new test condition is applied, or to a leaf node. The predicted class label associated with the leaf node is then assigned to the record. The predicted class assigned to each node depends on three factors: (1) the assumed prior probability of each class within the dataset, (2) the decision loss or cost matrix, and (3) the fraction of cases with each outcome in the training data set that end up in each node. The criteria used to assign predicted classes to each node are as follows. If C(good bad) is cost of classifying bad as good, π(bad) is prior probability of bad, N bad is the number of class bad in data set, and N bad (t) is number of class bad in node, Then node is class bad, if C( good bad ) π ( bad ) N bad ( t) N bad > for all C( bad good ) π ( good ) N ( t) N good ) good values of good. For more details on s, refer to [5],[6],[10], [15], and SAS Enterprise Miner: The other selected methods classification tasks and are particularly useful in solving nonlinear problems as they use linear models to implement nonlinear class boundaries. They select a few of critical boundary instances called support vectors from each class and form a linear discriminant function that separates them as widely as possible. Using nonlinear mapping, the algorithms transform the input space into a new space. A linear model built in the new space represents a nonlinear decision boundary in the original space. To implement nonlinear mappings, the SVM algorithms use the polynomial, radial basis function or sigmoid kernel functions. The algorithms also implement a special kind of a model, a maximum margin hyperplane, which gives the greatest separation between the classes. The instances with the minimum distance to the hyperplane are called support vectors. Compared with s, for example, SVMs are slow but often yield accurate classifiers because they create subtle and complex decision boundaries. An RFBNN differs from a feed-forward NN with back-propagation in the way the hidden neurons perform computations. Each neuron represents a point in input space, and its output for a given training pattern depends on the distance between its point and the pattern. The closer these two points are, the stronger the activation. The RFBNN uses nonlinear bell-shaped Gaussian activation functions whose width may be different for each neuron. The output layer forms a linear combination from the outputs of neurons in the hidden layer which are fed to the sigmoid function. A network learns two sets of parameters: the centers and width of the Gaussian functions by employing clustering and the weights used to form the linear combination of the outputs obtained from the hidden layer. As the first set of parameters can be obtained independently of the second set, RFBNN learns almost instantly if the number of hidden units is much smaller than the number of training patterns. Unlike a feed-forward NN with back-propagation, the RBFNN, however, cannot learn to ignore irrelevant attributes because it gives them the same weight in distance computations. Broadly construed, CBR is the method of solving new problems based on the solutions of similar past cases. The method requires no model to be fitted, or function to be estimated. Instead it requires all cases with their known solutions to be maintained in memory, and when a prediction is required, the method recalls items from memory and predicts the value of the target. In solving a new case, the CBR approach retrieves a case it deems sufficiently similar and uses that case as a basis for solving the new case. SVM algorithms form a blend of linear modeling and instance-based learning. They deal with 4

5 The method uses a k-nearest neighbor algorithm to classify cases. The k-nearest neighbor algorithm takes a data set of existing cases and a new case to be classified, where each existing case in the data set is composed of a set of variables and the new case has one value for each variable. The algorithm computes the normalized Euclidean or Manhattan distance for numeric attributes or Hamming distance for nominal or ordinal attributes between each existing case and the new case (to be classified). The k existing cases that have the smallest distances to the new case are the k-nearest neighbors to that case. Based on the target values of the k-nearest neighbors, each of the k-nearest neighbors votes on the target value for a new case. The votes are the posterior probabilities for the class dependent variable. There are two challenging tasks in successful application of CBR, i.e., choosing the right value for k and the proper distance measure. For more details on the three methods refer to [19]. 4. Data set used in the study All data sets used in modeling loan classification tasks have an inherent problem as they contain only those customers actually granted a loan. There are other individuals to whom a loan was not extended while applying for a loan and we do not know whether or not they would have been at risk. In other words, all data sets used in loan granting contexts consisted of records about individuals that banks considered to be creditworthy individuals as all of them obtained a loan. Although these considerations do not affect the validity of the analysis, we should keep them in mind. The data sets used to build loan classification models come from various loan-granting contexts. For example, Desai et al. [2] used 3 unbalanced data sets and 18 features that described financial characteristics of about 900 customers of 3 credit unions. The Australian scoring data [13] were similar but more balanced with 307 and 383 individuals who experienced loan defaults and paid off a loan, respectively. Zurada J. and Zurada M. [22] used a data set containing 5860 samples and 13 variables. In this study, we are using the same or very similar data set to that used by [3], [5], [18], and [23]. The data set contains the following 21 financial attributes (expressed on nominal, ordinal, or interval scales) describing each of 1000 customers: (1) AGE (of applicant in years), (2) AMOUNT (of credit requested), (3) CHECKING (balance in existing checking account), (4) COAPP (other debtors or guarantors), (5) DEPENDS (number of dependents), (6) DURATION (length of loan in months), (7) EMPLOYED (length of present employment at present), (8) EXISTCR (number of existing accounts at this bank), (9) FOREIGN (foreign worker or not), (10) HISTORY (credit history), (11) HOUSING (rent, own, free), (12) INSTALLP (debt as a percent of disposable income), (13) JOB (employment status), (14) MARITAL (marital status and gender), (15) OTHER (other installment loans), (16) PROPERTY (collateral property for loan), (17) PURPOSE (reason for loan request), (18) RESIDENT (years at current address), (19) SAVINGS (savings account balance), (20) TELEPHON (telephone: none or registered under the customer s name), (21) GOOD_BAD (credit rating status: bad [loan denied] or good [loan granted]) output variable. 5. Experiments and results We built and tested eight models (LR, NN, RBFNN, SVM, CBR and three methods, i.e., chisquare, entropy reduction, and Gini reduction) and compared their predictive performance on the same data set containing 1000 samples and 21 attributes. The attributes are commonly used in loan application contexts. We performed 10-fold cross-validation technique and ran each experiment 10 times to obtain unbiased and reliable error estimates. This technique also ensured that training data sets used to build the models were fully independent of the test data sets on which the performance of the models was evaluated. We calculated the average classification accuracy rates over all folds and runs. Using one of the three s as the reference points, we performed a paired two-tailed t-test to find out if the differences in the performance of the models are statistically significant at α=0.05 [19]. We chose to report only the results for the three cutoffs of 0.3 (30%), 0.5 (50%), and 0.7 (70%). For example, a 0.3 cut-off means that Type II error (classifying a bad loan as a good loan) is 3.3 times more costly than the Type I error (classifying a good loan as a bad loan). This cutoff may be applicable to situations in which banks do not secure smaller loans, i.e., do not hold any collateral. The 0.5 cutoff means that the costs of making a Type I and Type II errors are the same, whereas the 0.7 cutoff implies that the cost of making a Type I error is smaller than the cost of Type II error. This cut-off may typically be used when a financial institution secures larger loans by holding collateral such as customer s home. Table 1 shows that at the 0.5 cut-off, the entropy reduction method performs significantly better in the overall classification accuracy rates than the 5

6 seven other methods. The Gini comes close second. These trees are also superior than the remaining models in classifying good loans. However, LR and NN seem to be more effective in detecting bad loans. Because the overall classification accuracy rates of the entropy reduction and Gini s seem to outperform the other six models, we choose s for further analysis. Table 1. Correct classification accuracy rates [%] for the 8 models at a 0.5 probability cut-off (Entropy Reduction) (Chi Square) (Gini) LR NN RBFNN SVM CBR Overall w w 71.8 w 73.5 w 75.5 w 73.6 w Good w w 81.3 w 86.0 w 87.4 w 89.6 w Loans Bad Loans w b 49.6 b b 43.4 w Significantly worse than the entropy reduction at the 0.05 significance level. b Significantly better than the entropy reduction at the 0.05 significance level. Consequently, in Table 2 we decided to present the classification accuracy rates for the three s at the 0.3 and 0.7 cut-off levels. While the differences in the overall classification rates and in detecting good loans are statistically insignificant across all methods at a 0.3 cut-off, the chi square method outperformed the two other tree models in detecting bad loans. Table 2. Correct classification accuracy rates [%] for the 3 models at the 0.3 and 0.5 probability cut-offs (Entropy Reduction) (Chi Square) (Gini) (Entropy Reduction) (Chi Square) (Gini) Cut-off=0.3 Cut-off=0.7 Overall Good Loans Bad Loans b w 5.8 w Significantly worse than the entropy reduction at 0.05 the significance level. b Significantly better than the entropy reduction at 0.05 the significance level. The receiver operating characteristic (ROC) charts are graphical displays that give the global measure of the predictive accuracy of the three models (Figure 1). They display the sensitivity against 1-specificity of a classifier for a range of cutoffs. Sensitivity is a measure of accuracy for predicting events (detecting bad loans) that is equal to the true positive divided by total actual positive. 1- specificity is a measure of accuracy for predicting nonevents (detecting good loans) that is equal to the true negative divided by total actual negative. Each point on the curves represents a cutoff probability. Points closer to the upper-right corner correspond to low cutoff probabilities. Points in the lower left correspond to higher cutoff probabilities. The extreme points (1,1) and (0,0) represent no-data rules where all cases are classified into class 1 or class 0, respectively. The performance quality of the models is demonstrated by the degree to which the ROC curves push upward and to the left. The area under the curves can provide a quantitative performance measure. The area will range from 50, for a worthless model, to 100, for a perfect classifier. The ROC charts indicate that the predictive power of the three models is good and the three curves vary somewhat between the 0.3 and 0.6 cutoff levels. The most noticeable differences between the three models can be seen for the 0.5 cut-off value. 6

7 In Figure 2 we present the which not only exhibited one of the best classification accuracy, but also used the least number of attributes to classify individuals into those who paid the loan off and defaulted upon a loan. All three s have been very consistent in identifying the following 5 input attributes (out of 20) on which the classification rules are based: CHECKING ((balance in existing checking account), DURATION (length of loan in months), SAVINGS (savings account balance), HISTORY (credit history), and PURPOSE (reason for loan request). ROC Charts for Three s cut-off 0.7 Sensitivity cut-off Chi Square Entropy Reduction Gini Worthless model cut-off Specificity Figure 1. The ROC charts for the chi square, entropy reduction, Gini s, and the worthless model The textual form of the pruned tree with 17 leaves and 25 nodes created using the first three variables follows. checking = 1 duration <= 11: 1 (39.0/7.0) duration > 11 savings = 1 duration <= 15: 1 (60.0/25.0) duration > 15: 0 (130.0/48.0) savings = 2 duration <= 15: 0 (3.0) duration > 15: 1 (7.0/2.0) savings = 3: 1 (5.0/2.0) savings = 4: 1 (5.0) savings = 5: 0 (25.0/11.0) checking = 2 savings = 1 duration <= 22: 1 (91.0/25.0) duration > 22: 0 (61.0/19.0) savings = 2: 0 (47.0/23.0) savings = 3: 1 (11.0/3.0) savings = 4 duration <= 10: 0 (2.0) duration > 10: 1 (12.0/2.0) savings = 5: 1 (45.0/7.0) checking = 3: 1 (63.0/14.0) checking = 4: 1 (394.0/46.0) Legend: (a) Checking Account Balance 1 = less than 0 DM 2 = more than 0 but less than 200 DM 3 = at least 200 DM, and 4 = no checking account (b) Length of Loan [in months] (c) Savings Account Balance 1 = less than 100 DM 2 = at least 100, but less than 500 DM 3 = at least 500, but less than 1000 DM 4 = at least 1000 DM 5 = unknown / no savings account 7

8 Figure 2. The example tree The first level split is on the CHECKING attribute and the second and third level splits are on DURATION or SAVINGS. A 1 and 0 following the relational operators is a class label representing good and bad, respectively. The two numbers in parenthesis (39.0/7.0) in a leaf indicate the total number of instances that reach that leaf and the number of misclassified instances, respectively. In Figure 2 we show the graphical form of the tree with the first three variables only. (Because of space limitation and practicality we do not show bigger trees with four or five attributes.) One can see that it is easy to extract the rules from the tree. One of the rules that can be extracted from the left bottom of the tree is: IF checking=1 (less than 0 DM) AND duration > 11 AND savings=1 (less than 100 DM) AND duration >15 months THEN Predicted value of the leaf is 0 (bad) and 82 out of 130 cases (63.1%) are classified correctly as bad loans and 48 out of 130 cases (36.9%) are misclassified. 6. Conclusion and suggestions for future research Using a single data set, the paper evaluates the predictive performance of eight models including three models involving the s. The study also discusses the forms of knowledge learned by these models. The simulation results show that the s overall classification accuracy rates are better than those of the other models and the knowledge they learned during training is represented in the form of easy to understand and intuitive rules that make sense. s are linear models that separate the instance space using lines, planes or hyperplanes and they deal with one variable at a time. Their implementation might be considered by financial institutions which by law may be obliged to give reasons for a loan denial. 8

9 More testing is needed using several larger data sets with different attributes describing financial status of loan applicants. It is also proposed to investigate neuro-fuzzy systems which can learn interpretable fuzzy rules and fuzzy sets from inputoutput data pairs. 7. References [1]. D.K. Barney, O.F. Graves, and J.D. Johnson, "The Farmers Home Administration and Farm Debt Failure Prediction", Journal of Accounting and Public Policy, 1999, Vol. 18, pp [2]. V.S. Desai, J.N. Crook, and G.A. Overstreet, Jr, "A Comparison of Neural Networks and Linear Scoring Models in the Credit Union Environment", European Journal of Operation Research, 1996, Vol. 95, pp [3]. L. Fahrmeir, and A. Hamerle, Multivariate Statistical Modeling Based on Generalized Linear Programs, Springer-Verlag, Berlin, [4]. D. Feldman, and S. Gross, "Mortgage Default: Classification Tree Analysis", Journal of Real Estate Finance and Economics, 2005, Vol. 30, pp [5]. P. Giudici, Applied Data Mining: Statistical Methods for Business and Industry, John Wiley & Sons Ltd., Chichester, West Sussex, England, [6]. J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, [7]. D. Hand, and W. Henley, "Statistical Classification Method in Consumer Scoring: A Review", Journal of the Royal Statistical Society, Series A (Statistics in Society), 1997, Vol. 160, pp [8]. Y-H. Huang, C-M Hung, and H.C Jiau, "Evaluation of Neural Networks and Data mining Methods on a Credit Assessment Task for Class Imbalance Problem", Nonlinear Analysis: Real Worlds Applications, 2006, 7, pp [9]. I. Jagielska, C. Matthews, and T. Whitfort, "An Investigation into the Application of Neural Networks, Fuzzy Logic, Genetic Algorithms, and Rough Sets to Automated Knowledge Acquisition for Classification Problems", Neurocomputing, 1999, Vol. 24, pp [12]. S. Piramuthu, "Feature Selection for Financial Credit- Risk Evaluation Decisions", INFORMS Journal on Computing, 1999, Vol. 11, No. 3, pp [13]. J.R. Quinlan, "Simplifying Decision Trees", International Journal of Man-Machine Studies, 1987, Vol. 27, pp [14]. E. Rosenberg, and A. Gleidt, "Quantitative Methods in Credit Management: A Survey", Operations Research, 1994, Vol. 42, pp [15]. P-N Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison-Wesley, [16]. L.C. Thomas, "A Survey of Credit and Behavioral Scoring: Forecasting Financial Risk of Lending to Consumers", International Journal of Forecasting, 2000, Vol. 16, pp [17]. M. Vojtek, and E. Kocenda, "Credit Scoring Methods", Czech Journal of Economics and Finance, 2006, Vol. 56, Issue 3-4, pp [18]. D. West, "Neural Network Credit Scoring Models", Computers & Operations Research, 2000, Vol. 27, pp [19]. I.H. Witten, and E. Frank, Data Mining: Practical Learning Tools and Techniques Morgan Kaufmann Publishers, [20]. B. Yang, L.X. Li, H. Ji, and J. Xu, "An Early Warning System for Loan Risk Assessment Using Artificial Neural Networks", Knowledge-Based Systems, 2001, Vol. 14, pp [21]. Y. Yang, "Adaptive Credit Scoring with Kernel Learning Methods", European Journal of Operational Research, 2007, Vol. 183, Issue 3, pp [22]. J. Zurada, and M. Zurada, "How Secure are "Good Loans": Validating Loan-Granting Decisions and Predicting Default Rates on Consumer Loans", Review of Business Information Systems, 2002, 6(3), pp [23]. J. Zurada, "Rule Induction Methods for Credit Scoring", Review of Business Information Systems, 2007, Vol. 11, 2, pp [10]. T.M. Mitchell, Machine Learning, WCB/McGraw- Hill, Boston, Massachusetts, [11]. S. Piramuthu, "Financial Credit-Risk Evaluation with Neural and Neurofuzzy Systems", European Journal of Operational Research, 1999, Vol. 112, pp

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

American Journal of Data Mining and Knowledge Discovery 2018; 3(1): 1-12 http://www.sciencepublishinggroup.com/j/ajdmkd doi: 10.11648/j.ajdmkd.20180301.11 Naïve Bayesian Classifier and Classification Trees