Could Decision Trees Improve the Classification Accuracy and Interpretability of Loan Granting Decisions?

Size: px
Start display at page:

Download "Could Decision Trees Improve the Classification Accuracy and Interpretability of Loan Granting Decisions?"

Transcription

1 Could Decision Trees Improve the Classification Accuracy and Interpretability of Loan Granting Decisions? Jozef Zurada Department of Computer Information Systems College of Business University of Louisville Abstract The paper compares the classification performance rate of eight models: logistic regression (LR), neural network (NN), radial basis function neural network (RBFNN), support vector machine (SVM), case-base reasoning (CBR), and three decision trees (s). We build models and test their classification accuracy rates on a historical data set provided by a German financial institution. The data set contains 21 financial attributes of 1000 customers. Though at the time of loan application all individuals deemed to the institution to be qualified to obtain a loan, 300 of them defaulted upon a loan and 700 paid it off. To obtain reliable and unbiased error estimates for each of the eight models we apply 10-fold cross-validation and repeat an experiment 10 times. We found that in the overall classification accuracy rates at 0.5 probability cut-off, two of the three models significantly outperformed (at α=0.05) the other remaining models. We then concentrate our attention on models and compare their performance at 0.3 and 0.7 cut-off levels which are more likely to be used by financial institutions. The models not only classify better than the other models, but the knowledge they learn in the form of if-then rules is easy to interpret, makes sense, and might be of value to financial institutions which may have to explain the reasons for a loan denial. 1. Introduction In recent years the banking industry has experienced a fast growth in new mortgages for commercial and residential properties, refinancing, credit cards, and home equity loans to name a few. According to [7], for example, about 12% of retail expenditure in 1994 was made using credit cards, amounting to a total of about 36 billion British pounds. This rapid growth also happened in the former East-block Central and East European countries, now members of the European Union. In 2004 Czek and Slovak banks recorded 33.8% and 36.7% increases in their retail loans, respectively [17]. With this growth, however, there have been significant losses from loan defaults. For example, manufacturers Hanover's $3.5 billion commercial property portfolio was burdened with $385 million in non-performing loans [14]. Also, since 2008 many U.S. and world financial institutions reported mounting losses from bad loans and experienced serious levels of financial distress. Some of them sought help from their governments or went bankrupt. In response, financial institutions have been applying traditional statistical methods and various non-parametric AI methods to increase the accuracy of credit decisions. The ultimate goal of these methods is to more precisely detect creditworthy applicants who are granted loan, thereby increasing profits, and non-creditworthy applicants who are denied loan, thus decreasing losses. Even slight improvements in accuracy of predicting creditworthiness of customers can generate substantial future savings. Simultaneously many financial institutions [such as State Farm Agency (SFA) which loans to farmers] are subject to the provisions of the Equal Credit Opportunity Act (ECOA, 1975) and are obliged by law to provide a clear explanation to borrower applicants why a loan is denied. The paper compares the classification performance rate of eight models: logistic regression (LR), neural network (NN), radial basis function neural network (RBFNN), support vector machine (SVM), case-base reasoning (CBR), and three decision trees (s). We build models and test their /10 $ IEEE 1

2 classification accuracy rates on a historical data set provided by a German financial institution. The data set contains 21 financial attributes of 1000 customers. Though at the time of loan application all individuals deemed to the institution to be qualified to obtain a loan, 300 of them defaulted upon a loan and 700 paid it off. To obtain reliable and unbiased error estimates for each of the eight models we apply 10-fold crossvalidation and repeat an experiment 10 times. We found that in the overall classification accuracy rates at 0.5 probability cut-off, two of the three models significantly outperformed (at α=0.05) the other remaining models. We then concentrate our attention on models and compare their performance at 0.3 and 0.7 cut-off levels which are more likely to be used by financial institutions. Several of the mentioned models, however, are considered black boxes as the knowledge they learned is opaque and cannot be easily interpreted. For example, all NN architectures store their knowledge in the internal connections between neurons, called weights, represented by small numeric values. Although there are techniques for extracting understandable knowledge in the form of if-then rules from NNs, this knowledge is generally not reproducible. This means the NN trained on exactly the same data, but initialized with different set of initial weights, is very likely to produce different set of rules. The LR and SVM models are somewhat better understood as the knowledge they learn is expressed in regression coefficients and support vector parameters, respectively. The same refers to neuro-fuzzy systems which store knowledge in the form of fuzzy rules and fuzzy sets and their membership functions. s, however, represent the most transparent, intuitive, and reproducible knowledge because it is represented in the form of simple to explain if-then rules. The rules make sense and are easy to interpret and understand by business managers who would need to approve their implementation as well as explained to loan applicants as the reason for denying a loan. Considering the ECOA provision and the facts that the s perform better than our other models in classifying loans and that the knowledge they learn can be expressed in the form of simple and intuitive if-then rules, the s are an obvious choice for the final models. The paper is organized as follows. Section 2 discusses the previous studies. Section 3 briefly covers several selected methods with the emphasis on s, whereas section 4 describes the data set used in computer simulation. The results are included in section 5. Finally, section 6 concludes the paper and gives some recommendations for future work. 2. Prior research Many studies are reported in literature concerning the accuracy of machine learning methods such as NNs, s, fuzzy systems, genetic algorithms, rough sets, and LR analysis in predicting customer loan default rates. The vast majority of these studies, however, concentrated on the classification accuracy of various models built on data sets drawn from different financial contexts and paid little attention to the models interpretability. For example, Barney et al. [1] compared the performance of NNs and LR analyses in identifying the farmers who had defaulted on their FSA and those farmers who paid off the loans as scheduled. The authors found that NNs outperform LR in correctly classifying farmers into those who made timely payments and those who did not. However, both methods could not clearly explain the reason for a loan denial. Jagielska et al. [9] investigated credit risk classification abilities of NNs, fuzzy logic, genetic algorithms, s, and rough sets and concluded that the genetic/fuzzy approach compared more favorably with the neuro/fuzzy, rough set approaches, and s. Piramuthu [11],[12] investigated the beneficial aspects of using both NNs and neuro-fuzzy systems for credit-risk evaluation decisions. The author employed three real-world data sets that involved credit-risk evaluation in various forms: credit approval, loan default, and bank failure prediction. NNs performed significantly better than neuro-fuzzy systems in terms of classification accuracy, on both training as well as testing data. Unlike the neuro-fuzzy systems, however, which generate if-the rules that could be partially interpreted, the NN could not explain the rationale behind its credit granting/denial decision. West [18] investigated the credit scoring accuracy of five NN architectures and compared them to traditional statistical methods. The neural architectures and traditional models included multilayer perceptron, mixture-of-experts, RBFNN, learning vector quantization, and fuzzy adaptive resonance; and discriminant analysis, LR, k nearest neighbor, kernel density estimation, and s, respectively. Using two real world data sets and testing the models using 10-fold cross-validation, the author found that among neural architectures the mixture-of-experts and RBFNN did best, whereas among the traditional methods LR analysis was the most accurate. Thomas [16] surveyed the techniques for forecasting financial risk of lending to consumers. Yang et al. [20] examined the application of neural 2

3 networks to an early warning system for loan risk assessment. Zurada J. and Zurada M. [22] and Zurada [23] compared the performance of data mining techniques, including decision trees, in predicting the credit worthiness of customers. Also, Feldman and Gross [4] applied decision trees for detecting mortgage default rates. Huang et al. [8] investigated almost a dozen of various classification techniques including a NN, RBFNN, CBR, rule extraction techniques, Naïve Bayesian classifier, and ensemble classifiers. Using a bank s customers checking account information they obtain the overall classification accuracy rates between 70% and 81% for the dependent variable taking three distinct values: declined, risky, and good. Yang [21] used SVMs and kernel attribute selection techniques to enhance the predictive performance of the scoring model and add transparency to the final model, respectively. The author tests the models on two real world data sets containing a large number of attributes and samples, and concludes that kernel methods yield the classification results of about 69% and 70% for good and bad loans, respectively; while the stepwise logistic regression analysis produced the accuracy rates of about 90% and 24% for good and bad classes, respectively. The 24% classification accuracy rate for bad loans was certainly unacceptable. To summarize, it appears that many papers we studied concentrate on the overall classification accuracy rates that different tools produce as well as the percentages of good and bad loans classified correctly. In the available literature, the task of extraction useful rules from the created models as well as important attributes that have the best prediction power seemed to be of the secondary importance. Rules could be easily explained to lenders who would need to approve their implementation as well as loan applicants as the reason for denying a loan. 3. Description of selected methods used in this study In this section we devote our main attention to s. They yielded overall classification results better than the other models and at the same time the rules they produce are intuitive and easy to interpret. As LR and feed-forward NNs with back-propagation are very popular and frequently used in a variety of business and scientific applications, we only describe the three remaining relatively less known methods (CBR, RBFNN, and SVM), though less formally and in much less detail Decision trees s are fairly simple and widely applied tools for classification. A tree is built of nodes and branches. It has 3 types of nodes: a root node (a top node), internal nodes, and leaf nodes. In a binary tree, a top node has no incoming branches and two outgoing branches. Each internal node has exactly one incoming branch and two outgoing branches. Finally, each leaf node has exactly one incoming branch and no outgoing branches. Each leaf node is assigned a class label. Branches coming of the root and other internal nodes contain attribute test conditions to separate cases that have different characteristics. One of the greatest advantages of s is the fact that knowledge can be extracted and represented in the form of classification if-then rules between the input attributes and the target attribute. The operation of s are based on the ID3 or C4.5 divide-and-conquer algorithms [10],[13] and search heuristics which make the clusters at the node gradually purer by progressively reducing impurity in the original data set. The algorithms place the attribute that has the most predictive power at the top node of the tree and they have to find the optimum number of splits and determine where to partition the data to maximize the information gain. The fewer the splits, the more explainable the output is (there are less rules to understand). Selecting the best split is based on the degree of impurity of the child nodes. For example, a node which contains only cases of class good or class bad has the smallest impurity = 0. Similarly, a node that contains an equal number of cases of class good and class bad has the highest impurity = 1. Impurity can be measured by the wellestablished concept of entropy and information gain. Other popular measures include Gini reduction, classification error, and chi square. We formally introduce these measures below. Given a collection S, containing the positive (good) and negative examples (bad) of some target concept, the entropy of S relative to this Boolean classification is Entropy( S) p good log 2 p good pbad log 2 pbad where p good is the proportion of positive examples in S and p bad is the proportion of negative examples in S. If the output variable takes on k different values, then the entropy of S relative to this k-wise classification is defined as Entropy( S) = k i= 1 p i log 2 p i 3

4 Gini ( S ) = 1 k 2 p i i = 1 ( S) = 1 max( pi i Classification error For example, if impurity is measured by entropy, the information gain, Gain(S,A) of an attribute A, relative to a collection of examples S, can be computed as Sv Gain( S, A) Entropy( S) Entropy( S v ) S v Values( A) where Values(A) is the set of all possible values for attribute A, and S v is the subset of S for which attribute A has the value v (i.e., S v = { s S A( s) = v }. Chi-squared splitting criteria measure the reduction in variability of the target distribution in the branch (child) nodes. Specifically, the likelihood ratio Pearson chi-squared test statistic is a measure of association between the categories of the dependent variable and the branch nodes. This test statistic can be used to judge the worth of the split; it measures the difference between the observed cell counts and what would be expected if the branches and target classes were independent. Classifying a test record is straightforward once a has been built. Starting from the root node, one applies the test condition to the case and follows the appropriate branch based on the outcome of the test. This will lead one either to another internal node, for which a new test condition is applied, or to a leaf node. The predicted class label associated with the leaf node is then assigned to the record. The predicted class assigned to each node depends on three factors: (1) the assumed prior probability of each class within the dataset, (2) the decision loss or cost matrix, and (3) the fraction of cases with each outcome in the training data set that end up in each node. The criteria used to assign predicted classes to each node are as follows. If C(good bad) is cost of classifying bad as good, π(bad) is prior probability of bad, N bad is the number of class bad in data set, and N bad (t) is number of class bad in node, Then node is class bad, if C( good bad ) π ( bad ) N bad ( t) N bad > for all C( bad good ) π ( good ) N ( t) N good ) good values of good. For more details on s, refer to [5],[6],[10], [15], and SAS Enterprise Miner: The other selected methods classification tasks and are particularly useful in solving nonlinear problems as they use linear models to implement nonlinear class boundaries. They select a few of critical boundary instances called support vectors from each class and form a linear discriminant function that separates them as widely as possible. Using nonlinear mapping, the algorithms transform the input space into a new space. A linear model built in the new space represents a nonlinear decision boundary in the original space. To implement nonlinear mappings, the SVM algorithms use the polynomial, radial basis function or sigmoid kernel functions. The algorithms also implement a special kind of a model, a maximum margin hyperplane, which gives the greatest separation between the classes. The instances with the minimum distance to the hyperplane are called support vectors. Compared with s, for example, SVMs are slow but often yield accurate classifiers because they create subtle and complex decision boundaries. An RFBNN differs from a feed-forward NN with back-propagation in the way the hidden neurons perform computations. Each neuron represents a point in input space, and its output for a given training pattern depends on the distance between its point and the pattern. The closer these two points are, the stronger the activation. The RFBNN uses nonlinear bell-shaped Gaussian activation functions whose width may be different for each neuron. The output layer forms a linear combination from the outputs of neurons in the hidden layer which are fed to the sigmoid function. A network learns two sets of parameters: the centers and width of the Gaussian functions by employing clustering and the weights used to form the linear combination of the outputs obtained from the hidden layer. As the first set of parameters can be obtained independently of the second set, RFBNN learns almost instantly if the number of hidden units is much smaller than the number of training patterns. Unlike a feed-forward NN with back-propagation, the RBFNN, however, cannot learn to ignore irrelevant attributes because it gives them the same weight in distance computations. Broadly construed, CBR is the method of solving new problems based on the solutions of similar past cases. The method requires no model to be fitted, or function to be estimated. Instead it requires all cases with their known solutions to be maintained in memory, and when a prediction is required, the method recalls items from memory and predicts the value of the target. In solving a new case, the CBR approach retrieves a case it deems sufficiently similar and uses that case as a basis for solving the new case. SVM algorithms form a blend of linear modeling and instance-based learning. They deal with 4

5 The method uses a k-nearest neighbor algorithm to classify cases. The k-nearest neighbor algorithm takes a data set of existing cases and a new case to be classified, where each existing case in the data set is composed of a set of variables and the new case has one value for each variable. The algorithm computes the normalized Euclidean or Manhattan distance for numeric attributes or Hamming distance for nominal or ordinal attributes between each existing case and the new case (to be classified). The k existing cases that have the smallest distances to the new case are the k-nearest neighbors to that case. Based on the target values of the k-nearest neighbors, each of the k-nearest neighbors votes on the target value for a new case. The votes are the posterior probabilities for the class dependent variable. There are two challenging tasks in successful application of CBR, i.e., choosing the right value for k and the proper distance measure. For more details on the three methods refer to [19]. 4. Data set used in the study All data sets used in modeling loan classification tasks have an inherent problem as they contain only those customers actually granted a loan. There are other individuals to whom a loan was not extended while applying for a loan and we do not know whether or not they would have been at risk. In other words, all data sets used in loan granting contexts consisted of records about individuals that banks considered to be creditworthy individuals as all of them obtained a loan. Although these considerations do not affect the validity of the analysis, we should keep them in mind. The data sets used to build loan classification models come from various loan-granting contexts. For example, Desai et al. [2] used 3 unbalanced data sets and 18 features that described financial characteristics of about 900 customers of 3 credit unions. The Australian scoring data [13] were similar but more balanced with 307 and 383 individuals who experienced loan defaults and paid off a loan, respectively. Zurada J. and Zurada M. [22] used a data set containing 5860 samples and 13 variables. In this study, we are using the same or very similar data set to that used by [3], [5], [18], and [23]. The data set contains the following 21 financial attributes (expressed on nominal, ordinal, or interval scales) describing each of 1000 customers: (1) AGE (of applicant in years), (2) AMOUNT (of credit requested), (3) CHECKING (balance in existing checking account), (4) COAPP (other debtors or guarantors), (5) DEPENDS (number of dependents), (6) DURATION (length of loan in months), (7) EMPLOYED (length of present employment at present), (8) EXISTCR (number of existing accounts at this bank), (9) FOREIGN (foreign worker or not), (10) HISTORY (credit history), (11) HOUSING (rent, own, free), (12) INSTALLP (debt as a percent of disposable income), (13) JOB (employment status), (14) MARITAL (marital status and gender), (15) OTHER (other installment loans), (16) PROPERTY (collateral property for loan), (17) PURPOSE (reason for loan request), (18) RESIDENT (years at current address), (19) SAVINGS (savings account balance), (20) TELEPHON (telephone: none or registered under the customer s name), (21) GOOD_BAD (credit rating status: bad [loan denied] or good [loan granted]) output variable. 5. Experiments and results We built and tested eight models (LR, NN, RBFNN, SVM, CBR and three methods, i.e., chisquare, entropy reduction, and Gini reduction) and compared their predictive performance on the same data set containing 1000 samples and 21 attributes. The attributes are commonly used in loan application contexts. We performed 10-fold cross-validation technique and ran each experiment 10 times to obtain unbiased and reliable error estimates. This technique also ensured that training data sets used to build the models were fully independent of the test data sets on which the performance of the models was evaluated. We calculated the average classification accuracy rates over all folds and runs. Using one of the three s as the reference points, we performed a paired two-tailed t-test to find out if the differences in the performance of the models are statistically significant at α=0.05 [19]. We chose to report only the results for the three cutoffs of 0.3 (30%), 0.5 (50%), and 0.7 (70%). For example, a 0.3 cut-off means that Type II error (classifying a bad loan as a good loan) is 3.3 times more costly than the Type I error (classifying a good loan as a bad loan). This cutoff may be applicable to situations in which banks do not secure smaller loans, i.e., do not hold any collateral. The 0.5 cutoff means that the costs of making a Type I and Type II errors are the same, whereas the 0.7 cutoff implies that the cost of making a Type I error is smaller than the cost of Type II error. This cut-off may typically be used when a financial institution secures larger loans by holding collateral such as customer s home. Table 1 shows that at the 0.5 cut-off, the entropy reduction method performs significantly better in the overall classification accuracy rates than the 5

6 seven other methods. The Gini comes close second. These trees are also superior than the remaining models in classifying good loans. However, LR and NN seem to be more effective in detecting bad loans. Because the overall classification accuracy rates of the entropy reduction and Gini s seem to outperform the other six models, we choose s for further analysis. Table 1. Correct classification accuracy rates [%] for the 8 models at a 0.5 probability cut-off (Entropy Reduction) (Chi Square) (Gini) LR NN RBFNN SVM CBR Overall w w 71.8 w 73.5 w 75.5 w 73.6 w Good w w 81.3 w 86.0 w 87.4 w 89.6 w Loans Bad Loans w b 49.6 b b 43.4 w Significantly worse than the entropy reduction at the 0.05 significance level. b Significantly better than the entropy reduction at the 0.05 significance level. Consequently, in Table 2 we decided to present the classification accuracy rates for the three s at the 0.3 and 0.7 cut-off levels. While the differences in the overall classification rates and in detecting good loans are statistically insignificant across all methods at a 0.3 cut-off, the chi square method outperformed the two other tree models in detecting bad loans. Table 2. Correct classification accuracy rates [%] for the 3 models at the 0.3 and 0.5 probability cut-offs (Entropy Reduction) (Chi Square) (Gini) (Entropy Reduction) (Chi Square) (Gini) Cut-off=0.3 Cut-off=0.7 Overall Good Loans Bad Loans b w 5.8 w Significantly worse than the entropy reduction at 0.05 the significance level. b Significantly better than the entropy reduction at 0.05 the significance level. The receiver operating characteristic (ROC) charts are graphical displays that give the global measure of the predictive accuracy of the three models (Figure 1). They display the sensitivity against 1-specificity of a classifier for a range of cutoffs. Sensitivity is a measure of accuracy for predicting events (detecting bad loans) that is equal to the true positive divided by total actual positive. 1- specificity is a measure of accuracy for predicting nonevents (detecting good loans) that is equal to the true negative divided by total actual negative. Each point on the curves represents a cutoff probability. Points closer to the upper-right corner correspond to low cutoff probabilities. Points in the lower left correspond to higher cutoff probabilities. The extreme points (1,1) and (0,0) represent no-data rules where all cases are classified into class 1 or class 0, respectively. The performance quality of the models is demonstrated by the degree to which the ROC curves push upward and to the left. The area under the curves can provide a quantitative performance measure. The area will range from 50, for a worthless model, to 100, for a perfect classifier. The ROC charts indicate that the predictive power of the three models is good and the three curves vary somewhat between the 0.3 and 0.6 cutoff levels. The most noticeable differences between the three models can be seen for the 0.5 cut-off value. 6

7 In Figure 2 we present the which not only exhibited one of the best classification accuracy, but also used the least number of attributes to classify individuals into those who paid the loan off and defaulted upon a loan. All three s have been very consistent in identifying the following 5 input attributes (out of 20) on which the classification rules are based: CHECKING ((balance in existing checking account), DURATION (length of loan in months), SAVINGS (savings account balance), HISTORY (credit history), and PURPOSE (reason for loan request). ROC Charts for Three s cut-off 0.7 Sensitivity cut-off Chi Square Entropy Reduction Gini Worthless model cut-off Specificity Figure 1. The ROC charts for the chi square, entropy reduction, Gini s, and the worthless model The textual form of the pruned tree with 17 leaves and 25 nodes created using the first three variables follows. checking = 1 duration <= 11: 1 (39.0/7.0) duration > 11 savings = 1 duration <= 15: 1 (60.0/25.0) duration > 15: 0 (130.0/48.0) savings = 2 duration <= 15: 0 (3.0) duration > 15: 1 (7.0/2.0) savings = 3: 1 (5.0/2.0) savings = 4: 1 (5.0) savings = 5: 0 (25.0/11.0) checking = 2 savings = 1 duration <= 22: 1 (91.0/25.0) duration > 22: 0 (61.0/19.0) savings = 2: 0 (47.0/23.0) savings = 3: 1 (11.0/3.0) savings = 4 duration <= 10: 0 (2.0) duration > 10: 1 (12.0/2.0) savings = 5: 1 (45.0/7.0) checking = 3: 1 (63.0/14.0) checking = 4: 1 (394.0/46.0) Legend: (a) Checking Account Balance 1 = less than 0 DM 2 = more than 0 but less than 200 DM 3 = at least 200 DM, and 4 = no checking account (b) Length of Loan [in months] (c) Savings Account Balance 1 = less than 100 DM 2 = at least 100, but less than 500 DM 3 = at least 500, but less than 1000 DM 4 = at least 1000 DM 5 = unknown / no savings account 7

8 Figure 2. The example tree The first level split is on the CHECKING attribute and the second and third level splits are on DURATION or SAVINGS. A 1 and 0 following the relational operators is a class label representing good and bad, respectively. The two numbers in parenthesis (39.0/7.0) in a leaf indicate the total number of instances that reach that leaf and the number of misclassified instances, respectively. In Figure 2 we show the graphical form of the tree with the first three variables only. (Because of space limitation and practicality we do not show bigger trees with four or five attributes.) One can see that it is easy to extract the rules from the tree. One of the rules that can be extracted from the left bottom of the tree is: IF checking=1 (less than 0 DM) AND duration > 11 AND savings=1 (less than 100 DM) AND duration >15 months THEN Predicted value of the leaf is 0 (bad) and 82 out of 130 cases (63.1%) are classified correctly as bad loans and 48 out of 130 cases (36.9%) are misclassified. 6. Conclusion and suggestions for future research Using a single data set, the paper evaluates the predictive performance of eight models including three models involving the s. The study also discusses the forms of knowledge learned by these models. The simulation results show that the s overall classification accuracy rates are better than those of the other models and the knowledge they learned during training is represented in the form of easy to understand and intuitive rules that make sense. s are linear models that separate the instance space using lines, planes or hyperplanes and they deal with one variable at a time. Their implementation might be considered by financial institutions which by law may be obliged to give reasons for a loan denial. 8

9 More testing is needed using several larger data sets with different attributes describing financial status of loan applicants. It is also proposed to investigate neuro-fuzzy systems which can learn interpretable fuzzy rules and fuzzy sets from inputoutput data pairs. 7. References [1]. D.K. Barney, O.F. Graves, and J.D. Johnson, "The Farmers Home Administration and Farm Debt Failure Prediction", Journal of Accounting and Public Policy, 1999, Vol. 18, pp [2]. V.S. Desai, J.N. Crook, and G.A. Overstreet, Jr, "A Comparison of Neural Networks and Linear Scoring Models in the Credit Union Environment", European Journal of Operation Research, 1996, Vol. 95, pp [3]. L. Fahrmeir, and A. Hamerle, Multivariate Statistical Modeling Based on Generalized Linear Programs, Springer-Verlag, Berlin, [4]. D. Feldman, and S. Gross, "Mortgage Default: Classification Tree Analysis", Journal of Real Estate Finance and Economics, 2005, Vol. 30, pp [5]. P. Giudici, Applied Data Mining: Statistical Methods for Business and Industry, John Wiley & Sons Ltd., Chichester, West Sussex, England, [6]. J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, [7]. D. Hand, and W. Henley, "Statistical Classification Method in Consumer Scoring: A Review", Journal of the Royal Statistical Society, Series A (Statistics in Society), 1997, Vol. 160, pp [8]. Y-H. Huang, C-M Hung, and H.C Jiau, "Evaluation of Neural Networks and Data mining Methods on a Credit Assessment Task for Class Imbalance Problem", Nonlinear Analysis: Real Worlds Applications, 2006, 7, pp [9]. I. Jagielska, C. Matthews, and T. Whitfort, "An Investigation into the Application of Neural Networks, Fuzzy Logic, Genetic Algorithms, and Rough Sets to Automated Knowledge Acquisition for Classification Problems", Neurocomputing, 1999, Vol. 24, pp [12]. S. Piramuthu, "Feature Selection for Financial Credit- Risk Evaluation Decisions", INFORMS Journal on Computing, 1999, Vol. 11, No. 3, pp [13]. J.R. Quinlan, "Simplifying Decision Trees", International Journal of Man-Machine Studies, 1987, Vol. 27, pp [14]. E. Rosenberg, and A. Gleidt, "Quantitative Methods in Credit Management: A Survey", Operations Research, 1994, Vol. 42, pp [15]. P-N Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison-Wesley, [16]. L.C. Thomas, "A Survey of Credit and Behavioral Scoring: Forecasting Financial Risk of Lending to Consumers", International Journal of Forecasting, 2000, Vol. 16, pp [17]. M. Vojtek, and E. Kocenda, "Credit Scoring Methods", Czech Journal of Economics and Finance, 2006, Vol. 56, Issue 3-4, pp [18]. D. West, "Neural Network Credit Scoring Models", Computers & Operations Research, 2000, Vol. 27, pp [19]. I.H. Witten, and E. Frank, Data Mining: Practical Learning Tools and Techniques Morgan Kaufmann Publishers, [20]. B. Yang, L.X. Li, H. Ji, and J. Xu, "An Early Warning System for Loan Risk Assessment Using Artificial Neural Networks", Knowledge-Based Systems, 2001, Vol. 14, pp [21]. Y. Yang, "Adaptive Credit Scoring with Kernel Learning Methods", European Journal of Operational Research, 2007, Vol. 183, Issue 3, pp [22]. J. Zurada, and M. Zurada, "How Secure are "Good Loans": Validating Loan-Granting Decisions and Predicting Default Rates on Consumer Loans", Review of Business Information Systems, 2002, 6(3), pp [23]. J. Zurada, "Rule Induction Methods for Credit Scoring", Review of Business Information Systems, 2007, Vol. 11, 2, pp [10]. T.M. Mitchell, Machine Learning, WCB/McGraw- Hill, Boston, Massachusetts, [11]. S. Piramuthu, "Financial Credit-Risk Evaluation with Neural and Neurofuzzy Systems", European Journal of Operational Research, 1999, Vol. 112, pp

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients American Journal of Data Mining and Knowledge Discovery 2018; 3(1): 1-12 http://www.sciencepublishinggroup.com/j/ajdmkd doi: 10.11648/j.ajdmkd.20180301.11 Naïve Bayesian Classifier and Classification Trees

More information

The Classification Performance of Multiple Methods and Datasets: Cases from the Loan Credit Scoring Domain

The Classification Performance of Multiple Methods and Datasets: Cases from the Loan Credit Scoring Domain Journal of International Technology and Information Management Volume 23 Issue 1 Article 5 2014 The Classification Performance of Multiple Methods and Datasets: Cases from the Loan Credit Scoring Domain

More information

Credit Card Default Predictive Modeling

Credit Card Default Predictive Modeling Credit Card Default Predictive Modeling Background: Predicting credit card payment default is critical for the successful business model of a credit card company. An accurate predictive model can help

More information

STOCK PRICE PREDICTION: KOHONEN VERSUS BACKPROPAGATION

STOCK PRICE PREDICTION: KOHONEN VERSUS BACKPROPAGATION STOCK PRICE PREDICTION: KOHONEN VERSUS BACKPROPAGATION Alexey Zorin Technical University of Riga Decision Support Systems Group 1 Kalkyu Street, Riga LV-1658, phone: 371-7089530, LATVIA E-mail: alex@rulv

More information

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS Ling Kock Sheng 1, Teh Ying Wah 2 1 Faculty of Computer Science and Information Technology, University of

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL NETWORKS K. Jayanthi, Dr. K. Suresh 1 Department of Computer

More information

Accepted Manuscript. Enterprise Credit Risk Evaluation Based on Neural Network Algorithm. Xiaobing Huang, Xiaolian Liu, Yuanqian Ren

Accepted Manuscript. Enterprise Credit Risk Evaluation Based on Neural Network Algorithm. Xiaobing Huang, Xiaolian Liu, Yuanqian Ren Accepted Manuscript Enterprise Credit Risk Evaluation Based on Neural Network Algorithm Xiaobing Huang, Xiaolian Liu, Yuanqian Ren PII: S1389-0417(18)30213-4 DOI: https://doi.org/10.1016/j.cogsys.2018.07.023

More information

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques 6.1 Introduction Trading in stock market is one of the most popular channels of financial investments.

More information

Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt*

Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt* Asian Economic Journal 2018, Vol. 32 No. 1, 3 14 3 Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt* Jun-Tae Han, Jae-Seok Choi, Myeon-Jung Kim and Jina Jeong Received

More information

How To Prevent Another Financial Crisis On Wall Street

How To Prevent Another Financial Crisis On Wall Street How To Prevent Another Financial Crisis On Wall Street Helin Gao helingao@stanford.edu Qianying Lin qlin1@stanford.edu Kaidi Yan kaidi@stanford.edu Abstract Riskiness of a particular loan can be estimated

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL

More information

Role of soft computing techniques in predicting stock market direction

Role of soft computing techniques in predicting stock market direction REVIEWS Role of soft computing techniques in predicting stock market direction Panchal Amitkumar Mansukhbhai 1, Dr. Jayeshkumar Madhubhai Patel 2 1. Ph.D Research Scholar, Gujarat Technological University,

More information

Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data

Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data Sitti Wetenriajeng Sidehabi Department of Electrical Engineering Politeknik ATI Makassar Makassar, Indonesia tenri616@gmail.com

More information

An introduction to Machine learning methods and forecasting of time series in financial markets

An introduction to Machine learning methods and forecasting of time series in financial markets An introduction to Machine learning methods and forecasting of time series in financial markets Mark Wong markwong@kth.se December 10, 2016 Abstract The goal of this paper is to give the reader an introduction

More information

Modeling customer revolving credit scoring using logistic regression, survival analysis and neural networks

Modeling customer revolving credit scoring using logistic regression, survival analysis and neural networks Modeling customer revolving credit scoring using logistic regression, survival analysis and neural networks NATASA SARLIJA a, MIRTA BENSIC b, MARIJANA ZEKIC-SUSAC c a Faculty of Economics, J.J.Strossmayer

More information

Iran s Stock Market Prediction By Neural Networks and GA

Iran s Stock Market Prediction By Neural Networks and GA Iran s Stock Market Prediction By Neural Networks and GA Mahmood Khatibi MS. in Control Engineering mahmood.khatibi@gmail.com Habib Rajabi Mashhadi Associate Professor h_mashhadi@ferdowsi.um.ac.ir Electrical

More information

PERFORMANCE COMPARISON OF THREE DATA MINING MODELS FOR BUSINESS TAX AUDIT

PERFORMANCE COMPARISON OF THREE DATA MINING MODELS FOR BUSINESS TAX AUDIT PERFORMANCE COMPARISON OF THREE DATA MINING MODELS FOR BUSINESS TAX AUDIT 1 TSUNG-NAN CHOU 1 Asstt Prof., Department of Finance, Chaoyang University of Technology. Taiwan E-mail: 1 tnchou@cyut.edu.tw ABSTRACT

More information

Session 5. Predictive Modeling in Life Insurance

Session 5. Predictive Modeling in Life Insurance SOA Predictive Analytics Seminar Hong Kong 29 Aug. 2018 Hong Kong Session 5 Predictive Modeling in Life Insurance Jingyi Zhang, Ph.D Predictive Modeling in Life Insurance JINGYI ZHANG PhD Scientist Global

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

An enhanced artificial neural network for stock price predications

An enhanced artificial neural network for stock price predications An enhanced artificial neural network for stock price predications Jiaxin MA Silin HUANG School of Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR S. H. KWOK HKUST Business

More information

ScienceDirect. Detecting the abnormal lenders from P2P lending data

ScienceDirect. Detecting the abnormal lenders from P2P lending data Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 91 (2016 ) 357 361 Information Technology and Quantitative Management (ITQM 2016) Detecting the abnormal lenders from P2P

More information

LIFT-BASED QUALITY INDEXES FOR CREDIT SCORING MODELS AS AN ALTERNATIVE TO GINI AND KS

LIFT-BASED QUALITY INDEXES FOR CREDIT SCORING MODELS AS AN ALTERNATIVE TO GINI AND KS Journal of Statistics: Advances in Theory and Applications Volume 7, Number, 202, Pages -23 LIFT-BASED QUALITY INDEXES FOR CREDIT SCORING MODELS AS AN ALTERNATIVE TO GINI AND KS MARTIN ŘEZÁČ and JAN KOLÁČEK

More information

Natural Customer Ranking of Banks in Terms of Credit Risk by Using Data Mining A Case Study: Branches of Mellat Bank of Iran

Natural Customer Ranking of Banks in Terms of Credit Risk by Using Data Mining A Case Study: Branches of Mellat Bank of Iran Jurnal UMP Social Sciences and Technology Management Vol. 3, Issue. 2,2015 Natural Customer Ranking of Banks in Terms of Credit Risk by Using Data Mining A Case Study: Branches of Mellat Bank of Iran Somayyeh

More information

Can Twitter predict the stock market?

Can Twitter predict the stock market? 1 Introduction Can Twitter predict the stock market? Volodymyr Kuleshov December 16, 2011 Last year, in a famous paper, Bollen et al. (2010) made the claim that Twitter mood is correlated with the Dow

More information

Improving Stock Price Prediction with SVM by Simple Transformation: The Sample of Stock Exchange of Thailand (SET)

Improving Stock Price Prediction with SVM by Simple Transformation: The Sample of Stock Exchange of Thailand (SET) Thai Journal of Mathematics Volume 14 (2016) Number 3 : 553 563 http://thaijmath.in.cmu.ac.th ISSN 1686-0209 Improving Stock Price Prediction with SVM by Simple Transformation: The Sample of Stock Exchange

More information

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 12, December -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 REVIEW

More information

Understanding neural networks

Understanding neural networks Machine Learning Neural Networks Understanding neural networks An Artificial Neural Network (ANN) models the relationship between a set of input signals and an output signal using a model derived from

More information

Applications of Neural Networks in Stock Market Prediction

Applications of Neural Networks in Stock Market Prediction Applications of Neural Networks in Stock Market Prediction -An Approach Based Analysis Shiv Kumar Goel 1, Bindu Poovathingal 2, Neha Kumari 3 1Asst. Professor, Vivekanand Education Society Institute of

More information

Model Maestro. Scorto TM. Specialized Tools for Credit Scoring Models Development. Credit Portfolio Analysis. Scoring Models Development

Model Maestro. Scorto TM. Specialized Tools for Credit Scoring Models Development. Credit Portfolio Analysis. Scoring Models Development Credit Portfolio Analysis Scoring Models Development Scorto TM Models Analysis and Maintenance Model Maestro Specialized Tools for Credit Scoring Models Development 2 Purpose and Tasks to Be Solved Scorto

More information

A Neuro-fuzzy Approach to Bad Debt Recovery in Healthcare

A Neuro-fuzzy Approach to Bad Debt Recovery in Healthcare 2014 47th Hawaii International Conference on System Science A Neuro-fuzzy Approach to Bad Debt Recovery in Healthcare Donghui Shi Department of Computer Engineering School of Electronics and Information

More information

Decision Trees An Early Classifier

Decision Trees An Early Classifier An Early Classifier Jason Corso SUNY at Buffalo January 19, 2012 J. Corso (SUNY at Buffalo) Trees January 19, 2012 1 / 33 Introduction to Non-Metric Methods Introduction to Non-Metric Methods We cover

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue I, Jan. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue I, Jan. 18,   ISSN A.Komathi, J.Kumutha, Head & Assistant professor, Department of CS&IT, Research scholar, Department of CS&IT, Nadar Saraswathi College of arts and science, Theni. ABSTRACT Data mining techniques are becoming

More information

Prediction of Stock Closing Price by Hybrid Deep Neural Network

Prediction of Stock Closing Price by Hybrid Deep Neural Network Available online www.ejaet.com European Journal of Advances in Engineering and Technology, 2018, 5(4): 282-287 Research Article ISSN: 2394-658X Prediction of Stock Closing Price by Hybrid Deep Neural Network

More information

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Business Strategies in Credit Rating and the Control

More information

DFAST Modeling and Solution

DFAST Modeling and Solution Regulatory Environment Summary Fallout from the 2008-2009 financial crisis included the emergence of a new regulatory landscape intended to safeguard the U.S. banking system from a systemic collapse. In

More information

Predicting the Success of a Retirement Plan Based on Early Performance of Investments

Predicting the Success of a Retirement Plan Based on Early Performance of Investments Predicting the Success of a Retirement Plan Based on Early Performance of Investments CS229 Autumn 2010 Final Project Darrell Cain, AJ Minich Abstract Using historical data on the stock market, it is possible

More information

Modeling Private Firm Default: PFirm

Modeling Private Firm Default: PFirm Modeling Private Firm Default: PFirm Grigoris Karakoulas Business Analytic Solutions May 30 th, 2002 Outline Problem Statement Modelling Approaches Private Firm Data Mining Model Development Model Evaluation

More information

Neural Network Prediction of Stock Price Trend Based on RS with Entropy Discretization

Neural Network Prediction of Stock Price Trend Based on RS with Entropy Discretization 2017 International Conference on Materials, Energy, Civil Engineering and Computer (MATECC 2017) Neural Network Prediction of Stock Price Trend Based on RS with Entropy Discretization Huang Haiqing1,a,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 2, Mar Apr 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 2, Mar Apr 2017 RESEARCH ARTICLE Stock Selection using Principal Component Analysis with Differential Evolution Dr. Balamurugan.A [1], Arul Selvi. S [2], Syedhussian.A [3], Nithin.A [4] [3] & [4] Professor [1], Assistant

More information

Bond Market Prediction using an Ensemble of Neural Networks

Bond Market Prediction using an Ensemble of Neural Networks Bond Market Prediction using an Ensemble of Neural Networks Bhagya Parekh Naineel Shah Rushabh Mehta Harshil Shah ABSTRACT The characteristics of a successful financial forecasting system are the exploitation

More information

THE investment in stock market is a common way of

THE investment in stock market is a common way of PROJECT REPORT, MACHINE LEARNING (COMP-652 AND ECSE-608) MCGILL UNIVERSITY, FALL 2018 1 Comparison of Different Algorithmic Trading Strategies on Tesla Stock Price Tawfiq Jawhar, McGill University, Montreal,

More information

Research on Enterprise Financial Management and Decision Making based on Decision Tree Algorithm

Research on Enterprise Financial Management and Decision Making based on Decision Tree Algorithm Research on Enterprise Financial Management and Decision Making based on Decision Tree Algorithm Shen Zhai School of Economics and Management, Urban Vocational College of Sichuan, Chengdu, Sichuan, China

More information

Analyzing Life Insurance Data with Different Classification Techniques for Customers Behavior Analysis

Analyzing Life Insurance Data with Different Classification Techniques for Customers Behavior Analysis Analyzing Life Insurance Data with Different Classification Techniques for Customers Behavior Analysis Md. Saidur Rahman, Kazi Zawad Arefin, Saqif Masud, Shahida Sultana and Rashedur M. Rahman Abstract

More information

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index Soleh Ardiansyah 1, Mazlina Abdul Majid 2, JasniMohamad Zain 2 Faculty of Computer System and Software

More information

Automated Options Trading Using Machine Learning

Automated Options Trading Using Machine Learning 1 Automated Options Trading Using Machine Learning Peter Anselmo and Karen Hovsepian and Carlos Ulibarri and Michael Kozloski Department of Management, New Mexico Tech, Socorro, NM 87801, U.S.A. We summarize

More information

Lecture 9: Classification and Regression Trees

Lecture 9: Classification and Regression Trees Lecture 9: Classification and Regression Trees Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical

More information

An Improved Approach for Business & Market Intelligence using Artificial Neural Network

An Improved Approach for Business & Market Intelligence using Artificial Neural Network Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Introducing GEMS a Novel Technique for Ensemble Creation

Introducing GEMS a Novel Technique for Ensemble Creation Introducing GEMS a Novel Technique for Ensemble Creation Ulf Johansson 1, Tuve Löfström 1, Rikard König 1, Lars Niklasson 2 1 School of Business and Informatics, University of Borås, Sweden 2 School of

More information

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) CS22 Artificial Intelligence Stanford University Autumn 26-27 Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) Overview Lending Club is an online peer-to-peer lending

More information

Fitting financial time series returns distributions: a mixture normality approach

Fitting financial time series returns distributions: a mixture normality approach Fitting financial time series returns distributions: a mixture normality approach Riccardo Bramante and Diego Zappa * Abstract Value at Risk has emerged as a useful tool to risk management. A relevant

More information

Performance analysis of Neural Network Algorithms on Stock Market Forecasting

Performance analysis of Neural Network Algorithms on Stock Market Forecasting www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 9 September, 2014 Page No. 8347-8351 Performance analysis of Neural Network Algorithms on Stock Market

More information

International Journal of Research in Engineering Technology - Volume 2 Issue 5, July - August 2017

International Journal of Research in Engineering Technology - Volume 2 Issue 5, July - August 2017 RESEARCH ARTICLE OPEN ACCESS The technical indicator Z-core as a forecasting input for neural networks in the Dutch stock market Gerardo Alfonso Department of automation and systems engineering, University

More information

DATA MINING ON LOAN APPROVED DATSET FOR PREDICTING DEFAULTERS

DATA MINING ON LOAN APPROVED DATSET FOR PREDICTING DEFAULTERS DATA MINING ON LOAN APPROVED DATSET FOR PREDICTING DEFAULTERS By Ashish Pandit A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science

More information

Wage Determinants Analysis by Quantile Regression Tree

Wage Determinants Analysis by Quantile Regression Tree Communications of the Korean Statistical Society 2012, Vol. 19, No. 2, 293 301 DOI: http://dx.doi.org/10.5351/ckss.2012.19.2.293 Wage Determinants Analysis by Quantile Regression Tree Youngjae Chang 1,a

More information

AN ARTIFICIAL NEURAL NETWORK MODELING APPROACH TO PREDICT CRUDE OIL FUTURE. By Dr. PRASANT SARANGI Director (Research) ICSI-CCGRT, Navi Mumbai

AN ARTIFICIAL NEURAL NETWORK MODELING APPROACH TO PREDICT CRUDE OIL FUTURE. By Dr. PRASANT SARANGI Director (Research) ICSI-CCGRT, Navi Mumbai AN ARTIFICIAL NEURAL NETWORK MODELING APPROACH TO PREDICT CRUDE OIL FUTURE By Dr. PRASANT SARANGI Director (Research) ICSI-CCGRT, Navi Mumbai AN ARTIFICIAL NEURAL NETWORK MODELING APPROACH TO PREDICT CRUDE

More information

Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often

Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often Abstract Making good predictions for stock prices is an important task for the financial industry. The way these predictions are carried out is often by using artificial intelligence that can learn from

More information

Modeling and Forecasting Customer Behavior for Revolving Credit Facilities

Modeling and Forecasting Customer Behavior for Revolving Credit Facilities Modeling and Forecasting Customer Behavior for Revolving Credit Facilities Radoslava Mirkov 1, Holger Thomae 1, Michael Feist 2, Thomas Maul 1, Gordon Gillespie 1, Bastian Lie 1 1 TriSolutions GmbH, Hamburg,

More information

Estimating term structure of interest rates: neural network vs one factor parametric models

Estimating term structure of interest rates: neural network vs one factor parametric models Estimating term structure of interest rates: neural network vs one factor parametric models F. Abid & M. B. Salah Faculty of Economics and Busines, Sfax, Tunisia Abstract The aim of this paper is twofold;

More information

Examining Long-Term Trends in Company Fundamentals Data

Examining Long-Term Trends in Company Fundamentals Data Examining Long-Term Trends in Company Fundamentals Data Michael Dickens 2015-11-12 Introduction The equities market is generally considered to be efficient, but there are a few indicators that are known

More information

ABSTRACT. KEYWORDS: Credit Risk, Bad Debts, Credit Rating, Credit Indices, Logistic Regression INTRODUCTION AHMAD NAGHILOO 1 & MORADI FEREIDOUN 2

ABSTRACT. KEYWORDS: Credit Risk, Bad Debts, Credit Rating, Credit Indices, Logistic Regression INTRODUCTION AHMAD NAGHILOO 1 & MORADI FEREIDOUN 2 BEST: Journal of Management, Information Technology and Engineering (BEST: JMITE) Vol., Issue, Jun 05, 59-66 BEST Journals THE RELATIONSHIP BETWEEN CREDIT RISK AND BAD DEBTS THROUGH OPTIMUM CREDIT RISK

More information

A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION

A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION K. Valarmathi Software Engineering, SonaCollege of Technology, Salem, Tamil Nadu valarangel@gmail.com ABSTRACT A decision

More information

Stock Market Prediction System

Stock Market Prediction System Stock Market Prediction System W.N.N De Silva 1, H.M Samaranayaka 2, T.R Singhara 3, D.C.H Wijewardana 4. Sri Lanka Institute of Information Technology, Malabe, Sri Lanka. { 1 nathashanirmani55, 2 malmisamaranayaka,

More information

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree Introduction MS4424 Data Mining & Modelling Decision Tree Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk decision tree is a set of rules represented in a tree structure

More information

Classification of Capital Expenditures and Revenue Expenditures: An Analysis of Correlation and Neural Networks

Classification of Capital Expenditures and Revenue Expenditures: An Analysis of Correlation and Neural Networks Classification of Capital Expenditures and Revenue Expenditures: An Analysis of Correlation and Neural Networks Fadzilah Siraj a, Nurazzah Abu Bakar b, Adnan Abolgasim c a,b,c College of Arts and Sciences

More information

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman 11 November 2013 Agenda Introduction to predictive analytics Applications overview Case studies Conclusions and Q&A Introduction

More information

Pattern Recognition Chapter 5: Decision Trees

Pattern Recognition Chapter 5: Decision Trees Pattern Recognition Chapter 5: Decision Trees Asst. Prof. Dr. Chumphol Bunkhumpornpat Department of Computer Science Faculty of Science Chiang Mai University Learning Objectives How decision trees are

More information

CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults

CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults CS 475 Machine Learning: Final Project Dual-Form SVM for Predicting Loan Defaults Kevin Rowland Johns Hopkins University 3400 N. Charles St. Baltimore, MD 21218, USA krowlan3@jhu.edu Edward Schembor Johns

More information

Pattern Recognition by Neural Network Ensemble

Pattern Recognition by Neural Network Ensemble IT691 2009 1 Pattern Recognition by Neural Network Ensemble Joseph Cestra, Babu Johnson, Nikolaos Kartalis, Rasul Mehrab, Robb Zucker Pace University Abstract This is an investigation of artificial neural

More information

Model Maestro. Scorto. Specialized Tools for Credit Scoring Models Development. Credit Portfolio Analysis. Scoring Models Development

Model Maestro. Scorto. Specialized Tools for Credit Scoring Models Development. Credit Portfolio Analysis. Scoring Models Development Credit Portfolio Analysis Scoring Models Development Scorto TM Models Analysis and Maintenance Model Maestro Specialized Tools for Credit Scoring Models Development 2 Purpose and Tasks to Be Solved Scorto

More information

$tock Forecasting using Machine Learning

$tock Forecasting using Machine Learning $tock Forecasting using Machine Learning Greg Colvin, Garrett Hemann, and Simon Kalouche Abstract We present an implementation of 3 different machine learning algorithms gradient descent, support vector

More information

Data based stock portfolio construction using Computational Intelligence

Data based stock portfolio construction using Computational Intelligence Data based stock portfolio construction using Computational Intelligence Asimina Dimara and Christos-Nikolaos Anagnostopoulos Data Economy workshop: How online data change economy and business Introduction

More information

Journal of Internet Banking and Commerce

Journal of Internet Banking and Commerce Journal of Internet Banking and Commerce An open access Internet journal (http://www.icommercecentral.com) Journal of Internet Banking and Commerce, December 2017, vol. 22, no. 3 STOCK PRICE PREDICTION

More information

Credit scoring with boosted decision trees

Credit scoring with boosted decision trees MPRA Munich Personal RePEc Archive Credit scoring with boosted decision trees Joao Bastos CEMAPRE, School of Economics and Management (ISEG), Technical University of Lisbon 1. April 2008 Online at http://mpra.ub.uni-muenchen.de/8156/

More information

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Market Variables and Financial Distress. Giovanni Fernandez Stetson University Market Variables and Financial Distress Giovanni Fernandez Stetson University In this paper, I investigate the predictive ability of market variables in correctly predicting and distinguishing going concern

More information

Two kinds of neural networks, a feed forward multi layer Perceptron (MLP)[1,3] and an Elman recurrent network[5], are used to predict a company's

Two kinds of neural networks, a feed forward multi layer Perceptron (MLP)[1,3] and an Elman recurrent network[5], are used to predict a company's LITERATURE REVIEW 2. LITERATURE REVIEW Detecting trends of stock data is a decision support process. Although the Random Walk Theory claims that price changes are serially independent, traders and certain

More information

Adeptness Comparison between Instance Based and K Star Classifiers for Credit Risk Scrutiny

Adeptness Comparison between Instance Based and K Star Classifiers for Credit Risk Scrutiny Adeptness Comparison between Instance Based and K Star Classifiers for Credit Risk Scrutiny C. Lakshmi Devasena 1 Department of Operations and IT, IBS, Hyderabad, IFHE University, Hyderabad, Tamilnadu,

More information

Stock Price and Index Forecasting by Arbitrage Pricing Theory-Based Gaussian TFA Learning

Stock Price and Index Forecasting by Arbitrage Pricing Theory-Based Gaussian TFA Learning Stock Price and Index Forecasting by Arbitrage Pricing Theory-Based Gaussian TFA Learning Kai Chun Chiu and Lei Xu Department of Computer Science and Engineering The Chinese University of Hong Kong, Shatin,

More information

Implementation of Classifiers for Choosing Insurance Policy Using Decision Trees: A Case Study

Implementation of Classifiers for Choosing Insurance Policy Using Decision Trees: A Case Study Implementation of Classifiers for Choosing Insurance Policy Using Decision Trees: A Case Study CHIN-SHENG HUANG 1, YU-JU LIN, CHE-CHERN LIN 1: Department and Graduate Institute of Finance National Yunlin

More information

Accelerated Option Pricing Multiple Scenarios

Accelerated Option Pricing Multiple Scenarios Accelerated Option Pricing in Multiple Scenarios 04.07.2008 Stefan Dirnstorfer (stefan@thetaris.com) Andreas J. Grau (grau@thetaris.com) 1 Abstract This paper covers a massive acceleration of Monte-Carlo

More information

Problems and Opinions

Problems and Opinions Problems and Opinions Anna Matuszyk * Aneta Ptak-Chmielewska ** PROFILE OF THE FRAUDULENT CUSTOMER 1. INTRODUCTION Fraud may occur in any financial activity. However, banks are particularly exposed due

More information

Based on BP Neural Network Stock Prediction

Based on BP Neural Network Stock Prediction Based on BP Neural Network Stock Prediction Xiangwei Liu Foundation Department, PLA University of Foreign Languages Luoyang 471003, China Tel:86-158-2490-9625 E-mail: liuxwletter@163.com Xin Ma Foundation

More information

Alternate Models for Forecasting Hedge Fund Returns

Alternate Models for Forecasting Hedge Fund Returns University of Rhode Island DigitalCommons@URI Senior Honors Projects Honors Program at the University of Rhode Island 2011 Alternate Models for Forecasting Hedge Fund Returns Michael A. Holden Michael

More information

Backpropagation and Recurrent Neural Networks in Financial Analysis of Multiple Stock Market Returns

Backpropagation and Recurrent Neural Networks in Financial Analysis of Multiple Stock Market Returns Backpropagation and Recurrent Neural Networks in Financial Analysis of Multiple Stock Market Returns Jovina Roman and Akhtar Jameel Department of Computer Science Xavier University of Louisiana 7325 Palmetto

More information

Measuring DAX Market Risk: A Neural Network Volatility Mixture Approach

Measuring DAX Market Risk: A Neural Network Volatility Mixture Approach Measuring DAX Market Risk: A Neural Network Volatility Mixture Approach Kai Bartlmae, Folke A. Rauscher DaimlerChrysler AG, Research and Technology FT3/KL, P. O. Box 2360, D-8903 Ulm, Germany E mail: fkai.bartlmae,

More information

Foreign Exchange Forecasting via Machine Learning

Foreign Exchange Forecasting via Machine Learning Foreign Exchange Forecasting via Machine Learning Christian González Rojas cgrojas@stanford.edu Molly Herman mrherman@stanford.edu I. INTRODUCTION The finance industry has been revolutionized by the increased

More information

Does Non-linearity Matter in Retail Credit Risk Modeling?

Does Non-linearity Matter in Retail Credit Risk Modeling? JEL Classification: C45, C25, D81, G21 Keywords: retail banking, credit risk, logistic regression, learning vector quantization Does Non-linearity Matter in Retail Credit Risk Modeling? Vita JAGRIC Davorin

More information

Mining Investment Venture Rules from Insurance Data Based on Decision Tree

Mining Investment Venture Rules from Insurance Data Based on Decision Tree Mining Investment Venture Rules from Insurance Data Based on Decision Tree Jinlan Tian, Suqin Zhang, Lin Zhu, and Ben Li Department of Computer Science and Technology Tsinghua University., Beijing, 100084,

More information

Development and Performance Evaluation of Three Novel Prediction Models for Mutual Fund NAV Prediction

Development and Performance Evaluation of Three Novel Prediction Models for Mutual Fund NAV Prediction Development and Performance Evaluation of Three Novel Prediction Models for Mutual Fund NAV Prediction Ananya Narula *, Chandra Bhanu Jha * and Ganapati Panda ** E-mail: an14@iitbbs.ac.in; cbj10@iitbbs.ac.in;

More information

Stock market price index return forecasting using ANN. Gunter Senyurt, Abdulhamit Subasi

Stock market price index return forecasting using ANN. Gunter Senyurt, Abdulhamit Subasi Stock market price index return forecasting using ANN Gunter Senyurt, Abdulhamit Subasi E-mail : gsenyurt@ibu.edu.ba, asubasi@ibu.edu.ba Abstract Even though many new data mining techniques have been introduced

More information

A Review of Artificial Neural Network Applications in Control. Chart Pattern Recognition

A Review of Artificial Neural Network Applications in Control. Chart Pattern Recognition A Review of Artificial Neural Network Applications in Control Chart Pattern Recognition M. Perry and J. Pignatiello Department of Industrial Engineering FAMU - FSU College of Engineering 2525 Pottsdamer

More information

Tests for Two Independent Sensitivities

Tests for Two Independent Sensitivities Chapter 75 Tests for Two Independent Sensitivities Introduction This procedure gives power or required sample size for comparing two diagnostic tests when the outcome is sensitivity (or specificity). In

More information

Credit Scoring Analysis using LASSO Logistic Regression and Support Vector Machine (SVM)

Credit Scoring Analysis using LASSO Logistic Regression and Support Vector Machine (SVM) Volume-7, Issue-4, July-August 2017 International Journal of Engineering and Management Research Page Number: 393-397 Credit Scoring Analysis using LASSO Logistic Regression and Support Vector Machine

More information

COGNITIVE LEARNING OF INTELLIGENCE SYSTEMS USING NEURAL NETWORKS: EVIDENCE FROM THE AUSTRALIAN CAPITAL MARKETS

COGNITIVE LEARNING OF INTELLIGENCE SYSTEMS USING NEURAL NETWORKS: EVIDENCE FROM THE AUSTRALIAN CAPITAL MARKETS Asian Academy of Management Journal, Vol. 7, No. 2, 17 25, July 2002 COGNITIVE LEARNING OF INTELLIGENCE SYSTEMS USING NEURAL NETWORKS: EVIDENCE FROM THE AUSTRALIAN CAPITAL MARKETS Joachim Tan Edward Sek

More information

Stock Market Predictor and Analyser using Sentimental Analysis and Machine Learning Algorithms

Stock Market Predictor and Analyser using Sentimental Analysis and Machine Learning Algorithms Volume 119 No. 12 2018, 15395-15405 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Stock Market Predictor and Analyser using Sentimental Analysis and Machine Learning Algorithms 1

More information

Predictive Risk Categorization of Retail Bank Loans Using Data Mining Techniques

Predictive Risk Categorization of Retail Bank Loans Using Data Mining Techniques National Conference on Recent Advances in Computer Science and IT (NCRACIT) International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume

More information

Statistical Data Mining for Computational Financial Modeling

Statistical Data Mining for Computational Financial Modeling Statistical Data Mining for Computational Financial Modeling Ali Serhan KOYUNCUGIL, Ph.D. Capital Markets Board of Turkey - Research Department Ankara, Turkey askoyuncugil@gmail.com www.koyuncugil.org

More information

KERNEL PROBABILITY DENSITY ESTIMATION METHODS

KERNEL PROBABILITY DENSITY ESTIMATION METHODS 5.- KERNEL PROBABILITY DENSITY ESTIMATION METHODS S. Towers State University of New York at Stony Brook Abstract Kernel Probability Density Estimation techniques are fast growing in popularity in the particle

More information

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering ` Iranian Journal of Management Studies (IJMS) http://ijms.ut.ac.ir/ Vol. 11, No. 1, Winter 2018 Print ISSN: 2008-7055 pp. 91-111 Online ISSN: 2345-3745 DOI: 10.22059/ijms.2018.242718.672842 Matrix Sequential

More information

Research Article Design and Explanation of the Credit Ratings of Customers Model Using Neural Networks

Research Article Design and Explanation of the Credit Ratings of Customers Model Using Neural Networks Research Journal of Applied Sciences, Engineering and Technology 7(4): 5179-5183, 014 DOI:10.1906/rjaset.7.915 ISSN: 040-7459; e-issn: 040-7467 014 Maxwell Scientific Publication Corp. Submitted: February

More information

Predicting Economic Recession using Data Mining Techniques

Predicting Economic Recession using Data Mining Techniques Predicting Economic Recession using Data Mining Techniques Authors Naveed Ahmed Kartheek Atluri Tapan Patwardhan Meghana Viswanath Predicting Economic Recession using Data Mining Techniques Page 1 Abstract

More information