ScienceDirect. Detecting the abnormal lenders from P2P lending data

Similar documents
Predicting Online Peer-to-Peer(P2P) Lending Default using Data Mining Techniques

Available online at ScienceDirect. Procedia Computer Science 61 (2015 ) 85 91

Z-score Model on Financial Crisis Early-Warning of Listed Real Estate Companies in China: a Financial Engineering Perspective Wang Yi *

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Examination on the Relationship between OVX and Crude Oil Price with Kalman Filter

ScienceDirect. Project Coordination Model

Multifractal Properties of Interest Rates in Bond Market

Procedia - Social and Behavioral Sciences 109 ( 2014 ) Yigit Bora Senyigit *, Yusuf Ag

International Journal of Advance Engineering and Research Development REVIEW ON PREDICTION SYSTEM FOR BANK LOAN CREDIBILITY

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

Handling Uncertainty in Social Lending Credit Risk Prediction with a Choquet Fuzzy Integral Model

Credit Card Default Predictive Modeling

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 2, Mar Apr 2017

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Performance analysis of Neural Network Algorithms on Stock Market Forecasting

ISSN: (Online) Volume 4, Issue 2, February 2016 International Journal of Advance Research in Computer Science and Management Studies

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

Procedia - Social and Behavioral Sciences 156 ( 2014 )

A New Method Based on Clustering and Feature Selection for Credit Scoring of Banking Customers Seyedeh Maryam Anaei 1 and Mohsen Moradi 2

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

Stock Trading Following Stock Price Index Movement Classification Using Machine Learning Techniques

Procedia - Social and Behavioral Sciences 205 ( 2015 ) th World conference on Psychology Counseling and Guidance, May 2015

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

Procedia - Social and Behavioral Sciences 109 ( 2014 ) Policy-term financing of a business

Stock Prediction Using Twitter Sentiment Analysis

An Empirical Study on Default Factors for US Sub-prime Residential Loans

Predicting prepayment and default risks of unsecured consumer loans in online lending

Stock market price index return forecasting using ANN. Gunter Senyurt, Abdulhamit Subasi

An introduction to Machine learning methods and forecasting of time series in financial markets

New Option Strategy and its Using for Investment Certificate Issuing

Procedia - Social and Behavioral Sciences 109 ( 2014 ) Analysis of Financial Performance of Private Banks in Pakistan

Accepted Manuscript. Enterprise Credit Risk Evaluation Based on Neural Network Algorithm. Xiaobing Huang, Xiaolian Liu, Yuanqian Ren

The Present Situation of Empirical Accounting Research in China and Its Gap with Foreign Countries. Wei-Hua ZHANG

ScienceDirect. The Application of Fuzzy Association Rule on Co-Movement Analyze of Indonesian Stock Price

Procedia - Social and Behavioral Sciences 140 ( 2014 ) PSYSOC Assessment of Corporate Behavioural Finance

Adaptive Neuro-Fuzzy Inference System for Mortgage Loan Risk Assessment

Keyword: Risk Prediction, Clustering, Redundancy, Data Mining, Feature Extraction

ScienceDirect. Mortgage Lending for Slum Clearance

Stock Price and Index Forecasting by Arbitrage Pricing Theory-Based Gaussian TFA Learning

Available online at ScienceDirect

Multi-factor Stock Selection Model Based on Kernel Support Vector Machine

Available online at ScienceDirect. Procedia Economics and Finance 32 ( 2015 ) Andreea Ro oiu a, *

Iran s Stock Market Prediction By Neural Networks and GA

International Journal of Research in Engineering Technology - Volume 2 Issue 5, July - August 2017

ScienceDirect. The Determinants of CDS Spreads: The Case of UK Companies

The analysis of credit scoring models Case Study Transilvania Bank

Available online at ScienceDirect. Procedia Engineering 161 (2016 )

Application of Data Mining Technology in the Loss of Customers in Automobile Insurance Enterprises

ABSTRACT. KEYWORDS: Credit Risk, Bad Debts, Credit Rating, Credit Indices, Logistic Regression INTRODUCTION AHMAD NAGHILOO 1 & MORADI FEREIDOUN 2

Stock Price Prediction using Recurrent Neural Network (RNN) Algorithm on Time-Series Data

Predictive Risk Categorization of Retail Bank Loans Using Data Mining Techniques

CREDIT SCORING USING LOGISTIC REGRESSION

Journal of Internet Banking and Commerce

The Effect of Expert Systems Application on Increasing Profitability and Achieving Competitive Advantage

ScienceDirect. A Comparison of Several Bonus Malus Systems

Financial Innovation and Borrowers: Evidence from Peer-to-Peer Lending

Predicting and Preventing Credit Card Default

Outline. Consumers generate Big Data. Big Data and Economic Modeling. Economic Modeling with Big Data: Understanding Consumer Overdrafting at Banks

Creation and Application of Expert System Framework in Granting the Credit Facilities

International Journal of Computer Engineering and Applications, Volume XII, Issue I, Jan. 18, ISSN

A Comparative Study of Ensemble-based Forecasting Models for Stock Index Prediction

Available online at ScienceDirect. Procedia Computer Science 61 (2015 ) 80 84

Prediction of Stock Closing Price by Hybrid Deep Neural Network

LOGISTIC REGRESSION OF LOAN FULFILLMENT MODEL ON ONLINE PEER-TO-PEER LENDING

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index

Does Calendar Time Portfolio Approach Really Lack Power?

UPDATED IAA EDUCATION SYLLABUS

Assessing Credit Risk: an Application of Data Mining in a Rural Bank

Developing a Risk Group Predictive Model for Korean Students Falling into Bad Debt*

Available online at ScienceDirect. Procedia Environmental Sciences 22 (2014 )

Machine Learning Performance over Long Time Frame

Forecasting stock market prices

Procedia - Social and Behavioral Sciences 156 ( 2014 )

SELECTION BIAS REDUCTION IN CREDIT SCORING MODELS

OPENING RANGE BREAKOUT STOCK TRADING ALGORITHMIC MODEL

ScienceDirect. Some Applications in Economy for Utility Functions Involving Risk Theory

A Study on the Motif Pattern of Dark-Cloud Cover in the Securities

Prediction Using Back Propagation and k- Nearest Neighbor (k-nn) Algorithm

Confusion in scorecard construction - the wrong scores for the right reasons

Natural Customer Ranking of Banks in Terms of Credit Risk by Using Data Mining A Case Study: Branches of Mellat Bank of Iran

Study on Principle of Product Defect Identification

2nd Annual International Conference on Accounting and Finance (AF 2012) Current context of disclosure of corporate social responsibility in Sri Lanka

Bond Market Prediction using an Ensemble of Neural Networks

Available online at ScienceDirect. Procedia Economics and Finance 32 ( 2015 ) Paula Nistor a, *

Journal Of Financial And Strategic Decisions Volume 7 Number 3 Fall 1994 ASYMMETRIC INFORMATION: THE CASE OF BANK LOAN COMMITMENTS

Adeptness Comparison between Instance Based and K Star Classifiers for Credit Risk Scrutiny

Evaluation, Measurement, and Verification (EM&V) of Residential Behavior-Based Energy Efficiency Programs: Issues and Recommendations

International Journal of Computer Engineering and Applications, Volume XII, Issue IV, April 18, ISSN

An enhanced artificial neural network for stock price predications

Amath 546/Econ 589 Introduction to Credit Risk Models

An effective application of decision tree to stock trading

DEVELOPING PREDICTION MODEL FOR STOCK EXCHANGE DATA SET USING HADOOP MAP REDUCE TECHNIQUE

Automated Options Trading Using Machine Learning

Procedia - Social and Behavioral Sciences 156 ( 2014 ) Ingars Erins a *, Laura Vitola b. Riga Technical University, Latvia

Model Maestro. Scorto TM. Specialized Tools for Credit Scoring Models Development. Credit Portfolio Analysis. Scoring Models Development

FORECASTING THE S&P 500 INDEX: A COMPARISON OF METHODS

Modelling LGD for unsecured personal loans

A Novel Method of Trend Lines Generation Using Hough Transform Method

Market value of Innovation: An empirical analysis on China's stock market

Transcription:

Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 91 (2016 ) 357 361 Information Technology and Quantitative Management (ITQM 2016) Detecting the abnormal lenders from P2P lending data Haifeng Li a, *, Yuejin Zhang a, Ning Zhang a, Hengyue Jia a a School of Information, Central University of Finance and Economics, Beijing, China Abstract Online peer-to-peer lending is a new but useful finance method for small enterprises that is conducted on the website. To exclude the risk of this method, we make a study on predicting the potential lenders that may have a bad credit score. We use an outlier detection method to find the abnormal lenders, and we find the detected outliers have bad credit scores with a high possibility. 2016 Published The Authors. by Elsevier Published B.V. This by Elsevier is an open B.V. access article under the CC BY-NC-ND license Selection (http://creativecommons.org/licenses/by-nc-nd/4.0/). and/or peer-review under responsibility of the organizers of ITQM 2016 Peer-review under responsibility of the Organizing Committee of ITQM 2016 Keywords: trust model; credit score; classification; P2P 1. Introduction Online peer-to-peer lending is a new but useful finance method for small enterprises. To finance small and micro enterprises in an effective method has attracted many attentions. This problem is very important especially in China. By the advances in information technologies, a new type of financing method, online peerto-peer (P2P) lending has become an important issue for traditional financing. Online P2P lending allows people to lend and borrow funds directly through an online intermediary without the mediation of financial institutes. 1.1. Motivation When a lender wants to acquire capitals from the online P2P companies, a risk will be raised. Traditional bank can audit the background of a lender with his application document, which, for the P2P companies or the borrowers, is an impossible task. Since a lender is never known has a good credit score or a bad one. Thus, how to find the lenders with bad credit score is a very challengeable question. Many researches have focused on this problem and proposed some useful method. * Corresponding author. Tel.: +8613691380799 E-mail address:mydlhf@cufe.edu.cn. 1877-0509 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the Organizing Committee of ITQM 2016 doi:10.1016/j.procs.2016.07.095

358 Haifeng Li et al. / Procedia Computer Science 91 ( 2016 ) 357 361 1.2. Related Works [1] represented an extension of the expansive credit risk and credit migration literature, prominent in the corporate bond and securities risk pricing literature, to an analysis of the drift of consumer credit scores. A rich data set of residential mortgages was used to observe credit score migration post loan origination and in a test of the ability of credit score transition to serve as a precursor to potential default and prepayment. The results indicated credit scores provide signals and information to investors and servicing agents in a fashion similar to credit ratings on commercial paper as to default potential. Soner[2] presented a proposes a three stage hybrid Adaptive Neuro Fuzzy Inference System credit scoring model, which was based on statistical techniques and Neuro Fuzzy. The performance of the proposed model was compared with conventional and commonly utilized models. The credit scoring models were tested using a 10-fold cross-validation process with the credit card data of an international bank operating in Turkey. Results demonstrated that the proposed model consistently performed better than the Linear Discriminant Analysis, Logistic Regression Analysis, and Artificial Neural Network (ANN) approaches, in terms of average correct classification rate and estimated misclassification cost. [3] addressed the question of what determines a poor credit score. The authors compared estimated credit scores with measures of impulsivity, time preference, risk attitude, and trustworthiness, in an effort to determine the preferences that underlie credit behavior. Data was collected using an incentivized decisionmaking lab experiment, together with financial and psychological surveys. Credit scores were estimated using an online FICO creditscore estimator based on survey data supplied by the participants. Preferences were assessed using a survey measure of impulsivity, with experimental measures of time and risk prefer-ences, as well as trustworthiness. Controlling for income differences, the authors found that the credit score was correlated with measures of impulsivity, time preference, and trustworthiness. Based on trust theories, Chen et. al[4] the present study develops an integrated trust model specifically for the online P2P lending context, to better understand the critical factors that drive lenders trust. The model is empirically tested using surveyed data from 785 online lenders of PaiPaiDai, the first and largest online P2P platform in China. The results show that both trust in borrowers and trust in intermediaries are significant factors influencing lenders lending intention. Emerkter et. al[5] used data from the Lending Club, which is one of the popular online P2P lending houses, to explore the P2P loan characteristics, evaluate the credit risk and measures loan performances. They found that credit grade, debt-to-income ratio, FICO score and revolving line utilization played an important role in loan defaults. Loans with lower credit grade and longer duration were associated with high mortality rate. The result was consistent with the Cox Proportional Hazard test. Also, they found that higher interest rates charged on the high risk borrowers were not enough to compensate for higher probability of the loan default; thus, the Lending Club must find ways to attract high FICO score and high-income borrowers in order to sustain their businesses. Harris[6] investigated the practice of credit scoring and introduced the use of the clustered support vector machine (CSVM) for credit scorecard development. This algorithm was well known that as historical credit scoring datasets get large while highly accurate becomed computationally expensive. Accordingly, he compared the CSVM with other nonlinear SVM based techniques and shows that the CSVM can achieve comparable levels of classification performance while remaining relatively cheap computationally. In this paper, we also addressed this problem and proposed a outlier detection method by the online documents of the lenders. This method can detect the abnormal lenders by their general features. The rest paper is organized as follows: Section 2 presents the data related the lenders. Section 3 introduces our detecting method. Section 4 concludes this paper. 2. Dataset Preparation and Data Processing We use the data crawled from the website, which is a BBS that provide the lenders to discuss the issues related to P2P lending. We preprocess the data and get the dataset with 18 properties. We describe it with Table

Haifeng Li et al. / Procedia Computer Science 91 ( 2016 ) 357 361 359 1. In this dataset, the title and the descriptions are string information, which are not useful in our method. In addition, we transform the continously changed property values, such as age, to the discrete values with an aequilate method. Also, we convert the credit rate and other string type properties to integer properties. Table 1.The characteristics of the dataset Properties Title, Amount, Annual interest rate, Repayment Time, Descriptions, Credit rate, Successful loan number, Failed loan number, Gender, Age, Borrowed credit score, Lending credit score, Overdue, Membership score, Prestige, Forum currency, Contribution, Group Record Count Since not all the properties are valid in our problem, we employ the randomized logistic regression to filter certain the properties that have little impacts, and get the final properties. As shown in Figure 1, the age, membership score, group, amount has a very little percentage on our prediction; thus, we remove these properties. Also, we can see that the failed loan number, the payback time and the borrowed credit score may have a relatively much larger impact on the final predicting results. 20000 Fig.1 The impacts of the properties

360 Haifeng Li et al. / Procedia Computer Science 91 ( 2016 ) 357 361 3. Outlier Detecting Method In this section, we will use a outlier detecting method to perform our analysis. Generally, the outlier methods can be classified into 4 types: The statistics-based, the proximity-base, the density-based and the cluster-based. Since the statistics-based method requires the information of the data distributions, it cannot be used for our datasets. In addition, the proximity-based and the density-based methods are inefficient for massive data; thus, we finally choose the cluster-based method, which is described as follows. First, we clustered the data into K groups, and compute the center. Second, we computed the distances to the nearest center for all the data objects. Third, the relative distance β is computed, which is β=d(d, center)/m (d i, center), in which D(d, center) is the distance between the data object and the nearest center, and M (d i, center) is the median of the distances between all the data objects and their nearest centers. Finally, we compare the relative distance to a specified threshold. Fig. 2. Cluster when K=5, 10, 100, 1000 We perform the method when the threshold is set to 10. Figure 2 shows the mining results when we set K=5, 10, 100 and 1000. The X axis represented the ID of each data object, and the Y axis was the relative distance. As can be seen, the lower the K, the more effective this method. Thus we chose K=5 to achieve final results. In

Haifeng Li et al. / Procedia Computer Science 91 ( 2016 ) 357 361 361 all the 31 outliers, we find only 6 users have good credit score, and the other 25 users have overdue records. As a result, this outlier detection method can be regard as a new method to find the bad credit score. Acknowledgements This research is supported by the National Natural Science Foundation of China (61100112, 61309030, 61309029), Beijing Higher Education Young Elite Teacher Project (YETP0987). Key project of National Social Science Foundation of China(13AXW010), 121 of CUFE Talent project Young doctor Development Fund in 2014 (QBJ1427). References [1] B.C.Smith. Stability in consumer credit scores: Level and direction of FICO score drift as a precursor to mortgage default and prepayment. Journal of Housing Economics, 2011. [2] A. Soner. An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish credit card data. European Journal of Operational Research, 2012. [3] S.Arya, C.Eckel, C.Wichman. Anatomy of the credit score. Journal of Economic Behavior & Ornanization, 2013. [4] D.Chen, F.Lai, Z.Lin. A trust model for online peer-to-peer lending: a lender s perspective. Information Technology Management, 2014. [5] R.Emerkter, Y.Tu, B.Jirasakuldech, M.Lu. Evaluating credit risk and loan performance in online Peer-to-Peer(P2P) lending. Applied Economics, 2014. [6] T.Harris. Credit scoring using the clustered support vector machine. Expert Systems with Applications, 2015.