Parametric versus nonparametric methods in risk scoring: an application to microcredit

Size: px

Start display at page:

Download "Parametric versus nonparametric methods in risk scoring: an application to microcredit"

Thomas Erik Richardson
6 years ago
Views:

1 Empir Econ (2014) 46: DOI /s Parametric versus nonparametric methods in risk scoring: an application to microcredit Manuel A. Hernandez Maximo Torero Received: 9 May 2012 / Accepted: 5 February 2013 / Published online: 9 May 2013 Springer-Verlag Berlin Heidelberg 2013 Abstract The importance of credit access to improve economic opportunities in developing markets is well established in the literature. However, there exists a strong need to mitigate adverse selection problems in microlending. A risk scoring model that more accurately predicts the likelihood of repayment of potential borrowers can help address this market imperfection and to benefit both lenders and borrowers. This paper compares the performance of nonparametric versus semiparametric and traditional parametric risk scoring models based on default probabilities. We show the advantages of relying on less structured, data-driven methods for risk scoring using both simulated data and data from credit loans granted to small and microenterprises in rural Peru. The estimation results indicate that nonparametric methods lead to a better evaluation of credit worthiness and can help prevent including potential bad borrowers and excluding good borrowers from sensitive microcredit markets. Keywords Risk scoring Microcredit Default models Nonparametric methods JEL Classification C14 O16 G17 1 Introduction The importance of credit in improving economic opportunities in developing markets is well documented in the literature (Armendariz and Morduch 2005; Coleman 2006; Ghosh et al. 2000; Khandker 2005). One way in which formal credit to the poor has M. A. Hernandez (B) M. Torero Markets, Trade and Institutions Division, IFPRI, Washington, DC 20006, USA m.a.hernandez@cgiar.org M. Torero m.torero@cgiar.org

2 1058 M. A. Hernandez, M. Torero expanded in recent years is through microfinance. 1 Microfinance is generally targeted to self-employment activities without the need for collateral. The idea is to provide financial services to low-income clients who originally lack access to banking and related services, and extend credit based on the reputation of the borrower (i.e., his/her borrowing and repayment behavior). However, since reputation is difficult to measure at the initial stage and contracts are hard to enforce, lending risks in developing credit markets are potentially high. 2 A more precise measure of a borrower s riskiness can help, in this sense, to mitigate adverse selection problems, i.e., lessen wrong choices made when the characteristics of borrowers are imperfectly observed by the lender or there is insufficient/lack of credit history. In particular, a risk scoring model that more accurately predicts credit worthiness of a borrower can reduce this market imperfection, particularly acute in underdeveloped markets, and benefit both lenders and borrowers. An accurate risk scoring system is not an instrument to necessarily discriminate against high credit risk, but an important tool for lending institutions to design an adequate portfolio of borrowing options (e.g., terms and amounts of loans/grants, interest rates), sensitive to a varying range of risk scores (attitudes). Borrowing units with varying risk scores, in turn, are better able to choose a credit or grant plan appropriate for their needs. Ultimately, the sustainability of any lending system is critically dependent on properly assessing the risk associated with each borrower; thus the importance of a proper statistical model underlying their risk ranking, hitherto missing in most developing credit markets. 3 This paper assesses the performance of different risk scoring models with an application to microcredit. In particular, we compare the performance of standard probabilistic models with semi- and nonparametric methods based on predicted default probabilities. Semi- and nonparametric models impose less structure than parametric models on the functional form of the conditional probability to default and the link function governing this decision. We show the advantages of relying on less structured, data-driven methods for risk scoring using both simulated data and data from credit loans granted to small and microenterprises (SMEs) in rural Peru. Compared to traditional parametric models, nonparametric methods provide a more accurate measure of the risk associated with individual loans and can help prevent including potential bad borrowers and excluding good borrowers from sensitive microcredit markets. 1 As of December 2010, microfinance institutions reported reaching more than 205 million borrowers worldwide (Maes and Reed 2012). A separate issue pertains to whether microcredit has been an effective tool to lift poor people out of poverty by funding their microenterprises and increasing their wealth, considering that a large number of small businesses have been created through microcredits but only few have matured into larger businesses. Recent work evaluating the impact of microfinance using randomized field experiments provide mixed evidence regarding the effects of microcredit on household income and consumption (e.g., Banerjee et al. 2010; Dupas and Robinson 2009; Karlan and Zinman 2011). 2 There are also concerns that lending institutions have managed to sustain low interest rates and relatively high default rates due to subsidies and soft loans. Grameen Bank, for example, which charges an average real interest rate of 10 %, experienced losses close to 18 % of their outstanding loans from 1985 to 1996 after properly adjusting for their portfolio size (Armendariz and Morduch 2005). 3 See also Schreiner (2000) for additional discussion on credit scoring in microfinance.

3 Parametric versus nonparametric methods in risk scoring 1059 To the best of our knowledge, no previous study has formally examined the potential gains for client scoring in microfinance by using models that impose less structure on how the covariates can be related to the decision to default. In addition, due to the curse of dimensionality that affect semi- and nonparametric methods, these models seem well suited for microcredit as microfinance institutions typically have less information (variables) than traditional credit institutions. The remainder of the paper is organized as follows. Section 2 further discusses the potential advantages of using models that impose less structure on functional forms for risk scoring, including a simulation analysis. Section 3 compares the performance of nonparametric methods versus semiparametric and conventional probabilistic models using data from a microfinance institution in rural Peru. Section 4 presents concluding remarks. 2 Credit scoring A credit score is designed to indicate the creditworthiness of the borrowing unit by assigning a risk number based on a default probability. In well-developed credit markets, the use of credit scores is an integral part of the lending process and has proven to be an effective tool in determining the riskiness of borrowers, but in developing economies their increased use is relatively new. 4 This section discusses alternative methods that can be used in microlending to construct credit scores based on default probabilities and illustrates the potential advantages of using nonparametric estimators over more parametric methods. 2.1 Alternative risk scoring methods Associated with every potential borrower, there is a probability of default conditional on the terms of the loan being requested. This probability depends on the borrower s attributes as well as on external factors, which are not borrower-specific. The primary purpose of risk scoring models is to rank borrowers by estimating such probabilities. For example, consider a loan request from a smallholder farmer with limited resource endowments (e.g., few land and livestock). Borrower characteristics may include current assets (if any), income, credit history, and outstanding debt, while external factors may include crop price volatility. In general, estimation is conducted by relating a discrete variable, denoting failure or delayed payments of borrowers, to the borrower characteristics and external factors. The process of assigning a discrete variable to different borrowers is often referred to as segmentation of borrowers. In practice, individuals can be divided into different categories based on their level of risk. For example, we can think of a binary (0,1) variable to distinguish between individuals who never default and those who have defaulted at least once on past loans (i.e., low vs. high risk individuals). Alternatively, we can consider a count variable, which captures the exact number of times 4 Microfinance data in developing countries have been rather unexploited in general terms, in part due to the lack of information sharing across lending institutions.

4 1060 M. A. Hernandez, M. Torero an individual has defaulted in the past and may more accurately account for the riskiness of borrowers. 5 An appropriate econometric model, like a binary choice model or a count model, is then used to estimate default probabilities based on borrowers attributes and other factors. The result of this estimation is a risk algorithm, which, once statistically tested for appropriateness, can be applied to assess the default probabilities associated with different borrowers under various loan terms and conditions. Hereafter we focus on binary discrete-choice models, considering also the nature of our data (i.e., short-term loans granted in several cases for first time to smallholder farmers, which had to be paid back in just one full installment). A suitable statistical, discrete-choice model should be an accurate representation of the underlying relationship between borrower-specific attributes, external factors, and the defined binary variable reflecting the odds of a loan being of high or low risk. The specific assumption about the functional form for this relationship becomes crucial for classifying the borrowers into risk categories. Thus, the accuracy of the risk ranking is contingent on the choice of the functional form linking the decision to default to the covariates: the precision of the model predictions is likely to vary significantly depending on whether the chosen functional form is correct or not. For example, conditional on using appropriate covariates, imposing a linear relationship between the odds of defaulting and the covariates, when this relationship is not necessarily linear with respect to all variables, might lead to erroneous conclusions (see Sect. 2.2). Hence, allowing the data to reveal the functional form (i.e., to fit in the best functional form) is preferable to imposing specific and most likely erroneous functional assumptions. Statistical models in which specific functional forms are not imposed are known as nonparametric estimation methods. There is also a class of models called semiparametric models, which impose partial restrictions on the functional form. Although very well suited for credit scoring research, nonparametric methods have surprisingly not been widely applied in this area. A plausible explanation is that most implementable nonparametric methods have been developed more recently. However, it is also true that credit scoring databases in developed markets typically involve over one hundred variables (Hand and Henley 1997), which could limit the use of fully nonparametric methods due to the curse of dimensionality inherent in these models. But this is not the case for lending institutions in underdeveloped markets, which generally have more limited information (fewer variables). Formally, binary choice models are generally modeled as index function models, which posit the existence of an unobservable latent index variable y* related to a vector of covariates X such that Y = { 1 if y = Xβ + ε>0 0 if y = Xβ + ε 0 (1) where Y is the observed binary outcome, i.e., Y equals one if the individual defaults and Y equals zero otherwise, the explanatory variables X contain the set of factors 5 We could also consider a continuous variable measuring the percentage of loan (installments) repaid by each individual.

5 Parametric versus nonparametric methods in risk scoring 1061 that affect the likelihood of defaulting, and ε is an error term. 6 Equation (1) assumes a linear parametric link function between the decision to default (Y ) and the explanatory variables X. The conditional probability of default is, then, given by P(Y = 1 X) = E(Y X) = g(xβ) (2) where g( ) is the distribution function of the error term ε. In the case of a Probit model g( ) is assumed to be a Normal distribution, while in the case of a Logit model g( ) is assumed to be a logistic distribution. Hence, different distributional assumptions for ε lead to different functional forms for the conditional probability of Y = 1. The set of parameters β are estimated through maximum likelihood. A semiparametric single index model assumes that g( ) is an unknown distribution function. This model is semiparametric in nature since the functional form of the linear index is specified, while g( ) is left unspecified. Klein and Spady (1993) suggest a semiparametric likelihood approach to obtain parameters from Eq. (2). 7 More specifically, the maximum likelihood estimator of β is given by ˆβ = arg max L n (β, g) = β n i=1 (y i ln g(x i β) + (1 y i) ln(1 g(x i β))) (3) where g( ) is approximated for each individual (borrower) ( i through the leave-oneout nonparametric kernel estimator ĝ i (X (X i β) = j =i k j X i ) ) β h Y j j =i k ( (X j X i ) β h ), k( ) is a kernel function (e.g., Gaussian kernel), and h is the bandwidth, which is jointly estimated with the coefficient vector when maximizing the leave-one-out log likelihood function. 8 In a nonparametric setting, in turn, the conditional probability of default is modeled as P(Y = 1 X) = E(Y X) = f (X) (4) where the functional form f ( ) is unknown. The decision to default is not assumed to depend on vector X through a linear combination X β, but only through a link function f ( ) entirely driven by the data. Assuming that f ( ) is a smooth function, 6 The assumption that the threshold is zero is without loss of generality provided that X includes a constant. 7 An alternative estimator can be found in Ichimura (1993), but it is less efficient than the estimator proposed by Klein and Spady for binary choice models. 8 Klein and Spady add a trimming function to the log likelihood function, although trimming does not seem to matter in their simulations. Single index models further require two identification conditions under which the parameter vector β and function g( ) can be sensibly estimated. First, the set of explanatory variables X must contain at least one continuous variable. Second, β cannot be identified without some location and scale restrictions (normalizations). One popular location-normalization is to not include a constant in X; one popular scale-normalization is to assume that the first component of X has a unit coefficient and that this first component is a continuous variable. For further details on single index model estimations refer to Li and Racine (2006).

6 1062 M. A. Hernandez, M. Torero it can be estimated using kernel methods. Two widely used methods are the local constant and local linear conditional mean estimators. The local linear kernel estimator, implemented below, is based on the following minimization problem min a,b n ( (Y i a (X i x) b) 2 Xi x k h i=1 ). (5) Following Li and Racine (2006), if â =â(x) and ˆb = ˆb(x) are the solutions to (5), it can be shown that â(x) is a consistent estimator of the link function f (x) and ˆb(x) is a consistent (slope) estimator of f (x)/ x. The vector of smoothing parameters or bandwidths h ={h 1,...,h q }, where q is the number of covariates in X, can be estimated by least squares cross-validation. In particular, h 1,...,h q is chosen to minimize 1 n ni=1 (Y i ˆ f i (X i )) 2 M(X i ) using any standard numerical optimization procedure, where ˆ f i (X i ) is the leave-one-out kernel local linear estimator of f (X i ), and 0 M( ) 1 is just a weighting function to avoid a slow estimation convergence rate. As indicated by Li and Racine (2006), least squares cross-validation methods smooth out irrelevant regressors. 9 In general, the local linear and local constant least-squares estimators share many properties. We implement the local linear estimator because, compared to the local constant estimator, it does not suffer from a potentially large bias near the support boundaries of the estimated function (Fan and Gijbels 1996). The local linear estimator further outperformed the local constant estimator in our analysis. It is noteworthy, however, that while the Probit, Logit, and single index models explicitly recognize the discrete nature of the modeled outcome (i.e., to default or not), the local linear conditional mean estimator does not, and hence should be seen as a (nonparametric) approximation to a model where the dependent variable is the probability that the binary outcome equals one. Further, the local linear estimator may yield fitted values which are not necessarily proper probabilities, i.e., values greater than one or less than zero. 10 We also fitted a nonparametric conditional mode model, which explicitly models the conditional probability of default, but this model was also outperformed by the local linear estimator, as discussed below A simulation analysis A simple example can demonstrate the advantages of using a nonparametric estimation procedure over semiparametric and parametric methods. Let the default probability of an individual depend on the loan amount and asset size. Assume that there exists an unknown threshold asset size below which the default probability rises considerably, 9 An alternative selection method is the standard rule-of-thumb procedure in which the bandwidth for covariate X s is defined as h s = X s,sd n 1/(4+q),whereX s,sd is the sample standard deviation of X s, n is the number of observations in the working sample, and q is the total number of covariates in X. 10 In this sense, the local linear estimator is similar to the standard linear probability model. We thank an anonymous referee for noting this. 11 See Racine (2008) for further details on nonparametric conditional mode models.

7 Parametric versus nonparametric methods in risk scoring 1063 while for asset sizes above the threshold level the default probability does not depend on asset size. This is equivalent to assuming that after reaching a certain level of assets (e.g., a certain amount of land or other fixed assets), the likelihood of repayment is relatively high and does not show much variation among individuals. The standard scoring models (e.g., Probit and Logit models) assume that the odds of default are linear with respect to all explanatory variables and that the underlying distribution governing the relationship between the probability of default and the explanatory variables is known. The use of these methods will then incorrectly estimate the risk of default for borrowers by not incorporating these nonlinearities, and can lead, for example, to the potential exclusion of good borrowers from the market, as we show in the next simulation exercise. As in Klein and Spady (1993), let X 1 be a standard normal variate truncated at ±2 and standardized by dividing by , and let X 2 be a chi-squared variate truncated at 6 and standardized by subtracting and dividing by to have zero mean and unit variance. The range of X 1 is ( 2.26, 2.26) and X 2 is ( 1.54, 2.41). Without loss of generality, further assume that the true default model is given by y i = { Xi1 3X i2 + u i if X i2 0.7 X i1 1 + u i if X i2 > 0.7 (6) for i = 1,...,1,000, and Y i = 1ifyi > 0 and Y i = 0 otherwise, where the u i sare standard normal. Following the example above, X 1 could be interpreted as the loan amount and X 2 as asset size, which affects the odds of default in a nonlinear fashion. In particular, the decision to default y increases at a constant rate with the loan amount (X 1 ) and decreases with asset size (X 2 ) up to a threshold level (i.e., 0.7 which is the 75th percentile of X 2 ), after which there is no correlation between y and X 2. Hence, after reaching the threshold of 0.7, the likelihood of default does not further decrease with X 2 (see Fig. 1a). Using these simulated data, Table 1 compares the in-sample predictive performance of a standard Probit model, Klein and Spady s single index model, and a nonparametric model following a local-linear least-squares procedure. 12 The indicators reported include the mean squared predicted error and several performance indicators based on converting the estimated default probabilities to a binary regime prediction using the standard 0.5 rule, i.e., if the estimated probability is greater than 0.5 the individual is predicted to default, while if the estimated probability is less than or equal to 0.5 the individual is predicted not to default; the binary (1/0) estimated probabilities are then compared to the observed default/non default behavior captured through Y. The results confirm the advantages of using a fully data-driven method over alternative methods. The nonparametric model shows a lower mean square predicted error (MSPE) than the other two models: versus of the semiparametric model and of the Probit model. Similarly, the overall predictive performance of the nonparametric model based on McFadden et al. (1977) standard measure is 86.6 ver- 12 While the Probit model is implemented in Stata, the single index and nonparametric models are implemented in R using the np package.

8 1064 M. A. Hernandez, M. Torero A Latent default index (y*) X 2 (Asset size) B Conditional Pr(Y=1) X 2 (Asset size) Probit Single index Nonparametric regression Fig. 1 a Simulating a nonlinear relationship between the odds of default and asset size. b Predicted probability of default on asset size using simulated data. Note In (a), yi = X i1 3X i2 + u i if X i2 0.7 and yi = X i1 1 + u i if X i2 > 0.7 fori = 1,...,1,000, where X 1 is a standard normal variate truncated at ±2 and standardized by dividing by , X 2 is a chi-squared variate truncated at 6 and standardized by subtracting and dividing by 1.511, and u is a standard normal variate. In (b), the predicted conditional probability of Y = 1onX 2 was derived holding constant X 1 at its median value. The single index estimates are based on Klein and Spady (1993) estimator using a Gaussian kernel function of order two. The nonparametric estimates follow a local linear least-squares procedure using also a Gaussian kernel type sus 85.9 % of the single index model and 81.7 % of the Probit model. 13 In terms of the correct classification rate, the corresponding figures are 87.4, 86.8, and 83.2 %. A separate analysis for default and non-default cases further indicates that although there are not major differences in the predictive performance of all three models for the default cases (the accuracy rate is between 88 and 89 %), the Probit model has a relatively poorer performance for the non-default cases: 74 % accuracy versus 85.1 and 86.4 % for the single index and nonparametric model McFadden et al. (1977) performance measure is equal to p 11 + p 22 p12 2 p2 21,wherep ij is the ijth entry (expressed as a fraction of the sum of all entries) in the 2 2 confusion matrix of actual versus predicted (0,1) outcomes. 14 The Logit and linear probability model also perform very similar to the Probit model. Details are available upon request.

9 Parametric versus nonparametric methods in risk scoring 1065 Table 1 In-sample predictive performance of alternative binary choice models using simulated data Indicator Probit model Single index model Nonparametric model Full sample, 1,000 observations Mean square predicted error Predictive performance 81.7% 85.9% 86.6% Correct default/non-default classification rate 83.2% 86.8% 87.4% Correct default classification rate 89.5% 87.9% 88.1% (sensitivity), 597 defaults Correct non-default classification rate 74.0% 85.1% 86.4% (specificity), 403 non-defaults Low asset value (X 2 less than or equal to 0.7), 756 observations Mean square predicted error Predictive performance 83.1% 87.3% 87.8% Correct default/non-default classification rate 84.8% 88.1% 88.5% Correct default classification rate 96.8% 94.4% 92.9% (sensitivity), 538 defaults Correct non-default classification rate 55.0% 72.5% 77.5% (specificity), 218 non-defaults High asset value (X 2 greater than 0.7), 244 observations Mean square predicted error Predictive performance 74.6% 79.8% 82.1% Correct default/non-default classification rate 78.3% 82.8% 84.0% Correct default classification rate 22.0% 28.8% 44.1% (sensitivity), 59 defaults Correct non-default classification rate (specificity), 185 non-defaults 96.2 % 100.0% 96.8% The predictive performance and classification rates are based on converting the estimated default probabilities to a binary regime prediction using the standard 0.5 rule. The predictive performance measure follows McFadden et al. (1977); the measure is equal to p 11 + p 22 p 2 12 p2 21 where p ij is the ijth entry in the standard 2 2 confusion matrix of actual versus predicted (0,1) outcomes in which the entries are expressed as a fraction of the sum of all entries. Sensitivity accounts for the percentage of cases in which observations (individuals) defaulting are also predicted to default, while specificity measures the percentage of cases in which individuals not defaulting are also predicted to not default. The single index results are based on Klein and Spady (1993) estimator using a Gaussian kernel function of order two. The nonparametric results follow a local linear least-squares procedure using also a Gaussian kernel type. The relative advantage of using nonparametric techniques over alternative parametric and semiparametric methods becomes much clearer when accounting for the fact that the odds of default are not linear with respect to some of the explanatory variables. Table 1 also shows that the Probit model tends to overestimate the probability of default for individuals with asset holdings below the threshold level (X 2 0.7), and underestimate the probability of default for individuals with asset holdings above the threshold level (X 2 > 0.7). The single index model, which still imposes a linear combination on the way the covariates affect the decision to default, also appears to considerably underestimate the probability of default for borrowing units with asset holdings above the threshold level. More specifically, for low asset values, the Probit

10 1066 M. A. Hernandez, M. Torero model exhibits a correct non-default classification rate of 55 versus 72.5 and 77.5 % of the single index and nonparametric model; for high asset values, the Probit and single index model reveal a correct default classification rate of only 22 and 28.8 % versus 44.1 % of the nonparametric model. 15 The analysis above confirms the importance of using less restrictive, data-driven risk scoring models, which can provide a more accurate credit assessment and help prevent excluding (including) potentially good ( bad ) borrowers from generally sensitive credit markets in most developing countries. Figure 1b reveals that the differences in the estimated probabilities of default across the models are particularly acute for asset holdings close to the threshold level (of 0.7), which makes the potential exclusion and inclusion of good and bad borrowing units particularly sensitive to the risk scoring model implemented and the decision criteria adopted. This issue becomes more complex if we consider that thresholds are not necessarily observed in reality and there might be nonlinearities in the odds of defaulting with respect to additional covariates typically included in credit score models like loan size, household income, and debt ratio. 16 However, it is also worth mentioning that the benefit of not imposing distributional assumptions on the error term or imposing fewer restrictions on the functional form of the link function may come with certain costs. First, by imposing less structure, semi- and nonparametric methods require more data to achieve the same degree of precision as a correctly specified parametric model. In addition, it may be harder to deal with measurement error in semi- and nonparametric methods, which seems a plausible scenario in microcredit scoring in developing countries, as compared to parametric methods. On this matter, we repeated the simulation exercise above but, after generating the latent default index y, we contaminated the explanatory variables (first X 2 and then both X 1 and X 2 ) with an additive error term generated from a Normal distribution with mean zero and increasing standard deviations (1, 2, 3, 4, 5, 10, and 20). Figure 4 in the Appendix shows that as we increase the variance in the measurement error of the regressors, the difference in the performance of the Probit versus the semiand nonparametric models decreases. We measure performance using the MSPE and the overall predictive performance of McFadden et al. (1977). Yet, for a sufficiently large measurement error (i.e., a standard deviation greater than 5), the Probit model does not necessarily outperform the single index and nonparametric models. 3 An application in Peru Now we turn to compare the in- and out-of-sample performance of risk scoring models based on nonparametric estimators versus semiparametric and more traditional parametric methods using data from a microfinance institution in rural Peru. We use a dataset of loans granted to SMEs linked to the agricultural sector. 15 Note also that the differences in the MSPEs across models are more pronounced for high asset values, largely explained by the much lower correct default classification rate of the Probit and single index models. 16 Of course, it is possible that the odds of defaulting are linear to all covariates; but still in this (implausible) scenario, data-driven methods will perform at least similar to linear models.

11 Parametric versus nonparametric methods in risk scoring Data The dataset includes information from 2,899 different short-term loans granted to 1,393 clients (SMEs) of a municipal savings bank in a rural area in Peru. 17 All of the credits were provided for agricultural activities, with the purpose of financing working capital. The loans were granted after January 1st, 2008 and all of them expired sometime before July 31, The amount granted was in local currency (Nuevos Soles) and had to be paid back in just one full installment at the end of the loan term. The data were collected by credit analysts from the bank, based on at least one visit to the client prior to the loan approval and on documentation provided by the clients. For the client s evaluation, the analysts had to present documents that validate the information of the borrower, including property titles, land leasing agreements, bills or receipts of advance grain purchases, and any other kind of related guarantees. Based on the information collected during the loan application, a total of ten variables were considered to perform the risk scoring (refer to Table 3 in the Appendix). Most of these variables are usually used for risk assessment in microfinance, and include socioeconomic characteristics of the client as well as loan characteristics. In particular, we account for age, gender, education level, and marital status of the client; size of the enterprise and number of years in the business; if the client owns real estate; and the amount, term, and interest rate of the loan. 18 The dependent variable, in turn, is a binary (1/0) variable indicating if the client defaulted or not. According to the bank regulations, individuals are legally considered to default if they do not make the required one-time full payment of the loan up to three days after the end of the loan term. We follow the same definition to determine if an individual defaulted or not. Table 4 presents summary statistics of the data. As can be seen, 69 % of the client base is male, 70 % married, and their ages range from 19 to 85. The education level of the borrowers is relatively low with only 10 % of them with some kind of tertiary education. The firms lent to are clearly small-sized with an average size of 3.3 workers and a maximum of 15 workers. Similarly, the average client has been in the business for 19 years, and two thirds own some form of real estate. Regarding the loans, the average loan size was 4,197 Nuevos Soles (around 1,422 US dollars using the average exchange rate of 2009), interest rates ranged between 2.9 and 4.5 %, and loans had to be paid back, on average, within six months. Also note that more than half of the client base defaulted (58 %). 3.2 Estimation results We estimate the probability of default as a function of socioeconomic characteristics and loan characteristics using three alternative binary choice models. Similar to the simulation analysis, we implement a Probit model, Klein and Spady s sin- 17 The name of the bank is omitted due to confidentiality reasons. 18 Unfortunately, we only have information on asset (real estate) ownership but not on asset value. We also do not have information on debt ratio.

12 1068 M. A. Hernandez, M. Torero gle index model, and a nonparametric model applying a local-linear least-squares procedure. 19 For evaluation purposes, we randomly partition our dataset into a design sample for model estimation (60 % of the observations) and a test sample for further analysis (40 % of the observations), maintaining the population proportions of default and nondefault actions in the two samples. This is a standard cross-validation procedure that allows us to conduct both in- and out-of-sample assessments of the estimated models. 20 The data for estimating default models should include both accepted and rejected applicants. However, usually it is only possible to identify default and non-default cases among clients that were granted a credit (as in our data). This implies that there could be a potential bias in the data generating process for these models, capturing the procedures used by the financial institution to accept or reject applicants. The estimators could reflect this bias, although it is not possible to determine its direction (Capon 1982). In our case, the savings bank has a very lenient acceptance procedure and the rejection rate is negligible, which suggests that any potential selection bias in our data is marginal. The selection process basically consists of validating all the information presented by the applicant during the application process. The full estimation results are reported in Table 5. The left panel of the table presents the estimated coefficients for the Probit model. In this parametric setup, where the odds of defaulting are assumed to change in a linear fashion with the covariates, asset ownership, a lower loan term, and a larger firm size, for example, decrease the probability of default, while older, less educated people and men have a higher probability of default. The coefficients, however, are not statistically significant at conventional levels in most of the cases. The middle panel reports the estimated coefficients for the single index model using a Gaussian kernel function of order two with an estimated bandwidth or smoothing parameter equal to The sign and magnitude of these coefficients should be interpreted with caution, given that they are normalized with respect to the first variable in the set of explanatory variables (i.e., the clients age). In this setting, most of the explanatory variables are statistically significant at a 5 % level. Finally, the right panel shows the corresponding bandwidths for each regressor estimated through a local-linear least-squares cross-validation procedure. We use a Gaussian kernel type for the continuous variables and Li and Racine (2004) unordered categorical kernel type for the discrete variables. This cross-validation method assigns large bandwidth values to regressors that enter linearly in the model (which appears to be the case of the loan amount), and selects relatively small bandwidth values for regressors that enter nonlinearly. The reported p-values of the significance tests performed indicate that most of the explanatory variables are statistically significant in this nonparametric setting; only the loan amount is not significant. The significance tests follow Racine (1997) and Racine et al. (2006), and are based on a bootstrapping procedure with 1,000 replications. 19 We estimate a random-effects Probit model since a client may be observed more than once in the database. 20 We also considered alternative data partitions (70 30 and %) and obtained qualitatively similar results. The results are also not sensitive to repeated % data partitions.

13 Parametric versus nonparametric methods in risk scoring 1069 For matters of comparison across models, Table 5 also reports the marginal effects of the regressors on the conditional probability of default, evaluated at the median values. For continuous variables, the simulated change is equivalent to one standard deviation, while for discrete variables the change is from 0 to 1. Note, however, that the marginal effects in the semi- and nonparametric models are not unique as these effects vary across different evaluation points (as shown in Fig. 3 below for the case of business size and client age). According to the Probit model, the two most important (and statistically significant) factors are asset ownership and the client s age: owning real estate decreases the probability of default in almost 14 percentage points while one standard deviation increase in the client s age (12.6 years) increases the likelihood of defaulting in 4.5 percentage points. In the case of the single index model, the two most important factors (at the median values) are education level and loan size: having tertiary education and a one standard deviation increase in the loan amount (4,900 Nuevos Soles or 1,660 US dollars) decreases the probability of default by 49 and 48 percentage points. Education level is also one of the most important factors (at the median values) in the nonparametric model together with the loan term: an individual with tertiary education has 34-percentage points lower probability of defaulting while a one standard deviation increase in the loan term (4 months) decreases the likelihood of defaulting by 39 percentage points. Hence, the models yield varying results regarding the effects of the covariates on the estimated probability of default. We further discuss this below. Turning to the predictive performance of the models, Table 2 compares both their inand out-of-sample performance. Similar to the simulation analysis, a simple contrast of the within-sample performance across models confirms the higher accuracy of a less structured, data-driven approach over alternative approaches. 21 As shown in the top panel of the table, the MSPE of the nonparametric model (0.153) is much lower than in the Probit model (0.251) and semiparametric model (0.190). In addition, the overall predictive performance of the nonparametric estimator based on the 0.5 rule is 75 versus 67.1 % of the single index model and 44 % of the Probit model, which is the best-performing parametric model. 22 The poorer performance of the Probit model is largely explained by the poor predictions for the individuals who do not default on their payments (i.e., identification of good borrowers). Despite the better fit of the data-driven method on the sample used for the estimation, it is necessary to evaluate whether this method provides a better out-of-sample fit than the alternative methods. Ultimately, we want to examine if the proposed method will help lending institutions correctly identify and select their current and future clients. The out-of-sample assessment of the estimated models, reported in the bottom panel of Table 2, corroborates the advantages of using a data-driven approach over alternative specifications. The nonparametric model exhibits a lower mean squared prediction error than the semiparametric and parametric models (0.205 vs and 21 As indicated above, the local linear model may yield fitted values greater than one or less than zero. In this case, the fitted values range between 0.01 and 1.06, where 14 observations (out of 1,739) are greater than one and one observation is less than zero. 22 The predictive performance (both in-sample and out-of-sample) of the Logit and linear probability model are very similar to the performance of the Probit model. Further details are available upon request.

14 1070 M. A. Hernandez, M. Torero Table 2 Predictive performance of alternative binary choice models using loan data from SMEs in rural Peru Indicator Probit model Single index model Nonparametric model In-sample performance, 1,739 observations Mean square predicted error Predictive performance 44.0% 67.1% 75.0% Correct default/non-default 58.8% 71.7% 78.1% classification rate Correct default classification rate 94.9% 85.4% 91.4% (sensitivity), 1,009 defaults Correct non-default classification 8.8% 52.7% 59.7% rate (specificity), 730 non-defaults Out-of-sample performance, 1,160 observations Mean square predicted error Predictive performance 43.8% 57.2% 61.0% Correct default/non-default 58.6% 64.1% 67.1% classification rate Correct default classification rate 94.9% 77.0% 81.6% (sensitivity), 673 defaults Correct non-default classification rate (specificity), 487 non-defaults 8.4% 46.2% 47.0% The predictive performance and classification rates are based on converting the estimated default probabilities to a binary regime prediction using the standard 0.5 rule. The predictive performance measure follows McFadden et al. (1977); the measure is equal to p 11 + p 22 p 2 12 p2 21 where p ij is the ijth entry in the standard 2 2 confusion matrix of actual versus predicted (0,1) outcomes in which the entries are expressed as a fraction of the sum of all entries. Sensitivity accounts for the percentage of cases in which individuals defaulting are also predicted to default, while specificity measures the percentage of cases in which individuals not defaulting are also predicted to not default. The parametric results are based on the estimation of a Probit model with random effects. The single index results are based on Klein and Spady (1993) estimator using a Gaussian kernel function of order two. The nonparametric results follow a local linear least-squares procedure using a Gaussian kernel type for the continuous explanatory variables, and Li and Racine (2004) unordered categorical kernel type for the discrete variables ). Similarly, the overall predictive performance of the nonparametric model is over 17 percentage points higher than the Probit model and close to four percentage points higher than the single index model. More specifically, the nonparametric method has a predictive performance of 61 versus 57.2 % of the semiparametric model and 43.8 % of the Probit model. Alternatively, the data-driven method shows 67.1 % correct predictions versus 64.1 % of the single index model and 58.6 % of the parametric method. As in the case of the in-sample evaluations, most of the differences between the models arise when examining their accuracy among the non-defaulting cases (487 out of 1,160 credits). The nonparametric method correctly predicts non-defaults for 47 % of them (229 credits), while the single index model has an accuracy rate of 46.2 % (225 credits) and the Probit model has an accuracy rate of only 8.4 % (41 credits). Among the defaulting cases (673 credits), the data-driven approach has an accuracy rate of 81.6 % (549 credits) versus 77 % (518 credits) of the semiparametric approach

15 Parametric versus nonparametric methods in risk scoring 1071 and 94.9 % (639 credits) of the parametric specification. Thus, while the parametric approach does a good job in identifying potential bad borrowers, it has serious limitations when identifying good borrowers. As noted earlier, the local linear conditional mean model further exhibits a better performance than alternative nonparametric methods. For comparison purposes, Table 6 reports both in- and out-of-sample performance indicators of the local linear model versus the local constant conditional mean model and the conditional mode model, which explicitly models the conditional probability of default. In sample, the overall predictive performance of the local linear model is 2.4 and 1 percentage points higher than the local constant and conditional mode model; the local linear model also exhibits a lower mean squared predicted error than the local constant model (0.153 vs ). Out of sample, the performance of the local linear model is 0.1 and 1.3 percentage points higher than the other two models, and again the local linear estimator shows a lower mean squared predicted error than the local constant estimator (0.205 vs ). Another interesting pattern that emerges from the table is that all three nonparametric methods outperform the parametric (Probit) model and semiparametric single index model. An alternative way to evaluate the out-of-sample performance of credit score models is examining the number of good clients the model rates as bad (Type I error) and the number of bad clients the model rates as good (Type II error) for varying cutoff values of the probability of default. So far, we have used the standard 0.5 rule for the performance assessment. Figure 2a, b compares the percentage of good borrowers rejected and the percentage of bad borrowers accepted across the three estimated models for different cutoff values. In the case of Type I errors, the datadriven method outperforms the other methods for almost the entire range of cutoff values. This means that the lending institutions will better identify good clients by relying on the nonparametric credit score approach for basically any rejection rule (cutoff value). Note that the Probit model performs very low in accepting good clients for cutoff values lower than 0.6; furthermore, for cutoff values lower than 0.5 the percentage of good borrowers rejected is close to 100 %. In the case of Type II errors, we observe that the Probit model has a better performance than the nonparametric and semiparametric model for cutoff values lower than Hence, for more stringent acceptance rules (low cutoff values) the lending institution will do better using a parametric approach to identify bad clients, although this will also result in a much higher rejection rate of good clients (Type I error) compared to the other methods. For more lenient acceptance rules (cutoff values higher than 0.6), the nonparametric model generally outperforms the other models. We could argue that the lower overall performance of the Probit model could be linked to the lack of sufficient, relevant controls to model the probability of default (note that most of the control variables in the Probit specification are not statistically significant). As indicated above, data limitations prevent us from controlling for asset value or debt ratio, for example. 23 However, the inclusion of additional controls could 23 We also do not account for the probability of crop failure or climate conditions, but these variables are unlikely to explain default behavior in this case since the loans analyzed were granted to smallholder famers operating in a particular rural area in Peru.

16 1072 M. A. Hernandez, M. Torero A Percentage of "Good" rejected Cutoff value Probit Single index Nonparametric regression B Percentage of "Bad" Accepted Cutoff value Probit Single index Nonparametric regression Fig. 2 a Comparison of Type I errors. b Comparison of Type II errors. Note The parametric estimates are based on the estimation of a Probit model with random effects. The single index estimates are based on Klein and Spady (1993) estimator using a Gaussian kernel function of order two. The nonparametric estimates follow a local linear least-squares procedure using a Gaussian kernel type for the continuous explanatory variables, and Li and Racine (2004) unordered categorical kernel type for the discrete variables eventually increase the predictive performance of all models but not necessarily of one model in particular. Data-driven methods will generally perform at least similar to alternative methods given that they do not impose ex ante specific assumptions about functional forms. A standard specification error or link test also indicates that parametric regression models do not appear to be properly specified in this case, which further motivates the use of semi- and nonparametric models. In particular, Table 7 reports the results of the link test based on Tukey (1949) and Pregibon (1979), after fitting a Probit, Logit, and linear probability model. The test consists in refitting each model using their linear predicted values and the square of these values; if the model is properly specified, the linear predicted value should be a statistically significant predictor but the squared value should not have predictive power. Formally, the test evaluates if the link function relating the dependent variable to the explanatory variables is properly specified; the test is also often interpreted as a test that, conditional on the specification, evaluates if the explanatory variables are correctly specified. The results for all three models suggest the existence of a specification (link) error at a 5 % significance level.

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 25 Outline We will consider econometric