Table III model 1 2 3 4 Dscrmnant analyss 65.4 62.2 78.0 8.1 Lnear regresson model 55.1 47.0 87.5 6.2 Probt model 71.9 76.4 54.1 13.1 Posson Model 62.4 57.7 81.8 7.3 Negatve bnomal II model 63.3 58.9 80.6 7.6 two step procedure 64.9 61.1 79.8 7.6 9
Table II varables Posson Bn.Neg.II t.. 0.398 0.398 (0.358) (0.474) P1 0.086 0.086 (0.027) (0.069) P2-0.005-0.005 (0.034) (0.088) P3-0.640-0.640 (0.344) (1.236) P4-0.251-0.251 (0.028) (0.072) P5-0.248-0.248 (0.026) (0.065) P6 0.208 0.207 (0.042) (0.105) P7 0.073 0.073 (0.047) (0.116) P8 0.067 0.067 (0.057) (0.146) P9 0.143 0.143 (0.066) (0.165) E1-0.074-0.074 (0.054) (0.145) E2-0.029-0.029 (0.052) (0.140) F1-0.411-0.411 (0.032) (0.080) F2 1.420 1.420 (0.031) (0.092) F3-0.138-0.138 (0.035) (0.090) F4 0.003 0.004 (0.001) (0.018) C2-0.137-0.137 (0.072) (0.178) C3 0.531 0.531 (0.163) (1.257) alfa 1.340 8
Table I Number of Absolute Number of Absolute defaults frequency defaults frequency 0 3002 14 13 1 502 15 11 2 187 16 4 3 138 17 5 4 233 18 8 5 160 19 6 6 107 20 3 7 80 22 1 8 59 24 1 9 53 28 1 10 41 29 1 11 28 30 1 12 34 31 1 13 10 34 1 7
Adequate use of count data models wthout mean-varance restrcton s useful to nd whch are the most nuencal varables n the studed process. It has to be noted that estmaton requred assymptotc approxmatons for standard errors. Another mportant ssue s the possblty of estmatng truncated negatve bnomal models n order to study truncated samples whch have great nterest n ths context. Estmaton results are not shown n the text to keep t bref, but they show a change n the nuence of many varables. Income, for exemple, whch was not sgncant becomes sgncant n the truncated model, that s to say that once a payment has been mssed, hgher ncomes mean sgncant smaller expected number of defaults. The possblty of establshng a two step procedure for predcton has been stressed. In fact, ths combnes the use of dscrmnant analyss and modelzaton and t leads to a good global classcatn rate keepng the number of bad clents accepted as good well bellow 10%. Further research s needed to desetangle some obscure ponts such as model selecton or mssspeccaton n truncated models. In ths stuaton although prepayment has not been consdered, one should see the way to nclude duraton of repayment at sample collecton and ts nuence n nal estmaton and classcaton results. 6 References Altman, E.I., R.B. Avery, R.A. Esenbes y J.F. Snkey (1981) Applcaton of Classcaton Technques n Busness, Bankng and Fnance. JAI Press. Greenwch, CT. Boyes, W.J., Homan, D.L. y S.A. Low (1989) 'An Econometrc Analyss of the Bank Credt Scorng Problem', Journal of Econometrcs, 40, 3-14. Cameron, A. C. y P.K. Trved (1986) 'Econometrc Models Based on Count Data: Comparson and Applcaton of Some Estmators and Tests' Journal of Appled Econometrcs, 1, 29-53. Frome, E.L., Kutner, M.H. y J.J. Beauchamp (1973) 'Regresson Analyss of Posson-Dstrbuted Data' Journal of the Amercan Statstcal Assocaton, 68, 344 935-940. Goureroux, C., Monfort, A, y A. Trognon (1984a) 'Pseudo-Maxmum Lkelhood Methods: Theory' Econometrca, 52, 681-700. Goureroux, C., Monfort, A, y A. Trognon (1984b) 'Pseudo-Maxmum Lkelhodd Methods: Applcatons to Posson Models' Econometrca, 52, 701-720. Grogger, J.T. y R.T. Carson (1991) 'Models for Truncated Counts' Journal of Appled Econometrcs, 6, 225-238. Gullen, M. (1992) Analss econometrco del credt scorng. Modelos Count Data. Ph. D. Dssertaton. Unversty of Barcelona, Span. Hausman, J., Hall, B.H. y Z. Grllches (1984) 'Econometrc Models for Count Data wth an Applcaton to the Patents-R&D Relatonshp' Econometrca, 52, 909-938. Lee, L-F. (1986) 'Speccaton Test for the Posson Regresson Models ' Internatonal Economc Revew, 27, 689-706. McCullagh, P. y J.A. Nelder F.R.S. (1983) Generalzed Lnear Models. Probablty. Chapman and Hall. London. Monographs on Statstcs and Appled Mullahy, J. (1986) 'Speccaton and Testng of Some Moded Count Data Models' Journal of Econometrcs, 33, 3, 341-366. Myers, J.H. y W. Forgy (1963) 'The Development of Numercal Credt Evaluaton Systems' Journal of the Amercan Statstcal Assocaton, 58, 303, 779-806. Steenackers, A. y M.J. Goovaerts (1989) 'A Credt Scorng Model for Personal Loans' Insurance: Mathematcs and Economcs, 8, 31-34. 6
4 Results In 1990, a Spansh nancal nsttuton provded a data set contanng almost 5000 clents that had been granted credt n the prevous four years. The type of credt of nterest s known as personal loan. Personal loans are characterzed by the fact that the amount of money granted s moderate. Usually, the loan s returned n a short perod of tme and t s often repad monthly wth nstalments that are constant along the repayment perod and not very large compared to ndvdual ncome. Normally, ths knd of loans are related respond, most commonly automobles. The varables ncluded n the models are dvded n derent groups accordng to the source of the nformaton provded, whether they are tems responded by the ndvdual applyng for credt, or f those tems are nformaton that, although t may be provded by the applcant, the nancal nsttuton s able to check n ts own les. Varables n the models may be found n one of the followng three groups: Personal varables (date of brth, martal status, number of chldren,...) Soco-economc varables (net monthly ncome, housng ownershp,...) Fnancal varables (monthly mortgage, avalablty of credt card, amount requested...) The man nnovaton about the varables that are used n the model for credt scorng s that the above varables provde the nformaton that s needed to create new varables nnally used n the model. Modcatons are made n two derent senses and have two well establshed objectves: a) On the one hand, new combnatons may preserve condentalty of the dscrmnant functons or model that are beng mplemented. b) On the other hand, by usng some new varables, the model can cope wth nteractons. Table I shows the absolute frequences for the varable of nterest, that s the number of unpad nstalments. Note that 64 % of clents have no defaults. Table II shows estmates and standard errors for three models. For estmaton purposes, some ndvduals were elmnated from the orgnal sample. Indvduals wth repayment lastng less than sx months at sample collecton were excluded from the estmaton process on the grounds that there was not enough nformaton about ther repayment behavour and that posteror classcaton could be msleadng. It s nterestng to see that parameter estmates are the same, except for varable F4, but note how estmaton of a Posson model leads to dstorted standard errors due to the fact that heterogenety s not taken nto account. Classcaton results are shown n Table III. Fnally, a two step procedure for predcton s proposed. Frstly, one can use dscrmnant analyss to predct future behavour. Secondly, when the score obtaned les wthn a crtcal frame (around 50 %), a truncated negatve bnomal model s used to predct the number of defaulted nstalments. If ths number s large, the ndvdual s classed to the bad group. On the other hand, f t s small, the predcton s good. The orgnal crterum gven by the bank s used to dene the concept of large and small. The rst column represents the total correct classcaton, the second column s correct classcaton of good, the thrd s correct classcaton of bad and the forth s the percentage of bad accepted nto the good group. 5 Fnal remarks Classcaton problems n the context of credt grantng decsons may use count data models due to the chracterstcs of the dependent varable. In fact, the number of defaulted payments s the varable used to dene whether a clent s good (repayng) or bad (defaulter). 5
For the Posson model, elmnatng rrelevant terms, the objectve log-lkelhood functon s: `(y; ) = X n [y X exp(x )] : For Type II negatve bnomal model, the expresson s: nx =1 log `(y ;X ;;)= nx =1 log (y + 1 ) ( 1 ) (y +1) ( 1 e X ) 1 1 (1 + 1 e X ) y+ 1 The estmaton of standard errors, asymptotc approxmatons through pseudo maxmum lkelhood were used (Goureroux and Monfort, 1984). For the truncated at zero negatve bnomal model the log-lkelhood s: ln L = mx =1 ln (y + c ) ln c ln (y +1)+y ln y (c 1) ln (y + c c) )ln((1 +1) ln(1 P (Y = 0)) where m s the number of ndvduals n the truncated sample, that s for whch y > 0. For estmaton purposes, a partcular value for c was xed and was estmated n a prevous step (for detals, see Gullen, 1992). 3 Predcton n credt scorng models Usually, the performance of credt scorng models s evaluated through the percentage of correct classcaton for the ndvduals who already appled for credt, accordng to ther subsequent behavour. Nevertheless, the percentage of bad clents that would be classed as good by the scorng s a very mportant ssue. It s ths measure that s to be mnmzed snce the smaller t s, the smaller the rsk of grantng credt to potencal defaulters. Eventually, studes n ths area take a part of the sample for estmaton purposes and another part s used to check the predtve performance of the estmated models. In ths work, we do not set part of the sample apart snce an ntal dscrmnant analyss was done wth the whole sample and we dd not want to alter ths n the vew of a nal comparson of derent approaches. Denton of good and bad clents was based on the number of monthly nstalments that were defaulted. When usng dscrmnant analyss, a score was assocated to each ndvdual. The score s a transformaton of the probablty of havng been drawn from each of the two populatons under study. If the estmated probablty of beng a good clent s greater that the estmated probablty of beng bad, the predcton for the ndvdual s that t belongs to the good group (and conversely, for a smaller probablty). Ths predcton s compared to the actual clent behavour. When ths s done for all ndvduals n the sample, an estmaton of classcaton rates s obtaned. Wth the estmaton of count data models, predcton has to be performed n two steps. Frstly, the number of expected defaulted monthly nstalments s found. Afterwards, the denton of predcted goodor predcted bad s assgned to the ndvdual followng the same crterum that s used to dene good and bad clents n the sample. At the end, predcted and real behavour are compared to obtan estmated classcaton rates that may be used to evaluate the performance of ths methodololy to tradtonal approaches. 4
where y 0 2f0;1;2;:::g y x 0 2 R k and the parameter s related to the covarant varables through the expresson bellow, thus preservng postvty = exp(x): (1) Ths model s a generalzed lnear model (McCullagh and Nelder, 1983) and was presented n an equvalent form by Goureroux and Monfort (1984a and 1984b). It s known that the Posson model does not account for heterogenety snce t has a mean-varance restrcton. Negatve bnomal models are one possble generalzaton of the prevous one, allowng a exble relatonshp n terms of mean and varance. Negatve bnomal models appear when a dsturbance term, s ntroduced n relatonshp (1), so that ln() =X + : If s assumed to have a Gamma dstrbuton, ths leads to a condtonal negatve bnomal dstrbuton for the dependent varable Y. In fact, the negatve bnomal dstrbuton requres two parameters, unless a constant term s present n Xand s taken wth mean equal to one. Thus, both parameters are related to X n the followng way: and = exp(x) (2) =(1=)(exp(X)) c for >0 and c a xed constant. (3) Derent possbltes for constant c oer derent possble negatve bnomal models and types of (condtonal) mean-varance relatonshps. For c = 1, takng xed X, Var(Y ) = (1 + )E(Y ), whch was called Type I negatve bnomal model by Cameron and Trved (1986). Lkely, for c = 0 Type II s obtaned and the relatonshp s Var(Y )= E(Y)(1 + E(Y )). When truncated data need to be studed, partcularly when the truncaton pont s zero, modelzaton s based on the fact that P (Y = y jy >0) = P (Y = y ) P (Y >0) ; y =1;2;:::: To get the truncated negatve bnomal model we rst note that P (Y = y )= (y + ) () (y +1) ( ) (1 + ) (y+) y =0;1;2;:::: then usng (2) and (3) the followng expresson s obtaned P (Y = y )= (y + c ) c (y +1) y ((1 c) +1) (y+ c ) (1 c)y : Fnally, P (Y = y jy >0) = = (y + c ) c y c) ((1 +1) (y+ c (y +1) ) (1 Log-lkelhood functons for the untruncated models are stated bellow. c)y (1 P (Y = 0)) 1 : 3
classes: good and bad. Good clents would return the money completely, where as bad clents would be defaulters. A data base was avalable, havng nformaton from almost 5000 clents. Frstly, by means of the covarant varables, a dscrmnant analyss was performed, n order to produce a dscrmnant rule that would serve as a bass for the decson to grant credt. Ths procedure has been suggested by many authors n the same stuaton. For example, Myers and Forgy (1963) worked on a data base from an nsttuton n ths way. More recently, Steenackers and Goovaerts (1989) provded a smlar approach for a Belgan bank. Afterwards, lookng more carefully to the avalable data, we tred to nd alternatve ways to produce an estmaton of the expected level of debt for potental credtors. The crucal pont was to note that the behavour of clents was represented exclusvely by avarable countng the number defaulted nstalments, that s the number of tmes the clent dd not pay the money as t was agreed when credt was granted. In fact, ths varable was the bass for the denton of the two populatons: good and bad clents. So, nstead of usng technques to classfy ndvduals nto populatons, n ths work we suggest that a sensble approach s to modelze the varable countng the number of defaulted nstalments, whch sawayto get a model to predct the expected level of debt for new applcants. Moreover, we want to nd an adequate model for the credtors havng already defaulted at least one payment, n order to see whether there exsts a structural change n the model. Ths would lead to the nterpretaton that the process generatng the transton from no unpayment to one s derent from the process from whatever derent from zero number of defaults to one more. Mullahy (1986) proposed hurdle models to cope wth ths stuaton. Here, our purpose s to use count data models to predct the number of tmes that an applcant for credt wll not pay the accorded amount to return the credt. So, two derent ssues are addressed: modelzaton and predcton. Posson models are typcally used n stuatons where the dependent varable s dscrete, for example, the number of patents appled for by rms (Hausman, Hall and Grllches, 1984). Alternatvely, negatve bnomal dstrbuton models, that take nto account the observed heterogenety, have also been used (e.g., Cameron and Trved, 1986). Other dscrete dstrbutons are also useful to model count data, but n ths paper they are not stated. Here, we are gong to use Posson models, negatve bnomal models and ther truncated versons to model our dependent varable. In the next secton, we present the model speccaton and estmaton procedure. Afterwards, we explan how predcton performance wll be evaluated and nally, the results for the partcular data set are presented. In the last part, we draw some conclusons and we suggest possble ways to mprove our approach. 2 Model speccaton and estmaton The basc models for count data have been studed by many authors such as El Sayyad (1973), Frome, Kutner and Beauchamp (1973), Terza (1985), Cameron and Trved (1986) and Mullahy (1986). Let Y a dependent dscrete non-negatve varable. Assume that Y s the varable of nterest n a populaton from whch we have a random sample of sze n. Assume that X s a vector of k explanatory varables that wll be used for the modelzaton of Y Let y =(y 1 ; :::; y n ) 0, the vector n 1 observatons of Y, and X =(x 1 ; :::; x k ), the matrx n k for the observatons of the explanatory varables where x j =(x j1 ; :::x jn ) 0 s the vector correspondng to the n observatons of the j-th varable, j =1; :::k. The Posson model s obtaned when the condtonal probablty takes the followng form: P (Y = y 0 j X = x 0 )= exp( )y0 ; y 0! 2
Count Data Models for a credt scorng system Montserrat Gullen Departament d'econometra, Estadstca Economa Espanyola Unverstat de Barcelona Manuel Arts Departament d'econometra, Estadstca Economa Espanyola Unverstat de Barcelona Paper presented at the Thrd Meetng on the European Conference Seres n Quanttatve Economcs and Econometrcs on Econometrcs of Duraton, Count and Transton Models. Pars, December, 10-11, 1992 Abstract Credt scorng systems created for the evaluaton of new applcatons are based on the avalable statstcal nformaton whch s related to the behavour of former clents wth credt. Usually, nancal nsttutons apply dscrmnant analyss technques to create these systems but they lack of good propertes due, for example, to the presence of non-normal varables. As an alternatve, the future repayment behavour s predcted by means of the expected number of unpad nstalments. The use of ths latter varable suggests that approprate models mght be of nterest, n whch some covarant exogenous varables are ncluded n order to specfy the expected level of debt. At ths pont, prepayment s not explctly consdered. These models should be used as explanatory tools when evaluatng the level of rsk nvolved n personal credt transactons. Negatve Bnomal Dstrbuton models show partcularly useful when heterogenety s taken nto account. Some results related to predcton performance are shown for derent model speccatons n the case of data from a Spansh bank. Keywords: count data, NBD models, credt scorng AMS Classcaton: 90A19 Abbrevated ttle: Count Data Models 1 Introducton The applcaton of statstcal technques for the analyss of decson problems entalng classcaton has expermented a development n the last decades. Specally, n the context of economcs and nance, some partcular problems have oered a wde range of possble applcatons for theoretcal results. For a revew, see Altman, Avery, Esenbes and Snkey (1981). The work presented here was motvated as we wshed to develop a system for credt scorng. Ths means that a nancal nsttuton desred to nd a way to classfy new clents applyng for credt nto two derent Malng address: Montserrat Gullen. Departament d'econometra, Estadstca Economa Espanyola, Unverstat de Barcelona, Dagonal, 690, E-08034 Barcelona, Span. (e-mal: gullen@rscd2.ub.es) 1