An Empirical Study on Default Factors for US Sub-prime Residential Loans

Similar documents
After-tax APRPlus The APRPlus taking into account the effect of income taxes.

An Empirical Model of Subprime Mortgage Default from 2000 to 2007

Where s the Smoking Gun? A Study of Underwriting Standards for US Subprime Mortgages

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Differences Across Originators in CMBS Loan Underwriting

Comments on Understanding the Subprime Mortgage Crisis Chris Mayer

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Did Affordable Housing Legislation Contribute to the Subprime Securities Boom?

Jacksonville Bancorp, Inc. (Exact name of registrant as specified in its charter)

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C FORM 10-K. For the transition period from to.

Are Lemon s Sold First? Dynamic Signaling in the Mortgage Market. Online Appendix

Identifying Issues in the Subprime Mortgage Market: North San Joaquin Valley

Supplementary Results for Geographic Variation in Subprime Loan Features, Foreclosures and Prepayments. Morgan J. Rose. March 2011

Residential Mortgage Credit Model

Credit Risk of Low Income Mortgages

Complex Mortgages. Gene Amromin Federal Reserve Bank of Chicago. Jennifer Huang University of Texas at Austin and Cheung Kong GSB

Identifying Issues in the Subprime Mortgage Market: The Bay Area

A Comprehensive Look at the CECL Model

S&P Comments On Sequoia Mortgage Trust 2010-H1's Potential Credit Strengths And Risk Considerations

Internet Appendix for Did Dubious Mortgage Origination Practices Distort House Prices?

Financing Residential Real Estate. Conventional Financing

The Role of Soft Information in a Dynamic Contract Setting:

LOGISTIC REGRESSION OF LOAN FULFILLMENT MODEL ON ONLINE PEER-TO-PEER LENDING

Mortgage Modeling: Topics in Robustness. Robert Reeves September 2012 Bank of America

1. Modification algorithm

Understanding the Foreclosure Crisis in California

Complex Mortgages. May 2014

Risky Borrowers or Risky Mortgages?

CREDIT RISK MANAGEMENT GUIDANCE FOR HOME EQUITY LENDING

Qualified Residential Mortgage: Background Data Analysis on Credit Risk Retention 1 AUGUST 2013

Previous articles in this series have focused on the

Mortgage terminology.

A Look Behind the Numbers: FHA Lending in Ohio

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

Interest Rate Pass-Through: Mortgage Rates, Household Consumption, and Voluntary Deleveraging. Online Appendix

A Fast Track to Structured Finance Modeling, Monitoring, and Valuation: Jump Start VBA By William Preinitz Copyright 2009 by William Preinitz

What Fueled the Financial Crisis?

Loan Level Mortgage Modeling

New Model of Subprime Mortgage Rates

Announcement March 5, Updates and Clarifications for Streamlined Refinance Products

An Update on the Evolution of the Mortgage Origination Process 9

Not all prepayment penalties are created equal

PIMCO Advisory s Approach to RMBS Valuation. December 8, 2010

MEGA ALT ARM (MA5/1)

Structured Finance. U.S. RMBS Loan Loss Model Criteria. Residential Mortgage / U.S.A. Sector-Specific Criteria. Scope. Key Rating Drivers

Estimation of a credit scoring model for lenders company

How the Proposed Current Expected Credit Loss (CECL) Rule Will Affect your Allowance for Loan and Lease Losses

HOUSEHOLDS INDEBTEDNESS: A MICROECONOMIC ANALYSIS BASED ON THE RESULTS OF THE HOUSEHOLDS FINANCIAL AND CONSUMPTION SURVEY*

DYNAMICS OF HOUSING DEBT IN THE RECENT BOOM AND BUST. Manuel Adelino (Duke) Antoinette Schoar (MIT Sloan and NBER) Felipe Severino (Dartmouth)

Credit Card Receivable-Backed Securities

Pecuniary Mistakes? Payday Borrowing by Credit Union Members

Commercial Real. Estate. CMBS Conduit. Loan. Program. Retail Medical Office Industrial Warehouse Hotel Apartment Mixed-Use Self-Storage

Modelling Bank Loan LGD of Corporate and SME Segment

Issue No. 80 July 2009

HELOC end-of-draw analysis

Using R for Regulatory Stress Testing Modeling

Optimal Interest Rate for a Borrower with Estimated Default and Prepayment Risk

Conventional Financing

GLOBAL CREDIT RATING CO. Rating Methodology. Structured Finance. Global Consumer ABS Rating Criteria Updated April 2014

Understanding HELOC end of draw

January Basics of Fannie Mae Single-Family MBS 2018 FANNIE MAE

Negotiating Commitment Letters For Traditional Bank Financing. An Article by Michael L. Messer and Jeremy M. Garlock SCHENCK, PRICE, SMITH & KING, LLP

CECL Workshop Vintage Method

The Untold Costs of Subprime Lending: Communities of Color in California. Carolina Reid. Federal Reserve Bank of San Francisco.

How can saving deposit rate and Hang Seng Index affect housing prices : an empirical study in Hong Kong market

OCC and OTS Mortgage Metrics Report Disclosure of National Bank and Federal Thrift Mortgage Loan Data

M E M O R A N D U M Financial Crisis Inquiry Commission

The Subprime Market Meltdown: Crisis or Opportunity?

Top US Bankcard Issuer Validates the Power of FICO 8 Score Key metrics exceed client expectations in originations testing

INTRODUCTION TO SURVIVAL ANALYSIS IN BUSINESS

ScienceDirect. Detecting the abnormal lenders from P2P lending data

SAVE THE DATE! 22nd Annual CFO Council Conference The Disneyland Hotel Anaheim, CA May 15 18, 2016

Lending and Collateral Q&A

Fannie Mae 2010 First Quarter Credit Supplement. May 10, 2010

The Influence of Foreclosure Delays on Borrower s Default Behavior

ADVANCE PRODUCT COMPARISON.

How Do Predatory Lending Laws Influence Mortgage Lending in Urban Areas? A Tale of Two Cities

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

Ivan Gjaja (212) Natalia Nekipelova (212)

Randall S Kroszner: Legislative proposals on reforming mortgage practices

ditech BUSINESS LENDING JUMBO PRODUCTS

Mortgage Terms Glossary

SLM CORPORATION INVESTOR PRESENTATION STEVE MCGARRY EVP AND CFO

A+ HYBRID PROGRAM. 1 Year (5,3) 5% Months % Months 13-24

Credit Modeling, CECL, Concentration, and Capital Stress Testing

Household Finance Session: Annette Vissing-Jorgensen, Northwestern University

CRIF Lending Solutions WHITE PAPER

Chapter 11. Evaluating Consumer Loans

Study on the costs and benefits of the different policy options for mortgage credit. Annex D

Real Estate Loan Losses, Bank Failure and Emerging Regulation 2010

A Nation of Renters? Promoting Homeownership Post-Crisis. Roberto G. Quercia Kevin A. Park

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Fannie Mae 2009 First Quarter Credit Supplement. May 8, 2009

Economic Response Models in LookAhead

Printable Lesson Materials

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

Are you prepared? FASB s CECL Model for Impairment Demystifying the Proposed Standard

Extension Analysis. Lauren Goodwin Advisor: Steve Cherry. Spring Introduction and Background Filing Basics... 2

Financing Residential Real Estate. Lesson 11: FHA-Insured Loans

Transcription:

An Empirical Study on Default Factors for US Sub-prime Residential Loans Kai-Jiun Chang, Ph.D. Candidate, National Taiwan University, Taiwan ABSTRACT This research aims to identify the loan characteristics which are predictive of loan defaults, and fit the relationship between default rates and loan characteristics. From the empirical results, the loan characteristics with forecasting power include credit scores, loan-to-value ratios, payment methods, and lien status. Keywords: Default Risk, US Sub-prime Residential Loans, Credit Scores, Loan-to-value Ratios INTRODUCTION Originating mortgages related to real estate purchases or investments is a very important activity for financial institutions. Financial institutions are exposed to several kinds of risks when they conduct this business. This study aims to analyze one of the major risks of concern, that is, the credit risk. Credit risk, also commonly referred to as default risk, is the uncertainty faced by financial institutions whether or not borrowers will repay the principal and interest based on scheduled time when the contract is signed. To reduce this uncertainty, financial institutions examine various factors before lending out the money. Some of the factors commonly examined by financial institutions include credit history of the borrower, the loan-to-value ratios of the collateral, the purpose of the loan, etc. These are the factors known to financial institutions at the time when loans are originated. As a result, whether or not financial institutions could predict at the origination the outcomes of the residential mortgage loans they underwrite is very important to financial institutions engaging in mortgage lending business. Given the practical importance of default prediction, there have been several academic studies and institutional models devoted to the topic. Gjava (2000) analyzed 137,000 sub-prime loans originated between 1995 and 2000 with proportional hazard assumption, and found that borrower credit, loan-to-value ratios, owner occupancy status, and payment schedule are main determinants of defaults. He also noted that changes in economic situations after loan originations play an important role in explaining defaults. Danis and Pennington-Cross (2005) sampled 22,799 loans from 1996 through 2003, examined the history of the loans until default. They find relationships between borrower credit, loan-to-value ratios and defaults. However, they also noted the interaction between prepayments and defaults, that is, some potential defaults ended up in prepayments which adds difficulties in default projection. Dubitsky (2006) found that among common static drivers of default, credit score has the largest impact. Furthermore, loans with higher loan-to-value ratios will have higher default rates, too. Documentation type, loan size, loan purpose, method of payment will have their impacts too. Their analysis also identified relative importance of housing price appreciation and default seasoning effects after origination of the loans in explaining sub-prime mortgage defaults. Hayre et. al. (2008) studied 200

mortgage default behavior over all mortgage types, and find that loan-to-value ratios, loan characteristics, credit score, owner occupancy status, loan purpose, documentation status, payment shock, among other factors, will have direct or indirect impact on default. They also find that housing price appreciation changes in borrowers family status including job losses, divorce, and illnesses in families will trigger delinquencies. We can find from literature reviews that loan default is affected by many factors. Among loan characteristics at origination, credit score of borrowers and loan-to-value ratios are most frequently cited factors. The studies also identified many factors that cannot be predicted at the time the loans are originated, including housing price appreciation or changes in borrower s family status. These studies suggest that static models based on loan characteristics at the origination of loans will have their values, but dynamics after origination over time are very important, too. Model Specification Assume there are n sub-prime residential mortgage loans in the market. For a certain loan i, only two outcomes can happen, default or not-default. The default event for each loan is independent from others. We can define the state variable for loan i as follows. 1 if the loan defaulted y i 0 if the loan did not default We can further define the probability for each state, Pr( yi 1) pi Pr( yi 0) 1 pi Where p i is a function of a series of loan characteristics variables x, or equivalently, we can write it as p i (x). To ensure that p i (x) has the value between 0 and 1, we assume that p i (x) has logistic function form, or pi logit( p X T i ) log i p 1 i Combining these assumptions, we can write the model here as log it(pr( y i 1)) In this study, I ll fit this model considering the following loan characteristic variables as inputs, firstly choose appropriate variables and secondly fit a model based on the selected variables. 1 Occupancy status Loan purpose Balloon indicator Loan amount Term Loan-to-value ratio Payment method Credit score Lien position Income document type X T i 1 Please see appendix for the definition and meaning of these characteristics. 201

Data GMAC-RFC, a subsidiary of GMAC financial services, is a major residential mortgage originator in the US. It originates residential mortgage loans, act as servicer of these loans, and securitizes the loans into the market as mortgage backed securities. These mortgage backed securities are publicly traded in the fixed income securities market. To provide investors with analytical information, GMAC-RFC publishes detailed information on the loans underlying the mortgage backed securities on the following website: https://www.gmacrfc.com/investors/index.htm Investors are able to download information about the residential mortgage loans from the website. For this study, I retrieve from GMAC-RFC website the following two kinds of data for analysis, Loan characteristics data at the origination of the loans Default and loss severity records for defaulted loan The data for loans in this study are the loans securitized in the mortgage backed security labeled 2001-KS1 by GMAC-RFC. The loans are mainly originated in early 2001 from all over the US. The borrowers of the loans are sub-prime borrowers, whose credit are below average US borrowers and thus have higher default rates. The US is a large country with significant regional difference on economic status. As we can see from literature review, economic or family status changes after the origination of loans play an important role in default analysis, which are not discussed in this study. Given that ZIP codes are available in the data, I take the loan data from one single region to control for after-origination factors. The loans I used in these studies are the loans from ZIP codes beginning with 7, or the southern part of US 2. I take these data because the area had highest default rates among the 10 US geographical regions. There are 1950 loans in the dataset, with 444 defaults. There are basically three outcomes for the loans since the origination in 2001: (1) the loans are prepaid, so that GMAC-RFC receive the principal earlier the scheduled and suffered no default, (2) the loans are still outstanding, and (3) the loans are defaulted. As mentioned above, the loan terms are either 15 or 30 years. However, these loans are prepaid very fast. This is because sub-prime loans come with higher interest rates, and borrowers prepay quickly once their credits are cured so as to save interest expenses. As of March 2008, less than 10% of the loans are still outstanding. The majority of the loans are either prepaid or defaulted. GMAC-RFC does not suffer default from prepaid loans. With the assumption that remaining outstanding loans will not suffer defaults, I can have full outcomes of the loans. Variable Selection Variable selection is an important part of this study. I started with backward elimination based on literature review and AIC criteria, and then conduct forward search to find proper model form with consideration of interaction among variables. Step 1: Narrow down explanatory variable candidates using backward elimination 1. I start with the model containing all explanatory variables in the data set. 2. As the order of explanatory variables is important in backward elimination, the variables are set up by relative importance based on literature review. As we can see that credit score and loan-to-value ratios are found among all studies, they enter the model first. The remaining are roughly of decreasing importance. 2 Source: http://en.wikipedia.org/wiki/zip_code 202

3. Perform backward elimination with AIC criteria. 4. I end up with four variables, which are FICO (credit score), LTV (loan-to-value ratios), liens, and payment methods. Step 2: Finding proper model form with consideration for interactions 1. In addition to finding proper explanatory variables, it is important to consider the interactions among variables. I first look at the analysis of deviance table from the model in step 1, determine relative order, than perform forward search. 2. I start with the model with variable having highest deviance explained, and add a second variable. Then I perform significant test between the models. 3. I add all interaction terms for model in 2.2, and perform significance test for models between 2.2 and 2.3. If there is no significant difference, then interaction terms are not important and I continue back to 2.2. 4. If there is significant difference between the two models from 2.2 and 2.3, I run stepwise regression with AIC criteria to find proper interaction term to be included into the model. Then I return to 2.2. 5. Repeating 2.2-2.4, I find a preliminary form for the model including interaction terms. Step 3: Considering effects of unused explanatory variables 1. To avoid leaving out potential explanatory variables, I include the variables omitted in step 1 into the preliminary model found in step 2, one at a time, and test for significance. 2. It turns out no omitted variables from step 1 are significant enough to be added back to the preliminary model. Step 4: Analyzing goodness of fit I analyze goodness of fit through test of significance against null model, error rate comparisons, residuals examination. For the statistical methods used in this study, one can refer to Dobson (2002) or Gelman and Hill (2007). Empirical Results There are two continuous variables, credit score (FICO) and loan-to-value ratios (LTV), as well as two categorical variables, lien status and payment methods. I use indicator variables to represent the categorical variables. For payment methods, 1 represents fixed interest rate and 0 represents adjustable interest rate. For lien status, 1 represents 2 nd lien and 0 represents 1 st lien. There is an additional interaction term associated with 2 nd lien and LTV ratio. Table 1 exhibits the fitted results on log scale. The fitted model is as follows. logit( pi ) 1.13352 0.00599 FICO 4.0254512ndLien 0.04332 LTV 0.26964 1 0.054131 LTV FixedPayment 2ndLien Table 1: The fitted results on log scale Estimate (Std. Error) p-value Intercept -1.13352 (0.89296) 0.204 FICO -0.00599***(0.00116) <0.0001 Factor(2 nd Lien) 4.02545***(0.77177) <0.0001 LTV 0.04332***(0.00766) <0.0001 Factor(Fixed Payment) -0.26964**(0.12358) 0.029 LTV&Factor(2ndLien) -0.05413*(0.03169) 0.088 203

The fitted coefficients indicate that higher loan to value ratios will be accompanied by higher default rates, and better credit scores will result in lower default rates. Additionally, fixed interest rate loans will have lower default probabilities than loans with adjustable interests. 2 nd lien loans will have higher default rates than 1 st lien loans, and the model seriously punishes 2 nd lien loans. The interaction term serves to mitigate the effect of large coefficient by 2 nd line indicator, when 2 nd loans are backed by low loan-to-value collaterals which reduce default risk. Given that logistic regression only predicts binary results, traditional residuals plot does not provide much insight into the goodness-of-fit. Therefore, we need to use binned residuals plot to analyze the appropriateness of the fit. In this case, I examine the residuals against model fitted values. All the binned residuals lie within 95% confidence band, and there is no specific pattern for the residuals. To determine the appropriateness of the form of explanatory variables, I examine binned residuals against continuous explanatory variables. All three plots indicate that the residuals distribute evenly on both sides of zero line without specific patterns. Thus, the explanatory variables are in proper forms and no transformations are necessary. There are, however, some outliers on the binned residuals against credit score plot, but not significant. We can therefore conclude that this is a proper fit. In addition, to assess the validity of the fitted model, I compare error rates against null model, as well as test for significance against null model. Table 2 shows the result of error rates comparison. Table 3 shows the result of the significance test. Table 2:. Error rates comparison Model Null model Fitted model Error rates 0.227692307692308 0.227179487179487 We can find from error rates comparison that the fitted model improves error rate against null model, so that the model is valid. The improvement, however, is limited. Table 3: Test of significance Resid. Df Resid. Dev Df Deviance p-value Null model 1949 2092 Fitted model 1944 2007 5 85 <0.0001 Although we find from comparison in error rates that the improvement is limited, the test of significance indicates that there is significant difference between the fitted model and the null model. Conclusion and Further Analysis We can see from the fitted results that loan level characteristics financial institutions can see at the time of origination are credit score, loan-to-value ratios, payment methods, and lien status. Credit score, or FICO, is a statistically fitted result for borrowers credit history and ability to manage debts. The relative large deviance explained confirms the importance of examining borrowers credit history in predicting their future behavior. Loan-to-value ratios are criteria relative to the collaterals, which give cushion against default. We can see that higher loan-to-value ratios are predicted to result in higher default rates. Lenders face higher risk for not having sufficient collateral. Payment shock and lien status are also identified to have impact on default rates. Loans with adjustable rate interests have higher default probabilities compared to loans with fixed interest rates. In the US, it is usually the case that adjustable rate borrowers enjoy lower interest rate payment at earlier 204

years, but are faced with upward payment shocks when interest expense resets upward, while fixed rate borrowers pay a fixed amount of interest rates throughout the loan term period. This payment shock in adjustable rate loans could result in the higher default rates. Lien status is the other identified variable to affect default rates, with 2 nd liens fitted to have much higher default probabilities. 2 nd liens are riskier loans for lenders, as lenders do not have first claim against the collateral to secure their loans in case of default. Borrowers of 2 nd lien loans are more indebted usually, as they have to pay for both 1 st and 2 nd lien loans. This higher financial burden could be the explanation for higher default rates. There is improvement in error rates compared to null models, but limited. We can see from literature review that in addition to loan characteristics at origination, default is affected by various time-dependent factors afterwards, including economic change or family status. My result suggests loan level characteristics at origination are predictive of future defaults, but the power is limited. There are several possibilities to improve the model. With the potential time dependent behavior, fitting under survival analysis could be a better way to incorporate the time feature. As there are interactions between defaults and prepayments, incorporating the risk competition behaviors could help us better understand the dynamics. Appendix: Definition of Loan Characteristics Terms Occupancy status: whether or not the house underlying the loan is occupied by the owner. Loan purpose: the purpose of borrowers to make the loan. I have collapsed the data into two categories. The purpose is purchase if the borrower made the loan to buy a new house, and equity refinance if the borrower made the loan for financial purposes, including refinance, home improvement, or debt consolidation. Balloon indicator: the method for principal repayment. If the principal is to be repaid in a short period long after origination of the loan, like a balloon, it is a balloon loan. Otherwise, the principal is amortized evenly throughout the entire loan period. Balloon loan borrowers enjoys lower interest payment at the early stage, but are required to pay higher amount when principal payback window starts. Amount: The amount of the loan. This is a continuous variable. Term: The expected term of the term. There are two categories, 15 years or 30 years. For the same amount of principal, 15-year loan involves higher monthly payment, which is a higher burden to the borrower, but the overall liability reduces faster. Loan-to-value ratio: typically mortgagers require the house underlying the loan as collateral. Loan-to-value ratio is the ratio for amount of loan divided by the value of the property Payment method: the method for calculation of interest. There are two types, fixed and adjustable rate. Fixed interest means that the interest rate of the loan is fixed throughout the term. Adjustable rate mortgage interest calculation is more complicated. It is usually fixed for the first 2 years, and becomes floating after 2 years. Hybrid interest borrowers typically pays lower interest at the first 2 years, then pay higher interest rate when the loan becomes floating. Credit score: FICO score is a commonly accepted credit score in the US. It is a calculated based on the borrower s credit history, current liabilities, and ability to manage the debt. It is continuous. Lien position: A loan is 1st lien if the financial institution has the priority claim on the value of the collateral in the case of default. If the claim is next to the first, the loan is called 2nd lien. 2nd lien is usually more risky because lenders can only claim after the 1st lien lenders. Income document: typically financial institutions require borrowers to provide a set of documents 205

for their continued income. If the documents meet all requirements, it is called full document. Sometimes financial institutions make loans despite incomplete document, which is called reduced document in this situation. REFERENCES Danies, M.A. and A. Pennington-Cross (2005), A dynamic look at subprime loan performance, Journal of fixed income, 15, No.1: 28-39. Dobson, A. (2002), An introduction to generalized linear models, Chapman & Hall/CRC. Dubitsky, R. et. al. (2006), Subprime prepayment, default, and severity models, Credit Suisse fixed income research. Gelman, A. and J. Hill (2007), Data analysis using regression and multilevel/hierarchical models, Cambridge University Press. Gjava, I. (2000), Prepayments on RFC fixed-rate subprime/home equity loans, Salomon Smith Barney fixed income research. Hayre, L.S., R. Young, and J. Chen (2008), Modeling of mortgage defaults, Journal of fixed income, 17, No.4: 6-30. 206