Predicting Charitable Contributions

Size: px

Start display at page:

Download "Predicting Charitable Contributions"

Adrian Bond
5 years ago
Views:

1 Predicting Charitable Contributions By Lauren Meyer Executive Summary Charitable contributions depend on many factors from financial security to personal characteristics. This report will focus on demographic and financial information from 2000 families who participated in the 2004 Survey of Consumer Finance. Specifically, two regression models are introduced. The first serves to indicate whether or not a family will make any charitable contribution, while the second serves to predict the amount of the contribution. The models pinpoint certain financial and demographic information to be important determinants of charitable contributions. Among other things, both models find that older, more educated families with greater incomes, savings, and inheritances are likely to contribute more money to charity. Section 1. Introduction Charitable contributions are one thing that society values. As a society, we almost expect people making over some threshold to contribute large amounts to charity. Many ask: why does one person need all that wealth? Even your average citizen usually enjoys contributing to a cause in which they believe. Yet people s behaviors tend to vary a great deal, even in similar situations. Despite this, is it possible to accurately predict charitable contributions of individuals and families? It is logical to think that one s personal values will influence the amount they contribute to charity. There are some very generous people who find ways to donate even when they themselves are not in the best financial position. Personal characteristics, such as generosity, selflessness, and the degree to which one is materialistic, could all be influential in predicting charitable contributions. However, there must be some more objective financial evidence that can serve as a better predictor of charitable contributions. Personal characteristics aside, we would expect a billionaire to contribute more to charity than someone with an income of $30,000 per year. Income, along with numerous other financial measures, could be considered important predictors of charitable contributions. This report serves to use data from the 2004 Survey of Consumer Finance (SCF) in order to estimate charitable contributions of families, given that we know some demographic and financial information about them. The SCF is conducted by the Federal Reserve Board and collects information concerning the balance sheet, pension, income, and other demographic characteristics of U.S. families. One question asked of these U.S. families is the amount of their charitable contributions in 2003, which will be the key variable of interest for this report. 1

2 The remainder of the report will be organized as follows. Section 2 will explain important characteristics of the data. Section 3 will explore the model chosen to represent the data in hopes of explaining charitable contributions as best as possible. Section 4 will have concluding thoughts. Finally, more detailed analysis is provided in the appendices. Section 2. Data Characteristics The SCF is a cross-sectional survey that is nationally representative. Observational data is collected every three years by the National Opinion Research Center, a national organization for research and computing at the University of Chicago, via on-site and phone interviews. In order to study the factors affecting charitable contributions, two thousand observations were randomly selected from the SCF database. Two thousand observations are enough to ensure a representative sample, yet are easier to work with than the original forty-five hundred observations. When examining charitable contributions, it is expected that many families will not contribute anything to charity. On the other hand, for those who do contribute to charity, there are a wide range of amounts contributed. Based on this understanding, I found it best to first predict whether or not the family made a charitable contribution using all two thousand data points. Next, I would predict how much a given charitable contribution would be, using only observations with positive charitable contributions. Out of the two thousand observations from the SCF, one thousand fifty households contributed something to charity. The smallest contribution amount reported was five hundred dollars, while the largest amount contributed to charity was $9,070,000. Figure 1 below shows the distribution of charitable contributions, after removing the observations with no contribution. The numbers above the bars indicate how many observations fell into that range. Figure 1. 2

3 From Figure 1, notice that 1032 of the 1050 observations, or more than 98% of positive charitable contributions, were for amounts of $1000 or less. This graph indicates that the data is skewed to the right. A logarithmic transformation will be used for charitable contributions in order to symmetrize this skewed distribution. This transformation is useful, as it will pull-in extreme contribution values, yet will retain the original ordering, such that the largest contribution amounts are still the largest amounts on the logarithmic scale. Figure 2 shows the new distribution of charitable contributions, after performing a logarithmic transformation. While this transformation does not perfectly symmetrize the data, it does serve well in spreading out the observations. Figure 2. Distribution of Logarithmic Charitable Contributions Logarithmic Charitable Contribution Up to this point, I have not yet introduced the variables that will be used to predict charitable contributions. The SCF has more than 5,000 questions that were answered by participating families. Of these variables, there are many which could impact charitable contributions. The details of variable selection for this report will be discussed more in section three. There are, of course, some variables that we expect to have a great influence on charitable contributions. For example, it seems reasonable that income would be a strong predictor of charitable contributions. Figure 3 identifies the relationship between income and logarithmic charitable contributions, which was decided upon as the dependent variable of 3

4 interest. Notice that this plot indicates a quadratic component for the explanatory variable INCOME. Charitable contributions are lower at the three highest income values compared to the contributions at lower, but still very high, incomes. While this may be surprising, it is a fact of the data that will be taken into account during the model selection process. Figure 4 confirms that some very high income families contributed relatively little to charity. These three points are both outliers and high leverage points. However, I think it is important for these points to be left in the analysis because it is a fact of the real world which would likely be seen in other random samples from the SCF as well. Appendix A provides additional information about the outliers and high leverage points. Figure 3. Figure 4. Logarithmic Charitable Contributions Charitable Contributions in Thousands Income in Thousands Income in Thousands Section 3. Model Selection and Interpretation Section two established that logarithmic charitable contributions will be the dependent variable and also that income will be a strong predictor of contributions. It also foreshadowed the fact that a squared income term could be useful in the model. This section will introduce the rest of the explanatory variables and the model comprised of these variables. First, the logistic regression model will be introduced, which serves to predict whether or not a charitable contribution was made. This model is not concerned with the amount of a contribution. Next, the linear regression model, which serves to predict the amount of contribution, given one occurred, will be introduced and explained. For clarity, the first model will be referred to as the contribution indication model, and second model will be referred to as the contribution amount model. This will help remind one of the purposes of each model. After introducing both models, section three will continue with an explanation of how these models were concluded to be the best and also a discussion of alternative models. 4

5 Contribution Indication Model The contribution indication model is a nonlinear, logistic regression model with the following dependent variables and regression coefficients: Explanatory Variables Coefficient Z Value *** Income 1.327E ** Age 1.537E *** Pension, Annuities Income 1.005E * Life Insurance (FACE AMT) 7.622E *** Spouse Age 1.054E *** # Businesses Managed 4.700E *** Savings Bond Value 6.252E ** Support to Family/Friends 2.485E * Spouse Education (=Bachelors) 6.145E *** Spouse Education (=Masters) 1.355E *** Amount Owed on Mortgages 3.136E *** Total Savings 1.939E Total Inherited 4.717E Assets 7.811E * Saving Habits (Not regularly) 5.326E *** The coefficients in the above table provide information about how the explanatory variables will affect the dependent variable, which is a binary variable with a value of 1 if a charitable contribution occurred. For instance, the negative coefficient associated with saving habits indicates that a family who does not save regularly will be less likely to make a charitable contribution. The z-values are another important statistics associated with each variable. The larger the z-value, the more significant that variable is in explaining charitable contributions. The asterisks also serve as a significance indicator. For example, the *** indicates that the variable is significant at the one-tenth of a percent level. Many of the variables and coefficients in this model seem reasonable, maybe even expected. It makes sense that a family with more income would be more likely to make a charitable contribution. Other variables such as AGE and SPOUSE AGE have a positive coefficient indicating that older families are more likely to contribute to charity. This could be true because older households do not need to support children anymore. LIFE INSURANCE amount and number of BUSINESSES_MANAGED are also positively related to charitable contributions. This is partly because they are positively correlated with income. Appendix B contains a table of correlation coefficients for the contribution indication model. Also, business managers may be more likely to contribute to charity because it creates a good image for their company. 5

6 The variable ASSETS has a surprising negative coefficient. This negative coefficient indicates that we would expect a family with more assets to be less likely to make charitable contributions. This could be because these families do not have as much liquid cash available, which is usually the form of payment for charitable contributions. The positive coefficient for MORTGAGE_OWED also is surprising. I originally expected families with large outstanding mortgage to be less likely to make charitable contributions. However, a mortgage is not generally considered bad debt, so perhaps families with these large mortgages are more affluent and still able to make contributions, despite mortgage debt. This model had an Akaike s Information Criterion (AIC) of A lower AIC implies a better model fit. This AIC was relatively low, compared to other models considered. Further summary data, from the statistical program R, can be found in appendix C. Summary statistics are available in appendix D. Contribution Amount Model The contribution amount model uses many of the same variables that were seen in the contribution indication model. The below table presents each dependent variable as well as the linear regression coefficients and t-values associated with each variable. Explanatory Variables Coefficient T Value *** Income 2.614E *** Squared Income 3.767E *** Age 1.107E *** Pension, Annuities Income 4.648E ** Life Insurance (FACE AMT) 8.580E *** Spouse Age 1.031E *** Education (=Bachelors Degree) 2.476E Education (=Masters Degree) 5.843E ** # Businesses Managed 1.102E *** CD Value 2.288E *** Support to Family/Friends 4.269E *** Value of Stocks 1.115E ** Spouse Education (=Bachelors) 3.374E * Spouse Education (=Masters) 4.065E * Amount Owed on Mortgages 2.009E * Credit Line Available 3.369E ** Total Savings 9.519E * Total Inherited 4.303E *** Total in Checking 2.738E *** 6

7 One important variable in the model is SQUARED INCOME. This variable was added based on the plot of LOG(CHARITABLE_CONT) vs. INCOME that was shown in section two. SQUARED INCOME has a negative coefficient, as expected, indicating that as INCOME is increasing, LOG(CHARITABLE_CONT) is increasing at a decreasing rate. This squared term also explains the decreasing contribution amounts at very large income levels. Interestingly, CD_VALUE and STOCK_VALUE were useful in predicting a contribution amount, rather than BOND_VALUE, which was important in the contribution indication model. Both CD and stock value have positive coefficients indicating higher amounts in CDs and stocks is generally associated with a larger charitable contribution. In this model, EDUCATION of the respondent was now an important explanatory variable in addition to SPOUSE _EDUCATION, which was seen in the contribution indication model. Higher education levels are generally associated with high income levels, although the correlation coefficients between education and income for this data set were not strong. Appendix E has correlation coefficients for the contribution amount model. T-values, like z-values, indicate how significant each variable is in predicting LOG(CHARITABLE_CONT). Every variable except EDUCATION (=BACHELORS) has a t- value greater than two in absolute value. The coefficient of determination for this model, R 2 =.554, indicates that the model explains 55.4% of the variability in charitable contributions. The coefficient of determination adjusted for degrees of freedom, R a 2 =.5458, was not much lower. The size of the typical error, s, was Further summary data from the statistical program R can be found in appendix F. Summary statistics for this model are in appendix G. One concern with this model was related to the presence of collinearity, which occurs when one explanatory variable is nearly a linear combination of the other explanatory variables. To examine the presence of collinearity, variance inflation factors (VIF) were calculated for each variable. The highest VIF was that of INCOME with a value of While this is somewhat large, it is not greater than 10, at which point severe collinearity would exist. The VIF for the other variables can be found in appendix H. 7

8 An Example using these Models An example will help clarify how to use the two aforementioned models. Suppose we have the following information about the Smith family: Mr. Smith was the survey respondent. Family income is $140,000 per year. The family is receiving no pension or annuity income. The family holds a life insurance policy on Mr. Smith with FACE amount of $500,000. Mr. Smith is 46 and Mrs. Smith is 42. Both Mr. and Mrs. Smith have earned bachelor s degrees. They do not manage any businesses and do not support family or friends, monetarily. They have CDs worth $15,000, Stocks worth $20,000, and Savings Bonds for $10,000. They still owe $100,000 on their mortgage. They own a yacht worth $40,000 which is considered ASSETS. Their available credit line is $50,000. The Smiths have $50,000 in savings and $25,000 in checking. They recently inherited $100,000 when Mr. Smith s father passed away. Their savings habits are defined as regular by the SCF. First, using the contribution indication model, we can find the probability that the Smiths make a charitable contribution. Π(z) will be used to denote the logit regression case. Π(z) = (1.327E-06)(140,000) + (1.537E-02 )(46) + (7.622E-07)(500,000) + (1.054E-02)(42) + (6.252E-05)(10,000) E-01 + (3.136E-06)(100,000) + (1.939E-06)(50,000) + (4.717E-07)(100,000) + (-7.811E-08)(40,000) = For the logit case, Π(z) = e z /(1+e z ) or in this case e ( ) /(1+e ( ) ). The probability that the Smiths make some contribution to charity is.805. Next let s see how much we would expect the Smiths to contribute to charity, given they make a contribution. Log(Contribution) = (2.614E-07)(140,000) + (-3.767E-15)(140,000^2) + (1.107E-02)(46) + (8.580E-08)(500,000) + (1.031E-02)(42) + (2.476E-01) + (2.288E-07)(15,000) + (1.115E-08)(20,000) E-01 + (2.009E-07)(100,000) + (3.369E-07)(50,000) + (9.519E-08)(50,000) + (4.303E-08)(100,000) + (2.738E-07)(25,000) = Since log(contributions) = 7.92, we expect the Smiths to contribute $2,767 to charity. 8

9 Determining the Final Models Both models are fairly complex due to the large number of variables. However, this was expected, as charitable contributions are a complex prediction to make. Many other models were considered, but the two chosen models provide the best estimates. Goodness of fit measures were already discussed in section two. Stepwise regression was used as a first step in determining both models. This was important to use because the original data set had over five thousand variables. After narrowing these down to 70 variables, which can been seen in appendix I, stepwise regression was run for the logistic and linear regression models. The final model is not exactly what was recommended from the stepwise regression. After looking at scatter plots, correlations, and variable coefficients, the final model was created. As section two demonstrated, the scatter plot of log(charitable CONT) vs. INCOME indicated the squared income term, which would not be identified by stepwise regression. In section two, it was also noted that three points were both outliers and high leverage points. After removing the three most unusual points, as identified in the residuals vs. leverages plot, the R a 2 increased to 56.2% from 54.6%. Appendix J has more information about the regression model after removing these three points. These points were left in the data during model selection. One drawback of the contribution amount model is evident after looking at the residuals vs. fitted plot in appendix K. This plot indicates that heteroscedasticity is present. The variance of the residuals starts out relatively small, grows, and then slightly decreases. The plot also shows residuals becoming more consistently negative as fitted values increase. In other words, the model is overestimating charitable contributions for observations with large fitted values. Despite this, the chosen model is still the best, as all other models considered had this same problem. I feel more qualitative variables are necessary to solve this issue. This idea will be expanded upon in the recommendations portion of section four. Also relating to residuals, the histogram of standardized residuals appears normally distributed. Alternative Models One alternative model to be considered involves adding an INCOME^3 term. Doing this increases the R a 2 to 59.7% and also improves the residual versus fitted plot discussed earlier. I did not use this term because it increases the model complexity. Also, I questioned whether an INCOME^3 term made economical sense and whether the term would be significant if a different data sample from the SCF was taken. As mentioned earlier, the data used for this report was a subset of 2000 from the 4500 observations in the SCF. I would suggest rerunning the contribution amount model introduced above with all 4500 observations. At this point, it would easier to consider whether INCOME^3 would actually be useful. The reason is that my subset of 2000 had only three data points with 9

10 INCOME well over $40,000,000. These data points are influential in the analysis, and it would be useful to have more data points with such extremely high INCOME. Section 4. Summary and Concluding Remarks Despite the wide range of variability associated with charitable contributions, we know that it is possible to predict both whether a family will make a contribution and also how much that contribution would be. The recommended explanatory variables for indicating the probability that a contribution will be made are: the family s income level, the age of the respondent and their spouse, amount of life insurance purchased, amount of income from pensions or annuities, number of businesses managed, amount of support provided to others, amount of money in bonds, savings, and assets, total amount inherited, spouse s education level, amount owed on mortgages, and saving habits of the family. To predict the amount of money likely to be contributed to charity, the following additional variables are recommended: Income squared, education level of the respondent, value of CDs and stocks, the total amount in checking, and the credit line available to the family. The study looked at cross-sectional data from the Survey of Consumer Finance, which mainly has demographic and financial information about families. I believe the coefficient of determination, R 2, could be dramatically improved if additional data sources could be brought into the study. Mainly, subjective information would improve the study, such as how important one feels it is to donate to charity and other personal attributes that reflect how caring, giving, and materialistic the respondent is. It would also be interesting to consider a more longitudinal study, by looking at how much charitable contributions change for the same families year after year. This would allow personal characteristics to be held steadier, and the focus could be turned to financial variables, as we would expect salary and net worth to be increasing with time. 10

11 Appendices: Table of Contents Appendix A: Standardized residuals vs. Leverages for the Contribution Amount Model Appendix B: Table of Correlations for the Contribution Indication Model Appendix C: Summary data for predicting whether a contribution was made Appendix D: Summary Statistics for the Contribution Indication Model Appendix E: Table of Correlations for the Contribution Amount Model Appendix F: Summary data for predicting the amount of a charitable contribution Appendix G: Summary Statistics for the Contribution Amount Model Appendix H: Variance Inflation Factors Appendix I: Variables Considered from the SCF and Variables Created from the SCF Appendix J: R output after removing three unusual points Appendix K: Residuals vs. Fitted Values for the Contribution Amount Model 11

12 Appendix A: Residuals vs. Leverage for the Contribution Amount Model Observation Number Standardized Residual Leverage

13 Appendix B: Table of Correlations for the Contribution Indication Model Income Age Spouse Age Pension_ASaving_Habits Bus. Managed Support SavingBond_VLife_Insurance Spouse_Educ Mortgage OTotalSavings Assets Age Spouse Age Pension_Annuity_Income Saving_Habits Businesses_Managed Support SavingBond_Value Life_Insurance Spouse_Educ OwedOnMortgage TotalSavings Assets Total Inherit

14 Appendix C. Summary Data for Model 1: Predicting whether a contribution was made Call: glm(formula = CharityInd ~ AGE + SPOUSEAGE + INCOME + Pension_Annuity_Income + factor(saving_habits) + Businesses_Managed + Support + SavingBond_Value + Life_Insurance + factor(spouse_educ) + OwedOnMortgages + TotalSavings + Assets + TotalInherit, family = binomial(link = logit), data = mydata) Deviance Residuals: Min 1Q Median 3Q Max e e e e e+00 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e e < 2e-16 *** AGE 1.537e e *** SPOUSEAGE 1.054e e e-05 *** INCOME 1.327e e ** Pension_Annuity_Income 1.005e e * factor(saving_habits) e e e-06 *** Businesses_Managed 4.700e e e-05 *** Support 2.485e e * SavingBond_Value 6.252e e ** Life_Insurance 7.622e e *** factor(spouse_educ) e e e-06 *** factor(spouse_educ) e e e-08 *** OwedOnMortgages 3.136e e e-06 *** TotalSavings 1.939e e Assets e e * TotalInherit 4.717e e Null deviance: on 1999 degrees of freedom Residual deviance: on 1984 degrees of freedom AIC: Number of Fisher Scoring iterations: 9 Appendix D: Summary Statistics for the Contribution Indication Model Variable Mean Median SD Min Max INCOME 701, , ,440, , ,000, AGE Pension_Annuity_Income 13, , ,000, Life_Insurance 459, , ,880, ,000, SPOUSEAGE Businesses_Managed Support 8, , , SavingBond_Value 2, , , OwedOnMortgages 106, , ,000, Total Savings 100, , ,000, Assets 251, ,070, , ,000, TotalInherit 196, ,120, ,000,000.00

15 Appendix E: Table of Correlations for the Contribution Amount Model log(char) Income SqIncome Age Pension_AnnLife_Insuran SpouseAge Education Bus. Managed CD_Value Support Stock_Value Spouse_Educ Mortgages Owed Credit LineTotal Savings Total Inherit Income SqIncome Age Pension_Annuity_Income Life_Insurance SpouseAge Education Businesses_Managed CD_Value Support Stock_Value Spouse_Educ Mortgages Owed Credit Line Total Savings Total Inherit Total Checking

16 Appendix F. Summary Data for Model 2: Predicting the amount of a charitable contribution Call: lm(formula = lnchar ~ INCOME + SqIncome + AGE + Pension_Annuity_Income + Life_Insurance + SPOUSEAGE + factor(education) + Businesses_Managed + CD_Value + Support + Stock_Value + factor(spouse_educ) + OwedOnMortgages + CreditLine + TotalSavings + TotalInherit + TotalChecking, data = PositiveCont) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.263e e < 2e-16 *** INCOME 2.614e e < 2e-16 *** SqIncome e e < 2e-16 *** AGE 1.107e e *** Pension_Annuity_Income 4.648e e ** Life_Insurance 8.580e e e-07 *** SPOUSEAGE 1.031e e e-07 *** factor(education) e e factor(education) e e ** Businesses_Managed 1.102e e e-05 *** CD_Value 2.288e e *** Support 4.269e e e-07 *** Stock_Value 1.115e e ** factor(spouse_educ) e e * factor(spouse_educ) e e * OwedOnMortgages 2.009e e * CreditLine 3.369e e ** TotalSavings 9.519e e * TotalInherit 4.303e e e-05 *** TotalChecking 2.738e e e-11 *** Residual standard error: on 1030 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 19 and 1030 DF, p-value: < 2.2e-16 Appendix G: Summary Statistics for the Contribution Indication Model Variable Mean Median SD Min Max CHARITYCONT 89, , , ,070, lnchar INCOME 1,290, , ,670, , ,000, AGE Pension_Annuity_Income 20, , , Life_Insurance 808, , ,530, ,000, SPOUSEAGE Businesses_Managed Support 15, , , CD_Value 81, , ,000, OwedOnMortgages 167, , , ,500, CreditLine 83, , ,000, Total Savings 184, , , ,000, TotalInherit 358, ,300, ,000, TotalChecking 199, , ,130, ,500,000.00

17 Appendix H: Variance Inflation Factors Variable Variance Inflation Factor INCOME SqIncome AGE Pension_Annuity_Income Life_Insurance SPOUSEAGE EDUCATION Businesses_Managed CD_Value Support Stock_Value Spouse_Educ OwedOnMortgages CreditLine TotalSavings TotalInherit TotalChecking Appendix I: Variables created from the Survey of Consumer Finance OwedOnMortgages = X805 + X905+ X X1044 CreditLine = X X X1126 PBusIncome = X X X3332+ X3337+ X X X X X X3430 CurrentPensionBal = X X6467 +X X X X6487 TotalSavings = X X X X X X X3765 Assets = X4022+X4026+X4030+X4018-X4032 CashSettle = X5504+X5507+X5510+X5513+X5516+X5519 FuturePensionAMT = X5608+X5616+X5624+X5632+X5640+X5648 TotalInherit = X5804+X5809+X5814+X5818 TotalChecking = X3506+X3510+X3514+X3518+X3522+X3526+X3529 TotalAnnuities = X X6580 PropertyWorth = X X X X2002+ X2012 OtherLoans = X X X X X X2940 PensionReceived = X X X X X X5434

18 Appendix I: Variables Considered: From the SCF Age X14 Spouse Age X19 Income (as Wage or Salary) X5702 Number of people in Household X101 Gender X8021 Education Level X5901 EXpectations for Economy X301 Amt Business Income X5704 Nontax Investment Income X5706 Interest income X5708 Dividend Income X5710 Stock, Bond, Real Estate Income X5712 Rent, Trust, Royalties Income X5714 Pension, Annuities Income X5722 Child Support, Alimony Income X5718 Credit: Turned down in last 5 years X407 CC Bank: Amount of new charges X412 CC Bank: Amount still owed X413 CC Bank: Credit Limit X414 Mortgage1: # years X806 # Lines of Credit X1102 # Loans owed to Respondant X1403 # vehicles owned X2202 Foreseeable Major EXpenses X3010 How Much Financial Risk will they take on X3014 Don't Save- spend more than income X3015 Save regularly X3020 # business actively managed X3105 Total Value of CDs X3721 # of Savings/Money Market accounts X3728 Value of Cash/Call Money Account X3930 Total MKT Value of Stock funds X3822 Respondent AMT Earned before TaXes X4112 Respondent: Any pensions through jobs X4135 Amt to support friends/relatives X5734 EXpected to Inherit X5821 Respondent Race X6809 # people in PEU X7001 Respondent Marital Status X8023 Use Computer to Manage $ X6497 Currently Smoke? X7380 R: How old you'll live to be? X7381 How they Rate Retirement Income X3023 Total Number Mutual Funds X3820 Value of Savings Bonds X3902 Total MKT Value Bonds X6706 Total Value Mutual Funds X6704 Total MKT Value Stocks X3915 Currently Receiving Pension PMTs X4140 Face AMT of Life Insurance Policies X4003 Spouse earnings before taxes X4712 Spouse grade completed X6101 Spouse year of birth X6108 NPEU: Total Amount Owed on Mortgages X6437 NPEU: Amt in Debt X6439 Number of Properties X6688 Number of Checking Accounts X6695

19 Appendix J: R output after removing three unusual points. Call: lm(formula = lnchar ~ INCOME + SqIncome + AGE + Pension_Annuity_Income + Life_Insurance + SPOUSEAGE + factor(education) + Businesses_Managed + CD_Value + Support + Stock_Value + factor(spouse_educ) + OwedOnMortgages + CreditLine + TotalSavings + TotalInherit + TotalChecking, data = PositiveCont, subset = -c(1008, 775, 424)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6.26e e < 2e-16 *** INCOME 3.15e e < 2e-16 *** SqIncome -5.76e e < 2e-16 *** AGE 1.06e e ** Pension_Annuity_Income 4.34e e ** Life_Insurance 7.23e e e-05 *** SPOUSEAGE 1.07e e e-08 *** factor(education)2 2.74e e * factor(education)3 6.23e e *** Businesses_Managed 9.56e e *** CD_Value 1.98e e ** Support 3.88e e e-06 *** Stock_Value 1.21e e *** factor(spouse_educ)2 2.67e e * factor(spouse_educ)3 3.44e e OwedOnMortgages 1.97e e * CreditLine 3.16e e ** TotalSavings 2.05e e *** TotalInherit 4.00e e e-05 *** TotalChecking 3.63e e e-07 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 1.3 on 1027 degrees of freedom Multiple R-Squared: 0.57, Adjusted R-squared: F-statistic: 71.6 on 19 and 1027 DF, p-value: <2e-16

20 Appendix K: Residuals vs. Fitted Values for the Contribution Amount Model

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from Last Time Multiple regression model: Include multiple predictors in the model = + + + + How to interpret the