ST 350 Lecture Worksheet #33 Reiland

ST 350 Lecture Worksheet #33 Reiland SOLUTIONS Name Lotteries: Good Idea or Scam? Lotteries have become important sources of revenue for many state governments. However, people have criticized lotteries for several reasons. To establish the lottery, politicians promise that the lottery revenues will be used for education; however, they frequently cut back an equal amount on the state's funding for education when the lottery revenues begin appearing in the state's coffers. Another attack has been made by those who claim that lotteries are a tax on the poor and uneducated. To examine the validity of the second claim, a random sample of 100 adults was asked how much they spend on lottery tickets and interviewed about various socioeconomic variables. The purpose of the study was to test the following beliefs: i). Relatively uneducated people spend more on lotteries than do relatively educated people ii). Older people buy more lottery tickets than younger people iii). People with more children spend more on lottery tickets than people with fewer children iv). Relatively poor people spend a greater proportion of their income on lottery tickets than relatively rich people The following data were collected: CÀ amount spent on lottery tickets as a percentage of total household income B" À number of years of education B# À age B$ À number of children B% À personal income (in thousands of dollars) Output from Excel is shown below Correlation Matrix:

ST 350 Worksheet #33 page 2 Education Age Children Income Education 1 Age -0.1782 1 Children 0.1073 0.1072 1 Income 0.7339-0.0418 0.0801 1 SUMMARY OUTPUT Regression Statistics Multiple R 0.6584 R Square 0.4335 Adjusted R Square 0.4096 Standard Error 2.91 Observations 100 ANOVA df SS MS F Significance F Regression 4 615.4 153.9 18.17 0.0000 Residual 95 804.3 8.47 Total 99 1419.8 Coefficients Standard Error t Stat P-value Intercept 11.91 1.79 6.67 0.0000 Education -0.430 0.132-3.26 0.0016 Age 0.0292 0.0252 1.16 0.2501 Children 0.0934 0.224 0.42 0.6780 Income -0.0745 0.0277-2.69 0.0085 1. Write the least squares prediction equation and interpret the coefficients of the model. sc ""Þ*"!Þ%$!ÐI.?-+>398Ñ!Þ!#*#ÐE1/Ñ!Þ!*$%ÐG236.</8Ñ!Þ!(%&ÐM8-97/Ñ Interpretation: i) intercept : the intercept value ""Þ*" does not have a meaningful interpretation in the context of this data (if Education, Age, Children, and Income are all 0, then the model predicts that 11.91% of total family income is spent on the lottery) but its value is still important to make accurate predictions when using the model. ii) Education: computationally, after allowing for the linear effects of Age, Children, and Income, then 1 additional year of Education results in the regression model predicting a decrease of 0.43% of total household income spent on the lottery. Note that it is NOT correct to say that 1 additional year of education causes a 0.43% reduction in the percentage of total household income spent on the lottery. iii) Age: computationally, after allowing for the linear effects of Education, Children, and Income, then 1 additional year of Age results in the regression model predicting an increase of 0.0292% of total household income spent on the lottery. Note that it is NOT correct to say that 1 additional year of Age causes a 0.0292% increase in the percentage of total household income spent on the lottery.

ST 350 Worksheet #33 page 3 iv) Children: computationally, after allowing for the linear effects of Education, Age, and Income,, then 1 additional child results in the regression model predicting an increase of 0.0934% of total household income spent on the lottery. Note that it is NOT correct to say that 1 additional child causes a 0.0934% increase in the percentage of total household income spent on the lottery. v) Income: computationally, after allowing for the linear effects of Education, Age, and Children, then an additional $1,000 of personal income (1 unit of B %, the Income variable) results in the regression model predicting a decrease of 0.0745% of total household income spent on the lottery. Note that it is NOT correct to say that an additional $1,000 of personal income causes a 0.0745% decrease in the percentage of total household income spent on the lottery. 2. What percent of the variation in the percentage of total household income spent on the lottery is explained by the differences in the explanatory variables? (express as a percent; use 1 decimal place) # From the regression output, V!Þ%$$&à this means that 43.35% of the variation in the percent of total household income spent on the lottery is explained by differences in the explanatory variables Education, Age, Children, and Income. 3. Is the complete model useful for predicting percent of household income spent on the lottery? Global F test: L! À " " " $ "! " # % L+ Àat least 1 " 3 Á!ß3"ß#ß$ß%Þ QWV/1</398 "&$Þ* Test statistic (from output) J QWI<<9< )Þ%( ")Þ"( T @+6?/ (from output, Significance F, to 4 decimal places)!þ!!!! Conclusion: Reject the nulll hypothesis and conclude that at least 1 " 3 Á!. The complete model is useful for predicting the percent of household income is spent on the lottery. NOTE that this does NOT imply that this is the BEST modelþ 4. Test each of the individual beliefs i) - iv) at the 5% significance level. i) Relatively uneducated people spend more on lotteries than do relatively educated people L! À ""!ß La À "" Á!Þ,! "!Þ%$ >!Þ"$# $Þ#'à T @+6?/!Þ!!"' (from the above regression output), " Conclusion: rejct L! À" "! in favor of the alternative La À" " Á!. Since,!ßit " appears that people with more years of education spend LESS on the lottery when Age, Children, and Income are accounted for; that is, relatively uneducated people spend MORE on lotteries than do relatively educated people. ii). Older people buy more lottery tickets than younger people L À "!ß L À " Á!Þ! # a #

ST 350 Worksheet #33 page 4,!!Þ!#*# # >!Þ!#&# "Þ"'à T @+6?/!Þ#&!" (from the above regression output), # Conclusion: do not reject L! À "#!. When Education, Children and Income are accounted for, Age is not linearly related to the percentage of household income spent on the lottery. iii). People with more children spend more on lottery tickets than people with fewer children L! À " 3!ß La À " 3 Á!Þ,! 3!Þ!*$% >!Þ##%!Þ%#à T @+6?/!Þ'#)! (from the above regression output), 3 Conclusion: do not reject L! À " $!. When Education, Age, and Income are accounted for, the number of Children is not linearly related to the percentage of household income spent on the lottery. iv) Relatively poor people spend a greater proportion of their income on lottery tickets than relatively rich people. L! À " 4!ß La À " 4 Á!Þ,! 4!Þ!(%& >!Þ!#(( #Þ'*à T @+6?/!Þ!!)& (from the above regression output), 4 Conclusion: reject L! À "%!. When Education, Age, and Children are accounted for, personal Income is linearly related to the percentage of household income spent on the lottery. 5. Are the required conditions satisfied? Histogram Plot of Residuals versus Predicted Frequency 20 15 10 5 0-8 -6-4 -2 0 2 4 6 Residuals 10 5 0-5 -5 0 5 10-10 Residuals Predicted The regression model is C3 "! " " B"3 " # B#3 " $ B$3 " % B%3 % 3ß 3 "ßáß8 with the following assumptions on the error terms % 3 À i) For all 3, IÐ% 3 Ñ! ii) For all values of BßBßBßB " # $ %, WHÐ% 3Ñ5 % for all 3 iii) the distribution of the error % 3 is normal, 3 "ß á ß 8 iv) for errors associated with values of the response variable C are independent. Summary of the assumptions: % 3 µ33.rð!ß5% Ñfor all BßBßBßB " # $ % where 33. denotes independent and identically distributed.

ST 350 Worksheet #33 page 5 The histogram of the residuals shows an approximate mound-shaped (normal) pattern with some left skewness and a mean of approximately 0 so we can be reasonably comfortale with assumptions i) and iii). To check assumptions ii) and iv) we can examine residual plots to look for patterns. The cone-shaped pattern of the residual plot calls into question assumption ii) but not assumption iv). 6. a. Estimate the percentage of income spent on the lottery by adults with 11 years of education that are 45 years old with 3 children and annual income of $30,000. sc ""Þ*"!Þ%$Ð""Ñ!Þ!#*#Ð%&Ñ!Þ!*$%Ð$Ñ!Þ!(%&Ð$!Ñ 'Þ&% b. Estimate the percentage of income spent on the lottery by adults with 16 years of education who are 45 years old with 3 children and annual income of $60,000. sc ""Þ*"!Þ%$Ð"'Ñ!Þ!#*#Ð%&Ñ!Þ!*$%Ð$Ñ!Þ!(%&Ð'!Ñ #Þ"& 7. Since education and income are highly correlated, let's eliminate income from the model and observe how the estimates change. The output is shown below. SUMMARY OUTPUT Regression Statistics Multiple R 0.62486173 R Square 0.39045218 Adjusted R Square 0.37140381 Standard Error 3.00248144 Observations 100 ANOVA df SS MS F Significance F Regression 3 554.3600991 184.7867 20.49793 2.39533E-10 Residual 96 865.4299009 9.014895 Total 99 1419.79 Coefficients Standard Error t Stat P-value Intercept 13.1714062 1.77677665 7.413091 4.85E-11 Education -0.6913761 0.092145484-7.503093 3.15E-11 Age 0.02011054 0.025796646 0.77958 0.437556 Children 0.10280375 0.231431694 0.444208 0.657892 a. Is the complete model useful for predicting percent of income spent on the lottery? Global F test: L! À " " " $! " # L+ Àat least 1 " 3 Á!ß3"ß#ß$Þ QWV/1</398 ")%Þ(* Test statistic (from output) J QWI<<9< *Þ!" #!Þ%*) T @+6?/ (from output, Significance F, to 4 decimal places)!þ!!!!

ST 350 Worksheet #33 page 6 Conclusion: Reject the nulll hypothesis and conclude that at least 1 " 3 Á!. The complete model is useful for predicting the percent of household income is spent on the lottery. NOTE that this does NOT imply that this is the BEST modelþ b. Test each of the individual beliefs i) - iii) at the 5% significance level. i) Relatively uneducated people spend more on lotteries than do relatively educated people L! À ""!ß La À "" Á!Þ,! "!Þ'* >!Þ!*# (Þ&à T @+6?/!Þ!!!! (to 4 decimal palces, from the aove, " regression output) Conclusion: rejct L! À" "! in favor of the alternative La À" " Á!. Since,!ßit " appears that people with more years of education spend LESS on the lottery when Age, and Children are accounted for; that is, relatively uneducated people spend MORE on lotteries than do relatively educated people. ii). Older people buy more lottery tickets than younger people L! À "#!ß La À "# Á!Þ,! #!Þ!# 01 >!Þ!#& Þ((*'à T @+6?/!Þ%$(', # 8 0 (from the above regression output) Conclusion: do not reject L! À "#!. When Education and Children are accounted for, Age is not linearly related to the percentage of household income spent on the lottery. iii). People with more children spend more on lottery tickets than people with fewer children L! À " 3!ß La À " 3 Á!Þ,! 3!Þ"!#) >!Þ#$"%!Þ%%%#à T @+6?/!Þ'&(* (from the above regression output), 3 Conclusion: do not reject L! À " $!. When Education and Age are accounted for, the number of Children is not linearly related to the percentage of household income spent on the lottery.