How exogenous is exogenous income? A longitudinal study of lottery winners in the UK Dita Eckardt London School of Economics Nattavudh Powdthavee CEP, London School of Economics and MIASER, University of Melbourne 29 th August 2014 Abstract This note investigates whether gains from lottery wins are strictly randomized across lottery winners. Using a unique longitudinal dataset of lottery winners in Britain, we offer new evidence that certain socio-economic characteristics strongly predict future winnings in the national lottery. This is the case even after controlling for individual fixed effects in the estimation. Researchers working with lottery dataset are thus advised to check, whenever they can, whether their outcome variables also strongly predict the amount of future lottery wins. JEL: C5; D1 Keywords: income; lottery wins; quasi-experiment; BHPS; longitudinal 1
1. Introduction In an attempt to create a setting as close as possible to the idealized laboratory experiment, data on lottery wins have been increasingly used by economists to study the causal effect of income on a variety of behaviors and outcomes. This includes, for example, savings and consumption behaviors (Imben et al., 2001), mental health (Gardner & Oswald, 2007; Apouey & Clark, 2014), physical health and mortality rate (Lindahl, 2005), and the likelihood of becoming self-employed (Linh & Ohlsson, 2005; Georgellis et al., 2005). These studies took almost for granted that by excluding non-winners from the sample, the amount win is strictly exogenous across lottery winners at the year of winning. This is the case even when certain groups of individuals e.g. those with higher levels of social capital are more likely than others to participate in the lottery and thus become recipients of windfall gains (Georgellis et al., 2008). The current note, however, argues that the amount of lottery win may not be as randomly distributed across lottery winners as previously assumed by the authors, and that more care must be taken when working with lottery dataset. Unobserved individual fixed effects such as personality traits, as well as information on lottery playing behaviors that are not normally collected in surveys (e.g. how often a person played and spent on lottery tickets), may be correlated with both the amount win and the personal outcomes of interest. To test this hypothesis, we ask, Conditioning on having won some money from the lottery within the last twelve months, to what extent can last year s socio-economic characteristics predict the size of this year s lottery win? 2. Data and Methods 2
The data used in this paper come from the British Household Panel Survey (BHPS). This is a nationally representative random sample of households, containing over 25,000 unique adult individuals, conducted between September and Christmas of each year from 1991 (Taylor et al., 2002). In every year since 1996 to 2008, the BHPS asked its participants the following question about windfall income: About how much in total did you receive? Win on the football pools, national lottery or other form of gambling. In modern Britain, the national lottery is overwhelmingly the main form of gambling relevant to this question, so for succinctness we shall refer to this as lottery wins. The size of win in our data set varies from tiny (one pound) to substantial (approximately 185,000 pounds); see Figure 1 for the distribution of the log amount of win. If we examine lottery winners at the year of winning, we have 15,461 observations on actual lottery wins (on 6,573 discrete individuals). Many people also won money from the lottery more than once in our panel; from 1997, the average number of years of winning the lottery for the same person is 2.12 years, with a standard deviation of 2. We are unable to control for the number of tickets purchased; the number is not recorded in the data set. Our first approach involves estimating the following lottery win equation:! log (real lottery win)!" = α + X!"!! β + e!", (Eq. 1)! where X!"!! is a vector of lagged individual characteristics of individual i, which includes gender, age, age-squared, log of real net household income per capita (excluding lottery win), education, marital status, employment status, mental distress (GHQ-12: Caseness), self-assessed health, homeownership, frequency of talking to neighbors, frequency of seeing friends, and social class. The error term, e!", is assumed to be i.i.d., and the null hypothesis is that the estimated βs obtained from an 3
ordinary least squares (OLS) model will be statistically insignificantly different from zero if the amount of lottery win in t+1 is strictly exogenous. The longitudinal nature of the BHPS allows e!" to be further decomposed into the individual random effect component, η!, and the time-varying component, ε!", as follows: e!" = η! + ε!". (Eq. 2) Eq. (1) can then be estimated using a fixed effects (FE) estimator. Finally, in order to test for the switch between winning less to winning more, we also estimate a conditional logit (CL) equation with an indicator variable that has a value of 0 for win 1-99 and 1 for win 100 or more as the outcome variable. Note that all regressions include regional and year dummies as additional control variables with robust standard errors clustering at the individual level. 3. Results Table 1 reports the estimates obtained from running the OLS, FE, and CL models on a restricted sample of only the lottery winners at the year of winning. It contains a number of findings that might have been hard to predict. While the majority of Column 1 s coefficients are not statistically significant, we find a sizeable difference in terms of future lottery win by gender; on average, men win around 16% more than women from participating in the lottery between t-1 and t. Conditioning on winning at t, people with lower education at t-1 are significantly and sizably more likely to win more from playing the lottery than the highly educated. The same also applies for the people with higher lagged household incomes, although the opposite is true for those living with a partner. Moreover, we also find strong evidence that the winning amount at t tend to be higher for people who were self-employed, had better mental health, 4
had more children, and were more pro-social at t-1, on average. Yet despite these explanatory variables being statistically robust, we could only explain around 4% of the variation in future lottery wins. Moving on to the FE estimates reported in Column 2, we can see that less than 2% of the within-person variation in future lottery win can be explained by our model. Income continues to enter the FE specification in a positive and statistically significant manner, although its size has been reduced by more than one-half of its original size. The education effects are now imprecisely estimated, but this could be due to the fact that education is a slow-moving variable in a panel. A within-person change from not meeting people everyday to meeting people on most days is now highly correlated with receiving larger wins in the next twelve months. Nevertheless, while we have uncovered some important socio-economic predictors of log winning even in a FE specification, Column 3 demonstrates that very little of the switching between winning small (i.e., 1-99) and winning moderate to large amount (i.e., 100+) can be explained using the lagged variables we have in our model. This implies that much of the selection effect into larger sums of money probably occurs at the lower tail end of the lottery win distribution, i.e. less than 100. 4. Conclusion One of the most practical difficulties in the study of income effects on economic outcomes and behaviors is finding a quasi-experimental setting where some individuals are randomly assigned more money than others. For quite some time now, data on lottery winners, such as that provided by the BHPS, have provided economists with a tangible solution to this problem. However, this note shows that although very little variation of lottery wins can be explained by individual s socio-economic characteristics, we still cannot take for granted that the amount win is strictly 5
randomized across the entire distribution of lottery winners. We conclude that robustness checks on whether or not our outcome variables of interest also predict the amount of future wins should always be performed on a case-by-case basis. 6
References Apouey, B., and Clark, A.E. 2014. Winning but feeling no better? The effect of lottery prizes on physical and mental health. Health Economics, in press. Garder, J., and Oswald, A.J. 2007. Money and mental wellbeing: a longitudinal study of medium-sized lottery wins. Journal of Health Economics, 26(1), 49-60. Georgellis, Y., Sessions, J.G., and Tsitsianis, N. 2005. Windfalls, wealth, and the transition to self-employment. Small Business Economics, 5, 407-428. Georgellis, Y., Sessions, J.G., and Tsitsianis, N. 2008. Social capital and windfalls: empirical evidence. Economics Letters, 99(3), 521-525. Imbens, G.W., Rubin, D.B., and Sacerdote, B.I. 2001. Estimating the effect of unearned income on labor earnings, savings, and consumption: Evidence from a survey of lottery players. American Economic Review, 91, 778-794. Lindahl, M. 2005. Estimating the effect of income on health and mortality using lottery prizes as an exogenous source of variation in income. Journal of Human Resources, XL, 144-168. Linh, T., and Ohlsson, H. 1996. Self-employment and windfall gains: evidence from the Swedish lottery. Economic Journal, 106, 1515-1526. Taylor, M. F., Brice, J., Buck, N., and Prentice-Lane, E. 2002. British Household Panel Survey User Manual. Colchester: University of Essex. 7
Figure 1 The Frequency Distribution of (Log) Real Lottery Wins in the BHPS Dataset Density 0.2.4.6.8 0 5 10 15 lgwindfall Note: The vertical axis gives a proportion; the horizontal axis is the logarithm of the size of a win. A log windfall of 5 is approximately 150. A log windfall of 8 is approximately 2800. 8
Table 1 Predicting the amount of lottery win using lagged variables Winning Ln(real lottery win) 100+ OLS FE CL Socio-economic characteristics at t-1 Ln(real net household income per capita) 0.196*** 0.082** 0.135 [0.034] [0.038] [0.118] Male 0.164*** [0.041] Age 0.012 0.101* 0.119 [0.008] [0.054] [0.182] Age-squared -0.0002* -0.000-0.000 [0.0001] [0.000] [0.000] Highest education: A-level -0.117*** -0.026 0.024 [0.044] [0.100] [0.277] Highest education: University degree -0.363*** -0.263 0.112 [0.071] [0.239] [0.662] Living as couple -0.221*** -0.238* -0.876** [0.074] [0.144] [0.348] Widowed 0.010-0.033-0.316 [0.079] [0.118] [0.283] Divorced 0.004-0.076-0.036 [0.108] [0.225] [0.535] Separated -0.138-0.167-0.539 [0.112] [0.199] [0.462] Never married -0.173 0.024 0.139 [0.130] [0.200] [0.556] Self-employed 0.498*** 0.094 0.069 [0.142] [0.116] [0.255] Unemployed 0.129-0.060 0.120 [0.144] [0.134] [0.350] Retired -0.050-0.104-0.079 [0.121] [0.122] [0.254] Maternity leave -0.135 0.050 0.616 [0.175] [0.223] [0.763] Family care -0.003 0.011 0.259 [0.123] [0.134] [0.270] FT student -0.020-0.242-0.303 [0.169] [0.232] [0.519] Long-term illness/disabled -0.127-0.042 0.189 [0.136] [0.155] [0.392] Government training scheme -0.362-0.404 [0.617] [0.251] Other employment types 0.214 0.188 0.619 [0.295] [0.269] [0.816] Mental distress (GHQ-12: Likert) -0.007** -0.000-0.009 [0.003] [0.004] [0.011] Health: poor 0.043 0.054-0.168 [0.100] [0.105] [0.354] Health: fair 0.032 0.050-0.158 [0.100] [0.105] [0.349] Health: good 0.001-0.042-0.339 [0.102] [0.109] [0.355] 9
Health: excellent -0.012-0.096-0.594 [0.107] [0.114] [0.370] Number of children 0.049* 0.040 0.100 [0.028] [0.038] [0.106] Own home outright -0.038 0.070 0.129 [0.050] [0.083] [0.222] Frequency of talking to neighbours Once or twice a month -0.171-0.108-0.228 [0.146] [0.121] [0.364] Once or twice a week -0.191-0.117-0.282 [0.156] [0.117] [0.341] Most days -0.109-0.059-0.312 [0.155] [0.115] [0.330] Frequency of meeting people Once or twice a month 0.347 0.908** 1.066 [0.282] [0.463] [1.214] Once or twice a week 0.424 0.900* 0.926 [0.274] [0.463] [1.177] Most days 0.551** 1.026** 1.135 [0.270] [0.460] [1.174] Constant 1.563*** -1.552 [0.516] [2.324] Observations 12,421 12,421 3,607 R-squared 0.042 Within R-squared 0.017 Pseudo R-squared 0.033 Number of individuals 5,391 834 Note: *<10%; **<5%; ***<1%. Reference groups are: female; married; employed; health: very poor; talked to neighbors less than once or twice a month; meeting people less than once or twice a month. Regional dummies and wave dummies were included in all regressions. Social class dummies were included in Columns 1 and 2 but not in Column 3 as the CL model would not converge otherwise. 10