Econ 551 Government Finance: Revenues Winter 2018 Given by Kevin Milligan Vancouver School of Economics University of British Columbia Lecture 8c: Taxing High Income Workers ECON 551: Lecture 8c 1 of 34
Agenda 1. Overview 2. Deriving top tax rates 3. Taxable income elasticity 4. Milligan and Smart (2015) ECON 551: Lecture 8c 2 of 34
Overview Let s recall again the Mirrlees optimal income tax formula we derived a few weeks ago: T (1 T ) = (1 + 1 ε ) (1 F(w 0)) w 0 f(w 0 ) For high-ability workers: high wage (w 0 ), so productivity gain from their work matters more. Few workers above them in the distribution, so smaller revenue gain (1 F(w 0 )) Both these things push to lower marginal tax rates at high ranges. But: For most SWFs, we want to redistribute away from high ability workers. Tax gained on their labour is big: w 0 f(w 0 ) This pushes to lower marginal tax rates at low ranges. So once again, it s complicated. ECON 551: Lecture 8c 3 of 34
ECON 551: Lecture 8c 4 of 34
ECON 551: Lecture 8c 5 of 34
Threshold for paying top rate in 2017: Bracket Threshold Rate NL $ 179,214 18.3% PE $ 98,316 18.4% NS $ 150,000 21.0% NB $ 152,100 20.3% QC $ 103,915 25.8% ON $ 220,000 20.5% MB $ 68,005 17.4% SK $ 129,214 15.0% AB $ 303,900 15.0% BC $ 108,460 14.7% Fed $ 202,800 33.0% US Federal Tax Brackets (2017) single filer $0 to $9,325*: 10% $9,325* to $37,950: 15% $37,950 to $91,900: 25% $91,900 to $191,650: 28% $191,650 to $416,700: 33% $416,700 to $418,400: 35% $418,400+: 39.6% Of course, the taxable income definition is quite different. But still there is something to be learned from this comparison ECON 551: Lecture 8c 6 of 34
Agenda 1. Overview 2. Deriving top tax rates 3. Taxable income elasticity 4. Milligan and Smart (2015) ECON 551: Lecture 8c 7 of 34
Deriving top tax rates Goal will be to define a revenue-maximizing top tax rate behavioural and mechanical effects offset. This is sometimes referred to as the Laffer Curve. http://youtu.be/dxpvyieptwa?t=30s ECON 551: Lecture 8c 8 of 34
The Legend of the Laffer Curve It was traced out on a napkin during a 1974 meal at the Two Continents Restaurant in Washington. Argued that above some point, higher taxes would lead to lower revenues. At the dinner were: Arthur Laffer Jude Wanniski (WSJ) coined the term. Donald Rumsfeld Dick Cheney There are many historical precedents of this notion (Hume, Smith, others ) Location of Laffer peak is empirical question, but clearly of theoretical importance. ECON 551: Lecture 8c 9 of 34
Deriving the revenue maximizing tax rate The development of these formulae follows Saez, Slemrod, and Giertz (2012, p. 6-9) Notation: τ z z z m N Tax rate on top earners. Reported income of top earner. Threshold to be in top bracket. Average income of those in the top bracket. Number of taxpayers in top bracket. Elasticity: e 1 τ z z (1 τ) ECON 551: Lecture 8c 10 of 34
Pareto parameter This can be manipulated further by noticing that the ratio of the mean z m to the threshold z can be manipulated to recover the Pareto coefficient α in the following way. Define the ratio of the threshold to the mean as β = zm. z. This coefficient β is called the inverted Pareto coefficient. It can be shown that the Pareto coefficient α can be expressed as It follows with some basic manipulation that α = β β 1. zm α = ( z m ). z ECON 551: Lecture 8c 11 of 34
Pareto estimates Atkinson Piketty Saez (2011) ECON 551: Lecture 8c 12 of 34
Pareto estimates a quick check on the data Let s check on these numbers using data from the Canadian Revenue Agency here: http://www.cra-arc.gc.ca/gncy/stts/t1fnl-eng.html Use the provided $250,000 threshold for 2011 tax year. Total income for these taxpayers is $170,898,924,000. Total number of taxfilers is 290,260. So, average income of those over threshold z m is $588,779. β is ratio of this average to the threshold: 2.36. α = β = 2.36 = 1.74. 1 β 1.36 This is a bit bigger than the 1.70 implied by the β =2.42 reported by Atkinson, Piketty, Saez (2011) for 2005. ECON 551: Lecture 8c 13 of 34
Putting it all together Building Blocks: Behavioural effect: revenue loss by lower reported taxable income. Mechanical effect: revenue gain from higher tax collected on reported income. Let s write them down and equate them for a given change in tax rates τ. Mechanical effect: Behavioural effect: dm N (z m z ) τ First rearrange elasticity formula to solve for the change in reported income: dz m = e z m dτ (1 τ) db Ndz m τ db = N e z m τ (1 τ) dτ ECON 551: Lecture 8c 14 of 34
Equate and solve Revenue maximizing rate is therefore: dr = dm + db dm N (z m z ) τ db = N e z m τ (1 τ) dτ τ = (1 τ) 1 + e ( zm z m z ) Noticing that we can substitute in the Pareto coefficient α: τ = 1 1 + e α Q: What happens for the highest income taxpayer? (Hint: what is β at the top ) ECON 551: Lecture 8c 15 of 34
Some values for τ alpha 1.4 1.5 1.6 1.7 1.8 1.9 2.0 e 0.10 87.7% 87.0% 86.2% 85.5% 84.7% 84.0% 83.3% 0.25 74.1% 72.7% 71.4% 70.2% 69.0% 67.8% 66.7% 0.50 58.8% 57.1% 55.6% 54.1% 52.6% 51.3% 50.0% 0.75 48.8% 47.1% 45.5% 44.0% 42.6% 41.2% 40.0% As α gets bigger, more equitably distributed less gain from taxing high incomes. As e gets bigger, larger efficiency cost from taxing high incomes. Saez, Slemrod, and Giertz find e of 0.25 reasonable. Assume α = 1.5 for US. Gives 72.7% revenue maximizing rate. But, in UK Mirrlees review, Brewer, Saez, and Shephard (2011) found e=0.46, α = 1.67 Gives 56.6% revenue maximizing rate. We will get to Canada later in the lecture ECON 551: Lecture 8c 16 of 34
Should we aim to push top rate to τ? Diamond and Saez (2011 JEP) argue that we ought to push the top tax rate up until it raises no revenue at the margin the peak of the Laffer curve. Raising the tax rate on the top percentile obviously reduces the utility of high income tax filers. If we denote by g the social marginal value of $1 of consumption for top income earners (measured relative to government revenue), the direct welfare cost is g multiplied by the change in tax revenue collected. Because the government values redistribution, the social marginal value of consumption for top bracket tax filers is small relative to that of the average person in the economy, and so g is small and as a first approximation can be ignored. Feldstein (2012 JEL) reviews the Mirrlees Review, and disagrees. I find the words if society places some positive value on the welfare of those with income in the top tax bracket quite amazing. Who speaks for this society? What kind of nation places no value on the welfare of those with income in the top tax bracket, treating them only as the revenue producing property of the state? ECON 551: Lecture 8c 17 of 34
Agenda 1. Overview 2. Deriving top tax rates 3. Taxable income elasticity 4. Milligan and Smart (2015) ECON 551: Lecture 8c 18 of 34
Digging into taxable income elasticity concept Historically (into the 1980s), we thought mostly about labour supply responses to taxes. In general male labour supply not very wage-elastic. Is this different for high earners? Positional competition? High value of marginal leisure? New tax responsiveness literature looks more broadly. Incorporates not just real responses, but accounting or shifting responses. Shifting: across time, across tax bases, out of country. Invest in tax shelters. ECON 551: Lecture 8c 19 of 34
Example: Trusts to split income ECON 551: Lecture 8c 20 of 34
Feldstein (1999): all we need is taxable income elasticity Notation: L Leisure C Consumption E Excluded compensation (e.g. fancy office) D Deductions Time available=1 max U = U(L, C, E, D) Subject to budget constraint C = (1 t)[w(1 L) E D] Rewrite this as: C 1 (1 t) = w wl E D Insight: We don t need to worry about different elasticities for L, E, D. Same tax wedge of them. Just need to calculate overall elasticity of this tax wedge. (Chetty 2009 caveat: tax avoidance involves some transfer; not all social waste.) 1 (1 t) for all ECON 551: Lecture 8c 21 of 34
Agenda 1. Overview 2. Deriving top tax rates 3. Taxable income elasticity 4. Milligan and Smart (2015) ECON 551: Lecture 8c 22 of 34
Taxing Top Incomes in Canada Milligan and Smart (2015) Try to get estimates of taxable income elasticity for Canada. o Q: Why might Canadian elasticity be different? Use top share data now provided freely by CANSIM 204-0002 http://www5.statcan.gc.ca/cansim/a26?lang=eng&retrlang=eng&id=2040002 Also use Canadian Tax and Credit Simulator http://faculty.arts.ubc.ca/kmilligan/ctacs/ Exploit differences in tax rates across provinces and years o Q: Is there a difference between provincial and federal elasticity? Focus on period from 1988 to 2011. Discard 1982-1987 because of 1988 tax reform. o Q: why might tax reform matter? ECON 551: Lecture 8c 23 of 34
Top tax rates across provinces ECON 551: Lecture 8c 24 of 34
Scatterplot: top 1% share vs. top tax rates ECON 551: Lecture 8c 25 of 34
Deriving the estimating equation Take an individual with income y it and facing tax rate τ it. Could estimate: log y it = α i + δ t + β log(1 τ it ) + ε it But, we can do better if we can group people into a high and a low group. Take the aggregate income in the low and high group. Assume only high group reacts to taxes: log y Ht = α H + δ t + β log(1 τ Ht ) + ε Ht log y Lt = α L + δ t + ε Lt ECON 551: Lecture 8c 26 of 34
Deriving the estimating equation Now let s manipulate the shares First, define the income share of the high group: s Ht = Y Ht Y Ht + Y Lt Now let s take the log of this share, over one minus the share: s Ht z Ht log ( ) 1 s Ht And estimate z Ht = α + β log(1 τ Ht ) + u t But notice this is just the difference-in-difference model from above s Ht log ( ) = log s 1 s Ht log(1 s Ht ) = log Y Ht log Y Lt Ht ECON 551: Lecture 8c 27 of 34
Features of this approach Advantages Can control for time trends. Other work using national time series can t do this. Base is pretty much constant over this time period, and across provinces. Ideal. Disadvantages Can only capture provincial income elasticity. Membership in top share not fixed; mobility could matter. ECON 551: Lecture 8c 28 of 34
Basic results (1) (2) (3) (4) (5) (6) Add income Unweighted Log share Linear Quadratic Dependent Provincial Provincial Variable Trends Trends Basic specification Observations 240 240 240 240 240 240 R-squared 0.941 0.97 0.949 0.97 0.975 0.988 Log (1-MTR) 1.068** 0.689*** 0.723** 0.640*** 0.794** 0.510* [0.441] [0.238] [0.293] [0.210] [0.323] [0.264] Log Total 0.729*** 0.805*** 0.631*** 0.903*** 1.523*** Income [0.0767] [0.0341] [0.0690] [0.179] [0.209] ECON 551: Lecture 8c 29 of 34
Comparing across income definitions (1) (2) (3) (4) Total Income Market Income No Capital With Capital No Capital With Capital Gains Gains Gains Gains Observations 240 240 240 240 R-squared 0.97 0.96 0.964 0.953 Log (1-MTR) 0.689*** 0.817** 0.723*** 0.791** [0.238] [0.364] [0.243] [0.335] Log Total 0.729*** 0.766*** 0.605*** 0.643*** Income [0.0767] [0.0990] [0.0527] [0.0735] ECON 551: Lecture 8c 30 of 34
Results for different income groups (1) (2) (3) (4) P90 P95 P99 P99.9 Observations 240 240 240 190 R-squared 0.962 0.969 0.97 0.952 Log (1-MTR) 0.0246 0.221 0.689*** 1.451*** [0.219] [0.218] [0.238] [0.541] Log Total 0.424*** 0.511*** 0.729*** 0.893*** Income [0.0533] [0.0636] [0.0767] [0.162] ECON 551: Lecture 8c 31 of 34
Extension: simulate impact of provincial rate increases Imagine a province raises its top tax rate by 5 percentage points for the top 1 percent of earners in that province. Potential revenue gain will depend on: Elasticity (common across provinces) Provincial income distribution how much of total income is in the top 1%? Current tax rate: the higher it is already, the bigger the behavioural effect. ECON 551: Lecture 8c 32 of 34
Revenue impact per capita of a 5% increase on top 1% ECON 551: Lecture 8c 33 of 34
Summary Taxing high earners requires careful attention to possible means of tax avoidance. Will be substantial behavioural impact on revenues at current tax rates for provinces. Provinces have much less leeway to increase taxes. Raises the question of federal vs provincial taxation of higher income.see Milligan and Smart (2017) for a model ECON 551: Lecture 8c 34 of 34