Example: Histogram for US household incomes from 2015 Table:

Similar documents
State Individual Income Taxes: Personal Exemptions/Credits, 2011

Income from U.S. Government Obligations

Checkpoint Payroll Sources All Payroll Sources

Annual Costs Cost of Care. Home Health Care

Pay Frequency and Final Pay Provisions

Union Members in New York and New Jersey 2018

Kentucky , ,349 55,446 95,337 91,006 2,427 1, ,349, ,306,236 5,176,360 2,867,000 1,462

AIG Benefit Solutions Producer Licensing and Appointment Requirements by State

Undocumented Immigrants are:

MINIMUM WAGE WORKERS IN HAWAII 2013

Sales Tax Return Filing Thresholds by State

State Income Tax Tables

The Effect of the Federal Cigarette Tax Increase on State Revenue

Termination Final Pay Requirements

The Costs and Benefits of Half a Loaf: The Economic Effects of Recent Regulation of Debit Card Interchange Fees. Robert J. Shapiro

Mapping the geography of retirement savings

State Corporate Income Tax Collections Decline Sharply

MEDICAID BUY-IN PROGRAMS

Impacts of Prepayment Penalties and Balloon Loans on Foreclosure Starts, in Selected States: Supplemental Tables

Motor Vehicle Sales/Use, Tax Reciprocity and Rate Chart-2005

MINIMUM WAGE WORKERS IN TEXAS 2016

Federal Rates and Limits

Forecasting State and Local Government Spending: Model Re-estimation. January Equation

Fingerprint, Biographical Affidavit and Third-Party Verification Reports Requirements

The table below reflects state minimum wages in effect for 2014, as well as future increases. State Wage Tied to Federal Minimum Wage *

Ability-to-Repay Statutes

PAY STATEMENT REQUIREMENTS

Federal Registry. NMLS Federal Registry Quarterly Report Quarter I

Residual Income Requirements

Q Homeowner Confidence Survey Results. May 20, 2010

Understanding Oregon s Throwback Rule for Apportioning Corporate Income

ATHENE Performance Elite Series of Fixed Index Annuities

Metrics and Measurements for State Pension Plans. November 17, 2016 Greg Mennis

Nation s Uninsured Rate for Children Drops to Another Historic Low in 2016

NOTICE TO MEMBERS CANADIAN DERIVATIVES CORPORATION CANADIENNE DE. Trading by U.S. Residents

Required Training Completion Date. Asset Protection Reciprocity

Fingerprint and Biographical Affidavit Requirements

DFA INVESTMENT DIMENSIONS GROUP INC. DIMENSIONAL INVESTMENT GROUP INC. Institutional Class Shares January 2018

The 2017 CHP Salary Survey

2012 RUN Powered by ADP Tax Changes

Q309 NATIONAL DELINQUENCY SURVEY FROM THE MORTGAGE BANKERS ASSOCIATION. Data as of September 30, 2009

EBRI Databook on Employee Benefits Chapter 6: Employment-Based Retirement Plan Participation

Q209 NATIONAL DELINQUENCY SURVEY FROM THE MORTGAGE BANKERS ASSOCIATION. Data as of June 30, 2009

# of Credit Unions As of March 31, 2011

How Much Would a State Earned Income Tax Credit Cost in Fiscal Year 2018?

TA X FACTS NORTHERN FUNDS 2O17

DATA AS OF SEPTEMBER 30, 2010

Recourse for Employees Misclassified as Independent Contractors Department for Professional Employees, AFL-CIO

Aiming. Higher. Results from a Scorecard on State Health System Performance 2015 Edition. Douglas McCarthy, David C. Radley, and Susan L.

A d j u s t e r C r e d i t C E I n f o r m a t i o n S T A T E. DRI Will Submit Credit For You To Your State Agency. (hours ethics included)

Estimating the Number of People in Poverty for the Program Access Index: The American Community Survey vs. the Current Population Survey.

IMPORTANT TAX INFORMATION

Media Alert. First American CoreLogic Releases Q3 Negative Equity Data

STATE AND FEDERAL MINIMUM WAGES

Mutual Fund Tax Information

2014 STATE AND FEDERAL MINIMUM WAGES HR COMPLIANCE CENTER

Mutual Fund Tax Information

ADDITIONAL REQUIRED TRAINING before proceeding. Annuity Carrier Specific Product Training

Providing Subprime Consumers with Access to Credit: Helpful or Harmful? James R. Barth Auburn University

Chapter D State and Local Governments

# of Credit Unions As of September 30, 2011

S T A T E INSURANCE COVERAGE AND PRACTICE SYMPOSIUM DECEMBER 7 8, 2017 NEW YORK, NY. DRI Will Submit Credit For You To Your State Agency

State Tax Treatment of Social Security, Pension Income

White Paper 2018 STATE AND FEDERAL MINIMUM WAGES

Aetna Individual Direct Pay Commissions Schedule

S T A T E TURNING THE TABLES ON PLAINTIFFS IN TRUCKING LITIGATION APRIL 26 27, 2018 CHICAGO, IL. DRI Will Submit Credit For You To Your State Agency

FHA Manual Underwriting Exceeding 31% / 43% DTI Eligibility Quick Reference

J.P. Morgan Funds 2018 Distribution Notice

FAPRI Analysis of Dairy Policy Options for the 2002 Farm Bill Conference

If the foreign survivor of the merger is on the record what do you require?

Child Care Assistance Spending and Participation in 2016

8, ADP,

A d j u s t e r C r e d i t C E I n f o r m a t i o n S T A T E. DRI Will Submit Credit For You To Your State Agency. (hours ethics included)

A d j u s t e r C r e d i t C E I n f o r m a t i o n S T A T E. Pending. DRI Will Submit Credit For You To Your State Agency.

SECTION 109 HOST STATE LOAN-TO-DEPOSIT RATIOS. The Board of Governors of the Federal Reserve System, the Federal Deposit Insurance

STANDARD MANUALS EXEMPTIONS

A d j u s t e r C r e d i t C E I n f o r m a t i o n S T A T E. DRI Will Submit Credit For You To Your State Agency. (hours ethics included)

Supporting innovation and economic growth. The broad impact of the R&D credit in Prepared by Ernst & Young LLP for the R&D Credit Coalition

What is your New Financing Statement Fee? What is your Amendment Fee (include termination fee if a different amount)?

Do you charge an expedite fee for online filings?

February 2018 QUARTERLY CONSUMER CREDIT TRENDS. Public Records

DSH Reduction Allocation Process Flows. DRAFT Based on 5/15/13 NPRM

ADDITIONAL REQUIRED TRAINING before proceeding. Annuity Carrier Specific Product Training

THE STATE OF THE STATES IN DEVELOPMENTAL DISABILITIES

MainStay Funds Income Tax Information Notice

State Minimum Wage Chart (See below for Local/City Minimum Wage Chart)

THE HOME ENERGY AFFORDABILITY GAP 2017

S T A T E MEDICAL LIABILITY AND HEALTH CARE LAW MARCH 2 3, 2017 LAS VEGAS, NV. DRI Will Submit Credit For You To Your State Agency

Fiscal Fact. By Kail Padgitt and Alicia Hansen

STATE MINIMUM WAGES 2017 MINIMUM WAGE BY STATE

By: Adelle Simmons and Laura Skopec ASPE

2019 Summary of Benefits

Fiscal Policy Project

Notice on Reallotment of Workforce Investment Act (WIA) Title I Formula Allotted Funds

Minimum Wage Laws in the States - April 3, 2006

State Social Security Income Pension Income State computation not based on federal. Social Security benefits excluded from taxable income.

Summary of Benefits. Express Scripts Medicare. Value Choice S5660 & S5983. January 1, 2016 December 31, 2016

STATE AND LOCAL TAXES A Comparison Across States

CLMS BRIEF 2 - Estimate of SUI Revenue, State-by-State

Overview of Sales Tax Exemptions for Agricultural Producers in the United States

Transcription:

1 Example: Histogram for US household incomes from 2015 Table: Income level Relative frequency $0 - $14,999 11.6% $15,000 - $24,999 10.5% $25,000 - $34,999 10% $35,000 - $49,999 12.7% $50,000 - $74,999 16.7% $75,000 - $99,999 12.1% $100,000 - $149,999 14.1% $150,000 - $199,999 6.2% $200,000 and over 6.1%

1.2 2 1 Starting with the table of income distribution, we first draw the horizontal axis... % per $1000 0.8 0.6 0.4 0.2 0 50 100 150 200 250... Using a density scale, 2015 weu.s. draw Household rectangles Income over ($1000s) each class interval whose areas equal the percentages of the families in those intervals. The height of each rectangle is equal to the percentage of the observations in the corresponding class interval divided by the length of the class interval (the width of the rectangle).

3 The end-result should look like this 1.2 1 % per $1000 0.8 0.6 0.4 0.2 0 50 100 150 200 250 2015 U.S. Household Income ($1000s) The vertical scale here is percent per $1000 i.e., it is the relative frequency (percentage) divided by the width of the intervals (which in this case are measured in $1000s). It s always a good idea to label the axes.

4 If, for example, we use the relative frequency scale instead of the density scale, the histogram looks like this: 16 14 12 10 % 8 6 4 2 0 50 100 150 200 250 2015 U.S. Household Income ($1000s) This histogram reports the information accurately, but it is misleading. The bins for the higher incomes seem to be much bigger than the bins for the lower incomes because they are wider. (*) If bins have different widths use the density scale.

5 Comment: If all the bins in the distribution table have the same width, then the appearance of the histogram will be the same for all three scales. Only the units (and numbers) on the vertical scale will change. Example: Distribution of coal (by weight) in Christmas stockings of 40 children at Wool s orphanage. ounces of coal number of stockings 0 5 2 5 10 4 10 15 8 15 20 8 20 25 10 25 30 4 30 35 4

6 Histogram with frequency scale: 10 Number of stockings 8 6 4 2 5 10 15 20 25 30 35 Ounces of coal per stocking

7 Histogram with relative frequency scale: 25% Percentage of stockings 20% 15% 10% 5% 5 10 15 20 25 30 35 Ounces of coal per stocking

8 Histogram with density scale: 5 Percentage per ounce of coal 4 3 2 1 5 10 15 20 25 30 35 Ounces of coal per stocking

9 Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. A number that summarizes population data is called a parameter. A number that summarizes sample data is called a statistic. Observations: Population parameters are (more or less) constant. Sample statistics vary with the sample, i.e., their values depend on the particular sample chosen. A sample statistic can be thought of as a variable. Sample statistics are known because we can compute them from the (available) sample data, while population parameters are often unknown, because data for the entire population is often unavailable. One of the most common uses of sample statistics is to estimate population parameters.

10 Measures of central tendency The most extreme way to summarize a list of numbers is with a single, typical value. The most common choices are the mean and median. The mean (average) of a set of numbers is the sum of all the values divided by the number of values in the set. The median of a set of number is the middle number, when the numbers are listed in increasing (or decreasing) order. The median splits the data into two equally sized sets 50% of the data lies below the median and 50% lies above. (If the number of numbers in the set is even, then the median is the average of the two middle values.) The mean and median are different ways of describing the center of the data. Another statistic that is often used to describe the typical value is the mode, which is the most frequently occurring value in the data.

11 Example. Find the mean, median and mode of the following set of numbers: The mean (average). {12, 5, 6, 8, 12, 17, 7, 6, 14, 6, 5, 16}. 12 + 5 + 6 + 8 + 12 + 17 + 7 + 6 + 14 + 6 + 5 + 16 12 = 114 12 = 9.5. The median. Arrange the data in ascending order, and find the average of the middle two values in this case, since there are an even number of values: 5, 5, 6, 6, 6, 7, 8, 12, 12, 14, 16, 17 median = 7 + 8 2 = 7.5. The mode is 6, because 6 occurs most frequently (three times).

12 Comments: The mean is sensitive to outliers extreme values in the data (much bigger or much smaller than most of the data). Big outliers pull the mean up and small outliers pull the mean down. The median gives a better sense of middle when the data is skewed in one direction or the other. The mean is easier to use in mathematical formulas. Both the median and the mean leave out a lot of information. E.g., each one separately tells us nothing about the spread of the data or where we might find peaks (modes) in the distribution, etc.

13 On the other hand, if we know both, then the relative positions of the mean and median provide some information about how the data is distributed... In this histogram the mean is bigger than the median. 50% 50% Median Mean This is an indication that there are large outliers the histogram has a longer tail on the right. We say that the data is skewed to the right.

14 In this histogram the mean is smaller than the median. 50% 50% Mean Median This is an indication that there are small outliers the histogram has a longer tail on the left. We say that the data is skewed to the left.

15 If the mean and median are (more or less) equal, then the tails of the distribution are (more or less) the same, and the data has a (more or less) symmetric distribution around the mean/median, as depicted below. 50% 50% Median Mean

16 Example: Here is the histogram that we constructed before: 1.2 1 % per $1000 0.8 0.6 0.4 0.2 0 50 100 150 200 250 2015 U.S. Household Income ($1000s) The histogram is skewed to the right, indicating that the mean will be larger than the median in this case.

17 (*) The mean income (estimated from the sample data) is about $79,263. (*) We can find the (approximate) median by reading the histogram. (*) Remember: the area of each bar represents the percentage of the population with income in the corresponding range. We find the areas of the bars, starting from the leftmost interval (0 15), and stop when we reach 50%. 1.2 1.0 Mean $79,263 % per $1000 0.8 0.6 0.4 0.2 0 15 25 35 50 75 100 150 200 250 2015 U.S. Household Income ($1000s)

18 1.2 1.0 Mean $79,263 % per $1000 0.8 0.6 0.4 0.2 0 15 25 35 50 75 100 150 200 250 2015 U.S. Household Income ($1000s) 0 to 15: 0.78 % $1000 25 to 35: 1 % $1000 $15000 = 11.7%, 15 to 25: 1.05 % $1000 $10000 = 10%, 35 to 50: 0.85 % $1000 $10000 = 10.5% $15000 = 12.75% 0 to 50: area 11.7% + 10.5% + 10% + 12.75% = 44.95%... Need another 5%. 50 to 75: area 0.66 % $25000 = 16.5%. Need to go a little less than one $1000 third the way from 50 to 75 to get another 5%... Median $57, 500.

19 More precise estimate (using all of the survey data): Median $56, 516 1.2 Median $56,516 1.0 Mean $79,263 % per $1000 0.8 0.6 0.4 0.2 0 50 100 150 200 250 2015 U.S. Household Income ($1000s)

20 The mean and median describe the middle of the data in somewhat different ways: Percentage of US Households per Income (data from 2006 Economic Survey) The 1.5% median divides the histogram into two halves of equal area: it 1.5 divides the data into two equal halves. median 50% of the data 50% of the data

21 Percentage of US Households per Income (data from 2006 Economic Survey) 1.5% The mean is the balance point of the data: 1.5 mean Balance Point

22 Averages and medians give a snapshot of a set of data. If the data comprises more than one variable, we can divide the data into categories with respect to one variable, and study the average/median of another variable in each category separately. This allows researchers to discern relationships between different variables. Example: The following graph comes from the 2005 American Community Survey of the US Census Bureau. It plots median household income by state.

23 Figure 1. Median Household Income in the Past 12 Months With 90-Percent Confidence Intervals by State: 2005 New Jersey Maryland Connecticut Hawaii Massachusetts New Hampshire Alaska Virginia California Delaware Minnesota Rhode Island Colorado Illinois New York Washington Nevada Utah District of Columbia Wisconsin United States Wyoming Michigan Vermont Georgia Pennsylvania Arizona Indiana Nebraska Iowa Ohio Oregon Kansas Maine Florida Texas Missouri Idaho North Dakota North Carolina South Dakota South Carolina Montana Tennessee New Mexico Kentucky Oklahoma Alabama Louisiana Arkansas West Virginia Mississippi 2005 estimate 90-percent confidence interval $30,000 $35,000 $40,000 $45,000 $50,000 $55,000 $60,000 $65,000 Source: U.S. Census Bureau, 2005 American Community Survey. 4 Income, Earnings, and Poverty Data From the 2005 American Community Survey U.S. Census Bureau

24 Notational Interlude: The population mean (a parameter) is denoted by the Greek letter µ ( mu ). If there are several variables being studied, we put a subscript on the µ to tell us which variable it pertains to. For example, if we have data for population height (h) and population weight (w), the mean height would be denoted by µ h and the mean weight by µ w. The mean of a set of sample data (a statistic) is denoted by putting a bar over the variable. E.g., if {h 1, h 2, h 3,..., h n } is a sample of heights, then the average of this sample would be denoted by h. The median is usually denoted by m or M, and sometimes by Q 2 (more on this later). We can use summation notation to simplify the writing of (long) sums: n h 1 + h 2 + h 3 + + h n = h j = h j. j=1

25 For example we can write: h = h 1 + h 2 + + h n n = 1 n (h 1 + h 2 + + h n ) = 1 n hj. Comment: The point of summation notation is to simplify expressions that involve sums with many terms, or in some cases, an unspecified number of terms. All the usual rules/properties of addition continue to hold. In particular (i) (h j ± w j ) = h j ± w j (ii) ( ) (a h j ) = a hj and (iii) c = n c (here n is the number of constant terms).

26 Measuring the spread of the data The mean and median describe the middle of the data. To get a better sense of how the data is distributed, statisticians also use measures of dispersion. The range is the distance between the smallest and largest values in the data. The interquartile range is the distance between the value separating the bottom 25% of the data from the rest and the value separating the top 25% of the data from the rest. In other words, it is the range of the middle 50% of the data. Example: In the histogram describing household income distribution, about 25% of all households have incomes below $28,000 and about 25% of all households have incomes above $145,000, so the interquartile range is $145, 000 $28, 000 = $117, 000.

27 The standard deviation: The standard deviation of a set of numbers is something like the average distance of the numbers from their mean. Technically, it is a little more complicated than that. If x 1, x 2, x 3,..., x n are numbers and x is their mean, then one candidate for measuring spread is the average deviation from the mean: (x 1 x) + (x 2 x) + + (x n x) n = 1 n (xj x). Potential problem: positive terms and negative terms in the sum can cancel each other out... How much cancellation? 1 (xj x) = 1 xj 1 x n n n n {}}{ x + x + + x = x n = x n x n = x x = 0 Complete cancellation!

28 Instead, statisticians use the standard deviation, which is given by 1 SD x = (xj x) n 2. In words, the SD is the root of the mean of the squared deviations of the numbers from their mean. (*) Squaring the deviations fixes the cancellation problem... (*)... but exaggerates both very small deviations (making them smaller) and very large deviations (making them bigger)... (*)... and also changes the scale (e.g., from inches to squared inches). (*) Taking the square root of the average squared deviation fixes both of these problems (to a certain extent).

29 (*) If a lot of the data is far from the mean, then many of the (x j x) 2 terms will be quite large, so the mean of these terms will be large and the SD of the data will be large. (*) In particular, outliers can make the SD bigger. (Outliers have an even bigger effect on the range of the data.) (*) On the other hand, if the data is all clustered close to the mean, then all of the (x j x) 2 terms will be fairly small, so their mean will be small and the SD will be small. To be continued...