PDQ-Notes Reynolds Farley PDQ-Note 7 Quantiles and Medians
PDQ-Note 7 Quantiles and Medians The mean of a distribution is an excellent measure of central tendency. If we sum the years of age reported by all persons living in a state and then divide by the number of people in the computation, we obtain the average age, that is, the mean number of years of life lived by people in that state. However, the median is the other very common measure of central tendency. This is the number that divides a distribution into its upper and lower halves. For example, if you obtain the median age for residents in a state, you will know that one-half of the population reported younger ages than the median age while the other half reported older ages. For many economic indicators, the median is used as a measure of central tendency rather than the mean. This is because persons with very high incomes or earnings substantially raise the mean of a distribution of incomes or earnings. However, their great incomes or earnings have very little impact upon the median. If economic polarization occurs such that the rich get richer over time while the poor stay about the same in income or earnings, the mean of the income or earnings distribution may increase, even increase rapidly over time, while the median hardly changes. After selecting the 1990 PUMS 5% data set, and bringing the Query Setup window to your screen, you may move your cursor to the down arrow in the far right corner of the Query Type box. Click there, and you will find Quantile as a mode. Highlight that Quantile mode, to obtain a number of quantile points in the distribution of any quantitative data item. When you highlight the Quantile mode and look toward the bottom of your Query Setup Window, you will notice that two new boxes have appeared. One of these, Quantile Expression is the box where you enter the data items whose median, or quantile points you wish to learn. You may use a data item from the data set such as age or a created data item such as 1.39*rpincome. To the immediate left of the Quantile Expression box, is a Quantile Order box containing 10 as a default value. This is where you type in the number of slices in the distribution you wish to analyze. If you were interested in the median only, you would enter 2 since you are interested in slicing the distribution into halves so as to find out the median. If you were interested in the quartile points of the distribution; that is, the 25 th percentile point, the median, and the 75 th percentile point; you would enter 4. That is, you want to slice the distribution into four different units, each of them with the same number of observations. If you enter 5 into the Quantile Order box, you will obtain the quintile points of the distribution by slicing the distribution into five parts (but none of these would represent the precise median). If you were interested in knowing what amount of earnings distinguishes the top 1 percent of earnings from the bottom 99 percent, you would enter 100 into the Quantile Order box. Please note that the Quantile Order box has up and down arrows providing you with choices of quantile points. If you wish just the median scroll down until 2 appears in the Quantile Order box. If you wish to obtain 20 quantile points please scroll down to 20. Page 1 2001-12-18
Example 1 Distribution of Men and Women Over Age 59 by Deciles At this point, let s try an example of using PDQ-Explore in the Quantile mode. You might be interested in the decile points of the age distributions of men and women who were at least 60 at the last census date. If so, you would type the following into the Universe/Selection box in the Expert Query window: age>59 To obtain these decile points for men and women, type the following into the Repeat For Each (Dimension 3) box: sex Because your interest is in the distribution by age, you should type the following into the box for Quantile Expression age Because you wish to obtain decile points in the distribution, type the following into the Quantile Order box: 10 Example 2 Distribution of Income for Young Male and Female Physicians You might be interested in the quintile points in the distribution of total income for young men and women who were physicians and who reported at least some income. For this run with PDQ-Explore, you might type the following into the Universe/Selection box: Age>29 & age<50 & occup=84 & rpincome<>0 Note that occup=84 selects persons who reported physician as their occupation and rpincome<>70 selects physicians who had positive or negative incomes. Once again, you wish to compare men and women so you would type the following into the Repeat for each (Dimension 3) box: Sex For this analysis, the data item whose quintile points you wish to obtain will be: Rpincome Since we are interested in the quintile points of income distributions, we type the following into the Quantile Order box or use the scroll bar to scroll down to 5 : 5 After you make the appropriate entries into the boxes in the Quantile mode, please click the Results tab. In just a few seconds, you will see the quintile points you selected for whatever data item is in the Quantile Expression box. Page 2 2001-12-18
Quantile Results The results screen for a query in the Quantile mode produces five columns of data for the data item listed in the Quantile Expression box: N-tile: This reports the quantile whose value is shown to the immediate right. If you asked for the decile points of a distribution, you will see 10 th, 20 th, and 30 th and so forth on your screen. The 50 th N-tile is the median. Cutoff: This is the numerical value in the distribution of the data item associated with the n-tile point whose value is shown to the left. The numerical value, for example, associated with the 20 th decile separates the bottom 20 percent of the distribution from the top 80 percent. Percent of Aggregate: This shows the percent of the total values of the data item you have analyzed that are held by the slice of the distribution identified by the number in the N-tile column to the right. Recall that in the quantile mode, identical percents of the total number of people, households, families or housing units are in each slice. If you are dealing with deciles of the distribution every slice, that is, every decile, will include exactly one-tenth of the number of people or households. However, the bottom 10 percent of a distribution does not, ordinarily have or receive 10 percent of the data item whose distribution you are analyzing. The lowest 10 percent of families typically obtain much less than 10 percent of total income received by all families. And the youngest 10 percent of the population does not have 10 percent of the total years of age reported by a group you are studying. The Percent of Aggregation reports the share of the total values of the data item received or held by the slice of the distribution under consideration. Cumulative Percent: Numbers in this column show the cumulative percent of the values on the data item under consideration received by or held by the slice of the distribution under consideration and by every lower slice. The cumulative percent associated with the median or 50 th decile mark is the share of the total distribution of income, or age or whatever quantity is being studied, that is received by or held by the lower half of the distribution. Cumulative Aggregate: These numbers are similar to those in the Cumulative Percent column but are expressed in the units of the data item, such as years for age and dollars for rpincome or income1. Page 3 2001-12-18
Cautions when Using the Quantile Mode Please think carefully about the data items you enter in the Repeat for each (Axis 3) box. If you enter data items such as occup (for occupation), pob (for place of birth), industry (for industry of employment) or ancstry1 (for first reported ancestry), you will be asking for the quantiles for several hundred distributions in separate tables. You will not be able to readily interpret such an extensive amount of output. Please also think about the number you enter in the Quantile Order box. If you are interested in just the median or just decile points in the distribution, you will produce unnecessarily elaborate and cluttered results if you enter 100 into the Quantile Order box. Many census data items are top coded. The number in the Cutoff column of output associated with the 100.00 N-tile from Quantile results is equal to the largest reported value plus 1 for the data item specified in the analysis. For example, age was top coded at 90 years in 1980 and 1990 so Quantile queries using this data item will show 91 as the age associated with the cutoff value for 100.00 N-tile. Of course, there were some people who reported more than 91 years in the census but their ages were top coded at 90. Earnings and income data items were also top coded so the numbers associated with 100.00 N-tile in the Cutoff column equal the maximum reported earnings or income (subject to possible top coding) plus one dollar. All census data items have numerical codes but not all those codes may be interpreted quantitatively. You may enter any data item you wish into the Quantile Expression box on the Expert Query window and obtain the median or decile points of its distribution. But numerical codes for states were assigned on an alphabetical basis so knowing the median or deciles of that distribution tells you nothing useful. Page 4 2001-12-18