An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

Similar documents
The Integration of the Israel Labour Force Survey with the National Insurance File

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

Linear Combinations of Random Variables and Sampling (100 points)

Tests for Two Correlations

3: Central Limit Theorem, Systematic Errors

Spurious Seasonal Patterns and Excess Smoothness in the BLS Local Area Unemployment Statistics

OCR Statistics 1 Working with data. Section 2: Measures of location

Evaluating Performance

MgtOp 215 Chapter 13 Dr. Ahn

Introduction. Why One-Pass Statistics?

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Dr. Wayne A. Taylor

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Analysis of Variance and Design of Experiments-II

Spatial Variations in Covariates on Marriage and Marital Fertility: Geographically Weighted Regression Analyses in Japan

Price and Quantity Competition Revisited. Abstract

International ejournals

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

Elton, Gruber, Brown and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 4

Financial mathematics

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

Consumption Based Asset Pricing

Finance 402: Problem Set 1 Solutions

Welfare Aspects in the Realignment of Commercial Framework. between Japan and China

A Simulation Study to Compare Weighting Methods for Nonresponses in the National Survey of Recent College Graduates

Capability Analysis. Chapter 255. Introduction. Capability Analysis

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

Tests for Two Ordered Categorical Variables

Elements of Economic Analysis II Lecture VI: Industry Supply

/ Computational Genomics. Normalization

Chapter 3 Student Lecture Notes 3-1

Chapter 3 Descriptive Statistics: Numerical Measures Part B

Privatization and government preference in an international Cournot triopoly

Survey of Math: Chapter 22: Consumer Finance Borrowing Page 1

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

Monetary Tightening Cycles and the Predictability of Economic Activity. by Tobias Adrian and Arturo Estrella * October 2006.

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

Available online at ScienceDirect. Procedia Computer Science 24 (2013 ) 9 14

Survey of Math Test #3 Practice Questions Page 1 of 5

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

University of Toronto November 9, 2006 ECO 209Y MACROECONOMIC THEORY. Term Test #1 L0101 L0201 L0401 L5101 MW MW 1-2 MW 2-3 W 6-8

Finite Math - Fall Section Future Value of an Annuity; Sinking Funds

Random Variables. 8.1 What is a Random Variable? Announcements: Chapter 8

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 12

arxiv: v1 [q-fin.pm] 13 Feb 2018

EDC Introduction

Alternatives to Shewhart Charts

Simple Regression Theory II 2010 Samuel L. Baker

OPERATIONS RESEARCH. Game Theory

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Scribe: Chris Berlind Date: Feb 1, 2010

Raising Food Prices and Welfare Change: A Simple Calibration. Xiaohua Yu

2) In the medium-run/long-run, a decrease in the budget deficit will produce:

Skewness and kurtosis unbiased by Gaussian uncertainties

Uncertainties in the Swedish PPI and SPPI

The Effects of Industrial Structure Change on Economic Growth in China Based on LMDI Decomposition Approach

Final Exam. 7. (10 points) Please state whether each of the following statements is true or false. No explanation needed.

Work, Offers, and Take-Up: Decomposing the Source of Recent Declines in Employer- Sponsored Insurance

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

On Robust Small Area Estimation Using a Simple. Random Effects Model

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/15/2017. Behavioral Economics Mark Dean Spring 2017

Parallel Prefix addition

Risk and Return: The Security Markets Line

Hewlett Packard 10BII Calculator

THIS PAPER SHOULD NOT BE OPENED UNTIL PERMISSION HAS BEEN GIVEN BY THE INVIGILATOR.

UNIVERSITY OF NOTTINGHAM

Xiaoli Lu VA Cooperative Studies Program, Perry Point, MD

Dates July 2010, Revised November 2010, Final Revised March Total Words 7,462 (5,962 Words, 5 Tables, 1 Figure) *Corresponding author

Market Opening and Stock Market Behavior: Taiwan s Experience

Physics 4A. Error Analysis or Experimental Uncertainty. Error

ISE High Income Index Methodology

PASS Sample Size Software. :log

ISE Cloud Computing Index Methodology

Taxation and Externalities. - Much recent discussion of policy towards externalities, e.g., global warming debate/kyoto

Quiz 2 Answers PART I

Cyclic Scheduling in a Job shop with Multiple Assembly Firms

arxiv:cond-mat/ v1 [cond-mat.other] 28 Nov 2004

A Bootstrap Confidence Limit for Process Capability Indices

Fall 2017 Social Sciences 7418 University of Wisconsin-Madison Problem Set 3 Answers

THE MARKET PORTFOLIO MAY BE MEAN-VARIANCE EFFICIENT AFTER ALL

SOCIETY OF ACTUARIES FINANCIAL MATHEMATICS. EXAM FM SAMPLE SOLUTIONS Interest Theory

Nonresponse in the Norwegian Labour Force Survey (LFS): using administrative information to describe trends

Numerical Analysis ECIV 3306 Chapter 6

Harmonised Labour Cost Index. Methodology

ASSESSING GOODNESS OF FIT OF GENERALIZED LINEAR MODELS TO SPARSE DATA USING HIGHER ORDER MOMENT CORRECTIONS

S yi a bx i cx yi a bx i cx 2 i =0. yi a bx i cx 2 i xi =0. yi a bx i cx 2 i x

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh

Pivot Points for CQG - Overview

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

Creating a zero coupon curve by bootstrapping with cubic splines.

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions

Examining the Validity of Credit Ratings Assigned to Credit Derivatives

Random Variables. b 2.

Stochastic Generation of Daily Rainfall Data

Construction Rules for Morningstar Canada Momentum Index SM

Domestic Savings and International Capital Flows

Standardization. Stan Becker, PhD Bloomberg School of Public Health

Transcription:

Secton on Survey Research Methods An Applcaton of Alternatve Weghtng Matrx Collapsng Approaches for Improvng Sample Estmates Lnda Tompkns 1, Jay J. Km 2 1 Centers for Dsease Control and Preventon, atonal Center for Statstcs, 3311 Toledo Road, Room 3115, Hyattsvlle, MD, 20782 2 Centers for Dsease Control and Preventon, atonal Center for Statstcs, 3311 Toledo Road, Room 3111, Hyattsvlle, MD, 20782 Abstract When creatng sample weghts, most U.S. government agences combne small race groups such as the Amercan Indans and Asans wth Whtes dsregardng the dfferent coverage ratos of the groups. Ths paper examnes ths methodology usng the 2003 atonal Intervew Survey (HIS) data of the atonal Center for Statstcs (CHS) and reports the effect on the sample weghts and estmates, specfcally for Whtes, Amercan Indans (AI) and Asans. Two alternatve weghtng approaches wll be used n an effort to reduce the bas. KEY WORDS: coverage rato, sample weghtng, cell collapsng. 1. Introducton Before fnal weghts are developed for survey data, a poststratfcaton (rato or ntal adjustment) factor (PSF) s calculated for each cell (row or column) of a weghtng matrx and appled to the cell. However, for some cells, poststratfcaton factors cannot be computed. For example, f the sample count s zero for a cell, t s mpossble to calculate the PSF because the denomnator of the nvolved fracton s zero. Also f the raw sample count for a fracton s small, the fracton would be consdered unstable. Because of these occurrences n many surveys, cells are checked as to whether they have enough raw sample cases to stand by themselves.. Addtonally, for most surveys, the cells are checked to see whether ts PSF les wthn an acceptable range. Ths rato crteron assures that the fnal weghts are not too large or too small. It should be noted that very large or small weghts can nflate the varance of estmates. If a cell fals ether of the above tests, t s combned wth another cell. The cell collapsng strategy descrbed above has merts. However, Km (2004) rased a potental problem of combnng cells whch are dfferent n coverage ratos. Let be the control count for cell, ˆ the ntally weghted sample count for cell, =1, 2 and f ˆ =, =1, 2, the Intal Adjustment Factor (IAF) for cells 1 and 2. Then 1, = 1, 2, s the coverage rato for cell, f = 1, 2. Let 2= c 1. The PSF for the combned cell was expressed by Km (2004) as: and For cell 1: for cell 2: (1 + c) f 1 f1(1 + c) f 2, (1). (2) Before collapsng, the PSF for cell 1 s f 1. However, because of collapsng, as shown n equaton (1), f 1 s (1 + c) modfed by, whch s called the Collapsng Adjustment Factor (CAF) for cell 1 by Km, et all f1(1 + c) (2005). Smlarly, for cell 2, the CAF s. Usng the above formulas, we can make the followng f1 observatons: when c = 10 and = 4.0, cell 1 wll lose 73 percent of ts own weght to cell 2. For the same c, f f1 =.25, cell 1 wll gan an addtonal 214 percent of ts own weght from cell 2. ote that ths weght shft s artfcal. Thus, Km (2004) and Km and Tompkns (2007) clamed that the current approach of cell collapsng can ntroduce bas, whch can often be large. Most surveys collapse a cell (row or column) wth another f the PSF (rato) for the cell s greater than 2. Ths standard collapsng procedure allows the PSF of the poorly covered cell to decrease below 2. Hence, Km (2004) proposed to truncate (censor) the PSF for the cell at 2 to make sure that the PSF for that cell s 2 or at least 2, dependng on the method. Km, et al (2007) mplemented these two approaches of weght truncaton n ther smulaton studes and found that the latter 3024

Secton on Survey Research Methods outperforms the former and the standard collapsng procedure. When creatng sample weghts, most U.S. government agences combne small race groups such as the Amercan Indans and Asans wth Whtes dsregardng the dfferent coverage ratos of the groups. Ths paper examnes ths methodology usng the 2003 atonal Intervew Survey (HIS) data of the atonal Center for Statstcs (CHS) and reports the effect on the sample weghts and estmates, specfcally for Whtes, Amercan Indans (AI) and Asans. Two alternatve weghtng approaches wll be used n an effort to reduce the bas. 2. Cell Collapsng and Alternatve Weghtng Approaches The HIS uses the followng weghtng matrx: Table 1. Weghtng Matrx < 1 yr 1-4 5-9 10-14 15 19.. Hspanc on-hspanc Black on-hspanc Other M F M F M F In the above table, M stands for male and F for female. The non-hspanc other category, as mentoned before, ncludes all non-hspanc races other than non-hspanc Blacks, (.e., t ncludes Whtes, Amercan Indans, Asans, atve Hawaan and Pacfc Islanders and all multple race groups). It s nterestng to see how much the coverage ratos dffer among the race groups n the others race category. Tables 2a and 2b present coverage ratos for Whtes, Amercan Indans (AI) and Asans by age categores from the 2003 HIS. Table 2a. Ratos for 2003 HIS - Males Age Group Whte AI Asan < 1.85.17.33 1-4.80.44.66 5-9.79.88.59 10-14.80.65.54 15-17.84.46.75 18 19.61.26.55 20 24.59.55.51 25-29.60.44.31 30-34.67.54.65 35-44.67.32.53 45-49.65.51.63 50-54.67.54.57 55-64.70.53.47 65-74.75.44.44 75+.71.51.65 Table 2b. Ratos for 2003 HIS - Females Age Group Whte AI Asan < 1.82 -.38 1-4.80.43.71 5-9.84.70.78 10-14.76.95.70 15-17.77.25.67 18 19.72.10.50 20 24.59.57.50 25-29.68.39.56 30-34.75.46.57 35-44.76.59.59 45-49.76.36.67 50-54.80.31.53 55-64.78.62.45 65-74.75.12.48 75+.76.36.64 In Table 2a, except for one age group (5 9 years), Whte males always have hgher coverage ratos than Amercan Indan males. Also, Whte males always have hgher coverage ratos than Asan males, wthout excepton. One extreme case s age group less than 1, where the coverage rato for Whte males s.85, whle that for Amercan Indans s.17. The coverage rato for Amercan Indan males age < 1 s only 1/5 of that for Whtes. For the same age group, the Asan coverage rate s less than half that of Whtes. Of 15 male age groups, 7 age groups have coverage ratos less than.5 for Amercan Indans. For the 18 19, 20 24 and 25 29 years age groups, coverage ratos for Whtes are also low, but those for Amercan Indans and Asans are even lower, sometmes less than half of that for Whtes. As for females n Table 2b, Whtes always have hgher coverage ratos than Amercan Indans, wth one excepton (10 14 years of age). Also, Whtes are better covered than Asans for all age groups. For the 18 19 years age group, Whtes have a coverage rate whch s more than 7 tmes better than that of Amercan Indans. For the 65 74 year age group, the coverage rato for Whtes s more than 6 tmes that of Amercan Indans. Qute often the Whte coverage rate s much better than that of Amercan Indans. 3025

Secton on Survey Research Methods The followng example demonstrates the effect on weghts and estmates when two cells wth very dfferent coverage ratos are combned. Example 1. Suppose we have the followng ntally weghted sample counts, control counts and the ntal adjustment factors for 2 cells, one for Whtes and the other for Amercan Indans n Table 3. Table 3. Sample Weghtng Data ˆ AI 50 300 6 Whte 17,000 20,000 1.17647 When Whte and Amercan Indan cells n the above table are combned, the new PSF for the combned cell s 300 + 20,000 = 1.1906158 50 + 17, 000 The orgnal PSF for Amercan Indans was 6, but the new PSF s 1.1906158. Hence, the new weghted total for Amercan Indans s 1.1906158 50 60. Snce the control count s 300, we observe an underestmaton of 240, whch equates to an 80 percent underestmaton of Amercan Indans n ths cell. On the other hand, the orgnal PSF for Whtes s 1.18, but the new PSF s 1.1906158. Thus, the new weghted total s 20,240, whch s greater than the control count (20,000). In other words, Whtes pcked up an addtonal weght of 240 due to collapsng. Ths amount s 1.2 percent of the control count (20,000). ote that a 1.2 percent overestmaton for Whtes s neglgble, but an 80 percent underestmaton for Amercan Indan s large. In fact, the Collapsng Adjustment Factors (CAFs) for cells 1 and 2 from equatons (1) and (2) have been mplctly appled to f 1 (6) to reach 1.1906158 n equaton (3). That s, the CAF for cell 1 s: 1.17647(20, 000 / 300 + 1) =.1984358 6(20, 000 / 300) + 1.17647 The new PSF for cell 1 s 6(.1984358) = 1.1906148. f (3) (4) There s a slght dfference between the values n equatons (3) and (4), whch s due to roundng error. As mentoned before, the category of Whte males age <1 has a much hgher coverage rato than Amercan Indans and Asans. The same observaton can be made for females. Consequently, both Whte males and females age <1 were overestmated by 7 percent n 2003. For both genders, n all except two age groups, Whtes are better covered than Amercan Indans, whch causes the former to absorb weghts from the latter. As a result, Amercan Indans, overall, were underestmated by 29.7 percent, as wll be seen n secton 3. Smlarly, Asans were underestmated by 20.7 percent. To rectfy ths problem, we propose two alternatve weghtng procedures. The frst s to weght Amercan Indans and Asans ndependently. Amercan Indans had 197 raw sample cases, whch s enough for ndependent sample weghtng. The number of sample persons s 1,200 for Asans, whch ˆ s more than enough for ndependent sample weghtng. The second procedure s to artfcally nflate to.5 the coverage ratos whch are orgnally lower than.5. Ths s to protect the sample cases n the cells whose coverage ratos are too low, or whose PSF s too hgh. Ths approach s to ensure that the fnal weghted total n the cell s at least half the control count. Accordng to ths approach, the PSF can sometmes go much hgher than 2. Ths approach s somewhat consstent wth the weght truncaton approach by Km, et al (2007). They consdered two approaches of weght truncaton: one allows PSF to go over the threshold (2), but the other does not. The approach proposed here s smlar n sprt to the former. The protecton of the weghts n the poorly covered cells s greater n the approach proposed here because the PSF for ths new approach can ncrease much more than that consdered by Km, et al. Example 2 (Table 4) numercally llustrates the approach proposed here. Table 4. Sample Weghtng Data ˆ AI 50 150 300 6 2 Whte 17,000 20,000 1.17647 In Table 4, we set f for Amercan Indan equal to 2, nstead of 6 as n Table 3. To do so, we had to multply ˆ (50) by 3 to make t 150. In other words, to make sure that f = 2, we had to artfcally nflate ˆ by a factor of 3. If the orgnal f were 3 (ths means ˆ = 100), then we had to artfcally nflate ˆ by a factor of 1.5, nstead of 3. f 3026

Secton on Survey Research Methods When Whte and Amercan Indan cells n the above table are combned, the new PSF for the combned cell s 300 + 20,000 = 1.18367 150 + 17, 000 The new PSF for Whtes s 1.18367, but that for Amercan Indans s 3(1.18367) = 3.55101. Compare 1.1906158 to 3.55101 for the Amercan Indan cell s PSF. The new cell estmate for Amercan Indans s 50(3.55101) = 177.5505. Snce the control count s 300, we observe an underestmaton of 122, whch equates to an approxmate 41 percent underestmaton of Amercan Indans n ths cell. Ths s a bg mprovement n comparson to the result of the orgnal cell collapsng approach. 3. Alternatve Sample Weghtng When ndependently weghtng the sample for Amercan Indans and Asans, a mnmum raw sample count of 20 was used for cell collapsng. That s, startng wth the age group <1 cell, f a raw sample count was less than 20 for a cell, t was combned wth the next nearest cell. It should be noted that no artfcal nflaton of the weghts was done whle combnng cells n each of the race groups. Artfcally nflatng the weghts was, however, employed n collapsng Amercan Indans and Asans wth Whtes. After weghtng was completed, weghts for each sample unt were accumulated for Amercan Indans and Asans, where the results are shown n Tables 5 and 6, respectvely. Table 5. Amercan Indan Weghtng (n 1,000 s) (5) Total Weght Control Count Current 1,496 (-29.7%) 2,127 Inflated 1,752 (-17.4%) 2,127 Independent 2,127 2,127 As the Table 5 shows, when we rely on the current weghtng procedure,.e., when Amercan Indans are collapsed wth Whtes for weghtng, the weght total for Amercan Indans s 29.7 percent lower than ts control count. On the other hand, when a specal measure was taken to protect the weghts n the cells whose coverage ratos were lower than.5, the weght total mproved over the current approach by 12.3 percent. However, the nflaton approach stll underestmates the control count by 17.4 percent. There are two reasons for ths. Frst, we dd not take any measure to protect the cells whose coverage ratos were hgher than.5, even f coverage rato for Amercan Indans was lower than that for Whtes. Second, even f we gave hgher PSF s to cells whose coverage ratos were lower than.5, we dd not rase the rato all the way to the same level as that for Whtes. As can be predcted, when the ndependent weghtng approach was used, the total weght s the same as control. Table 6. Asan Sample Weghtng (n 1,000 s) Total Weght Control Current 9,369 (-20.7%) 11,817 Inflated 9,753 (-17.5%) 11,817 Independent 11,817 11,817 As shown n Table 6, when Asan cells are collapsed wth Whtes for weghtng, as n the current approach, Asans are underestmated by 20.7 percent. ote that ths underestmaton rate s better than that for Amercan Indans. Ths s because Asans, n general, have better coverage ratos than Amercan Indans for both genders. When the nflaton approach was used, the weghted total mproved over the current approach by only 3.2 percent. Ths mprovement s much lower than that observed for Amercan Indans. The dfference s due to the fact that 16 out of 30 Amercan Indan age groups have coverage ratos less than.5, but for Asans, the same observaton could be made for only 7 age groups. Prevalence rates were calculated for 4 health characterstcs based on the three cell collapsng approaches: dabetes, health nsurance coverage, overnght hosptal stay and asthma. It should be noted that one rate for each race was computed just as n publshed survey reports. Table 7 presents prevalence rates for Amercan Indans. Table 7. Prevalence Rates for Amercan Indans Weghted Total as Denomnator Dabetes 9.22 9.43 10.28 Insurance 64.90 63.72 65.33 Overnght 7.73 8.67 8.25 Hosptal Stay Asthma 17.41 16.32 18.04 In Table 7, for all 4 health characterstcs, the prevalence rate for the ndependent weghtng approach s hgher than that for the current weghtng approach. The bggest dfference can be observed for dabetes. The ndependent weghtng approach provdes the prevalence rate for dabetes more than 1 percentage (n absolute term) hgher than the current approach. It s 11 percent hgher n relatve term. The nflaton approach s rate s hgher for 2 characterstcs than the current approach s rate, but 3027

Secton on Survey Research Methods t s lower than the ndependent approach s rate. However, for 2 other characterstcs, the prevalence rate for the truncaton approach s lower than that of the current approach. Table 8 presents prevalence rates for Asans. Table 8. Prevalence Rates for Asans Weghted Total as Denomnator Dabetes 4.35 4.50 4.70 Insurance 83.49 83.44 83.70 Overnght 4.85 5.05 5.09 Hosptal Stay Asthma 5.96 5.84 5.83 As shown n Table 8, the prevalence rate for the ndependent weghtng approach s hgher than that for the current weghtng approach except for asthma. The truncaton approach provdes prevalence rates closer to that of the ndependent weghtng approach for all varables, except for health nsurance. The dfference for the prevalence rates between the current and the ndependent weghtng approach for Asans s much smaller than that for Amercan Indans. Ths may be due to the fact that the coverage ratos for Asans are much more stable than those for Amercan Indans. ote that n calculatng the prevalence rates n Tables 7 and 8, estmated counts were used for both numerators and denomnators. However, control (populaton) counts nstead of estmated counts (weghted totals) can be used for the denomnator, whle estmated counts are stll used for numerator. For example, suppose researchers want to calculate the prevalence rates for Amercan Indans or Asans resdng n certan age groups regons of the naton, snce CHS report does not show the rates for regons. To do so, they can cumulate weghts of, for example, dabetc people n the regons and compute the prevalence rates usng the cumulated weghts as the numerator and the populaton count as the denomnator. The followng two tables show the prevalence rates calculated n that manner: Table 9. Prevalence Rates for Amercan Indans Control Count as Denomnator Dabetes 6.48 7.77 10.28 Insurance 45.65 52.49 65.33 Overnght 5.44 7.14 8.25 Hosptal Stay Asthma 12.25 13.44 18.04 Tables 7 and 9 show the prevalence rates for Amercan Indans. The rates n Table 7 are computed wth the weghted total n the denomnator and those n Table 9, wth the populaton count n the denomnator. The rates n Table 9 are much lower than those n Table 7, except for those for the ndependent weghtng method, whch are the same. The rate for the current approach n Table 9 s 29.7 percent lower than that n Table 7 for each of the four health characterstcs. Smlarly, the rates for the nflaton approach n Table 9 are 17.6 percent lower than those n Table 7. In Table 9, the rates for the current approach are almost one thrd lower than those for the ndependent weghtng approach. The rates for the nflaton approach are between the two approaches. Table 10. Prevalence Rates for Asans Control Count as Denomnator Dabetes 3.45 3.71 4.70 Insurance 66.19 68.87 83.70 Overnght 3.85 4.17 5.09 Hosptal Stay Asthma 4.73 4.82 5.83 Both Tables 8 and 10 show the prevalence rates for Asans. The relatonshp between Tables 8 and 10 s the same as that between Table 7 and Table 9. The rates n Table 10 are much lower than the rates n Table 8, except for those for the ndependent weghtng method, whch remans the same. The rate for the current approach n Table 10 s 20.7 percent lower than that n Table 8 for each of the four health characterstcs. Smlarly, the rates for the nflaton approach n Table 10 are 17.4 percent lower than those n Table 8. Agan, these dfferences are due to the dfferent denomnators, that s, the weghted total or the control count. Comparsons between the rates n Table 7 and the rates n Table 9 and between the rates n Table 8 and the rates n Table 10 show that when the prevalence rates are calculated t s better to use the weghted totals as the denomnator for Amercan Indans and Asans. 4. Concludng Remarks Thus far, we have observed that combnng cells wth varyng coverage ratos results n under- and overestmaton of populaton (control) counts. In order to 3028

Secton on Survey Research Methods allevate ths problem, we proposed ndependent weghtng and weght nflaton approaches for collapsng cells, mplemented these approaches usng HIS data and compared them wth the current weghtng procedure. Currently, Amercan Indans and Asans are combned wth Whtes for sample weghtng. However, coverage rates for Whtes are better, often much better, than those for Amercan Indans n 28 out of 30 age groups. rates for 3 age groups for Amercan Indans are extremely low,.e., they are n the 10 17 percent range, whle they are at least 72 percent for Whtes. Because of ths, the current weghtng approach underestmated Amercan Indan by 29.7 percent. Also Whtes consstently had better coverage ratos than Asans, and as a result, the current weghtng approach underestmated Asans by 20.7 percent. We also estmated the prevalence rates for dabetes, health nsurance coverage, overnght hosptal stay and asthma usng the weghts developed by three dfferent ndependent weghtng approach, except for health nsurance. The prevalence rate can be calculated usng two methods. One s to use weghted counts for both numerator and denomnator, and the other s to use weghted counts for the numerator, but populaton counts for the denomnator. The frst approach was used for the tables above. However, f the second approach were to be used, the rates would be underestmated by 29.7 percent for Amercan Indans and by 20.7 percent for Asans wth the current collapsng approach and 17.7 percent and 17.4 percent, respectvely, wth the nflaton approach. Ths s because ther weghted totals are lower than ther respectve populaton counts. Thus, the frst approach s recommended for computng the prevalence rates. The publc use mcro data (PUM) fle from the survey data we used for ths study has been released to the general publc. ote that the PUM fle contans sample weghts for sample persons n the fle. Some data users of the PUM fle mght want to accumulate weghts for Amercan Indans or Asans, say wth dabetes, to come up wth the number of dabetc Amercan Indans or Asans n the naton or some regon of the naton. However, the result would be a gross underestmaton of the true values for the reason mentoned above. A better approach of gettng the number of dabetc Amercan Indans or Asans n the naton or a regon would be to calculate the prevalence rate usng weghted counts for both the numerator and the denomnator and to then multply the rate by the Amercan Indan or Asan populaton count, respectvely. cell collapsng approaches. For all 4 health characterstcs, Amercan Indans show hgher prevalence rates when they are weghted ndependently than when they are weghted as a part of the Other race category (.e., when they are weghted whle combned wth Whtes). The Amercan Indan dabetes prevalence rate s more than 1 percent hgher when the ndependent weghtng approach s used (10.28 %) than when current weghtng approach s used (9.22 %). The weght nflaton approach shows mxed results for Amercan Indans. For 2 characterstcs, the weght nflaton approach showed hgher prevalence rates than the current weghtng approach, whereas for 2 others, the reverse was observed. For Asans, the prevalence rate for the ndependent weghtng approach s hgher than that for the current weghtng approach, except for asthma. The nflaton approach provdes prevalence rates closer to that of the addton, the current approach appears to underperform when compared to the nflaton approach, even though the latter can be further fne tuned. 5. References Km, J. J. (2004). Effect of collapsng rows/columns of weghtng matrx on weghts. Proceedngs of the Secton on Survey Methods Research, Amercan Statstcal Assocaton CD. Km, J.J., L, J., and Vallant, R. (2007). Cell collapsng n poststratfcaton, to be publshed n Survey Methodology. Km, J.J. and Tompkns, L. (2007). Comparsons of current and alternatve collapsng approaches for mproved health estmates. Paper presented at the 11th Bennal CDC/ASTDR Symposum on Statstcal Methods, n Atlanta, Georga, Aprl 17-18, 2007. DISCLAIMER: The fndngs and conclusons n ths paper are those of the authors and do not necessarly represent the vews of the atonal Center for Statstcs, Centers for Dsease Control and Preventon. In concluson, the ndependent weghtng approach for Amercan Indans and Asans may produce more realstc weghts, and therefore, more accurate estmates. In 3029