ESTP course on Small Area Estimation Statistics Finlan, Helsini, 29 September 2 October 2014 Topic 3: Direct estimators for omains Risto Lehtonen, University of Helsini Risto Lehtonen University of Helsini Eurostat 1
Lecture topics: Monay 29 Sept. Topic 3: Direct estimators for omains Definitions an notation Estimation of omain totals for planne an unplanne omains Horvitz-Thompson estimator Háje estimator Variance estimation Example Risto Lehtonen University of Helsini 2 Eurostat
Definitions an notation - 1 Fixe an finite population U {1,2,...,,..., N}, where refers to the label of population element The fixe population is sai to be generate from a superpopulation. Variable of interest y For practical purposes, we are intereste in one particular realize population U with ( y1, y2,..., y N), not in the more general properties of the process (or moel) explaining how the population evolve. NOTE: In the esign-base approach, the values of the variable of interest are regare as fixe but unnown quantities. The only source of ranomness is the sampling esign, an our conclusions shoul apply to hypothetical repeate sampling from the fixe population. Risto Lehtonen University of Helsini 3
Definitions an notation - 2 Basic parameters for stuy variable y for the whole population: Total t y U Mean y y / N U We iscuss here the estimation of totals In practice, the values y of y are observe in an n element sample s U which is rawn by a sampling esign giving probability ps ( ) to each sample s NOTE: The sampling esign can be complex involving stratification an clustering an several sampling stages Risto Lehtonen University of Helsini 4
Definitions an notation - 3 The esign expectation of an estimator ˆt of population total t is etermine by the probabilities ps ( ): Let ts ˆ( ) enote the value of estimator that epens on y observe in sample s Expectation is E( tˆ) p( s) tˆ( s) s Design unbiase estimator: E() tˆ t Design variance: Var ( tˆ ) p( s) t ˆ( s) E( tˆ ) 2 NOTE: Var () t ˆ is an unnown parameter An estimator of esign variance is enote by Vt ˆˆ () s Risto Lehtonen University of Helsini 5
Definitions an notation - 4 Variance estimators are erive in two steps: (1) The theoretical esign-base variance Var () t ˆ (or its approximation if the theoretical esign variance is intractable) is erive (2) The erive quantity is estimate by a esign unbiase or esign-consistent estimator Vt ˆˆ () NOTE: An estimator is esign consistent if its esign bias an variance ten to zero as the sample size increases Risto Lehtonen University of Helsini 6
Definitions an notation - 5 Inclusion probability: An observation is inclue in the sample with probability P s The inverse probabilities a 1/ are calle esign weights Sample membership inicator: I I{ s} with value 1 if is in the sample an 0 otherwise Expectation of sample membership inicator EI ( ) Probability of incluing both elements an l ( l) is EII ( ) with inverse al 1/ l ( al a when l) The covariance of I an I l is (, ) Cov I I l l l l l Risto Lehtonen University of Helsini 7
Estimation for omains Domain estimation of totals or averages of variable of interest y over D non-overlapping omains U U, 1,2,...,,..., D, with possibly nown omain sizes Example: Population of a country is ivie into D omains by regional classification, with N househols in omain U The aim is to estimate statistics on househol income for the regional areas (omains) The ey parameter is omain total: N t y, U where y refers to measurement for househol Risto Lehtonen University of Helsini 8
Why omain totals are important? Totals are basic an the simplest escriptive statistics for continuous (or binary) stuy variables Many other, more complex statistic are functions of totals Domain ratio: R t y U tz U y z Estimator: Rˆ tˆ tˆ y s z s ay az Domain mean: y t / N Estimator: yˆ tˆ / N or yˆ tˆ / Nˆ Risto Lehtonen University of Helsini 9
Estimation for planne omains - 1 Sample is ivie into subsamples s, 1,..., D Planne omains: Stratifie sampling with omains = strata The population omains subpopulations U can be regare as separate Domain sizes N in omains U are assume nown Sample size n in omain sample s U avance is fixe in Stanar population estimators are applicable as such Risto Lehtonen University of Helsini 10
Estimation for planne omains - 3 NOTES Stratifie sampling with a suitable allocation scheme (e.g. optimal (Neyman) or power (Banier) allocation) is avisable in practical applications, in orer to obtain control over omain sample sizes Singh, Gambino an Mantel (1994) escribe allocation strategies to attain reasonable accuracy for small omains, still retaining goo accuracy for large omains Risto Lehtonen University of Helsini 11
Estimation for unplanne omains - 1 Unplanne omains: A single sample s of size n is rawn from population U. Domain samples are s U Domain sample sizes n cannot be consiere fixe but are ranom Extene omain variable of interest y efine as: y y for U an y 0 for U In other wors, y I{ U } y Because, we can estimate t y y U U omain total of y by estimating the population total of y Risto Lehtonen University of Helsini 12
Estimation for unplanne omains - 2 NOTES Contribution of extra variance cause by ranom omain sample sizes can be incorporate in variance expressions an computation SAS survey proceures: SURVEYMEANS SURVEYREG etc. can hanle the unplanne omains case by using the DOMAIN statement with extene omain y-variables an extene resiuals Risto Lehtonen University of Helsini 13
Horvitz-Thompson estimator of omain totals Horvitz-Thompson (HT) estimator (expansion estimator) is the basic esign-base irect estimator of the omain total t y, 1,..., D: U (1) tˆ I y / y / a y HT U s s HT estimates of omain totals are aitive: they sum up to the HT estimator t ˆHT a s y of the population total t y U As EI ( ), the HT estimator is esign unbiase for t Risto Lehtonen University of Helsini 14
Variance estimation for HT - 1 Stanar variance estimator for t uner planne omains: ˆHT Vˆ tˆ HT ( aal al ) yyl (2) s l s An alternative Sen-Yates-Gruny formula: ˆ ˆ al 2 V tht ( 1)( a y al yl ) (3) aa s l ; l s l NOTE: Both (2) an (3) are somewhat impractical... Why? Risto Lehtonen University of Helsini 15
Variance estimation for HT - 2 Variance estimation for planne omains in practice SUDAAN: Stanar formula (2) SAS macro CLAN: Sen-Yates-Gruny formula (3) Variance estimators are impractical because of a 1/ Approximations to l for fixe- size without-replacement (WOR) probability proportional-to-size (πps) esigns : Háje (1964) an Berger (2004, 2005) approximation Särnal (1996) approximation Berger an Sinner (2005) jacnife variance estimator Kott (2006) elete-a-group jacnife variance estimator l l Risto Lehtonen University of Helsini 16
Variance estimation for HT - 3 Variance estimation for planne omains in practice 1 Vˆ ˆ ˆ 2 A tht na y tht (4) n( n 1) s For example, SAS Proceure SURVEYMEANS uses (4) Risto Lehtonen University of Helsini 17
Variance estimation for HT - 3 Unplanne omains: Variance estimator shoul account for ranom omain sizes Approximate variance estimator by using extene omain variables y : 1 Vˆ ˆ ˆ 2 U tht nay tht, (5) nn ( 1) s where n is the total sample size NOTE: e.g. SAS proceure SURVEYMEANS uses (5) NOTE: Extene omain variables are y I{ U } y Recall: y y if U, 0 otherwise Risto Lehtonen University of Helsini 18
Háje estimator of omain totals Háje type irect estimator: where ˆ ˆ N t N y a y (6) H( N) ˆ s N yˆ a y / Nˆ are estimate omain means s Nˆ a are estimate sizes of population omains s Assuming omain sizes N are nown we expect better results with the Háje estimator (Särnal, Swensson an Wretman 1992) The variance of t ˆH ( N ) is estimate by 2 ˆ ˆ N V t ˆ ˆ H( N) ( l l ) l ˆ a a a y y y y (7) N s l s Risto Lehtonen University of Helsini 19
EXAMPLE: HT an Háje estimators for omain totals Real population ata from Western Finlan (Statistics Finlan) Domains: D = 12 regional areas = strata Planne omains for HT an Háje Unplanne omains for HT Stuy variable y: Disposable income (registers) Auxiliary ata: Sizes of population omains Sample size: n = 1,000 househols (welling units) Sampling: stratifie πps (WOR type probability proportional to size sampling) with househol size as the size variable Details: See separate pf sheet an Table 1 Risto Lehtonen University of Helsini 20 Eurostat
Table 2. Mean absolute relative error MARE (%) an mean coefficient of variation MCV (%) of irect HT an Háje estimators of totals for minor, meium-size an major omains for planne omains (HT an Háje) an unplanne omains (HT). HT Háje Domain sample Auxiliary information size class None Domain sizes MARE % MCV1 % MCV2 % MARE % MCV1 % Minor 8 33 11.5 11.9 28.3 5.3 10.9 n Meium 34 45 7.6 9.0 20.3 6.4 9.0 n Major 46 277 12.5 5.2 9.6 4.7 5.6 n MCV1: Assuming planne omains for HT an Háje MCV2: Assuming unplanne omains for HT Risto Lehtonen University of Helsini 21
References Berger, Y.G. (2004). A simple variance estimator for unequal probability sampling without replacement. Journal of Applie Statistics 31, 305-315. Berger, Y.G. (2005). Variance estimation with highly stratifie sampling esigns with unequal probabilities. Australian & New Zealan Journal of Statistics 47, 365-373. Berger, Y.G. an C.J. Sinner (2005). A jacnife variance estimator for unequal probability sampling. Journal of the Royal Statistical Society, Series B, 67, 79-89. Háje, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics 35, 1491-1523. Risto Lehtonen University of Helsini 22 Eurostat
References(cont.) Kott, P.S. (2006). Delete-a-group variance estimation for the general regression estimator uner Poisson sampling. Journal of Official Statistics 22, 759-767. Särnal, C.-E. (1996). Efficient estimators with simple variance in unequal probability sampling. Journal of the American Statistical Association 91, 1289-1300. Singh, M.P., J. Gambino an H.J. Mantel (1994). Issues an strategies for small area ata. Survey Methoology 20, 3-14. Risto Lehtonen University of Helsini 23 Eurostat