Small-Area Estimation based on Survey Data from a Left-Censored Fay-Herriot Model

Size: px

Start display at page:

Download "Small-Area Estimation based on Survey Data from a Left-Censored Fay-Herriot Model"

Shanon Hoover
6 years ago
Views:

1 Small-Area Estmaton based on Survey Data from a Left-Censored Fay-Herrot Model Erc V. Slud and Tapabrata Mat Unversty of Maryland College Park & Iowa State Unversty July 28, 2005 Abstract. We study Small Area Estmaton based on data obtaned by leftcensored responses from a Fay-Herrot 979) normal-error model. The problem s motvated by the Census Bureau s ongong Small Area Income and Poverty Estmaton SAIPE) project, where a FH model s ftted to a logarthmcally transformed response varable count of sampled poor chldren wthn a CPS-sampled county), wth PSU s provdng responses of 0 beng dscarded. Alternatve small area estmates and assocated mean-squared error formulas are provded and supported by a smulaton study, and appled to a SAIPE data analyss. Key words: bas correcton, Fay-Herrot model, left-censorng, left-truncaton, mean-squared error, msspecfed model, survey estmaton. Ths paper descrbes research and analyss of ts authors, and s released to nform nterested partes and encourage dscusson. Results and conclusons are the authors and have not been endorsed by the Census Bureau. Introducton Small-area estmaton s becomng ncreasngly mportant n survey applcatons, partcularly n those felds of offcal statstcs where legslatve mandates requre socoeconomc estmates wthn narrower jursdctons than can accurately be descrbed by drect estmates from natonal surveys. An especally promnent and successful small-area project of ths sort s the US Census Bureau s Small Area Income and Poverty Estmaton SAIPE) program, an ongong effort mandated by Ttle I of the US federal Code and currently funded under the No Chld Left Behnd Act to estmate among other thngs) the numbers of poor school-age chldren by state, county, and ultmately school dstrct, based upon data from the Current

2 Populaton Survey CPS), Internal Revenue Servce IRS), Food Stamps, and the latest decennal census. The estmaton methodology used at the county level the level to whch we restrct attenton here s to ft a lnear regresson model to logarthms of counts of poor school-age chldren related to the householders n CPS-sampled households, a regresson model of a form ntroduced by Fay and Herrot 979) n whch samplng error and a random effect at the small-area level are separated and one usually the samplng error) s assumed known, based ether on drect survey estmates or a generalzed varance estmaton model. In the SAIPE and many other applcatons of Fay-Herrot models, arealevel Small Area models are specfed usng a transformaton most often, the logarthm) of the orgnal sampled data or of weghted estmators derved from t. PSU s where sampled responses fall below a possbly PSUdependent) threshold may be dropped, e.g. n SAIPE, countes wth 0 sampled poor chldren are dropped from the estmatng equatons defnng natonal parameter values. In ths paper, we develop formulas showng the top-order effect of such left-truncaton n basng model parameters and SAE s. We show how the bas could be corrected approxmately, startng from estmates based on a Fay-Herrot model gnorng censorng, and how more accurate parameter estmates and SAE s could be derved by treatng the data as left-censored. The problem treated here s generc n small-area estmaton because of the wde mportance of Fay-Herrot models. Left-censorng arses ether because of transformaton and excluded zeroes, as n SAIPE, or for structural reasons, e.g. due to an establshment survey s mposton of lesser reportng requrements on small unts. For example, the Energy Informaton Admnstraton s monthly crude ol report s based on a survey EIA-83, n whch respondent companes that carry or store more than 000 barrels of crude ol are requred to fle data monthly. Another example s n the US Natonal Resource Inventory Survey Nusser and Goebel 997) data on wnd eroson, collected annually to produce average wnd eroson at the natonal level. State and county authortes are nterested n county-level estmates of wnd eroson, yet some countes are typcally dscarded from the data analyss because ther observed wnd eroson remans stable and small over tme. The methods of ths paper would apply to such surveys, possbly after nonlnear transformaton of the sampled measurements, when small-area estmates are needed f there are good lnear-model predctors for the response. 2

3 2 FH Model for SAIPE Small Area Estmaton One of the most prevalent mxed-effect lnear models used n small-area estmatons Ghosh & Rao 994), ncludng SAIPE, s the Fay-Herrot 979) model FH) form, descrbed as follows. For each PSU ndexed by =,...,m, assume that a sample-sze n and p-dmensonal vector x of predctor varables are known, and that response-varables satsfyng y = x tr β 0 + u + e, u N0, σ 2 0), e N 0, s ) ) are observed whenever n > 0), where β 0 R p s a vector of unknown fxed-effect coeffcents, and u, e are respectvely PSU random effects and samplng errors, ndependent of each other wthn and across PSU s. The varances s are assumed to be known functons of n except possbly for a constant v e of proportonalty: the usual form s s v e /n. Ordnarly, the parameter σ 2 s unknown and estmated, wth σ0 2 denotng ts true value for the observed data, whle v e s known. In SAIPE t also makes sense to treat σ0 2 as known estmated from an auxlary model ftted to the most recent decennal census data, cf. Ctro and Kalton 2000, App. A) and v e as unknown, and a parallel treatment of the ssues treated here can be gven for ths case. Small area estmates SAE s) based on such FH models are statstcs desgned to estmate wth small mean squared error MSE) the parameters ϑ = x tr β 0 + u, =,..., m The values y are generally drect survey estmators of the target smallarea parameters ϑ n the sampled PSU s but may be unacceptably varable because of small sample-sze n. In the SAIPE log-count FH models, y s the observed log number of poor chldren n the th PSU county), wth the small-area parameter for the count tself defned by exponentatng: ϑ = expϑ ) expx tr β 0 + u ) 2) 2. SAE Formulas In the FH model, the estmators we consder for ϑ based on the data {y, n : n > 0, m} above are the EBLUP estmators cf. Prasad and Rao 990, Ghosh and Rao 994, Rao 2003) ˆϑ = x tr ˆβ + ˆγ y x tr ˆβ) 3) 3

4 where ˆβ, ˆσ 2 ) or ˆβ, ˆv e ) are the maxmum lkelhood ML) estmators n the model ), and ˆγ = ˆσ 2 /ˆσ 2 +s ). Note that whle the papers of Prasad and Rao 990) and Lahr and Rao 995) treatng large-sample propertes of EBLUP estmators restrcted attenton to moment-based estmators, an analogous theory for ML estmators has been provded by Datta and Lahr 200).) We follow the conventon that ˆγ 0 so that ˆϑ = x tr ˆβ) when n = 0. In addton, we defne for future reference the notatons τ = σ s, γ = σ2 0 τ, η = x tr β 0, ˆη = x tr and let φz), Φz) respectvely denote the standard normal densty and dstrbuton functon. In later sectons, we consder estmaton of parameters under model ) based on dfferent mechansms causng the survey data to be ncomplete. We study several lkelhoods and estmators for ν 0 β 0, σ0 2). In each case, those estmators could be substtuted nto 3) to create small-area predctors, but we also explore and compare other lkelhood-based SAE s. ˆβ 3 Left-Censored FH-Model Data Consder now the data y satsfyng model ) reported n such a way that the exact response-value y s observed only f y κ, where κ R s a known real threshold. For count data, the threshold κ = apples, whle f y denotes the logarthm of a rate, y = logcount /n ), then a fnte transformed value s observed only f e y /n,.e., κ logn ). Under the assumpton that the predctor varables and sample szes n ) are avalable for all PSU s, t s clear that the threshold-based samplng framework descrbed above results n the classc left-censored data structure Klen and Moeschberger 2003): ) x, n, maxy, κ ), I [y κ ] : =,...,m 4) wth correspondng log-lkelhood { l cens β, σ 2 ) = 2 I [y κ ] log2πσ 2 + s )) + y x tr ) β)2 σ 2 + s I [y <κ ] logφ κ x tr β } )) σ 2 + s 5) 4

5 In ths settng, the value y s replaced by an ndcator whenever y < κ. Although that nformaton has not always been used n surveys, the data do nclude the ndces of whch sampled PSU s have below-threshold responses. In some surveys, there may nstead be no nformaton whatever on PSU s wthn the samplng frame whch would have been sampled but would have produced below-threshold responses. In such cases, m tself as an overall characterstc of the samplng frame) would be unknown, and the data would consst only of x, n, y : m, y κ ) 6) whch s a left-truncated sample Klen and Moeschberger 2003), wth loglkelhood l trunc β, σ 2 ) = { log2πσ 2 + s )) + y x tr ) β)2 2 σ 2 + s : y κ log Φ κ x tr β } )) σ 2 + s 7) For survey data from a samplng frame assumed to satsfy ), there are at least four ways the statstcan could analyze the data to provde parameter estmators ˆν = ˆβ, ˆσ 2 ) to be substtuted nto small-area estmators. A) One can drop the PSU s wth below-threshold responses, whch s to say estmate parameters ν = β, σ 2 ) from the dataset 6), usng standard FH methodology based on complete data. The underlyng log-lkelhood here s l compl β, σ) = 2 : y κ { log2πσ 2 + s )) + y x tr β)2 σ 2 + s B) One can recognze that the FH complete-data log-lkelhood 8)s msspecfed under the samplng framework 6), and correct the parameter estmators for bas and robustly estmate ther large-sample varances. C) One can drop the PSU s wth below-threshold responses, but analyze the left-truncated dataset 6) usng the log-lkelhood 7). D) Fnally, one can analyze the left-censored dataset 4) usng the leftcensored regresson model log-lkelhood. } 8) 5

6 The Census Bureau s SAIPE methodology has essentally been opton A) above. The SAIPE program has hstorcally up through ts 997 county-level estmates of numbers of poor chldren) treated ts log-transformed county-level counts of poor related sampled chldren n the CPS ASEC) through a slghtly modfed FH model ) treated as a complete-data model. Strctly speakng, the FH model was used only n the varant form where σ 2 s taken as known ftted wthn a generalzed-varance framework from decennal census data and s = v e /n α wth v e treated as unknown. The value α used n practce has been α = /2 up through the 995 estmates and α = /4 startng wth the SAIPE 997 producton estmates. Our objectve n ths paper s frst to study opton B) as a way of dervng correct large-sample estmators of parameters from left-censored data, wth a vew to assessng, approxmatng and correctng for the bases whch arse n A). Next, snce the truncated dataset 6) contans strctly less nformaton both n the vernacular and techncal senses) than the censored dataset 4), there s no good motvaton for opton C) above, even though ts model s correctly specfed: when the left-censored form 4) of the data s avalable, a fully nformatve lkelhood-based treatment D) wll provde better large-sample estmators than those based on the reduced data 6). The remander of the paper s organzed as follows. Frst, n Secton 4, we consder opton A) of estmatng parameters usng the complete-data FH parameter estmates on 6), gnorng the left-truncaton. We exhbt a toporder approxmaton of the estmaton bases, numercally assess the qualty of that approxmaton under several scenaros, and study the consequences of the bases for small-area estmaton. Next, n Secton 5, we derve maxmumlkelhood estmators based on a parametrc left-censored-data lkelhood, and compare the large-sample behavor and asymptotc varances of those estmators wththe ones found n B). All methods are compared frst n a realstc fnte-sample smulaton study, n Secton 6, and then n a real-data SAIPE example, n Secton 7. Overall conclusons are drawn n Secton 8. 4 Msspecfed Analyss of Censored Data as Complete For the data-set observed n the form 6), a nave estmator ν = β, σ 2 ) of ν 0 = β 0, σ 2 0 ) obtaned by maxmzng l censβ, σ 2 ), equvalent to the one actually used n SAIPE for the log-count model, s gven by usng precsely the same formulas as before restrcted to the observed data, namely: 6

7 and β β σ 2 ), βt) = m σ 2 = arg mn t x 2 ) m I t + s [y κ ] I [y κ ] x y t + s I [y κ ] logt + s ) + y x tr t + s βt)) 2 ) Snce these estmators are not derved from a lkelhood or moment crteron based on the left-truncated data model, one mght expect them to be somewhat based. It turns out that they are, and we calculate the large-sample lmts ν = β, σ 2 ) of ν = β, σ 2 ), as follows. Frst, β β 0 = m I [y κ ] x 2 ) [ m σ 2 + s I [y κ ] x u + e )] σ 2 + s Next, denotng the varable of ntegraton for the standard normal devate u + e )/ τ as z, and defnng ξ κ η )/ τ = κ x tr β 0 )/ σ0 2 + s note that y = η + u + e ), and ) E I [y κ ] u + e ) = τ ξ z e z2/2 2π dz = τ 2π e ξ2 /2 Therefore, applyng a nondentcal-summand Law of Large Numbers unform over the parameter t = σ 2 ) wthn the prevous expresson for β β 0, the lmt β β 0 dffers by an amount asymptotcally neglgble n probablty from m x 2 ) m σ 2 Φξ )) + s 2π x τ σ 2 + s e ξ2 /2 In ths same sprt, we deduce that for large m the estmator βt) dffers asymptotcally neglgbly n probablty from β t) defned by m β t) = β 0 + x 2 ) m Φξ )) t + s 2π x τ t + s e ξ2 /2 9) 7

8 Smlarly, σ 2 dffers asymptotcally neglgbly, as m, from the argument t = σ 2 mnmzng z τ x tr Φξ )) logt + s ) + β t) β 0 )) 2 e z2 /2 dz ) ξ t + s 2π = Φξ )){logt + s ) + t + s τ + x tr β t) β 0 )) 2 )} + e ξ2 /2 { 2 ) τ x tr β t) β 0 )) + τ ξ } 2π t + s and after substtutng the formula 9) left-multpled by β t) β 0 ) tr t + s ) x 2 Φξ )), we obtan σ 2 = arg mn t Φξ ))logt + s ) + τ t + s ) + e ξ2 /2 τ 2π κ x tr β t)) t + s 0) We collect our conclusons n the followng Theorem. Further justfcatons of the steps can be found n the Appendx. Theorem 4. Under the assumptons a) c), as m, the estmators β, σ 2 maxmzng 8) dffer asymptotcally neglgbly respectvely from β σ ) 2 and σ 2 defned above n formulas 9) and 0). Moreover, n the lmtng case where as m, m m I [y κ ], to top order σ 2 σ 2 0 2π m β β 0 m ξ / e ξ2 /2 m τ τ 2 x 2 ) m Φξ )) τ 2π Φξ )) δ σ 2 x τ e ξ2 /2 δ β It s not hard to check that the calculaton just completed s essentally the same as fndng the arguments ν = β, σ ) 2 mnmzng the Kullback- Lebler dstance between the model A) wth log-lkelhood 8) for the data on { : y κ } versus the correct log-lkelhood 5). Thus our dervaton of ν follows the well-establshed lnes of the asymptotc msspecfedmodel theory n Whte 982), regardng the estmator ν as the maxmumlkelhood estmator under the msspecfed complete-data lkelhood for 6). 8

9 We contnue by fndng the form of the asymptotc varances of these estmators, as follows. Under sutable regularty condtons, descrbed n the Appendx, we have as m, m β β σ 2 σ 2 ) D N0, A ΣA ) ) where the p+) p+) matrces A, Σ are defned by nonrandom lmts A = lm m and Σ = lm m 2m 4m [ 2 b,t I E [y κ ] logt + s ) + y x tr ) b)2 ] t + s [ Var I [y κ ] b,t b,t)=a logt + s ) + y x tr ) b)2 ] 2 t + s and A, Σ are gven n block-decomposed form ) ) A A A = 2 Σ Σ A tr, Σ = 2 2 A 22 Σ tr 2 Σ 22 b,t)=a wth p p upper-left blocks A, Σ, n formulas 26)-3) n the Appendx. Our objectve n ths Secton has been to contrast the actual large-sample behavor ) of the complete-data estmators wth the nomnal behavor that would be expected under complete-data FH formulas: where m β β0 σ 2 σ 2 0 ) N 0, Σβ 0 0 Σ σ 2 )) 2) Σ β = m x 2 σ0 2 + s I [y κ ], Σ σ 2 = m 2σ s ) 2 I [y κ ] The major result of ths secton s that, n the settng where left-censorng exsts but s relatvely lght, the extent of bas n parameter estmaton can be well approxmated and estmated, and mean-squared estmaton error can be estmated robustly. To accomplsh ths, we want essentally to fnd estmators of ν 0 = β 0, σ0 2) to substtute nto the terms τ, ξ wthn 9

10 the formulas for δ β, δ σ 2 n Theorem 4.. Now, we wll see below that the dfferences between ν = β, σ ) 2 and ν 0 are often large enough that drect substtuton of ν for ν 0 s a bad dea. On the other hand, ths substton does gve a prelmnary estmator for δ β, δ σ 2) whch corrects ν n the rght drecton. Then ν corrected by the prelmnary-estmated bases δ β, δ σ 2) provded an mproved estmator of ν 0 whch can n turn be substtuted nto an mproved estmator of δ β, δ σ 2), and ths correcton, bas-estmaton, and re-substtuton can be terated to obtan a sequence of estmators ν k) = β k), σ k) ) 2 ), k 0, accordng to the scheme β 0), σ 0) ) 2 ) β, σ 2 ), τ k) = σ k) ) 2 + s, ξ k) = κ x tr τ k) β k) δ k) β δ k) σ 2 = m β k+) β x 2 τ k) Φ ξ k) ) m )) ξ τ k) φ ξ k) / m ) x φ ξ k) τ k) ) k) τ k) Φ ξ )) ) 2 k) δ β, σ k+) ) 2 σ 2 k) δ σ 2 For a fxed dataset, the lmt of ths sequence of estmators as k can equvalently be gven by the soluton assumed unque, whch appears to be the case n practce) of the estmatng equatons where ˆβ = β m ˆσ 2 = σ 2 x 2 ) m Φˆξ )) ˆτ ξ / m φˆξ ) ˆτ ˆτ 2 x ˆτ φˆξ ) 3) Φˆξ )) 4) ˆτ = ˆσ 2 + s, ˆξ = κ x tr ˆβ)/ ˆτ 5) Note that the estmators gven by equatons 3) 5) are the method-b estmators ˆν ˆν B, but we omt the B subscrpt for smplcty. Although the soluton ˆν to these equatons cannot be asserted to be consstent but only approxmately so, we wll see that ths estmator performs very well. We summarze our result n: 0

11 Theorem 4.2 Under the hypotheses of Theorem 4., the estmators ˆβ, ˆσ 2 ) defned as solutons of the estmatng equatons 3) 5) are approxmately consstent estmators of β 0, σ0 2 ), n the sense that the errors are of smaller order than δ β + δ σ 2 wth δ β, δ σ 2) gven n Theorem 4., when censorng s rare n the sense that m m I [y κ ] n probablty as m. We shall examne fnte-sample aspects of ths approxmaton n the smulaton study of Secton 6 below. At ths pont, we brefly consder the exact calculaton of large-sample dfferences, wthn specfc scenaros, between a 0, a, and the bas approxmatons gven by Theorem 4.. Suppose for llustraton that x =, w ) tr R 2 wth w N0, σw) 2 and that the sample szes n are nteger-valued random varables dstrbuted ndependently of w, wth n Unf{0,,...,50} dscrete-unform), and wth κ = κ the same for all. Then n a large-m lmt, formulas 9)-0) and 26)-3) become expectatons, and β, σ, 2 Σ β, Σ σ 2 and the dagonal elements of A ΣA, can be calculated numercally, as can the dfferences δ σ 2 and δ β between the left and rght hand sdes of the approxmate equaltes n Theorem 4.. We dd ths by codng a functon n R, and we exhbt the results n Table for several combnatons of the parameter values β 0, σ0 2, σ2 w, v e. In each case, we quantfy the degree of censorng through the parameter c proporton censored), where c = lm m m Py < κ ) = EΦ κ x tr β 0 σ v e/n )) There are two mmedate conclusons from the tabulated results. Frst, the correctons δ β, δ σ 2 approxmate the correspondng bases β β 0 and σ 2 σ0 2 remarkably well, wth some dscrepances vsble at 20% censorng and progressvely smaller errors as the degree of censorng decreases. Secondly, the robust or sandwch-formula varances gven by the dagonal elements of A ΣA are generally very smlar to the smpler nomnal varances gven by the dagonal elements of Σ β together wth Σ σ Small Area Estmates and MSE We now consder alternatve SAE s and ther MSE s accordng to the analyss optons A)-D) lsted n Secton 3. Recall that n all cases, we vew the data as beng governed by the left-censored FH model wth data 4) and log-lkelhood 5).

12 Table : Lmts β, σ ), 2 asymptotc varance parameters, and proporton c of PSU s censored, for msspecfed complete-data) estmators of β 0, σ0 2) when FH model data are actually left-censored at a constant threshold κ. In upper porton of Table above double-lne), β 0 =,.5), σ0 2 = 2, and sample-szes n are dscrete-unform over {0,...,50}; n lower porton, β 0 =, ), σ0 2 =, and n are unform n {30, 70, 00}. Asymptotc bases β β 0 and σ 2 σ0 2 should respectvely be compared wth approxmatons δ β, δ σ 2 from Theorem 4.. Dagonal elements of Σ β are gven n the column headed Σ β, and the dagonal elements of A ΣA n the columns Σ R β and Σ R σ R for Robust). Columns nvolvng β have two entres correspondng to 2 frst and second components. κ σ0 2 v e c β σ 2 δ β δ σ 2 Σ β Σ σ 2 Σ R β Σ R σ

13 The smplest opton s the EBLUP estmator 3) arsng n the msspecfed complete-data analyss A), whch after treatng censored PSU s as wholly unobserved becomes ϑ A = σ 2 σ 2 y x tr + s β)i [y κ ] + x tr β Recall that the actual asymptotc behavor of the usual ML estmators β, σ 2 ) s gven by ). To top order, we fnd under opton A) usng small-area predctors ϑa : MSE A σ 2 = E σ 2 y x tr + s β)i [y κ ] + x tr β β 0 ) u ) 2 Now substtute of β, σ 2 ) for β, σ 2 ), wth o P ) error, and recall that u N σ2 0 τ y, s σ 2 0 τ ) condtonally gven y. Then MSE A = o P )+ σ 2 0 s τ + E y x tr σ 2 β 0 ){ σ 2 I + s [y κ ] σ2 0 } + { τ σ2 ) 2 σ 2 I + s [y κ ]} where as n Secton 9.3 we denote = x tr β β 0 ). formula for µ from Secton 9.3, to fnd Next substtute MSE A = σ2 0 s σ 2 + τ τ σ 2 φξ ) + + s σ2 ) 2 σ 2 Φξ ))) + s σ 2 +Var σ 2 y x tr β )I + s [y κ ] σ2 ) 0 y x tr β 0 ) + o P ) τ Next, substtute formulas for µ, µ 2 from Secton 9.3 to obtan, after some further algebrac reductons, MSE A = σ0 2 σ 2 ) + Φξ ) + ξ φξ )) τ σ 2 ) 2 2 σ0 2 + s + 2 { 2 2Φξ )) σ 2 σ 2 σ 2 } + 2 τ φξ ) s τ σ2 ) σ 2 + s Ths MSE formula ncludes a bas term due to the dscrepancy between the large-sample lmt β of the msspecfed ML estmator β, but not nomnal or robust) asymptotc-varance terms for β or σ 2 snce these are O/m). The magntude of the bas has been approxmated n Theorem 4., and we have seen that under realstc parameter combnatons wth left-censorng of 3

14 20% or less, ths approxmaton s very good, correctng more than 95% of the bas. A natural frst approach to mprovng the small-area predctons s to modfy β by the bas-estmator wthn the SAE formula, usng the estmators ˆβ, ˆσ 2 ) defned by 3)-4), as follows: ϑ B = ˆσ 2 ˆσ 2 y x tr + s ˆβ)I [y κ ] + x tr ˆβ An expresson for the mean-squared error MSE B of ths SAE can be derved along the same lnes as the formula for MSE A above. One man dfference s that the term correspondng to n the new formula s much smaller than. However, as can be verfed from smulatons, MSE B s often qute a bt larger than MSE A : the man pont s that n many examples ˆβ estmates β 0 qute accurately, but there s ntrnsc postve bas n ϑ B due to the fact that the correcton to x tr ˆβ by resduals s employed only when those resduals are above the threshold κ x tr ˆβ. Therefore, a slghtly mproved small-area estmator usng the adjustment B) nvolves correctng ϑ B by ts estmated bas, namely ϑ B = ˆσ2 ˆσ + y x tr s ˆβ)I [y κ ] + x tr ˆβ ˆσ ˆσ 2 φ κ x tr ˆβ ) 6) 2 + s ˆσ 2 + s 5 Analyss usng Left-Censored Data Lkelhood The method of analyss proposed above as method D) s a left-censored parametrc lnear regresson model wth the slghtly unusual PSU-dependent varance term σ 2 +s of Fay and Herrot. Wth the s terms absent, such analyses have appeared n older survval analyss lterature: recent survval lterature has emphaszed nstead the semparametrc censored lnear regresson model wth e absent and) the dstrbuton of u unknown wth mean 0), cf. Buckley and James 979), Tsats 990), Rtov 990), and Yng 993). As n Klen and Moeschberger 2003), the maxmum-lkelhood estmaton of β 0, σ 2 0 ) usng the parametrc log-lkelhood l cens s straghtforward but does not lead to explct formulas. The lkelhood equatons determnng the ML estmates, expressed n terms of the notaton z z β, σ 2 ) κ x tr β)/ σ 2 + s, are: x {I [y κ ] y x tr β σ 2 I + s [y <κ ] σ 2 + s 4 φz ) } Φz ) = 0 7)

15 { σ 2 I + s [y κ ] y x tr β)2 ) σ 2 + s φz ) } + I [y <κ ] z Φz ) = 0 8) and the asymptotc varance-covarance matrx for the resultng maxmum lkelhood estmates of β, σ 2 ) s readly computed and estmated as the nverse of the per-psu nformaton matrx ) Iβ 0, σ0) 2 I I = 2 I2 tr E 2 I β,σ lcensβ 2 0, σ0)) 2 9) 22 I 22 = m I = m I 2 = m 4τ 2 x 2 [ ] Φξ ) + ξ φξ ) + φ 2 ξ )/Φξ ) τ x ) 2τ 3/2 φξ ) + ξ 2 + ξ φξ )/Φξ ) [ ] 2 Φξ )) + ξ φξ ) + ξ 2 + ξ φξ )/Φξ )) Lettng ˆβ D, ˆσ D ) 2 ) denote the densored-data ML estmators obtaned by solvng the lkelhood equatons 7)-8), we know from standard maxmumlkelhood estmaton theory the regularty condtons for whch are easly satsfed n the present settng) that m ˆβ D β 0 ˆσ D ) 2 σ 2 0 D N 0, ) Iβ 0, σ0)) 2 We brefly compare the asymptotc varances of the estmators ˆβ D, ˆσ D ) 2, ) versus β, σ 2 ) normalzed by m) n the frst llustratve case consdered n calculatng Table : ths s the case where x =, w ) tr, w N0, 2), β 0 =,.5), σ 2 0 = 0.5, wth n dscrete-unform n {0,,...,50}. The nomnal complete-data) Fay-Herrot varances for the two components of β and σ 2 are respectvely.550,.38,.453, whle the dagonal elements of the correct asymptotc varance matrx n ) for the Fay-Herrot estmators are.490,.307,.442. By contrast, the correspondng varances for the censored-data ML estmators are.623,.340,.843. At frst sght, t s puzzlng that the ML component estmators do not all have smaller varances. But recall that the FH and censored-ml estmators are not drectly comparable because the former are based, and the asymptotc varablty of the bas estmators has so far not been taken nto account. 5

16 5. Small Area Estmates and MSE How would the censored-data lkelhood be used to generate small-area estmators? The most natural generalzaton of the EBLUP dea s to estmate ϑ n PSU s wthout sample data as x tr ˆβ D, and to estmate n PSU s wth sampled data by ˆϑD x tr β + I [y κ ] Eu y ) + I [y <κ ] Eu y < κ )) β,σ 2 )=ˆβ D,ˆσ D ) 2 ) 20) Here the superscrpt D refers to the censored-data ML as method D) dscussed n Secton 3. Note that the SAE ˆϑ D necessarly takes a dfferent form accordng to whether the response y n the th PSU s left-censored. The condtonal expectatons wthn ϑ D are Eu y ) = y x tr β 0)σ0 2/τ and Eu y < κ ) = σ2 ξ 0 τ z φz)dz τ Φξ ) = φξ )σ0 2 Φξ ) τ Therefore, ˆϑ D where = x tr ˆβ D + I [y κ ] ˆσ D ) 2 y x tr ˆβ D ) ˆσ D ) 2 I + s [y <κ ] φˆξ D )ˆσD ) 2 Φˆξ D ) ˆτ D ˆτ D ˆσ D ) 2 + s, ˆξD κ x tr ˆβ D )/ ˆτ D 2) To calculate MSE s, we need to take nto account the probablty of the observaton n the th PSU havng been left-censored, where x, κ, n are fxed. The calculaton for n > 0, as n the case of MSE A, MSEB ) s as follows. MSE D = E I [y κ ] ˆϑ D ϑ ) 2 + I [y <κ ] ˆϑ D ϑ ) 2) Then, usng the consstency of the ML estmators, we fnd to top order MSE D { σ 2 } E I 0 2 { [y κ ] y x tr φξ )σ 2 } β 0 ) u ) I[y <κ τ ] Φξ ) + u τ = E I [y κ ] σ2 0 u + e ) u ) 2 + τ + I [y <κ ] [ u σ2 0 u + e )) 2 + σ4 0 τ τ 2 u + e + φξ ) τ Φξ ) )2]) 6

17 snce the cross-term n the expanded square term multplyng I [y <κ ] has expectaton 0. Evaluatng the fnal expectatons now shows MSE D s σ0 2 + σ4 0 Φξ ) z 2 φz)dz φ2 ξ ) ) τ τ Φξ ) Φ 2 ξ ) = s σ0 2 + σ4 0 Φξ ) ξ φξ ) φ2 ξ )) 22) τ τ Φξ ) The formula 22) for MSE D naturally ndcates that ths MSE s close to the top-order) nomnal Fay-Herrot MSE of s σ0 2/τ. When censorng s moderate,.e. when most of the values ξ = κ x tr β 0)/ τ are less than 0.5, say, we note from our numercal experence n Table that the bas-term n MSE B s qute small, whle the functon Φξ) ξφξ) φ 2 ξ)/φξ) s bounded between 0 and.083 for ξ.5. Thus n moderate-censorng settngs, the comparson of MSE D versus nomnal MSE s effectvely between s σ0 2/τ and s s σ0 2 +σ4 )/σ 2 +s ) 2, and t s easly checked that the frst of these s always larger than the second, wth notceable dfferences when σ 2 s far from σ0 2. Thus, snce we found n our numercal calculatons of Table that σ 2 can easly dffer by more than 5% when the censorng-proporton c s no more than 0%, we conclude that the mean-squared-errors for small-area estmators ˆϑ D are lkely to be much better than for ϑb or ϑa. We proceed, n the followng Secton, to test these theoretcal predctons n terms of fnte-sample SAE and MSE behavor from a smulaton study usng desgn matrx and parameters as n the SAIPE 993 data Ctro and Kalton 2000). We also compare there the top-order formula 22), wth remanders o), to a more elaborate formula 33) derved n the Appendx to have remanders O/m). ξ 6 Comparatve Smulaton Study We conducted a smulaton study to check the performance of the estmaton methodology presented n the prevous sectons. Our smulaton desgn closely mtates the stuaton encountered n the U.S. Census Bureau s ongong SAIPE project descrbed n greater detal below n Secton 7. For smplcty and confdentalty, the covarates used n the smulaton are pseudo values smulated once only) from a multvarate normal dstrbuton wth the same means and varances as the orgnal covarates for all US countes whch were used n the SAIPE 993 log-rate model for poverty among school-age chldren related to sampled householders. The covarates are as 7

18 descrbed n Secton 7 and the coeffcents used to generate the response varables are β =.860,.236,.33,.9,.393). The sample szes n are the actual US Current Populaton Survey CPS) numbers of sampled households n , as n the SAIPE project, for subsets of the frst m alphabetcally ordered US countes, after deletng Los Angeles county by far the largest one). Ths was done because, n smulatons not reported here, wth fxed values of σ 2 as small as.04, we found that the L.A. county SAE was very erratc and dstorted the summary measures of MSE. Fxng β, we generated values {y } m n smulatons wth N = 000 teratons, accordng to model ) wth s = v e /n, but we left-censored the generated values as y maxκ, y ), where κ = log n. We have explored varous combnatons of smulaton parameter values m, σ 2, v e ), but dsplay results only for m = 00, 500, 000 cross-classfed wth four labelled combnatons for σ0 2, v e): Par σ0 2, v e) =.5, 30).5, 7), 30), 7) These σ0 2 values are somewhat larger than the values fxed n the SAIPE log-count and log-rate models descrbed n Ctro & Kalton 2000), but they are reasonable, beng smlar to the values σ 2, v e ) jontly ftted by maxmum lkelhood to the SAIPE log-rate FH model on 993 data. Table 2 shows the averages over strata of the small area parameters, the smulated SAE bases, and the true MSE under methods A=FH, B=Badj and D=Cens. The frst set of A, B, D columns dsplays SAE bas and the second set MSE. In all cases, method D yelds the smallest bas by far, as well as the smallest range. The bas for methods A and B s generally postve, much less so for method D. Both n terms of bas and MSE of SAE, method B appears always nferor to A. Table 3 summarzes the mean and standard-error behavor of the maxmum lkelhood parameter estmates of β, σ 2 ) obtaned by methods A, B and D, over the same range of smulatons as n Table 2, for parametercombnaton P ar = 2. We calculated but do not dsplay the correspondng results for the other parameter combnatons, snce the results were very smlar.) Here also, method D performs best. As m ncreases, clearly the MLE converges to the correct values. In our smulaton, the censorng rate vares between 5% to 4%. Method A produces SAE s based consderably above ther targets, wth bas-squared as a percentage of MSE rangng from 4 to 2 %, and Method B dd not materally mprove the results. However, method D results n very accurate parameter estmates, at least for large samples. 8

19 Table 2: Average small-area parameters, censorng proportons, SAE bases and MSE s over 000 smulaton teratons, for varous m, σ 2, v e ) combnatons, wth β fxed =.860,.236,.33,.9,.393) as descrbed n text. SAE Bas MSE m Par %Cens θ A B D A B D A strkng feature of Table 2 s that the MSE s under method B are systematcally larger than those of method A, despte the clear ndcaton of Table 3 that the parameter estmators under method B are systematcally closer than those of method A to the truth [or to those of method D]. Ths apparent paradox s resolved by recallng that the MSE formulas for methods A and method D: the one n D explctly adjusts the fxed-effect predctor downward for each unobserved PSU known to be below-threshold, whle the one n methods A and B takes the fxed-effect predctor as s. So the fndng s that the MSE and bas become worse f one uses the method-a SAE formula wth the method-b or method-d parameter estmators. Clearly the current practce of the Census Bureau n SAIPE, method A, would produce serous bas n SAE s, wth MSE s too large by 0-20%, f the FH model smulated were the correct one. Note however that these bases and MSE s are expressed on the measurement scale of the underlyng FH model, whch n the case of SAIPE s logarthmc. Comparsons between the performance of Methods A, B, and D on real SAIPE data are gven n Secton 7 below. 9

20 Table 3: Parameter Estmates and ther SE s, for Par = 2 case. m Method ˆβ0 ˆβ ˆβ2 ˆβ3 ˆβ4 ˆσ 2 00 A ) 0.464) 0.367) 0.97) 0.558) 0.098) B ) 0.459) 0.360) 0.93) 0.546) 0.097) D ) 0.46) 0.363) 0.927) 0.540) 0.05) 500 A ) 0.92) 0.48) 0.426) 0.225) 0.048) B ) 0.93) 0.44) 0.420) 0.222) 0.047) D ) 0.95) 0.44) 0.425) 0.222) 0.052) 000 A ) 0.36) 0.00) 0.322) 0.47) 0.033) B ) 0.36) 0.097) 0.38) 0.44) 0.032) D ) 0.37) 0.098) 0.322) 0.46) 0.035) Table 4: Emprcal MSE D values averaged over MSED), n smulatons of sze m = 00, 500, 000, along wth relatve dfferences between theoretcal MSE and MSED RD), and between estmated MSE and MSED RB). m Par MSED RD RB RD2 RB

21 Table 4 compares the emprcal, theoretcal, and estmated quanttes MSE D wthn the same SAIPE-style smulaton as n the other Tables. The emprcal MSE s usng small-area estmators ˆϑD are calculated drectly, over the 000 smulaton teratons n each of three smulatons, wth m = 00, 500 and 000 strata. The theoretcal MSE s are calculated usng formulas 22) and 33), wth true parameters β, σ 2, v e ) substtuted, and the estmated MSE s from the same formulas wth method-d parameter estmators substtuted based on fxed v e ). Each of these MSE s s then averaged over the m strata of the smulaton. Column RD [respectvely RD2] gves the relatve dfference between theoretcal MSE based on 22) [resp. 33)] and MSE D ; and columnrb [resp.rb2] gve relatve dfferences between MSE estmators, based on the correspondng formulas, and MSE D. Table 4 shows that the MSE formulas 22) and 33) under-estmate the actual emprcal MSE D on average, by an amount whch s no more than 0% for 22) and 5% for 33), for all combnatons m, Par) tred. The relatve errors for estmated MSE s based on plugged-n rather than true parameter values are somewhat worse, but substantally so only for m = 00. None of the theoretcal or estmated MSE s show a clear decrease wth m n the Table, but ths s because the smulatons wth dfferent m are consderably dfferent. For example, see the dfferences n censorng percentages across m dsplayed n Table 2.) 7 Real-data Comparsons 7. Log-rate model for SAIPE 993 data As descrbed n the Introducton, our motvaton for ths paper came from the small-area estmaton method of the SAIPE program based on a Fay- Herrot model for log-transformed county chld-poverty response data. We present model-fttng results usng methods A, B, and D on the SAIPE data for ncome-year 993 usng CPS samples aggregated across ). As descrbed elsewhere Slud 2003, 2004), the log-rate model for SAIPE 993 data uses as response varable y the log-transformed CPS-weghted rato estmate of county chld poverty rate among chldren aged 5 7 related to householders) n those countes appearng n the pooled CPS sample, and as predctors the logarthms of the followng four varables LTAXRT, LSTMPRT, LFILRT, LCPRT: 2

22 LTAXRT: the logarthm of the 993 county total of Poor chld IRS exemptons over total chld exemptons, LSTMPRT: the logarthm of the rato of the number of people recevng food stamps over the 993 demographc estmate of county populaton, LFILRT: the logarthm of the rato of the county number of IRS chld exemptons over the 993 estmated resdent chld populaton, LCPRT: the logarthm of the chld poverty rate estmated from the prevous decennal census adjusted to the CPS unverse defntons of resdent householders and related chldren. Here n s the number of households sampled, even though the relevant sampled unts would be chldren, because the count of CPS-sampled chldren was not drectly avalable for the SAIPE data. The Fay-Herrot model ) wth ths response and predctors, where σ0 2 s fxed n 993) at 0.04, s = v e /n, and β, v e ) are the unknown parameters, s the county lograte model whch dffers from but closely approxmates the model for log counts of poor related chldren, by county) actually specfed n SAIPE. Slud 2003, 2004) has studed the relatonshp of the log-rate model to the actual SAIPE log-count model, as well as the goodness of ft of small-area predctors based on the model both to the CPS data and n decennal census years) to the census county chld-poverty rates adjusted to CPS unverse defntons). In the 993 SAIPE data, there were 488 countes n the aggregated CPS sample, of whch only 84 had a postve number of sampled poor chldren. Snce the 304 sampled countes wth no sampled poor chldren were gnored n fttng the model, they represent a censorng rate of 304/488 = 20.4%. However, as ndcated n Secton 2, we model the droppng of countes wth 0-counts by sayng that the log ratos y of the counts of poor chldren dvded by n are stll normally dstrbuted varates whch are left-censored by the values κ = log/n ),.e. known only to take some value less than κ. Table 5 exhbts the maxmum lkelhood parameter estmates of β, v e ) obtaned by Methods A, B, and D, wth σ0 2 =.04 fxed. The Methods B and D yeld very smlar parameter estmates except for the ntercept and LSTMPRT coeffcents and v e. The LFILRT coeffcents have large standard error, thus are not as dfferent as they look.) However, as we now proceed to show, none of these models fts very well. As a prelmnary ndcaton, consder the predcted censorng percentage accordng to these models wth 22

23 Table 5: Parameters ftted by Method A=FH, B=Badj, and D=Cens, to SAIPE 993 data wth predctor means subtracted and σ0 2 =.04 fxed. Int LTAXRT LSTMPRT LFILRT LCPRT β 0 β β 2 β 3 β 4 v e FH Badj Cens Table 6: Parameters ftted by Method A=FH, B=Badj, and D=Cens, to SAIPE 993 data wth predctor means subtracted and v e = fxed at value obtaned by ML over β, σ 2, v e ) wthn left-censored log-lkelhood 5). Int LTAXRT LSTMPRT LFILRT LCPRT β 0 β β 2 β 3 β 4 σ 2 FH Badj Cens estmates plugged nto formula 23):.09 for Method A,.096 for Method B, and.25 for Method D. Thus, although the correctons gven by Methods B and D do brng the estmated value of c slghtly hgher, they stll fall far below the observed censorng rate of.204. One mght say that the value.04 for σ 2, artfcally fxed by a method documented n Kalton and Ctro 2000) nvolvng lnear-model fttng usng a lnear model wth the analogous predctors and the prevous n ths case, the 990) decennal census, s an obstacle to fndng a close ft to the log CPS weghted rato estmate by county that s beng used as response varable. For ths reason, re-ft the models by jontly maxmzng the censored-data lkelhood over β, σ 2, v e ), whch can be accomplshed usng a functon coded n R repeatedly, alternately to maxmze 5) over β, v e ) for fxed σ 2 and to maxmze over β, σ 2 ) for fxed v e. The resultng jont estmates are σ 2 =.277, v e = Now fxng ths v e value and re-estmatng parameters usng Methods A and B gves the parameter estmates summarzed n Table 6. We fxed v e n ths way because accordng to our model, the censored-data lkelhood most fully descrbes the data, and we are nvestgatng whether ths fully specfed left-censored model fts adequately. 23

24 Table 7: Lmtng parameter estmates from Methods A=FH, B=Badj, and D=Cens, wth v e = fxed, f the left-censored model held wth n and predctor-varable covarances as n SAIPE 993, and f the predctors were multvarate normally dstrbuted, dentcally across countes. Int LTAXRT LSTMPRT LFILRT LCPRT β 0 β β 2 β 3 β 4 σ 2 FH Badj Cens Agan we can check whether any of the ftted models n Table 6 gve adequate estmates of the censorng proporton c. The respectve plug-n estmated values of c from these models are:.095 for Method A,.099 for B, and.22 for D. Thus, none of the estmaton methods wth the alternatvely chosen value for v e provdes a close estmate of c. Another ndcaton of the lack of) ft of the fully specfed left-censored FH model to the data s gven by the dscrepances between the Method B and Method D estmators. The formulas for the lmtng values β, σ 2 of the Method A estmators and the correspondng Method-B adjusted values β δ β, σ 2 δ σ 2 can be evaluated numercally, analogously to the calculaton done for Table, f the centered covarates X are treated as beng multvarate normally dstrbuted wth means 0 and varances estmated from ther emprcal covarance matrx based on the 488 SAIPE countes n 993. The results of the calculaton are gven n Table 7. We can see clearly that f the left-censored FH model held precsely wth the parameters ftted by Method B n Table 6 and f the predctor varables were d multvarate normal), then the Method B and Method D parameter estmates would agree extremely closely n large-m samples. Ths s somethng we saw n the m = 000 smulatons n Secton 6, e.g. n Table 3, but defntely do not observe n the actual SAIPE 993 data. The combned falure of the ftted models n Tables 5 and 6 to provde accurate c estmators or to match closely between Methods B and D strongly suggests that the left-censored FH model does not adequately ft the SAIPE 993 data. Gven that the predctor varables do lnearly predct the log-rate responses very strongly Kalton and Ctro 2000, Slud 2003), how are we to understand the lack of ft? We cannot yet exclude the possblty ether 24

25 Overlay of Hstogram for Resduals from Method D Ft and county averaged condtonal densty for uncensored obs Scaled relatve frequency Resduals from Method D ft Overlay of Hstogram for Resduals from Method A Ft and county averaged condtonal densty for uncensored obs Scaled relatve frequency Resduals from Method A ft Fgure : Hstograms for resduals from Method D v e = fxed) and Method A σ 2 =.04 fxed) fts for observed SAIPE 993 countes wth non-zero counts of poor chldren, overlad wth estmated county-averaged condtonal densty for uncensored observatons, as descrbed n text. 25

26 that the underlyng county random-effects u are non-normal or that the sample-szes n fgurng n the thresholds κ do not behave ndependently of the responses y condtonally gven the covarates x. So we proceed by examnng the behavor of the above-threshold resduals and comparng t to the model predctons. We consder next a dagnostc for lack of ft based on graphcal comparson between the hstogram of resduals from the Method-D or method-a model and the correspondng condtonal densty constructed for the observed n > 0) populaton of countes. Fgure respectvely shows hstograms of resduals y x tr ˆβ D from method D wth fxed v e = 34.33) and y x tr ˆβ A from method A wth σ 2 =.04 fxed) for the 84 uncensored countes,.e. countes wth y > κ. If the left-censored Fay- Herrot model were vald, and the parameters β 0, v e, σ0 2 known, then n each county for whch y > κ s known to have occurred, the condtonal densty of y x tr β 0 would be τ I [t κ x tr β 0] φt/ τ )/ Φξ )) It follows that the dsplayed hstogram ought to be close to the average condtonal densty over all such countes, 84 τ /2 :y κ I [t κ x tr β 0] φt/ τ )/ Φξ )) Therefore, we overlad the hstogram wth ths densty, where the respectve estmates ˆβ D, ˆσ 2 ) D, 34.33) and ˆβ A,.04, ˆv e A ) are substtuted for the parameters β 0, σ 2, v e ). The resultng pcture s Fgure. Although not perfect, because of a slght skewness and mean shft to the left, the ft of the theoretcal condtonal denstes to the hstograms n these graphs s strkngly good. Both of these graphs relate only to the behavor of abovethreshold response-values y, and they dffer n that the populaton-wde parameters n the upper Method D) panel are estmated based on all data, ncludng the below-threshold observatons, whle n the lower panel Method A) only the above-threshold observatons were used n fttng. Partly as a result, the emprcal average of the above-threshold resduals y x tr ˆβ D for the model depcted n the upper panel was 0.30, whch corresponded farly well to the estmated theoretcal average φˆξ D ) Φˆξ D ) ˆσD ) n ) /2 =.90 26

27 whle the emprcal average of for Model A resduals from the lower panel was more sharply dscrepant from ts estmated theoretcal average φˆξ A ) Φˆξ A ).04 + ˆvA e n ) /2 =.60 Overall, the dagnostcs of ft suggest that the lack of ft of the leftcensored FH model to the SAIPE 993 data are largely due to the falure of the left-censorng threshold κ n a FH model to descrbe the phenomenon of observng 0 poor chldren n a sampled county, rather than to a falure of dstrbutonal assumptons concernng random effects. 8 Summary and Conclusons We have shown, theoretcally and through smulatons, the consderable bas and nflaton of MSE that can result from gnorng left-censorng n a Fay- Herrot model by comparson wth an estmaton methodology both for populaton and small-area parameters) based on a censored-data lkelhood. We have provded a method B estmatng equatons 3 5) of adjustng the parameter estmators derved by gnorng left-censorng, a method whch provdes answers extremely close to the censored-data ML estmators when censorng s up to about 0%. As shown n the example of Secton 7, ths result provdes a useful check on the correctness of the model assumptons. Theoretcal results, ncludng formulas and estmators for MSE of SAE s derved from censored-data method D) ML estmators, are corroborated by a smulaton study presented n Secton 6. In the left-censored smulatons, SAE s whch adjust fxed-effect predctors downward n areas known to be left-censored clearly outperform SAE s whch do not. As wth other Small-Area methods, the results presented here are unavodably parametrc. Yet n the motvatng SAIPE example, we have seen that lack of ft s apparently due not to the falure of dstrbutonal assumptons, but to the smple model of ndependent samplng and left-censorng. In that example, there s n fact some clusterng of samples: the CPS samples clusters of four nearby housng unts, whch are more smlar to one another than are area-wde unts. Thus, n areas wth small samples, there mght be a so-far unmodelled tendency for the below-threshold proporton to be larger than allowed for n model ). In future work, we wll study models wth parameters for nflaton of ths below-threshold category. 27

28 9 Appendx 9. Regularty Condtons It s assumed throughout the paper that the response-varables y, =,...,m, satsfy the model ), and also that: a) The random vectors x and sample szes n are ether unformly bounded or are realzatons of ndependent dentcally dstrbuted varates wth fnte fourth moments. b) Whether random or not, the vectors x R p are such that as m gets large, E m m x 4 ) are unformly bounded and wth probablty approachng as m, m x 2 s a postve defnte matrx, where for a column vector v, we denote v 2 = vv tr. c) The followng large-sample almost-sure) lmts exst as m, wth error-terms O P m /2 ) : lm m m m x 2 σ 2 + s, lm m m 2σ 2 + s ) 2 d) For each compact subnterval J of 0, ), the followng large-sample lmts exst unformly over t J: lm m m x 2 t + s ) k Φκ x tr β 0 )), k =, 2, 3 σ0 2 + s lm m m lm m m t + s ) k Φκ x tr β 0 )), k = 0,, 2, 3 σ0 2 + s x j κ j t + s ) k φκ x tr β 0 σ s ), j = 0,, k =, 2, 3 Moreover, the lmtng proporton uncensored must be strctly less than : c lm m m Φ κ x tr β 0 ) < 23) σ0 2 + s 28

29 9.2 Proof of Theorem 4. Beyond the regularty condtons, the Theorem assumes that the quantty c s close but not necessarly equal) to 0. The regularty condtons drectly mply the unform) large-sample convergence of both βt) and β t) to β 0 + lm m m x 2 ) Φξ )) lm t + s m m ) x τ φξ ) t + s d and also the convergence of βt) dt = β d t) and of dt β t) = β t) to ) lm m m x 2 t + s Φξ )) + lm m m lm m m x τ t + s ) 2 φξ ) x 2 ) t + s ) 2 Φξ ))β t) β 0 ) 24) 25) and of β t), β t), both to the same lmt. Now, restrctng attenton to a small neghborhood of t values not dependng on m) on whch the mnmum of the rght-hand sde of 0) s unque, we fnd by dfferentaton that the mnmzer σ 2 s determned as the root of the functon lm m m [ Φξ )) t σ2 0 t + s ) 2 whose dervatve also has a unform lmt. Under the hypotheses of Theorem 4., β t) P lm m m τ x tr β t + s t) + κ x tr β t) x 2 t + s ) m x y t + s = β 0 t + s ) ] φξ ) and σ 2 σ0 2, and by 24) and the boundedness away from of c n 23), β σ 2 ), β β 0 = O P lm m m τ ) σ 2 φξ ) + s It follows by the nspecton of the dsplayed functon above wth root σ 2 that to top order, [ lm Φξ )) σ2 σ0 2 m m τ 2 φξ ) κ x tr β ] ) 0 0 τ τ 29

3: Central Limit Theorem, Systematic Errors

3: Central Limit Theorem, Systematic Errors 3: Central Lmt Theorem, Systematc Errors 1 Errors 1.1 Central Lmt Theorem Ths theorem s of prme mportance when measurng physcal quanttes because usually the mperfectons n the measurements are due to several