Labour Force Survey in Belarus: determination of sample size, sample design, statistical weighting

Labour Force urvey i Belarus: determiatio of sample size, sample desig, statistical weightig Natallia Boku Belarus tate Ecoomic Uiversity, e-mail: ataliaboku@rambler.ru Abstract The first experiece of formig samplig frames i Belarus for the Labour Force urvey (LF) is aalyzed. Various optios for determiig the sample size are show. ome issues of sample desig ad estimatio are cosidered. Keywords: sample size, selectio, three-stage sample, iterative weightig. Itroductio I Belarus, util recetly, data o the size ad structure of the labor force have bee formed oce a year, whe calculatig the balace of labour resources. The major sources of iformatio o the labour market were as follows: cotiuous reportig of orgaizatios, admiistrative sources ad cesus. Despite a rather adequate ad detailed measuremet of idicators of the ecoomically active populatio, the existig system of the curret accout had o possibility of mothly ad aual estimates of the actual level of uemploymet, which accordig to the Cesus 009, 6-7 times higher tha its recorded amout; did ot allow to estimate employmet by age, professioal groups, to determie the status of employmet, uderemploymet, etc. These factors have caused eed for a specialized Labour Force urvey. Nowadays, the Natioal tatistical Committee of the Republic of Belarus together with some foreig ad atioal experts makes the preparatory work o implemetatio of the Labour Force urvey (LF). I November 0 a test sample survey was coducted. ice 0 LF is provided o a regular basis. The purposes are: to obtai empirical statistics o the labour force, ecoomically active populatio, employed, uemployed; to obtai empirical statistics o labour force, employed, uemployed by sex, regios, rural, urba; to determie real labour force demad ad supply. Frequecy of the results: quarterly ad aual. LF data will be widely used for the labour market aalysis, assess the actual level of uemploymet, makig optimal maagemet decisios i the field of employmet.

The survey covers the whole coutry: urba ad rural areas i each regio. Private households are surveyed. Participatio i the survey is volutary. The target populatio comprises all residets aged 5-74 years. ample size Calculatio of samplig frame, o which represetativeess, duratio ad cost of the survey largely depeds, is the most importat stage of samplig. To calculate the sample size, with the usage of the appropriate formula, recommeded strategy for calculatio the sample size is to take ito accout several factors, coected with sample precisio, desigeffect (deff), household size, o-resposes. These factors are: the eed to select a key idicator by which the sample size is calculated; the precisio, eeded relative sample error; desired cofidece level; estimated (or kow) proportio of the populatio i the specified target group; predicted coverage rate, or prevalece for the specified idicator; sample deff; average household size; adjustmet for potetial loss of sampled households due to o-respose. As a key idicator it is recommeded to select oe of the most importat idicators for the survey ad o its basis to estimate the maximum size of samplig frame, yieldig a estimate for the miimal (ot less tha.5%) stratum of the populatio. electio of the target group ad key idicator icludes the followig stages:. electio of two or three target populatios that comprise small percetages of the total populatio (-year, -year, 5-year age groups) (Multiple Idicator Cluster urvey Maual (009), p. 4.8).. Review of importat idicator based o these groups, igorig idicators that have very low (less tha 5%) or very high (more tha 50%) prevalece. 3. Maximal idicator value, calculated for target group (0-5% of the populatio) is 5-0%. 4. Do ot pick from desirable low coverage idicators a idicator that is already acceptably low. Key idicator, used i Belorussia LF, is the real uemploymet rate (by the Cesus results). Target groups are ecoomically active populatios (rural, urba, by regios, 5-year groups). Desig-effect (deff) describes the ifluece of sample structure o the value of selectio bias, it is defied as a ratio of sample variaces of the actual stratified cluster sample ( ) ad of a simple radom sample of the a

same overall sample size ( ). Iteratioal statistical practice has show that the optimal value of deff is.5 (Multiple Idicator Cluster urvey Maual (009), p. 4.3-4.8), which may be sometimes high. The sample size formula is used (Boku, N., Cherysheva, T (997), p. 44-53; Multiple Idicator Cluster urvey Maual (009), p. 4.5-4.8, 4.): 4r( r) f., () (0.r) p h where required size for the key idicator; 4 the factor to achieve 95% level of cofidece, t-criteria; r predicted prevalece for the key idicator;. essetial factor i order to raise the sample size by 0% for o-respose; f the symbol for deff (.5); 0. recommeded relative sample error (95% level of cofidece); p proportio of the total populatio upo which the idicator (r) is based; household size. everal types of the sample size calculatios were executed: - radom selectio for rural ad urba populatio for each regio; - radom selectio for Belarus (for target groups); - radom selectio for each regio; - stratified samplig for each regio. h average I the first variat a small surveyed group is the ecoomically active populatio, accordig to the secod it is the umber of ecoomically active populatio i a particular age rage (5-0, 0-4 or 5-74 years). I the third ad fourth variats a key idicator is a uemploymet rate for the uit of a total populatio: the proportio of uemployed i the populatio aged 5-74 years. I this case, there is o eed to use the surveyed small groups i the calculatio to determie the sample size for each area the classic formula of the samplig theory is used used (Boku, N., Cherysheva, T (997), p. 7-50, 44-53), adjusted for deff, o-respose ad the umber of aged 5-74 years per oe HH i average. The examples of sample size determiatio are give i Tables -3. Table ample size for LF. Variat Target group Ecoomically active populatio of age 0-4 (565833 ) Ecoomicalyl active populatio of age 5-74 i rural area (0567 ) Real uemploymet rate %, r Target group size to total populati o, р to 5-74-year group, р / Average household size, h Number of of age 5-74 o average, fallig to oe HH, h Predicted sample size 4r( r) f. (0.r) p h 4r( r).5. (0.r) p 6067 0.7 5.95 7.5.43.94 8860 8860 69346 6.6.06 4.0.43.94 638 605 h

Regios Populatio of age 5-74, N, Table ample size for LF. Variat 3 Number of uemplo ymet, Proportio uemploy ed i the populatio aged 5-74 years, w of age 5-74 o average, fallig to oe HH, ample size,, umber of households Relative Relative stadard error stadard error =0.06, =0.075, relative relative limited limited error =0.5, (with deff ) h error =0., (without deff) Brest regio 0737 50065 0.047.9 350 3380 Vitebsk regio 979845 3708 0.038.87 4480 43 Gomel regio 398 46840 0.04.89 40 3946 Grodo regio 8963 3757 0.038.87 4474 4308 Misk 53844 5693 0.037.06 49 4043 Misk regio 387 37345 0.033.94 4997 48 Mogilev regio 868907 385 0.044.97 365 353 Total 75885 9799 0.040.94 9397 833 Regios Table 3 ample size for LF. Variat 4 Populatio of age 5-74, N, Proportio uemployed i the populatio aged 5-74 years, w urba rural urba rural ample size,, umber of households Relative Relative stadard error stadard error =0.06, =0.06, relative relative limited limited error error =0., =0., (without deff) (with deff ) Brest regio 785 3450 0,048 0,043 987 98 Vitebsk regio 77698 547 0,039 0,035 88 44 Gomel regio 844646 888 0,040 0,044 55 3788 Grodo regio 589695 39568 0,04 0,03 773 460 Misk 53844 0,037 4 637 Misk regio 636 4870 0,034 0,033 570 3855 Mogilev regio 67056 98346 0,044 0,046 09 334 Total 5705730 80655 0,040 0,038 903 8657 Calculatio results by differet variats have show that required aual sample size is 6-9 thousads of households, or i average 8 thousads. Without takig ito accout o-resposes sample size is thousads. Therefore predicted sample fractio is 0.6%, or 000 HH. It is plaed to examie 7000 HH o a quarterly basis. 3 ample desig The territorial three-stage sample is used: primary uit city or village coucil; secodary uit cesus eumeratio district or village (zoe); fial samplig uit household. As a sample frame for each stage of the selectio the data sets are used which are built by the Cesus 009: set of cities i each regio (the first stage); set of village coucils i each regio (the first stage); cesus eumeratio districts i each selected city (the secod stage);

villages (settlemets) i each selected village coucil (the secod stage); the household totality i each cesus eumeratio district ad village (the third stage). Aual updatig of the lists of eumeratio areas ad HH is assumed. At each stage uits are selected with systematic samplig with the probability that is proportioal to populatio size or to the umber of households. Variables used for the stratificatio are: admiistrative districts, urba/rural. The first stage. Tows, icludig urba settlemets ad rural coucils are selected. At first the tows, which ecessarily have to get ito the survey, are defied. A criterio of populatio size for their selectio is calculated from the peak value of the iterviewer (40 HH), the coefficiet of the sample (k = / N) ad the average household size (accordig to Cesus 009.43): ã 40, 43 600. Thus, the 0,006 sample icludes all the "large" cities with a populatio 7 thousad people or more. Urba settlemets with a populatio less tha 7 thousad people are selected systematically or radomly withi each regio. Their umber depeds o the pre-plaed umber of iterviewers ad the proportios of the populatio i small ad medium-sized tows (table 4). There are 78 cities to be surveyed (43 large, 35 small ad medium-sized), which represet over 38% of the total umber of cities i Belarus. Regio cities Table 4 The compositio of samplig frame for LF village coucils selected eumeratio areas settlemets i the village coucils selected households urba rural total Brest regio 3 3 3 560 560 40 Vitebsk regio 4 0 34 47 70 00 390 Gomel regio 4 0 38 7 3040 00 440 Grodo regio 8 36 40 30 3560 Misk - 56-4480 - 4480 Misk regio 3 6 8 33 40 90 460 Mogilev regio 8 3 5 560 968 358 Total 78 68 48 80 9840 860 8000 The secod stage. I urba areas, eumeratio areas accordig to cesus are selected, i rural settlemets accordig to cesus or village coucils accoutig. They are selected either accordig to a predetermied loadig ad the umber of iterviewers, or by a combiatio of radom ad systematic selectio with probability proportioal to populatio size. The third stage. I the selected sites i urba areas ad settlemets i rural areas the lists of residetial apartmets ad housig estates are compiled. From a actualized ivetory of housig uits HHs i urba ad rural areas are radomly selected. 4 tatistical weightig The methodology of weightig is based o the assigmet for each idividual uit correspodig statistical weight.

HH weights are calculated as reciprocal of overall sample probabilities: B i, () p p p 3 where p - the probability of selectig a city or a rural soviet; p - the probability of selectig each pollig district i cities, zoes ad rural soviets; p 3 - the probability of selectig each household withi the Cesus eumerated district or zoe. For the case of o-respose a additioal array of HH is reserved withi ot less tha 0% of the total sample ( 8000 0, 6000). Idividual s weights are based o iterative weightig (Multiple Idicator Cluster urvey Maual (009); Metodika provedeia bazovyh obsledovaij aseleija (997)). It is possible to use oe of two ways: a) the a simplified method; b) iterative weightig ( or more iteratios). A simplified method (M) assumes the calculatio of idividual weights based o the size of age groups, separately for rural ad urba areas: ij kuij, (3) bij where ij - idividual weight i-th geder-are group i urba (rural) area of j-th regio; geder-are group i urba (rural) area i total populatio; (rural) area, that has bee selected withi the regio. Iterative weightig (IW) ivolves several iteratios: Iteratio I: a) weights are calculated separately by sex, urba ad rural areas; ij - the size of i-th bij - the size of i-th geder-are group i urba b) the first correctio coefficiet (k) is calculated; weighted variables are: regio, sex, rural/urba; c) the secod correctio coefficiet (k) is calculated; variables are: regio, sex, five-years groups. Idividual weights are equal withi each regio, five-year groups, oe kid of a settlemet. Iteratio II: At the secod iteratio the operatios are implemeted o the subsequet adjustmet of the basic weight ad itermediate extrapolated data o the same criteria as for the first iteratio. Fial idividual weights for each five-year group: K B k, (4) i b k k3

j where: Bb ; s j k t ; E k jt E ; j, s j populatio size i j-th sex-age group based o the result of the Cesus ad survey; t populatio size i t-th group by rural (urba), sex (o the Cesus data); E extrapolated populatio size i t-th group (by Bb); jt populatio size i jt-th sex-age rural (urba) group; E extrapolated populatio size i jt-th group (by Bb ad k); k 3 geeric correctio coefficiet, calculated i the secod iteratio ( k k k ). k3 3 3... 3 Prelimiary results of iterative weightig for uemploymet rate ad employmet rate, calculated for Mogilev regio (Table 5) have show that received sample populatio is represetative. Relative errors for the regio do t exceed 7-8%: for the umber of uemployed 6%, umber of employed.8%, uemploymet rate 6.6%. Table 5 Idicators of sample represetativeess. Mogilev regio Characteristic value Error i the extrapolated, i absolute terms, i % Idicators geeral x Эx Э х а х Э populatio, х x M IW х M IW M IW employed, 5056 5063 55876 9360 9645.8.87 Urba area 400763 40333 496 99 069.95.57 - Male 9868 94658 05508 640 0850 6.5 5.8 - Female 07894 07675 07454 440 0. 0. Rural area 05754 03898 094 840 984.76 0.96 - Male 57064 558 558 836 0.3 3.3 0.0006 - Female 48690 48670 47686 003 984.0.06 Total umber of employed, - Male 49933 49885 60736 0804 085 4.4 4.6 - Female 56584 56346 5540 444 06 0.57 0.47 uemployed, 4064 4050 385 3 899 5.49 4.9 Urba area 3995 3094 933 663 76 9.08 9.4 - Male 9876 0046 838 495 665 8.3 9.06 - Female 0 049 095 69 998 0.67 9.0 Rural area 869 846 979 550 763 5.99 8.3 - Male 6065 593 657 507 640 7.7 9.75 - Female 564 485 607 43.63 4.69 uemployed () amog - Male 5940 5977 4953 987 04 3.96 4.0 - Female 4684 4533 3558 6 975 8.3 7.9 The results of trial calculatios ad testig of the first versio of methodological ad software samplig have show that the mai difficulties are associated with the use of differet weightig schemes, determiig the umber of iteratios steps, evaluatio of structural idicators of employmet ad uemploymet, the presece of atypical employmet o the level of primary uits (cities, districts).

5 Cocludig remarks The use of three-stage territorial samplig ad iterative weightig provides very reliable iformatio over larger umber variables of LF, coducted i Belarus. However, stadard errors, calculated by the level of uemploymet, the uemployed, i the cotext of geder-age groups at regioal level are rather high (0- %). Moreover, uder a give load ad a limited umber of iterviewers (00), it is ot possible o a quarterly basis to questio the estimated umber of HH - 8000. O the basis of the selected aual array of HH (8000), built by regios, for each quarter, radomly geerated four sub-samples are formed (each icludes 7000 HH). If the aual array of iformatio makes it possible to obtai a sufficietly represetative data at the level of the republic ad regios o most idicators (umber of employed, uemployed, the ecoomically active populatio, employmet, uemploymet, ad i the cotext of all sex-age groups, the urba ad rural areas), the quarterly array makes it possible to desig ad evaluate the idicators with a acceptable degree of accuracy (0-%) oly at the level of the coutry. To improve the represetativeess by regio the idicators of the survey ca be formed o the basis of the three samples the average for three cosecutive quarters. It is possible to icrease the umber of iteratios, to use alterative weightig schemes. Refereces BOKUN, N., CHERNYHEVA, T (997): Metody vyborochyh obsledovaij. Misk. COCHRAN, W (997): amplig techiques. Joh, Willey ad sos, ic. New-York. Multiple Idicator Cluster urvey Maual. Eurostat, 009. Metodika provedeia bazovyh obsledovaij aseleija. Kiev, 008.