EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 2012 MODULE 8 : Survey sampling and estimation Time allowed: One and a alf ours Candidates sould answer THREE questions. Eac question carries 20 marks. Te number of marks allotted for eac part-question is sown in brackets. Grap paper and Official tables are provided. Candidates may use calculators in accordance wit te regulations publised in te Society's "Guide to Examinations" (document Ex1). Te notation log denotes logaritm to base e. Logaritms to any oter base are explicitly identified, e.g. log 10. Note also tat n r is te same as n C. r 1 HC Module 8 2012 Tis examination paper consists of 5 printed pages, eac printed on one side only. Tis front cover is page 1. Question 1 starts on page 2. RSS 2012 Tere are 4 questions altogeter in te paper.
1. Te organisers of a large maraton selected a simple random sample of 1000 atletes from te 10 000 wo entered. Atletes were notified as a condition of entry to te maraton tat if tey were selected tey would ave to provide a urine sample and answer questions about teir training. Te primary purpose of te sampling was to estimate te proportion p of all atletes using steroids and performance-enancing drugs. Of te 1000 atletes tested, 35 were found to be positive. Give approximate 95% and 99% confidence intervals for te proportion of all atletes using steroids and performance-enancing drugs. For future maratons, te organisers' aim is to sample a sufficient number so tat te alf-widt of te 95% confidence interval for p is less tan 0.01. If te organisers were to use simple random sampling sow tat tis aim sould be acieved wit a sample of about 1300. Te 1000 sampled atletes were asked to give te average weekly mileage X tey ran during te 8 weeks preceding te maraton. Te mean of X in tis sample was 46.8 and te sample standard deviation was 6.2 miles. Calculate a 95% confidence interval for te mean of X for te 10 000 runners wo entered te maraton. Explain wat tis confidence interval sows. (iv) Te organisers would, for practical reasons, muc prefer to use a systematic sample. Explain briefly wat assumptions tis would require, and wat advice you migt give tem. If any atletes ad refused to be tested for steroids and performance-enancing drugs wen selected, ow migt tis ave led to bias in te survey results? (3) 2
2. A regional council wises to assess te amount of azardous waste produced by te 6231 manufacturing companies in its area. Te companies are split into tree strata: (1) basic metal industries; (2) food, textiles and mineral products; (3) oter manufacturing. A simple random sample of companies was taken in eac stratum, and for eac company te total amount of azardous waste produced in 2003 was measured. Stratum () N Hazardous waste ('000 tonnes) n y 1 92 11 166.6 207.7 2 1612 61 7.7 14.7 3 4527 292 0.3 4.5 Total 6231 364 s Define N, n, y and s. Estimate te mean amount of azardous waste produced per company and obtain an estimate of te standard error of your estimator. Give an approximate 95% confidence interval for te mean amount of azardous waste per company. Compute te sample sizes in te strata if proportional allocation ad been used for tis survey. Give brief reasons wy a stratified sample using proportional allocation sould give muc more precise results tan a simple random sample of 364 units. Explain briefly weter te allocation actually used as been effective in improving precision compared wit a proportional allocation. Te council wises to report estimates of te mean amount of azardous waste produced per company for eac of te tree strata, as supporting information. Obtain a point estimate and an approximate 95% confidence interval for te mean amount of azardous waste produced per company for te basic metal industries. 3
3. A researcer selects a simple random sample of 2055 farms from te 75 308 farms in a large region in a developing country, and te number of cattle (y) and te total area under cattle (x) were recorded for eac farm. Te results were as follows. Sample total number of cattle, y i 25 751 Sample total area (ectares), x i 62 989 2 Te sum of te squares is y i 596 737. Te total area under cattle in tis region is 2 353 365. Using te mean of te simple random sample, estimate te total number of cattle in te region, and te standard error of your estimator. Te researcer seeks your advice on ow te supplementary information on te area under cattle in te region migt be used to estimate te total number of cattle in te region. Discuss briefly wy eiter a ratio or regression estimator could be appropriate for tese data. Explain ow you would decide weter to use a ratio or regression estimator. (3) Te researcer decides to use a ratio estimator, and asks you to comment on is results compared wit tose obtained in part. You may assume tat te ratio estimate of te total number of cattle in te region is 962 055, and its estimated standard error is 14 020.7. Comment on te relative standard errors. If it was suggested to you tat te ratio estimate sould not be used because it is biased, ow would you reply? Explain ow and wy stratification and clustering migt be useful in suc a survey, and wat practical problems tey could elp to overcome. 4
4. A careers advisor at a university is interested in finding out details relating to occupations of te university's graduates one, tree and seven years after graduation. Te university administration olds student records on a computer database and could easily provide a list of all students wo graduated in a given year, wit te following information for eac student. Identification number Name Sex Mode of study (full-time or part-time) Level of qualification (post-graduate, first degree, oter undergraduate) Faculty/subject Date awarded Current address (e-mail or ome) or telepone number Domicile prior to year of entry Te careers advisor is wondering weter to use a cross-sectional sample survey of tose students wo graduated one, tree and seven years ago, or a longitudinal study of a sample of last year's graduates. Explain ow a cross-sectional sample survey and a longitudinal study differ in teir construction and use. (7) Te careers advisor decides to use a longitudinal study of a sample of last year's graduates. Discuss te potential advantages and disadvantages of a longitudinal study for te proposed survey. Outline a possible survey metodology tat te careers advisor could use to follow up on te occupations of graduates. You sould discuss factors suc as identification of te population, constructing a sampling frame, te sampling metodology, and ow te study sould be conducted, mentioning any difficulties te university is likely to encounter in attempting to contact students. 5