European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 Uncertantes n the Swedsh PPI and SPPI Krstna Strandberg 1, Anders orberg, Marcus Frdén 3 1 Statstcs Sweden, Stockholm, Sweden; krstna.strandberg@scb.se Statstcs Sweden, Stockholm, Sweden; anders.norberg@scb.se 3 Statstcs Sweden, Stockholm, Sweden; marcus.frden@scb.se Abstract In estmatng the uncertantes n a sample survey t s easy to concentrate on the samplng error snce t often can be quantfed numercally. In the Swedsh PPI and SPPI there s an establshed formula for estmatng the samplng error. The formula takes nto account the mult-stage samplng desgn as well as the fnte populaton correcton. However, a sgnfcant part of the uncertantes n these surveys are non-samplng errors, such as specfcaton error, measurement error and data processng error. An effort has been made to estmate the mpact of these errors and for each stratum the error contrbuton from non-samplng sources. It s not clear how to combne the samplng error wth the non-samplng error nto an overall measure of total error. We propose a method to estmate the total survey error and to dentfy strata wth the bggest total uncertanty. Keywords: varance estmaton, non-samplng error, total survey error. 1. Introducton Work of explorng the uncertantes n the Producer and Import Prce Index (PPI) and the Servces Producer Prce Index (SPPI) was prompted by a proect at Statstcs Sweden where the man task was mappng the uncertantes n the Swedsh Gross Domestc Product (GDP). A better understandng of the uncertantes n the prmary statstcal sources would be a frst step towards mtgatng errors n the fnal product. Studes on the topc have been carred out n proect form (Isaksson et. al. 014 and 015). In the plot, a number of senstvty analyses for the Swedsh GDP were conducted, nvestgatng the mpact on the GDP of dfferent nput data sources for product groups and ndustres, varyng prce ndexes, and of dfferent deflaton methods. 1
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 The man study further explored the new nsghts made possble by the senstvty analyses. One focus was the SCM method : a method for automatc balancng of the natonal accounts (Calzaron et. al. 1998). The method was frst descrbed n 194 and t got ts name from the creators; Stone, Champernowne and Meade. A short descrpton of the method s gven n chapter. Throughout the studes, varance estmaton methods for the PPI and the SPPI were studed. The conclusons are summarzed n chapter 3. Dfferent ways to evaluate non-samplng errors were looked nto, whch s dscussed n chapter 4. A necessary effort to evaluate the total uncertantes of these ndexes, however not scentfcally strngent, s presented n chapter 5.. Prce Indexes and the atonal Accounts.1. Producer and Import Prce Index and Servces Producer Prce Index PPI s a monthly survey that ams to show the average change n prces n producer and mport stages for dfferent product groups. Prces are measured n the frst dstrbuton stage, when products ext the producton process from Swedsh producers or when products cross the Swedsh customs fronter enterng the Swedsh market. The quarterly SPPI ams to show the average prce change for servces produced n Sweden... Populaton, Sample Selecton and Estmaton Obects n the PPI and the SPPI are transactons and the target populaton s formed by all the transactons referrng to goods and servces sold by a Swedsh producer or mported by a Swedsh mporter. Market weghts for product groups are calculated from prevous year s sales. Desgn weghts on the other hand, stem from the samplng desgn. These weghts are combned to form the weghts used n estmaton. Wth ths nformaton, the prce change between the current perod and a base perod s calculated for each of the product groups n the survey. A samplng frame s constructed by combnatons of companes and product groups from other surveys, such as Foregn Trade n Goods and the Structural busness statstcs. From the frame,
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 a frst-stage πps-sample of companes s selected wthn each stratum/product group. In a second-stage Statstcs Sweden and the respondent cooperate n selecton of a typcal transacton to follow monthly. In total, about 9000 transacton prces are collected for PPI/SPPI durng 015.3. atonal Accounts One mportant aspect of the natonal accounts (A) s that all values are presented n current prces as well as fxed (free from nflaton) prces. Turnng a current prce value nto a fxed prce value s called deflaton. Most fxed prce calculatons n the Swedsh A are done by dvdng values n current prces wth correspondng prce ndexes. Indexes used are the Consumer Prce Index (CPI), the PPI, the SPPI and the Buldng Prce Index. Uncertantes n these prce ndexes are thereby carred drectly nto the A. There are three approaches for calculatng the GDP n the A: the producton approach, the expendture approach and the ncome approach. Theoretcally these estmates should be the of the same sze, but snce they are calculated from dfferent ndependent sources, there s usually a dscrepancy. Therefore a post-adustment ( balance ) of these estmates s necessary n order to comply wth the requrements of the accountng system. The balancng of the accounts s a bg task, relyng almost exclusvely on subectve methods. In short, the SCM s a generalzed least square method, usng nverted uncertanty measures for ndustres and product groups as weghts. The method s used to mnmze the sum of squared dscrepances between the estmates n the accountng system. A fully automatc and reproducble balancng method could be an mportant tool to create benchmark values for the A. The method has been used for example by Calzaron (1998) and Chen (006 and 01). 3. Samplng Errors Results from earler nvestgatons of the varance estmaton methods were studed and confrmed by a smulaton study. Two varance estmates were pcked out as the best based on 3
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 ther performance n the smulaton. For sample allocaton we use a formula by Wngren (009) (Fg. 1), the fnte populaton correcton s however not necessary. For qualty measures Roséns (010) varance estmaton formula for πps-samplng (Fg. ) proved to be the best. Wngren s method (1) V 1 I n Vˆ ( I ) 1 1 n 1 where Vˆ ( I 1 ) n 1 n 1 I I n n 1 n 1 f and n 1 I n 1 n 1 I = stratum = observaton I = prce rato for observaton, stratum n = number of observatons n the sample n stratum f = fnte sample correcton (n terms of market shares) w = weghts used n estmaton Rosén s method () Varance estmates are vald for populaton totals. We adapt the method to our purposes accordng to d). a) Populaton total, ( y ), s estmated by; ˆ( y ) urvalet y, where the λ are the selecton probabltes b) The varance of the estmator s gven, wth good approxmaton, by: 4
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 V ˆ( y ) y 1 / 1 1 c) Consstent estmaton of V ˆ ( y ) s gven by: Vˆ ˆ( y ) n y y 1 1 1 n 1 1 d) We let y log ndex 1 1 y urvalet 1 1 urvalet. The populaton total for y dvded by the total weght then becomes a weghted mean of the ndexes. 4. on-samplng Errors It s a well-known problem n producton of statstcs that non-samplng errors are rarely estmable. For our purposes, we have asked expert personnel to decde f the non-samplng error contrbuton s ether Low, Medum or Hgh. on-samplng errors have been evaluated on product group level,.e. the three, four or fve-dgt level of SPI 1. The am has been to take nto account all sources of error whch are not covered by the samplng varance. The followng fve sources of error have been dentfed: Specfcaton error Frame error on-response error Measurement error Data processng error 1 The Swedsh verson of the European product classfcaton CPA Specfcaton error proposed by Bemer et. al. (014) s not defned as an error source n the Code of Practce. 5
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 Frame error s assumed to be dstrbuted evenly across product groups, and therefore not dffer much between product groups. The same reasonng s used for data processng, whch s carred out automatcally n the software. The only data processng done manually s qualty adustments and they are always udgmental whch gves a certan degree of uncertanty. Ths has been taken nto account n the overall assessment, but not gven as much mpact as the error sources mentoned below. Specfcaton error s consdered to gve the hghest contrbuton to the overall error. Specfcaton error means that we do not measure exactly what we want to measure. Such errors n PPI are due to use of lst prces and hourly rate methods. The proporton of lst prces and hourly rates based methods, n total, has been calculated for each product group. An occurrence of between 0 and 5 percent s deemed Low. Over 5 percent but less than 50 percent s deemed Medum, and 50 percent of more s deemed Hgh. The remanng error sources, non-response and measurement error have also been evaluated and are consdered low rsk. These fve separate sources of error have been combned nto one over all measure. More than an error, ths measure could be seen as rsk. As an example, prces for personal computers are measured by hghly experenced staff, usng A-methods. Stll, the character of the product group such as fast product development and frequent qualty adustments, result n a hgher rsk of faulty values beng collected. PPI conssts of a total of 508 strata. Out of these, 101 were deemed to have a non-samplng error contrbuton that s Low, for 60 strata the contrbuton was deemed Medum and for 147 strata the contrbuton was deemed Hgh. The correspondng numbers for the 69 SPPI strata are 19, 7 and 3. It should be ponted out that evaluatng the non-samplng error n ths manner s a hghly subectve method. Undoubtable wll t be affected by the pror knowledge and experence of the person/persons dong the evaluaton. For the work presented n ths paper, one subect expert handled the entre task. A better method would be to obtan several opnons and use an average. 6
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 5. Combnng Error Measures 5.1 Random Errors vs. Bas It s, for example, reasonable to thnk that lst prces (whch are used n place of hard to collect transacton prces) are systematcally hgher than transacton prces ncludng wthdrawn dscounts etc. Hence, per defnton a lst prce s a source of bas. It s a harder task to determne f a lst prce wll result n an overestmaton or an underestmaton of the ndex. Lst prces are usually fxed for a perod of tme, probably resultng n perods of systematc underestmaton of the true prce movements but not necessarly an underestmaton of change. When the lst prce s adusted accordng to the market, t s safe to assume that the prce movement wll be an overestmaton of that perod s true prce movement. Monthly ndexes are then averaged to create a yearly ndex and t s hard to say f the fnal result wll be an over- or an underestmaton of the true value. We argue that non-samplng errors n the PPI and the SPPI manly can be looked upon as random. We cannot say that a certan error source always result n an overestmaton or an underestmaton. Random errors wll show up n the varance estmate and there s a rsk that addng separately estmated non-samplng errors wll overestmate the total uncertanty. We have not seen any evdence that a stratum s unreasonably punshed wth a hgh total uncertanty measure. It s however fully possble that a stratum that s almost completely covered n the sample and thereby have very lttle (f any) samplng error, stll can have a substantal amount of uncertantes added from non-samplng sources. The combnaton of the samplng error and ths non-samplng error s what we are tryng to estmate. 5. Creatng One Uncertanty Measure It s not clear how to combne the samplng errors wth the non-samplng errors. Lterature studes have not resulted n any obvous solutons. We propose a method where the assessed non-samplng errors are gven numerc values and then combned wth the estmated samplng error. 7
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 As a startng pont, we assume that the samplng errors and the non-samplng errors accounts for 50% each of the total uncertantes n the entre survey (.e. all strata combned). Ths s a strong assumpton, but we don t have any evdence suggestng the scale to tp ether way. We developed ths algorthm: 1. The assessed non-samplng errors are gven numercal values accordng to Low = 1, Medum = 3 and Hgh = 9.. Errors are rescaled so that samplng errors and non-samplng error account for 50% each of the total uncertanty n the survey, for all strata combned. 3. For each stratum, the Total Uncertanty (TU) s calculated as: TU = Sample Varance + Assessed on-samplng Error 4. An ndcator s calculated as: Indcator = weght TU 5. Plot standard devaton vs. stratum weght for all strata 6. In the plot, mark strata wth hgh ndcator values. For the Swedsh PPI ths algorthm results n the plot shown n n Fg. 3. For the sake of clarty, strata wth extreme weghts (for ex. mnng, petroleum and auto manufacturng) are excluded from the plot these strata should always be examned carefully. As can be seen n the plot, t s the combnaton of varance, non-samplng uncertanty and weght that decdes f a stratum s marked as nfluental or not. 8
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 Fg. 3. A plot of standard devaton vs. weght for strata n the PPI. Strata wth extreme weghts are excluded n the plot. Strata wth nfluental total uncertanty,.e. a hgh ndcator value are marked wth an H. 6. Fnal Remarks Under the umbrella of the GDB proect, work was carred out wth three maor ams: 1. Ganng a better understandng of the error profle n the PPI and the SPPI. Creatng a measure of total uncertanty to use n the SCM method for automatc balancng of the A 3. Identfyng strata where error mtgaton efforts are needed the most 9
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 A careful study of the error profle for the PPI and the SPPI helped shed lght on where qualty mprovement efforts would be most effcent. After an evaluaton of the dfferent error sources we found that mtgatng non-samplng errors s the most effcent way of mprovng the qualty of the surveys. By combnng the samplng varance wth the non-samplng error accordng to the algorthm presented, we were able to create numerc estmates of the total uncertanty of each stratum n the PPI and the SPPI. Whle the estmates are not statstcally strngent, they could be used n the SCM method to automatcally balance the A. In addton, estmates are also used to create a plot where strata wth bg uncertantes are dentfed. Ths nformaton can be used for qualty reports as well as a tool n the manual work of balancng of the A. 7. References Bemer, P., Trewn D-, Bergdahl H. and Japec L. (014), A System for Managng the Qualty of Offcal Statstcs, Journal of Offcal Statstcs, Vol. 30, pp. 381-415. Calzaron, M. and Puggon, A. (1998), Evalutaon and Analyss of the Qualty of the atonal Accounts Aggregates, Report to the Task Force on Accuracy Assessment of atonal Accounts, Eurostat. Chen, B. (006), A Balanced System of Industry Accounts and Structural Dstrbuton of Aggregate Statstcal Dscrepancy, workng paper WP006-8, Bureau of Economc Analyss, Washngton, DC. Chen, B. (01), A Balanced System of U.S. Industry Accounts and Dstrbuton of Aggregate Statstcal Dscrepancy by Industry, Journal of Busness and Economc Statstcs, Vol. 30 (), 0-11. Isaksson, A., Xe, Y., Strandberg K., Lndholm, P., Lennmalm, A. and Lennartsson, D. (014), Känslghetsanalys BP Förstude, Statstcs Sweden 10
European Conference on Qualty n Offcal Statstcs (Q016) Madrd, 31 May-3 June 016 Isaksson, A., Frdén, M., Lennartsson, A., Lennmalm, A., Saltvet, M., Strandberg, K. and Xe, Y. (015), Känslghetsanalys BP Huvudstude, Statstcs Sweden Rosén B. (010), Teor och praktk för urvalsundersöknngar, Kompendum, Mattematsk statstk Stockholms unverstet, Sweden. Stone, R., Champernowne, D.G. and Meade, J.E. (194), The Precson of atonal Income Estmates, The Revew of Economc Studes, Vol. 9, 111 135. Wngren, J-E. (009), Varance Analyss for PPI and SPPI, fnal techncal mplementaton report, Prce Statstcs Unt, Statstcs Sweden. 11