EUROPEAN COMMISSION EUROSTAT Directorate F: Social statistics Unit F-5: Education, health and social protection Luxembourg, 23 March 2017 DOC SP-2017-06-Annex 2 https://circabc.europa.eu/w/browse/26803710-8227-45b9-8c56-6595574a4499 ESSPROS Expert Group (1486): https://circabc.europa.eu/w/browse/96adef83-8ee4-4c76-8c37-83502f932ec9 Working Group Social Protection 4-5 April 2017 Data validation: rules for the module on pension beneficiaries Item 6 of the Agenda Meeting of the Working Group Social Protection statistics Luxembourg, 4-5 April 2017 BECH Building Room Quetelet
ANNEX 2 OF ITEM 6 DATA VALIDATION RULES FOR THE MODULE ON PENSION BENEFICIARIES This document lists the validation rules for data collected through the ESSPROS module on pension beneficiaries. Note: In this document a cell refers to the actual value of a single observation (described by a set of dimensions). 0. Validation level 0 (format and file structure) Checks that fall within validation level 0 concern the format and the structure of the files transmitted to Eurostat. As the file structure is under discussion (see point 4 of the Working Group agenda) a detailed list will be provided at a later date. DICTIONARIES CHECK A number of dimensions are used to identify individual observations. Certain dimensions, such as those specifying the country and detailed classifications (see Appendix I of the ESSPROS methodology), are fixed according to predefined code lists (or dictionaries). The values used to specify fixed dimensions of the data must conform to the agreed values laid out in the respective code lists (or dictionaries) for each dimension. In the current data structure the coded dimensions are: Country, Item, and Scheme. Please note that period is a dimension identifying an observation but is not coded. When the new data structure (see item 5 of the agenda) is adopted these rules will be updated according to the new dimensions and associated code lists. 0.1. Country codes are valid entries in the country code list (dictionary) The country codes used must be one of the values in the country list. 0.1 Country codes used are defined in the country code lists (dictionary) _ Fatal ERROR 0.1: code for country not defined in the country_list. Examples: = GR Fatal error = FX Fatal error = CR Fatal error 2
0.2. Items codes are valid entries in the item code list (dictionary) The items codes used must be one of the values in the item list. 0.2 Item codes used are defined in the item code lists (dictionary) _ Fatal ERROR 0.2: code for item not defined in the item_list. Examples: = 1000001 Fatal error = 2000000 Fatal error = "1151111" Fatal error 0.3. Schemes codes are valid entries in the schemes code list (dictionary) The scheme codes used must be one of the values in the scheme list. 0.3 Scheme codes used are defined in the scheme code lists (dictionary) _ Fatal ERROR 0.3: code for item not defined in the scheme_list. Examples: = SCH1 Fatal error = allschemes Fatal error = "All Schemes" Fatal error 0.4. Flags are valid entries in the flags code list (dictionary) The flags used must correspond to those available in the flag code list ("b", "e", "p"). 0.4 Flags used are defined in the flag code list _ Fatal ERROR 0.4: code for flag used (dictionary) in cell is not defined in the flag_list. Examples: = 12 z Fatal error = 12 S Fatal error = - Fatal error 1. Validation level 1 (checks on the content of the file) A) Format and content of dimension identifiers 1.1. Cell contents are not negative The cell values must be 0 or positive. A negative value will generate a fatal error 1.1 Cell contents are positive or "0" _ Fatal ERROR 1.1: The value in cell cell is negative 3
1.2. Number of pension beneficiaries is an integer The number of pension beneficiaries should be an integer number. The cell values of the current questionnaires can take the form of a string because flags are concatenated to the end of the number of beneficiaries (without any space in between). Taking this into account, if the length of the cell value is "N": Cell value must be an integer if N=1. Sub-string of cell value from 1 to "N-1 must be an integer for N>1 Any cases where this does not hold will generate a fatal error. 1.2 Number of pension beneficiaries is an integer _ Fatal ERROR 1.2: The value in cell cell is not an integer. Examples: Cell value= 1.2 Fatal error Cell value= 1.2*3p Fatal error 1.3. Flags are correctly concatenated to the numeric values Flags are concatenated to the end of numerical values without any space in between. This rule is applicable only to the current data transmission format (Excel questionnaire). In the proposed file structure (discussed under item 5 of the Agenda), this rule will be obsolete. If the length of the cell value is "N" and cell value is not a number: Sub-string of cell value from 1 to "N-1 must be an integer and character "N" must be a non-numeric character (<> blank) for N>1 Any cases where this does not hold will generate a fatal error. 1.3 Flags are concatenated to the numeric values without any space _ Fatal ERROR 1.3: flag in cell cell is not correctly concatenated. Examples: = 12 e Fatal error = "12*e" Fatal error = "12@1" Fatal error 4
1.4 For each item and scheme the total number of beneficiaries is equal to the sum of data by gender. For each item and scheme (included Total schemes), the sum of the number of beneficiaries by gender must be equal to the total number of beneficiaries. 1.4 For the optional item e) and for each scheme the _ Fatal ERROR 1.4: sum of data by sum of data by gender = total number of gender for item and scheme beneficiaries doesn t sum up to the total The two following checks are conceptually the same. They differ only for the threshold. 1.5 For each item the total number of beneficiaries (Total schemes) is greater than or equal to the number of beneficiaries of the scheme with the largest value. For each item, the number of beneficiaries at Total scheme level is by definition <= to the sum of number of beneficiaries by scheme. This is due to the required elimination of double counting at Total scheme level in case a beneficiary receives the same kind of pension from two different schemes (double counting type 2). However the Total number of beneficiaries can t be < of the number of beneficiaries of any scheme. 1.5 - Fatal ERROR 1.5: Total number of The number of beneficiaries at all schemes beneficiaries for item is level for the item X has to be greater or equal smaller than the value for to the number of beneficiaries for the scheme item and scheme[i] with the largest number of beneficiaries. ItemX (All schemes)>= max(itemx (Scheme [i]) for i = 1 to n. of schemes 1.6 Integrity rules linking items are satisfied (Absolute value) The items in the code lists are linked by equations (integrity rules) that must be respected for data consistency (see list below). In the Pension beneficiaries module the operator linking the aggregated data and their components is not "=". In fact, due to the elimination of double counting, the aggregated data can be "<" or "=" to the sum of the components. If an equation is not satisfied, it can be due to the unavailability of one of the components for any reason or to an error in the data reported. This should be explained. Any absolute difference greater than 1 generates a warning. 1.6 The integrity rules defined on the items are satisfied (aggregate <= sum of components) Threshold for difference 1 Fatal ERROR 1.6: equation eq_number not satisfied for cell Absolute difference greater than 1 5
Examples (equations expressed with the current codes classification): 1000000 = 1120110 + 1130110 + 1140111 + 1160113 1000000 = 115200 1120110 + 1130110 + 1140111 + 1160113= 115121 115200 > 115121 Fatal error The integrity rules defined on the items used in the pension beneficiaries module are: Total pension beneficiaries <= Total pension beneficiaries in disability function + Total pension beneficiaries in old age function + Total pension beneficiaries in survivors' function + Total pension beneficiaries in unemployment function Total pension beneficiaries in disability function <= Disability pension beneficiaries + Beneficiaries receiving early retirement benefits due to reduced capacity to work Total pension beneficiaries in old age function <= Old-age pension beneficiaries + Anticipated old age pension beneficiaries + Partial pension beneficiaries Old-age pension beneficiaries <= Old-age pension beneficiaries NMT + Old-age pension beneficiaries MT Anticipated old age pension beneficiaries <= Anticipated old age pension beneficiaries NMT + Anticipated old age pension beneficiaries MT Partial pension beneficiaries <= Partial pension beneficiaries NMT + Partial pension beneficiaries MT Total pension beneficiaries in survivors' function <= Survivors' pension beneficiaries NMT + Survivors' pension beneficiaries MT Total pension beneficiaries in unemployment function <= Beneficiaries receiving early retirement benefits for labour market reasons NMT + Beneficiaries receiving early retirement benefits for labour market reasons MT DOUBLE COUNTING CHECKS Pension beneficiaries are defined as recipients of periodic cash benefits reported under the ESSPROS pension classifications identified in Appendix III of the ESSPROS manual. It is possible that individuals receive multiple pensions. In the ESSPROS pension beneficiaries module, the beneficiaries reported for any classification corresponding to a particular type of pension or group of pension types should relate to the number of beneficiaries and should not double count those who receive multiple pensions within the classification. The ESSPROS identifies 6 different types of such double counting. The following validation rules detect the treatment of the double counting. 1.7 Double counting type 1 in the data When more than one benefit is classified under a single type of pension (the same elementary item) recorded in one cell of a single scheme, the pensioners receiving more than one of the benefits should only be counted once. See Appendix III, paragraph 4.2, section A of the ESPROSS manual. This check generates a warning. 6
section of Quality Report (section 2.2 (a)) Item d (supplementary optional item) corresponds to the number of benefits by scheme and gender. This check is possible only when item d is transmitted. 1.7 Double counting type 1: Pensioners receiving more than one of the >0 Warning WARNING 1.7 : Double counting of Type 1 has been benefits classified under the same item are treated only counted once. item d -1160113-1140111-1130110-1120110 > 0 1.8 Double counting type 2 in the data Beneficiaries may receive the same type of pension (the same elementary item) from more than one scheme. This double counting has to be removed when reporting beneficiaries at all scheme level so that those pensioners who receive a specific type of pension from two or more schemes are only counted once. See Appendix III, paragraph 4.2, section B of the ESPROSS manual. This check generates a warning. section of Quality Report (section 2.2 (b)) 1.8 Double counting type 2: For item X the value at total schemes level is less than or equal to the sum of values at schemes level. ItemX (All schemes) <= ItemX (Scheme [i]) for i = 1 to n. of schemes _ Warning WARNING 1.8: Double counting of Type 2 has been treated Example: Item 1121111, 2 schemes Scheme 01 =5000 Scheme 02= 3000 Total = Scheme 01+ Scheme 02 = 2000 Warning 7
1.9 Double counting type 3 in the data Beneficiaries may receive both a non means-tested and means-tested version of a pension of a specific type. This double counting has to be treated when adding up the beneficiaries to derive the total number of beneficiaries receiving that type of pension. This shall be treated only at level of the corresponding category. See Appendix III, paragraph 4.2, section C of the ESPROSS manual. This check generates a warning. section of Quality Report (section 2.2 (c)). 1.9 Double counting type 3: _ Warning WARNING 1.9: Double Pensioners are receiving both a non meanstested and a means-tested version of a specific counting of Type 3 has been treated pension type. Category pension X < category pension X (NMT) + category pension X (MT) Example: 1121111 = 6000 1122111 = 2000 NMT+MT = 8000 1120111 = 7000 Warning 1.10 Double counting type 4 in disability function Beneficiaries may receive pensions from two or more pension benefits within the Disability function. Such beneficiaries should only be counted once in the total beneficiaries reported for the function as a whole. See Appendix III, paragraph 4.2, section D.1 of the ESPROSS manual. This check generates a warning. section of Quality Report (section 2.2 (d)). 1.10 Double counting 4: Pensioners are receiving _ Warning WARNING 1.10 Double pensions from two or more pension benefits within the disability function. counting of Type 4 has been treated in disability function 1120110 < 1120111+1120112 Examples: 1120111 = 5800 1120112 = 320 1120110 = 5800 Warning 8
1.11 Double counting type 4 in old age function Beneficiaries may receive pensions from two or more pension benefits within the Old age function. Such beneficiaries should only be counted once in the total beneficiaries reported for the function as a whole. See Appendix III, paragraph 4.2, section D.1 of the ESPROSS manual. This check generates a warning. section of Quality Report (section 2.2 (d)). 1.11 Double counting 4: Pensioners are receiving _ Warning WARNING 1.11 Double pensions from two or more pension benefits within the old age function. counting of Type 4 has been treated in old age function 1130110 < 1130111+1130112+1130113 Examples: 1130111 = 1720 1130112 = 120 1130113 = 0 1130110 = 1720 Warning 1.12 Double counting type 5 in the data The total number of beneficiaries receiving at least one pension classified within either the old age function or the survivors' function has to be calculated for the cell Total beneficiaries in old-age and survivors functions (item 1190110). Pensioners who simultaneously receive pensions reported within both functions should only be counted once. See Appendix III, paragraph 4.2, section D.2 of the ESPROSS manual. This check generates a warning. section of Quality Report (section 2.2 (d)). 1.12 Double counting type 5: _ Warning WARNING 1.12 Double In the cell Total beneficiaries in old-age and survivors functions (item 1190110) the counting of Type 5 has been treated between old age and beneficiaries simultaneously receiving survivors function pensions within both functions are only counted once. 1190110< 1130110+1140111 Examples: 1130110 = 100 1140110 = 250 1190110 =300 Warning 9
1.13 Double counting type 6 in the data The total number of beneficiaries receiving at least one pension of any type has to be calculated for the cell Total number of pension beneficiaries (item 1000000). Pensioners who simultaneously receive pensions reported under multiple functions should only be counted once. See Appendix III, paragraph 4.2, section D.2 of the ESPROSS manual This check generates a warning. section of Quality Report (section 2.2 (d)). 1.13 Double counting type 6: In the cell "Total number of pension beneficiaries" (item 1000000), the beneficiaries receiving more than one pension, no matter of which type, are counted only once. 1000000 < 1120110+1130110+1140111+1160113 _ Warning WARNING 1.13 There is a risk of double counting Type 6. Examples: 1120110 = 100 1130110 = 150 1140110 = 250 1160113 = 70 1000000 = 500 Warning Rules for the optional data by residence For more detailed information about the rules defined on optional data by residence please refer to the document DOC SP-2016-07-Annex 5. 1.14 The total number of pension beneficiaries is greater than or equal to the number of beneficiaries living outside the country The total number of pension beneficiaries should be greater than or equal than the number of pension beneficiaries living outside the country. 1.14 Item 1000000 >= Number of pension _ Warning WARNING 1.14: Total beneficiaries living outside the country number of pension (optional data e)) beneficiaries is less than number of pension beneficiaries living outside the country Some countries provide data on the number of beneficiaries living outside the country at the scheme level and by gender. In these cases, the following validation rules should be applied to the optional data. 10
1.15 The total number of beneficiaries is equal to the sum of data by gender For each scheme (including Total schemes) the sum of number of beneficiaries by gender must be equal to the total number of beneficiaries. For the optional data the Error severity is Warning (Fatal error for the mandatory data). 1.15 For each item and scheme the sum of data by gender = total number of beneficiaries _ Warning WARNING 1.15: sum of data by gender for item and scheme doesn t sum up to the total 1.16 The total number of beneficiaries (Total schemes) is greater than or equal to the number of beneficiaries of the scheme with the largest value The number of beneficiaries at Total scheme level is by definition <= to the sum of number of beneficiaries by scheme. This is due to the required elimination of double counting at Total scheme level in case a beneficiary receives the same kind of pension from two different schemes (double counting type 2). However, the Total number of beneficiaries can t be < of the number of beneficiaries of any scheme. For the optional data the Error severity is Warning (Fatal error for the mandatory data). 1.16 - Warning WARNING 1.16: Total The number of beneficiaries at all schemes number of beneficiaries for level for the optional item e) has to be greater item is smaller than the value or equal to the number of beneficiaries for the for item and scheme[i] scheme with the largest number of beneficiaries. Item e) (All schemes)>= max(item e) (Scheme [i]) for i = 1 to n. of schemes 11
Validation level 2 (intra-domain, intra-source) 2.1 Time series checks: outliers detection The time series checks are performed at Scheme level. The range of years analysed in the time series check can varies according to the tool used. A 4 years range seems a reasonable choice. The current threshold is roughly fixed to 10%. As for the core system data a definition of thresholds for each specific item (possibly by country) should be identified. If the thresholds is not respected it should be explained. This check generates a warning. 2.1 Time series checks: outliers detection Warning ERROR 2.1: cell is a 10 % potential outlier 2.2 Revision checks: compare the revised values with those previously transmitted The revision checks are performed at scheme level. Any cases where a value within a scheme is revised by more than the threshold should be explained. The current threshold is roughly fixed to 10%. This check generates a warning. 2.2 Revision checks: compare the revised values Warning with those previously transmitted 10 % ERROR 2.2: Value in cell is significantly different to data provided previously. 12