Improving the accuracy of estimates for complex sampling in auditing 1.

Similar documents
**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

MATH 3200 Exam 3 Dr. Syring

Statistics Class 15 3/21/2012

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations

Confidence Intervals Introduction

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

On Some Statistics for Testing the Skewness in a Population: An. Empirical Study

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Module 4: Point Estimation Statistics (OA3102)

Introduction to Statistical Data Analysis II

Homework Problems Stat 479

Chapter 8: Sampling distributions of estimators Sections

Applied Statistics I

EX-POST VERIFICATION OF PREDICTION MODELS OF WAGE DISTRIBUTIONS

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Inference of Several Log-normal Distributions

BIO5312 Biostatistics Lecture 5: Estimations

Some developments about a new nonparametric test based on Gini s mean difference

Chapter 7 Sampling Distributions and Point Estimation of Parameters

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Mixed Logit or Random Parameter Logit Model

Process capability estimation for non normal quality characteristics: A comparison of Clements, Burr and Box Cox Methods

Chapter 5. Statistical inference for Parametric Models

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

MgtOp S 215 Chapter 8 Dr. Ahn

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 7 - Lecture 1 General concepts and criteria

GENERATION OF APPROXIMATE GAMMA SAMPLES BY PARTIAL REJECTION

A Stratified Sampling Plan for Billing Accuracy in Healthcare Systems

STRESS-STRENGTH RELIABILITY ESTIMATION

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Statistics 13 Elementary Statistics

Appendix A. Selecting and Using Probability Distributions. In this appendix

Chapter 7. Inferences about Population Variances

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

On modelling of electricity spot price

Probability & Statistics

1 Sampling Distributions

Sampling Distributions and the Central Limit Theorem

χ 2 distributions and confidence intervals for population variance

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

12/1/2017. Chapter. Copyright 2009 by The McGraw-Hill Companies, Inc. 8B-2

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

Back to estimators...

Homework Problems Stat 479

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Lecture 2. Probability Distributions Theophanis Tsandilas

Statistical Intervals (One sample) (Chs )

5.3 Statistics and Their Distributions

Bayesian Inference for Volatility of Stock Prices

Two Populations Hypothesis Testing

1/2 2. Mean & variance. Mean & standard deviation

1. You are given the following information about a stationary AR(2) model:

5/5/2014 یادگیري ماشین. (Machine Learning) ارزیابی فرضیه ها دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی. Evaluating Hypothesis (بخش دوم)

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Probability Models.S2 Discrete Random Variables

M249 Diagnostic Quiz

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Analysis of truncated data with application to the operational risk estimation

Chapter 4 Probability Distributions

Equivalence Tests for the Ratio of Two Means in a Higher- Order Cross-Over Design

12 The Bootstrap and why it works

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Homework: (Due Wed) Chapter 10: #5, 22, 42

A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications

Small Area Estimation of Poverty Indicators using Interval Censored Income Data

1 Bayesian Bias Correction Model

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution

Statistical estimation

The Probability of Legislative Shirking: Estimation and Validation

Chapter 9: Sampling Distributions

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

Chapter Seven: Confidence Intervals and Sample Size

Estimation of a Ramsay-Curve IRT Model using the Metropolis-Hastings Robbins-Monro Algorithm

Practice Exam 1. Loss Amount Number of Losses

NCSS Statistical Software. Reference Intervals

Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations

Chapter 8 Estimation

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Technology Support Center Issue

Statistics for Business and Economics

Two-term Edgeworth expansions of the distributions of fit indexes under fixed alternatives in covariance structure models

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

ECON 214 Elements of Statistics for Economists 2016/2017

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

THE USE OF THE LOGNORMAL DISTRIBUTION IN ANALYZING INCOMES

Transcription:

Improving the accuracy of estimates for complex sampling in auditing 1. Y. G. Berger 1 P. M. Chiodini 2 M. Zenga 2 1 University of Southampton (UK) 2 University of Milano-Bicocca (Italy) 14-06-2017 1 The research leading to these results has received support under the European Commissions 7th Framework Programme (FP7/2013-2017) under grant agreement n312691, InGRID Inclusive Growth Research Infrastructure Diffusion. Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 1 / 26

Overview 1 Audit Sampling 2 Definitions 3 Simulation study 4 Conclusions and Future research Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 2 / 26

AUDIT SAMPLING In auditing the goal is to verify if the values of the accounts reported by the company are not materially misstated To determine the total error in the amount reported by the company auditors should audit all accounts in the population of accounts. This is not possible as it is too costly and time expensive! In practice auditors verify only a sample of accounts to estimate the error of the total population of accounts (total amount error & error rate) Common statistical methods to select an audit sample are by without replacement or by probability proportional to size (PPS) Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 3 / 26

SAMPLING PLANS IN AUDITING For the auditor the only interest is to verify that the error rate is within a pre-assigned value, if the observed rate is greater than the pre-assigned value a census is made. Practically speaking all the sampling methods are reliable for audit sampling, by the way Monetary Unit Sampling (MUS) - a particular case of the PPS - is the most popular! This sampling method directs efforts towards high-valued items which contain the greatest potential of large overstatement Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 4 / 26

SAMPLING PLANS IN AUDITING The way the auditors make their choices in terms of sampling strategies is frequently based on personal experience! In general: book values distributions are highly positively skewed with different percentage of errors Two main scenarios can be met: - a population with relatively large number of small accounts combined with high rate of mistakes - a population with relatively large number of accounts combined with a small rate of mistakes Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 5 / 26

DEFINITIONS An accounting population consist of N line items with book (or recorded) values, y 1, y 2,, y N and total book amount T y defined by: T y = N y i. i=1 The audited (true) amount of the N line items in the population is denoted by x 1, x 2,..., x N and the total audited amount is: T x = N x i. i=1 Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 6 / 26

DEFINITIONS The error in item i, is z i = y i x i, 1 i N. When z i > 0, the i-th item is said to be overstated and when z i < 0, it is understated. When z i = 0, the account is said to be error free. The total error amount is defined as: N T z = z i. For y i 0, t i = z i y i = y i x i y i is called the fractional error or taint. The values (x 1, x 2,..., x N ) are unknown before sampling, whereas (y 1, y 2,..., y N ) are known. It is assumed that the amount of any overstatement does not exceed the stated recorded value. i=1 Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 7 / 26

DEFINITIONS The purpose of the audit is to estimate the total error amount T z : n n T z = z i = t i y i i=1 i=1 obtained by the examination of a sample of n items of the account. Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 8 / 26

Stringer Bound (Stringer, 1963) Let T 1,..., T n be the independent random variables which represent the taintings. The distribution of these taintings is some unknown mixture of distributions on the interval [0; 1], so that Pr(0 T i 1) = 1. Let denote µ = E(T i ) and let 0 t 1:n t 2:n... t n:n 1 be the ordered statistics of (T 1, T 2,..., T n ). For α (0; 1) and i = 0, 1,..., n 1 let p = p n (i; 1 α) be the unique solution of: with p n (n; 1 α) = 1. i k=0 ( ) n p k (1 p) n k = α k Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 9 / 26

Stringer Bound (Stringer, 1963) The Stringer method for obtaining an upper bound for the total overstatement error can be obtained by combining the upper limits for the sample error rates with the taints: ˆT z = T y p n(0, 1 α) + T y n [p n(i, 1 α) p n(i 1, 1 α)]t n i+1:n i=1 then p n(i, 1 α) is the (1 α) upper confidence limit for the binomial parameter when i errors are observed in a sample of size n. Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 10 / 26

Stringer Bound (Stringer, 1963) Equivalently for a given α, n and number of errors i, it is possible to find the value p n (i, 1 α) that satisfies: i j=0 ( ) n [p n (i, 1 α)] j [1 p n (i, 1 α)] n j = α. j The Stringer bound is sometimes calculated using the Poisson approximation for obtaining the upper confidence limits p n (i, 1 α). Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 11 / 26

Empirical Likelihood estimator EL is a non-parametric likelihood. Hartley and Rao in 1968 first introduced it in the context of survey sampling as scale-load approach. From early 2000 the EL approach has been introduced also in survey sampling literature. EL approach provides non-parametric confidence intervals similar to the parametric likelihood ratio intervals. Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 12 / 26

Empirical Likelihood estimator The shape and the orientation of the EL intervals are completely determined by the data. Chen et al. (2003) obtained EL intervals on the population mean for populations containing many zero values thats the case of audit sampling. Parametric likelihood ratio intervals based on parametric mixture distributions perform better than the standard normal theory intervals in terms of coverage, but EL intervals perform better under deviations from the assumed mixture model, by providing non-coverage rate below lower bound closer to the nominal value and also larger lower bound. Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 13 / 26

EL approach (Berger, 2016) Let U be a finite populations of N units. θ 0 is the unique solution G(θ) = 0, G(θ) = g i (θ) i U The empirical log-likelihood function is defined l (m) = log( m i ) = log(m i ) (1) i S i S m i is estimated by the value ˆm i which maximize l (m) subject m i 0 m i c = C i S Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 14 / 26

EL approach (Berger, 2016) The solution is given by where π i = np i. ˆm i = {(t + η) T c i } 1 = (π i + η T c i ) 1 Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 15 / 26

Maximum EL estimator The empirical log-likelihood ratio function is defined by ˆr(θ) = 2{l( ˆm) l( ˆm, θ)} where l( ˆm) = log( ˆm i ) i S l( ˆm, θ) = log( ˆm i (θ)) i S ˆm i (θ) is the value that maximize (1) subject to - m i 0 - m i ci = C with ci = (ci T, 0) T and C = (C T, 0) T i S Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 16 / 26

HOW TO EVALUATE AUDITING RISK Previous studies has already demonstrated that different sampling plans such as Systematic Sampling or Probability Proportional to Size plans (such as Unrestricted Random Sampling, Lahiri Sampling,...) are all reliable even for all sample size! We evaluate the efficiency of EL Bound respect the Stringer Bound. The parametr of interest is the ratio of error-per-euro (Taint) Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 17 / 26

SIMULATION STUDY We simulated data using a real accounting population of credit invoices of an audited society. 40,0 30,0 Frequency 20,0 10,0 0,0 0 1000 2000 3000 4000 5000 Value Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 18 / 26

SIMULATION STUDY X LogNormal(7.001, 1.71). N = 10000. m = 1000 samples. Error randomly associated to the X i (no fraudulent hypothesis) Error rates : + 5%, + 10% + 20%. Taint simulated values: - 0.1-0.3; - 0.5-0.7; - 0.2-0.7; Sample fractions: * 0.05 * 0.1 Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 19 / 26

SIMULATION STUDY To compare different estimators Tightness, MSE and Coverage Probability have been computed Tightness is used to indicate how close the bound value is to the true error rate: (AV (ˆt) t) t 100 If it is small the sampling method is said to be tight and if it is large is said to be conservative Variability of the bound: this is an indicator of the uncertainty of the bound. This is measured by Mean Squared Error (MSE): 1 m m (ˆt i t) 2 where ˆt i is the estimated value for the error rate at the ith replicate. i=1 Coverage probability: for a specific bound it refers to the proportion of replications for which a bound is greater than or equal to the true population error amount. A bound is considered unreliable if its coverage is significantly below the specified nominal coverage, otherwise it is reliable. Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 20 / 26

Case Study I: Taint 0.1-0.3 EL Stringer % Error % n/n Tight. MSE COV Tight. MSE COV 5 5 35.16% 8.745E-05 0.976 49.13% 8.012E-04 1 5 10 24.12% 3.275E-05 0.999 32.16% 3.121E-04 1 10 5 23.75% 1.311E-04 0.978 30.88% 6.532E-04 1 10 10 15.84% 2.003E-04 0.987 21.38% 9.041E-04 0.996 20 5 15.50% 3.096E-04 0.969 19.44% 8.521E-04 0.988 20 10 34.78% 1.281E-04 0.962 47.20% 8.301E-04 0.991 Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 21 / 26

Case Study II: Taint 0.5-0.7 EL Stringer % Error % n/n Tight. MSE COV Tight. MSE COV 5 5 35.18% 9.451E-05 0.964 61.31% 1.713E-03 1 5 10 24.12% 7.921E-04 0.963 41.21% 1.312E-03 1 10 5 23.89% 4.251E-05 0.9106 39.90% 9.101E-04 0.9984 10 10 16.50% 1.741E-04 0.984 25.86% 1.700E-04 1 20 5 15.43% 7.096E-05 0.978 23.46% 3.512E-04 1 20 10 10.99% 2.283E-05 0.984 15.73% 7.192E-04 1 Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 22 / 26

Case Study III: Taint 0.2-0.7 EL Stringer % Error % n/n Tight. MSE COV Tight. MSE COV 5 5 37.37% 2.217E-04 0.971 70.82% 1.653E-03 1 5 10 24.77% 4.628E-05 0.99 38.27% 7.510E-04 1 10 5 25.23% 1.115E-04 0.971 36.36% 1.210E-03 0.998 10 10 16.44% 4.941E-05 0.97 24.96% 2.612E-04 1 20 5 16.91% 2.140E-04 0.912 24.09% 1.300E-03 0.989 20 10 11.09% 8.162E-05 0.953 16.27% 6.010E-04 0.995 Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 23 / 26

Results * EL is generally better than SB for estimating error rate * EL is tighter for the estimation of the real error rate. SB is more conservative! * MSE of SB is greater respect EL. * EL and SB can be considered reliable. Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 24 / 26

Conclusions and Future research * We introduced di EL Bound to estimate the Upper Bound for the estimation of the error rate * In general EL Bound perform better respect the Stringer Bound. + Other distributions (Dagum, Gamma,...) for the X values. + Other hypothesys on the generation of the error (on the queues,...) Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 25 / 26

References Arens, A.A., Loebbecke, J.K., (1981). Applications of Statistical sampling to Auditing, Prentice-Hall, Inc.. Berger, Y. G., De La Riva Torres, O. (2015). Empirical likelihood confidence intervals for complex sampling designs, Journal of the Royal Statistical Society: Series B (Statistical Methodology),78(2),319-341. Chen, J., Chen, S. Y., and Rao, J. N. K. (2003). Empirical likelihood confidence intervals for the mean of a population containing many zero values. The Canadian Journal of Statistics 31, 53-68. Chiodini P.M., Zenga M. (2014) Efficiency of the sample plans for symmetric and non-symmetric distributions in auditing: a comparison. in Contribution to sampling Statistics, ed Mecatti F., Conti PL., Ranalli MG Horgan, J.M. (2008). Monetary-unit sampling old and new, School of Computing, Dublin City University, Dublin,Ireland. Nandram, H.N., (2009). Applications of Statistical sampling to Auditing, Monetary unit sampling: Improving estimation of the total audit error, Advances in Accounting, incorporating Advances in International Accounting, 25, 174-182. Owen Art B. (1988). Empirical likelihood ratio confidence intervals for a single functional, Biometrika 75(2), 237-49 Rao, J. N. K. (2006). Empirical Likelihood Methods for Sample Survey Data:An Overview AUSTRIAN JOURNAL OF STATISTICS, 35(2-3), 191196 Stringer, K.W. (1963). Practical aspects of statistical sampling in auditing. Proceedings of the Business and Economics Statistics Section, 405411. ASA Berger, Chiodini, Zenga Improving the accuracy of estimates 14-06-2017 26 / 26