Control Chart for Autocorrelated Processes with Heavy Tailed Distributions

Heldermann Verlag Economic Quality Control ISSN 0940-5151 Vol 23 (2008), No. 2, 197 206 Control Chart for Autocorrelated Processes with Heavy Tailed Distributions Keoagile Thaga Abstract: Standard control charts are constructed under the assumption that the observations taken from the process of interest are independent over time; however, in practice the observations in many cases are actually correlated. This paper considers the problem of monitoring a process in which the observations can be represented as a first-order autoregressive model following a heavy tailed distribution. We propose a chart based on computing the control limits using the process mean and the standard error of the least absolute deviation for the case when the process quality characteristics follows a heavy tailed t-distribution. This chart has narrow control limits since the standard error of the least absolute deviation is smaller than that of the ordinary least square estimator in the case of heavy tailed distributions. Key words and phrases: Least absolute deviation, autocorrelated processes, heavy tailed distribution. 1 Introduction Statistical process control techniques have found widespread application in industry for process monitoring and improvement. Control charts are commonly used to achieve these objectives. Most of the control charts discussed in the literature are designed under the assumption that a process being monitored will produce item characteristics that are independent and identically distributed over time when only the inherent sources of variation are present in the system. However, in some applications, the assumption of independent characeteristics is not realistic. For instance, variables from tanks, reactors and recycle streams in chemical processes show significant serial correlation (see Harris and Ross [5]). In some instances, the dynamics of the process will induce correlated variables which are closely spaced in time. If the sampling interval used for process monitoring in these situations is short enough, then the correlation can have very serious effects on the properties of standard control charts developed under the independence assumption, (see Maragah and Woodall [9], VanBrackle and Reynolds [18], Lu and Reynolds [8, 7] and Runger, Willemain and Prabhu [12]). If there is correlation between the variables, the process mean is not constant. It may be more realistic to assume that the process mean is continually wandering even when the process is in control. Positive autocorrelation of the process variables can result in severe negative bias in traditional estimators of the standard deviation. This bias produces control limits that are

198 Keoagile Thaga much tighter than desired. Lu and Reynolds [7] observed that tight control limits combined with autocorrelation in the variables can result in an average false alarm rate much higher than expected. This will lead to unnecessary efforts in searching for unavailable special causes of variation in the process and it also can lead to a loss of confidence in the control charts and finally in abandon them. Corrective action taken after the false alarms can also introduce variability into the process and make the control chart less effective and very expensive to use. Negative autocorrelation can lead to wider control limits which makes the chart insensitive to shifts in the process mean. It is therefore very important to take autocorrelation among the process variables into consideration when designing a process monitoring scheme, in particular control charts, in order to maximize full benefit from their use. Maragah and Woodall [9] observed that autocorrelation can be a source of variability. They proposed that if a process is being controlled to a target value and the cause of the autocorrelation can be found, then it should be removed from the process. The effect of autocorrelation has been studied for several types of control charts. Vasilopoulos and Stamboulis [19] have studied a modification of the X-control chart limits for correlated process variables within samples. Maragah and Woodall [9] studied the effect of autocorrelation on the retrospective X-chart. Atienza; Tang and Ang [1], Lu and Reynolds [7], and others studied the effect of autocorrelation on the cumulative sum charts. Lu and Reynolds [8] studied the effect of autocorrelation on the exponentially weighted moving average control charts. Recently, new control charts have been proposed for dealing with autocorrelated data. Two approaches have been advocated for dealing with this phenomenon. The first approach uses standard control charts on original observations, but adjust the control limits and methods of estimating parameters to account for the autocorrelation in the variable (see, VanBackle and Reynolds [18], Lu and Reynolds [8]). This approach is particularly applicable when the level of autocorrelation is not high. A second approach for dealing with autocorrelation fits time series model to the process variable. The procedure forecasts observations from previous values and then computes the forecast errors or residuals. These residuals are then plotted on standard control charts, because the residuals are independent and identically distributed normal random variables when the process is in control, when the fitted time series model is the same as the true process model and the parameters are estimated without error. (see, Montgomery and Mastrangelo [10]; Lu and Reynolds [8]; Runger, Willemain and Prabhu [12]; Cheng and Thaga [2]; Thaga; Kgosi and Gabaitiri [14]; and Thaga and Yadavalli [13]). Control charts based on residuals seem to work well when the level of autocorrelation is high. When the level of autocorrelation is low, forecasting is more difficult and residual charts are not very effective at detecting process changes. A fundamental assumption in the typical application of the above discussed control charts for autocorrelated processes is that the random errors are independent and identically distributed normal random variables. However, in some applications the process variables may have distributions with heavy tails. When the random errors follow a heavy tailed

Control Chart for Autocorrelated Processes with Heavy Tailed Distributions 199 distribution, the standard control charts based on the normality assumptions will not be appropriate since the standard error of the least squares estimates will be overestimated and thus the chart s control limits will be wide resulting in charts being slow to detect process shifts. Several charts based on outlier resistant statistics have been proposed for use when there are outliers in the process measurements. These include among others the charts whose control limits are calculated using the median, midrange and median range by Ferrell [3]. Langenberg and Iglewicz [6] proposed charts whose control limits are determined by the trimmed mean of the subgroup means and the trimmed mean of the ranges. White and Schoeder [21] proposed a chart constructed by plotting subgroup box plots. This chart uses the subgroup median and subgroup interquartile range. Rocke [11] proposed a series of robust control charts that uses combinations of subgroup trimmed and untrimmed mean, median, range and interquartile range. Most of the charts discussed above uses resistant statistics to determine the control limits and then monitor the subgroup means for the occurrence of the out-of-control signals. The median charts are less sensitive to process shifts since the median is not affected by outliers or extreme values. The chart which plots the mean and range with control limits determined from the subgroup means and the interquartile ranges is more effective in detecting mean shifts. For a heavy tailed distribution, the extreme observations are not necessarily outliers or signs of the presence of assignable causes of variation. Thavaneswaren and Thaga [15] proposed a chart based on the least absolute deviation that is effective in monitoring the process whose quality measurements follows a heavy tailed distribution. This chart is effective if the process variables are independent over time. We propose and show in this article that, for autocorrelated processes, when the process variables follow a heavy tailed distribution, a chart whose control limits are determined using the standard errors of the least absolute deviation estimators performs better than the chart whose control limits are calculated using the standard errors of the normal distribution based on ordinary least squares estimators (OLS). 2 Least Absolute Deviations Estimators Consider a regression model of the form Y i = h(x i,θ) + ǫ i (1) The ǫ i s are assumed to be independent for i = 1,...,n having symmetric distribution. A least absolute deviations (LAD) estimator of θ is a solution to the following problem: n min Y i h(x i,θ) (2) i=1

200 Keoagile Thaga Here the deviation between the response variable Y i and its approximation h(x i,θ) provided by the model is measured by the L 1 distance instead of the usually used L 2 distance, when studying the least squares estimate. A difficulty with the LAD method arises from the non-differentiability of the objective function in equation (2). This function, however, is continuous and continuously differentiable at every point except at zero, where there the left and right derivatives are unequal. Because of this, an approach similar to that of Thavaneswaran and Heyde [16] can be followed provided the derivative at every point is replaced by the right derivative, which is given below. + x x = I x 0 I x<0 i.e. the LAD estimating function is given by: glad(y, θ) n h(x i, = ˆθ n ) ( ) I θ Yi h(x i,ˆθ I n) 0 Y i h(x i,ˆθ n)<0 i=1 (3) (4) Y Let f be the conditional density function of such that f(0) > 0. Under suitable X regularity conditions Gourieroux and Monfort [4], it can be shown that the information associated with the estimating function is I 1 4f 2 (0) (5) where I = [ n E i=1 h(x iˆθn ) h(x iˆθn ) θ h(x i, ˆθ n θ ] (6) In a first-order autoregressive model, h(x i,ξ) = φx i 1 and X i = ξ + φx i 1 + ǫ i (7) where ξ and φ for 1 < φ < 1 are the process mean and autoregressive parameter respectively and the ǫ i s are uncorrelated random variables having the density function f(x) such that supf(x) < and f(0) > 0. The ǫ i s have the mean 0 and the variance σǫ. 2 It can be shown that the information associated with the LAD estimating function for the parameter xi is: ( 2E[ǫ 2 2f(0)) i] (8) 1 φ 2 The least absolute deviation estimating function is more efficient than the ordinary least squares (OLS) estimating function if the distribution of the error term is such that 4f 2 (0) 1, where σ 2 is the variance of the error term. For the case with normally σ 2 distributed errors, the least-squares estimating function is more efficient than the LAD estimating function. When the errors have a Cauchy distribution it can be shown that 4f 2 (0) 0 (see Gourieroux and Monfort [4]) and the LAD estimating function is more efficient than the least squares estimating function.

Control Chart for Autocorrelated Processes with Heavy Tailed Distributions 201 3 The New Control Chart When the process variables are correlated, the variable at time t can be represented as X t = ξ + φx t 1 + ǫ t. The Shewhart control chart for process variables assuming that the errors are normally distributed uses control limits given as: LCL = ˆθ 3 CL = ˆθ UCL = ˆθ + 3 where ˆθ = ξ 1 φ σ ǫ (1 φ 2 )n σ ǫ (1 φ 2 )n is the mean of the observations and σ ǫ (1 φ 2 )n (9) is the standard error of the least square estimator ˆθ. We propose a chart based on the LAD estimator as follows: The control limits for this chart are computed using the standard deviation of the LAD estimator which is more precise than the OLS estimator for heavy tailed distributions. These control limits are given as: LCL = ˆθ 3 2f(0)σǫ (1 φ 2 )n CL = ˆθ UCL = ˆθ + 3 2f(0)σǫ where 2f(0)σ ǫ (1 φ 2 )n (1 φ 2 )n (10) is the standard error of the LAD estimator. The chart will then be constructed by plotting the subgroup means against time or sample number with control limits given in equation (10). If the errors have a t-distribution with ν degrees of freedom then: ( [ Γ ν+1 ]) 2 4f 2 2 (0) = 4 νπ ( Γ [ ]) ν 2 2 (11) Since 4f 2 (0) < 1, the standard error for the LAD estimating function is always less than that of the OLS estimating function. Therefore a chart based on the LAD estimator will produce narrower control limits than a chart based on OLS estimator for heavy tailed autocorrelated processes. Figure 1 shows the ARL curves for the control charts for the LAD and OLS estimators. These charts are compared by adjusting their control limits so that the two control charts have the same in-control ARL. By looking at the out-of-control ARL, it can be concluded that the chart based on the LAD estimator is more sensitive than the chart based on the OLS estimator in detecting both small and large shifts in the process for heavy tailed autocorrelated processes. These charts ARLs were computed by fixing the level of autocorrelation at φ = 0.75. For example to detect a 2σ shift in the mean, a chart based on the OLS estimator detects this shift on an average with the 120 th sample, while the chart based on the LAD estimator detects this shift on an average with the 50 th sample and for a 3σ shift in the mean, a chart based on the OLS estimator detects this shift on

202 Keoagile Thaga an average with the 45 th sample while the chart based on the LAD estimator detects this shift on an average with the 19 th sample. Figure 1: The ARL curves for the OLS estimator and LAD estimator based control charts. 4 An Example In order to provide an illustration of how the Shewhart chart based on the normal distribution assumption and the proposed Shewhart-type chart based on the least absolute deviation respond to various kinds of process changes, a simulated set of data for autocorrelated variables is used. Specific process changes are introduced and the two charts are plotted to monitor these changes. We simulated the processes using a first order autoregressive model with the autoregressive parameter φ = 0.75. We have simulated 60 samples of size 5 for a process whose quality characteristics follow a t-distribution with 2 degrees of freedom. These data are used to construct charts shown in Figures 2 and 3. Figure 2 shows the standard Shewhart chart based on the normal distribution assumption and Figure 3 shows the chart constructed using the LAD estimator. The figures show the process is in control. To introduce a process change, we simulated the next 40 process variables using a t-distribution with 3 degrees of freedom. We added this data to the data simulated above and plotted them in Figures 4 and 5. The chart based on the normal distribution assumption shown in Figure 4 does not signal a shift in the process distribution. The chart that uses the LAD estimator which is shown in Figure 5 signals a shift in the distribution for the first time on the 69 th observation. The chart based on the OLS estimator does not detect this shift because it has wider control limits for heavy tailed data.

Control Chart for Autocorrelated Processes with Heavy Tailed Distributions 203 Figure 2: The OSL estimator based chart for an in control heavy tailed distribution. Figure 3: An LAD estimator based chart for an in control heavy tailed distribution.

204 Keoagile Thaga Figure 4: The OLS estimator based chart for an out of control heavy tailed distribution. Figure 5: An LAD estimator based chart for an out of control heavy tailed distribution. 5 Conclusion A control chart that is more effective in detecting shifts in the process for heavy tailed distributions than the normal distribution based control chart for autocorrelated processes is proposed in this article. This chart uses the standard error of the least absolute deviation to estimate the process variability. We use the LAD estimator because it provides more information about the process than the ordinary least squares estimator when the process follows heavy tailed distributions.

Control Chart for Autocorrelated Processes with Heavy Tailed Distributions 205 References [1] Atienza, O. O., Tang, L. C. and Ang, B. W. (2002): A CUSUM Scheme for Autocorrelated Observations. Journal of Quality Technology 34, 188-199. [2] Cheng, S.W. and Thaga, K (2005): Max-CUSUM Chart for Autocorrelated Processes. Statistica Sinica 15, 527-546. [3] Ferrell, E. B. (1953): Control Charts Using Midranges and Medians. Industrial Quality Control 9, 30-34. [4] Gourieroux, C. and Monfort, A. (1995): Statistics and Econometric Models, Vol 1. Cambridge University Press, Cambridge. [5] Harris, T. J. and Ross, W. H. (1991): Statistical Process Control Procedures for Correlated Observations. The Canadian Journal of Chemical Engineering 69, 48-57. [6] Langenberg, P. and Iglewicz, B. (1986): Trimmed Mean and R Charts. Journal of Quality Technology 18, 152-161. [7] Lu, C. W. and Reynolds, JR. M. R. (2001): CUSUM Charts for Monitoring an Autocorrelated Process. Journal of Quality Technology 33, 316-334. [8] Lu, C. W. and Reynolds, JR. M. R. (1999): Control Charts for Monitoring the Mean and Variance of Autocorrelated Processes. Journal of Quality Technology 31, 259-274. [9] Maragah, H. D. and Woodball, W. H. (1992): The Effect of Autocorrelation on the Retrospective X-Charts. Journal of Statistical Computing Simulation 40, 29-42. [10] Montgomery, D. C. and Mastrangelo, C. M. (1991): Some Statistical Process Control Methods for Autocorrelated Data. Journal of Quality Technology 23, 179-193. [11] Rocke, D.M. (1989): Robust Control Charts. Technometrics 31, 173-184. [12] Runger, G. C., Willemain, T. R. and Prabhu, S. (1995): Average Run Lengths for CUSUM Control Charts Applied to Residuals. Communications in Statistics - Theory and Methods 24, 273-282. [13] Thaga, K. and Yadavalli V.S.S. (2007): Max-EWMA Chart for Autocorreleted Processes. South African Journal of Industrial Engineering 18, 131-152. [14] Thaga, K., Kgosi, P.M. and Gabaitiri, L. (2007): Max-Chart for Autocorrelated Processes. Economic Quality Control 22, 87-105. [15] Thavaneswaran, A. and Thaga, K. (2007): Control Chart for Heavy Tailed Distributions. Submitted. [16] Thavaneswaran, A. and Heyde, C.C. (1999): Prediction via Estimating Functions. Journal of Statistical Planning and Inference 77, 89-101.

206 Keoagile Thaga [17] Thavaneswaran, A., Macpherson, B. D. and Abraham, B. (1998): An Application of Filtering to Statistical Process Control. Quality Improvement Through Statistical Methods. Birkhauser, Boston, 109-120. [18] VanBrackle III, L. N. and Reynolds, JR. M. R. (1997): EWMA and CUSUM Control Charts in the Presence of Correlation. Communications in Statistics - Simulation 26, 979-1008. [19] Vasilopoulus, A. V. and Stamboulis, A. P. (1978): Modification of Control Chart Limits in the Presence of Data Correlation. Journal of Quality Technology 10, 22-30. [20] Wardell, D. G., Moskowitz, H. and Plante, R. D. (1994): Run-Length Distributions of Special-Cause Control Charts for Correlated Processes. Technometrics 36, 3-17. [21] White, E. M. and Schroeder, R. (1987): A Simultaneous Control Chart. Journal of Quality Technology 19, 1-10. Keoagile Thaga Department of Statistics University of Botswana Private Bag 0022 Gaborone Botswana THAGAK@mopipi.ub.bw