ROBUST CHAUVENET OUTLIER REJECTION

Size: px

Start display at page:

Download "ROBUST CHAUVENET OUTLIER REJECTION"

Jesse Barnett
5 years ago
Views:

1 Submitted to the Astrophysical Journal Supplement Series Preprint typeset using L A TEX style emulateapj v. 12/16/11 ROBUST CHAUVENET OUTLIER REJECTION M. P. Maples, D. E. Reichart 1, T. A. Berger, A. S. Trotter, J. R. Martin, M. L. Paggen, R. E. Joyner, C. P. Salemi, D. A. Dutton Department of Physics and Astronomy, University of North Carolina at Chapel Hill, Chapel Hill, NC Submitted to the Astrophysical Journal Supplement Series ABSTRACT Sigma clipping is commonly used in astronomy for outlier rejection, but the number of standard deviations beyond which one should clip data from a sample ultimately depends on the size of the sample. Chauvenet rejection accounts for this, but, like sigma clipping, depends on the sample s mean and standard deviation. If these are not known in advance, which is generally the case (else why make the measurement in the first place), they must be measured from the sample itself, and consequently can be contaminated by the very outliers one is trying to reject. To this end, we present a variation on Chauvenet rejection that we call robust Chauvenet rejection, and show it to be significantly more effective for a wide variety of contaminant types, even when a significant, even dominant, fraction of the sample is contaminated, and especially when the contaminants are strong. Furthermore, we have developed a bulk-rejection variant, to significantly decrease computing times, and the technique can be applied both to weighted data and to model fitting. We are using the technique extensively as we develop the Skynet Robotic Telescope Network s image-processing library, particularly for single-dish radio mapping. The algorithm may be used by anyone at and the source code is available there as well. Subject headings: methods: statistical methods: data analysis 1. INTRODUCTION Whether combining multiple measurements into a single value, or fitting a model to multiple measurements, outliers meaning contaminants due to other physical processes, drawn from different statistical distributions, including errors in measurement can result in incorrect inferences. Sigma clipping is a method of outlier rejection that is commonly used in astronomy, where data are rejected if more than a certain number of standard deviations from the sample s mean, assuming that the sample is otherwise distributed normally. For example, it is a staple of aperture photometry, used to reject signal above the noise (other sources, cosmic rays, bad pixels, etc.) when measuring the background level in a surrounding annulus. Sigma clipping, however, is crude in a number of ways, the first being where to set the threshold. For example, if working with 100 data points, 2-sigma variations are expected but 4-sigma variations are not, so one might choose to set the threshold between 2 and 4. However, if working with 10 4 points, 3-sigma variations are expected but 5-sigma variations are not, in which case a greater threshold should be applied. Chauvenet rejection is a generalization of sigma clipping that takes into account the number of points in the sample (Chauvenet 1863). The criterion for Chauvenet rejection is: NP(> z ) < 0.5, (1) where N is the number of points and P(> z ) is the cumulative probability of being more than z standard deviations from the mean, assuming a Gaussian distribution. We apply this criterion iteratively, rejecting only 1 reichart@unc.edu one point at a time for increased stability, but consider the case of (bulk) rejecting all points that meet this criterion each iteration in 9. In either case, after each iteration (1) we lower N by the number of points that we rejected, and (2) we re-estimate the mean and standard deviation, which are used to compute each point s z value, from the remaining points. However, traditional Chauvenet rejection, as well as sigma clipping, suffers from neither the mean nor the standard deviation being robust both are sensitive to the very outliers that they are being used to reject. In this paper, we develop and evaluate increasingly robust replacements for both of these quantities. The median is a common, robust replacement for the mean, being both less sensitive to outliers and easy to compute. The mode is even more robust, but not uniquely defined for continuously distributed data. We adopt a half-sample mode approach (e.g., Bickel & Frühwirth 2005), and define how we compute both quantities in 2. The median and the mode equal the mean in the limit of a Gaussian distribution, but measure it with decreasing precision. In 3, we develop three, increasingly robust replacements for the standard deviation. In 4, we calibrate these 2 3 = 6 techniques, as well as two less-robust, comparison techniques that use the mean and standard deviation, using uncontaminated data. In 5, we consider the case of two-sided contaminants, meaning that outliers are as likely to be positive as negative. In this case, the mean, median, and mode are all three, on average, insensitive to outliers, even in the limit of a large fraction of the sample being contaminated. Consequently, this is a good case to evaluate our replacements for the standard deviation. In 6, we consider the more challenging case of onesided contaminants, meaning that the outliers are pre-

2 2 Maples et al. dominantly positive(or negative). In this case, the mean, median, and mode are all three increasingly sensitive to contaminants as their fraction of the sample increases, with the mean being the least robust and the mode being the most robust. In 7, we consider the case of rejecting outliers from mildly non-gaussian distributions. In 5, 6, and 7, weshowthatrobustchauvenetrejection is more accurate but less precise than regular Chauvenet rejection. In 8, we show that by applying robust and regular Chauvenet rejection in sequence, the results are both accurate and precise. In 9, we evaluate the effectiveness of bulk rejection, which can be significantly less demanding computationally. In 10, we consider the case of weighted data. In 11, we exercise both of these techniques with an astronomical example. In 12, we show how robust Chauvenet rejection can be applied to model fitting. In 13, we compare robust Chauvenet rejection to Peirce rejection, which is a wellknown, non-iterative outlier rejection technique that can also be applied to model fitting, and that has a reputation of being superior to regular Chauvenet rejection. We summarize our findings in MEDIAN AND MODE Insteadoftheusingthemean, whichissensitivetooutliers and consequently not robust, we use (1) the median and (2) the mode. The median is given by sorting a data set and taking its middle value if the number of data points, N, is odd, and the average of its two middle values if N is even. Adopting a half-sample mode approach (e.g., Bickel & Frühwirth 2005), we define the mode as follows. Sort the data, x i, and for every index j in the first half of the data set, including the middle value if N is odd, let k be the largest integer such that: k j +0.5N. (2) Of these (j,k) combinations, select the one for which x k x j is smallest. If multiple combinations meet this criterion, let j be the smallest of their j values and k be the largest of their k values. Restricting oneself to only the k j+1 values between and including j and k, repeat this procedure, iterating to completion. Take the median of the final k j +1 (typically two) values PERCENTILE DEVIATION Instead of using the standard deviation, which is also sensitive to outliers, we sort the absolute value of the deviations from either (1) the median or (2) the mode, and either measure or model the 68.3-percentile value, in three, increasingly robust ways. The first way is to simply take the 68.3-percentile value from the sorted distribution. This is equivalent to the standard deviation in the limit of a Gaussian distribution, andworkswellaslongaslessthan40% 70%ofthe measurements are contaminated (see 4 and 5). However, sometimes a greater fraction of the sample may be contaminated. In this case, we model the 68.3-percentile deviation from the lower-deviation measurements. Consider the case of N measurements, distributed normally and sorted by the absolute value of their deviations Fig sorted deviations from the median, all drawn from a Gaussian distribution of standard deviation σ = 1. The measured 68.3-percentile deviation is also 1. from µ (equal to either the median or the mode). If weighted uniformly (however, see 10), the percentile of the ith element is given by: i 1+ i N ( ) = P < δ i σ, (3) where P(< δ i /σ ) is the cumulative probability of being within δ i /σ standard deviations of the mean, δ i is the ith sorted deviation, σ is the 68.3-percentile deviation, and 0 < i < 1 is the bin center. We set i = to yield intuitive results in the limit that N 1 and µ is known a priori ( 10). Solving for δ i yields: [ ( )] 2erf 1 i δ i = σ. (4) N Consequently, if plotted δ i vs. 2erf 1 [(i 0.317)/N], the distribution is linear, and the slope of this line yields σ (see Figure 1). However, if a fraction of the sample is contaminated, the shape of the distribution changes: The slope steepens, and (1) if the value from which the deviations are measured (the median or the mode) still approximates that of the uncontaminated measurements, and (2) if the contaminants are drawn from a sufficiently broader distribution, the curve breaks upward (see Figure 2, upper left). 2 Consequently, we model the 68.3-percentile deviation of the uncontaminated measurements in three, increasingly accurate ways: (1) by simply using the percentile value, as described above (e.g., Figure 2, upper right); (2) by fitting a zero-intercept line to the 2erf 1 [(i 0.317)/N] < 2erf 1 (0.683) = 1 data and using the fitted slope (e.g., Figure 2, lower left), and (3) by fitting a broken line of intercept zero (see Ap- 2 If the median or the mode no longer approximates that of the uncontaminated measurements, the curve can instead break downward, making the following three 68.3-percentile deviation measurement techniques decreasingly robust, instead of increasingly robust (see 6, Figure 17).

3 Robust Chauvenet Rejection 3 Fig. 2. Upper left: 100 sorted deviations from the median, with fraction f 1 = 0.5 drawn from a Gaussian distribution of standard deviation σ 1 = 1, and fraction f 2 = 0.5, representing contaminated measurments, drawn from a Gaussian distribution of standard deviation σ 2 = 10. Upper right: Zoom-in of the upper-left panel, with the 68.3-percentile deviation measured using technique 1, yielding a prerejection value of σ 1 = Lower left: Zoom-in of the upper-left panel, with the 68.3-percentile deviation measured using technique 2, yielding a pre-rejection value of σ 1 = Lower right: Zoom-in of the upper-left panel, with the 68.3-percentile deviation measured using technique 3, yielding a pre-rejection value of σ 1 = See Figure 3 for post-rejection versions and measured values. pendix A for fitting details) to the same data and using the fitted slope of the first component (e.g., Figure 2, lower right). We then iteratively Chauvenet reject the greatest outlier ( 1), using either (1) the median or (2) the mode instead of the mean, and the 68.3-percentile deviation instead of the standard deviation. 3 The effect of this on the data presented in Figure 2 can be seen in Figure 3, for each of our three, increasingly robust, 68.3-percentile deviation measurement techniques. 3 With the following exception: We never reject down to a sample of identical measurements. In the standard case of producing a single measurement from multiple, we always leave at least two distinct measurements. In the more general case of fitting a multiple-parameter model to multiple measurements (see 12), we always leave at least M +1 distinct measurements, where M is the number of model parameters. 4. CALIBRATION We begin by calibrating the 2 3 = 6 techniques introducedin 2and 3, aswellastwoless-robust, comparison techniques, using uncontaminated data. The comparison techniques use the mean and standard deviation(1) without and (2) with iterated Chauvenet rejection. For each sample size 2 N 100, as well as for N = 1000, we drew 100,000 samples from agaussian distribution of mean µ = 0 and standard deviation σ = 1, and then recovered µ and σ using each technique. Averaged over the 100,000 samples, the recovered value of µ was always 0, and the recovered value of σ was 1 in the limit of large N. However, all of the techniques, includingthetraditional, comparisontechniques, 4 under- 4 It is well known that although the variance can be computed

4 4 Maples et al. estimated σ in the limit of small N (see Figure 4). In Figure 4, we plot correction factors by which measured standard and 68.3-percentile deviations need to be multiplied to yield the correct result, on average. We make use of these correction factors throughout this paper, to prevent overaggressive rejection in small samples. 5. TWO-SIDED CONTAMINANTS We now evaluate the effectiveness of the 2 3 = 6 techniques introduced in 2 and 3, and of the two traditional, comparison techniques introduced in 4, at rejecting outliers. For sample sizes N = 1000, 100, and 10, we draw f 1 N uncontaminated measurements from a Gaussian distribution of mean µ 1 = 0 and standard deviation σ 1 = 1, and f 2 N contaminated measurements, where f 2 = 1 f 1. In this section, we model the contaminants as two-sided, meaning that outliers are as likely to be positive as negative. We draw them from a Gaussian distribution of mean µ 2 = 0 and standard deviation σ 2, and add them to uncontaminated measurements, drawn as above. In the case of two-sided contaminants, the mean, median, and mode are all three, on average, insensitive to outliers, even in the limit of a large fraction of the sample being contaminated (f 2 1; see Figure 6). Consequently, this is a good case to evaluate the effectiveness of our three, increasingly robust, 68.3-percentile deviation techniques. (We explore the more challenging case of one-sided contaminants in 6.) For each technique and sample size, we draw 100 samples for each combination of f 2 = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1 and σ 2 = 1, 1.6, 2.5, 4.0, 6.3, 10, 16, 25, 40, 63, 100 (see Figure 5), and plot the average recovered µ 1 in Figure 6, the uncertainty in the recovered µ 1 in Figure 7, the average recovered σ 1 in Figure 8, and the uncertainty in the recovered σ 1 in Figure 9. As expected with two-sided contaminants, the average recovered µ 1 is always 0. However, the uncertainty in the recovered µ 1, the average recovered σ 1, and the uncertainty in the recovered σ 1 are all susceptible to contamination, especially when f 2 and σ 2 are large. However, our increasingly robust 68.3-percentile deviation measurement techniques are increasingly effective at rejecting outliers in large-f 2 samples, allowing σ 1 to be measured significantly more accurately, and both µ 1 and σ 1 to be measured significantly more precisely. Note that this is at a marginal cost: When applied to uncontaminated samples, our increasingly robust measurement techniques recover µ 1 and σ 1 with degrading precisions (Figures 7 and 9). This suggests that one can reach a point of diminishing returns; however, this is a drawback that we largely eliminate in 8. Given this, when robust Chauvenet rejecting two-sided contaminants, we recommend using (1) the median (because it is just as accurate as the mode (in this case), more precise, and computationally faster) and (2) the 68.3-percentile deviation as measured by technique 3 from 3 (the broken-line fit). without bias using Bessel s correction, the standard deviation cannot, and the correction depends on the shape of the distribution. For a normal distribution, without rejection of outliers, the cor- rection is given by Γ N 1 2 ( ) N 1 2 Γ( N 2, which matches what we determined empirically, and plot in the upper-left panel of Figure 4 ) (solid black curve). 6. ONE-SIDED CONTAMINANTS Wenowrepeattheanalysisof 5, butforthemorechallenging case of one-sided contaminants, which we model by drawing values from only the positive side of a Gaussian distribution of mean µ 2 = 0 and standard deviation σ 2. This case is more challenging because even though the median is more robust than the mean, and the mode ismorerobustthanthemedian, eventhemodewillbebiased in the direction of the contaminants (see Figure 10), and increasingly so as the fraction of the sample that is contaminated increases (see Figures 11 and 12). Furthermore, as µ (equal to the mean, the median, or the mode) becomes more biased in the direction of the one-sided contaminants, σ (equal to the standard deviation or the 68.3-percentile deviation, as measured by any of the techniques presented in 3) becomes more biased as well, (1) because of the contaminants, and (2) because it is measured from µ. However, σ can be measured with less bias, if measured using only the deviations that are in the opposite direction as the contaminants (in this case, the deviations below µ; Figure 11). Since the direction of the contaminants might not be known a priori, or since the contaminants might not be fully one-sided, instead being between the cases presented in 5 and 6, we measure σ both below and above µ, 5 and use the smaller of these two measurements when rejecting outliers (Figure 12). Note, using the smaller of these two measurements should only be done if the uncontaminated measurements are normally distributed (see 7). For the same techniques presented in 5, except now computing σ both below and above µ and adopting the smaller of the two, and for the same sample sizes presented in 5, we plot the average recovered µ 1 in Figure 13, the uncertainty in the recovered µ 1 in Figure 14, the average recovered σ 1 in Figure 15, and the uncertainty in the recovered σ 1 in Figure 16. With one-sided contaminants, all four of these are susceptible to contamination, especially when f 2 and σ 2 are large. However, for a fixed σ-measurement technique, our increasingly robust µ-measurement techniques are increasingly effective at rejecting outliers in large-f 2 samples, allowing µ 1 and σ 1 to be measured both significantly more accurately and significantly more precisely. However, when µ 1 cannot be measured accurately, as is the case with the mean and the median when f 2 is large (Figures 10, 11, and 12), our (otherwise) increasingly robust σ-measurement techniques are decreasingly effective at rejecting outliers (see Figure 17). However, the mode can measure µ 1 significantly more accurately (Figures 10, 11, and 12), even when f 2 is large, though with decreasing effectiveness in the low-n limit. In any case, when µ 1 is measured accurately, all of these techniques are nearly equally effective, because σ 1 is measured on the nearly uncontaminated side of each sample s distribution. Given this, when robust Chauvenet rejecting one-sided contaminants, we recommend using (1) the mode, and (2) the 68.3-percentile deviation as measured by technique 1 from 3 (the 68.3% value, because it is essentially as accurate as the other techniques (in this 5 When computing σ below or above µ, if a measurement equals µ, we include it in both the below and above calculations, but with 50% weight for each (see 10).

5 Robust Chauvenet Rejection 5 Fig. 3. Figure 2, after iterated Chauvenet rejection. Upper left: Using the 68.3-percentile deviation from technique 1, yielding a final measured value of σ 1 = Upper right: Zoom-in of the upper-left panel. Middle left: Using the 68.3-percentile deviation from technique 2, yielding a final measured value of σ 1 = Middle right: Zoom-in of the middle-left panel. Lower left: Using the 68.3-percentile deviation from technique 3, yielding a final measured value of σ 1 = Lower right: Zoom-in of the lower-left panel.

6 6 Maples et al. Fig. 4. Correction factors by which standard and 68.3-percentile deviations, measured from uncontaminated data, need to be multiplied to yield the correct result, on average, and to avoid overaggressive rejection of outliers in small samples, (1) for the case of no rejection, using the mean and standard deviation (solid black curves; see Footnote 3); (2) for the case of Chauvenet rejection, using the mean and standard deviation (dashed black curves); (3) for the case of Chauvenet rejection, using the median and 68.3-percentile deviation as measured by technique 1 from 3 (solid red curves), as measured by technique 2 from 3 (solid green curves), and as measured by technique 3 from 3 (solid blue curves); and (4) for the case of Chauvenet rejection, using the mode and 68.3-percentile deviation as measured by technique 1 (dotted red curves), technique 2 (dotted green curves), and technique 3 (dotted blue curves). Upper left: For the simplest case of computing a single σ (standard or 68.3-percentile deviation), using the deviations both below and above µ (the mean, the median, or the mode; see 5). Lower left: For the case of computing separate σ below and above µ (σ and σ +, respectively) and using the smaller of the two when rejecting outliers (see 6). Lower right: For the same case, but using σ to reject outliers below µ and σ + to reject outliers above µ (see 7). Note that technique 3 defaults to technique 2 when the two are statistically equivalent (see Appendix A), or when fitting to fewer than three points (e.g., when N < 4 for the cases in the top row and when N < 7 for the median cases in the bottom row). Similarly, technique 2 defaults to technique 1 when fitting to fewer than two points (e.g., when N < 3 for the cases in the top row and when N < 5 for the median cases in the bottom row). Oscillations are not noise, but odd-even effects (e.g., with equally weighted data, when N is odd, use of the median always results in at least one zero deviation, requiring a larger correction factor). We use look-up tables for N 100 and power-law approximations for N > 100 (see Appendix B).

7 Robust Chauvenet Rejection 7 Fig. 5. Blank contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) figure. Each pixel corresponds to either a recovered quantity (µ 1 or σ 1 ) or the uncertainty in a recovered quantity ( µ 1 or σ 1 ), measured from 100 samples with contaminants modeled by f 2 and σ 2. This figure is provided as reference, as axis information would be too small to be easily readable in upcoming figures. case), more precise, 6 and computationally faster). When robust Chauvenet rejecting contaminants that are neither one-sided nor two-sided, but an in-between case, with contaminants that are both high and low, but not in equal proportion or strength, we recommend using the smaller of the below- and above-measured percentile deviations, as in the one-sided case, but recommend using (1) the mode (which is just as effective as the median at eliminating two-sided contaminants ( 5), but more effective at eliminating one-sided contaminants), and (2) the 68.3-percentile deviation as measured by technique 3 from 3 (the broken line fit, which is more effective than the other techniques at eliminating twosided contaminants ( 5) and essentially as effective at eliminating one-sided contaminants). 7. NON-NORMAL UNCONTAMINATED DISTRIBUTIONS In 5 and 6, we assumed that the uncontaminated measurements were drawn from a Gaussian distribution. Although this is often a reasonable assumption, sometimesonemightneedtoadmitthepossibilityofanasymmetric (see 7.1) or a peaked or flat-topped (see 7.2) distribution for the uncontaminated measurements Asymmetric Uncontaminated Distributions In this case, it is better to use the σ (equal to the standard deviation or the 68.3-percentile deviation, as measured by any of the techniques presented in 3) measured from the deviations below µ (equal to the mean, the median, or the mode) to reject outliers below µ, and the σ measured from the deviations above µ to reject outliers above µ, assuming that the distribution is only 6 As in the case of two-sided contaminants, when applied to uncontaminated samples, our increasingly robust measurement techniques recover µ 1 and σ 1 with degrading precisions (Figures 14 and 16), but again, this is a drawback that we largely eliminate in 8. mildly non-normal, even if this means not always using the smaller of the two σ values, as can be done with normally distributed uncontaminated measurements ( 6). However, this weakens one s ability to reject outliers, particularly when one-sided contaminants dominate the sample. Even if the uncontaminated measurements are not asymmetrically distributed, simply admitting the possibility can significantly reduce one s ability to remove contaminants, so this is a decision that should be made with care. To demonstrate this, we repeat the analysis of 6, not changing the uncontaminated measurements, but changing the assumption that we make about their distribution, instead admitting the possibility of asymmetry. We plot the average recovered µ 1 in Figure 18, the uncertaintyintherecoveredµ 1 infigure19, theaveragerecovered below-measured σ 1 in Figure 20, the uncertainty in the recovered σ 1 in Figure 21, the average recovered above-measured σ 1+ in Figure 22, and the uncertainty in the recovered σ 1+ in Figure 23. The results should be similar to the one-sided contaminant results for µ 1 (Figure 13), µ 1 (Figure 14), σ 1 (Figure 15), and σ 1 (Figure 16); and to the two-sided contaminant results for σ 1+ (Figure 8) and σ 1+ (Figure 9), but for about half as many measurements. This appears to be the case, especially with the more robust techniques. Since this case approximates both one-sided and two-sided results, when robust Chauvenet rejecting contaminants, we recommend using (1) the mode and (2) the 68.3-percentile deviation as measured by technique 3 from 3 (the broken-line fit) for the same reasons that we recommend using this combination when rejecting in-between contaminants from normally distributed uncontaminated measurements ( 6). Itshouldbenotedthatifwealsochangetheuncontaminated measurements to be asymmetrically distributed, instead of merely admitting the possibility that they are asymmetrically distributed, the mean, median, and mode then mean different things, in the sense that they mark different parts of the distribution, even in the limit of large N and no contaminants. Furthermore, deviations, however measured, from each of these µ measurements likewise then mean different things. A deeper exploration of these differences, and of their effects on contaminant removal, is beyond the scope of this paper. However, as long as the asymmetry is mild, the effectiveness of this technique should not differ greatly from what has been presented here. It should also be noted that in the simpler case of twosided contaminants, this technique differs very little from what has been presented in 5, except that σ 1, σ 1, σ 1+, and σ 1+ are each determined with about half as many measurements (the measurements on each quantity s side of µ 1 ) Peaked or Flat-Topped Uncontaminated Distributions Consider the following generalization of the Gaussian (technically called an exponential power distribution): κ p(δ) = e 1 2 σ δ 2κ, (5) 2πσ which reduces to a Gaussian when κ = 1, but results in

$fraction of sample (f 2 ) axis information.$ $of contaminant fraction or strength.$

8 Fig. 6. Average recovered µ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for two-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. As expected with two-sided contaminants, the recovered values are 0, independent of contaminant fraction or strength. Variation about zero is due to drawing only 100 samples, and is larger for larger values of f 2 and σ 2, and for smaller values of N (see Figure 7). The colors are scaled logarithmically, and cut off at 0.02, to match the upcoming figures, permitting direct comparison of colors between figures. 8 Maples et al.

$fraction of sample (f 2 ) axis information.$ $contaminant fractions and strengths, as well as smaller sample sizes, result in less precise$ recovered values of µ 1.

outliers in large-f 2 samples, allowing µ 1 to be measured significantly more precisely.

9 Fig. 7. Uncertainty in the recovered µ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for two-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths, as well as smaller sample sizes, result in less precise recovered values of µ 1. However, increasingly robust measurement techniques are increasingly effective at rejecting outliers in large-f 2 samples, allowing µ 1 to be measured significantly more precisely. Note that this is at a marginal cost: When applied to uncontaminated samples (f 2 = 0), these techniques recover µ 1 with degrading precisions, of µ 1 /σ 1 1.0N 1/2, 1.0N 1/2, 1.3N 1/2, 1.3N 1/2, 1.3N 1/2, 7.6N 1/2, 7.6N 1/2, and 7.6N 1/2, respectively. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 9

$fraction of sample (f 2 ) axis information.$ $column: Larger contaminant fractions and strengths result in larger recovered$

10 Fig. 8. Average recovered σ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for two-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths result in larger recovered values of σ 1. However, increasingly robust measurement techniques are increasingly effective at rejecting outliers in large-f 2 samples, allowing σ 1 to be measured significantly more accurately. The colors are scaled logarithmically, between 0.02 and Maples et al.

$fraction of sample (f 2 ) axis information.$ $The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths, as well as smaller sample sizes, result in less precise recovered$

11 Fig. 9. Uncertainty in the recovered σ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for two-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths, as well as smaller sample sizes, result in less precise recovered values of σ 1. However, increasingly robust measurement techniques are increasingly effective at rejecting outliers in large-f 2 samples, allowing σ 1 to be measured significantly more precisely. Note that this is at a marginal cost: When applied to uncontaminated samples (f 2 = 0), these techniques recover σ 1 with degrading precisions of σ 1 /σ 1 0.7N 1/2, 0.8N 1/2, 1.0N 1/2, 1.0N 1/2, 2.3N 1/2, 1.5N 1/2, 1.5N 1/2, and 2.8N 1/2, respectively. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 11

12 12 Maples et al. Fig. 10. Left: 1000 measurements, with fraction f 1 = 0.15 drawn from a Gaussian distribution of mean µ 1 = 0 and standard deviation σ 1 = 1, and fraction f 2 = 0.85, representing contaminated measurements, drawn from the positive side of a Gaussian distribution of mean µ 2 = 0 and standard deviation σ 2 = 10, and added to uncontaminated measurements, drawn as above. The measurements have been binned, and the mean (solid red line), median (solid green line), and mode (solid blue line) have been marked. The dashed black curve marks the theoretical, or large-n, distribution, and for this the mean, median, and mode have also been marked, with dashed lines. Right: Zoom-in of the left panel, with smaller bins. A large f 2 was chosen to more clearly demonstrate that the mode is biased in the direction of the contaminants, albeit only marginally. Also, the sample mode differs from the theoretical mode more than the sample median and mean differ from the theoretical median and mean, due to noise peaks, caused by random sampling. This is typical, and why the mode, although significantly more accurate, is less precise. peaked (positive-kurtosis) distributions when κ < 1 and flat-topped (negative-kurtosis) distributions when κ > 1 (see Figure 24). The standard deviation of this distribution is σ/ κ. For this distribution, Chauvenet s criterion (Equation 1) implies that measurements are rejected if their deviations are greater than a certain number of σ/ κ (standard deviations), instead of σ, as in the pure Gaussian case. Furthermore, Equation 4 becomes: δ i = σ [ ( )] 2erf 1 i 0.317, (6) κ N which is proportional to σ/ κ, instead of σ. Consequently, the techniques presented in this paper work identically if the uncontaminated measurements are distributed not normally but peaked or flat-topped in this specific way. Of course, not all peaked and flat-topped distributions are of this specific form. However, if only mildly peaked or flat-topped, this form is a good, first-order approximation, and consequently we conclude that the techniques presented in this paper are not overly sensitive to our assumption of Gaussianity, for the uncontaminated measurements. 8. ACCURACY VS. PRECISION In general, we have found that the mode is just as accurate (in the case of two-sided contaminants) or more accurate (in the case of one-sided contaminants) than the median, yet the mode is up to 5.8 times less precise than the median, and up to 7.7 times less precise than the mean. We have also found that when µ (equal to the median or the mode) is measured accurately, our increasingly robust 68.3-percentile deviation measurement techniques are either equally accurate (in the case of onesided contaminants) or increasingly accurate (in the case of two-sided contaminants), yet technique 3 (the brokenline fit) is up to 2.2 times less precise than technique 2 (the linear fit), up to 2.4 times less precise than technique 1 (the 68.3% value), and up to 3.6 times less precise than the standard deviation. Consequently, there appears to be a trade-off between accuracy and precision. But can we have the best of both? In this section, we evaluate using our robust Chauvenet rejection techniques in sequence with regular Chauvenet rejection. Regular Chauvenet rejection, which uses the mean and standard deviation, is the most susceptible to contamination by outliers, but is also the most precise, when not significantly contaminated by outliers. By applying robust Chauvenet rejection first, we best eliminate the outliers that contaminate regular Chauvenet rejection, allowing us to capitalize on its precision without its inaccuracy. We demonstrate the success of this approach for the following combinations of contaminant types and bestoption robust Chauvenet rejection techniques: The median + technique 3 (the broken-line fit) is the best option for two-sided contaminants (contaminants that are both high and low, in equal proportion and strength; 5). We plot the average recovered µ 1, the uncertainty in the recovered µ 1, the average recovered σ 1, and the uncertainty in the recovered σ 1 in the third column of Figures 25 28, respectively. The mode + technique 1 (the 68.3% value) is the

13 Robust Chauvenet Rejection 13 Fig measurements, with fraction f 1 = 1 f 2 drawn from a Gaussian distribution of mean µ 1 = 0 and standard deviation σ 1 = 1, and fraction f 2 = 0.15 (top row), 0.5 (middle row), and 0.85 (bottom row), representing contaminated measurements, drawn from the positive side of a Gaussian distribution of mean µ 2 = 0 and standard deviation σ 2 = 10, and added to uncontaminated measurements, drawn as above. Left column: Median (black line) and 68.3-percentile deviations, measured both below and above the median, using technique 1 from 3 (red lines), using technique 2 from 3 (green lines), and using technique 3 from 3 (blue lines). Right column: Same as the left column, except using the mode instead of the median. The mode performs better, especially in the limit of large f 2. The 68.3-percentile deviation performs better when paired with the mode, and when measured in the opposite direction as the contaminants. See Figure 12 for post-rejection versions.

14 14 Maples et al. Fig. 12. Figure 11, after iterated Chauvenet rejection, using the smaller of the below- and above-measured 68.3-percentile deviations, in this case as measured by technique 1 from 3. Techniques 2 and 3 from 3 yield similar post-rejection samples and µ and σ measurements. The mode continues to perform better in the limit of large f 2.

15 Fig. 13. Average recovered µ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for one-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths result in larger recovered values of µ 1. However, for a fixed σ-measurement technique, our increasingly robust µ-measurement techniques are increasingly effective at rejecting outliers in large-f 2 samples, allowing µ 1 to be measured significantly more accurately. However, when µ 1 cannot be measured accurately, as is the case with the mean and the median when f 2 is large (Figures 10, 11, and 12), our (otherwise) increasingly robust σ-measurement techniques are decreasingly effective at rejecting outliers (see Figure 17). This can be seen in columns 3 5, which use the 68.3-percentile deviation as measured by technique 1 from 3, the 68.3-percentile deviation as measured by technique 2 from 3, and the 68.3-percentile deviation as measured by technique 3 from 3, respectively. However, the mode can measure µ 1 significantly more accurately (Figures 10, 11, and 12), even when f 2 is large, though with decreasing effectiveness in the low-n limit. In any case, when µ 1 is measured accurately, all of these techniques are nearly equally effective, because σ 1 is measured on the nearly uncontaminated side of each sample s distribution. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 15

16 Fig. 14. Uncertainty in the recovered µ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for one-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths, as well as smaller sample sizes, result in less precise recovered values of µ 1. However to the degree that µ 1 can be measured accurately (Figure 13) all of our Chauvenet rejection techniques are effective at removing outliers (and nearly equally so, since σ 1 is measured on the nearly uncontaminated side of each sample s distribution), allowing µ 1 to be measured significantly more precisely. Note that, as in the case of two-sided contaminants (Figure 7), when applied to uncontaminated samples (f 2 = 0), these techniques recover µ 1 with degrading precisions, of µ 1 /σ 1 1.0N 1/2, 1.0N 1/2, 1.3N 1/2, 1.3N 1/2, 1.3N 1/2, 7.4N 1/2, 7.4N 1/2, and 7.4N 1/2, respectively. The colors are scaled logarithmically, between 0.02 and Maples et al.

17 Fig. 15. Average recovered σ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for one-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths generally result in larger recovered values of σ 1. However to the degree that µ 1 can be measured accurately (Figure 13) all of our Chauvenet rejection techniques are effective at removing outliers (and nearly equally so, since σ 1 is measured on the nearly uncontaminated side of each sample s distribution), allowing σ 1 to be measured significantly more accurately. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 17

$fraction of sample (f 2 ) axis information.$

18 Fig. 16. Uncertainty in the recovered σ 1 for increasingly robust measurement techniques and decreasing sample sizes (N), for one-sided contaminants. See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. The effect of the contaminants, without rejection, can be seen in the first column: Larger contaminant fractions and strengths, as well as smaller sample sizes, generally result in less precise recovered values of σ 1. However to the degree that µ 1 can be measured accurately (Figure 13) all of our Chauvenet rejection techniques are effective at removing outliers (and nearly equally so, since σ 1 is measured on the nearly uncontaminated side of each sample s distribution), allowing σ 1 to be measured significantly more precisely. Note that, as in the case of two-sided contaminants (Figure 9), when applied to uncontaminated samples (f 2 = 0), these techniques recover σ 1 with degrading precisions of σ 1 /σ 1 0.8N 1/2, 0.9N 1/2, 1.2N 1/2, 1.3N 1/2, 2.7N 1/2, 2.5N 1/2, 2.8N 1/2, and 3.9N 1/2, respectively. The colors are scaled logarithmically, between 0.02 and Maples et al.

19 Robust Chauvenet Rejection 19 Fig. 17. Left: Sorted deviations from below the median of 100 measurements. A fraction f 1 = 0.15 of these measurements are drawn from a Gaussian distribution of mean µ 1 = 0 and standard deviation σ 1 = 1, and a fraction f 2 = 0.85, representing contaminated measurements, are drawn from the positive side of a Gaussian distribution of mean µ 2 = 0 and standard deviation σ 2 = 10, and added to uncontaminated measurements, drawn as above. The standard deviation, measured below the median, is marked (black arrow). Right: Zoom-in of the left panel, but with the 68.3-percentile deviation, also measured below the median, using technique 1 from 3 (68.3% value, red), using technique 2 from 3 (linear fit, green), and using technique 3 from 3 (broken-line fit, blue), instead marked. In this case, the median significantly overestimates µ 1, measuring 5.81 instead of 0, and consequently the curve breaks downward instead of upward. When this happens, our normally increasingly robust σ-measurement techniques are decreasingly accurate, measuring σ 1 = 4.58, 5.11, 5.78, and 6.32, respectively, instead of 1. In other words, these techniques are only increasingly robust if µ 1 is measured sufficiently accurately. This is the case with the mode, even when f 2 is large, but is not the case with the mean and the median when f 2 is large (Figures 10 and 11), even post-rejection (Figure 12). best option for one-sided contaminants ( 6). We plot the average recovered µ 1, the uncertainty in the recovered µ 1, the average recovered σ 1, and the uncertainty in the recovered σ 1 in the third column of Figures 29 32, respectively. The mode + technique 3 (the broken-line fit) is the best option for in-between cases (contaminants that are both high and low, but not in equal proportion or strength; 6), and/or if the uncontaminated distribution is taken to be asymmetric ( 7). The former case we similarly demonstrate in the limit of two-sided contaminants in Figures 33 36, and in the limit of one-sided contaminants in Figures The latter case behaves very similarly to Figures in the limit of two-sided contaminants, and similarly to Figure 37 (µ 1 ), Figure 38 ( µ 1 ), Figure 39 (σ 1 ), Figure 40 ( σ 1 ), Figure 35 (σ 1+ ), and Figure 36 ( σ 1+ ), in the limit of (positive) one-sided contaminants, but we will not explore this case further in this paper ( 7). In all cases, robust Chauvenet rejection followed by regular Chauvenet rejection results in vastly improved precisions comparable to those of regular Chauvenet rejection when not significantly contaminated by outliers with only small compromises in accuracy. The small compromises in accuracy, when they occur, are due to the robust techniques not eliminating enough outliers before regular Chauvenet rejection is applied. We further improve on this by sequencing (1) our bestoption robust technique from above, (2) our most precise robust technique the median + technique 1 (the 68.3% value) to eliminate more outliers before (3) regular Chauvenet rejection. In nearly all cases, this either leaves the accuracies and the precisions the same, or improves them, by as much as 30%. These are worthwhile gains, particularly given the computational efficiency of theadditionalstep, buttheyarealsodifficulttoseegiven the logarithmic scaling that we use in Figures Consequently, we instead plot the improvement over column 3, multiplied by 100, in column 4. Both of these sequencing techniques, as well as a bulkrejection variant of the latter technique that we present in 9, require the calculation of new correction factors, which we do as in 4 and plot in Figure BULK REJECTION So far, we have rejected only one outlier the most discrepant outlier at a time, recomputing µ 1 and σ 1 after each rejection. This can be time-consuming, computationally, particularly with large samples, so now we evaluate the effectiveness of bulk rejection. In this case, we reject all measurements that meet Chauvenet s criterion each iteration, 2 recomputing µ 1 and σ 1 once per iteration instead of once per rejection. However, bulk rejection works only if σ 1 is never significantly underestimated. If this happens, even if only for a single iteration, significant over-rejection can occur. Furthermore, each of the techniques that we have presented can fail in this way, under the right (or wrong) conditions: With one-sided contaminants, when µ 1 cannot be measured accurately (Figure 13), the standard deviation underestimates the 68.3-percentile devia-

20 Fig. 18. Average recovered µ 1 for increasingly robust measurement techniques and decreasing sample sizes, for one-sided contaminants, admitting the possibility of (mildly, asymmetrically) non-normally distributed uncontaminated measurements. See Figure 5 for σ 2 vs. f 2 axis labels. The results are similar to those of Figure 13, but with decreased effectiveness with the less robust techniques. The colors are scaled logarithmically, between 0.02 and Maples et al.

21 Fig. 19. Uncertainty in the recovered µ 1 for increasingly robust measurement techniques and decreasing sample sizes, for one-sided contaminants, admitting the possibility of (mildly, asymmetrically) non-normally distributed uncontaminated measurements. See Figure 5 for σ 2 vs. f 2 axis labels. The results are similar to those of Figure 14, but with decreased effectiveness with the less robust techniques. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 21

22 Fig. 20. Average recovered σ 1 for increasingly robust measurement techniques and decreasing sample sizes, for one-sided contaminants, admitting the possibility of (mildly, asymmetrically) non-normally distributed uncontaminated measurements. See Figure 5 for σ 2 vs. f 2 axis labels. The results are similar to those of Figure 15, but with decreased effectiveness with the less robust techniques. The colors are scaled logarithmically, between 0.02 and Maples et al.

23 Fig. 21. Uncertainty in the recovered σ 1 for increasingly robust measurement techniques and decreasing sample sizes, for one-sided contaminants, admitting the possibility of (mildly, asymmetrically) non-normally distributed uncontaminated measurements. See Figure 5 for σ 2 vs. f 2 axis labels. The results are similar to those of Figure 16, but with decreased effectiveness with the less robust techniques. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 23

24 Fig. 22. Average recovered σ 1+ for increasingly robust measurement techniques and decreasing sample sizes, for one-sided contaminants, admitting the possibility of (mildly, asymmetrically) non-normally distributed uncontaminated measurements. See Figure 5 for σ 2 vs. f 2 axis labels. The results are similar to those of Figure 8, but with decreased effectiveness with the less robust techniques. The colors are scaled logarithmically, between 0.02 and Maples et al.

25 Fig. 23. Uncertainty in the recovered σ 1+ for increasingly robust measurement techniques and decreasing sample sizes, for one-sided contaminants, admitting the possibility of (mildly, asymmetrically) non-normally distributed uncontaminated measurements. See Figure 5 for σ 2 vs. f 2 axis labels. The results are similar to those of Figure 9, but with decreased effectiveness with the less robust techniques. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 25

26 26 Maples et al. 0.6 significantly from those of column 4. Speed-up times are presented in Table WEIGHTED DATA We now consider the case of weighted data. In this case, the mean is given by: Fig. 24. Exponential power distribution (Equation 5), for κ = 0.5 (peaked), 0.7, 1 (Gaussian), 1.4, and 2 (flat-topped). tion as measured by technique 1 (the 68.3% value), which underestimates the 68.3-percentile deviation as measured by technique 2 (the linear fit), which underestimates the 68.3-percentile deviation as measured by technique 3 (the broken-line fit; Figure 17). In this case, the latter technique overestimates σ 1. However, the former three techniques can either overestimate σ 1 or underestimate it, sometimes significantly. With one-sided or two-sided contaminants, when µ 1 can be measured accurately, technique 3 (the broken-line fit) is as accurate ( 6) or more accurate ( 5) than the other techniques, but it is also the least precise ( 8), meaning that it is as likely to underestimate σ 1 as overestimate it, and, again, sometimes significantly. Note also that one can transition between these two cases: µ 1 often begins inaccurately measured but ends accurately measured, after iterations of rejections (Figures 11 and 12). Asolutionthatworksinallcasesistomeasureσ 1 using both techniques 2 (the linear fit) and 3 (the broken-line fit), and adopt the larger of the two for bulk rejection. When µ 1 cannot be measured accurately, the deviation curve breaks downward, and the broken-line fit is the most conservative option (Figure 17). When µ 1 can be measured accurately, the deviation curve breaks upward, and the linear fit is a sufficiently conservative option(figures 2 and 3). (Technique 1, the 68.3% value, is in this case a more conservative option, but can be overly conservative, bulk rejecting too few points per iteration.) We use the same µ-measurement technique as we use for individual rejection. Finally, once bulk rejection is done, we follow up with individual rejection, as described in the second to last paragraph of 8. Individual rejection (1) is significantly faster now that most of the outliers have already been bulk pre-rejected, and (2) ensures accuracy with precision ( 8). We plot the results in column 5 of Figures 25 40, and, desirably, they do not differ µ = N w i x i i=1, (7) N w i i=1 where x i are the data and w i are the weights. When the mean is measured from the sample, the standard deviation is given by: N w i (x i µ) 2 σ = i=1, (8) N N wi 2 i=1 w i i=1 N w i i=1 where = 1 when summing over data both below and above the mean, and we take = 0.5 when summing over data either only below or only above the mean. To determine the weighted median, sort the data and the weights by x i. First, consider the following, crude definition: Let j be the smallest integer such that: j N w i 0.5 w i. (9) i=1 i=1 The weighted median could then be given by µ = x j, but this definition would be very sensitive to edge effects. Instead, we define the weighted median as follows. Let: s j = j (0.5w i w i ), (10) i=1 where w 0 = 0, and let j be the smallest integer such that: s j 0.5 N w i. (11) i=1 The weighted median is then given by interpolation: 0.5 N w i s j 1 i=1 µ = x j 1 +(x j x j 1 ) s j s j 1, (12) where s 0 = 0.

$bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.$

27 Fig. 25. Average recovered µ 1 given two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 5, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 27

28 Fig. 26. Uncertainty in recovered µ 1 given two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 5, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

best-option robust Chauvenet rejection ( 5, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our

$for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.$ 3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.

3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.

29 Fig. 27. Average recovered σ 1 given two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 5, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 29

30 Fig. 28. Uncertainty in recovered σ 1 given two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 5, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

31 Fig. 29. Average recovered µ 1 given one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 31

32 Fig. 30. Uncertainty in recovered µ 1 given one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

$bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.$

33 Fig. 31. Average recovered σ 1 given one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 33

34 Fig. 32. Uncertainty in recovered σ 1 given one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by

$column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.$

35 Fig. 33. Average recovered µ 1 given in-between contaminants, in the limit of two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 35

36 Fig. 34. Uncertainty in recovered µ 1 given in-between contaminants, in the limit of two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 (

37 Fig. 35. Average recovered σ 1 given in-between contaminants, in the limit of two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 37

38 Fig. 36. Uncertainty in recovered σ 1 given in-between contaminants, in the limit of two-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

39 Fig. 37. Average recovered µ 1 given in-between contaminants, in the limit of one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 39

40 Fig. 38. Uncertainty in recovered µ 1 given in-between contaminants, in the limit of one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

41 Fig. 39. Average recovered σ 1 given in-between contaminants, in the limit of one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 41

42 Fig. 40. Uncertainty in recovered σ 1 given in-between contaminants, in the limit of one-sided contaminants, for, from left to right: (1) regular Chauvenet rejection; (2) best-option robust Chauvenet rejection ( 6, 8); (3) column 2 followed by column 1 ( 8); (4) column 2 followed by our most precise robust Chauvenet rejection option followed by column 1, plotted as improvement over column 3, multiplied by 100 ( 8); (5) best-option robust Chauvenet bulk pre-rejection followed by column 4 ( 9); (6) same as column 5, except for weighted data, with weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.3 ( 10); (7) same as column 5, except for weights distributed uniformly from zero, corresponding to σ w/µ w 0.58 ( 10); and (8) same as column 5, except for weights distributed inversely over one dex, corresponding to σ w/µ w 0.73 ( 10). See Figure 5 for σ 2 vs. f 2 axis labels. The colors are scaled logarithmically, between 0.02 and Maples et al.

43 Robust Chauvenet Rejection 43 Fig. 41. Correction factors by which standard and 68.3-percentile deviations, measured from uncontaminated data, need to be multiplied to yield the correct result, on average, and to avoid overaggressive rejection of outliers in small samples, (1) for the case of our best-option robust techniques (see below; black curves, from Figure 4); (2) for the case of (1) followed by regular Chauvenet rejection (red curves); (3) for the case of (1) followed by our most precise robust technique the median + technique 1 (the 68.3% value) followed by regular Chauvenet rejection (green curves); and (4) for the case of bulk rejection (see 9) followed by (3) (blue curves). Upper left: For our best-option robust technique for two-sided contaminants the median + technique 3 (the broken-line fit) in which we compute a single σ using the deviations both below and above µ ( 5). Upper right: For our best-option robust technique for one-sided contaminants the mode + technique 1 (the 68.3% value) in which we compute separate σ below and above µ (σ and σ +, respectively) and use the smaller of the two when rejecting outliers ( 6). Lower left: For our best-option robust technique for in-between cases the mode + technique 3 (the broken-line fit) in which we also use the smaller of σ and σ + when rejecting outliers ( 6). Lower right: For our best-option robust technique if the uncontaminated distribution is taken to be asymmetric the mode + technique 3 (the broken-line fit) in which we use σ to reject outliers below µ and σ + to reject outliers above µ ( 7). We use look-up tables for N 100 and power-law approximations for N > 100 (see Appendix B). To determine the weighted mode, we again follow the half-sample mode approach ( 2). For every j such that: s j 0.5 N w i, (13) i=1 let k be the largest integer such that: s k s j +0.5 N w i, (14) i=1 and for every k such that: N s k 0.5 w i, (15) i=1 let j be the smallest integer such that: N s j s k 0.5 w i. (16) i=1

44 44 Maples et al. TABLE 1 Time in Milliseconds to Measure µ 1 and σ 1 a Contaminant Type: 2-Sided 1-Sided 2-Sided 1-Sided RCR Type: b Median-T3 Mode-T1 Mode-T3 Mode-T3 Corresponding Figures: Bulk Pre-Rejection: No Yes No Yes No Yes No Yes Corresponding Column: N = N = N = a Averaged over the = 121,000 samples in each σ 2 vs. f 2 figure in columns 4 vs. 5 of Figures 25 40, using a single, AMD Opteron 6168 processor. Measuring the mode is 1.6N 0.05 times slower than measuring the median, and technique 3 (the brokenline fit) is 1.2 times slower than technique 1 (the 68.3% value), but bulk pre-rejection is 0.65N 0.21 (2-sided) to 0.16N 0.73 (1-sided) times faster than no bulk pre-rejection, where N is the sample size. Time to completion is proportional to N α, where α 2 (no bulk pre-rejection) or 1 < α < 2 (bulk pre-rejection), plus an overhead constant, which dominates when N In the case of weighted data (see 10), completion times are roughly 1+0.7N 0.4 times longer. b + RCR (Median-T1) + CR ( 9) Of these (j,k) combinations, select the one for which x k x j is smallest. If multiple combinations meet this criterion, let j be the smallest of their j values and k be the largest of their k values. Restricting oneself to only the k j+1 values between and including j and k, repeat this procedure, iterating to completion. Take the weighted median of the final k j +1 values. To determine the weighted 68.3-percentile deviation, measured either from the weighted median or the weighted mode, sort the deviations δ i = x i µ and the weights by δ i. Analogously to the weighted median above, first consider the following, crude definition: Let j be the smallest integer such that: j N w i w i. (17) i=1 i=1 The weighted 68.3-percentile deviation could then be given by σ = δ j, but, again, this definition would be very sensitive to edge effects. Instead, we define the weighted 68.3-percentile deviation, for technique 1 (the 68.3% value), as follows. Let: 7 s j = j (0.317w i w i ), (18) i=1 where w 0 = 0, and let j be the smallest integer such that: s j N w i. (19) i=1 The weighted 68.3-percentile deviation, for technique 1, is then given by interpolation: N w i s j 1 i=1 σ = δ j 1 +(δ j δ j 1 ), (20) s j s j 1 7 We center these not halfway through each bin, as we do for the weighted median and weighted mode, but 68.3% of the way through each bin. The need for this can be seen in the case of µ being known a priori, in the limit of one measurement having significantly more weight than the rest, or in the limit of N 1. where s 0 = 0. For techniques 2 (the linear fit) and 3 (the broken-line fit), the 68.3-percentile deviation is given by plotting δ i vs. 2erf 1 (s i / N i=1 w i) and fitting as before ( 3), except to weighted data (e.g., Appendix A). Note that as defined here, all of these measurement techniques reduce to their unweighted counterparts ( 2 and 3) when all of the weights, w i, are equal. Note also that the correction factors ( 4) that one uses depend on the weights of the data. To this end, for each of the four scenarios that we consider in 8, correspondingtothefourpanelsoffigure41,wehavecomputedcorrection factors for the case of bulk rejection ( 9) followed by individual rejection as described in the second to last paragraph of 8, for five representative weight distributions: (1) all weights equal (see Figure 42, solid black curves same as Figure 41, blue curves); (2) weights distributed normally with standard deviation as a fraction of the mean σ w /µ w = 0.1 (Figure 42, solid red curves); (3) weights distributed normally with σ w /µ w = 0.3 (Figure 42, solid green curves); (4) weights distributed uniformly from zero (i.e., low-weight points as common as high-weight points; Figure 42, solid blue curves), corresponding to σ w /µ w 0.58; and (5) weights distributed inversely over one dex (i.e., low-weight points more common than high-weight points, with the sum of the weights of the low-weight points as impactful as the sum of the weights of the high-weight points; Figure 42, solid purple curves), corresponding to σ w /µ w From these, we have produced empirical approximations, as functions of (1) N and (2) σ w /µ w of the x i = 2erf 1 (s i / N i=1 w i) < 1 points, which can be used with any sample of similarly distributed weights (Figure 42, dashed curves; see Appendix B). We demonstrate these for the latter three weight distributions listed above in columns 6, 7, and 8, respectively, of Figures 25 40, and, desirably, they do not differ significantly from those of column 5, in which σ w /µ w = 0, although there is some decrease in effectiveness in the low-n, high-σ w /µ w limit. 11. EXAMPLE: APERTURE PHOTOMETRY The Skynet Robotic Telescope Network is a global network of fully automated, or robotic, volunteer telescopes,

45 Robust Chauvenet Rejection 45 Fig. 42. Same as the blue curves from Figure 41, but for five representative weight distributions: (1) all weights equal (solid black curves same as the blue curves from Figure 41); (2) weights distributed normally with standard deviation as a fraction of the mean σ w/µ w = 0.1 (solid red curves); (3) weights distributed normally with σ w/µ w = 0.3 (solid green curves); (4) weights distributed uniformly from zero (i.e., low-weight points as common as high-weight points; solid blue curves), corresponding to σ w/µ w 0.58; and (5) weights distributed inversely over one dex (i.e., low-weight points more common than high-weight points, with the sum of the weights of the low-weight points as impactful as the sum of the weights of the high-weight points; solid purple curves), corresponding to σ w/µ w From these, we have produced empirical approximations, as functions of (1) N and (2) σ w/µ w of the x i = 2erf 1 (s i / N i=1 w i) < 1 points, which can be used with any sample of similarly distributed weights (dashed curves; see Appendix B). scheduled through a common web interface. 8 Currently, our optical telescopes range in size from 14 to 40 inches, and span four continents. Recently, we have added Skynet s first radio telescope, Green Bank Observatory s 20-meter (Martin et al., in prep.) We are incorporating robust Chauvenet rejection techniques into Skynet s image-processing library, beginning with our single-dish mapping algorithm (Martin et al., in prep.) Here, we use robust Chauvenet rejection extensively: (1) to eliminate contaminants during gain calibration; (2) to measure the noise level of the data along scans, and as a function of time, to aid in background subtraction along 8 the scans; (3) to combine local, background-level models into global models, for background subtraction along entire scans; (4) to eliminate contaminants if signal and telescope-position clocks must be synchronized post facto from the background-subtracted data; (5) to measure the noise level of the background-subtracted data across scans, and as afunction of time, to aid in radio-frequency interference (RFI) cleaning; and (6) to combine local models of the background-subtracted, RFI-cleaned signal into a global model, describing the entire observation. After this, we locally model and fit a surface to the background-subtracted, time-delay corrected, RFIcleaned data, filling in the gaps between the signal measurements to produce the final image (e.g., Figure 43).

46 Maples et al. Fig. 43. Upper left: Signal measurement positions from an on-the-fly raster mapping of Cas A, made with Green Bank Observatory s 20-meter diameter telescope, in L band.

time-delay corrected, or RFI cleaned. Lower right: Final image, which has been background subtracted, time-delay corrected, RFI cleaned, and then surface modeled (Martin et al., in prep.

Here, we demonstrate another application of robust Chauvenet rejection: aperture photometry, in this case of the primary source, Cas A, in the lower-right panel of Figure 43.

We sum all of the values in the aperture, but from each we must also subtract off the average background-level value, which we measure from the surrounding annulus.

46 46 Maples et al. Fig. 43. Upper left: Signal measurement positions from an on-the-fly raster mapping of Cas A, made with Green Bank Observatory s 20-meter diameter telescope, in L band. Lower left: Raw image, which has been surface modeled (to fill in the gaps between the signal measurements, without additionally blurring the image), but has not been background subtracted, time-delay corrected, or RFI cleaned. Lower right: Final image, which has been background subtracted, time-delay corrected, RFI cleaned, and then surface modeled (Martin et al., in prep.) Furthermore, each pixel in the final image is weighted, equal to the proximity-weighted number of data points that contributed to its determination (e.g., Figure 44). Here, we demonstrate another application of robust Chauvenet rejection: aperture photometry, in this case of the primary source, Cas A, in the lower-right panel of Figure 43. We have centered the aperture on the source, and have selected its radius to match that of the minimum between the source and its first Airy ring (see Figure 45). We sum all of the values in the aperture, but from each we must also subtract off the average background-level value, which we measure from the surrounding annulus. The annulus we have selected to extend from the radius of the aperture to the size of the image (Figure 45). However, it is heavily contaminated, by the source s Airy rings and by other sources. This is a good case to demonstrate robust Chauvenet rejection, because (1) a large fraction, f 2, of the pixels in the annulus are contaminated, and (2) they are strongly contaminated, σ 2, compared to the background noise level, σ 1. It is also a good case to demonstrate bulk pre-rejection( 9), because

Variations are due to larger than expected gaps in the mapping pattern (upper-left panel of Figure 43), or due to gaps produced by the RFI-cleaning algorithm.

Variations are due to variations in the weighting scale (left panel), and due to over-densities of data points at the bottom and top edges, where the telescope slows down when changing direction

47 Robust Chauvenet Rejection 47 Fig. 44. Left: Angular scale, in telescope beamwidths, over which data points are most strongly weighted when modeling the surface about each pixel in the lower-right panel of Figure 43. Variations are due to larger than expected gaps in the mapping pattern (upper-left panel of Figure 43), or due to gaps produced by the RFI-cleaning algorithm. Right: Proximity-weighted number of data points that contributed to the surface model at each pixel in the lower-right panel of Figure 43. Variations are due to variations in the weighting scale (left panel), and due to over-densities of data points at the bottom and top edges, where the telescope slows down when changing direction (Martin et al., in prep.) there are a large number of pixels in the annulus, and to demonstrate robust Chauvenet rejection of weighted data ( 10, Figure 44). These are one-sided contaminants, so we follow bulk pre-rejection with RCR (Mode Technique 1) + RCR (Median Technique 1) + CR ( 8, Figures 29 32). The rejected pixels have been excised from the left panel of Figure 45. If one suspected an in-between case, with some low contaminants as well, we would instead follow bulk prerejection with RCR (Mode Technique 3) + RCR (Median Technique 1) + CR ( 8, Figures 33 40). The rejected pixels for this case have been excised from the right panel of Figure 45. For these two cases, the post-rejection background level is measured to be ± and ± , respectively, which is a significant improvement over the pre-rejection value, ± (gain calibration units). 12. MODEL FITTING So far, we have only applied robust Chauvenet rejection in cases where uncontaminated measurements are distributed, either symmetrically ( 5, 6) or asymmetrically ( 7), about a single, parameterized value, y. In particular, we have introduced increasingly robust ways of measuring y, or to put it differently, of fitting y to measurements, namely: the mean, the median, and the mode ( 2, 10). We have also introduced techniques: (1) to more robustly identify outlying deviations from y, for rejection ( 3 7 and 10); (2) to more precisely measure y, without sacrificing robustness ( 8); and (3) to more rapidly measure y ( 9). In this section, we show that robust Chauvenet rejection can also be applied when measurements are distributed not about a single, parameterized value, but about a parameterized model, y({x} {θ}), where {x} are the model s independent variables, and {θ} are the model s parameters. But first, we must introduce increasingly robust ways of fitting y({x} {θ}) to measurements, now given by {{ x i +σ x+,i σ x,i },y i +σ y+,i σ y,i }. These are generalizations of the mean, the median, and the mode, and replace them in our algorithms. From these best-fit, or baseline, models, deviations can be calculated, after which our remaining techniques can be applied without modification. Usually, models are fitted to measurements by maximizing a likelihood function. 9 For example, if: and χ 2 = i σ x,i σ x+,i 0, (21) σ y,i σ y+,i σ y,i, (22) [ ] 2 yi y({x} {θ}) N M, (23) σ y,i where N is the number of independent measurements, and M is the number of non-degenerate model param- 9 Or, by maximizing the product of a likelihood function and a prior probability distribution, if the latter is available.

48 48 Maples et al. Fig. 45. Same as the lower-right panel of Figure 43, except that contaminated pixels (contaminated by other sources, Airy rings, etc.) have been robust Chauvenet rejected within an annulus in which we are measuring the background level, (1) assuming that the contaminants are one sided (left), and (2) assuming that the contaminants are an in-between case, with some low contaminants as well (right). eters, this function is simple: L e χ2 /2, in which case maximizing L is equivalent to minimizing χ 2. If these conditions are not met, L, and its maximization, can be significantly more involved (e.g., Reichart 2001; Trotter 2011). Regardless, such, maximum-likelihood, approaches are generalizations of the mean, and consequently are not robust. To see this, again consider the simple case of the single-parameter model: y({x} {θ}) = y. Minimizing Equation 23 with respect to y (i.e., solving χ 2 / y = 0 for y) yields a ( best-fit parameter value, and a best-fit model, of ŷ = i y ) ( ) i/σy,i 2 / i 1/σ2 y,i = ( i w iy i )/( i w i). This is just the weighted mean of the measurements (Equation 7), which is not robust. One could imagine iterating between (1) maximizing L to establish a best-fit model, and (2) applying robust Chauvenet rejection to the deviations from this model, to eliminate outliers, but given that (1) is not robust, this would be little better than iterating with regular Chauvenet rejection, which relies on the weighted mean. Instead, we retain our previous algorithms, but replace the weighted mean, the weighted median, and the weighted mode with generalized versions, maintaining the robustness, and precision, of each. We generalize the weighted mean as above, with maximum-likelihood model fitting. We generalize the weighted median and the weighted mode as follows. First, consider the case of an M-parameter model where for any combination of M measurements, a unique set of parameter values, {θ} j, can be straightforwardly calculated. Furthermore, imagine doing this for all 1 j N!/[M!(N M)!] combinations of M measurements, and weighting each calculated parameter value by the product of: (1) the weights of each of the M measurements that went into its calculation, and (2) a weight indicating how accurately the parameter value was determined. 10,11 Our generalizations} are then given by: (1) the weighted median of {{θ} j, and (2) the weighted } mode of {{θ} j. Although more sophisticated implementations can be imagined, here we define these quantities simply, and such that they reduce to the weighted median and the weighted mode, respectively, in the limit of the singleparameter model, just as the maximum-likelihood technique above reduces to the weighted} mean in this limit. For the weighted median of {{θ} j, we calculate the weighted median for each model } parameter separately. For the weighted mode of {{θ} j, we determine the halfsample for each model parameter separately, but then include only the intersection of these half-samples in the next iteration. 12 We demonstrate these techniques, and 10 For example, consider a linear model, given by y(x) = b+m(x x), withthescatterofthedataaboutthismodelgivenby σ y(x). Given two measurements, (x 1,y 1 ) and (x 2,y 2 ), m = y 2 y 1 x 2 x 1 σ 2 y (x 2)+σ 2 y (x 1) (x 2 x 1 ) 2, and can be calculated with uncertainty σ m = ( ( b = x2 x )y x 2 x 1 x1 x )y 1 x 2 x 2 can be calculated with uncertainty 1 (x 2 x) σ b = 2 σy 2(x 1)+(x 1 x) 2 σy 2(x 2) (x 2 x 1 ) 2. Consequently, we additionally weight m by w m σ 2 m and b by w b σ 2 b, which if σ y(x) is constant, as it is in Figures 46 and 47, simplify to w m (x 2 x 1 ) 2 and w b (x 2 x 1 ) 2 (x 1 x) 2 +(x 2 x) If there is a prior probability distribution, this weight should also be multiplied by the prior probability of the calculated parameter value. 12 In this case, iteration ends either: (1) as before, if the next 2

49 Robust Chauvenet Rejection 49 the maximum-likelihood technique, for a simple, linear, but contaminated, model in Figure 46. In Figure 47, we apply robust Chauvenet rejection as before ( 8, 9), but using our generalizations instead of the weighted mode, the weighted median, and the weighted mean. 13 Even in the face of heavy contamination, this approach can be very effective at recovering the original, underlying correlation. However, there are some considerations and limitations: Due to statistical and/or systematic variations, some combinations of M measurements, even M uncontaminated measurements, might not correspond to a well-defined combination of the model s parameter values. For example, consider an exponential model, e.g., y(x) = be m(x x), that asymptotes from positive values to zero (i.e., b > 0 and m < 0), but with measurements, y i, that are occasionally negative due to statistical and/or systematic variations. Since there are no combinations of b and m that yield both positive and negative values of y(x), such a combination of measurements, say y 1 and y 2, where x 1 < x 2, must instead be assigned a reasonable, limiting combination of b and m values, such as m = and: 0 x 1 < x y 1 x 1 = x b = x 1 > x, y 1 > 0, (24) 0 x 1 > x, y 1 = 0 x 1 > x, y 1 < 0 if y 1 > y 2 ; m = and: 0 x 2 > x y 2 x 2 = x b = x 2 < x, y 2 > 0, (25) 0 x 2 < x, y 2 = 0 x 2 < x, y 2 < 0 if y 1 < y 2 ; and both of these, each with half weight, if y 1 = y 2. Simply excluding such combinations would} bias calculation of the weighted median of {{θ} j, in this case, toward higher values of b and lower values of m. At the same time, such combinations should be excluded} from calculation of the weighted mode of {{θ} j, lest they be returned artificially. We demonstrate robust Chauvenet rejection applied to such an exponential model in Figures 48 and ,15 intersection would be unchanged ( 2, 10), or (2) if the next intersection would be null. 13 When rejecting outliers at the end of robust Chauvenet rejection, using the mean and the standard deviation, in Equation 8, is instead equal to M, the number of non-degenerate model parameters, when summing over data both below and above the mean, and half of this when summing over data either only below or only above the mean. 14 In this case, we approximate the additional weights } that are needed to calculate (1) the weighted mode of {{θ} j and Combinations of M measurements with redundant independent-variable information cannot be used to determine all M of the model s parameters. Furthermore, if, in this case, any of the model s parameters can be determined, they will be overdetermined. For example, a plane cannot be uniquely determined by three co-linear measurements. However, if this line happens to run parallel to one of the model s axes, at least one, and possibly two, of the model s parameters (i.e., the plane s slope along this axis, and the plane s normalization, if defined along this line) can be determined. But they will be overdetermined, given three measurements for only one or two parameters. In the interest of simplicity, we discard these (usually rare) combinations completely, noting that uncontaminated measurements selected in this way are unlikely to be preferentially under- or over-estimates, and consequently their exclusion is unlikely to } bias calculation of the weighed median of {{θ} j, let alone of } the weighted mode of {{θ} j. However, more sophisticated implementations can also be imagined. If a fraction, 1 f, of the measurements is uncontaminated, a smaller fraction, } (1 f) M, of the set of model solutions, {{θ} j, is uncontaminated. So, the higher the dimension of the model, and hence of the model s parameter space, the more difficult it becomes for our generalization of the mode, in } (2) the weighted median of {{θ} j as follows. Given two measurements, (x 1,y 1 ) and (x 2,y 2 ), m = lny 2 lny 1 can be calculated with uncertainty σ m = (x 2)+σ x 2 x 1 σ lny 2 lny 2 (x 1) (x 2 x 1 ) 2, and lnb = ( ) ( ) x2 x lny x 2 x 1 x1 x lny 1 x 2 x 2 can be calculated with uncertainty σ lnb = (x 1)+(x 1 x) 2 σ 1 (x 2 x) 2 σ lny 2 lny 2 (x 2) (x 2 x 1 ) 2 (see Footnote 10), where σ lny (x) is the scatter about the linearized model, ln y(x) (see final bullet point). Consequently, we additionally weight m by w m σm 2 and lnb by w lnb σ 2 lnb. However, if σ y(x) = σ y is constant, as it is in Figures 48 and 49, then σ lny (x) is not constant, and consequently cannot be dropped from these expressions as σ y can be in our linear example. 10 Instead, σ +lny (x) ln[y(x)+σ y] lny(x) σ y/y(x) and σ lny (x) lny(x) ln[y(x) σ y] σ y/y(x) when σ y y(x), andσ +lny (x) ln[σ y/y(x)]andσ lny (x) whenσ y y(x). Since the σ y y(x) measurements are the most informative, we approximate σ lny σ y/y(x), which, conservatively, underestimates the weights for the less-informative, σ y y(x) measurements. Consequently, our weight expressions simplify to w m (x 2 x 1 ) 2 y 2 (x 2 )+y 2 (x 1 ) and w (x b 2 x 1 ) 2 (x 1 x) 2 y 2 (x 2 )+(x 2 x) 2 y 2 (x 1 ) (see Footnote 20). Also unlike in our linear example, 10 these weights depend upon the model. Consequently, we update them after each iteration, using the new baseline model to calculate new weights. For the initial baseline model: (1) We adopt weights corresponding to σ lny (x) constant, or equivalently, to y(x) constant; (2) using these weights, we calculate the weighted mode of {{θ} j } ; (3) using this model, we calculate new weights; and (4) we repeat (2) and (3) until convergence. 15 For the maximum-likelihood fits, we use the Gauss-Newton algorithm, since the model is nonlinear. The Gauss-Newton algorithm requires an initial guess, for which we use the baseline model from the previous iteration.

50 50 Maples et al. Fig. 46. Left column: 201 measurements, with fraction f1 = 1 f2 drawn from a Gaussian distribution of mean y(x) = x and standard deviation 1, and fraction f2 = 0.15 (top row), 0.5 (middle row), and 0.85 (bottom row), representing contaminated measurements, drawn from the positive side of a Gaussian distribution of mean zero and standard deviation 10, and added to uncontaminated measurements, drawn as above. Right column: Models, {θ}j, calculated from each pair of measurements in the panel to the left, using y(x) = b+m (x x ) and model parameters b and m. Each calculated parameter value is weighted,10 and darker points correspond to models where the product of these weights isnin the model, and in both columns, blue corresponds to the o top 50%. Purple corresponds to the original,nunderlying o weighted mode of {θ}j, green corresponds to the weighted median of {θ}j, and red corresponds to maximum-likelihood model fitting. n o The weighted mode of {θ}j performs the best, especially in the limit of large f2. Maximum-likelihood model fitting performs the worst. See Figure 47 for post-rejection versions.

51 Robust Chauvenet Rejection 51 Fig. 47. Figure 46, after robust Chauvenet rejection. Here, we have performed bulk rejection using our generalization of the mode, followed by our most general robust technique for symmetrically distributed uncontaminated measurements our generalization of the mode + technique 3 (the broken-line fit) followed by our most precise robust technique our generalization of the median + technique 1 (the 68.3% value) followed by regular Chauvenet rejection using our generalization of the mean ( 8, 9). Robust Chauvenet rejection proves effective, even in the face of heavy contamination.

52 52 Maples et al. Fig. 48. Left column: 101 measurements, with fraction f 1 = 1 f 2 drawn from a Gaussian distribution of mean y(x) = e (x 0.5) and standard deviation 1, and fraction f 2 = 0.15 (top row), 0.5 (middle row), and 0.85 (bottom row), representing contaminated measurements, drawn from the positive side of a Gaussian distribution of mean zero and standard deviation 10, and added to uncontaminated measurements, drawn as above. Right column: Models, {θ} j, calculated from each pair of measurements in the panel to the left, using y(x) = be m(x x), Equations 24 and 25, and model parameters lnb and m (see final bullet point). Each calculated parameter value is weighted, 14 and darker points correspond to models where the product of these weights is in the top 50%. Purple corresponds to the original, underlying model, and in both columns, blue corresponds to the weighted mode of {{θ} j }, green corresponds to the weighted median of {{θ} j }, and red corresponds to maximum-likelihood model fitting. The contaminants have a greater, relative, effect on the high-x/low-y measurements than on the low-x/high-y measurements, biasing the calculated models } toward shallower slopes and higher normalizations (i.e., toward the upper right, in the panels on the right). The weighted mode of {{θ} j most successfully overcomes this bias as f 2 0.5, but all three techniques fail as f See Figure 49 for post-rejection versions.

53 Robust Chauvenet Rejection 53 Fig. 49. Figure 48, after robust Chauvenet rejection. Here, we have again performed bulk rejection using our generalization of the mode, followed by our most general robust technique for symmetrically distributed uncontaminated measurements our generalization of the mode + technique 3 (the broken-line fit) followed by our most precise robust technique our generalization of the median + technique 1 (the 68.3% value) followed by regular Chauvenet rejection using our generalization of the mean ( 8, 9). Robust Chauvenet rejection proves effective in the face of fairly heavy contamination, but is unable to overcome bias introduced by the contaminants (Figure 48; as opposed to bias introduced by poor model design; see final two bullet points) as f

54 54 Maples et al. particular, to latch on to a desirable solution. Or to put it another way, the higher M, the lower f beyond which robust Chauvenet rejection fails. We demonstrate this for a simple three-parameter model in Figures 50 and 51, using the same contamination fractions as in Figures ,17 Naturally, our generalization of the mode, in particular, is most } effective if the uncontaminated subset of {{θ} j is maximally concentrated. Often, this is but a matter of good model design, or of avoiding standard modeling pitfalls. For example, consider linear (or linearized; see final bullet point) y i vs. x i data. Modeling such data with y(x) = b+m(x x)resultsinalargelyuncorrelated, nearmaximally concentrated distribution of, at least the highest-weight, b vs. m values. 18 However, model- 16 This example is more involved than our previous examples in that given M measurements, the model s M parameters cannot be solved for analytically. Consequently, we again use the Gauss-Newton algorithm, not just for the maximum-likelihood fits at the end} of robust Chauvenet rejection, 15 but now also to populate {{θ} j. Furthermore, the uncertainties in these calculated parameter values, σ θ = ( ) σ θ1,...,σ θm, are given by the matrix at the heart of the Gauss-Newton algorithm, which, when N = M, is simply the inverse Jacobian: σ θ = J 1 σ y. Here, σ y = (σ y1,...,σ yn=m ), with each component, σ yi, drawn from a Gaussian distribution of mean zero and standard deviation σ y (x i ). Mathematically, this is equivalent to setting each σ yi = σ y (x i ) and when adding terms in this matrix-vector multiplication, doing so in quadrature. (This yields the same uncertainty expressions that we derived in our previous examples. 10,14 ) As before, each calculated parameter value is then additionally weighted by w θ σ 2 θ. In the case of this trigonometric example, σ y(x) = σ y is constant, and consequently (1) a factor of σy 2 results, which (2) can be dropped from these expressions, as in our linear example. 10,14 In cases where one instead models lny (see final bullet point), with σ θ = J 1 σ lny, one can similarly group and drop the resulting factors of σ 2 lny if σ lny(x) is constant, or model them if it is not: E.g., if modeling lny with σ y(x) constant and consequently σ lny (x) not constant, we model σ 2 lny y2 (x), as in our linearized exponential example Thisexampleisalsomoreinvolvedthanourpreviousexamples in that the same data can result in (1) different, but equivalent solutions, and (2) different, but not equivalent solutions, both } of which can unintentionally bias the weighted median of {{θ} j and the weighted mode of {{θ} j }. In the former case, we map all equivalent solutions onto their simplest form: (1) If m < 0, we map m m and ( b b; ) (2) then if x 0 2π/m, we map x 0 x 0 x 0 x 0 floor m x0 (2π ) 2π m ; and (3) then if x0 π/m, we map x 0 x 0 x ( 0 π ) x 0 m and b b. In the latter case, one can always find shorter-period/higher-m solutions. Which solution the Gauss-Newton algorithm finds depends upon the initial guess that it is given. 15 For this, we use the baseline model from the previous iteration, with the initial baseline model provided by the user. This is analogous to centroiding algorithms in astrometry. If a user clicks anywhere in a star s vicinity, such algorithms arrive at the same solution for the star s center. But if the user clicks too far away, another star s center will be found instead. On this note, an improved version of the Gauss-Newton algorithm is required to ensure local convergence. For example, each time an iteration results in a poorer fit, we do not apply the increment vector, and then decrease it by 50% for future iterations; each time an iteration results in an improved fit, we apply the increment vector, and then increase it by 50% for future iterations. 18 Here, x is the weighted average of the non-rejected x i values, which we update after each iteration. In the case of our ingsuchdatawithy(x) = b+m(x x 0 )forx 0 x or x 0 x results in a correlated, and hence dispersed and not near-maximally concentrated, distribution, making robust Chauvenet rejection less effective, and in this case, unnecessarily (see Figure 52). 19 However, sometimes such correlations cannot be avoided. For example, if presented with quadratic y i vs. x i data, one can design away correlations between two of the model s three parameters, but not between all three of them simultaneously: If one models the data with y(x) = b+m 1 (x x)+ m 2 (x x) 2, both (1) the highest-weight b vs. m 1 values and (2) the highest-weight m 1 vs. m 2 values will, for the most part, be uncorrelated, but the highest-weight b vs. m 2 values will be marginally (negatively) correlated, which we demonstrate in Figure 53. Despite this, robust Chauvenet rejection is still effective through fairly high contamination fractions, which we demonstrate in Figure 54. Another example of good model design is proper choice of basis. For example, when modeling measurements with an exponential function, e.g., y(x) = be m(x x) (, or with ) a power-law function, m, e.g., y(x) = b x/e lnx one normally models lnb instead of b. 14,19 This is called choice of basis. When performing maximum-likelihood model fitting, choice of basis does not affect the best fit, but can yield a more concentrated/symmetric probability distribution for the parameter of interest, and hence more concentrated/symmetric error bars. Likewise, choice of basis can yield a more concentrated/symmetric} distribution of calculated parameter values, {{θ} j. While this does not affect the weighed median of } {{θ} j, it can affect } the weighted mode of {{θ} j. However, unless the scatter of the uncontaminated measurements about the model is significant, this is usually a very small effect, and in practice either form of this parameter can be used. 20 linearized exponential example, 14 with σ y(x) constant and consequently σ lny (x) not constant, we additionally weight each x i by σ 2 lny (x i) y 2 (x i ) when calculating x. 19 In the ( case of ) a linearized power-law ( model: ) lny(x) = m. lnb + m lnx lnx, or y(x) = b x/e lnx Hence, the reference value of x is instead given by weighted logarithmic averaging. } When calculating (1) the weighted } mode of {{θ} j and (2) the weighted median of {{θ} j, we additionally weight m by w m (lnx 2 lnx 1 ) 2 (lnx 2 lnx 1 ) 2 σ 2 lny (x 2)+σ 2 lny (x 1) (lnx 1 lnx) 2 σ 2 lny (x 2)+(lnx 2 lnx) 2 σ 2 lny (x 1) and lnb by w b (see Footnote 14), and whencalculatinglnx, weadditionallyweighteachlnx i byσ 2 lny (x i) (see Footnote 18). If σ lny (x) is constant, it can be dropped from these expressions. 10,14,16,18 If instead σ y(x) is constant, and consequently σ lny (x) is not constant, we again approximate σ 2 lny (x) y 2 (x). 14,16,18 20 Modeling, e.g., y(x) = be m(x x) with model parameter b,

55 Robust Chauvenet Rejection 55 Application of robust Chauvenet rejection to parameterized models is potentially a very broad topic, with applications spanning not only science, but all disciplines. Here, we have but scratched the surface with a few, simple examples. 13. PEIRCE REJECTION Regular Chauvenet rejection is sigma clipping plus a rule for selecting a reasonable number of sigma for the threshold, given N measurements ( 1). It is straightforward to use, and as such has been adopted as standard by many government and industry laboratories, and is commonly taught at universities (Ross 2003). However, it is not the only approach one might take to reject outliers. For example, even Chauvenet (1863) deferred to Peirce s approach (1852; Gould 1855), 21 which has recently seen new life with its own implementation in the R programming language (Dardis 2012). Instead of assuming 0.5 in Equation 1, Peirce derives this value from probability theory, and finds (1) that it is weakly dependent on N, asymptoting to 0.5 as N increases, and (2) that it decreases with subsequent rejections. However, unlike Peirce s approach, Chauvenet rejectionisamenableto(1)n, (2)themean, and(3)thestandard deviation being updated after each rejection ( 1). Peirce s approach requires all three of these quantities to remain fixed until all rejections have been completed, and as such Peirce rejection is less robust than Chauvenet rejection, at least when the latter is implemented iteratively, as we have done. Furthermore, our correction factors ( 3, 8 10) empirically account for the above, weak dependence on N, as well as for differences in implementation (e.g., our use of one-sided deviation measurements, our use of robust quantities, etc.) In Figures 55 58, we compare Peirce rejection to (1) regular Chauvenet rejection and (2) robust Chauvenet rejection, for both two-sided and one-sided contaminants. Given our iterative implementation, and our correction factors, we find regular Chauvenet rejection to be comparable to Peirce rejection when the contaminants are two-sided and N is low, and better than Peirce rejection otherwise. Robust Chauvenet rejection is significantly better than both of these approaches. 14. SUMMARY The most fundamental act in science is measurement. By combining multiple measurements, one can better constrain a quantity s true value, and its uncertainty. However, measurements, and consequently samples of measurements, can be contaminated. Here, we have introduced, and thoroughly tested, a technique that, while not perfect, is very effective at identifying which measurements in a sample are contaminated, even if they constitute most of the sample, and especially if the contaminants are strong (making contaminated measurements easier to identify). We have considered both symmetrically( 5) and asymmetrically ( 6) distributed contamination of both symmetrically ( 5, 6) and asymmetrically ( 7) distributed uncontaminated measurements, making recommendations in all four cases. We have considered accuracy vs. precision, combining robust and regular Chauvenet rejection techniques to achieve both ( 8). We have considered the practical cases of bulk rejection ( 9), weighted data ( 10), and model fitting ( 12), again making recommendations after thorough testing. Finally, we have developed a simple web interface for robust Chauvenet rejection. 22 Users may upload a data set, and select from the above scenarios. They are returned their data set with outliers flagged, and with µ 1 and σ 1 robustly measured. Source code is available here as well. We gratefully acknowledge the support of the National Science Foundation, through the following programs and awards: ESP , MRI-R , AAG , , and , ISE , HBCU-UP , TUES , and STEM+C We also gratefully acknowledge the support of the Mt. Cuba Astronomical Foundation, the Robert Martin Ayers Sciences Fund, and the North Carolina Space Grant Consortium. APPENDIX A. BROKEN-LINE FIT THROUGH ORIGIN Let x i = 2erf 1 [(i 0.317)/N] and y i = δ i. We model these data as a broken line that passes through the origin: y = { σ1 x, if i m σ 1 x m +σ 2 (x x m ), if i m and x i 1, (A1) where σ 1 is the slope of the line for i m, and our modeled 68.3-percentile deviation, and σ 2 is the slope of the line for i m and x i 1. We model the break to occur at x m, instead of between points, for simplicity. Let the fitness of this three-parameter model be measured by: instead of lny(x) = lnb + m(x x) with model parameter lnb, yields a different inverse Jacobian, and a different model for the scatter of the data, σ y(x) vs. σ lny (x). But together, these yield the same expressions for the weights, w b = w lnb and w m. 14,19 Consequently, } the difference when calculating the weighted mode of {{θ} j is really just one of concentration: Using b instead of lnb favors lower values, but again, usually only marginally. (In the case of significant positive contamination, this can actually be favorable, but again, usually only marginally.) 21 It is interesting to note that this topic, although statistical in nature, originates in the field of astronomy. Two of these publications are in the early volumes of the Astronomical Journal, and the third, Chauvenet s A Manual of Spherical and Practical Astronomy, was a standard reference for decades. 22

56 56 Maples et al. Fig. 50. Left column: 43 measurements, with fraction f1 = 1 f2 drawn from a Gaussian distribution of mean y(x) = 3 sin x and standard deviation 1, and fraction f2 = 0.15 (top row), 0.5 (middle row), and 0.85 (bottom row), representing contaminated measurements, drawn from a Gaussian distribution of mean zero and standard deviation 10, and added to uncontaminated measurements, drawn as above. Right columns: Models, {θ}j, calculated from each triplet of measurements in the panel to the left, using y(x) = b sin m (x x0 ) and model parameters b, m, and x0.17 Each calculated parameter value is weighted,16 and darker points correspond to n models o where the product of these weights is in the top n 50%. o Purple corresponds to the original, underlying model, and in all columns, blue corresponds to the weighted mode of {θ}j, green corresponds to the weighted median of {θ}j, n o and red corresponds to maximum-likelihood model fitting. The weighted mode of {θ}j performs the best, especially in the limit of large f2. Maximum-likelihood model fitting performs the worst. See Figure 51 for post-rejection versions.

57 Robust Chauvenet Rejection 57 Fig. 51. Figure 50, after robust Chauvenet rejection. Here, we have again performed bulk rejection using our generalization of the mode, followed by our most general robust technique for symmetrically distributed uncontaminated measurements our generalization of the mode + technique 3 (the broken-line fit) followed by our most precise robust technique our generalization of the median + technique 1 (the 68.3% value) followed by regular Chauvenet rejection using our generalization of the mean ( 8, 9). Robust Chauvenet rejection proves effective in the face of fairly heavy contamination, but is unable to overcome the greater fraction of contaminated models as f2 0.85: 1 (1 0.85)3 = vs. 1 (1 0.85)2 =

58 58 Maples et al. Fig. 52. Left column: 101 uncontaminated measurements, drawn from a Gaussian distribution of mean y(x) = e (x 0.5) and standard deviation 1. Right column: Models, {θ} j, calculated from each pair of measurements in the panel to the left, using y(x) = be m(x x), Equations 24 and 25, and model parameters lnb and m (see final bullet point). Each calculated parameter value is weighted, 14 and darker points correspond to models where the product of these weights is in the top 50%. Purple corresponds to the original, underlying model, and in both columns, blue corresponds to the weighted mode of {{θ} j }, green corresponds to the weighted median of {{θ} j }, and red corresponds to maximum-likelihood model fitting. Top row: Here, we additionally weight each x i by σ 2 lny (x i) y 2 (x i ) when calculating x, 18 which results in a fairly } uncorrelated/fairly concentrated distribution for the highest-weight lnb vs. m values, and consequently, the weighted mode of {{θ} j, in particular, is less susceptible to imprecision. Bottom row: Here, we do not additionally weight each x i when calculating x, which results } in a strongly correlated/dispersed distribution for the highest-weight ln b vs. m values, and consequently, the weighted mode of {{θ} j, in particular, is more susceptible to imprecision. χ 2 3 = [y(x i σ 1,σ 2,m) y i ] 2, N i=1 where N = floor(0.683n ) is the number of points for which x i 1. Then for a given break point, m, the best fit is given by dχ 2 3/dσ 1 = dχ 2 3/dσ 2 = 0, yielding: (A2)

59 Robust Chauvenet Rejection 59 Fig. 53. Upper left: 43 uncontaminated measurements, drawn from a Gaussian distribution of mean y(x) = 10(x 0.5)+20(x 0.5) 2 and standard deviation 1. Upper right and bottom row: Models, {θ} j, calculated from each triplet of measurements in the upper-left panel, using y(x) = b + m 1 (x x) + m 2 (x x) 2 and model parameters b, m 1, and m 2. Each calculated parameter value is weighted, 16 and darker points correspond to models where the product of these weights is in the} top 50%. Purple corresponds to the original, underlying model, and in all panels, blue corresponds to the weighted mode of {{θ} j, green corresponds to the weighted median of } {{θ} j, and red corresponds to maximum-likelihood model fitting. The highest-weight b vs. m 1 values, corresponding to linear y eff (x) [ ] y(x) ˆm 2 (x x) 2 b + m 1 (x x), and the highest-weight m 1 vs. m 2 values, corresponding to linear y eff (x) y(x) ˆb /(x x) m 1 +m 2 (x x), are largely uncorrelated, but the highest-weight b vs. m 2 values, corresponding to non-linear y eff (x) y(x) ˆm 1 (x x) b+m 2 (x x) 2, are marginally, negatively correlated: Since (x x) 2 is always positive, if m 2 is high, b tends to be low, to compensate. m ] x 2 N N i +x2 m 1 x m (x i x m ) i=1 i=m+1 i=m+1 σ = 2 N (x i x m ) (x i x m ) 2 [ σ1 x m N i=m+1 i=m+1 1 m i=1 N x i y i +x m N i=m+1 y i i=m+1 (x i x m )y i We use a recursive partitioning algorithm to efficiently find the value of m for which χ 2 3 is minimized. We restrict m > 1 to avoid the following pathological case: If µ is measured by the median and N is odd, one of the measured values will always equal the median value, and consequently y 1 will always be zero; m = 1 would then imply σ 1 = 0, but without meaning.. (A3)

60 60 Maples et al. Fig. 54. Left column: 43 measurements, with fraction f 1 = 1 f 2 drawn from a Gaussian distribution of mean y(x) = 10(x 0.5)+20(x 0.5) 2 and standard deviation 1, and fraction f 2 = 0.15 (top row), 0.5 (middle row), and 0.85 (bottom row), representing contaminated measurements, drawn from the positive side of a Gaussian distribution of mean zero and standard deviation 10, and added {{θ} j }, green corresponds to the weighted to uncontaminated measurements, drawn as above. Blue corresponds to the weighted mode of } median of {{θ} j, and red corresponds to maximum-likelihood model fitting. Right column: After robust Chauvenet rejection. Here, we have again performed bulk rejection using our generalization of the mode, followed by our most general robust technique for symmetrically distributed uncontaminated measurements our generalization of the mode + technique 3 (the broken-line fit) followed by our most precise robust technique our generalization of the median + technique 1 (the 68.3% value) followed by regular Chauvenet rejection using our generalization of the mean ( 8, 9). Robust Chauvenet rejection proves effective in the face of fairly heavy contamination, but is unable to overcome the greater fraction of contaminated models as f (see third bullet point).

61 Fig. 55. Average recovered µ 1 for (1) no rejection, (2) Peirce rejection, (3) regular Chauvenet rejection, and (4) robust Chauvenet rejection, for two-sided contaminants (left) and one-sided contaminants (right). See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. In the case of two-sided contaminants, all techniques recover µ 1 0 (Figure 6). In the case of one-sided contaminants, regular Chauvenet rejection, as implemented in this paper, is superior to Peirce rejection, and robust Chauvenet rejection is superior to regular Chauvenet rejection. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 61

62 Fig. 56. Average recovered µ 1 for (1) no rejection, (2) Peirce rejection, (3) regular Chauvenet rejection, and (4) robust Chauvenet rejection, for two-sided contaminants (left) and one-sided contaminants (right). See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. Peirce rejection is of course better than no rejection, but only at low contamination fractions and strengths. Regular Chauvenet rejection, as implemented in this paper, is comparable to Peirce rejection in the case of two-sided contaminants and low N, but is otherwise superior to Peirce rejection. Robust Chauvenet rejection is superior to both Peirce rejection and regular Chauvenet rejection, albeit with marginally reduced precision at low N ( 8). The colors are scaled logarithmically, between 0.02 and Maples et al.

63 Fig. 57. Average recovered σ 1 for (1) no rejection, (2) Peirce rejection, (3) regular Chauvenet rejection, and (4) robust Chauvenet rejection, for two-sided contaminants (left) and one-sided contaminants (right). See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. Peirce rejection is of course better than no rejection, but only at low contamination fractions and strengths. Regular Chauvenet rejection, as implemented in this paper, is comparable to Peirce rejection in the case of two-sided contaminants and low N, but is otherwise superior to Peirce rejection. Robust Chauvenet rejection is superior to both Peirce rejection and regular Chauvenet rejection. The colors are scaled logarithmically, between 0.02 and 100. Robust Chauvenet Rejection 63

64 Fig. 58. Average recovered σ 1 for (1) no rejection, (2) Peirce rejection, (3) regular Chauvenet rejection, and (4) robust Chauvenet rejection, for two-sided contaminants (left) and one-sided contaminants (right). See Figure 5 for contaminant strength (σ 2 ) vs. fraction of sample (f 2 ) axis information. Peirce rejection is of course better than no rejection, but only at low contamination fractions and strengths. Regular Chauvenet rejection, as implemented in this paper, is comparable to Peirce rejection in the case of two-sided contaminants and low N, but is otherwise superior to Peirce rejection. Robust Chauvenet rejection is superior to both Peirce rejection and regular Chauvenet rejection, albeit with marginally reduced precision at low N ( 8). The colors are scaled logarithmically, between 0.02 and Maples et al.

Numerical Descriptive Measures. Measures of Center: Mean and Median

Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where