Improved Inference for Signal Discovery Under Exceptionally Low False Positive Error Rates

Size: px

Start display at page:

Download "Improved Inference for Signal Discovery Under Exceptionally Low False Positive Error Rates"

Shanon Rogers
5 years ago
Views:

1 Improved Inference for Signal Discovery Under Exceptionally Low False Positive Error Rates (to appear in Journal of Instrumentation) Igor Volobouev & Alex Trindade Dept. of Physics & Astronomy, Texas Tech University Dept. of Mathematics & Statistics, Texas Tech University December 2018 Edgeworth Expansions for Mixture Models 1 / 30

2 Signal Strength Determination by Maximum Likelihood Model is signal/background density mixture without nuisance parameters: p(x α) = αs(x) + (1 α)b(x) (1) Signal fraction α is estimated by maximizing the log-likelihood: l(α) = n log p(x i α). (2) i=1 ˆα := arg max l(α) (3) α R Goal: produce accurate tests of H 0 : α = 0 vs. H 1 : α > 0. Only unknown parameter is α; a toy problem... But we ll make it more realistic at the end. alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 2 / 30

3 Example: Flat Background With Gaussian Signal Let b(x) follow a uniform distribution on [0, 1], and s(x) a truncated Gaussian on [0, 1]: { 1, if x [0, 1] b(x) =, (4) 0, if x [0, 1] s(x) = e (x µ)2 2σ 2 / 1 0 (y µ) 2 e 2σ 2 dy, if x [0, 1] 0, if x [0, 1]. (5) This will be the model used in simulations, and whenever specific settings of the signal are needed, we use µ = 0.5, and σ = 0.1. (6) alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 3 / 30

4 Notation l i (α) = i l/ α i, the i-th derivative of l(α) J(α) = l 2 (α) Expected information number: I (α) = E[J(α)] Observed information number: J(ˆα) = l 2 (ˆα) Assume usual regularity conditions for consistency and asymptotic normality of ˆα are satisfied: n(ˆα α) d N (0, I(α) 1 ), 1 I(α) = lim n n I (α) = ˆα N (α, σ 2ˆα (α)), σ2ˆα (α) = I (α) 1 (By not restricting α [0, 1] we avoid exotic asymptotics at the boundaries...) alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 4 / 30

5 A Gamut of Tests for α In lack of a UMP test, we have the following: Table: Promising statistics for tests on α. Method Statistic Value Likelihood Ratio T LR 2[l(ˆα) l(0)] Wald (Expected) T W ˆα 2 I (0) Wald (Observed) T W2 ˆα 2 J(ˆα) Score T S l 1 (0) 2 /I (0) Wald-type 3 T W3 Wald-type 4 T W4 ˆα 2 /σ3 2 ˆα 2 /σ4 2 The Wald-type 3 & 4 statistics are variants of T W2 (used by physicists) that use shortcuts for computing J(α) so as to avoid differentiating l(α). alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 5 / 30

6 Higher-Order Asymptotics For one-sided testing use signed version of any of the statistics (say T ) in the Table: R = sgn(ˆα) T. Under H 0, to first order R Z, where Z N (0, 1), whence p -value = P(Z > r), r = sgn(ˆα) t In general, R n Z to k-th order, means that approx error = R n Z = O p (n k/2 ) P(R n r) = Φ(r) + a 1,n n 1/2 + a 2,n n 1 + a 3,n n 3/2 + + a k 1,n n (k 1)/2 + O(n k/2 ) alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 6 / 30

7 Tools for Higher-Order Asymptotic Theory Taylor expansions of l(α) near true value of α Joint cumulants for the derivatives of l(α) under H 0 : nν ijkl = (i, j, k, l)-th joint cumulant of {l 1 (0),..., l 4 (0)} Edgeworth-type series: construct an approximate pdf for R (which is approx N (0, 1)) via the Gram-Charlier expansion: f R (z) = φ(z) 1 + β j H j (z), H j (z) are the Hermite polynomials. Coefficients β j are chosen to match the cumulants κ j of R n. j=1 alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 7 / 30

8 Final Edgeworth Expansion for CDF of R Integrate Gram-Charlier expansion and collect terms in powers of n 1/2 : [ F R (z) = Φ(z) φ(z) κ κ 3H 2 (z) + 1 ( 1 2 (κ2 1 + κ 2 1)z + 6 κ 1κ ) 24 κ 4 H 3 (z) + 1 ] 72 κ2 3H 5 (z) + O(n 3/2 ). (7) (There are some technical assumptions on the cumulant behavior of R...) alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 8 / 30

9 Relating Cumulants of Log-Likelihood to those of Statistic The above expression for F R (z) holds for any statistic R which is approx N (0, 1). The challenge is to be able to express (approximate) the κ j (which are unknown) in terms of the ν ijkl (which can be computed)!!! Has to be done case-by-case for each statistic R: start from suitable Taylor expansions in probability, and use some tricks... Required A LOT OF BOOKKEEPING (20th century). In 21st century this can be replaced with careful programming of a symbolic algebra system (Maple/Mathematica). The relationships in the above challenge have been worked out (to 3rd order) for the classical statistics (LR, Wald, Score), so we: corrected some typos in the existing expressions (Severini, 2000), worked out 4th order expansions for the classical statistics, and worked out 3rd order expansions for the non-classical statistics. alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 9 / 30

10 More Notation E denotes expectation under the null : E[q] := q(x)b(x)dx. E s denotes expectation under the signal : E s [q] := q(x)s(x)dx. Defining V i := El i (0), the Edgeworth expansions for all the statistics in Table 1 depend only on the following (dimensionless & location-scale invariant) expressions: γ := ρ := V 4 6V 2 2 V 3 2 ( V 2 ) 3/2 = E s = E s [ s 3 b 3 ] [ s 2 b 2 ] 3E s [ s b ] + 2 ( Es [ s b ] 1 ) 3/2, (8) 4E s [ s 2 b 2 ] + 6E s [ s b ] 3 ( Es [ s b ] 1 ) 2. (9) alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 10 / 30

11 Resulting Higher-Order Distributions of Test Statistics Plugging the (thus) approximated ˆκ j into F R (z) in (7), gives, e.g. Signed Wald statistic: [ ( ) 1 P(R W z) = Φ(z) φ(z) n 1/2 6 γh 2(z) ( 1 + n 1 2 (ρ γ2 1)z + 1 ) 24 (ρ 3)H 3(z) + γ2 72 H 5(z) ] + O(n 3/2 ). Signed likelihood ratio statistic: [ P(R LR z) = Φ(z) φ(z) n 1/2 ( γ 6 + n 1 ( 1 12 (3ρ 2γ2 )z ) ) ] + O(n 3/2 ). alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 11 / 30

12 But Wait: why do we need higher-order asymptotics? When n is small (typically not the case in these experiments). When Type I error rate (q 1 ) is very small..., how small? In signal-hunting particle physics experiments the gold standard is 5σ: q 0 = P(Z > 5) = This puts us way out in the tail of the N (0, 1)... alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 12 / 30

13 Quantifying Deviations From Normality Consider normal approx error R(r) = r r, r = Φ 1 (F R (r)) With Edgeworth-approx F R ( ): r = r to an accuracy of O(n 3/2 ) under H 0 If R is exactly N (0, 1): Large values of R(r): R(r) = 0 Edgeworth-approx had a large effect in normalizing R alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 13 / 30

14 Ex: R(r) for Flat Background With Gaussian Signal Wald (Expected): R W Wald (Observed): R W2 R(r) R(r) r r Score: R S Likelihood Ratio: R LR R(r) R(r) n=200 n=1000 n=5000 n= r r alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 14 / 30

15 Some Deviations From Normality for this Example µ = 0.5 and σ = 0.1, implies γ and ρ Wald statistic at r = 5 for n = 200: R W (r) = 0.25 means p-value is wrong by factor of P(Z > 5)/P(Z > 5.25) 3.8 higher signal significance will be claimed than supported by data. LR statistic at r = 4 for n = 200: R LR (r) = means p-value is wrong by factor of P(Z > 4)/P(Z > 3.995) 0.98 reported signal significance will be about right. alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 15 / 30

16 Simulations: Quantify Accuracy of Edgeworth-Approx m = 10 9 Monte Carlo replicates for n = 200, 1000, 5000, Compared distributions of R with Edgeworth-predictions. For 3 classical stats (R W, R LR, R S ): distributional shape parameters (mean, standard deviation, skewness, kurtosis) are in good agreement with O(n 3/2 ) predictions for n The agreement worsens for n = Statistically significant disagreements with N (0, 1) predictions are in red values in next Table. alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 16 / 30

17 Simulations: Predicted Mean (Equals 0 if Exactly N (0, 1)) O(n 3/2 ) Simulated Simulation n Prediction Value Uncertainty R W R W R W R W R LR R S alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 17 / 30

18 Simulations: Predicted Survival Probabilities (R W & R W2 ) Log base 10 of survival probability R W and n = 200 N(0,1) prediction O(n 1 ) prediction O(n 3 2 ) prediction Log base 10 of survival probability R W and n = r r Log base 10 of survival probability R W2 and n = Log base 10 of survival probability R W2 and n = r r alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 18 / 30

19 Simulations: Predicted Survival Probabilities (R LR & R S ) Log base 10 of survival probability R LR and n = Log base 10 of survival probability R LR and n = r r Log base 10 of survival probability R S and n = Log base 10 of survival probability R S and n = 1000 N(0,1) prediction O(n 1 ) prediction O(n 3 2 ) prediction r r alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 19 / 30

20 Simulations: Quantiles (Predicted vs. Simulated) Table: O(n 3/2 ) Edgeworth predictions for the (1 q 0 )-th quantiles of the statistics in Table 1 (± std. err.), compared to their corresponding values computed based on 10 9 simulations. (Predictions deviating by more than twice the std. err. from their simulated values are bolded.) Statistic Method n = 200 n = 1000 n = 5000 n = R W Predicted ± ± ± ± Simulated R W2 Predicted ± ± ± ± Simulated R W3 Predicted ± ± ± ± Simulated R W4 Predicted ± ± ± ± Simulated R LR Predicted ± ± ± ± Simulated R S Predicted ± ± ± ± Simulated alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 20 / 30

21 Simulations: Type I Errors (Nominal Level is q 0 = ) Table: Type I error probabilities: rejection is based on the O(n 3/2 ) Edgeworth-predicted quantiles from previous Table. (Values that deviate by more than twice the simulation uncertainty of from the nominal value are bolded.) n R W R W2 R W3 R W4 R LR R S alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 21 / 30

22 Type II Errors: Strategy Since all statistics have same asymptotics as MLE (to 1st order), when truth is α = α 1, we have R n N(α 1, σ 2ˆα (α 1)) Let c n be the Edgeworth-predicted quantiles from Table such that Then the power is: P α=0 (R n > c n ) = q 0, c n 5σˆα (0) 1 β = P α=α1 (R n > c n ) = P ( Z > c ) n α 1 σˆα (α 1 ) Thus choosing α 1 = 5σˆα (0) we should have as n : 1 β P (Z > 0) = 0.5 (Keeps difficulty in finding signal approx constant...) alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 22 / 30

23 Type II Errors: Settings Table: Values of the Cramer-Rao uncertainty for H 0, σ ˆα, and corresponding values of α = α 1 used as the actual model signal fraction under H 1. n σˆα α 1 = 5σˆα alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 23 / 30

24 Simulations: Type II Errors Table: Type II error probabilities (determined empirically), using the predicted & simulated quantiles from Table. The smallest predicted value at each n is in bold. (Simulation uncertainty ) Sample Size (n) R Method R W Predicted Simulated R W2 Predicted Simulated R W3 Predicted Simulated R W4 Predicted Simulated R LR Predicted Simulated R S Predicted Simulated alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 24 / 30

25 Insight: Why does LR perform so well? Often remarked in math-stat books... Mykland (1999): proves that k-th cumulant of R LR vanishes to O(n k/2 ) for all k 3 κ 3 = 0 to O(n 3/2 ) (but κ j 0 for j 2) κ 4 = 0 to O(n 2 ) (but κ j 0 for j 3) etc. Mykland speculates: this fact... would seem to be the main asymptotic property governing the accuracy behavior... of R LR. Why? Because the high-order cumulants are precisely the coefficients of the highest degree H k ( ) in the Edgeworth exp... alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 25 / 30

26 Extension 1: Nuisance parameters in signal & background Doable: s(x) & b(x) b(x φ). extend everything we have done in nuisance parameter setting (multivariate Edgeworth exp.). Problem: s(x) s(x θ) means θ is not identifiable under H 0 : classical inference for treating nuisance parameters then breaks down... Davies (Biometrika, 1987): appropriate p-value is an excursion probability p-value = P(max R(θ) > c) θ Θ Theory of Random Fields (TRF): emerged as only analytical solution so far (large-scale searches in neuroimaging, astrophysics, etc.) R(θ) is viewed as Gaussian random field over manifold Θ R d φ has been profiled out of R(θ, φ) : φ ˆφ provides closed-form approximaton when c is large... alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 26 / 30

27 TRF: Adler & Taylor (2007), Random Fields and Geometry, Springer Excursion set of field above level c: A c = {θ Θ : R(θ) > c} Euler characteristic of excursion set: φ(a c ) = geometric property of field Fundamental result in TRF: E[φ(A c )] = d a i f i (c) i=0 a i : positive constants (to be determined by Monte Carlo) f i ( ): known universal functions For large c: (Taylor et al., Annals of Probability, 2005) p-value = P(max θ Θ R(θ) > c) E[φ(A c)] p global alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 27 / 30

28 Nuisance in signal & background: improving TRF (THIS IS THE MAIN CONTRIBUTION OF THIS PAPER!) Suppose θ 0 & φ 0 Solution 1 (straightforward): treat all parameters via TRF in conjuction with Edgeworth O(n 3/2 ) normalized versions of LR statistic r r = Φ 1 (F R (r)) Solution 2 (exotic): adjust global significance of test statistic, leading to (conservative) estimate of p global in context of TRF... p global = P(R LR (ˆθ) > r(ˆθ)) r(θ) is observed (local) value of R LR (θ) computed from sample, ˆθ = arg max θ Θ r(θ). alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 28 / 30

29 Algorithm for Solution 2: details... Normal approx error for each observed (local) r r(θ) as before: R(r(θ)) = r(θ) r(θ) Locate: θ = arg max θ Θ R(r(θ)) Search can use same grid as TRF search for ˆθ = arg max r(θ). Calculate global significance of signal p global via TRF, and express it in terms of the global r: r global = Φ 1 (1 p global ) Adjust global r: r adj global = r global R(r(θ )) Global (adjusted) p -value is then: p adj global = 1 Φ(r adj global ) alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 29 / 30

30 Extension 2: Random sample size Suppose x 1,..., x N iid p(x α) and N P(ν(α)), then have two cases: if ν(α) independent of α so that ν(α) ν, inference on α is as before l(α, ν) = n log p(x i α) ν + n log ν = l(α) + l(ν) i=1 otherwise, log-likelihood is not separable, but under a simplified regime where α = ν s /(ν s + ν b ) with ν b known, we have l(ν s ) = n log p(x i α) (ν s + ν b ) + n log(ν s + ν b ) i=1 and can now re-do all calcs for the new parameter ν s... THE END! alex.trindade@ttu.edu Edgeworth Expansions for Mixture Models 30 / 30

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS Answer any FOUR of the SIX questions.