Probabilistic Meshless Methods for Bayesian Inverse Problems Jon Cockayne July 8, 2016 1
Co-Authors Chris Oates Tim Sullivan Mark Girolami 2
What is PN? Many problems in mathematics have no analytical solution, and must be solved numerically. In Probabilistic Numerics we phrase such problems as inference problems and construct a probabilistic description of this error. 3
What is PN? Many problems in mathematics have no analytical solution, and must be solved numerically. In Probabilistic Numerics we phrase such problems as inference problems and construct a probabilistic description of this error. This is not a new idea 1! Lots of recent development on Integration, Optimization, ODEs, PDEs... see http://probnum.org/ 2 [Kadane, 1985, Diaconis, 1988, O Hagan, 1992, Skilling, 1991] 3
What does this mean for PDEs? A prototypical linear PDE. Given g, κ, b find u (κ(x) u(x)) = g(x) u(x) = b(x) in D on D For general D, κ(x) this cannot be solved analytically. 4
What does this mean for PDEs? A prototypical linear PDE. Given g, κ, b find u (κ(x) u(x)) = g(x) u(x) = b(x) in D on D For general D, κ(x) this cannot be solved analytically. The majority of PDE solvers produce an approximation like: û(x) = N w i φ i (x) i=1 We want to quantify the error from finite N probabilistically. 4
What does this mean for PDEs? Inverse Problem: Given partial information of g, b, u find κ (κ(x) u(x)) = g(x) u(x) = b(x) in D on D 5
What does this mean for PDEs? Inverse Problem: Given partial information of g, b, u find κ (κ(x) u(x)) = g(x) u(x) = b(x) in D on D Bayesian Inverse Problem: κ Π κ Data u(x i ) = y i κ y Π y κ We want to account for an inaccurate forward solver in the inverse problem. 5
Why do this? Using an inaccurate forward solver in an inverse problem can produce biased and overconfident posteriors. 6
Why do this? Using an inaccurate forward solver in an inverse problem can produce biased and overconfident posteriors. 16 14 12 10 8 6 4 2 Probabilistic Standard ma = 4 ma = 8 ma = 16 ma = 32 ma = 64 0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 θ θ Comparison of inverse problem posteriors produced using the Probabilistic Meshless Method (PMM) vs. symmetric collocation. 6
Forward Problem 7
Abstract Formulation Au(x) = g(x) in D Forward inference procedure: u Π u Data Au(x i ) = g(x i ) u g Π g u 8
Posterior for the forward problem Use a Gaussian Process prior u Π u = GP(0, k). Assuming linearity, the posterior Π g u is available in closed-form 2. 2 [Cockayne et al., 2016, Särkkä, 2011, Cialenco et al., 2012, Owhadi, 2014] 9
Posterior for the forward problem Use a Gaussian Process prior u Π u = GP(0, k). Assuming linearity, the posterior Π g u is available in closed-form 2. Π g u GP(m 1, Σ 1 ) m 1 (x) = ĀK(x, X ) [ AĀK(X, X ) ] 1 g Σ 1 (x, x ) = k(x, x ) ĀK(x, X ) [ AĀK(X, X ) ] 1 AK(X, x ) Ā the adjoint of A Observation: The mean function is the same as in symmetric collocation! 2 [Cockayne et al., 2016, Särkkä, 2011, Cialenco et al., 2012, Owhadi, 2014] 9
Theoretical Results Theorem (Forward Contraction) For a ball B ɛ (u 0 ) of radius ɛ centered on the true solution u 0 of the PDE, we have ( ) h 1 Π g 2β 2ρ d u [B ɛ (u 0 )] = O ɛ h the fill distance β the smoothness of the prior ρ < β d/2 the order of the PDE d the input dimension 10
Toy Example 2 u(x) = g(x) x (0, 1) u(x) = 0 x = 0, 1 To associate with the notation from before... Π u GP(0, k(x, y)) A = d 2 dx 2 Ā = d 2 dy 2 11
Forward problem: posterior samples g(x) = sin(2πx) 0.06 0.04 Mean Truth Samples 0.02 u 0.00 0.02 0.04 0.06 0.0 0.2 0.4 0.6 0.8 1.0 x 12
Forward problem: convergence 10-1 10-1 µ u 10-2 10-3 10-4 Tr(Σ1) 10-2 10-3 10-5 0 20 40 60 80 100 ma 10-4 0 20 40 60 80 100 ma (a) Mean error from truth (b) Trace of posterior covariance Figure 2: Convergence 13
Inverse Problem 14
Recap (κ(x) u(x)) = g(x) u(x) = b(x) in D on D Now we need to incorporate the forward posterior measure Π g u into the posterior measure for the inverse problem, κ 15
Incorporation of Forward Measure Assuming the data in the inverse problem is: y i = u(x i ) + ξ i ξ N(0, Γ) i = 1,..., n implies the standard likelihood: But we don t know u p(y κ, u) N(y; u, Γ) 16
Incorporation of Forward Measure Assuming the data in the inverse problem is: y i = u(x i ) + ξ i ξ N(0, Γ) i = 1,..., n implies the standard likelihood: p(y κ, u) N(y; u, Γ) But we don t know u Marginalise the forward posterior Π g u to obtain a PN likelihood: p PN (y κ) p(y κ, u)dπ g u N(y; m 1, Γ + Σ 1 ) 16
Inverse Contraction Denote by Π y κ the posterior for κ from likelihood p, and by Π y κ,pn the posterior for κ from likelihood p PN. Theorem (Inverse Contraction) Assume Π y κ δ(κ 0 ) as n. Then Π y κ,pn δ(κ 0) provided h = o(n 1/(β ρ d/2) ) 17
Back to the Toy Example (κ u(x)) = sin(2πx) x (0, 1) u(x) = 0 x = 0, 1 Infer κ R + ; data generated for κ = 1 at x = 0.25, 0.75. Corrupted with independent Gaussian noise ξ N(0, 0.01 2 ) 18
θ θ Posteriors for κ 16 14 12 10 8 6 4 2 Probabilistic Standard ma = 4 ma = 8 ma = 16 ma = 32 ma = 64 0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 θ θ (a) Posterior Distributions for different numbers of design points. 1.2 1.1 Probabilistic Standard 1.0 0.9 0.8 0.7 0.6 0.5 0 10 20 30 40 50 60 70 n 0 10 20 30 40 50 60 70 n (b) Convergence of posterior distributions with number of design points. 19
Nonlinear Example: Steady-State Allen Cahn 20
Allen Cahn A prototypical nonlinear model. δ 2 u(x) + δ 1 (u(x) 3 u(x)) = 0 x (0, 1) 2 u(x) = 1 x 1 {0, 1} ; 0 < x 2 < 1 u(x) = 1 x 2 {0, 1} ; 0 < x 1 < 1 Goal: infer δ from 16 equally spaced observations of u(x) in the interior of the domain. 1.0 Negative Stable 1.0 Unstable 1.0 Positive Stable 1.00 0.8 0.8 0.8 0.75 0.50 0.6 0.6 0.6 0.25 0.00 0.4 0.4 0.4 0.25 0.2 0.2 0.2 0.50 0.75 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.00 21
Allen Cahn: Forward Solutions Nonlinear PDE - non-gp posterior sampling schemes required, see [Cockayne et al., 2016]. 1.0 Initial Design 1.2 1.0 Optimal design 1.2 0.8 0.9 0.6 0.8 0.9 0.6 0.6 0.3 0.6 0.3 0.0 0.0 0.4 0.3 0.4 0.3 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.6 0.9 1.2 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.6 0.9 1.2 22
Allen Cahn: Inverse Problem 140 120 100 PMM l = 5 PMM l = 10 PMM l = 20 PMM l = 40 PMM l = 80 1200 1000 800 FEM 5x5 FEM 10x10 FEM 25x25 FEM 50x50 80 600 60 40 400 20 200 0 0.02 0.03 0.04 0.05 0.06 0.07 0.08 θ (a) PMM 0 0.02 0.03 0.04 0.05 0.06 0.07 0.08 θ (b) FEA Comparison of posteriors for δ with different solver resolutions, when using the PMM forward solver with PN likelihood, vs. FEA forward solver with Gaussian likelihood. 23
Conclusions 24
We have shown... How to build probability measures for the forward solution of PDEs. How to use this to make rhobust inferences in PDE inverse problems, even with inaccurate forward solvers. 25
We have shown... How to build probability measures for the forward solution of PDEs. How to use this to make rhobust inferences in PDE inverse problems, even with inaccurate forward solvers. Coming soon... Evolutionary problems ( / t). More profound nonlinearities. Non-Gaussian priors. 25
I. Cialenco, G. E. Fasshauer, and Q. Ye. Approximation of stochastic partial differential equations by a kernel-based collocation method. Int. J. Comput. Math., 89(18):2543 2561, 2012. J. Cockayne, C. J. Oates, T. Sullivan, and M. Girolami. Probabilistic Meshless Methods for Partial Differential Equations and Bayesian Inverse Problems. arxiv preprint arxiv:1605.07811, 2016. P. Diaconis. Bayesian numerical analysis. Statistical decision theory and related topics IV, 1:163 175, 1988. J. B. Kadane. Parallel and Sequential Computation: A Statistician s view. Journal of Complexity, 1:256 263, 1985. A. O Hagan. Some Bayesian numerical analysis. Bayesian Statistics, 4:345 363, 1992. H. Owhadi. Bayesian numerical homogenization. arxiv preprint arxiv:1406.6668, 2014. S. Särkkä. Linear operators and stochastic partial differential equations in Gaussian process regression. In Artificial Neural Networks and Machine Learning ICANN 2011: 21 st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part II, pages 151 158. Springer, 2011. J. Skilling. Bayesian solution of ordinary differential equations. Maximum Entropy and Bayesian Methods, 50: 23 37, 1991. 26