Simulation of Extreme Events in the Presence of Spatial Dependence

Simulation of Extreme Events in the Presence of Spatial Dependence Nicholas Beck Bouchra Nasri Fateh Chebana Marie-Pier Côté Juliana Schulz Jean-François Plante Martin Durocher Marie-Hélène Toupin Jean-François Quessy Jonathan Jalbert Véronique Tremblay Nouredine Daili Desjardins General Insurance Group

Problem Evaluate the risk of coastal floods (from salt water) Deterministic model for the propagation of water in land Create a model to simulate water levels on the coast Capture spatial dependence to evaluate risk properly

Data and related choices Measure from tide gauges 21 locations, east of Québec City (at most) 50 years of data (much fewer in some locations) Interest in Z, the annual maximum water level L, the normal (theoretical) level of a high tide is known Longitude and latitude used for location

Working hypothesis #1 No temporal trends in the annual maxima X1700 3.0 3.2 3.4 3.6 3.8 4.0 4.2 X1805 2.2 2.4 2.6 2.8 3.0 0 10 20 30 40 50 0 10 20 30 40 50 X2145 2.9 3.0 3.1 3.2 X2780 3.4 3.6 3.8 4.0 0 10 20 30 40 50 0 10 20 30 40 50 X3057 6.8 7.0 7.2 7.4 7.6 X835 2.8 2.9 3.0 3.1 3.2 0 10 20 30 40 50 0 10 20 30 40 50 Shown above : Annual maximum vs year at 6 different sites. Climate change ignored : insurance contract are for one year.

Working hypothesis #2 Spatial trend in mean Tide 2 4 6 8 Tide (corrected) 0.5 1.0 1.5 70 65 60 55 longitude 70 65 60 55 longitude Annual maxima vs longitude. Left : raw data (Z ) ; Right : corrected for tide level (Z L)

Notation Notation Let a random fields Z (s), where Z (s) are annual maxima at location s R 2 Joint distribution H(z 1,..., z n ) = Pr {Z (s 1 ) z 1,..., Z (s n ) z n } Looking for a model for H

Plan Marginal model Generalized Extreme Value Spatial model t Copula Solution #1 : Frequentist approach Solution #2 : Bayesian approach Conclusion

Extreme value theory Theorem (Fisher-Tippett) Let Y 1,..., Y m be an iid sequence of variables and Z m = max{y 1,..., Y m }. If there exists some b m > 0 and a m R such that Z m a m b m H, for some non-degenerate distribution H, then H has a generalized extreme value (GEV) distribution, that is exp { (1 + ξ y µ σ F(y; µ, σ, ξ) = ) 1/ξ}, ξ 0, 1 + ξ y µ σ > 0 exp { exp ( y µ )}, ξ = 0, y R. σ

Extreme value theory In particular, when m is large enough, Z m GEV (µ, σ, ξ) and this approximation holds true even if Y 1,..., Y m is not independent. - short/medium range autocorrelation "disappears" when taking maximum (yay!)

Copula approach Decomposition H(z 1,..., z 21 ) = C Σθ,γ {F β1 (z 1 ),..., F β21 (z 21 )} C Σθ,γ is a copula and F β are marginal distributions ( ) δij (Σ θ,γ ) ij = g γ θ Isotropic random field

Copula approach Student copula Non zero upper tail dependence { } λ U = lim Pr X 1 > F 1 u 1 1 (u) X 2 > F 1 2 (u) Close under margins Power exponential model g ν (δ) = exp [ ( ) δ γ ] 3 θ where θ is a range parameter and γ is a smoothness parameter.

Copula approach Effect of the tail dependence Gaussian t dof=8 latitude 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 latitude 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 longitude 0.0 0.2 0.4 0.6 0.8 1.0 longitude

Copula approach Effect of the range parameter θ t dof=8, theta=1000 t dof=8, theta=3000 latitude 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 latitude 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 longitude 0.0 0.2 0.4 0.6 0.8 1.0 longitude

Copula approach Effect of the smoothness parameter γ t dof=8, nu=1 t dof=8, nu=0.2 latitude 0.0 0.2 0.4 0.6 0.8 1.0 0.8 0.6 0.4 0.2 latitude 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 longitude 0.0 0.2 0.4 0.6 0.8 1.0 longitude

Frequentist model Marginal GEV model Marginal distribution with covariates (Z L) = Z GEV (µ (s lon ), σ, ξ) Location : µ (s lon ) = a 0 + a 1 s lon + a 2 s 2 lon Constant scale σ and shape ξ Likelihood for the marginal L(a 0, a 1, a 2, σ, ξ) = n k=1 where n = number of year-site f (z k ; a 0, a 1, a 2, σ, ξ)

Frequentist model Marginal GEV model µ = 4.74 + 0.16s long + 0.002slong 2 σ = 0.20 ξ = 0.07

Frequentist model Estimation of the location parameters µ

Frequentist model Copula estimation Marginal parameters are known : β i = ( µ i, σ, ξ) Second step : Pseudo-likelihood L(θ, γ) = n i=1 } c (d i ) Σ θ,γ (Z {F β1 i,1 ),..., (Z F βdi i,di ), where d i is the number of non missing sites at time i = 1,..., n.

Frequentist model Evolution of the dependence (ˆθ = 2850, ˆγ =.37 df = 21)

Frequentist model Boxplot of simulated (green) vs true (blue) values

Hierarchical Bayesian model 1) Data level Z(s) θ has copula C θ and Z (s i ) µ(s i ), σ, ξ GEV (µ(s i ), σ, ξ) 2) Process level Non-informative priors are used for σ, ξ and θ = (θ 1, θ 2 ) and (µ(s) β, τ, ϱ) N d. 3) Hyper priors β = (β 0, β 1 ) and ϱ have non-informative priors and τ has a non-informative prior.

Hierarchical Bayesian model Given the parameters µ, σ, ξ and θ, the joint distribution of the maxima is F Z µ,σ,ξ,θ (z) = C θ [F{z(s 1 ) µ 1, σ, ξ},..., F {z(s d ) µ d, σ, ξ}], where C θ is a t copula F is the cdf of the GEV distribution. Fitting is performed by MCMC.

Hierarchical Bayesian model copula parameters The degrees of freedom of the t copula are fixed. The correlation depends on the distance between the stations : ρ z (s i, s j ) = θ dist(s i,s j ) θ 2 1, with f θ1 is U(0, 1) and f θ2 is U(0, 2).

Hierarchical Bayesian model marginal parameters At the process level, we use non-informative priors for σ and ξ, while, for s = (s 1,..., s d ), µ(s) N d (X(s)β, Σ(s)), where X(s i )β = β 0 + β 1 x i and x i is the deterministic predicted maximum tide level at station s i, Σ(s) = (ρ µ (s i, s j )) d d, with ρ µ (s i, s j ) = τ exp( dist(s i, s j )/ϱ). A non-informative prior was used for β, f ϱ is U(0, max i,j {dist(s i, s j )}) and f τ (τ) 1/τ.

Goodness of fit testing 3.5 7.5 Estimated Quantiles 3 2.5 2 1.5 1.8 2 2.2 2.4 2.6 2.8 Empirical Quantiles (a) Station 2 Estimated Quantiles 7 6.5 6 5.5 6.2 6.4 6.6 6.8 Empirical Quantiles (b) Station 11 FIGURE QQ Plots for GEV margins

Goodness of fit testing (cont.) 9 8 7 GEV Location 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Station FIGURE Boxplots for stations location parameter

Simulation Simulation from this model can be done with the following (simple!) algorithm : (1) Generate the parameters β, τ, ϱ, µ(s), σ, ξ and θ (2) Generate realization from the copula C θ, i.e. u = (u 1,..., u d ) (3) Using the GEV margins, invert the copula realizations using x i = F 1 (u i µ(s i ), σ, ξ), i = 1,..., d. Can also be used for interpolation between sites as the hierarchical structure accounts for spatiality.

Conclusion Applicable model with reasonnable results Scalable Annual maxima vs multiple claims the same year Room to improve the modelling of µ and σ Need to check the fit on simultaneous data Great team!