KERNEL PROBABILITY DENSITY ESTIMATION METHODS

Similar documents
Alternative VaR Models

Accelerated Option Pricing Multiple Scenarios

Continuous-Time Pension-Fund Modelling

Fast Convergence of Regress-later Series Estimators

Bloomberg. Portfolio Value-at-Risk. Sridhar Gollamudi & Bryan Weber. September 22, Version 1.0

Publication date: 12-Nov-2001 Reprinted from RatingsDirect

Market Risk: FROM VALUE AT RISK TO STRESS TESTING. Agenda. Agenda (Cont.) Traditional Measures of Market Risk

Measuring and managing market risk June 2003

Quantitative Measure. February Axioma Research Team

2.1 Mathematical Basis: Risk-Neutral Pricing

Financial Models with Levy Processes and Volatility Clustering

Predicting the Success of a Retirement Plan Based on Early Performance of Investments

Using Fractals to Improve Currency Risk Management Strategies

Statistical Models and Methods for Financial Markets

How Much Can Clients Spend in Retirement? A Test of the Two Most Prominent Approaches By Wade Pfau December 10, 2013

bitarisk. BITA Vision a product from corfinancial. london boston new york BETTER INTELLIGENCE THROUGH ANALYSIS better intelligence through analysis

Monte Carlo Methods for Uncertainty Quantification

Ideal Bootstrapping and Exact Recombination: Applications to Auction Experiments

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

STOCK PRICE PREDICTION: KOHONEN VERSUS BACKPROPAGATION

Small Sample Bias Using Maximum Likelihood versus. Moments: The Case of a Simple Search Model of the Labor. Market

The Importance (or Non-Importance) of Distributional Assumptions in Monte Carlo Models of Saving. James P. Dow, Jr.

Implementing Models in Quantitative Finance: Methods and Cases

Executive Summary: A CVaR Scenario-based Framework For Minimizing Downside Risk In Multi-Asset Class Portfolios

FX Smile Modelling. 9 September September 9, 2008

Better decision making under uncertain conditions using Monte Carlo Simulation

Modelling Returns: the CER and the CAPM

High Volatility Medium Volatility /24/85 12/18/86

Monte Carlo Methods in Structuring and Derivatives Pricing

Correlation Structures Corresponding to Forward Rates

Joensuu, Finland, August 20 26, 2006

Characterization of the Optimum

The Constant Expected Return Model

8. International Financial Allocation

symmys.com 3.2 Projection of the invariants to the investment horizon

Lecture outline. Monte Carlo Methods for Uncertainty Quantification. Importance Sampling. Importance Sampling

A new approach for scenario generation in risk management

Optimal Stochastic Recovery for Base Correlation

COMPARISON OF RATIO ESTIMATORS WITH TWO AUXILIARY VARIABLES K. RANGA RAO. College of Dairy Technology, SPVNR TSU VAFS, Kamareddy, Telangana, India

CHAPTER II LITERATURE STUDY

Option Pricing Using Bayesian Neural Networks

ANALYSIS OF THE BINOMIAL METHOD

Computational Finance Binomial Trees Analysis

Properties of the estimated five-factor model

A gentle introduction to the RM 2006 methodology

Relevant parameter changes in structural break models

As we saw in Chapter 12, one of the many uses of Monte Carlo simulation by

F19: Introduction to Monte Carlo simulations. Ebrahim Shayesteh

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Valuation of performance-dependent options in a Black- Scholes framework

Understanding goal-based investing

Applying Independent Component Analysis to Factor Model in Finance

Estimation of dynamic term structure models

IEOR E4703: Monte-Carlo Simulation

Chapter DIFFERENTIAL EQUATIONS: PHASE SPACE, NUMERICAL SOLUTIONS

Development and Performance Evaluation of Three Novel Prediction Models for Mutual Fund NAV Prediction

Application of multi-agent games to the prediction of financial time-series

Presented at the 2012 SCEA/ISPA Joint Annual Conference and Training Workshop -

CHAPTER 8: INDEX MODELS

Module 2: Monte Carlo Methods

Computer Exercise 2 Simulation

2 f. f t S 2. Delta measures the sensitivityof the portfolio value to changes in the price of the underlying

NATIONWIDE ASSET ALLOCATION INVESTMENT PROCESS

Comparative analysis and estimation of mathematical methods of market risk valuation in application to Russian stock market.

The mean-variance portfolio choice framework and its generalizations

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

"Vibrato" Monte Carlo evaluation of Greeks

Incorporating Model Error into the Actuary s Estimate of Uncertainty

Modeling of Price. Ximing Wu Texas A&M University

Random Variables and Probability Distributions

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Machine Learning for Quantitative Finance

Introduction to Sequential Monte Carlo Methods

Strategies for Improving the Efficiency of Monte-Carlo Methods

Spike Statistics. File: spike statistics3.tex JV Stone Psychology Department, Sheffield University, England.

A Correlated Sampling Method for Multivariate Normal and Log-normal Distributions

Measurement of Market Risk

LOSS SEVERITY DISTRIBUTION ESTIMATION OF OPERATIONAL RISK USING GAUSSIAN MIXTURE MODEL FOR LOSS DISTRIBUTION APPROACH

Statistical Modeling Techniques for Reserve Ranges: A Simulation Approach

Approximating the Confidence Intervals for Sharpe Style Weights

International Finance. Estimation Error. Campbell R. Harvey Duke University, NBER and Investment Strategy Advisor, Man Group, plc.

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

Lattice Coding and its Applications in Communications

Retirement. Optimal Asset Allocation in Retirement: A Downside Risk Perspective. JUne W. Van Harlow, Ph.D., CFA Director of Research ABSTRACT

Financial Computing with Python

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Chapter 2 Uncertainty Analysis and Sampling Techniques

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 4

EC316a: Advanced Scientific Computation, Fall Discrete time, continuous state dynamic models: solution methods

A structural model for electricity forward prices Florentina Paraschiv, University of St. Gallen, ior/cf with Fred Espen Benth, University of Oslo

One-Factor Models { 1 Key features of one-factor (equilibrium) models: { All bond prices are a function of a single state variable, the short rate. {

Sensitivity analysis for risk-related decision-making

Smile in the low moments

The misleading nature of correlations

On Arbitrage Possibilities via Linear Feedback in an Idealized Market

Introduction. Tero Haahtela

Spike Statistics: A Tutorial

Developing Time Horizons for Use in Portfolio Analysis

Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach

Maximum Likelihood Estimation

Transcription:

5.- KERNEL PROBABILITY DENSITY ESTIMATION METHODS S. Towers State University of New York at Stony Brook Abstract Kernel Probability Density Estimation techniques are fast growing in popularity in the particle physics community. This note gives an overview of these techniques, and compares their signal/background discrimination performance to that of an artificial neural network. 1 Introduction Over the past ten years, the particle physics community has become conversant with a number of quite sophisticated multivariate techniques that exploit higher-order correlations between variables to achieve optimum separation between signal and background. Kernel Probability Density Estimation (PDE) methods have recently become part of this arsenal (see, for instance, references [1] and [2]), and are based on the premise that continuous, differentiable functions can be exactly modelled by the infinite sum of some other, appropriately chosen, kernel function [3]. Fourier series are a familiar example of this concept; all periodic functions can be expressed as infinite sums of sine and cosine terms. We will restrict this overview to non-parametric PDE methods that use a Gaussian kernel. The choice of a Gaussian kernel is a natural one for particle physics applications, since nearly all variables we analyse have been Gaussian smeared by detector resolution, or other effects. Non-parametric simply means that no assumptions are made about the form of the probability density functions (PDF s) from which the samples are drawn. A typical application of Gaussian kernel PDE s begins with a sample of Monte Carlo events generated in a -dimensional parameter space. The Monte Carlo events are distributed according to some (unknown) PDF,. A Gaussian kernel PDE method estimates the value of the PDF at a point by the sum of Gaussians centred at the Monte Carlo generated points, : 6 < 4%5 687 "!#%$ '&(*),+ #%$ :9! 6 /132 &) $ ; (1) =, is a covariance matrix, and) is an additional scaling factor. where The optimal forms of and) are a matter of debate. We will discuss two options here: The static-kernel PDE Method determines from the covariance matrix of the overall sample. The scale factor) is set to >9!#@? +BADCFE [1]. Because the parameters of the resulting Gaussian kernel are the same for all points, this is known as a static kernel method. HG The Gaussian Expansion Method (GEM), developed by the author, determines from the covariance matrix of the points in the Monte Carlo sample that are spatially closest to in the normalised parameter space (where all variables are normalised to lie between and 1). The parameter) is set to one. In general, the arbitrary parameter HG must be large enough that the elements of are approximately Gaussian distributed about the true value, but small enough that the local I A colour version of this paper is available at http://www-d.fnal.gov/j smjt/durham/pde.ps 17

`ab.- R R _ 5 c structure of the original PDF close to is imitated by HGML &N G K is a reasonable choice, and that the quality of the simulation of strongly depend on the exact value of. D. The author has found that does not Since the covariance matrix depends on the density of points in the parameter space, the parameters of the Gaussian kernel will change for different. Thus this technique is known as a dynamic, or adaptive, kernel method. The statistical variance of the GEM PDF estimate is to yield: 'O $ 'O $. QPSR R P QP T U$VW T (,+ 6 7 9! 6 X /132ZY 6 7 9! 6 \[] (2) Note that there are only one or two free parameters of both methods, and they are rather intuitive in nature. This is in sharp contrast to a typical neural network, which usually has a number of tuning parameters, the meaning of which is usually not transparent to the average user. Both static-kernel PDE and GEM have relative advantages and disadvantages, which we will explore in the next section. arbitrary normalisation arbitrary normalisation 5 5 5 analytic PDF estimated PDF Monte Carlo generated with analytic PDF.5 ^ -8-6 -4-2 2 4 6 8 PDF (as estimated by alphapde) 5 5 5.5-8 -6-4 -2 2 4 6 8 PDF (as estimated by GEM) c(a) (b) Fig. 1: A comparision of the performance of the static-kernel PDE method to that of GEM. Both figures display an analytic PDF (dashed line), which is used to generate 1 events randomly sampled from the distribution (points). (a) and (b) show the static-kernel PDE and GEM estimates of the PDF, respectively (solid lines), based on the randomly generated sample. The estimate from the static-kernel PDE method has a smaller variance than that of the GEM method, but is more biased. The shaded region in (b) indicates the statistical variance of the GEM estimate, as provided by Equation 2. Note that neither method uses binned data to estimate the PDF. The data are binned here for display purposes only. 18

ge d ef 2 Comparison of static-kernel PDE and GEM A contrived example best shows the primary differences between the static-kernel PDE method and GEM. Figure 1 displays an analytic PDF (indicated by the dashed line), that both the static-kernel PDE method and GEM estimate using the same sample of 1 events randomly sampled from the PDF (the points). The static-kernel PDE method clearly gives a much smoother estimate than that of GEM. However, the static-kernel PDE estimate is biased, whereas the GEM estimate is nearly unbiased. The old adage you can t get something for nothing applies well here; the static-kernel PDE estimate of any PDF will always be biased (although in the limit of infinite Monte Carlo statistics the bias disappears), but will always have a smaller statistical variance than the GEM estimate. The bias is most prevalent in cases where the true PDF is non-differentiable, or has valleys. The variance of the GEM estimate, indicated by the shaded region in (b), is comparable to the variance of the binned Monte Carlo generated events. In the limit of infinite statistics, the variance goes to zero, and both GEM and the static-kernel PDE method then perform equally. While we know a priori the form of the true PDF in this case, in general it is unknown. Thus, in a real analysis situation, when Monte Carlo samples of limited size are used for training, it may be difficult to assess the bias inherent in the static-kernel PDE method. However, the speed of application of the static-kernel PDE method relative to that of GEM may make its use desirable if very large samples of Monte Carlo are involved (the GEM method is slower than static-kernel PDE, because the local covariance matrix must be calculated for each point in the Monte Carlo samples). background efficiency 1.9.8.7 background signal.6.5.4 MLPfit GEM static-kernal PDE ideal.4.5.6.7.8.9 1 signal efficiency Fig. 2: A comparision of the performance of PDE methods to that of an artificial neural network for the signal and background distributions displayed in the vignette. The training of the neural network fails in this case because the correlations in the parameter space are highly non-linear, and the number of events used in the training samples is relatively small (5 events each for signal and background). Based on the same small training samples, however, the performance of GEM and the static-kernel PDE method are similar, and close to the ideal (shown in black). 19

ge d e 3 Comparison of PDE performance to that of a Neural Network The author has found, for the most part, that nearly all signal/background discrimination problems confronted in actual analysis situations are handled equally well by both artificial neural networks and PDE s. One advantage of using PDE s is that it easily provides an intuitive visual means of determining what the PDF looks like in the multi-dimensional parameter space. There are, in addition, a few types of discrimination problems that are better suited to the PDE approach, rather than that of neural networks. Take, for instance, the contrived example presented in the vignette of Figure 2. The signal is a distributed according to a 2D Gaussian smeared annular ring PDF, while the background is similarly distributed except the width of the smearing is much broader. Because the correlations in the parameter space are highly non-linear in this situation, it can be quite difficult to successfully train a neural network to discriminate between signal and background, especially if the statistics of the signal and background training samples are sparse. The background efficiency versus signal efficiency curves are shown for static-kernel PDE, GEM, and a neural network, MLPfit[4], all based on training samples of 5 events each of signal and background. Both static-kernel PDE and GEM attain discrimination performance that is close to the ideal, but the training of the neural network fails in this case. background efficiency 1.9.8.7 background signal.6.5.4 MLPfit GEM static-kernal PDE.4.5.6.7.8.9 1 signal efficiency Fig. 3: A comparision of the performance of PDE methods to that of an artificial neural network for the signal and background distributions displayed in the vignette. The correlations in the parameter space are somewhat non-linear for the background, and the size of the training samples is again relatively small, but the discrimination performance of the neural network nonetheless approaches that of the PDE methods. In most real analysis situations, the discrimination performance of artificial neural networks and PDE methods is comparable. An example of a case where both PDE s and neural networks perform comparably well is given in Figure 3. In this example the signal and background are slightly separated in both dimensions of the parameter space. The two dimensions are uncorrelated for the signal, while the background is slightly kidney shaped (which is perhaps not obvious in the vignette displayed in the figure). The background efficiency versus signal efficiency curves are shown for static-kernel PDE, GEM, and MLPfit. Despite 11

the sparse training statistics, the discrimination performace of the neural network is comparable to that of the PDE methods. 4 Summary Gaussian kernel PDE methods are quite easy to implement, and yield convenient graphical interpretations of the PDF in the multi-dimensional parameter space. The methods are easy to understand, and the few parameters needed to tune the methods are quite intuitive in nature. The discrimination performance of the methods is comparable to that of artificial neural networks, and thus they offer an interesting alternative to the use of neural networks in a multivariate analysis. References [1] B. Knuteson et al., (21) physics/182. [2] DØ Collaboration, V.M. Abazov et al., Phys. Rev. Lett. 87 (21) 23181. [3] T. Hastie et al., The Elements of Statistical Learning, Springer Verlag (21). [4] MLPfit, J. Schwindling, http://schwind.home.cern.ch/schwind/mlpfit.html 111