Many-core Accelerated LIBOR Swaption Portfolio Pricing

Size: px
Start display at page:

Download "Many-core Accelerated LIBOR Swaption Portfolio Pricing"

Transcription

1 2012 SC Companion: High Performance Computing, Networking Storage and Analysis Many-core Accelerated LIBOR Swaption Portfolio Pricing Jörg Lotze, Paul D. Sutton, Hicham Lahlou Xcelerit Dunlop House, Fenian Street Dublin 2, Ireland Telephone: {jorg.lotze,paul.sutton,hicham.lahlou}@xcelerit.com Abstract This paper describes the acceleration of a Monte- Carlo algorithm for pricing a LIBOR swaption portfolio using multi-core CPUs and GPUs. Speedups of up to 305x are achieved on two Nvidia Tesla M2050 GPUs and up to 20.8x on two Intel Xeon E5620 CPUs, compared to a sequential CPU implementation. This performance is achieved by using the Xcelerit platform writing sequential, high-level C++ code and adopting a simple dataflow programming model. It avoids the complexity involved when using low-level high-performance computing frameworks such as OpenMP, OpenCL, CUDA, or SIMD intrinsics. The paper provides an overview of the Xcelerit platform, details how high performance is achieved through various automatic optimisation and parallelisation techniques, and shows how the tool can be used to implement portable accelerated Monte-Carlo algorithms in finance. It illustrates the implementation of the Monte-Carlo LIBOR swaption portfolio pricer and gives performance results. A comparison of the Xcelerit platform implementation with an equivalent low-level CUDA version shows that the overhead introduced is less than 1.5% in all scenarios. I. INTRODUCTION Pricing financial derivatives is one of the most important problems in computational finance. This is often a computeintensive task, especially when considering large portfolios or derivatives with complicated features. Typically closed-form algorithms only exist for the simplest of cases, for example single-asset European options with the Black-Scholes assumptions applied [1]. In all other cases, numerical solutions have to be found, e.g., applying latticebased methods (for example binomial or trinomial trees), finite difference schemes, or Monte-Carlo simulations. These algorithms are computationally demanding and financial institutions worldwide are currently exploring high-performance computing (HPC) hardware such as multi-core CPUs, grids, and GPUs to deal with these demands. However, financial analysts are primarily mathematicians and typically have little or no experience in programming HPC hardware using development frameworks such as OpenMP [2], OpenCL [3], or CUDA [4], or using SIMD intrinsics. This paper examines the acceleration of a Monte-Carlo algorithm to price a portfolio of LIBOR swaptions. The Xcelerit platform is used to exploit HPC hardware including multi-core CPUs and GPUs using high-level, portable C++ source code. The Xcelerit platform permits users to avoid the challenges associated with programming using the low-level frameworks mentioned above, yet still achieve high performance applications. Applications can be efficiently executed on different many-core processors and are compiled from a single source codebase. The Xcelerit platform has a GPU back-end using Nvidia s CUDA Toolkit [4], and we provide comparisons to a low-level CUDA implementation in this paper. CUDA is a proprietary programming toolkit that defines a programming model, a language based on C++, and an API for programming Nvidia GPUs. A compiler for GPU code and a large set of GPUbased libraries for specific purposes is included in the toolkit. Using CUDA requires expert knowledge about the GPU hardware architecture, the CUDA C++ language, and data-parallel programming and synchronization techniques. The Xcelerit platform avoids that complexity by providing a programming interface on a higher level of abstraction. Fully utilising the compute power of multi-core CPUs, including their Single Instruction Multiple Data (SIMD) instruction sets, requires parallel and low-level programming expertise. The Xcelerit platform automates this task, improving programmer productivity and code portability. The paper is structured as follows. The algorithm used to price a LIBOR swaption portfolio is presented in Sec. II. Sec. III provides an overview of the Xcelerit platform, it s programming model and how applications are developed. This section also describes the automatic parallelisation and optimisation techniques applied within the platform to achieve high performance. Monte-Carlo simulations are common in computational finance not only for financial derivatives, but also in risk management algorithms. Therefore a generic strategy for implementing financial Monte-Carlo algorithms using the Xcelerit platform is detailed in Sec. IV. This approach is applied in Sec. V to the LIBOR swaption portfolio pricing algorithm and performance results are presented for both multi-core CPU and GPU hardware. A detailed performance comparison with an equivalent low-level CUDA implementation is given in Sec. VI and Sec. VII concludes the paper. II. PRICING A LIBOR SWAPTION PORTFOLIO Swaptions are options on financial swap contracts, i.e., they provide one party with the right to enter a swap agreement at a future date, where a pre-determined fixed interest rate /13 $ IEEE DOI /SC.Companion

2 is exchanged for a floating rate [5]. The London Interbank Offered Rate (LIBOR) is the interest rate applied for loans between banks, and is calculated for ten different currencies and 15 borrowing terms ranging from overnight to one year on a daily basis (i.e., it is a floating rate subject to change every day) [5], [6]. In a LIBOR swaption, the floating interest rate of the swap agreement is the LIBOR rate. To value a portfolio of LIBOR swaptions, a stochastic model that can predict the future development of the LIBOR rate for a given term is required. Based on this model, the value of each swaption can be determined, and the overall portfolio value can be computed by applying a payoff function. The evaluation is typically performed using a Monte-Carlo simulation, i.e., a large number of different possible developments for the LIBOR rate for different time steps is simulated using random numbers, the portfolio is valued for each case, and the final result is obtained by computing an average of all paths. For high accuracy, a large number of Monte-Carlo paths is required; typically more than 100,000 paths. A. Algorithm In this paper, we apply the algorithm introduced in [7], which we briefly outline in the following. We denote the forward LIBOR rate by L n i for the time interval [iδ, (i + 1)δ], where δ is the LIBOR interval. If the simulation time-step is chosen equal to the LIBOR interval, the forward LIBOR rates at the times nδ < iδ can be approximated by the equations L n+1 i = L n i e (σi n 1Sn i 1 2 σ2 i n 1)δ+σ i n 1Z n δ for all i > n and where n = 0,..., N mat 1. The number of time steps to maturity is denoted by N mat. The variable Z n is a standard normal distributed random variable for the n th time-step, and S n i = i j=n+1 σ j n 1 δl n j 1 + δl n j (1). (2) This model treats the volatility σ as a function of time to maturity, which remains fixed once the maturity is reached. Therefore, L n+1 i = L n i for all time steps i n. Based on these forward LIBOR rates, the payoff V of a portfolio of N opt swaptions with the swap rates C j and maturities T j (with j = 0,..., N opt ) can be computed as { Nmat 1 } 1 V = 1 + δl i=0 i N opt 1 100(1 B Tj C o S Tj ) +, (3) where j=0 S m = B m = m 1 i=0 m 1 i=0 δb i, (4) 1. (5) 1 + δl Nmat+i Thus, to price a portfolio of LIBOR swaptions using a Monte-Carlo simulation, the following steps have to be taken: i) Draw N mat N paths standard normal random samples (N paths is the number of Monte-Carlo paths), ii) Compute the LIBOR forward rates for each path using (1), iii) Compute the portfolio payoff for each path using (3), and iv) Average all paths to obtain the final portfolio value. B. Greeks For financial institutions, it is valuable to not only determine the value of an instrument, but also the sensitivity to changes in the parameters on which this value depends. This permits the application of hedging techniques to compensate for the risk associated with one asset by adding other assets with reverse characteristics to the institution s portfolio [5]. These sensitivities are denoted by Greek letters, commonly referred to as the Greeks. For the LIBOR swaption portfolio pricing detailed in this paper, we focus on the λ Greek to serve as an example, which denotes the percentage change in derivative value per percentage change in the underlying price [8]. It is sometimes also called Ω or elasticity, and defined as λ = V S S V, (6) where V is the derivative s value and S is the price of the underlying asset. Here we use an adjoint method to compute the value of λ simultaneously with the portfolio value, as detailed in [9]. C. Computational Complexity Acceptable accuracies for a Monte-Carlo simulation are generally achieved for 100,000 paths or more. This means, for N mat = 40, a total number of 4 million random numbers must be drawn, and the equations outlined above have to be computed for each path. Furthermore, when estimating the Greeks the algorithm becomes significantly more complex. For example, a straightforward sequential C language implementation of the algorithm including Greeks for 15 swaptions, N mat = 40, and 128K paths (here K denotes 1024) takes 20 seconds on a Xeon E5620 CPU. For a financial institution with many assets to value and which typically uses more paths for better accuracy, this time is not acceptable. Therefore high-performance parallel processors have to be considered for this algorithm. III. THE XCELERIT PLATFORM The Xcelerit platform consists of the Xcelerit SDK at its core and several add-ons providing specialised functions or interfaces. The Xcelerit SDK permits the efficient use of many-core processors, i.e., multi-core CPUs and GPUs, from a single high-level codebase written in a high-level programming language such as C++, C, C#, or Java (the code samples in this paper use the C++ API). It is available for the Linux and Windows operating systems. Its dataflow programming model, based on the Synchronous Dataflow (SDF) model of 1186

3 source actor RandomSrc source actor RandomSrc generic actor Sum sink actor MemoryWriter Fig. 1. A simple Xcelerit dataflow graph, computing the sum of two random streams of numbers. computation [10], allows source code to be automatically optimised and parallelised without requiring programmers to include any parallel constructs. This ensures that user code is simple and portable, making it an attractive framework for implementing financial algorithms such as the LIBOR swaption portfolio pricing explained in Sec. II. Add-ons for the Xcelerit SDK include statistical functions and random number generators, or interfaces to Excel and Matlab. A complete package for computational finance, Xcelerit Quant, combines all add-ons relevant to this domain. A. Programming Model To use the Xcelerit API, compute-intensive algorithms are expressed as a dataflow graph a succession of processing stages (termed actors) which are connected together. A series of actors are continuously applied to streams of data, flowing through the graph. Source actors generate the data, generic actors take data from their input ports and compute the output data to be placed into their output ports, and sink actors consume data (for example, save it into a file). Fig. 1 shows a simple example dataflow graph, generating two streams of random numbers, computing their sum, and writing the results to memory. Generic actors can be configured using parameters and constant look-up tables. To describe a program using the Xcelerit API, source actors, sink actors, and generic actors are instantiated and connected to a dataflow graph, which is then executed in parallel on the available multi-core CPUs or GPUs. The use of this programming model enables parallel execution while ensuring that common parallel programming bugs such as race conditions or deadlocks cannot be introduced. B. Application Development Fig. 2 gives an overview of the components of the Xcelerit SDK. Developing applications using the Xcelerit SDK involves the following steps: i) Identify the performance-critical part of the application through profiling, ii) For these bottlenecks, map the algorithm to a dataflow graph, iii) Implement the dataflow actors needed (or choose from provided library actors), and iv) Instantiate actors, connect to a flow graph, and execute the graph. Fig. 2. The Xcelerit SDK Architecture. Using the C++ API, actors are implemented as a class inheriting from a common base class. Input and output ports and optionally parameters and constant look-up tables are added to the public interface as required. A run() method is implemented which performs the core computation of the actor, i.e., it takes the data from the input ports, processes it, and places the results into the output ports. It may use the parameter values and constant look-up tables, and helper functions and other classes for the computation. These actors are then compiled by the Xcelerit Compiler Driver, which drives a set of compilation tools and existing standard compilers for CPUs and GPUs. C++ actors are compiled into a single binary file that holds both GPU and CPU code. This permits the runtime to pick the appropriate implementation depending on the available compute resources. For composing a dataflow graph, actor objects are instantiated and connected using C++ operators. The graph can then be executed and the Xcelerit SDK runtime will automatically handle its efficient execution on the available compute resources. This simple methodology makes the Xcelerit SDK a good candidate to exploit high performance many-core processors for algorithms such as the LIBOR swaption portfolio pricer described in Sec. II. Through the use of a dataflow programming model, it ensures high performance parallel execution while maintaining programmer productivity. To ease development, the Xcelerit SDK comes with a set of development tools, e.g., a profiler, debugger integration, IDE plug-ins, and a library of often-used functions and actors. In addition, conventional CPU and GPU debuggers and profilers can be used. C. Under the Hood This section gives an insight into how the Xcelerit SDK works under the hood. It explains the optimisations employed in order to improve the performance. Several types of parallelism can be extracted for the efficient execution of Xcelerit programs. These types of parallelism are 1187

4 RandomSrc RandomSrc Sum MeanReduce time (a) + A A x 0 x 1 x 2 A y 2 y 1 y 0 input items output items (a) Fig. 4. Comparison of a sequential reduction (a) and a parallel version (b) for adding 8 values together. The sequential version takes 7 time steps to complete, while the parallel version finishes in 3. (b) (b) Fig. 3. Pipeline parallelism (a), and data parallelism (b) in dataflow graphs as applied by the Xcelerit SDK. The round arrow represents a concurrent thread of execution. extracted automatically from sequential Xcelerit-enabled user code by the Xcelerit Compiler Driver and Runtime System. Efficient parallel code is generated and CPU and GPU memory access optimisations are carried out. 1) Pipeline Parallelism: Pipeline parallelism is a form of task parallelism, where every stage in a processing pipeline is executed concurrently on different sets of data. This form of parallelism, illustrated in Fig. 3(a), yields a significant gain in performance, especially when the actors are of similar computational complexity. 2) Data Parallelism: It is possible to execute multiple instances of the same actor function in parallel, each working on a different set of the input and output data. This form of parallelism, illustrated in Fig. 3(b), provides efficient load balancing between concurrent executions, as all execute the same code. Data parallelism is the primary form of parallelism supported by GPUs. 3) Vectorisation (SIMD): Single-instruction multiple-data (SIMD) parallelism works at a much finer granularity. As the name implies, it applies single processor instructions to vectors of data at once, reducing the total number of instructions needed. It is a form of data parallelism which requires hardware support, i.e., the underlying hardware must provide special SIMD instructions and wide registers to hold vector data. Modern CPUs support SIMD in different instruction sets, e.g., SSE (16-byte data vectors), AVX (32 byte data vectors), or AltiVec (different widths). GPUs from AMD also provide SIMD instructions. An AVX multiplication operation can for example compute the element-wise product of vectors of 8 floating point numbers simultaneously which results in an approximately 8x faster computation compared to using scalar multiply instructions. The Xcelerit API is implemented specifically to allow back-end compilers to carry out automatic SIMD vectorization. Note that the next release, currently under development, will include more direct methods to implement vectorised computations from a high-level API. Main Memory (xgb) L2 Cache L1 Cache Registers L3 Cache (8MB) 256KB 64KB L2 Cache L1 Cache Registers Core 1... Core 6 (a) speed size Registers Main Memory Constant Memory Shared Memory Registers Core 1... Core 32 Fig. 5. Example memory and cache hierarchy of typical CPUs (a) and GPUs (b). On the left is an example of a 6-core CPU of Intel s Nehalem architecture, and on the right is a typical GPU of Nvidia s Fermi architecture. 4) Parallel Reductions: Reductions are central to many algorithms, where partial results are combined to a final result. It is therefore crucial for the compiler to generate efficient code for reductions. There are many ways to implement reductions in parallel, where partial reductions are computed concurrently and reduced further in subsequent steps. Fig. 4 illustrates a sequential and a parallel implementation of a reduction that sums up all individual items in a buffer. The parallel sum finishes after 3 steps, whereas the sequential version needs 7 steps to complete. More generically, the parallel reduction needs log 2 N steps, while the sequential version needs N 1 steps. On GPUs, shared memory (between threads) and efficient thread block sizes can be used to speed up the process. The Xcelerit SDK provides a set of highly-optimised built-in reduction actors and allows users to easily configure them with their own reduction functions. 5) Memory Access Optimisations: Typically, processors have a hierarchy of memory, starting from registers (the fastest and smallest), over several levels of cache (fast, larger with every level), to external memory (slowest and largest). Being able to place data used by actors in memory at a level close to the processing element can improve the performance of the computation significantly. The Xcelerit SDK exploits memory locality and performs cache optimisations where possible. That is, it ensures that the data needed is kept physically close to (b)

5 RandomSrc PathCalc Reduce Writer random samples payoff calculation reduce paths/option write results Fig. 6. Typical Xcelerit Dataflow Graph for Monte-Carlo Simulations. the processing element, using the memory hierarchy present in today s systems efficiently. For illustration, Fig. 5 shows the architecture of a typical CPU and GPU with different levels of cache and memory. Furthermore, the Xcelerit SDK ensures that while a thread is waiting for data to arrive from memory, other threads can compute (overlapping computation with memory access). For GPUs, Xcelerit applications make efficient use of fast shared memory and registers where possible. Fast global memory access is ensured where possible (coalescing). All of these optimisations ensure that all processors involved in the computation are kept busy and are not held back by memory access latencies. Note that these optimisations are automatic and do not require code annotations. IV. MONTE-CARLO SIMULATIONS USING THE XCELERIT PLATFORM Monte-Carlo simulations involve simulating a large number of random paths and computing the per-path results individually. These results are then combined using statistical analysis in order to compute the final results. Generally, the more paths used in the simulation, the more accurate the result. For pricing financial derivatives using a Monte-Carlo method, typically the following steps are followed: i) Generate random samples ii) Calculate the payoff for each path, using random samples, iii) Average the payoffs of all paths, and iv) Save the results The final result is a numerical estimate for the value of the derivative. This approach can be used to obtain the price of derivatives with different complexity. For example, several sources of uncertainty can easily be incorporated, correlations between underlyings in multi-asset options 1 can be modelled, or different probability distributions can be incorporated. Furthermore, Monte-Carlo simulation can also be applied in risk management and other areas, following a similar approach. They are a powerful tool for financial analysts. An Xcelerit dataflow graph for the general steps involved in a Monte-Carlo simulation is shown in Fig. 6. It is a straightforward mapping of the algorithm explained above, and can be implemented directly. A. Random Number Generation It is critical for any Monte-Carlo simulation that the pseudorandom number generator used exhibits the desired statistical properties, and has a long enough period to avoid repetitions of the data. The accuracy of the final result is highly dependent 1 multi-asset options are also called basket options or rainbow options on the quality of the random number generator used. The Xcelerit SDK Statistics Add-on provides a set of high-quality generators. Note that user-defined generators can also be implemented for specific requirements. B. Per-Path Computation This part of the algorithm is typically straightforward, as each path can be computed independently using different random numbers. Payoff functions can be of arbitrary complexity, depending on the type of derivative to be priced. Simple functions only need the final value of the underlying assets, while others might involve weighted payoffs of several underlyings. For risk computations, the per-path calculation can be of a different nature, for example computing the expected loss for each scenario. With the Xcelerit platform, this part is implemented within a custom generic actor, taking the per-path random numbers from its input port and computing the associated payoff for the output port. C. Reduction and Writer Averaging the individual per-path payoffs can be achieved by using the provided parallel reduction actors in the Xcelerit SDK. Usually multiple derivatives must be priced, which means a block-wise reduction is required to average all paths per option. This outputs a stream of option values (as illustrated in Fig. 6), which are received by some Writer sink actor and stored to file, for example. If only a single instrument is valued, as in the LIBOR swaption portfolio case, a full reduction can be used. This is a sink actor which simply reduces all per-path values on its input and stores the final result in a variable. The additional Writer component is not required in this case. If the provided reductions are not sufficient, users can configure a generic reduction sink actor with their own reduction function, which will then be executed in parallel. V. LIBOR SWAPTION PRICING WITH THE XCELERIT SDK This section presents the implementation of the LIBOR swaption portfolio pricing algorithm described in Sec. II using the Xcelerit SDK. Performance results for a number of different configurations are presented and compared with a sequential reference implementation. A. Implementation The first step in the implementation is to map the algorithm to a dataflow graph, as shown in Fig. 7. The algorithm includes the computation of the λ Greek, hence the two sink actors one for the swaption portfolio value and another for the associated λ. Only a single portfolio is priced, rather than multiple options, and therefore a full reduction sink can be used for both outputs (instead of a block-wise reduction and writer combination, as described in Sec. IV-C). To provide the random samples, the Xcelerit-provided source actor RandomSrc is set up with the MRG32K3a 1189

6 RandomSrc LiborSwaptionGreek MeanReduce portfolio value TABLE I SPEEDUPS ACHIEVED FOR THE LIBOR SWAPTION PRICER USING 512K PATHS (GPUS: TESLA M2050). normal distribution payoff and λ calc. MeanReduce λ value Fig. 7. Xcelerit Dataflow Graphs for LIBOR Swaption Portfolio Pricing (with λ Greek). random number generator [11] and a normal distribution. The number of samples needed, the mean and standard deviation, and the generator seed are set up in the constructor. The LiborSwaptionGreek actor is user-defined, with an outline given below: class LiborSwaptionGreek : public Actor { public: Input<float> z; // rand. sample Output<float> v, lb; // value/lambda Constant<float> swaprates, L0, lambda; Constant<int> maturities; Parameter<int> nopt;... // setup / construct // core algorithm actor void run() const { float L[NN], L2[L2_SIZE]; // temporary values float *L_b = L; // copy initial LIBOR rates from constant copy(l0.begin(), L0.end(), L); // LIBOR rates calc., store reverse paths path_calc_b1(l, L2); // compute portfolio value, updating L_b v[0] = portfolio_b(l, L_b); // Reverse path calc for Greek path_calc_b2(l_b, L2); // Greek is the last entry in L_b lb[0] = L_b[NN-1]; } private: actor void path_calc_b1(float* L, float* L2) const; actor float portfolio(float* L, float* L_b) const; actor float path_calc_b2(float* L_b, float* L2) const; }; As can be seen, the actor has one input for the random samples, and two outputs for the portfolio value and λ. It is initialised with the swaption data, i.e., the number of swaptions, maturities, and swap rates, and a set of initial LIBOR rates and λ values. From these, the portfolio value and λ are computed using the random input samples. The two reduction sinks are provided by the Xcelerit SDK, they simply compute the mean of all input values. The code for the actor instantiation, construction of the dataflow graph, and graph execution is as follows: // instantiate actors RandomSrc<float, RNDGEN_MRG32K3a, RNDDIST_NORMAL> samplegen(numpaths*nummat, SEED, 0.0f, 1.0f); LiborSwaptionGreek libor(swaprates, maturities, nopt, lambda0, L0); MeanReduce<float> meanvalue(&value); MeanReduce<float> meanlambda(&lambda); // construct dataflow graph Flowgraph f; Precision 1 GPU 2 GPUs single 155x 297x double 89x 171x f += samplegen >> libor, libor.v >> meanvalue, libor.lb >> meanlambda; // execute graph f.run(); It constructs objects of all needed actors, creates and connects a dataflow graph by connecting ports using the >> operator, and runs the graph. This executes the application in a highly efficient way on both multi-core CPUs and GPUs, depending on the available resources. B. Results To evaluate the performance of the Xcelerit platform implementation, the application is executed on a system with the following configuration: CPUs: 2 Intel Xeon E5620, hyperthreading off (8 cores) GPUs: 2 Nvidia Tesla M2050, ECC off RAM: 24GB OS: RedHat Enterprise Linux 5.4, 64bit CUDA SDK version: 4.2 GPU driver version: Xcelerit SDK version: Compiler: NVCC with GCC 4.1 as host compiler Compiler flags: -O3 -DNDEBUG Execution times for the LIBOR swaption pricing algorithm (with Greeks) are measured for single and double precision, using path numbers between 4K and 1024K (here K is 1024), a portfolio of 15 swaptions, and N mat = 40. The full dataflow graph execution is measured, including the random number generator and reduction as well as the CPU/GPU data transfers (managed by the Xcelerit SDK). These execution times are compared to an equivalent sequential implementation using a single CPU core. This sequential implementation uses the host API (CPU) of Nvidia curand s random number generator [12] to ensure that a generator of a quality comparable to Xcelerit s built-in random number generator is used. Fig. 8 shows the speedups achieved on GPU hardware with one and two GPUs. Using a single GPU, speedups of up to 155x can be realised for a single precision implementation for 512K paths. By using two M2050 GPUs, this speedup figure can be increased to 297x. This is an improvement of factor 1.9x when adding an extra GPU, which shows very good scalability, without changing the source code or re-compiling. The imperfect scalability can be explained by the unavoidable overhead involved in managing multiple GPUs and splitting the data and computation between them. 1190

7 speedup Nvidia Tesla M2050 GPU(s) 2 GPU, single 1 GPU, single 2 GPU, double 1 GPU, double 4K 16K 64K 256K 1024K number of paths TABLE II TIME COMPARISON XCELERIT SDK VS. CUDA FOR LIBOR SWAPTION PORTFOLIO PRICER (TIMES IN MILLISECONDS). Paths All DT 1 RNG 2 CE 3 RED 4 CUDA 32K 42.0 < Xcelerit 32K Overhead 32K 0% 90% 8% -2% 0% CUDA 128K < Xcelerit 128K Overhead 128K 1.5% 70% 11% -1.2% 67% CUDA 512K Xcelerit 512K Overhead 512K 1.4% 40% 24% -0.8% 67% AAA 1 CPU/GPU data transfers and memory allocations AAAI 2 Random number generation AAAI 3 Core LIBOR per-path calculation AAAI 4 Reductions (mean of all per-path values) Fig. 8. LIBOR swaption pricing performance on the GPU for double and single precision, compared to a sequential implementation running on a single core. All GPU results for 512K paths are also summarised in Tab. I. Speedups on multi-core CPUs range between 7.4x and 7.9x for 128K paths and more (using the 8 CPU cores in the test system). The difference between single and double precision is insignificant on the CPU. A new version of the Xcelerit SDK providing a direct simple API for vectorised computations is currently under development. Using this, preliminary results have shown speedups of 20.8x (single precision) and 13.6x (double precision) on the same system for 256K paths. This significantly higher speedup is thanks to a more efficient use of SIMD, and the difference between single and double precision is due to the different number of vector elements fitting into the available SIMD registers (128bit wide). VI. COMPARISONS WITH A LOW-LEVEL CUDA IMPLEMENTATION In this section, we will compare the performance and overhead of using the Xcelerit SDK for the LIBOR swaption portfolio pricer on GPUs with an equivalent low-level implementation using CUDA directly. A. CUDA Implementation The CUDA reference which serves as a basis and is available from Oxford University at [13]. We believe this code has a reasonable optimisation level which reflects the performance of the CUDA framework. The kernels have been left untouched, but for comparison fairness the random number generator has been replaced with Nvidia s curand library [12] (as the originally-used generator is not publicly available), and the mean reduction, originally done on the CPU, has been replaced by a parallel GPU-based version using the Thrust library [14]. Further, the number of threads per block has been adjusted from the original 64 (optimised for older GPUs) to 256 to be optimal for the Tesla M2050. All other code is identical to the original implementation [13]. As the reference CUDA implementation uses single precision floats and executes on a single GPU, we will compare with the equivalent variant of the Xcelerit SDK implementation. Adding multiple GPUs to the CUDA version would involve changes to the program architecture, using multiple host threads and a different approach to management of the data transfers. With the Xcelerit SDK, this is all handled automatically. Additional GPUs or multi-core CPU cores can be used without the need for source code changes or even recompilation of the application. A single binary makes it possible to fully leverage the available processing hardware. B. Metrics For the tests, the same machine configuration as mentioned in Sec. V is used. The following metrics are used for comparison: Overall application runtime Individual times: data transfers and memory allocations (DT), random number generation (RNG), core execution (CE), reduction (RED) Visual Profiler performance and efficiency metrics (detailed below) These are compared for a range of different Monte-Carlo path numbers. C. Results Tab. II shows the overall application runtime as well as a breakdown of the individual tasks for a range of different Monte-Carlo path numbers. As can be seen, the overhead introduced by the Xcelerit SDK is within 1.5% of the CUDA version in all cases. There is slightly more time taken for the memory allocations and data transfers with the Xcelerit SDK due to buffering of data between actors. The random number generator also takes slightly more time than the curand version used in the reference implementation. This is due to the more generic implementation of the Xcelerit SDK version. The most relevant part for the overall application is the core LIBOR function ( CE in table) which computes the per-path 1191

8 TABLE III NVIDIA VISUAL PROFILER METRICS FOR XCELERIT SDK VS. CUDA FOR LIBOR SWAPTION PORTFOLIO PRICER. Paths Reg/Thr GlbLd GlbSt DRAM Branch Occup CUDA 32K % 100% 42.8% 12.5% 55.4% Xcelerit 32K % 100% 46.8% 12.2% 56.3% CUDA 128K % 100% 48.3% 12.4% 64.1% Xcelerit 128K % 100% 52.3% 12.2% 64.3% CUDA 512K % 100% 49.5% 12.5% 65.5% Xcelerit 512K % 100% 53.5% 12.1% 65.7% portfolio values for all Monte-Carlo paths, all other parts are insignificant for the overall result. From the results in Tab. II it can be seen that the Xcelerit SDK is slightly faster than the original CUDA version. In the following we take a closer look into the core execution function by using the Nvidia Visual Profiler [15]. The following metrics reported by the profiler have been chosen for comparison: Reg/Thr Number of registers used per GPU thread (for information only) GlbLd Global Load Efficiency, i.e., efficiency of reading from global device memory (higher is better) GlbSt Global Store Efficiency, i.e., efficiency of writing to global device memory (higher is better) DRAM Utilization of the available DRAM bandwidth (higher is better) Branch Branch divergence overhead (lower is better) Occup Occupancy of the available processor cores (higher is better) The results are presented in Tab. III. As can be seen, most of the metrics show approximately the same results for the CUDA kernel and Xcelerit SDK actor. The biggest difference is in the global load efficiency, where the Xcelerit SDK achieves approximately 92% and the CUDA kernel only 20%. This explains the difference in the core execution times reported in Tab. II. The Xcelerit SDK takes care that the memory accesses on the device are coalesced, i.e., the data and memory reads are arranged in a fashion that avoids the serialisation of threads within a WARP (blocks of 32 threads) while accessing memory. This is not always possible, but in this application the benefit clearly shows. D. Summary As shown in this section, the overhead of the Xcelerit SDK compared to a low-level CUDA implementation is negligible for the LIBOR swaption portfolio pricing algorithm. This small overhead is the cost of developing algorithms on a much higher level (increasing productivity), with the added benefit of generating portable binaries that can run on any number of GPUs or multi-core CPUs. VII. CONCLUSIONS This paper has presented the acceleration of a Monte- Carlo LIBOR swaption portfolio pricer using the Xcelerit platform. It has shown that a dramatic performance increase can be achieved on GPUs (up to 305x on 2 Nvidia Tesla M2050 GPUs) while avoiding the complexity of low-level programming frameworks. The same application can also be executed on multi-core CPUs without re-compilation, achieving speedups of up to 20.8x on 2 Intel Xeon E5620 CPUs. Comparison with an equivalent low-level CUDA implementation has shown that the performance overhead added by the Xcelerit platform is very light (<1.5%). Details on how the Xcelerit platform achieves this high level of performance have been presented and a general strategy for implementing financial Monte-Carlo algorithms has been outlined. Thus, it has been shown that the Xcelerit platform can be used to implement complex financial algorithms such as the LIBOR swaption pricer using straightforward high-level programming techniques, while still achieving high performance and portability across HPC processing platforms. REFERENCES [1] F. Black and M. Scholes, The pricing of options and corporate liabilities, The Journal of Political Economy, pp , [2] OpenMP Application Program Interface, OpenMP Architecture Review Board, Rev. 3.1, Jul [Online]. Available: org/mp-documents/openmp3.1.pdf [3] (2011) OpenCL - The open standard for parallel programming of heterogeneous systems. Khronos Group. [Online]. Available: [4] (2012) NVIDIA CUDA Toolkit. NVIDIA Corporation. [Online]. Available: [5] J. C. Hull, Fundamentals of Futures and Option Markets, 7th ed., D. Battista, Ed. Peason Education, Inc., [6] E. V. Murphy, LIBOR: Frequently asked questions, Congressional Research Service, Washington, DC, CRS Report for Congress R42608, Jul [Online]. Available: pdf [7] M. Giles, Libor notes, [Online]. Available: ox.ac.uk/gilesm/libor/libor_notes.pdf [8] E. G. Haug, The Complete Guide to Option Pricing Formulas. McGraw- Hill Professional, [9] M. Giles, Monte carlo evaluation of sensitivities in computational finance, Oxford University Computing Laboratory, Oxford, UK, Tech. Rep. 07/12, Jun [Online]. Available: uk/1090/1/na pdf [10] E. A. Lee and D. G. Messerschmitt, Synchronous data flow, Proc. IEEE, vol. 75, no. 9, pp , Sep [11] P. L Ecuyer, R. Simard, E. J. Chen, and W. D. Kelton, An objectoriented random-number package with many long streams and substreams, Oper. Res., vol. 50, no. 6, pp , Nov [12] (2012) NVIDIA curand Random Number Generation library. NVIDIA Corporation. [Online]. Available: cuda/curand [13] M. Giles. (2007) Libor monte carlo application. Oxford University Computing Laboratory. Oxford, UK. [Online]. Available: http: //people.maths.ox.ac.uk/gilesm/hpc/ [14] (2012) Thrust library. NVIDIA Corporation. [Online]. Available: [15] (2011) NVIDIA Visual Profiler. NVIDIA Corporation. [Online]. Available:

Accelerating Financial Computation

Accelerating Financial Computation Accelerating Financial Computation Wayne Luk Department of Computing Imperial College London HPC Finance Conference and Training Event Computational Methods and Technologies for Finance 13 May 2013 1 Accelerated

More information

Accelerating Quantitative Financial Computing with CUDA and GPUs

Accelerating Quantitative Financial Computing with CUDA and GPUs Accelerating Quantitative Financial Computing with CUDA and GPUs NVIDIA GPU Technology Conference San Jose, California Gerald A. Hanweck, Jr., PhD CEO, Hanweck Associates, LLC Hanweck Associates, LLC 30

More information

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE Outline GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing Monte Carlo PDE Conclusions 2 Why GPU for Finance? Need for effective portfolio/risk management solutions Accurately measuring,

More information

GPU-Accelerated Quant Finance: The Way Forward

GPU-Accelerated Quant Finance: The Way Forward GPU-Accelerated Quant Finance: The Way Forward NVIDIA GTC Express Webinar Gerald A. Hanweck, Jr., PhD CEO, Hanweck Associates, LLC Hanweck Associates, LLC 30 Broad St., 42nd Floor New York, NY 10004 www.hanweckassoc.com

More information

Analytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London

Analytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London Analytics in 10 Micro-Seconds Using FPGAs David B. Thomas dt10@imperial.ac.uk Imperial College London Overview 1. The case for low-latency computation 2. Quasi-Random Monte-Carlo in 10us 3. Binomial Trees

More information

SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU)

SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU) SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU) NIKOLA VASILEV, DR. ANATOLIY ANTONOV Eurorisk Systems Ltd. 31, General Kiselov str. BG-9002 Varna, Bulgaria Phone +359 52 612 367

More information

Algorithmic Differentiation of a GPU Accelerated Application

Algorithmic Differentiation of a GPU Accelerated Application of a GPU Accelerated Application Numerical Algorithms Group 1/31 Disclaimer This is not a speedup talk There won t be any speed or hardware comparisons here This is about what is possible and how to do

More information

CUDA-enabled Optimisation of Technical Analysis Parameters

CUDA-enabled Optimisation of Technical Analysis Parameters CUDA-enabled Optimisation of Technical Analysis Parameters John O Rourke (Allied Irish Banks) School of Science and Computing Institute of Technology, Tallaght Dublin 24, Ireland Email: John.ORourke@ittdublin.ie

More information

Stochastic Grid Bundling Method

Stochastic Grid Bundling Method Stochastic Grid Bundling Method GPU Acceleration Delft University of Technology - Centrum Wiskunde & Informatica Álvaro Leitao Rodríguez and Cornelis W. Oosterlee London - December 17, 2015 A. Leitao &

More information

Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,*

Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,* 2017 2 nd International Conference on Computer Science and Technology (CST 2017) ISBN: 978-1-60595-461-5 Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform Gang

More information

Financial Mathematics and Supercomputing

Financial Mathematics and Supercomputing GPU acceleration in early-exercise option valuation Álvaro Leitao and Cornelis W. Oosterlee Financial Mathematics and Supercomputing A Coruña - September 26, 2018 Á. Leitao & Kees Oosterlee SGBM on GPU

More information

Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL

Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL Javier Alejandro Varela, Norbert Wehn Microelectronic Systems Design Research Group University of Kaiserslautern,

More information

Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA

Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA Rajesh Bordawekar and Daniel Beece IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation

More information

Pricing Early-exercise options

Pricing Early-exercise options Pricing Early-exercise options GPU Acceleration of SGBM method Delft University of Technology - Centrum Wiskunde & Informatica Álvaro Leitao Rodríguez and Cornelis W. Oosterlee Lausanne - December 4, 2016

More information

HIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH. GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM

HIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH. GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM HIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM Introduction Valuation of American options on several assets requires

More information

Ultimate Control. Maxeler RiskAnalytics

Ultimate Control. Maxeler RiskAnalytics Ultimate Control Maxeler RiskAnalytics Analytics Risk Financial markets are rapidly evolving. Data volume and velocity are growing exponentially. To keep ahead of the competition financial institutions

More information

GRAPHICAL ASIAN OPTIONS

GRAPHICAL ASIAN OPTIONS GRAPHICAL ASIAN OPTIONS MARK S. JOSHI Abstract. We discuss the problem of pricing Asian options in Black Scholes model using CUDA on a graphics processing unit. We survey some of the issues with GPU programming

More information

Assessing Solvency by Brute Force is Computationally Tractable

Assessing Solvency by Brute Force is Computationally Tractable O T Y H E H U N I V E R S I T F G Assessing Solvency by Brute Force is Computationally Tractable (Applying High Performance Computing to Actuarial Calculations) E D I N B U R M.Tucker@epcc.ed.ac.uk Assessing

More information

Applications of Dataflow Computing to Finance. Florian Widmann

Applications of Dataflow Computing to Finance. Florian Widmann Applications of Dataflow Computing to Finance Florian Widmann Overview 1. Requirement Shifts in the Financial World 2. Case 1: Real Time Margin 3. Case 2: FX Option Monitor 4. Conclusions Market Context

More information

PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS. Massimiliano Fatica, NVIDIA Corporation

PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS. Massimiliano Fatica, NVIDIA Corporation PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS Massimiliano Fatica, NVIDIA Corporation OUTLINE! Overview! Least Squares Monte Carlo! GPU implementation! Results! Conclusions OVERVIEW!

More information

Black-Scholes option pricing. Victor Podlozhnyuk

Black-Scholes option pricing. Victor Podlozhnyuk Black-Scholes option pricing Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Document Change History Version Date Responsible Reason for Change 0.9 007/03/19 Victor Podlozhnyuk Initial release 1.0 007/04/06

More information

Domokos Vermes. Min Zhao

Domokos Vermes. Min Zhao Domokos Vermes and Min Zhao WPI Financial Mathematics Laboratory BSM Assumptions Gaussian returns Constant volatility Market Reality Non-zero skew Positive and negative surprises not equally likely Excess

More information

Hedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud

Hedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud Hedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud GPU Technology Conference 2013 Aon Benfield Securities, Inc. Annuity Solutions Group (ASG) This document is the confidential property

More information

Barrier Option. 2 of 33 3/13/2014

Barrier Option. 2 of 33 3/13/2014 FPGA-based Reconfigurable Computing for Pricing Multi-Asset Barrier Options RAHUL SRIDHARAN, GEORGE COOKE, KENNETH HILL, HERMAN LAM, ALAN GEORGE, SAAHPC '12, PROCEEDINGS OF THE 2012 SYMPOSIUM ON APPLICATION

More information

AD in Monte Carlo for finance

AD in Monte Carlo for finance AD in Monte Carlo for finance Mike Giles giles@comlab.ox.ac.uk Oxford University Computing Laboratory AD & Monte Carlo p. 1/30 Overview overview of computational finance stochastic o.d.e. s Monte Carlo

More information

Local Volatility FX Basket Option on CPU and GPU

Local Volatility FX Basket Option on CPU and GPU www.nag.co.uk Local Volatility FX Basket Option on CPU and GPU Jacques du Toit 1 and Isabel Ehrlich 2 Abstract We study a basket option written on 10 FX rates driven by a 10 factor local volatility model.

More information

Accelerated Option Pricing Multiple Scenarios

Accelerated Option Pricing Multiple Scenarios Accelerated Option Pricing in Multiple Scenarios 04.07.2008 Stefan Dirnstorfer (stefan@thetaris.com) Andreas J. Grau (grau@thetaris.com) 1 Abstract This paper covers a massive acceleration of Monte-Carlo

More information

Gas storage: overview and static valuation

Gas storage: overview and static valuation In this first article of the new gas storage segment of the Masterclass series, John Breslin, Les Clewlow, Tobias Elbert, Calvin Kwok and Chris Strickland provide an illustration of how the four most common

More information

S4199 Effortless GPU Models for Finance

S4199 Effortless GPU Models for Finance ADAPTIV Risk management, risk-based pricing and operational solutions S4199 Effortless GPU Models for Finance 26 th March 2014 Ben Young Senior Software Engineer SUNGARD SunGard is one of the world s leading

More information

The Dynamic Cross-sectional Microsimulation Model MOSART

The Dynamic Cross-sectional Microsimulation Model MOSART Third General Conference of the International Microsimulation Association Stockholm, June 8-10, 2011 The Dynamic Cross-sectional Microsimulation Model MOSART Dennis Fredriksen, Pål Knudsen and Nils Martin

More information

Computational Finance in CUDA. Options Pricing with Black-Scholes and Monte Carlo

Computational Finance in CUDA. Options Pricing with Black-Scholes and Monte Carlo Computational Finance in CUDA Options Pricing with Black-Scholes and Monte Carlo Overview CUDA is ideal for finance computations Massive data parallelism in finance Highly independent computations High

More information

Reconfigurable Acceleration for Monte Carlo based Financial Simulation

Reconfigurable Acceleration for Monte Carlo based Financial Simulation Reconfigurable Acceleration for Monte Carlo based Financial Simulation G.L. Zhang, P.H.W. Leong, C.H. Ho, K.H. Tsoi, C.C.C. Cheung*, D. Lee**, Ray C.C. Cheung*** and W. Luk*** The Chinese University of

More information

Efficient Reconfigurable Design for Pricing Asian Options

Efficient Reconfigurable Design for Pricing Asian Options Efficient Reconfigurable Design for Pricing Asian Options Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Department of Computing Imperial College London, UK {htt08,dt10,khtsoi,wl}@doc.ic.ac.uk ABSTRACT

More information

Numerix Pricing with CUDA. Ghali BOUKFAOUI Numerix LLC

Numerix Pricing with CUDA. Ghali BOUKFAOUI Numerix LLC Numerix Pricing with CUDA Ghali BOUKFAOUI Numerix LLC What is Numerix? Started in 1996 Roots in pricing exotic derivatives Sophisticated models CrossAsset product Excel and SDK for pricing Expanded into

More information

HPC IN THE POST 2008 CRISIS WORLD

HPC IN THE POST 2008 CRISIS WORLD GTC 2016 HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016 STANFORD CENTER FOR FINANCIAL AND RISK ANALYTICS HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016 BACK TO 2008 FINANCIAL MARKETS

More information

Benchmarks Open Questions and DOL Benchmarks

Benchmarks Open Questions and DOL Benchmarks Benchmarks Open Questions and DOL Benchmarks Iuliana Bacivarov ETH Zürich Outline Benchmarks what do we need? what is available? Provided benchmarks in a DOL format Open questions Map2Mpsoc, 29-30 June

More information

NAG for HPC in Finance

NAG for HPC in Finance NAG for HPC in Finance John Holden Jacques Du Toit 3 rd April 2014 Computation in Finance and Insurance, post Napier Experts in numerical algorithms and HPC services Agenda NAG and Financial Services Why

More information

Remarks on stochastic automatic adjoint differentiation and financial models calibration

Remarks on stochastic automatic adjoint differentiation and financial models calibration arxiv:1901.04200v1 [q-fin.cp] 14 Jan 2019 Remarks on stochastic automatic adjoint differentiation and financial models calibration Dmitri Goloubentcev, Evgeny Lakshtanov Abstract In this work, we discuss

More information

Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result

Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result Lurng-Kuo Liu Virat Agarwal Outline Objectivee Collateralized Debt Obligation Basics CDO on the Cell/B.E. A preliminary result

More information

Monte-Carlo Pricing under a Hybrid Local Volatility model

Monte-Carlo Pricing under a Hybrid Local Volatility model Monte-Carlo Pricing under a Hybrid Local Volatility model Mizuho International plc GPU Technology Conference San Jose, 14-17 May 2012 Introduction Key Interests in Finance Pricing of exotic derivatives

More information

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Workstation Testing. Enterprise Testing Dell and NVIDIA solutions Conclusions

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Workstation Testing. Enterprise Testing Dell and NVIDIA solutions Conclusions Outline GPU for Finance SciFinance SciFinance CUDA Risk Applications Workstation Testing Monte Carlo PDE Enterprise Testing Dell and NVIDIA solutions Conclusions 2 Why GPU for Finance? Need for effective

More information

Application of High Performance Computing in Investment Banks

Application of High Performance Computing in Investment Banks British Computer Society FiNSG and APSG Public Application of High Performance Computing in Investment Banks Dr. Tony K. Chau Lead Architect, IB CTO, UBS January 8, 2014 Table of contents Section 1 UBS

More information

Milliman STAR Solutions - NAVI

Milliman STAR Solutions - NAVI Milliman STAR Solutions - NAVI Milliman Solvency II Analysis and Reporting (STAR) Solutions The Solvency II directive is not simply a technical change to the way in which insurers capital requirements

More information

Efficient Reconfigurable Design for Pricing Asian Options

Efficient Reconfigurable Design for Pricing Asian Options Efficient Reconfigurable Design for Pricing Asian Options Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Department of Computing Imperial College London, UK (htt08,dtl O,khtsoi,wl)@doc.ic.ac.uk

More information

Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations

Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo s David B. Thomas, Jacob A. Bower, Wayne Luk {dt1,wl}@doc.ic.ac.uk Department of Computing Imperial College London Abstract

More information

F1 Acceleration for Montecarlo: financial algorithms on FPGA

F1 Acceleration for Montecarlo: financial algorithms on FPGA F1 Acceleration for Montecarlo: financial algorithms on FPGA Presented By Liang Ma, Luciano Lavagno Dec 10 th 2018 Contents Financial problems and mathematical models High level synthesis Optimization

More information

Computational Finance. Computational Finance p. 1

Computational Finance. Computational Finance p. 1 Computational Finance Computational Finance p. 1 Outline Binomial model: option pricing and optimal investment Monte Carlo techniques for pricing of options pricing of non-standard options improving accuracy

More information

Load Test Report. Moscow Exchange Trading & Clearing Systems. 07 October Contents. Testing objectives... 2 Main results... 2

Load Test Report. Moscow Exchange Trading & Clearing Systems. 07 October Contents. Testing objectives... 2 Main results... 2 Load Test Report Moscow Exchange Trading & Clearing Systems 07 October 2017 Contents Testing objectives... 2 Main results... 2 The Equity & Bond Market trading and clearing system... 2 The FX Market trading

More information

Handbook of Financial Risk Management

Handbook of Financial Risk Management Handbook of Financial Risk Management Simulations and Case Studies N.H. Chan H.Y. Wong The Chinese University of Hong Kong WILEY Contents Preface xi 1 An Introduction to Excel VBA 1 1.1 How to Start Excel

More information

Multi-level Stochastic Valuations

Multi-level Stochastic Valuations Multi-level Stochastic Valuations 14 March 2016 High Performance Computing in Finance Conference 2016 Grigorios Papamanousakis Quantitative Strategist, Investment Solutions Aberdeen Asset Management 0

More information

New GPU Pricing Library

New GPU Pricing Library New GPU Pricing Library! Client project for Bank Sarasin! Highly regarded sustainable Swiss private bank! Founded 1841! Core business! Asset management! Investment advisory! Investment funds! Structured

More information

Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA

Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical

More information

FAILURE RATE TRENDS IN AN AGING POPULATION MONTE CARLO APPROACH

FAILURE RATE TRENDS IN AN AGING POPULATION MONTE CARLO APPROACH FAILURE RATE TRENDS IN AN AGING POPULATION MONTE CARLO APPROACH Niklas EKSTEDT Sajeesh BABU Patrik HILBER KTH Sweden KTH Sweden KTH Sweden niklas.ekstedt@ee.kth.se sbabu@kth.se hilber@kth.se ABSTRACT This

More information

Mark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling

Mark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling EE 357 Unit 12 Performance Modeling An Opening Question An Intel and a Sun/SPARC computer measure their respective rates of instruction execution on the same application written in C Mark Redekopp, All

More information

Real-Time Market Data Technology Overview

Real-Time Market Data Technology Overview Real-Time Market Data Technology Overview Zoltan Radvanyi Morgan Stanley Session Outline What is market data? Basic terms used in market data world Market data processing systems Real time requirements

More information

Why know about performance

Why know about performance 1 Performance Today we ll discuss issues related to performance: Latency/Response Time/Execution Time vs. Throughput How do you make a reasonable performance comparison? The 3 components of CPU performance

More information

RunnING Risk on GPUs. Answering The Computational Challenges of a New Environment. Tim Wood Market Risk Management Trading - ING Bank

RunnING Risk on GPUs. Answering The Computational Challenges of a New Environment. Tim Wood Market Risk Management Trading - ING Bank RunnING Risk on GPUs Answering The Computational Challenges of a New Environment Tim Wood Market Risk Management Trading - ING Bank Nvidia GTC Express September 19 th 2012 www.ing.com ING Bank Part of

More information

A Portable and Fast Stochastic Volatility Model Calibration using Multi and Many-Core Processors

A Portable and Fast Stochastic Volatility Model Calibration using Multi and Many-Core Processors A Portable and Fast Stochastic Volatility Model Calibration using Multi and Many-Core Processors Matthew Dixon Department of Analytics University of San Francisco San Francisco, CA mfdixon@usfca.edu Jörg

More information

2.1 Mathematical Basis: Risk-Neutral Pricing

2.1 Mathematical Basis: Risk-Neutral Pricing Chapter Monte-Carlo Simulation.1 Mathematical Basis: Risk-Neutral Pricing Suppose that F T is the payoff at T for a European-type derivative f. Then the price at times t before T is given by f t = e r(t

More information

High Performance Risk Aggregation: Addressing the Data Processing Challenge the Hadoop MapReduce Way

High Performance Risk Aggregation: Addressing the Data Processing Challenge the Hadoop MapReduce Way High Performance Risk Aggregation: Addressing the Data Processing Challenge the Hadoop MapReduce Way A. Rau-Chaplin, B. Varghese 1, Z. Yao Faculty of Computer Science, Dalhousie University Halifax, Nova

More information

PARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES

PARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES PARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES WIKTOR JAKUBIUK, KESHAV PURANMALKA 1. Introduction Dijkstra s algorithm solves the single-sourced shorest path problem on a

More information

Smoking Adjoints: fast evaluation of Greeks in Monte Carlo calculations

Smoking Adjoints: fast evaluation of Greeks in Monte Carlo calculations Report no. 05/15 Smoking Adjoints: fast evaluation of Greeks in Monte Carlo calculations Michael Giles Oxford University Computing Laboratory, Parks Road, Oxford, U.K. Paul Glasserman Columbia Business

More information

Anne Bracy CS 3410 Computer Science Cornell University

Anne Bracy CS 3410 Computer Science Cornell University Anne Bracy CS 3410 Computer Science Cornell University These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Complex question How fast is the

More information

Razor Risk Market Risk Overview

Razor Risk Market Risk Overview Razor Risk Market Risk Overview Version 1.0 (Final) Prepared by: Razor Risk Updated: 20 April 2012 Razor Risk 7 th Floor, Becket House 36 Old Jewry London EC2R 8DD Telephone: +44 20 3194 2564 e-mail: peter.walsh@razor-risk.com

More information

Option Pricing with the SABR Model on the GPU

Option Pricing with the SABR Model on the GPU Option Pricing with the SABR Model on the GPU Yu Tian, Zili Zhu, Fima C. Klebaner and Kais Hamza School of Mathematical Sciences, Monash University, Clayton, VIC3800, Australia Email: {yu.tian, fima.klebaner,

More information

Monte Carlo Option Pricing

Monte Carlo Option Pricing Monte Carlo Option Pricing Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Mark Harris mharris@nvidia.com Document Change History Version Date Responsible Reason for Change 1. 2/3/27 vpodlozhnyuk Initial release

More information

Binomial American Option Pricing on CPU-GPU Hetergenous System

Binomial American Option Pricing on CPU-GPU Hetergenous System Binomial American Option Pricing on CPU-GPU Hetergenous System Nan Zhang, Chi-Un Lei and Ka Lok Man Abstract We present a novel parallel binomial algorithm to compute prices of American options. The algorithm

More information

Numerical Methods in Option Pricing (Part III)

Numerical Methods in Option Pricing (Part III) Numerical Methods in Option Pricing (Part III) E. Explicit Finite Differences. Use of the Forward, Central, and Symmetric Central a. In order to obtain an explicit solution for the price of the derivative,

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

Chapter 2 Uncertainty Analysis and Sampling Techniques

Chapter 2 Uncertainty Analysis and Sampling Techniques Chapter 2 Uncertainty Analysis and Sampling Techniques The probabilistic or stochastic modeling (Fig. 2.) iterative loop in the stochastic optimization procedure (Fig..4 in Chap. ) involves:. Specifying

More information

Oracle Financial Services Market Risk User Guide

Oracle Financial Services Market Risk User Guide Oracle Financial Services User Guide Release 8.0.4.0.0 March 2017 Contents 1. INTRODUCTION... 1 PURPOSE... 1 SCOPE... 1 2. INSTALLING THE SOLUTION... 3 2.1 MODEL UPLOAD... 3 2.2 LOADING THE DATA... 3 3.

More information

An Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model

An Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model 2011 International Conference on Reconfigurable Computing and FPGAs An Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model Christian de Schryver, Ivan Shcherbakov, Frank

More information

Balance Sheet Management

Balance Sheet Management Balance Sheet Management white paper The content of this document is the intellectual property of MavenBlue BV. No part of this document may be used, copied, distributed, changed or made public without

More information

Monte Carlo Methods for Uncertainty Quantification

Monte Carlo Methods for Uncertainty Quantification Monte Carlo Methods for Uncertainty Quantification Abdul-Lateef Haji-Ali Based on slides by: Mike Giles Mathematical Institute, University of Oxford Contemporary Numerical Techniques Haji-Ali (Oxford)

More information

Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL

Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL Valentin Mena Morales, Pierre-Henri Horrein, Amer Baghdadi, Erik Hochapfel, Sandrine Vaton Institut Mines-Telecom; Telecom

More information

Financial Computing with Python

Financial Computing with Python Introduction to Financial Computing with Python Matthieu Mariapragassam Why coding seems so easy? But is actually not Sprezzatura : «It s an art that doesn t seem to be an art» - The Book of the Courtier

More information

Operational Risk Quantification System

Operational Risk Quantification System N O R T H E R N T R U S T Operational Risk Quantification System Northern Trust Corporation May 2012 Achieving High-Performing, Simulation-Based Operational Risk Measurement with R and RevoScaleR Presented

More information

Interest Rate Models: An ALM Perspective Ser-Huang Poon Manchester Business School

Interest Rate Models: An ALM Perspective Ser-Huang Poon Manchester Business School Interest Rate Models: An ALM Perspective Ser-Huang Poon Manchester Business School 1 Interest Rate Models: An ALM Perspective (with NAG implementation) Ser-Huang Poon Manchester Business School Full paper:

More information

Market interest-rate models

Market interest-rate models Market interest-rate models Marco Marchioro www.marchioro.org November 24 th, 2012 Market interest-rate models 1 Lecture Summary No-arbitrage models Detailed example: Hull-White Monte Carlo simulations

More information

CUDA Implementation of the Lattice Boltzmann Method

CUDA Implementation of the Lattice Boltzmann Method CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Buffalo 2 Dec 2010 A. Leach (University at Buffalo) CUDA LBM Nov 2010 1 / 16 Motivation The Lattice

More information

Computer Exercise 2 Simulation

Computer Exercise 2 Simulation Lund University with Lund Institute of Technology Valuation of Derivative Assets Centre for Mathematical Sciences, Mathematical Statistics Fall 2017 Computer Exercise 2 Simulation This lab deals with pricing

More information

BlitzTrader. Next Generation Algorithmic Trading Platform

BlitzTrader. Next Generation Algorithmic Trading Platform BlitzTrader Next Generation Algorithmic Trading Platform Introduction TRANSFORM YOUR TRADING IDEAS INTO ACTION... FAST TIME TO THE MARKET BlitzTrader is next generation, most powerful, open and flexible

More information

Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking

Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking Christian de Schryver #, Henning Marxen, Daniel Schmidt # # Micrelectronic Systems Design Department, University

More information

EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS

EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Commun. Korean Math. Soc. 23 (2008), No. 2, pp. 285 294 EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Kyoung-Sook Moon Reprinted from the Communications of the Korean Mathematical Society

More information

XSG. Economic Scenario Generator. Risk-neutral and real-world Monte Carlo modelling solutions for insurers

XSG. Economic Scenario Generator. Risk-neutral and real-world Monte Carlo modelling solutions for insurers XSG Economic Scenario Generator Risk-neutral and real-world Monte Carlo modelling solutions for insurers 2 Introduction to XSG What is XSG? XSG is Deloitte s economic scenario generation software solution,

More information

Valuation of Discrete Vanilla Options. Using a Recursive Algorithm. in a Trinomial Tree Setting

Valuation of Discrete Vanilla Options. Using a Recursive Algorithm. in a Trinomial Tree Setting Communications in Mathematical Finance, vol.5, no.1, 2016, 43-54 ISSN: 2241-1968 (print), 2241-195X (online) Scienpress Ltd, 2016 Valuation of Discrete Vanilla Options Using a Recursive Algorithm in a

More information

As we saw in Chapter 12, one of the many uses of Monte Carlo simulation by

As we saw in Chapter 12, one of the many uses of Monte Carlo simulation by Financial Modeling with Crystal Ball and Excel, Second Edition By John Charnes Copyright 2012 by John Charnes APPENDIX C Variance Reduction Techniques As we saw in Chapter 12, one of the many uses of Monte

More information

Index. High-Frequency Trading Models By Gewei Ye Copyright 2011 by Gewei Ye.

Index. High-Frequency Trading Models By Gewei Ye Copyright 2011 by Gewei Ye. High-Frequency Trading Models By Gewei Ye Copyright 2011 by Gewei Ye. Index Abstraction, 14 Advanced trading strategies with SAPE Black-Scholes model, 290 292 large cap hedge strategy, 219 large cap long

More information

An Adjusted Trinomial Lattice for Pricing Arithmetic Average Based Asian Option

An Adjusted Trinomial Lattice for Pricing Arithmetic Average Based Asian Option American Journal of Applied Mathematics 2018; 6(2): 28-33 http://www.sciencepublishinggroup.com/j/ajam doi: 10.11648/j.ajam.20180602.11 ISSN: 2330-0043 (Print); ISSN: 2330-006X (Online) An Adjusted Trinomial

More information

Institute of Actuaries of India. Subject. ST6 Finance and Investment B. For 2018 Examinationspecialist Technical B. Syllabus

Institute of Actuaries of India. Subject. ST6 Finance and Investment B. For 2018 Examinationspecialist Technical B. Syllabus Institute of Actuaries of India Subject ST6 Finance and Investment B For 2018 Examinationspecialist Technical B Syllabus Aim The aim of the second finance and investment technical subject is to instil

More information

Fast American Basket Option Pricing on a multi-gpu Cluster

Fast American Basket Option Pricing on a multi-gpu Cluster Fast American Basket Option Pricing on a multi-gpu Cluster Michael Benguigui, Françoise Baude To cite this version: Michael Benguigui, Françoise Baude. Fast American Basket Option Pricing on a multi-gpu

More information

FINANCIAL DERIVATIVE. INVESTMENTS An Introduction to Structured Products. Richard D. Bateson. Imperial College Press. University College London, UK

FINANCIAL DERIVATIVE. INVESTMENTS An Introduction to Structured Products. Richard D. Bateson. Imperial College Press. University College London, UK FINANCIAL DERIVATIVE INVESTMENTS An Introduction to Structured Products Richard D. Bateson University College London, UK Imperial College Press Contents Preface Guide to Acronyms Glossary of Notations

More information

Modelling Counterparty Exposure and CVA An Integrated Approach

Modelling Counterparty Exposure and CVA An Integrated Approach Swissquote Conference Lausanne Modelling Counterparty Exposure and CVA An Integrated Approach Giovanni Cesari October 2010 1 Basic Concepts CVA Computation Underlying Models Modelling Framework: AMC CVA:

More information

Structural credit risk models and systemic capital

Structural credit risk models and systemic capital Structural credit risk models and systemic capital Somnath Chatterjee CCBS, Bank of England November 7, 2013 Structural credit risk model Structural credit risk models are based on the notion that both

More information

Accelerating Reconfigurable Financial Computing

Accelerating Reconfigurable Financial Computing Imperial College London Department of Computing Accelerating Reconfigurable Financial Computing Hong Tak Tse (Anson) Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy

More information

Modeling Path Dependent Derivatives Using CUDA Parallel Platform

Modeling Path Dependent Derivatives Using CUDA Parallel Platform Modeling Path Dependent Derivatives Using CUDA Parallel Platform A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Mathematical Sciences in the Graduate School of The

More information

The Binomial Lattice Model for Stocks: Introduction to Option Pricing

The Binomial Lattice Model for Stocks: Introduction to Option Pricing 1/33 The Binomial Lattice Model for Stocks: Introduction to Option Pricing Professor Karl Sigman Columbia University Dept. IEOR New York City USA 2/33 Outline The Binomial Lattice Model (BLM) as a Model

More information

Plain Vanilla - Black model Version 1.2

Plain Vanilla - Black model Version 1.2 Plain Vanilla - Black model Version 1.2 1 Introduction The Plain Vanilla plug-in provides Fairmat with the capability to price a plain vanilla swap or structured product with options like caps/floors,

More information

Automated Options Trading Using Machine Learning

Automated Options Trading Using Machine Learning 1 Automated Options Trading Using Machine Learning Peter Anselmo and Karen Hovsepian and Carlos Ulibarri and Michael Kozloski Department of Management, New Mexico Tech, Socorro, NM 87801, U.S.A. We summarize

More information

History of Monte Carlo Method

History of Monte Carlo Method Monte Carlo Methods History of Monte Carlo Method Errors in Estimation and Two Important Questions for Monte Carlo Controlling Error A simple Monte Carlo simulation to approximate the value of pi could

More information