Many-core Accelerated LIBOR Swaption Portfolio Pricing
|
|
- Megan Leona Robinson
- 5 years ago
- Views:
Transcription
1 2012 SC Companion: High Performance Computing, Networking Storage and Analysis Many-core Accelerated LIBOR Swaption Portfolio Pricing Jörg Lotze, Paul D. Sutton, Hicham Lahlou Xcelerit Dunlop House, Fenian Street Dublin 2, Ireland Telephone: {jorg.lotze,paul.sutton,hicham.lahlou}@xcelerit.com Abstract This paper describes the acceleration of a Monte- Carlo algorithm for pricing a LIBOR swaption portfolio using multi-core CPUs and GPUs. Speedups of up to 305x are achieved on two Nvidia Tesla M2050 GPUs and up to 20.8x on two Intel Xeon E5620 CPUs, compared to a sequential CPU implementation. This performance is achieved by using the Xcelerit platform writing sequential, high-level C++ code and adopting a simple dataflow programming model. It avoids the complexity involved when using low-level high-performance computing frameworks such as OpenMP, OpenCL, CUDA, or SIMD intrinsics. The paper provides an overview of the Xcelerit platform, details how high performance is achieved through various automatic optimisation and parallelisation techniques, and shows how the tool can be used to implement portable accelerated Monte-Carlo algorithms in finance. It illustrates the implementation of the Monte-Carlo LIBOR swaption portfolio pricer and gives performance results. A comparison of the Xcelerit platform implementation with an equivalent low-level CUDA version shows that the overhead introduced is less than 1.5% in all scenarios. I. INTRODUCTION Pricing financial derivatives is one of the most important problems in computational finance. This is often a computeintensive task, especially when considering large portfolios or derivatives with complicated features. Typically closed-form algorithms only exist for the simplest of cases, for example single-asset European options with the Black-Scholes assumptions applied [1]. In all other cases, numerical solutions have to be found, e.g., applying latticebased methods (for example binomial or trinomial trees), finite difference schemes, or Monte-Carlo simulations. These algorithms are computationally demanding and financial institutions worldwide are currently exploring high-performance computing (HPC) hardware such as multi-core CPUs, grids, and GPUs to deal with these demands. However, financial analysts are primarily mathematicians and typically have little or no experience in programming HPC hardware using development frameworks such as OpenMP [2], OpenCL [3], or CUDA [4], or using SIMD intrinsics. This paper examines the acceleration of a Monte-Carlo algorithm to price a portfolio of LIBOR swaptions. The Xcelerit platform is used to exploit HPC hardware including multi-core CPUs and GPUs using high-level, portable C++ source code. The Xcelerit platform permits users to avoid the challenges associated with programming using the low-level frameworks mentioned above, yet still achieve high performance applications. Applications can be efficiently executed on different many-core processors and are compiled from a single source codebase. The Xcelerit platform has a GPU back-end using Nvidia s CUDA Toolkit [4], and we provide comparisons to a low-level CUDA implementation in this paper. CUDA is a proprietary programming toolkit that defines a programming model, a language based on C++, and an API for programming Nvidia GPUs. A compiler for GPU code and a large set of GPUbased libraries for specific purposes is included in the toolkit. Using CUDA requires expert knowledge about the GPU hardware architecture, the CUDA C++ language, and data-parallel programming and synchronization techniques. The Xcelerit platform avoids that complexity by providing a programming interface on a higher level of abstraction. Fully utilising the compute power of multi-core CPUs, including their Single Instruction Multiple Data (SIMD) instruction sets, requires parallel and low-level programming expertise. The Xcelerit platform automates this task, improving programmer productivity and code portability. The paper is structured as follows. The algorithm used to price a LIBOR swaption portfolio is presented in Sec. II. Sec. III provides an overview of the Xcelerit platform, it s programming model and how applications are developed. This section also describes the automatic parallelisation and optimisation techniques applied within the platform to achieve high performance. Monte-Carlo simulations are common in computational finance not only for financial derivatives, but also in risk management algorithms. Therefore a generic strategy for implementing financial Monte-Carlo algorithms using the Xcelerit platform is detailed in Sec. IV. This approach is applied in Sec. V to the LIBOR swaption portfolio pricing algorithm and performance results are presented for both multi-core CPU and GPU hardware. A detailed performance comparison with an equivalent low-level CUDA implementation is given in Sec. VI and Sec. VII concludes the paper. II. PRICING A LIBOR SWAPTION PORTFOLIO Swaptions are options on financial swap contracts, i.e., they provide one party with the right to enter a swap agreement at a future date, where a pre-determined fixed interest rate /13 $ IEEE DOI /SC.Companion
2 is exchanged for a floating rate [5]. The London Interbank Offered Rate (LIBOR) is the interest rate applied for loans between banks, and is calculated for ten different currencies and 15 borrowing terms ranging from overnight to one year on a daily basis (i.e., it is a floating rate subject to change every day) [5], [6]. In a LIBOR swaption, the floating interest rate of the swap agreement is the LIBOR rate. To value a portfolio of LIBOR swaptions, a stochastic model that can predict the future development of the LIBOR rate for a given term is required. Based on this model, the value of each swaption can be determined, and the overall portfolio value can be computed by applying a payoff function. The evaluation is typically performed using a Monte-Carlo simulation, i.e., a large number of different possible developments for the LIBOR rate for different time steps is simulated using random numbers, the portfolio is valued for each case, and the final result is obtained by computing an average of all paths. For high accuracy, a large number of Monte-Carlo paths is required; typically more than 100,000 paths. A. Algorithm In this paper, we apply the algorithm introduced in [7], which we briefly outline in the following. We denote the forward LIBOR rate by L n i for the time interval [iδ, (i + 1)δ], where δ is the LIBOR interval. If the simulation time-step is chosen equal to the LIBOR interval, the forward LIBOR rates at the times nδ < iδ can be approximated by the equations L n+1 i = L n i e (σi n 1Sn i 1 2 σ2 i n 1)δ+σ i n 1Z n δ for all i > n and where n = 0,..., N mat 1. The number of time steps to maturity is denoted by N mat. The variable Z n is a standard normal distributed random variable for the n th time-step, and S n i = i j=n+1 σ j n 1 δl n j 1 + δl n j (1). (2) This model treats the volatility σ as a function of time to maturity, which remains fixed once the maturity is reached. Therefore, L n+1 i = L n i for all time steps i n. Based on these forward LIBOR rates, the payoff V of a portfolio of N opt swaptions with the swap rates C j and maturities T j (with j = 0,..., N opt ) can be computed as { Nmat 1 } 1 V = 1 + δl i=0 i N opt 1 100(1 B Tj C o S Tj ) +, (3) where j=0 S m = B m = m 1 i=0 m 1 i=0 δb i, (4) 1. (5) 1 + δl Nmat+i Thus, to price a portfolio of LIBOR swaptions using a Monte-Carlo simulation, the following steps have to be taken: i) Draw N mat N paths standard normal random samples (N paths is the number of Monte-Carlo paths), ii) Compute the LIBOR forward rates for each path using (1), iii) Compute the portfolio payoff for each path using (3), and iv) Average all paths to obtain the final portfolio value. B. Greeks For financial institutions, it is valuable to not only determine the value of an instrument, but also the sensitivity to changes in the parameters on which this value depends. This permits the application of hedging techniques to compensate for the risk associated with one asset by adding other assets with reverse characteristics to the institution s portfolio [5]. These sensitivities are denoted by Greek letters, commonly referred to as the Greeks. For the LIBOR swaption portfolio pricing detailed in this paper, we focus on the λ Greek to serve as an example, which denotes the percentage change in derivative value per percentage change in the underlying price [8]. It is sometimes also called Ω or elasticity, and defined as λ = V S S V, (6) where V is the derivative s value and S is the price of the underlying asset. Here we use an adjoint method to compute the value of λ simultaneously with the portfolio value, as detailed in [9]. C. Computational Complexity Acceptable accuracies for a Monte-Carlo simulation are generally achieved for 100,000 paths or more. This means, for N mat = 40, a total number of 4 million random numbers must be drawn, and the equations outlined above have to be computed for each path. Furthermore, when estimating the Greeks the algorithm becomes significantly more complex. For example, a straightforward sequential C language implementation of the algorithm including Greeks for 15 swaptions, N mat = 40, and 128K paths (here K denotes 1024) takes 20 seconds on a Xeon E5620 CPU. For a financial institution with many assets to value and which typically uses more paths for better accuracy, this time is not acceptable. Therefore high-performance parallel processors have to be considered for this algorithm. III. THE XCELERIT PLATFORM The Xcelerit platform consists of the Xcelerit SDK at its core and several add-ons providing specialised functions or interfaces. The Xcelerit SDK permits the efficient use of many-core processors, i.e., multi-core CPUs and GPUs, from a single high-level codebase written in a high-level programming language such as C++, C, C#, or Java (the code samples in this paper use the C++ API). It is available for the Linux and Windows operating systems. Its dataflow programming model, based on the Synchronous Dataflow (SDF) model of 1186
3 source actor RandomSrc source actor RandomSrc generic actor Sum sink actor MemoryWriter Fig. 1. A simple Xcelerit dataflow graph, computing the sum of two random streams of numbers. computation [10], allows source code to be automatically optimised and parallelised without requiring programmers to include any parallel constructs. This ensures that user code is simple and portable, making it an attractive framework for implementing financial algorithms such as the LIBOR swaption portfolio pricing explained in Sec. II. Add-ons for the Xcelerit SDK include statistical functions and random number generators, or interfaces to Excel and Matlab. A complete package for computational finance, Xcelerit Quant, combines all add-ons relevant to this domain. A. Programming Model To use the Xcelerit API, compute-intensive algorithms are expressed as a dataflow graph a succession of processing stages (termed actors) which are connected together. A series of actors are continuously applied to streams of data, flowing through the graph. Source actors generate the data, generic actors take data from their input ports and compute the output data to be placed into their output ports, and sink actors consume data (for example, save it into a file). Fig. 1 shows a simple example dataflow graph, generating two streams of random numbers, computing their sum, and writing the results to memory. Generic actors can be configured using parameters and constant look-up tables. To describe a program using the Xcelerit API, source actors, sink actors, and generic actors are instantiated and connected to a dataflow graph, which is then executed in parallel on the available multi-core CPUs or GPUs. The use of this programming model enables parallel execution while ensuring that common parallel programming bugs such as race conditions or deadlocks cannot be introduced. B. Application Development Fig. 2 gives an overview of the components of the Xcelerit SDK. Developing applications using the Xcelerit SDK involves the following steps: i) Identify the performance-critical part of the application through profiling, ii) For these bottlenecks, map the algorithm to a dataflow graph, iii) Implement the dataflow actors needed (or choose from provided library actors), and iv) Instantiate actors, connect to a flow graph, and execute the graph. Fig. 2. The Xcelerit SDK Architecture. Using the C++ API, actors are implemented as a class inheriting from a common base class. Input and output ports and optionally parameters and constant look-up tables are added to the public interface as required. A run() method is implemented which performs the core computation of the actor, i.e., it takes the data from the input ports, processes it, and places the results into the output ports. It may use the parameter values and constant look-up tables, and helper functions and other classes for the computation. These actors are then compiled by the Xcelerit Compiler Driver, which drives a set of compilation tools and existing standard compilers for CPUs and GPUs. C++ actors are compiled into a single binary file that holds both GPU and CPU code. This permits the runtime to pick the appropriate implementation depending on the available compute resources. For composing a dataflow graph, actor objects are instantiated and connected using C++ operators. The graph can then be executed and the Xcelerit SDK runtime will automatically handle its efficient execution on the available compute resources. This simple methodology makes the Xcelerit SDK a good candidate to exploit high performance many-core processors for algorithms such as the LIBOR swaption portfolio pricer described in Sec. II. Through the use of a dataflow programming model, it ensures high performance parallel execution while maintaining programmer productivity. To ease development, the Xcelerit SDK comes with a set of development tools, e.g., a profiler, debugger integration, IDE plug-ins, and a library of often-used functions and actors. In addition, conventional CPU and GPU debuggers and profilers can be used. C. Under the Hood This section gives an insight into how the Xcelerit SDK works under the hood. It explains the optimisations employed in order to improve the performance. Several types of parallelism can be extracted for the efficient execution of Xcelerit programs. These types of parallelism are 1187
4 RandomSrc RandomSrc Sum MeanReduce time (a) + A A x 0 x 1 x 2 A y 2 y 1 y 0 input items output items (a) Fig. 4. Comparison of a sequential reduction (a) and a parallel version (b) for adding 8 values together. The sequential version takes 7 time steps to complete, while the parallel version finishes in 3. (b) (b) Fig. 3. Pipeline parallelism (a), and data parallelism (b) in dataflow graphs as applied by the Xcelerit SDK. The round arrow represents a concurrent thread of execution. extracted automatically from sequential Xcelerit-enabled user code by the Xcelerit Compiler Driver and Runtime System. Efficient parallel code is generated and CPU and GPU memory access optimisations are carried out. 1) Pipeline Parallelism: Pipeline parallelism is a form of task parallelism, where every stage in a processing pipeline is executed concurrently on different sets of data. This form of parallelism, illustrated in Fig. 3(a), yields a significant gain in performance, especially when the actors are of similar computational complexity. 2) Data Parallelism: It is possible to execute multiple instances of the same actor function in parallel, each working on a different set of the input and output data. This form of parallelism, illustrated in Fig. 3(b), provides efficient load balancing between concurrent executions, as all execute the same code. Data parallelism is the primary form of parallelism supported by GPUs. 3) Vectorisation (SIMD): Single-instruction multiple-data (SIMD) parallelism works at a much finer granularity. As the name implies, it applies single processor instructions to vectors of data at once, reducing the total number of instructions needed. It is a form of data parallelism which requires hardware support, i.e., the underlying hardware must provide special SIMD instructions and wide registers to hold vector data. Modern CPUs support SIMD in different instruction sets, e.g., SSE (16-byte data vectors), AVX (32 byte data vectors), or AltiVec (different widths). GPUs from AMD also provide SIMD instructions. An AVX multiplication operation can for example compute the element-wise product of vectors of 8 floating point numbers simultaneously which results in an approximately 8x faster computation compared to using scalar multiply instructions. The Xcelerit API is implemented specifically to allow back-end compilers to carry out automatic SIMD vectorization. Note that the next release, currently under development, will include more direct methods to implement vectorised computations from a high-level API. Main Memory (xgb) L2 Cache L1 Cache Registers L3 Cache (8MB) 256KB 64KB L2 Cache L1 Cache Registers Core 1... Core 6 (a) speed size Registers Main Memory Constant Memory Shared Memory Registers Core 1... Core 32 Fig. 5. Example memory and cache hierarchy of typical CPUs (a) and GPUs (b). On the left is an example of a 6-core CPU of Intel s Nehalem architecture, and on the right is a typical GPU of Nvidia s Fermi architecture. 4) Parallel Reductions: Reductions are central to many algorithms, where partial results are combined to a final result. It is therefore crucial for the compiler to generate efficient code for reductions. There are many ways to implement reductions in parallel, where partial reductions are computed concurrently and reduced further in subsequent steps. Fig. 4 illustrates a sequential and a parallel implementation of a reduction that sums up all individual items in a buffer. The parallel sum finishes after 3 steps, whereas the sequential version needs 7 steps to complete. More generically, the parallel reduction needs log 2 N steps, while the sequential version needs N 1 steps. On GPUs, shared memory (between threads) and efficient thread block sizes can be used to speed up the process. The Xcelerit SDK provides a set of highly-optimised built-in reduction actors and allows users to easily configure them with their own reduction functions. 5) Memory Access Optimisations: Typically, processors have a hierarchy of memory, starting from registers (the fastest and smallest), over several levels of cache (fast, larger with every level), to external memory (slowest and largest). Being able to place data used by actors in memory at a level close to the processing element can improve the performance of the computation significantly. The Xcelerit SDK exploits memory locality and performs cache optimisations where possible. That is, it ensures that the data needed is kept physically close to (b)
5 RandomSrc PathCalc Reduce Writer random samples payoff calculation reduce paths/option write results Fig. 6. Typical Xcelerit Dataflow Graph for Monte-Carlo Simulations. the processing element, using the memory hierarchy present in today s systems efficiently. For illustration, Fig. 5 shows the architecture of a typical CPU and GPU with different levels of cache and memory. Furthermore, the Xcelerit SDK ensures that while a thread is waiting for data to arrive from memory, other threads can compute (overlapping computation with memory access). For GPUs, Xcelerit applications make efficient use of fast shared memory and registers where possible. Fast global memory access is ensured where possible (coalescing). All of these optimisations ensure that all processors involved in the computation are kept busy and are not held back by memory access latencies. Note that these optimisations are automatic and do not require code annotations. IV. MONTE-CARLO SIMULATIONS USING THE XCELERIT PLATFORM Monte-Carlo simulations involve simulating a large number of random paths and computing the per-path results individually. These results are then combined using statistical analysis in order to compute the final results. Generally, the more paths used in the simulation, the more accurate the result. For pricing financial derivatives using a Monte-Carlo method, typically the following steps are followed: i) Generate random samples ii) Calculate the payoff for each path, using random samples, iii) Average the payoffs of all paths, and iv) Save the results The final result is a numerical estimate for the value of the derivative. This approach can be used to obtain the price of derivatives with different complexity. For example, several sources of uncertainty can easily be incorporated, correlations between underlyings in multi-asset options 1 can be modelled, or different probability distributions can be incorporated. Furthermore, Monte-Carlo simulation can also be applied in risk management and other areas, following a similar approach. They are a powerful tool for financial analysts. An Xcelerit dataflow graph for the general steps involved in a Monte-Carlo simulation is shown in Fig. 6. It is a straightforward mapping of the algorithm explained above, and can be implemented directly. A. Random Number Generation It is critical for any Monte-Carlo simulation that the pseudorandom number generator used exhibits the desired statistical properties, and has a long enough period to avoid repetitions of the data. The accuracy of the final result is highly dependent 1 multi-asset options are also called basket options or rainbow options on the quality of the random number generator used. The Xcelerit SDK Statistics Add-on provides a set of high-quality generators. Note that user-defined generators can also be implemented for specific requirements. B. Per-Path Computation This part of the algorithm is typically straightforward, as each path can be computed independently using different random numbers. Payoff functions can be of arbitrary complexity, depending on the type of derivative to be priced. Simple functions only need the final value of the underlying assets, while others might involve weighted payoffs of several underlyings. For risk computations, the per-path calculation can be of a different nature, for example computing the expected loss for each scenario. With the Xcelerit platform, this part is implemented within a custom generic actor, taking the per-path random numbers from its input port and computing the associated payoff for the output port. C. Reduction and Writer Averaging the individual per-path payoffs can be achieved by using the provided parallel reduction actors in the Xcelerit SDK. Usually multiple derivatives must be priced, which means a block-wise reduction is required to average all paths per option. This outputs a stream of option values (as illustrated in Fig. 6), which are received by some Writer sink actor and stored to file, for example. If only a single instrument is valued, as in the LIBOR swaption portfolio case, a full reduction can be used. This is a sink actor which simply reduces all per-path values on its input and stores the final result in a variable. The additional Writer component is not required in this case. If the provided reductions are not sufficient, users can configure a generic reduction sink actor with their own reduction function, which will then be executed in parallel. V. LIBOR SWAPTION PRICING WITH THE XCELERIT SDK This section presents the implementation of the LIBOR swaption portfolio pricing algorithm described in Sec. II using the Xcelerit SDK. Performance results for a number of different configurations are presented and compared with a sequential reference implementation. A. Implementation The first step in the implementation is to map the algorithm to a dataflow graph, as shown in Fig. 7. The algorithm includes the computation of the λ Greek, hence the two sink actors one for the swaption portfolio value and another for the associated λ. Only a single portfolio is priced, rather than multiple options, and therefore a full reduction sink can be used for both outputs (instead of a block-wise reduction and writer combination, as described in Sec. IV-C). To provide the random samples, the Xcelerit-provided source actor RandomSrc is set up with the MRG32K3a 1189
6 RandomSrc LiborSwaptionGreek MeanReduce portfolio value TABLE I SPEEDUPS ACHIEVED FOR THE LIBOR SWAPTION PRICER USING 512K PATHS (GPUS: TESLA M2050). normal distribution payoff and λ calc. MeanReduce λ value Fig. 7. Xcelerit Dataflow Graphs for LIBOR Swaption Portfolio Pricing (with λ Greek). random number generator [11] and a normal distribution. The number of samples needed, the mean and standard deviation, and the generator seed are set up in the constructor. The LiborSwaptionGreek actor is user-defined, with an outline given below: class LiborSwaptionGreek : public Actor { public: Input<float> z; // rand. sample Output<float> v, lb; // value/lambda Constant<float> swaprates, L0, lambda; Constant<int> maturities; Parameter<int> nopt;... // setup / construct // core algorithm actor void run() const { float L[NN], L2[L2_SIZE]; // temporary values float *L_b = L; // copy initial LIBOR rates from constant copy(l0.begin(), L0.end(), L); // LIBOR rates calc., store reverse paths path_calc_b1(l, L2); // compute portfolio value, updating L_b v[0] = portfolio_b(l, L_b); // Reverse path calc for Greek path_calc_b2(l_b, L2); // Greek is the last entry in L_b lb[0] = L_b[NN-1]; } private: actor void path_calc_b1(float* L, float* L2) const; actor float portfolio(float* L, float* L_b) const; actor float path_calc_b2(float* L_b, float* L2) const; }; As can be seen, the actor has one input for the random samples, and two outputs for the portfolio value and λ. It is initialised with the swaption data, i.e., the number of swaptions, maturities, and swap rates, and a set of initial LIBOR rates and λ values. From these, the portfolio value and λ are computed using the random input samples. The two reduction sinks are provided by the Xcelerit SDK, they simply compute the mean of all input values. The code for the actor instantiation, construction of the dataflow graph, and graph execution is as follows: // instantiate actors RandomSrc<float, RNDGEN_MRG32K3a, RNDDIST_NORMAL> samplegen(numpaths*nummat, SEED, 0.0f, 1.0f); LiborSwaptionGreek libor(swaprates, maturities, nopt, lambda0, L0); MeanReduce<float> meanvalue(&value); MeanReduce<float> meanlambda(&lambda); // construct dataflow graph Flowgraph f; Precision 1 GPU 2 GPUs single 155x 297x double 89x 171x f += samplegen >> libor, libor.v >> meanvalue, libor.lb >> meanlambda; // execute graph f.run(); It constructs objects of all needed actors, creates and connects a dataflow graph by connecting ports using the >> operator, and runs the graph. This executes the application in a highly efficient way on both multi-core CPUs and GPUs, depending on the available resources. B. Results To evaluate the performance of the Xcelerit platform implementation, the application is executed on a system with the following configuration: CPUs: 2 Intel Xeon E5620, hyperthreading off (8 cores) GPUs: 2 Nvidia Tesla M2050, ECC off RAM: 24GB OS: RedHat Enterprise Linux 5.4, 64bit CUDA SDK version: 4.2 GPU driver version: Xcelerit SDK version: Compiler: NVCC with GCC 4.1 as host compiler Compiler flags: -O3 -DNDEBUG Execution times for the LIBOR swaption pricing algorithm (with Greeks) are measured for single and double precision, using path numbers between 4K and 1024K (here K is 1024), a portfolio of 15 swaptions, and N mat = 40. The full dataflow graph execution is measured, including the random number generator and reduction as well as the CPU/GPU data transfers (managed by the Xcelerit SDK). These execution times are compared to an equivalent sequential implementation using a single CPU core. This sequential implementation uses the host API (CPU) of Nvidia curand s random number generator [12] to ensure that a generator of a quality comparable to Xcelerit s built-in random number generator is used. Fig. 8 shows the speedups achieved on GPU hardware with one and two GPUs. Using a single GPU, speedups of up to 155x can be realised for a single precision implementation for 512K paths. By using two M2050 GPUs, this speedup figure can be increased to 297x. This is an improvement of factor 1.9x when adding an extra GPU, which shows very good scalability, without changing the source code or re-compiling. The imperfect scalability can be explained by the unavoidable overhead involved in managing multiple GPUs and splitting the data and computation between them. 1190
7 speedup Nvidia Tesla M2050 GPU(s) 2 GPU, single 1 GPU, single 2 GPU, double 1 GPU, double 4K 16K 64K 256K 1024K number of paths TABLE II TIME COMPARISON XCELERIT SDK VS. CUDA FOR LIBOR SWAPTION PORTFOLIO PRICER (TIMES IN MILLISECONDS). Paths All DT 1 RNG 2 CE 3 RED 4 CUDA 32K 42.0 < Xcelerit 32K Overhead 32K 0% 90% 8% -2% 0% CUDA 128K < Xcelerit 128K Overhead 128K 1.5% 70% 11% -1.2% 67% CUDA 512K Xcelerit 512K Overhead 512K 1.4% 40% 24% -0.8% 67% AAA 1 CPU/GPU data transfers and memory allocations AAAI 2 Random number generation AAAI 3 Core LIBOR per-path calculation AAAI 4 Reductions (mean of all per-path values) Fig. 8. LIBOR swaption pricing performance on the GPU for double and single precision, compared to a sequential implementation running on a single core. All GPU results for 512K paths are also summarised in Tab. I. Speedups on multi-core CPUs range between 7.4x and 7.9x for 128K paths and more (using the 8 CPU cores in the test system). The difference between single and double precision is insignificant on the CPU. A new version of the Xcelerit SDK providing a direct simple API for vectorised computations is currently under development. Using this, preliminary results have shown speedups of 20.8x (single precision) and 13.6x (double precision) on the same system for 256K paths. This significantly higher speedup is thanks to a more efficient use of SIMD, and the difference between single and double precision is due to the different number of vector elements fitting into the available SIMD registers (128bit wide). VI. COMPARISONS WITH A LOW-LEVEL CUDA IMPLEMENTATION In this section, we will compare the performance and overhead of using the Xcelerit SDK for the LIBOR swaption portfolio pricer on GPUs with an equivalent low-level implementation using CUDA directly. A. CUDA Implementation The CUDA reference which serves as a basis and is available from Oxford University at [13]. We believe this code has a reasonable optimisation level which reflects the performance of the CUDA framework. The kernels have been left untouched, but for comparison fairness the random number generator has been replaced with Nvidia s curand library [12] (as the originally-used generator is not publicly available), and the mean reduction, originally done on the CPU, has been replaced by a parallel GPU-based version using the Thrust library [14]. Further, the number of threads per block has been adjusted from the original 64 (optimised for older GPUs) to 256 to be optimal for the Tesla M2050. All other code is identical to the original implementation [13]. As the reference CUDA implementation uses single precision floats and executes on a single GPU, we will compare with the equivalent variant of the Xcelerit SDK implementation. Adding multiple GPUs to the CUDA version would involve changes to the program architecture, using multiple host threads and a different approach to management of the data transfers. With the Xcelerit SDK, this is all handled automatically. Additional GPUs or multi-core CPU cores can be used without the need for source code changes or even recompilation of the application. A single binary makes it possible to fully leverage the available processing hardware. B. Metrics For the tests, the same machine configuration as mentioned in Sec. V is used. The following metrics are used for comparison: Overall application runtime Individual times: data transfers and memory allocations (DT), random number generation (RNG), core execution (CE), reduction (RED) Visual Profiler performance and efficiency metrics (detailed below) These are compared for a range of different Monte-Carlo path numbers. C. Results Tab. II shows the overall application runtime as well as a breakdown of the individual tasks for a range of different Monte-Carlo path numbers. As can be seen, the overhead introduced by the Xcelerit SDK is within 1.5% of the CUDA version in all cases. There is slightly more time taken for the memory allocations and data transfers with the Xcelerit SDK due to buffering of data between actors. The random number generator also takes slightly more time than the curand version used in the reference implementation. This is due to the more generic implementation of the Xcelerit SDK version. The most relevant part for the overall application is the core LIBOR function ( CE in table) which computes the per-path 1191
8 TABLE III NVIDIA VISUAL PROFILER METRICS FOR XCELERIT SDK VS. CUDA FOR LIBOR SWAPTION PORTFOLIO PRICER. Paths Reg/Thr GlbLd GlbSt DRAM Branch Occup CUDA 32K % 100% 42.8% 12.5% 55.4% Xcelerit 32K % 100% 46.8% 12.2% 56.3% CUDA 128K % 100% 48.3% 12.4% 64.1% Xcelerit 128K % 100% 52.3% 12.2% 64.3% CUDA 512K % 100% 49.5% 12.5% 65.5% Xcelerit 512K % 100% 53.5% 12.1% 65.7% portfolio values for all Monte-Carlo paths, all other parts are insignificant for the overall result. From the results in Tab. II it can be seen that the Xcelerit SDK is slightly faster than the original CUDA version. In the following we take a closer look into the core execution function by using the Nvidia Visual Profiler [15]. The following metrics reported by the profiler have been chosen for comparison: Reg/Thr Number of registers used per GPU thread (for information only) GlbLd Global Load Efficiency, i.e., efficiency of reading from global device memory (higher is better) GlbSt Global Store Efficiency, i.e., efficiency of writing to global device memory (higher is better) DRAM Utilization of the available DRAM bandwidth (higher is better) Branch Branch divergence overhead (lower is better) Occup Occupancy of the available processor cores (higher is better) The results are presented in Tab. III. As can be seen, most of the metrics show approximately the same results for the CUDA kernel and Xcelerit SDK actor. The biggest difference is in the global load efficiency, where the Xcelerit SDK achieves approximately 92% and the CUDA kernel only 20%. This explains the difference in the core execution times reported in Tab. II. The Xcelerit SDK takes care that the memory accesses on the device are coalesced, i.e., the data and memory reads are arranged in a fashion that avoids the serialisation of threads within a WARP (blocks of 32 threads) while accessing memory. This is not always possible, but in this application the benefit clearly shows. D. Summary As shown in this section, the overhead of the Xcelerit SDK compared to a low-level CUDA implementation is negligible for the LIBOR swaption portfolio pricing algorithm. This small overhead is the cost of developing algorithms on a much higher level (increasing productivity), with the added benefit of generating portable binaries that can run on any number of GPUs or multi-core CPUs. VII. CONCLUSIONS This paper has presented the acceleration of a Monte- Carlo LIBOR swaption portfolio pricer using the Xcelerit platform. It has shown that a dramatic performance increase can be achieved on GPUs (up to 305x on 2 Nvidia Tesla M2050 GPUs) while avoiding the complexity of low-level programming frameworks. The same application can also be executed on multi-core CPUs without re-compilation, achieving speedups of up to 20.8x on 2 Intel Xeon E5620 CPUs. Comparison with an equivalent low-level CUDA implementation has shown that the performance overhead added by the Xcelerit platform is very light (<1.5%). Details on how the Xcelerit platform achieves this high level of performance have been presented and a general strategy for implementing financial Monte-Carlo algorithms has been outlined. Thus, it has been shown that the Xcelerit platform can be used to implement complex financial algorithms such as the LIBOR swaption pricer using straightforward high-level programming techniques, while still achieving high performance and portability across HPC processing platforms. REFERENCES [1] F. Black and M. Scholes, The pricing of options and corporate liabilities, The Journal of Political Economy, pp , [2] OpenMP Application Program Interface, OpenMP Architecture Review Board, Rev. 3.1, Jul [Online]. Available: org/mp-documents/openmp3.1.pdf [3] (2011) OpenCL - The open standard for parallel programming of heterogeneous systems. Khronos Group. [Online]. Available: [4] (2012) NVIDIA CUDA Toolkit. NVIDIA Corporation. [Online]. Available: [5] J. C. Hull, Fundamentals of Futures and Option Markets, 7th ed., D. Battista, Ed. Peason Education, Inc., [6] E. V. Murphy, LIBOR: Frequently asked questions, Congressional Research Service, Washington, DC, CRS Report for Congress R42608, Jul [Online]. Available: pdf [7] M. Giles, Libor notes, [Online]. Available: ox.ac.uk/gilesm/libor/libor_notes.pdf [8] E. G. Haug, The Complete Guide to Option Pricing Formulas. McGraw- Hill Professional, [9] M. Giles, Monte carlo evaluation of sensitivities in computational finance, Oxford University Computing Laboratory, Oxford, UK, Tech. Rep. 07/12, Jun [Online]. Available: uk/1090/1/na pdf [10] E. A. Lee and D. G. Messerschmitt, Synchronous data flow, Proc. IEEE, vol. 75, no. 9, pp , Sep [11] P. L Ecuyer, R. Simard, E. J. Chen, and W. D. Kelton, An objectoriented random-number package with many long streams and substreams, Oper. Res., vol. 50, no. 6, pp , Nov [12] (2012) NVIDIA curand Random Number Generation library. NVIDIA Corporation. [Online]. Available: cuda/curand [13] M. Giles. (2007) Libor monte carlo application. Oxford University Computing Laboratory. Oxford, UK. [Online]. Available: http: //people.maths.ox.ac.uk/gilesm/hpc/ [14] (2012) Thrust library. NVIDIA Corporation. [Online]. Available: [15] (2011) NVIDIA Visual Profiler. NVIDIA Corporation. [Online]. Available:
Accelerating Financial Computation
Accelerating Financial Computation Wayne Luk Department of Computing Imperial College London HPC Finance Conference and Training Event Computational Methods and Technologies for Finance 13 May 2013 1 Accelerated
More informationAccelerating Quantitative Financial Computing with CUDA and GPUs
Accelerating Quantitative Financial Computing with CUDA and GPUs NVIDIA GPU Technology Conference San Jose, California Gerald A. Hanweck, Jr., PhD CEO, Hanweck Associates, LLC Hanweck Associates, LLC 30
More informationOutline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE
Outline GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing Monte Carlo PDE Conclusions 2 Why GPU for Finance? Need for effective portfolio/risk management solutions Accurately measuring,
More informationGPU-Accelerated Quant Finance: The Way Forward
GPU-Accelerated Quant Finance: The Way Forward NVIDIA GTC Express Webinar Gerald A. Hanweck, Jr., PhD CEO, Hanweck Associates, LLC Hanweck Associates, LLC 30 Broad St., 42nd Floor New York, NY 10004 www.hanweckassoc.com
More informationAnalytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London
Analytics in 10 Micro-Seconds Using FPGAs David B. Thomas dt10@imperial.ac.uk Imperial College London Overview 1. The case for low-latency computation 2. Quasi-Random Monte-Carlo in 10us 3. Binomial Trees
More informationSPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU)
SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU) NIKOLA VASILEV, DR. ANATOLIY ANTONOV Eurorisk Systems Ltd. 31, General Kiselov str. BG-9002 Varna, Bulgaria Phone +359 52 612 367
More informationAlgorithmic Differentiation of a GPU Accelerated Application
of a GPU Accelerated Application Numerical Algorithms Group 1/31 Disclaimer This is not a speedup talk There won t be any speed or hardware comparisons here This is about what is possible and how to do
More informationCUDA-enabled Optimisation of Technical Analysis Parameters
CUDA-enabled Optimisation of Technical Analysis Parameters John O Rourke (Allied Irish Banks) School of Science and Computing Institute of Technology, Tallaght Dublin 24, Ireland Email: John.ORourke@ittdublin.ie
More informationStochastic Grid Bundling Method
Stochastic Grid Bundling Method GPU Acceleration Delft University of Technology - Centrum Wiskunde & Informatica Álvaro Leitao Rodríguez and Cornelis W. Oosterlee London - December 17, 2015 A. Leitao &
More informationLiangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,*
2017 2 nd International Conference on Computer Science and Technology (CST 2017) ISBN: 978-1-60595-461-5 Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform Gang
More informationFinancial Mathematics and Supercomputing
GPU acceleration in early-exercise option valuation Álvaro Leitao and Cornelis W. Oosterlee Financial Mathematics and Supercomputing A Coruña - September 26, 2018 Á. Leitao & Kees Oosterlee SGBM on GPU
More informationNear Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL
Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL Javier Alejandro Varela, Norbert Wehn Microelectronic Systems Design Research Group University of Kaiserslautern,
More informationFinancial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA
Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA Rajesh Bordawekar and Daniel Beece IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation
More informationPricing Early-exercise options
Pricing Early-exercise options GPU Acceleration of SGBM method Delft University of Technology - Centrum Wiskunde & Informatica Álvaro Leitao Rodríguez and Cornelis W. Oosterlee Lausanne - December 4, 2016
More informationHIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH. GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM
HIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM Introduction Valuation of American options on several assets requires
More informationUltimate Control. Maxeler RiskAnalytics
Ultimate Control Maxeler RiskAnalytics Analytics Risk Financial markets are rapidly evolving. Data volume and velocity are growing exponentially. To keep ahead of the competition financial institutions
More informationGRAPHICAL ASIAN OPTIONS
GRAPHICAL ASIAN OPTIONS MARK S. JOSHI Abstract. We discuss the problem of pricing Asian options in Black Scholes model using CUDA on a graphics processing unit. We survey some of the issues with GPU programming
More informationAssessing Solvency by Brute Force is Computationally Tractable
O T Y H E H U N I V E R S I T F G Assessing Solvency by Brute Force is Computationally Tractable (Applying High Performance Computing to Actuarial Calculations) E D I N B U R M.Tucker@epcc.ed.ac.uk Assessing
More informationApplications of Dataflow Computing to Finance. Florian Widmann
Applications of Dataflow Computing to Finance Florian Widmann Overview 1. Requirement Shifts in the Financial World 2. Case 1: Real Time Margin 3. Case 2: FX Option Monitor 4. Conclusions Market Context
More informationPRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS. Massimiliano Fatica, NVIDIA Corporation
PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS Massimiliano Fatica, NVIDIA Corporation OUTLINE! Overview! Least Squares Monte Carlo! GPU implementation! Results! Conclusions OVERVIEW!
More informationBlack-Scholes option pricing. Victor Podlozhnyuk
Black-Scholes option pricing Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Document Change History Version Date Responsible Reason for Change 0.9 007/03/19 Victor Podlozhnyuk Initial release 1.0 007/04/06
More informationDomokos Vermes. Min Zhao
Domokos Vermes and Min Zhao WPI Financial Mathematics Laboratory BSM Assumptions Gaussian returns Constant volatility Market Reality Non-zero skew Positive and negative surprises not equally likely Excess
More informationHedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud
Hedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud GPU Technology Conference 2013 Aon Benfield Securities, Inc. Annuity Solutions Group (ASG) This document is the confidential property
More informationBarrier Option. 2 of 33 3/13/2014
FPGA-based Reconfigurable Computing for Pricing Multi-Asset Barrier Options RAHUL SRIDHARAN, GEORGE COOKE, KENNETH HILL, HERMAN LAM, ALAN GEORGE, SAAHPC '12, PROCEEDINGS OF THE 2012 SYMPOSIUM ON APPLICATION
More informationAD in Monte Carlo for finance
AD in Monte Carlo for finance Mike Giles giles@comlab.ox.ac.uk Oxford University Computing Laboratory AD & Monte Carlo p. 1/30 Overview overview of computational finance stochastic o.d.e. s Monte Carlo
More informationLocal Volatility FX Basket Option on CPU and GPU
www.nag.co.uk Local Volatility FX Basket Option on CPU and GPU Jacques du Toit 1 and Isabel Ehrlich 2 Abstract We study a basket option written on 10 FX rates driven by a 10 factor local volatility model.
More informationAccelerated Option Pricing Multiple Scenarios
Accelerated Option Pricing in Multiple Scenarios 04.07.2008 Stefan Dirnstorfer (stefan@thetaris.com) Andreas J. Grau (grau@thetaris.com) 1 Abstract This paper covers a massive acceleration of Monte-Carlo
More informationGas storage: overview and static valuation
In this first article of the new gas storage segment of the Masterclass series, John Breslin, Les Clewlow, Tobias Elbert, Calvin Kwok and Chris Strickland provide an illustration of how the four most common
More informationS4199 Effortless GPU Models for Finance
ADAPTIV Risk management, risk-based pricing and operational solutions S4199 Effortless GPU Models for Finance 26 th March 2014 Ben Young Senior Software Engineer SUNGARD SunGard is one of the world s leading
More informationThe Dynamic Cross-sectional Microsimulation Model MOSART
Third General Conference of the International Microsimulation Association Stockholm, June 8-10, 2011 The Dynamic Cross-sectional Microsimulation Model MOSART Dennis Fredriksen, Pål Knudsen and Nils Martin
More informationComputational Finance in CUDA. Options Pricing with Black-Scholes and Monte Carlo
Computational Finance in CUDA Options Pricing with Black-Scholes and Monte Carlo Overview CUDA is ideal for finance computations Massive data parallelism in finance Highly independent computations High
More informationReconfigurable Acceleration for Monte Carlo based Financial Simulation
Reconfigurable Acceleration for Monte Carlo based Financial Simulation G.L. Zhang, P.H.W. Leong, C.H. Ho, K.H. Tsoi, C.C.C. Cheung*, D. Lee**, Ray C.C. Cheung*** and W. Luk*** The Chinese University of
More informationEfficient Reconfigurable Design for Pricing Asian Options
Efficient Reconfigurable Design for Pricing Asian Options Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Department of Computing Imperial College London, UK {htt08,dt10,khtsoi,wl}@doc.ic.ac.uk ABSTRACT
More informationNumerix Pricing with CUDA. Ghali BOUKFAOUI Numerix LLC
Numerix Pricing with CUDA Ghali BOUKFAOUI Numerix LLC What is Numerix? Started in 1996 Roots in pricing exotic derivatives Sophisticated models CrossAsset product Excel and SDK for pricing Expanded into
More informationHPC IN THE POST 2008 CRISIS WORLD
GTC 2016 HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016 STANFORD CENTER FOR FINANCIAL AND RISK ANALYTICS HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016 BACK TO 2008 FINANCIAL MARKETS
More informationBenchmarks Open Questions and DOL Benchmarks
Benchmarks Open Questions and DOL Benchmarks Iuliana Bacivarov ETH Zürich Outline Benchmarks what do we need? what is available? Provided benchmarks in a DOL format Open questions Map2Mpsoc, 29-30 June
More informationNAG for HPC in Finance
NAG for HPC in Finance John Holden Jacques Du Toit 3 rd April 2014 Computation in Finance and Insurance, post Napier Experts in numerical algorithms and HPC services Agenda NAG and Financial Services Why
More informationRemarks on stochastic automatic adjoint differentiation and financial models calibration
arxiv:1901.04200v1 [q-fin.cp] 14 Jan 2019 Remarks on stochastic automatic adjoint differentiation and financial models calibration Dmitri Goloubentcev, Evgeny Lakshtanov Abstract In this work, we discuss
More informationCollateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result
Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result Lurng-Kuo Liu Virat Agarwal Outline Objectivee Collateralized Debt Obligation Basics CDO on the Cell/B.E. A preliminary result
More informationMonte-Carlo Pricing under a Hybrid Local Volatility model
Monte-Carlo Pricing under a Hybrid Local Volatility model Mizuho International plc GPU Technology Conference San Jose, 14-17 May 2012 Introduction Key Interests in Finance Pricing of exotic derivatives
More informationOutline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Workstation Testing. Enterprise Testing Dell and NVIDIA solutions Conclusions
Outline GPU for Finance SciFinance SciFinance CUDA Risk Applications Workstation Testing Monte Carlo PDE Enterprise Testing Dell and NVIDIA solutions Conclusions 2 Why GPU for Finance? Need for effective
More informationApplication of High Performance Computing in Investment Banks
British Computer Society FiNSG and APSG Public Application of High Performance Computing in Investment Banks Dr. Tony K. Chau Lead Architect, IB CTO, UBS January 8, 2014 Table of contents Section 1 UBS
More informationMilliman STAR Solutions - NAVI
Milliman STAR Solutions - NAVI Milliman Solvency II Analysis and Reporting (STAR) Solutions The Solvency II directive is not simply a technical change to the way in which insurers capital requirements
More informationEfficient Reconfigurable Design for Pricing Asian Options
Efficient Reconfigurable Design for Pricing Asian Options Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Department of Computing Imperial College London, UK (htt08,dtl O,khtsoi,wl)@doc.ic.ac.uk
More informationAutomatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations
Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo s David B. Thomas, Jacob A. Bower, Wayne Luk {dt1,wl}@doc.ic.ac.uk Department of Computing Imperial College London Abstract
More informationF1 Acceleration for Montecarlo: financial algorithms on FPGA
F1 Acceleration for Montecarlo: financial algorithms on FPGA Presented By Liang Ma, Luciano Lavagno Dec 10 th 2018 Contents Financial problems and mathematical models High level synthesis Optimization
More informationComputational Finance. Computational Finance p. 1
Computational Finance Computational Finance p. 1 Outline Binomial model: option pricing and optimal investment Monte Carlo techniques for pricing of options pricing of non-standard options improving accuracy
More informationLoad Test Report. Moscow Exchange Trading & Clearing Systems. 07 October Contents. Testing objectives... 2 Main results... 2
Load Test Report Moscow Exchange Trading & Clearing Systems 07 October 2017 Contents Testing objectives... 2 Main results... 2 The Equity & Bond Market trading and clearing system... 2 The FX Market trading
More informationHandbook of Financial Risk Management
Handbook of Financial Risk Management Simulations and Case Studies N.H. Chan H.Y. Wong The Chinese University of Hong Kong WILEY Contents Preface xi 1 An Introduction to Excel VBA 1 1.1 How to Start Excel
More informationMulti-level Stochastic Valuations
Multi-level Stochastic Valuations 14 March 2016 High Performance Computing in Finance Conference 2016 Grigorios Papamanousakis Quantitative Strategist, Investment Solutions Aberdeen Asset Management 0
More informationNew GPU Pricing Library
New GPU Pricing Library! Client project for Bank Sarasin! Highly regarded sustainable Swiss private bank! Founded 1841! Core business! Asset management! Investment advisory! Investment funds! Structured
More informationDesign of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA
Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical
More informationFAILURE RATE TRENDS IN AN AGING POPULATION MONTE CARLO APPROACH
FAILURE RATE TRENDS IN AN AGING POPULATION MONTE CARLO APPROACH Niklas EKSTEDT Sajeesh BABU Patrik HILBER KTH Sweden KTH Sweden KTH Sweden niklas.ekstedt@ee.kth.se sbabu@kth.se hilber@kth.se ABSTRACT This
More informationMark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling
EE 357 Unit 12 Performance Modeling An Opening Question An Intel and a Sun/SPARC computer measure their respective rates of instruction execution on the same application written in C Mark Redekopp, All
More informationReal-Time Market Data Technology Overview
Real-Time Market Data Technology Overview Zoltan Radvanyi Morgan Stanley Session Outline What is market data? Basic terms used in market data world Market data processing systems Real time requirements
More informationWhy know about performance
1 Performance Today we ll discuss issues related to performance: Latency/Response Time/Execution Time vs. Throughput How do you make a reasonable performance comparison? The 3 components of CPU performance
More informationRunnING Risk on GPUs. Answering The Computational Challenges of a New Environment. Tim Wood Market Risk Management Trading - ING Bank
RunnING Risk on GPUs Answering The Computational Challenges of a New Environment Tim Wood Market Risk Management Trading - ING Bank Nvidia GTC Express September 19 th 2012 www.ing.com ING Bank Part of
More informationA Portable and Fast Stochastic Volatility Model Calibration using Multi and Many-Core Processors
A Portable and Fast Stochastic Volatility Model Calibration using Multi and Many-Core Processors Matthew Dixon Department of Analytics University of San Francisco San Francisco, CA mfdixon@usfca.edu Jörg
More information2.1 Mathematical Basis: Risk-Neutral Pricing
Chapter Monte-Carlo Simulation.1 Mathematical Basis: Risk-Neutral Pricing Suppose that F T is the payoff at T for a European-type derivative f. Then the price at times t before T is given by f t = e r(t
More informationHigh Performance Risk Aggregation: Addressing the Data Processing Challenge the Hadoop MapReduce Way
High Performance Risk Aggregation: Addressing the Data Processing Challenge the Hadoop MapReduce Way A. Rau-Chaplin, B. Varghese 1, Z. Yao Faculty of Computer Science, Dalhousie University Halifax, Nova
More informationPARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES
PARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES WIKTOR JAKUBIUK, KESHAV PURANMALKA 1. Introduction Dijkstra s algorithm solves the single-sourced shorest path problem on a
More informationSmoking Adjoints: fast evaluation of Greeks in Monte Carlo calculations
Report no. 05/15 Smoking Adjoints: fast evaluation of Greeks in Monte Carlo calculations Michael Giles Oxford University Computing Laboratory, Parks Road, Oxford, U.K. Paul Glasserman Columbia Business
More informationAnne Bracy CS 3410 Computer Science Cornell University
Anne Bracy CS 3410 Computer Science Cornell University These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. Complex question How fast is the
More informationRazor Risk Market Risk Overview
Razor Risk Market Risk Overview Version 1.0 (Final) Prepared by: Razor Risk Updated: 20 April 2012 Razor Risk 7 th Floor, Becket House 36 Old Jewry London EC2R 8DD Telephone: +44 20 3194 2564 e-mail: peter.walsh@razor-risk.com
More informationOption Pricing with the SABR Model on the GPU
Option Pricing with the SABR Model on the GPU Yu Tian, Zili Zhu, Fima C. Klebaner and Kais Hamza School of Mathematical Sciences, Monash University, Clayton, VIC3800, Australia Email: {yu.tian, fima.klebaner,
More informationMonte Carlo Option Pricing
Monte Carlo Option Pricing Victor Podlozhnyuk vpodlozhnyuk@nvidia.com Mark Harris mharris@nvidia.com Document Change History Version Date Responsible Reason for Change 1. 2/3/27 vpodlozhnyuk Initial release
More informationBinomial American Option Pricing on CPU-GPU Hetergenous System
Binomial American Option Pricing on CPU-GPU Hetergenous System Nan Zhang, Chi-Un Lei and Ka Lok Man Abstract We present a novel parallel binomial algorithm to compute prices of American options. The algorithm
More informationNumerical Methods in Option Pricing (Part III)
Numerical Methods in Option Pricing (Part III) E. Explicit Finite Differences. Use of the Forward, Central, and Symmetric Central a. In order to obtain an explicit solution for the price of the derivative,
More informationDRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics
Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward
More informationChapter 2 Uncertainty Analysis and Sampling Techniques
Chapter 2 Uncertainty Analysis and Sampling Techniques The probabilistic or stochastic modeling (Fig. 2.) iterative loop in the stochastic optimization procedure (Fig..4 in Chap. ) involves:. Specifying
More informationOracle Financial Services Market Risk User Guide
Oracle Financial Services User Guide Release 8.0.4.0.0 March 2017 Contents 1. INTRODUCTION... 1 PURPOSE... 1 SCOPE... 1 2. INSTALLING THE SOLUTION... 3 2.1 MODEL UPLOAD... 3 2.2 LOADING THE DATA... 3 3.
More informationAn Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model
2011 International Conference on Reconfigurable Computing and FPGAs An Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model Christian de Schryver, Ivan Shcherbakov, Frank
More informationBalance Sheet Management
Balance Sheet Management white paper The content of this document is the intellectual property of MavenBlue BV. No part of this document may be used, copied, distributed, changed or made public without
More informationMonte Carlo Methods for Uncertainty Quantification
Monte Carlo Methods for Uncertainty Quantification Abdul-Lateef Haji-Ali Based on slides by: Mike Giles Mathematical Institute, University of Oxford Contemporary Numerical Techniques Haji-Ali (Oxford)
More informationEnergy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL
Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL Valentin Mena Morales, Pierre-Henri Horrein, Amer Baghdadi, Erik Hochapfel, Sandrine Vaton Institut Mines-Telecom; Telecom
More informationFinancial Computing with Python
Introduction to Financial Computing with Python Matthieu Mariapragassam Why coding seems so easy? But is actually not Sprezzatura : «It s an art that doesn t seem to be an art» - The Book of the Courtier
More informationOperational Risk Quantification System
N O R T H E R N T R U S T Operational Risk Quantification System Northern Trust Corporation May 2012 Achieving High-Performing, Simulation-Based Operational Risk Measurement with R and RevoScaleR Presented
More informationInterest Rate Models: An ALM Perspective Ser-Huang Poon Manchester Business School
Interest Rate Models: An ALM Perspective Ser-Huang Poon Manchester Business School 1 Interest Rate Models: An ALM Perspective (with NAG implementation) Ser-Huang Poon Manchester Business School Full paper:
More informationMarket interest-rate models
Market interest-rate models Marco Marchioro www.marchioro.org November 24 th, 2012 Market interest-rate models 1 Lecture Summary No-arbitrage models Detailed example: Hull-White Monte Carlo simulations
More informationCUDA Implementation of the Lattice Boltzmann Method
CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Buffalo 2 Dec 2010 A. Leach (University at Buffalo) CUDA LBM Nov 2010 1 / 16 Motivation The Lattice
More informationComputer Exercise 2 Simulation
Lund University with Lund Institute of Technology Valuation of Derivative Assets Centre for Mathematical Sciences, Mathematical Statistics Fall 2017 Computer Exercise 2 Simulation This lab deals with pricing
More informationBlitzTrader. Next Generation Algorithmic Trading Platform
BlitzTrader Next Generation Algorithmic Trading Platform Introduction TRANSFORM YOUR TRADING IDEAS INTO ACTION... FAST TIME TO THE MARKET BlitzTrader is next generation, most powerful, open and flexible
More informationHardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking
Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking Christian de Schryver #, Henning Marxen, Daniel Schmidt # # Micrelectronic Systems Design Department, University
More informationEFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS
Commun. Korean Math. Soc. 23 (2008), No. 2, pp. 285 294 EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS Kyoung-Sook Moon Reprinted from the Communications of the Korean Mathematical Society
More informationXSG. Economic Scenario Generator. Risk-neutral and real-world Monte Carlo modelling solutions for insurers
XSG Economic Scenario Generator Risk-neutral and real-world Monte Carlo modelling solutions for insurers 2 Introduction to XSG What is XSG? XSG is Deloitte s economic scenario generation software solution,
More informationValuation of Discrete Vanilla Options. Using a Recursive Algorithm. in a Trinomial Tree Setting
Communications in Mathematical Finance, vol.5, no.1, 2016, 43-54 ISSN: 2241-1968 (print), 2241-195X (online) Scienpress Ltd, 2016 Valuation of Discrete Vanilla Options Using a Recursive Algorithm in a
More informationAs we saw in Chapter 12, one of the many uses of Monte Carlo simulation by
Financial Modeling with Crystal Ball and Excel, Second Edition By John Charnes Copyright 2012 by John Charnes APPENDIX C Variance Reduction Techniques As we saw in Chapter 12, one of the many uses of Monte
More informationIndex. High-Frequency Trading Models By Gewei Ye Copyright 2011 by Gewei Ye.
High-Frequency Trading Models By Gewei Ye Copyright 2011 by Gewei Ye. Index Abstraction, 14 Advanced trading strategies with SAPE Black-Scholes model, 290 292 large cap hedge strategy, 219 large cap long
More informationAn Adjusted Trinomial Lattice for Pricing Arithmetic Average Based Asian Option
American Journal of Applied Mathematics 2018; 6(2): 28-33 http://www.sciencepublishinggroup.com/j/ajam doi: 10.11648/j.ajam.20180602.11 ISSN: 2330-0043 (Print); ISSN: 2330-006X (Online) An Adjusted Trinomial
More informationInstitute of Actuaries of India. Subject. ST6 Finance and Investment B. For 2018 Examinationspecialist Technical B. Syllabus
Institute of Actuaries of India Subject ST6 Finance and Investment B For 2018 Examinationspecialist Technical B Syllabus Aim The aim of the second finance and investment technical subject is to instil
More informationFast American Basket Option Pricing on a multi-gpu Cluster
Fast American Basket Option Pricing on a multi-gpu Cluster Michael Benguigui, Françoise Baude To cite this version: Michael Benguigui, Françoise Baude. Fast American Basket Option Pricing on a multi-gpu
More informationFINANCIAL DERIVATIVE. INVESTMENTS An Introduction to Structured Products. Richard D. Bateson. Imperial College Press. University College London, UK
FINANCIAL DERIVATIVE INVESTMENTS An Introduction to Structured Products Richard D. Bateson University College London, UK Imperial College Press Contents Preface Guide to Acronyms Glossary of Notations
More informationModelling Counterparty Exposure and CVA An Integrated Approach
Swissquote Conference Lausanne Modelling Counterparty Exposure and CVA An Integrated Approach Giovanni Cesari October 2010 1 Basic Concepts CVA Computation Underlying Models Modelling Framework: AMC CVA:
More informationStructural credit risk models and systemic capital
Structural credit risk models and systemic capital Somnath Chatterjee CCBS, Bank of England November 7, 2013 Structural credit risk model Structural credit risk models are based on the notion that both
More informationAccelerating Reconfigurable Financial Computing
Imperial College London Department of Computing Accelerating Reconfigurable Financial Computing Hong Tak Tse (Anson) Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy
More informationModeling Path Dependent Derivatives Using CUDA Parallel Platform
Modeling Path Dependent Derivatives Using CUDA Parallel Platform A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Mathematical Sciences in the Graduate School of The
More informationThe Binomial Lattice Model for Stocks: Introduction to Option Pricing
1/33 The Binomial Lattice Model for Stocks: Introduction to Option Pricing Professor Karl Sigman Columbia University Dept. IEOR New York City USA 2/33 Outline The Binomial Lattice Model (BLM) as a Model
More informationPlain Vanilla - Black model Version 1.2
Plain Vanilla - Black model Version 1.2 1 Introduction The Plain Vanilla plug-in provides Fairmat with the capability to price a plain vanilla swap or structured product with options like caps/floors,
More informationAutomated Options Trading Using Machine Learning
1 Automated Options Trading Using Machine Learning Peter Anselmo and Karen Hovsepian and Carlos Ulibarri and Michael Kozloski Department of Management, New Mexico Tech, Socorro, NM 87801, U.S.A. We summarize
More informationHistory of Monte Carlo Method
Monte Carlo Methods History of Monte Carlo Method Errors in Estimation and Two Important Questions for Monte Carlo Controlling Error A simple Monte Carlo simulation to approximate the value of pi could
More information