Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA

Similar documents
Barrier Option. 2 of 33 3/13/2014

Analytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London

GRAPHICAL ASIAN OPTIONS

Computational Finance

Financial Mathematics and Supercomputing

King s College London

Domokos Vermes. Min Zhao

F1 Acceleration for Montecarlo: financial algorithms on FPGA

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Monte Carlo Methods in Financial Engineering

Accelerated Option Pricing Multiple Scenarios

Math Option pricing using Quasi Monte Carlo simulation

IEOR E4703: Monte-Carlo Simulation

Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA

Monte Carlo Methods in Finance

Monte Carlo Methods for Uncertainty Quantification

Monte Carlo Simulations

Computational Finance Improving Monte Carlo

Valuation of performance-dependent options in a Black- Scholes framework

PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS. Massimiliano Fatica, NVIDIA Corporation

Reconfigurable Acceleration for Monte Carlo based Financial Simulation

King s College London

Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL

History of Monte Carlo Method

Quasi-Monte Carlo for Finance

Simulating Stochastic Differential Equations

SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU)

Monte-Carlo Pricing under a Hybrid Local Volatility model

Math Computational Finance Option pricing using Brownian bridge and Stratified samlping

ELEMENTS OF MONTE CARLO SIMULATION

Pricing Early-exercise options

Numerical schemes for SDEs

"Vibrato" Monte Carlo evaluation of Greeks

Accelerating Financial Computation

Parallel Multilevel Monte Carlo Simulation

Implementing Models in Quantitative Finance: Methods and Cases

Math 623 (IOE 623), Winter 2008: Final exam

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE

Contents Critique 26. portfolio optimization 32

Stochastic Grid Bundling Method

EFFICIENT MONTE CARLO ALGORITHM FOR PRICING BARRIER OPTIONS

AD in Monte Carlo for finance

Monte Carlo Methods for Uncertainty Quantification

Computer Exercise 2 Simulation

23 Stochastic Ordinary Differential Equations with Examples from Finance

Math Computational Finance Double barrier option pricing using Quasi Monte Carlo and Brownian Bridge methods

NEWCASTLE UNIVERSITY SCHOOL OF MATHEMATICS, STATISTICS & PHYSICS SEMESTER 1 SPECIMEN 2 MAS3904. Stochastic Financial Modelling. Time allowed: 2 hours

Option Pricing with the SABR Model on the GPU

Monte Carlo Methods. Prof. Mike Giles. Oxford University Mathematical Institute. Lecture 1 p. 1.

Black-Scholes option pricing. Victor Podlozhnyuk

Accelerating Quantitative Financial Computing with CUDA and GPUs

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Solutions to Final Exam.

2.1 Mathematical Basis: Risk-Neutral Pricing

The Black-Scholes Model

Lecture 17. The model is parametrized by the time period, δt, and three fixed constant parameters, v, σ and the riskless rate r.

Market interest-rate models

GPU-Accelerated Quant Finance: The Way Forward

Module 4: Monte Carlo path simulation

Chapter 2 Uncertainty Analysis and Sampling Techniques

The Black-Scholes Model

Computational Finance in CUDA. Options Pricing with Black-Scholes and Monte Carlo

- 1 - **** d(lns) = (µ (1/2)σ 2 )dt + σdw t

Lecture Note 8 of Bus 41202, Spring 2017: Stochastic Diffusion Equation & Option Pricing

Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,*

Multi-Asset Options. A Numerical Study VILHELM NIKLASSON FRIDA TIVEDAL. Master s thesis in Engineering Mathematics and Computational Science

Handbook of Financial Risk Management

Module 10:Application of stochastic processes in areas like finance Lecture 36:Black-Scholes Model. Stochastic Differential Equation.

Market Volatility and Risk Proxies

Write legibly. Unreadable answers are worthless.

Microsoft Morgan Stanley Finance Contest Final Report

On the Scrambled Sobol sequences Lecture Notes in Computer Science 3516, , Springer 2005

Fast and accurate pricing of discretely monitored barrier options by numerical path integration

Stochastic Differential Equations in Finance and Monte Carlo Simulations

Monte Carlo Option Pricing

Valuation of Asian Option. Qi An Jingjing Guo

for Finance Python Yves Hilpisch Koln Sebastopol Tokyo O'REILLY Farnham Cambridge Beijing

New GPU Pricing Library

Introduction to Stochastic Calculus With Applications

MONTE CARLO EXTENSIONS

A Moment Matching Approach To The Valuation Of A Volume Weighted Average Price Option

IEOR E4703: Monte-Carlo Simulation

EE266 Homework 5 Solutions

Introduction to Financial Mathematics

Risk Neutral Valuation

Monte Carlo Methods for Uncertainty Quantification

Multilevel quasi-monte Carlo path simulation

Numerix Pricing with CUDA. Ghali BOUKFAOUI Numerix LLC

Exact Sampling of Jump-Diffusion Processes

The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations

A distributed Laplace transform algorithm for European options

Math 416/516: Stochastic Simulation

Forecasting Life Expectancy in an International Context

An analysis of faster convergence in certain finance applications for quasi-monte Carlo

Pricing Asian Options

Monte Carlo Methods in Structuring and Derivatives Pricing

1.1 Basic Financial Derivatives: Forward Contracts and Options

TEST OF BOUNDED LOG-NORMAL PROCESS FOR OPTIONS PRICING

Analytical formulas for local volatility model with stochastic. Mohammed Miri

Algorithmic Differentiation of a GPU Accelerated Application

Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result

Transcription:

Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA Rajesh Bordawekar and Daniel Beece IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation

Outline Motivation Monte Carlo Option Pricing Path Generation Accumulator Forward Option Parallelization on TK1 Experimental Evaluation Conclusions and Future Work 2 3/17/2015

Motivation Monte Carlo simulation extensively used in financial modeling Monte Carlo is a compute-bound problem FPGAs and GPUs are increasingly being used for accelerating financial kernels Low power consumption of FPGA a key advantage over enterprise-class GPUs (e.g., a K40) Lower price enables building price-competitive clusters Focus of this work: Evaluate exploitation of TK1 for accelerating financial Monte Carlo (specifically pricing esoteric options) Compare performance and power consumption 3 3/17/2015

Pricing via Monte Carlo Simulation Used for pricing esoteric options no analytic solution, typically 10% to 20% of pricing functions in a portfolio Low I/O- High Compute Workload: suitable for accelerators such as FPGA and GPUs Focus of this work: Accumulator Forward Options 4 3/17/2015

Pricing Function: Accumulator Forward Option Option on a stock with defined strike and barrier prices At fixed intervals (e.g., each month) seller is obliged to sell at the strike price buyer is obliged to buy at the strike price No down side limit buyer can loose a lot of money Limited up side contract terminates if price exceeds the barrier Must use Monte Carlo approach for pricing no analytic solution 5 3/17/2015

Core Computation of the Accumulator Forward Options Stochastic paths (10 6 ) of stock prices for 365 days Quasi-random number generation (Sobol) Gaussian distribution (inverse normal) Path generation (Black-Scholes) Compute cash flows (pricing function) for each path 6 3/17/2015

Sobol Sequences Low-dispersion, quasi-random numbers uniformly distributed on the interval (0, 1) requires inverse-normal transformation Two parameters- number of samples and number of dimensions 10 6 samples (paths) in 365 dimensions (days) Faster convergence compared to other techniques Excellent implementations available with very long periods Joe & Kuo (Sequential), basis of CURAND Sobol QRNG Easy to generate exploits bit-vector operations e.g., shift, xor, mask of constants. 7 3/17/2015

Black-Scholes Stochastic Model The Black-Scholes model describes the evolution of stock s price through a stochastic differential equation (SDE) the expresses the percentage change as increments of a Brownian motion stock price at time t ds S t t r dt drift (mean rate of return) dw t Brownian Motion: normally distributed random variable (mean 0, variance t ) volatility of the price 8 3/17/2015

Price IBM Research SDE Solution S t S 0 e 1 r 2 2 t tz stock price at time t initial stock price standard normal random variable (mean 0, variance 1) $130.00 $120.00 $110.00 $100.00 $90.00 $80.00 Paths $70.00 $60.00 1 51 101 151 201 251 301 351 Days 9 3/17/2015

Execution Flow of the Monte-Carlo Computation Input Uniformly-distributed Quasi-Random Number Generation Gaussian Distributed (Inverse Normal) Stochastic Path Generation Black-Scholes Compute Cash Flows (Accumulator Forward) Results 10 3/17/2015

Parallelizing the Monte-Carlo Computation on GPU Each thread executes one or more distinct paths. Individual cash flows aggregated to compute final result GPU Kernel Host Paths = 10 6 Dimensions = 365 Stochastic Path Stochastic Path ----- Stochastic Path Stochastic Path Thread 0 Path-based Parallelization Thread N Aggregation Result 11 3/17/2015

TK1 Implementation Details Issues impacting TK1 implementation Weak ARM host: need to do everything on the TK1 TK1 has low memory bandwidth (peak 9 GB/s) Minimize device memory accesses TK1 has few physical cores: limit on the threadblock count Core computations on the TK1 (Single-precision calculations) Sobol QRNG generation Using CURAND Sobol generator versus native implementation Inverse-normal calculations Sum reduction to calculate final result Uses warp functions to reduce usage of atomicadd() 12 3/17/2015

Implementation of Sobol Generator Sobol generators follow a simple recurrence x [Bratley and Fox, Algorithm 659] where n 1 xn vc vc is called the direction number x(n) computed using Gray code representation of n Gray code(n) =.. g 3 g 2 g 1. Gray code(n) and Gray code (n+1) differ in one bit x(n) = g 1 v 1 g 2 v 2.. For generating M samples in N directions, it requires N * 32 direction numbers (32 integers per dimension) Calculations across dimension completely independent Within a dimension, sample i can be calculated directly by solving the recurrence 13 3/17/2015

Parallelizing Sobol Generator on GPU Sobol parallelization strategy depends on how the overall computation is parallelized Current strategy uses path-based parallelization Each thread executes 365 iterations, each for a dimension At every iteration j, thread i calculates a unique sample of index map(i) in dimension j At every iteration j each thread operates on the 32 direction numbers for the direction j Total data fetched from device memory = 32 * 365 * #thread-block Current CURAND interface can not support this execution pattern Reading pre-computed 365x10 6 random numbers from TK1 s device memory extremely inefficient 14 3/17/2015

Per-thread execution of Sobol generator int stride= iterations; /* Stride = #Iterations */ int loops = ffs(stride); /* gid is between 0 and #iterations */ unsigned int gid = blockid* threads_per_block + iam; unsigned int directions[32]; unsigned int X=0, mask=0; /* Fetch direction vectors for dimension j (day j ) */ unsigned g = gid ^ (gid >> 1); /* We want X ^= g_k * v[k], where g_k is one or zero. */ for (unsigned int k=0; k < loops -1 ; k++){ mask = -(g & 1); X ^= mask & directions[k]; g = g >> 1; } sobolsample_i_j = (float) X * k_2powneg32; /* i == gid */ Modified version of code used in the Sobol QRNG Sample Uses Joe and Kuo s (ACM TOMS 2003) dimension numbers 15 3/17/2015

Experiment Evaluation: FPGA Setup Altera Stratix V connected to Power 8 host Implements a 1024-dimension Sobol Generator Result aggregation computed on the Power 8 host 16 3/17/2015

Experimental Results: 10 6 Paths and 365 Days TK1: 12.28 sec @ 3 Watts (ARM Host) 0.013 sec for 1K Paths FPGA: 0.2 sec @ 9 Watts (Aggregation done on the P8 host) TK1 without aggregation takes 12.17 sec Other architectures: K40: 0.053 sec @ 68 Watts (Needs CPU host) x86 (IB): 1 sec, 20 threads Cost Analysis A TK1 board at least 50x cheaper than enterprise class multi-core CPU+accelerator system GPU has smaller NRE ($) than FPGA 17 3/17/2015

Experimental Results: TK1 Performance Issues Three expensive components Sobol Calculations: xor, bit shifts Coalesced accesses to fetch 32 direction numbers Inverse-normal and Path calculations Exp, log, FMA operations Result aggregation uses atomicadd() Number of thread blocks can affect the performance Using 1024 blocks of 128 threads each Overall GPU performance affected by Sobol, Inverse-normal, and Path Calculations cost of accessing direction vectors insignificant 18 3/17/2015

GPU versus FPGA FPGA was faster than TK1 somewhat slower than K40 FPGA consumes more power than TK1 less than K40 GPU programming easier than FPGA more flexible and less NRE compared to FPGA Same code runs on TK1 and K40

Conclusions and Future Work Implemented Monte-Carlo Pricing model for Accumulator Forward Options on the TK1 TK1 performance affected by the computational functions (sobol, inverse-normal, pricing) Need to investigate performance optimization opportunities Low power GPUs could be very competitive if run on enterprise class host 20 3/17/2015