Accelerating Financial Computation

Similar documents
Analytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London

F1 Acceleration for Montecarlo: financial algorithms on FPGA

Efficient Reconfigurable Design for Pricing Asian Options

Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations

Reconfigurable Acceleration for Monte Carlo based Financial Simulation

Efficient Reconfigurable Design for Pricing Asian Options

Accelerating Reconfigurable Financial Computing

Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA

Barrier Option. 2 of 33 3/13/2014

Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA

Ultimate Control. Maxeler RiskAnalytics

Algorithmic Differentiation of a GPU Accelerated Application

Financial Mathematics and Supercomputing

High Performance and Low Power Monte Carlo Methods to Option Pricing Models via High Level Design and Synthesis

Stochastic Grid Bundling Method

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE

Multi-level Stochastic Valuations

Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL

Architecture Exploration for Tree-based Option Pricing Models

Pricing Early-exercise options

An Energy Efficient FPGA Accelerator for Monte Carlo Option Pricing with the Heston Model

Many-core Accelerated LIBOR Swaption Portfolio Pricing

Innovation in the global credit

SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU)

FPGA ACCELERATION OF MONTE-CARLO BASED CREDIT DERIVATIVE PRICING

stratification strategy controlled by CPUs, to adaptively allocate the optimal number of simulations to a specific segment of the entire integration d

Accelerating Quantitative Financial Computing with CUDA and GPUs

Hedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud

GRAPHICAL ASIAN OPTIONS

Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL

Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,*

S4199 Effortless GPU Models for Finance

HIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH. GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM

Domokos Vermes. Min Zhao

HyPER: A Runtime Reconfigurable Architecture for Monte Carlo Option Pricing in the Heston Model

Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking

CUDA-enabled Optimisation of Technical Analysis Parameters

Accelerated Option Pricing Multiple Scenarios

Puttable Bond and Vaulation

Interest Rate Bermudan Swaption Valuation and Risk

GPU-Accelerated Quant Finance: The Way Forward

Assessing Solvency by Brute Force is Computationally Tractable

HPC IN THE POST 2008 CRISIS WORLD

High throughput implementation of the new Secure Hash Algorithm through partial unrolling

Milliman STAR Solutions - NAVI

Callable Bond and Vaulation

Monte-Carlo Pricing under a Hybrid Local Volatility model

Interest Rate Cancelable Swap Valuation and Risk

PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS. Massimiliano Fatica, NVIDIA Corporation

Oracle Financial Services Market Risk User Guide

NAG for HPC in Finance

XSG. Economic Scenario Generator. Risk-neutral and real-world Monte Carlo modelling solutions for insurers

Numerix Pricing with CUDA. Ghali BOUKFAOUI Numerix LLC

Applications of Dataflow Computing to Finance. Florian Widmann

Benchmarks Open Questions and DOL Benchmarks

New GPU Pricing Library

Institute of Actuaries of India. Subject. ST6 Finance and Investment B. For 2018 Examinationspecialist Technical B. Syllabus

Option Pricing with the SABR Model on the GPU

Mark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling

Fixed Income Modelling

arxiv: v1 [cs.dc] 14 Jan 2013

Towards efficient option pricing in incomplete markets

Risk Neutral Valuation

Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result

High Performance Monte-Carlo Based Option Pricing on FPGAs

Computational Finance in CUDA. Options Pricing with Black-Scholes and Monte Carlo

Unparalleled Performance, Agility and Security for NSE

1. In this exercise, we can easily employ the equations (13.66) (13.70), (13.79) (13.80) and

ThermOS. System Support for Dynamic Thermal Management of Chip Multi-Processors

Callable Libor exotic products. Ismail Laachir. March 1, 2012

Latest Developments: Interest Rate Modelling & Interest Rate Exotic & Hybrid Products

Local Volatility FX Basket Option on CPU and GPU

7 pages Intro /Doc /Kernel /Interface /Contents 1. Premia: Overview version 14. C. Martini, A.Zanette 1999/15/12. Premia What Premia is 1

Cross Asset CVA Application

EE266 Homework 5 Solutions

COMPARISON OF BUDGET BORROWING AND BUDGET ADAPTATION IN HIERARCHICAL SCHEDULING FRAMEWORK

Physical Unclonable Functions (PUFs) and Secure Processors. Srini Devadas Department of EECS and CSAIL Massachusetts Institute of Technology

Implementing the HJM model by Monte Carlo Simulation

Monte Carlo Option Pricing

Gas storage: overview and static valuation

Preparing for the Fundamental Review of the Trading Book (FRTB)

Interest rate modelling: How important is arbitrage free evolution?

Distance-Based High-Frequency Trading

Computational Finance Improving Monte Carlo

Application of High Performance Computing in Investment Banks

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Workstation Testing. Enterprise Testing Dell and NVIDIA solutions Conclusions

Oracle Financial Services Market Risk User Guide

Why know about performance

Asian Option Pricing: Monte Carlo Control Variate. A discrete arithmetic Asian call option has the payoff. S T i N N + 1

A Pattern Matching Approach to Map Cognitive Domain Ontologies to the IBM TrueNorth Processor

Contents. Part I Introduction to Option Pricing

IEOR E4703: Monte-Carlo Simulation

Monte Carlo Simulations

The Dynamic Cross-sectional Microsimulation Model MOSART

MONTE CARLO EXTENSIONS

Quantitative Finance COURSE NUMBER: 22:839:510 COURSE TITLE: Numerical Analysis

IEOR E4703: Monte-Carlo Simulation

List of Abbreviations

MINIMAL PARTIAL PROXY SIMULATION SCHEMES FOR GENERIC AND ROBUST MONTE-CARLO GREEKS

arxiv: v3 [q-fin.cp] 20 Sep 2018

Transcription:

Accelerating Financial Computation Wayne Luk Department of Computing Imperial College London HPC Finance Conference and Training Event Computational Methods and Technologies for Finance 13 May 2013 1

Accelerated System Architecture CPU request accelerator result data data memory I/O accelerators multiple functions in clouds common types FPGA GPU mixed 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed.

Acceleration@imperial security: Elliptic Curve Encryption 35MHz XC2V6000: 1150x 2.6GHz Xeon processor bio-informatics: canonical labelling xc4vlx60: up to 400x 2.2 GHz Quad-Opteron combinatorial optimisation: tabu search for TSPLIB 1.15GHz C2050: 112x 2.67GHz Xeon X5650 12-cores medical imaging: 3D image registration 412MHz XC5VLX330: 108x 2.5GHz Quad-Xeon financial: Monte Carlo credit risk modelling 233MHz XC4VSX55: 60-100x 2.4GHz Quad-Xeon

Why Accelerators? features parallelism: many heterogeneous cores customisable: operation and data, e.g. precision benefits: improve over CPU based systems speed latency size power energy cost 4

Challenges maximise efficiency: best trade-offs in: speed size power and energy maximise productivity high-level description support users + experts facilitate design re-use 5

Customisation Example 1. Monte Carlo framework HJM based interest rate derivatives payoff evaluations 3 levels of functional specialisations 2. Specialisation Domain-Specific Language: specialise for applications optimise data-width on FPGA 3. Evaluation 1.36 times faster than GPU 3 times more energy efficient than GPU Joint work with Qiwei Jin, Diwei Dong, Anson Tse, Gary Chow, David Thomas, and Stephen Weston 6

Background Monte Carlo Method useful numerical technique used for options with no closed-form solution easily parallelisable time-consuming to obtain accurate result FPGA: natural fit for Monte Carlo simulations deep pipelining customisable data-width low power consumption efficient random number generating 7

Concerns FPGA complexity in mapping algorithm to hardware adversarial to change if design is optimised real-world Monte Carlo applications complex control logic prone to change short deadline for delivery financial interest rate derivatives payoff evaluation: family of interest rate curves bespoke products: different payoff, continuously emerging Monte Carlo: can be the only feasible way of valuation 8

Heath-Jarrow-Morton Heath-Jarrow-Morton (HJM) framework general mathematical model models instantaneous forward interest rate curve mathematical description f(t,t): instantaneous forward rate at time T as seen from time t σ(t,t): forward volatility column vector of size d (no. of factors) W(t): d-dimensional standard random process 9

f(0,t) Forward Curve Dynamics f(0,t), 0 T 8 0 1 2 3 4 5 6 7 8 T 10

f(1,t) Forward Curve Dynamics f(1,t), 1 T 8 0 1 2 3 4 5 6 7 8 T 11

f(1,t) Forward Curve Dynamics f(1,t), 1 T 8 Random displacement 0 1 2 3 4 5 6 7 8 T 12

f(2,t) Forward Curve Dynamics f(2,t), 2 T 8 Random displacement 0 1 2 3 4 5 6 7 8 T 13

HJM Monte Carlo: Single Path Input: f(0, T) = initial forward curve, σ = volatility model Output: f(t, T) = forward surface 1: for t=0 to t max do 2: for T =0 to T max do 3: Calculate Drift: obtain σ(t, T) and calculate μ(t ϭt, t+t ) 4: Update forward Surface: get f(t, t+t ) 5: Price Derivative State 1: Use f(t, t+t ) to price the target derivative 6: end for 7: Price Derivative State 2: Use result from State 1 to price the target derivative 8: end for 14

HJM Monte Carlo: Single Path Input: f(0, T) = initial forward curve, σ is volatility model Output: f(t, T) = forward surface 1: for t=0 to t max do 2: for T =0 to T max do 3: Calculate Drift: obtain σ(t, T) and calculate μ(t ϭt, t+t ) 4: Update forward Surface: get f(t, t+t ) 5: Price Derivative State 1: Interest Rate Generator Volatility Logic Payoff Evaluation Logic Use f(t, t+t ) to price the target derivative 6: end for 7: Price Derivative State 2: Use result from State 1 to price the target derivative 8: end for 15

1. Multi-level Customisation efficiency: two phases in development model developing phase payoff evaluator developing phase productivity: two types of developers platform experts: expertise in target platform platform users: expertise in applications 3 levels of modular functional specialisations Heavy, Medium, Light 16

Heavy Specialisations stable modules: highly optimised, platform dependent require detailed knowledge of platform, done by experts Medium semi-stable modules: optimised, platform dependent limited variations: specified by users ahead of time building blocks: in payoff evaluator developing phase Light volatile modules: still under development ease of use: domain specific languages may involve platform dependent configuration files 17

Customisation: Two Phases Model development phase 1. Experts develop heavily specialised modules 2. Experts and users define templates for mediumly specialised modules 3. Experts optimise the modules for potential target platform payoff evaluator development phase 4. Users choose a mediumly specialised module as a base component and a target platform 5. Users using a platform independent domain specific language to generate payoff evaluators 18

Multi-level Customisation for HJM Parameters From CPU Interest Rate Engine Interest Rate Generator (Hand Optimised) Parameters... Volatility Logic (From Template)... Prone to change Heavily specialised module Mediumly specialised module Lightly specialised module By expert By expert By user Payoff Evaluation Logic (Programmed by User) HJM Payoff Evaluation Kernel Results to CPU Parallel Kernels 19

Customise: volatility + payoff evaluation From Template: max re-use In C-based domainspecific language 20

Workflow: Experts + Users By expert By user 21

2. Application Specialisation Flow domain specific programming environment to specialise the framework to particular application data-width optimisation to find the optimal data format ensures good performance on FPGA while retaining result accuracy 22

Domain Specific Programming C style and control-based provides environment parameters per iteration operator latency is implicit platform user create input/output variables create intermediate variables defines payoff evaluation logic 23

Present value calculator for a Zero Coupon Bond B(t Imax, t+t Jmax ) 24

Data-Width Optimisation: Errors results from numerical techniques discretisation error finite precision error discretisation error intrinsic finite precision error increases as data-width decreases 25

MHz Data-Width Optimisation data-width reduction: improve FPGA performance 16,00% 300 14,00% 250 12,00% 10,00% 200 8,00% 6,00% 4,00% 150 100 LUT FF BRAM DSPs Clock Freq 2,00% 50 0,00% 0 Resource consumption for HJM Bond Option Kernels with different data-widths 26

Data-Width Optimisation problem: determine optimal data-width preserve result accuracy consume minimal FPGA resources Welch s t-test assess statistical significance of finite precision error compare reduced precision and full precision 27

Welch s t-test: Optimised Data-Width Number of mantissa bits: p-value in log scale for Swaption 28

3. Results MaxWorkstation: Xilinx Virtex-6 SX475T FPGA 4-Core Intel i7-870 CPU, 2.93GHz 448-Core NVIDIA Tesla C2070 GPU, 1.15GHz CPU FPGA GPU Compiler Intel Max Compiler nvcc Native Language C++ MaxJ CUDA 29

% Resource Consumption Resource Use: Optimised Data-Width 45,00% 40,00% 42% 35,00% 30,00% 25,00% 20,00% 15,00% 10,00% 5,00% 0,00% 29% 29% 12% 8% 6% 5% 3% 4% 2% 2% 2% Bond Option Swaption CMS Spread Option Wf=53 LUT Wf=17 LUT Wf=53 BRAM Wf=17 BRAM Wf: number of mantissa bits 30

Speed up (times) Speed Up 50 45 40 Speed up over single core software implementation 44,8 42,4 39,2 35 30 25 20 15 32,8 30,04 27,1 4-Core CPU FPGA GPU 10 5 0 4 4 4 Bond Option Swaption CSM Spread Option 31

Power (Watt) Power Consumption 300 250 Power Consumption for Different Implementations, using Power Measuring Socket from Olson Electronics 240 238 240 200 150 100 183 184 184 87 87 85 4-Core CPU FPGA GPU 50 0 Bond Option Swaption CSM Spread Option 32

Current Work extend framework to support more platforms, e.g. those with multiple accelerator types volatility structures, payoff evaluation functions financial, risk and other applications improve performance + energy efficiency mixed precision more automation run-time reconfiguration 33

Why Reconfigurability growing fabrication cost time-share large design accelerate demanding applications potential for low power/energy consumption support health monitoring enhance reliability + fault tolerance speed up design cycle: incremental development 34

Why Reconfigurability growing fabrication cost time-share large design??? accelerate demanding applications potential for low power/energy consumption support health monitoring enhance reliability + fault tolerance speed up design cycle: incremental development 35

Run-time Reconfigurability multiple reconfigurations interleave or concurrent with data processing mixed precision computation low precision: maximise parallelism high precision: improve accuracy multi-stage computation: multiple precisions high precision: fewer iteration, each takes longer eliminate idle functions active functions in same configuration 36

Recent Results: MAX3 Accelerator finance: pricing Asian options 44.6x speed, 40.7x energy efficiency of quadcore i7-870 4.6x speed, 5.5x energy efficiency of C2070 GPU seismic imaging: reverse time migration 103x speed, 145x energy efficiency of quadcore i7-870 2.5x speed, 10.2x energy efficiency of GTX280 GPU biomedical: genetic sequence matching 293x speed of Xeon X5650 with 20 threads 134x speed of NVIDIA GTX 580 GPU 37

Current and Future Research functional and performance models correctness + performance: generalise reconfigurability aspect-oriented design: software + hardware multi-source e.g. OpenCL, design re-use, portability machine learning: smarter systems adapt to application and device behaviour at run time 38

Summary accelerators: becoming main-stream Improving speed, latency, size, power, energy, key challenges best trade-offs in efficiency and productivity compilation, verification, performance analysis models, machine learning, run-time reconfigurability 39