HPC IN THE POST 2008 CRISIS WORLD

Similar documents
Algorithmic Differentiation of a GPU Accelerated Application

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE

Accelerating Quantitative Financial Computing with CUDA and GPUs

Hedging Strategy Simulation and Backtesting with DSLs, GPUs and the Cloud

RunnING Risk on GPUs. Answering The Computational Challenges of a New Environment. Tim Wood Market Risk Management Trading - ING Bank

Applications of Dataflow Computing to Finance. Florian Widmann

NAG for HPC in Finance

Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA

S4199 Effortless GPU Models for Finance

Accelerating Financial Computation

PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS. Massimiliano Fatica, NVIDIA Corporation

Barrier Option. 2 of 33 3/13/2014

Analytics in 10 Micro-Seconds Using FPGAs. David B. Thomas Imperial College London

MVA, KVA: modelling challenges

Financial Mathematics and Supercomputing

Assessing Solvency by Brute Force is Computationally Tractable

Stochastic Grid Bundling Method

GPU-Accelerated Quant Finance: The Way Forward

Pricing Early-exercise options

Numerix Pricing with CUDA. Ghali BOUKFAOUI Numerix LLC

Ultimate Control. Maxeler RiskAnalytics

GRAPHICAL ASIAN OPTIONS

F1 Acceleration for Montecarlo: financial algorithms on FPGA

SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU)

Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL

Domokos Vermes. Min Zhao

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Workstation Testing. Enterprise Testing Dell and NVIDIA solutions Conclusions

HIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH. GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM

Why know about performance

2nd Order Sensis: PnL and Hedging

for Finance Python Yves Hilpisch Koln Sebastopol Tokyo O'REILLY Farnham Cambridge Beijing

CUDA Implementation of the Lattice Boltzmann Method

Efficient Reconfigurable Design for Pricing Asian Options

Modelling Counterparty Exposure and CVA An Integrated Approach

ARM. A commodity risk management system.

Accelerated Option Pricing Multiple Scenarios

Preparing for the Fundamental Review of the Trading Book (FRTB)

Handbook of Financial Risk Management

New GPU Pricing Library

SAS Data Mining & Neural Network as powerful and efficient tools for customer oriented pricing and target marketing in deregulated insurance markets

Coarse Grain Automatic Differentiation in Financial Software

Challenges in Counterparty Credit Risk Modelling

Multi-level Stochastic Valuations

A new breed of Monte Carlo to meet FRTB computational challenges

Reprinted from. RISK MANAGEMENT l DERIVATIVES l REGULATION. RISK.NET DecEMbER Murex Vendor of Choice

Reconfigurable Acceleration for Monte Carlo based Financial Simulation

Many-core Accelerated LIBOR Swaption Portfolio Pricing

Application of High Performance Computing in Investment Banks

Financial Computing with Python

Numerical software & tools for the actuarial community

TEPZZ 858Z 5A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2015/15

Real-Time Market Data Technology Overview

Monte-Carlo Pricing under a Hybrid Local Volatility model

UNITED STATES SECURITIES AND EXCHANGE COMMISSION. Washington, D.C TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF

Efficient Reconfigurable Design for Pricing Asian Options

Efficient Lifetime Portfolio Sensitivities: AAD Versus Longstaff-Schwartz Compression Chris Kenyon

Monte Carlo Option Pricing

Razor Risk Market Risk Overview

WHITE PAPER THINKING FORWARD ABOUT PRICING AND HEDGING VARIABLE ANNUITIES

CS 188: Artificial Intelligence

Fast Convergence of Regress-later Series Estimators

We are not saying it s easy, we are just trying to make it simpler than before. An Online Platform for backtesting quantitative trading strategies.

An Algorithm for Distributing Coalitional Value Calculations among Cooperating Agents

Implementing Models in Quantitative Finance: Methods and Cases

Building the Healthcare System of the Future O R A C L E W H I T E P A P E R F E B R U A R Y

Clouds for HPC Potential? Challenges?

Scaling SGD Batch Size to 32K for ImageNet Training

Global Calibration. 1 Calibration Strategies. Claudio Albanese 1. August 18, 2009

Collateralized Debt Obligation Pricing on the Cell/B.E. -- A preliminary Result

History of Monte Carlo Method

Mark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling

Remarks on stochastic automatic adjoint differentiation and financial models calibration

The Fundamental Review of the Trading Book - Tackling a new approach for market risk

Earnings Conference Presentation

Anne Bracy CS 3410 Computer Science Cornell University

Architecture Exploration for Tree-based Option Pricing Models

STOCHASTIC PROGRAMMING FOR ASSET ALLOCATION IN PENSION FUNDS

The Dynamic Cross-sectional Microsimulation Model MOSART

Option Models for Bonds and Interest Rate Claims

FINCAD s Flexible Valuation Adjustment Solution

Unparalleled Performance, Agility and Security for NSE

7 pages Intro /Doc /Kernel /Interface /Contents 1. Premia: Overview version 14. C. Martini, A.Zanette 1999/15/12. Premia What Premia is 1

Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,*

Computational Finance in CUDA. Options Pricing with Black-Scholes and Monte Carlo

Curve fitting for calculating SCR under Solvency II

The Next Steps in the xva Journey. Jon Gregory, Global Derivatives, Barcelona, 11 th May 2017 Copyright Jon Gregory 2017 page 1

Towards efficient option pricing in incomplete markets

by Kian Guan Lim Professor of Finance Head, Quantitative Finance Unit Singapore Management University

2010 Annual Report Cray Inc. 901 Fifth Avenue, Suite 1000, Seattle, WA tel fax

Standardised Risk under Basel 3. Pardha Viswanadha, Product Management Calypso

Black-Scholes option pricing. Victor Podlozhnyuk

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Cross Asset CVA Application

CS 343: Artificial Intelligence

Interest Rate Cancelable Swap Valuation and Risk

Fixed-Income Securities Lecture 5: Tools from Option Pricing

Tampere University of Technology. Kanniainen, Juho; Piché, Robert; Mikkonen, Tommi. Use of distributed computing in derivative pricing

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Markov Decision Processes

Transcription:

GTC 2016 HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016

STANFORD CENTER FOR FINANCIAL AND RISK ANALYTICS HPC IN THE POST 2008 CRISIS WORLD Pierre SPATZ MUREX 2016

BACK TO 2008

FINANCIAL MARKETS THE PICTURE BEFORE 2008 Margins are high, regulation costs are small Flexibility of the tools, Invention of new exotic features and time to market count more than performance Tier 1 and big Tier 2 banks have no budget issues and invest in huge grid of computers Other banks are more intermediaries and resale products and need only an informative present value Code is mainly mono threaded Most quants focus only on the mathematics disregarding IT problems and we are not different 2015 Murex S.A.S. All rights reserved 4

MUREX POSITION THE PICTURE BEFORE 2008 We are already a leader in our market Tier 1 banks plug their own models inside our system and like our system for being fully integrated from front office to processing Murex front office teams invest heavily in risk measure, scenario flexibility, complex sensitivities for nested calibration cases computation and automatic grid management Our financial model library quality is close to the ones of the biggest banks Our customers who want to challenge Tier1 banks like our models but do want to not invest in a huge infrastructure 2015 Murex S.A.S. All rights reserved 5

COMPUTATION NEEDS IN FINANCE THE PICTURE BEFORE 2008 Pricing and front office risk management of Exotic structured products with scripted payoffs evaluated by Monte-Carlo Credit derivatives American and barrier options evaluated by partial differential equations 1 Year historical value at risk as a night batch 2015 Murex S.A.S. All rights reserved 6

COMPUTERS, CHIPS AND TOOLS THE PICTURE IN 2008 Xeon and Opteron have 4 cores and we have no practice of parallel programming Sun microsystem doesn t belong to Oracle and Solaris on Sparc processors is still preferred by our customers Quants love Excel and IT wants us to do everything in Java PlayStation 3 with its cell processor is available worldwide, can be used and programmed as a workstation under Yellow Dog Linux RoadRunner featuring a double precision friendly cell processor becomes the first computer to pass the PetaFlop barrier NVIDIA gaming GPUs are said to be programmable using something called CUDA and first Unix servers with Tesla cards are delivered to some universities and research centers We are playing with our first iphones and they are powered by a low consumption ARM processor 2015 Murex S.A.S. All rights reserved 7

CELL & ARM They are mostly CPUs like Intel Xeons ARM processors achieve better performance per watt by implementing simpler instructions and running at smaller frequency CELL processors achieve better performance for the same number of transistors by implementing wide vector functions inside simpler and slower cores, replacing cache by cores and by letting the programmer responsible of accessing the memory using explicit instructions with a high latency CELL processor was extremely complex to program and is deprecated today but we can consider Xeon Phi as its natural descendant featuring a cache 2015 Murex S.A.S. All rights reserved 8

CPU & GPU THEY ARE BOTH BUNCH OF CORES CPUs multi cores run at high frequency and are optimized for fast execution of mono threaded code with unpredictable execution stack GPUs many cores run at small frequency and are optimized for batch execution of the same set of instructions across the board CPUs are not specialized in computation GPUs are Flops machines CPUs can handle a huge amount of memory CPUs cores have fast access to the memory thanks to a huge and fast L2/L3 memory cache CPUs cores have a fast L1 cache managed automatically CPUs parallelization is better implemented at the level of the task CPUs multithreading is software managed GPUs memory is limited but has high bandwidth GPUs cores access memory with a latency but hide it by doing something else GPUs cores have a fast local memory managed by the programmer GPUs parallelization is better implemented at the level of the data GPUs multithreading is hardware managed 2015 Murex S.A.S. All rights reserved 9

GPU & 2008 PROBLEMS

EXOTIC STRUCTURED PRODUCTS MONTE-CARLO WITH SCRIPTED PAYOFFS WITH GPU Monte-Carlo is embarrassingly parallel Best performance with payoff scripting/dsl by path Generate and compile CUDA/OpenCL kernels In practice you are limited by the number of registers by CUDA core and the complexity of the payoff Best flexibility with payoff scripting/dsl by date Use your preferred interpreted scripting language on CPU and implement vector based operations on the GPU In practice you are limited by the memory bandwidth of the GPU Choose a good random number generator to cope with flexible implementation and be able to replay a part of the Monte-Carlo for optimization purposes In practice De Shaw Philox is great 2015 Murex S.A.S. All rights reserved 11

THE LATENCY PROBLEM GPUs are only efficient when treating big problems and there is a real latency when launching the kernels In practice reshape your code to see more problems at the same time sensitivities, scenarios, trades,.. but keep in mind that GPU memory is limited 2015 Murex S.A.S. All rights reserved 12

OPTION PRICING AND CALIBRATION SOLVED BY PARTIAL DIFFERENTIAL EQUATIONS LU solvers are not GPU friendly since they are sequential Choose instead a divide and conquer algorithm like PCR N log(n) operations but only in log(n) steps Stencil computation is more about accessing inputs than doing computation Keep as much as possible your data in local memory 1D problems are not big enough to feed a GPU but you have many options in your portfolios 2 a - b = 1 x1/2-1 a + 2 b - 1 c = 1 x1 - b + 2 c - d = 1 x1/2 x1/2-1 c + 2 d - 1 e = 1 x1 - d + 2 e - 1 f = 1 x1/2 x1/2-1 e + 2 f - 1 g = 1 x1-1 f + 2 g = 1 x1/2 + 1 b - 1/2 d = 2 x1/2-1/2 b + 1 d - 1/2 f = 2 x1-1/2 d + 1 f = 2 x1/2 1/2 d = 4 2015 Murex S.A.S. All rights reserved 13

BACK TO TODAY

FINANCIAL MARKETS THE PICTURE TODAY Lower margins, higher volumes, regulation costs are high We see a trend in exotic standardization but we still have 40 years PRDCS in our books Tier 1 banks and Murex have had GPUs in production for some time and are continuing to invest while other experiences like FPGAs for Monte Carlo have failed GPUs are mainstream in super-computers and are there to stay Medium size banks are obliged to be able to manage their risk and run their VAR on exotic portfolios even when trades are asset swapped and theoretically risk free CVA is our day to day topic and invest only in computers without a rewrite of an efficient and parallel friendly code is no-more an option A good quant is also a good computer science expert 2015 Murex S.A.S. All rights reserved 15

CVA & PFE A Monte-Carlo with a reduced set of paths on all the trades done with a counterpart Where we need to retrieve all PVs for all future paths and dates for future flexible aggregation and drill down type analysis Where counterparty trades composition and volume may be very different 2015 Murex S.A.S. All rights reserved 17

CVA A FLAVOR OF THE DIFFICULTY Swaps LCH Foreign branches Caps Exotics Many other small counterparts 1 TB of results generated when computing sensitivities for a medium size bank and far more for a Tier1 Considering all trades or all counterparts equivalent would be a mistake in building a system Re-compute everything in case of a failure is not an option 2015 Murex S.A.S. All rights reserved 18

HPC FOR CVA Group vanilla trades and evaluate them together on GPUs independently of their counterpart in a compute centric cluster aka small nodes Use GPU American Monte-Carlo with non linear regression for exotic trades Use specific boxes with enough memory for aggregation in a big data centric subcluster aka big nodes Use a parallel fast flash file storage as an intermediate buffer and checkpoint for the calculation chain to insure performance and reliability Use IB network as interconnect being able to convey several GB per second 2015 Murex S.A.S. All rights reserved 19

BACK TO THE FUTURE

THE PICTURE TOMORROW FRTB and till 15K scenarios using front office models MVA which leads to the computation of an historical VAR inside a Monte-Carlo in a scalable manner of all trades done with a CCP AD and AAD are back in the game but are no game changers yet Always faster and more flexible GPUs Cars become self aware 2015 Murex S.A.S. All rights reserved 21

AD AND AAD IN A NUTSHELL AD is the good old forward pathwise method for computing sensitivities but done automatically by tools AAD is about the same method but generates sensitivities to all inputs and intermediate values in a unique additional backward sweep at a ridiculous compute cost AAD can be implemented using some special compilers which are only partially compatible with GPUs or by overloading C++ basic scalar operators used to program the MC which is totally GPU friendly The operator execution keeps the record of all operations and intermediary results of the forward sweep. The tape is played backward on all path in // and the derivatives per path are computed using the rule of chain keeping future results constant The result sensitivities are finally the expectation of the sensitivities computed for each path θ x y z p p x = p z z x p y = p z z y p z z p x, p y x, y 2015 Murex S.A.S. All rights reserved 22

AD AND AAD PROMISING, GOOD FOR VANILLAS, BUT The method is simple but the implementation can be tricky. Everything should be done to have generic enough kernels to keep the GPU fed while avoiding race conditions To obtain the best performance one still needs to trick the order of operations inside the computation tree making the method often incompatible with cases where we want to keep the full flexibility at the level of the post-aggregation of several Monte-Carlo detailed results AAD is not applicable to all complex exotics even if the vibrato method smoother helps AAD doesn t solve the stress test and historical VAR problems AAD is also said to be memory bound. Well implemented it is only memory bandwidth bound 2015 Murex S.A.S. All rights reserved 23

PASCAL THE MEMORY BANDWIDTH JUMP FOR IN A SINGLE GPU GB/Sec X80 TITAN 1000 K40 288 M2090 178 C10160 102 0 100 200 300 400 500 600 700 800 900 1000 It is the first time since 2008 that the number of Bytes per Flop has increased for a single GPU during a generation change and maybe the last - Our AAD code will simply be 3.5 faster on next generation but most of our algorithms are at least partially limited by the memory bandwidth of the GPUs and will show huge benefits 2015 Murex S.A.S. All rights reserved 24

SIERRA SUPERCOMPUTER 2017-2018 A FULL FLEDGED CVA RISK SYSTEM IN A NODE The revival of the big nodes The Flops of 8 K40 A lot of CPU cores and memory to prepare inputs, convert outputs, interpret scripts, aggregate, query, Enough GPU/CPU interconnect speed to retrieve CVA or MVA profiles unnoticed NVRAM to replace external flash array storage Enough network bandwidth to have the flexibility of keeping results locally or remotely Bilateral MVA with SIMM at the same cost CCP MVA with full revaluation using only a few nodes 2015 Murex S.A.S. All rights reserved 25

THANK YOU! PARIS NEW YORK SINGAPORE linkedin.com/company/murex twitter.com/murex_group www.murex.com info@murex.com