CUDA Implementation of the Lattice Boltzmann Method

Similar documents
History of Monte Carlo Method

Monte-Carlo Pricing under a Hybrid Local Volatility model

PRICING AMERICAN OPTIONS WITH LEAST SQUARES MONTE CARLO ON GPUS. Massimiliano Fatica, NVIDIA Corporation

Financial Computations on the GPU

Financial Mathematics and Supercomputing

GRAPHICAL ASIAN OPTIONS

Stochastic Grid Bundling Method

Black-Scholes option pricing. Victor Podlozhnyuk

Monte Carlo Option Pricing

Computational Finance in CUDA. Options Pricing with Black-Scholes and Monte Carlo

New GPU Pricing Library

6.4 Solving Linear Inequalities by Using Addition and Subtraction

Extra Practice Chapter 6

CUDA-enabled Optimisation of Technical Analysis Parameters

Towards efficient option pricing in incomplete markets

HPC IN THE POST 2008 CRISIS WORLD

Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGA

Pricing Early-exercise options

Architecture Exploration for Tree-based Option Pricing Models

Outline. GPU for Finance SciFinance SciFinance CUDA Risk Applications Testing. Conclusions. Monte Carlo PDE

Liangzi AUTO: A Parallel Automatic Investing System Based on GPUs for P2P Lending Platform. Gang CHEN a,*

NAG for HPC in Finance

List the quadrant(s) in which the given point is located. 1) (-10, 0) A) On an axis B) II C) IV D) III

Mark Redekopp, All rights reserved. EE 357 Unit 12. Performance Modeling

GPU-Accelerated Quant Finance: The Way Forward

A distributed Laplace transform algorithm for European options

Accelerating Quantitative Financial Computing with CUDA and GPUs

Application of High Performance Computing in Investment Banks

SPEED UP OF NUMERIC CALCULATIONS USING A GRAPHICS PROCESSING UNIT (GPU)

Accelerating Financial Computation

F1 Acceleration for Montecarlo: financial algorithms on FPGA

MDPs: Bellman Equations, Value Iteration

Modeling Path Dependent Derivatives Using CUDA Parallel Platform

arxiv: v1 [q-fin.cp] 17 Jan 2011

Barrier Option. 2 of 33 3/13/2014

Enhanced Shell Sorting Algorithm

A t S + b r t T B (h i + 1) (t S + t T ) C h i (t S + t T ) + t S + b t T D (h i + n) (t S + t T )

Efficient Reconfigurable Design for Pricing Asian Options

ANALYSIS OF THE BINOMIAL METHOD

Computational Finance Binomial Trees Analysis

CS 343: Artificial Intelligence

Real-Time Market Data Technology Overview

Algorithmic Differentiation of a GPU Accelerated Application

Dividing Polynomials

Partial Differential Equations of Fluid Dynamics

Binomial American Option Pricing on CPU-GPU Hetergenous System

HIGH PERFORMANCE COMPUTING IN THE LEAST SQUARES MONTE CARLO APPROACH. GILLES DESVILLES Consultant, Rationnel Maître de Conférences, CNAM

Chapter 6 Analyzing Accumulated Change: Integrals in Action

November 2018 Abstract

Ultimate Control. Maxeler RiskAnalytics

Why know about performance

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

Efficient Reconfigurable Design for Pricing Asian Options

A Highly Efficient Implementation on GPU Clusters of PDE-Based Pricing Methods for Path-Dependent Foreign Exchange Interest Rate Derivatives

Scaling SGD Batch Size to 32K for ImageNet Training

STOCK PITCH AMD. (Advanced Micro Devices) Stock Summary

Solving the Stochastic Steady-State Diffusion Problem Using Multigrid

Finite Element Method

COS 318: Operating Systems. CPU Scheduling. Jaswinder Pal Singh Computer Science Department Princeton University

Linear Modeling Business 5 Supply and Demand

Parallel Multilevel Monte Carlo Simulation

Application of an Interval Backward Finite Difference Method for Solving the One-Dimensional Heat Conduction Problem

Administration CSE 326: Data Structures

Hardware Accelerators for Financial Mathematics - Methodology, Results and Benchmarking

Section 9.1 Solving Linear Inequalities

ifko, LANB, PWML, PCA & Other Fascinating Post-ICL Acronyms

Numerical Methods in Option Pricing (Part III)

Reconfigurable Acceleration for Monte Carlo based Financial Simulation

CS 188: Artificial Intelligence Fall 2011

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Fast American Basket Option Pricing on a multi-gpu Cluster

MLC at Boise State Polynomials Activity 3 Week #5

PARELLIZATION OF DIJKSTRA S ALGORITHM: COMPARISON OF VARIOUS PRIORITY QUEUES

How to Make Calls Into Puts

Numerical software & tools for the actuarial community

COS 318: Operating Systems. CPU Scheduling. Today s Topics. CPU Scheduler. Preemptive and Non-Preemptive Scheduling

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

COS402- Artificial Intelligence Fall Lecture 17: MDP: Value Iteration and Policy Iteration

Accelerating Reconfigurable Financial Computing

Implementing hybrid PDE solvers

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Los Angeles Fire and Police Pensions

Stochastic Local Volatility & High Performance Computing

Econ 337 Spring 2014 Due 10am 100 points possible

Interest Rate Basis Curve Construction and Bootstrapping Guide

The Evaluation of American Compound Option Prices under Stochastic Volatility. Carl Chiarella and Boda Kang

Anne Bracy CS 3410 Computer Science Cornell University

A Branch-and-Price method for the Multiple-depot Vehicle and Crew Scheduling Problem

arxiv: v1 [cs.dc] 14 Jan 2013

Soft Response Generation and Thresholding Strategies for Linear and Feed-Forward MUX PUFs

JADE LICENSING DOCUME N T V E R S I O N 1 2 JADE SOFTWARE CORPORATION

Math Performance Task Teacher Instructions

Business Statistics 41000: Probability 3

Option Pricing with the SABR Model on the GPU

Unblinded Sample Size Re-Estimation in Bioequivalence Trials with Small Samples. Sam Hsiao, Cytel Lingyun Liu, Cytel Romeo Maciuca, Genentech

Rayleigh Curves A Tutorial

Implementing Models in Quantitative Finance: Methods and Cases

CS227-Scientific Computing. Lecture 6: Nonlinear Equations

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Transcription:

CUDA Implementation of the Lattice Boltzmann Method CSE 633 Parallel Algorithms Andrew Leach University at Buffalo 2 Dec 2010 A. Leach (University at Buffalo) CUDA LBM Nov 2010 1 / 16

Motivation The Lattice Boltzmann Method(LBM) solves the Navier-Stokes equation accurately and efficiently. A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16

Motivation The Lattice Boltzmann Method(LBM) solves the Navier-Stokes equation accurately and efficiently. Uniformity makes it easy to parallelize. A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16

Motivation The Lattice Boltzmann Method(LBM) solves the Navier-Stokes equation accurately and efficiently. Uniformity makes it easy to parallelize. High volume of simple calculations make it ideal for GPGPU computing. A. Leach (University at Buffalo) CUDA LBM Nov 2010 2 / 16

LBM Degrees of Freedom F2 F6 F5 F3 F1 F7 F8 F4 Each lattice point has an associated mass density A. Leach (University at Buffalo) CUDA LBM Nov 2010 3 / 16

LBM Degrees of Freedom F2 F6 F5 F3 F1 F7 F8 F4 Each lattice point has an associated mass density This mass density is projected in 9 directions A. Leach (University at Buffalo) CUDA LBM Nov 2010 3 / 16

LBM Stream F8 F4 F7 F1 F3 F5 F2 F6 At each time step, each neighbor passes mass density A. Leach (University at Buffalo) CUDA LBM Nov 2010 4 / 16

LBM Collision F6 F2 F5 F3 F1 F7 F4 F8 Collision occurs with the accepted mass densities A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16

LBM Collision F6 F2 F5 F3 F1 F7 F4 F8 Collision occurs with the accepted mass densities Equillibrium condition is solved A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16

LBM Collision F6 F2 F5 F3 F1 F7 F4 F8 Collision occurs with the accepted mass densities Equillibrium condition is solved New projected mass densities are assigned A. Leach (University at Buffalo) CUDA LBM Nov 2010 5 / 16

LBM Boundary Conditions Bounceback is implemented at solid boundaries A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16

LBM Boundary Conditions Bounceback is implemented at solid boundaries The inlet has predetermined mass density A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16

LBM Boundary Conditions Bounceback is implemented at solid boundaries The inlet has predetermined mass density The outlet accepts outward flow A. Leach (University at Buffalo) CUDA LBM Nov 2010 6 / 16

Code: Data Structures F1 HOST Device Data initialized as an array on host A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Data Structures F1 HOST Data initialized as an array on host Pitch stores the width of a row in memory, determined by CudaMallocPitch() Device A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Data Structures F1 HOST Data initialized as an array on host Pitch stores the width of a row in memory, determined by CudaMallocPitch() Memory is allocated on the device linear memory with CudaMallocArray() Device A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Data Structures F1 HOST Data initialized as an array on host Pitch stores the width of a row in memory, determined by CudaMallocPitch() Memory is allocated on the device linear memory with CudaMallocArray() Array copied from host to device with CudaMemcpy2D() Device A. Leach (University at Buffalo) CUDA LBM Nov 2010 7 / 16

Code: Textures The stream step requires a lot of data retrieval A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16

Code: Textures The stream step requires a lot of data retrieval Texture memory has fast retrieval but limited space A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16

Code: Textures The stream step requires a lot of data retrieval Texture memory has fast retrieval but limited space Use cudabindtexturetoarray() to copy data as a texture A. Leach (University at Buffalo) CUDA LBM Nov 2010 8 / 16

Code: Kernels A kernel is launched on a grid of blocks A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16

Code: Kernels A kernel is launched on a grid of blocks Each block consists of threads which will independently run the kernel(simd) A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16

Code: Kernels A kernel is launched on a grid of blocks Each block consists of threads which will independently run the kernel(simd) What follows is the Kernel for the stream() method. This example utilizes a lock-step texture look up. A. Leach (University at Buffalo) CUDA LBM Nov 2010 9 / 16

Code: Stream A. Leach (University at Buffalo) CUDA LBM Nov 2010 10 / 16

Runtime Analysis The following slides contain graphs comparing run times for the LBM on a laptop with 1.3 GHZ processor running sequential C code and a single Tesla GPU running parallel code in CUDA. The change in performance based on block size is also explored. A. Leach (University at Buffalo) CUDA LBM Nov 2010 11 / 16

Sequential vs Parallel 400 300 Time 200 100 0 A. Leach (University at Buffalo) CUDA LBM Nov 2010 12 / 16

Sequential vs Parallel Time (s 14 12 10 8 6 4 2 0 Comparison A. Leach (University at Buffalo) CUDA LBM Nov 2010 13 / 16

Sequential vs Parallel 300 250 speed up (X 200 150 100 50 0 A. Leach (University at Buffalo) CUDA LBM Nov 2010 14 / 16

Thank You Thanks to Dr.Graham Pullan from Cambridge University for letting me use and modify his code. A. Leach (University at Buffalo) CUDA LBM Nov 2010 15 / 16

Bibliography Alexander Wagner, A Practical Introduction to the Lattice Boltzmann Method. North Dakota State University, March 2008. Graham Pullan, A 2D Lattice Boltzmann Flow Solver Demo. http://www.many-core.group.cam.ac.uk/projects/lbdemo.shtml, University of Cambridge. A. Leach (University at Buffalo) CUDA LBM Nov 2010 16 / 16